date:20220318

[RFC PATCH] vhost_net: should not use max_queue_pairs for non-mq guest

2022-03-18 Thread Si-Wei Liu

With MQ enabled vdpa device and non-MQ supporting guest e.g.
booting vdpa with mq=on over OVMF of single vqp, it's easy
to hit assert failure as the following:

../hw/virtio/vhost-vdpa.c:560: vhost_vdpa_get_vq_index: Assertion `idx >= 
dev->vq_index && idx < dev->vq_index + dev->nvqs' failed.

0  0x7f8ce3ff3387 in raise () at /lib64/libc.so.6
1  0x7f8ce3ff4a78 in abort () at /lib64/libc.so.6
2  0x7f8ce3fec1a6 in __assert_fail_base () at /lib64/libc.so.6
3  0x7f8ce3fec252 in  () at /lib64/libc.so.6
4  0x558f52d79421 in vhost_vdpa_get_vq_index (dev=, 
idx=) at ../hw/virtio/vhost-vdpa.c:563
5  0x558f52d79421 in vhost_vdpa_get_vq_index (dev=, 
idx=) at ../hw/virtio/vhost-vdpa.c:558
6  0x558f52d7329a in vhost_virtqueue_mask (hdev=0x558f55c01800, 
vdev=0x558f568f91f0, n=2, mask=) at ../hw/virtio/vhost.c:1557
7  0x558f52c6b89a in virtio_pci_set_guest_notifier 
(d=d@entry=0x558f568f0f60, n=n@entry=2, assign=assign@entry=true, 
with_irqfd=with_irqfd@entry=false)
   at ../hw/virtio/virtio-pci.c:974
8  0x558f52c6c0d8 in virtio_pci_set_guest_notifiers (d=0x558f568f0f60, 
nvqs=3, assign=true) at ../hw/virtio/virtio-pci.c:1019
9  0x558f52bf091d in vhost_net_start (dev=dev@entry=0x558f568f91f0, 
ncs=0x558f56937cd0, data_queue_pairs=data_queue_pairs@entry=1, cvq=cvq@entry=1)
   at ../hw/net/vhost_net.c:361
10 0x558f52d4e5e7 in virtio_net_set_status (status=, 
n=0x558f568f91f0) at ../hw/net/virtio-net.c:289
11 0x558f52d4e5e7 in virtio_net_set_status (vdev=0x558f568f91f0, status=15 
'\017') at ../hw/net/virtio-net.c:370
12 0x558f52d6c4b2 in virtio_set_status (vdev=vdev@entry=0x558f568f91f0, 
val=val@entry=15 '\017') at ../hw/virtio/virtio.c:1945
13 0x558f52c69eff in virtio_pci_common_write (opaque=0x558f568f0f60, 
addr=, val=, size=) at 
../hw/virtio/virtio-pci.c:1292
14 0x558f52d15d6e in memory_region_write_accessor (mr=0x558f568f19d0, 
addr=20, value=, size=1, shift=, mask=, attrs=...)
   at ../softmmu/memory.c:492
15 0x558f52d127de in access_with_adjusted_size (addr=addr@entry=20, 
value=value@entry=0x7f8cdbffe748, size=size@entry=1, access_size_min=, access_size_max=, access_fn=0x558f52d15cf0 
, mr=0x558f568f19d0, attrs=...) at 
../softmmu/memory.c:554
16 0x558f52d157ef in memory_region_dispatch_write 
(mr=mr@entry=0x558f568f19d0, addr=20, data=, op=, 
attrs=attrs@entry=...)
   at ../softmmu/memory.c:1504
17 0x558f52d078e7 in flatview_write_continue (fv=fv@entry=0x7f8accbc3b90, 
addr=addr@entry=103079215124, attrs=..., ptr=ptr@entry=0x7f8ce6300028, 
len=len@entry=1, addr1=, l=, mr=0x558f568f19d0) 
at /home/opc/qemu-upstream/include/qemu/host-utils.h:165
18 0x558f52d07b06 in flatview_write (fv=0x7f8accbc3b90, addr=103079215124, 
attrs=..., buf=0x7f8ce6300028, len=1) at ../softmmu/physmem.c:2822
19 0x558f52d0b36b in address_space_write (as=, 
addr=, attrs=..., buf=buf@entry=0x7f8ce6300028, len=)
   at ../softmmu/physmem.c:2914
20 0x558f52d0b3da in address_space_rw (as=, addr=, attrs=...,
   attrs@entry=..., buf=buf@entry=0x7f8ce6300028, len=, 
is_write=) at ../softmmu/physmem.c:2924
21 0x558f52dced09 in kvm_cpu_exec (cpu=cpu@entry=0x558f55c2da60) at 
../accel/kvm/kvm-all.c:2903
22 0x558f52dcfabd in kvm_vcpu_thread_fn (arg=arg@entry=0x558f55c2da60) at 
../accel/kvm/kvm-accel-ops.c:49
23 0x558f52f9f04a in qemu_thread_start (args=) at 
../util/qemu-thread-posix.c:556
24 0x7f8ce4392ea5 in start_thread () at /lib64/libpthread.so.0
25 0x7f8ce40bb9fd in clone () at /lib64/libc.so.6

The cause for the assert failure is due to that the vhost_dev index
for the ctrl vq was not aligned with actual one in use by the guest.
Upon multiqueue feature negotiation in virtio_net_set_multiqueue(),
if guest doesn't support multiqueue, the guest vq layout would shrink
to single queue pair of 3 vqs in total (rx, tx and ctrl). This results
in ctrl_vq taking a different vhost_dev group index than the default
n->max_queue_pairs, the latter of which is only valid for multiqueue
guest. While on those additional vqs not exposed to the guest,
vhost_net_set_vq_index() never populated vq_index properly, hence
getting the assert failure.

A possible fix is to pick the correct vhost_dev group for the control
vq according to this table [*]:

vdpa tool / QEMU arg / guest config/ ctrl_vq group index

max_vqp 8 / mq=on/ mq=off  (UEFI) => data_queue_pairs
max_vqp 8 / mq=on/ mq=on  (Linux) => n->max_queue_pairs(>1)
max_vqp 8 / mq=off   / mq=on  (Linux) => n->max_queue_pairs(=1)

[*] Please see FIXME in the code for open question and discussion

Signed-off-by: Si-Wei Liu 
---
 hw/net/vhost_net.c | 13 +
 hw/virtio/vhost-vdpa.c | 25 -
 2 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 30379d2..9a4479b 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -322,6 +322,7 @@ int

Re: [PATCH] gitattributes: Cover Objective-C source files

2022-03-18 Thread Akihiko Odaki


On 2022/03/19 1:14, Philippe Mathieu-Daudé wrote:

Commit 29cf16db23 says:

Since commits 0979ed017f0 ("meson: rename .inc.h files to .h.inc")
and 139c1837db7 ("meson: rename included C source files to .c.inc")
'git-diff --function-context' stopped displaying C function context
correctly.


So I suspect Git has some knowledge of common file extensions like .c, 
.h and .m although I couldn't find in the source code of Git.


'git-diff --function-context' doesn't work for me without this change.


With some debugging, I found Apple's Git distribution actually carries a 
default gitattributes file which annotates *.m.

https://github.com/apple-opensource/Git/blob/master/gitattributes

However, it does not annotate *.c or *.h. Apparently there is no "c" 
diff pattern and they are handled with the "default" diff pattern which 
is actually designed for C. In fact, "c" diff pattern is not present in 
the documentation:

https://git-scm.com/docs/gitattributes#_defining_an_external_diff_driver

In conclusion, *.m should be listed in gitattributes but *.c.inc and 
*.h.inc should not be if my understanding is correct.


Paolo Bonzini, I found you are the author of commit 29cf16db23. Can you 
test the above conclusion?


Regards,
Akihiko Odaki

Re: [PATCH 2/2] iotests/207: Filter host fingerprint

2022-03-18 Thread John Snow

On Fri, Mar 18, 2022 at 8:53 AM Hanna Reitz  wrote:
>
> Commit e3296cc796aeaf319f3ed4e064ec309baf5e4da4 made the ssh block
> driver's error message for fingerprint mismatches more verbose, so it
> now prints the actual host key fingerprint and the key type.
>
> iotest 207 tests such errors, but was not amended to filter that
> fingerprint (which is host-specific), so do it now.  Filter the key
> type, too, because I guess this too can differ depending on the host
> configuration.
>

Oh, neat.

(Not that neat.)

> Fixes: e3296cc796aeaf319f3ed4e064ec309baf5e4da4
>("block: print the server key type and fingerprint on failure")
> Reported-by: John Snow 
> Signed-off-by: Hanna Reitz 
> ---
>  tests/qemu-iotests/207 | 7 ++-
>  tests/qemu-iotests/207.out | 6 +++---
>  2 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/tests/qemu-iotests/207 b/tests/qemu-iotests/207
> index 0f5c4bc8a0..41dcf3ff55 100755
> --- a/tests/qemu-iotests/207
> +++ b/tests/qemu-iotests/207
> @@ -35,7 +35,12 @@ def filter_hash(qmsg):
>  if key == 'hash' and re.match('[0-9a-f]+', value):
>  return 'HASH'
>  return value
> -return iotests.filter_qmp(qmsg, _filter)
> +if isinstance(qmsg, str):
> +# Strip key type and fingerprint
> +p = r"\S+ (key fingerprint) '(md5|sha1|sha256):[0-9a-f]+'"
> +return re.sub(p, r"\1 '\2:HASH'", qmsg)
> +else:
> +return iotests.filter_qmp(qmsg, _filter)
>
>  def blockdev_create(vm, options):
>  vm.blockdev_create(options, filters=[iotests.filter_qmp_testfiles, 
> filter_hash])
> diff --git a/tests/qemu-iotests/207.out b/tests/qemu-iotests/207.out
> index aeb8569d77..05cf753283 100644
> --- a/tests/qemu-iotests/207.out
> +++ b/tests/qemu-iotests/207.out
> @@ -42,7 +42,7 @@ virtual size: 4 MiB (4194304 bytes)
>
>  {"execute": "blockdev-create", "arguments": {"job-id": "job0", "options": 
> {"driver": "ssh", "location": {"host-key-check": {"hash": "wrong", "mode": 
> "hash", "type": "md5"}, "path": "TEST_DIR/PID-t.img", "server": {"host": 
> "127.0.0.1", "port": "22"}}, "size": 2097152}}}
>  {"return": {}}
> -Job failed: remote host key does not match host_key_check 'wrong'
> +Job failed: remote host key fingerprint 'md5:HASH' does not match 
> host_key_check 'md5:wrong'
>  {"execute": "job-dismiss", "arguments": {"id": "job0"}}
>  {"return": {}}
>
> @@ -59,7 +59,7 @@ virtual size: 8 MiB (8388608 bytes)
>
>  {"execute": "blockdev-create", "arguments": {"job-id": "job0", "options": 
> {"driver": "ssh", "location": {"host-key-check": {"hash": "wrong", "mode": 
> "hash", "type": "sha1"}, "path": "TEST_DIR/PID-t.img", "server": {"host": 
> "127.0.0.1", "port": "22"}}, "size": 2097152}}}
>  {"return": {}}
> -Job failed: remote host key does not match host_key_check 'wrong'
> +Job failed: remote host key fingerprint 'sha1:HASH' does not match 
> host_key_check 'sha1:wrong'
>  {"execute": "job-dismiss", "arguments": {"id": "job0"}}
>  {"return": {}}
>
> @@ -76,7 +76,7 @@ virtual size: 4 MiB (4194304 bytes)
>
>  {"execute": "blockdev-create", "arguments": {"job-id": "job0", "options": 
> {"driver": "ssh", "location": {"host-key-check": {"hash": "wrong", "mode": 
> "hash", "type": "sha256"}, "path": "TEST_DIR/PID-t.img", "server": {"host": 
> "127.0.0.1", "port": "22"}}, "size": 2097152}}}
>  {"return": {}}
> -Job failed: remote host key does not match host_key_check 'wrong'
> +Job failed: remote host key fingerprint 'sha256:HASH' does not match 
> host_key_check 'sha256:wrong'
>  {"execute": "job-dismiss", "arguments": {"id": "job0"}}
>  {"return": {}}
>
> --
> 2.35.1
>

sankyuu~

Reviewed-by: John Snow

Re: [PATCH 1/2] iotests.py: Filters for VM.run_job()

2022-03-18 Thread John Snow

On Fri, Mar 18, 2022 at 8:53 AM Hanna Reitz  wrote:
>
> Allow filters for VM.run_job(), and pass the filters given to
> VM.blockdev_create() to it.
>
> (Use this opportunity to annotate VM.run_job()'s parameter types;
> unfortunately, for the filter, I could not come up with anything better
> than Callable[[Any], Any] that would pass mypy's scrutiny.)
>

Yeah, I wrote some of this stuff ... before I started using mypy, and
I'd do it differently if I had to again.

(And might still do so: pulling out things like a generalized Job
Runner is still on my Someday pile, especially now that I have an
async QMP module to play with.)

Long story short: Yeah, sure, cool, I don't want to do any better than
this right now either.

> At one point, a plain string is logged, so the filters passed to it must
> work fine with plain strings.  The only filters passed to it at this
> point are the ones from VM.blockdev_create(), which are
> filter_qmp_test_files() (by default) and 207's filter_hash().  Both
> cannot handle plain strings yet, but we can make them by amending
> filter_qmp() to treat them as plain values with a None key.
>
> Signed-off-by: Hanna Reitz 

Looks fine enough to me for now.

Reviewed-by: John Snow 

> ---
>  tests/qemu-iotests/iotests.py | 26 --
>  1 file changed, 16 insertions(+), 10 deletions(-)
>
> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
> index 508adade9e..ad62d1f641 100644
> --- a/tests/qemu-iotests/iotests.py
> +++ b/tests/qemu-iotests/iotests.py
> @@ -521,8 +521,10 @@ def filter_qmp(qmsg, filter_fn):
>  # Iterate through either lists or dicts;
>  if isinstance(qmsg, list):
>  items = enumerate(qmsg)
> -else:
> +elif isinstance(qmsg, dict):
>  items = qmsg.items()
> +else:
> +return filter_fn(None, qmsg)
>
>  for k, v in items:
>  if isinstance(v, (dict, list)):
> @@ -858,8 +860,12 @@ def qmp_log(self, cmd, filters=(), indent=None, 
> **kwargs):
>  return result
>
>  # Returns None on success, and an error string on failure
> -def run_job(self, job, auto_finalize=True, auto_dismiss=False,
> -pre_finalize=None, cancel=False, wait=60.0):
> +def run_job(self, job: str, auto_finalize: bool = True,
> +auto_dismiss: bool = False,
> +pre_finalize: Optional[Callable[[], None]] = None,
> +cancel: bool = False, wait: float = 60.0,
> +filters: Iterable[Callable[[Any], Any]] = (),
> +) -> Optional[str]:
>  """
>  run_job moves a job from creation through to dismissal.
>
> @@ -889,7 +895,7 @@ def run_job(self, job, auto_finalize=True, 
> auto_dismiss=False,
>  while True:
>  ev = filter_qmp_event(self.events_wait(events, timeout=wait))
>  if ev['event'] != 'JOB_STATUS_CHANGE':
> -log(ev)
> +log(ev, filters=filters)
>  continue
>  status = ev['data']['status']
>  if status == 'aborting':
> @@ -897,18 +903,18 @@ def run_job(self, job, auto_finalize=True, 
> auto_dismiss=False,
>  for j in result['return']:
>  if j['id'] == job:
>  error = j['error']
> -log('Job failed: %s' % (j['error']))
> +log('Job failed: %s' % (j['error']), filters=filters)
>  elif status == 'ready':
> -self.qmp_log('job-complete', id=job)
> +self.qmp_log('job-complete', id=job, filters=filters)
>  elif status == 'pending' and not auto_finalize:
>  if pre_finalize:
>  pre_finalize()
>  if cancel:
> -self.qmp_log('job-cancel', id=job)
> +self.qmp_log('job-cancel', id=job, filters=filters)
>  else:
> -self.qmp_log('job-finalize', id=job)
> +self.qmp_log('job-finalize', id=job, filters=filters)
>  elif status == 'concluded' and not auto_dismiss:
> -self.qmp_log('job-dismiss', id=job)
> +self.qmp_log('job-dismiss', id=job, filters=filters)
>  elif status == 'null':
>  return error
>
> @@ -921,7 +927,7 @@ def blockdev_create(self, options, job_id='job0', 
> filters=None):
>
>  if 'return' in result:
>  assert result['return'] == {}
> -job_result = self.run_job(job_id)
> +job_result = self.run_job(job_id, filters=filters)
>  else:
>  job_result = result['error']
>
> --
> 2.35.1
>

Re: [PATCH v4 00/18] iotests: add enhanced debugging info to qemu-img failures

2022-03-18 Thread John Snow

On Fri, Mar 18, 2022 at 9:36 AM Hanna Reitz  wrote:
>
> On 18.03.22 00:49, John Snow wrote:
> > Hiya!
> >
> > This series effectively replaces qemu_img_pipe_and_status() with a
> > rewritten function named qemu_img() that raises an exception on non-zero
> > return code by default. By the end of the series, every last invocation
> > of the qemu-img binary ultimately goes through qemu_img().
> >
> > The exception that this function raises includes stdout/stderr output
> > when the traceback is printed in a a little decorated text box so that
> > it stands out from the jargony Python traceback readout.
> >
> > (You can test what this looks like for yourself, or at least you could,
> > by disabling ztsd support and then running qcow2 iotest 065.)
> >
> > Negative tests are still possible in two ways:
> >
> > - Passing check=False to qemu_img, qemu_img_log, or img_info_log
> > - Catching and handling the CalledProcessError exception at the callsite.
>
> Thanks!  Applied to my block branch:
>
> https://gitlab.com/hreitz/qemu/-/commits/block
>
> Hanna
>

Actually, hold it -- this looks like it is causing problems with the
Gitlab CI. I need to investigate these.
https://gitlab.com/jsnow/qemu/-/pipelines/495155073/failures

... and, ugh, naturally the nice error diagnostics are suppressed here
so I can't see them. Well, there's one more thing to try and fix
somehow.

--js

Re: How to backtrace an separate stack?

2022-03-18 Thread Tom Tromey

>> You can play with this if you want.  It's on 'submit/green-threads' on
>> my github.  Be warned that I rebase a lot.

Stefan> This looks cool! Would it be useful to see a port of QEMU's coroutine.py
Stefan> script to your green threads API?

Wouldn't hurt :)

Stefan> QEMU's coroutines aren't in a scheduler list so there is no way to
Stefan> enumerate all coroutines. The Python script can register a GDB command
Stefan> (e.g. "qemu coroutine 0x12345678") that makes GDB aware of the
Stefan> coroutine.

On the one hand, maybe this means the model is wrong.

On the other, I suppose qemu could also have a new command to create a
temporary "thread", given a ucontext_t (or whatever), and switch to it.
Then when the user "continue"s, the thread could be deleted again.

Tom

[PATCH qemu] target/riscv: rvv: Add missing early exit condition for whole register load/store

2022-03-18 Thread ~eopxd

From: Yueh-Ting (eop) Chen 

According to v-spec (section 7.9):
The instructions operate with an effective vector length, evl=NFIELDS*VLEN/EEW,
regardless of current settings in vtype and vl. The usual property that no
elements are written if vstart ≥ vl does not apply to these instructions.
Instead, no elements are written if vstart ≥ evl.

Signed-off-by: eop Chen 
Reviewed-by: Frank Chang 
---
 target/riscv/insn_trans/trans_rvv.c.inc | 5 +
 1 file changed, 5 insertions(+)

diff --git a/target/riscv/insn_trans/trans_rvv.c.inc 
b/target/riscv/insn_trans/trans_rvv.c.inc
index 275fded6e4..4ea7e41e1a 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -1121,6 +1121,10 @@ static bool ldst_whole_trans(uint32_t vd, uint32_t rs1, 
uint32_t nf,
  gen_helper_ldst_whole *fn, DisasContext *s,
  bool is_store)
 {
+uint32_t evl = (s->cfg_ptr->vlen / 8) * nf / (1 << s->sew);
+TCGLabel *over = gen_new_label();
+tcg_gen_brcondi_tl(TCG_COND_GEU, cpu_vstart, evl, over);
+
 TCGv_ptr dest;
 TCGv base;
 TCGv_i32 desc;
@@ -1140,6 +1144,7 @@ static bool ldst_whole_trans(uint32_t vd, uint32_t rs1, 
uint32_t nf,
 if (!is_store) {
 mark_vs_dirty(s);
 }
+gen_set_label(over);
 
 return true;
 }
-- 
2.34.1

Re: [PATCH v8 19/46] hw/cxl/device: Add some trivial commands

2022-03-18 Thread Alison Schofield

On Fri, Mar 18, 2022 at 03:06:08PM +, Jonathan Cameron wrote:
> From: Ben Widawsky 
> 
> GET_FW_INFO and GET_PARTITION_INFO, for this emulation, is equivalent to
> info already returned in the IDENTIFY command. To have a more robust
> implementation, add those.
> 
> Signed-off-by: Ben Widawsky 
> Signed-off-by: Jonathan Cameron 
> ---
>  hw/cxl/cxl-mailbox-utils.c | 69 ++
>  1 file changed, 69 insertions(+)
> 

snip

>  
> +static ret_code cmd_ccls_get_partition_info(struct cxl_cmd *cmd,
> +   CXLDeviceState *cxl_dstate,
> +   uint16_t *len)
> +{
> +struct {
> +uint64_t active_vmem;
> +uint64_t active_pmem;
> +uint64_t next_vmem;
> +uint64_t next_pmem;
> +} QEMU_PACKED *part_info = (void *)cmd->payload;
> +QEMU_BUILD_BUG_ON(sizeof(*part_info) != 0x20);
> +uint64_t size = cxl_dstate->pmem_size;
> +
> +if (!QEMU_IS_ALIGNED(size, 256 << 20)) {
> +return CXL_MBOX_INTERNAL_ERROR;
> +}
> +
> +/* PMEM only */
> +part_info->active_vmem = 0;
> +part_info->next_vmem = 0;
> +part_info->active_pmem = size / (256 << 20);
> +part_info->next_pmem = part_info->active_pmem;

Setting next like this is logical, but it's not per the CXL spec:

8.2.9.5.2.1
"Next Persistent Capacity: If non-zero, this value shall become the
Active Persistent Capacity on the next cold reset. If both this field and the
Next Volatile Capacity field are zero, there is no pending change to the
partitioning."

next_(vmem|pmem) should start as zero and only change as the result
of a successful set_partition_info command.

>From your cover letter:
* Volatile memory devices (easy but it's more code so left for now).
Wondering if this is something I could do, and follow that with
set_partition support. Does that sound reasonable? 

Alison

> +
> +*len = sizeof(*part_info);
> +return CXL_MBOX_SUCCESS;
> +}
> +

snip

Re: [PATCH] sh4: Replace TAB indentations with spaces

2022-03-18 Thread Ahmed Abouzied

Hello,

I remember this PR. It was a long time ago. I'll take a look at it and
propose a fix.

Thanks,
Ahmed

On Fri, 18 Mar 2022 at 19:25, Thomas Huth  wrote:

> On 20/06/2021 19.54, Ahmed Abouzied wrote:
> > Replaces TABs with spaces, making sure to have a consistent coding style
> > of 4 space indentations in the SH4 subsystem.
> >
> > Signed-off-by: Ahmed Abouzied 
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/376
> > ---
> ...
> > @@ -1705,101 +1705,101 @@ static void _decode_opc(DisasContext * ctx)
> >   }
> >   return;
> >   case 0xf00d: /* fsts FPUL,FRn - FPSCR: Nothing */
> > - CHECK_FPU_ENABLED
> > +CHECK_FPU_ENABLED
> >   tcg_gen_mov_i32(FREG(B11_8), cpu_fpul);
> > - return;
> > +return;
> >   case 0xf01d: /* flds FRm,FPUL - FPSCR: Nothing */
> > - CHECK_FPU_ENABLED
> > +CHECK_FPU_ENABLED
> >   tcg_gen_mov_i32(cpu_fpul, FREG(B11_8));
> > - return;
> > +return;
>
> Sorry, it's a very late reply ... but in case you're still interested in
> fixing this: It seems like at least some of these files used TABs as 8
> spaces, not as 4 spaces, so after applying your patch, the indentation
> seems
> to be wrong in all places. Please double-check the look of the files
> before
> sending! Thanks!
>
>   Thomas
>
>

[PATCH 07/15] iotests/030: fixup

2022-03-18 Thread John Snow

(Merge into prior patch.)

Signed-off-by: John Snow 
---
 tests/qemu-iotests/030 | 85 --
 1 file changed, 49 insertions(+), 36 deletions(-)

diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index 567bf1da67..3a2de920a3 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -64,16 +64,18 @@ class TestSingleDrive(iotests.QMPTestCase):
 self.assert_no_active_block_jobs()
 self.vm.shutdown()
 
-self.assertEqual(qemu_io('-f', 'raw', '-c', 'map', backing_img),
- qemu_io('-f', iotests.imgfmt, '-c', 'map', test_img),
- 'image file map does not match backing file after 
streaming')
+self.assertEqual(
+qemu_io('-f', 'raw', '-c', 'map', backing_img).stdout,
+qemu_io('-f', iotests.imgfmt, '-c', 'map', test_img).stdout,
+'image file map does not match backing file after streaming')
 
 def test_stream_intermediate(self):
 self.assert_no_active_block_jobs()
 
-self.assertNotEqual(qemu_io('-f', 'raw', '-rU', '-c', 'map', 
backing_img),
-qemu_io('-f', iotests.imgfmt, '-rU', '-c', 'map', 
mid_img),
-'image file map matches backing file before 
streaming')
+self.assertNotEqual(
+qemu_io('-f', 'raw', '-rU', '-c', 'map', backing_img).stdout,
+qemu_io('-f', iotests.imgfmt, '-rU', '-c', 'map', mid_img).stdout,
+'image file map matches backing file before streaming')
 
 result = self.vm.qmp('block-stream', device='mid', job_id='stream-mid')
 self.assert_qmp(result, 'return', {})
@@ -83,9 +85,10 @@ class TestSingleDrive(iotests.QMPTestCase):
 self.assert_no_active_block_jobs()
 self.vm.shutdown()
 
-self.assertEqual(qemu_io('-f', 'raw', '-c', 'map', backing_img),
- qemu_io('-f', iotests.imgfmt, '-c', 'map', mid_img),
- 'image file map does not match backing file after 
streaming')
+self.assertEqual(
+qemu_io('-f', 'raw', '-c', 'map', backing_img).stdout,
+qemu_io('-f', iotests.imgfmt, '-c', 'map', mid_img).stdout,
+'image file map does not match backing file after streaming')
 
 def test_stream_pause(self):
 self.assert_no_active_block_jobs()
@@ -113,15 +116,17 @@ class TestSingleDrive(iotests.QMPTestCase):
 self.assert_no_active_block_jobs()
 self.vm.shutdown()
 
-self.assertEqual(qemu_io('-f', 'raw', '-c', 'map', backing_img),
- qemu_io('-f', iotests.imgfmt, '-c', 'map', test_img),
- 'image file map does not match backing file after 
streaming')
+self.assertEqual(
+qemu_io('-f', 'raw', '-c', 'map', backing_img).stdout,
+qemu_io('-f', iotests.imgfmt, '-c', 'map', test_img).stdout,
+'image file map does not match backing file after streaming')
 
 def test_stream_no_op(self):
 self.assert_no_active_block_jobs()
 
 # The image map is empty before the operation
-empty_map = qemu_io('-f', iotests.imgfmt, '-rU', '-c', 'map', test_img)
+empty_map = qemu_io(
+'-f', iotests.imgfmt, '-rU', '-c', 'map', test_img).stdout
 
 # This is a no-op: no data should ever be copied from the base image
 result = self.vm.qmp('block-stream', device='drive0', base=mid_img)
@@ -132,8 +137,9 @@ class TestSingleDrive(iotests.QMPTestCase):
 self.assert_no_active_block_jobs()
 self.vm.shutdown()
 
-self.assertEqual(qemu_io('-f', iotests.imgfmt, '-c', 'map', test_img),
- empty_map, 'image file map changed after a no-op')
+self.assertEqual(
+qemu_io('-f', iotests.imgfmt, '-c', 'map', test_img).stdout,
+empty_map, 'image file map changed after a no-op')
 
 def test_stream_partial(self):
 self.assert_no_active_block_jobs()
@@ -146,9 +152,10 @@ class TestSingleDrive(iotests.QMPTestCase):
 self.assert_no_active_block_jobs()
 self.vm.shutdown()
 
-self.assertEqual(qemu_io('-f', iotests.imgfmt, '-c', 'map', mid_img),
- qemu_io('-f', iotests.imgfmt, '-c', 'map', test_img),
- 'image file map does not match backing file after 
streaming')
+self.assertEqual(
+qemu_io('-f', iotests.imgfmt, '-c', 'map', mid_img).stdout,
+qemu_io('-f', iotests.imgfmt, '-c', 'map', test_img).stdout,
+'image file map does not match backing file after streaming')
 
 def test_device_not_found(self):
 result = self.vm.qmp('block-stream', device='nonexistent')
@@ -236,9 +243,10 @@ class TestParallelOps(iotests.QMPTestCase):
 
 # Check that the maps don't match before the streaming operations
 for i in range(2, self.num_imgs, 2):
-

[PATCH 15/15] iotests: make qemu_io_log() check return codes by default

2022-03-18 Thread John Snow

Just like qemu_img_log(), upgrade qemu_io_log() to enforce a return code
of zero by default.

Affected tests: 242 245 255 274 303 307 nbd-reconnect-on-open

Signed-off-by: John Snow 
---
 tests/qemu-iotests/iotests.py  | 5 +++--
 tests/qemu-iotests/tests/nbd-reconnect-on-open | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 0b631e1f8c..c05637bd57 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -365,8 +365,9 @@ def qemu_io(*args: str, check: bool = True, combine_stdio: 
bool = True
 return qemu_tool(*qemu_io_wrap_args(args),
  check=check, combine_stdio=combine_stdio)
 
-def qemu_io_log(*args: str) -> subprocess.CompletedProcess[str]:
-result = qemu_io(*args, check=False)
+def qemu_io_log(*args: str, check: bool = True
+) -> subprocess.CompletedProcess[str]:
+result = qemu_io(*args, check=check)
 log(result.stdout, filters=[filter_testfiles, filter_qemu_io])
 return result
 
diff --git a/tests/qemu-iotests/tests/nbd-reconnect-on-open 
b/tests/qemu-iotests/tests/nbd-reconnect-on-open
index 8be721a24f..d0b401b060 100755
--- a/tests/qemu-iotests/tests/nbd-reconnect-on-open
+++ b/tests/qemu-iotests/tests/nbd-reconnect-on-open
@@ -39,7 +39,7 @@ def check_fail_to_connect(open_timeout):
 log(f'Check fail to connect with {open_timeout} seconds of timeout')
 
 start_t = time.time()
-qemu_io_log(*create_args(open_timeout))
+qemu_io_log(*create_args(open_timeout), check=False)
 delta_t = time.time() - start_t
 
 max_delta = open_timeout + 0.2
-- 
2.34.1

[PATCH 13/15] iotests: remove qemu_io_pipe_and_status()

2022-03-18 Thread John Snow

I know we just added it, sorry. This is done in favor of qemu_io() which
*also* returns the console output and status, but with more robust error
handling on failure.

Signed-off-by: John Snow 
---
 tests/qemu-iotests/iotests.py   |  3 ---
 tests/qemu-iotests/tests/image-fleecing | 12 +++-
 2 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 58ea766568..e8f38e7ad3 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -365,9 +365,6 @@ def qemu_io(*args: str, check: bool = True, combine_stdio: 
bool = True
 return qemu_tool(*qemu_io_wrap_args(args),
  check=check, combine_stdio=combine_stdio)
 
-def qemu_io_pipe_and_status(*args):
-return qemu_tool_pipe_and_status('qemu-io', qemu_io_wrap_args(args))
-
 def qemu_io_log(*args: str) -> subprocess.CompletedProcess[str]:
 result = qemu_io(*args, check=False)
 log(result.stdout, filters=[filter_testfiles, filter_qemu_io])
diff --git a/tests/qemu-iotests/tests/image-fleecing 
b/tests/qemu-iotests/tests/image-fleecing
index b7e5076104..07a4ea7bc4 100755
--- a/tests/qemu-iotests/tests/image-fleecing
+++ b/tests/qemu-iotests/tests/image-fleecing
@@ -23,8 +23,7 @@
 # Creator/Owner: John Snow 
 
 import iotests
-from iotests import log, qemu_img, qemu_io, qemu_io_silent, \
-qemu_io_pipe_and_status
+from iotests import log, qemu_img, qemu_io, qemu_io_silent
 
 iotests.script_initialize(
 supported_fmts=['qcow2'],
@@ -185,10 +184,7 @@ def do_test(vm, use_cbw, use_snapshot_access_filter, 
base_img_path,
 for p in patterns + zeroes:
 cmd = 'read -P%s %s %s' % p
 log(cmd)
-out, ret = qemu_io_pipe_and_status('-r', '-f', 'raw', '-c', cmd,
-   nbd_uri)
-if ret != 0:
-print(out)
+qemu_io('-r', '-f', 'raw', '-c', cmd, nbd_uri)
 
 log('')
 log('--- Testing COW ---')
@@ -228,9 +224,7 @@ def do_test(vm, use_cbw, use_snapshot_access_filter, 
base_img_path,
 args += [target_img_path]
 else:
 args += ['-f', 'raw', nbd_uri]
-out, ret = qemu_io_pipe_and_status(*args)
-if ret != 0:
-print(out)
+qemu_io(*args)
 
 log('')
 log('--- Cleanup ---')
-- 
2.34.1

[PATCH 10/15] iotests/245: fixup

2022-03-18 Thread John Snow

(Merge with prior patch.)

Signed-off-by: John Snow 
---
 tests/qemu-iotests/242 | 2 +-
 tests/qemu-iotests/245 | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/242 b/tests/qemu-iotests/242
index 4b7ec16af6..ecc851582a 100755
--- a/tests/qemu-iotests/242
+++ b/tests/qemu-iotests/242
@@ -22,7 +22,7 @@
 import iotests
 import json
 import struct
-from iotests import qemu_img_create, qemu_io, qemu_img_info, \
+from iotests import qemu_img_create, qemu_io_log, qemu_img_info, \
 file_path, img_info_log, log, filter_qemu_io
 
 iotests.script_initialize(supported_fmts=['qcow2'],
diff --git a/tests/qemu-iotests/245 b/tests/qemu-iotests/245
index 8cbed7821b..efdad1a0c4 100755
--- a/tests/qemu-iotests/245
+++ b/tests/qemu-iotests/245
@@ -217,7 +217,7 @@ class TestBlockdevReopen(iotests.QMPTestCase):
 # Reopen an image several times changing some of its options
 def test_reopen(self):
 # Check whether the filesystem supports O_DIRECT
-if 'O_DIRECT' in qemu_io('-f', 'raw', '-t', 'none', '-c', 'quit', 
hd_path[0]):
+if 'O_DIRECT' in qemu_io('-f', 'raw', '-t', 'none', '-c', 'quit', 
hd_path[0]).stdout:
 supports_direct = False
 else:
 supports_direct = True
-- 
2.34.1

[PATCH 05/15] iotests: create generic qemu_tool() function

2022-03-18 Thread John Snow

reimplement qemu_img() in terms of qemu_tool() in preparation for doing
the same with qemu_io().

Signed-off-by: John Snow 
---
 tests/qemu-iotests/iotests.py | 37 +++
 1 file changed, 24 insertions(+), 13 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 6cd8374c81..974a2b0c8d 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -207,15 +207,13 @@ def qemu_img_create_prepare_args(args: List[str]) -> 
List[str]:
 
 return result
 
-def qemu_img(*args: str, check: bool = True, combine_stdio: bool = True
+
+def qemu_tool(*args: str, check: bool = True, combine_stdio: bool = True
  ) -> subprocess.CompletedProcess[str]:
 """
-Run qemu_img and return the status code and console output.
+Run a qemu tool and return its status code and console output.
 
-This function always prepends QEMU_IMG_OPTIONS and may further alter
-the args for 'create' commands.
-
-:param args: command-line arguments to qemu-img.
+:param args: command-line arguments to a QEMU cli tool.
 :param check: Enforce a return code of zero.
 :param combine_stdio: set to False to keep stdout/stderr separated.
 
@@ -227,14 +225,13 @@ def qemu_img(*args: str, check: bool = True, 
combine_stdio: bool = True
 handled, the command-line, return code, and all console output
 will be included at the bottom of the stack trace.
 
-:return: a CompletedProcess. This object has args, returncode, and
-stdout properties. If streams are not combined, it will also
-have a stderr property.
+:return:
+A CompletedProcess. This object has args, returncode, and stdout
+properties. If streams are not combined, it will also have a
+stderr property.
 """
-full_args = qemu_img_args + qemu_img_create_prepare_args(list(args))
-
 subp = subprocess.run(
-full_args,
+args,
 stdout=subprocess.PIPE,
 stderr=subprocess.STDOUT if combine_stdio else subprocess.PIPE,
 universal_newlines=True,
@@ -243,7 +240,7 @@ def qemu_img(*args: str, check: bool = True, combine_stdio: 
bool = True
 
 if check and subp.returncode or (subp.returncode < 0):
 raise VerboseProcessError(
-subp.returncode, full_args,
+subp.returncode, args,
 output=subp.stdout,
 stderr=subp.stderr,
 )
@@ -251,6 +248,20 @@ def qemu_img(*args: str, check: bool = True, 
combine_stdio: bool = True
 return subp
 
 
+def qemu_img(*args: str, check: bool = True, combine_stdio: bool = True
+ ) -> subprocess.CompletedProcess[str]:
+"""
+Run QEMU_IMG_PROG and return its status code and console output.
+
+This function always prepends QEMU_IMG_OPTIONS and may further alter
+the args for 'create' commands.
+
+See `qemu_tool()` for greater detail.
+"""
+full_args = qemu_img_args + qemu_img_create_prepare_args(list(args))
+return qemu_tool(*full_args, check=check, combine_stdio=combine_stdio)
+
+
 def ordered_qmp(qmsg, conv_keys=True):
 # Dictionaries are not ordered prior to 3.6, therefore:
 if isinstance(qmsg, list):
-- 
2.34.1

[PATCH 08/15] iotests/149: fixup

2022-03-18 Thread John Snow

(Merge into prior patch.)

Notes: I don't quite like this change, but I'm at a loss for what would
be cleaner. This is a funky test.

Signed-off-by: John Snow 
---
 tests/qemu-iotests/149 | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/149 b/tests/qemu-iotests/149
index 9bb96d6a1d..2ae318f16f 100755
--- a/tests/qemu-iotests/149
+++ b/tests/qemu-iotests/149
@@ -295,7 +295,8 @@ def qemu_io_write_pattern(config, pattern, offset_mb, 
size_mb, dev=False):
 args = ["-c", "write -P 0x%x %dM %dM" % (pattern, offset_mb, size_mb)]
 args.extend(qemu_io_image_args(config, dev))
 iotests.log("qemu-io " + " ".join(args), filters=[iotests.filter_test_dir])
-iotests.log(check_cipher_support(config, iotests.qemu_io(*args)),
+output = iotests.qemu_io(*args, check=False).stdout
+iotests.log(check_cipher_support(config, output),
 filters=[iotests.filter_test_dir, iotests.filter_qemu_io])
 
 
@@ -307,7 +308,8 @@ def qemu_io_read_pattern(config, pattern, offset_mb, 
size_mb, dev=False):
 args = ["-c", "read -P 0x%x %dM %dM" % (pattern, offset_mb, size_mb)]
 args.extend(qemu_io_image_args(config, dev))
 iotests.log("qemu-io " + " ".join(args), filters=[iotests.filter_test_dir])
-iotests.log(check_cipher_support(config, iotests.qemu_io(*args)),
+output = iotests.qemu_io(*args, check=False).stdout
+iotests.log(check_cipher_support(config, output),
 filters=[iotests.filter_test_dir, iotests.filter_qemu_io])
 
 
-- 
2.34.1

[PATCH 01/15] iotests: replace calls to log(qemu_io(...)) with qemu_io_log()

2022-03-18 Thread John Snow

This makes these callsites a little simpler, but the real motivation is
a forthcoming commit will change the return type of qemu_io(), so removing
users of the return value now is helpful.

Signed-off-by: John Snow 
---
 tests/qemu-iotests/242 | 2 +-
 tests/qemu-iotests/255 | 4 +---
 tests/qemu-iotests/303 | 4 ++--
 3 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/tests/qemu-iotests/242 b/tests/qemu-iotests/242
index b3afd36d72..4b7ec16af6 100755
--- a/tests/qemu-iotests/242
+++ b/tests/qemu-iotests/242
@@ -61,7 +61,7 @@ def add_bitmap(bitmap_number, persistent, disabled):
 
 def write_to_disk(offset, size):
 write = 'write {} {}'.format(offset, size)
-log(qemu_io('-c', write, disk), filters=[filter_qemu_io])
+qemu_io_log('-c', write, disk)
 
 
 def toggle_flag(offset):
diff --git a/tests/qemu-iotests/255 b/tests/qemu-iotests/255
index f86fa851b6..88b29d64b4 100755
--- a/tests/qemu-iotests/255
+++ b/tests/qemu-iotests/255
@@ -95,9 +95,7 @@ with iotests.FilePath('src.qcow2') as src_path, \
 iotests.qemu_img_create('-f', iotests.imgfmt, src_path, size_str)
 iotests.qemu_img_create('-f', iotests.imgfmt, dst_path, size_str)
 
-iotests.log(iotests.qemu_io('-f', iotests.imgfmt, '-c', 'write 0 1M',
-src_path),
-filters=[iotests.filter_test_dir, iotests.filter_qemu_io])
+iotests.qemu_io_log('-f', iotests.imgfmt, '-c', 'write 0 1M', src_path),
 
 vm.add_object('throttle-group,x-bps-read=4096,id=throttle0')
 
diff --git a/tests/qemu-iotests/303 b/tests/qemu-iotests/303
index 93aa5ce9b7..32128b1d32 100755
--- a/tests/qemu-iotests/303
+++ b/tests/qemu-iotests/303
@@ -21,7 +21,7 @@
 
 import iotests
 import subprocess
-from iotests import qemu_img_create, qemu_io, file_path, log, filter_qemu_io
+from iotests import qemu_img_create, qemu_io_log, file_path, log
 
 iotests.script_initialize(supported_fmts=['qcow2'],
   unsupported_imgopts=['refcount_bits', 'compat'])
@@ -43,7 +43,7 @@ def create_bitmap(bitmap_number, disabled):
 
 def write_to_disk(offset, size):
 write = f'write {offset} {size}'
-log(qemu_io('-c', write, disk), filters=[filter_qemu_io])
+qemu_io_log('-c', write, disk)
 
 
 def add_bitmap(num, begin, end, disabled):
-- 
2.34.1

[PATCH 14/15] iotests: remove qemu_io_silent() and qemu_io_silent_check().

2022-03-18 Thread John Snow

Like qemu-img, qemu-io returning 0 should be the norm and not the
exception. Remove all calls to qemu_io_silent that just assert the
return code is zero (That's every last call, as it turns out), and
replace them with a normal qemu_io() call.

Signed-off-by: John Snow 
---
 tests/qemu-iotests/216| 12 +-
 tests/qemu-iotests/218|  5 ++---
 tests/qemu-iotests/224|  4 ++--
 tests/qemu-iotests/258| 12 +-
 tests/qemu-iotests/298| 16 ++
 tests/qemu-iotests/310| 22 +--
 tests/qemu-iotests/iotests.py | 16 --
 tests/qemu-iotests/tests/image-fleecing   |  4 ++--
 .../tests/mirror-ready-cancel-error   |  2 +-
 .../qemu-iotests/tests/stream-error-on-reset  |  4 ++--
 10 files changed, 39 insertions(+), 58 deletions(-)

diff --git a/tests/qemu-iotests/216 b/tests/qemu-iotests/216
index 88b385afa3..97de1cda61 100755
--- a/tests/qemu-iotests/216
+++ b/tests/qemu-iotests/216
@@ -21,7 +21,7 @@
 # Creator/Owner: Max Reitz 
 
 import iotests
-from iotests import log, qemu_img, qemu_io_silent
+from iotests import log, qemu_img, qemu_io
 
 # Need backing file support
 iotests.script_initialize(supported_fmts=['qcow2', 'qcow', 'qed', 'vmdk'],
@@ -52,10 +52,10 @@ with iotests.FilePath('base.img') as base_img_path, \
 log('')
 
 qemu_img('create', '-f', iotests.imgfmt, base_img_path, '64M')
-assert qemu_io_silent(base_img_path, '-c', 'write -P 1 0M 1M') == 0
+qemu_io(base_img_path, '-c', 'write -P 1 0M 1M')
 qemu_img('create', '-f', iotests.imgfmt, '-b', base_img_path,
  '-F', iotests.imgfmt, top_img_path)
-assert qemu_io_silent(top_img_path,  '-c', 'write -P 2 1M 1M') == 0
+qemu_io(top_img_path,  '-c', 'write -P 2 1M 1M')
 
 log('Done')
 
@@ -110,8 +110,8 @@ with iotests.FilePath('base.img') as base_img_path, \
 log('--- Checking COR result ---')
 log('')
 
-assert qemu_io_silent(base_img_path, '-c', 'discard 0 64M') == 0
-assert qemu_io_silent(top_img_path,  '-c', 'read -P 1 0M 1M') == 0
-assert qemu_io_silent(top_img_path,  '-c', 'read -P 2 1M 1M') == 0
+qemu_io(base_img_path, '-c', 'discard 0 64M')
+qemu_io(top_img_path,  '-c', 'read -P 1 0M 1M')
+qemu_io(top_img_path,  '-c', 'read -P 2 1M 1M')
 
 log('Done')
diff --git a/tests/qemu-iotests/218 b/tests/qemu-iotests/218
index 853ed52b34..0c717c9b7f 100755
--- a/tests/qemu-iotests/218
+++ b/tests/qemu-iotests/218
@@ -28,7 +28,7 @@
 # Creator/Owner: Max Reitz 
 
 import iotests
-from iotests import log, qemu_img, qemu_io_silent
+from iotests import log, qemu_img, qemu_io
 
 iotests.script_initialize(supported_fmts=['qcow2', 'raw'])
 
@@ -146,8 +146,7 @@ with iotests.VM() as vm, \
  iotests.FilePath('src.img') as src_img_path:
 
 qemu_img('create', '-f', iotests.imgfmt, src_img_path, '64M')
-assert qemu_io_silent('-f', iotests.imgfmt, src_img_path,
-  '-c', 'write -P 42 0M 64M') == 0
+qemu_io('-f', iotests.imgfmt, src_img_path, '-c', 'write -P 42 0M 64M')
 
 vm.launch()
 
diff --git a/tests/qemu-iotests/224 b/tests/qemu-iotests/224
index c31c55b49d..8eb3ceb8d1 100755
--- a/tests/qemu-iotests/224
+++ b/tests/qemu-iotests/224
@@ -22,7 +22,7 @@
 # Creator/Owner: Max Reitz 
 
 import iotests
-from iotests import log, qemu_img, qemu_io_silent, filter_qmp_testfiles, \
+from iotests import log, qemu_img, qemu_io, filter_qmp_testfiles, \
 filter_qmp_imgfmt
 import json
 
@@ -54,7 +54,7 @@ for filter_node_name in False, True:
  '-F', iotests.imgfmt, top_img_path)
 
 # Something to commit
-assert qemu_io_silent(mid_img_path, '-c', 'write -P 1 0 1M') == 0
+qemu_io(mid_img_path, '-c', 'write -P 1 0 1M')
 
 vm.launch()
 
diff --git a/tests/qemu-iotests/258 b/tests/qemu-iotests/258
index 7798a04d7d..e11a0e09b3 100755
--- a/tests/qemu-iotests/258
+++ b/tests/qemu-iotests/258
@@ -21,7 +21,7 @@
 # Creator/Owner: Max Reitz 
 
 import iotests
-from iotests import log, qemu_img, qemu_io_silent, \
+from iotests import log, qemu_img, qemu_io, \
 filter_qmp_testfiles, filter_qmp_imgfmt
 
 # Returns a node for blockdev-add
@@ -86,15 +86,15 @@ def test_concurrent_finish(write_to_stream_node):
 if write_to_stream_node:
 # This is what (most of the time) makes commit finish
 # earlier and then pull in stream
-assert qemu_io_silent(node2_path,
-  '-c', 'write %iK 64K' % (65536 - 192),
-  '-c', 'write %iK 64K' % (65536 -  64)) == 0
+qemu_io(node2_path,
+'-c', 'write %iK 64K' % (65536 - 192),
+'-c', 'write %iK 64K' % (65536 -  64))
 
 stream_throttle='tg'
 else:
 # And this makes stream finish earlier

[PATCH 12/15] iotests/migration-permissions: use assertRaises() for qemu_io() negative test

2022-03-18 Thread John Snow

Modify this test to use assertRaises for its negative testing of
qemu_io. If the exception raised does not match the one we tell it to
expect, we get *that* exception unhandled. If we get no exception, we
get a unittest assertion failure and the provided emsg printed to
screen.

If we get the CalledProcessError exception but the output is not what we
expect, we re-raise the original CalledProcessError.

Tidy.

Signed-off-by: John Snow 
---
 .../qemu-iotests/tests/migration-permissions  | 28 +--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/tests/qemu-iotests/tests/migration-permissions 
b/tests/qemu-iotests/tests/migration-permissions
index c7afb1bd2c..4e1da369c9 100755
--- a/tests/qemu-iotests/tests/migration-permissions
+++ b/tests/qemu-iotests/tests/migration-permissions
@@ -18,6 +18,8 @@
 #
 
 import os
+from subprocess import CalledProcessError
+
 import iotests
 from iotests import imgfmt, qemu_img_create, qemu_io
 
@@ -69,13 +71,12 @@ class TestMigrationPermissions(iotests.QMPTestCase):
 def test_post_migration_permissions(self):
 # Try to access the image R/W, which should fail because virtio-blk
 # has not been configured with share-rw=on
-log = qemu_io('-f', imgfmt, '-c', 'quit', test_img, check=False).stdout
-if not log.strip():
-print('ERROR (pre-migration): qemu-io should not be able to '
-  'access this image, but it reported no error')
-else:
-# This is the expected output
-assert 'Is another process using the image' in log
+emsg = ('ERROR (pre-migration): qemu-io should not be able to '
+'access this image, but it reported no error')
+with self.assertRaises(CalledProcessError, msg=emsg) as ctx:
+qemu_io('-f', imgfmt, '-c', 'quit', test_img)
+if 'Is another process using the image' not in ctx.exception.stdout:
+raise ctx.exception
 
 # Now migrate the VM
 self.vm_s.qmp('migrate', uri=f'unix:{mig_sock}')
@@ -84,13 +85,12 @@ class TestMigrationPermissions(iotests.QMPTestCase):
 
 # Try the same qemu-io access again, verifying that the WRITE
 # permission remains unshared
-log = qemu_io('-f', imgfmt, '-c', 'quit', test_img, check=False).stdout
-if not log.strip():
-print('ERROR (post-migration): qemu-io should not be able to '
-  'access this image, but it reported no error')
-else:
-# This is the expected output
-assert 'Is another process using the image' in log
+emsg = ('ERROR (post-migration): qemu-io should not be able to '
+'access this image, but it reported no error')
+with self.assertRaises(CalledProcessError, msg=emsg) as ctx:
+qemu_io('-f', imgfmt, '-c', 'quit', test_img)
+if 'Is another process using the image' not in ctx.exception.stdout:
+raise ctx.exception
 
 
 if __name__ == '__main__':
-- 
2.34.1

[PATCH 09/15] iotests/205: fixup

2022-03-18 Thread John Snow

(Merge into prior patch.)

Signed-off-by: John Snow 
---
 tests/qemu-iotests/205 | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/205 b/tests/qemu-iotests/205
index c0e107328f..15f798288a 100755
--- a/tests/qemu-iotests/205
+++ b/tests/qemu-iotests/205
@@ -85,13 +85,13 @@ class TestNbdServerRemove(iotests.QMPTestCase):
 
 def do_test_connect_after_remove(self, mode=None):
 args = ('-r', '-f', 'raw', '-c', 'read 0 512', nbd_uri)
-self.assertReadOk(qemu_io(*args))
+self.assertReadOk(qemu_io(*args).stdout)
 
 result = self.remove_export('exp', mode)
 self.assert_qmp(result, 'return', {})
 
 self.assertExportNotFound('exp')
-self.assertConnectFailed(qemu_io(*args))
+self.assertConnectFailed(qemu_io(*args, check=False).stdout)
 
 def test_connect_after_remove_default(self):
 self.do_test_connect_after_remove()
-- 
2.34.1

[PATCH 04/15] iotests/040: Don't check image pattern on zero-length image

2022-03-18 Thread John Snow

qemu-io fails on read/write with zero-length raw images, so skip these
when running the zero-length image tests.

Signed-off-by: John Snow 
---
 tests/qemu-iotests/040 | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
index adf5815781..c4a90937dc 100755
--- a/tests/qemu-iotests/040
+++ b/tests/qemu-iotests/040
@@ -86,8 +86,10 @@ class TestSingleDrive(ImageCommitTestCase):
 qemu_img('create', '-f', iotests.imgfmt,
  '-o', 'backing_file=%s' % mid_img,
  '-F', iotests.imgfmt, test_img)
-qemu_io('-f', 'raw', '-c', 'write -P 0xab 0 524288', backing_img)
-qemu_io('-f', iotests.imgfmt, '-c', 'write -P 0xef 524288 524288', 
mid_img)
+if self.image_len:
+qemu_io('-f', 'raw', '-c', 'write -P 0xab 0 524288', backing_img)
+qemu_io('-f', iotests.imgfmt, '-c', 'write -P 0xef 524288 524288',
+mid_img)
 self.vm = iotests.VM().add_drive(test_img, 
"node-name=top,backing.node-name=mid,backing.backing.node-name=base", 
interface="none")
 self.vm.add_device('virtio-scsi')
 self.vm.add_device("scsi-hd,id=scsi0,drive=drive0")
@@ -101,11 +103,15 @@ class TestSingleDrive(ImageCommitTestCase):
 
 def test_commit(self):
 self.run_commit_test(mid_img, backing_img)
+if not self.image_len:
+return
 qemu_io('-f', 'raw', '-c', 'read -P 0xab 0 524288', backing_img)
 qemu_io('-f', 'raw', '-c', 'read -P 0xef 524288 524288', backing_img)
 
 def test_commit_node(self):
 self.run_commit_test("mid", "base", node_names=True)
+if not self.image_len:
+return
 qemu_io('-f', 'raw', '-c', 'read -P 0xab 0 524288', backing_img)
 qemu_io('-f', 'raw', '-c', 'read -P 0xef 524288 524288', backing_img)
 
@@ -192,11 +198,15 @@ class TestSingleDrive(ImageCommitTestCase):
 
 def test_top_is_active(self):
 self.run_commit_test(test_img, backing_img, need_ready=True)
+if not self.image_len:
+return
 qemu_io('-f', 'raw', '-c', 'read -P 0xab 0 524288', backing_img)
 qemu_io('-f', 'raw', '-c', 'read -P 0xef 524288 524288', backing_img)
 
 def test_top_is_default_active(self):
 self.run_default_commit_test()
+if not self.image_len:
+return
 qemu_io('-f', 'raw', '-c', 'read -P 0xab 0 524288', backing_img)
 qemu_io('-f', 'raw', '-c', 'read -P 0xef 524288 524288', backing_img)
 
-- 
2.34.1

[PATCH 00/15] iotests: add enhanced debugging info to qemu-io failures

2022-03-18 Thread John Snow

Howdy,

This series does for qemu_io() what we've done for qemu_img() and makes
this a function that checks the return code by default and raises an
Exception when things do not go according to plan.

This series removes qemu_io_pipe_and_status(), qemu_io_silent(), and
qemu_io_silent_check() in favor of just qemu_io().

RFC:

- There are a few remaining uses of qemu-io that don't go through qemu_io;
QemuIoInteractive is a user that is used in 205, 298, 299, and 307. It
... did not appear worth it to morph qemu_tool_popen into something that
could be used by both QemuIoInteractive *and* qemu_io(), so I left it
alone. It's probably fine for now. (But it does bother me, a little.)

- qemu_io_popen itself is used by the nbd-reconnect-on-open test, and it
seems like a legitimate use -- it wants concurrency. Like the above
problem, I couldn't find a way to bring it into the fold, so I
didn't. (Meh.) I eventually plan to add asyncio subprocess management to
machine.py, and I could tackle stuff like this then. It's not worth it
now.

- Several patches in this patchset ("fixup" in the title) are designed to
be merged-on-commit. I know that's not usually how we do things, but I
thought it was actually nicer than pre-squashing it because it gives me
more flexibility on re-spin.

- Uh, actually, test 040 fails with this patchset and I don't understand
  if it's intentional, harmless, a test design problem, or worse:

==
ERROR: test_filterless_commit (__main__.TestCommitWithFilters)
--
Traceback (most recent call last):
  File "/home/jsnow/src/qemu/tests/qemu-iotests/040", line 822, in tearDown
self.do_test_io('read')
  File "/home/jsnow/src/qemu/tests/qemu-iotests/040", line 751, in do_test_io
qemu_io('-f', iotests.imgfmt,
  File "/home/jsnow/src/qemu/tests/qemu-iotests/iotests.py", line 365, in 
qemu_io
return qemu_tool(*qemu_io_wrap_args(args),
  File "/home/jsnow/src/qemu/tests/qemu-iotests/iotests.py", line 242, in 
qemu_tool
raise VerboseProcessError(

qemu.utils.VerboseProcessError: Command
  '('/home/jsnow/src/qemu/bin/git/tests/qemu-iotests/../../qemu-io',
  '--cache', 'writeback', '--aio', 'threads', '-f', 'qcow2', '-c',
  'read -P 4 3M 1M',
  '/home/jsnow/src/qemu/bin/git/tests/qemu-iotests/scratch/3.img')'
  returned non-zero exit status 1.
  ┏━ output 
  ┃ qemu-io: can't open device
  ┃ /home/jsnow/src/qemu/bin/git/tests/qemu-iotests/scratch/3.img:
  ┃ Could not open backing file: Could not open backing file: Throttle
  ┃ group 'tg' does not exist
  ┗━

It looks like we start with the img chain 3->2->1->0, then we commit 2
down into 1, but checking '3' fails somewhere in the backing
chain. Maybe a real bug?

John Snow (15):
  iotests: replace calls to log(qemu_io(...)) with qemu_io_log()
  iotests/163: Fix broken qemu-io invocation
  iotests: Don't check qemu_io() output for specific error strings
  iotests/040: Don't check image pattern on zero-length image
  iotests: create generic qemu_tool() function
  iotests: rebase qemu_io() on top of qemu_tool()
  iotests/030: fixup
  iotests/149: fixup
  iotests/205: fixup
  iotests/245: fixup
  iotests/migration-permissions: fixup
  iotests/migration-permissions: use assertRaises() for qemu_io()
negative test
  iotests: remove qemu_io_pipe_and_status()
  iotests: remove qemu_io_silent() and qemu_io_silent_check().
  iotests: make qemu_io_log() check return codes by default

 tests/qemu-iotests/030| 85 +++
 tests/qemu-iotests/040| 47 +-
 tests/qemu-iotests/056|  2 +-
 tests/qemu-iotests/149|  6 +-
 tests/qemu-iotests/163|  5 +-
 tests/qemu-iotests/205|  4 +-
 tests/qemu-iotests/216| 12 +--
 tests/qemu-iotests/218|  5 +-
 tests/qemu-iotests/224|  4 +-
 tests/qemu-iotests/242|  4 +-
 tests/qemu-iotests/245|  2 +-
 tests/qemu-iotests/255|  4 +-
 tests/qemu-iotests/258| 12 +--
 tests/qemu-iotests/298| 16 ++--
 tests/qemu-iotests/303|  4 +-
 tests/qemu-iotests/310| 22 ++---
 tests/qemu-iotests/iotests.py | 74 
 tests/qemu-iotests/tests/image-fleecing   | 14 +--
 .../qemu-iotests/tests/migration-permissions  | 28 +++---
 .../tests/mirror-ready-cancel-error   |  2 +-
 .../qemu-iotests/tests/nbd-reconnect-on-open  |  2 +-
 .../qemu-iotests/tests/stream-error-on-reset  |  4 +-
 22 files changed, 184 insertions(+), 174 deletions(-)

-- 
2.34.1

[PATCH 11/15] iotests/migration-permissions: fixup

2022-03-18 Thread John Snow

(Merge into prior commit)

Note, this is a quick hack band-aid, but a follow-up patch spends the
time to refactor it a bit. This is just the quick stop-gap to prevent
bisection failures.

Signed-off-by: John Snow 
---
 tests/qemu-iotests/tests/migration-permissions | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/tests/migration-permissions 
b/tests/qemu-iotests/tests/migration-permissions
index 6be02581c7..c7afb1bd2c 100755
--- a/tests/qemu-iotests/tests/migration-permissions
+++ b/tests/qemu-iotests/tests/migration-permissions
@@ -69,7 +69,7 @@ class TestMigrationPermissions(iotests.QMPTestCase):
 def test_post_migration_permissions(self):
 # Try to access the image R/W, which should fail because virtio-blk
 # has not been configured with share-rw=on
-log = qemu_io('-f', imgfmt, '-c', 'quit', test_img)
+log = qemu_io('-f', imgfmt, '-c', 'quit', test_img, check=False).stdout
 if not log.strip():
 print('ERROR (pre-migration): qemu-io should not be able to '
   'access this image, but it reported no error')
@@ -84,7 +84,7 @@ class TestMigrationPermissions(iotests.QMPTestCase):
 
 # Try the same qemu-io access again, verifying that the WRITE
 # permission remains unshared
-log = qemu_io('-f', imgfmt, '-c', 'quit', test_img)
+log = qemu_io('-f', imgfmt, '-c', 'quit', test_img, check=False).stdout
 if not log.strip():
 print('ERROR (post-migration): qemu-io should not be able to '
   'access this image, but it reported no error')
-- 
2.34.1

[PATCH 06/15] iotests: rebase qemu_io() on top of qemu_tool()

2022-03-18 Thread John Snow

Rework qemu_io() to be analogous to qemu_img(); a function that requires
a return code of zero by default unless disabled explicitly.

Tests that use qemu_io():
030 040 041 044 055 056 093 124 129 132 136 148 149 151 152 163 165 205
209 219 236 245 248 254 255 257 260 264 280 298 300 302 304
image-fleecing migrate-bitmaps-postcopy-test migrate-bitmaps-test
migrate-during-backup migration-permissions

Test that use qemu_io_log():
242 245 255 274 303 307 nbd-reconnect-on-open

Signed-off-by: John Snow 

---

Note: This breaks several tests at this point. I'll be fixing each
broken test one by one in the subsequent commits. We can squash them all
on merge to avoid test regressions.

(Seems like a way to have your cake and eat it too with regards to
maintaining bisectability while also having nice mailing list patches.)

Copy-pastables:

./check -qcow2 030 040 041 044 055 056 124 129 132 151 152 163 165 209 \
   219 236 242 245 248 254 255 257 260 264 274 \
   280 298 300 302 303 304 307 image-fleecing \
   migrate-bitmaps-postcopy-test migrate-bitmaps-test \
   migrate-during-backup nbd-reconnect-on-open

./check -raw 093 136 148 migration-permissions

./check -nbd 205

# ./configure configure --disable-gnutls --enable-gcrypt
# this ALSO requires passwordless sudo.
./check -luks 149


# Just the ones that fail:
./check -qcow2 030 040 242 245
./check -raw migration-permissions
./check -nbd 205
./check -luks 149

Signed-off-by: John Snow 
---
 tests/qemu-iotests/iotests.py | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 974a2b0c8d..58ea766568 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -354,16 +354,23 @@ def qemu_io_wrap_args(args: Sequence[str]) -> List[str]:
 def qemu_io_popen(*args):
 return qemu_tool_popen(qemu_io_wrap_args(args))
 
-def qemu_io(*args):
-'''Run qemu-io and return the stdout data'''
-return qemu_tool_pipe_and_status('qemu-io', qemu_io_wrap_args(args))[0]
+def qemu_io(*args: str, check: bool = True, combine_stdio: bool = True
+) -> subprocess.CompletedProcess[str]:
+"""
+Run QEMU_IO_PROG and return the status code and console output.
+
+This function always prepends either QEMU_IO_OPTIONS or
+QEMU_IO_OPTIONS_NO_FMT.
+"""
+return qemu_tool(*qemu_io_wrap_args(args),
+ check=check, combine_stdio=combine_stdio)
 
 def qemu_io_pipe_and_status(*args):
 return qemu_tool_pipe_and_status('qemu-io', qemu_io_wrap_args(args))
 
-def qemu_io_log(*args):
-result = qemu_io(*args)
-log(result, filters=[filter_testfiles, filter_qemu_io])
+def qemu_io_log(*args: str) -> subprocess.CompletedProcess[str]:
+result = qemu_io(*args, check=False)
+log(result.stdout, filters=[filter_testfiles, filter_qemu_io])
 return result
 
 def qemu_io_silent(*args):
-- 
2.34.1

[PATCH 02/15] iotests/163: Fix broken qemu-io invocation

2022-03-18 Thread John Snow

The 'read' commands to qemu-io were malformed, and this invocation only
worked by coincidence because the error messages were identical. Oops.

There's no point in checking the patterning of the reference image, so
just check the empty image by itself instead.

(Note: as of this commit, nothing actually enforces that this command
completes successfully, but a forthcoming commit in this series will
enforce that qemu_io() must have a zero status code.)

Signed-off-by: John Snow 
---
 tests/qemu-iotests/163 | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/tests/qemu-iotests/163 b/tests/qemu-iotests/163
index e4cd4b230f..c94ad16f4a 100755
--- a/tests/qemu-iotests/163
+++ b/tests/qemu-iotests/163
@@ -113,10 +113,7 @@ class ShrinkBaseClass(iotests.QMPTestCase):
 qemu_img('resize',  '-f', iotests.imgfmt, '--shrink', test_img,
  self.shrink_size)
 
-self.assertEqual(
-qemu_io('-c', 'read -P 0x00 %s'%self.shrink_size, test_img),
-qemu_io('-c', 'read -P 0x00 %s'%self.shrink_size, check_img),
-"Verifying image content")
+qemu_io('-c', f"read -P 0x00 0 {self.shrink_size}", test_img)
 
 self.image_verify()
 
-- 
2.34.1

[PATCH 03/15] iotests: Don't check qemu_io() output for specific error strings

2022-03-18 Thread John Snow

A forthcoming commit updates qemu_io() to raise an exception on non-zero
return by default, and changes its return type.

In preparation, simplify some calls to qemu_io() that assert that
specific error message strings do not appear in qemu-io's
output. Asserting that all of these calls return a status code of zero
will be a more robust way to guard against failure.

Signed-off-by: John Snow 
---
 tests/qemu-iotests/040 | 33 -
 tests/qemu-iotests/056 |  2 +-
 2 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
index 0e1cfd7e49..adf5815781 100755
--- a/tests/qemu-iotests/040
+++ b/tests/qemu-iotests/040
@@ -101,13 +101,13 @@ class TestSingleDrive(ImageCommitTestCase):
 
 def test_commit(self):
 self.run_commit_test(mid_img, backing_img)
-self.assertEqual(-1, qemu_io('-f', 'raw', '-c', 'read -P 0xab 0 
524288', backing_img).find("verification failed"))
-self.assertEqual(-1, qemu_io('-f', 'raw', '-c', 'read -P 0xef 524288 
524288', backing_img).find("verification failed"))
+qemu_io('-f', 'raw', '-c', 'read -P 0xab 0 524288', backing_img)
+qemu_io('-f', 'raw', '-c', 'read -P 0xef 524288 524288', backing_img)
 
 def test_commit_node(self):
 self.run_commit_test("mid", "base", node_names=True)
-self.assertEqual(-1, qemu_io('-f', 'raw', '-c', 'read -P 0xab 0 
524288', backing_img).find("verification failed"))
-self.assertEqual(-1, qemu_io('-f', 'raw', '-c', 'read -P 0xef 524288 
524288', backing_img).find("verification failed"))
+qemu_io('-f', 'raw', '-c', 'read -P 0xab 0 524288', backing_img)
+qemu_io('-f', 'raw', '-c', 'read -P 0xef 524288 524288', backing_img)
 
 @iotests.skip_if_unsupported(['throttle'])
 def test_commit_with_filter_and_quit(self):
@@ -192,13 +192,13 @@ class TestSingleDrive(ImageCommitTestCase):
 
 def test_top_is_active(self):
 self.run_commit_test(test_img, backing_img, need_ready=True)
-self.assertEqual(-1, qemu_io('-f', 'raw', '-c', 'read -P 0xab 0 
524288', backing_img).find("verification failed"))
-self.assertEqual(-1, qemu_io('-f', 'raw', '-c', 'read -P 0xef 524288 
524288', backing_img).find("verification failed"))
+qemu_io('-f', 'raw', '-c', 'read -P 0xab 0 524288', backing_img)
+qemu_io('-f', 'raw', '-c', 'read -P 0xef 524288 524288', backing_img)
 
 def test_top_is_default_active(self):
 self.run_default_commit_test()
-self.assertEqual(-1, qemu_io('-f', 'raw', '-c', 'read -P 0xab 0 
524288', backing_img).find("verification failed"))
-self.assertEqual(-1, qemu_io('-f', 'raw', '-c', 'read -P 0xef 524288 
524288', backing_img).find("verification failed"))
+qemu_io('-f', 'raw', '-c', 'read -P 0xab 0 524288', backing_img)
+qemu_io('-f', 'raw', '-c', 'read -P 0xef 524288 524288', backing_img)
 
 def test_top_and_base_reversed(self):
 self.assert_no_active_block_jobs()
@@ -334,8 +334,8 @@ class TestRelativePaths(ImageCommitTestCase):
 
 def test_commit(self):
 self.run_commit_test(self.mid_img, self.backing_img)
-self.assertEqual(-1, qemu_io('-f', 'raw', '-c', 'read -P 0xab 0 
524288', self.backing_img_abs).find("verification failed"))
-self.assertEqual(-1, qemu_io('-f', 'raw', '-c', 'read -P 0xef 524288 
524288', self.backing_img_abs).find("verification failed"))
+qemu_io('-f', 'raw', '-c', 'read -P 0xab 0 524288', 
self.backing_img_abs)
+qemu_io('-f', 'raw', '-c', 'read -P 0xef 524288 524288', 
self.backing_img_abs)
 
 def test_device_not_found(self):
 result = self.vm.qmp('block-commit', device='nonexistent', top='%s' % 
self.mid_img)
@@ -361,8 +361,8 @@ class TestRelativePaths(ImageCommitTestCase):
 
 def test_top_is_active(self):
 self.run_commit_test(self.test_img, self.backing_img)
-self.assertEqual(-1, qemu_io('-f', 'raw', '-c', 'read -P 0xab 0 
524288', self.backing_img_abs).find("verification failed"))
-self.assertEqual(-1, qemu_io('-f', 'raw', '-c', 'read -P 0xef 524288 
524288', self.backing_img_abs).find("verification failed"))
+qemu_io('-f', 'raw', '-c', 'read -P 0xab 0 524288', 
self.backing_img_abs)
+qemu_io('-f', 'raw', '-c', 'read -P 0xef 524288 524288', 
self.backing_img_abs)
 
 def test_top_and_base_reversed(self):
 self.assert_no_active_block_jobs()
@@ -738,11 +738,10 @@ class TestCommitWithFilters(iotests.QMPTestCase):
 
 def do_test_io(self, read_or_write):
 for index, pattern_file in enumerate(self.pattern_files):
-result = qemu_io('-f', iotests.imgfmt,
- '-c',
- f'{read_or_write} -P {index + 1} {index}M 1M',
- pattern_file)
-self.assertFalse('Pattern verification failed' in result)
+qemu_io('-f', iotests.imgfmt,
+'-c',

Re: [PATCH] target/i386: kvm: do not access uninitialized variable on older kernels

2022-03-18 Thread Volker Rümelin


Am 18.03.22 um 16:26 schrieb Paolo Bonzini:

KVM support for AMX includes a new system attribute, KVM_X86_XCOMP_GUEST_SUPP.
Commit 19db68ca68 ("x86: Grant AMX permission for guest", 2022-03-15) however
did not fully consider the behavior on older kernels.  First, it warns
too aggressively.  Second, it invokes the KVM_GET_DEVICE_ATTR ioctl
unconditionally and then uses the "bitmask" variable, which remains
uninitialized if the ioctl fails.

While at it, explain why the ioctl is needed and KVM_GET_SUPPORTED_CPUID
is not enough.

Signed-off-by: Paolo Bonzini 
---
  target/i386/kvm/kvm.c | 17 +
  1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index ce0e8a4042..f2c9f7b5ca 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -412,6 +412,12 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
  }
  } else if (function == 0xd && index == 0 &&
 (reg == R_EAX || reg == R_EDX)) {
+/*
+ * The value returned by KVM_GET_SUPPORTED_CPUID does not include
+ * features that still have to be enabled with the arch_prctl
+ * system call.  QEMU needs the full value, which is retrieved
+ * with KVM_GET_DEVICE_ATTR.
+ */
  struct kvm_device_attr attr = {
  .group = 0,
  .attr = KVM_X86_XCOMP_GUEST_SUPP,
@@ -420,13 +426,16 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
  
  bool sys_attr = kvm_check_extension(s, KVM_CAP_SYS_ATTRIBUTES);

  if (!sys_attr) {
-warn_report("cannot get sys attribute capabilities %d", sys_attr);
+return ret;
  }
  
  int rc = kvm_ioctl(s, KVM_GET_DEVICE_ATTR, );

-if (rc == -1 && (errno == ENXIO || errno == EINVAL)) {
-warn_report("KVM_GET_DEVICE_ATTR(0, KVM_X86_XCOMP_GUEST_SUPP) "
-"error: %d", rc);
+if (rc == -1) {


Hi Paolo,

this is kvm_ioctl() not ioctl(). kvm_ioctl() returns -errno on error.

With best regards,
Volker


+if (errno != ENXIO) {
+warn_report("KVM_GET_DEVICE_ATTR(0, KVM_X86_XCOMP_GUEST_SUPP) "
+"error: %d", rc);
+}
+return ret;
  }
  ret = (reg == R_EAX) ? bitmask : bitmask >> 32;
  } else if (function == 0x8001 && reg == R_ECX) {

Re: Question about vmstate_register(), dc->vmsd and instance_id

2022-03-18 Thread Daniel Henrique Barboza





On 3/18/22 00:43, David Gibson wrote:

On Thu, Mar 17, 2022 at 04:29:14PM +, Dr. David Alan Gilbert wrote:

* Peter Maydell (peter.mayd...@linaro.org) wrote:

On Thu, 17 Mar 2022 at 14:03, Daniel Henrique Barboza
 wrote:

I've been looking into converting some vmstate_register() calls to use dc->vmsd,
using as a base the docs in docs/devel/migration.rst. This doc mentions that we
can either register the vmsd by using vmstate_register() or we can use dc->vmsd
for qdev-based devices.

When trying to convert this vmstate() call for the qdev alternative 
(hw/ppc/spapr_drc.c,
drc_realize()) I found this:

  vmstate_register(VMSTATE_IF(drc), spapr_drc_index(drc), 
_spapr_drc,
   drc);

spapr_drc_index() is an unique identifier for these DRC devices and it's being 
used
as instance_id. It is not clear to me how we can keep using this same 
instance_id when
using the dc->vmsd alternative. By looking a bit into migration files I 
understood
that if dc->vmsd is being used the instance_id is always autogenerated. Is that 
correct?


Not entirely. It is the intended common setup, but because changing
the ID value breaks migration compatibility there is a mechanism
for saying "my device is special and needs to set the instance ID
to something else" -- qdev_set_legacy_instance_id().


Yes, this is normally only an issue for 'system' or memory mapped
devices;  for things hung off a bus that has it's own device naming,
then each instance of a device has it's own device due to the bus name
so instance_id's aren't used.  Where you've got a few of the
same device with the same name, and no bus for them to be named by, then
the instance_id is used to uniquify them.



Thanks for the info. qdev_set_legacy_instance_id() was the missing piece I was
looking for to continue with the dc->vmsd transition I'd like to do.




Thanks for the information.  I remember deciding at the time that just
using vmsd wouldn't work for the DRCs because we needed this fixed
index.  At the time either qdev_set_legacy_instance_id() didn't exist,
or I didn't know about it, hence the explicit vmstate_register() call
so that an explicit instance id could be supplied.



This is the commit that introduced DRC migration:


commit a50919dddf148b0a2008db4a0593dbe69e1059c0
Author: Daniel Henrique Barboza 
Date:   Mon May 22 16:35:49 2017 -0300

hw/ppc: migrating the DRC state of hotplugged devices


I'd say you can cut yourself some slack this time. Blame that guy instead.




Thanks,


Daniel

[PATCH v7 12/12] hw/acpi: Make the PCI hot-plug aware of SR-IOV

2022-03-18 Thread Lukasz Maniak

From: Łukasz Gieryk 

PCI device capable of SR-IOV support is a new, still-experimental
feature with only a single working example of the Nvme device.

This patch in an attempt to fix a double-free problem when a
SR-IOV-capable Nvme device is hot-unplugged. The problem and the
reproduction steps can be found in this thread:

https://patchew.org/QEMU/20220217174504.1051716-1-lukasz.man...@linux.intel.com/20220217174504.1051716-14-lukasz.man...@linux.intel.com/

Details of the proposed solution are, for convenience, included below.

1) The current SR-IOV implementation assumes it’s the PhysicalFunction
   that creates and deletes VirtualFunctions.
2) It’s a design decision (the Nvme device at least) for the VFs to be
   of the same class as PF. Effectively, they share the dc->hotpluggable
   value.
3) When a VF is created, it’s added as a child node to PF’s PCI bus
   slot.
4) Monitor/device_del triggers the ACPI mechanism. The implementation is
   not aware of SR/IOV and ejects PF’s PCI slot, directly unrealizing all
   hot-pluggable (!acpi_pcihp_pc_no_hotplug) children nodes.
5) VFs are unrealized directly, and it doesn’t work well with (1).
   SR/IOV structures are not updated, so when it’s PF’s turn to be
   unrealized, it works on stale pointers to already-deleted VFs.

Signed-off-by: Łukasz Gieryk 
---
 hw/acpi/pcihp.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
index 6351bd3424d..248839e1110 100644
--- a/hw/acpi/pcihp.c
+++ b/hw/acpi/pcihp.c
@@ -192,8 +192,12 @@ static bool acpi_pcihp_pc_no_hotplug(AcpiPciHpState *s, 
PCIDevice *dev)
  * ACPI doesn't allow hotplug of bridge devices.  Don't allow
  * hot-unplug of bridge devices unless they were added by hotplug
  * (and so, not described by acpi).
+ *
+ * Don't allow hot-unplug of SR-IOV Virtual Functions, as they
+ * will be removed implicitly, when Physical Function is unplugged.
  */
-return (pc->is_bridge && !dev->qdev.hotplugged) || !dc->hotpluggable;
+return (pc->is_bridge && !dev->qdev.hotplugged) || !dc->hotpluggable ||
+   pci_is_vf(dev);
 }
 
 static void acpi_pcihp_eject_slot(AcpiPciHpState *s, unsigned bsel, unsigned 
slots)
-- 
2.25.1

[PATCH v7 11/12] hw/nvme: Update the initalization place for the AER queue

2022-03-18 Thread Lukasz Maniak

From: Łukasz Gieryk 

This patch updates the initialization place for the AER queue, so it’s
initialized once, at controller initialization, and not every time
controller is enabled.

While the original version works for a non-SR-IOV device, as it’s hard
to interact with the controller if it’s not enabled, the multiple
reinitialization is not necessarily correct.

With the SR/IOV feature enabled a segfault can happen: a VF can have its
controller disabled, while a namespace can still be attached to the
controller through the parent PF. An event generated in such case ends
up on an uninitialized queue.

While it’s an interesting question whether a VF should support AER in
the first place, I don’t think it must be answered today.

Signed-off-by: Łukasz Gieryk 
Reviewed-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 247c09882dd..b0862b1d96c 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -6326,8 +6326,6 @@ static int nvme_start_ctrl(NvmeCtrl *n)
 
 nvme_set_timestamp(n, 0ULL);
 
-QTAILQ_INIT(>aer_queue);
-
 nvme_select_iocs(n);
 
 return 0;
@@ -6987,6 +6985,7 @@ static void nvme_init_state(NvmeCtrl *n)
 n->features.temp_thresh_hi = NVME_TEMPERATURE_WARNING;
 n->starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
 n->aer_reqs = g_new0(NvmeRequest *, n->params.aerl + 1);
+QTAILQ_INIT(>aer_queue);
 
 list->numcntl = cpu_to_le16(max_vfs);
 for (i = 0; i < max_vfs; i++) {
-- 
2.25.1

[PATCH v7 07/12] hw/nvme: Calculate BAR attributes in a function

2022-03-18 Thread Lukasz Maniak

From: Łukasz Gieryk 

An NVMe device with SR-IOV capability calculates the BAR size
differently for PF and VF, so it makes sense to extract the common code
to a separate function.

Signed-off-by: Łukasz Gieryk 
Reviewed-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 45 +++--
 1 file changed, 31 insertions(+), 14 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index f34d73a00c8..f0554a07c40 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -6728,6 +6728,34 @@ static void nvme_init_pmr(NvmeCtrl *n, PCIDevice 
*pci_dev)
 memory_region_set_enabled(>pmr.dev->mr, false);
 }
 
+static uint64_t nvme_bar_size(unsigned total_queues, unsigned total_irqs,
+  unsigned *msix_table_offset,
+  unsigned *msix_pba_offset)
+{
+uint64_t bar_size, msix_table_size, msix_pba_size;
+
+bar_size = sizeof(NvmeBar) + 2 * total_queues * NVME_DB_SIZE;
+bar_size = QEMU_ALIGN_UP(bar_size, 4 * KiB);
+
+if (msix_table_offset) {
+*msix_table_offset = bar_size;
+}
+
+msix_table_size = PCI_MSIX_ENTRY_SIZE * total_irqs;
+bar_size += msix_table_size;
+bar_size = QEMU_ALIGN_UP(bar_size, 4 * KiB);
+
+if (msix_pba_offset) {
+*msix_pba_offset = bar_size;
+}
+
+msix_pba_size = QEMU_ALIGN_UP(total_irqs, 64) / 8;
+bar_size += msix_pba_size;
+
+bar_size = pow2ceil(bar_size);
+return bar_size;
+}
+
 static void nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset,
 uint64_t bar_size)
 {
@@ -6767,7 +6795,7 @@ static int nvme_add_pm_capability(PCIDevice *pci_dev, 
uint8_t offset)
 static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp)
 {
 uint8_t *pci_conf = pci_dev->config;
-uint64_t bar_size, msix_table_size, msix_pba_size;
+uint64_t bar_size;
 unsigned msix_table_offset, msix_pba_offset;
 int ret;
 
@@ -6793,19 +6821,8 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice 
*pci_dev, Error **errp)
 }
 
 /* add one to max_ioqpairs to account for the admin queue pair */
-bar_size = sizeof(NvmeBar) +
-   2 * (n->params.max_ioqpairs + 1) * NVME_DB_SIZE;
-bar_size = QEMU_ALIGN_UP(bar_size, 4 * KiB);
-msix_table_offset = bar_size;
-msix_table_size = PCI_MSIX_ENTRY_SIZE * n->params.msix_qsize;
-
-bar_size += msix_table_size;
-bar_size = QEMU_ALIGN_UP(bar_size, 4 * KiB);
-msix_pba_offset = bar_size;
-msix_pba_size = QEMU_ALIGN_UP(n->params.msix_qsize, 64) / 8;
-
-bar_size += msix_pba_size;
-bar_size = pow2ceil(bar_size);
+bar_size = nvme_bar_size(n->params.max_ioqpairs + 1, n->params.msix_qsize,
+ _table_offset, _pba_offset);
 
 memory_region_init(>bar0, OBJECT(n), "nvme-bar0", bar_size);
 memory_region_init_io(>iomem, OBJECT(n), _mmio_ops, n, "nvme",
-- 
2.25.1

[PATCH v7 02/12] hw/nvme: Add support for Primary Controller Capabilities

2022-03-18 Thread Lukasz Maniak

Implementation of Primary Controller Capabilities data
structure (Identify command with CNS value of 14h).

Currently, the command returns only ID of a primary controller.
Handling of remaining fields are added in subsequent patches
implementing virtualization enhancements.

Signed-off-by: Lukasz Maniak 
Reviewed-by: Klaus Jensen 
---
 hw/nvme/ctrl.c   | 23 ++-
 hw/nvme/nvme.h   |  2 ++
 hw/nvme/trace-events |  1 +
 include/block/nvme.h | 23 +++
 4 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 0e1d8d03c87..ea9d5af3545 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -4799,6 +4799,14 @@ static uint16_t nvme_identify_ctrl_list(NvmeCtrl *n, 
NvmeRequest *req,
 return nvme_c2h(n, (uint8_t *)list, sizeof(list), req);
 }
 
+static uint16_t nvme_identify_pri_ctrl_cap(NvmeCtrl *n, NvmeRequest *req)
+{
+trace_pci_nvme_identify_pri_ctrl_cap(le16_to_cpu(n->pri_ctrl_cap.cntlid));
+
+return nvme_c2h(n, (uint8_t *)>pri_ctrl_cap,
+sizeof(NvmePriCtrlCap), req);
+}
+
 static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req,
  bool active)
 {
@@ -5018,6 +5026,8 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeRequest 
*req)
 return nvme_identify_ctrl_list(n, req, true);
 case NVME_ID_CNS_CTRL_LIST:
 return nvme_identify_ctrl_list(n, req, false);
+case NVME_ID_CNS_PRIMARY_CTRL_CAP:
+return nvme_identify_pri_ctrl_cap(n, req);
 case NVME_ID_CNS_CS_NS:
 return nvme_identify_ns_csi(n, req, true);
 case NVME_ID_CNS_CS_NS_PRESENT:
@@ -6609,6 +6619,8 @@ static void nvme_check_constraints(NvmeCtrl *n, Error 
**errp)
 
 static void nvme_init_state(NvmeCtrl *n)
 {
+NvmePriCtrlCap *cap = >pri_ctrl_cap;
+
 /* add one to max_ioqpairs to account for the admin queue pair */
 n->reg_size = pow2ceil(sizeof(NvmeBar) +
2 * (n->params.max_ioqpairs + 1) * NVME_DB_SIZE);
@@ -6618,6 +6630,8 @@ static void nvme_init_state(NvmeCtrl *n)
 n->features.temp_thresh_hi = NVME_TEMPERATURE_WARNING;
 n->starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
 n->aer_reqs = g_new0(NvmeRequest *, n->params.aerl + 1);
+
+cap->cntlid = cpu_to_le16(n->cntlid);
 }
 
 static void nvme_init_cmb(NvmeCtrl *n, PCIDevice *pci_dev)
@@ -6919,15 +6933,14 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
**errp)
 qbus_init(>bus, sizeof(NvmeBus), TYPE_NVME_BUS,
   _dev->qdev, n->parent_obj.qdev.id);
 
-nvme_init_state(n);
-if (nvme_init_pci(n, pci_dev, errp)) {
-return;
-}
-
 if (nvme_init_subsys(n, errp)) {
 error_propagate(errp, local_err);
 return;
 }
+nvme_init_state(n);
+if (nvme_init_pci(n, pci_dev, errp)) {
+return;
+}
 nvme_init_ctrl(n, pci_dev);
 
 /* setup a namespace if the controller drive property was given */
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 89ca6e96401..e58bab841e2 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -477,6 +477,8 @@ typedef struct NvmeCtrl {
 uint32_tasync_config;
 NvmeHostBehaviorSupport hbs;
 } features;
+
+NvmePriCtrlCap  pri_ctrl_cap;
 } NvmeCtrl;
 
 static inline NvmeNamespace *nvme_ns(NvmeCtrl *n, uint32_t nsid)
diff --git a/hw/nvme/trace-events b/hw/nvme/trace-events
index ff1b4589692..1834b17cf21 100644
--- a/hw/nvme/trace-events
+++ b/hw/nvme/trace-events
@@ -56,6 +56,7 @@ pci_nvme_identify_ctrl(void) "identify controller"
 pci_nvme_identify_ctrl_csi(uint8_t csi) "identify controller, csi=0x%"PRIx8""
 pci_nvme_identify_ns(uint32_t ns) "nsid %"PRIu32""
 pci_nvme_identify_ctrl_list(uint8_t cns, uint16_t cntid) "cns 0x%"PRIx8" cntid 
%"PRIu16""
+pci_nvme_identify_pri_ctrl_cap(uint16_t cntlid) "identify primary controller 
capabilities cntlid=%"PRIu16""
 pci_nvme_identify_ns_csi(uint32_t ns, uint8_t csi) "nsid=%"PRIu32", 
csi=0x%"PRIx8""
 pci_nvme_identify_nslist(uint32_t ns) "nsid %"PRIu32""
 pci_nvme_identify_nslist_csi(uint16_t ns, uint8_t csi) "nsid=%"PRIu16", 
csi=0x%"PRIx8""
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 3737351cc81..524a04fb94e 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -1033,6 +1033,7 @@ enum NvmeIdCns {
 NVME_ID_CNS_NS_PRESENT= 0x11,
 NVME_ID_CNS_NS_ATTACHED_CTRL_LIST = 0x12,
 NVME_ID_CNS_CTRL_LIST = 0x13,
+NVME_ID_CNS_PRIMARY_CTRL_CAP  = 0x14,
 NVME_ID_CNS_CS_NS_PRESENT_LIST= 0x1a,
 NVME_ID_CNS_CS_NS_PRESENT = 0x1b,
 NVME_ID_CNS_IO_COMMAND_SET= 0x1c,
@@ -1553,6 +1554,27 @@ typedef enum NvmeZoneState {
 NVME_ZONE_STATE_OFFLINE  = 0x0f,
 } NvmeZoneState;
 
+typedef struct QEMU_PACKED NvmePriCtrlCap {
+uint16_tcntlid;
+uint16_tportid;
+uint8_t crt;
+uint8_t rsvd5[27];
+uint32_tvqfrt;
+uint32_tvqrfa;
+uint16_t

[PATCH v7 09/12] hw/nvme: Add support for the Virtualization Management command

2022-03-18 Thread Lukasz Maniak

From: Łukasz Gieryk 

With the new command one can:
 - assign flexible resources (queues, interrupts) to primary and
   secondary controllers,
 - toggle the online/offline state of given controller.

Signed-off-by: Łukasz Gieryk 
---
 hw/nvme/ctrl.c   | 257 ++-
 hw/nvme/nvme.h   |  20 
 hw/nvme/trace-events |   3 +
 include/block/nvme.h |  17 +++
 4 files changed, 295 insertions(+), 2 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 011231ab5a6..247c09882dd 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -188,6 +188,7 @@
 #include "qemu/error-report.h"
 #include "qemu/log.h"
 #include "qemu/units.h"
+#include "qemu/range.h"
 #include "qapi/error.h"
 #include "qapi/visitor.h"
 #include "sysemu/sysemu.h"
@@ -262,6 +263,7 @@ static const uint32_t nvme_cse_acs[256] = {
 [NVME_ADM_CMD_GET_FEATURES] = NVME_CMD_EFF_CSUPP,
 [NVME_ADM_CMD_ASYNC_EV_REQ] = NVME_CMD_EFF_CSUPP,
 [NVME_ADM_CMD_NS_ATTACHMENT]= NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_NIC,
+[NVME_ADM_CMD_VIRT_MNGMT]   = NVME_CMD_EFF_CSUPP,
 [NVME_ADM_CMD_FORMAT_NVM]   = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
 };
 
@@ -293,6 +295,7 @@ static const uint32_t nvme_cse_iocs_zoned[256] = {
 };
 
 static void nvme_process_sq(void *opaque);
+static void nvme_ctrl_reset(NvmeCtrl *n, NvmeResetType rst);
 
 static uint16_t nvme_sqid(NvmeRequest *req)
 {
@@ -5838,6 +5841,167 @@ out:
 return status;
 }
 
+static void nvme_get_virt_res_num(NvmeCtrl *n, uint8_t rt, int *num_total,
+  int *num_prim, int *num_sec)
+{
+*num_total = le32_to_cpu(rt ?
+ n->pri_ctrl_cap.vifrt : n->pri_ctrl_cap.vqfrt);
+*num_prim = le16_to_cpu(rt ?
+n->pri_ctrl_cap.virfap : n->pri_ctrl_cap.vqrfap);
+*num_sec = le16_to_cpu(rt ? n->pri_ctrl_cap.virfa : n->pri_ctrl_cap.vqrfa);
+}
+
+static uint16_t nvme_assign_virt_res_to_prim(NvmeCtrl *n, NvmeRequest *req,
+ uint16_t cntlid, uint8_t rt,
+ int nr)
+{
+int num_total, num_prim, num_sec;
+
+if (cntlid != n->cntlid) {
+return NVME_INVALID_CTRL_ID | NVME_DNR;
+}
+
+nvme_get_virt_res_num(n, rt, _total, _prim, _sec);
+
+if (nr > num_total) {
+return NVME_INVALID_NUM_RESOURCES | NVME_DNR;
+}
+
+if (nr > num_total - num_sec) {
+return NVME_INVALID_RESOURCE_ID | NVME_DNR;
+}
+
+if (rt) {
+n->next_pri_ctrl_cap.virfap = cpu_to_le16(nr);
+} else {
+n->next_pri_ctrl_cap.vqrfap = cpu_to_le16(nr);
+}
+
+req->cqe.result = cpu_to_le32(nr);
+return req->status;
+}
+
+static void nvme_update_virt_res(NvmeCtrl *n, NvmeSecCtrlEntry *sctrl,
+ uint8_t rt, int nr)
+{
+int prev_nr, prev_total;
+
+if (rt) {
+prev_nr = le16_to_cpu(sctrl->nvi);
+prev_total = le32_to_cpu(n->pri_ctrl_cap.virfa);
+sctrl->nvi = cpu_to_le16(nr);
+n->pri_ctrl_cap.virfa = cpu_to_le32(prev_total + nr - prev_nr);
+} else {
+prev_nr = le16_to_cpu(sctrl->nvq);
+prev_total = le32_to_cpu(n->pri_ctrl_cap.vqrfa);
+sctrl->nvq = cpu_to_le16(nr);
+n->pri_ctrl_cap.vqrfa = cpu_to_le32(prev_total + nr - prev_nr);
+}
+}
+
+static uint16_t nvme_assign_virt_res_to_sec(NvmeCtrl *n, NvmeRequest *req,
+uint16_t cntlid, uint8_t rt, int 
nr)
+{
+int num_total, num_prim, num_sec, num_free, diff, limit;
+NvmeSecCtrlEntry *sctrl;
+
+sctrl = nvme_sctrl_for_cntlid(n, cntlid);
+if (!sctrl) {
+return NVME_INVALID_CTRL_ID | NVME_DNR;
+}
+
+if (sctrl->scs) {
+return NVME_INVALID_SEC_CTRL_STATE | NVME_DNR;
+}
+
+limit = le16_to_cpu(rt ? n->pri_ctrl_cap.vifrsm : n->pri_ctrl_cap.vqfrsm);
+if (nr > limit) {
+return NVME_INVALID_NUM_RESOURCES | NVME_DNR;
+}
+
+nvme_get_virt_res_num(n, rt, _total, _prim, _sec);
+num_free = num_total - num_prim - num_sec;
+diff = nr - le16_to_cpu(rt ? sctrl->nvi : sctrl->nvq);
+
+if (diff > num_free) {
+return NVME_INVALID_RESOURCE_ID | NVME_DNR;
+}
+
+nvme_update_virt_res(n, sctrl, rt, nr);
+req->cqe.result = cpu_to_le32(nr);
+
+return req->status;
+}
+
+static uint16_t nvme_virt_set_state(NvmeCtrl *n, uint16_t cntlid, bool online)
+{
+NvmeCtrl *sn = NULL;
+NvmeSecCtrlEntry *sctrl;
+int vf_index;
+
+sctrl = nvme_sctrl_for_cntlid(n, cntlid);
+if (!sctrl) {
+return NVME_INVALID_CTRL_ID | NVME_DNR;
+}
+
+if (!pci_is_vf(>parent_obj)) {
+vf_index = le16_to_cpu(sctrl->vfn) - 1;
+sn = NVME(pcie_sriov_get_vf_at_index(>parent_obj, vf_index));
+}
+
+if (online) {
+if (!sctrl->nvi || (le16_to_cpu(sctrl->nvq) < 2) || !sn) {
+return NVME_INVALID_SEC_CTRL_STATE | NVME_DNR;
+}
+
+if

[PATCH v7 06/12] hw/nvme: Remove reg_size variable and update BAR0 size calculation

2022-03-18 Thread Lukasz Maniak

From: Łukasz Gieryk 

The n->reg_size parameter unnecessarily splits the BAR0 size calculation
in two phases; removed to simplify the code.

With all the calculations done in one place, it seems the pow2ceil,
applied originally to reg_size, is unnecessary. The rounding should
happen as the last step, when BAR size includes Nvme registers, queue
registers, and MSIX-related space.

Finally, the size of the mmio memory region is extended to cover the 1st
4KiB padding (see the map below). Access to this range is handled as
interaction with a non-existing queue and generates an error trace, so
actually nothing changes, while the reg_size variable is no longer needed.


|  BAR0|

[Nvme Registers]
[Queues]
[power-of-2 padding] - removed in this patch
[4KiB padding (1)  ]
[MSIX TABLE]
[4KiB padding (2)  ]
[MSIX PBA  ]
[power-of-2 padding]

Signed-off-by: Łukasz Gieryk 
Reviewed-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 10 +-
 hw/nvme/nvme.h |  1 -
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 12372038075..f34d73a00c8 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -6669,9 +6669,6 @@ static void nvme_init_state(NvmeCtrl *n)
 n->conf_ioqpairs = n->params.max_ioqpairs;
 n->conf_msix_qsize = n->params.msix_qsize;
 
-/* add one to max_ioqpairs to account for the admin queue pair */
-n->reg_size = pow2ceil(sizeof(NvmeBar) +
-   2 * (n->params.max_ioqpairs + 1) * NVME_DB_SIZE);
 n->sq = g_new0(NvmeSQueue *, n->params.max_ioqpairs + 1);
 n->cq = g_new0(NvmeCQueue *, n->params.max_ioqpairs + 1);
 n->temperature = NVME_TEMPERATURE;
@@ -6795,7 +6792,10 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice 
*pci_dev, Error **errp)
 pcie_ari_init(pci_dev, 0x100, 1);
 }
 
-bar_size = QEMU_ALIGN_UP(n->reg_size, 4 * KiB);
+/* add one to max_ioqpairs to account for the admin queue pair */
+bar_size = sizeof(NvmeBar) +
+   2 * (n->params.max_ioqpairs + 1) * NVME_DB_SIZE;
+bar_size = QEMU_ALIGN_UP(bar_size, 4 * KiB);
 msix_table_offset = bar_size;
 msix_table_size = PCI_MSIX_ENTRY_SIZE * n->params.msix_qsize;
 
@@ -6809,7 +6809,7 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, 
Error **errp)
 
 memory_region_init(>bar0, OBJECT(n), "nvme-bar0", bar_size);
 memory_region_init_io(>iomem, OBJECT(n), _mmio_ops, n, "nvme",
-  n->reg_size);
+  msix_table_offset);
 memory_region_add_subregion(>bar0, 0, >iomem);
 
 if (pci_is_vf(pci_dev)) {
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 5bd6ac698bc..adde718105b 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -428,7 +428,6 @@ typedef struct NvmeCtrl {
 uint16_tmax_prp_ents;
 uint16_tcqe_size;
 uint16_tsqe_size;
-uint32_treg_size;
 uint32_tmax_q_ents;
 uint8_t outstanding_aers;
 uint32_tirq_status;
-- 
2.25.1

[PATCH v7 01/12] hw/nvme: Add support for SR-IOV

2022-03-18 Thread Lukasz Maniak

This patch implements initial support for Single Root I/O Virtualization
on an NVMe device.

Essentially, it allows to define the maximum number of virtual functions
supported by the NVMe controller via sriov_max_vfs parameter.

Passing a non-zero value to sriov_max_vfs triggers reporting of SR-IOV
capability by a physical controller and ARI capability by both the
physical and virtual function devices.

NVMe controllers created via virtual functions mirror functionally
the physical controller, which may not entirely be the case, thus
consideration would be needed on the way to limit the capabilities of
the VF.

NVMe subsystem is required for the use of SR-IOV.

Signed-off-by: Lukasz Maniak 
Reviewed-by: Klaus Jensen 
---
 hw/nvme/ctrl.c   | 85 ++--
 hw/nvme/nvme.h   |  3 +-
 include/hw/pci/pci_ids.h |  1 +
 3 files changed, 85 insertions(+), 4 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 03760ddeae8..0e1d8d03c87 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -35,6 +35,7 @@
  *  mdts=,vsl=, \
  *  zoned.zasl=, \
  *  zoned.auto_transition=, \
+ *  sriov_max_vfs= \
  *  subsys=
  *  -device nvme-ns,drive=,bus=,nsid=,\
  *  zoned=, \
@@ -106,6 +107,12 @@
  *   transitioned to zone state closed for resource management purposes.
  *   Defaults to 'on'.
  *
+ * - `sriov_max_vfs`
+ *   Indicates the maximum number of PCIe virtual functions supported
+ *   by the controller. The default value is 0. Specifying a non-zero value
+ *   enables reporting of both SR-IOV and ARI capabilities by the NVMe device.
+ *   Virtual function controllers will not report SR-IOV capability.
+ *
  * nvme namespace device parameters
  * 
  * - `shared`
@@ -160,6 +167,7 @@
 #include "sysemu/block-backend.h"
 #include "sysemu/hostmem.h"
 #include "hw/pci/msix.h"
+#include "hw/pci/pcie_sriov.h"
 #include "migration/vmstate.h"
 
 #include "nvme.h"
@@ -176,6 +184,9 @@
 #define NVME_TEMPERATURE_CRITICAL 0x175
 #define NVME_NUM_FW_SLOTS 1
 #define NVME_DEFAULT_MAX_ZA_SIZE (128 * KiB)
+#define NVME_MAX_VFS 127
+#define NVME_VF_OFFSET 0x1
+#define NVME_VF_STRIDE 1
 
 #define NVME_GUEST_ERR(trace, fmt, ...) \
 do { \
@@ -5886,6 +5897,10 @@ static void nvme_ctrl_reset(NvmeCtrl *n)
 g_free(event);
 }
 
+if (!pci_is_vf(>parent_obj) && n->params.sriov_max_vfs) {
+pcie_sriov_pf_disable_vfs(>parent_obj);
+}
+
 n->aer_queued = 0;
 n->outstanding_aers = 0;
 n->qs_created = false;
@@ -6567,6 +6582,29 @@ static void nvme_check_constraints(NvmeCtrl *n, Error 
**errp)
 error_setg(errp, "vsl must be non-zero");
 return;
 }
+
+if (params->sriov_max_vfs) {
+if (!n->subsys) {
+error_setg(errp, "subsystem is required for the use of SR-IOV");
+return;
+}
+
+if (params->sriov_max_vfs > NVME_MAX_VFS) {
+error_setg(errp, "sriov_max_vfs must be between 0 and %d",
+   NVME_MAX_VFS);
+return;
+}
+
+if (params->cmb_size_mb) {
+error_setg(errp, "CMB is not supported with SR-IOV");
+return;
+}
+
+if (n->pmr.dev) {
+error_setg(errp, "PMR is not supported with SR-IOV");
+return;
+}
+}
 }
 
 static void nvme_init_state(NvmeCtrl *n)
@@ -6624,6 +6662,20 @@ static void nvme_init_pmr(NvmeCtrl *n, PCIDevice 
*pci_dev)
 memory_region_set_enabled(>pmr.dev->mr, false);
 }
 
+static void nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset,
+uint64_t bar_size)
+{
+uint16_t vf_dev_id = n->params.use_intel_id ?
+ PCI_DEVICE_ID_INTEL_NVME : PCI_DEVICE_ID_REDHAT_NVME;
+
+pcie_sriov_pf_init(pci_dev, offset, "nvme", vf_dev_id,
+   n->params.sriov_max_vfs, n->params.sriov_max_vfs,
+   NVME_VF_OFFSET, NVME_VF_STRIDE);
+
+pcie_sriov_pf_init_vf_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY |
+  PCI_BASE_ADDRESS_MEM_TYPE_64, bar_size);
+}
+
 static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp)
 {
 uint8_t *pci_conf = pci_dev->config;
@@ -6638,7 +6690,7 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, 
Error **errp)
 
 if (n->params.use_intel_id) {
 pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_INTEL);
-pci_config_set_device_id(pci_conf, 0x5845);
+pci_config_set_device_id(pci_conf, PCI_DEVICE_ID_INTEL_NVME);
 } else {
 pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_REDHAT);
 pci_config_set_device_id(pci_conf, PCI_DEVICE_ID_REDHAT_NVME);
@@ -6646,6 +6698,9 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, 
Error **errp)
 
 pci_config_set_class(pci_conf, PCI_CLASS_STORAGE_EXPRESS);
 pcie_endpoint_cap_init(pci_dev,

[PATCH v7 10/12] docs: Add documentation for SR-IOV and Virtualization Enhancements

2022-03-18 Thread Lukasz Maniak

Signed-off-by: Lukasz Maniak 
---
 docs/system/devices/nvme.rst | 82 
 1 file changed, 82 insertions(+)

diff --git a/docs/system/devices/nvme.rst b/docs/system/devices/nvme.rst
index b5acb2a9c19..aba253304e4 100644
--- a/docs/system/devices/nvme.rst
+++ b/docs/system/devices/nvme.rst
@@ -239,3 +239,85 @@ The virtual namespace device supports DIF- and DIX-based 
protection information
   to ``1`` to transfer protection information as the first eight bytes of
   metadata. Otherwise, the protection information is transferred as the last
   eight bytes.
+
+Virtualization Enhancements and SR-IOV (Experimental Support)
+-
+
+The ``nvme`` device supports Single Root I/O Virtualization and Sharing
+along with Virtualization Enhancements. The controller has to be linked to
+an NVM Subsystem device (``nvme-subsys``) for use with SR-IOV.
+
+A number of parameters are present (**please note, that they may be
+subject to change**):
+
+``sriov_max_vfs`` (default: ``0``)
+  Indicates the maximum number of PCIe virtual functions supported
+  by the controller. Specifying a non-zero value enables reporting of both
+  SR-IOV and ARI (Alternative Routing-ID Interpretation) capabilities
+  by the NVMe device. Virtual function controllers will not report SR-IOV.
+
+``sriov_vq_flexible``
+  Indicates the total number of flexible queue resources assignable to all
+  the secondary controllers. Implicitly sets the number of primary
+  controller's private resources to ``(max_ioqpairs - sriov_vq_flexible)``.
+
+``sriov_vi_flexible``
+  Indicates the total number of flexible interrupt resources assignable to
+  all the secondary controllers. Implicitly sets the number of primary
+  controller's private resources to ``(msix_qsize - sriov_vi_flexible)``.
+
+``sriov_max_vi_per_vf`` (default: ``0``)
+  Indicates the maximum number of virtual interrupt resources assignable
+  to a secondary controller. The default ``0`` resolves to
+  ``(sriov_vi_flexible / sriov_max_vfs)``
+
+``sriov_max_vq_per_vf`` (default: ``0``)
+  Indicates the maximum number of virtual queue resources assignable to
+  a secondary controller. The default ``0`` resolves to
+  ``(sriov_vq_flexible / sriov_max_vfs)``
+
+The simplest possible invocation enables the capability to set up one VF
+controller and assign an admin queue, an IO queue, and a MSI-X interrupt.
+
+.. code-block:: console
+
+   -device nvme-subsys,id=subsys0
+   -device nvme,serial=deadbeef,subsys=subsys0,sriov_max_vfs=1,
+sriov_vq_flexible=2,sriov_vi_flexible=1
+
+The minimum steps required to configure a functional NVMe secondary
+controller are:
+
+  * unbind flexible resources from the primary controller
+
+.. code-block:: console
+
+   nvme virt-mgmt /dev/nvme0 -c 0 -r 1 -a 1 -n 0
+   nvme virt-mgmt /dev/nvme0 -c 0 -r 0 -a 1 -n 0
+
+  * perform a Function Level Reset on the primary controller to actually
+release the resources
+
+.. code-block:: console
+
+   echo 1 > /sys/bus/pci/devices/:01:00.0/reset
+
+  * enable VF
+
+.. code-block:: console
+
+   echo 1 > /sys/bus/pci/devices/:01:00.0/sriov_numvfs
+
+  * assign the flexible resources to the VF and set it ONLINE
+
+.. code-block:: console
+
+   nvme virt-mgmt /dev/nvme0 -c 1 -r 1 -a 8 -n 1
+   nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 8 -n 2
+   nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 9 -n 0
+
+  * bind the NVMe driver to the VF
+
+.. code-block:: console
+
+   echo :01:00.1 > /sys/bus/pci/drivers/nvme/bind
\ No newline at end of file
-- 
2.25.1

[PATCH v7 04/12] hw/nvme: Implement the Function Level Reset

2022-03-18 Thread Lukasz Maniak

From: Łukasz Gieryk 

This patch implements the Function Level Reset, a feature currently not
implemented for the Nvme device, while listed as a mandatory ("shall")
in the 1.4 spec.

The implementation reuses FLR-related building blocks defined for the
pci-bridge module, and follows the same logic:
- FLR capability is advertised in the PCIE config,
- custom pci_write_config callback detects a write to the trigger
  register and performs the PCI reset,
- which, eventually, calls the custom dc->reset handler.

Depending on reset type, parts of the state should (or should not) be
cleared. To distinguish the type of reset, an additional parameter is
passed to the reset function.

This patch also enables advertisement of the Power Management PCI
capability. The main reason behind it is to announce the no_soft_reset=1
bit, to signal SR-IOV support where each VF can be reset individually.

The implementation purposedly ignores writes to the PMCS.PS register,
as even such naïve behavior is enough to correctly handle the D3->D0
transition.

It’s worth to note, that the power state transition back to to D3, with
all the corresponding side effects, wasn't and stil isn't handled
properly.

Signed-off-by: Łukasz Gieryk 
Reviewed-by: Klaus Jensen 
---
 hw/nvme/ctrl.c   | 52 
 hw/nvme/nvme.h   |  5 +
 hw/nvme/trace-events |  1 +
 3 files changed, 54 insertions(+), 4 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index b1b1bebbaf2..e6d6e5840af 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -5901,7 +5901,7 @@ static void nvme_process_sq(void *opaque)
 }
 }
 
-static void nvme_ctrl_reset(NvmeCtrl *n)
+static void nvme_ctrl_reset(NvmeCtrl *n, NvmeResetType rst)
 {
 NvmeNamespace *ns;
 int i;
@@ -5933,7 +5933,9 @@ static void nvme_ctrl_reset(NvmeCtrl *n)
 }
 
 if (!pci_is_vf(>parent_obj) && n->params.sriov_max_vfs) {
-pcie_sriov_pf_disable_vfs(>parent_obj);
+if (rst != NVME_RESET_CONTROLLER) {
+pcie_sriov_pf_disable_vfs(>parent_obj);
+}
 }
 
 n->aer_queued = 0;
@@ -6167,7 +6169,7 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, 
uint64_t data,
 }
 } else if (!NVME_CC_EN(data) && NVME_CC_EN(cc)) {
 trace_pci_nvme_mmio_stopped();
-nvme_ctrl_reset(n);
+nvme_ctrl_reset(n, NVME_RESET_CONTROLLER);
 cc = 0;
 csts &= ~NVME_CSTS_READY;
 }
@@ -6725,6 +6727,28 @@ static void nvme_init_sriov(NvmeCtrl *n, PCIDevice 
*pci_dev, uint16_t offset,
   PCI_BASE_ADDRESS_MEM_TYPE_64, bar_size);
 }
 
+static int nvme_add_pm_capability(PCIDevice *pci_dev, uint8_t offset)
+{
+Error *err = NULL;
+int ret;
+
+ret = pci_add_capability(pci_dev, PCI_CAP_ID_PM, offset,
+ PCI_PM_SIZEOF, );
+if (err) {
+error_report_err(err);
+return ret;
+}
+
+pci_set_word(pci_dev->config + offset + PCI_PM_PMC,
+ PCI_PM_CAP_VER_1_2);
+pci_set_word(pci_dev->config + offset + PCI_PM_CTRL,
+ PCI_PM_CTRL_NO_SOFT_RESET);
+pci_set_word(pci_dev->wmask + offset + PCI_PM_CTRL,
+ PCI_PM_CTRL_STATE_MASK);
+
+return 0;
+}
+
 static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp)
 {
 uint8_t *pci_conf = pci_dev->config;
@@ -6746,7 +6770,9 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, 
Error **errp)
 }
 
 pci_config_set_class(pci_conf, PCI_CLASS_STORAGE_EXPRESS);
+nvme_add_pm_capability(pci_dev, 0x60);
 pcie_endpoint_cap_init(pci_dev, 0x80);
+pcie_cap_flr_init(pci_dev);
 if (n->params.sriov_max_vfs) {
 pcie_ari_init(pci_dev, 0x100, 1);
 }
@@ -6997,7 +7023,7 @@ static void nvme_exit(PCIDevice *pci_dev)
 NvmeNamespace *ns;
 int i;
 
-nvme_ctrl_reset(n);
+nvme_ctrl_reset(n, NVME_RESET_FUNCTION);
 
 if (n->subsys) {
 for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
@@ -7096,6 +7122,22 @@ static void nvme_set_smart_warning(Object *obj, Visitor 
*v, const char *name,
 }
 }
 
+static void nvme_pci_reset(DeviceState *qdev)
+{
+PCIDevice *pci_dev = PCI_DEVICE(qdev);
+NvmeCtrl *n = NVME(pci_dev);
+
+trace_pci_nvme_pci_reset();
+nvme_ctrl_reset(n, NVME_RESET_FUNCTION);
+}
+
+static void nvme_pci_write_config(PCIDevice *dev, uint32_t address,
+  uint32_t val, int len)
+{
+pci_default_write_config(dev, address, val, len);
+pcie_cap_flr_write_config(dev, address, val, len);
+}
+
 static const VMStateDescription nvme_vmstate = {
 .name = "nvme",
 .unmigratable = 1,
@@ -7107,6 +7149,7 @@ static void nvme_class_init(ObjectClass *oc, void *data)
 PCIDeviceClass *pc = PCI_DEVICE_CLASS(oc);
 
 pc->realize = nvme_realize;
+pc->config_write = nvme_pci_write_config;
 pc->exit = nvme_exit;
 pc->class_id = PCI_CLASS_STORAGE_EXPRESS;

[PATCH v7 08/12] hw/nvme: Initialize capability structures for primary/secondary controllers

2022-03-18 Thread Lukasz Maniak

From: Łukasz Gieryk 

With four new properties:
 - sriov_v{i,q}_flexible,
 - sriov_max_v{i,q}_per_vf,
one can configure the number of available flexible resources, as well as
the limits. The primary and secondary controller capability structures
are initialized accordingly.

Since the number of available queues (interrupts) now varies between
VF/PF, BAR size calculation is also adjusted.

Signed-off-by: Łukasz Gieryk 
---
 hw/nvme/ctrl.c   | 141 ---
 hw/nvme/nvme.h   |   4 ++
 include/block/nvme.h |   5 ++
 3 files changed, 143 insertions(+), 7 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index f0554a07c40..011231ab5a6 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -36,6 +36,10 @@
  *  zoned.zasl=, \
  *  zoned.auto_transition=, \
  *  sriov_max_vfs= \
+ *  sriov_vq_flexible= \
+ *  sriov_vi_flexible= \
+ *  sriov_max_vi_per_vf= \
+ *  sriov_max_vq_per_vf= \
  *  subsys=
  *  -device nvme-ns,drive=,bus=,nsid=,\
  *  zoned=, \
@@ -113,6 +117,29 @@
  *   enables reporting of both SR-IOV and ARI capabilities by the NVMe device.
  *   Virtual function controllers will not report SR-IOV capability.
  *
+ *   NOTE: Single Root I/O Virtualization support is experimental.
+ *   All the related parameters may be subject to change.
+ *
+ * - `sriov_vq_flexible`
+ *   Indicates the total number of flexible queue resources assignable to all
+ *   the secondary controllers. Implicitly sets the number of primary
+ *   controller's private resources to `(max_ioqpairs - sriov_vq_flexible)`.
+ *
+ * - `sriov_vi_flexible`
+ *   Indicates the total number of flexible interrupt resources assignable to
+ *   all the secondary controllers. Implicitly sets the number of primary
+ *   controller's private resources to `(msix_qsize - sriov_vi_flexible)`.
+ *
+ * - `sriov_max_vi_per_vf`
+ *   Indicates the maximum number of virtual interrupt resources assignable
+ *   to a secondary controller. The default 0 resolves to
+ *   `(sriov_vi_flexible / sriov_max_vfs)`.
+ *
+ * - `sriov_max_vq_per_vf`
+ *   Indicates the maximum number of virtual queue resources assignable to
+ *   a secondary controller. The default 0 resolves to
+ *   `(sriov_vq_flexible / sriov_max_vfs)`.
+ *
  * nvme namespace device parameters
  * 
  * - `shared`
@@ -185,6 +212,7 @@
 #define NVME_NUM_FW_SLOTS 1
 #define NVME_DEFAULT_MAX_ZA_SIZE (128 * KiB)
 #define NVME_MAX_VFS 127
+#define NVME_VF_RES_GRANULARITY 1
 #define NVME_VF_OFFSET 0x1
 #define NVME_VF_STRIDE 1
 
@@ -6656,6 +6684,53 @@ static void nvme_check_constraints(NvmeCtrl *n, Error 
**errp)
 error_setg(errp, "PMR is not supported with SR-IOV");
 return;
 }
+
+if (!params->sriov_vq_flexible || !params->sriov_vi_flexible) {
+error_setg(errp, "both sriov_vq_flexible and sriov_vi_flexible"
+   " must be set for the use of SR-IOV");
+return;
+}
+
+if (params->sriov_vq_flexible < params->sriov_max_vfs * 2) {
+error_setg(errp, "sriov_vq_flexible must be greater than or equal"
+   " to %d (sriov_max_vfs * 2)", params->sriov_max_vfs * 
2);
+return;
+}
+
+if (params->max_ioqpairs < params->sriov_vq_flexible + 2) {
+error_setg(errp, "(max_ioqpairs - sriov_vq_flexible) must be"
+   " greater than or equal to 2");
+return;
+}
+
+if (params->sriov_vi_flexible < params->sriov_max_vfs) {
+error_setg(errp, "sriov_vi_flexible must be greater than or equal"
+   " to %d (sriov_max_vfs)", params->sriov_max_vfs);
+return;
+}
+
+if (params->msix_qsize < params->sriov_vi_flexible + 1) {
+error_setg(errp, "(msix_qsize - sriov_vi_flexible) must be"
+   " greater than or equal to 1");
+return;
+}
+
+if (params->sriov_max_vi_per_vf &&
+(params->sriov_max_vi_per_vf - 1) % NVME_VF_RES_GRANULARITY) {
+error_setg(errp, "sriov_max_vi_per_vf must meet:"
+   " (sriov_max_vi_per_vf - 1) %% %d == 0 and"
+   " sriov_max_vi_per_vf >= 1", NVME_VF_RES_GRANULARITY);
+return;
+}
+
+if (params->sriov_max_vq_per_vf &&
+(params->sriov_max_vq_per_vf < 2 ||
+ (params->sriov_max_vq_per_vf - 1) % NVME_VF_RES_GRANULARITY)) {
+error_setg(errp, "sriov_max_vq_per_vf must meet:"
+   " (sriov_max_vq_per_vf - 1) %% %d == 0 and"
+   " sriov_max_vq_per_vf >= 2", NVME_VF_RES_GRANULARITY);
+return;
+}
 }
 }
 
@@ -6664,10 +6739,19 @@ static void nvme_init_state(NvmeCtrl *n)
 NvmePriCtrlCap *cap = >pri_ctrl_cap;

[PATCH v7 05/12] hw/nvme: Make max_ioqpairs and msix_qsize configurable in runtime

2022-03-18 Thread Lukasz Maniak

From: Łukasz Gieryk 

The NVMe device defines two properties: max_ioqpairs, msix_qsize. Having
them as constants is problematic for SR-IOV support.

SR-IOV introduces virtual resources (queues, interrupts) that can be
assigned to PF and its dependent VFs. Each device, following a reset,
should work with the configured number of queues. A single constant is
no longer sufficient to hold the whole state.

This patch tries to solve the problem by introducing additional
variables in NvmeCtrl’s state. The variables for, e.g., managing queues
are therefore organized as:
 - n->params.max_ioqpairs – no changes, constant set by the user
 - n->(mutable_state) – (not a part of this patch) user-configurable,
specifies number of queues available _after_
reset
 - n->conf_ioqpairs - (new) used in all the places instead of the ‘old’
  n->params.max_ioqpairs; initialized in realize()
  and updated during reset() to reflect user’s
  changes to the mutable state

Since the number of available i/o queues and interrupts can change in
runtime, buffers for sq/cqs and the MSIX-related structures are
allocated big enough to handle the limits, to completely avoid the
complicated reallocation. A helper function (nvme_update_msixcap_ts)
updates the corresponding capability register, to signal configuration
changes.

Signed-off-by: Łukasz Gieryk 
Reviewed-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 52 ++
 hw/nvme/nvme.h |  2 ++
 2 files changed, 38 insertions(+), 16 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index e6d6e5840af..12372038075 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -448,12 +448,12 @@ static bool nvme_nsid_valid(NvmeCtrl *n, uint32_t nsid)
 
 static int nvme_check_sqid(NvmeCtrl *n, uint16_t sqid)
 {
-return sqid < n->params.max_ioqpairs + 1 && n->sq[sqid] != NULL ? 0 : -1;
+return sqid < n->conf_ioqpairs + 1 && n->sq[sqid] != NULL ? 0 : -1;
 }
 
 static int nvme_check_cqid(NvmeCtrl *n, uint16_t cqid)
 {
-return cqid < n->params.max_ioqpairs + 1 && n->cq[cqid] != NULL ? 0 : -1;
+return cqid < n->conf_ioqpairs + 1 && n->cq[cqid] != NULL ? 0 : -1;
 }
 
 static void nvme_inc_cq_tail(NvmeCQueue *cq)
@@ -4290,8 +4290,7 @@ static uint16_t nvme_create_sq(NvmeCtrl *n, NvmeRequest 
*req)
 trace_pci_nvme_err_invalid_create_sq_cqid(cqid);
 return NVME_INVALID_CQID | NVME_DNR;
 }
-if (unlikely(!sqid || sqid > n->params.max_ioqpairs ||
-n->sq[sqid] != NULL)) {
+if (unlikely(!sqid || sqid > n->conf_ioqpairs || n->sq[sqid] != NULL)) {
 trace_pci_nvme_err_invalid_create_sq_sqid(sqid);
 return NVME_INVALID_QID | NVME_DNR;
 }
@@ -4643,8 +4642,7 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeRequest 
*req)
 trace_pci_nvme_create_cq(prp1, cqid, vector, qsize, qflags,
  NVME_CQ_FLAGS_IEN(qflags) != 0);
 
-if (unlikely(!cqid || cqid > n->params.max_ioqpairs ||
-n->cq[cqid] != NULL)) {
+if (unlikely(!cqid || cqid > n->conf_ioqpairs || n->cq[cqid] != NULL)) {
 trace_pci_nvme_err_invalid_create_cq_cqid(cqid);
 return NVME_INVALID_QID | NVME_DNR;
 }
@@ -4660,7 +4658,7 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeRequest 
*req)
 trace_pci_nvme_err_invalid_create_cq_vector(vector);
 return NVME_INVALID_IRQ_VECTOR | NVME_DNR;
 }
-if (unlikely(vector >= n->params.msix_qsize)) {
+if (unlikely(vector >= n->conf_msix_qsize)) {
 trace_pci_nvme_err_invalid_create_cq_vector(vector);
 return NVME_INVALID_IRQ_VECTOR | NVME_DNR;
 }
@@ -5261,13 +5259,12 @@ defaults:
 
 break;
 case NVME_NUMBER_OF_QUEUES:
-result = (n->params.max_ioqpairs - 1) |
-((n->params.max_ioqpairs - 1) << 16);
+result = (n->conf_ioqpairs - 1) | ((n->conf_ioqpairs - 1) << 16);
 trace_pci_nvme_getfeat_numq(result);
 break;
 case NVME_INTERRUPT_VECTOR_CONF:
 iv = dw11 & 0x;
-if (iv >= n->params.max_ioqpairs + 1) {
+if (iv >= n->conf_ioqpairs + 1) {
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 
@@ -5423,10 +5420,10 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, 
NvmeRequest *req)
 
 trace_pci_nvme_setfeat_numq((dw11 & 0x) + 1,
 ((dw11 >> 16) & 0x) + 1,
-n->params.max_ioqpairs,
-n->params.max_ioqpairs);
-req->cqe.result = cpu_to_le32((n->params.max_ioqpairs - 1) |
-  ((n->params.max_ioqpairs - 1) << 16));
+n->conf_ioqpairs,
+n->conf_ioqpairs);
+req->cqe.result = cpu_to_le32((n->conf_ioqpairs - 1) |
+  ((n->conf_ioqpairs - 1) << 16));

[PATCH v7 00/12] hw/nvme: SR-IOV with Virtualization Enhancements

2022-03-18 Thread Lukasz Maniak

Resubmitting v6 as v7 since Patchew got lost with my sophisticated CC of
all maintainers just for the cover letter.

Changes since v5:
- Fixed PCI hotplug issue related to deleting VF twice
- Corrected error messages for SR-IOV parameters
- Rebased on master, patches for PCI got pulled into the tree
- Added Reviewed-by labels

Lukasz Maniak (4):
  hw/nvme: Add support for SR-IOV
  hw/nvme: Add support for Primary Controller Capabilities
  hw/nvme: Add support for Secondary Controller List
  docs: Add documentation for SR-IOV and Virtualization Enhancements

Łukasz Gieryk (8):
  hw/nvme: Implement the Function Level Reset
  hw/nvme: Make max_ioqpairs and msix_qsize configurable in runtime
  hw/nvme: Remove reg_size variable and update BAR0 size calculation
  hw/nvme: Calculate BAR attributes in a function
  hw/nvme: Initialize capability structures for primary/secondary
controllers
  hw/nvme: Add support for the Virtualization Management command
  hw/nvme: Update the initalization place for the AER queue
  hw/acpi: Make the PCI hot-plug aware of SR-IOV

 docs/system/devices/nvme.rst |  82 +
 hw/acpi/pcihp.c  |   6 +-
 hw/nvme/ctrl.c   | 673 ---
 hw/nvme/ns.c |   2 +-
 hw/nvme/nvme.h   |  55 ++-
 hw/nvme/subsys.c |  75 +++-
 hw/nvme/trace-events |   6 +
 include/block/nvme.h |  65 
 include/hw/pci/pci_ids.h |   1 +
 9 files changed, 909 insertions(+), 56 deletions(-)

-- 
2.25.1

[PATCH v7 03/12] hw/nvme: Add support for Secondary Controller List

2022-03-18 Thread Lukasz Maniak

Introduce handling for Secondary Controller List (Identify command with
CNS value of 15h).

Secondary controller ids are unique in the subsystem, hence they are
reserved by it upon initialization of the primary controller to the
number of sriov_max_vfs.

ID reservation requires the addition of an intermediate controller slot
state, so the reserved controller has the address 0x.
A secondary controller is in the reserved state when it has no virtual
function assigned, but its primary controller is realized.
Secondary controller reservations are released to NULL when its primary
controller is unregistered.

Signed-off-by: Lukasz Maniak 
---
 hw/nvme/ctrl.c   | 35 +
 hw/nvme/ns.c |  2 +-
 hw/nvme/nvme.h   | 18 +++
 hw/nvme/subsys.c | 75 ++--
 hw/nvme/trace-events |  1 +
 include/block/nvme.h | 20 
 6 files changed, 141 insertions(+), 10 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index ea9d5af3545..b1b1bebbaf2 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -4807,6 +4807,29 @@ static uint16_t nvme_identify_pri_ctrl_cap(NvmeCtrl *n, 
NvmeRequest *req)
 sizeof(NvmePriCtrlCap), req);
 }
 
+static uint16_t nvme_identify_sec_ctrl_list(NvmeCtrl *n, NvmeRequest *req)
+{
+NvmeIdentify *c = (NvmeIdentify *)>cmd;
+uint16_t pri_ctrl_id = le16_to_cpu(n->pri_ctrl_cap.cntlid);
+uint16_t min_id = le16_to_cpu(c->ctrlid);
+uint8_t num_sec_ctrl = n->sec_ctrl_list.numcntl;
+NvmeSecCtrlList list = {0};
+uint8_t i;
+
+for (i = 0; i < num_sec_ctrl; i++) {
+if (n->sec_ctrl_list.sec[i].scid >= min_id) {
+list.numcntl = num_sec_ctrl - i;
+memcpy(, n->sec_ctrl_list.sec + i,
+   list.numcntl * sizeof(NvmeSecCtrlEntry));
+break;
+}
+}
+
+trace_pci_nvme_identify_sec_ctrl_list(pri_ctrl_id, list.numcntl);
+
+return nvme_c2h(n, (uint8_t *), sizeof(list), req);
+}
+
 static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req,
  bool active)
 {
@@ -5028,6 +5051,8 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeRequest 
*req)
 return nvme_identify_ctrl_list(n, req, false);
 case NVME_ID_CNS_PRIMARY_CTRL_CAP:
 return nvme_identify_pri_ctrl_cap(n, req);
+case NVME_ID_CNS_SECONDARY_CTRL_LIST:
+return nvme_identify_sec_ctrl_list(n, req);
 case NVME_ID_CNS_CS_NS:
 return nvme_identify_ns_csi(n, req, true);
 case NVME_ID_CNS_CS_NS_PRESENT:
@@ -6620,6 +6645,9 @@ static void nvme_check_constraints(NvmeCtrl *n, Error 
**errp)
 static void nvme_init_state(NvmeCtrl *n)
 {
 NvmePriCtrlCap *cap = >pri_ctrl_cap;
+NvmeSecCtrlList *list = >sec_ctrl_list;
+NvmeSecCtrlEntry *sctrl;
+int i;
 
 /* add one to max_ioqpairs to account for the admin queue pair */
 n->reg_size = pow2ceil(sizeof(NvmeBar) +
@@ -6631,6 +6659,13 @@ static void nvme_init_state(NvmeCtrl *n)
 n->starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
 n->aer_reqs = g_new0(NvmeRequest *, n->params.aerl + 1);
 
+list->numcntl = cpu_to_le16(n->params.sriov_max_vfs);
+for (i = 0; i < n->params.sriov_max_vfs; i++) {
+sctrl = >sec[i];
+sctrl->pcid = cpu_to_le16(n->cntlid);
+sctrl->vfn = cpu_to_le16(i + 1);
+}
+
 cap->cntlid = cpu_to_le16(n->cntlid);
 }
 
diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index 8a3613d9ab0..cfd232bb147 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -596,7 +596,7 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
 for (i = 0; i < ARRAY_SIZE(subsys->ctrls); i++) {
 NvmeCtrl *ctrl = subsys->ctrls[i];
 
-if (ctrl) {
+if (ctrl && ctrl != SUBSYS_SLOT_RSVD) {
 nvme_attach_ns(ctrl, ns);
 }
 }
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index e58bab841e2..7581ef26fdb 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -43,6 +43,7 @@ typedef struct NvmeBus {
 #define TYPE_NVME_SUBSYS "nvme-subsys"
 #define NVME_SUBSYS(obj) \
 OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)
+#define SUBSYS_SLOT_RSVD (void *)0x
 
 typedef struct NvmeSubsystem {
 DeviceState parent_obj;
@@ -67,6 +68,10 @@ static inline NvmeCtrl *nvme_subsys_ctrl(NvmeSubsystem 
*subsys,
 return NULL;
 }
 
+if (subsys->ctrls[cntlid] == SUBSYS_SLOT_RSVD) {
+return NULL;
+}
+
 return subsys->ctrls[cntlid];
 }
 
@@ -479,6 +484,7 @@ typedef struct NvmeCtrl {
 } features;
 
 NvmePriCtrlCap  pri_ctrl_cap;
+NvmeSecCtrlList sec_ctrl_list;
 } NvmeCtrl;
 
 static inline NvmeNamespace *nvme_ns(NvmeCtrl *n, uint32_t nsid)
@@ -513,6 +519,18 @@ static inline uint16_t nvme_cid(NvmeRequest *req)
 return le16_to_cpu(req->cqe.cid);
 }
 
+static inline NvmeSecCtrlEntry *nvme_sctrl(NvmeCtrl *n)
+{
+PCIDevice *pci_dev = >parent_obj;
+NvmeCtrl

Re: [RFC PATCH 3/3] tests/qtest/intel-hda-test: Add reproducer for issue #542

2022-03-18 Thread Thomas Huth


On 18/12/2021 17.09, Philippe Mathieu-Daudé wrote:

Include the qtest reproducer provided by Alexander Bulekov
in https://gitlab.com/qemu-project/qemu/-/issues/542.
Without the previous commit, we get:

   $ make check-qtest-i386
   ...
   Running test tests/qtest/intel-hda-test
   AddressSanitizer:DEADLYSIGNAL
   =
   ==1580408==ERROR: AddressSanitizer: stack-overflow on address 0x7ffc3d566fe0
   #0 0x63d297cf in address_space_translate_internal softmmu/physmem.c:356
   #1 0x63d27260 in flatview_do_translate softmmu/physmem.c:499:15
   #2 0x63d27af5 in flatview_translate softmmu/physmem.c:565:15
   #3 0x63d4ce84 in flatview_write softmmu/physmem.c:2850:10
   #4 0x63d4cb18 in address_space_write softmmu/physmem.c:2950:18
   #5 0x63d4d387 in address_space_rw softmmu/physmem.c:2960:16
   #6 0x62ae12f2 in dma_memory_rw_relaxed include/sysemu/dma.h:89:12
   #7 0x62ae104a in dma_memory_rw include/sysemu/dma.h:132:12
   #8 0x62ae6157 in dma_memory_write include/sysemu/dma.h:173:12
   #9 0x62ae5ec0 in stl_le_dma include/sysemu/dma.h:275:1
   #10 0x62ae5ba2 in stl_le_pci_dma include/hw/pci/pci.h:871:1
   #11 0x62ad59a6 in intel_hda_response hw/audio/intel-hda.c:372:12
   #12 0x62ad2afb in hda_codec_response hw/audio/intel-hda.c:107:5
   #13 0x62aec4e1 in hda_audio_command hw/audio/hda-codec.c:655:5
   #14 0x62ae05d9 in intel_hda_send_command hw/audio/intel-hda.c:307:5
   #15 0x62adff54 in intel_hda_corb_run hw/audio/intel-hda.c:342:9
   #16 0x62adc13b in intel_hda_set_corb_wp hw/audio/intel-hda.c:548:5
   #17 0x62ae5942 in intel_hda_reg_write hw/audio/intel-hda.c:977:9
   #18 0x62ada10a in intel_hda_mmio_write hw/audio/intel-hda.c:1054:5
   #19 0x63d8f383 in memory_region_write_accessor softmmu/memory.c:492:5
   #20 0x63d8ecc1 in access_with_adjusted_size softmmu/memory.c:554:18
   #21 0x63d8d5d6 in memory_region_dispatch_write softmmu/memory.c:1504:16
   #22 0x63d5e85e in flatview_write_continue softmmu/physmem.c:2812:23
   #23 0x63d4d05b in flatview_write softmmu/physmem.c:2854:12
   #24 0x63d4cb18 in address_space_write softmmu/physmem.c:2950:18
   #25 0x63d4d387 in address_space_rw softmmu/physmem.c:2960:16
   #26 0x62ae12f2 in dma_memory_rw_relaxed include/sysemu/dma.h:89:12
   #27 0x62ae104a in dma_memory_rw include/sysemu/dma.h:132:12
   #28 0x62ae6157 in dma_memory_write include/sysemu/dma.h:173:12
   #29 0x62ae5ec0 in stl_le_dma include/sysemu/dma.h:275:1
   #30 0x62ae5ba2 in stl_le_pci_dma include/hw/pci/pci.h:871:1
   #31 0x62ad59a6 in intel_hda_response hw/audio/intel-hda.c:372:12
   #32 0x62ad2afb in hda_codec_response hw/audio/intel-hda.c:107:5
   #33 0x62aec4e1 in hda_audio_command hw/audio/hda-codec.c:655:5
   #34 0x62ae05d9 in intel_hda_send_command hw/audio/intel-hda.c:307:5
   #35 0x62adff54 in intel_hda_corb_run hw/audio/intel-hda.c:342:9
   #36 0x62adc13b in intel_hda_set_corb_wp hw/audio/intel-hda.c:548:5
   #37 0x62ae5942 in intel_hda_reg_write hw/audio/intel-hda.c:977:9
   #38 0x62ada10a in intel_hda_mmio_write hw/audio/intel-hda.c:1054:5
   #39 0x63d8f383 in memory_region_write_accessor softmmu/memory.c:492:5
   #40 0x63d8ecc1 in access_with_adjusted_size softmmu/memory.c:554:18
   #41 0x63d8d5d6 in memory_region_dispatch_write softmmu/memory.c:1504:16
   #42 0x63d5e85e in flatview_write_continue softmmu/physmem.c:2812:23
   #43 0x63d4d05b in flatview_write softmmu/physmem.c:2854:12
   #44 0x63d4cb18 in address_space_write softmmu/physmem.c:2950:18
   #45 0x63d4d387 in address_space_rw softmmu/physmem.c:2960:16
   #46 0x62ae12f2 in dma_memory_rw_relaxed include/sysemu/dma.h:89:12
   #47 0x62ae104a in dma_memory_rw include/sysemu/dma.h:132:12
   #48 0x62ae6157 in dma_memory_write include/sysemu/dma.h:173:12
   ...
   SUMMARY: AddressSanitizer: stack-overflow softmmu/physmem.c:356 in 
address_space_translate_internal
   ==1580408==ABORTING
   Broken pipe
   Aborted (core dumped)

Signed-off-by: Philippe Mathieu-Daudé 
---
  tests/qtest/intel-hda-test.c | 34 ++
  1 file changed, 34 insertions(+)

diff --git a/tests/qtest/intel-hda-test.c b/tests/qtest/intel-hda-test.c
index fc25ccc33cc..a58c98e4d11 100644
--- a/tests/qtest/intel-hda-test.c
+++ b/tests/qtest/intel-hda-test.c
@@ -29,11 +29,45 @@ static void ich9_test(void)
  qtest_end();
  }
  
+/*

+ * https://gitlab.com/qemu-project/qemu/-/issues/542
+ * Used to trigger:
+ *  AddressSanitizer: stack-overflow
+ */
+static void test_issue542_ich6(void)
+{
+QTestState *s;
+
+s = qtest_init("-nographic -nodefaults -M pc-q35-6.2 "
+   "-device intel-hda,id=" HDA_ID CODEC_DEVICES);
+
+qtest_outl(s, 0xcf8, 0x8804);
+qtest_outw(s, 0xcfc, 0x06);
+qtest_bufwrite(s, 0xff0d060f, "\x03", 1);
+qtest_bufwrite(s, 0x0,

Re: [RFC PATCH 2/3] hw/audio/intel-hda: Restrict DMA engine to memories (not MMIO devices) [CVE-2021-3611]

2022-03-18 Thread Thomas Huth


On 18/12/2021 17.09, Philippe Mathieu-Daudé wrote:

Issue #542 reports a reentrancy problem when the DMA engine accesses
the HDA controller I/O registers. Fix by restricting the DMA engine
to memories regions (forbidding MMIO devices such the HDA controller).

Reported-by: OSS-Fuzz (Issue 28435)
Reported-by: Alexander Bulekov 
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/542
Signed-off-by: Philippe Mathieu-Daudé 
---
Likely intel_hda_xfer() and intel_hda_corb_run() should be restricted
too.
---
  hw/audio/intel-hda.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/audio/intel-hda.c b/hw/audio/intel-hda.c
index 0c1017edbbf..3aa57d274e6 100644
--- a/hw/audio/intel-hda.c
+++ b/hw/audio/intel-hda.c
@@ -345,7 +345,7 @@ static void intel_hda_corb_run(IntelHDAState *d)
  
  static void intel_hda_response(HDACodecDevice *dev, bool solicited, uint32_t response)

  {
-const MemTxAttrs attrs = MEMTXATTRS_UNSPECIFIED;
+const MemTxAttrs attrs = { .memory = true };
  HDACodecBus *bus = HDA_BUS(dev->qdev.parent_bus);
  IntelHDAState *d = container_of(bus, IntelHDAState, codecs);
  hwaddr addr;


That's maybe the best we can do right now to fix CVE-2021-3611 !

Reviewed-by: Thomas Huth

[PATCH v6 10/12] docs: Add documentation for SR-IOV and Virtualization Enhancements

2022-03-18 Thread Lukasz Maniak

Signed-off-by: Lukasz Maniak 
---
 docs/system/devices/nvme.rst | 82 
 1 file changed, 82 insertions(+)

diff --git a/docs/system/devices/nvme.rst b/docs/system/devices/nvme.rst
index b5acb2a9c19..aba253304e4 100644
--- a/docs/system/devices/nvme.rst
+++ b/docs/system/devices/nvme.rst
@@ -239,3 +239,85 @@ The virtual namespace device supports DIF- and DIX-based 
protection information
   to ``1`` to transfer protection information as the first eight bytes of
   metadata. Otherwise, the protection information is transferred as the last
   eight bytes.
+
+Virtualization Enhancements and SR-IOV (Experimental Support)
+-
+
+The ``nvme`` device supports Single Root I/O Virtualization and Sharing
+along with Virtualization Enhancements. The controller has to be linked to
+an NVM Subsystem device (``nvme-subsys``) for use with SR-IOV.
+
+A number of parameters are present (**please note, that they may be
+subject to change**):
+
+``sriov_max_vfs`` (default: ``0``)
+  Indicates the maximum number of PCIe virtual functions supported
+  by the controller. Specifying a non-zero value enables reporting of both
+  SR-IOV and ARI (Alternative Routing-ID Interpretation) capabilities
+  by the NVMe device. Virtual function controllers will not report SR-IOV.
+
+``sriov_vq_flexible``
+  Indicates the total number of flexible queue resources assignable to all
+  the secondary controllers. Implicitly sets the number of primary
+  controller's private resources to ``(max_ioqpairs - sriov_vq_flexible)``.
+
+``sriov_vi_flexible``
+  Indicates the total number of flexible interrupt resources assignable to
+  all the secondary controllers. Implicitly sets the number of primary
+  controller's private resources to ``(msix_qsize - sriov_vi_flexible)``.
+
+``sriov_max_vi_per_vf`` (default: ``0``)
+  Indicates the maximum number of virtual interrupt resources assignable
+  to a secondary controller. The default ``0`` resolves to
+  ``(sriov_vi_flexible / sriov_max_vfs)``
+
+``sriov_max_vq_per_vf`` (default: ``0``)
+  Indicates the maximum number of virtual queue resources assignable to
+  a secondary controller. The default ``0`` resolves to
+  ``(sriov_vq_flexible / sriov_max_vfs)``
+
+The simplest possible invocation enables the capability to set up one VF
+controller and assign an admin queue, an IO queue, and a MSI-X interrupt.
+
+.. code-block:: console
+
+   -device nvme-subsys,id=subsys0
+   -device nvme,serial=deadbeef,subsys=subsys0,sriov_max_vfs=1,
+sriov_vq_flexible=2,sriov_vi_flexible=1
+
+The minimum steps required to configure a functional NVMe secondary
+controller are:
+
+  * unbind flexible resources from the primary controller
+
+.. code-block:: console
+
+   nvme virt-mgmt /dev/nvme0 -c 0 -r 1 -a 1 -n 0
+   nvme virt-mgmt /dev/nvme0 -c 0 -r 0 -a 1 -n 0
+
+  * perform a Function Level Reset on the primary controller to actually
+release the resources
+
+.. code-block:: console
+
+   echo 1 > /sys/bus/pci/devices/:01:00.0/reset
+
+  * enable VF
+
+.. code-block:: console
+
+   echo 1 > /sys/bus/pci/devices/:01:00.0/sriov_numvfs
+
+  * assign the flexible resources to the VF and set it ONLINE
+
+.. code-block:: console
+
+   nvme virt-mgmt /dev/nvme0 -c 1 -r 1 -a 8 -n 1
+   nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 8 -n 2
+   nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 9 -n 0
+
+  * bind the NVMe driver to the VF
+
+.. code-block:: console
+
+   echo :01:00.1 > /sys/bus/pci/drivers/nvme/bind
\ No newline at end of file
-- 
2.25.1

[PATCH v6 09/12] hw/nvme: Add support for the Virtualization Management command

2022-03-18 Thread Lukasz Maniak

From: Łukasz Gieryk 

With the new command one can:
 - assign flexible resources (queues, interrupts) to primary and
   secondary controllers,
 - toggle the online/offline state of given controller.

Signed-off-by: Łukasz Gieryk 
---
 hw/nvme/ctrl.c   | 257 ++-
 hw/nvme/nvme.h   |  20 
 hw/nvme/trace-events |   3 +
 include/block/nvme.h |  17 +++
 4 files changed, 295 insertions(+), 2 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 011231ab5a6..247c09882dd 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -188,6 +188,7 @@
 #include "qemu/error-report.h"
 #include "qemu/log.h"
 #include "qemu/units.h"
+#include "qemu/range.h"
 #include "qapi/error.h"
 #include "qapi/visitor.h"
 #include "sysemu/sysemu.h"
@@ -262,6 +263,7 @@ static const uint32_t nvme_cse_acs[256] = {
 [NVME_ADM_CMD_GET_FEATURES] = NVME_CMD_EFF_CSUPP,
 [NVME_ADM_CMD_ASYNC_EV_REQ] = NVME_CMD_EFF_CSUPP,
 [NVME_ADM_CMD_NS_ATTACHMENT]= NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_NIC,
+[NVME_ADM_CMD_VIRT_MNGMT]   = NVME_CMD_EFF_CSUPP,
 [NVME_ADM_CMD_FORMAT_NVM]   = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
 };
 
@@ -293,6 +295,7 @@ static const uint32_t nvme_cse_iocs_zoned[256] = {
 };
 
 static void nvme_process_sq(void *opaque);
+static void nvme_ctrl_reset(NvmeCtrl *n, NvmeResetType rst);
 
 static uint16_t nvme_sqid(NvmeRequest *req)
 {
@@ -5838,6 +5841,167 @@ out:
 return status;
 }
 
+static void nvme_get_virt_res_num(NvmeCtrl *n, uint8_t rt, int *num_total,
+  int *num_prim, int *num_sec)
+{
+*num_total = le32_to_cpu(rt ?
+ n->pri_ctrl_cap.vifrt : n->pri_ctrl_cap.vqfrt);
+*num_prim = le16_to_cpu(rt ?
+n->pri_ctrl_cap.virfap : n->pri_ctrl_cap.vqrfap);
+*num_sec = le16_to_cpu(rt ? n->pri_ctrl_cap.virfa : n->pri_ctrl_cap.vqrfa);
+}
+
+static uint16_t nvme_assign_virt_res_to_prim(NvmeCtrl *n, NvmeRequest *req,
+ uint16_t cntlid, uint8_t rt,
+ int nr)
+{
+int num_total, num_prim, num_sec;
+
+if (cntlid != n->cntlid) {
+return NVME_INVALID_CTRL_ID | NVME_DNR;
+}
+
+nvme_get_virt_res_num(n, rt, _total, _prim, _sec);
+
+if (nr > num_total) {
+return NVME_INVALID_NUM_RESOURCES | NVME_DNR;
+}
+
+if (nr > num_total - num_sec) {
+return NVME_INVALID_RESOURCE_ID | NVME_DNR;
+}
+
+if (rt) {
+n->next_pri_ctrl_cap.virfap = cpu_to_le16(nr);
+} else {
+n->next_pri_ctrl_cap.vqrfap = cpu_to_le16(nr);
+}
+
+req->cqe.result = cpu_to_le32(nr);
+return req->status;
+}
+
+static void nvme_update_virt_res(NvmeCtrl *n, NvmeSecCtrlEntry *sctrl,
+ uint8_t rt, int nr)
+{
+int prev_nr, prev_total;
+
+if (rt) {
+prev_nr = le16_to_cpu(sctrl->nvi);
+prev_total = le32_to_cpu(n->pri_ctrl_cap.virfa);
+sctrl->nvi = cpu_to_le16(nr);
+n->pri_ctrl_cap.virfa = cpu_to_le32(prev_total + nr - prev_nr);
+} else {
+prev_nr = le16_to_cpu(sctrl->nvq);
+prev_total = le32_to_cpu(n->pri_ctrl_cap.vqrfa);
+sctrl->nvq = cpu_to_le16(nr);
+n->pri_ctrl_cap.vqrfa = cpu_to_le32(prev_total + nr - prev_nr);
+}
+}
+
+static uint16_t nvme_assign_virt_res_to_sec(NvmeCtrl *n, NvmeRequest *req,
+uint16_t cntlid, uint8_t rt, int 
nr)
+{
+int num_total, num_prim, num_sec, num_free, diff, limit;
+NvmeSecCtrlEntry *sctrl;
+
+sctrl = nvme_sctrl_for_cntlid(n, cntlid);
+if (!sctrl) {
+return NVME_INVALID_CTRL_ID | NVME_DNR;
+}
+
+if (sctrl->scs) {
+return NVME_INVALID_SEC_CTRL_STATE | NVME_DNR;
+}
+
+limit = le16_to_cpu(rt ? n->pri_ctrl_cap.vifrsm : n->pri_ctrl_cap.vqfrsm);
+if (nr > limit) {
+return NVME_INVALID_NUM_RESOURCES | NVME_DNR;
+}
+
+nvme_get_virt_res_num(n, rt, _total, _prim, _sec);
+num_free = num_total - num_prim - num_sec;
+diff = nr - le16_to_cpu(rt ? sctrl->nvi : sctrl->nvq);
+
+if (diff > num_free) {
+return NVME_INVALID_RESOURCE_ID | NVME_DNR;
+}
+
+nvme_update_virt_res(n, sctrl, rt, nr);
+req->cqe.result = cpu_to_le32(nr);
+
+return req->status;
+}
+
+static uint16_t nvme_virt_set_state(NvmeCtrl *n, uint16_t cntlid, bool online)
+{
+NvmeCtrl *sn = NULL;
+NvmeSecCtrlEntry *sctrl;
+int vf_index;
+
+sctrl = nvme_sctrl_for_cntlid(n, cntlid);
+if (!sctrl) {
+return NVME_INVALID_CTRL_ID | NVME_DNR;
+}
+
+if (!pci_is_vf(>parent_obj)) {
+vf_index = le16_to_cpu(sctrl->vfn) - 1;
+sn = NVME(pcie_sriov_get_vf_at_index(>parent_obj, vf_index));
+}
+
+if (online) {
+if (!sctrl->nvi || (le16_to_cpu(sctrl->nvq) < 2) || !sn) {
+return NVME_INVALID_SEC_CTRL_STATE | NVME_DNR;
+}
+
+if

[PATCH v6 07/12] hw/nvme: Calculate BAR attributes in a function

2022-03-18 Thread Lukasz Maniak

From: Łukasz Gieryk 

An NVMe device with SR-IOV capability calculates the BAR size
differently for PF and VF, so it makes sense to extract the common code
to a separate function.

Signed-off-by: Łukasz Gieryk 
Reviewed-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 45 +++--
 1 file changed, 31 insertions(+), 14 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index f34d73a00c8..f0554a07c40 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -6728,6 +6728,34 @@ static void nvme_init_pmr(NvmeCtrl *n, PCIDevice 
*pci_dev)
 memory_region_set_enabled(>pmr.dev->mr, false);
 }
 
+static uint64_t nvme_bar_size(unsigned total_queues, unsigned total_irqs,
+  unsigned *msix_table_offset,
+  unsigned *msix_pba_offset)
+{
+uint64_t bar_size, msix_table_size, msix_pba_size;
+
+bar_size = sizeof(NvmeBar) + 2 * total_queues * NVME_DB_SIZE;
+bar_size = QEMU_ALIGN_UP(bar_size, 4 * KiB);
+
+if (msix_table_offset) {
+*msix_table_offset = bar_size;
+}
+
+msix_table_size = PCI_MSIX_ENTRY_SIZE * total_irqs;
+bar_size += msix_table_size;
+bar_size = QEMU_ALIGN_UP(bar_size, 4 * KiB);
+
+if (msix_pba_offset) {
+*msix_pba_offset = bar_size;
+}
+
+msix_pba_size = QEMU_ALIGN_UP(total_irqs, 64) / 8;
+bar_size += msix_pba_size;
+
+bar_size = pow2ceil(bar_size);
+return bar_size;
+}
+
 static void nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset,
 uint64_t bar_size)
 {
@@ -6767,7 +6795,7 @@ static int nvme_add_pm_capability(PCIDevice *pci_dev, 
uint8_t offset)
 static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp)
 {
 uint8_t *pci_conf = pci_dev->config;
-uint64_t bar_size, msix_table_size, msix_pba_size;
+uint64_t bar_size;
 unsigned msix_table_offset, msix_pba_offset;
 int ret;
 
@@ -6793,19 +6821,8 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice 
*pci_dev, Error **errp)
 }
 
 /* add one to max_ioqpairs to account for the admin queue pair */
-bar_size = sizeof(NvmeBar) +
-   2 * (n->params.max_ioqpairs + 1) * NVME_DB_SIZE;
-bar_size = QEMU_ALIGN_UP(bar_size, 4 * KiB);
-msix_table_offset = bar_size;
-msix_table_size = PCI_MSIX_ENTRY_SIZE * n->params.msix_qsize;
-
-bar_size += msix_table_size;
-bar_size = QEMU_ALIGN_UP(bar_size, 4 * KiB);
-msix_pba_offset = bar_size;
-msix_pba_size = QEMU_ALIGN_UP(n->params.msix_qsize, 64) / 8;
-
-bar_size += msix_pba_size;
-bar_size = pow2ceil(bar_size);
+bar_size = nvme_bar_size(n->params.max_ioqpairs + 1, n->params.msix_qsize,
+ _table_offset, _pba_offset);
 
 memory_region_init(>bar0, OBJECT(n), "nvme-bar0", bar_size);
 memory_region_init_io(>iomem, OBJECT(n), _mmio_ops, n, "nvme",
-- 
2.25.1

[PATCH v6 12/12] hw/acpi: Make the PCI hot-plug aware of SR-IOV

2022-03-18 Thread Lukasz Maniak

From: Łukasz Gieryk 

PCI device capable of SR-IOV support is a new, still-experimental
feature with only a single working example of the Nvme device.

This patch in an attempt to fix a double-free problem when a
SR-IOV-capable Nvme device is hot-unplugged. The problem and the
reproduction steps can be found in this thread:

https://patchew.org/QEMU/20220217174504.1051716-1-lukasz.man...@linux.intel.com/20220217174504.1051716-14-lukasz.man...@linux.intel.com/

Details of the proposed solution are, for convenience, included below.

1) The current SR-IOV implementation assumes it’s the PhysicalFunction
   that creates and deletes VirtualFunctions.
2) It’s a design decision (the Nvme device at least) for the VFs to be
   of the same class as PF. Effectively, they share the dc->hotpluggable
   value.
3) When a VF is created, it’s added as a child node to PF’s PCI bus
   slot.
4) Monitor/device_del triggers the ACPI mechanism. The implementation is
   not aware of SR/IOV and ejects PF’s PCI slot, directly unrealizing all
   hot-pluggable (!acpi_pcihp_pc_no_hotplug) children nodes.
5) VFs are unrealized directly, and it doesn’t work well with (1).
   SR/IOV structures are not updated, so when it’s PF’s turn to be
   unrealized, it works on stale pointers to already-deleted VFs.

Signed-off-by: Łukasz Gieryk 
---
 hw/acpi/pcihp.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
index 6351bd3424d..248839e1110 100644
--- a/hw/acpi/pcihp.c
+++ b/hw/acpi/pcihp.c
@@ -192,8 +192,12 @@ static bool acpi_pcihp_pc_no_hotplug(AcpiPciHpState *s, 
PCIDevice *dev)
  * ACPI doesn't allow hotplug of bridge devices.  Don't allow
  * hot-unplug of bridge devices unless they were added by hotplug
  * (and so, not described by acpi).
+ *
+ * Don't allow hot-unplug of SR-IOV Virtual Functions, as they
+ * will be removed implicitly, when Physical Function is unplugged.
  */
-return (pc->is_bridge && !dev->qdev.hotplugged) || !dc->hotpluggable;
+return (pc->is_bridge && !dev->qdev.hotplugged) || !dc->hotpluggable ||
+   pci_is_vf(dev);
 }
 
 static void acpi_pcihp_eject_slot(AcpiPciHpState *s, unsigned bsel, unsigned 
slots)
-- 
2.25.1

[PATCH v6 08/12] hw/nvme: Initialize capability structures for primary/secondary controllers

2022-03-18 Thread Lukasz Maniak

From: Łukasz Gieryk 

With four new properties:
 - sriov_v{i,q}_flexible,
 - sriov_max_v{i,q}_per_vf,
one can configure the number of available flexible resources, as well as
the limits. The primary and secondary controller capability structures
are initialized accordingly.

Since the number of available queues (interrupts) now varies between
VF/PF, BAR size calculation is also adjusted.

Signed-off-by: Łukasz Gieryk 
---
 hw/nvme/ctrl.c   | 141 ---
 hw/nvme/nvme.h   |   4 ++
 include/block/nvme.h |   5 ++
 3 files changed, 143 insertions(+), 7 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index f0554a07c40..011231ab5a6 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -36,6 +36,10 @@
  *  zoned.zasl=, \
  *  zoned.auto_transition=, \
  *  sriov_max_vfs= \
+ *  sriov_vq_flexible= \
+ *  sriov_vi_flexible= \
+ *  sriov_max_vi_per_vf= \
+ *  sriov_max_vq_per_vf= \
  *  subsys=
  *  -device nvme-ns,drive=,bus=,nsid=,\
  *  zoned=, \
@@ -113,6 +117,29 @@
  *   enables reporting of both SR-IOV and ARI capabilities by the NVMe device.
  *   Virtual function controllers will not report SR-IOV capability.
  *
+ *   NOTE: Single Root I/O Virtualization support is experimental.
+ *   All the related parameters may be subject to change.
+ *
+ * - `sriov_vq_flexible`
+ *   Indicates the total number of flexible queue resources assignable to all
+ *   the secondary controllers. Implicitly sets the number of primary
+ *   controller's private resources to `(max_ioqpairs - sriov_vq_flexible)`.
+ *
+ * - `sriov_vi_flexible`
+ *   Indicates the total number of flexible interrupt resources assignable to
+ *   all the secondary controllers. Implicitly sets the number of primary
+ *   controller's private resources to `(msix_qsize - sriov_vi_flexible)`.
+ *
+ * - `sriov_max_vi_per_vf`
+ *   Indicates the maximum number of virtual interrupt resources assignable
+ *   to a secondary controller. The default 0 resolves to
+ *   `(sriov_vi_flexible / sriov_max_vfs)`.
+ *
+ * - `sriov_max_vq_per_vf`
+ *   Indicates the maximum number of virtual queue resources assignable to
+ *   a secondary controller. The default 0 resolves to
+ *   `(sriov_vq_flexible / sriov_max_vfs)`.
+ *
  * nvme namespace device parameters
  * 
  * - `shared`
@@ -185,6 +212,7 @@
 #define NVME_NUM_FW_SLOTS 1
 #define NVME_DEFAULT_MAX_ZA_SIZE (128 * KiB)
 #define NVME_MAX_VFS 127
+#define NVME_VF_RES_GRANULARITY 1
 #define NVME_VF_OFFSET 0x1
 #define NVME_VF_STRIDE 1
 
@@ -6656,6 +6684,53 @@ static void nvme_check_constraints(NvmeCtrl *n, Error 
**errp)
 error_setg(errp, "PMR is not supported with SR-IOV");
 return;
 }
+
+if (!params->sriov_vq_flexible || !params->sriov_vi_flexible) {
+error_setg(errp, "both sriov_vq_flexible and sriov_vi_flexible"
+   " must be set for the use of SR-IOV");
+return;
+}
+
+if (params->sriov_vq_flexible < params->sriov_max_vfs * 2) {
+error_setg(errp, "sriov_vq_flexible must be greater than or equal"
+   " to %d (sriov_max_vfs * 2)", params->sriov_max_vfs * 
2);
+return;
+}
+
+if (params->max_ioqpairs < params->sriov_vq_flexible + 2) {
+error_setg(errp, "(max_ioqpairs - sriov_vq_flexible) must be"
+   " greater than or equal to 2");
+return;
+}
+
+if (params->sriov_vi_flexible < params->sriov_max_vfs) {
+error_setg(errp, "sriov_vi_flexible must be greater than or equal"
+   " to %d (sriov_max_vfs)", params->sriov_max_vfs);
+return;
+}
+
+if (params->msix_qsize < params->sriov_vi_flexible + 1) {
+error_setg(errp, "(msix_qsize - sriov_vi_flexible) must be"
+   " greater than or equal to 1");
+return;
+}
+
+if (params->sriov_max_vi_per_vf &&
+(params->sriov_max_vi_per_vf - 1) % NVME_VF_RES_GRANULARITY) {
+error_setg(errp, "sriov_max_vi_per_vf must meet:"
+   " (sriov_max_vi_per_vf - 1) %% %d == 0 and"
+   " sriov_max_vi_per_vf >= 1", NVME_VF_RES_GRANULARITY);
+return;
+}
+
+if (params->sriov_max_vq_per_vf &&
+(params->sriov_max_vq_per_vf < 2 ||
+ (params->sriov_max_vq_per_vf - 1) % NVME_VF_RES_GRANULARITY)) {
+error_setg(errp, "sriov_max_vq_per_vf must meet:"
+   " (sriov_max_vq_per_vf - 1) %% %d == 0 and"
+   " sriov_max_vq_per_vf >= 2", NVME_VF_RES_GRANULARITY);
+return;
+}
 }
 }
 
@@ -6664,10 +6739,19 @@ static void nvme_init_state(NvmeCtrl *n)
 NvmePriCtrlCap *cap = >pri_ctrl_cap;

[PATCH v6 11/12] hw/nvme: Update the initalization place for the AER queue

2022-03-18 Thread Lukasz Maniak

From: Łukasz Gieryk 

This patch updates the initialization place for the AER queue, so it’s
initialized once, at controller initialization, and not every time
controller is enabled.

While the original version works for a non-SR-IOV device, as it’s hard
to interact with the controller if it’s not enabled, the multiple
reinitialization is not necessarily correct.

With the SR/IOV feature enabled a segfault can happen: a VF can have its
controller disabled, while a namespace can still be attached to the
controller through the parent PF. An event generated in such case ends
up on an uninitialized queue.

While it’s an interesting question whether a VF should support AER in
the first place, I don’t think it must be answered today.

Signed-off-by: Łukasz Gieryk 
Reviewed-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 247c09882dd..b0862b1d96c 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -6326,8 +6326,6 @@ static int nvme_start_ctrl(NvmeCtrl *n)
 
 nvme_set_timestamp(n, 0ULL);
 
-QTAILQ_INIT(>aer_queue);
-
 nvme_select_iocs(n);
 
 return 0;
@@ -6987,6 +6985,7 @@ static void nvme_init_state(NvmeCtrl *n)
 n->features.temp_thresh_hi = NVME_TEMPERATURE_WARNING;
 n->starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
 n->aer_reqs = g_new0(NvmeRequest *, n->params.aerl + 1);
+QTAILQ_INIT(>aer_queue);
 
 list->numcntl = cpu_to_le16(max_vfs);
 for (i = 0; i < max_vfs; i++) {
-- 
2.25.1

[PATCH v6 02/12] hw/nvme: Add support for Primary Controller Capabilities

2022-03-18 Thread Lukasz Maniak

Implementation of Primary Controller Capabilities data
structure (Identify command with CNS value of 14h).

Currently, the command returns only ID of a primary controller.
Handling of remaining fields are added in subsequent patches
implementing virtualization enhancements.

Signed-off-by: Lukasz Maniak 
Reviewed-by: Klaus Jensen 
---
 hw/nvme/ctrl.c   | 23 ++-
 hw/nvme/nvme.h   |  2 ++
 hw/nvme/trace-events |  1 +
 include/block/nvme.h | 23 +++
 4 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 0e1d8d03c87..ea9d5af3545 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -4799,6 +4799,14 @@ static uint16_t nvme_identify_ctrl_list(NvmeCtrl *n, 
NvmeRequest *req,
 return nvme_c2h(n, (uint8_t *)list, sizeof(list), req);
 }
 
+static uint16_t nvme_identify_pri_ctrl_cap(NvmeCtrl *n, NvmeRequest *req)
+{
+trace_pci_nvme_identify_pri_ctrl_cap(le16_to_cpu(n->pri_ctrl_cap.cntlid));
+
+return nvme_c2h(n, (uint8_t *)>pri_ctrl_cap,
+sizeof(NvmePriCtrlCap), req);
+}
+
 static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req,
  bool active)
 {
@@ -5018,6 +5026,8 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeRequest 
*req)
 return nvme_identify_ctrl_list(n, req, true);
 case NVME_ID_CNS_CTRL_LIST:
 return nvme_identify_ctrl_list(n, req, false);
+case NVME_ID_CNS_PRIMARY_CTRL_CAP:
+return nvme_identify_pri_ctrl_cap(n, req);
 case NVME_ID_CNS_CS_NS:
 return nvme_identify_ns_csi(n, req, true);
 case NVME_ID_CNS_CS_NS_PRESENT:
@@ -6609,6 +6619,8 @@ static void nvme_check_constraints(NvmeCtrl *n, Error 
**errp)
 
 static void nvme_init_state(NvmeCtrl *n)
 {
+NvmePriCtrlCap *cap = >pri_ctrl_cap;
+
 /* add one to max_ioqpairs to account for the admin queue pair */
 n->reg_size = pow2ceil(sizeof(NvmeBar) +
2 * (n->params.max_ioqpairs + 1) * NVME_DB_SIZE);
@@ -6618,6 +6630,8 @@ static void nvme_init_state(NvmeCtrl *n)
 n->features.temp_thresh_hi = NVME_TEMPERATURE_WARNING;
 n->starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
 n->aer_reqs = g_new0(NvmeRequest *, n->params.aerl + 1);
+
+cap->cntlid = cpu_to_le16(n->cntlid);
 }
 
 static void nvme_init_cmb(NvmeCtrl *n, PCIDevice *pci_dev)
@@ -6919,15 +6933,14 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
**errp)
 qbus_init(>bus, sizeof(NvmeBus), TYPE_NVME_BUS,
   _dev->qdev, n->parent_obj.qdev.id);
 
-nvme_init_state(n);
-if (nvme_init_pci(n, pci_dev, errp)) {
-return;
-}
-
 if (nvme_init_subsys(n, errp)) {
 error_propagate(errp, local_err);
 return;
 }
+nvme_init_state(n);
+if (nvme_init_pci(n, pci_dev, errp)) {
+return;
+}
 nvme_init_ctrl(n, pci_dev);
 
 /* setup a namespace if the controller drive property was given */
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 89ca6e96401..e58bab841e2 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -477,6 +477,8 @@ typedef struct NvmeCtrl {
 uint32_tasync_config;
 NvmeHostBehaviorSupport hbs;
 } features;
+
+NvmePriCtrlCap  pri_ctrl_cap;
 } NvmeCtrl;
 
 static inline NvmeNamespace *nvme_ns(NvmeCtrl *n, uint32_t nsid)
diff --git a/hw/nvme/trace-events b/hw/nvme/trace-events
index ff1b4589692..1834b17cf21 100644
--- a/hw/nvme/trace-events
+++ b/hw/nvme/trace-events
@@ -56,6 +56,7 @@ pci_nvme_identify_ctrl(void) "identify controller"
 pci_nvme_identify_ctrl_csi(uint8_t csi) "identify controller, csi=0x%"PRIx8""
 pci_nvme_identify_ns(uint32_t ns) "nsid %"PRIu32""
 pci_nvme_identify_ctrl_list(uint8_t cns, uint16_t cntid) "cns 0x%"PRIx8" cntid 
%"PRIu16""
+pci_nvme_identify_pri_ctrl_cap(uint16_t cntlid) "identify primary controller 
capabilities cntlid=%"PRIu16""
 pci_nvme_identify_ns_csi(uint32_t ns, uint8_t csi) "nsid=%"PRIu32", 
csi=0x%"PRIx8""
 pci_nvme_identify_nslist(uint32_t ns) "nsid %"PRIu32""
 pci_nvme_identify_nslist_csi(uint16_t ns, uint8_t csi) "nsid=%"PRIu16", 
csi=0x%"PRIx8""
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 3737351cc81..524a04fb94e 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -1033,6 +1033,7 @@ enum NvmeIdCns {
 NVME_ID_CNS_NS_PRESENT= 0x11,
 NVME_ID_CNS_NS_ATTACHED_CTRL_LIST = 0x12,
 NVME_ID_CNS_CTRL_LIST = 0x13,
+NVME_ID_CNS_PRIMARY_CTRL_CAP  = 0x14,
 NVME_ID_CNS_CS_NS_PRESENT_LIST= 0x1a,
 NVME_ID_CNS_CS_NS_PRESENT = 0x1b,
 NVME_ID_CNS_IO_COMMAND_SET= 0x1c,
@@ -1553,6 +1554,27 @@ typedef enum NvmeZoneState {
 NVME_ZONE_STATE_OFFLINE  = 0x0f,
 } NvmeZoneState;
 
+typedef struct QEMU_PACKED NvmePriCtrlCap {
+uint16_tcntlid;
+uint16_tportid;
+uint8_t crt;
+uint8_t rsvd5[27];
+uint32_tvqfrt;
+uint32_tvqrfa;
+uint16_t

[PATCH v6 06/12] hw/nvme: Remove reg_size variable and update BAR0 size calculation

2022-03-18 Thread Lukasz Maniak

From: Łukasz Gieryk 

The n->reg_size parameter unnecessarily splits the BAR0 size calculation
in two phases; removed to simplify the code.

With all the calculations done in one place, it seems the pow2ceil,
applied originally to reg_size, is unnecessary. The rounding should
happen as the last step, when BAR size includes Nvme registers, queue
registers, and MSIX-related space.

Finally, the size of the mmio memory region is extended to cover the 1st
4KiB padding (see the map below). Access to this range is handled as
interaction with a non-existing queue and generates an error trace, so
actually nothing changes, while the reg_size variable is no longer needed.


|  BAR0|

[Nvme Registers]
[Queues]
[power-of-2 padding] - removed in this patch
[4KiB padding (1)  ]
[MSIX TABLE]
[4KiB padding (2)  ]
[MSIX PBA  ]
[power-of-2 padding]

Signed-off-by: Łukasz Gieryk 
Reviewed-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 10 +-
 hw/nvme/nvme.h |  1 -
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 12372038075..f34d73a00c8 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -6669,9 +6669,6 @@ static void nvme_init_state(NvmeCtrl *n)
 n->conf_ioqpairs = n->params.max_ioqpairs;
 n->conf_msix_qsize = n->params.msix_qsize;
 
-/* add one to max_ioqpairs to account for the admin queue pair */
-n->reg_size = pow2ceil(sizeof(NvmeBar) +
-   2 * (n->params.max_ioqpairs + 1) * NVME_DB_SIZE);
 n->sq = g_new0(NvmeSQueue *, n->params.max_ioqpairs + 1);
 n->cq = g_new0(NvmeCQueue *, n->params.max_ioqpairs + 1);
 n->temperature = NVME_TEMPERATURE;
@@ -6795,7 +6792,10 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice 
*pci_dev, Error **errp)
 pcie_ari_init(pci_dev, 0x100, 1);
 }
 
-bar_size = QEMU_ALIGN_UP(n->reg_size, 4 * KiB);
+/* add one to max_ioqpairs to account for the admin queue pair */
+bar_size = sizeof(NvmeBar) +
+   2 * (n->params.max_ioqpairs + 1) * NVME_DB_SIZE;
+bar_size = QEMU_ALIGN_UP(bar_size, 4 * KiB);
 msix_table_offset = bar_size;
 msix_table_size = PCI_MSIX_ENTRY_SIZE * n->params.msix_qsize;
 
@@ -6809,7 +6809,7 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, 
Error **errp)
 
 memory_region_init(>bar0, OBJECT(n), "nvme-bar0", bar_size);
 memory_region_init_io(>iomem, OBJECT(n), _mmio_ops, n, "nvme",
-  n->reg_size);
+  msix_table_offset);
 memory_region_add_subregion(>bar0, 0, >iomem);
 
 if (pci_is_vf(pci_dev)) {
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 5bd6ac698bc..adde718105b 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -428,7 +428,6 @@ typedef struct NvmeCtrl {
 uint16_tmax_prp_ents;
 uint16_tcqe_size;
 uint16_tsqe_size;
-uint32_treg_size;
 uint32_tmax_q_ents;
 uint8_t outstanding_aers;
 uint32_tirq_status;
-- 
2.25.1

[PATCH v6 04/12] hw/nvme: Implement the Function Level Reset

2022-03-18 Thread Lukasz Maniak

From: Łukasz Gieryk 

This patch implements the Function Level Reset, a feature currently not
implemented for the Nvme device, while listed as a mandatory ("shall")
in the 1.4 spec.

The implementation reuses FLR-related building blocks defined for the
pci-bridge module, and follows the same logic:
- FLR capability is advertised in the PCIE config,
- custom pci_write_config callback detects a write to the trigger
  register and performs the PCI reset,
- which, eventually, calls the custom dc->reset handler.

Depending on reset type, parts of the state should (or should not) be
cleared. To distinguish the type of reset, an additional parameter is
passed to the reset function.

This patch also enables advertisement of the Power Management PCI
capability. The main reason behind it is to announce the no_soft_reset=1
bit, to signal SR-IOV support where each VF can be reset individually.

The implementation purposedly ignores writes to the PMCS.PS register,
as even such naïve behavior is enough to correctly handle the D3->D0
transition.

It’s worth to note, that the power state transition back to to D3, with
all the corresponding side effects, wasn't and stil isn't handled
properly.

Signed-off-by: Łukasz Gieryk 
Reviewed-by: Klaus Jensen 
---
 hw/nvme/ctrl.c   | 52 
 hw/nvme/nvme.h   |  5 +
 hw/nvme/trace-events |  1 +
 3 files changed, 54 insertions(+), 4 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index b1b1bebbaf2..e6d6e5840af 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -5901,7 +5901,7 @@ static void nvme_process_sq(void *opaque)
 }
 }
 
-static void nvme_ctrl_reset(NvmeCtrl *n)
+static void nvme_ctrl_reset(NvmeCtrl *n, NvmeResetType rst)
 {
 NvmeNamespace *ns;
 int i;
@@ -5933,7 +5933,9 @@ static void nvme_ctrl_reset(NvmeCtrl *n)
 }
 
 if (!pci_is_vf(>parent_obj) && n->params.sriov_max_vfs) {
-pcie_sriov_pf_disable_vfs(>parent_obj);
+if (rst != NVME_RESET_CONTROLLER) {
+pcie_sriov_pf_disable_vfs(>parent_obj);
+}
 }
 
 n->aer_queued = 0;
@@ -6167,7 +6169,7 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, 
uint64_t data,
 }
 } else if (!NVME_CC_EN(data) && NVME_CC_EN(cc)) {
 trace_pci_nvme_mmio_stopped();
-nvme_ctrl_reset(n);
+nvme_ctrl_reset(n, NVME_RESET_CONTROLLER);
 cc = 0;
 csts &= ~NVME_CSTS_READY;
 }
@@ -6725,6 +6727,28 @@ static void nvme_init_sriov(NvmeCtrl *n, PCIDevice 
*pci_dev, uint16_t offset,
   PCI_BASE_ADDRESS_MEM_TYPE_64, bar_size);
 }
 
+static int nvme_add_pm_capability(PCIDevice *pci_dev, uint8_t offset)
+{
+Error *err = NULL;
+int ret;
+
+ret = pci_add_capability(pci_dev, PCI_CAP_ID_PM, offset,
+ PCI_PM_SIZEOF, );
+if (err) {
+error_report_err(err);
+return ret;
+}
+
+pci_set_word(pci_dev->config + offset + PCI_PM_PMC,
+ PCI_PM_CAP_VER_1_2);
+pci_set_word(pci_dev->config + offset + PCI_PM_CTRL,
+ PCI_PM_CTRL_NO_SOFT_RESET);
+pci_set_word(pci_dev->wmask + offset + PCI_PM_CTRL,
+ PCI_PM_CTRL_STATE_MASK);
+
+return 0;
+}
+
 static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp)
 {
 uint8_t *pci_conf = pci_dev->config;
@@ -6746,7 +6770,9 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, 
Error **errp)
 }
 
 pci_config_set_class(pci_conf, PCI_CLASS_STORAGE_EXPRESS);
+nvme_add_pm_capability(pci_dev, 0x60);
 pcie_endpoint_cap_init(pci_dev, 0x80);
+pcie_cap_flr_init(pci_dev);
 if (n->params.sriov_max_vfs) {
 pcie_ari_init(pci_dev, 0x100, 1);
 }
@@ -6997,7 +7023,7 @@ static void nvme_exit(PCIDevice *pci_dev)
 NvmeNamespace *ns;
 int i;
 
-nvme_ctrl_reset(n);
+nvme_ctrl_reset(n, NVME_RESET_FUNCTION);
 
 if (n->subsys) {
 for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
@@ -7096,6 +7122,22 @@ static void nvme_set_smart_warning(Object *obj, Visitor 
*v, const char *name,
 }
 }
 
+static void nvme_pci_reset(DeviceState *qdev)
+{
+PCIDevice *pci_dev = PCI_DEVICE(qdev);
+NvmeCtrl *n = NVME(pci_dev);
+
+trace_pci_nvme_pci_reset();
+nvme_ctrl_reset(n, NVME_RESET_FUNCTION);
+}
+
+static void nvme_pci_write_config(PCIDevice *dev, uint32_t address,
+  uint32_t val, int len)
+{
+pci_default_write_config(dev, address, val, len);
+pcie_cap_flr_write_config(dev, address, val, len);
+}
+
 static const VMStateDescription nvme_vmstate = {
 .name = "nvme",
 .unmigratable = 1,
@@ -7107,6 +7149,7 @@ static void nvme_class_init(ObjectClass *oc, void *data)
 PCIDeviceClass *pc = PCI_DEVICE_CLASS(oc);
 
 pc->realize = nvme_realize;
+pc->config_write = nvme_pci_write_config;
 pc->exit = nvme_exit;
 pc->class_id = PCI_CLASS_STORAGE_EXPRESS;

[PATCH v6 03/12] hw/nvme: Add support for Secondary Controller List

2022-03-18 Thread Lukasz Maniak

Introduce handling for Secondary Controller List (Identify command with
CNS value of 15h).

Secondary controller ids are unique in the subsystem, hence they are
reserved by it upon initialization of the primary controller to the
number of sriov_max_vfs.

ID reservation requires the addition of an intermediate controller slot
state, so the reserved controller has the address 0x.
A secondary controller is in the reserved state when it has no virtual
function assigned, but its primary controller is realized.
Secondary controller reservations are released to NULL when its primary
controller is unregistered.

Signed-off-by: Lukasz Maniak 
---
 hw/nvme/ctrl.c   | 35 +
 hw/nvme/ns.c |  2 +-
 hw/nvme/nvme.h   | 18 +++
 hw/nvme/subsys.c | 75 ++--
 hw/nvme/trace-events |  1 +
 include/block/nvme.h | 20 
 6 files changed, 141 insertions(+), 10 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index ea9d5af3545..b1b1bebbaf2 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -4807,6 +4807,29 @@ static uint16_t nvme_identify_pri_ctrl_cap(NvmeCtrl *n, 
NvmeRequest *req)
 sizeof(NvmePriCtrlCap), req);
 }
 
+static uint16_t nvme_identify_sec_ctrl_list(NvmeCtrl *n, NvmeRequest *req)
+{
+NvmeIdentify *c = (NvmeIdentify *)>cmd;
+uint16_t pri_ctrl_id = le16_to_cpu(n->pri_ctrl_cap.cntlid);
+uint16_t min_id = le16_to_cpu(c->ctrlid);
+uint8_t num_sec_ctrl = n->sec_ctrl_list.numcntl;
+NvmeSecCtrlList list = {0};
+uint8_t i;
+
+for (i = 0; i < num_sec_ctrl; i++) {
+if (n->sec_ctrl_list.sec[i].scid >= min_id) {
+list.numcntl = num_sec_ctrl - i;
+memcpy(, n->sec_ctrl_list.sec + i,
+   list.numcntl * sizeof(NvmeSecCtrlEntry));
+break;
+}
+}
+
+trace_pci_nvme_identify_sec_ctrl_list(pri_ctrl_id, list.numcntl);
+
+return nvme_c2h(n, (uint8_t *), sizeof(list), req);
+}
+
 static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req,
  bool active)
 {
@@ -5028,6 +5051,8 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeRequest 
*req)
 return nvme_identify_ctrl_list(n, req, false);
 case NVME_ID_CNS_PRIMARY_CTRL_CAP:
 return nvme_identify_pri_ctrl_cap(n, req);
+case NVME_ID_CNS_SECONDARY_CTRL_LIST:
+return nvme_identify_sec_ctrl_list(n, req);
 case NVME_ID_CNS_CS_NS:
 return nvme_identify_ns_csi(n, req, true);
 case NVME_ID_CNS_CS_NS_PRESENT:
@@ -6620,6 +6645,9 @@ static void nvme_check_constraints(NvmeCtrl *n, Error 
**errp)
 static void nvme_init_state(NvmeCtrl *n)
 {
 NvmePriCtrlCap *cap = >pri_ctrl_cap;
+NvmeSecCtrlList *list = >sec_ctrl_list;
+NvmeSecCtrlEntry *sctrl;
+int i;
 
 /* add one to max_ioqpairs to account for the admin queue pair */
 n->reg_size = pow2ceil(sizeof(NvmeBar) +
@@ -6631,6 +6659,13 @@ static void nvme_init_state(NvmeCtrl *n)
 n->starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
 n->aer_reqs = g_new0(NvmeRequest *, n->params.aerl + 1);
 
+list->numcntl = cpu_to_le16(n->params.sriov_max_vfs);
+for (i = 0; i < n->params.sriov_max_vfs; i++) {
+sctrl = >sec[i];
+sctrl->pcid = cpu_to_le16(n->cntlid);
+sctrl->vfn = cpu_to_le16(i + 1);
+}
+
 cap->cntlid = cpu_to_le16(n->cntlid);
 }
 
diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index 8a3613d9ab0..cfd232bb147 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -596,7 +596,7 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
 for (i = 0; i < ARRAY_SIZE(subsys->ctrls); i++) {
 NvmeCtrl *ctrl = subsys->ctrls[i];
 
-if (ctrl) {
+if (ctrl && ctrl != SUBSYS_SLOT_RSVD) {
 nvme_attach_ns(ctrl, ns);
 }
 }
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index e58bab841e2..7581ef26fdb 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -43,6 +43,7 @@ typedef struct NvmeBus {
 #define TYPE_NVME_SUBSYS "nvme-subsys"
 #define NVME_SUBSYS(obj) \
 OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)
+#define SUBSYS_SLOT_RSVD (void *)0x
 
 typedef struct NvmeSubsystem {
 DeviceState parent_obj;
@@ -67,6 +68,10 @@ static inline NvmeCtrl *nvme_subsys_ctrl(NvmeSubsystem 
*subsys,
 return NULL;
 }
 
+if (subsys->ctrls[cntlid] == SUBSYS_SLOT_RSVD) {
+return NULL;
+}
+
 return subsys->ctrls[cntlid];
 }
 
@@ -479,6 +484,7 @@ typedef struct NvmeCtrl {
 } features;
 
 NvmePriCtrlCap  pri_ctrl_cap;
+NvmeSecCtrlList sec_ctrl_list;
 } NvmeCtrl;
 
 static inline NvmeNamespace *nvme_ns(NvmeCtrl *n, uint32_t nsid)
@@ -513,6 +519,18 @@ static inline uint16_t nvme_cid(NvmeRequest *req)
 return le16_to_cpu(req->cqe.cid);
 }
 
+static inline NvmeSecCtrlEntry *nvme_sctrl(NvmeCtrl *n)
+{
+PCIDevice *pci_dev = >parent_obj;
+NvmeCtrl

[PATCH v6 05/12] hw/nvme: Make max_ioqpairs and msix_qsize configurable in runtime

2022-03-18 Thread Lukasz Maniak

From: Łukasz Gieryk 

The NVMe device defines two properties: max_ioqpairs, msix_qsize. Having
them as constants is problematic for SR-IOV support.

SR-IOV introduces virtual resources (queues, interrupts) that can be
assigned to PF and its dependent VFs. Each device, following a reset,
should work with the configured number of queues. A single constant is
no longer sufficient to hold the whole state.

This patch tries to solve the problem by introducing additional
variables in NvmeCtrl’s state. The variables for, e.g., managing queues
are therefore organized as:
 - n->params.max_ioqpairs – no changes, constant set by the user
 - n->(mutable_state) – (not a part of this patch) user-configurable,
specifies number of queues available _after_
reset
 - n->conf_ioqpairs - (new) used in all the places instead of the ‘old’
  n->params.max_ioqpairs; initialized in realize()
  and updated during reset() to reflect user’s
  changes to the mutable state

Since the number of available i/o queues and interrupts can change in
runtime, buffers for sq/cqs and the MSIX-related structures are
allocated big enough to handle the limits, to completely avoid the
complicated reallocation. A helper function (nvme_update_msixcap_ts)
updates the corresponding capability register, to signal configuration
changes.

Signed-off-by: Łukasz Gieryk 
Reviewed-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 52 ++
 hw/nvme/nvme.h |  2 ++
 2 files changed, 38 insertions(+), 16 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index e6d6e5840af..12372038075 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -448,12 +448,12 @@ static bool nvme_nsid_valid(NvmeCtrl *n, uint32_t nsid)
 
 static int nvme_check_sqid(NvmeCtrl *n, uint16_t sqid)
 {
-return sqid < n->params.max_ioqpairs + 1 && n->sq[sqid] != NULL ? 0 : -1;
+return sqid < n->conf_ioqpairs + 1 && n->sq[sqid] != NULL ? 0 : -1;
 }
 
 static int nvme_check_cqid(NvmeCtrl *n, uint16_t cqid)
 {
-return cqid < n->params.max_ioqpairs + 1 && n->cq[cqid] != NULL ? 0 : -1;
+return cqid < n->conf_ioqpairs + 1 && n->cq[cqid] != NULL ? 0 : -1;
 }
 
 static void nvme_inc_cq_tail(NvmeCQueue *cq)
@@ -4290,8 +4290,7 @@ static uint16_t nvme_create_sq(NvmeCtrl *n, NvmeRequest 
*req)
 trace_pci_nvme_err_invalid_create_sq_cqid(cqid);
 return NVME_INVALID_CQID | NVME_DNR;
 }
-if (unlikely(!sqid || sqid > n->params.max_ioqpairs ||
-n->sq[sqid] != NULL)) {
+if (unlikely(!sqid || sqid > n->conf_ioqpairs || n->sq[sqid] != NULL)) {
 trace_pci_nvme_err_invalid_create_sq_sqid(sqid);
 return NVME_INVALID_QID | NVME_DNR;
 }
@@ -4643,8 +4642,7 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeRequest 
*req)
 trace_pci_nvme_create_cq(prp1, cqid, vector, qsize, qflags,
  NVME_CQ_FLAGS_IEN(qflags) != 0);
 
-if (unlikely(!cqid || cqid > n->params.max_ioqpairs ||
-n->cq[cqid] != NULL)) {
+if (unlikely(!cqid || cqid > n->conf_ioqpairs || n->cq[cqid] != NULL)) {
 trace_pci_nvme_err_invalid_create_cq_cqid(cqid);
 return NVME_INVALID_QID | NVME_DNR;
 }
@@ -4660,7 +4658,7 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeRequest 
*req)
 trace_pci_nvme_err_invalid_create_cq_vector(vector);
 return NVME_INVALID_IRQ_VECTOR | NVME_DNR;
 }
-if (unlikely(vector >= n->params.msix_qsize)) {
+if (unlikely(vector >= n->conf_msix_qsize)) {
 trace_pci_nvme_err_invalid_create_cq_vector(vector);
 return NVME_INVALID_IRQ_VECTOR | NVME_DNR;
 }
@@ -5261,13 +5259,12 @@ defaults:
 
 break;
 case NVME_NUMBER_OF_QUEUES:
-result = (n->params.max_ioqpairs - 1) |
-((n->params.max_ioqpairs - 1) << 16);
+result = (n->conf_ioqpairs - 1) | ((n->conf_ioqpairs - 1) << 16);
 trace_pci_nvme_getfeat_numq(result);
 break;
 case NVME_INTERRUPT_VECTOR_CONF:
 iv = dw11 & 0x;
-if (iv >= n->params.max_ioqpairs + 1) {
+if (iv >= n->conf_ioqpairs + 1) {
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 
@@ -5423,10 +5420,10 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, 
NvmeRequest *req)
 
 trace_pci_nvme_setfeat_numq((dw11 & 0x) + 1,
 ((dw11 >> 16) & 0x) + 1,
-n->params.max_ioqpairs,
-n->params.max_ioqpairs);
-req->cqe.result = cpu_to_le32((n->params.max_ioqpairs - 1) |
-  ((n->params.max_ioqpairs - 1) << 16));
+n->conf_ioqpairs,
+n->conf_ioqpairs);
+req->cqe.result = cpu_to_le32((n->conf_ioqpairs - 1) |
+  ((n->conf_ioqpairs - 1) << 16));

[PATCH v6 01/12] hw/nvme: Add support for SR-IOV

2022-03-18 Thread Lukasz Maniak

This patch implements initial support for Single Root I/O Virtualization
on an NVMe device.

Essentially, it allows to define the maximum number of virtual functions
supported by the NVMe controller via sriov_max_vfs parameter.

Passing a non-zero value to sriov_max_vfs triggers reporting of SR-IOV
capability by a physical controller and ARI capability by both the
physical and virtual function devices.

NVMe controllers created via virtual functions mirror functionally
the physical controller, which may not entirely be the case, thus
consideration would be needed on the way to limit the capabilities of
the VF.

NVMe subsystem is required for the use of SR-IOV.

Signed-off-by: Lukasz Maniak 
Reviewed-by: Klaus Jensen 
---
 hw/nvme/ctrl.c   | 85 ++--
 hw/nvme/nvme.h   |  3 +-
 include/hw/pci/pci_ids.h |  1 +
 3 files changed, 85 insertions(+), 4 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 03760ddeae8..0e1d8d03c87 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -35,6 +35,7 @@
  *  mdts=,vsl=, \
  *  zoned.zasl=, \
  *  zoned.auto_transition=, \
+ *  sriov_max_vfs= \
  *  subsys=
  *  -device nvme-ns,drive=,bus=,nsid=,\
  *  zoned=, \
@@ -106,6 +107,12 @@
  *   transitioned to zone state closed for resource management purposes.
  *   Defaults to 'on'.
  *
+ * - `sriov_max_vfs`
+ *   Indicates the maximum number of PCIe virtual functions supported
+ *   by the controller. The default value is 0. Specifying a non-zero value
+ *   enables reporting of both SR-IOV and ARI capabilities by the NVMe device.
+ *   Virtual function controllers will not report SR-IOV capability.
+ *
  * nvme namespace device parameters
  * 
  * - `shared`
@@ -160,6 +167,7 @@
 #include "sysemu/block-backend.h"
 #include "sysemu/hostmem.h"
 #include "hw/pci/msix.h"
+#include "hw/pci/pcie_sriov.h"
 #include "migration/vmstate.h"
 
 #include "nvme.h"
@@ -176,6 +184,9 @@
 #define NVME_TEMPERATURE_CRITICAL 0x175
 #define NVME_NUM_FW_SLOTS 1
 #define NVME_DEFAULT_MAX_ZA_SIZE (128 * KiB)
+#define NVME_MAX_VFS 127
+#define NVME_VF_OFFSET 0x1
+#define NVME_VF_STRIDE 1
 
 #define NVME_GUEST_ERR(trace, fmt, ...) \
 do { \
@@ -5886,6 +5897,10 @@ static void nvme_ctrl_reset(NvmeCtrl *n)
 g_free(event);
 }
 
+if (!pci_is_vf(>parent_obj) && n->params.sriov_max_vfs) {
+pcie_sriov_pf_disable_vfs(>parent_obj);
+}
+
 n->aer_queued = 0;
 n->outstanding_aers = 0;
 n->qs_created = false;
@@ -6567,6 +6582,29 @@ static void nvme_check_constraints(NvmeCtrl *n, Error 
**errp)
 error_setg(errp, "vsl must be non-zero");
 return;
 }
+
+if (params->sriov_max_vfs) {
+if (!n->subsys) {
+error_setg(errp, "subsystem is required for the use of SR-IOV");
+return;
+}
+
+if (params->sriov_max_vfs > NVME_MAX_VFS) {
+error_setg(errp, "sriov_max_vfs must be between 0 and %d",
+   NVME_MAX_VFS);
+return;
+}
+
+if (params->cmb_size_mb) {
+error_setg(errp, "CMB is not supported with SR-IOV");
+return;
+}
+
+if (n->pmr.dev) {
+error_setg(errp, "PMR is not supported with SR-IOV");
+return;
+}
+}
 }
 
 static void nvme_init_state(NvmeCtrl *n)
@@ -6624,6 +6662,20 @@ static void nvme_init_pmr(NvmeCtrl *n, PCIDevice 
*pci_dev)
 memory_region_set_enabled(>pmr.dev->mr, false);
 }
 
+static void nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset,
+uint64_t bar_size)
+{
+uint16_t vf_dev_id = n->params.use_intel_id ?
+ PCI_DEVICE_ID_INTEL_NVME : PCI_DEVICE_ID_REDHAT_NVME;
+
+pcie_sriov_pf_init(pci_dev, offset, "nvme", vf_dev_id,
+   n->params.sriov_max_vfs, n->params.sriov_max_vfs,
+   NVME_VF_OFFSET, NVME_VF_STRIDE);
+
+pcie_sriov_pf_init_vf_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY |
+  PCI_BASE_ADDRESS_MEM_TYPE_64, bar_size);
+}
+
 static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp)
 {
 uint8_t *pci_conf = pci_dev->config;
@@ -6638,7 +6690,7 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, 
Error **errp)
 
 if (n->params.use_intel_id) {
 pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_INTEL);
-pci_config_set_device_id(pci_conf, 0x5845);
+pci_config_set_device_id(pci_conf, PCI_DEVICE_ID_INTEL_NVME);
 } else {
 pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_REDHAT);
 pci_config_set_device_id(pci_conf, PCI_DEVICE_ID_REDHAT_NVME);
@@ -6646,6 +6698,9 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, 
Error **errp)
 
 pci_config_set_class(pci_conf, PCI_CLASS_STORAGE_EXPRESS);
 pcie_endpoint_cap_init(pci_dev,

[PATCH v6 00/12] hw/nvme: SR-IOV with Virtualization Enhancements

2022-03-18 Thread Lukasz Maniak

Changes since v5:
- Fixed PCI hotplug issue related to deleting VF twice
- Corrected error messages for SR-IOV parameters
- Rebased on master, patches for PCI got pulled into the tree
- Added Reviewed-by labels

Lukasz Maniak (4):
  hw/nvme: Add support for SR-IOV
  hw/nvme: Add support for Primary Controller Capabilities
  hw/nvme: Add support for Secondary Controller List
  docs: Add documentation for SR-IOV and Virtualization Enhancements

Łukasz Gieryk (8):
  hw/nvme: Implement the Function Level Reset
  hw/nvme: Make max_ioqpairs and msix_qsize configurable in runtime
  hw/nvme: Remove reg_size variable and update BAR0 size calculation
  hw/nvme: Calculate BAR attributes in a function
  hw/nvme: Initialize capability structures for primary/secondary
controllers
  hw/nvme: Add support for the Virtualization Management command
  hw/nvme: Update the initalization place for the AER queue
  hw/acpi: Make the PCI hot-plug aware of SR-IOV

 docs/system/devices/nvme.rst |  82 +
 hw/acpi/pcihp.c  |   6 +-
 hw/nvme/ctrl.c   | 673 ---
 hw/nvme/ns.c |   2 +-
 hw/nvme/nvme.h   |  55 ++-
 hw/nvme/subsys.c |  75 +++-
 hw/nvme/trace-events |   6 +
 include/block/nvme.h |  65 
 include/hw/pci/pci_ids.h |   1 +
 9 files changed, 909 insertions(+), 56 deletions(-)

-- 
2.25.1

Re: [PATCH-for-6.2 0/2] hw/block/fdc: Fix CVE-2021-3507

2022-03-18 Thread Thomas Huth


On 10/03/2022 18.53, Jon Maloy wrote:


On 3/10/22 12:14, Thomas Huth wrote:

On 06/02/2022 20.19, Jon Maloy wrote:

Trying again with correct email address.
///jon

On 2/6/22 14:15, Jon Maloy wrote:



On 1/27/22 15:14, Jon Maloy wrote:


On 11/18/21 06:57, Philippe Mathieu-Daudé wrote:

Trivial fix for CVE-2021-3507.

Philippe Mathieu-Daudé (2):
   hw/block/fdc: Prevent end-of-track overrun (CVE-2021-3507)
   tests/qtest/fdc-test: Add a regression test for CVE-2021-3507

  hw/block/fdc.c |  8 
  tests/qtest/fdc-test.c | 20 
  2 files changed, 28 insertions(+)


Series
Acked-by: Jon Maloy 


Philippe,
I hear from other sources that you earlier have qualified this one as 
"incomplete".
I am of course aware that this one, just like my own patch, is just a 
mitigation and not a complete correction of the erroneous calculation.

Or did you have anything else in mind?


Any news on this one? It would be nice to get the CVE fixed for 7.0 ?

 Thomas


The ball is currently with John Snow, as I understand it.
The concern is that this fix may not take the driver back to a consistent 
state, so that we may have other problems later.

Maybe Philippe can chip in with a comment here?


John, Philippe, any ideas how to move this forward?

 Thomas

Re: [RFC PATCH 2/3] hw/sd/sdhci: Prohibit DMA accesses to devices

2022-03-18 Thread Thomas Huth


On 15/12/2021 21.56, Philippe Mathieu-Daudé wrote:

From: Philippe Mathieu-Daudé 

The issue reported by OSS-Fuzz produces the following backtrace:

   ==447470==ERROR: AddressSanitizer: heap-buffer-overflow
   READ of size 1 at 0x6152a080 thread T0
   #0 0x71766d47 in sdhci_read_dataport hw/sd/sdhci.c:474:18
   #1 0x7175f139 in sdhci_read hw/sd/sdhci.c:1022:19
   #2 0x721b937b in memory_region_read_accessor softmmu/memory.c:440:11
   #3 0x72171e51 in access_with_adjusted_size softmmu/memory.c:554:18
   #4 0x7216f47c in memory_region_dispatch_read1 softmmu/memory.c:1424:16
   #5 0x7216ebb9 in memory_region_dispatch_read softmmu/memory.c:1452:9
   #6 0x7212db5d in flatview_read_continue softmmu/physmem.c:2879:23
   #7 0x7212f958 in flatview_read softmmu/physmem.c:2921:12
   #8 0x7212f418 in address_space_read_full softmmu/physmem.c:2934:18
   #9 0x721305a9 in address_space_rw softmmu/physmem.c:2962:16
   #10 0x7175a392 in dma_memory_rw_relaxed include/sysemu/dma.h:89:12
   #11 0x7175a0ea in dma_memory_rw include/sysemu/dma.h:132:12
   #12 0x71759684 in dma_memory_read include/sysemu/dma.h:152:12
   #13 0x7175518c in sdhci_do_adma hw/sd/sdhci.c:823:27
   #14 0x7174bf69 in sdhci_data_transfer hw/sd/sdhci.c:935:13
   #15 0x7176aaa7 in sdhci_send_command hw/sd/sdhci.c:376:9
   #16 0x717629ee in sdhci_write hw/sd/sdhci.c:1212:9
   #17 0x72172513 in memory_region_write_accessor softmmu/memory.c:492:5
   #18 0x72171e51 in access_with_adjusted_size softmmu/memory.c:554:18
   #19 0x72170766 in memory_region_dispatch_write softmmu/memory.c:1504:16
   #20 0x721419ee in flatview_write_continue softmmu/physmem.c:2812:23
   #21 0x721301eb in flatview_write softmmu/physmem.c:2854:12
   #22 0x7212fca8 in address_space_write softmmu/physmem.c:2950:18
   #23 0x721d9a53 in qtest_process_command softmmu/qtest.c:727:9

A DMA descriptor is previously filled in RAM. An I/O access to the
device (frames #22 to #16) start the DMA engine (frame #13). The
engine fetch the descriptor and execute the request, which itself
accesses the SDHCI I/O registers (frame #1 and #0), triggering a
re-entrancy issue.

Fix by prohibit transactions from the DMA to devices. The DMA engine
is thus restricted to memories.

Reported-by: OSS-Fuzz (Issue 36391)
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/451
Signed-off-by: Philippe Mathieu-Daudé 
---
  hw/sd/sdhci.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
index fe2f21f0c37..0e5e988927e 100644
--- a/hw/sd/sdhci.c
+++ b/hw/sd/sdhci.c
@@ -741,6 +741,7 @@ static void sdhci_do_adma(SDHCIState *s)
  {
  unsigned int begin, length;
  const uint16_t block_size = s->blksize & BLOCK_SIZE_MASK;
+const MemTxAttrs attrs = { .memory = true };
  ADMADescr dscr = {};
  MemTxResult res;
  int i;
@@ -794,7 +795,7 @@ static void sdhci_do_adma(SDHCIState *s)
  res = dma_memory_write(s->dma_as, dscr.addr,
 >fifo_buffer[begin],
 s->data_count - begin,
-   MEMTXATTRS_UNSPECIFIED);
+   attrs);
  if (res != MEMTX_OK) {
  break;
  }
@@ -823,7 +824,7 @@ static void sdhci_do_adma(SDHCIState *s)
  res = dma_memory_read(s->dma_as, dscr.addr,
>fifo_buffer[begin],
s->data_count - begin,
-  MEMTXATTRS_UNSPECIFIED);
+  attrs);
  if (res != MEMTX_OK) {
  break;
  }


Looks sane to me!

Reviewed-by: Thomas Huth

Re: [PATCH] Fix 'writeable' typos

2022-03-18 Thread Philippe Mathieu-Daudé


On 18/3/22 18:30, Peter Maydell wrote:

We have about 25 instances of the typo/variant spelling 'writeable',
and over 500 of the more common 'writable'.  Standardize on the
latter.


Amusingly I was looking yesterday at the difference between both.

Reviewed-by: Philippe Mathieu-Daudé 


Change produced with:

  sed -i -e 's/writeable/writable/g' $(git grep -l writeable)

and then hand-undoing the instance in linux-headers/linux/kvm.h.

All these changes are in comments or documentation, except for the
two local variables in accel/hvf/hvf-accel-ops.c and
accel/kvm/kvm-all.c.


Signed-off-by: Peter Maydell 
---
  docs/interop/vhost-user.rst| 2 +-
  docs/specs/vmgenid.txt | 4 ++--
  hw/scsi/mfi.h  | 2 +-
  target/arm/internals.h | 2 +-
  accel/hvf/hvf-accel-ops.c  | 4 ++--
  accel/kvm/kvm-all.c| 4 ++--
  accel/tcg/user-exec.c  | 6 +++---
  hw/acpi/ghes.c | 2 +-
  hw/intc/arm_gicv3_cpuif.c  | 2 +-
  hw/intc/arm_gicv3_dist.c   | 2 +-
  hw/intc/arm_gicv3_redist.c | 2 +-
  hw/intc/riscv_aclint.c | 2 +-
  hw/intc/riscv_aplic.c  | 2 +-
  hw/pci/shpc.c  | 2 +-
  hw/timer/sse-timer.c   | 2 +-
  target/arm/gdbstub.c   | 2 +-
  target/i386/cpu-sysemu.c   | 2 +-
  target/s390x/ioinst.c  | 2 +-
  python/qemu/machine/machine.py | 2 +-
  tests/tcg/x86_64/system/boot.S | 2 +-
  20 files changed, 25 insertions(+), 25 deletions(-)

Re: [RFC PATCH 1/3] hw/sd/sdhci: Honor failed DMA transactions

2022-03-18 Thread Thomas Huth


On 15/12/2021 21.56, Philippe Mathieu-Daudé wrote:

From: Philippe Mathieu-Daudé 

DMA transactions might fail. The DMA API returns a MemTxResult,
indicating such failures. Do not ignore it. On failure, raise
the ADMA error flag and eventually triggering an IRQ (see spec
chapter 1.13.5: "ADMA2 States").

Signed-off-by: Philippe Mathieu-Daudé 
---
  hw/sd/sdhci.c | 34 +-
  1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
index e0bbc903446..fe2f21f0c37 100644
--- a/hw/sd/sdhci.c
+++ b/hw/sd/sdhci.c
@@ -742,6 +742,7 @@ static void sdhci_do_adma(SDHCIState *s)
  unsigned int begin, length;
  const uint16_t block_size = s->blksize & BLOCK_SIZE_MASK;
  ADMADescr dscr = {};
+MemTxResult res;
  int i;
  
  if (s->trnmod & SDHC_TRNS_BLK_CNT_EN && !s->blkcnt) {

@@ -790,10 +791,13 @@ static void sdhci_do_adma(SDHCIState *s)
  s->data_count = block_size;
  length -= block_size - begin;
  }
-dma_memory_write(s->dma_as, dscr.addr,
- >fifo_buffer[begin],
- s->data_count - begin,
- MEMTXATTRS_UNSPECIFIED);
+res = dma_memory_write(s->dma_as, dscr.addr,
+   >fifo_buffer[begin],
+   s->data_count - begin,
+   MEMTXATTRS_UNSPECIFIED);
+if (res != MEMTX_OK) {
+break;
+}
  dscr.addr += s->data_count - begin;
  if (s->data_count == block_size) {
  s->data_count = 0;
@@ -816,10 +820,13 @@ static void sdhci_do_adma(SDHCIState *s)
  s->data_count = block_size;
  length -= block_size - begin;
  }
-dma_memory_read(s->dma_as, dscr.addr,
->fifo_buffer[begin],
-s->data_count - begin,
-MEMTXATTRS_UNSPECIFIED);
+res = dma_memory_read(s->dma_as, dscr.addr,
+  >fifo_buffer[begin],
+  s->data_count - begin,
+  MEMTXATTRS_UNSPECIFIED);
+if (res != MEMTX_OK) {
+break;
+}
  dscr.addr += s->data_count - begin;
  if (s->data_count == block_size) {
  sdbus_write_data(>sdbus, s->fifo_buffer, 
block_size);
@@ -833,7 +840,16 @@ static void sdhci_do_adma(SDHCIState *s)
  }
  }
  }
-s->admasysaddr += dscr.incr;
+if (res != MEMTX_OK) {
+if (s->errintstsen & SDHC_EISEN_ADMAERR) {
+trace_sdhci_error("Set ADMA error flag");
+s->errintsts |= SDHC_EIS_ADMAERR;
+s->norintsts |= SDHC_NIS_ERR;
+}
+sdhci_update_irq(s);
+} else {
+s->admasysaddr += dscr.incr;
+}
  break;
  case SDHC_ADMA_ATTR_ACT_LINK:   /* link to next descriptor table */
  s->admasysaddr = dscr.addr;


Patch looks sane to me:

Reviewed-by: Thomas Huth 

Are you still considering it or did you drop this from your TODO list? 
(since it was just marked as RFC?)


 Thomas

Re: [PATCH 0/4] iotests: finalize switch to async QMP

2022-03-18 Thread John Snow

On Fri, Mar 18, 2022 at 12:32 PM Hanna Reitz  wrote:
>
> On 08.02.22 20:52, John Snow wrote:
> > Squeak Squeak...
> >
> > ...Any objections to me staging this?
> >
> > (This patchset removes the accommodations in iotests for allowing
> > either library to run and always forces the new one. Point of no
> > return for iotests.)
>
> I took this as “if I don’t reply, that’ll be reply enough” :)
>
> Looks to me like the rebase is minimal (just shuffling the imports in
> patch 4 a bit), so I guess this’ll help even before you resend:
>
> Acked-by: Hanna Reitz 
>

Great, thanks! I just didn't want to pull the rug out from under
anyone on this and really wanted an explicit "yes, sure".

You're the best!

--js

[PATCH for-7.1 9/9] hw/ppc/spapr_drc.c: remove spapr_drc_index()

2022-03-18 Thread Daniel Henrique Barboza

The only remaining caller of this function is the initialization of
drc->index in spapr_dr_connector_new().

Open code the body of the function inside spapr_dr_connector_new() and
remove spapr_drc_index().

Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/spapr_drc.c | 23 ++-
 include/hw/ppc/spapr_drc.h |  1 -
 2 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 11a49620c8..8c8654121c 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -39,18 +39,6 @@ SpaprDrcType spapr_drc_type(SpaprDrc *drc)
 return 1 << drck->typeshift;
 }
 
-uint32_t spapr_drc_index(SpaprDrc *drc)
-{
-SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
-
-/* no set format for a drc index: it only needs to be globally
- * unique. this is how we encode the DRC type on bare-metal
- * however, so might as well do that here
- */
-return (drck->typeshift << DRC_INDEX_TYPE_SHIFT)
-| (drc->id & DRC_INDEX_ID_MASK);
-}
-
 static void spapr_drc_release(SpaprDrc *drc)
 {
 SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
@@ -546,11 +534,20 @@ SpaprDrc *spapr_dr_connector_new(Object *owner, const 
char *type,
  uint32_t id)
 {
 SpaprDrc *drc = SPAPR_DR_CONNECTOR(object_new(type));
+SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
 g_autofree char *prop_name = NULL;
 
 drc->id = id;
 drc->owner = owner;
-drc->index = spapr_drc_index(drc);
+
+/*
+ * No set format for a drc index: it only needs to be globally
+ * unique. This is how we encode the DRC type on bare-metal
+ * however, so might as well do that here.
+ */
+drc->index = (drck->typeshift << DRC_INDEX_TYPE_SHIFT) |
+ (drc->id & DRC_INDEX_ID_MASK);
+
 prop_name = g_strdup_printf("dr-connector[%"PRIu32"]", drc->index);
 object_property_add_child(owner, prop_name, OBJECT(drc));
 object_unref(OBJECT(drc));
diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
index 93825e47a6..33cdb3cc20 100644
--- a/include/hw/ppc/spapr_drc.h
+++ b/include/hw/ppc/spapr_drc.h
@@ -228,7 +228,6 @@ static inline bool spapr_drc_hotplugged(DeviceState *dev)
 /* Returns true if an unplug request completed */
 bool spapr_drc_reset(SpaprDrc *drc);
 
-uint32_t spapr_drc_index(SpaprDrc *drc);
 SpaprDrcType spapr_drc_type(SpaprDrc *drc);
 
 SpaprDrc *spapr_dr_connector_new(Object *owner, const char *type,
-- 
2.35.1

[PATCH for-7.1 6/9] hw/ppc/spapr_events.c: use drc->index

2022-03-18 Thread Daniel Henrique Barboza

Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/spapr_events.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index 630e86282c..d41f4e47c0 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -675,7 +675,7 @@ void spapr_hotplug_req_add_by_index(SpaprDrc *drc)
 SpaprDrcType drc_type = spapr_drc_type(drc);
 union drc_identifier drc_id;
 
-drc_id.index = spapr_drc_index(drc);
+drc_id.index = drc->index;
 spapr_hotplug_req_event(RTAS_LOG_V6_HP_ID_DRC_INDEX,
 RTAS_LOG_V6_HP_ACTION_ADD, drc_type, _id);
 }
@@ -685,7 +685,7 @@ void spapr_hotplug_req_remove_by_index(SpaprDrc *drc)
 SpaprDrcType drc_type = spapr_drc_type(drc);
 union drc_identifier drc_id;
 
-drc_id.index = spapr_drc_index(drc);
+drc_id.index = drc->index;
 spapr_hotplug_req_event(RTAS_LOG_V6_HP_ID_DRC_INDEX,
 RTAS_LOG_V6_HP_ACTION_REMOVE, drc_type, _id);
 }
-- 
2.35.1

[PATCH for-7.1 5/9] hw/ppc/spapr.c: use drc->index

2022-03-18 Thread Daniel Henrique Barboza

Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/spapr.c | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 953fc65fa8..6aab04787d 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -419,7 +419,7 @@ static int spapr_dt_dynamic_memory_v2(SpaprMachineState 
*spapr, void *fdt,
 drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, cur_addr / lmb_size);
 g_assert(drc);
 elem = spapr_get_drconf_cell((addr - cur_addr) / lmb_size,
- cur_addr, spapr_drc_index(drc), -1, 
0);
+ cur_addr, drc->index, -1, 0);
 QSIMPLEQ_INSERT_TAIL(_queue, elem, entry);
 nr_entries++;
 }
@@ -428,7 +428,7 @@ static int spapr_dt_dynamic_memory_v2(SpaprMachineState 
*spapr, void *fdt,
 drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, addr / lmb_size);
 g_assert(drc);
 elem = spapr_get_drconf_cell(size / lmb_size, addr,
- spapr_drc_index(drc), node,
+ drc->index, node,
  (SPAPR_LMB_FLAGS_ASSIGNED |
   SPAPR_LMB_FLAGS_HOTREMOVABLE));
 QSIMPLEQ_INSERT_TAIL(_queue, elem, entry);
@@ -441,7 +441,7 @@ static int spapr_dt_dynamic_memory_v2(SpaprMachineState 
*spapr, void *fdt,
 drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB, cur_addr / lmb_size);
 g_assert(drc);
 elem = spapr_get_drconf_cell((mem_end - cur_addr) / lmb_size,
- cur_addr, spapr_drc_index(drc), -1, 0);
+ cur_addr, drc->index, -1, 0);
 QSIMPLEQ_INSERT_TAIL(_queue, elem, entry);
 nr_entries++;
 }
@@ -497,7 +497,7 @@ static int spapr_dt_dynamic_memory(SpaprMachineState 
*spapr, void *fdt,
 
 dynamic_memory[0] = cpu_to_be32(addr >> 32);
 dynamic_memory[1] = cpu_to_be32(addr & 0x);
-dynamic_memory[2] = cpu_to_be32(spapr_drc_index(drc));
+dynamic_memory[2] = cpu_to_be32(drc->index);
 dynamic_memory[3] = cpu_to_be32(0); /* reserved */
 dynamic_memory[4] = cpu_to_be32(spapr_pc_dimm_node(dimms, addr));
 if (memory_region_present(get_system_memory(), addr)) {
@@ -663,14 +663,12 @@ static void spapr_dt_cpu(CPUState *cs, void *fdt, int 
offset,
 uint32_t pft_size_prop[] = {0, cpu_to_be32(spapr->htab_shift)};
 int compat_smt = MIN(smp_threads, ppc_compat_max_vthreads(cpu));
 SpaprDrc *drc;
-int drc_index;
 uint32_t radix_AP_encodings[PPC_PAGE_SIZES_MAX_SZ];
 int i;
 
 drc = spapr_drc_by_id(TYPE_SPAPR_DRC_CPU, index);
 if (drc) {
-drc_index = spapr_drc_index(drc);
-_FDT((fdt_setprop_cell(fdt, offset, "ibm,my-drc-index", drc_index)));
+_FDT((fdt_setprop_cell(fdt, offset, "ibm,my-drc-index", drc->index)));
 }
 
 _FDT((fdt_setprop_cell(fdt, offset, "reg", index)));
@@ -3448,7 +3446,7 @@ int spapr_lmb_dt_populate(SpaprDrc *drc, 
SpaprMachineState *spapr,
 uint64_t addr;
 uint32_t node;
 
-addr = spapr_drc_index(drc) * SPAPR_MEMORY_BLOCK_SIZE;
+addr = drc->index * SPAPR_MEMORY_BLOCK_SIZE;
 node = object_property_get_uint(OBJECT(drc->dev), PC_DIMM_NODE_PROP,
 _abort);
 *fdt_start_offset = spapr_dt_memory_node(spapr, fdt, node, addr,
@@ -3491,7 +3489,7 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t 
addr_start, uint64_t size,
 g_assert(drc);
 spapr_hotplug_req_add_by_count_indexed(SPAPR_DR_CONNECTOR_TYPE_LMB,
nr_lmbs,
-   spapr_drc_index(drc));
+   drc->index);
 } else {
 spapr_hotplug_req_add_by_count(SPAPR_DR_CONNECTOR_TYPE_LMB,
nr_lmbs);
@@ -3791,7 +3789,7 @@ static void spapr_memory_unplug_request(HotplugHandler 
*hotplug_dev,
 drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB,
   addr_start / SPAPR_MEMORY_BLOCK_SIZE);
 spapr_hotplug_req_remove_by_count_indexed(SPAPR_DR_CONNECTOR_TYPE_LMB,
-  nr_lmbs, spapr_drc_index(drc));
+  nr_lmbs, drc->index);
 }
 
 /* Callback to be called during DRC release. */
-- 
2.35.1

[PATCH for-7.1 8/9] hw/ppc/spapr_pci.c: use drc->index

2022-03-18 Thread Daniel Henrique Barboza

Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/spapr_pci.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 5bfd4aa9e5..f9338af071 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1419,8 +1419,7 @@ static int spapr_dt_pci_device(SpaprPhbState *sphb, 
PCIDevice *dev,
 g_free(loc_code);
 
 if (drc) {
-_FDT(fdt_setprop_cell(fdt, offset, "ibm,my-drc-index",
-  spapr_drc_index(drc)));
+_FDT(fdt_setprop_cell(fdt, offset, "ibm,my-drc-index", drc->index));
 }
 
 if (msi_present(dev)) {
@@ -2429,7 +2428,7 @@ int spapr_dt_phb(SpaprMachineState *spapr, SpaprPhbState 
*phb,
 
 drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PHB, phb->index);
 if (drc) {
-uint32_t drc_index = cpu_to_be32(spapr_drc_index(drc));
+uint32_t drc_index = cpu_to_be32(drc->index);
 
 _FDT(fdt_setprop(fdt, bus_off, "ibm,my-drc-index", _index,
  sizeof(drc_index)));
-- 
2.35.1

[PATCH for-7.1 4/9] hw/ppc/spapr_drc.c: use drc->index

2022-03-18 Thread Daniel Henrique Barboza

After this patch, the only place where spapr_drc_index() is still being
used in this file is in the drc->index initialization.

We can't get rid of spapr_drc_index() yet because of external callers.
We'll handle them next.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/spapr_drc.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 1d751fe9cc..11a49620c8 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -509,7 +509,7 @@ static const VMStateDescription vmstate_spapr_drc = {
 static void drc_realize(DeviceState *d, Error **errp)
 {
 SpaprDrc *drc = SPAPR_DR_CONNECTOR(d);
-g_autofree gchar *link_name = g_strdup_printf("%x", spapr_drc_index(drc));
+g_autofree gchar *link_name = g_strdup_printf("%x", drc->index);
 Object *root_container;
 const char *child_name;
 
@@ -526,15 +526,14 @@ static void drc_realize(DeviceState *d, Error **errp)
 trace_spapr_drc_realize_child(drc->index, child_name);
 object_property_add_alias(root_container, link_name,
   drc->owner, child_name);
-vmstate_register(VMSTATE_IF(drc), spapr_drc_index(drc), _spapr_drc,
- drc);
+vmstate_register(VMSTATE_IF(drc), drc->index, _spapr_drc, drc);
 trace_spapr_drc_realize_complete(drc->index);
 }
 
 static void drc_unrealize(DeviceState *d)
 {
 SpaprDrc *drc = SPAPR_DR_CONNECTOR(d);
-g_autofree gchar *name = g_strdup_printf("%x", spapr_drc_index(drc));
+g_autofree gchar *name = g_strdup_printf("%x", drc->index);
 Object *root_container;
 
 trace_spapr_drc_unrealize(drc->index);
@@ -552,8 +551,7 @@ SpaprDrc *spapr_dr_connector_new(Object *owner, const char 
*type,
 drc->id = id;
 drc->owner = owner;
 drc->index = spapr_drc_index(drc);
-prop_name = g_strdup_printf("dr-connector[%"PRIu32"]",
-spapr_drc_index(drc));
+prop_name = g_strdup_printf("dr-connector[%"PRIu32"]", drc->index);
 object_property_add_child(owner, prop_name, OBJECT(drc));
 object_unref(OBJECT(drc));
 qdev_realize(DEVICE(drc), NULL, NULL);
@@ -633,8 +631,7 @@ static void realize_physical(DeviceState *d, Error **errp)
 return;
 }
 
-vmstate_register(VMSTATE_IF(drcp),
- spapr_drc_index(SPAPR_DR_CONNECTOR(drcp)),
+vmstate_register(VMSTATE_IF(drcp), SPAPR_DR_CONNECTOR(drcp)->index,
  _spapr_drc_physical, drcp);
 qemu_register_reset(drc_physical_reset, drcp);
 }
@@ -883,7 +880,7 @@ int spapr_dt_drc(void *fdt, int offset, Object *owner, 
uint32_t drc_type_mask)
 drc_count++;
 
 /* ibm,drc-indexes */
-drc_index = cpu_to_be32(spapr_drc_index(drc));
+drc_index = cpu_to_be32(drc->index);
 g_array_append_val(drc_indexes, drc_index);
 
 /* ibm,drc-power-domains */
-- 
2.35.1

Re: [PATCH v4] tests: Do not treat the iotests as separate meson test target anymore

2022-03-18 Thread Thomas Huth


On 18/03/2022 18.04, Hanna Reitz wrote:

On 10.03.22 08:50, Thomas Huth wrote:

If there is a failing iotest, the output is currently not logged to
the console anymore. To get this working again, we need to run the
meson test runner with "--print-errorlogs" (and without "--verbose"
due to a current meson bug that will be fixed here:
https://github.com/mesonbuild/meson/commit/c3f145ca2b9f5.patch ).
We could update the "meson test" call in tests/Makefile.include,
but actually it's nicer and easier if we simply do not treat the
iotests as separate test target anymore and integrate them along
with the other test suites. This has the disadvantage of not getting
the detailed progress indication there anymore, but since that was
only working right in single-threaded "make -j1" mode anyway, it's
not a huge loss right now.

Signed-off-by: Thomas Huth 
---
  v4: updated commit description

  meson.build    | 6 +++---
  scripts/mtest2make.py  | 4 
  tests/Makefile.include | 9 +
  3 files changed, 4 insertions(+), 15 deletions(-)


I can’t really say I understand what’s going on in this patch and around it, 
but I can confirm that it before this patch, fail diffs aren’t printed; but 
afterwards, they are


It's a bug in Meson. It will be fixed in 0.61.3 and later (so this patch 
won't be needed there anymore), but the update to meson 0.61.3 caused other 
problems so we also can't do that right now... so I'm not sure whether we 
now want to have this patch here included, wait for a better version of 
meson, or even rather want to revert the TAP support / meson integration 
again for 7.0 ... ?


 Thomas

[PATCH for-7.1 7/9] hw/ppc/spapr_nvdimm.c: use drc->index

2022-03-18 Thread Daniel Henrique Barboza

Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/spapr_nvdimm.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/hw/ppc/spapr_nvdimm.c b/hw/ppc/spapr_nvdimm.c
index c4c97da5de..5acb761220 100644
--- a/hw/ppc/spapr_nvdimm.c
+++ b/hw/ppc/spapr_nvdimm.c
@@ -145,7 +145,6 @@ static int spapr_dt_nvdimm(SpaprMachineState *spapr, void 
*fdt,
 int child_offset;
 char *buf;
 SpaprDrc *drc;
-uint32_t drc_idx;
 uint32_t node = object_property_get_uint(OBJECT(nvdimm), PC_DIMM_NODE_PROP,
  _abort);
 uint64_t slot = object_property_get_uint(OBJECT(nvdimm), PC_DIMM_SLOT_PROP,
@@ -157,15 +156,13 @@ static int spapr_dt_nvdimm(SpaprMachineState *spapr, void 
*fdt,
 drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PMEM, slot);
 g_assert(drc);
 
-drc_idx = spapr_drc_index(drc);
-
-buf = g_strdup_printf("ibm,pmemory@%x", drc_idx);
+buf = g_strdup_printf("ibm,pmemory@%x", drc->index);
 child_offset = fdt_add_subnode(fdt, parent_offset, buf);
 g_free(buf);
 
 _FDT(child_offset);
 
-_FDT((fdt_setprop_cell(fdt, child_offset, "reg", drc_idx)));
+_FDT((fdt_setprop_cell(fdt, child_offset, "reg", drc->index)));
 _FDT((fdt_setprop_string(fdt, child_offset, "compatible", "ibm,pmemory")));
 _FDT((fdt_setprop_string(fdt, child_offset, "device_type", 
"ibm,pmemory")));
 
@@ -175,7 +172,8 @@ static int spapr_dt_nvdimm(SpaprMachineState *spapr, void 
*fdt,
 _FDT((fdt_setprop_string(fdt, child_offset, "ibm,unit-guid", buf)));
 g_free(buf);
 
-_FDT((fdt_setprop_cell(fdt, child_offset, "ibm,my-drc-index", drc_idx)));
+_FDT((fdt_setprop_cell(fdt, child_offset, "ibm,my-drc-index",
+   drc->index)));
 
 _FDT((fdt_setprop_u64(fdt, child_offset, "ibm,block-size",
   SPAPR_MINIMUM_SCM_BLOCK_SIZE)));
-- 
2.35.1

[PATCH for-7.1 3/9] hw/ppc/spapr_drc.c: use drc->index in trace functions

2022-03-18 Thread Daniel Henrique Barboza

All the trace calls in the file are using spapr_drc_index(). Let's
convert them to use drc->index.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/spapr_drc.c | 30 +-
 1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 1a5e9003b2..1d751fe9cc 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -81,8 +81,7 @@ static uint32_t drc_isolate_physical(SpaprDrc *drc)
 drc->state = SPAPR_DRC_STATE_PHYSICAL_POWERON;
 
 if (drc->unplug_requested) {
-uint32_t drc_index = spapr_drc_index(drc);
-trace_spapr_drc_set_isolation_state_finalizing(drc_index);
+trace_spapr_drc_set_isolation_state_finalizing(drc->index);
 spapr_drc_release(drc);
 }
 
@@ -247,8 +246,7 @@ static uint32_t drc_set_unusable(SpaprDrc *drc)
 
 drc->state = SPAPR_DRC_STATE_LOGICAL_UNUSABLE;
 if (drc->unplug_requested) {
-uint32_t drc_index = spapr_drc_index(drc);
-trace_spapr_drc_set_allocation_state_finalizing(drc_index);
+trace_spapr_drc_set_allocation_state_finalizing(drc->index);
 spapr_drc_release(drc);
 }
 
@@ -390,7 +388,7 @@ static void prop_get_fdt(Object *obj, Visitor *v, const 
char *name,
 
 void spapr_drc_attach(SpaprDrc *drc, DeviceState *d)
 {
-trace_spapr_drc_attach(spapr_drc_index(drc));
+trace_spapr_drc_attach(drc->index);
 
 g_assert(!drc->dev);
 g_assert((drc->state == SPAPR_DRC_STATE_LOGICAL_UNUSABLE)
@@ -408,14 +406,14 @@ void spapr_drc_unplug_request(SpaprDrc *drc)
 {
 SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
 
-trace_spapr_drc_unplug_request(spapr_drc_index(drc));
+trace_spapr_drc_unplug_request(drc->index);
 
 g_assert(drc->dev);
 
 drc->unplug_requested = true;
 
 if (drc->state != drck->empty_state) {
-trace_spapr_drc_awaiting_quiesce(spapr_drc_index(drc));
+trace_spapr_drc_awaiting_quiesce(drc->index);
 return;
 }
 
@@ -427,7 +425,7 @@ bool spapr_drc_reset(SpaprDrc *drc)
 SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
 bool unplug_completed = false;
 
-trace_spapr_drc_reset(spapr_drc_index(drc));
+trace_spapr_drc_reset(drc->index);
 
 /* immediately upon reset we can safely assume DRCs whose devices
  * are pending removal can be safely removed.
@@ -515,7 +513,7 @@ static void drc_realize(DeviceState *d, Error **errp)
 Object *root_container;
 const char *child_name;
 
-trace_spapr_drc_realize(spapr_drc_index(drc));
+trace_spapr_drc_realize(drc->index);
 /* NOTE: we do this as part of realize/unrealize due to the fact
  * that the guest will communicate with the DRC via RTAS calls
  * referencing the global DRC index. By unlinking the DRC
@@ -525,12 +523,12 @@ static void drc_realize(DeviceState *d, Error **errp)
  */
 root_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
 child_name = object_get_canonical_path_component(OBJECT(drc));
-trace_spapr_drc_realize_child(spapr_drc_index(drc), child_name);
+trace_spapr_drc_realize_child(drc->index, child_name);
 object_property_add_alias(root_container, link_name,
   drc->owner, child_name);
 vmstate_register(VMSTATE_IF(drc), spapr_drc_index(drc), _spapr_drc,
  drc);
-trace_spapr_drc_realize_complete(spapr_drc_index(drc));
+trace_spapr_drc_realize_complete(drc->index);
 }
 
 static void drc_unrealize(DeviceState *d)
@@ -539,7 +537,7 @@ static void drc_unrealize(DeviceState *d)
 g_autofree gchar *name = g_strdup_printf("%x", spapr_drc_index(drc));
 Object *root_container;
 
-trace_spapr_drc_unrealize(spapr_drc_index(drc));
+trace_spapr_drc_unrealize(drc->index);
 vmstate_unregister(VMSTATE_IF(drc), _spapr_drc, drc);
 root_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
 object_property_del(root_container, name);
@@ -986,7 +984,7 @@ static uint32_t rtas_set_isolation_state(uint32_t idx, 
uint32_t state)
 return RTAS_OUT_NO_SUCH_INDICATOR;
 }
 
-trace_spapr_drc_set_isolation_state(spapr_drc_index(drc), state);
+trace_spapr_drc_set_isolation_state(drc->index, state);
 
 drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
 
@@ -1010,7 +1008,7 @@ static uint32_t rtas_set_allocation_state(uint32_t idx, 
uint32_t state)
 return RTAS_OUT_NO_SUCH_INDICATOR;
 }
 
-trace_spapr_drc_set_allocation_state(spapr_drc_index(drc), state);
+trace_spapr_drc_set_allocation_state(drc->index, state);
 
 switch (state) {
 case SPAPR_DR_ALLOCATION_STATE_USABLE:
@@ -1232,10 +1230,8 @@ static void rtas_ibm_configure_connector(PowerPCCPU *cpu,
 case FDT_END_NODE:
 drc->ccs_depth--;
 if (drc->ccs_depth == 0) {
-uint32_t drc_index = spapr_drc_index(drc);
-
 /* done sending the device tree, move to configured state */
-

[PATCH for-7.1 1/9] hw/ppc/spapr_drc.c: add drc->index

2022-03-18 Thread Daniel Henrique Barboza

The DRC index is an unique identifier that is used across all spapr
code. Its value is given by spapr_drc_index() as follows:

return (drck->typeshift << DRC_INDEX_TYPE_SHIFT)
| (drc->id & DRC_INDEX_ID_MASK);

We see that there is nothing that varies with the machine/device state
on spapr_drc_index(). drc->id is defined in spapr_dr_connector_new() and
it's read only, drck->typeshift relies on the DRC class type and it
doesn't change as well and the two macros. Nevertheless,
spapr_drc_index() is called multiple times across spapr files, meaning
that we're always recalculating this value.

This patch adds a new SpaprDrc attribute called 'index'. drc->index will
be initialized with spapr_drc_index() and it's going to be a replacement
for the repetitive spapr_drc_index() usage we have today.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/spapr_drc.c | 1 +
 include/hw/ppc/spapr_drc.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 76bc5d42a0..1b8c797192 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -561,6 +561,7 @@ SpaprDrc *spapr_dr_connector_new(Object *owner, const char 
*type,
 
 drc->id = id;
 drc->owner = owner;
+drc->index = spapr_drc_index(drc);
 prop_name = g_strdup_printf("dr-connector[%"PRIu32"]",
 spapr_drc_index(drc));
 object_property_add_child(owner, prop_name, OBJECT(drc));
diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
index 02a63b3666..93825e47a6 100644
--- a/include/hw/ppc/spapr_drc.h
+++ b/include/hw/ppc/spapr_drc.h
@@ -173,6 +173,7 @@ typedef struct SpaprDrc {
 DeviceState parent;
 
 uint32_t id;
+uint32_t index;
 Object *owner;
 
 uint32_t state;
-- 
2.35.1

[PATCH for-7.1 2/9] hw/ppc/spapr_drc.c: redefine 'index' SpaprDRC property

2022-03-18 Thread Daniel Henrique Barboza

'index' is defined as an uint32 retrieved by prop_get_index(). Change it
to instead return the value of drc->index the same way it's done with
the 'id' property that returns drc->id.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/spapr_drc.c | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 1b8c797192..1a5e9003b2 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -316,14 +316,6 @@ static SpaprDREntitySense logical_entity_sense(SpaprDrc 
*drc)
 }
 }
 
-static void prop_get_index(Object *obj, Visitor *v, const char *name,
-   void *opaque, Error **errp)
-{
-SpaprDrc *drc = SPAPR_DR_CONNECTOR(obj);
-uint32_t value = spapr_drc_index(drc);
-visit_type_uint32(v, name, , errp);
-}
-
 static void prop_get_fdt(Object *obj, Visitor *v, const char *name,
  void *opaque, Error **errp)
 {
@@ -577,8 +569,8 @@ static void spapr_dr_connector_instance_init(Object *obj)
 SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
 
 object_property_add_uint32_ptr(obj, "id", >id, OBJ_PROP_FLAG_READ);
-object_property_add(obj, "index", "uint32", prop_get_index,
-NULL, NULL, NULL);
+object_property_add_uint32_ptr(obj, "index", >index,
+   OBJ_PROP_FLAG_READ);
 object_property_add(obj, "fdt", "struct", prop_get_fdt,
 NULL, NULL, NULL);
 drc->state = drck->empty_state;
-- 
2.35.1

[PATCH for-7.1 0/9] spapr: add drc->index, remove spapr_drc_index()

2022-03-18 Thread Daniel Henrique Barboza

Hi,

I decided to make this change after realizing that (1) spapr_drc_index()
always return the same index value for the DRC regardless of machine or
device state and (2) we call spapr_drc_index() a lot throughout the
spapr code.

This means that a new attribute to store the generated index in the DRC
object time will spare us from calling a function that always returns
the same value.

No functional changes were made.

 
Daniel Henrique Barboza (9):
  hw/ppc/spapr_drc.c: add drc->index
  hw/ppc/spapr_drc.c: redefine 'index' SpaprDRC property
  hw/ppc/spapr_drc.c: use drc->index in trace functions
  hw/ppc/spapr_drc.c: use drc->index
  hw/ppc/spapr.c: use drc->index
  hw/ppc/spapr_events.c: use drc->index
  hw/ppc/spapr_nvdimm.c: use drc->index
  hw/ppc/spapr_pci.c: use drc->index
  hw/ppc/spapr_drc.c: remove spapr_drc_index()

 hw/ppc/spapr.c | 18 -
 hw/ppc/spapr_drc.c | 79 +++---
 hw/ppc/spapr_events.c  |  4 +-
 hw/ppc/spapr_nvdimm.c  | 10 ++---
 hw/ppc/spapr_pci.c |  5 +--
 include/hw/ppc/spapr_drc.h |  2 +-
 6 files changed, 48 insertions(+), 70 deletions(-)

-- 
2.35.1

[PATCH] Fix 'writeable' typos

2022-03-18 Thread Peter Maydell

We have about 25 instances of the typo/variant spelling 'writeable',
and over 500 of the more common 'writable'.  Standardize on the
latter.

Change produced with:

 sed -i -e 's/writeable/writable/g' $(git grep -l writeable)

and then hand-undoing the instance in linux-headers/linux/kvm.h.

All these changes are in comments or documentation, except for the
two local variables in accel/hvf/hvf-accel-ops.c and
accel/kvm/kvm-all.c.


Signed-off-by: Peter Maydell 
---
 docs/interop/vhost-user.rst| 2 +-
 docs/specs/vmgenid.txt | 4 ++--
 hw/scsi/mfi.h  | 2 +-
 target/arm/internals.h | 2 +-
 accel/hvf/hvf-accel-ops.c  | 4 ++--
 accel/kvm/kvm-all.c| 4 ++--
 accel/tcg/user-exec.c  | 6 +++---
 hw/acpi/ghes.c | 2 +-
 hw/intc/arm_gicv3_cpuif.c  | 2 +-
 hw/intc/arm_gicv3_dist.c   | 2 +-
 hw/intc/arm_gicv3_redist.c | 2 +-
 hw/intc/riscv_aclint.c | 2 +-
 hw/intc/riscv_aplic.c  | 2 +-
 hw/pci/shpc.c  | 2 +-
 hw/timer/sse-timer.c   | 2 +-
 target/arm/gdbstub.c   | 2 +-
 target/i386/cpu-sysemu.c   | 2 +-
 target/s390x/ioinst.c  | 2 +-
 python/qemu/machine/machine.py | 2 +-
 tests/tcg/x86_64/system/boot.S | 2 +-
 20 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index 4dbc84fd001..09dad5aea9f 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -222,7 +222,7 @@ Virtio device config space
 :size: a 32-bit configuration space access size in bytes
 
 :flags: a 32-bit value:
-  - 0: Vhost master messages used for writeable fields
+  - 0: Vhost master messages used for writable fields
   - 1: Vhost master messages used for live migration
 
 :payload: Size bytes array holding the contents of the virtio
diff --git a/docs/specs/vmgenid.txt b/docs/specs/vmgenid.txt
index aa9f5186767..80ff69f31cc 100644
--- a/docs/specs/vmgenid.txt
+++ b/docs/specs/vmgenid.txt
@@ -153,7 +153,7 @@ change the contents of the memory at runtime, specifically 
when starting a
 backed-up or snapshotted image.  In order to do this, QEMU must know the
 address that has been allocated.
 
-The mechanism chosen for this memory sharing is writeable fw_cfg blobs.
+The mechanism chosen for this memory sharing is writable fw_cfg blobs.
 These are data object that are visible to both QEMU and guests, and are
 addressable as sequential files.
 
@@ -164,7 +164,7 @@ Two fw_cfg blobs are used in this case:
 /etc/vmgenid_guid - contains the actual VM Generation ID GUID
   - read-only to the guest
 /etc/vmgenid_addr - contains the address of the downloaded vmgenid blob
-  - writeable by the guest
+  - writable by the guest
 
 
 QEMU sends the following commands to the guest at startup:
diff --git a/hw/scsi/mfi.h b/hw/scsi/mfi.h
index e67a5c0b477..0b4ee53dfc0 100644
--- a/hw/scsi/mfi.h
+++ b/hw/scsi/mfi.h
@@ -633,7 +633,7 @@ struct mfi_ctrl_props {
   * metadata and user data
   * 1=5%, 2=10%, 3=15% and so on
   */
-uint8_t viewSpace;   /* snapshot writeable VIEWs
+uint8_t viewSpace;   /* snapshot writable VIEWs
   * capacity as a % of source LD
   * capacity. 0=READ only
   * 1=5%, 2=10%, 3=15% and so on
diff --git a/target/arm/internals.h b/target/arm/internals.h
index a34be2e4595..3f573d53d66 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -1255,7 +1255,7 @@ enum MVEECIState {
 #define PMCRP   0x2
 #define PMCRE   0x1
 /*
- * Mask of PMCR bits writeable by guest (not including WO bits like C, P,
+ * Mask of PMCR bits writable by guest (not including WO bits like C, P,
  * which can be written as 1 to trigger behaviour but which stay RAZ).
  */
 #define PMCR_WRITEABLE_MASK (PMCRLC | PMCRDP | PMCRX | PMCRD | PMCRE)
diff --git a/accel/hvf/hvf-accel-ops.c b/accel/hvf/hvf-accel-ops.c
index 54457c76c2f..684b30dd26e 100644
--- a/accel/hvf/hvf-accel-ops.c
+++ b/accel/hvf/hvf-accel-ops.c
@@ -120,12 +120,12 @@ static void hvf_set_phys_mem(MemoryRegionSection 
*section, bool add)
 {
 hvf_slot *mem;
 MemoryRegion *area = section->mr;
-bool writeable = !area->readonly && !area->rom_device;
+bool writable = !area->readonly && !area->rom_device;
 hv_memory_flags_t flags;
 uint64_t page_size = qemu_real_host_page_size;
 
 if (!memory_region_is_ram(area)) {
-if (writeable) {
+if (writable) {
 return;
 } else if (!memory_region_is_romd(area)) {
 /*
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 27864dfaeaa..52c52accede 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1346,13 +1346,13 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
 KVMSlot *mem;
 int err;
 MemoryRegion *mr =

Re: [RFC PATCH v3 33/36] i386/tdx: Only configure MSR_IA32_UCODE_REV in kvm_init_msrs() for TDs

2022-03-18 Thread Isaku Yamahata

On Thu, Mar 17, 2022 at 09:59:10PM +0800,
Xiaoyao Li  wrote:

> For TDs, only MSR_IA32_UCODE_REV in kvm_init_msrs() can be configured
> by VMM, while the features enumerated/controlled by other MSRs except
> MSR_IA32_UCODE_REV in kvm_init_msrs() are not under control of VMM.
> 
> Only configure MSR_IA32_UCODE_REV for TDs.

non-TDs?
-- 
Isaku Yamahata

Re: [PATCH] sh4: Replace TAB indentations with spaces

2022-03-18 Thread Thomas Huth


On 20/06/2021 19.54, Ahmed Abouzied wrote:

Replaces TABs with spaces, making sure to have a consistent coding style
of 4 space indentations in the SH4 subsystem.

Signed-off-by: Ahmed Abouzied 
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/376
---

...

@@ -1705,101 +1705,101 @@ static void _decode_opc(DisasContext * ctx)
  }
  return;
  case 0xf00d: /* fsts FPUL,FRn - FPSCR: Nothing */
-   CHECK_FPU_ENABLED
+CHECK_FPU_ENABLED
  tcg_gen_mov_i32(FREG(B11_8), cpu_fpul);
-   return;
+return;
  case 0xf01d: /* flds FRm,FPUL - FPSCR: Nothing */
-   CHECK_FPU_ENABLED
+CHECK_FPU_ENABLED
  tcg_gen_mov_i32(cpu_fpul, FREG(B11_8));
-   return;
+return;


Sorry, it's a very late reply ... but in case you're still interested in 
fixing this: It seems like at least some of these files used TABs as 8 
spaces, not as 4 spaces, so after applying your patch, the indentation seems 
to be wrong in all places. Please double-check the look of the files before 
sending! Thanks!


 Thomas

Re: [RFC PATCH v3 18/36] i386/tdvf: Introduce function to parse TDVF metadata

2022-03-18 Thread Isaku Yamahata

On Thu, Mar 17, 2022 at 09:58:55PM +0800,
Xiaoyao Li  wrote:

> diff --git a/hw/i386/tdvf.c b/hw/i386/tdvf.c
> new file mode 100644
> index ..02da1d2c12dd
> --- /dev/null
> +++ b/hw/i386/tdvf.c
> @@ -0,0 +1,196 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0-or-later
> +
> + * Copyright (c) 2020 Intel Corporation
> + * Author: Isaku Yamahata 
> + *
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> +
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> +
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see .
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/i386/pc.h"
> +#include "hw/i386/tdvf.h"
> +#include "sysemu/kvm.h"
> +
> +#define TDX_METADATA_GUID "e47a6535-984a-4798-865e-4685a7bf8ec2"
> +#define TDX_METADATA_VERSION1
> +#define TDVF_SIGNATURE_LE32 0x46564454 /* TDVF as little endian */

_LE32 doesn't make sense.  qemu doesn't provide macro version for byteswap.
Let's convert at the usage point.


> +
> +typedef struct {
> +uint32_t DataOffset;
> +uint32_t RawDataSize;
> +uint64_t MemoryAddress;
> +uint64_t MemoryDataSize;
> +uint32_t Type;
> +uint32_t Attributes;
> +} TdvfSectionEntry;
> +
> +typedef struct {
> +uint32_t Signature;
> +uint32_t Length;
> +uint32_t Version;
> +uint32_t NumberOfSectionEntries;
> +TdvfSectionEntry SectionEntries[];
> +} TdvfMetadata;
> +
> +struct tdx_metadata_offset {
> +uint32_t offset;
> +};
> +
> +static TdvfMetadata *tdvf_get_metadata(void *flash_ptr, int size)
> +{
> +TdvfMetadata *metadata;
> +uint32_t offset = 0;
> +uint8_t *data;
> +
> +if ((uint32_t) size != size) {
> +return NULL;
> +}
> +
> +if (pc_system_ovmf_table_find(TDX_METADATA_GUID, , NULL)) {
> +offset = size - le32_to_cpu(((struct tdx_metadata_offset 
> *)data)->offset);
> +
> +if (offset + sizeof(*metadata) > size) {
> +return NULL;
> +}
> +} else {
> +error_report("Cannot find TDX_METADATA_GUID\n");
> +return NULL;
> +}
> +
> +metadata = flash_ptr + offset;
> +
> +/* Finally, verify the signature to determine if this is a TDVF image. */
> +   if (metadata->Signature != TDVF_SIGNATURE_LE32) {


metadata->Signature = le32_to_cpu(metadata->Signature);
metadata->Signature != TDVF_SIGNATURE for consistency.

-- 
Isaku Yamahata

Re: QEMU device refcounting when device creates a container MR

2022-03-18 Thread Igor Mammedov

On Thu, 10 Mar 2022 17:11:14 +
Peter Maydell  wrote:

> On Thu, 10 Mar 2022 at 16:30, Igor Mammedov  wrote:
> >
> > Do On Thu, 10 Mar 2022 16:05:24 +
> > Peter Maydell  wrote:
> >  
> > > On Thu, 10 Mar 2022 at 15:36, Igor Mammedov  wrote:  
> > > >
> > > > On Wed, 9 Mar 2022 16:56:21 +
> > > > Peter Maydell  wrote:  
> > > > > ...also, in the device-introspect-test where I see this problem,
> > > > > unrealize is never going to be called anyway, because the device
> > > > > is only put through "instance_init" and then dereffed (which
> > > > > does not result in instance_finalize being called, because the
> > > > > refcount is still non-zero).  
> > > >
> > > > question is why introspected device is deferred instead of being
> > > > destroyed if it's no longer needed?  
> > >
> > > ...because the reference count is not zero.
> > >
> > > What is supposed to happen is:
> > >  * device is created (inited), and has refcount of 1
> > >  * introspection code does its thing
> > >  * introspection code derefs the device, and it gets deinited
> > >
> > > This bug means that when the device is inited it has a refcount
> > > that is too high, and so despite the code that creates it
> > > correctly dereffing it, it's still lying around.  
> >
> > looks like ref count leak somewhere, instance_finalize() take care
> > of cleaning up instance_init() actions.  
> 
> If you read the rest of the thread, we know why the refcount
> is too high. And instance_finalize *is never called*, so it
> cannot clean up what instance_init has done.
> 
> > Do you have an example/reproducer?  
> 
> Yes, see the thread -- device-introspect-test shows it.
> (You can put printfs in ehci_sysbus_init and ehci_sysbus_finalize
> and see that for some devices we don't ever call finalize.)

something like following might work.

basic idea is avoid cyclic references when subregion and container
have the same owner.
And properly handle references of subregion itsef when it's added to container,
this is necessary to prevent subregion being freed (when it's removed as a 
child property)
since container might still exist and 'referencing' subregion.
So that later when container is finalized it would call del_region()
on still alive subregion.


diff --git a/softmmu/memory.c b/softmmu/memory.c
index 8060c6de78..499c20fcef 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -2527,8 +2527,11 @@ static void 
memory_region_update_container_subregions(MemoryRegion *subregion)
 MemoryRegion *other;
 
 memory_region_transaction_begin();
+object_ref(subregion);
 
-memory_region_ref(subregion);
+if (subregion->container->owner != subregion->owner) {
+memory_region_ref(subregion);
+}
 QTAILQ_FOREACH(other, >subregions, subregions_link) {
 if (subregion->priority >= other->priority) {
 QTAILQ_INSERT_BEFORE(other, subregion, subregions_link);
@@ -2580,14 +2583,17 @@ void memory_region_del_subregion(MemoryRegion *mr,
 
 memory_region_transaction_begin();
 assert(subregion->container == mr);
-subregion->container = NULL;
 for (alias = subregion->alias; alias; alias = alias->alias) {
 alias->mapped_via_alias--;
 assert(alias->mapped_via_alias >= 0);
 }
 QTAILQ_REMOVE(>subregions, subregion, subregions_link);
-memory_region_unref(subregion);
+if (subregion->container->owner != subregion->owner) {
+memory_region_unref(subregion);
+}
+subregion->container = NULL;
 memory_region_update_pending |= mr->enabled && subregion->enabled;
+object_unref(subregion);
 memory_region_transaction_commit();
 }

> 
> -- PMM
>

Re: [RFC PATCH v3 16/36] i386/tdx: Set kvm_readonly_mem_enabled to false for TDX VM

2022-03-18 Thread Isaku Yamahata

On Thu, Mar 17, 2022 at 09:58:53PM +0800,
Xiaoyao Li  wrote:

> TDX only supports readonly for shared memory but not for private memory.
> 
> In the view of QEMU, it has no idea whether a memslot is used by shared
> memory of private. Thus just mark kvm_readonly_mem_enabled to false to
> TDX VM for simplicity.
> 
> Note, pflash has dependency on readonly capability from KVM while TDX
> wants to reuse pflash interface to load TDVF (as OVMF). Excuse TDX VM
> for readonly check in pflash.
> 
> Signed-off-by: Xiaoyao Li 
> ---
>  hw/i386/pc_sysfw.c| 2 +-
>  target/i386/kvm/tdx.c | 9 +
>  2 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
> index c8b17af95353..75b34d02cb4f 100644
> --- a/hw/i386/pc_sysfw.c
> +++ b/hw/i386/pc_sysfw.c
> @@ -245,7 +245,7 @@ void pc_system_firmware_init(PCMachineState *pcms,
>  /* Machine property pflash0 not set, use ROM mode */
>  x86_bios_rom_init(MACHINE(pcms), "bios.bin", rom_memory, false);
>  } else {
> -if (kvm_enabled() && !kvm_readonly_mem_enabled()) {
> +if (kvm_enabled() && (!kvm_readonly_mem_enabled() && !is_tdx_vm())) {

Is this called before tdx_kvm_init()?

Thanks,


>  /*
>   * Older KVM cannot execute from device memory. So, flash
>   * memory cannot be used unless the readonly memory kvm
> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
> index 94a9c1ea7e9c..1bb8211e74e6 100644
> --- a/target/i386/kvm/tdx.c
> +++ b/target/i386/kvm/tdx.c
> @@ -115,6 +115,15 @@ int tdx_kvm_init(MachineState *ms, Error **errp)
>  get_tdx_capabilities();
>  }
>  
> +/*
> + * Set kvm_readonly_mem_allowed to false, because TDX only supports 
> readonly
> + * memory for shared memory but not for private memory. Besides, whether 
> a
> + * memslot is private or shared is not determined by QEMU.
> + *
> + * Thus, just mark readonly memory not supported for simplicity.
> + */
> +kvm_readonly_mem_allowed = false;
> +
>  tdx_guest = tdx;
>  
>  return 0;
> -- 
> 2.27.0
> 
> 

-- 
Isaku Yamahata

Re: [PATCH v2 5/5] i386/cpu: Free env->xsave_buf in x86_cpu_unrealizefn()

2022-03-18 Thread Mark Kanda


On 3/18/2022 11:32 AM, Philippe Mathieu-Daudé wrote:

On 18/3/22 16:15, Mark Kanda wrote:

vCPU hotunplug related leak reported by Valgrind:

==132362== 4,096 bytes in 1 blocks are definitely lost in loss record 8,440 
of 8,549

==132362==    at 0x4C3B15F: memalign (vg_replace_malloc.c:1265)
==132362==    by 0x4C3B288: posix_memalign (vg_replace_malloc.c:1429)
==132362==    by 0xB41195: qemu_try_memalign (memalign.c:53)
==132362==    by 0xB41204: qemu_memalign (memalign.c:73)
==132362==    by 0x7131CB: kvm_init_xsave (kvm.c:1601)
==132362==    by 0x7148ED: kvm_arch_init_vcpu (kvm.c:2031)
==132362==    by 0x91D224: kvm_init_vcpu (kvm-all.c:516)
==132362==    by 0x9242C9: kvm_vcpu_thread_fn (kvm-accel-ops.c:40)
==132362==    by 0xB2EB26: qemu_thread_start (qemu-thread-posix.c:556)
==132362==    by 0x7EB2159: start_thread (in /usr/lib64/libpthread-2.28.so)
==132362==    by 0x9D45DD2: clone (in /usr/lib64/libc-2.28.so)

Signed-off-by: Mark Kanda 
---
  target/i386/cpu.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index a88d6554c8..014a716c36 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6572,6 +6572,11 @@ static void x86_cpu_unrealizefn(DeviceState *dev)
  }
    xcc->parent_unrealize(dev);
+
+#if defined(CONFIG_KVM) || defined(CONFIG_HVF)
+    CPUX86State *env = >env;
+    g_free(env->xsave_buf);


This belong to hvf_arch_vcpu_destroy().

And for KVM, in the missing kvm_arch_destroy_vcpu().



Will fix in v3.

Thanks Philippe,
-Mark

Re: [PATCH v4] tests: Do not treat the iotests as separate meson test target anymore

2022-03-18 Thread Hanna Reitz


On 10.03.22 08:50, Thomas Huth wrote:

If there is a failing iotest, the output is currently not logged to
the console anymore. To get this working again, we need to run the
meson test runner with "--print-errorlogs" (and without "--verbose"
due to a current meson bug that will be fixed here:
https://github.com/mesonbuild/meson/commit/c3f145ca2b9f5.patch ).
We could update the "meson test" call in tests/Makefile.include,
but actually it's nicer and easier if we simply do not treat the
iotests as separate test target anymore and integrate them along
with the other test suites. This has the disadvantage of not getting
the detailed progress indication there anymore, but since that was
only working right in single-threaded "make -j1" mode anyway, it's
not a huge loss right now.

Signed-off-by: Thomas Huth 
---
  v4: updated commit description

  meson.build| 6 +++---
  scripts/mtest2make.py  | 4 
  tests/Makefile.include | 9 +
  3 files changed, 4 insertions(+), 15 deletions(-)


I can’t really say I understand what’s going on in this patch and around 
it, but I can confirm that it before this patch, fail diffs aren’t 
printed; but afterwards, they are.  So I’m afraid all I can give is a


Tested-by: Hanna Reitz 

If noone else steps up and you need a tree for this to go in, I’d be up 
for it.


Hanna

Re: [PATCH v2 4/5] cpu: Free cpu->cpu_ases in cpu_exec_unrealizefn()

2022-03-18 Thread Mark Kanda


On 3/18/2022 11:26 AM, Philippe Mathieu-Daudé wrote:

On 18/3/22 16:15, Mark Kanda wrote:

vCPU hotunplug related leak reported by Valgrind:

==132362== 216 bytes in 1 blocks are definitely lost in loss record 7,119 of 
8,549

==132362==    at 0x4C3ADBB: calloc (vg_replace_malloc.c:1117)
==132362==    by 0x69EE4CD: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.5600.4)
==132362==    by 0x7E34AF: cpu_address_space_init (physmem.c:751)
==132362==    by 0x45053E: qemu_init_vcpu (cpus.c:635)
==132362==    by 0x76B4A7: x86_cpu_realizefn (cpu.c:6520)
==132362==    by 0x9343ED: device_set_realized (qdev.c:531)
==132362==    by 0x93E26F: property_set_bool (object.c:2273)
==132362==    by 0x93C23E: object_property_set (object.c:1408)
==132362==    by 0x9406DC: object_property_set_qobject (qom-qobject.c:28)
==132362==    by 0x93C5A9: object_property_set_bool (object.c:1477)
==132362==    by 0x933C81: qdev_realize (qdev.c:333)
==132362==    by 0x455E9A: qdev_device_add_from_qdict (qdev-monitor.c:713)

Signed-off-by: Mark Kanda 
---
  cpu.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/cpu.c b/cpu.c
index be1f8b074c..6a3475022f 100644
--- a/cpu.c
+++ b/cpu.c
@@ -173,6 +173,7 @@ void cpu_exec_unrealizefn(CPUState *cpu)
  if (tcg_enabled()) {
  tcg_exec_unrealizefn(cpu);
  }
+    g_free(cpu->cpu_ases);


There is an API mismatch here. We miss cpu_address_space_destroy().

cpu_exec_unrealizefn() then calls cpu_address_space_destroy(),
and cpu_address_space_destroy() frees cpu_ases.

Otherwise other cpu_address_space_init() calls will keep leaking.



Will fix in v3.

Thanks Philippe,
-Mark

Re: [RFC PATCH v3 09/36] KVM: Introduce kvm_arch_pre_create_vcpu()

2022-03-18 Thread Isaku Yamahata

On Thu, Mar 17, 2022 at 09:58:46PM +0800,
Xiaoyao Li  wrote:

> Introduce kvm_arch_pre_create_vcpu(), to perform arch-dependent
> work prior to create any vcpu. This is for i386 TDX because it needs
> call TDX_INIT_VM before creating any vcpu.
> 
> Signed-off-by: Xiaoyao Li 
> ---
>  accel/kvm/kvm-all.c| 7 +++
>  include/sysemu/kvm.h   | 1 +
>  target/arm/kvm64.c | 5 +
>  target/i386/kvm/kvm.c  | 5 +
>  target/mips/kvm.c  | 5 +
>  target/ppc/kvm.c   | 5 +
>  target/s390x/kvm/kvm.c | 5 +
>  7 files changed, 33 insertions(+)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 27864dfaeaaa..a4bb449737a6 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -465,6 +465,13 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
>  
>  trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
>  
> +ret = kvm_arch_pre_create_vcpu(cpu);
> +if (ret < 0) {
> +error_setg_errno(errp, -ret,
> + "kvm_init_vcpu: kvm_arch_pre_create_vcpu() failed");
> +goto err;
> +}
> +
>  ret = kvm_get_vcpu(s, kvm_arch_vcpu_id(cpu));
>  if (ret < 0) {
>  error_setg_errno(errp, -ret, "kvm_init_vcpu: kvm_get_vcpu failed 
> (%lu)",
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index a783c7886811..0e94031ab7c7 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -373,6 +373,7 @@ int kvm_arch_put_registers(CPUState *cpu, int level);
>  
>  int kvm_arch_init(MachineState *ms, KVMState *s);
>  
> +int kvm_arch_pre_create_vcpu(CPUState *cpu);
>  int kvm_arch_init_vcpu(CPUState *cpu);
>  int kvm_arch_destroy_vcpu(CPUState *cpu);
>  
> diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
> index ccadfbbe72be..ae7336851c62 100644
> --- a/target/arm/kvm64.c
> +++ b/target/arm/kvm64.c
> @@ -935,6 +935,11 @@ int kvm_arch_init_vcpu(CPUState *cs)
>  return kvm_arm_init_cpreg_list(cpu);
>  }
>  
> +int kvm_arch_pre_create_vcpu(CPUState *cpu)
> +{
> +return 0;
> +}
> +

Weak symbol can be used to avoid update all the arch.

Thanks,
-- 
Isaku Yamahata

Re: [RFC PATCH v3 08/36] i386/tdx: Adjust get_supported_cpuid() for TDX VM

2022-03-18 Thread Isaku Yamahata

On Thu, Mar 17, 2022 at 09:58:45PM +0800,
Xiaoyao Li  wrote:

> For TDX, the allowable CPUID configuration differs from what KVM
> reports for KVM scope via KVM_GET_SUPPORTED_CPUID.
> 
> - Some CPUID bits are not supported for TDX VM while KVM reports the
>   support. Mask them off for TDX VM. e.g., CPUID_EXT_VMX, some PV
>   featues.
> 
> - The supported XCR0 and XSS bits needs to be caped by tdx_caps, because
>   KVM uses them to setup XFAM of TD.
> 
> Introduce tdx_get_supported_cpuid() to adjust the
> kvm_arch_get_supported_cpuid() for TDX VM.
> 
> Signed-off-by: Xiaoyao Li 
> ---
>  target/i386/cpu.h |  5 +
>  target/i386/kvm/kvm.c |  4 
>  target/i386/kvm/tdx.c | 39 +++
>  target/i386/kvm/tdx.h |  2 ++
>  4 files changed, 50 insertions(+)
> 
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index 5e406088a91a..7fa30f4ed7db 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -566,6 +566,11 @@ typedef enum X86Seg {
>  #define ESA_FEATURE_XFD_MASK(1U << ESA_FEATURE_XFD_BIT)
>  
>  
> +#define XCR0_MASK   (XSTATE_FP_MASK | XSTATE_SSE_MASK | XSTATE_YMM_MASK 
> | \
> + XSTATE_BNDREGS_MASK | XSTATE_BNDCSR_MASK | \
> + XSTATE_OPMASK_MASK | XSTATE_ZMM_Hi256_MASK | \
> + XSTATE_Hi16_ZMM_MASK | XSTATE_PKRU_MASK)
> +
>  /* CPUID feature words */
>  typedef enum FeatureWord {
>  FEAT_1_EDX, /* CPUID[1].EDX */
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index 26ed5faf07b8..ddbe8f64fadb 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -486,6 +486,10 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
> uint32_t function,
>  ret |= 1U << KVM_HINTS_REALTIME;
>  }
>  
> +if (is_tdx_vm()) {
> +tdx_get_supported_cpuid(function, index, reg, );
> +}
> +
>  return ret;
>  }
>  
> diff --git a/target/i386/kvm/tdx.c b/target/i386/kvm/tdx.c
> index 846511b299f4..e4ee55f30c79 100644
> --- a/target/i386/kvm/tdx.c
> +++ b/target/i386/kvm/tdx.c
> @@ -14,6 +14,7 @@
>  #include "qemu/osdep.h"
>  #include "qapi/error.h"
>  #include "qom/object_interfaces.h"
> +#include "standard-headers/asm-x86/kvm_para.h"
>  #include "sysemu/kvm.h"
>  
>  #include "hw/i386/x86.h"
> @@ -110,6 +111,44 @@ int tdx_kvm_init(MachineState *ms, Error **errp)
>  return 0;
>  }
>  
> +void tdx_get_supported_cpuid(uint32_t function, uint32_t index, int reg,
> + uint32_t *ret)
> +{
> +switch (function) {
> +case 1:
> +if (reg == R_ECX) {
> +*ret &= ~CPUID_EXT_VMX;
> +}
> +break;
> +case 0xd:
> +if (index == 0) {
> +if (reg == R_EAX) {
> +*ret &= (uint32_t)tdx_caps->xfam_fixed0 & XCR0_MASK;
> +*ret |= (uint32_t)tdx_caps->xfam_fixed1 & XCR0_MASK;
> +} else if (reg == R_EDX) {
> +*ret &= (tdx_caps->xfam_fixed0 & XCR0_MASK) >> 32;
> +*ret |= (tdx_caps->xfam_fixed1 & XCR0_MASK) >> 32;
> +}
> +} else if (index == 1) {
> +/* TODO: Adjust XSS when it's supported. */
> +}
> +break;
> +case KVM_CPUID_FEATURES:
> +if (reg == R_EAX) {
> +*ret &= ~((1ULL << KVM_FEATURE_CLOCKSOURCE) |
> +  (1ULL << KVM_FEATURE_CLOCKSOURCE2) |
> +  (1ULL << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
> +  (1ULL << KVM_FEATURE_ASYNC_PF) |
> +  (1ULL << KVM_FEATURE_ASYNC_PF_VMEXIT) |
> +  (1ULL << KVM_FEATURE_ASYNC_PF_INT));

Because new feature bit may be introduced in future (it's unlikely though),
*ret &= (supported_bits) is better than *ret &= ~(unsupported_bits)

Thanks,

> +}
> +break;
> +default:
> +/* TODO: Use tdx_caps to adjust CPUID leafs. */
> +break;
> +}
> +}
> +
>  /* tdx guest */
>  OBJECT_DEFINE_TYPE_WITH_INTERFACES(TdxGuest,
> tdx_guest,
> diff --git a/target/i386/kvm/tdx.h b/target/i386/kvm/tdx.h
> index 4036ca2f3f99..06599b65b827 100644
> --- a/target/i386/kvm/tdx.h
> +++ b/target/i386/kvm/tdx.h
> @@ -27,5 +27,7 @@ bool is_tdx_vm(void);
>  #endif /* CONFIG_TDX */
>  
>  int tdx_kvm_init(MachineState *ms, Error **errp);
> +void tdx_get_supported_cpuid(uint32_t function, uint32_t index, int reg,
> + uint32_t *ret);
>  
>  #endif /* QEMU_I386_TDX_H */
> -- 
> 2.27.0
> 
> 

-- 
Isaku Yamahata

Re: [PATCH] block/rbd: fix write zeroes with growing images

2022-03-18 Thread Stefano Garzarella


On Fri, Mar 18, 2022 at 04:48:18PM +0100, Peter Lieven wrote:




Am 18.03.2022 um 09:25 schrieb Stefano Garzarella :

On Thu, Mar 17, 2022 at 07:27:05PM +0100, Peter Lieven wrote:




Am 17.03.2022 um 17:26 schrieb Stefano Garzarella :


Commit d24f80234b ("block/rbd: increase dynamically the image size")
added a workaround to support growing images (eg. qcow2), resizing
the image before write operations that exceed the current size.

We recently added support for write zeroes and without the
workaround we can have problems with qcow2.

So let's move the resize into qemu_rbd_start_co() and do it when
the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
Signed-off-by: Stefano Garzarella 
---
block/rbd.c | 26 ++
1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 8f183eba2a..6caf35cbba 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1107,6 +1107,20 @@ static int coroutine_fn 
qemu_rbd_start_co(BlockDriverState *bs,

   assert(!qiov || qiov->size == bytes);

+if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
+/*
+ * RBD APIs don't allow us to write more than actual size, so in order
+ * to support growing images, we resize the image before write
+ * operations that exceed the current size.
+ */
+if (offset + bytes > s->image_size) {
+int r = qemu_rbd_resize(bs, offset + bytes);
+if (r < 0) {
+return r;
+}
+}
+}
+
   r = rbd_aio_create_completion(,
 (rbd_callback_t) qemu_rbd_completion_cb, );
   if (r < 0) {
@@ -1182,18 +1196,6 @@ coroutine_fn qemu_rbd_co_pwritev(BlockDriverState *bs, 
int64_t offset,
int64_t bytes, QEMUIOVector *qiov,
BdrvRequestFlags flags)
{
-BDRVRBDState *s = bs->opaque;
-/*
- * RBD APIs don't allow us to write more than actual size, so in order
- * to support growing images, we resize the image before write
- * operations that exceed the current size.
- */
-if (offset + bytes > s->image_size) {
-int r = qemu_rbd_resize(bs, offset + bytes);
-if (r < 0) {
-return r;
-}
-}
   return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, RBD_AIO_WRITE);
}

--
2.35.1



Do we really have a use case for growing rbd images?


The use case is to have a qcow2 image on rbd.
I don't think it's very common, but some people use it and here [1] we had a 
little discussion about features that could be interesting (e.g.  persistent 
dirty bitmaps for incremental backup).

In any case the support is quite simple and does not affect other use cases 
since we only increase the size when we go beyond the current size.

IMHO we can have it in :-)



The QCOW2 alone doesn’t make much sense, but additional metadata might 
be a use case.


Yep.

Be aware that the current approach will serialize requests. If there is 
a real use case, we might think of a better solution.


Good point, but it only happens when we have to resize, so maybe it's 
okay for now, but I agree we could do better ;-)


Thanks,
Stefano

Re: [PATCH] tests/qemu-iotests: Use GNU sed in two more spots where it is necessary

2022-03-18 Thread Hanna Reitz


On 09.03.22 11:16, Thomas Huth wrote:

These two spots have been missed in commit 9086c7639822 ("Rework the
checks and spots using GNU sed") - they need GNU sed, too, since they
are using the "+" address form.

Signed-off-by: Thomas Huth 
---
  tests/qemu-iotests/common.filter | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)


Thanks, applied to my block branch:

https://gitlab.com/hreitz/qemu/-/commits/block

Hanna

Re: [PATCH v4 00/18] iotests: add enhanced debugging info to qemu-img failures

2022-03-18 Thread Hanna Reitz


On 18.03.22 16:08, John Snow wrote:



On Fri, Mar 18, 2022, 9:36 AM Hanna Reitz  wrote:

On 18.03.22 00:49, John Snow wrote:
> Hiya!
>
> This series effectively replaces qemu_img_pipe_and_status() with a
> rewritten function named qemu_img() that raises an exception on
non-zero
> return code by default. By the end of the series, every last
invocation
> of the qemu-img binary ultimately goes through qemu_img().
>
> The exception that this function raises includes stdout/stderr
output
> when the traceback is printed in a a little decorated text box
so that
> it stands out from the jargony Python traceback readout.
>
> (You can test what this looks like for yourself, or at least you
could,
> by disabling ztsd support and then running qcow2 iotest 065.)
>
> Negative tests are still possible in two ways:
>
> - Passing check=False to qemu_img, qemu_img_log, or img_info_log
> - Catching and handling the CalledProcessError exception at the
callsite.

Thanks!  Applied to my block branch:

https://gitlab.com/hreitz/qemu/-/commits/block

Hanna


Thanks so much!

I have more works-in-progress, but I want to be kind to your time. 
(And tolerance level for Python.)


Important:

- 4 patches that switch to async qmp permanently. Almost no code, it's 
just a policy thing, but it could affect iotests. Not for this freeze 
now, but it'd help me a lot if you could take the time to ack it next 
week so I can stage them and push forward with splitting the qmp 
library out of the tree. I need to rebase and resend, which I'll do in 
just a bit.


Not urgent:

- Another 15ish patches for unifying qemu-io calls like i did for 
qemu-img here. Stalled somewhat because I couldn't convincingly unify 
the qemu_io calls that keep the pipe open, so I will probably just 
leave those calls alone for now, unless I get a New Idea. Should I 
send them to the list and you'll get to them whenever you get to them, 
or would you prefer I wait a while?


I don’t mind you sending them.

- Another 15ish patches that split the "skip files" list for 
pylint/mypy into separate skip-lists per tool and then drastically 
reduces their size such that only a handful of files remain in each 
skiplist. Same question here: Should I send these to the list and 
someone'll get to it whenever they do, or would you prefer I wait?


Same reply. :)

Hanna

Re: [PATCH v2 for-7.1] vfio/common: remove spurious tpm-crb-cmd misalignment warning

2022-03-18 Thread Eric Auger

Hi,

On 3/18/22 5:18 PM, Philippe Mathieu-Daudé wrote:
> On 18/3/22 16:01, Eric Auger wrote:
>> The CRB command buffer currently is a RAM MemoryRegion and given
>> its base address alignment, it causes an error report on
>> vfio_listener_region_add(). This region could have been a RAM device
>> region, easing the detection of such safe situation but this option
>> was not well received. So let's add a helper function that uses the
>> memory region owner type to detect the situation is safe wrt
>> the assignment. Other device types can be checked here if such kind
>> of problem occurs again.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>>
>> v1 -> v2:
>> - do not check the MR name but rather the owner type
>> ---
>>   hw/vfio/common.c | 27 ++-
>>   hw/vfio/trace-events |  1 +
>>   2 files changed, 27 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 080046e3f51..98b0b6fb8c7 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -40,6 +40,7 @@
>>   #include "trace.h"
>>   #include "qapi/error.h"
>>   #include "migration/migration.h"
>> +#include "sysemu/tpm.h"
>
>> +static bool vfio_known_safe_misalignment(MemoryRegionSection *section)
>> +{
>> +    MemoryRegion *mr = section->mr;
>> +
>> +    if (!object_dynamic_cast(mr->owner, TYPE_TPM_CRB)) {
>
> Using TPM_IS_CRB() instead:
> Reviewed-by: Philippe Mathieu-Daudé 

Hum yes, missed that define. Alex, with that fix, does it match your
expectation. Do we really want to add the dynamic trace point for safe
misalignments or don't we care?

Thanks

Eric

Re: [PATCH 0/4] iotests: finalize switch to async QMP

2022-03-18 Thread Hanna Reitz


On 08.02.22 20:52, John Snow wrote:

Squeak Squeak...

...Any objections to me staging this?

(This patchset removes the accommodations in iotests for allowing
either library to run and always forces the new one. Point of no
return for iotests.)


I took this as “if I don’t reply, that’ll be reply enough” :)

Looks to me like the rebase is minimal (just shuffling the imports in 
patch 4 a bit), so I guess this’ll help even before you resend:


Acked-by: Hanna Reitz

Re: [PATCH v2 5/5] i386/cpu: Free env->xsave_buf in x86_cpu_unrealizefn()

2022-03-18 Thread Philippe Mathieu-Daudé


On 18/3/22 16:15, Mark Kanda wrote:

vCPU hotunplug related leak reported by Valgrind:

==132362== 4,096 bytes in 1 blocks are definitely lost in loss record 8,440 of 
8,549
==132362==at 0x4C3B15F: memalign (vg_replace_malloc.c:1265)
==132362==by 0x4C3B288: posix_memalign (vg_replace_malloc.c:1429)
==132362==by 0xB41195: qemu_try_memalign (memalign.c:53)
==132362==by 0xB41204: qemu_memalign (memalign.c:73)
==132362==by 0x7131CB: kvm_init_xsave (kvm.c:1601)
==132362==by 0x7148ED: kvm_arch_init_vcpu (kvm.c:2031)
==132362==by 0x91D224: kvm_init_vcpu (kvm-all.c:516)
==132362==by 0x9242C9: kvm_vcpu_thread_fn (kvm-accel-ops.c:40)
==132362==by 0xB2EB26: qemu_thread_start (qemu-thread-posix.c:556)
==132362==by 0x7EB2159: start_thread (in /usr/lib64/libpthread-2.28.so)
==132362==by 0x9D45DD2: clone (in /usr/lib64/libc-2.28.so)

Signed-off-by: Mark Kanda 
---
  target/i386/cpu.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index a88d6554c8..014a716c36 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6572,6 +6572,11 @@ static void x86_cpu_unrealizefn(DeviceState *dev)
  }
  
  xcc->parent_unrealize(dev);

+
+#if defined(CONFIG_KVM) || defined(CONFIG_HVF)
+CPUX86State *env = >env;
+g_free(env->xsave_buf);


This belong to hvf_arch_vcpu_destroy().

And for KVM, in the missing kvm_arch_destroy_vcpu().


+#endif
  }
  
  typedef struct BitProperty {

Re: [PATCH v2 4/5] cpu: Free cpu->cpu_ases in cpu_exec_unrealizefn()

2022-03-18 Thread Philippe Mathieu-Daudé


On 18/3/22 16:15, Mark Kanda wrote:

vCPU hotunplug related leak reported by Valgrind:

==132362== 216 bytes in 1 blocks are definitely lost in loss record 7,119 of 
8,549
==132362==at 0x4C3ADBB: calloc (vg_replace_malloc.c:1117)
==132362==by 0x69EE4CD: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.5600.4)
==132362==by 0x7E34AF: cpu_address_space_init (physmem.c:751)
==132362==by 0x45053E: qemu_init_vcpu (cpus.c:635)
==132362==by 0x76B4A7: x86_cpu_realizefn (cpu.c:6520)
==132362==by 0x9343ED: device_set_realized (qdev.c:531)
==132362==by 0x93E26F: property_set_bool (object.c:2273)
==132362==by 0x93C23E: object_property_set (object.c:1408)
==132362==by 0x9406DC: object_property_set_qobject (qom-qobject.c:28)
==132362==by 0x93C5A9: object_property_set_bool (object.c:1477)
==132362==by 0x933C81: qdev_realize (qdev.c:333)
==132362==by 0x455E9A: qdev_device_add_from_qdict (qdev-monitor.c:713)

Signed-off-by: Mark Kanda 
---
  cpu.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/cpu.c b/cpu.c
index be1f8b074c..6a3475022f 100644
--- a/cpu.c
+++ b/cpu.c
@@ -173,6 +173,7 @@ void cpu_exec_unrealizefn(CPUState *cpu)
  if (tcg_enabled()) {
  tcg_exec_unrealizefn(cpu);
  }
+g_free(cpu->cpu_ases);


There is an API mismatch here. We miss cpu_address_space_destroy().

cpu_exec_unrealizefn() then calls cpu_address_space_destroy(),
and cpu_address_space_destroy() frees cpu_ases.

Otherwise other cpu_address_space_init() calls will keep leaking.


  cpu_list_remove(cpu);
  }

Re: [PATCH v2 1/5] accel: Introduce AccelOpsClass::destroy_vcpu_thread()

2022-03-18 Thread Philippe Mathieu-Daudé


On 18/3/22 16:15, Mark Kanda wrote:

Add destroy_vcpu_thread() to AccelOps as a method for vcpu thread cleanup.
This will be used in subsequent patches.

Suggested-by: Philippe Mathieu-Daude 


Thanks, but preferably:
Suggested-by: Philippe Mathieu-Daudé 


Signed-off-by: Mark Kanda 
---
  include/sysemu/accel-ops.h | 1 +
  softmmu/cpus.c | 3 +++
  2 files changed, 4 insertions(+)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v2 for-7.1] vfio/common: remove spurious tpm-crb-cmd misalignment warning

2022-03-18 Thread Philippe Mathieu-Daudé


On 18/3/22 16:01, Eric Auger wrote:

The CRB command buffer currently is a RAM MemoryRegion and given
its base address alignment, it causes an error report on
vfio_listener_region_add(). This region could have been a RAM device
region, easing the detection of such safe situation but this option
was not well received. So let's add a helper function that uses the
memory region owner type to detect the situation is safe wrt
the assignment. Other device types can be checked here if such kind
of problem occurs again.

Signed-off-by: Eric Auger 

---

v1 -> v2:
- do not check the MR name but rather the owner type
---
  hw/vfio/common.c | 27 ++-
  hw/vfio/trace-events |  1 +
  2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 080046e3f51..98b0b6fb8c7 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -40,6 +40,7 @@
  #include "trace.h"
  #include "qapi/error.h"
  #include "migration/migration.h"
+#include "sysemu/tpm.h"



+static bool vfio_known_safe_misalignment(MemoryRegionSection *section)
+{
+MemoryRegion *mr = section->mr;
+
+if (!object_dynamic_cast(mr->owner, TYPE_TPM_CRB)) {


Using TPM_IS_CRB() instead:
Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH for-7.1] hw: Add compat machines for 7.1

2022-03-18 Thread Eric Farman

On Wed, 2022-03-16 at 15:55 +0100, Cornelia Huck wrote:
> Add 7.1 machine types for arm/i440fx/m68k/q35/s390x/spapr.
> 
> Signed-off-by: Cornelia Huck 
> ---
>  hw/arm/virt.c  |  9 -
>  hw/core/machine.c  |  3 +++
>  hw/i386/pc.c   |  3 +++
>  hw/i386/pc_piix.c  | 14 +-
>  hw/i386/pc_q35.c   | 13 -
>  hw/m68k/virt.c |  9 -
>  hw/ppc/spapr.c | 15 +--
>  hw/s390x/s390-virtio-ccw.c | 14 +-
>  include/hw/boards.h|  3 +++
>  include/hw/i386/pc.h   |  3 +++
>  10 files changed, 79 insertions(+), 7 deletions(-)
> 

For s390x:

Reviewed-by: Eric Farman 

..snip...

Re: [PATCH] gitattributes: Cover Objective-C source files

2022-03-18 Thread Philippe Mathieu-Daudé


On 18/3/22 15:42, Akihiko Odaki wrote:
I don't think this is needed. I could see a diff annotated with a method 
name even without this change:

% git diff
diff --git a/ui/cocoa.m b/ui/cocoa.m
index cb6e7c41dc6..14a4416cc8b 100644
--- a/ui/cocoa.m
+++ b/ui/cocoa.m
@@ -1264,6 +1264,7 @@ - (id) init
  [pauseLabel setTextColor: [NSColor blackColor]];
  [pauseLabel sizeToFit];
  }
+    //
  return self;
  }

Commit 29cf16db23 says:

Since commits 0979ed017f0 ("meson: rename .inc.h files to .h.inc")
and 139c1837db7 ("meson: rename included C source files to .c.inc")
'git-diff --function-context' stopped displaying C function context
correctly.


So I suspect Git has some knowledge of common file extensions like .c, 
.h and .m although I couldn't find in the source code of Git.


'git-diff --function-context' doesn't work for me without this change.

Re: [PATCH] block/rbd: fix write zeroes with growing images

2022-03-18 Thread Peter Lieven




> Am 18.03.2022 um 09:25 schrieb Stefano Garzarella :
> 
> On Thu, Mar 17, 2022 at 07:27:05PM +0100, Peter Lieven wrote:
>> 
>> 
 Am 17.03.2022 um 17:26 schrieb Stefano Garzarella :
>>> 
>>> Commit d24f80234b ("block/rbd: increase dynamically the image size")
>>> added a workaround to support growing images (eg. qcow2), resizing
>>> the image before write operations that exceed the current size.
>>> 
>>> We recently added support for write zeroes and without the
>>> workaround we can have problems with qcow2.
>>> 
>>> So let's move the resize into qemu_rbd_start_co() and do it when
>>> the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.
>>> 
>>> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
>>> Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
>>> Signed-off-by: Stefano Garzarella 
>>> ---
>>> block/rbd.c | 26 ++
>>> 1 file changed, 14 insertions(+), 12 deletions(-)
>>> 
>>> diff --git a/block/rbd.c b/block/rbd.c
>>> index 8f183eba2a..6caf35cbba 100644
>>> --- a/block/rbd.c
>>> +++ b/block/rbd.c
>>> @@ -1107,6 +1107,20 @@ static int coroutine_fn 
>>> qemu_rbd_start_co(BlockDriverState *bs,
>>> 
>>>assert(!qiov || qiov->size == bytes);
>>> 
>>> +if (cmd == RBD_AIO_WRITE || cmd == RBD_AIO_WRITE_ZEROES) {
>>> +/*
>>> + * RBD APIs don't allow us to write more than actual size, so in 
>>> order
>>> + * to support growing images, we resize the image before write
>>> + * operations that exceed the current size.
>>> + */
>>> +if (offset + bytes > s->image_size) {
>>> +int r = qemu_rbd_resize(bs, offset + bytes);
>>> +if (r < 0) {
>>> +return r;
>>> +}
>>> +}
>>> +}
>>> +
>>>r = rbd_aio_create_completion(,
>>>  (rbd_callback_t) qemu_rbd_completion_cb, 
>>> );
>>>if (r < 0) {
>>> @@ -1182,18 +1196,6 @@ coroutine_fn qemu_rbd_co_pwritev(BlockDriverState 
>>> *bs, int64_t offset,
>>> int64_t bytes, QEMUIOVector *qiov,
>>> BdrvRequestFlags flags)
>>> {
>>> -BDRVRBDState *s = bs->opaque;
>>> -/*
>>> - * RBD APIs don't allow us to write more than actual size, so in order
>>> - * to support growing images, we resize the image before write
>>> - * operations that exceed the current size.
>>> - */
>>> -if (offset + bytes > s->image_size) {
>>> -int r = qemu_rbd_resize(bs, offset + bytes);
>>> -if (r < 0) {
>>> -return r;
>>> -}
>>> -}
>>>return qemu_rbd_start_co(bs, offset, bytes, qiov, flags, RBD_AIO_WRITE);
>>> }
>>> 
>>> --
>>> 2.35.1
>>> 
>> 
>> Do we really have a use case for growing rbd images?
> 
> The use case is to have a qcow2 image on rbd.
> I don't think it's very common, but some people use it and here [1] we had a 
> little discussion about features that could be interesting (e.g.  persistent 
> dirty bitmaps for incremental backup).
> 
> In any case the support is quite simple and does not affect other use cases 
> since we only increase the size when we go beyond the current size.
> 
> IMHO we can have it in :-)
> 

The QCOW2 alone doesn’t make much sense, but additional metadata might be a use 
case.
Be aware that the current approach will serialize requests. If there is a real 
use case, we might think of a better solution.

Peter

> Thanks,
> Stefano
> 
> [1] https://lore.kernel.org/all/20190415080452.GA6031@localhost.localdomain/
>

[PATCH v8 38/46] tests/acpi: Add tables for CXL emulation.

2022-03-18 Thread Jonathan Cameron via

Tables that differ from normal Q35 tables when running the CXL test.

Signed-off-by: Jonathan Cameron 
---
 tests/data/acpi/q35/CEDT.cxl| Bin 0 -> 184 bytes
 tests/data/acpi/q35/DSDT.cxl| Bin 0 -> 9615 bytes
 tests/qtest/bios-tables-test-allowed-diff.h |   2 --
 3 files changed, 2 deletions(-)

diff --git a/tests/data/acpi/q35/CEDT.cxl b/tests/data/acpi/q35/CEDT.cxl
index 
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..b8fa06b00e65712e91e0a5ea0d9277e0146d1c00
 100644
GIT binary patch
literal 184
zcmZ>EbqU$Qz`(%x(aGQ0BUr*U{GMV2P7eE5T6mshKVRJ@Sw=U
r)I#JL88kqeKtKSd14gp~1^Iy(qF)E31_T6{AT-z>kXmGQAh!SjnYIc6

literal 0
HcmV?d1

diff --git a/tests/data/acpi/q35/DSDT.cxl b/tests/data/acpi/q35/DSDT.cxl
index 
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..c1206defed0154e9024702bba88453b2790a306d
 100644
GIT binary patch
literal 9615
zcmeHN&2JmW9i1g9X|-HQONzE^`9p-`^eaU|`6EeNq%pZmk+ejbCaE|94R9$bt!$^r
zB8h=GhEZ7o632(O3FDx*(k=t^*8T%U4YY^$W}qk?>Dm}
z9}AR+<^E>ho8P?InSIL{dUdby)5jSzBDphev7XMoSas9*7>qGGr*EeeJI|V1UartG
z;*prqydLN0IONRKx4qnI!T9;6|B>&%@vd*Q1GaX@xwX~~-oD|lF#=s)3oMIHocwgF
zo@+I?U90MrGG?n-^6czA%QRcAIE$LCtXE@ZYqjLD)XGHbOx=y$yu@7Z++w#f*4a$V
zT28b4x8t8L96a^Wxi_+RpZn_%ZeFrt035@}lb=dDZA9OAl*ND!qEp}%=
z={4m$z}MTB%p%(5*6N7`>)^X{jM>yV^!ZJ{
z-~XLBWzH4mlue;BZx*ZhhE!=l8>wn;6|0Rhvl+YhAkJdV>kh@UFXSs;x?1yE>D1G$
zdLzpMD)9prTJlX|oU8Hv7ka#(J!0&4{)otm$_qsV(;&
zuoz=#!)=+;px93asY>Rg>(l4MX)l%(j#PTiMS)O?+DuIM*Zl74rc>s%h6h-UN
zDw$@VwWnbC%x8vCFgDl*zK=wZt+{=)d}eirH8ZQROl#~2^-y#B*h;mrDC>@i`)z1g
z$C@e_Z${sYn!$Uh^^cOnHYh1~hte1m}MAew3L<9L{;X)^K-P6A$knuR34>Gt48*
zKo?aK5Bq4V>ed@Z{H|@8xHS~G=)2W44qm#sRnMQsEcl~s;l{-I&%Khgd)5jaR*TdAhq2PK|rd{OO7o+bCGrTcP>~F
z%z$fr9N8GQeb!4vjq7w^x97ThIv1^pAUPIcQ>-2MH`W~isOrQMNa^WGP3NSp6QQcp(sWvyPD|H`P}LdNbjCHEaa|`uRVSW>
z%Kfx8owlwMp{jFA(>bN-oYHk7RCP{kI;SqMyPOlUe2n$Co-6QQazsp(8=I+MCigsRS2P3Nqpb5_@hP}P~zbfz?*Rp%Z}
z=N?Vx9$hCwRp(w!=Uz?cUR@_bRVUsgF#6fER4+^6Z>r|U$h>fFzn
z+3-oYpEGme!*0J|x(`EQdLedRW6o>Ld7X(+WggI&2Q=mZorzFo9@LlzHReH`iBM%8
z(wK)d<{_PlP-PzGOzAR*Ia5C44-2Mza3dt9yn_o`*!f}Rth$Z5hrvxLsM+
zW@_LZi9-WLfV3irB9KX8paRD&$za5i?K6;6Kz9rjp_B##63>2YsMKVx?QYQ>l
zU^x>8szAv=1(fF`14Srx!axOlU^x>8szAv=1(Y+%
zKoLrvFi?TzOcu>1(q{mpbC@>R6seC3>2Z%2?G^a+#~P%=;f
z=
z3{*fllMEE0)CmI>Sk8ojDo`>|0p(0GP=rz^3{+q_69%e4$v_2^Gs!>^N}VuJf#pmX
zr~)Me6;RG314Srx!axOM_lrzad5lWpfP=V!47^ngz0~JutBm+e#b;3XqDF%v2HBf}Afg)54RAItE
z6($*|!XyJ#m@rU<2?JG_WS|O@3{+vlKouqoRAG{VDoip^g$V;ym@rU1;PyD(5hxMRFC$v_d}Hpq~evTtFah-BZwKoQBlB?Cn$`<4t8A(o2fTd+{pwLARB
zYL9-9-X5o~Z1ehepNi72R9e-b^$w$2JDY{$p3Tw0rGsZOti7Dg)A0)0$y)hDOw|
z^s+L6cZ955^02X7LyJKsnq5!qwPxR|L948^iOP;Yp0ui_{EX2kKE1(3)&2(eg@l
zc8$)hEnYH1>ro5{x5neSR=rj?Zf=Hcp!8H8X3q^|$KuIX}U=XmxU+NWmErABAZ
zHMh&8T`Z+xxi8diMIIr{%dUqhbyGwdEOz#bdx3Hk~mDPJndXJ~~2GV=Gr1!$~
z`N8zDM@XMn(^l{6PBrF6r}O`lZ42V~>!2NlCxN(=QFAU)m-8QkcFln11{b(ifEU
z1)jbzkiM`>`a+n#IGBFo5z-fx^hKV=$OZpk`^!*uqHJV!L*sg{UL9U)hN=@~BdZ%4r*nFB
zc+(lGPL%1@IR_rDXK%Q9QMArKT=B2g@^v{%Elf^$)0A6X
zO&4E%BY5TV57V!`{Q9exUt53qb=EZ8>dJCBTBj_lV0>bKVjDJA_2EghbpDBL+0EN1{T0GbXYQ2)OE`q7TJ@8jJoywN*Zu4+el-rxl}2c0i!~U
z`s3%h9yCZaaw_XqOPS1KhMFNZj>b|6x3Tn6q-%9H**k6~lev&8j$`#cJK22f{8KTx
zwLwBj04=`{79&}}{O){b@B)tQjo34_#SV?)jHOY~1mZoed*k8-d{mtbJ$2{#nO2Zmpxp57q}$a>0XzxGCMaTZ5
zE$bh5Cp_#qpo+44)q+}_h9`7wXw}Ex6!KG`?!T89)?OV5^!BEHGB6yeA
zX=5=T6FZAk;Tl_~TMljOhU!9lF0YO5=JKR_rrxl3>E_X+WvG61UT|SV-vm}<
zu#}(|2Mf{7BU{`(nE&`-dSG0eJsRDZ0p)BX5w}c+)dqSGO-*Cdv=JvUZ1cj!
z)B#MMLN(vYXO6LO#?wTiG3A_z(Ir0d!#S0Cnx(!2>>{I%*x3;jJ61|T)vfTBY6xff
z=y~yQ$sKi9JXDP5i1n%2%H=Bbg{?$6b)ObfH)%)s1~47)LbkK_3v_+8T4ko$M
zclo)CO<~ffYzkvnbmXqED2@N+Uz->HaI^f|=RTZwp8xX~yY3Y)e8s(D><%BJrtwW?
zYMt0U?s~CZ6WZXMKw{c#K1sY2q=!yUq5_w;olhEc=v}%!t!+Z5PEEo*8!Sjhw)lLm
zOx$W)TftjxE5g=-tFN`!@C%5ocb(2UK$Bu;%3~W;VC)oRQIP1YTalfTTv!s_DRJ@4
zxOQdDav1I4-Pm9(xY|bDH#Q6wY~1i^`u2SBlH!e9zP1DYWxxOYmwarpQ%t>*e$el?
z9*ny3OI+!NS0e1j6pLULiG9lcj$;%qr;nu!-co2R*k-D1{r|CqKQ#Q0jHOc;
zOFtetmVAtuUyK|}w{hjp&{(n=)(FK|Ix3z@hujxe@aGI==sd+!p5d&_i`NXk
zqs{;5QF7ujLk>;xv)pC{Dp%eQ#h*Ch;z7aR57Xo6
zafAQs<8-WAtk%R0puf~StOR{N;$3sNuDkYK+t`Pv!#B?(ef@YVIUY0MdN@DPN}4e%
zf(Ii-C+P}_aK88Ot~R%yTsr59-vCo*^W{}o>M=s{du6Wq_7;y4Y8
z=61ZE$%y~Ypi910+<$`)q(zV64;-lQm^?X7Cr!MBFNQ>5Bck9TIm%K

Re: [PATCH] block/rbd: fix write zeroes with growing images

2022-03-18 Thread Hanna Reitz


On 17.03.22 17:26, Stefano Garzarella wrote:

Commit d24f80234b ("block/rbd: increase dynamically the image size")
added a workaround to support growing images (eg. qcow2), resizing
the image before write operations that exceed the current size.

We recently added support for write zeroes and without the
workaround we can have problems with qcow2.

So let's move the resize into qemu_rbd_start_co() and do it when
the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993
Fixes: c56ac27d2a ("block/rbd: add write zeroes support")
Signed-off-by: Stefano Garzarella 
---
  block/rbd.c | 26 ++
  1 file changed, 14 insertions(+), 12 deletions(-)


Thanks, applied to my block branch:

https://gitlab.com/hreitz/qemu/-/commits/block

Hanna

[PATCH v8 29/46] hw/pci-host/gpex-acpi: Add support for dsdt construction for pxb-cxl

2022-03-18 Thread Jonathan Cameron via

This adds code to instantiate the slightly extended ACPI root port
description in DSDT as per the CXL 2.0 specification.

Basically a cut and paste job from the i386/pc code.

Signed-off-by: Jonathan Cameron 
Signed-off-by: Ben Widawsky 
Reviewed-by: Alex Bennée 
---
 hw/arm/Kconfig  |  1 +
 hw/pci-host/gpex-acpi.c | 20 +---
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index 97f3b38019..219262a8da 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -29,6 +29,7 @@ config ARM_VIRT
 select ACPI_APEI
 select ACPI_VIOT
 select VIRTIO_MEM_SUPPORTED
+select ACPI_CXL
 
 config CHEETAH
 bool
diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
index e7e162a00a..7c7316bc96 100644
--- a/hw/pci-host/gpex-acpi.c
+++ b/hw/pci-host/gpex-acpi.c
@@ -5,6 +5,7 @@
 #include "hw/pci/pci_bus.h"
 #include "hw/pci/pci_bridge.h"
 #include "hw/pci/pcie_host.h"
+#include "hw/acpi/cxl.h"
 
 static void acpi_dsdt_add_pci_route_table(Aml *dev, uint32_t irq)
 {
@@ -139,6 +140,7 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
 QLIST_FOREACH(bus, >child, sibling) {
 uint8_t bus_num = pci_bus_num(bus);
 uint8_t numa_node = pci_bus_numa_node(bus);
+bool is_cxl = pci_bus_is_cxl(bus);
 
 if (!pci_bus_is_root(bus)) {
 continue;
@@ -154,8 +156,16 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
 }
 
 dev = aml_device("PC%.02X", bus_num);
-aml_append(dev, aml_name_decl("_HID", aml_string("PNP0A08")));
-aml_append(dev, aml_name_decl("_CID", aml_string("PNP0A03")));
+if (is_cxl) {
+struct Aml *pkg = aml_package(2);
+aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0016")));
+aml_append(pkg, aml_eisaid("PNP0A08"));
+aml_append(pkg, aml_eisaid("PNP0A03"));
+aml_append(dev, aml_name_decl("_CID", pkg));
+} else {
+aml_append(dev, aml_name_decl("_HID", aml_string("PNP0A08")));
+aml_append(dev, aml_name_decl("_CID", aml_string("PNP0A03")));
+}
 aml_append(dev, aml_name_decl("_BBN", aml_int(bus_num)));
 aml_append(dev, aml_name_decl("_UID", aml_int(bus_num)));
 aml_append(dev, aml_name_decl("_STR", aml_unicode("pxb Device")));
@@ -175,7 +185,11 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
 cfg->pio.base, 0, 0, 0);
 aml_append(dev, aml_name_decl("_CRS", crs));
 
-acpi_dsdt_add_pci_osc(dev);
+if (is_cxl) {
+build_cxl_osc_method(dev);
+} else {
+acpi_dsdt_add_pci_osc(dev);
+}
 
 aml_append(scope, dev);
 }
-- 
2.32.0

[PATCH v8 46/46] docs/cxl: Add switch documentation

2022-03-18 Thread Jonathan Cameron via

Switches were already introduced, but now we support them update
the documentation to provide an example in diagram and
qemu command line parameter forms.

Signed-off-by: Jonathan Cameron 
---
 docs/system/devices/cxl.rst | 88 -
 1 file changed, 86 insertions(+), 2 deletions(-)

diff --git a/docs/system/devices/cxl.rst b/docs/system/devices/cxl.rst
index 6871c26efd..ebc5fb3862 100644
--- a/docs/system/devices/cxl.rst
+++ b/docs/system/devices/cxl.rst
@@ -118,8 +118,6 @@ and associated component register access via PCI bars.
 
 CXL Switch
 ~~
-Not yet implemented in QEMU.
-
 Here we consider a simple CXL switch with only a single
 virtual hierarchy. Whilst more complex devices exist, their
 visibility to a particular host is generally the same as for
@@ -137,6 +135,10 @@ BARs.  The Upstream Port has the configuration interfaces 
for
 the HDM decoders which route incoming memory accesses to the
 appropriate downstream port.
 
+A CXL switch is created in a similar fashion to PCI switches
+by creating an upstream port (cxl-upstream) and a number of
+downstream ports on the internal switch bus (cxl-downstream).
+
 CXL Memory Devices - Type 3
 ~~~
 CXL type 3 devices use a PCI class code and are intended to be supported
@@ -240,6 +242,62 @@ Notes:
 they will take the Host Physical Addresses of accesses and map
 them to their own local Device Physical Address Space (DPA).
 
+Example topology involving a switch::
+
+  |<--SYSTEM PHYSICAL ADDRESS MAP (1)->|
+  |__   __   __|
+  |   |  | |  | |  |   |
+  |   | CFMW 0   | |  CXL Fixed Memory Window 1   | | CFMW 1   |   |
+  |   | HB0 only | |  Configured to interleave memory | | HB1 only |   |
+  |   |  | |  memory accesses across HB0/HB1  | |  |   |
+  |   |x_| |__| |__|   |
+   | | | |
+   | | | |
+   | | |
+  Interleave Decoder | | |
+   Matches this HB   | | |
+   \_| |_/
+   __|__  _|___
+  | || |
+  | CXL HB 0|| CXL HB 1|
+  | HB IntLv Decoders   || HB IntLv Decoders   |
+  | PCI/CXL Root Bus 0c || PCI/CXL Root Bus 0d |
+  | || |
+  |___x_||_|
+  |  |  |   |
+  |
+   A HB 0 HDM Decoder
+   matches this Port
+   ___|___
+  |  Root Port 0  |
+  |  Appears in   |
+  |  PCI topology |
+  |  As 0c:00.0   |
+  |___x___|
+  |
+  |
+  \_
+|
+|
+---
+   |Switch 0  USP as PCI 0d:00.0   |
+   |USP has HDM decoder which direct traffic to|
+   |appropiate downstream port |
+   |Switch BUS appears as 0e   |
+   |x__|
+|  |   |  |
+|  |   |  |
+   _|_   __|__   __|_   __|___
+   (4)| x | | | || |  |
+  | CXL Type3 0   | | CXL Type3 1 | | CXL type3 2| | CLX Type 3 3 |
+  |   | | | || |  |
+  | PMEM0(Vol LSA)| | PMEM1 (...) | | PMEM2 (...)| | PMEM3 (...)  |
+  | Decoder to go | | | || |  |
+  | from host PA  | | PCI 10:00.0 | | PCI 11:00.0| | PCI 12:00.0  |
+  | to device PA  | | | || |  |
+  | PCI as 0f:00.0| | | || |  |
+  |___| |_| || |__|
+
 Example command lines
 -
 A very simple setup with just one directly attached CXL Type 3 device::
@@ -279,6 +337,32 @@ the CXL Type3 device directly attached (no switches).::
   -device 
cxl-type3,bus=root_port16,memdev=cxl-mem4,lsa=cxl-lsa4,id=cxl-pmem3,size=256M \
   -cxl-fixed-memory-window 
targets.0=cxl.1,targets.1=cxl.2,size=4G,interleave-granularity=8k
 
+An example of 4 devices below a switch

[PATCH v8 28/46] acpi/cxl: Introduce CFMWS structures in CEDT

2022-03-18 Thread Jonathan Cameron via

From: Ben Widawsky 

The CEDT CXL Fixed Window Memory Window Structures (CFMWs)
define regions of the host phyiscal address map which
(via an impdef means) are configured such that they have
a particular interleave setup across one or more CXL Host Bridges.

Reported-by: Alison Schofield 
Signed-off-by: Ben Widawsky 
Signed-off-by: Jonathan Cameron 
Reviewed-by: Alex Bennée 
---
 hw/acpi/cxl.c | 59 +++
 1 file changed, 59 insertions(+)

diff --git a/hw/acpi/cxl.c b/hw/acpi/cxl.c
index aa4af86a4c..31d5235136 100644
--- a/hw/acpi/cxl.c
+++ b/hw/acpi/cxl.c
@@ -60,6 +60,64 @@ static void cedt_build_chbs(GArray *table_data, PXBDev *cxl)
 build_append_int_noprefix(table_data, memory_region_size(mr), 8);
 }
 
+/*
+ * CFMWS entries in CXL 2.0 ECN: CEDT CFMWS & QTG _DSM.
+ * Interleave ways encoding in CXL 2.0 ECN: 3, 6, 12 and 16-way memory
+ * interleaving.
+ */
+static void cedt_build_cfmws(GArray *table_data, MachineState *ms)
+{
+CXLState *cxls = ms->cxl_devices_state;
+GList *it;
+
+for (it = cxls->fixed_windows; it; it = it->next) {
+CXLFixedWindow *fw = it->data;
+int i;
+
+/* Type */
+build_append_int_noprefix(table_data, 1, 1);
+
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 1);
+
+/* Record Length */
+build_append_int_noprefix(table_data, 36 + 4 * fw->num_targets, 2);
+
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 4);
+
+/* Base HPA */
+build_append_int_noprefix(table_data, fw->mr.addr, 8);
+
+/* Window Size */
+build_append_int_noprefix(table_data, fw->size, 8);
+
+/* Host Bridge Interleave Ways */
+build_append_int_noprefix(table_data, fw->enc_int_ways, 1);
+
+/* Host Bridge Interleave Arithmetic */
+build_append_int_noprefix(table_data, 0, 1);
+
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 2);
+
+/* Host Bridge Interleave Granularity */
+build_append_int_noprefix(table_data, fw->enc_int_gran, 4);
+
+/* Window Restrictions */
+build_append_int_noprefix(table_data, 0x0f, 2); /* No restrictions */
+
+/* QTG ID */
+build_append_int_noprefix(table_data, 0, 2);
+
+/* Host Bridge List (list of UIDs - currently bus_nr) */
+for (i = 0; i < fw->num_targets; i++) {
+g_assert(fw->target_hbs[i]);
+build_append_int_noprefix(table_data, fw->target_hbs[i]->bus_nr, 
4);
+}
+}
+}
+
 static int cxl_foreach_pxb_hb(Object *obj, void *opaque)
 {
 Aml *cedt = opaque;
@@ -86,6 +144,7 @@ void cxl_build_cedt(MachineState *ms, GArray *table_offsets, 
GArray *table_data,
 /* reserve space for CEDT header */
 
 object_child_foreach_recursive(object_get_root(), cxl_foreach_pxb_hb, 
cedt);
+cedt_build_cfmws(cedt->buf, ms);
 
 /* copy AML table into ACPI tables blob and patch header there */
 g_array_append_vals(table_data, cedt->buf->data, cedt->buf->len);
-- 
2.32.0

[PATCH v8 45/46] cxl/cxl-host: Support interleave decoding with one level of switches.

2022-03-18 Thread Jonathan Cameron via

Extend the walk of the CXL bus during interleave decoding to take
into account one layer of switches.

Whilst theoretically CXL 2.0 allows multiple switch levels, in the
vast majority of usecases only one level is expected and currently
that is all the proposed Linux support provides.

Signed-off-by: Jonathan Cameron 
---
 hw/cxl/cxl-host.c | 44 ++--
 1 file changed, 42 insertions(+), 2 deletions(-)

diff --git a/hw/cxl/cxl-host.c b/hw/cxl/cxl-host.c
index a1eafa89bb..ac20d9e2f5 100644
--- a/hw/cxl/cxl-host.c
+++ b/hw/cxl/cxl-host.c
@@ -130,8 +130,9 @@ static bool cxl_hdm_find_target(uint32_t *cache_mem, hwaddr 
addr,
 
 static PCIDevice *cxl_cfmws_find_device(CXLFixedWindow *fw, hwaddr addr)
 {
-CXLComponentState *hb_cstate;
+CXLComponentState *hb_cstate, *usp_cstate;
 PCIHostState *hb;
+CXLUpstreamPort *usp;
 int rb_index;
 uint32_t *cache_mem;
 uint8_t target;
@@ -166,7 +167,46 @@ static PCIDevice *cxl_cfmws_find_device(CXLFixedWindow 
*fw, hwaddr addr)
 
 d = pci_bridge_get_sec_bus(PCI_BRIDGE(rp))->devices[0];
 
-if (!d || !object_dynamic_cast(OBJECT(d), TYPE_CXL_TYPE3_DEV)) {
+if (!d) {
+return NULL;
+}
+
+if (object_dynamic_cast(OBJECT(d), TYPE_CXL_TYPE3_DEV)) {
+return d;
+}
+
+/*
+ * Could also be a switch.  Note only one level of switching currently
+ * supported.
+ */
+if (!object_dynamic_cast(OBJECT(d), TYPE_CXL_USP)) {
+return NULL;
+}
+usp = CXL_USP(d);
+
+usp_cstate = cxl_usp_to_cstate(usp);
+if (!usp_cstate) {
+return NULL;
+}
+
+cache_mem = usp_cstate->crb.cache_mem_registers;
+
+target_found = cxl_hdm_find_target(cache_mem, addr, );
+if (!target_found) {
+return NULL;
+}
+
+d = pcie_find_port_by_pn(_BRIDGE(d)->sec_bus, target);
+if (!d) {
+return NULL;
+}
+
+d = pci_bridge_get_sec_bus(PCI_BRIDGE(d))->devices[0];
+if (!d) {
+return NULL;
+}
+
+if (!object_dynamic_cast(OBJECT(d), TYPE_CXL_TYPE3_DEV)) {
 return NULL;
 }
 
-- 
2.32.0

1 2 3 >

1 - 100 of 264 matches

Mail list logo