Re: Regression caught by replay_kernel.py:ReplayKernelNormal.test_aarch64_virt

2021-07-26 Thread Pavel Dovgalyuk

On 27.07.2021 03:39, Cleber Rosa wrote:


Hi everyone,

tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_aarch64_virt
is currently failing consistently (first found that in [1]).

I've bisected it down to the following commit:


Thanks for bisecting.
I didn't try to understand why the bug happens, but it can be solved 
with the following patch:


--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1428,7 +1428,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,

 max_insns = cflags & CF_COUNT_MASK;
 if (max_insns == 0) {
-max_insns = TCG_MAX_INSNS;
+max_insns = CF_COUNT_MASK;
 }
 QEMU_BUILD_BUG_ON(CF_COUNT_MASK + 1 != TCG_MAX_INSNS);




---

78ff82bb1b67c0d79113688e4b3427fc99cab9d4 is the first bad commit
commit 78ff82bb1b67c0d79113688e4b3427fc99cab9d4
Author: Richard Henderson 

 accel/tcg: Reduce CF_COUNT_MASK to match TCG_MAX_INSNS
 
 The space reserved for CF_COUNT_MASK was overly large.

 Reduce to free up cflags bits and eliminate an extra test.
 
 Tested-by: Mark Cave-Ayland 

 Signed-off-by: Richard Henderson 
 Reviewed-by: Alex Bennée 
 Reviewed-by: Peter Maydell 
 Message-Id: <20210717221851.2124573-2-richard.hender...@linaro.org>

  accel/tcg/translate-all.c | 5 ++---
  include/exec/exec-all.h   | 4 +++-
  2 files changed, 5 insertions(+), 4 deletions(-)

---

To reproduce it:

1. configure --target-list=aarch64-softmmu
2. meson compile
3. make check-venv
4. ./tests/venv/bin/avocado --show=test run 
tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_aarch64_virt

PS: I haven't had the time yet to scan the mailing list for possible
discussions about it.

[1] https://gitlab.com/qemu-project/qemu/-/jobs/1445513133#L268






Re: aarch64 efi boot failures with qemu 6.0+

2021-07-26 Thread Guenter Roeck

On 7/26/21 9:45 PM, Michael S. Tsirkin wrote:

On Mon, Jul 26, 2021 at 06:00:57PM +0200, Ard Biesheuvel wrote:

(cc Bjorn)

On Mon, 26 Jul 2021 at 11:08, Philippe Mathieu-Daudé  wrote:


On 7/26/21 12:56 AM, Guenter Roeck wrote:

On 7/25/21 3:14 PM, Michael S. Tsirkin wrote:

On Sat, Jul 24, 2021 at 11:52:34AM -0700, Guenter Roeck wrote:

Hi all,

starting with qemu v6.0, some of my aarch64 efi boot tests no longer
work. Analysis shows that PCI devices with IO ports do not instantiate
in qemu v6.0 (or v6.1-rc0) when booting through efi. The problem affects
(at least) ne2k_pci, tulip, dc390, and am53c974. The problem only
affects
aarch64, not x86/x86_64.

I bisected the problem to commit 0cf8882fd0 ("acpi/gpex: Inform os to
keep firmware resource map"). Since this commit, PCI device BAR
allocation has changed. Taking tulip as example, the kernel reports
the following PCI bar assignments when running qemu v5.2.

[3.921801] pci :00:01.0: [1011:0019] type 00 class 0x02
[3.922207] pci :00:01.0: reg 0x10: [io  0x-0x007f]
[3.922505] pci :00:01.0: reg 0x14: [mem 0x1000-0x107f]


IIUC, these lines are read back from the BARs


[3.927111] pci :00:01.0: BAR 0: assigned [io  0x1000-0x107f]
[3.927455] pci :00:01.0: BAR 1: assigned [mem
0x1000-0x107f]



... and this is the assignment created by the kernel.


With qemu v6.0, the assignment is reported as follows.

[3.922887] pci :00:01.0: [1011:0019] type 00 class 0x02
[3.923278] pci :00:01.0: reg 0x10: [io  0x-0x007f]
[3.923451] pci :00:01.0: reg 0x14: [mem 0x1000-0x107f]



The problem here is that Linux, for legacy reasons, does not support
I/O ports <= 0x1000 on PCI, so the I/O assignment created by EFI is
rejected.

This might make sense on x86, where legacy I/O ports may exist, but on
other architectures, this makes no sense.



Fixing Linux makes sense but OTOH EFI probably shouldn't create mappings
that trip up existing guests, right?



I think it is difficult to draw a line. Sure, maybe EFI should not create
such mappings, but then maybe qemu should not suddenly start to enforce
those mappings for existing guests either.

For my own testing, I simply reverted commit 0cf8882fd0 in my copy of
qemu. That solves my immediate problem, giving us time to find a solution
that is acceptable for everyone. After all, it doesn't look like anyone
else has noticed the problem, so there is no real urgency.

Thanks,
Guenter



[PATCH] hw/i386/ich9: add comment explaining an argument to acpi_pcihp_reset call

2021-07-26 Thread Ani Sinha
acpi_pcihp_reset() call from ich9/pm_reset() passes an unconditional truth value
as the second argument. Added a commnet here to explain the reason why the
argument is being passed unconditionally.

Signed-off-by: Ani Sinha 
---
 hw/acpi/ich9.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
index 778e27b659..b2e3c46075 100644
--- a/hw/acpi/ich9.c
+++ b/hw/acpi/ich9.c
@@ -281,6 +281,11 @@ static void pm_reset(void *opaque)
 pm->smi_en_wmask = ~0;
 
 if (pm->use_acpi_hotplug_bridge) {
+/*
+ * PCI Express root buses do not support hot-plug, for
+ * details see docs/pcie.txt. Hence, the second argument
+ * is unconditionally true.
+ */
 acpi_pcihp_reset(>acpi_pci_hotplug, true);
 }
 
-- 
2.25.1




Re: aarch64 efi boot failures with qemu 6.0+

2021-07-26 Thread Michael S. Tsirkin
On Mon, Jul 26, 2021 at 06:00:57PM +0200, Ard Biesheuvel wrote:
> (cc Bjorn)
> 
> On Mon, 26 Jul 2021 at 11:08, Philippe Mathieu-Daudé  
> wrote:
> >
> > On 7/26/21 12:56 AM, Guenter Roeck wrote:
> > > On 7/25/21 3:14 PM, Michael S. Tsirkin wrote:
> > >> On Sat, Jul 24, 2021 at 11:52:34AM -0700, Guenter Roeck wrote:
> > >>> Hi all,
> > >>>
> > >>> starting with qemu v6.0, some of my aarch64 efi boot tests no longer
> > >>> work. Analysis shows that PCI devices with IO ports do not instantiate
> > >>> in qemu v6.0 (or v6.1-rc0) when booting through efi. The problem affects
> > >>> (at least) ne2k_pci, tulip, dc390, and am53c974. The problem only
> > >>> affects
> > >>> aarch64, not x86/x86_64.
> > >>>
> > >>> I bisected the problem to commit 0cf8882fd0 ("acpi/gpex: Inform os to
> > >>> keep firmware resource map"). Since this commit, PCI device BAR
> > >>> allocation has changed. Taking tulip as example, the kernel reports
> > >>> the following PCI bar assignments when running qemu v5.2.
> > >>>
> > >>> [3.921801] pci :00:01.0: [1011:0019] type 00 class 0x02
> > >>> [3.922207] pci :00:01.0: reg 0x10: [io  0x-0x007f]
> > >>> [3.922505] pci :00:01.0: reg 0x14: [mem 0x1000-0x107f]
> 
> IIUC, these lines are read back from the BARs
> 
> > >>> [3.927111] pci :00:01.0: BAR 0: assigned [io  0x1000-0x107f]
> > >>> [3.927455] pci :00:01.0: BAR 1: assigned [mem
> > >>> 0x1000-0x107f]
> > >>>
> 
> ... and this is the assignment created by the kernel.
> 
> > >>> With qemu v6.0, the assignment is reported as follows.
> > >>>
> > >>> [3.922887] pci :00:01.0: [1011:0019] type 00 class 0x02
> > >>> [3.923278] pci :00:01.0: reg 0x10: [io  0x-0x007f]
> > >>> [3.923451] pci :00:01.0: reg 0x14: [mem 0x1000-0x107f]
> > >>>
> 
> The problem here is that Linux, for legacy reasons, does not support
> I/O ports <= 0x1000 on PCI, so the I/O assignment created by EFI is
> rejected.
> 
> This might make sense on x86, where legacy I/O ports may exist, but on
> other architectures, this makes no sense.


Fixing Linux makes sense but OTOH EFI probably shouldn't create mappings
that trip up existing guests, right?

> 
> > >>> and the controller does not instantiate. The problem disapears after
> > >>> reverting commit 0cf8882fd0.
> > >>>
> > >>> Attached is a summary of test runs with various devices and qemu v5.2
> > >>> as well as qemu v6.0, and the command line I use for efi boots.
> > >>>
> > >>> Did commit 0cf8882fd0 introduce a bug, do I now need need some different
> > >>> command line to instantiate PCI devices with io ports, or are such
> > >>> devices
> > >>> simply no longer supported if the system is booted with efi support ?
> > >>>
> > >>> Thanks,
> > >>> Guenter
> > >>
> > >>
> > >> So that commit basically just says don't ignore what efi did.
> > >>
> > >> The issue's thus likely efi.
> > >>
> > >
> > > I don't see the problem with efi boots on x86 and x86_64.
> > > Any idea why that might be the case ?
> > >
> > > Thanks,
> > > Guenter
> > >
> > >> Cc the maintainer. Philippe can you comment pls?
> >
> > I'll have a look. Cc'ing Ard for EDK2/Aarch64.
> >
> 
> So a potential workaround would be to use a different I/O resource
> window for ArmVirtPkg, that starts at 0x1000. But I would prefer to
> fix Linux instead.
> 
> 
> > >>
> > >>> ---
> > >>> Command line (tulip network interface):
> > >>>
> > >>> CMDLINE="root=/dev/vda console=ttyAMA0"
> > >>> ROOTFS="rootfs.ext2"
> > >>>
> > >>> qemu-system-aarch64 -M virt -kernel arch/arm64/boot/Image -no-reboot \
> > >>>  -m 512 -cpu cortex-a57 -no-reboot \
> > >>>  -device tulip,netdev=net0 -netdev user,id=net0 \
> > >>>  -bios QEMU_EFI-aarch64.fd \
> > >>>  -snapshot \
> > >>>  -device virtio-blk-device,drive=d0 \
> > >>>  -drive file=${ROOTFS},if=none,id=d0,format=raw \
> > >>>  -nographic -serial stdio -monitor none \
> > >>>  --append "${CMDLINE}"
> > >>>
> > >>> ---
> > >>> Boot tests with various devices known to work in qemu v5.2.
> > >>>
> > >>> v5.2v6.0v6.0
> > >>> efinon-efiefi
> > >>> e1000passpasspass
> > >>> e1000-82544gcpasspasspass
> > >>> e1000-82545empasspasspass
> > >>> e1000epasspasspass
> > >>> i82550passpasspass
> > >>> i82557apasspasspass
> > >>> i82557bpasspasspass
> > >>> i82557cpasspasspass
> > >>> i82558apasspasspass
> > >>> i82559bpasspasspass
> > >>> i82559cpasspasspass
> > >>> i82559erpasspasspass
> > >>> i82562passpasspass
> > >>> i82801passpasspass
> > >>> ne2k_pcipasspassfail<--
> > >>> pcnetpasspasspass
> > >>> rtl8139passpasspass
> > >>> tulippasspassfail<--
> > >>> usb-net  

Re: aarch64 efi boot failures with qemu 6.0+

2021-07-26 Thread Guenter Roeck

On 7/26/21 2:31 PM, Bjorn Helgaas wrote:

[+cc linux-pci]

On Mon, Jul 26, 2021 at 04:16:29PM -0500, Bjorn Helgaas wrote:

On Mon, Jul 26, 2021 at 06:00:57PM +0200, Ard Biesheuvel wrote:

On Mon, 26 Jul 2021 at 11:08, Philippe Mathieu-Daudé  wrote:

On 7/26/21 12:56 AM, Guenter Roeck wrote:

On 7/25/21 3:14 PM, Michael S. Tsirkin wrote:

On Sat, Jul 24, 2021 at 11:52:34AM -0700, Guenter Roeck wrote:

Hi all,

starting with qemu v6.0, some of my aarch64 efi boot tests no longer
work. Analysis shows that PCI devices with IO ports do not instantiate
in qemu v6.0 (or v6.1-rc0) when booting through efi. The problem affects
(at least) ne2k_pci, tulip, dc390, and am53c974. The problem only
affects
aarch64, not x86/x86_64.

I bisected the problem to commit 0cf8882fd0 ("acpi/gpex: Inform os to
keep firmware resource map"). Since this commit, PCI device BAR
allocation has changed. Taking tulip as example, the kernel reports
the following PCI bar assignments when running qemu v5.2.

[3.921801] pci :00:01.0: [1011:0019] type 00 class 0x02
[3.922207] pci :00:01.0: reg 0x10: [io  0x-0x007f]
[3.922505] pci :00:01.0: reg 0x14: [mem 0x1000-0x107f]


IIUC, these lines are read back from the BARs


[3.927111] pci :00:01.0: BAR 0: assigned [io  0x1000-0x107f]
[3.927455] pci :00:01.0: BAR 1: assigned [mem
0x1000-0x107f]



... and this is the assignment created by the kernel.


With qemu v6.0, the assignment is reported as follows.

[3.922887] pci :00:01.0: [1011:0019] type 00 class 0x02
[3.923278] pci :00:01.0: reg 0x10: [io  0x-0x007f]
[3.923451] pci :00:01.0: reg 0x14: [mem 0x1000-0x107f]


The problem here is that Linux, for legacy reasons, does not support
I/O ports <= 0x1000 on PCI, so the I/O assignment created by EFI is
rejected.

This might make sense on x86, where legacy I/O ports may exist, but on
other architectures, this makes no sense.


I guess this is the "#define PCIBIOS_MIN_IO 0x1000" in
arm64/include/asm/pci.h.  From a PCI point of view, I'm not opposed to
changing that to 0, as it is on csky, riscv, sh, sparc, um.  But it's
really an arch question, so the arm64 folks would have to weigh in.

But I don't think that would fix this.  PCIBIOS_MIN_IO is mainly used
when we assign or reassign resources to a BAR, and if firmware tells
us to preserve the assignments done by firmware, Linux shouldn't be
doing any assignment or reassignment.

Linux received 00:01.0 BAR 0 as [io 0x-0x007f], and Guenter didn't
report any reassignment, so I assume Linux saw the
DSM_PCI_PRESERVE_BOOT_CONFIG [1] and didn't change anything.

Could this be due to drivers assuming that an I/O BAR of 0 is invalid?
I see that at least ne2k_pci_init_one() [2] seems to assume that.  And


Correct, and ne2k_pci is known to already fail on architectures where the
IO address range starts at 0, such as riscv. Not that it helps to fix the
code - doing so only results in a crash elsewhere when running a riscv
emulation (when executing outsl, suggesting that there may be a problem
with that emulation or its use). But that is a different problem.


tulip_init_one() [3] and pci_esp_probe_one() (am53c974.c, [4]) use
pci_iomap() [5], which fails if the resource starts at 0.

So pci_iomap() is probably already broken on the arches above that
allow I/O BARs to be zero.  Maybe pci_iomap() should only fail on
"!start" for *memory* BARs, e.g.,

diff --git a/lib/pci_iomap.c b/lib/pci_iomap.c
index 2d3eb1cb73b8..77455e702a3e 100644
--- a/lib/pci_iomap.c
+++ b/lib/pci_iomap.c
@@ -34,7 +34,9 @@ void __iomem *pci_iomap_range(struct pci_dev *dev,
resource_size_t len = pci_resource_len(dev, bar);
unsigned long flags = pci_resource_flags(dev, bar);
  
-	if (len <= offset || !start)

+   if (flags & IORESOURCE_MEM && !start)
+   return NULL;


I am far out of my league here, but what is the purpose of the !start
check given the PCIBIOS_MIN_MEM define which can also be 0 ? Shouldn't
the check be against PCIBIOS_MIN_MEM and PCIBIOS_MIN_IO ?

But, anyway, the above change fixes the problem for 'tulip', though
obviously not for 'ne2k_pci'. 'ne2k_pci' starts working if I remove
the "!ioaddr" check in ne2k_pci_init_one().

Thanks,
Guenter


+   if (len <= offset)
return NULL;
len -= offset;
start += offset;


[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/acpi/pci_root.c?id=v5.13#n915
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/8390/ne2k-pci.c?id=v5.13#n247
[3] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/dec/tulip/tulip_core.c?id=v5.13#n1418
[4] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/scsi/am53c974.c?id=v5.13#n431
[5] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/pci_iomap.c?id=v5.13#n37



[PATCH] hw/char/virtio-serial-bus: fix: Unpop throttled VirtQueueElement to queue before discard vq data

2021-07-26 Thread AIERPATIJIANG1 [艾尔帕提江·阿布都赛买提]
Ports enter a "throttled" state when writing to the chardev would block.
The current output VirtQueueElement is kept around until the chardev
becomes writable again.
 
Because closing the virtio serial device does not reset the queue, we cannot
directly discard this element,  otherwise the control variables of the front
and back ends of the queue are inconsistent such as used_index. We should unpop 
the
VirtQueueElement to queue, let discard_vq_data process it.
 
The test environment:
kernel: linux-5.12
Qemu command:
Qemu-system-x86 -machine pc,accel=kvm \
    -cpu host,host-phys-bits \
    -smp 4 \
    -m 4G \
    -kernel ./kernel \
    -display none \
    -nodefaults \
    -serial mon:stdio \
    -append "panic=1 no_timer_check noreplace-smp rootflags=data=ordered 
rootfstype=ext4 console=ttyS0 reboot=k root=/dev/vda1 rw" \
    -drive id=os,file=./disk,if=none \
    -device virtio-blk-pci,drive=os \
    -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 \
    -chardev socket,id=charchannel0,path=/tmp/char-dev-test,server,nowait \
  -device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
 
full up virtio queue after VM started:
Cat /large-file > /dev/vport1p1
 
Host side:
Open and close character device sockets repeatedly
 
After awhile we can’t write any request to /dev/vport1p1 at VM side, VM kernel 
soft lockup at drivers/char/virtio_console.c: __send_to_port
 
 
Signed-off-by: Arafatms 
 
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index dd6bc27b3b..36236defdf 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -150,8 +150,12 @@ static void discard_vq_data(VirtQueue *vq, VirtIODevice 
*vdev)
 
static void discard_throttle_data(VirtIOSerialPort *port)
{
+    if (!virtio_queue_ready(port->ovq)) {
+    return;
+    }
+
 if (port->elem) {
-    virtqueue_detach_element(port->ovq, port->elem, 0);
+    virtqueue_unpop(port->ovq, port->elem, 0);
 g_free(port->elem);
 port->elem = NULL;
 }



Re: [PATCH v2 07/22] target/loongarch: Add fixed point arithmetic instruction translation

2021-07-26 Thread Song Gao
Hi, Richard.

On 07/26/2021 11:53 PM, Richard Henderson wrote:
> On 7/26/21 1:56 AM, Song Gao wrote:
>> Hi, Richard.
>>
>> On 07/23/2021 08:46 AM, Richard Henderson wrote:
>>> On 7/20/21 11:53 PM, Song Gao wrote:
 +/* Fixed point arithmetic operation instruction translation */
 +static bool trans_add_w(DisasContext *ctx, arg_add_w *a)
 +{
 +    TCGv Rd = cpu_gpr[a->rd];
 +    TCGv Rj = cpu_gpr[a->rj];
 +    TCGv Rk = cpu_gpr[a->rk];
 +
 +    if (a->rd == 0) {
 +    /* Nop */
 +    return true;
 +    }
 +
 +    if (a->rj != 0 && a->rk != 0) {
 +    tcg_gen_add_tl(Rd, Rj, Rk);
 +    tcg_gen_ext32s_tl(Rd, Rd);
 +    } else if (a->rj == 0 && a->rk != 0) {
 +    tcg_gen_mov_tl(Rd, Rk);
 +    } else if (a->rj != 0 && a->rk == 0) {
 +    tcg_gen_mov_tl(Rd, Rj);
 +    } else {
 +    tcg_gen_movi_tl(Rd, 0);
 +    }
 +
 +    return true;
 +}
>>>
>>> Do not do all of this "if reg(n) zero" testing.
>>>
>>> Use a common function to perform the gpr lookup, and a small callback 
>>> function for the operation.  Often, the callback function already exists 
>>> within include/tcg/tcg-op.h.
>>>
>>> Please see my riscv cleanup patch set I referenced vs patch 6.
>>
>> I am not sure  that 'riscv cleanup' patchs at:
>>        
>> https://patchew.org/QEMU/20210709042608.883256-1-richard.hender...@linaro.org
>>
>> It seems that  gpr_dst/gpr_src are common function to perform the gpr 
>> lookup. is that right?
> 
> More than that.  The gen_arith() function, for example, performs all of the 
> bookkeeping for a binary operation.
> 
> For example,
> 
> static bool gen_arith(DisasContext *ctx, arg_fmt_rdrjrk *a,
>   void (*func)(TCGv, TCGv, TCGv))
> {
>    TCGv dest = gpr_dst(ctx, a->rd);
>    TCGv src1 = gpr_src(ctx, a->rj);
>    TCGv src2 = gpr_src(ctx, a->rk);
> 
>     func(dest, src1, src2);
>     return true;
> }
> 
> #define TRANS(NAME, FUNC, ...) \
>     static bool trans_##NAME(DisasContext *ctx, arg_##NAME *a) \
>     { return FUNC(ctx, a, __VA_ARGS__); }
> 
> static void gen_add_w(TCGv dest, TCGv src1, TCGv src2)
> {
>     tcg_gen_add_tl(dest, src1, src2);
>     tcg_gen_ext32s_tl(dest, dest);
> }
> 
> TRANS(add_w, gen_arith, gen_add_w)
> TRANS(add_d, gen_arith, tcg_gen_add_tl)
> 
> 
OK

Again, thank you kindly help.

Thanks
Song Gao.




Re: [PATCH v2 12/22] target/loongarch: Add fixed point extra instruction translation

2021-07-26 Thread Song Gao
Hi, Richard.

On 07/27/2021 12:42 AM, Richard Henderson wrote:
> On 7/26/21 2:57 AM, Song Gao wrote:
>>
>> Hi, Richard.
>>
>> On 07/23/2021 01:12 PM, Richard Henderson wrote:
>>> On 7/20/21 11:53 PM, Song Gao wrote:
 +target_ulong helper_cpucfg(CPULoongArchState *env, target_ulong rj)
 +{
 +    target_ulong r = 0;
 +
 +    switch (rj) {
 +    case 0:
 +    r = env->CSR_MCSR0 & 0x;
 +    break;
 +    case 1:
 +    r = (env->CSR_MCSR0 & 0x) >> 32;
 +    break;
>>>
>>> Why do you represent all of these as high and low portions of a 64-bit 
>>> internal value, when the manual describes them as 32-bit values?
>>>
>> This method can reduce variables on env.
> 
> The number of variables may increase, but the memory consumed -- which is the 
> metric that is more important -- is still the same.
> 
> Also, it is much less confusing to match the description in the manual.
> 
OK.

Thanks
Song Gao.
> 
> r~




Regression caught by replay_kernel.py:ReplayKernelNormal.test_aarch64_virt

2021-07-26 Thread Cleber Rosa


Hi everyone,

tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_aarch64_virt
is currently failing consistently (first found that in [1]).

I've bisected it down to the following commit:

---

78ff82bb1b67c0d79113688e4b3427fc99cab9d4 is the first bad commit
commit 78ff82bb1b67c0d79113688e4b3427fc99cab9d4
Author: Richard Henderson 

accel/tcg: Reduce CF_COUNT_MASK to match TCG_MAX_INSNS

The space reserved for CF_COUNT_MASK was overly large.
Reduce to free up cflags bits and eliminate an extra test.

Tested-by: Mark Cave-Ayland 
Signed-off-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Reviewed-by: Peter Maydell 
Message-Id: <20210717221851.2124573-2-richard.hender...@linaro.org>

 accel/tcg/translate-all.c | 5 ++---
 include/exec/exec-all.h   | 4 +++-
 2 files changed, 5 insertions(+), 4 deletions(-)

---

To reproduce it:

1. configure --target-list=aarch64-softmmu
2. meson compile
3. make check-venv
4. ./tests/venv/bin/avocado --show=test run 
tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_aarch64_virt

PS: I haven't had the time yet to scan the mailing list for possible
discussions about it.

[1] https://gitlab.com/qemu-project/qemu/-/jobs/1445513133#L268

-- 
Cleber Rosa
[ Sr Software Engineer - Virtualization Team - Red Hat ]
[ Avocado Test Framework - avocado-framework.github.io ]
[  7ABB 96EB 8B46 B94D 5E0F  E9BB 657E 8D33 A5F2 09F3  ]




Re: [PATCH for-6.1 00/10] docs: Format literals correctly in rST

2021-07-26 Thread Richard Henderson

On 7/26/21 4:23 AM, Peter Maydell wrote:

Peter Maydell (10):
   docs/devel/build-system.rst: Format literals correctly
   docs/devel/build-system.rst: Correct typo in example code
   docs/devel/ebpf_rss.rst: Format literals correctly
   docs/devel/migration.rst: Format literals correctly
   docs/devel: Format literals correctly
   docs/system/s390x/protvirt.rst: Format literals correctly
   docs/system/arm/cpu-features.rst: Format literals correctly
   docs: Format literals correctly
   docs/about/removed-features: Fix markup error
   docs/tools/virtiofsd.rst: Delete stray backtick


Reviewed-by: Richard Henderson 

r~



Re: [PATCH for-6.1] hw/arm/boot: Report error if there is no fw_cfg device in the machine

2021-07-26 Thread Richard Henderson

On 7/26/21 6:33 AM, Peter Maydell wrote:

If the user provides both a BIOS/firmware image and also a guest
kernel filename, arm_setup_firmware_boot() will pass the
kernel image to the firmware via the fw_cfg device. However we
weren't checking whether there really was a fw_cfg device present,
and if there wasn't we would crash.

This crash can be provoked with a command line such as
  qemu-system-aarch64 -M raspi3 -kernel /dev/null -bios /dev/null -display none

It is currently only possible on the raspi3 machine, because unless
the machine sets info->firmware_loaded we won't call
arm_setup_firmware_boot(), and the only machines which set that are:
  * virt (has a fw-cfg device)
  * sbsa-ref (checks itself for kernel_filename && firmware_loaded)
  * raspi3 (crashes)

But this is an unfortunate beartrap to leave for future machine
model implementors, so we should handle this situation in boot.c.

Check in arm_setup_firmware_boot() whether the fw-cfg device exists
before trying to load files into it, and if it doesn't exist then
exit with a hopefully helpful error message.

Because we now handle this check in a machine-agnostic way, we
can remove the check from sbsa-ref.

Resolves:https://gitlab.com/qemu-project/qemu/-/issues/503
Signed-off-by: Peter Maydell
---


Reviewed-by: Richard Henderson 

r~



[PATCH 3/3] virtio-gpu: call dpy_gl_frame_counter at every guest scanout flush

2021-07-26 Thread Dongwon Kim
dpy_gl_frame_counter needs to be called for guest scanout frame count
to calculate and display the performance figure - ups, the guest FB
update per seconds.

Signed-off-by: Dongwon Kim 
---
 hw/display/virtio-gpu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index e183f4ecda..722869864a 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -523,6 +523,7 @@ static void virtio_gpu_resource_flush(VirtIOGPU *g,
 console_has_gl(scanout->con)) {
 dpy_gl_update(scanout->con, 0, 0, scanout->width,
   scanout->height);
+dpy_gl_count_frame(scanout->con, true);
 return;
 }
 }
@@ -566,6 +567,7 @@ static void virtio_gpu_resource_flush(VirtIOGPU *g,
extents->x1, extents->y1,
extents->x2 - extents->x1,
extents->y2 - extents->y1);
+dpy_gl_count_frame(scanout->con, true);
 
 pixman_region_fini();
 pixman_region_fini();
-- 
2.17.1




[PATCH 2/3] ui/gtk: calling gd_gl_frame_counter at every draw/swap

2021-07-26 Thread Dongwon Kim
For FPS calculation, gd_gl_frame_counter is called at every
draw(gtk-gl-area) or swap(gtk-egl) activity.

Signed-off-by: Dongwon Kim 
---
 ui/gtk-egl.c | 2 ++
 ui/gtk-gl-area.c | 1 +
 2 files changed, 3 insertions(+)

diff --git a/ui/gtk-egl.c b/ui/gtk-egl.c
index 2a2e6d3a17..ac56f5b9f4 100644
--- a/ui/gtk-egl.c
+++ b/ui/gtk-egl.c
@@ -89,6 +89,7 @@ void gd_egl_draw(VirtualConsole *vc)
 
 eglSwapBuffers(qemu_egl_display, vc->gfx.esurface);
 
+gd_gl_count_frame(>gfx.dcl, false);
 vc->gfx.scale_x = (double)ww / surface_width(vc->gfx.ds);
 vc->gfx.scale_y = (double)wh / surface_height(vc->gfx.ds);
 }
@@ -290,6 +291,7 @@ void gd_egl_scanout_flush(DisplayChangeListener *dcl,
 }
 
 eglSwapBuffers(qemu_egl_display, vc->gfx.esurface);
+gd_gl_count_frame(>gfx.dcl, false);
 }
 
 void gtk_egl_init(DisplayGLMode mode)
diff --git a/ui/gtk-gl-area.c b/ui/gtk-gl-area.c
index dd5783fec7..a18b6ff425 100644
--- a/ui/gtk-gl-area.c
+++ b/ui/gtk-gl-area.c
@@ -72,6 +72,7 @@ void gd_gl_area_draw(VirtualConsole *vc)
 }
 
 glFlush();
+gd_gl_count_frame(>gfx.dcl, false);
 graphic_hw_gl_flushed(vc->gfx.dcl.con);
 }
 
-- 
2.17.1




[PATCH 1/3] ui/gtk: adds status bar for expressing ups and fps

2021-07-26 Thread Dongwon Kim
With a display option, "show-fps=on", qemu adds a status bar and print
following performance numbers on the bar,

ups = update per seconds - the rate the guest scanout is updated.
fps = frame per seconds - the frame rate of VC's GL drawing area

One function, gd_gl_count_frame is added to count # frames
and calculate ups and fps every 100 frames or guest scanout updates.

Signed-off-by: Dongwon Kim 
---
 include/ui/console.h |  4 +++-
 include/ui/gtk.h |  2 ++
 qapi/ui.json |  6 +-
 ui/console.c |  6 ++
 ui/gtk.c | 51 
 5 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/include/ui/console.h b/include/ui/console.h
index b30b63976a..3ca8d07220 100644
--- a/include/ui/console.h
+++ b/include/ui/console.h
@@ -243,7 +243,8 @@ typedef struct DisplayChangeListenerOps {
 /* required if GL */
 void (*dpy_gl_update)(DisplayChangeListener *dcl,
   uint32_t x, uint32_t y, uint32_t w, uint32_t h);
-
+/* optional */
+void (*dpy_gl_count_frame)(DisplayChangeListener *dcl, bool ups);
 } DisplayChangeListenerOps;
 
 struct DisplayChangeListener {
@@ -314,6 +315,7 @@ void dpy_gl_release_dmabuf(QemuConsole *con,
QemuDmaBuf *dmabuf);
 void dpy_gl_update(QemuConsole *con,
uint32_t x, uint32_t y, uint32_t w, uint32_t h);
+void dpy_gl_count_frame(QemuConsole *con, bool ups);
 
 QEMUGLContext dpy_gl_ctx_create(QemuConsole *con,
 QEMUGLParams *params);
diff --git a/include/ui/gtk.h b/include/ui/gtk.h
index 9516670ebc..8d1a6c2cef 100644
--- a/include/ui/gtk.h
+++ b/include/ui/gtk.h
@@ -118,6 +118,7 @@ struct GtkDisplayState {
 GtkWidget *show_tabs_item;
 GtkWidget *untabify_item;
 GtkWidget *show_menubar_item;
+GtkWidget *status_bar;
 
 GtkWidget *vbox;
 GtkWidget *notebook;
@@ -152,6 +153,7 @@ extern bool gtk_use_gl_area;
 /* ui/gtk.c */
 void gd_update_windowsize(VirtualConsole *vc);
 int gd_monitor_update_interval(GtkWidget *widget);
+void gd_gl_count_frame(DisplayChangeListener *dcl, bool ups);
 
 /* ui/gtk-egl.c */
 void gd_egl_init(VirtualConsole *vc);
diff --git a/qapi/ui.json b/qapi/ui.json
index 1052ca9c38..a0882d6428 100644
--- a/qapi/ui.json
+++ b/qapi/ui.json
@@ -1035,13 +1035,17 @@
 #   assuming the guest will resize the display to match
 #   the window size then.  Otherwise it defaults to "off".
 #   Since 3.1
+# @show-fps:Enable showing Guest Scanout's update rate (UPS) and
+#   Surface render swap rate (FPS) on a status bar (default: off).
+#   Since 6.0
 #
 # Since: 2.12
 #
 ##
 { 'struct'  : 'DisplayGTK',
   'data': { '*grab-on-hover' : 'bool',
-'*zoom-to-fit'   : 'bool'  } }
+'*zoom-to-fit'   : 'bool',
+'*show-fps'  : 'bool'  } }
 
 ##
 # @DisplayEGLHeadless:
diff --git a/ui/console.c b/ui/console.c
index 2de5f4105b..31efdb0512 100644
--- a/ui/console.c
+++ b/ui/console.c
@@ -1924,6 +1924,12 @@ void dpy_gl_update(QemuConsole *con,
 con->gl->ops->dpy_gl_update(con->gl, x, y, w, h);
 }
 
+void dpy_gl_count_frame(QemuConsole *con, bool ups)
+{
+assert(con->gl);
+con->gl->ops->dpy_gl_count_frame(con->gl, ups);
+}
+
 /***/
 /* register display */
 
diff --git a/ui/gtk.c b/ui/gtk.c
index 98046f577b..091da08028 100644
--- a/ui/gtk.c
+++ b/ui/gtk.c
@@ -549,6 +549,47 @@ static void gd_switch(DisplayChangeListener *dcl,
 }
 }
 
+void gd_gl_count_frame(DisplayChangeListener *dcl, bool ups)
+{
+VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
+gchar ups_fps_str[100];
+static guint prev, curr;
+static guint ups_cnt, fps_cnt, status_bar_id;
+struct timeval tv;
+
+if (!vc->s->opts->show_fps) {
+return;
+}
+
+if (prev == 0) {
+gettimeofday(, NULL);
+prev = tv.tv_sec * 100 + tv.tv_usec;
+}
+
+if (ups) {
+ups_cnt++;
+} else {
+fps_cnt++;
+}
+
+/* update rate is calculated for every 200 frames */
+if (ups_cnt == 200 || fps_cnt == 200) {
+gettimeofday(, NULL);
+curr = tv.tv_sec * 100 + tv.tv_usec;
+prev = curr - prev;
+sprintf(ups_fps_str, "UPS : %0.2f u/s  FPS : %0.2f f/s",
+ups_cnt * 100/(gfloat)prev, fps_cnt * 
100/(gfloat)prev);
+
+status_bar_id = 
gtk_statusbar_get_context_id(GTK_STATUSBAR(vc->s->status_bar),
+ "ups_fps_info");
+gtk_statusbar_pop(GTK_STATUSBAR(vc->s->status_bar), status_bar_id);
+gtk_statusbar_push(GTK_STATUSBAR(vc->s->status_bar), status_bar_id, 
ups_fps_str);
+prev = curr;
+fps_cnt = 0;
+ups_cnt = 0;
+ }
+}
+
 static const DisplayChangeListenerOps dcl_ops = {
 .dpy_name = "gtk",
 

Re: aarch64 efi boot failures with qemu 6.0+

2021-07-26 Thread Bjorn Helgaas
[+cc linux-pci]

On Mon, Jul 26, 2021 at 04:16:29PM -0500, Bjorn Helgaas wrote:
> On Mon, Jul 26, 2021 at 06:00:57PM +0200, Ard Biesheuvel wrote:
> > On Mon, 26 Jul 2021 at 11:08, Philippe Mathieu-Daudé  
> > wrote:
> > > On 7/26/21 12:56 AM, Guenter Roeck wrote:
> > > > On 7/25/21 3:14 PM, Michael S. Tsirkin wrote:
> > > >> On Sat, Jul 24, 2021 at 11:52:34AM -0700, Guenter Roeck wrote:
> > > >>> Hi all,
> > > >>>
> > > >>> starting with qemu v6.0, some of my aarch64 efi boot tests no longer
> > > >>> work. Analysis shows that PCI devices with IO ports do not instantiate
> > > >>> in qemu v6.0 (or v6.1-rc0) when booting through efi. The problem 
> > > >>> affects
> > > >>> (at least) ne2k_pci, tulip, dc390, and am53c974. The problem only
> > > >>> affects
> > > >>> aarch64, not x86/x86_64.
> > > >>>
> > > >>> I bisected the problem to commit 0cf8882fd0 ("acpi/gpex: Inform os to
> > > >>> keep firmware resource map"). Since this commit, PCI device BAR
> > > >>> allocation has changed. Taking tulip as example, the kernel reports
> > > >>> the following PCI bar assignments when running qemu v5.2.
> > > >>>
> > > >>> [3.921801] pci :00:01.0: [1011:0019] type 00 class 0x02
> > > >>> [3.922207] pci :00:01.0: reg 0x10: [io  0x-0x007f]
> > > >>> [3.922505] pci :00:01.0: reg 0x14: [mem 0x1000-0x107f]
> > 
> > IIUC, these lines are read back from the BARs
> > 
> > > >>> [3.927111] pci :00:01.0: BAR 0: assigned [io  0x1000-0x107f]
> > > >>> [3.927455] pci :00:01.0: BAR 1: assigned [mem
> > > >>> 0x1000-0x107f]
> > > >>>
> > 
> > ... and this is the assignment created by the kernel.
> > 
> > > >>> With qemu v6.0, the assignment is reported as follows.
> > > >>>
> > > >>> [3.922887] pci :00:01.0: [1011:0019] type 00 class 0x02
> > > >>> [3.923278] pci :00:01.0: reg 0x10: [io  0x-0x007f]
> > > >>> [3.923451] pci :00:01.0: reg 0x14: [mem 0x1000-0x107f]
> > 
> > The problem here is that Linux, for legacy reasons, does not support
> > I/O ports <= 0x1000 on PCI, so the I/O assignment created by EFI is
> > rejected.
> > 
> > This might make sense on x86, where legacy I/O ports may exist, but on
> > other architectures, this makes no sense.
> 
> I guess this is the "#define PCIBIOS_MIN_IO 0x1000" in
> arm64/include/asm/pci.h.  From a PCI point of view, I'm not opposed to
> changing that to 0, as it is on csky, riscv, sh, sparc, um.  But it's
> really an arch question, so the arm64 folks would have to weigh in.
> 
> But I don't think that would fix this.  PCIBIOS_MIN_IO is mainly used
> when we assign or reassign resources to a BAR, and if firmware tells
> us to preserve the assignments done by firmware, Linux shouldn't be
> doing any assignment or reassignment.
> 
> Linux received 00:01.0 BAR 0 as [io 0x-0x007f], and Guenter didn't
> report any reassignment, so I assume Linux saw the
> DSM_PCI_PRESERVE_BOOT_CONFIG [1] and didn't change anything.
> 
> Could this be due to drivers assuming that an I/O BAR of 0 is invalid?
> I see that at least ne2k_pci_init_one() [2] seems to assume that.  And
> tulip_init_one() [3] and pci_esp_probe_one() (am53c974.c, [4]) use
> pci_iomap() [5], which fails if the resource starts at 0.
> 
> So pci_iomap() is probably already broken on the arches above that
> allow I/O BARs to be zero.  Maybe pci_iomap() should only fail on
> "!start" for *memory* BARs, e.g.,
> 
> diff --git a/lib/pci_iomap.c b/lib/pci_iomap.c
> index 2d3eb1cb73b8..77455e702a3e 100644
> --- a/lib/pci_iomap.c
> +++ b/lib/pci_iomap.c
> @@ -34,7 +34,9 @@ void __iomem *pci_iomap_range(struct pci_dev *dev,
>   resource_size_t len = pci_resource_len(dev, bar);
>   unsigned long flags = pci_resource_flags(dev, bar);
>  
> - if (len <= offset || !start)
> + if (flags & IORESOURCE_MEM && !start)
> + return NULL;
> + if (len <= offset)
>   return NULL;
>   len -= offset;
>   start += offset;
> 
> 
> [1] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/acpi/pci_root.c?id=v5.13#n915
> [2] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/8390/ne2k-pci.c?id=v5.13#n247
> [3] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/dec/tulip/tulip_core.c?id=v5.13#n1418
> [4] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/scsi/am53c974.c?id=v5.13#n431
> [5] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/pci_iomap.c?id=v5.13#n37
> 
> > > >>> and the controller does not instantiate. The problem disapears after
> > > >>> reverting commit 0cf8882fd0.
> > > >>>
> > > >>> Attached is a summary of test runs with various devices and qemu v5.2
> > > >>> as well as qemu v6.0, and the command line I use for efi boots.
> > > >>>
> > > >>> Did commit 0cf8882fd0 introduce a bug, do I now need need some 
> > > >>> 

Re: aarch64 efi boot failures with qemu 6.0+

2021-07-26 Thread Bjorn Helgaas
On Mon, Jul 26, 2021 at 06:00:57PM +0200, Ard Biesheuvel wrote:
> (cc Bjorn)
> 
> On Mon, 26 Jul 2021 at 11:08, Philippe Mathieu-Daudé  
> wrote:
> > On 7/26/21 12:56 AM, Guenter Roeck wrote:
> > > On 7/25/21 3:14 PM, Michael S. Tsirkin wrote:
> > >> On Sat, Jul 24, 2021 at 11:52:34AM -0700, Guenter Roeck wrote:
> > >>> Hi all,
> > >>>
> > >>> starting with qemu v6.0, some of my aarch64 efi boot tests no longer
> > >>> work. Analysis shows that PCI devices with IO ports do not instantiate
> > >>> in qemu v6.0 (or v6.1-rc0) when booting through efi. The problem affects
> > >>> (at least) ne2k_pci, tulip, dc390, and am53c974. The problem only
> > >>> affects
> > >>> aarch64, not x86/x86_64.
> > >>>
> > >>> I bisected the problem to commit 0cf8882fd0 ("acpi/gpex: Inform os to
> > >>> keep firmware resource map"). Since this commit, PCI device BAR
> > >>> allocation has changed. Taking tulip as example, the kernel reports
> > >>> the following PCI bar assignments when running qemu v5.2.
> > >>>
> > >>> [3.921801] pci :00:01.0: [1011:0019] type 00 class 0x02
> > >>> [3.922207] pci :00:01.0: reg 0x10: [io  0x-0x007f]
> > >>> [3.922505] pci :00:01.0: reg 0x14: [mem 0x1000-0x107f]
> 
> IIUC, these lines are read back from the BARs
> 
> > >>> [3.927111] pci :00:01.0: BAR 0: assigned [io  0x1000-0x107f]
> > >>> [3.927455] pci :00:01.0: BAR 1: assigned [mem
> > >>> 0x1000-0x107f]
> > >>>
> 
> ... and this is the assignment created by the kernel.
> 
> > >>> With qemu v6.0, the assignment is reported as follows.
> > >>>
> > >>> [3.922887] pci :00:01.0: [1011:0019] type 00 class 0x02
> > >>> [3.923278] pci :00:01.0: reg 0x10: [io  0x-0x007f]
> > >>> [3.923451] pci :00:01.0: reg 0x14: [mem 0x1000-0x107f]
> 
> The problem here is that Linux, for legacy reasons, does not support
> I/O ports <= 0x1000 on PCI, so the I/O assignment created by EFI is
> rejected.
> 
> This might make sense on x86, where legacy I/O ports may exist, but on
> other architectures, this makes no sense.

I guess this is the "#define PCIBIOS_MIN_IO 0x1000" in
arm64/include/asm/pci.h.  From a PCI point of view, I'm not opposed to
changing that to 0, as it is on csky, riscv, sh, sparc, um.  But it's
really an arch question, so the arm64 folks would have to weigh in.

But I don't think that would fix this.  PCIBIOS_MIN_IO is mainly used
when we assign or reassign resources to a BAR, and if firmware tells
us to preserve the assignments done by firmware, Linux shouldn't be
doing any assignment or reassignment.

Linux received 00:01.0 BAR 0 as [io 0x-0x007f], and Guenter didn't
report any reassignment, so I assume Linux saw the
DSM_PCI_PRESERVE_BOOT_CONFIG [1] and didn't change anything.

Could this be due to drivers assuming that an I/O BAR of 0 is invalid?
I see that at least ne2k_pci_init_one() [2] seems to assume that.  And
tulip_init_one() [3] and pci_esp_probe_one() (am53c974.c, [4]) use
pci_iomap() [5], which fails if the resource starts at 0.

So pci_iomap() is probably already broken on the arches above that
allow I/O BARs to be zero.  Maybe pci_iomap() should only fail on
"!start" for *memory* BARs, e.g.,

diff --git a/lib/pci_iomap.c b/lib/pci_iomap.c
index 2d3eb1cb73b8..77455e702a3e 100644
--- a/lib/pci_iomap.c
+++ b/lib/pci_iomap.c
@@ -34,7 +34,9 @@ void __iomem *pci_iomap_range(struct pci_dev *dev,
resource_size_t len = pci_resource_len(dev, bar);
unsigned long flags = pci_resource_flags(dev, bar);
 
-   if (len <= offset || !start)
+   if (flags & IORESOURCE_MEM && !start)
+   return NULL;
+   if (len <= offset)
return NULL;
len -= offset;
start += offset;


[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/acpi/pci_root.c?id=v5.13#n915
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/8390/ne2k-pci.c?id=v5.13#n247
[3] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/dec/tulip/tulip_core.c?id=v5.13#n1418
[4] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/scsi/am53c974.c?id=v5.13#n431
[5] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/pci_iomap.c?id=v5.13#n37

> > >>> and the controller does not instantiate. The problem disapears after
> > >>> reverting commit 0cf8882fd0.
> > >>>
> > >>> Attached is a summary of test runs with various devices and qemu v5.2
> > >>> as well as qemu v6.0, and the command line I use for efi boots.
> > >>>
> > >>> Did commit 0cf8882fd0 introduce a bug, do I now need need some different
> > >>> command line to instantiate PCI devices with io ports, or are such
> > >>> devices
> > >>> simply no longer supported if the system is booted with efi support ?
> > >>>
> > >>> Thanks,
> > >>> Guenter
> > >>
> > >>
> > >> So that commit basically just says 

[PATCH v2 1/2] virtio-gpu: splitting one extended mode guest fb into n-scanouts

2021-07-26 Thread Dongwon Kim
When guest is running Linux/X11 with extended multiple displays mode enabled,
the guest shares one scanout resource each time containing whole surface
rather than sharing individual display output separately. This extended frame
is properly splited and rendered on the corresponding scanout surfaces but
not in case of blob-resource (zero copy).

This code change lets the qemu split this one large surface data into multiple
in case of blob-resource as well so that each sub frame then can be blitted
properly to each scanout.

v2: resizing qemu console in virtio_gpu_update_dmabuf to scanout's width and
height

Signed-off-by: Dongwon Kim 
---
 hw/display/virtio-gpu-udmabuf.c | 21 ++---
 hw/display/virtio-gpu.c |  4 ++--
 include/hw/virtio/virtio-gpu.h  |  5 +++--
 include/ui/console.h|  4 
 stubs/virtio-gpu-udmabuf.c  |  3 ++-
 5 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/hw/display/virtio-gpu-udmabuf.c b/hw/display/virtio-gpu-udmabuf.c
index 3c01a415e7..aadf1221ef 100644
--- a/hw/display/virtio-gpu-udmabuf.c
+++ b/hw/display/virtio-gpu-udmabuf.c
@@ -171,7 +171,8 @@ static VGPUDMABuf
 *virtio_gpu_create_dmabuf(VirtIOGPU *g,
   uint32_t scanout_id,
   struct virtio_gpu_simple_resource *res,
-  struct virtio_gpu_framebuffer *fb)
+  struct virtio_gpu_framebuffer *fb,
+  struct virtio_gpu_rect *r)
 {
 VGPUDMABuf *dmabuf;
 
@@ -183,6 +184,10 @@ static VGPUDMABuf
 dmabuf->buf.width = fb->width;
 dmabuf->buf.height = fb->height;
 dmabuf->buf.stride = fb->stride;
+dmabuf->buf.x = r->x;
+dmabuf->buf.y = r->y;
+dmabuf->buf.scanout_width = r->width;
+dmabuf->buf.scanout_height = r->height;
 dmabuf->buf.fourcc = qemu_pixman_to_drm_format(fb->format);
 dmabuf->buf.fd = res->dmabuf_fd;
 
@@ -195,24 +200,26 @@ static VGPUDMABuf
 int virtio_gpu_update_dmabuf(VirtIOGPU *g,
  uint32_t scanout_id,
  struct virtio_gpu_simple_resource *res,
- struct virtio_gpu_framebuffer *fb)
+ struct virtio_gpu_framebuffer *fb,
+ struct virtio_gpu_rect *r)
 {
 struct virtio_gpu_scanout *scanout = >parent_obj.scanout[scanout_id];
 VGPUDMABuf *new_primary, *old_primary = NULL;
 
-new_primary = virtio_gpu_create_dmabuf(g, scanout_id, res, fb);
+new_primary = virtio_gpu_create_dmabuf(g, scanout_id, res, fb, r);
 if (!new_primary) {
 return -EINVAL;
 }
 
 if (g->dmabuf.primary) {
-old_primary = g->dmabuf.primary;
+old_primary = g->dmabuf.primary[scanout_id];
 }
 
-g->dmabuf.primary = new_primary;
+g->dmabuf.primary[scanout_id] = new_primary;
+
 qemu_console_resize(scanout->con,
-new_primary->buf.width,
-new_primary->buf.height);
+new_primary->buf.scanout_width,
+new_primary->buf.scanout_height);
 dpy_gl_scanout_dmabuf(scanout->con, _primary->buf);
 
 if (old_primary) {
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index e183f4ecda..60db480719 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -523,9 +523,9 @@ static void virtio_gpu_resource_flush(VirtIOGPU *g,
 console_has_gl(scanout->con)) {
 dpy_gl_update(scanout->con, 0, 0, scanout->width,
   scanout->height);
-return;
 }
 }
+return;
 }
 
 if (!res->blob &&
@@ -633,7 +633,7 @@ static void virtio_gpu_do_set_scanout(VirtIOGPU *g,
 
 if (res->blob) {
 if (console_has_gl(scanout->con)) {
-if (!virtio_gpu_update_dmabuf(g, scanout_id, res, fb)) {
+if (!virtio_gpu_update_dmabuf(g, scanout_id, res, fb, r)) {
 virtio_gpu_update_scanout(g, scanout_id, res, r);
 return;
 }
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index bcf54d970f..6372f4bbb5 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -187,7 +187,7 @@ struct VirtIOGPU {
 
 struct {
 QTAILQ_HEAD(, VGPUDMABuf) bufs;
-VGPUDMABuf *primary;
+VGPUDMABuf *primary[VIRTIO_GPU_MAX_SCANOUTS];
 } dmabuf;
 };
 
@@ -273,7 +273,8 @@ void virtio_gpu_fini_udmabuf(struct 
virtio_gpu_simple_resource *res);
 int virtio_gpu_update_dmabuf(VirtIOGPU *g,
  uint32_t scanout_id,
  struct virtio_gpu_simple_resource *res,
- struct virtio_gpu_framebuffer *fb);
+ struct virtio_gpu_framebuffer *fb,
+ struct virtio_gpu_rect *r);
 
 /* virtio-gpu-3d.c */
 void 

[PATCH v2 2/2] ui/gtk-egl: blitting partial guest fb to the proper scanout surface

2021-07-26 Thread Dongwon Kim
eb_fb_blit should be able to blit partial image that represent a specific
guest display in case there are multiple displays connected to the guest
(One guest scanout FB contains all display outputs in a single dmabuf (blob-
resource).).

v2: egl_fb includes dmabuf info then make egl_fb_blit position and size
parameters programmed in dmabuf structure (previously position/size
parameters were given to egl_fb_blit separately)
(Vivek Kasireddy)

changed the commit message as there is no interface change to egl_fb_blit

Signed-off-by: Dongwon Kim 
---
 include/ui/egl-helpers.h |  1 +
 ui/egl-helpers.c | 25 +
 ui/gtk-egl.c |  4 
 3 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/include/ui/egl-helpers.h b/include/ui/egl-helpers.h
index f1bf8f97fc..b6fced7062 100644
--- a/include/ui/egl-helpers.h
+++ b/include/ui/egl-helpers.h
@@ -19,6 +19,7 @@ typedef struct egl_fb {
 GLuint texture;
 GLuint framebuffer;
 bool delete_texture;
+QemuDmaBuf *dmabuf;
 } egl_fb;
 
 void egl_fb_destroy(egl_fb *fb);
diff --git a/ui/egl-helpers.c b/ui/egl-helpers.c
index 6d0cb2b5cb..80bc61fb70 100644
--- a/ui/egl-helpers.c
+++ b/ui/egl-helpers.c
@@ -90,14 +90,31 @@ void egl_fb_setup_new_tex(egl_fb *fb, int width, int height)
 
 void egl_fb_blit(egl_fb *dst, egl_fb *src, bool flip)
 {
-GLuint y1, y2;
+GLuint x1 = 0;
+GLuint y1 = 0;
+GLuint x2, y2;
+GLuint w = src->width;
+GLuint h = src->height;
 
 glBindFramebuffer(GL_READ_FRAMEBUFFER, src->framebuffer);
 glBindFramebuffer(GL_DRAW_FRAMEBUFFER, dst->framebuffer);
 glViewport(0, 0, dst->width, dst->height);
-y1 = flip ? src->height : 0;
-y2 = flip ? 0 : src->height;
-glBlitFramebuffer(0, y1, src->width, y2,
+
+if (src->dmabuf) {
+x1 = src->dmabuf->x;
+y1 = src->dmabuf->y;
+w = src->dmabuf->scanout_width;
+h = src->dmabuf->scanout_height;
+}
+
+w = (x1 + w) > src->width ? src->width - x1 : w;
+h = (y1 + h) > src->height ? src->height - y1 : h;
+
+y2 = flip ? y1 : h + y1;
+y1 = flip ? h + y1 : y1;
+x2 = x1 + w;
+
+glBlitFramebuffer(x1, y1, x2, y2,
   0, 0, dst->width, dst->height,
   GL_COLOR_BUFFER_BIT, GL_LINEAR);
 }
diff --git a/ui/gtk-egl.c b/ui/gtk-egl.c
index 2a2e6d3a17..636b2aa02c 100644
--- a/ui/gtk-egl.c
+++ b/ui/gtk-egl.c
@@ -209,6 +209,8 @@ void gd_egl_scanout_dmabuf(DisplayChangeListener *dcl,
QemuDmaBuf *dmabuf)
 {
 #ifdef CONFIG_GBM
+VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
+
 egl_dmabuf_import_texture(dmabuf);
 if (!dmabuf->texture) {
 return;
@@ -217,6 +219,8 @@ void gd_egl_scanout_dmabuf(DisplayChangeListener *dcl,
 gd_egl_scanout_texture(dcl, dmabuf->texture,
false, dmabuf->width, dmabuf->height,
0, 0, dmabuf->width, dmabuf->height);
+
+vc->gfx.guest_fb.dmabuf = dmabuf;
 #endif
 }
 
-- 
2.17.1




[PULL for-6.1 11/12] linux-user/syscall: Remove unused variable from execve

2021-07-26 Thread Richard Henderson
>From clang-13:
linux-user/syscall.c:8503:17: error: variable 'total_size' set but not used \
[-Werror,-Wunused-but-set-variable]

Acked-by: Laurent Vivier 
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 376629c689..ccd3892b2d 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8364,7 +8364,6 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 abi_ulong guest_envp;
 abi_ulong addr;
 char **q;
-int total_size = 0;
 
 argc = 0;
 guest_argp = arg2;
@@ -8396,7 +8395,6 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 break;
 if (!(*q = lock_user_string(addr)))
 goto execve_efault;
-total_size += strlen(*q) + 1;
 }
 *q = NULL;
 
@@ -8408,7 +8406,6 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 break;
 if (!(*q = lock_user_string(addr)))
 goto execve_efault;
-total_size += strlen(*q) + 1;
 }
 *q = NULL;
 
-- 
2.25.1




[PATCH v2 1/2] ui/gtk: detach_all option for making all VCs detached upon starting

2021-07-26 Thread Dongwon Kim
With "detach-all=on" for display, all VCs are detached from the beginning.
This is useful when there are multiple displays assigned to a guest OS.

v2: Move "detach-all" option to under DisplayGTK as it's GTK specific
(Thomas Huth)

Signed-off-by: Dongwon Kim 
Signed-off-by: Khairul Anuar Romli 
---
 qapi/ui.json | 5 -
 ui/gtk.c | 7 +++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/qapi/ui.json b/qapi/ui.json
index 1052ca9c38..e37af91683 100644
--- a/qapi/ui.json
+++ b/qapi/ui.json
@@ -1035,13 +1035,16 @@
 #   assuming the guest will resize the display to match
 #   the window size then.  Otherwise it defaults to "off".
 #   Since 3.1
+# @detach-all:  Detach all VirtualConsoles when starting Qemu (default: off).
+#   Since 6.0
 #
 # Since: 2.12
 #
 ##
 { 'struct'  : 'DisplayGTK',
   'data': { '*grab-on-hover' : 'bool',
-'*zoom-to-fit'   : 'bool'  } }
+'*zoom-to-fit'   : 'bool',
+'*detach-all': 'bool'  } }
 
 ##
 # @DisplayEGLHeadless:
diff --git a/ui/gtk.c b/ui/gtk.c
index 98046f577b..48c1fff54f 100644
--- a/ui/gtk.c
+++ b/ui/gtk.c
@@ -2189,6 +2189,7 @@ static void gtk_display_init(DisplayState *ds, 
DisplayOptions *opts)
 GdkDisplay *window_display;
 GtkIconTheme *theme;
 char *dir;
+int i;
 
 if (!gtkinit) {
 fprintf(stderr, "gtk initialization failed\n");
@@ -2268,6 +2269,12 @@ static void gtk_display_init(DisplayState *ds, 
DisplayOptions *opts)
 gtk_menu_item_activate(GTK_MENU_ITEM(s->grab_on_hover_item));
 }
 gd_clipboard_init(s);
+
+if (opts->u.gtk.detach_all) {
+for (i = 0; i < s->nb_vcs - 1; i++) {
+gtk_menu_item_activate(GTK_MENU_ITEM(s->untabify_item));
+}
+}
 }
 
 static void early_gtk_display_init(DisplayOptions *opts)
-- 
2.17.1




[PULL for-6.1 08/12] hw/audio/adlib: Remove unused variable in adlib_callback

2021-07-26 Thread Richard Henderson
>From clang-13:
hw/audio/adlib.c:189:18: error: variable 'net' set but not used \
[-Werror,-Wunused-but-set-variable]

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 hw/audio/adlib.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/audio/adlib.c b/hw/audio/adlib.c
index 42d50d2fdc..5f979b1487 100644
--- a/hw/audio/adlib.c
+++ b/hw/audio/adlib.c
@@ -186,7 +186,7 @@ static int write_audio (AdlibState *s, int samples)
 static void adlib_callback (void *opaque, int free)
 {
 AdlibState *s = opaque;
-int samples, net = 0, to_play, written;
+int samples, to_play, written;
 
 samples = free >> SHIFT;
 if (!(s->active && s->enabled) || !samples) {
@@ -219,7 +219,6 @@ static void adlib_callback (void *opaque, int free)
 written = write_audio (s, samples);
 
 if (written) {
-net += written;
 samples -= written;
 s->pos = (s->pos + written) % s->samples;
 }
-- 
2.25.1




[PULL for-6.1 12/12] tests/unit: Remove unused variable from test_io

2021-07-26 Thread Richard Henderson
>From clang-13:
tests/unit/test-iov.c:161:26: error: variable 't' set but not used \
[-Werror,-Wunused-but-set-variable]

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tests/unit/test-iov.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/tests/unit/test-iov.c b/tests/unit/test-iov.c
index 9c415e2f1f..5371066fb6 100644
--- a/tests/unit/test-iov.c
+++ b/tests/unit/test-iov.c
@@ -158,7 +158,7 @@ static void test_io(void)
 
 int sv[2];
 int r;
-unsigned i, j, k, s, t;
+unsigned i, j, k, s;
 fd_set fds;
 unsigned niov;
 struct iovec *iov, *siov;
@@ -182,7 +182,6 @@ static void test_io(void)
 
 FD_ZERO();
 
-t = 0;
 if (fork() == 0) {
/* writer */
 
@@ -201,7 +200,6 @@ static void test_io(void)
g_assert(memcmp(iov, siov, sizeof(*iov)*niov) == 0);
if (r >= 0) {
k += r;
-   t += r;
usleep(g_test_rand_int_range(0, 30));
} else if (errno == EAGAIN) {
select(sv[1]+1, NULL, , NULL, NULL);
@@ -238,7 +236,6 @@ static void test_io(void)
g_assert(memcmp(iov, siov, sizeof(*iov)*niov) == 0);
if (r > 0) {
k += r;
-   t += r;
} else if (!r) {
if (s) {
break;
-- 
2.25.1




[PULL for-6.1 10/12] hw/pci-hist/pnv_phb4: Fix typo in pnv_phb4_ioda_write

2021-07-26 Thread Richard Henderson
>From clang-13:
hw/pci-host/pnv_phb4.c:375:18: error: variable 'v' set but not used \
[-Werror,-Wunused-but-set-variable]

It's pretty clear that we meant to write back 'v' after
all that computation and not 'val'.

Acked-by: David Gibson 
Acked-by: Benjamin Herrenschmidt 
Reviewed-by: Cédric Le Goater 
Signed-off-by: Richard Henderson 
---
 hw/pci-host/pnv_phb4.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/pci-host/pnv_phb4.c b/hw/pci-host/pnv_phb4.c
index 54f57c660a..5c375a9f28 100644
--- a/hw/pci-host/pnv_phb4.c
+++ b/hw/pci-host/pnv_phb4.c
@@ -392,7 +392,7 @@ static void pnv_phb4_ioda_write(PnvPHB4 *phb, uint64_t val)
 v &= 0xull;
 v |= 0xcfffull & val;
 }
-*tptr = val;
+*tptr = v;
 break;
 }
 case IODA3_TBL_MBT:
-- 
2.25.1




[PULL for-6.1 06/12] util/selfmap: Discard mapping on error

2021-07-26 Thread Richard Henderson
>From clang-13:
util/selfmap.c:26:21: error: variable 'errors' set but not used \
[-Werror,-Wunused-but-set-variable]

Quite right of course, but there's no reason not to check errors.

First, incrementing errors is incorrect, because qemu_strtoul
returns an errno not a count -- just or them together so that
we have a non-zero value at the end.

Second, if we have an error, do not add the struct to the list,
but free it instead.

Cc: Alex Bennée 
Reviewed-by: Eric Blake 
Signed-off-by: Richard Henderson 
---
 util/selfmap.c | 29 +
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/util/selfmap.c b/util/selfmap.c
index 2ec99dfdda..2c14f019ce 100644
--- a/util/selfmap.c
+++ b/util/selfmap.c
@@ -23,29 +23,34 @@ GSList *read_self_maps(void)
 gchar **fields = g_strsplit(lines[i], " ", 6);
 if (g_strv_length(fields) > 4) {
 MapInfo *e = g_new0(MapInfo, 1);
-int errors;
+int errors = 0;
 const char *end;
 
-errors  = qemu_strtoul(fields[0], , 16, >start);
-errors += qemu_strtoul(end + 1, NULL, 16, >end);
+errors |= qemu_strtoul(fields[0], , 16, >start);
+errors |= qemu_strtoul(end + 1, NULL, 16, >end);
 
 e->is_read  = fields[1][0] == 'r';
 e->is_write = fields[1][1] == 'w';
 e->is_exec  = fields[1][2] == 'x';
 e->is_priv  = fields[1][3] == 'p';
 
-errors += qemu_strtoul(fields[2], NULL, 16, >offset);
+errors |= qemu_strtoul(fields[2], NULL, 16, >offset);
 e->dev = g_strdup(fields[3]);
-errors += qemu_strtou64(fields[4], NULL, 10, >inode);
+errors |= qemu_strtou64(fields[4], NULL, 10, >inode);
 
-/*
- * The last field may have leading spaces which we
- * need to strip.
- */
-if (g_strv_length(fields) == 6) {
-e->path = g_strdup(g_strchug(fields[5]));
+if (!errors) {
+/*
+ * The last field may have leading spaces which we
+ * need to strip.
+ */
+if (g_strv_length(fields) == 6) {
+e->path = g_strdup(g_strchug(fields[5]));
+}
+map_info = g_slist_prepend(map_info, e);
+} else {
+g_free(e->dev);
+g_free(e);
 }
-map_info = g_slist_prepend(map_info, e);
 }
 
 g_strfreev(fields);
-- 
2.25.1




[PULL for-6.1 09/12] hw/ppc/spapr_events: Remove unused variable from check_exception

2021-07-26 Thread Richard Henderson
>From clang-13:
hw/ppc/spapr_events.c:937:14: error: variable 'xinfo' set but not used \
[-Werror,-Wunused-but-set-variable]

Acked-by: David Gibson 
Signed-off-by: Richard Henderson 
---
 hw/ppc/spapr_events.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index 0cfc19be19..23e2e2fff1 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -934,7 +934,6 @@ static void check_exception(PowerPCCPU *cpu, 
SpaprMachineState *spapr,
 uint32_t nret, target_ulong rets)
 {
 uint32_t mask, buf, len, event_len;
-uint64_t xinfo;
 SpaprEventLogEntry *event;
 struct rtas_error_log header;
 int i;
@@ -944,13 +943,9 @@ static void check_exception(PowerPCCPU *cpu, 
SpaprMachineState *spapr,
 return;
 }
 
-xinfo = rtas_ld(args, 1);
 mask = rtas_ld(args, 2);
 buf = rtas_ld(args, 4);
 len = rtas_ld(args, 5);
-if (nargs == 7) {
-xinfo |= (uint64_t)rtas_ld(args, 6) << 32;
-}
 
 event = rtas_event_log_dequeue(spapr, mask);
 if (!event) {
-- 
2.25.1




[PULL for-6.1 07/12] net/checksum: Remove unused variable in net_checksum_add_iov

2021-07-26 Thread Richard Henderson
>From clang-13:
../qemu/net/checksum.c:189:23: error: variable 'buf_off' set but not used \
[-Werror,-Wunused-but-set-variable]

Reviewed-by: Eric Blake 
Signed-off-by: Richard Henderson 
---
 net/checksum.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/checksum.c b/net/checksum.c
index 70f4eaeb3a..68245fd748 100644
--- a/net/checksum.c
+++ b/net/checksum.c
@@ -186,12 +186,11 @@ uint32_t
 net_checksum_add_iov(const struct iovec *iov, const unsigned int iov_cnt,
  uint32_t iov_off, uint32_t size, uint32_t csum_offset)
 {
-size_t iovec_off, buf_off;
+size_t iovec_off;
 unsigned int i;
 uint32_t res = 0;
 
 iovec_off = 0;
-buf_off = 0;
 for (i = 0; i < iov_cnt && size; i++) {
 if (iov_off < (iovec_off + iov[i].iov_len)) {
 size_t len = MIN((iovec_off + iov[i].iov_len) - iov_off , size);
@@ -200,7 +199,6 @@ net_checksum_add_iov(const struct iovec *iov, const 
unsigned int iov_cnt,
 res += net_checksum_add_cont(len, chunk_buf, csum_offset);
 csum_offset += len;
 
-buf_off += len;
 iov_off += len;
 size -= len;
 }
-- 
2.25.1




[PULL for-6.1 02/12] accel/tcg: Remove unnecessary check on icount_extra in cpu_loop_exec_tb()

2021-07-26 Thread Richard Henderson
From: Peter Maydell 

In cpu_loop_exec_tb(), we decide whether to look for a TB with
exactly insns_left instructions in it using the condition
 (!cpu->icount_extra && insns_left > 0 && insns_left < tb->icount)

The check for icount_extra == 0 is unnecessary, because we just set
  insns_left = MIN(0x, cpu->icount_budget);
  icount_extra = icount_budget - insns_left;
and so icount_extra can only be non-zero if icount_budget > 0x
and insns_left == 0x. But in that case insns_left >= tb->icount
because 0x is much larger than TCG_MAX_INSNS, so the condition
will be false anyway.

Remove the unnecessary check, and instead assert:
 * that we are only going to execute a partial TB here if the
   icount budget has run out (ie icount_extra == 0)
 * that the number of insns we're going to execute does fit into
   the CF_COUNT_MASK

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
Message-Id: <20210725174405.24568-3-peter.mayd...@linaro.org>
Signed-off-by: Richard Henderson 
---
 accel/tcg/cpu-exec.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 6e8dc29119..5aa42fbff3 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -843,7 +843,9 @@ static inline void cpu_loop_exec_tb(CPUState *cpu, 
TranslationBlock *tb,
  * execute we need to ensure we find/generate a TB with exactly
  * insns_left instructions in it.
  */
-if (!cpu->icount_extra && insns_left > 0 && insns_left < tb->icount)  {
+if (insns_left > 0 && insns_left < tb->icount)  {
+assert(insns_left <= CF_COUNT_MASK);
+assert(cpu->icount_extra == 0);
 cpu->cflags_next_tb = (tb->cflags & ~CF_COUNT_MASK) | insns_left;
 }
 #endif
-- 
2.25.1




[PULL for-6.1 04/12] nbd/server: Mark variable unused in nbd_negotiate_meta_queries

2021-07-26 Thread Richard Henderson
>From clang-13:
nbd/server.c:976:22: error: variable 'bitmaps' set but not used \
[-Werror,-Wunused-but-set-variable]

which is incorrect; see //bugs.llvm.org/show_bug.cgi?id=3888.

Reviewed-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Richard Henderson 
---
 nbd/server.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/nbd/server.c b/nbd/server.c
index b60ebc3ab6..3927f7789d 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -973,7 +973,8 @@ static int nbd_negotiate_meta_queries(NBDClient *client,
 {
 int ret;
 g_autofree char *export_name = NULL;
-g_autofree bool *bitmaps = NULL;
+/* Mark unused to work around https://bugs.llvm.org/show_bug.cgi?id=3888 */
+g_autofree G_GNUC_UNUSED bool *bitmaps = NULL;
 NBDExportMetaContexts local_meta = {0};
 uint32_t nb_queries;
 size_t i;
-- 
2.25.1




[PULL for-6.1 05/12] accel/tcg: Remove unused variable in cpu_exec

2021-07-26 Thread Richard Henderson
>From clang-13:
accel/tcg/cpu-exec.c:783:15: error: variable 'cc' set but not used \
[-Werror,-Wunused-but-set-variable]

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 accel/tcg/cpu-exec.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 5aa42fbff3..e5c0ccd1a2 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -855,7 +855,6 @@ static inline void cpu_loop_exec_tb(CPUState *cpu, 
TranslationBlock *tb,
 
 int cpu_exec(CPUState *cpu)
 {
-CPUClass *cc = CPU_GET_CLASS(cpu);
 int ret;
 SyncClocks sc = { 0 };
 
@@ -889,19 +888,14 @@ int cpu_exec(CPUState *cpu)
  * that we support, but is still unfixed in clang:
  *   https://bugs.llvm.org/show_bug.cgi?id=21183
  *
- * Reload essential local variables here for those compilers.
+ * Reload an essential local variable here for those compilers.
  * Newer versions of gcc would complain about this code (-Wclobbered),
  * so we only perform the workaround for clang.
  */
 cpu = current_cpu;
-cc = CPU_GET_CLASS(cpu);
 #else
-/*
- * Non-buggy compilers preserve these locals; assert that
- * they have the correct value.
- */
+/* Non-buggy compilers preserve this; assert the correct value. */
 g_assert(cpu == current_cpu);
-g_assert(cc == CPU_GET_CLASS(cpu));
 #endif
 
 #ifndef CONFIG_SOFTMMU
-- 
2.25.1




[PULL for-6.1 03/12] bitops.h: revert db1ffc32dd ("qemu/bitops.h: add bitrev8 implementation")

2021-07-26 Thread Richard Henderson
From: Mark Cave-Ayland 

Commit db1ffc32dd ("qemu/bitops.h: add bitrev8 implementation") introduced
a bitrev8() function to reverse the bit ordering required for storing the
MAC address in the q800 PROM.

This function is not required since QEMU implements its own revbit8()
function which does exactly the same thing. Remove the extraneous
bitrev8() function and switch its only caller in hw/m68k/q800.c to
use revbit8() instead.

Signed-off-by: Mark Cave-Ayland 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
Message-Id: <20210725110557.3007-1-mark.cave-ayl...@ilande.co.uk>
Signed-off-by: Richard Henderson 
---
 include/qemu/bitops.h | 22 --
 hw/m68k/q800.c|  2 +-
 2 files changed, 1 insertion(+), 23 deletions(-)

diff --git a/include/qemu/bitops.h b/include/qemu/bitops.h
index 110c56e099..03213ce952 100644
--- a/include/qemu/bitops.h
+++ b/include/qemu/bitops.h
@@ -618,26 +618,4 @@ static inline uint64_t half_unshuffle64(uint64_t x)
 return x;
 }
 
-/**
- * bitrev8:
- * @x: 8-bit value to be reversed
- *
- * Given an input value with bits::
- *
- *   ABCDEFGH
- *
- * return the value with its bits reversed from left to right::
- *
- *   HGFEDCBA
- *
- * Returns: the bit-reversed value.
- */
-static inline uint8_t bitrev8(uint8_t x)
-{
-x = ((x >> 1) & 0x55) | ((x << 1) & 0xaa);
-x = ((x >> 2) & 0x33) | ((x << 2) & 0xcc);
-x = (x >> 4) | (x << 4) ;
-return x;
-}
-
 #endif
diff --git a/hw/m68k/q800.c b/hw/m68k/q800.c
index 6817c8b5d1..ac0a13060b 100644
--- a/hw/m68k/q800.c
+++ b/hw/m68k/q800.c
@@ -334,7 +334,7 @@ static void q800_init(MachineState *machine)
 prom = memory_region_get_ram_ptr(dp8393x_prom);
 checksum = 0;
 for (i = 0; i < 6; i++) {
-prom[i] = bitrev8(nd_table[0].macaddr.a[i]);
+prom[i] = revbit8(nd_table[0].macaddr.a[i]);
 checksum ^= prom[i];
 }
 prom[7] = 0xff - checksum;
-- 
2.25.1




[PULL for-6.1 01/12] accel/tcg: Don't use CF_COUNT_MASK as the max value of icount_decr.u16.low

2021-07-26 Thread Richard Henderson
From: Peter Maydell 

In cpu_loop_exec_tb() we were bounding the number of insns we might
try to execute in a TB using CF_COUNT_MASK.  This is incorrect,
because we can validly put up to 0x into icount_decr.u16.low.  In
particular, since commit 78ff82bb1b67c0d7 reduced CF_COUNT_MASK to
511 this meant that we would incorrectly only try to execute 511
instructions in a 512-instruction TB, which could result in QEMU
hanging when in icount mode.

Use the actual maximum value, which is 0x. (This brings this code
in to line with the similar logic in icount_prepare_for_run() in
tcg-accel-ops-icount.c.)

Fixes: 78ff82bb1b67c0d7
Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
Reviewed-by: Philippe Mathieu-Daudé 
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/499
Message-Id: <20210725174405.24568-2-peter.mayd...@linaro.org>
Signed-off-by: Richard Henderson 
---
 accel/tcg/cpu-exec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index fc895cf51e..6e8dc29119 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -834,7 +834,7 @@ static inline void cpu_loop_exec_tb(CPUState *cpu, 
TranslationBlock *tb,
 /* Ensure global icount has gone forward */
 icount_update(cpu);
 /* Refill decrementer and continue execution.  */
-insns_left = MIN(CF_COUNT_MASK, cpu->icount_budget);
+insns_left = MIN(0x, cpu->icount_budget);
 cpu_neg(cpu)->icount_decr.u16.low = insns_left;
 cpu->icount_extra = cpu->icount_budget - insns_left;
 
-- 
2.25.1




[PULL for-6.1 00/12] tcg and misc patch queue

2021-07-26 Thread Richard Henderson
The following changes since commit 34fd92ab4142bde5b54adacd16e6682f4ea83da1:

  Merge remote-tracking branch 'remotes/stsquad/tags/pull-for-6.1-rc1-230721-1' 
into staging (2021-07-26 11:00:15 +0100)

are available in the Git repository at:

  https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20210726

for you to fetch changes up to 2bf07e788eb69bee843be274386fb20f4ab6b0f6:

  tests/unit: Remove unused variable from test_io (2021-07-26 07:07:28 -1000)


Fix icount accounting.
Replace bitrev8 with revbit8.
Fixes for set but not used warnings.


Mark Cave-Ayland (1):
  bitops.h: revert db1ffc32dd ("qemu/bitops.h: add bitrev8 implementation")

Peter Maydell (2):
  accel/tcg: Don't use CF_COUNT_MASK as the max value of icount_decr.u16.low
  accel/tcg: Remove unnecessary check on icount_extra in cpu_loop_exec_tb()

Richard Henderson (9):
  nbd/server: Mark variable unused in nbd_negotiate_meta_queries
  accel/tcg: Remove unused variable in cpu_exec
  util/selfmap: Discard mapping on error
  net/checksum: Remove unused variable in net_checksum_add_iov
  hw/audio/adlib: Remove unused variable in adlib_callback
  hw/ppc/spapr_events: Remove unused variable from check_exception
  hw/pci-hist/pnv_phb4: Fix typo in pnv_phb4_ioda_write
  linux-user/syscall: Remove unused variable from execve
  tests/unit: Remove unused variable from test_io

 include/qemu/bitops.h  | 22 --
 accel/tcg/cpu-exec.c   | 16 ++--
 hw/audio/adlib.c   |  3 +--
 hw/m68k/q800.c |  2 +-
 hw/pci-host/pnv_phb4.c |  2 +-
 hw/ppc/spapr_events.c  |  5 -
 linux-user/syscall.c   |  3 ---
 nbd/server.c   |  3 ++-
 net/checksum.c |  4 +---
 tests/unit/test-iov.c  |  5 +
 util/selfmap.c | 29 +
 11 files changed, 30 insertions(+), 64 deletions(-)



Re: [PATCH for-6.1? v2 4/7] jobs: Give Job.force_cancel more meaning

2021-07-26 Thread Eric Blake
On Mon, Jul 26, 2021 at 04:46:10PM +0200, Max Reitz wrote:
> We largely have two cancel modes for jobs:
> 
> First, there is actual cancelling.  The job is terminated as soon as
> possible, without trying to reach a consistent result.
> 
> Second, we have mirror in the READY state.  Technically, the job is not
> really cancelled, but it just is a different completion mode.  The job
> can still run for an indefinite amount of time while it tries to reach a
> consistent result.
> 
> We want to be able to clearly distinguish which cancel mode a job is in
> (when it has been cancelled).  We can use Job.force_cancel for this, but
> right now it only reflects cancel requests from the user with
> force=true, but clearly, jobs that do not even distinguish between
> force=false and force=true are effectively always force-cancelled.
> 
> So this patch has Job.force_cancel signify whether the job will
> terminate as soon as possible (force_cancel=true) or whether it will
> effectively remain running despite being "cancelled"
> (force_cancel=false).
> 
> To this end, we let jobs that provide JobDriver.cancel() tell the
> generic job code whether they will terminate as soon as possible or not,
> and for jobs that do not provide that method we assume they will.
> 
> Signed-off-by: Max Reitz 
> ---
>  include/qemu/job.h | 11 ++-
>  block/backup.c |  3 ++-
>  block/mirror.c | 24 ++--
>  job.c  |  6 +-
>  4 files changed, 35 insertions(+), 9 deletions(-)
>

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH] Fix CPUID_Fn8000001E_EBX for AMD

2021-07-26 Thread Eduardo Habkost
CCing the original author of that code (Babu Moger).

On Wed, Jun 30, 2021 at 04:25:51PM +0800, Jade Cheng wrote:
> According to AMD64 Arch Programmer's Manual Appendix D,
> bits 7:0 in Fn8000_001E_EBX should be physical core(s) per logical processor, 
> not per die.

Do you mean physical cores per package/socket?

> 
> Signed-off-by: Jade Cheng 

Do you have a pointer to the specific paragraph of the
documentation that states that?

https://www.amd.com/system/files/TechDocs/24594.pdf
page 634 says:

  CPUID Fn8000_001E_EBX Compute Unit Identifiers
  [...]
  7:0 ComputeUnitId Compute unit ID. Identifies a Compute Unit,
which may be one or more physical cores that each implement
one or more logical processors

I don't see any content referencing physical cores per logical
processor, or physical cores per package/socket.

Which problem are you trying to fix here?


> ---
>  target/i386/cpu.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index a9fe1662d3..417f5ba81f 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -381,7 +381,13 @@ static void encode_topo_cpuid801e(X86CPU *cpu, 
> X86CPUTopoInfo *topo_info,
>   *  NOTE: CoreId is already part of apic_id. Just use it. We can
>   *  use all the 8 bits to represent the core_id here.
>   */
> -*ebx = ((topo_info->threads_per_core - 1) << 8) | (topo_ids.core_id & 
> 0xFF);
> +uint32_t core_id = topo_ids.core_id;
> +
> +if (IS_AMD_CPU(>env)) {
> +core_id = topo_ids.core_id + topo_ids.die_id * 
> topo_info->cores_per_die;
> +}
> +
> +*ebx = ((topo_info->threads_per_core - 1) << 8) | (core_id & 0xFF);
>  
>  /*
>   * CPUID_Fn801E_ECX [Node Identifiers] (NodeId)
> -- 
> 2.24.3 (Apple Git-128)
> 

-- 
Eduardo




Re: [PATCH] fixup! target/arm: Add sve-default-vector-length cpu property

2021-07-26 Thread Peter Maydell
On Mon, 26 Jul 2021 at 21:11, Richard Henderson
 wrote:
>
> ---
>  docs/system/arm/cpu-features.rst | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/docs/system/arm/cpu-features.rst 
> b/docs/system/arm/cpu-features.rst
> index 79b87f7c5f..67847a5cc1 100644
> --- a/docs/system/arm/cpu-features.rst
> +++ b/docs/system/arm/cpu-features.rst
> @@ -385,5 +385,6 @@ defined to mirror the Linux kernel parameter file
>  is in units of bytes and must be between 16 and 8192.
>  If not specified, the default vector length is 64.
>
> -If the default length is larger than the maximum vector length enabled
> -with ``sve`` properties, the actual vector length will be reduced.
> +If the default length is larger than the maximum vector length enabled,
> +the actual vector length will be reduced.  Note that the maximum vector
> +length supported by QEMU is 256.

Thanks. I've squashed that into the patch in target-arm.next.

I plan to send this as part of a pullreq for rc1 tomorrow probably
midday-ish UK time, unless somebody speaks up now to say they'd
rather have more time for review/testing/whatever. (That wouldn't
mean it missed 6.1, as a bugfix-of-sorts this can go in rc2 if
necessary.)

-- PMM



[PATCH] fixup! target/arm: Add sve-default-vector-length cpu property

2021-07-26 Thread Richard Henderson
---
 docs/system/arm/cpu-features.rst | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/docs/system/arm/cpu-features.rst b/docs/system/arm/cpu-features.rst
index 79b87f7c5f..67847a5cc1 100644
--- a/docs/system/arm/cpu-features.rst
+++ b/docs/system/arm/cpu-features.rst
@@ -385,5 +385,6 @@ defined to mirror the Linux kernel parameter file
 is in units of bytes and must be between 16 and 8192.  
 If not specified, the default vector length is 64.
 
-If the default length is larger than the maximum vector length enabled
-with ``sve`` properties, the actual vector length will be reduced.
+If the default length is larger than the maximum vector length enabled,
+the actual vector length will be reduced.  Note that the maximum vector
+length supported by QEMU is 256.
-- 
2.25.1




Re: [PATCH v5 09/10] ACPI ERST: qtest for ERST

2021-07-26 Thread Eric DeVolder




On 7/26/21 6:45 AM, Igor Mammedov wrote:

On Wed, 21 Jul 2021 11:18:44 -0500
Eric DeVolder  wrote:


On 7/20/21 8:38 AM, Igor Mammedov wrote:

On Wed, 30 Jun 2021 15:07:20 -0400
Eric DeVolder  wrote:
   

This change provides a qtest that locates and then does a simple
interrogation of the ERST feature within the guest.

Signed-off-by: Eric DeVolder 
---
   tests/qtest/erst-test.c | 129 

   tests/qtest/meson.build |   2 +
   2 files changed, 131 insertions(+)
   create mode 100644 tests/qtest/erst-test.c

diff --git a/tests/qtest/erst-test.c b/tests/qtest/erst-test.c
new file mode 100644
index 000..ce014c1
--- /dev/null
+++ b/tests/qtest/erst-test.c
@@ -0,0 +1,129 @@
+/*
+ * QTest testcase for ACPI ERST
+ *
+ * Copyright (c) 2021 Oracle
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/bitmap.h"
+#include "qemu/uuid.h"
+#include "hw/acpi/acpi-defs.h"
+#include "boot-sector.h"
+#include "acpi-utils.h"
+#include "libqos/libqtest.h"
+#include "qapi/qmp/qdict.h"
+
+#define RSDP_ADDR_INVALID 0x10 /* RSDP must be below this address */
+
+static uint64_t acpi_find_erst(QTestState *qts)
+{
+uint32_t rsdp_offset;
+uint8_t rsdp_table[36 /* ACPI 2.0+ RSDP size */];
+uint32_t rsdt_len, table_length;
+uint8_t *rsdt, *ent;
+uint64_t base = 0;
+
+/* Wait for guest firmware to finish and start the payload. */
+boot_sector_test(qts);
+
+/* Tables should be initialized now. */
+rsdp_offset = acpi_find_rsdp_address(qts);
+
+g_assert_cmphex(rsdp_offset, <, RSDP_ADDR_INVALID);
+
+acpi_fetch_rsdp_table(qts, rsdp_offset, rsdp_table);
+acpi_fetch_table(qts, , _len, _table[16 /* RsdtAddress */],
+ 4, "RSDT", true);
+
+ACPI_FOREACH_RSDT_ENTRY(rsdt, rsdt_len, ent, 4 /* Entry size */) {
+uint8_t *table_aml;
+acpi_fetch_table(qts, _aml, _length, ent, 4, NULL, true);
+if (!memcmp(table_aml + 0 /* Header Signature */, "ERST", 4)) {
+/*
+ * Picking up ERST base address from the Register Region
+ * specified as part of the first Serialization Instruction
+ * Action (which is a Begin Write Operation).
+ */
+memcpy(, _aml[56], sizeof(base));
+g_free(table_aml);
+break;
+}
+g_free(table_aml);
+}
+g_free(rsdt);
+return base;
+}

I'd drop this, bios-tables-test should do ACPI table check
as for PCI device itself you can test it with qtest accelerator
that allows to instantiate it and access registers directly
without overhead of running actual guest.

Yes, bios-tables-test checks the ACPI table, but not the functionality.
This test has actually caught a problem/bug during development.


What I'm saying is not to drop test, but rather use qtest
accelerator to test PCI hardware registers. Which doesn't run
guest code. but lets you directly program/access PCI device.

So instead of searching/parsing ERST table, you'd program BARs
manually on behalf of BIOS, and then test that it works as expected.

As for ACPI tables, we don't have complete testing infrastructure
in tree, bios-tables-test, only tests matching to committed
reference blobs. And verifying that reference blob is correct,
is manual process currently.

To test whole stack one could write an optional acceptance test,
which would run actual guest (if you wish to add that, you can look at
docs/devel/testing.rst "Acceptance tests ...").



I've reworked this to pattern it after ivshmem test.





As example you can look into megasas-test.c, ivshmem-test.c
or other PCI device tests.

But I'll look at these and see about migrating to this approach.

   

+static char disk[] = "tests/erst-test-disk-XX";
+
+#define ERST_CMD()  \
+"-accel kvm -accel tcg "\
+"-object memory-backend-file," \
+  "id=erstnvram,mem-path=tests/acpi-erst-XX,size=0x1,share=on " \
+"-device acpi-erst,memdev=erstnvram " \
+"-drive id=hd0,if=none,file=%s,format=raw " \
+"-device ide-hd,drive=hd0 ", disk
+
+static void erst_get_error_log_address_range(void)
+{
+QTestState *qts;
+uint64_t log_address_range = 0;
+unsigned log_address_length = 0;
+unsigned log_address_attr = 0;
+
+qts = qtest_initf(ERST_CMD());
+
+uint64_t base = acpi_find_erst(qts);
+g_assert(base != 0);
+
+/* Issue GET_ERROR_LOG_ADDRESS_RANGE command */
+qtest_writel(qts, base + 0, 0xD);
+/* Read GET_ERROR_LOG_ADDRESS_RANGE result */
+log_address_range = qtest_readq(qts, base + 8);\
+
+/* Issue GET_ERROR_LOG_ADDRESS_RANGE_LENGTH command */
+qtest_writel(qts, base + 0, 0xE);
+/* Read GET_ERROR_LOG_ADDRESS_RANGE_LENGTH result */
+log_address_length = qtest_readq(qts, base + 8);\
+
+/* Issue 

Re: [PATCH v5 06/10] ACPI ERST: build the ACPI ERST table

2021-07-26 Thread Eric DeVolder




On 7/26/21 6:00 AM, Igor Mammedov wrote:

On Wed, 21 Jul 2021 11:12:41 -0500
Eric DeVolder  wrote:


On 7/20/21 8:16 AM, Igor Mammedov wrote:

On Wed, 30 Jun 2021 15:07:17 -0400
Eric DeVolder  wrote:
   

This code is called from the machine code (if ACPI supported)
to generate the ACPI ERST table.

should be along lines:
This builds ACPI ERST table /spec ref/ to inform OSMP
how to communicate with ... device.


Like this?
This builds the ACPI ERST table to inform OSMP how to communicate

  ^ [1]

with the acpi-erst device.



1) ACPI spec vX.Y, chapter foo

done





   


Signed-off-by: Eric DeVolder 
---
   hw/acpi/erst.c | 214 
+
   1 file changed, 214 insertions(+)

diff --git a/hw/acpi/erst.c b/hw/acpi/erst.c
index 6e9bd2e..1f1dbbc 100644
--- a/hw/acpi/erst.c
+++ b/hw/acpi/erst.c
@@ -555,6 +555,220 @@ static const MemoryRegionOps erst_mem_ops = {
   /***/
   /***/
   
+/* ACPI 4.0: 17.4.1.2 Serialization Instruction Entries */

+static void build_serialization_instruction_entry(GArray *table_data,
+uint8_t serialization_action,
+uint8_t instruction,
+uint8_t flags,
+uint8_t register_bit_width,
+uint64_t register_address,
+uint64_t value,
+uint64_t mask)

like I mentioned in previous patch, It could be simplified
a lot if it's possible to use fixed 64-bit access with every
action and the same width mask.

See previous response.

lets see if it's a guest OS issue first, and then decide what to do with it.



   

+{
+/* ACPI 4.0: Table 17-18 Serialization Instruction Entry */
+struct AcpiGenericAddress gas;
+
+/* Serialization Action */
+build_append_int_noprefix(table_data, serialization_action, 1);
+/* Instruction */
+build_append_int_noprefix(table_data, instruction , 1);
+/* Flags */
+build_append_int_noprefix(table_data, flags   , 1);
+/* Reserved */
+build_append_int_noprefix(table_data, 0   , 1);
+/* Register Region */
+gas.space_id = AML_SYSTEM_MEMORY;
+gas.bit_width = register_bit_width;
+gas.bit_offset = 0;
+switch (register_bit_width) {
+case 8:
+gas.access_width = 1;
+break;
+case 16:
+gas.access_width = 2;
+break;
+case 32:
+gas.access_width = 3;
+break;
+case 64:
+gas.access_width = 4;
+break;
+default:
+gas.access_width = 0;
+break;
+}
+gas.address = register_address;
+build_append_gas_from_struct(table_data, );
+/* Value */
+build_append_int_noprefix(table_data, value  , 8);
+/* Mask */
+build_append_int_noprefix(table_data, mask   , 8);
+}
+
+/* ACPI 4.0: 17.4.1 Serialization Action Table */
+void build_erst(GArray *table_data, BIOSLinker *linker, Object *erst_dev,
+const char *oem_id, const char *oem_table_id)
+{
+ERSTDeviceState *s = ACPIERST(erst_dev);


globals are not welcomed in new code,
pass erst_dev as argument here (ex: find_vmgenid_dev)
   

+unsigned action;
+unsigned erst_start = table_data->len;
+
   

+s->bar0 = (hwaddr)pci_get_bar_addr(PCI_DEVICE(erst_dev), 0);
+trace_acpi_erst_pci_bar_0(s->bar0);
+s->bar1 = (hwaddr)pci_get_bar_addr(PCI_DEVICE(erst_dev), 1);


just store pci_get_bar_addr(PCI_DEVICE(erst_dev), 0) in local variable,
Bar 1 is not used in this function so you don't need it here.

Corrected



   

+trace_acpi_erst_pci_bar_1(s->bar1);
+
+acpi_data_push(table_data, sizeof(AcpiTableHeader));
+/* serialization_header_length */

comments documenting table entries should be verbatim copy from spec,
see build_amd_iommu() as example of preferred style.

Corrected

   

+build_append_int_noprefix(table_data, 48, 4);
+/* reserved */
+build_append_int_noprefix(table_data,  0, 4);
+/*
+ * instruction_entry_count - changes to the number of serialization
+ * instructions in the ACTIONs below must be reflected in this
+ * pre-computed value.
+ */
+build_append_int_noprefix(table_data, 29, 4);

a bit fragile as it can easily diverge from actual number later on.
maybe instead of building instruction entries in place, build it
in separate array and when done, use actual count to fill 
instruction_entry_count.
pseudo code could look like:

   /* prepare instructions in advance because ... */
   GArray table_instruction_data;
   build_serialization_instruction_entry(table_instruction_data,...);;
   ...
   build_serialization_instruction_entry(table_instruction_data,...);
   /* instructions count */
   build_append_int_noprefix(table_data, 
table_instruction_data.len/entry_size, 4);
   /* copy prepared in advance instructions */
   g_array_append_vals(table_data, table_instruction_data.data, 

Re: [PATCH v5 08/10] ACPI ERST: create ACPI ERST table for pc/x86 machines.

2021-07-26 Thread Eric DeVolder




On 7/26/21 6:30 AM, Igor Mammedov wrote:

On Wed, 21 Jul 2021 11:16:42 -0500
Eric DeVolder  wrote:


On 7/20/21 8:19 AM, Igor Mammedov wrote:

On Wed, 30 Jun 2021 15:07:19 -0400
Eric DeVolder  wrote:
   

This change exposes ACPI ERST support for x86 guests.

Signed-off-by: Eric DeVolder 

looks good to me, maybe move find_erst_dev() impl. here as well
if it's the patch it's first used.


I've followed your previous suggestion of mimicking find_vmgenid_dev(), which
declares it in its header file. I've done the same, find_erst_dev() is
declared in its header file and used in these files.


it's fine doing like this but
it would be easier to follow if this were part of [6/10],
so that function is introduced and used in the same patch.

Done




   

---
   hw/i386/acpi-build.c   | 9 +
   hw/i386/acpi-microvm.c | 9 +
   2 files changed, 18 insertions(+)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index de98750..d2026cc 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -43,6 +43,7 @@
   #include "sysemu/tpm.h"
   #include "hw/acpi/tpm.h"
   #include "hw/acpi/vmgenid.h"
+#include "hw/acpi/erst.h"
   #include "hw/boards.h"
   #include "sysemu/tpm_backend.h"
   #include "hw/rtc/mc146818rtc_regs.h"
@@ -2327,6 +2328,7 @@ void acpi_build(AcpiBuildTables *tables, MachineState 
*machine)
   GArray *tables_blob = tables->table_data;
   AcpiSlicOem slic_oem = { .id = NULL, .table_id = NULL };
   Object *vmgenid_dev;
+Object *erst_dev;
   char *oem_id;
   char *oem_table_id;
   
@@ -2388,6 +2390,13 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)

   ACPI_DEVICE_IF(x86ms->acpi_dev), x86ms->oem_id,
   x86ms->oem_table_id);
   
+erst_dev = find_erst_dev();

+if (erst_dev) {
+acpi_add_table(table_offsets, tables_blob);
+build_erst(tables_blob, tables->linker, erst_dev,
+   x86ms->oem_id, x86ms->oem_table_id);
+}
+
   vmgenid_dev = find_vmgenid_dev();
   if (vmgenid_dev) {
   acpi_add_table(table_offsets, tables_blob);
diff --git a/hw/i386/acpi-microvm.c b/hw/i386/acpi-microvm.c
index ccd3303..0099b13 100644
--- a/hw/i386/acpi-microvm.c
+++ b/hw/i386/acpi-microvm.c
@@ -30,6 +30,7 @@
   #include "hw/acpi/bios-linker-loader.h"
   #include "hw/acpi/generic_event_device.h"
   #include "hw/acpi/utils.h"
+#include "hw/acpi/erst.h"
   #include "hw/boards.h"
   #include "hw/i386/fw_cfg.h"
   #include "hw/i386/microvm.h"
@@ -160,6 +161,7 @@ static void acpi_build_microvm(AcpiBuildTables *tables,
   X86MachineState *x86ms = X86_MACHINE(mms);
   GArray *table_offsets;
   GArray *tables_blob = tables->table_data;
+Object *erst_dev;
   unsigned dsdt, xsdt;
   AcpiFadtData pmfadt = {
   /* ACPI 5.0: 4.1 Hardware-Reduced ACPI */
@@ -209,6 +211,13 @@ static void acpi_build_microvm(AcpiBuildTables *tables,
   ACPI_DEVICE_IF(x86ms->acpi_dev), x86ms->oem_id,
   x86ms->oem_table_id);
   
+erst_dev = find_erst_dev();

+if (erst_dev) {
+acpi_add_table(table_offsets, tables_blob);
+build_erst(tables_blob, tables->linker, erst_dev,
+   x86ms->oem_id, x86ms->oem_table_id);
+}
+
   xsdt = tables_blob->len;
   build_xsdt(tables_blob, tables->linker, table_offsets, x86ms->oem_id,
  x86ms->oem_table_id);
   








Re: [PATCH v5 07/10] ACPI ERST: trace support

2021-07-26 Thread Eric DeVolder




On 7/26/21 6:08 AM, Igor Mammedov wrote:

On Wed, 21 Jul 2021 11:14:37 -0500
Eric DeVolder  wrote:


On 7/20/21 8:15 AM, Igor Mammedov wrote:

On Wed, 30 Jun 2021 15:07:18 -0400
Eric DeVolder  wrote:
   

Provide the definitions needed to support tracing in ACPI ERST.

trace points should be introduced in patches that use them for the first time,
as it stands now series breaks bisection.


Are you asking to move this patch before the patch that introduces erst.c (which
uses these trace points)?
Or are you asking to include this patch with the patch that introduces erst.c?


I'd squash it into patch that introduces corresponding functions.

Done




Also, you requested I separate the building of ERST table from the implemenation
of the erst device as separate patches. Doesn't that also break bisection?


it should be possible to compile series patch by patch and not break 'make 
check'
after each patch.

ACPI table is not part of device, it's separate part that describes to OSPM
how to work with device. So if code split correctly between patches
it shouldn't break bisection.







   


Signed-off-by: Eric DeVolder 
---
   hw/acpi/trace-events | 14 ++
   1 file changed, 14 insertions(+)

diff --git a/hw/acpi/trace-events b/hw/acpi/trace-events
index dcc1438..a5c2755 100644
--- a/hw/acpi/trace-events
+++ b/hw/acpi/trace-events
@@ -55,3 +55,17 @@ piix4_gpe_writeb(uint64_t addr, unsigned width, uint64_t val) 
"addr: 0x%" PRIx64
   # tco.c
   tco_timer_reload(int ticks, int msec) "ticks=%d (%d ms)"
   tco_timer_expired(int timeouts_no, bool strap, bool no_reboot) "timeouts_no=%d 
no_reboot=%d/%d"
+
+# erst.c
+acpi_erst_reg_write(uint64_t addr, uint64_t val, unsigned size) "addr: 0x%04" PRIx64 " <== 
0x%016" PRIx64 " (size: %u)"
+acpi_erst_reg_read(uint64_t addr, uint64_t val, unsigned size) " addr: 0x%04" PRIx64 " ==> 
0x%016" PRIx64 " (size: %u)"
+acpi_erst_mem_write(uint64_t addr, uint64_t val, unsigned size) "addr: 0x%06" PRIx64 " <== 
0x%016" PRIx64 " (size: %u)"
+acpi_erst_mem_read(uint64_t addr, uint64_t val, unsigned size) " addr: 0x%06" PRIx64 " ==> 
0x%016" PRIx64 " (size: %u)"
+acpi_erst_pci_bar_0(uint64_t addr) "BAR0: 0x%016" PRIx64
+acpi_erst_pci_bar_1(uint64_t addr) "BAR1: 0x%016" PRIx64
+acpi_erst_realizefn_in(void)
+acpi_erst_realizefn_out(unsigned size) "total nvram size %u bytes"
+acpi_erst_reset_in(unsigned record_count) "record_count %u"
+acpi_erst_reset_out(unsigned record_count) "record_count %u"
+acpi_erst_class_init_in(void)
+acpi_erst_class_init_out(void)
   








Re: [PATCH v5 05/10] ACPI ERST: support for ACPI ERST feature

2021-07-26 Thread Eric DeVolder




On 7/26/21 5:42 AM, Igor Mammedov wrote:

On Wed, 21 Jul 2021 11:07:40 -0500
Eric DeVolder  wrote:


On 7/20/21 7:17 AM, Igor Mammedov wrote:

On Wed, 30 Jun 2021 15:07:16 -0400
Eric DeVolder  wrote:
   

This change implements the support for the ACPI ERST feature.

Drop this

Done

   


This implements a PCI device for ACPI ERST. This implments the

s/implments/implements/

Corrected

   

non-NVRAM "mode" of operation for ERST.

add here why non-NVRAM "mode" is implemented.

How about:
This implements a PCI device for ACPI ERST. This implments the
non-NVRAM "mode" of operation for ERST as it is supported by
Linux and Windows and aligns with ERST support in most BIOS.


modulo typos looks good to me.
pls consider using a spell checker to check commit messages/comments.

done







Also even if this non-NVRAM implementation, there is still
a lot of not necessary data copying (see below) so drop it
or justify why it's there.
 

This change also includes erst.c in the build of general ACPI support.

Drop this as well

Done



   

Signed-off-by: Eric DeVolder 
---
   hw/acpi/erst.c  | 704 

   hw/acpi/meson.build |   1 +
   2 files changed, 705 insertions(+)
   create mode 100644 hw/acpi/erst.c

diff --git a/hw/acpi/erst.c b/hw/acpi/erst.c
new file mode 100644
index 000..6e9bd2e
--- /dev/null
+++ b/hw/acpi/erst.c
@@ -0,0 +1,704 @@
+/*
+ * ACPI Error Record Serialization Table, ERST, Implementation
+ *
+ * Copyright (c) 2021 Oracle and/or its affiliates.
+ *
+ * ACPI ERST introduced in ACPI 4.0, June 16, 2009.
+ * ACPI Platform Error Interfaces : Error Serialization
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation;
+ * version 2 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see 


consider adding SPDX-License-Identifier to all new files.

Done

  

+ */
+
+#include 
+#include 
+#include 
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "hw/qdev-core.h"
+#include "exec/memory.h"
+#include "qom/object.h"
+#include "hw/pci/pci.h"
+#include "qom/object_interfaces.h"
+#include "qemu/error-report.h"
+#include "migration/vmstate.h"
+#include "hw/qdev-properties.h"
+#include "hw/acpi/acpi.h"
+#include "hw/acpi/acpi-defs.h"
+#include "hw/acpi/aml-build.h"
+#include "hw/acpi/bios-linker-loader.h"
+#include "exec/address-spaces.h"
+#include "sysemu/hostmem.h"
+#include "hw/acpi/erst.h"
+#include "trace.h"
+
+/* UEFI 2.1: Append N Common Platform Error Record */
+#define UEFI_CPER_RECORD_MIN_SIZE 128U
+#define UEFI_CPER_RECORD_LENGTH_OFFSET 20U
+#define UEFI_CPER_RECORD_ID_OFFSET 96U
+#define IS_UEFI_CPER_RECORD(ptr) \
+(((ptr)[0] == 'C') && \
+ ((ptr)[1] == 'P') && \
+ ((ptr)[2] == 'E') && \
+ ((ptr)[3] == 'R'))
+#define THE_UEFI_CPER_RECORD_ID(ptr) \
+(*(uint64_t *)(&(ptr)[UEFI_CPER_RECORD_ID_OFFSET]))
+
+/*
+ * This implementation is an ACTION (cmd) and VALUE (data)
+ * interface consisting of just two 64-bit registers.
+ */
+#define ERST_REG_SIZE (2UL * sizeof(uint64_t))
   

+#define ERST_CSR_ACTION (0UL << 3) /* action (cmd) */
+#define ERST_CSR_VALUE  (1UL << 3) /* argument/value (data) */

what's meaning of CRS?

CSR = control status register

Looking at patch both should be called ERST_[ACTION|VALUE]_OFFSET

Done

pls use explicit offset values instead of shifting bit.

Done


   

+/*
+ * ERST_RECORD_SIZE is the buffer size for exchanging ERST
+ * record contents. Thus, it defines the maximum record size.
+ * As this is mapped through a PCI BAR, it must be a power of
+ * two, and should be at least PAGE_SIZE.
+ * Records are stored in the backing file in a simple fashion.
+ * The backing file is essentially divided into fixed size
+ * "slots", ERST_RECORD_SIZE in length, with each "slot"
+ * storing a single record. No attempt at optimizing storage
+ * through compression, compaction, etc is attempted.
+ * NOTE that any change to this value will make any pre-
+ * existing backing files, not of the same ERST_RECORD_SIZE,
+ * unusable to the guest.
+ */
+/* 8KiB records, not too small, not too big */
+#define ERST_RECORD_SIZE (2UL * 4096)
+
+#define ERST_INVALID_RECORD_ID (~0UL)
+#define ERST_EXECUTE_OPERATION_MAGIC 0x9CUL
+
+/*
+ * Object cast macro
+ */
+#define ACPIERST(obj) \
+OBJECT_CHECK(ERSTDeviceState, (obj), TYPE_ACPI_ERST)
+
+/*
+ * Main ERST device state structure
+ */
+typedef struct {
+PCIDevice parent_obj;
+
+HostMemoryBackend *hostmem;
+MemoryRegion *hostmem_mr;
+
+

Re: [PATCH v5 02/10] ACPI ERST: specification for ERST support

2021-07-26 Thread Eric DeVolder




On 7/26/21 5:06 AM, Igor Mammedov wrote:

On Wed, 21 Jul 2021 10:42:33 -0500
Eric DeVolder  wrote:


On 7/19/21 10:02 AM, Igor Mammedov wrote:

On Wed, 30 Jun 2021 19:26:39 +
Eric DeVolder  wrote:
   

Oops, at the end of the 4th paragraph, I meant to state that "Linux does not support 
the NVRAM mode."
rather than "non-NVRAM mode", which contradicts everything I stated prior.
Eric.

From: Eric DeVolder 
Sent: Wednesday, June 30, 2021 2:07 PM
To: qemu-devel@nongnu.org 
Cc: m...@redhat.com ; imamm...@redhat.com ; marcel.apfelb...@gmail.com 
; pbonz...@redhat.com ; r...@twiddle.net ; 
ehabk...@redhat.com ; Konrad Wilk ; Boris Ostrovsky 

Subject: [PATCH v5 02/10] ACPI ERST: specification for ERST support

Information on the implementation of the ACPI ERST support.

Signed-off-by: Eric DeVolder 
---
   docs/specs/acpi_erst.txt | 152 
+++
   1 file changed, 152 insertions(+)
   create mode 100644 docs/specs/acpi_erst.txt

diff --git a/docs/specs/acpi_erst.txt b/docs/specs/acpi_erst.txt
new file mode 100644
index 000..79f8eb9
--- /dev/null
+++ b/docs/specs/acpi_erst.txt
@@ -0,0 +1,152 @@
+ACPI ERST DEVICE
+
+
+The ACPI ERST device is utilized to support the ACPI Error Record
+Serialization Table, ERST, functionality. The functionality is
+designed for storing error records in persistent storage for
+future reference/debugging.
+
+The ACPI specification[1], in Chapter "ACPI Platform Error Interfaces
+(APEI)", and specifically subsection "Error Serialization", outlines
+a method for storing error records into persistent storage.
+
+The format of error records is described in the UEFI specification[2],
+in Appendix N "Common Platform Error Record".
+
+While the ACPI specification allows for an NVRAM "mode" (see
+GET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES) where non-volatile RAM is
+directly exposed for direct access by the OS/guest, this implements
+the non-NVRAM "mode". This non-NVRAM "mode" is what is implemented
+by most BIOS (since flash memory requires programming operations
+in order to update its contents). Furthermore, as of the time of this
+writing, Linux does not support the non-NVRAM "mode".


shouldn't it be s/non-NVRAM/NVRAM/ ?


Yes, it has been corrected.

   

+
+
+Background/Motivation
+-
+Linux uses the persistent storage filesystem, pstore, to record
+information (eg. dmesg tail) upon panics and shutdowns.  Pstore is
+independent of, and runs before, kdump.  In certain scenarios (ie.
+hosts/guests with root filesystems on NFS/iSCSI where networking
+software and/or hardware fails), pstore may contain the only
+information available for post-mortem debugging.


well,
it's not the only way, one can use existing pvpanic device to notify
mgmt layer about crash and mgmt layer can take appropriate measures
to for post-mortem debugging, including dumping guest state,
which is superior to anything pstore can offer as VM is still exists
and mgmt layer can inspect VMs crashed state directly or dump
necessary parts of it.

So ERST shouldn't be portrayed as the only way here but rather
as limited alternative to pvpanic in regards to post-mortem debugging
(it's the only way only on bare-metal).

It would be better to describe here other use-cases you've mentioned
in earlier reviews, that justify adding alternative to pvpanic.


I'm not sure how I would change this. I do say "may contain", which means it
is not the only way. Pvpanic is a way to notify the mgmt layer/host, but
this is a method solely with the guest. Each serves a different purpose;
plugs a different hole.



I'd suggest edit  "pstore may contain the only information" as "pstore may contain 
information"


Done


As noted in a separate message, my company has intentions of storing other
data in ERST beyond panics.

perhaps add your use cases here as well.
  


+Two common storage backends for the pstore filesystem are ACPI ERST
+and UEFI. Most BIOS implement ACPI ERST.  UEFI is not utilized in
+all guests. With QEMU supporting ACPI ERST, it becomes a viable
+pstore storage backend for virtual machines (as it is now for
+bare metal machines).
+
   

+Enabling support for ACPI ERST facilitates a consistent method to
+capture kernel panic information in a wide range of guests: from
+resource-constrained microvms to very large guests, and in
+particular, in direct-boot environments (which would lack UEFI
+run-time services).

this hunk probably not necessary
   

+
+Note that Microsoft Windows also utilizes the ACPI ERST for certain
+crash information, if available.

a pointer to a relevant source would be helpful here.


I've included the reference, here for your benefit.
Windows Hardware Error Architecutre, specifically Persistence Mechanism
https://docs.microsoft.com/en-us/windows-hardware/drivers/whea/error-record-persistence-mechanism

   

+Invocation

s/^^/Configuration|Usage/


Corrected

   

+--
+
+To utilize 

[PULL for-6.1 11/11] tests/qtest/nvme-test: add mmio read test

2021-07-26 Thread Klaus Jensen
From: Klaus Jensen 

Add a regression test for mmio read on big-endian hosts.

Signed-off-by: Klaus Jensen 
Reviewed-by: Gollu Appalanaidu 
---
 tests/qtest/nvme-test.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/tests/qtest/nvme-test.c b/tests/qtest/nvme-test.c
index 47e757d7e2af..f8bafb5d70fb 100644
--- a/tests/qtest/nvme-test.c
+++ b/tests/qtest/nvme-test.c
@@ -67,6 +67,30 @@ static void nvmetest_oob_cmb_test(void *obj, void *data, 
QGuestAllocator *alloc)
 g_assert_cmpint(qpci_io_readl(pdev, bar, cmb_bar_size - 1), !=, 
0x44332211);
 }
 
+static void nvmetest_reg_read_test(void *obj, void *data, QGuestAllocator 
*alloc)
+{
+QNvme *nvme = obj;
+QPCIDevice *pdev = >dev;
+QPCIBar bar;
+uint32_t cap_lo, cap_hi;
+uint64_t cap;
+
+qpci_device_enable(pdev);
+bar = qpci_iomap(pdev, 0, NULL);
+
+cap_lo = qpci_io_readl(pdev, bar, 0x0);
+g_assert_cmpint(NVME_CAP_MQES(cap_lo), ==, 0x7ff);
+
+cap_hi = qpci_io_readl(pdev, bar, 0x4);
+g_assert_cmpint(NVME_CAP_MPSMAX((uint64_t)cap_hi << 32), ==, 0x4);
+
+cap = qpci_io_readq(pdev, bar, 0x0);
+g_assert_cmpint(NVME_CAP_MQES(cap), ==, 0x7ff);
+g_assert_cmpint(NVME_CAP_MPSMAX(cap), ==, 0x4);
+
+qpci_iounmap(pdev, bar);
+}
+
 static void nvmetest_pmr_reg_test(void *obj, void *data, QGuestAllocator 
*alloc)
 {
 QNvme *nvme = obj;
@@ -142,6 +166,8 @@ static void nvme_register_nodes(void)
  &(QOSGraphTestOptions) {
 .edge.extra_device_opts = "pmrdev=pmr0"
 });
+
+qos_add_test("reg-read", "nvme", nvmetest_reg_read_test, NULL);
 }
 
 libqos_init(nvme_register_nodes);
-- 
2.32.0




Re: [PATCH-for-6.1 v4 2/4] gitlab-ci: Fix 'when:' condition in acceptance_test_job_template

2021-07-26 Thread Willian Rampazzo
On Mon, Jul 26, 2021 at 12:06 PM Philippe Mathieu-Daudé
 wrote:
>
> Jobs depending on another should not use the 'when: always'
> condition, because if a dependency failed we should not keep
> running jobs depending on it. The correct condition is
> 'when: on_success'.
>
> Fixes: f56bf4caf71 ("gitlab: Run Avocado tests manually (except mainstream 
> CI)")
> Reported-by: Daniel P. Berrangé 
> Reviewed-by: Daniel P. Berrangé 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  .gitlab-ci.d/buildtest-template.yml | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>

Reviewed-by: Willian Rampazzo 




Re: [PATCH v2 3/3] target/arm: Add sve-default-vector-length cpu property

2021-07-26 Thread Peter Maydell
On Mon, 26 Jul 2021 at 19:34, Richard Henderson
 wrote:
>
> On 7/26/21 4:59 AM, Andrew Jones wrote:
> >> +SVE User-mode Default Vector Length Property
> >> +
> >> +
> >> +For qemu-aarch64, the cpu property `sve-default-vector-length=N` is
> >> +defined to mirror the Linux kernel parameter file
> >> +`/proc/sys/abi/sve_default_vector_length`.  The default length, `N`,
> >> +is in units of bytes and must be between 16 and 8192.
> >
> > Hmm. If a user inputs anything greater than 256, then won't it get
> > silently reduced to 256?
>
> Yes.
>
> >> +If not specified, the default vector length is 64.
> >> +
> >> +If the default length is larger than the maximum vector length enabled
> >> +with `sve` properties, the actual vector length will be reduced.
> >
> > Here it's pointed out that the default may be reduced, but it implies that
> > that only happens if an sve property is also given. Won't users wonder
> > why they're only getting vectors that are 256 bytes large even when they
> > ask for more?
>
> I guess adding that 256 is the maximum length supported by qemu should be 
> sufficient?

Could you either provide a fixup diff for me to fold into this patch
or else repost a new version? (I don't mind which; if it's just a
docs tweak fixup diff is probably simplest.)

thanks
-- PMM



[PULL for-6.1 09/11] hw/nvme: fix out-of-bounds reads

2021-07-26 Thread Klaus Jensen
From: Klaus Jensen 

Peter noticed that mmio access may read into the NvmeParams member in
the NvmeCtrl struct.

Fix the bounds check.

Reported-by: Peter Maydell 
Signed-off-by: Klaus Jensen 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Peter Maydell 
---
 hw/nvme/ctrl.c | 27 +++
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 23ff71f65c0e..10c2363c1d4d 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -5969,23 +5969,26 @@ static uint64_t nvme_mmio_read(void *opaque, hwaddr 
addr, unsigned size)
 /* should RAZ, fall through for now */
 }
 
-if (addr < sizeof(n->bar)) {
-/*
- * When PMRWBM bit 1 is set then read from
- * from PMRSTS should ensure prior writes
- * made it to persistent media
- */
-if (addr == NVME_REG_PMRSTS &&
-(NVME_PMRCAP_PMRWBM(n->bar.pmrcap) & 0x02)) {
-memory_region_msync(>pmr.dev->mr, 0, n->pmr.dev->size);
-}
-memcpy(, ptr + addr, size);
-} else {
+if (addr > sizeof(n->bar) - size) {
 NVME_GUEST_ERR(pci_nvme_ub_mmiord_invalid_ofs,
"MMIO read beyond last register,"
" offset=0x%"PRIx64", returning 0", addr);
+
+return 0;
 }
 
+/*
+ * When PMRWBM bit 1 is set then read from
+ * from PMRSTS should ensure prior writes
+ * made it to persistent media
+ */
+if (addr == NVME_REG_PMRSTS &&
+(NVME_PMRCAP_PMRWBM(n->bar.pmrcap) & 0x02)) {
+memory_region_msync(>pmr.dev->mr, 0, n->pmr.dev->size);
+}
+
+memcpy(, ptr + addr, size);
+
 return val;
 }
 
-- 
2.32.0




Re: [PULL for-6.1 0/1] Block patches

2021-07-26 Thread Peter Maydell
On Mon, 26 Jul 2021 at 09:53, Stefan Hajnoczi  wrote:
>
> The following changes since commit a2376507f615495b1d16685449ce0ea78c2caf9d:
>
>   Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' 
> into staging (2021-07-24 11:04:57 +0100)
>
> are available in the Git repository at:
>
>   https://gitlab.com/stefanha/qemu.git tags/block-pull-request
>
> for you to fetch changes up to 15a730e7a3aaac180df72cd5730e0617bcf44a5a:
>
>   block/nvme: Fix VFIO_MAP_DMA failed: No space left on device (2021-07-26 
> 09:38:12 +0100)
>
> 
> Pull request
>
> Phil's block/nvme.c ENOSPC fix for newer Linux kernels that return this errno.
>
> 


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/6.1
for any user-visible changes.

-- PMM



Re: [PATCH-for-6.1 v4 3/4] gitlab-ci: Fix 'when:' condition in EDK2 jobs

2021-07-26 Thread Willian Rampazzo
On Mon, Jul 26, 2021 at 12:07 PM Philippe Mathieu-Daudé
 wrote:
>
> Jobs depending on another should not use the 'when: always'
> condition, because if a dependency failed we should not keep
> running jobs depending on it. The correct condition is
> 'when: on_success'.
>
> Fixes: 71920809cea ("gitlab-ci.yml: Add jobs to build EDK2 firmware binaries")
> Reported-by: Daniel P. Berrangé 
> Reviewed-by: Daniel P. Berrangé 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  .gitlab-ci.d/edk2.yml | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>

Reviewed-by: Willian Rampazzo 




Re: [PATCH-for-6.1 v4 1/4] docs: Document GitLab custom CI/CD variables

2021-07-26 Thread Willian Rampazzo
On Mon, Jul 26, 2021 at 12:08 PM Philippe Mathieu-Daudé
 wrote:
>
> We introduced the QEMU_CI_AVOCADO_TESTING variable in commit f56bf4caf
> ("gitlab: Run Avocado tests manually (except mainstream CI)"), but
> forgot to document it properly. Do it now.
>
> Suggested-by: Thomas Huth 
> Reviewed-by: Daniel P. Berrangé 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  docs/devel/ci.rst | 40 
>  .gitlab-ci.yml| 19 ++-
>  2 files changed, 42 insertions(+), 17 deletions(-)
>

Reviewed-by: Willian Rampazzo 




[PULL for-6.1 10/11] hw/nvme: fix mmio read

2021-07-26 Thread Klaus Jensen
From: Klaus Jensen 

The new PMR test unearthed a long-standing issue with MMIO reads on
big-endian hosts.

Fix this by unconditionally storing all controller registers in little
endian.

Cc: Gollu Appalanaidu 
Reported-by: Peter Maydell 
Signed-off-by: Klaus Jensen 
Reviewed-by: Peter Maydell 
---
 hw/nvme/ctrl.c | 291 +++--
 1 file changed, 162 insertions(+), 129 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 10c2363c1d4d..43dfaeac9f54 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -439,10 +439,12 @@ static uint8_t nvme_sq_empty(NvmeSQueue *sq)
 
 static void nvme_irq_check(NvmeCtrl *n)
 {
+uint32_t intms = ldl_le_p(>bar.intms);
+
 if (msix_enabled(&(n->parent_obj))) {
 return;
 }
-if (~n->bar.intms & n->irq_status) {
+if (~intms & n->irq_status) {
 pci_irq_assert(>parent_obj);
 } else {
 pci_irq_deassert(>parent_obj);
@@ -1289,7 +1291,7 @@ static void nvme_post_cqes(void *opaque)
 if (ret) {
 trace_pci_nvme_err_addr_write(addr);
 trace_pci_nvme_err_cfs();
-n->bar.csts = NVME_CSTS_FAILED;
+stl_le_p(>bar.csts, NVME_CSTS_FAILED);
 break;
 }
 QTAILQ_REMOVE(>req_list, req, entry);
@@ -4022,7 +4024,7 @@ static uint16_t nvme_create_sq(NvmeCtrl *n, NvmeRequest 
*req)
 trace_pci_nvme_err_invalid_create_sq_sqid(sqid);
 return NVME_INVALID_QID | NVME_DNR;
 }
-if (unlikely(!qsize || qsize > NVME_CAP_MQES(n->bar.cap))) {
+if (unlikely(!qsize || qsize > NVME_CAP_MQES(ldq_le_p(>bar.cap {
 trace_pci_nvme_err_invalid_create_sq_size(qsize);
 return NVME_MAX_QSIZE_EXCEEDED | NVME_DNR;
 }
@@ -4208,7 +4210,7 @@ static uint16_t nvme_cmd_effects(NvmeCtrl *n, uint8_t 
csi, uint32_t buf_len,
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 
-switch (NVME_CC_CSS(n->bar.cc)) {
+switch (NVME_CC_CSS(ldl_le_p(>bar.cc))) {
 case NVME_CC_CSS_NVM:
 src_iocs = nvme_cse_iocs_nvm;
 /* fall through */
@@ -4370,7 +4372,7 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeRequest 
*req)
 trace_pci_nvme_err_invalid_create_cq_cqid(cqid);
 return NVME_INVALID_QID | NVME_DNR;
 }
-if (unlikely(!qsize || qsize > NVME_CAP_MQES(n->bar.cap))) {
+if (unlikely(!qsize || qsize > NVME_CAP_MQES(ldq_le_p(>bar.cap {
 trace_pci_nvme_err_invalid_create_cq_size(qsize);
 return NVME_MAX_QSIZE_EXCEEDED | NVME_DNR;
 }
@@ -5163,17 +5165,19 @@ static void nvme_update_dmrsl(NvmeCtrl *n)
 
 static void nvme_select_iocs_ns(NvmeCtrl *n, NvmeNamespace *ns)
 {
+uint32_t cc = ldl_le_p(>bar.cc);
+
 ns->iocs = nvme_cse_iocs_none;
 switch (ns->csi) {
 case NVME_CSI_NVM:
-if (NVME_CC_CSS(n->bar.cc) != NVME_CC_CSS_ADMIN_ONLY) {
+if (NVME_CC_CSS(cc) != NVME_CC_CSS_ADMIN_ONLY) {
 ns->iocs = nvme_cse_iocs_nvm;
 }
 break;
 case NVME_CSI_ZONED:
-if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_CSI) {
+if (NVME_CC_CSS(cc) == NVME_CC_CSS_CSI) {
 ns->iocs = nvme_cse_iocs_zoned;
-} else if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_NVM) {
+} else if (NVME_CC_CSS(cc) == NVME_CC_CSS_NVM) {
 ns->iocs = nvme_cse_iocs_nvm;
 }
 break;
@@ -5510,7 +5514,7 @@ static void nvme_process_sq(void *opaque)
 if (nvme_addr_read(n, addr, (void *), sizeof(cmd))) {
 trace_pci_nvme_err_addr_read(addr);
 trace_pci_nvme_err_cfs();
-n->bar.csts = NVME_CSTS_FAILED;
+stl_le_p(>bar.csts, NVME_CSTS_FAILED);
 break;
 }
 nvme_inc_sq_head(sq);
@@ -5565,8 +5569,6 @@ static void nvme_ctrl_reset(NvmeCtrl *n)
 n->aer_queued = 0;
 n->outstanding_aers = 0;
 n->qs_created = false;
-
-n->bar.cc = 0;
 }
 
 static void nvme_ctrl_shutdown(NvmeCtrl *n)
@@ -5605,7 +5607,12 @@ static void nvme_select_iocs(NvmeCtrl *n)
 
 static int nvme_start_ctrl(NvmeCtrl *n)
 {
-uint32_t page_bits = NVME_CC_MPS(n->bar.cc) + 12;
+uint64_t cap = ldq_le_p(>bar.cap);
+uint32_t cc = ldl_le_p(>bar.cc);
+uint32_t aqa = ldl_le_p(>bar.aqa);
+uint64_t asq = ldq_le_p(>bar.asq);
+uint64_t acq = ldq_le_p(>bar.acq);
+uint32_t page_bits = NVME_CC_MPS(cc) + 12;
 uint32_t page_size = 1 << page_bits;
 
 if (unlikely(n->cq[0])) {
@@ -5616,73 +5623,72 @@ static int nvme_start_ctrl(NvmeCtrl *n)
 trace_pci_nvme_err_startfail_sq();
 return -1;
 }
-if (unlikely(!n->bar.asq)) {
+if (unlikely(!asq)) {
 trace_pci_nvme_err_startfail_nbarasq();
 return -1;
 }
-if (unlikely(!n->bar.acq)) {
+if (unlikely(!acq)) {
 trace_pci_nvme_err_startfail_nbaracq();
 return -1;
 }
-if (unlikely(n->bar.asq & (page_size - 1))) {
-trace_pci_nvme_err_startfail_asq_misaligned(n->bar.asq);
+if (unlikely(asq & 

[PULL for-6.1 08/11] hw/nvme: use symbolic names for registers

2021-07-26 Thread Klaus Jensen
From: Klaus Jensen 

Add the NvmeBarRegs enum and use these instead of explicit register
offsets.

Signed-off-by: Klaus Jensen 
Reviewed-by: Gollu Appalanaidu 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Keith Busch 
---
 include/block/nvme.h | 29 -
 hw/nvme/ctrl.c   | 44 ++--
 2 files changed, 50 insertions(+), 23 deletions(-)

diff --git a/include/block/nvme.h b/include/block/nvme.h
index 84053b68b987..77aae0117494 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -9,7 +9,7 @@ typedef struct QEMU_PACKED NvmeBar {
 uint32_tcc;
 uint8_t rsvd24[4];
 uint32_tcsts;
-uint32_tnssrc;
+uint32_tnssr;
 uint32_taqa;
 uint64_tasq;
 uint64_tacq;
@@ -31,6 +31,33 @@ typedef struct QEMU_PACKED NvmeBar {
 uint8_t css[484];
 } NvmeBar;
 
+enum NvmeBarRegs {
+NVME_REG_CAP = offsetof(NvmeBar, cap),
+NVME_REG_VS  = offsetof(NvmeBar, vs),
+NVME_REG_INTMS   = offsetof(NvmeBar, intms),
+NVME_REG_INTMC   = offsetof(NvmeBar, intmc),
+NVME_REG_CC  = offsetof(NvmeBar, cc),
+NVME_REG_CSTS= offsetof(NvmeBar, csts),
+NVME_REG_NSSR= offsetof(NvmeBar, nssr),
+NVME_REG_AQA = offsetof(NvmeBar, aqa),
+NVME_REG_ASQ = offsetof(NvmeBar, asq),
+NVME_REG_ACQ = offsetof(NvmeBar, acq),
+NVME_REG_CMBLOC  = offsetof(NvmeBar, cmbloc),
+NVME_REG_CMBSZ   = offsetof(NvmeBar, cmbsz),
+NVME_REG_BPINFO  = offsetof(NvmeBar, bpinfo),
+NVME_REG_BPRSEL  = offsetof(NvmeBar, bprsel),
+NVME_REG_BPMBL   = offsetof(NvmeBar, bpmbl),
+NVME_REG_CMBMSC  = offsetof(NvmeBar, cmbmsc),
+NVME_REG_CMBSTS  = offsetof(NvmeBar, cmbsts),
+NVME_REG_PMRCAP  = offsetof(NvmeBar, pmrcap),
+NVME_REG_PMRCTL  = offsetof(NvmeBar, pmrctl),
+NVME_REG_PMRSTS  = offsetof(NvmeBar, pmrsts),
+NVME_REG_PMREBS  = offsetof(NvmeBar, pmrebs),
+NVME_REG_PMRSWTP = offsetof(NvmeBar, pmrswtp),
+NVME_REG_PMRMSCL = offsetof(NvmeBar, pmrmscl),
+NVME_REG_PMRMSCU = offsetof(NvmeBar, pmrmscu),
+};
+
 enum NvmeCapShift {
 CAP_MQES_SHIFT = 0,
 CAP_CQR_SHIFT  = 16,
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 070d9f6a962d..23ff71f65c0e 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -5740,7 +5740,7 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, 
uint64_t data,
 }
 
 switch (offset) {
-case 0xc:   /* INTMS */
+case NVME_REG_INTMS:
 if (unlikely(msix_enabled(&(n->parent_obj {
 NVME_GUEST_ERR(pci_nvme_ub_mmiowr_intmask_with_msix,
"undefined access to interrupt mask set"
@@ -5752,7 +5752,7 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, 
uint64_t data,
 trace_pci_nvme_mmio_intm_set(data & 0x, n->bar.intmc);
 nvme_irq_check(n);
 break;
-case 0x10:  /* INTMC */
+case NVME_REG_INTMC:
 if (unlikely(msix_enabled(&(n->parent_obj {
 NVME_GUEST_ERR(pci_nvme_ub_mmiowr_intmask_with_msix,
"undefined access to interrupt mask clr"
@@ -5764,7 +5764,7 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, 
uint64_t data,
 trace_pci_nvme_mmio_intm_clr(data & 0x, n->bar.intmc);
 nvme_irq_check(n);
 break;
-case 0x14:  /* CC */
+case NVME_REG_CC:
 trace_pci_nvme_mmio_cfg(data & 0x);
 /* Windows first sends data, then sends enable bit */
 if (!NVME_CC_EN(data) && !NVME_CC_EN(n->bar.cc) &&
@@ -5798,7 +5798,7 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, 
uint64_t data,
 n->bar.cc = data;
 }
 break;
-case 0x1c:  /* CSTS */
+case NVME_REG_CSTS:
 if (data & (1 << 4)) {
 NVME_GUEST_ERR(pci_nvme_ub_mmiowr_ssreset_w1c_unsupported,
"attempted to W1C CSTS.NSSRO"
@@ -5809,7 +5809,7 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, 
uint64_t data,
" of controller status");
 }
 break;
-case 0x20:  /* NSSR */
+case NVME_REG_NSSR:
 if (data == 0x4e564d65) {
 trace_pci_nvme_ub_mmiowr_ssreset_unsupported();
 } else {
@@ -5817,38 +5817,38 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, 
uint64_t data,
 return;
 }
 break;
-case 0x24:  /* AQA */
+case NVME_REG_AQA:
 n->bar.aqa = data & 0x;
 trace_pci_nvme_mmio_aqattr(data & 0x);
 break;
-case 0x28:  /* ASQ */
+case NVME_REG_ASQ:
 n->bar.asq = size == 8 ? data :
 (n->bar.asq & ~0xULL) | (data & 0x);
 trace_pci_nvme_mmio_asqaddr(data);
 break;
-case 0x2c:  /* ASQ hi */
+case NVME_REG_ASQ + 4:
 n->bar.asq = (n->bar.asq & 0x) | (data << 32);
 trace_pci_nvme_mmio_asqaddr_hi(data, 

[PULL for-6.1 06/11] hw/nvme: fix controller hot unplugging

2021-07-26 Thread Klaus Jensen
From: Klaus Jensen 

Prior to this patch the nvme-ns devices are always children of the
NvmeBus owned by the NvmeCtrl. This causes the namespaces to be
unrealized when the parent device is removed. However, when subsystems
are involved, this is not what we want since the namespaces may be
attached to other controllers as well.

This patch adds an additional NvmeBus on the subsystem device. When
nvme-ns devices are realized, if the parent controller device is linked
to a subsystem, the parent bus is set to the subsystem one instead. This
makes sure that namespaces are kept alive and not unrealized.

Reviewed-by: Hannes Reinecke 
Signed-off-by: Klaus Jensen 
---
 hw/nvme/nvme.h   | 15 ---
 hw/nvme/ctrl.c   | 14 ++
 hw/nvme/ns.c | 18 ++
 hw/nvme/subsys.c |  3 +++
 4 files changed, 35 insertions(+), 15 deletions(-)

diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index c4065467d877..83ffabade4cf 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -33,12 +33,20 @@ QEMU_BUILD_BUG_ON(NVME_MAX_NAMESPACES > NVME_NSID_BROADCAST 
- 1);
 typedef struct NvmeCtrl NvmeCtrl;
 typedef struct NvmeNamespace NvmeNamespace;
 
+#define TYPE_NVME_BUS "nvme-bus"
+OBJECT_DECLARE_SIMPLE_TYPE(NvmeBus, NVME_BUS)
+
+typedef struct NvmeBus {
+BusState parent_bus;
+} NvmeBus;
+
 #define TYPE_NVME_SUBSYS "nvme-subsys"
 #define NVME_SUBSYS(obj) \
 OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)
 
 typedef struct NvmeSubsystem {
 DeviceState parent_obj;
+NvmeBus bus;
 uint8_t subnqn[256];
 
 NvmeCtrl  *ctrls[NVME_MAX_CONTROLLERS];
@@ -365,13 +373,6 @@ typedef struct NvmeCQueue {
 QTAILQ_HEAD(, NvmeRequest) req_list;
 } NvmeCQueue;
 
-#define TYPE_NVME_BUS "nvme-bus"
-#define NVME_BUS(obj) OBJECT_CHECK(NvmeBus, (obj), TYPE_NVME_BUS)
-
-typedef struct NvmeBus {
-BusState parent_bus;
-} NvmeBus;
-
 #define TYPE_NVME "nvme"
 #define NVME(obj) \
 OBJECT_CHECK(NvmeCtrl, (obj), TYPE_NVME)
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index ead7531bde5e..2f0524e12a36 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -6527,16 +6527,14 @@ static void nvme_exit(PCIDevice *pci_dev)
 
 nvme_ctrl_reset(n);
 
-for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
-ns = nvme_ns(n, i);
-if (!ns) {
-continue;
+if (n->subsys) {
+for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
+ns = nvme_ns(n, i);
+if (ns) {
+ns->attached--;
+}
 }
 
-nvme_ns_cleanup(ns);
-}
-
-if (n->subsys) {
 nvme_subsys_unregister_ctrl(n->subsys, n);
 }
 
diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index 3c4f5b8c714a..b7cf1494e75b 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -441,6 +441,15 @@ void nvme_ns_cleanup(NvmeNamespace *ns)
 }
 }
 
+static void nvme_ns_unrealize(DeviceState *dev)
+{
+NvmeNamespace *ns = NVME_NS(dev);
+
+nvme_ns_drain(ns);
+nvme_ns_shutdown(ns);
+nvme_ns_cleanup(ns);
+}
+
 static void nvme_ns_realize(DeviceState *dev, Error **errp)
 {
 NvmeNamespace *ns = NVME_NS(dev);
@@ -462,6 +471,14 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
"linked to an nvme-subsys device");
 return;
 }
+} else {
+/*
+ * If this namespace belongs to a subsystem (through a link on the
+ * controller device), reparent the device.
+ */
+if (!qdev_set_parent_bus(dev, >bus.parent_bus, errp)) {
+return;
+}
 }
 
 if (nvme_ns_setup(ns, errp)) {
@@ -552,6 +569,7 @@ static void nvme_ns_class_init(ObjectClass *oc, void *data)
 
 dc->bus_type = TYPE_NVME_BUS;
 dc->realize = nvme_ns_realize;
+dc->unrealize = nvme_ns_unrealize;
 device_class_set_props(dc, nvme_ns_props);
 dc->desc = "Virtual NVMe namespace";
 }
diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c
index 92caa604a280..93c35950d69d 100644
--- a/hw/nvme/subsys.c
+++ b/hw/nvme/subsys.c
@@ -50,6 +50,9 @@ static void nvme_subsys_realize(DeviceState *dev, Error 
**errp)
 {
 NvmeSubsystem *subsys = NVME_SUBSYS(dev);
 
+qbus_create_inplace(>bus, sizeof(NvmeBus), TYPE_NVME_BUS, dev,
+dev->id);
+
 nvme_subsys_setup(subsys);
 }
 
-- 
2.32.0




[PULL for-6.1 07/11] hw/nvme: split pmrmsc register into upper and lower

2021-07-26 Thread Klaus Jensen
From: Klaus Jensen 

The specification uses a set of 32 bit PMRMSCL and PMRMSCU registers to
make up the 64 bit logical PMRMSC register.

Make it so.

Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 include/block/nvme.h | 31 ---
 hw/nvme/ctrl.c   | 10 ++
 2 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/include/block/nvme.h b/include/block/nvme.h
index 527105fafc0b..84053b68b987 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -26,7 +26,8 @@ typedef struct QEMU_PACKED NvmeBar {
 uint32_tpmrsts;
 uint32_tpmrebs;
 uint32_tpmrswtp;
-uint64_tpmrmsc;
+uint32_tpmrmscl;
+uint32_tpmrmscu;
 uint8_t css[484];
 } NvmeBar;
 
@@ -475,25 +476,25 @@ enum NvmePmrswtpMask {
 #define NVME_PMRSWTP_SET_PMRSWTV(pmrswtp, val)   \
 (pmrswtp |= (uint64_t)(val & PMRSWTP_PMRSWTV_MASK) << 
PMRSWTP_PMRSWTV_SHIFT)
 
-enum NvmePmrmscShift {
-PMRMSC_CMSE_SHIFT   = 1,
-PMRMSC_CBA_SHIFT= 12,
+enum NvmePmrmsclShift {
+PMRMSCL_CMSE_SHIFT   = 1,
+PMRMSCL_CBA_SHIFT= 12,
 };
 
-enum NvmePmrmscMask {
-PMRMSC_CMSE_MASK   = 0x1,
-PMRMSC_CBA_MASK= 0xf,
+enum NvmePmrmsclMask {
+PMRMSCL_CMSE_MASK   = 0x1,
+PMRMSCL_CBA_MASK= 0xf,
 };
 
-#define NVME_PMRMSC_CMSE(pmrmsc)\
-((pmrmsc >> PMRMSC_CMSE_SHIFT)   & PMRMSC_CMSE_MASK)
-#define NVME_PMRMSC_CBA(pmrmsc) \
-((pmrmsc >> PMRMSC_CBA_SHIFT)   & PMRMSC_CBA_MASK)
+#define NVME_PMRMSCL_CMSE(pmrmscl)\
+((pmrmscl >> PMRMSCL_CMSE_SHIFT)   & PMRMSCL_CMSE_MASK)
+#define NVME_PMRMSCL_CBA(pmrmscl) \
+((pmrmscl >> PMRMSCL_CBA_SHIFT)   & PMRMSCL_CBA_MASK)
 
-#define NVME_PMRMSC_SET_CMSE(pmrmsc, val)   \
-(pmrmsc |= (uint64_t)(val & PMRMSC_CMSE_MASK) << PMRMSC_CMSE_SHIFT)
-#define NVME_PMRMSC_SET_CBA(pmrmsc, val)   \
-(pmrmsc |= (uint64_t)(val & PMRMSC_CBA_MASK) << PMRMSC_CBA_SHIFT)
+#define NVME_PMRMSCL_SET_CMSE(pmrmscl, val)   \
+(pmrmscl |= (uint32_t)(val & PMRMSCL_CMSE_MASK) << PMRMSCL_CMSE_SHIFT)
+#define NVME_PMRMSCL_SET_CBA(pmrmscl, val)   \
+(pmrmscl |= (uint32_t)(val & PMRMSCL_CBA_MASK) << PMRMSCL_CBA_SHIFT)
 
 enum NvmeSglDescriptorType {
 NVME_SGL_DESCR_TYPE_DATA_BLOCK  = 0x0,
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 2f0524e12a36..070d9f6a962d 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -5916,11 +5916,13 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, 
uint64_t data,
 return;
 }
 
-n->bar.pmrmsc = (n->bar.pmrmsc & ~0x) | (data & 0x);
+n->bar.pmrmscl = data;
 n->pmr.cmse = false;
 
-if (NVME_PMRMSC_CMSE(n->bar.pmrmsc)) {
-hwaddr cba = NVME_PMRMSC_CBA(n->bar.pmrmsc) << PMRMSC_CBA_SHIFT;
+if (NVME_PMRMSCL_CMSE(n->bar.pmrmscl)) {
+uint64_t pmrmscu = n->bar.pmrmscu;
+hwaddr cba = (pmrmscu << 32) |
+(NVME_PMRMSCL_CBA(n->bar.pmrmscl) << PMRMSCL_CBA_SHIFT);
 if (cba + int128_get64(n->pmr.dev->mr.size) < cba) {
 NVME_PMRSTS_SET_CBAI(n->bar.pmrsts, 1);
 return;
@@ -5936,7 +5938,7 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, 
uint64_t data,
 return;
 }
 
-n->bar.pmrmsc = (n->bar.pmrmsc & 0x) | (data << 32);
+n->bar.pmrmscu = data;
 return;
 default:
 NVME_GUEST_ERR(pci_nvme_ub_mmiowr_invalid,
-- 
2.32.0




[PULL for-6.1 04/11] hw/nvme: error handling for too many mappings

2021-07-26 Thread Klaus Jensen
From: Padmakar Kalghatgi 

If the number of PRP/SGL mappings exceed 1024, reads and writes will
fail because of an internal QEMU limitation of max 1024 vectors.

Signed-off-by: Padmakar Kalghatgi 
Reviewed-by: Klaus Jensen 
[k.jensen: changed the error message to be more generic]
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c   | 13 +
 hw/nvme/trace-events |  1 +
 2 files changed, 14 insertions(+)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 90e3ee2b70ee..ead7531bde5e 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -623,6 +623,10 @@ static uint16_t nvme_map_addr(NvmeCtrl *n, NvmeSg *sg, 
hwaddr addr, size_t len)
 return NVME_INVALID_USE_OF_CMB | NVME_DNR;
 }
 
+if (sg->iov.niov + 1 > IOV_MAX) {
+goto max_mappings_exceeded;
+}
+
 if (cmb) {
 return nvme_map_addr_cmb(n, >iov, addr, len);
 } else {
@@ -634,9 +638,18 @@ static uint16_t nvme_map_addr(NvmeCtrl *n, NvmeSg *sg, 
hwaddr addr, size_t len)
 return NVME_INVALID_USE_OF_CMB | NVME_DNR;
 }
 
+if (sg->qsg.nsg + 1 > IOV_MAX) {
+goto max_mappings_exceeded;
+}
+
 qemu_sglist_add(>qsg, addr, len);
 
 return NVME_SUCCESS;
+
+max_mappings_exceeded:
+NVME_GUEST_ERR(pci_nvme_ub_too_many_mappings,
+   "number of mappings exceed 1024");
+return NVME_INTERNAL_DEV_ERROR | NVME_DNR;
 }
 
 static inline bool nvme_addr_is_dma(NvmeCtrl *n, hwaddr addr)
diff --git a/hw/nvme/trace-events b/hw/nvme/trace-events
index f9a1f14e2638..430eeb395b24 100644
--- a/hw/nvme/trace-events
+++ b/hw/nvme/trace-events
@@ -199,3 +199,4 @@ pci_nvme_ub_db_wr_invalid_cqhead(uint32_t qid, uint16_t 
new_head) "completion qu
 pci_nvme_ub_db_wr_invalid_sq(uint32_t qid) "submission queue doorbell write 
for nonexistent queue, sqid=%"PRIu32", ignoring"
 pci_nvme_ub_db_wr_invalid_sqtail(uint32_t qid, uint16_t new_tail) "submission 
queue doorbell write value beyond queue size, sqid=%"PRIu32", 
new_head=%"PRIu16", ignoring"
 pci_nvme_ub_unknown_css_value(void) "unknown value in cc.css field"
+pci_nvme_ub_too_many_mappings(void) "too many prp/sgl mappings"
-- 
2.32.0




[PULL for-6.1 05/11] tests/qtest/nvme-test: add persistent memory region test

2021-07-26 Thread Klaus Jensen
From: Gollu Appalanaidu 

This will test the PMR functionality.

Signed-off-by: Gollu Appalanaidu 
Reviewed-by: Klaus Jensen 
[k.jensen: replaced memory-backend-file with memory-backend-ram]
Signed-off-by: Klaus Jensen 
---
 tests/qtest/nvme-test.c | 61 -
 1 file changed, 60 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/nvme-test.c b/tests/qtest/nvme-test.c
index d32c953a3824..47e757d7e2af 100644
--- a/tests/qtest/nvme-test.c
+++ b/tests/qtest/nvme-test.c
@@ -13,6 +13,7 @@
 #include "libqos/libqtest.h"
 #include "libqos/qgraph.h"
 #include "libqos/pci.h"
+#include "include/block/nvme.h"
 
 typedef struct QNvme QNvme;
 
@@ -66,12 +67,65 @@ static void nvmetest_oob_cmb_test(void *obj, void *data, 
QGuestAllocator *alloc)
 g_assert_cmpint(qpci_io_readl(pdev, bar, cmb_bar_size - 1), !=, 
0x44332211);
 }
 
+static void nvmetest_pmr_reg_test(void *obj, void *data, QGuestAllocator 
*alloc)
+{
+QNvme *nvme = obj;
+QPCIDevice *pdev = >dev;
+QPCIBar pmr_bar, nvme_bar;
+uint32_t pmrcap, pmrsts;
+
+qpci_device_enable(pdev);
+pmr_bar = qpci_iomap(pdev, 4, NULL);
+
+/* Without Enabling PMRCTL check bar enablemet */
+qpci_io_writel(pdev, pmr_bar, 0, 0xccbbaa99);
+g_assert_cmpint(qpci_io_readb(pdev, pmr_bar, 0), !=, 0x99);
+g_assert_cmpint(qpci_io_readw(pdev, pmr_bar, 0), !=, 0xaa99);
+
+/* Map NVMe Bar Register to Enable the Mem Region */
+nvme_bar = qpci_iomap(pdev, 0, NULL);
+
+pmrcap = qpci_io_readl(pdev, nvme_bar, 0xe00);
+g_assert_cmpint(NVME_PMRCAP_RDS(pmrcap), ==, 0x1);
+g_assert_cmpint(NVME_PMRCAP_WDS(pmrcap), ==, 0x1);
+g_assert_cmpint(NVME_PMRCAP_BIR(pmrcap), ==, 0x4);
+g_assert_cmpint(NVME_PMRCAP_PMRWBM(pmrcap), ==, 0x2);
+g_assert_cmpint(NVME_PMRCAP_CMSS(pmrcap), ==, 0x1);
+
+/* Enable PMRCTRL */
+qpci_io_writel(pdev, nvme_bar, 0xe04, 0x1);
+
+qpci_io_writel(pdev, pmr_bar, 0, 0x44332211);
+g_assert_cmpint(qpci_io_readb(pdev, pmr_bar, 0), ==, 0x11);
+g_assert_cmpint(qpci_io_readw(pdev, pmr_bar, 0), ==, 0x2211);
+g_assert_cmpint(qpci_io_readl(pdev, pmr_bar, 0), ==, 0x44332211);
+
+pmrsts = qpci_io_readl(pdev, nvme_bar, 0xe08);
+g_assert_cmpint(NVME_PMRSTS_NRDY(pmrsts), ==, 0x0);
+
+/* Disable PMRCTRL */
+qpci_io_writel(pdev, nvme_bar, 0xe04, 0x0);
+
+qpci_io_writel(pdev, pmr_bar, 0, 0x88776655);
+g_assert_cmpint(qpci_io_readb(pdev, pmr_bar, 0), !=, 0x55);
+g_assert_cmpint(qpci_io_readw(pdev, pmr_bar, 0), !=, 0x6655);
+g_assert_cmpint(qpci_io_readl(pdev, pmr_bar, 0), !=, 0x88776655);
+
+pmrsts = qpci_io_readl(pdev, nvme_bar, 0xe08);
+g_assert_cmpint(NVME_PMRSTS_NRDY(pmrsts), ==, 0x1);
+
+qpci_iounmap(pdev, nvme_bar);
+qpci_iounmap(pdev, pmr_bar);
+}
+
 static void nvme_register_nodes(void)
 {
 QOSGraphEdgeOptions opts = {
 .extra_device_opts = "addr=04.0,drive=drv0,serial=foo",
 .before_cmd_line = "-drive id=drv0,if=none,file=null-co://,"
-   "file.read-zeroes=on,format=raw",
+   "file.read-zeroes=on,format=raw "
+   "-object memory-backend-ram,id=pmr0,"
+   "share=on,size=8",
 };
 
 add_qpci_address(, &(QPCIAddress) { .devfn = QPCI_DEVFN(4, 0) });
@@ -83,6 +137,11 @@ static void nvme_register_nodes(void)
 qos_add_test("oob-cmb-access", "nvme", nvmetest_oob_cmb_test, 
&(QOSGraphTestOptions) {
 .edge.extra_device_opts = "cmb_size_mb=2"
 });
+
+qos_add_test("pmr-test-access", "nvme", nvmetest_pmr_reg_test,
+ &(QOSGraphTestOptions) {
+.edge.extra_device_opts = "pmrdev=pmr0"
+});
 }
 
 libqos_init(nvme_register_nodes);
-- 
2.32.0




[PULL for-6.1 01/11] hw/nvme: remove NvmeCtrl parameter from ns setup/check functions

2021-07-26 Thread Klaus Jensen
From: Klaus Jensen 

The nvme_ns_setup and nvme_ns_check_constraints should not depend on the
controller state. Refactor and remove it.

Reviewed-by: Hannes Reinecke 
Signed-off-by: Klaus Jensen 
---
 hw/nvme/nvme.h |  2 +-
 hw/nvme/ctrl.c |  2 +-
 hw/nvme/ns.c   | 37 ++---
 3 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 56f8eceed2ad..0868359a1e86 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -246,7 +246,7 @@ static inline void nvme_aor_dec_active(NvmeNamespace *ns)
 }
 
 void nvme_ns_init_format(NvmeNamespace *ns);
-int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp);
+int nvme_ns_setup(NvmeNamespace *ns, Error **errp);
 void nvme_ns_drain(NvmeNamespace *ns);
 void nvme_ns_shutdown(NvmeNamespace *ns);
 void nvme_ns_cleanup(NvmeNamespace *ns);
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 629b0d38c2a2..dd1801510032 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -6498,7 +6498,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 ns = >namespace;
 ns->params.nsid = 1;
 
-if (nvme_ns_setup(n, ns, errp)) {
+if (nvme_ns_setup(ns, errp)) {
 return;
 }
 
diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index 4275c3db6301..3c4f5b8c714a 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -346,8 +346,7 @@ static void nvme_zoned_ns_shutdown(NvmeNamespace *ns)
 assert(ns->nr_open_zones == 0);
 }
 
-static int nvme_ns_check_constraints(NvmeCtrl *n, NvmeNamespace *ns,
- Error **errp)
+static int nvme_ns_check_constraints(NvmeNamespace *ns, Error **errp)
 {
 if (!ns->blkconf.blk) {
 error_setg(errp, "block backend not configured");
@@ -366,20 +365,6 @@ static int nvme_ns_check_constraints(NvmeCtrl *n, 
NvmeNamespace *ns,
 return -1;
 }
 
-if (!n->subsys) {
-if (ns->params.detached) {
-error_setg(errp, "detached requires that the nvme device is "
-   "linked to an nvme-subsys device");
-return -1;
-}
-
-if (ns->params.shared) {
-error_setg(errp, "shared requires that the nvme device is "
-   "linked to an nvme-subsys device");
-return -1;
-}
-}
-
 if (ns->params.zoned) {
 if (ns->params.max_active_zones) {
 if (ns->params.max_open_zones > ns->params.max_active_zones) {
@@ -411,9 +396,9 @@ static int nvme_ns_check_constraints(NvmeCtrl *n, 
NvmeNamespace *ns,
 return 0;
 }
 
-int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp)
+int nvme_ns_setup(NvmeNamespace *ns, Error **errp)
 {
-if (nvme_ns_check_constraints(n, ns, errp)) {
+if (nvme_ns_check_constraints(ns, errp)) {
 return -1;
 }
 
@@ -465,7 +450,21 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
 uint32_t nsid = ns->params.nsid;
 int i;
 
-if (nvme_ns_setup(n, ns, errp)) {
+if (!n->subsys) {
+if (ns->params.detached) {
+error_setg(errp, "detached requires that the nvme device is "
+   "linked to an nvme-subsys device");
+return;
+}
+
+if (ns->params.shared) {
+error_setg(errp, "shared requires that the nvme device is "
+   "linked to an nvme-subsys device");
+return;
+}
+}
+
+if (nvme_ns_setup(ns, errp)) {
 return;
 }
 
-- 
2.32.0




[PULL for-6.1 03/11] hw/nvme: unregister controller with subsystem at exit

2021-07-26 Thread Klaus Jensen
From: Klaus Jensen 

Make sure the controller is unregistered from the subsystem when device
is removed.

Reviewed-by: Hannes Reinecke 
Signed-off-by: Klaus Jensen 
---
 hw/nvme/nvme.h   | 1 +
 hw/nvme/ctrl.c   | 4 
 hw/nvme/subsys.c | 5 +
 3 files changed, 10 insertions(+)

diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 0868359a1e86..c4065467d877 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -50,6 +50,7 @@ typedef struct NvmeSubsystem {
 } NvmeSubsystem;
 
 int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp);
+void nvme_subsys_unregister_ctrl(NvmeSubsystem *subsys, NvmeCtrl *n);
 
 static inline NvmeCtrl *nvme_subsys_ctrl(NvmeSubsystem *subsys,
  uint32_t cntlid)
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index dd1801510032..90e3ee2b70ee 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -6523,6 +6523,10 @@ static void nvme_exit(PCIDevice *pci_dev)
 nvme_ns_cleanup(ns);
 }
 
+if (n->subsys) {
+nvme_subsys_unregister_ctrl(n->subsys, n);
+}
+
 g_free(n->cq);
 g_free(n->sq);
 g_free(n->aer_reqs);
diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c
index dc7a96862f37..92caa604a280 100644
--- a/hw/nvme/subsys.c
+++ b/hw/nvme/subsys.c
@@ -32,6 +32,11 @@ int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp)
 return cntlid;
 }
 
+void nvme_subsys_unregister_ctrl(NvmeSubsystem *subsys, NvmeCtrl *n)
+{
+subsys->ctrls[n->cntlid] = NULL;
+}
+
 static void nvme_subsys_setup(NvmeSubsystem *subsys)
 {
 const char *nqn = subsys->params.nqn ?
-- 
2.32.0




[PULL for-6.1 02/11] hw/nvme: mark nvme-subsys non-hotpluggable

2021-07-26 Thread Klaus Jensen
From: Klaus Jensen 

We currently lack the infrastructure to handle subsystem hotplugging, so
disable it.

Reviewed-by: Hannes Reinecke 
Signed-off-by: Klaus Jensen 
---
 hw/nvme/subsys.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c
index 192223d17ca1..dc7a96862f37 100644
--- a/hw/nvme/subsys.c
+++ b/hw/nvme/subsys.c
@@ -61,6 +61,7 @@ static void nvme_subsys_class_init(ObjectClass *oc, void 
*data)
 
 dc->realize = nvme_subsys_realize;
 dc->desc = "Virtual NVMe subsystem";
+dc->hotpluggable = false;
 
 device_class_set_props(dc, nvme_subsystem_props);
 }
-- 
2.32.0




[PULL for-6.1 00/11] hw/nvme fixes

2021-07-26 Thread Klaus Jensen
From: Klaus Jensen 

Hi Peter,

The following changes since commit 1d6f147f043bece029a795c6eb9d43c1abd909b6:

  Merge remote-tracking branch 'remotes/quic/tags/pull-hex-20210725' into 
staging (2021-07-26 13:36:51 +0100)

are available in the Git repository at:

  git://git.infradead.org/qemu-nvme.git tags/nvme-next-pull-request

for you to fetch changes up to 9631a8ab21679e3d605f7f540dd8c692b9593e02:

  tests/qtest/nvme-test: add mmio read test (2021-07-26 21:09:39 +0200)


hw/nvme fixes

* new PMR test (Gollu Appalanaidu)
* pmr/sgl mapping fix (Padmakar Kalghatgi)
* hotplug fixes (me)
* mmio out-of-bound read fix (me)
* big-endian host fixes (me)



Gollu Appalanaidu (1):
  tests/qtest/nvme-test: add persistent memory region test

Klaus Jensen (9):
  hw/nvme: remove NvmeCtrl parameter from ns setup/check functions
  hw/nvme: mark nvme-subsys non-hotpluggable
  hw/nvme: unregister controller with subsystem at exit
  hw/nvme: fix controller hot unplugging
  hw/nvme: split pmrmsc register into upper and lower
  hw/nvme: use symbolic names for registers
  hw/nvme: fix out-of-bounds reads
  hw/nvme: fix mmio read
  tests/qtest/nvme-test: add mmio read test

Padmakar Kalghatgi (1):
  hw/nvme: error handling for too many mappings

 hw/nvme/nvme.h  |  18 +-
 include/block/nvme.h|  60 +--
 hw/nvme/ctrl.c  | 379 +++-
 hw/nvme/ns.c|  55 --
 hw/nvme/subsys.c|   9 +
 tests/qtest/nvme-test.c |  87 -
 hw/nvme/trace-events|   1 +
 7 files changed, 402 insertions(+), 207 deletions(-)

-- 
2.32.0




Re: [PATCH v2 3/3] target/arm: Add sve-default-vector-length cpu property

2021-07-26 Thread Andrew Jones
On Mon, Jul 26, 2021 at 08:33:52AM -1000, Richard Henderson wrote:
> On 7/26/21 4:59 AM, Andrew Jones wrote:
> > > +SVE User-mode Default Vector Length Property
> > > +
> > > +
> > > +For qemu-aarch64, the cpu property `sve-default-vector-length=N` is
> > > +defined to mirror the Linux kernel parameter file
> > > +`/proc/sys/abi/sve_default_vector_length`.  The default length, `N`,
> > > +is in units of bytes and must be between 16 and 8192.
> > 
> > Hmm. If a user inputs anything greater than 256, then won't it get
> > silently reduced to 256?
> 
> Yes.
> 
> > > +If not specified, the default vector length is 64.
> > > +
> > > +If the default length is larger than the maximum vector length enabled
> > > +with `sve` properties, the actual vector length will be reduced.
> > 
> > Here it's pointed out that the default may be reduced, but it implies that
> > that only happens if an sve property is also given. Won't users wonder
> > why they're only getting vectors that are 256 bytes large even when they
> > ask for more?
> 
> I guess adding that 256 is the maximum length supported by qemu should be 
> sufficient?
>

Works for me.

Thanks,
drew 




Re: [PATCH v2 3/3] target/arm: Add sve-default-vector-length cpu property

2021-07-26 Thread Richard Henderson

On 7/26/21 4:59 AM, Andrew Jones wrote:

+SVE User-mode Default Vector Length Property
+
+
+For qemu-aarch64, the cpu property `sve-default-vector-length=N` is
+defined to mirror the Linux kernel parameter file
+`/proc/sys/abi/sve_default_vector_length`.  The default length, `N`,
+is in units of bytes and must be between 16 and 8192.


Hmm. If a user inputs anything greater than 256, then won't it get
silently reduced to 256?


Yes.


+If not specified, the default vector length is 64.
+
+If the default length is larger than the maximum vector length enabled
+with `sve` properties, the actual vector length will be reduced.


Here it's pointed out that the default may be reduced, but it implies that
that only happens if an sve property is also given. Won't users wonder
why they're only getting vectors that are 256 bytes large even when they
ask for more?


I guess adding that 256 is the maximum length supported by qemu should be 
sufficient?


r~



Re: [PULL v2 0/2] Hexagon (target/hexagon) remove put_user_*/get_user_*

2021-07-26 Thread Peter Maydell
On Sun, 25 Jul 2021 at 22:42, Taylor Simpson  wrote:
>
> The following changes since commit 7457b407edd6e8555e4b46488aab2f13959fccf8:
>
>   Merge remote-tracking branch 
> 'remotes/thuth-gitlab/tags/pull-request-2021-07-19' into staging (2021-07-19 
> 11:34:08 +0100)
>
> are available in the git repository at:
>
>   https://github.com/quic/qemu tags/pull-hex-20210725
>
> for you to fetch changes up to 25fc9b79cd057e394f35d7afc18493becd515797:
>
>   target/hexagon: Drop include of qemu.h (2021-07-21 15:54:02 -0500)
>
> 
> The Hexagon target was silently failing the SIGSEGV test because
> the signal handler was not called.
>
> Patch 1/2 fixes the Hexagon target
> Patch 2/2 drops include qemu.h from target/hexagon/op_helper.c
>
>  Changes in v2 
> Drop changes to linux-test.c due to intermittent failures on riscv
>
> 


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/6.1
for any user-visible changes.

-- PMM



Re: [PATCH] hw/arm/nseries: Display hexadecimal value with '0x' prefix

2021-07-26 Thread Richard Henderson

On 7/26/21 5:09 AM, Philippe Mathieu-Daudé wrote:

Signed-off-by: Philippe Mathieu-Daudé
---
  hw/arm/nseries.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH for-6.1? v2 0/9] Fixes for clang-13

2021-07-26 Thread Richard Henderson

On 7/25/21 2:24 AM, Richard Henderson wrote:


Richard Henderson (9):
   nbd/server: Mark variable unused in nbd_negotiate_meta_queries
   accel/tcg: Remove unused variable in cpu_exec
   util/selfmap: Discard mapping on error
   net/checksum: Remove unused variable in net_checksum_add_iov
   hw/audio/adlib: Remove unused variable in adlib_callback
   hw/ppc/spapr_events: Remove unused variable from check_exception
   hw/pci-hist/pnv_phb4: Fix typo in pnv_phb4_ioda_write
   linux-user/syscall: Remove unused variable from execve
   tests/unit: Remove unused variable from test_io


Queued for 6.1.
Thanks for the reviews, all.

r~



Re: [PATCH v3 1/4] tpm: mark correct memory region range dirty when clearing RAM

2021-07-26 Thread David Hildenbrand

On 26.07.21 18:57, Peter Xu wrote:

On Mon, Jul 26, 2021 at 06:03:43PM +0200, David Hildenbrand wrote:

We might not start at the beginning of the memory region. Let's
calculate the offset into the memory region via the difference in the
host addresses.

Acked-by: Stefan Berger 
Fixes: ffab1be70692 ("tpm: clear RAM when "memory overwrite" requested")
Cc: Marc-André Lureau 
Cc: Paolo Bonzini 
Cc: "Michael S. Tsirkin" 
Cc: Eduardo Habkost 
Cc: Alex Williamson 
Cc: Dr. David Alan Gilbert 
Cc: Igor Mammedov 
Cc: Claudio Fontana 
Cc: Thomas Huth 
Cc: "Alex Bennée" 
Cc: Peter Xu 
Cc: Laurent Vivier 
Cc: Stefan Berger 
Signed-off-by: David Hildenbrand 
---
  hw/tpm/tpm_ppi.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/tpm/tpm_ppi.c b/hw/tpm/tpm_ppi.c
index 362edcc5c9..f243d9d0f6 100644
--- a/hw/tpm/tpm_ppi.c
+++ b/hw/tpm/tpm_ppi.c
@@ -30,11 +30,14 @@ void tpm_ppi_reset(TPMPPI *tpmppi)
  guest_phys_blocks_init(_phys_blocks);
  guest_phys_blocks_append(_phys_blocks);
  QTAILQ_FOREACH(block, _phys_blocks.head, next) {
+hwaddr mr_offs = (uint8_t *)memory_region_get_ram_ptr(block->mr) -
+ block->host_addr;


Didn't look closely previous - should it be reversed instead?

   block->host_addr - memory_region_get_ram_ptr(block->mr)


Of course it should :(

Thanks! :)

--
Thanks,

David / dhildenb




[PATCH v3] hw/acpi: add an assertion check for non-null return from acpi_get_i386_pci_host

2021-07-26 Thread Ani Sinha
All existing code using acpi_get_i386_pci_host() checks for a non-null
return value from this function call. Instead of returning early when the value
returned is NULL, assert instead. Since there are only two possible host buses
for i386 - q35 and i440fx, a null value return from the function does not make
sense in most cases and is likely an error situation.

Fixes: c0e427d6eb5fef ("hw/acpi/ich9: Enable ACPI PCI hot-plug")

Signed-off-by: Ani Sinha 
---
 hw/acpi/pcihp.c  |  8 
 hw/i386/acpi-build.c | 15 ++-
 2 files changed, 14 insertions(+), 9 deletions(-)

changelog:
v1: initial patch
v2: removed comment addition - that can be sent as a separate patch.
v3: added assertion for null host values for all cases except one.

diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
index f4d706e47d..054ee8cbc5 100644
--- a/hw/acpi/pcihp.c
+++ b/hw/acpi/pcihp.c
@@ -116,6 +116,12 @@ static void acpi_set_pci_info(void)
 bsel_is_set = true;
 
 if (!host) {
+/*
+ * This function can be eventually called from
+ * qemu_devices_reset() -> acpi_pcihp_reset() even
+ * for architectures other than i386. Hence, we need
+ * to ignore null values for host here.
+ */
 return;
 }
 
@@ -136,6 +142,8 @@ static void acpi_pcihp_disable_root_bus(void)
 return;
 }
 
+assert(host);
+
 bus = PCI_HOST_BRIDGE(host)->bus;
 if (bus) {
 /* setting the hotplug handler to NULL makes the bus non-hotpluggable 
*/
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 17836149fe..83fb1d55c0 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -321,9 +321,7 @@ static void acpi_get_pci_holes(Range *hole, Range *hole64)
 
 pci_host = acpi_get_i386_pci_host();
 
-if (!pci_host) {
-return;
-}
+assert(pci_host);
 
 range_set_bounds1(hole,
   object_property_get_uint(pci_host,
@@ -1769,9 +1767,9 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 
 pci_host = acpi_get_i386_pci_host();
 
-if (pci_host) {
-bus = PCI_HOST_BRIDGE(pci_host)->bus;
-}
+assert(pci_host);
+
+bus = PCI_HOST_BRIDGE(pci_host)->bus;
 
 if (bus) {
 Aml *scope = aml_scope("PCI0");
@@ -2389,9 +2387,8 @@ static bool acpi_get_mcfg(AcpiMcfgInfo *mcfg)
 QObject *o;
 
 pci_host = acpi_get_i386_pci_host();
-if (!pci_host) {
-return false;
-}
+
+assert(pci_host);
 
 o = object_property_get_qobject(pci_host, PCIE_HOST_MCFG_BASE, NULL);
 if (!o) {
-- 
2.25.1




Re: [PATCH v3 1/4] tpm: mark correct memory region range dirty when clearing RAM

2021-07-26 Thread Peter Xu
On Mon, Jul 26, 2021 at 06:03:43PM +0200, David Hildenbrand wrote:
> We might not start at the beginning of the memory region. Let's
> calculate the offset into the memory region via the difference in the
> host addresses.
> 
> Acked-by: Stefan Berger 
> Fixes: ffab1be70692 ("tpm: clear RAM when "memory overwrite" requested")
> Cc: Marc-André Lureau 
> Cc: Paolo Bonzini 
> Cc: "Michael S. Tsirkin" 
> Cc: Eduardo Habkost 
> Cc: Alex Williamson 
> Cc: Dr. David Alan Gilbert 
> Cc: Igor Mammedov 
> Cc: Claudio Fontana 
> Cc: Thomas Huth 
> Cc: "Alex Bennée" 
> Cc: Peter Xu 
> Cc: Laurent Vivier 
> Cc: Stefan Berger 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/tpm/tpm_ppi.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/tpm/tpm_ppi.c b/hw/tpm/tpm_ppi.c
> index 362edcc5c9..f243d9d0f6 100644
> --- a/hw/tpm/tpm_ppi.c
> +++ b/hw/tpm/tpm_ppi.c
> @@ -30,11 +30,14 @@ void tpm_ppi_reset(TPMPPI *tpmppi)
>  guest_phys_blocks_init(_phys_blocks);
>  guest_phys_blocks_append(_phys_blocks);
>  QTAILQ_FOREACH(block, _phys_blocks.head, next) {
> +hwaddr mr_offs = (uint8_t *)memory_region_get_ram_ptr(block->mr) 
> -
> + block->host_addr;

Didn't look closely previous - should it be reversed instead?

  block->host_addr - memory_region_get_ram_ptr(block->mr)

Thanks,

> +
>  trace_tpm_ppi_memset(block->host_addr,
>   block->target_end - block->target_start);
>  memset(block->host_addr, 0,
> block->target_end - block->target_start);
> -memory_region_set_dirty(block->mr, 0,
> +memory_region_set_dirty(block->mr, mr_offs,
>  block->target_end - block->target_start);
>  }
>  guest_phys_blocks_free(_phys_blocks);
> -- 
> 2.31.1
> 

-- 
Peter Xu




Re: [PATCH for-6.1?] bitops.h: revert db1ffc32dd ("qemu/bitops.h: add bitrev8 implementation")

2021-07-26 Thread Richard Henderson

On 7/25/21 1:05 AM, Mark Cave-Ayland wrote:

Commit db1ffc32dd ("qemu/bitops.h: add bitrev8 implementation") introduced a
bitrev8() function to reverse the bit ordering required for storing the MAC
address in the q800 PROM.

This function is not required since QEMU implements its own revbit8() function
which does exactly the same thing. Remove the extraneous bitrev8() function and
switch its only caller in hw/m68k/q800.c to use revbit8() instead.

Signed-off-by: Mark Cave-Ayland
---
  hw/m68k/q800.c|  2 +-
  include/qemu/bitops.h | 22 --
  2 files changed, 1 insertion(+), 23 deletions(-)

---
I picked this up reading the loongarch thread where I realised that QEMU
already has a revbit8() function - I was searching for bitrev8() before
deciding that this needed to be added since this was the name of the equivalent
function in Linux.

I think this is a good candidate for 6.1 still because a) it only has 1 caller
which is easy for me to test and b) it prevents anyone else coming along and
accidentally using it later.

MCA.


Queued for 6.1.


r~



Re: [PATCH for-6.1 0/2] accel/tcg: Fix hang when running in icount mode

2021-07-26 Thread Richard Henderson

On 7/25/21 8:11 AM, Richard Henderson wrote:

On 7/25/21 7:44 AM, Peter Maydell wrote:

This patchset fixes the intermittent hang seen when running a guest in
icount mode, as reported in
   https://gitlab.com/qemu-project/qemu/-/issues/499 .

The underlying cause of the hang is that code in cpu_loop_exec_tb()
was using CF_COUNT_MASK as the maximum possible number of instructions
it would try to execute from a TB when it set the icount_decr.u16.low
field. This is wrong, because (a) that field can validly be set to any
unsigned 16-bit integer and (b) now that CF_COUNT_MASK has been
reduced to 511 in commit 78ff82bb1b67c0d7, it might be less than the
number of insns in the TB.

Patch one fixes cpu_loop_exec_tb() to use the actual maximum valid
value for icount_decr.u16.low, which is 0x.  Patch two adjusts the
"should we ask for a TB with exactly this many insns in it?" condition
so that instead of testing "cpu->icount_extra == 0", which should be
always true if (insns_left > 0 && insns_left < tb->icount), we assert
it instead.  This assertion would have caught the bug fixed in patch
one.

Tested using the same iterating loop test described in the bug report;
without the fix QEMU hangs within a handful of iterations. With the
fix it managed 175 successful iterations before I got bored and hit ^C.

thanks
-- PMM

Peter Maydell (2):
   accel/tcg: Don't use CF_COUNT_MASK as the max value of
 icount_decr.u16.low
   accel/tcg: Remove unnecessary check on icount_extra in
 cpu_loop_exec_tb()


Nice one.
Reviewed-by: Richard Henderson 


Queued for 6.1.

r~




Re: [PATCH v2 12/22] target/loongarch: Add fixed point extra instruction translation

2021-07-26 Thread Richard Henderson

On 7/26/21 2:57 AM, Song Gao wrote:


Hi, Richard.

On 07/23/2021 01:12 PM, Richard Henderson wrote:

On 7/20/21 11:53 PM, Song Gao wrote:

+target_ulong helper_cpucfg(CPULoongArchState *env, target_ulong rj)
+{
+    target_ulong r = 0;
+
+    switch (rj) {
+    case 0:
+    r = env->CSR_MCSR0 & 0x;
+    break;
+    case 1:
+    r = (env->CSR_MCSR0 & 0x) >> 32;
+    break;


Why do you represent all of these as high and low portions of a 64-bit internal 
value, when the manual describes them as 32-bit values?


This method can reduce variables on env.


The number of variables may increase, but the memory consumed -- which is the metric that 
is more important -- is still the same.


Also, it is much less confusing to match the description in the manual.


r~



Re: [PATCH v2 09/22] target/loongarch: Add fixed point bit instruction translation

2021-07-26 Thread Richard Henderson

On 7/26/21 2:22 AM, Song Gao wrote:

Hi, Richard.

On 07/23/2021 09:29 AM, Richard Henderson wrote:

On 7/20/21 11:53 PM, Song Gao wrote:

This patch implement fixed point bit instruction translation.

This includes:
- EXT.W.{B/H}
- CL{O/Z}.{W/D}, CT{O/Z}.{W/D}
- BYTEPICK.{W/D}
- REVB.{2H/4H/2W/D}
- REVH.{2W/D}
- BITREV.{4B/8B}, BITREV.{W/D}
- BSTRINS.{W/D}, BSTRPICK.{W/D}
- MASKEQZ, MASKNEZ

Signed-off-by: Song Gao 
---
   target/loongarch/helper.h |  10 +
   target/loongarch/insns.decode |  45 +++
   target/loongarch/op_helper.c  | 119 
   target/loongarch/trans.inc.c  | 665 
++
   4 files changed, 839 insertions(+)

diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 6c7e19b..bbbcc26 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -8,3 +8,13 @@
     DEF_HELPER_3(raise_exception_err, noreturn, env, i32, int)
   DEF_HELPER_2(raise_exception, noreturn, env, i32)
+
+DEF_HELPER_2(cto_w, tl, env, tl)
+DEF_HELPER_2(ctz_w, tl, env, tl)
+DEF_HELPER_2(cto_d, tl, env, tl)
+DEF_HELPER_2(ctz_d, tl, env, tl)


The count leading and trailing zero operations are built into tcg.  Count 
leading and trailing one simply needs a NOT operation to convert it to zero.



My understanding is this:

   cto -> NOT operation (tcg_gen_not_tl)  -> ctz,


   is right?


Yes.


r~



Re: [PATCH v2 3/3] qemu-ga/msi: fix w32 libgcc name

2021-07-26 Thread Marc-André Lureau
Hi

On Mon, Jul 26, 2021 at 7:56 PM Gerd Hoffmann  wrote:

> This is what I find on my Fedora 34 mingw install.
>
> Signed-off-by: Gerd Hoffmann 
> ---
>  qga/installer/qemu-ga.wxs | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/qga/installer/qemu-ga.wxs b/qga/installer/qemu-ga.wxs
> index 9cb4c3d73302..ce7b25b5e16f 100644
> --- a/qga/installer/qemu-ga.wxs
> +++ b/qga/installer/qemu-ga.wxs
> @@ -31,7 +31,7 @@
>
>
>
> -
> +
>  
>
>
>
msitools/wixl has Fedora WXI (wxs includes files) for that
https://gitlab.gnome.org/GNOME/msitools/-/blob/master/data/wixl/gcc.wxi

In theory, all you need is: "" at the top level, and
reference the component group ""

That's how we handle the dozens for dependencies in virt-viewer/spice.

-- 
Marc-André Lureau


[PATCH for-6.1] hw/arm/boot: Report error if there is no fw_cfg device in the machine

2021-07-26 Thread Peter Maydell
If the user provides both a BIOS/firmware image and also a guest
kernel filename, arm_setup_firmware_boot() will pass the
kernel image to the firmware via the fw_cfg device. However we
weren't checking whether there really was a fw_cfg device present,
and if there wasn't we would crash.

This crash can be provoked with a command line such as
 qemu-system-aarch64 -M raspi3 -kernel /dev/null -bios /dev/null -display none

It is currently only possible on the raspi3 machine, because unless
the machine sets info->firmware_loaded we won't call
arm_setup_firmware_boot(), and the only machines which set that are:
 * virt (has a fw-cfg device)
 * sbsa-ref (checks itself for kernel_filename && firmware_loaded)
 * raspi3 (crashes)

But this is an unfortunate beartrap to leave for future machine
model implementors, so we should handle this situation in boot.c.

Check in arm_setup_firmware_boot() whether the fw-cfg device exists
before trying to load files into it, and if it doesn't exist then
exit with a hopefully helpful error message.

Because we now handle this check in a machine-agnostic way, we
can remove the check from sbsa-ref.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/503
Signed-off-by: Peter Maydell 
---
I didn't reuse exactly the same wording sbsa-ref used to have,
because the bit about loading an OS from hard disk might not
apply to all machine types.
---
 hw/arm/boot.c | 9 +
 hw/arm/sbsa-ref.c | 7 ---
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index d7b059225e6..57efb61ee41 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -1243,6 +1243,15 @@ static void arm_setup_firmware_boot(ARMCPU *cpu, struct 
arm_boot_info *info)
 bool try_decompressing_kernel;
 
 fw_cfg = fw_cfg_find();
+
+if (!fw_cfg) {
+error_report("This machine type does not support loading both "
+ "a guest firmware/BIOS image and a guest kernel at "
+ "the same time. You should change your QEMU command "
+ "line to specify one or the other, but not both.");
+exit(1);
+}
+
 try_decompressing_kernel = arm_feature(>env,
ARM_FEATURE_AARCH64);
 
diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
index 43c19b49234..c1629df6031 100644
--- a/hw/arm/sbsa-ref.c
+++ b/hw/arm/sbsa-ref.c
@@ -691,13 +691,6 @@ static void sbsa_ref_init(MachineState *machine)
 
 firmware_loaded = sbsa_firmware_init(sms, sysmem, secure_sysmem);
 
-if (machine->kernel_filename && firmware_loaded) {
-error_report("sbsa-ref: No fw_cfg device on this machine, "
- "so -kernel option is not supported when firmware loaded, 
"
- "please load OS from hard disk instead");
-exit(1);
-}
-
 /*
  * This machine has EL3 enabled, external firmware should supply PSCI
  * implementation, so the QEMU's internal PSCI is disabled.
-- 
2.20.1




[PATCH] hw/net/can: sja1000 fix buff2frame_bas for dlc out of std CAN 8 bytes

2021-07-26 Thread Pavel Pisa
Problem reported by openEuler fuzz-sig group.

The buff2frame_bas function (hw\net\can\can_sja1000.c)
infoleak(qemu5.x~qemu6.x) or stack-overflow(qemu 4.x).

Reported-by: Qiang Ning 
Signed-off-by: Pavel Pisa 
---
 hw/net/can/can_sja1000.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/net/can/can_sja1000.c b/hw/net/can/can_sja1000.c
index 42d2f99dfb..64e81bff58 100644
--- a/hw/net/can/can_sja1000.c
+++ b/hw/net/can/can_sja1000.c
@@ -311,6 +311,10 @@ static void buff2frame_bas(const uint8_t *buff, 
qemu_can_frame *frame)
 }
 frame->can_dlc = buff[1] & 0x0f;
 
+if (frame->can_dlc > 8) {
+frame->can_dlc = 8;
+}
+
 for (i = 0; i < frame->can_dlc; i++) {
 frame->data[i] = buff[2 + i];
 }
-- 
2.20.1




Re: [PATCH v6 2/2] tests/tcg/s390x: Test SIGILL and SIGSEGV handling

2021-07-26 Thread Ilya Leoshkevich
On Mon, 2021-07-05 at 23:04 +0200, Ilya Leoshkevich wrote:
> Verify that s390x-specific uc_mcontext.psw.addr is reported correctly
> and that signal handling interacts properly with debugging.
> 
> Signed-off-by: Ilya Leoshkevich 
> ---
>  tests/tcg/s390x/Makefile.target   |  18 +-
>  tests/tcg/s390x/gdbstub/test-signals-s390x.py |  76 
>  tests/tcg/s390x/signals-s390x.c   | 165
> ++
>  3 files changed, 258 insertions(+), 1 deletion(-)
>  create mode 100644 tests/tcg/s390x/gdbstub/test-signals-s390x.py
>  create mode 100644 tests/tcg/s390x/signals-s390x.c
> 
> diff --git a/tests/tcg/s390x/Makefile.target
> b/tests/tcg/s390x/Makefile.target
> index 0036b8a505..0a5b25c156 100644
> --- a/tests/tcg/s390x/Makefile.target
> +++ b/tests/tcg/s390x/Makefile.target
> @@ -1,4 +1,5 @@
> -VPATH+=$(SRC_PATH)/tests/tcg/s390x
> +S390X_SRC=$(SRC_PATH)/tests/tcg/s390x
> +VPATH+=$(S390X_SRC)
>  CFLAGS+=-march=zEC12 -m64
>  TESTS+=hello-s390x
>  TESTS+=csst
> @@ -12,3 +13,18 @@ TESTS+=mvc
>  # This triggers failures on s390x hosts about 4% of the time
>  run-signals: signals
> $(call skip-test, $<, "BROKEN awaiting sigframe clean-ups")
> +
> +TESTS+=signals-s390x
> +
> +ifneq ($(HAVE_GDB_BIN),)
> +GDB_SCRIPT=$(SRC_PATH)/tests/guest-debug/run-test.py
> +
> +run-gdbstub-signals-s390x: signals-s390x
> +   $(call run-test, $@, $(GDB_SCRIPT) \
> +   --gdb $(HAVE_GDB_BIN) \
> +   --qemu $(QEMU) --qargs "$(QEMU_OPTS)" \
> +   --bin $< --test $(S390X_SRC)/gdbstub/test-signals-
> s390x.py, \
> +   "mixing signals and debugging on s390x")
> +
> +EXTRA_RUNS += run-gdbstub-signals-s390x
> +endif
> diff --git a/tests/tcg/s390x/gdbstub/test-signals-s390x.py
> b/tests/tcg/s390x/gdbstub/test-signals-s390x.py
> new file mode 100644
> index 00..80a284b475
> --- /dev/null
> +++ b/tests/tcg/s390x/gdbstub/test-signals-s390x.py
> @@ -0,0 +1,76 @@
> +from __future__ import print_function
> +
> +#
> +# Test that signals and debugging mix well together on s390x.
> +#
> +# This is launched via tests/guest-debug/run-test.py
> +#
> +
> +import gdb
> +import sys
> +
> +failcount = 0
> +
> +
> +def report(cond, msg):
> +    """Report success/fail of test"""
> +    if cond:
> +    print("PASS: %s" % (msg))
> +    else:
> +    print("FAIL: %s" % (msg))
> +    global failcount
> +    failcount += 1
> +
> +
> +def run_test():
> +    """Run through the tests one by one"""
> +    illegal_op = gdb.Breakpoint("illegal_op")
> +    stg = gdb.Breakpoint("stg")
> +    mvc_8 = gdb.Breakpoint("mvc_8")
> +
> +    # Expect the following events:
> +    # 1x illegal_op breakpoint
> +    # 2x stg breakpoint, segv, breakpoint
> +    # 2x mvc_8 breakpoint, segv, breakpoint
> +    for _ in range(14):
> +    gdb.execute("c")
> +    report(illegal_op.hit_count == 1, "illegal_op.hit_count == 1")
> +    report(stg.hit_count == 4, "stg.hit_count == 4")
> +    report(mvc_8.hit_count == 4, "mvc_8.hit_count == 4")
> +
> +    # The test must succeed.
> +    gdb.Breakpoint("_exit")
> +    gdb.execute("c")
> +    status = int(gdb.parse_and_eval("$r2"))
> +    report(status == 0, "status == 0");
> +
> +
> +#
> +# This runs as the script it sourced (via -x, via run-test.py)
> +#
> +try:
> +    inferior = gdb.selected_inferior()
> +    arch = inferior.architecture()
> +    print("ATTACHED: %s" % arch.name())
> +except (gdb.error, AttributeError):
> +    print("SKIPPING (not connected)", file=sys.stderr)
> +    exit(0)
> +
> +if gdb.parse_and_eval("$pc") == 0:
> +    print("SKIP: PC not set")
> +    exit(0)
> +
> +try:
> +    # These are not very useful in scripts
> +    gdb.execute("set pagination off")
> +    gdb.execute("set confirm off")
> +
> +    # Run the actual tests
> +    run_test()
> +except (gdb.error):
> +    print("GDB Exception: %s" % (sys.exc_info()[0]))
> +    failcount += 1
> +    pass
> +
> +print("All tests complete: %d failures" % failcount)
> +exit(failcount)
> diff --git a/tests/tcg/s390x/signals-s390x.c
> b/tests/tcg/s390x/signals-s390x.c
> new file mode 100644
> index 00..dc2f8ee59a
> --- /dev/null
> +++ b/tests/tcg/s390x/signals-s390x.c
> @@ -0,0 +1,165 @@
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/*
> + * Various instructions that generate SIGILL and SIGSEGV. They could
> have been
> + * defined in a separate .s file, but this would complicate the
> build, so the
> + * inline asm is used instead.
> + */
> +
> +void illegal_op(void);
> +void after_illegal_op(void);
> +asm(".globl\tillegal_op\n"
> +    "illegal_op:\t.byte\t0x00,0x00\n"
> +    "\t.globl\tafter_illegal_op\n"
> +    "after_illegal_op:\tbr\t%r14");
> +
> +void stg(void *dst, unsigned long src);
> +asm(".globl\tstg\n"
> +    "stg:\tstg\t%r3,0(%r2)\n"
> +    "\tbr\t%r14");
> +
> +void mvc_8(void *dst, void *src);
> +asm(".globl\tmvc_8\n"
> +    "mvc_8:\tmvc\t0(8,%r2),0(%r3)\n"
> +    "\tbr\t%r14");
> +
> +static void 

[PATCH v3 4/4] softmmu/memory_mapping: optimize for RamDiscardManager sections

2021-07-26 Thread David Hildenbrand
virtio-mem logically plugs/unplugs memory within a sparse memory region
and notifies via the RamDiscardManager interface when parts become
plugged (populated) or unplugged (discarded).

Currently, we end up (via the two users)
1) zeroing all logically unplugged/discarded memory during TPM resets.
2) reading all logically unplugged/discarded memory when dumping, to
   figure out the content is zero.

1) is always bad, because we assume unplugged memory stays discarded
   (and is already implicitly zero).
2) isn't that bad with anonymous memory, we end up reading the zero
   page (slow and unnecessary, though). However, once we use some
   file-backed memory (future use case), even reading will populate memory.

Let's cut out all parts marked as not-populated (discarded) via the
RamDiscardManager. As virtio-mem is the single user, this now means that
logically unplugged memory ranges will no longer be included in the
dump, which results in smaller dump files and faster dumping.

virtio-mem has a minimum granularity of 1 MiB (and the default is usually
2 MiB). Theoretically, we can see quite some fragmentation, in practice
we won't have it completely fragmented in 1 MiB pieces. Still, we might
end up with many physical ranges.

Both, the ELF format and kdump seem to be ready to support many
individual ranges (e.g., for ELF it seems to be UINT32_MAX, kdump has a
linear bitmap).

Reviewed-by: Peter Xu 
Cc: Marc-André Lureau 
Cc: Paolo Bonzini 
Cc: "Michael S. Tsirkin" 
Cc: Eduardo Habkost 
Cc: Alex Williamson 
Cc: Dr. David Alan Gilbert 
Cc: Igor Mammedov 
Cc: Claudio Fontana 
Cc: Thomas Huth 
Cc: "Alex Bennée" 
Cc: Peter Xu 
Cc: Laurent Vivier 
Cc: Stefan Berger 
Signed-off-by: David Hildenbrand 
---
 softmmu/memory_mapping.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/softmmu/memory_mapping.c b/softmmu/memory_mapping.c
index a2af02c41c..a62eaa49cc 100644
--- a/softmmu/memory_mapping.c
+++ b/softmmu/memory_mapping.c
@@ -246,6 +246,15 @@ static void guest_phys_block_add_section(GuestPhysListener 
*g,
 #endif
 }
 
+static int guest_phys_ram_populate_cb(MemoryRegionSection *section,
+  void *opaque)
+{
+GuestPhysListener *g = opaque;
+
+guest_phys_block_add_section(g, section);
+return 0;
+}
+
 static void guest_phys_blocks_region_add(MemoryListener *listener,
  MemoryRegionSection *section)
 {
@@ -257,6 +266,17 @@ static void guest_phys_blocks_region_add(MemoryListener 
*listener,
 memory_region_is_nonvolatile(section->mr)) {
 return;
 }
+
+/* for special sparse regions, only add populated parts */
+if (memory_region_has_ram_discard_manager(section->mr)) {
+RamDiscardManager *rdm;
+
+rdm = memory_region_get_ram_discard_manager(section->mr);
+ram_discard_manager_replay_populated(rdm, section,
+ guest_phys_ram_populate_cb, g);
+return;
+}
+
 guest_phys_block_add_section(g, section);
 }
 
-- 
2.31.1




[PATCH v3 2/4] softmmu/memory_mapping: never merge ranges accross memory regions

2021-07-26 Thread David Hildenbrand
Let's make sure to not merge when different memory regions are involved.
Unlikely, but theoretically possible.

Acked-by: Stefan Berger 
Reviewed-by: Peter Xu 
Cc: Marc-André Lureau 
Cc: Paolo Bonzini 
Cc: "Michael S. Tsirkin" 
Cc: Eduardo Habkost 
Cc: Alex Williamson 
Cc: Dr. David Alan Gilbert 
Cc: Igor Mammedov 
Cc: Claudio Fontana 
Cc: Thomas Huth 
Cc: "Alex Bennée" 
Cc: Peter Xu 
Cc: Laurent Vivier 
Cc: Stefan Berger 
Signed-off-by: David Hildenbrand 
---
 softmmu/memory_mapping.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/softmmu/memory_mapping.c b/softmmu/memory_mapping.c
index e7af276546..d401ca7e31 100644
--- a/softmmu/memory_mapping.c
+++ b/softmmu/memory_mapping.c
@@ -229,7 +229,8 @@ static void guest_phys_blocks_region_add(MemoryListener 
*listener,
 
 /* we want continuity in both guest-physical and host-virtual memory */
 if (predecessor->target_end < target_start ||
-predecessor->host_addr + predecessor_size != host_addr) {
+predecessor->host_addr + predecessor_size != host_addr ||
+predecessor->mr != section->mr) {
 predecessor = NULL;
 }
 }
-- 
2.31.1




[PATCH v3 3/4] softmmu/memory_mapping: factor out adding physical memory ranges

2021-07-26 Thread David Hildenbrand
Let's factor out adding a MemoryRegionSection to the list, to be reused in
RamDiscardManager context next.

Reviewed-by: Stefan Berger 
Reviewed-by: Peter Xu 
Cc: Marc-André Lureau 
Cc: Paolo Bonzini 
Cc: "Michael S. Tsirkin" 
Cc: Eduardo Habkost 
Cc: Alex Williamson 
Cc: Dr. David Alan Gilbert 
Cc: Igor Mammedov 
Cc: Claudio Fontana 
Cc: Thomas Huth 
Cc: "Alex Bennée" 
Cc: Peter Xu 
Cc: Laurent Vivier 
Cc: Stefan Berger 
Signed-off-by: David Hildenbrand 
---
 softmmu/memory_mapping.c | 41 
 1 file changed, 20 insertions(+), 21 deletions(-)

diff --git a/softmmu/memory_mapping.c b/softmmu/memory_mapping.c
index d401ca7e31..a2af02c41c 100644
--- a/softmmu/memory_mapping.c
+++ b/softmmu/memory_mapping.c
@@ -193,29 +193,14 @@ typedef struct GuestPhysListener {
 MemoryListener listener;
 } GuestPhysListener;
 
-static void guest_phys_blocks_region_add(MemoryListener *listener,
+static void guest_phys_block_add_section(GuestPhysListener *g,
  MemoryRegionSection *section)
 {
-GuestPhysListener *g;
-uint64_t section_size;
-hwaddr target_start, target_end;
-uint8_t *host_addr;
-GuestPhysBlock *predecessor;
-
-/* we only care about RAM */
-if (!memory_region_is_ram(section->mr) ||
-memory_region_is_ram_device(section->mr) ||
-memory_region_is_nonvolatile(section->mr)) {
-return;
-}
-
-g= container_of(listener, GuestPhysListener, listener);
-section_size = int128_get64(section->size);
-target_start = section->offset_within_address_space;
-target_end   = target_start + section_size;
-host_addr= memory_region_get_ram_ptr(section->mr) +
-   section->offset_within_region;
-predecessor  = NULL;
+const hwaddr target_start = section->offset_within_address_space;
+const hwaddr target_end = target_start + int128_get64(section->size);
+uint8_t *host_addr = memory_region_get_ram_ptr(section->mr) +
+ section->offset_within_region;
+GuestPhysBlock *predecessor = NULL;
 
 /* find continuity in guest physical address space */
 if (!QTAILQ_EMPTY(>list->head)) {
@@ -261,6 +246,20 @@ static void guest_phys_blocks_region_add(MemoryListener 
*listener,
 #endif
 }
 
+static void guest_phys_blocks_region_add(MemoryListener *listener,
+ MemoryRegionSection *section)
+{
+GuestPhysListener *g = container_of(listener, GuestPhysListener, listener);
+
+/* we only care about RAM */
+if (!memory_region_is_ram(section->mr) ||
+memory_region_is_ram_device(section->mr) ||
+memory_region_is_nonvolatile(section->mr)) {
+return;
+}
+guest_phys_block_add_section(g, section);
+}
+
 void guest_phys_blocks_append(GuestPhysBlockList *list)
 {
 GuestPhysListener g = { 0 };
-- 
2.31.1




[PATCH v3 0/4] softmmu/memory_mapping: optimize dump/tpm for virtio-mem

2021-07-26 Thread David Hildenbrand
Minor fixes and cleanups, followed by an optimization for virtio-mem
regarding guest dumps and tpm.

virtio-mem logically plugs/unplugs memory within a sparse memory region
and notifies via the RamDiscardMgr interface when parts become
plugged (populated) or unplugged (discarded).

Currently, guest_phys_blocks_append() appends the whole (sparse)
virtio-mem managed region and therefore tpm code might zero the hole
region and dump code will dump the whole region. Let's only add logically
plugged (populated) parts of that region, skipping over logically
unplugged (discarded) parts by reusing the RamDiscardMgr infrastructure
introduced to handle virtio-mem + VFIO properly.

v2 -> v4:
- "tpm: mark correct memory region range dirty when clearing RAM"
-- Fix calculation of offset into memory region (thanks Peter!)
- "softmmu/memory_mapping: reuse qemu_get_guest_simple_memory_mapping()"
-- Dropped

v1 -> v2:
- "softmmu/memory_mapping: factor out adding physical memory ranges"
-- Simplify based on RamDiscardManager changes: add using a
   MemoryRegionSection
- "softmmu/memory_mapping: optimize for RamDiscardManager sections"
-- Simplify based on RamDiscardManager changes

David Hildenbrand (4):
  tpm: mark correct memory region range dirty when clearing RAM
  softmmu/memory_mapping: never merge ranges accross memory regions
  softmmu/memory_mapping: factor out adding physical memory ranges
  softmmu/memory_mapping: optimize for RamDiscardManager sections

 hw/tpm/tpm_ppi.c |  5 +++-
 softmmu/memory_mapping.c | 64 ++--
 2 files changed, 46 insertions(+), 23 deletions(-)

-- 
2.31.1




[PATCH v3 1/4] tpm: mark correct memory region range dirty when clearing RAM

2021-07-26 Thread David Hildenbrand
We might not start at the beginning of the memory region. Let's
calculate the offset into the memory region via the difference in the
host addresses.

Acked-by: Stefan Berger 
Fixes: ffab1be70692 ("tpm: clear RAM when "memory overwrite" requested")
Cc: Marc-André Lureau 
Cc: Paolo Bonzini 
Cc: "Michael S. Tsirkin" 
Cc: Eduardo Habkost 
Cc: Alex Williamson 
Cc: Dr. David Alan Gilbert 
Cc: Igor Mammedov 
Cc: Claudio Fontana 
Cc: Thomas Huth 
Cc: "Alex Bennée" 
Cc: Peter Xu 
Cc: Laurent Vivier 
Cc: Stefan Berger 
Signed-off-by: David Hildenbrand 
---
 hw/tpm/tpm_ppi.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/tpm/tpm_ppi.c b/hw/tpm/tpm_ppi.c
index 362edcc5c9..f243d9d0f6 100644
--- a/hw/tpm/tpm_ppi.c
+++ b/hw/tpm/tpm_ppi.c
@@ -30,11 +30,14 @@ void tpm_ppi_reset(TPMPPI *tpmppi)
 guest_phys_blocks_init(_phys_blocks);
 guest_phys_blocks_append(_phys_blocks);
 QTAILQ_FOREACH(block, _phys_blocks.head, next) {
+hwaddr mr_offs = (uint8_t *)memory_region_get_ram_ptr(block->mr) -
+ block->host_addr;
+
 trace_tpm_ppi_memset(block->host_addr,
  block->target_end - block->target_start);
 memset(block->host_addr, 0,
block->target_end - block->target_start);
-memory_region_set_dirty(block->mr, 0,
+memory_region_set_dirty(block->mr, mr_offs,
 block->target_end - block->target_start);
 }
 guest_phys_blocks_free(_phys_blocks);
-- 
2.31.1




Re: aarch64 efi boot failures with qemu 6.0+

2021-07-26 Thread Ard Biesheuvel
(cc Bjorn)

On Mon, 26 Jul 2021 at 11:08, Philippe Mathieu-Daudé  wrote:
>
> On 7/26/21 12:56 AM, Guenter Roeck wrote:
> > On 7/25/21 3:14 PM, Michael S. Tsirkin wrote:
> >> On Sat, Jul 24, 2021 at 11:52:34AM -0700, Guenter Roeck wrote:
> >>> Hi all,
> >>>
> >>> starting with qemu v6.0, some of my aarch64 efi boot tests no longer
> >>> work. Analysis shows that PCI devices with IO ports do not instantiate
> >>> in qemu v6.0 (or v6.1-rc0) when booting through efi. The problem affects
> >>> (at least) ne2k_pci, tulip, dc390, and am53c974. The problem only
> >>> affects
> >>> aarch64, not x86/x86_64.
> >>>
> >>> I bisected the problem to commit 0cf8882fd0 ("acpi/gpex: Inform os to
> >>> keep firmware resource map"). Since this commit, PCI device BAR
> >>> allocation has changed. Taking tulip as example, the kernel reports
> >>> the following PCI bar assignments when running qemu v5.2.
> >>>
> >>> [3.921801] pci :00:01.0: [1011:0019] type 00 class 0x02
> >>> [3.922207] pci :00:01.0: reg 0x10: [io  0x-0x007f]
> >>> [3.922505] pci :00:01.0: reg 0x14: [mem 0x1000-0x107f]

IIUC, these lines are read back from the BARs

> >>> [3.927111] pci :00:01.0: BAR 0: assigned [io  0x1000-0x107f]
> >>> [3.927455] pci :00:01.0: BAR 1: assigned [mem
> >>> 0x1000-0x107f]
> >>>

... and this is the assignment created by the kernel.

> >>> With qemu v6.0, the assignment is reported as follows.
> >>>
> >>> [3.922887] pci :00:01.0: [1011:0019] type 00 class 0x02
> >>> [3.923278] pci :00:01.0: reg 0x10: [io  0x-0x007f]
> >>> [3.923451] pci :00:01.0: reg 0x14: [mem 0x1000-0x107f]
> >>>

The problem here is that Linux, for legacy reasons, does not support
I/O ports <= 0x1000 on PCI, so the I/O assignment created by EFI is
rejected.

This might make sense on x86, where legacy I/O ports may exist, but on
other architectures, this makes no sense.


> >>> and the controller does not instantiate. The problem disapears after
> >>> reverting commit 0cf8882fd0.
> >>>
> >>> Attached is a summary of test runs with various devices and qemu v5.2
> >>> as well as qemu v6.0, and the command line I use for efi boots.
> >>>
> >>> Did commit 0cf8882fd0 introduce a bug, do I now need need some different
> >>> command line to instantiate PCI devices with io ports, or are such
> >>> devices
> >>> simply no longer supported if the system is booted with efi support ?
> >>>
> >>> Thanks,
> >>> Guenter
> >>
> >>
> >> So that commit basically just says don't ignore what efi did.
> >>
> >> The issue's thus likely efi.
> >>
> >
> > I don't see the problem with efi boots on x86 and x86_64.
> > Any idea why that might be the case ?
> >
> > Thanks,
> > Guenter
> >
> >> Cc the maintainer. Philippe can you comment pls?
>
> I'll have a look. Cc'ing Ard for EDK2/Aarch64.
>

So a potential workaround would be to use a different I/O resource
window for ArmVirtPkg, that starts at 0x1000. But I would prefer to
fix Linux instead.


> >>
> >>> ---
> >>> Command line (tulip network interface):
> >>>
> >>> CMDLINE="root=/dev/vda console=ttyAMA0"
> >>> ROOTFS="rootfs.ext2"
> >>>
> >>> qemu-system-aarch64 -M virt -kernel arch/arm64/boot/Image -no-reboot \
> >>>  -m 512 -cpu cortex-a57 -no-reboot \
> >>>  -device tulip,netdev=net0 -netdev user,id=net0 \
> >>>  -bios QEMU_EFI-aarch64.fd \
> >>>  -snapshot \
> >>>  -device virtio-blk-device,drive=d0 \
> >>>  -drive file=${ROOTFS},if=none,id=d0,format=raw \
> >>>  -nographic -serial stdio -monitor none \
> >>>  --append "${CMDLINE}"
> >>>
> >>> ---
> >>> Boot tests with various devices known to work in qemu v5.2.
> >>>
> >>> v5.2v6.0v6.0
> >>> efinon-efiefi
> >>> e1000passpasspass
> >>> e1000-82544gcpasspasspass
> >>> e1000-82545empasspasspass
> >>> e1000epasspasspass
> >>> i82550passpasspass
> >>> i82557apasspasspass
> >>> i82557bpasspasspass
> >>> i82557cpasspasspass
> >>> i82558apasspasspass
> >>> i82559bpasspasspass
> >>> i82559cpasspasspass
> >>> i82559erpasspasspass
> >>> i82562passpasspass
> >>> i82801passpasspass
> >>> ne2k_pcipasspassfail<--
> >>> pcnetpasspasspass
> >>> rtl8139passpasspass
> >>> tulippasspassfail<--
> >>> usb-netpasspasspass
> >>> virtio-net-device
> >>> passpasspass
> >>> virtio-net-pcipasspasspass
> >>> virtio-net-pci-non-transitional
> >>> passpasspass
> >>>
> >>> usb-xhcipasspasspass
> >>> usb-ehcipasspasspass
> >>> usb-ohcipasspasspass
> >>> usb-uas-xhcipasspasspass
> >>> virtiopasspasspass
> >>> 

Re: [PATCH] raw-format: drop WRITE and RESIZE child perms when possible

2021-07-26 Thread Stefan Hajnoczi
On Mon, Jul 26, 2021 at 05:42:47PM +0200, Kevin Wolf wrote:
> Am 26.07.2021 um 16:41 hat Vladimir Sementsov-Ogievskiy geschrieben:
> > 26.07.2021 15:28, Stefan Hajnoczi wrote:
> > > The following command-line fails due to a permissions conflict:
> > > 
> > >$ qemu-storage-daemon \
> > >--blockdev 
> > > driver=nvme,node-name=nvme0,device=:08:00.0,namespace=1 \
> > >--blockdev 
> > > driver=raw,node-name=l1-1,file=nvme0,offset=0,size=1073741824 \
> > >--blockdev 
> > > driver=raw,node-name=l1-2,file=nvme0,offset=1073741824,size=1073741824 \
> > >--nbd-server 
> > > addr.type=unix,addr.path=/tmp/nbd.sock,max-connections=2 \
> > >--export type=nbd,id=nbd-l1-1,node-name=l1-1,name=l1-1,writable=on 
> > > \
> > >--export type=nbd,id=nbd-l1-2,node-name=l1-2,name=l1-2,writable=on
> > > 
> > >qemu-storage-daemon: --export 
> > > type=nbd,id=nbd-l1-1,node-name=l1-1,name=l1-1,writable=on: Permission 
> > > conflict on node 'nvme0': permissions 'resize' are both required by node 
> > > 'l1-1' (uses node 'nvme0' as 'file' child) and unshared by node 'l1-2' 
> > > (uses node 'nvme0' as 'file' child).
> > > 
> > > The problem is that block/raw-format.c relies on bdrv_default_perms() to
> > > set permissions on the nvme node. The default permissions add RESIZE in
> > > anticipation of a format driver like qcow2 that needs to grow the image
> > > file. This fails because RESIZE is unshared, so we cannot get the RESIZE
> > > permission.
> > > 
> > > Max Reitz pointed out that block/crypto.c already handles this case by
> > > implementing a custom ->bdrv_child_perm() function that adjusts the
> > > result of bdrv_default_perms().
> > > 
> > > This patch takes the same approach in block/raw-format.c so that RESIZE
> > > is only required if it's actually necessary (e.g. the parent is qcow2).
> > > 
> > > Cc: Max Reitz 
> > > Cc: Kevin Wolf 
> > > Signed-off-by: Stefan Hajnoczi 
> > > ---
> > > This is not a bug fix, so I didn't mark it for QEMU 6.1. It's new
> > > behavior that hasn't been supported before. I want to split an NVMe
> > > drive using the raw format's offset=/size= feature.
> > > ---
> > >   block/raw-format.c | 21 -
> > >   1 file changed, 20 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/block/raw-format.c b/block/raw-format.c
> > > index 7717578ed6..c26f493688 100644
> > > --- a/block/raw-format.c
> > > +++ b/block/raw-format.c
> > > @@ -580,6 +580,25 @@ static void raw_cancel_in_flight(BlockDriverState 
> > > *bs)
> > >   bdrv_cancel_in_flight(bs->file->bs);
> > >   }
> > > +static void raw_child_perm(BlockDriverState *bs, BdrvChild *c,
> > > +   BdrvChildRole role,
> > > +   BlockReopenQueue *reopen_queue,
> > > +   uint64_t parent_perm, uint64_t parent_shared,
> > > +   uint64_t *nperm, uint64_t *nshared)
> > > +{
> > > +bdrv_default_perms(bs, c, role, reopen_queue, parent_perm,
> > > +   parent_shared, nperm, nshared);
> > > +
> > > +/*
> > > + * bdrv_default_perms() may add WRITE and/or RESIZE (see comment in
> > > + * bdrv_default_perms_for_storage() for an explanation) but we only 
> > > need
> > > + * them if they are in parent_perm. Drop WRITE and RESIZE whenever 
> > > possible
> > > + * to avoid permission conflicts.
> > > + */
> > > +*nperm &= ~(BLK_PERM_WRITE | BLK_PERM_RESIZE);
> > > +*nperm |= parent_perm & (BLK_PERM_WRITE | BLK_PERM_RESIZE);
> > > +}
> > > +
> > >   BlockDriver bdrv_raw = {
> > >   .format_name  = "raw",
> > >   .instance_size= sizeof(BDRVRawState),
> > > @@ -588,7 +607,7 @@ BlockDriver bdrv_raw = {
> > >   .bdrv_reopen_commit   = _reopen_commit,
> > >   .bdrv_reopen_abort= _reopen_abort,
> > >   .bdrv_open= _open,
> > > -.bdrv_child_perm  = bdrv_default_perms,
> > > +.bdrv_child_perm  = raw_child_perm,
> > >   .bdrv_co_create_opts  = _co_create_opts,
> > >   .bdrv_co_preadv   = _co_preadv,
> > >   .bdrv_co_pwritev  = _co_pwritev,
> > > 
> > 
> > I think it's OK:
> > 
> > Reviewed-by: Vladimir Sementsov-Ogievskiy 
> > 
> > 
> > Still, did you consider an alternative of making
> > bdrv_filter_default_perm() function public and just do
> > ".bdrv_child_perm = bdrv_filter_default_perm," here?
> > 
> > raw_format is not considered to be filter, but for it's permissions I
> > think it works exactly like filter.
> 
> I had the same thought, but then commit 69dca43d6b6 explicitly made the
> opposite change. I seem to remember that Max never liked raw being
> treated like a filter much.

Additionally:

  static int raw_open(BlockDriverState *bs, QDict *options, int flags,
  Error **errp)
  {
  ...
  /*
   * Without offset and a size limit, this driver behaves very much
   * like a filter.  With any such limit, it does not.
 

[PATCH v2 3/3] qemu-ga/msi: fix w32 libgcc name

2021-07-26 Thread Gerd Hoffmann
This is what I find on my Fedora 34 mingw install.

Signed-off-by: Gerd Hoffmann 
---
 qga/installer/qemu-ga.wxs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qga/installer/qemu-ga.wxs b/qga/installer/qemu-ga.wxs
index 9cb4c3d73302..ce7b25b5e16f 100644
--- a/qga/installer/qemu-ga.wxs
+++ b/qga/installer/qemu-ga.wxs
@@ -31,7 +31,7 @@
   
 
   
-
+
 
   
 
-- 
2.31.1




[PATCH v2 2/3] ci: build & store guest agent msi

2021-07-26 Thread Gerd Hoffmann
Build guest agent windows msi install package in gitlab CI,
store the result as artifact.

Signed-off-by: Gerd Hoffmann 
---
 .gitlab-ci.d/crossbuild-template.yml   | 3 ++-
 .gitlab-ci.d/crossbuilds.yml   | 2 ++
 tests/docker/dockerfiles/fedora-win32-cross.docker | 1 +
 tests/docker/dockerfiles/fedora-win64-cross.docker | 1 +
 4 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/.gitlab-ci.d/crossbuild-template.yml 
b/.gitlab-ci.d/crossbuild-template.yml
index 7d3ad00a1eb9..12f78e81de0d 100644
--- a/.gitlab-ci.d/crossbuild-template.yml
+++ b/.gitlab-ci.d/crossbuild-template.yml
@@ -12,9 +12,10 @@
   mips64-softmmu ppc-softmmu sh4-softmmu xtensa-softmmu"
 - make -j$(expr $(nproc) + 1) all check-build $MAKE_CHECK_ARGS
 - if grep -q "EXESUF=.exe" config-host.mak;
-  then make installer;
+  then make installer msi;
   version="$(git describe --match v[0-9]*)";
   mv -v qemu-setup*.exe qemu-setup-${version}.exe;
+  mv -v qga/*.msi $(basename qga/*.msi .msi)-${version}.msi;
   fi
 
 # Job to cross-build specific accelerators.
diff --git a/.gitlab-ci.d/crossbuilds.yml b/.gitlab-ci.d/crossbuilds.yml
index 4ff3aa3cfcdd..fc14a1cf5c10 100644
--- a/.gitlab-ci.d/crossbuilds.yml
+++ b/.gitlab-ci.d/crossbuilds.yml
@@ -163,6 +163,7 @@ cross-win32-system:
   artifacts:
 paths:
   - build/qemu-setup*.exe
+  - build/qemu-ga*.msi
 
 cross-win64-system:
   extends: .cross_system_build_job
@@ -173,6 +174,7 @@ cross-win64-system:
   artifacts:
 paths:
   - build/qemu-setup*.exe
+  - build/qemu-ga*.msi
 
 cross-amd64-xen-only:
   extends: .cross_accel_build_job
diff --git a/tests/docker/dockerfiles/fedora-win32-cross.docker 
b/tests/docker/dockerfiles/fedora-win32-cross.docker
index 5a03e1af43ac..f198927ef56b 100644
--- a/tests/docker/dockerfiles/fedora-win32-cross.docker
+++ b/tests/docker/dockerfiles/fedora-win32-cross.docker
@@ -28,6 +28,7 @@ ENV PACKAGES \
 mingw32-pixman \
 mingw32-pkg-config \
 mingw32-SDL2 \
+msitools \
 perl \
 perl-Test-Harness \
 python3 \
diff --git a/tests/docker/dockerfiles/fedora-win64-cross.docker 
b/tests/docker/dockerfiles/fedora-win64-cross.docker
index d3f13666e82e..fc0b3bf02eff 100644
--- a/tests/docker/dockerfiles/fedora-win64-cross.docker
+++ b/tests/docker/dockerfiles/fedora-win64-cross.docker
@@ -25,6 +25,7 @@ ENV PACKAGES \
 mingw64-libtasn1 \
 mingw64-pixman \
 mingw64-pkg-config \
+msitools \
 perl \
 perl-Test-Harness \
 python3 \
-- 
2.31.1




Re: [PATCH v2 07/22] target/loongarch: Add fixed point arithmetic instruction translation

2021-07-26 Thread Richard Henderson

On 7/26/21 1:56 AM, Song Gao wrote:

Hi, Richard.

On 07/23/2021 08:46 AM, Richard Henderson wrote:

On 7/20/21 11:53 PM, Song Gao wrote:

+/* Fixed point arithmetic operation instruction translation */
+static bool trans_add_w(DisasContext *ctx, arg_add_w *a)
+{
+    TCGv Rd = cpu_gpr[a->rd];
+    TCGv Rj = cpu_gpr[a->rj];
+    TCGv Rk = cpu_gpr[a->rk];
+
+    if (a->rd == 0) {
+    /* Nop */
+    return true;
+    }
+
+    if (a->rj != 0 && a->rk != 0) {
+    tcg_gen_add_tl(Rd, Rj, Rk);
+    tcg_gen_ext32s_tl(Rd, Rd);
+    } else if (a->rj == 0 && a->rk != 0) {
+    tcg_gen_mov_tl(Rd, Rk);
+    } else if (a->rj != 0 && a->rk == 0) {
+    tcg_gen_mov_tl(Rd, Rj);
+    } else {
+    tcg_gen_movi_tl(Rd, 0);
+    }
+
+    return true;
+}


Do not do all of this "if reg(n) zero" testing.

Use a common function to perform the gpr lookup, and a small callback function 
for the operation.  Often, the callback function already exists within 
include/tcg/tcg-op.h.

Please see my riscv cleanup patch set I referenced vs patch 6.


I am not sure  that 'riscv cleanup' patchs at:
   
    https://patchew.org/QEMU/20210709042608.883256-1-richard.hender...@linaro.org


It seems that  gpr_dst/gpr_src are common function to perform the gpr lookup. 
is that right?


More than that.  The gen_arith() function, for example, performs all of the bookkeeping 
for a binary operation.


For example,

static bool gen_arith(DisasContext *ctx, arg_fmt_rdrjrk *a,
  void (*func)(TCGv, TCGv, TCGv))
{
   TCGv dest = gpr_dst(ctx, a->rd);
   TCGv src1 = gpr_src(ctx, a->rj);
   TCGv src2 = gpr_src(ctx, a->rk);

func(dest, src1, src2);
return true;
}

#define TRANS(NAME, FUNC, ...) \
static bool trans_##NAME(DisasContext *ctx, arg_##NAME *a) \
{ return FUNC(ctx, a, __VA_ARGS__); }

static void gen_add_w(TCGv dest, TCGv src1, TCGv src2)
{
tcg_gen_add_tl(dest, src1, src2);
tcg_gen_ext32s_tl(dest, dest);
}

TRANS(add_w, gen_arith, gen_add_w)
TRANS(add_d, gen_arith, tcg_gen_add_tl)


r~



[PATCH v2 1/3] nsis.py: create dlldir automatically

2021-07-26 Thread Gerd Hoffmann
Use objdump do figure which dlls are needed.  Copy them to a temporary
dlldir and pass that directory to makensis.

This patch removes the need to manually copy dlls to $srcdir/dll/w{32,64}
to get functional windows installers via "make installer".

Signed-off-by: Gerd Hoffmann 
---
 scripts/nsis.py | 27 +++
 meson.build |  1 +
 2 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/scripts/nsis.py b/scripts/nsis.py
index 5135a0583167..5a3cc7c09628 100644
--- a/scripts/nsis.py
+++ b/scripts/nsis.py
@@ -7,6 +7,7 @@
 import argparse
 import glob
 import os
+import sys
 import shutil
 import subprocess
 import tempfile
@@ -19,16 +20,35 @@ def signcode(path):
 subprocess.run([cmd, path])
 
 
+def copydlls(binary, srcdir, dstdir):
+cmdline = [ "objdump", "-p", binary ]
+result = subprocess.run(cmdline, stdout = subprocess.PIPE,
+universal_newlines = True)
+if result.returncode != 0:
+sys.exit(result.returncode)
+for line in result.stdout.split('\n'):
+if line.find('DLL Name') != -1:
+dll = line.split()[2]
+src = os.path.join(srcdir, dll)
+dst = os.path.join(dstdir, dll)
+if os.path.isfile(src) and not os.path.isfile(dst):
+print("nsis.py: copy " + src)
+shutil.copyfile(src, dst)
+copydlls(src, srcdir, dstdir)
+
+
 def main():
 parser = argparse.ArgumentParser(description="QEMU NSIS build helper.")
 parser.add_argument("outfile")
 parser.add_argument("prefix")
 parser.add_argument("srcdir")
 parser.add_argument("cpu")
+parser.add_argument("dllsrc")
 parser.add_argument("nsisargs", nargs="*")
 args = parser.parse_args()
 
 destdir = tempfile.mkdtemp()
+dlldir = tempfile.mkdtemp()
 try:
 subprocess.run(["make", "install", "DESTDIR=" + destdir + os.path.sep])
 with open(
@@ -52,6 +72,7 @@ def main():
 
 for exe in glob.glob(os.path.join(destdir + args.prefix, "*.exe")):
 signcode(exe)
+copydlls(exe, args.dllsrc, dlldir)
 
 makensis = [
 "makensis",
@@ -59,19 +80,17 @@ def main():
 "-NOCD",
 "-DSRCDIR=" + args.srcdir,
 "-DBINDIR=" + destdir + args.prefix,
+"-DDLLDIR=" + dlldir,
 ]
-dlldir = "w32"
 if args.cpu == "x86_64":
-dlldir = "w64"
 makensis += ["-DW64"]
-if os.path.exists(os.path.join(args.srcdir, "dll")):
-makensis += ["-DDLLDIR={0}/dll/{1}".format(args.srcdir, dlldir)]
 
 makensis += ["-DOUTFILE=" + args.outfile] + args.nsisargs
 subprocess.run(makensis)
 signcode(args.outfile)
 finally:
 shutil.rmtree(destdir)
+shutil.rmtree(dlldir)
 
 
 if __name__ == "__main__":
diff --git a/meson.build b/meson.build
index f2e148eaf98e..4a2d54fbae3f 100644
--- a/meson.build
+++ b/meson.build
@@ -2790,6 +2790,7 @@ if host_machine.system() == 'windows'
 get_option('prefix'),
 meson.current_source_dir(),
 host_machine.cpu(),
+config_host['QEMU_GA_MSI_MINGW_DLL_PATH'],
 '--',
 '-DDISPLAYVERSION=' + meson.project_version(),
   ]
-- 
2.31.1




[PATCH v2 0/3] build windows installers in ci

2021-07-26 Thread Gerd Hoffmann
With 8619b5ddb56f ("ci: build & store windows installer") merged at
least patch 1/3 should go into 6.1 too so the installers created by
CI do actually work.

Patches 2+3 are for the guest agent installer.

Gerd Hoffmann (3):
  nsis.py: create dlldir automatically
  ci: build & store guest agent msi
  qemu-ga/msi: fix w32 libgcc name

 scripts/nsis.py   | 27 ---
 .gitlab-ci.d/crossbuild-template.yml  |  3 ++-
 .gitlab-ci.d/crossbuilds.yml  |  2 ++
 meson.build   |  1 +
 qga/installer/qemu-ga.wxs |  2 +-
 .../dockerfiles/fedora-win32-cross.docker |  1 +
 .../dockerfiles/fedora-win64-cross.docker |  1 +
 7 files changed, 31 insertions(+), 6 deletions(-)

-- 
2.31.1





Re: [PATCH] raw-format: drop WRITE and RESIZE child perms when possible

2021-07-26 Thread Kevin Wolf
Am 26.07.2021 um 16:41 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 26.07.2021 15:28, Stefan Hajnoczi wrote:
> > The following command-line fails due to a permissions conflict:
> > 
> >$ qemu-storage-daemon \
> >--blockdev 
> > driver=nvme,node-name=nvme0,device=:08:00.0,namespace=1 \
> >--blockdev 
> > driver=raw,node-name=l1-1,file=nvme0,offset=0,size=1073741824 \
> >--blockdev 
> > driver=raw,node-name=l1-2,file=nvme0,offset=1073741824,size=1073741824 \
> >--nbd-server 
> > addr.type=unix,addr.path=/tmp/nbd.sock,max-connections=2 \
> >--export type=nbd,id=nbd-l1-1,node-name=l1-1,name=l1-1,writable=on \
> >--export type=nbd,id=nbd-l1-2,node-name=l1-2,name=l1-2,writable=on
> > 
> >qemu-storage-daemon: --export 
> > type=nbd,id=nbd-l1-1,node-name=l1-1,name=l1-1,writable=on: Permission 
> > conflict on node 'nvme0': permissions 'resize' are both required by node 
> > 'l1-1' (uses node 'nvme0' as 'file' child) and unshared by node 'l1-2' 
> > (uses node 'nvme0' as 'file' child).
> > 
> > The problem is that block/raw-format.c relies on bdrv_default_perms() to
> > set permissions on the nvme node. The default permissions add RESIZE in
> > anticipation of a format driver like qcow2 that needs to grow the image
> > file. This fails because RESIZE is unshared, so we cannot get the RESIZE
> > permission.
> > 
> > Max Reitz pointed out that block/crypto.c already handles this case by
> > implementing a custom ->bdrv_child_perm() function that adjusts the
> > result of bdrv_default_perms().
> > 
> > This patch takes the same approach in block/raw-format.c so that RESIZE
> > is only required if it's actually necessary (e.g. the parent is qcow2).
> > 
> > Cc: Max Reitz 
> > Cc: Kevin Wolf 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> > This is not a bug fix, so I didn't mark it for QEMU 6.1. It's new
> > behavior that hasn't been supported before. I want to split an NVMe
> > drive using the raw format's offset=/size= feature.
> > ---
> >   block/raw-format.c | 21 -
> >   1 file changed, 20 insertions(+), 1 deletion(-)
> > 
> > diff --git a/block/raw-format.c b/block/raw-format.c
> > index 7717578ed6..c26f493688 100644
> > --- a/block/raw-format.c
> > +++ b/block/raw-format.c
> > @@ -580,6 +580,25 @@ static void raw_cancel_in_flight(BlockDriverState *bs)
> >   bdrv_cancel_in_flight(bs->file->bs);
> >   }
> > +static void raw_child_perm(BlockDriverState *bs, BdrvChild *c,
> > +   BdrvChildRole role,
> > +   BlockReopenQueue *reopen_queue,
> > +   uint64_t parent_perm, uint64_t parent_shared,
> > +   uint64_t *nperm, uint64_t *nshared)
> > +{
> > +bdrv_default_perms(bs, c, role, reopen_queue, parent_perm,
> > +   parent_shared, nperm, nshared);
> > +
> > +/*
> > + * bdrv_default_perms() may add WRITE and/or RESIZE (see comment in
> > + * bdrv_default_perms_for_storage() for an explanation) but we only 
> > need
> > + * them if they are in parent_perm. Drop WRITE and RESIZE whenever 
> > possible
> > + * to avoid permission conflicts.
> > + */
> > +*nperm &= ~(BLK_PERM_WRITE | BLK_PERM_RESIZE);
> > +*nperm |= parent_perm & (BLK_PERM_WRITE | BLK_PERM_RESIZE);
> > +}
> > +
> >   BlockDriver bdrv_raw = {
> >   .format_name  = "raw",
> >   .instance_size= sizeof(BDRVRawState),
> > @@ -588,7 +607,7 @@ BlockDriver bdrv_raw = {
> >   .bdrv_reopen_commit   = _reopen_commit,
> >   .bdrv_reopen_abort= _reopen_abort,
> >   .bdrv_open= _open,
> > -.bdrv_child_perm  = bdrv_default_perms,
> > +.bdrv_child_perm  = raw_child_perm,
> >   .bdrv_co_create_opts  = _co_create_opts,
> >   .bdrv_co_preadv   = _co_preadv,
> >   .bdrv_co_pwritev  = _co_pwritev,
> > 
> 
> I think it's OK:
> 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> 
> 
> Still, did you consider an alternative of making
> bdrv_filter_default_perm() function public and just do
> ".bdrv_child_perm = bdrv_filter_default_perm," here?
> 
> raw_format is not considered to be filter, but for it's permissions I
> think it works exactly like filter.

I had the same thought, but then commit 69dca43d6b6 explicitly made the
opposite change. I seem to remember that Max never liked raw being
treated like a filter much.

Kevin




Re: [PATCH v2 0/3] target/arm: Add sve-default-vector-length cpu property

2021-07-26 Thread Peter Maydell
On Mon, 26 Jul 2021 at 16:01, Andrew Jones  wrote:
>
> On Mon, Jul 26, 2021 at 01:42:45PM +0100, Peter Maydell wrote:
> > On Fri, 23 Jul 2021 at 21:34, Richard Henderson
> >  wrote:
> > >
> > > This is intended to resolve #482.
> > >
> > > Changes for v2:
> > >   * Split out length bounding fix to new patch.
> > >   * Use byte units for sve-default-vector-length.
> > >   * Support undocumented -1 "maximum".
> > >   * Add documentation.
> >
> > I'm going to apply this to target-arm.next with the following
> > docs tweak squashed into patch 3:
> >
> > diff --git a/docs/system/arm/cpu-features.rst 
> > b/docs/system/arm/cpu-features.rst
> > index 4ff36cc83f0..7b97df442aa 100644
> > --- a/docs/system/arm/cpu-features.rst
> > +++ b/docs/system/arm/cpu-features.rst
> > @@ -379,11 +379,14 @@ example's (1), (4), and (6) exhibit recommended
> > uses of the properties.
> >  SVE User-mode Default Vector Length Property
> >  
> >
> > -For qemu-aarch64, the cpu property `sve-default-vector-length=N` is
> > +For qemu-aarch64, the cpu property ``sve-default-vector-length=N`` is
> >  defined to mirror the Linux kernel parameter file
> > -`/proc/sys/abi/sve_default_vector_length`.  The default length, `N`,
> > -is in units of bytes and must be between 16 and 8192.
> > +``/proc/sys/abi/sve_default_vector_length``.  The default length, ``N``,
> > +is in units of bytes and must be between 16 and 8192.
> >  If not specified, the default vector length is 64.
> >
> >  If the default length is larger than the maximum vector length enabled
> > -with `sve` properties, the actual vector length will be reduced.
> > +with ``sve`` properties, the actual vector length will be reduced.
> > +
> > +If this property is set to ``-1`` then the default vector length
> > +is set to the maximum possible length.
>
> This file is full of single backtick usage. Isn't it better to stay
> consistent? Or do we need a patch that converts all the rest now?

I just sent one of those:
https://patchew.org/QEMU/20210726142338.31872-1-peter.mayd...@linaro.org/

-- PMM



Re: [PATCH for-6.1? v2 4/9] net/checksum: Remove unused variable in net_checksum_add_iov

2021-07-26 Thread Eric Blake
On Sun, Jul 25, 2021 at 02:24:11AM -1000, Richard Henderson wrote:
> From clang-13:
> ../qemu/net/checksum.c:189:23: error: variable 'buf_off' set but not used \
> [-Werror,-Wunused-but-set-variable]
> 
> Signed-off-by: Richard Henderson 
> ---
>  net/checksum.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH for-6.1? v2 3/9] util/selfmap: Discard mapping on error

2021-07-26 Thread Eric Blake
On Sun, Jul 25, 2021 at 02:24:10AM -1000, Richard Henderson wrote:
> From clang-13:
> util/selfmap.c:26:21: error: variable 'errors' set but not used \
> [-Werror,-Wunused-but-set-variable]
> 
> Quite right of course, but there's no reason not to check errors.
> 
> First, incrementing errors is incorrect, because qemu_strtoul
> returns an errno not a count -- just or them together so that
> we have a non-zero value at the end.
> 
> Second, if we have an error, do not add the struct to the list,
> but free it instead.
> 
> Cc: Alex Bennée 
> Signed-off-by: Richard Henderson 
> ---
>  util/selfmap.c | 29 +
>  1 file changed, 17 insertions(+), 12 deletions(-)
>

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v2 06/22] target/loongarch: Add main translation routines

2021-07-26 Thread Richard Henderson

On 7/25/21 11:39 PM, Song Gao wrote:

+void gen_base_offset_addr(TCGv addr, int base, int offset)
+{
+    if (base == 0) {
+    tcg_gen_movi_tl(addr, offset);
+    } else if (offset == 0) {
+    gen_load_gpr(addr, base);
+    } else {
+    tcg_gen_movi_tl(addr, offset);
+    gen_op_addr_add(addr, cpu_gpr[base], addr);
+    }
+}


Using the interfaces I quote above from my riscv cleanup,
this can be tidied to

     tcg_gen_addi_tl(addr, gpr_src(base), offset);



'riscv cleanup' series at 
https://patchew.org/QEMU/20210709042608.883256-1-richard.hender...@linaro.org/ 
, Right?


Yes.


r~



Re: [PATCH v2 03/22] target/loongarch: Add core definition

2021-07-26 Thread Richard Henderson

On 7/25/21 10:47 PM, Song Gao wrote:

Hmm,  but where can we declared in ? such as ARM architecture declared in 
internals.h, is that OK?


Yes.

It is preferable that only things that are used outside of target/arch/ go into cpu.h, and 
that everything that is private to target/arch/ go into some other local header (with 
internals.h being a good catch-all).



r~



Re: [PATCH for-6.1? v2 2/7] mirror: Drop s->synced

2021-07-26 Thread Eric Blake
On Mon, Jul 26, 2021 at 04:46:08PM +0200, Max Reitz wrote:
> As of HEAD^, there is no meaning to s->synced other than whether the job
> is READY or not.  job_is_ready() gives us that information, too.
> 
> Suggested-by: Vladimir Sementsov-Ogievskiy 
> Signed-off-by: Max Reitz 
> ---
>  block/mirror.c | 19 +--
>  1 file changed, 9 insertions(+), 10 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v6 1/5] hw/nvme: split pmrmsc register into upper and lower

2021-07-26 Thread Keith Busch
On Wed, Jul 21, 2021 at 09:48:32AM +0200, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> The specification uses a set of 32 bit PMRMSCL and PMRMSCU registers to
> make up the 64 bit logical PMRMSC register.
> 
> Make it so.

Looks good.

Reviewed-by: Keith Busch 



Re: [PATCH resend v2 5/5] softmmu/memory_mapping: optimize for RamDiscardManager sections

2021-07-26 Thread Peter Xu
On Tue, Jul 20, 2021 at 03:03:04PM +0200, David Hildenbrand wrote:
> virtio-mem logically plugs/unplugs memory within a sparse memory region
> and notifies via the RamDiscardManager interface when parts become
> plugged (populated) or unplugged (discarded).
> 
> Currently, we end up (via the two users)
> 1) zeroing all logically unplugged/discarded memory during TPM resets.
> 2) reading all logically unplugged/discarded memory when dumping, to
>figure out the content is zero.
> 
> 1) is always bad, because we assume unplugged memory stays discarded
>(and is already implicitly zero).
> 2) isn't that bad with anonymous memory, we end up reading the zero
>page (slow and unnecessary, though). However, once we use some
>file-backed memory (future use case), even reading will populate memory.
> 
> Let's cut out all parts marked as not-populated (discarded) via the
> RamDiscardManager. As virtio-mem is the single user, this now means that
> logically unplugged memory ranges will no longer be included in the
> dump, which results in smaller dump files and faster dumping.
> 
> virtio-mem has a minimum granularity of 1 MiB (and the default is usually
> 2 MiB). Theoretically, we can see quite some fragmentation, in practice
> we won't have it completely fragmented in 1 MiB pieces. Still, we might
> end up with many physical ranges.
> 
> Both, the ELF format and kdump seem to be ready to support many
> individual ranges (e.g., for ELF it seems to be UINT32_MAX, kdump has a
> linear bitmap).
> 
> Cc: Marc-André Lureau 
> Cc: Paolo Bonzini 
> Cc: "Michael S. Tsirkin" 
> Cc: Eduardo Habkost 
> Cc: Alex Williamson 
> Cc: Dr. David Alan Gilbert 
> Cc: Igor Mammedov 
> Cc: Claudio Fontana 
> Cc: Thomas Huth 
> Cc: "Alex Bennée" 
> Cc: Peter Xu 
> Cc: Laurent Vivier 
> Cc: Stefan Berger 
> Signed-off-by: David Hildenbrand 

Reviewed-by: Peter Xu 

-- 
Peter Xu




Re: [PATCH v6 1/5] hw/nvme: split pmrmsc register into upper and lower

2021-07-26 Thread Klaus Jensen
Keith, Appala, any chance one of you could review this? This really
needs to get to an -rc sooner rather than later :)

On Jul 21 09:48, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> The specification uses a set of 32 bit PMRMSCL and PMRMSCU registers to
> make up the 64 bit logical PMRMSC register.
> 
> Make it so.
> 
> Signed-off-by: Klaus Jensen 
> ---
>  include/block/nvme.h | 31 ---
>  hw/nvme/ctrl.c   | 10 ++
>  2 files changed, 22 insertions(+), 19 deletions(-)
> 
> diff --git a/include/block/nvme.h b/include/block/nvme.h
> index 527105fafc0b..84053b68b987 100644
> --- a/include/block/nvme.h
> +++ b/include/block/nvme.h
> @@ -26,7 +26,8 @@ typedef struct QEMU_PACKED NvmeBar {
>  uint32_tpmrsts;
>  uint32_tpmrebs;
>  uint32_tpmrswtp;
> -uint64_tpmrmsc;
> +uint32_tpmrmscl;
> +uint32_tpmrmscu;
>  uint8_t css[484];
>  } NvmeBar;
>  
> @@ -475,25 +476,25 @@ enum NvmePmrswtpMask {
>  #define NVME_PMRSWTP_SET_PMRSWTV(pmrswtp, val)   \
>  (pmrswtp |= (uint64_t)(val & PMRSWTP_PMRSWTV_MASK) << 
> PMRSWTP_PMRSWTV_SHIFT)
>  
> -enum NvmePmrmscShift {
> -PMRMSC_CMSE_SHIFT   = 1,
> -PMRMSC_CBA_SHIFT= 12,
> +enum NvmePmrmsclShift {
> +PMRMSCL_CMSE_SHIFT   = 1,
> +PMRMSCL_CBA_SHIFT= 12,
>  };
>  
> -enum NvmePmrmscMask {
> -PMRMSC_CMSE_MASK   = 0x1,
> -PMRMSC_CBA_MASK= 0xf,
> +enum NvmePmrmsclMask {
> +PMRMSCL_CMSE_MASK   = 0x1,
> +PMRMSCL_CBA_MASK= 0xf,
>  };
>  
> -#define NVME_PMRMSC_CMSE(pmrmsc)\
> -((pmrmsc >> PMRMSC_CMSE_SHIFT)   & PMRMSC_CMSE_MASK)
> -#define NVME_PMRMSC_CBA(pmrmsc) \
> -((pmrmsc >> PMRMSC_CBA_SHIFT)   & PMRMSC_CBA_MASK)
> +#define NVME_PMRMSCL_CMSE(pmrmscl)\
> +((pmrmscl >> PMRMSCL_CMSE_SHIFT)   & PMRMSCL_CMSE_MASK)
> +#define NVME_PMRMSCL_CBA(pmrmscl) \
> +((pmrmscl >> PMRMSCL_CBA_SHIFT)   & PMRMSCL_CBA_MASK)
>  
> -#define NVME_PMRMSC_SET_CMSE(pmrmsc, val)   \
> -(pmrmsc |= (uint64_t)(val & PMRMSC_CMSE_MASK) << PMRMSC_CMSE_SHIFT)
> -#define NVME_PMRMSC_SET_CBA(pmrmsc, val)   \
> -(pmrmsc |= (uint64_t)(val & PMRMSC_CBA_MASK) << PMRMSC_CBA_SHIFT)
> +#define NVME_PMRMSCL_SET_CMSE(pmrmscl, val)   \
> +(pmrmscl |= (uint32_t)(val & PMRMSCL_CMSE_MASK) << PMRMSCL_CMSE_SHIFT)
> +#define NVME_PMRMSCL_SET_CBA(pmrmscl, val)   \
> +(pmrmscl |= (uint32_t)(val & PMRMSCL_CBA_MASK) << PMRMSCL_CBA_SHIFT)
>  
>  enum NvmeSglDescriptorType {
>  NVME_SGL_DESCR_TYPE_DATA_BLOCK  = 0x0,
> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> index 2f0524e12a36..070d9f6a962d 100644
> --- a/hw/nvme/ctrl.c
> +++ b/hw/nvme/ctrl.c
> @@ -5916,11 +5916,13 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr 
> offset, uint64_t data,
>  return;
>  }
>  
> -n->bar.pmrmsc = (n->bar.pmrmsc & ~0x) | (data & 0x);
> +n->bar.pmrmscl = data;
>  n->pmr.cmse = false;
>  
> -if (NVME_PMRMSC_CMSE(n->bar.pmrmsc)) {
> -hwaddr cba = NVME_PMRMSC_CBA(n->bar.pmrmsc) << PMRMSC_CBA_SHIFT;
> +if (NVME_PMRMSCL_CMSE(n->bar.pmrmscl)) {
> +uint64_t pmrmscu = n->bar.pmrmscu;
> +hwaddr cba = (pmrmscu << 32) |
> +(NVME_PMRMSCL_CBA(n->bar.pmrmscl) << PMRMSCL_CBA_SHIFT);
>  if (cba + int128_get64(n->pmr.dev->mr.size) < cba) {
>  NVME_PMRSTS_SET_CBAI(n->bar.pmrsts, 1);
>  return;
> @@ -5936,7 +5938,7 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, 
> uint64_t data,
>  return;
>  }
>  
> -n->bar.pmrmsc = (n->bar.pmrmsc & 0x) | (data << 32);
> +n->bar.pmrmscu = data;
>  return;
>  default:
>  NVME_GUEST_ERR(pci_nvme_ub_mmiowr_invalid,
> -- 
> 2.32.0
> 

-- 
One of us - No more doubt, silence or taboo about mental illness.


signature.asc
Description: PGP signature


Re: [PATCH RFC server 06/11] vfio-user: handle PCI config space accesses

2021-07-26 Thread John Levon
On Mon, Jul 19, 2021 at 04:00:08PM -0400, Jagannathan Raman wrote:

> Define and register handlers for PCI config space accesses
> 
> Signed-off-by: Elena Ufimtseva 
> Signed-off-by: John G Johnson 
> Signed-off-by: Jagannathan Raman 
> ---
>  hw/remote/vfio-user-obj.c | 41 +
>  hw/remote/trace-events|  2 ++
>  2 files changed, 43 insertions(+)
> 
> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
> index 6a2d0f5..60d9fa8 100644
> --- a/hw/remote/vfio-user-obj.c
> +++ b/hw/remote/vfio-user-obj.c
> @@ -36,6 +36,7 @@
>  #include "sysemu/runstate.h"
>  #include "qemu/notify.h"
>  #include "qemu/thread.h"
> +#include "qemu/main-loop.h"
>  #include "qapi/error.h"
>  #include "sysemu/sysemu.h"
>  #include "hw/qdev-core.h"
> @@ -131,6 +132,35 @@ static void *vfu_object_ctx_run(void *opaque)
>  return NULL;
>  }
>  
> +static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
> + size_t count, loff_t offset,
> + const bool is_write)
> +{
> +VfuObject *o = vfu_get_private(vfu_ctx);
> +uint32_t val = 0;
> +int i;
> +
> +qemu_mutex_lock_iothread();
> +
> +for (i = 0; i < count; i++) {
> +if (is_write) {
> +val = *((uint8_t *)(buf + i));
> +trace_vfu_cfg_write((offset + i), val);
> +pci_default_write_config(PCI_DEVICE(o->pci_dev),
> + (offset + i), val, 1);
> +} else {
> +val = pci_default_read_config(PCI_DEVICE(o->pci_dev),
> +  (offset + i), 1);
> +*((uint8_t *)(buf + i)) = (uint8_t)val;
> +trace_vfu_cfg_read((offset + i), val);
> +}
> +}

Is it always OK to split up the access into single bytes like this?

regards
john



[PATCH] hw/arm/nseries: Display hexadecimal value with '0x' prefix

2021-07-26 Thread Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/arm/nseries.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/arm/nseries.c b/hw/arm/nseries.c
index 906c915df78..af3164c5519 100644
--- a/hw/arm/nseries.c
+++ b/hw/arm/nseries.c
@@ -692,7 +692,7 @@ static uint32_t mipid_txrx(void *opaque, uint32_t cmd, int 
len)
 default:
 bad_cmd:
 qemu_log_mask(LOG_GUEST_ERROR,
-  "%s: unknown command %02x\n", __func__, s->cmd);
+  "%s: unknown command 0x%02x\n", __func__, s->cmd);
 break;
 }
 
-- 
2.31.1




[PATCH-for-6.1 v4 4/4] gitlab-ci: Fix 'when:' condition in OpenSBI jobs

2021-07-26 Thread Philippe Mathieu-Daudé
Jobs depending on another should not use the 'when: always'
condition, because if a dependency failed we should not keep
running jobs depending on it. The correct condition is
'when: on_success'.

Fixes: c6fc0fc1a71 ("gitlab-ci.yml: Add jobs to build OpenSBI firmware 
binaries")
Reported-by: Daniel P. Berrangé 
Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Philippe Mathieu-Daudé 
---
 .gitlab-ci.d/opensbi.yml | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/.gitlab-ci.d/opensbi.yml b/.gitlab-ci.d/opensbi.yml
index d8a0456679e..5e0a2477c5d 100644
--- a/.gitlab-ci.d/opensbi.yml
+++ b/.gitlab-ci.d/opensbi.yml
@@ -6,14 +6,14 @@
- .gitlab-ci.d/opensbi.yml
# or the Dockerfile is modified
- .gitlab-ci.d/opensbi/Dockerfile
-   when: always
+   when: on_success
  - changes: # or roms/opensbi/ is modified (submodule updated)
- roms/opensbi/*
-   when: always
+   when: on_success
  - if: '$CI_COMMIT_REF_NAME =~ /^opensbi/' # or the branch/tag starts with 
'opensbi'
-   when: always
+   when: on_success
  - if: '$CI_COMMIT_MESSAGE =~ /opensbi/i' # or last commit description 
contains 'OpenSBI'
-   when: always
+   when: on_success
 
 docker-opensbi:
  extends: .opensbi_job_rules
-- 
2.31.1




  1   2   3   >