Re: [RFC 00/10] hw/mos6522: VIA timer emulation fixes and improvements

2021-08-27 Thread Finn Thain
On Wed, 25 Aug 2021, Mark Cave-Ayland wrote:

> On 24/08/2021 11:09, Finn Thain wrote:
> 
> > This is a patch series that I started last year. The aim was to try to
> > get a monotonic clocksource for Linux/m68k guests. That aim hasn't been
> > achieved yet (for q800 machines) but I'm submitting the patch series as
> > an RFC because,
> > 
> >   - It does improve 6522 emulation fidelity.
> > 
> >   - It allows Linux/m68k to make use of the additional timer that the
> > hardware indeed offers but which QEMU omits. This has several
> > benefits for Linux guests [1].
> > 
> >   - I see that Mark has been working on timer emulation issues in his
> > github repo [2] and it seems likely that MacOS, NetBSD or A/UX guests
> > will also require better 6522 emulation.
> > 
> > To make collaboration easier these patches can also be fetched from
> > github [3].
> > 
> > On a real Quadra, accesses to the SY6522 chips are slow because they are
> > synchronous with the 783360 Hz "phase 2" clock. In QEMU, they are slow
> > only because of the division operation in the timer count calculation.
> > 
> > This patch series improves the fidelity of the emulated chip, but the
> > price is more division ops. I haven't tried to measure this yet.
> > 
> > The emulated 6522 still deviates from the behaviour of the real thing,
> > however. For example, two consecutive accesses to a real 6522 timer
> > counter can never yield the same value. This is not true of the 6522 in
> > QEMU 6 wherein two consecutive accesses to a timer count register have
> > been observed to yield the same value.
> > 
> > Linux is not particularly robust in the face of a 6522 that deviates
> > from the usual behaviour. The problem presently affecting a Linux guest
> > is that its 'via' clocksource is prone to monotonicity failure. That is,
> > the clocksource counter can jump backwards. This can be observed by
> > patching Linux like so:
> > 
> > diff --git a/arch/m68k/mac/via.c b/arch/m68k/mac/via.c
> > --- a/arch/m68k/mac/via.c
> > +++ b/arch/m68k/mac/via.c
> > @@ -606,6 +606,8 @@ void __init via_init_clock(void)
> > clocksource_register_hz(_clk, VIA_CLOCK_FREQ);
> >   }
> >   +static u32 prev_ticks;
> > +
> >   static u64 mac_read_clk(struct clocksource *cs)
> >   {
> > unsigned long flags;
> > @@ -631,6 +633,8 @@ static u64 mac_read_clk(struct clocksource *cs)
> > count = count_high << 8;
> > ticks = VIA_TIMER_CYCLES - count;
> > ticks += clk_offset + clk_total;
> > +if (ticks < prev_ticks) pr_warn("%s: %u < %u\n", __func__, ticks,
> > prev_ticks);
> > +prev_ticks = ticks;
> > local_irq_restore(flags);
> > return ticks;
> > 
> > This problem can be partly blamed on a 6522 design limitation, which is
> > that the timer counter has no overflow register. Hence, if a timer
> > counter wraps around and the kernel is late to handle the subsequent
> > interrupt, the kernel can't account for any missed ticks.
> > 
> > On a real Quadra, the kernel mitigates this limitation by minimizing
> > interrupt latency. But on QEMU, interrupt latency is unbounded. This
> > can't be mitigated by the guest kernel at all and leads to clock drift.
> > This can be observed by patching QEMU like so:
> > 
> > diff --git a/hw/misc/mos6522.c b/hw/misc/mos6522.c
> > --- a/hw/misc/mos6522.c
> > +++ b/hw/misc/mos6522.c
> > @@ -379,6 +379,12 @@ void mos6522_write(void *opaque, hwaddr addr, uint64_t
> > val, unsigned size)
> >   s->pcr = val;
> >   break;
> >   case VIA_REG_IFR:
> > +if (val & T1_INT) {
> > +static int64_t last_t1_int_cleared;
> > +int64_t now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
> > +if (now - last_t1_int_cleared > 2000) printf("\t%s: t1 int
> > clear is late\n", __func__);
> > +last_t1_int_cleared = now;
> > +}
> >   /* reset bits */
> >   s->ifr &= ~val;
> >   mos6522_update_irq(s);
> > 
> > This logic asserts that, given that Linux/m68k sets CONFIG_HZ to 100,
> > the emulator will theoretically see each timer 1 interrupt cleared
> > within 20 ms of the previous one. But that deadline is often missed on
> > my QEMU host [4].
> > 
> > On real Mac hardware you could observe the same scenario if a high
> > priority interrupt were to sufficiently delay the timer interrupt
> > handler. (This is the reason why the VIA1 interrupt priority gets
> > increased from level 1 to level 5 when running on Quadras.)
> > 
> > Anyway, for now, the clocksource monotonicity problem in Linux/mac68k
> > guests is still unresolved. Nonetheless, I think this patch series does
> > improve the situation.
> > 
> > [1] I've also been working on some improvements to Linux/m68k based on
> > Arnd Bergman's clockevent RFC patch,
> > https://lore.kernel.org/linux-m68k/20201008154651.1901126-14-a...@arndb.de/
> > The idea is to add a oneshot clockevent device by making use of the
> > second VIA1 timer. This approach should help mitigate 

Re: [PATCH v2 8/8] tests/tcg: Add arm and aarch64 pc alignment tests

2021-08-27 Thread Richard Henderson

On 8/26/21 6:54 AM, Peter Maydell wrote:

On Sat, 21 Aug 2021 at 21:09, Richard Henderson
 wrote:


+/*
+ * From v8, it is CONSTRAINED UNPREDICTABLE whether BXWritePC aligns
+ * the address or not.  If so, we can legitimately fall through.
+ */
+return EXIT_SUCCESS;
+}


Can we get the test harness to run the test on a cortex-a9 guest CPU
so we can avoid the UNPREDICTABLE and can always check the signal
happened ? The test is a bit weak if it doesn't actually test that
we take an exception...


Well, it'll always raise an exception under qemu.

I wrote it this way so that it would also always succeed when run on real hardware.  Even 
then, I have no specific knowledge of a cpu that does not raise an exception, but I only 
tested cortex-a57.



r~



Re: [RFC 00/10] hw/mos6522: VIA timer emulation fixes and improvements

2021-08-27 Thread Finn Thain
On Tue, 24 Aug 2021, Philippe Mathieu-Daudé wrote:

> On 8/24/21 12:09 PM, Finn Thain wrote:
> 
> > On a real Quadra, accesses to the SY6522 chips are slow because they are 
> > synchronous with the 783360 Hz "phase 2" clock. In QEMU, they are slow 
> > only because of the division operation in the timer count calculation.
> > 
> > This patch series improves the fidelity of the emulated chip, but the 
> > price is more division ops. I haven't tried to measure this yet.
> > 
> > The emulated 6522 still deviates from the behaviour of the real thing, 
> > however. For example, two consecutive accesses to a real 6522 timer 
> > counter can never yield the same value. This is not true of the 6522 in 
> > QEMU 6 wherein two consecutive accesses to a timer count register have 
> > been observed to yield the same value.
> > 
> > Linux is not particularly robust in the face of a 6522 that deviates 
> > from the usual behaviour. The problem presently affecting a Linux guest 
> > is that its 'via' clocksource is prone to monotonicity failure. That is, 
> > the clocksource counter can jump backwards. This can be observed by 
> > patching Linux like so:
> > 
> > diff --git a/arch/m68k/mac/via.c b/arch/m68k/mac/via.c
> > --- a/arch/m68k/mac/via.c
> > +++ b/arch/m68k/mac/via.c
> > @@ -606,6 +606,8 @@ void __init via_init_clock(void)
> > clocksource_register_hz(_clk, VIA_CLOCK_FREQ);
> >  }
> >  
> > +static u32 prev_ticks;
> > +
> >  static u64 mac_read_clk(struct clocksource *cs)
> >  {
> > unsigned long flags;
> > @@ -631,6 +633,8 @@ static u64 mac_read_clk(struct clocksource *cs)
> > count = count_high << 8;
> > ticks = VIA_TIMER_CYCLES - count;
> > ticks += clk_offset + clk_total;
> > +if (ticks < prev_ticks) pr_warn("%s: %u < %u\n", __func__, ticks, 
> > prev_ticks);
> > +prev_ticks = ticks;
> > local_irq_restore(flags);
> >  
> > return ticks;
> > 
> > This problem can be partly blamed on a 6522 design limitation, which is 
> > that the timer counter has no overflow register. Hence, if a timer 
> > counter wraps around and the kernel is late to handle the subsequent 
> > interrupt, the kernel can't account for any missed ticks.
> > 
> > On a real Quadra, the kernel mitigates this limitation by minimizing 
> > interrupt latency. But on QEMU, interrupt latency is unbounded. This 
> > can't be mitigated by the guest kernel at all and leads to clock drift. 
> > This can be observed by patching QEMU like so:
> > 
> > diff --git a/hw/misc/mos6522.c b/hw/misc/mos6522.c
> > --- a/hw/misc/mos6522.c
> > +++ b/hw/misc/mos6522.c
> > @@ -379,6 +379,12 @@ void mos6522_write(void *opaque, hwaddr addr, uint64_t 
> > val, unsigned size)
> >  s->pcr = val;
> >  break;
> >  case VIA_REG_IFR:
> > +if (val & T1_INT) {
> > +static int64_t last_t1_int_cleared;
> > +int64_t now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
> > +if (now - last_t1_int_cleared > 2000) printf("\t%s: t1 int 
> > clear is late\n", __func__);
> > +last_t1_int_cleared = now;
> > +}
> >  /* reset bits */
> >  s->ifr &= ~val;
> >  mos6522_update_irq(s);
> > 
> > This logic asserts that, given that Linux/m68k sets CONFIG_HZ to 100, 
> > the emulator will theoretically see each timer 1 interrupt cleared 
> > within 20 ms of the previous one. But that deadline is often missed on 
> > my QEMU host [4].
> 
> I wonder if using QEMU ptimer wouldn't help here, instead of
> calling qemu_clock_get_ns() and doing the math by hand:
> 
> /* Starting to run with/setting counter to "0" won't trigger immediately,
>  * but after a one period for both oneshot and periodic modes.  */
> #define PTIMER_POLICY_NO_IMMEDIATE_TRIGGER  (1 << 2)
> 
> /* Starting to run with/setting counter to "0" won't re-load counter
>  * immediately, but after a one period.  */
> #define PTIMER_POLICY_NO_IMMEDIATE_RELOAD   (1 << 3)
> 
> /* Make counter value of the running timer represent the actual value and
>  * not the one less.  */
> #define PTIMER_POLICY_NO_COUNTER_ROUND_DOWN (1 << 4)
> 

It's often the case that a conversion to a new API is trivial for someone 
who understands that API. But I still haven't got my head around the 
implementation (hw/core/ptimer.c).

So I agree the ptimer API could simplify mos6522.c but adopting it is not 
trivial for me.

QEMU's 6522 device does not attempt to model the relationship between the 
"phase two" clock and timer counters and timer interrupts. I haven't 
attempted to fix these deviations at all, as that's not trivial either.

But using the ptimer API could potentially make it easier to address those 
fidelity issues.

Re: [RFC 01/10] hw/mos6522: Remove get_load_time() methods and functions

2021-08-27 Thread Finn Thain
On Wed, 25 Aug 2021, Mark Cave-Ayland wrote:

> On 24/08/2021 11:09, Finn Thain wrote:
> 
> > This code appears to be unnecessary.
> > 
> > Signed-off-by: Finn Thain 
> > ---
> >   hw/misc/mos6522.c | 22 +-
> >   1 file changed, 1 insertion(+), 21 deletions(-)
> > 
> > diff --git a/hw/misc/mos6522.c b/hw/misc/mos6522.c
> > index 1c57332b40..a478c1ca43 100644
> > --- a/hw/misc/mos6522.c
> > +++ b/hw/misc/mos6522.c
> > @@ -63,17 +63,6 @@ static uint64_t get_counter_value(MOS6522State *s,
> > MOS6522Timer *ti)
> >   }
> >   }
> >   -static uint64_t get_load_time(MOS6522State *s, MOS6522Timer *ti)
> > -{
> > -MOS6522DeviceClass *mdc = MOS6522_GET_CLASS(s);
> > -
> > -if (ti->index == 0) {
> > -return mdc->get_timer1_load_time(s, ti);
> > -} else {
> > -return mdc->get_timer2_load_time(s, ti);
> > -}
> > -}
> > -
> >   static unsigned int get_counter(MOS6522State *s, MOS6522Timer *ti)
> >   {
> >   int64_t d;
> > @@ -98,7 +87,7 @@ static unsigned int get_counter(MOS6522State *s,
> > MOS6522Timer *ti)
> >   static void set_counter(MOS6522State *s, MOS6522Timer *ti, unsigned int
> > val)
> >   {
> >   trace_mos6522_set_counter(1 + ti->index, val);
> > -ti->load_time = get_load_time(s, ti);
> > +ti->load_time = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
> >   ti->counter_value = val;
> >   if (ti->index == 0) {
> >   mos6522_timer1_update(s, ti, ti->load_time);
> > @@ -208,13 +197,6 @@ static uint64_t mos6522_get_counter_value(MOS6522State
> > *s, MOS6522Timer *ti)
> >   ti->frequency, NANOSECONDS_PER_SECOND);
> >   }
> >   -static uint64_t mos6522_get_load_time(MOS6522State *s, MOS6522Timer *ti)
> > -{
> > -uint64_t load_time = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
> > -
> > -return load_time;
> > -}
> > -
> >   static void mos6522_portA_write(MOS6522State *s)
> >   {
> >   qemu_log_mask(LOG_UNIMP, "portA_write unimplemented\n");
> > @@ -518,8 +500,6 @@ static void mos6522_class_init(ObjectClass *oc, void
> > *data)
> >   mdc->update_irq = mos6522_update_irq;
> >   mdc->get_timer1_counter_value = mos6522_get_counter_value;
> >   mdc->get_timer2_counter_value = mos6522_get_counter_value;
> > -mdc->get_timer1_load_time = mos6522_get_load_time;
> > -mdc->get_timer2_load_time = mos6522_get_load_time;
> >   }
> > static const TypeInfo mos6522_type_info = {
> 
> Both the get_counter_value() and get_load_time() callbacks are used as part of
> the CUDA emulation in hw/misc/macio/cuda.c as per the comment:
> 
> /* MacOS uses timer 1 for calibration on startup, so we use
>  * the timebase frequency and cuda_get_counter_value() with
>  * cuda_get_load_time() to steer MacOS to calculate calibrate its timers
>  * correctly for both TCG and KVM (see commit b981289c49 "PPC: Cuda: Use cuda
>  * timer to expose tbfreq to guest" for more information) */
> 
> Certainly for the 6522 device it is worth configuring with
> --target-list="ppc-softmmu m68k-softmmu" to make sure that you don't
> inadvertently break anything in the PPC world.
> 

No build failure here. Maybe your tree is different?

> A bit of history here: the original mos6522.c was extracted from
> hw/misc/macio/cuda.c when Laurent presented his initial q800 patches since
> they also had their own implementation of the 6522, and it was better to move
> the implementation into a separate QEMU device so that the logic could be
> shared.
> 
> The Darwin kernel timer calibration loop is quite hard to get right: see
> https://opensource.apple.com/source/xnu/xnu-123.5/pexpert/ppc/pe_clock_speed_asm.s.auto.html
> and
> https://opensource.apple.com/source/xnu/xnu-123.5/pexpert/ppc/pe_clock_speed.c.auto.html.
> Ben/Alex came up with the current mechanism to fool the calibration routine,
> and I simply added in those callbacks to allow it to be implemented as part of
> the now-generic 6522 device.
> 

I didn't find any references to these methods (get_timer1_counter_value, 
get_timer2_counter_value, get_timer1_load_time and get_timer2_load_time).
It appears to be dead code, and it adds complexity and harms readability. 
The Linux kernel project has a policy that no code is added if it lacks 
any in-tree usage. Does QEMU have the same policy?



Re: [RFC 07/10] hw/mos6522: Fix initial timer counter reload

2021-08-27 Thread Finn Thain
On Wed, 25 Aug 2021, Mark Cave-Ayland wrote:

> On 24/08/2021 11:09, Finn Thain wrote:
> 
> > The first reload of timer 1 is early by half of a clock cycle as it gets
> > measured from a falling edge. By contrast, the succeeding reloads are
> > measured from rising edge to rising edge.
> > 
> > Neglecting that complication, the behaviour of the counter should be the
> > same from one reload to the next. The sequence is always:
> > 
> > N, N-1, N-2, ... 2, 1, 0, -1, N, N-1, N-2, ...
> > 
> > But at the first reload, the present driver does this instead:
> > 
> > N, N-1, N-2, ... 2, 1, 0, -1, N-1, N-2, ...
> > 
> > Fix this deviation for both timer 1 and timer 2, and allow for the fact
> > that on a real 6522 the timer 2 counter is not reloaded when it wraps.
> > 
> > Signed-off-by: Finn Thain 
> > ---
> >   hw/misc/mos6522.c | 19 +++
> >   1 file changed, 11 insertions(+), 8 deletions(-)
> > 
> > diff --git a/hw/misc/mos6522.c b/hw/misc/mos6522.c
> > index 5b1657ac0d..0a241fe9f8 100644
> > --- a/hw/misc/mos6522.c
> > +++ b/hw/misc/mos6522.c
> > @@ -63,15 +63,16 @@ static unsigned int get_counter(MOS6522State *s,
> > MOS6522Timer *ti)
> >   if (ti->index == 0) {
> >   /* the timer goes down from latch to -1 (period of latch + 2) */
> >   if (d <= (ti->counter_value + 1)) {
> > -counter = (ti->counter_value - d) & 0x;
> > +counter = ti->counter_value - d;
> >   } else {
> > -counter = (d - (ti->counter_value + 1)) % (ti->latch + 2);
> > -counter = (ti->latch - counter) & 0x;
> > +int64_t d_post_reload = d - (ti->counter_value + 2);
> > +/* XXX this calculation assumes that ti->latch has not changed
> > */
> > +counter = ti->latch - (d_post_reload % (ti->latch + 2));
> >   }
> >   } else {
> > -counter = (ti->counter_value - d) & 0x;
> > +counter = ti->counter_value - d;
> >   }
> > -return counter;
> > +return counter & 0x;
> >   }
> > static void set_counter(MOS6522State *s, MOS6522Timer *ti, unsigned int
> > val)
> > @@ -103,11 +104,13 @@ static int64_t get_next_irq_time(MOS6522State *s,
> > MOS6522Timer *ti,
> > /* the timer goes down from latch to -1 (period of latch + 2) */
> >   if (d <= (ti->counter_value + 1)) {
> > -counter = (ti->counter_value - d) & 0x;
> > +counter = ti->counter_value - d;
> >   } else {
> > -counter = (d - (ti->counter_value + 1)) % (ti->latch + 2);
> > -counter = (ti->latch - counter) & 0x;
> > +int64_t d_post_reload = d - (ti->counter_value + 2);
> > +/* XXX this calculation assumes that ti->latch has not changed */
> > +counter = ti->latch - (d_post_reload % (ti->latch + 2));
> >   }
> > +counter &= 0x;
> > /* Note: we consider the irq is raised on 0 */
> >   if (counter == 0x) {
> 
> I think the code looks right, but I couldn't see an explicit reference to this
> behaviour in
> http://archive.6502.org/datasheets/mos_6522_preliminary_nov_1977.pdf.

Yes, that datasheet is missing a lot of information.

> Presumably this matches what you've observed on real hardware?
> 

Yes.



Re: [PATCH 1/5] hw/arm/aspeed: Add get_irq to AspeedSoCClass

2021-08-27 Thread Peter Delevoryas
Just noticed, I forgot to initialize get_irq for the AST2500. I didn’t notice 
it because I hadn’t tested booting an image for -machine ast2500-evb. I’ll make 
sure to test with images for all 3 chip revisions, then I’ll resubmit with 
PATCH v2. I’ll wait a little while though, for comments on the rest of the 
series.

From: p...@fb.com 
Date: Friday, August 27, 2021 at 2:06 PM
To:
Cc: c...@kaod.org , j...@jms.id.au , 
qemu-devel@nongnu.org , qemu-...@nongnu.org 
, Peter Delevoryas 
Subject: [PATCH 1/5] hw/arm/aspeed: Add get_irq to AspeedSoCClass
From: Peter Delevoryas 

The AST2500 uses different logic than the AST2600 for getting IRQ's.
This adds a virtual `get_irq` function to the Aspeed SOC class, so that
the shared initialization code in `hw/arm/aspeed.c` can retrieve IRQ's.

Signed-off-by: Peter Delevoryas 
---
 hw/arm/aspeed_ast2600.c | 1 +
 hw/arm/aspeed_soc.c | 1 +
 include/hw/arm/aspeed_soc.h | 1 +
 3 files changed, 3 insertions(+)

diff --git a/hw/arm/aspeed_ast2600.c b/hw/arm/aspeed_ast2600.c
index e3013128c6..56e1adb343 100644
--- a/hw/arm/aspeed_ast2600.c
+++ b/hw/arm/aspeed_ast2600.c
@@ -527,6 +527,7 @@ static void aspeed_soc_ast2600_class_init(ObjectClass *oc, 
void *data)
 sc->irqmap   = aspeed_soc_ast2600_irqmap;
 sc->memmap   = aspeed_soc_ast2600_memmap;
 sc->num_cpus = 2;
+sc->get_irq  = aspeed_soc_get_irq;
 }

 static const TypeInfo aspeed_soc_ast2600_type_info = {
diff --git a/hw/arm/aspeed_soc.c b/hw/arm/aspeed_soc.c
index 3ad6c56fa9..c373182299 100644
--- a/hw/arm/aspeed_soc.c
+++ b/hw/arm/aspeed_soc.c
@@ -476,6 +476,7 @@ static void aspeed_soc_ast2400_class_init(ObjectClass *oc, 
void *data)
 sc->irqmap   = aspeed_soc_ast2400_irqmap;
 sc->memmap   = aspeed_soc_ast2400_memmap;
 sc->num_cpus = 1;
+sc->get_irq  = aspeed_soc_get_irq;
 }

 static const TypeInfo aspeed_soc_ast2400_type_info = {
diff --git a/include/hw/arm/aspeed_soc.h b/include/hw/arm/aspeed_soc.h
index d9161d26d6..ca16e1e383 100644
--- a/include/hw/arm/aspeed_soc.h
+++ b/include/hw/arm/aspeed_soc.h
@@ -84,6 +84,7 @@ struct AspeedSoCClass {
 const int *irqmap;
 const hwaddr *memmap;
 uint32_t num_cpus;
+qemu_irq (*get_irq)(AspeedSoCState *s, int ctrl);
 };


--
2.30.2


[PATCH 2/5] hw/arm/aspeed: Select console UART from machine

2021-08-27 Thread pdel
From: Peter Delevoryas 

This change replaces the UART serial device initialization code with machine
configuration data, making it so that we have a single code path for console
UART initialization, but allowing different machines to use different
UART's. This is relevant because the Aspeed chips have 2 debug UART's, UART5
and UART1, and while most machines just use UART5, some use UART1.

Signed-off-by: Peter Delevoryas 
---
 hw/arm/aspeed.c | 7 +++
 hw/arm/aspeed_ast2600.c | 5 -
 hw/arm/aspeed_soc.c | 5 -
 include/hw/arm/aspeed.h | 1 +
 4 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index 9d43e26c51..ff53d12395 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -14,6 +14,7 @@
 #include "hw/arm/boot.h"
 #include "hw/arm/aspeed.h"
 #include "hw/arm/aspeed_soc.h"
+#include "hw/char/serial.h"
 #include "hw/i2c/i2c_mux_pca954x.h"
 #include "hw/i2c/smbus_eeprom.h"
 #include "hw/misc/pca9552.h"
@@ -21,6 +22,7 @@
 #include "hw/misc/led.h"
 #include "hw/qdev-properties.h"
 #include "sysemu/block-backend.h"
+#include "sysemu/sysemu.h"
 #include "hw/loader.h"
 #include "qemu/error-report.h"
 #include "qemu/units.h"
@@ -352,6 +354,10 @@ static void aspeed_machine_init(MachineState *machine)
 }
 qdev_realize(DEVICE(>soc), NULL, _abort);
 
+serial_mm_init(get_system_memory(), sc->memmap[amc->serial_dev], 2,
+   sc->get_irq(>soc, amc->serial_dev), 38400,
+   serial_hd(0), DEVICE_LITTLE_ENDIAN);
+
 memory_region_add_subregion(get_system_memory(),
 sc->memmap[ASPEED_DEV_SDRAM],
 >ram_container);
@@ -804,6 +810,7 @@ static void aspeed_machine_class_init(ObjectClass *oc, void 
*data)
 mc->no_parallel = 1;
 mc->default_ram_id = "ram";
 amc->macs_mask = ASPEED_MAC0_ON;
+amc->serial_dev = ASPEED_DEV_UART5;
 
 aspeed_machine_class_props_init(oc);
 }
diff --git a/hw/arm/aspeed_ast2600.c b/hw/arm/aspeed_ast2600.c
index 56e1adb343..a27b0de482 100644
--- a/hw/arm/aspeed_ast2600.c
+++ b/hw/arm/aspeed_ast2600.c
@@ -322,11 +322,6 @@ static void aspeed_soc_ast2600_realize(DeviceState *dev, 
Error **errp)
 sysbus_connect_irq(SYS_BUS_DEVICE(>timerctrl), i, irq);
 }
 
-/* UART - attach an 8250 to the IO space as our UART5 */
-serial_mm_init(get_system_memory(), sc->memmap[ASPEED_DEV_UART5], 2,
-   aspeed_soc_get_irq(s, ASPEED_DEV_UART5),
-   38400, serial_hd(0), DEVICE_LITTLE_ENDIAN);
-
 /* I2C */
 object_property_set_link(OBJECT(>i2c), "dram", OBJECT(s->dram_mr),
  _abort);
diff --git a/hw/arm/aspeed_soc.c b/hw/arm/aspeed_soc.c
index c373182299..0c09d1e5b4 100644
--- a/hw/arm/aspeed_soc.c
+++ b/hw/arm/aspeed_soc.c
@@ -287,11 +287,6 @@ static void aspeed_soc_realize(DeviceState *dev, Error 
**errp)
 sysbus_connect_irq(SYS_BUS_DEVICE(>timerctrl), i, irq);
 }
 
-/* UART - attach an 8250 to the IO space as our UART5 */
-serial_mm_init(get_system_memory(), sc->memmap[ASPEED_DEV_UART5], 2,
-   aspeed_soc_get_irq(s, ASPEED_DEV_UART5), 38400,
-   serial_hd(0), DEVICE_LITTLE_ENDIAN);
-
 /* I2C */
 object_property_set_link(OBJECT(>i2c), "dram", OBJECT(s->dram_mr),
  _abort);
diff --git a/include/hw/arm/aspeed.h b/include/hw/arm/aspeed.h
index c9747b15fc..9f5b9f04d6 100644
--- a/include/hw/arm/aspeed.h
+++ b/include/hw/arm/aspeed.h
@@ -38,6 +38,7 @@ struct AspeedMachineClass {
 uint32_t num_cs;
 uint32_t macs_mask;
 void (*i2c_init)(AspeedMachineState *bmc);
+uint32_t serial_dev;
 };
 
 
-- 
2.30.2




[PATCH 1/5] hw/arm/aspeed: Add get_irq to AspeedSoCClass

2021-08-27 Thread pdel
From: Peter Delevoryas 

The AST2500 uses different logic than the AST2600 for getting IRQ's.
This adds a virtual `get_irq` function to the Aspeed SOC class, so that
the shared initialization code in `hw/arm/aspeed.c` can retrieve IRQ's.

Signed-off-by: Peter Delevoryas 
---
 hw/arm/aspeed_ast2600.c | 1 +
 hw/arm/aspeed_soc.c | 1 +
 include/hw/arm/aspeed_soc.h | 1 +
 3 files changed, 3 insertions(+)

diff --git a/hw/arm/aspeed_ast2600.c b/hw/arm/aspeed_ast2600.c
index e3013128c6..56e1adb343 100644
--- a/hw/arm/aspeed_ast2600.c
+++ b/hw/arm/aspeed_ast2600.c
@@ -527,6 +527,7 @@ static void aspeed_soc_ast2600_class_init(ObjectClass *oc, 
void *data)
 sc->irqmap   = aspeed_soc_ast2600_irqmap;
 sc->memmap   = aspeed_soc_ast2600_memmap;
 sc->num_cpus = 2;
+sc->get_irq  = aspeed_soc_get_irq;
 }
 
 static const TypeInfo aspeed_soc_ast2600_type_info = {
diff --git a/hw/arm/aspeed_soc.c b/hw/arm/aspeed_soc.c
index 3ad6c56fa9..c373182299 100644
--- a/hw/arm/aspeed_soc.c
+++ b/hw/arm/aspeed_soc.c
@@ -476,6 +476,7 @@ static void aspeed_soc_ast2400_class_init(ObjectClass *oc, 
void *data)
 sc->irqmap   = aspeed_soc_ast2400_irqmap;
 sc->memmap   = aspeed_soc_ast2400_memmap;
 sc->num_cpus = 1;
+sc->get_irq  = aspeed_soc_get_irq;
 }
 
 static const TypeInfo aspeed_soc_ast2400_type_info = {
diff --git a/include/hw/arm/aspeed_soc.h b/include/hw/arm/aspeed_soc.h
index d9161d26d6..ca16e1e383 100644
--- a/include/hw/arm/aspeed_soc.h
+++ b/include/hw/arm/aspeed_soc.h
@@ -84,6 +84,7 @@ struct AspeedSoCClass {
 const int *irqmap;
 const hwaddr *memmap;
 uint32_t num_cpus;
+qemu_irq (*get_irq)(AspeedSoCState *s, int ctrl);
 };
 
 
-- 
2.30.2




[PATCH 3/5] hw/arm/aspeed: Add fuji machine type

2021-08-27 Thread pdel
From: Peter Delevoryas 

Fuji uses the AST2600, so it's very similar to `ast2600-evb`, but it has
a few slight differences, such as using UART1 for the serial console.

Signed-off-by: Peter Delevoryas 
---
 hw/arm/aspeed.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index ff53d12395..d2eb516a1d 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -1029,6 +1029,15 @@ static void 
aspeed_machine_rainier_class_init(ObjectClass *oc, void *data)
 aspeed_soc_num_cpus(amc->soc_name);
 };
 
+static void aspeed_machine_fuji_class_init(ObjectClass *oc, void *data)
+{
+MachineClass *mc = MACHINE_CLASS(oc);
+AspeedMachineClass *amc = ASPEED_MACHINE_CLASS(oc);
+
+mc->desc = "Facebook Fuji BMC (Aspeed AST2600, Cortex-A7)";
+amc->serial_dev = ASPEED_DEV_UART1;
+}
+
 static const TypeInfo aspeed_machine_types[] = {
 {
 .name  = MACHINE_TYPE_NAME("palmetto-bmc"),
@@ -1078,6 +1087,10 @@ static const TypeInfo aspeed_machine_types[] = {
 .name  = MACHINE_TYPE_NAME("rainier-bmc"),
 .parent= TYPE_ASPEED_MACHINE,
 .class_init= aspeed_machine_rainier_class_init,
+}, {
+.name  = MACHINE_TYPE_NAME("fuji"),
+.parent= MACHINE_TYPE_NAME("ast2600-evb"),
+.class_init= aspeed_machine_fuji_class_init,
 }, {
 .name  = TYPE_ASPEED_MACHINE,
 .parent= TYPE_MACHINE,
-- 
2.30.2




[PATCH 4/5] hw/arm/aspeed: Fix AST2600_CLK_SEL3 address

2021-08-27 Thread pdel
From: Peter Delevoryas 

This register address is not actually used anywhere, and the datasheet
specifies that it's zero-initialized by default anyways, but the address
is incorrect. This just corrects the address.

Fixes: e09cf36321f6 ("hw: aspeed_scu: Add AST2600 support")
Signed-off-by: Peter Delevoryas 
---
 hw/misc/aspeed_scu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/misc/aspeed_scu.c b/hw/misc/aspeed_scu.c
index 40a38ebd85..c373e678f0 100644
--- a/hw/misc/aspeed_scu.c
+++ b/hw/misc/aspeed_scu.c
@@ -108,7 +108,7 @@
 #define AST2600_EPLL_EXT  TO_REG(0x244)
 #define AST2600_CLK_SEL   TO_REG(0x300)
 #define AST2600_CLK_SEL2  TO_REG(0x304)
-#define AST2600_CLK_SEL3  TO_REG(0x310)
+#define AST2600_CLK_SEL3  TO_REG(0x308)
 #define AST2600_HW_STRAP1 TO_REG(0x500)
 #define AST2600_HW_STRAP1_CLR TO_REG(0x504)
 #define AST2600_HW_STRAP1_PROTTO_REG(0x508)
-- 
2.30.2




[PATCH 5/5] hw/arm/aspeed: Initialize AST2600 clock selection registers

2021-08-27 Thread pdel
From: Peter Delevoryas 

UART5 is typically used as the default debug UART on the AST2600, but
UART1 is also designed to be a debug UART. All the AST2600 UART's have
semi-configurable clock rates through registers in the System Control
Unit (SCU), but only UART5 works out of the box with zero-initialized
values. The rest of the UART's expect a few of the registers to be
initialized to non-zero values, or else the clock rate calculation will
yield zero or undefined (due to a divide-by-zero).

For reference, the U-Boot clock rate driver here shows the calculation:


https://github.com/facebook/openbmc-uboot/blob/main/drivers/clk/aspeed/clk_ast2600.c#L357)

To summarize, UART5 allows selection from 4 rates: 24 MHz, 192 MHz, 24 /
13 MHz, and 192 / 13 MHz. The other UART's allow selecting either the
"low" rate (UARTCLK) or the "high" rate (HUARTCLK). UARTCLK and HUARTCLK
are configurable themselves:

UARTCLK = UXCLK * R / (N * 2)
HUARTCLK = HUXCLK * HR / (HN * 2)

UXCLK and HUXCLK are also configurable, and depend on the APLL and/or
HPLL clock rates, which also derive from complicated calculations. Long
story short, there's lots of multiplication and division from
configurable registers, and most of these registers are zero-initialized
in QEMU, which at best is unexpected and at worst causes this clock rate
driver to hang from divide-by-zero's. This can also be difficult to
diagnose, because it may cause U-Boot to hang before serial console
initialization completes, requiring intervention from gdb.

This change just initializes all of these registers with default values
from the datasheet.

Signed-off-by: Peter Delevoryas 
---
 hw/misc/aspeed_scu.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/hw/misc/aspeed_scu.c b/hw/misc/aspeed_scu.c
index c373e678f0..d51fe8564d 100644
--- a/hw/misc/aspeed_scu.c
+++ b/hw/misc/aspeed_scu.c
@@ -104,11 +104,16 @@
 #define AST2600_SDRAM_HANDSHAKE   TO_REG(0x100)
 #define AST2600_HPLL_PARAMTO_REG(0x200)
 #define AST2600_HPLL_EXT  TO_REG(0x204)
+#define AST2600_APLL_PARAMTO_REG(0x210)
 #define AST2600_MPLL_EXT  TO_REG(0x224)
 #define AST2600_EPLL_EXT  TO_REG(0x244)
 #define AST2600_CLK_SEL   TO_REG(0x300)
 #define AST2600_CLK_SEL2  TO_REG(0x304)
 #define AST2600_CLK_SEL3  TO_REG(0x308)
+#define AST2600_CLK_SEL4  TO_REG(0x310)
+#define AST2600_CLK_SEL5  TO_REG(0x314)
+#define AST2600_UARTCLK_PARAM TO_REG(0x338)
+#define AST2600_HUARTCLK_PARAMTO_REG(0x33C)
 #define AST2600_HW_STRAP1 TO_REG(0x500)
 #define AST2600_HW_STRAP1_CLR TO_REG(0x504)
 #define AST2600_HW_STRAP1_PROTTO_REG(0x508)
@@ -658,9 +663,15 @@ static const uint32_t 
ast2600_a1_resets[ASPEED_AST2600_SCU_NR_REGS] = {
 [AST2600_CLK_STOP_CTRL2]= 0xFFF0FFF0,
 [AST2600_SDRAM_HANDSHAKE]   = 0x,
 [AST2600_HPLL_PARAM]= 0x1000405F,
+[AST2600_APLL_PARAM]= 0x1000405F,
 [AST2600_CHIP_ID0]  = 0x1234ABCD,
 [AST2600_CHIP_ID1]  = 0x,
-
+[AST2600_CLK_SEL2]  = 0x0070,
+[AST2600_CLK_SEL3]  = 0x,
+[AST2600_CLK_SEL4]  = 0xF3F4,
+[AST2600_CLK_SEL5]  = 0x3000,
+[AST2600_UARTCLK_PARAM] = 0x00014506,
+[AST2600_HUARTCLK_PARAM]= 0x000145C0,
 };
 
 static void aspeed_ast2600_scu_reset(DeviceState *dev)
-- 
2.30.2




[PATCH 0/5] hw/arm/aspeed: Add fuji machine type

2021-08-27 Thread pdel
From: Peter Delevoryas 

Hello!

This patch series creates an Aspeed machine type for Facebook's OpenBMC
platform "fuji".

The first 2 commits do some refactoring, to allow Aspeed machines to
configure the first serial device. Most board configurations use UART5
for the console, but fuji uses UART1. Neither of these should change the
behavior for any machine types.

The third commit adds the fuji machine type definition, utilizing the
"serial_dev" option from the previous two commits to configure the
console device as UART1 instead of UART5.

After the third commit, you can test booting a fuji image as follows:

# Build a fuji image from Facebook's OpenBMC repository.
git clone https://github.com/facebook/openbmc
cd openbmc
./sync_yocto.sh
source openbmc-init-build-env fuji build-fuji
bitbake fuji-image

dd if=/dev/zero of=/tmp/fuji-image.mtd bs=1M count=128
dd if=./tmp/deploy/images/fuji/flash-fuji of=/tmp/fuji-image.mtd \
bs=1k conv=notrunc

git clone https://github.com/peterdelevoryas/qemu
cd qemu
./configure --target-list=arm-softmmu
make -j $(nproc)

# Attempt to boot the fuji image: you should not see any console
# output.
./build/arm-softmmu/qemu-system-arm -machine fuji \
-drive file=/tmp/fuji-image.mtd,format=raw,if=mtd -serial stdio

You shouldn't see any serial console output, because U-Boot hangs in
clock rate initialization due to a divide-by-zero. The last 2 commits
fixup the clock registers to avoid the divide-by-zero, and fuji boots
successfully after that.

I organized the patch series with the clock rate fixes last because it
was more natural to test the behavior before and after the fix, but I
can understand if you'd like those patches to come first, or to even be
added completely independently from the fuji patch series.

This is my first contribution to QEMU, and I tried to follow the
wiki/etc as best as possible, but I'm sure I probably made some
mistakes, so let me know how best to submit this.

Peter Delevoryas (5):
  hw/arm/aspeed: Add get_irq to AspeedSoCClass
  hw/arm/aspeed: Select console UART from machine
  hw/arm/aspeed: Add fuji machine type
  hw/arm/aspeed: Fix AST2600_CLK_SEL3 address
  hw/arm/aspeed: Initialize AST2600 clock selection registers

 hw/arm/aspeed.c | 20 
 hw/arm/aspeed_ast2600.c |  6 +-
 hw/arm/aspeed_soc.c |  6 +-
 hw/misc/aspeed_scu.c| 15 +--
 include/hw/arm/aspeed.h |  1 +
 include/hw/arm/aspeed_soc.h |  1 +
 6 files changed, 37 insertions(+), 12 deletions(-)

-- 
2.30.2




[PATCH] block/export/fuse.c: fix fuse-lseek on uclibc or musl

2021-08-27 Thread Fabrice Fontaine
Include linux/fs.h to avoid the following build failure on uclibc or
musl raised since version 6.0.0:

../block/export/fuse.c: In function 'fuse_lseek':
../block/export/fuse.c:641:19: error: 'SEEK_HOLE' undeclared (first use in this 
function)
  641 | if (whence != SEEK_HOLE && whence != SEEK_DATA) {
  |   ^
../block/export/fuse.c:641:19: note: each undeclared identifier is reported 
only once for each function it appears in
../block/export/fuse.c:641:42: error: 'SEEK_DATA' undeclared (first use in this 
function); did you mean 'SEEK_SET'?
  641 | if (whence != SEEK_HOLE && whence != SEEK_DATA) {
  |  ^
  |  SEEK_SET

Fixes:
 - 
http://autobuild.buildroot.org/results/33c90ebf04997f4d3557cfa66abc9cf9a3076137

Signed-off-by: Fabrice Fontaine 
---
 block/export/fuse.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/export/fuse.c b/block/export/fuse.c
index fc7b07d2b5..2e3bf8270b 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -31,6 +31,9 @@
 #include 
 #include 
 
+#ifdef __linux__
+#include 
+#endif
 
 /* Prevent overly long bounce buffer allocations */
 #define FUSE_MAX_BOUNCE_BYTES (MIN(BDRV_REQUEST_MAX_BYTES, 64 * 1024 * 1024))
-- 
2.32.0




Re: [PATCH v6 10/10] ACPI ERST: step 6 of bios-tables-test

2021-08-27 Thread Eric DeVolder

Igor,
I'm not sure if I should post v7 with the correction to the tables,
or await your guidance/feedback on v6.
Thanks,
eric


On 8/6/21 12:16 PM, Eric DeVolder wrote:

Well, I discovered today that running "make check" again resulted in
bios table mismatches. In looking into this further, I think I might
finally have an understanding as to how this is all to work. My
bios-tables-test-allowed-diff for step 1 now looks like:

"tests/data/acpi/pc/DSDT.acpierst",
"tests/data/acpi/pc/ERST",
"tests/data/acpi/q35/DSDT.acpierst",
"tests/data/acpi/q35/ERST",
"tests/data/acpi/microvm/ERST.pcie",

and with the corresponding empty files and by using the
  .variant = ".acpierst"
in bios-tables-test, I am able to run "make check" multiple times
now without failures.

So, that means patch 01/10 and 10/10 are wrong. I'm assuming there
will be other items to address, so I'll plan for these fixes in
v7.

My apologies,
eric


On 8/5/21 5:30 PM, Eric DeVolder wrote:

Following the guidelines in tests/qtest/bios-tables-test.c, this
is step 6, the re-generated ACPI tables binary blobs.

Signed-off-by: Eric DeVolder 
---
  tests/data/acpi/microvm/ERST.pcie   | Bin 0 -> 912 bytes
  tests/data/acpi/pc/DSDT | Bin 6002 -> 6009 bytes
  tests/data/acpi/pc/ERST | Bin 0 -> 912 bytes
  tests/data/acpi/q35/DSDT    | Bin 8289 -> 8306 bytes
  tests/data/acpi/q35/ERST    | Bin 0 -> 912 bytes
  tests/qtest/bios-tables-test-allowed-diff.h |   6 --
  6 files changed, 6 deletions(-)
  create mode 100644 tests/data/acpi/microvm/ERST.pcie

diff --git a/tests/data/acpi/microvm/ERST.pcie 
b/tests/data/acpi/microvm/ERST.pcie
new file mode 100644
index 
..d9a2b3211ab5893a50751ad52be3782579e367f2
GIT binary patch
literal 912
zcmaKpO%8%E5QPUQ|KVrvh9h_c12J)@5f?5!k_Ygv*jGA8UW7?#`}+D#XXyDpKHiZ?
z@anI_W$gOrZRl(SB7!yMqx}#E4EC=}m^g_!0^0`kEl)DOuIXM6D@@*xq*8vyqH
z)b0KTlmlgmH~xt7vG=QZp?+M@&@5ljF8

diff --git a/tests/data/acpi/pc/ERST b/tests/data/acpi/pc/ERST
index 
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..f24fadd345c798ee5c17cdb66e0e1703bd1b2f26
 100644
GIT binary patch
literal 912
zcmaKpOAdlC6h#XZC=fn#CoF*_7>J28jW}>wF2KFG3zs9lTPTnl;U#=7r>E_sr(1u2
z21ttC4TZ!l<{$N9cc#e2SmmnSn
O1||j(wg6|p5C#C(xDBxY

delta 42
xcmez5@X&$FCDJ28jW}>wF2KFG3zs9lTPTnl;U#=7r>E_sr(1u2
z21diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h

index b3aaf76..dfb8523 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1,7 +1 @@
  /* List of comma-separated changed AML files to ignore */
-"tests/data/acpi/pc/ERST",
-"tests/data/acpi/q35/ERST",
-"tests/data/acpi/microvm/ERST",
-"tests/data/acpi/pc/DSDT",
-"tests/data/acpi/q35/DSDT",
-"tests/data/acpi/microvm/DSDT",





Re: [PATCH v2 38/43] bsd-user: Refactor load_elf_sections and is_target_elf_binary

2021-08-27 Thread Warner Losh


> On Aug 26, 2021, at 3:11 PM, i...@bsdimp.com wrote:
> 
> From: Warner Losh 
> 
> Factor out load_elf_sections and is_target_elf_binary out of
> load_elf_interp.
> 
> Signed-off-by: Mikaël Urankar 
> Signed-off-by: Stacey Son 
> Signed-off-by: Warner Losh 
> ---
> bsd-user/elfload.c | 350 +
> 1 file changed, 164 insertions(+), 186 deletions(-)
> 
> diff --git a/bsd-user/elfload.c b/bsd-user/elfload.c
> index bdf18f3dce..aed28f2f73 100644
> --- a/bsd-user/elfload.c
> +++ b/bsd-user/elfload.c
> @@ -36,6 +36,8 @@ abi_ulong target_stksiz;
> abi_ulong target_stkbas;
> 
> static int elf_core_dump(int signr, CPUArchState *env);
> +static int load_elf_sections(const struct elfhdr *hdr, struct elf_phdr *phdr,
> +int fd, abi_ulong rbase, abi_ulong *baddrp);
> 
> static inline void memcpy_fromfs(void *to, const void *from, unsigned long n)
> {
> @@ -271,16 +273,10 @@ static abi_ulong load_elf_interp(struct elfhdr 
> *interp_elf_ex,
>  abi_ulong *interp_load_addr)
> {
> struct elf_phdr *elf_phdata  =  NULL;
> -struct elf_phdr *eppnt;
> -abi_ulong load_addr = 0;
> -int load_addr_set = 0;
> +abi_ulong rbase;
> int retval;
> -abi_ulong last_bss, elf_bss;
> -abi_ulong error;
> -int i;
> +abi_ulong baddr, error;
> 
> -elf_bss = 0;
> -last_bss = 0;
> error = 0;
> 
> bswap_ehdr(interp_elf_ex);
> @@ -325,6 +321,7 @@ static abi_ulong load_elf_interp(struct elfhdr 
> *interp_elf_ex,
> }
> bswap_phdr(elf_phdata, interp_elf_ex->e_phnum);
> 
> +rbase = 0;
> if (interp_elf_ex->e_type == ET_DYN) {
> /*
>  * In order to avoid hardcoding the interpreter load
> @@ -332,86 +329,25 @@ static abi_ulong load_elf_interp(struct elfhdr 
> *interp_elf_ex,
>  */
> error = target_mmap(0, INTERP_MAP_SIZE, PROT_NONE,
> MAP_PRIVATE | MAP_ANON, -1, 0);

s/error = /rbase = /

so that the test below worked properly.

> -if (error == -1) {
> +if (rbase == -1) {
> perror("mmap");
> exit(-1);
> }
> -load_addr = error;
> -load_addr_set = 1;
> -}
> -
> -eppnt = elf_phdata;
> -for (i = 0; i < interp_elf_ex->e_phnum; i++, eppnt++)
> -if (eppnt->p_type == PT_LOAD) {
> -int elf_type = MAP_PRIVATE | MAP_DENYWRITE;
> -int elf_prot = 0;
> -abi_ulong vaddr = 0;
> -abi_ulong k;
> -
> -if (eppnt->p_flags & PF_R) elf_prot =  PROT_READ;
> -if (eppnt->p_flags & PF_W) elf_prot |= PROT_WRITE;
> -if (eppnt->p_flags & PF_X) elf_prot |= PROT_EXEC;
> -if (interp_elf_ex->e_type == ET_EXEC || load_addr_set) {
> -elf_type |= MAP_FIXED;
> -vaddr = eppnt->p_vaddr;
> -}
> -error = target_mmap(load_addr + TARGET_ELF_PAGESTART(vaddr),
> -eppnt->p_filesz + 
> TARGET_ELF_PAGEOFFSET(eppnt->p_vaddr),
> -elf_prot,
> -elf_type,
> -interpreter_fd,
> -eppnt->p_offset - 
> TARGET_ELF_PAGEOFFSET(eppnt->p_vaddr));
> -
> -if (error == -1) {
> -/* Real error */
> -close(interpreter_fd);
> -free(elf_phdata);
> -return ~((abi_ulong)0UL);
> -}
> -
> -if (!load_addr_set && interp_elf_ex->e_type == ET_DYN) {
> -load_addr = error;
> -load_addr_set = 1;
> -}
> -
> -/*
> - * Find the end of the file  mapping for this phdr, and keep
> - * track of the largest address we see for this.
> - */
> -k = load_addr + eppnt->p_vaddr + eppnt->p_filesz;
> -if (k > elf_bss) elf_bss = k;
> +}
> 
> -/*
> - * Do the same thing for the memory mapping - between
> - * elf_bss and last_bss is the bss section.
> - */
> -k = load_addr + eppnt->p_memsz + eppnt->p_vaddr;
> -if (k > last_bss) last_bss = k;
> -}
> +error = load_elf_sections(interp_elf_ex, elf_phdata, interpreter_fd, 
> rbase,
> +);
> +if (error != 0) {
> +perror("load_elf_sections");
> +exit(-1);
> +}
> 
> /* Now use mmap to map the library into memory. */
> -
> close(interpreter_fd);
> -
> -/*
> - * Now fill out the bss section.  First pad the last page up
> - * to the page boundary, and then perform a mmap to make sure
> - * that there are zeromapped pages up to and including the last
> - * bss page.
> - */
> -padzero(elf_bss, last_bss);
> -elf_bss = TARGET_ELF_PAGESTART(elf_bss + qemu_host_page_size - 1); /* 
> What we have mapped so far */
> -
> -/* Map the last of the bss segment */
> -if 

Re: [PATCH] Report any problems with loading the VGA driver for PPC Macintosh targets

2021-08-27 Thread BALATON Zoltan

On Fri, 27 Aug 2021, John Arbuckle wrote:

I was having a problem with missing video resolutions in my Mac OS 9 VM. When I
ran QEMU it gave no indication as to why these resolutions were missing. I found
out that the OpenFirmware VGA driver was not being loaded. To prevent anyone 
from
going thru the same trouble I went thru I added messages that the user can see
when a problem takes place with loading this driver in the future.

Signed-off-by: John Arbuckle 
---
hw/ppc/mac_newworld.c | 6 ++
hw/ppc/mac_oldworld.c | 6 ++
2 files changed, 12 insertions(+)

diff --git a/hw/ppc/mac_newworld.c b/hw/ppc/mac_newworld.c
index 7bb7ac3997..c1960452b8 100644
--- a/hw/ppc/mac_newworld.c
+++ b/hw/ppc/mac_newworld.c
@@ -526,8 +526,14 @@ static void ppc_core99_init(MachineState *machine)

if (g_file_get_contents(filename, _file, _size, NULL)) {
fw_cfg_add_file(fw_cfg, "ndrv/qemu_vga.ndrv", ndrv_file, ndrv_size);
+} else {
+printf("Warning: failed to load driver %s. This may cause video"
+   " problems.\n");


I think you should use warn_report for these instead of printf and watch 
out if that needs \n or not (some functions add \n while some others may 
not and I always forget which is which but checkpatch should warn for it 
so you should get nofified if you leave \n there but it's not needed).


Regards,
BALATON Zoltan


}
g_free(filename);
+} else {
+printf("Warning: driver %s not found. This may cause video 
problems.\n",
+   NDRV_VGA_FILENAME);
}

qemu_register_boot_set(fw_cfg_boot_set, fw_cfg);
diff --git a/hw/ppc/mac_oldworld.c b/hw/ppc/mac_oldworld.c
index de2be960e6..96603fe9cf 100644
--- a/hw/ppc/mac_oldworld.c
+++ b/hw/ppc/mac_oldworld.c
@@ -367,8 +367,14 @@ static void ppc_heathrow_init(MachineState *machine)

if (g_file_get_contents(filename, _file, _size, NULL)) {
fw_cfg_add_file(fw_cfg, "ndrv/qemu_vga.ndrv", ndrv_file, ndrv_size);
+} else {
+printf("Warning: failed to load driver %s. This may cause video"
+   " problems.\n");
}
g_free(filename);
+} else {
+printf("Warning: driver %s not found. This may cause video 
problems.\n",
+   NDRV_VGA_FILENAME);
}

qemu_register_boot_set(fw_cfg_boot_set, fw_cfg);





Re: [PATCH] qapi: Set boolean value correctly in examples

2021-08-27 Thread Eric Blake
On Fri, Aug 27, 2021 at 05:06:27PM +0800, Guoyi Tu wrote:
> 
> Signed-off-by: Guoyi Tu 
> ---
>  qapi/trace.json | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Eric Blake 

> 
> diff --git a/qapi/trace.json b/qapi/trace.json
> index 47c68f04da..eedfded512 100644
> --- a/qapi/trace.json
> +++ b/qapi/trace.json
> @@ -99,7 +99,7 @@
>  # Example:
>  #
>  # -> { "execute": "trace-event-set-state",
> -#  "arguments": { "name": "qemu_memalign", "enable": "true" } }
> +#  "arguments": { "name": "qemu_memalign", "enable": true } }
>  # <- { "return": {} }
>  #
>  ##
> -- 
> 2.25.1
> 
> -- 
> Best Regards,
> Guoyi Tu
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH 2/2] Prevent vhost-user-blk-test hang

2021-08-27 Thread ebl...@redhat.com
On Fri, Aug 27, 2021 at 04:50:47PM +, Raphael Norwitz wrote:
> In the vhost-user-blk-test, as of now there is nothing stoping

stopping

> vhost-user-blk in QEMU writing to the socket right after forking off the
> storage daemon before it has a chance to come up properly, leaving the
> test hanging forever. This intermittently hanging test has caused QEMU
> automation failures reported multiple times on the mailing list [1].
> 
> This change makes the storage-daemon notify the vhost-user-blk-test
> that it is fully initialized and ready to handle client connections via
> a pipefd before allowing the test to proceed. This ensures that the
> storage-daemon backend won't miss vhost-user messages and thereby
> resolves the hang.

As I said on patch 1, I think the proper fix here is to utilize the
--pidfile option.

> 
> [1] 
> https://lore.kernel.org/qemu-devel/CAFEAcA8kYpz9LiPNxnWJAPSjc=nv532bedyfynabemeohqb...@mail.gmail.com/
> 
> Signed-off-by: Raphael Norwitz 
> ---
>  tests/qtest/vhost-user-blk-test.c | 33 ---
>  1 file changed, 30 insertions(+), 3 deletions(-)
> 
> diff --git a/tests/qtest/vhost-user-blk-test.c 
> b/tests/qtest/vhost-user-blk-test.c
> index 6f108a1b62..b62af449df 100644
> --- a/tests/qtest/vhost-user-blk-test.c
> +++ b/tests/qtest/vhost-user-blk-test.c
> @@ -21,6 +21,8 @@
>  #include "libqos/vhost-user-blk.h"
>  #include "libqos/libqos-pc.h"
>  
> +const char *daemon_msg = "Block exports setup\n";
> +
>  #define TEST_IMAGE_SIZE (64 * 1024 * 1024)
>  #define QVIRTIO_BLK_TIMEOUT_US  (30 * 1000 * 1000)
>  #define PCI_SLOT_HP 0x06
> @@ -885,7 +887,8 @@ static void start_vhost_user_blk(GString *cmd_line, int 
> vus_instances,
>   int num_queues)
>  {
>  const char *vhost_user_blk_bin = qtest_qemu_storage_daemon_binary();
> -int i;
> +int i, err, pipe_fds[2];
> +char buf[32] = {0};
>  gchar *img_path;
>  GString *storage_daemon_command = g_string_new(NULL);
>  QemuStorageDaemonState *qsd;
> @@ -898,6 +901,12 @@ static void start_vhost_user_blk(GString *cmd_line, int 
> vus_instances,
>  " -object memory-backend-memfd,id=mem,size=256M,share=on "
>  " -M memory-backend=mem -m 256M ");
>  
> +err = pipe(pipe_fds);
> +if (err != 0) {
> +fprintf(stderr, "start_vhost_user_blk: pipe() failed %m\n");
> +abort();
> +}

Instead of setting up a pipe()...

> +
>  for (i = 0; i < vus_instances; i++) {
>  int fd;
>  char *sock_path = create_listen_socket();
> @@ -914,22 +923,40 @@ static void start_vhost_user_blk(GString *cmd_line, int 
> vus_instances,
> i + 1, sock_path);
>  }
>  
> +g_string_append_printf(storage_daemon_command, "--printset");

...change this to request the --pidfile option...

> +
>  g_test_message("starting vhost-user backend: %s",
> storage_daemon_command->str);
> +
>  pid_t pid = fork();
>  if (pid == 0) {
> +close(pipe_fds[0]);
> +
>  /*
>   * Close standard file descriptors so tap-driver.pl pipe detects when
>   * our parent terminates.
>   */
>  close(0);
> -close(1);
>  open("/dev/null", O_RDONLY);
> -open("/dev/null", O_WRONLY);
> +close(1);
> +dup2(pipe_fds[1], 1);
>  
>  execlp("/bin/sh", "sh", "-c", storage_daemon_command->str, NULL);
>  exit(1);
>  }
> +
> +close(pipe_fds[1]);
> +
> +err = read(pipe_fds[0], buf, 20);
> +if (err < 0) {
> +fprintf(stderr, "Failed to read from storage-daemon pipe %m\n");
> +abort();
> +} else if (strcmp(buf, daemon_msg) != 0) {
> +fprintf(stderr, "qemu-storage-daemon did not write expected messaage 
> "
> +"to the pipe. Total bytes read: %d. Got: %s\n", err, buf);
> +abort();
> +}

...and instead of trying to read() from a pipe, you instead wait until
the pid file exists.

> +
>  g_string_free(storage_daemon_command, true);
>  
>  qsd = g_new(QemuStorageDaemonState, 1);
> -- 
> 2.20.1
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH 1/2] storage-daemon: add opt to print when initialized

2021-08-27 Thread ebl...@redhat.com
On Fri, Aug 27, 2021 at 04:50:35PM +, Raphael Norwitz wrote:
> This change adds a command line option to print a line to standard out
> when the storage daemon has completed initialization and is ready to
> serve client connections.
> 
> This option will be used to resolve a hang in the vhost-user-blk-test.

Doesn't the existing --pidfile already serve the same job?  That is,
why not fix vhost-user-blk-test to take advantage of the pid-file
creation rather than output to stdout as evidence of when the storage
daemon is up and running?

Therefore, I don't think we need this patch.

> 
> Signed-off-by: Raphael Norwitz 
> ---
>  storage-daemon/qemu-storage-daemon.c | 21 +
>  1 file changed, 21 insertions(+)

Missing a corresponding change to the man page
(docs/tools/qemu-storage-daemon.rst), if we decide to go with this
after all.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [FSL P50x0] lscpu reports wrong values since the RC1 of kernel 5.13

2021-08-27 Thread Christian Zigotzky



> On 26. Aug 2021, at 05:43, Srikar Dronamraju  
> wrote:
> 
> * Christian Zigotzky  [2021-08-16 14:29:21]:
> 
> 
> Hi Christian,
> 
>> I tested the RC6 of kernel 5.14 today and unfortunately the issue still
>> exists. We have figured out that only P5040 SoCs are affected. [1]
>> P5020 SoCs display the correct values.
>> Please check the CPU changes in the PowerPC updates 5.13-1 and 5.13-2.
>> 
> 
> Thanks for reporting the issue.
> Would it be possible to try
> https://lore.kernel.org/linuxppc-dev/20210821092419.167454-3-sri...@linux.vnet.ibm.com/t/#u

Hi Srikar,

This patch works! Thanks a lot!

Cheers,
Christian

> 
> If the above patch is not helping, then can you please collect the output of
> 
> cat /sys/devices/system/cpu/cpu*/topology/core_siblings
> 
> Were all the CPUs online at the time of boot?
> Did we do any CPU online/offline operations post boot?
> 
> If we did CPU online/offline, can you capture the output just after the
> boot along with lscpu output..
> 
> Since this is being seen on few SOCs, can you summarize the difference
> between P5040 and P5020.
>> 
>> [1] https://forum.hyperion-entertainment.com/viewtopic.php?p=53775#p53775
>> 
>> 
>>> On 09 August 2021 um 02:37 pm, Christian Zigotzky wrote:
>>> Hi All,
>>> 
>>> Lscpu reports wrong values [1] since the RC1 of kernel 5.13 on my FSL
>>> P5040 Cyrus+ board (A-EON AmigaOne X5000). [2]
>>> The differences are:
>>> 
>>> Since the RC1 of kernel 5.13 (wrong values):
>>> 
>>> Core(s) per socket: 1
>>> Socket(s): 3
>>> 
> 
> I know that the socket count was off by 1, but I cant explain how its off by
> 2 here.
> 
>>> Before (correct values):
>>> 
>>> Core(s) per socket: 4
>>> Socket(s): 1
>>> 
> 
> -- 
> Thanks and Regards
> Srikar Dronamraju




[PATCH v2 18/19] qapi: backup: add immutable-source parameter

2021-08-27 Thread Vladimir Sementsov-Ogievskiy
We are on the way to implement internal-backup with fleecing scheme,
which includes backup job copying from fleecing block driver node
(which is target of copy-before-write filter) to final target of
backup. This job doesn't need own filter, as fleecing block driver node
is a kind of snapshot, it's immutable from reader point of view.

Let's add a parameter for backup to not insert filter but instead
unshare writes on source. This way backup job becomes a simple copying
process.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 qapi/block-core.json  | 12 +++-
 include/block/block_int.h |  1 +
 block/backup.c| 61 +++
 block/replication.c   |  2 +-
 blockdev.c|  1 +
 5 files changed, 70 insertions(+), 7 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 8a333136f5..995ca16a5e 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1391,6 +1391,15 @@
 #above node specified by @drive. If this option is not 
given,
 #a node name is autogenerated. (Since: 4.2)
 #
+# @immutable-source: If true, assume source is immutable and don't insert 
filter
+#as no copy-before-write operations are needed. It will
+#fail if there are existing writers on source node, as 
well,
+#any attempt to add writer to source node during backup 
will
+#fail. @filter-node-name must not be set.
+#If false, insert copy-before-write filter above source 
node
+#(see also @filter-node-name parameter).
+#Default is false. (Since 6.2)
+#
 # @x-perf: Performance options. (Since 6.0)
 #
 # Note: @on-source-error and @on-target-error only affect background
@@ -1407,7 +1416,8 @@
 '*on-source-error': 'BlockdevOnError',
 '*on-target-error': 'BlockdevOnError',
 '*auto-finalize': 'bool', '*auto-dismiss': 'bool',
-'*filter-node-name': 'str', '*x-perf': 'BackupPerf'  } }
+'*filter-node-name': 'str', '*immutable-source': 'bool',
+'*x-perf': 'BackupPerf'  } }
 
 ##
 # @DriveBackup:
diff --git a/include/block/block_int.h b/include/block/block_int.h
index f1a54db0f8..6571dad061 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1284,6 +1284,7 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 BitmapSyncMode bitmap_mode,
 bool compress,
 const char *filter_node_name,
+bool immutable_source,
 BackupPerf *perf,
 BlockdevOnError on_source_error,
 BlockdevOnError on_target_error,
diff --git a/block/backup.c b/block/backup.c
index 687d2882bc..a7f4d0d663 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -34,6 +34,14 @@ typedef struct BackupBlockJob {
 BlockDriverState *cbw;
 BlockDriverState *source_bs;
 BlockDriverState *target_bs;
+BlockBackend *source_blk;
+BlockBackend *target_blk;
+/*
+ * Note that if backup runs with filter (immutable-source parameter is
+ * false), @cbw is set but @source_blk and @target_blk are NULL.
+ * Otherwise if backup runs without filter (immutable-source paramter is
+ * true), @cbw is NULL but @source_blk and @target_blk are set.
+ */
 
 BdrvDirtyBitmap *sync_bitmap;
 
@@ -102,7 +110,17 @@ static void backup_clean(Job *job)
 {
 BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
 block_job_remove_all_bdrv(>common);
-bdrv_cbw_drop(s->cbw);
+if (s->cbw) {
+assert(!s->source_blk && !s->target_blk);
+bdrv_cbw_drop(s->cbw);
+} else {
+block_copy_state_free(s->bcs);
+s->bcs = NULL;
+blk_unref(s->source_blk);
+s->source_blk = NULL;
+blk_unref(s->target_blk);
+s->target_blk = NULL;
+}
 }
 
 void backup_do_checkpoint(BlockJob *job, Error **errp)
@@ -356,6 +374,7 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
   BitmapSyncMode bitmap_mode,
   bool compress,
   const char *filter_node_name,
+  bool immutable_source,
   BackupPerf *perf,
   BlockdevOnError on_source_error,
   BlockdevOnError on_target_error,
@@ -368,6 +387,7 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 int64_t cluster_size;
 BlockDriverState *cbw = NULL;
 BlockCopyState *bcs = NULL;
+BlockBackend *source_blk = NULL, *target_blk = NULL;
 
 assert(bs);
 assert(target);
@@ -376,6 +396,12 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 assert(sync_mode != MIRROR_SYNC_MODE_INCREMENTAL);
 assert(sync_bitmap || 

Re: [PATCH] nbd/server: Advertise MULTI_CONN for shared writable exports

2021-08-27 Thread Eric Blake
On Fri, Aug 27, 2021 at 07:58:10PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 27.08.2021 18:09, Eric Blake wrote:
> > According to the NBD spec, a server advertising
> > NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will
> > not see any cache inconsistencies: when properly separated by a single
> > flush, actions performed by one client will be visible to another
> > client, regardless of which client did the flush.  We satisfy these
> > conditions in qemu because our block layer serializes any overlapping
> > operations (see bdrv_find_conflicting_request and friends)
> 
> Not any. We serialize only write operations not aligned to request_alignment 
> of bs (see bdrv_make_request_serialising() call in bdrv_co_pwritev_part). So, 
> actually most of overlapping operations remain overlapping. And that's 
> correct: it's not a Qemu work to resolve overlapping requests. We resolve 
> them only when we are responsible for appearing of intersection: when we 
> align requests up.

I welcome improvements on the wording.  Maybe what I should be
emphasizing is that even when there are overlapping requests, qemu
itself is multiplexing all of those requests through a single
interface into the backend, without any caching on qemu's part, and
relying on the consistency of the flush operation into that backend.

>From a parallelism perspective, in file-posix.c, we don't distiguish
between two pwrite() syscalls made (potentially out-of-order) by a
single BDS client in two coroutines, from two pwrite() syscalls made
by two separate BDS clients.  Either way, those two syscalls may both
be asynchronous, but both go through a single interface into the
kernel's view of the underlying filesystem or block device.  And we
implement flush via fdatasync(), which the kernel already has some
pretty strong guarantees on cross-thread consistency.

But I am less certain of whether we are guaranteed cross-consistency
like that for all protocol drivers.  Is there any block driver (most
likely a networked one) where we have situations such that even though
we are using the same API for all asynchronous access within the qemu
coroutines, under the hood those APIs can end up diverging on their
destinations such as due to network round-robin effects, and result in
us seeing cache-inconsistent views?  That is, can we ever encounter
this:

-> read()
  -> kicks off networked storage call that resolves to host X
-> host X caches the read
  <- reply
-> write()
  -> kicks off networked storage call that resolves to host Y
-> host Y updates the file system
  <- reply
-> flush()
  -> kicks off networked storage call that resolves to host Y
-> host Y starts flushing, but replies early
  <- reply
-> read()
  -> kicks off networked storage call that resolves to host X
-> host X does not see effects of Y's flush yet, returns stale data

If we can encounter that, then in those situations we must not
advertise MULTI_CONN.  But I'm confident that file-posix.c does not
have that problem, and even if another driver did have that problem
(where our single API access can result in cache-inconsistent views
over the protocol, rather than flush really being effective to all
further API access to that driver), you'd think we'd be aware of it.
However, if we DO know of a place where that is the case, then now is
the time to design our QAPI control over whether to advertise NBD's
MULTI_CONN bit based on whether the block layer can warn us about a
particular block layer NOT being safe.

But unless we come up with such a scenario, maybe all I need here is
better wording to put in the commit message to state why we think we
ARE safe in advertising MULTI_CONN.  Remember, the NBD flag only has
an impact in relation to how strong flush calls are (it is NOT
required that overlapping write requests have any particular behavior
- that's always been up to the client to be careful with that, and
qemu need not go out of its way to prevent client stupidity with
overlapping writes), but rather that actions with a reply completed
prior to FLUSH are then visible to actions started after the reply to
FLUSH.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




[PATCH v2 17/19] block: blk_root(): return non-const pointer

2021-08-27 Thread Vladimir Sementsov-Ogievskiy
In the following patch we'll want to pass blk children to block-copy.
Const pointers are not enough. So, return non const pointer from
blk_root().

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/sysemu/block-backend.h | 2 +-
 block/block-backend.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 29d4fdbf63..5d4dd877b7 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -271,7 +271,7 @@ int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, 
int64_t off_in,
int bytes, BdrvRequestFlags read_flags,
BdrvRequestFlags write_flags);
 
-const BdrvChild *blk_root(BlockBackend *blk);
+BdrvChild *blk_root(BlockBackend *blk);
 
 int blk_make_empty(BlockBackend *blk, Error **errp);
 
diff --git a/block/block-backend.c b/block/block-backend.c
index 6140d133e2..b167c630d2 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2463,7 +2463,7 @@ int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, 
int64_t off_in,
   bytes, read_flags, write_flags);
 }
 
-const BdrvChild *blk_root(BlockBackend *blk)
+BdrvChild *blk_root(BlockBackend *blk)
 {
 return blk->root;
 }
-- 
2.29.2




Re: [PATCH v2 0/2] hw/arm/raspi: Remove deprecated raspi2/raspi3 aliases

2021-08-27 Thread Philippe Mathieu-Daudé
On 8/27/21 8:01 PM, Willian Rampazzo wrote:
> Hi, Phil,
> 
> On Thu, Aug 26, 2021 at 1:49 PM Philippe Mathieu-Daudé  
> wrote:
>>
>> Hi Peter,
>>
>> On 7/9/21 6:00 PM, Peter Maydell wrote:
>>> On Fri, 9 Jul 2021 at 16:33, Peter Maydell  wrote:

 On Thu, 8 Jul 2021 at 15:55, Philippe Mathieu-Daudé  
 wrote:
>
> Since v1:
> - renamed tests (Peter)
>
> Philippe Mathieu-Daudé (2):
>   tests: Remove uses of deprecated raspi2/raspi3 machine names
>   hw/arm/raspi: Remove deprecated raspi2/raspi3 aliases



 Applied to target-arm.next, thanks.
>>>
>>> I found that this seems to break 'make check':
>>>
>>>  (09/52) 
>>> tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_raspi2_initrd:
>>> ERROR: Could not perform graceful shutdown (40.38 s)
>>
>> I can not reproduce. Maybe something got changed in Python/Avocado
>> since... I'm going to simply respin (updating 6.1 -> 6.2).
> 
> I also could not reproduce. I checked and nothing changed on the
> Avocado side. The version is still the same on the QEMU side.

Thanks for double-checking!



[PATCH v2 15/19] iotests.py: add qemu_io_pipe_and_status()

2021-08-27 Thread Vladimir Sementsov-Ogievskiy
Add helper that returns both status and output, to be used in the
following commit

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/iotests.py | 4 
 1 file changed, 4 insertions(+)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 77efcb0927..ff96d8ef16 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -206,6 +206,10 @@ def qemu_io(*args):
 args = qemu_io_args + list(args)
 return qemu_tool_pipe_and_status('qemu-io', args)[0]
 
+def qemu_io_pipe_and_status(*args):
+args = qemu_io_args + list(args)
+return qemu_tool_pipe_and_status('qemu-io', args)
+
 def qemu_io_log(*args):
 result = qemu_io(*args)
 log(result, filters=[filter_testfiles, filter_qemu_io])
-- 
2.29.2




[PATCH v2 10/19] block: introduce fleecing block driver

2021-08-27 Thread Vladimir Sementsov-Ogievskiy
Introduce a new driver, that works in pair with copy-before-write to
improve fleecing.

Without fleecing driver, old fleecing scheme looks as follows:

[guest]
  |
  |root
  v
[copy-before-write] -> [temp.qcow2] <--- [nbd export]
  | target  |
  |file |backing
  v |
[active disk] <-+

With fleecing driver, new scheme is:

[guest]
  |
  |root
  v
[copy-before-write] -> [fleecing] <--- [nbd export]
  | target  ||
  |file ||file
  v |v
[active disk]<--source--+  [temp.img]

Benefits of new scheme:

1. Access control: if remote client try to read data that not covered
   by original dirty bitmap used on copy-before-write open, client gets
   -EACCES.

2. Discard support: if remote client do DISCARD, this additionally to
   discarding data in temp.img informs block-copy process to not copy
   these clusters. Next read from discarded area will return -EACCES.
   This is significant thing: when fleecing user reads data that was
   not yet copied to temp.img, we can avoid copying it on further guest
   write.

3. Synchronisation between client reads and block-copy write is more
   efficient: it doesn't block intersecting block-copy write during
   client read.

4. We don't rely on backing feature: active disk should not be backing
   of temp image, so we avoid some permission-related difficulties and
   temp image now is not required to support backing, it may be simple
   raw image.

Note that now nobody calls fleecing_drv_activate(), so new driver is
actually unusable. It's a work for the following patch: support
fleecing block driver in copy-before-write filter driver.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 qapi/block-core.json |  17 ++-
 block/fleecing.h |  16 +++
 block/fleecing-drv.c | 260 +++
 MAINTAINERS  |   1 +
 block/meson.build|   1 +
 5 files changed, 294 insertions(+), 1 deletion(-)
 create mode 100644 block/fleecing-drv.c

diff --git a/qapi/block-core.json b/qapi/block-core.json
index c42d23752d..8a333136f5 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2826,13 +2826,14 @@
 # @blkreplay: Since 4.2
 # @compress: Since 5.0
 # @copy-before-write: Since 6.2
+# @fleecing: Since 6.2
 #
 # Since: 2.9
 ##
 { 'enum': 'BlockdevDriver',
   'data': [ 'blkdebug', 'blklogwrites', 'blkreplay', 'blkverify', 'bochs',
 'cloop', 'compress', 'copy-before-write', 'copy-on-read', 'dmg',
-'file', 'ftp', 'ftps', 'gluster',
+'file', 'fleecing', 'ftp', 'ftps', 'gluster',
 {'name': 'host_cdrom', 'if': 'defined(HAVE_HOST_BLOCK_DEVICE)' },
 {'name': 'host_device', 'if': 'defined(HAVE_HOST_BLOCK_DEVICE)' },
 'http', 'https', 'iscsi',
@@ -4077,6 +4078,19 @@
   'base': 'BlockdevOptionsGenericFormat',
   'data': { 'target': 'BlockdevRef', '*bitmap': 'BlockDirtyBitmap' } }
 
+##
+# @BlockdevOptionsFleecing:
+#
+# Driver that works in pair with copy-before-write to make fleecing scheme.
+#
+# @source: source node of fleecing
+#
+# Since: 6.2
+##
+{ 'struct': 'BlockdevOptionsFleecing',
+  'base': 'BlockdevOptionsGenericFormat',
+  'data': { 'source': 'str' } }
+
 ##
 # @BlockdevOptions:
 #
@@ -4133,6 +4147,7 @@
   'copy-on-read':'BlockdevOptionsCor',
   'dmg':'BlockdevOptionsGenericFormat',
   'file':   'BlockdevOptionsFile',
+  'fleecing':   'BlockdevOptionsFleecing',
   'ftp':'BlockdevOptionsCurlFtp',
   'ftps':   'BlockdevOptionsCurlFtps',
   'gluster':'BlockdevOptionsGluster',
diff --git a/block/fleecing.h b/block/fleecing.h
index fb7b2f86c4..75ad2f8b19 100644
--- a/block/fleecing.h
+++ b/block/fleecing.h
@@ -80,6 +80,9 @@
 #include "block/block-copy.h"
 #include "block/reqlist.h"
 
+
+/* fleecing.c */
+
 typedef struct FleecingState FleecingState;
 
 /*
@@ -132,4 +135,17 @@ void fleecing_discard(FleecingState *f, int64_t offset, 
int64_t bytes);
 void fleecing_mark_done_and_wait_readers(FleecingState *f, int64_t offset,
  int64_t bytes);
 
+
+/* fleecing-drv.c */
+
+/* Returns true if @bs->drv is fleecing block driver */
+bool is_fleecing_drv(BlockDriverState *bs);
+
+/*
+ * Normally FleecingState is created by copy-before-write filter. Then
+ * copy-before-write filter calls fleecing_drv_activate() to share 
FleecingState
+ * with fleecing block driver.
+ */
+void fleecing_drv_activate(BlockDriverState *bs, FleecingState *fleecing);
+
 #endif /* FLEECING_H */
diff --git a/block/fleecing-drv.c b/block/fleecing-drv.c
new file mode 100644
index 00..9161e13809
--- /dev/null
+++ b/block/fleecing-drv.c
@@ -0,0 +1,260 @@
+/*
+ * fleecing block driver
+ *
+ * Copyright (c) 2021 Virtuozzo International GmbH.
+ *
+ * Author:
+ *  Sementsov-Ogievskiy Vladimir 
+ *
+ * This program is free software; you can redistribute it and/or 

[PATCH v2 08/19] block/reqlist: add reqlist_wait_all()

2021-08-27 Thread Vladimir Sementsov-Ogievskiy
Add function to wait for all intersecting requests.
To be used in the further commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/reqlist.h | 8 
 block/reqlist.c | 8 
 2 files changed, 16 insertions(+)

diff --git a/include/block/reqlist.h b/include/block/reqlist.h
index b904d80216..4695623bb3 100644
--- a/include/block/reqlist.h
+++ b/include/block/reqlist.h
@@ -53,6 +53,14 @@ BlockReq *reqlist_find_conflict(BlockReqList *reqs, int64_t 
offset,
 bool coroutine_fn reqlist_wait_one(BlockReqList *reqs, int64_t offset,
int64_t bytes, CoMutex *lock);
 
+/*
+ * Wait for all intersecting requests. It just calls reqlist_wait_one() in a
+ * loops, caller is responsible to stop producing new requests in this region
+ * in parallel, otherwise reqlist_wait_all() may never return.
+ */
+void coroutine_fn reqlist_wait_all(BlockReqList *reqs, int64_t offset,
+   int64_t bytes, CoMutex *lock);
+
 /*
  * Shrink request and wake all waiting coroutines (may be some of them are not
  * intersecting with shrunk request).
diff --git a/block/reqlist.c b/block/reqlist.c
index 5e320ba649..52a362a1d8 100644
--- a/block/reqlist.c
+++ b/block/reqlist.c
@@ -57,6 +57,14 @@ bool coroutine_fn reqlist_wait_one(BlockReqList *reqs, 
int64_t offset,
 return true;
 }
 
+void coroutine_fn reqlist_wait_all(BlockReqList *reqs, int64_t offset,
+   int64_t bytes, CoMutex *lock)
+{
+while (reqlist_wait_one(reqs, offset, bytes, lock)) {
+/* continue */
+}
+}
+
 void coroutine_fn reqlist_shrink_req(BlockReq *req, int64_t new_bytes)
 {
 if (new_bytes == req->bytes) {
-- 
2.29.2




[PATCH v2 11/19] block/copy-before-write: support fleecing block driver

2021-08-27 Thread Vladimir Sementsov-Ogievskiy
The last step to make new fleecing scheme work (see block/fleecing.h
for descritption) is to update copy-before-write filter:

If we detect that unfiltered target child is fleecing block driver, we
do:
 - initialize shared FleecingState
 - activate fleecing block driver with it
 - do guest write synchronization with help of
   fleecing_mark_done_and_wait_readers() function

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/copy-before-write.c | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index e3456ad6aa..686a085861 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -33,10 +33,13 @@
 #include "block/block-copy.h"
 
 #include "block/copy-before-write.h"
+#include "block/fleecing.h"
 
 typedef struct BDRVCopyBeforeWriteState {
 BlockCopyState *bcs;
 BdrvChild *target;
+
+FleecingState *fleecing;
 } BDRVCopyBeforeWriteState;
 
 static coroutine_fn int cbw_co_preadv(
@@ -50,6 +53,7 @@ static coroutine_fn int 
cbw_do_copy_before_write(BlockDriverState *bs,
 uint64_t offset, uint64_t bytes, BdrvRequestFlags flags)
 {
 BDRVCopyBeforeWriteState *s = bs->opaque;
+int ret;
 uint64_t off, end;
 int64_t cluster_size = block_copy_cluster_size(s->bcs);
 
@@ -60,7 +64,16 @@ static coroutine_fn int 
cbw_do_copy_before_write(BlockDriverState *bs,
 off = QEMU_ALIGN_DOWN(offset, cluster_size);
 end = QEMU_ALIGN_UP(offset + bytes, cluster_size);
 
-return block_copy(s->bcs, off, end - off, true);
+ret = block_copy(s->bcs, off, end - off, true);
+if (ret < 0) {
+return ret;
+}
+
+if (s->fleecing) {
+fleecing_mark_done_and_wait_readers(s->fleecing, off, end - off);
+}
+
+return 0;
 }
 
 static int coroutine_fn cbw_co_pdiscard(BlockDriverState *bs,
@@ -149,6 +162,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, 
int flags,
 {
 BDRVCopyBeforeWriteState *s = bs->opaque;
 BdrvDirtyBitmap *bitmap = NULL;
+BlockDriverState *unfiltered_target;
 
 bs->file = bdrv_open_child(NULL, options, "file", bs, _of_bds,
BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
@@ -162,6 +176,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, 
int flags,
 if (!s->target) {
 return -EINVAL;
 }
+unfiltered_target = bdrv_skip_filters(s->target->bs);
 
 if (qdict_haskey(options, "bitmap.node") ||
 qdict_haskey(options, "bitmap.name"))
@@ -203,6 +218,14 @@ static int cbw_open(BlockDriverState *bs, QDict *options, 
int flags,
 return -EINVAL;
 }
 
+if (is_fleecing_drv(unfiltered_target)) {
+s->fleecing = fleecing_new(s->bcs, unfiltered_target, errp);
+if (!s->fleecing) {
+return -EINVAL;
+}
+fleecing_drv_activate(unfiltered_target, s->fleecing);
+}
+
 return 0;
 }
 
@@ -210,6 +233,8 @@ static void cbw_close(BlockDriverState *bs)
 {
 BDRVCopyBeforeWriteState *s = bs->opaque;
 
+fleecing_free(s->fleecing);
+s->fleecing = NULL;
 block_copy_state_free(s->bcs);
 s->bcs = NULL;
 }
-- 
2.29.2




[PATCH v2 09/19] block: introduce FleecingState class

2021-08-27 Thread Vladimir Sementsov-Ogievskiy
FleecingState represents state shared between copy-before-write filter
and upcoming fleecing block driver.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/fleecing.h  | 135 ++
 block/fleecing.c  | 182 ++
 MAINTAINERS   |   2 +
 block/meson.build |   1 +
 4 files changed, 320 insertions(+)
 create mode 100644 block/fleecing.h
 create mode 100644 block/fleecing.c

diff --git a/block/fleecing.h b/block/fleecing.h
new file mode 100644
index 00..fb7b2f86c4
--- /dev/null
+++ b/block/fleecing.h
@@ -0,0 +1,135 @@
+/*
+ * FleecingState
+ *
+ * The common state of image fleecing, shared between copy-before-write filter
+ * and fleecing block driver.
+ *
+ * Copyright (c) 2021 Virtuozzo International GmbH.
+ *
+ * Author:
+ *  Sementsov-Ogievskiy Vladimir 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see .
+ *
+ *
+ * Fleecing scheme looks as follows:
+ *
+ * [guest blk]   [nbd export]
+ *|  |
+ *|root  |
+ *v  v
+ * [copy-before-write]--target-->[fleecing drv]
+ *|  /   |
+ *|file /|file
+ *v/ v
+ * [active disk]<--source-/  [temp disk]
+ *
+ * Note that "active disk" is also called just "source" and "temp disk" is also
+ * called "target".
+ *
+ * What happens here:
+ *
+ * copy-before-write filter performs copy-before-write operations: on guest
+ * write we should copy old data to target child before rewriting. Note that we
+ * write this data through fleecing driver: it saves a possibility to implement
+ * a kind of cache in fleecing driver in future.
+ *
+ * Fleecing user is nbd export: it can read from fleecing node, which 
guarantees
+ * a snapshot-view for fleecing user. Fleecing user may also do discard
+ * operations.
+ *
+ * FleecingState is responsible for most of the fleecing logic:
+ *
+ * 1. Fleecing read. Handle reads of fleecing user: we should decide where from
+ * to read, from source node or from copy-before-write target node. In former
+ * case we need to synchronize with guest writes. See fleecing_read_lock() and
+ * fleecing_read_unlock() functionality.
+ *
+ * 2. Guest write synchronization (part of [1] actually). See
+ * fleecing_mark_done_and_wait_readers()
+ *
+ * 3. Fleecing discard. Used by fleecing user when corresponding area is 
already
+ * copied. Fleecing user may discard the area which is not needed anymore, that
+ * should result in:
+ *   - discarding data to free disk space
+ *   - clear bits in copy-bitmap of block-copy, to avoid extra 
copy-before-write
+ * operations
+ *   - clear bits in access-bitmap of FleecingState, to avoid further wrong
+ * access
+ *
+ * Still, FleecingState doesn't own any block children, so all real io
+ * operations (reads, writes and discards) are done by copy-before-write filter
+ * and fleecing block driver.
+ */
+
+#ifndef FLEECING_H
+#define FLEECING_H
+
+#include "block/block_int.h"
+#include "block/block-copy.h"
+#include "block/reqlist.h"
+
+typedef struct FleecingState FleecingState;
+
+/*
+ * Create FleecingState.
+ *
+ * @bcs: link to block-copy owned by copy-before-write filter.
+ *
+ * @fleecing_node: should be fleecing block driver node. Used to create some
+ * bitmaps in it.
+ */
+FleecingState *fleecing_new(BlockCopyState *bcs,
+BlockDriverState *fleecing_node,
+Error **errp);
+
+/* Free the state. Doesn't free block-copy state (@bcs) */
+void fleecing_free(FleecingState *s);
+
+/*
+ * Convenient function for thous who want to do fleecing read.
+ *
+ * If requested region starts in "done" area, i.e. data is already copied to
+ * copy-before-write target node, req is set to NULL, pnum is set to available
+ * bytes to read from target. User is free to read @pnum bytes from target.
+ * Still, user is responsible for concurrent discards on target.
+ *
+ * If requests region starts in "not done" area, i.e. we have to read from
+ * source node directly, than @pnum bytes of source node are frozen and
+ * guaranteed not be rewritten until user calls cbw_snapshot_read_unlock().
+ *
+ * Returns 0 on success and -EACCES when try to read non-dirty area of
+ * access_bitmap.
+ */
+int 

[PATCH v2 19/19] iotests/image-fleecing: test push backup with fleecing

2021-08-27 Thread Vladimir Sementsov-Ogievskiy
Add test for push backup with fleecing:

 - start fleecing with copy-before-write filter
 - start a backup job from temporary fleecing node to actual backup
   target

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/tests/image-fleecing | 121 ++--
 tests/qemu-iotests/tests/image-fleecing.out |  63 ++
 2 files changed, 152 insertions(+), 32 deletions(-)

diff --git a/tests/qemu-iotests/tests/image-fleecing 
b/tests/qemu-iotests/tests/image-fleecing
index ab95e4960e..ed57358e5d 100755
--- a/tests/qemu-iotests/tests/image-fleecing
+++ b/tests/qemu-iotests/tests/image-fleecing
@@ -48,9 +48,15 @@ remainder = [('0xd5', '0x108000',  '32k'), # Right-end of 
partial-left [1]
  ('0xdc', '32M',   '32k'), # Left-end of partial-right [2]
  ('0xcd', '0x3ff', '64k')] # patterns[3]
 
-def do_test(use_cbw, use_fleecing_filter, base_img_path,
-fleece_img_path, nbd_sock_path, vm,
+def do_test(vm, use_cbw, use_fleecing_filter, base_img_path,
+fleece_img_path, nbd_sock_path=None,
+target_img_path=None,
 bitmap=False):
+push_backup = target_img_path is not None
+assert (nbd_sock_path is not None) != push_backup
+if push_backup:
+assert use_cbw
+
 log('--- Setting up images ---')
 log('')
 
@@ -64,6 +70,9 @@ def do_test(use_cbw, use_fleecing_filter, base_img_path,
 else:
 assert qemu_img('create', '-f', 'qcow2', fleece_img_path, '64M') == 0
 
+if push_backup:
+assert qemu_img('create', '-f', 'qcow2', target_img_path, '64M') == 0
+
 for p in patterns:
 qemu_io('-f', iotests.imgfmt,
 '-c', 'write -P%s %s %s' % p, base_img_path)
@@ -139,27 +148,45 @@ def do_test(use_cbw, use_fleecing_filter, base_img_path,
 
 export_node = 'fl-fleecing' if use_fleecing_filter else tmp_node
 
-log('')
-log('--- Setting up NBD Export ---')
-log('')
+if push_backup:
+log('')
+log('--- Starting actual backup ---')
+log('')
 
-nbd_uri = 'nbd+unix:///%s?socket=%s' % (export_node, nbd_sock_path)
-log(vm.qmp('nbd-server-start',
-   {'addr': { 'type': 'unix',
-  'data': { 'path': nbd_sock_path } } }))
+log(vm.qmp('blockdev-add', **{
+'driver': iotests.imgfmt,
+'node-name': 'target',
+'file': {
+'driver': 'file',
+'filename': target_img_path
+}
+}))
+log(vm.qmp('blockdev-backup', device=export_node,
+   sync='full', target='target',
+   immutable_source=True,
+   job_id='push-backup', speed=1))
+else:
+log('')
+log('--- Setting up NBD Export ---')
+log('')
 
-log(vm.qmp('nbd-server-add', device=export_node))
+nbd_uri = 'nbd+unix:///%s?socket=%s' % (export_node, nbd_sock_path)
+log(vm.qmp('nbd-server-start',
+   {'addr': { 'type': 'unix',
+  'data': { 'path': nbd_sock_path } } }))
 
-log('')
-log('--- Sanity Check ---')
-log('')
+log(vm.qmp('nbd-server-add', device=export_node))
 
-for p in patterns + zeroes:
-cmd = 'read -P%s %s %s' % p
-log(cmd)
-out, ret = qemu_io_pipe_and_status('-r', '-f', 'raw', '-c', cmd, 
nbd_uri)
-if ret != 0:
-print(out)
+log('')
+log('--- Sanity Check ---')
+log('')
+
+for p in patterns + zeroes:
+cmd = 'read -P%s %s %s' % p
+log(cmd)
+out, ret = qemu_io_pipe_and_status('-r', '-f', 'raw', '-c', cmd, 
nbd_uri)
+if ret != 0:
+print(out)
 
 log('')
 log('--- Testing COW ---')
@@ -170,6 +197,20 @@ def do_test(use_cbw, use_fleecing_filter, base_img_path,
 log(cmd)
 log(vm.hmp_qemu_io(qom_path, cmd, qdev=True))
 
+if push_backup:
+# Check that previous operations were done during backup, not after
+result = vm.qmp('query-block-jobs')
+if len(result['return']) != 1:
+log('Backup finished too fast, COW is not tested')
+
+result = vm.qmp('block-job-set-speed', device='push-backup', speed=0)
+assert result == {'return': {}}
+
+log(vm.event_wait(name='BLOCK_JOB_COMPLETED',
+  match={'data': {'device': 'push-backup'}}),
+  filters=[iotests.filter_qmp_event])
+log(vm.qmp('blockdev-del', node_name='target'))
+
 log('')
 log('--- Verifying Data ---')
 log('')
@@ -177,15 +218,19 @@ def do_test(use_cbw, use_fleecing_filter, base_img_path,
 for p in patterns + zeroes:
 cmd = 'read -P%s %s %s' % p
 log(cmd)
-out, ret = qemu_io_pipe_and_status('-r', '-f', 'raw', '-c', cmd, 
nbd_uri)
-if ret != 0:
-print(out)
+if push_backup:
+assert 

[PATCH v2 13/19] block/copy-before-write: use write-unchanged in fleecing mode

2021-08-27 Thread Vladimir Sementsov-Ogievskiy
As announced in previous commit, we need use write-unchanged operations
for fleecing, so that fleecing client may unshare writes if needed.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/copy-before-write.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 7e4e4bf7d4..91250cc634 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -132,6 +132,8 @@ static void cbw_child_perm(BlockDriverState *bs, BdrvChild 
*c,
uint64_t perm, uint64_t shared,
uint64_t *nperm, uint64_t *nshared)
 {
+BDRVCopyBeforeWriteState *s = bs->opaque;
+
 if (!(role & BDRV_CHILD_FILTERED)) {
 /*
  * Target child
@@ -142,7 +144,7 @@ static void cbw_child_perm(BlockDriverState *bs, BdrvChild 
*c,
  * only upfront.
  */
 *nshared = BLK_PERM_ALL & ~BLK_PERM_RESIZE;
-*nperm = BLK_PERM_WRITE;
+*nperm = s->fleecing ? BLK_PERM_WRITE_UNCHANGED : BLK_PERM_WRITE;
 } else {
 /* Source child */
 bdrv_default_perms(bs, c, role, reopen_queue,
@@ -212,7 +214,14 @@ static int cbw_open(BlockDriverState *bs, QDict *options, 
int flags,
 ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
  bs->file->bs->supported_zero_flags);
 
-s->bcs = block_copy_state_new(bs->file, s->target, bitmap, false, errp);
+/*
+ * For fleecing scheme set parameter write_unchanged=true, as our
+ * copy-before-write operations will actually be write-unchanged. As well 
we
+ * take write-unchanged permission instead of write, which is important for
+ * backup with immutable_source=true to work as fleecing client.
+ */
+s->bcs = block_copy_state_new(bs->file, s->target, bitmap,
+  is_fleecing_drv(unfiltered_target), errp);
 if (!s->bcs) {
 error_prepend(errp, "Cannot create block-copy-state: ");
 return -EINVAL;
-- 
2.29.2




[PATCH v2 00/19] Make image fleecing more usable

2021-08-27 Thread Vladimir Sementsov-Ogievskiy
Hi all!

That continues "[PATCH RFC DRAFT 00/11] Make image fleecing more usable"
and supersedes "[PATCH v2 for-6.2 0/6] push backup with fleecing"

Supersedes: <20210804131750.127574-1-vsement...@virtuozzo.com>
Supersedes: <20210721140424.163701-1-vsement...@virtuozzo.com>

There several improvements to fleecing scheme:

1. support bitmap in copy-before-write filter

2. introduce fleecing block driver, which opens the door for a lot of
   image fleecing improvements.
   See "block: introduce fleecing block driver" commit message for
   details.

3. support "push backup with fleecing" scheme, when backup job is a
   client of common fleecing scheme. That helps when writes to final
   backup target are slow and we don't want guest writes hang waiting
   for copy-before-write operations to final target.

Vladimir Sementsov-Ogievskiy (19):
  block/block-copy: move copy_bitmap initialization to
block_copy_state_new()
  block/dirty-bitmap: bdrv_merge_dirty_bitmap(): add return value
  block/block-copy: block_copy_state_new(): add bitmap parameter
  block/copy-before-write: add bitmap open parameter
  block/block-copy: add block_copy_reset()
  block: intoduce reqlist
  block/dirty-bitmap: introduce bdrv_dirty_bitmap_status()
  block/reqlist: add reqlist_wait_all()
  block: introduce FleecingState class
  block: introduce fleecing block driver
  block/copy-before-write: support fleecing block driver
  block/block-copy: add write-unchanged mode
  block/copy-before-write: use write-unchanged in fleecing mode
  iotests/image-fleecing: add test-case for fleecing format node
  iotests.py: add qemu_io_pipe_and_status()
  iotests/image-fleecing: add test case with bitmap
  block: blk_root(): return non-const pointer
  qapi: backup: add immutable-source parameter
  iotests/image-fleecing: test push backup with fleecing

 qapi/block-core.json|  39 ++-
 block/fleecing.h| 151 
 include/block/block-copy.h  |   4 +-
 include/block/block_int.h   |   1 +
 include/block/dirty-bitmap.h|   4 +-
 include/block/reqlist.h |  75 ++
 include/qemu/hbitmap.h  |  11 +
 include/sysemu/block-backend.h  |   2 +-
 block/backup.c  |  61 -
 block/block-backend.c   |   2 +-
 block/block-copy.c  | 157 +---
 block/copy-before-write.c   |  70 +-
 block/dirty-bitmap.c|  15 +-
 block/fleecing-drv.c| 260 
 block/fleecing.c| 182 ++
 block/monitor/bitmap-qmp-cmds.c |   5 +-
 block/replication.c |   2 +-
 block/reqlist.c |  84 +++
 blockdev.c  |   1 +
 util/hbitmap.c  |  36 +++
 MAINTAINERS |   7 +-
 block/meson.build   |   3 +
 tests/qemu-iotests/iotests.py   |   4 +
 tests/qemu-iotests/tests/image-fleecing | 178 +++---
 tests/qemu-iotests/tests/image-fleecing.out | 221 -
 25 files changed, 1420 insertions(+), 155 deletions(-)
 create mode 100644 block/fleecing.h
 create mode 100644 include/block/reqlist.h
 create mode 100644 block/fleecing-drv.c
 create mode 100644 block/fleecing.c
 create mode 100644 block/reqlist.c

-- 
2.29.2




[PATCH v2 01/19] block/block-copy: move copy_bitmap initialization to block_copy_state_new()

2021-08-27 Thread Vladimir Sementsov-Ogievskiy
We are going to complicate bitmap initialization in the further
commit. And in future, backup job will be able to work without filter
(when source is immutable), so we'll need same bitmap initialization in
copy-before-write filter and in backup job. So, it's reasonable to do
it in block-copy.

Note that for now cbw_open() is the only caller of
block_copy_state_new().

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/block-copy.c| 1 +
 block/copy-before-write.c | 4 
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index 37d804ec42..c39cb5fda7 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -401,6 +401,7 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, 
BdrvChild *target,
 return NULL;
 }
 bdrv_disable_dirty_bitmap(copy_bitmap);
+bdrv_set_dirty_bitmap(copy_bitmap, 0, bdrv_dirty_bitmap_size(copy_bitmap));
 
 /*
  * If source is in backing chain of target assume that target is going to 
be
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 2a5e57deca..f5551cd15b 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -148,7 +148,6 @@ static int cbw_open(BlockDriverState *bs, QDict *options, 
int flags,
 Error **errp)
 {
 BDRVCopyBeforeWriteState *s = bs->opaque;
-BdrvDirtyBitmap *copy_bitmap;
 
 bs->file = bdrv_open_child(NULL, options, "file", bs, _of_bds,
BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
@@ -176,9 +175,6 @@ static int cbw_open(BlockDriverState *bs, QDict *options, 
int flags,
 return -EINVAL;
 }
 
-copy_bitmap = block_copy_dirty_bitmap(s->bcs);
-bdrv_set_dirty_bitmap(copy_bitmap, 0, bdrv_dirty_bitmap_size(copy_bitmap));
-
 return 0;
 }
 
-- 
2.29.2




[PATCH v2 03/19] block/block-copy: block_copy_state_new(): add bitmap parameter

2021-08-27 Thread Vladimir Sementsov-Ogievskiy
This will be used in the following commit to bring "incremental" mode
to copy-before-write filter.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/block-copy.h |  2 +-
 block/block-copy.c | 14 --
 block/copy-before-write.c  |  2 +-
 3 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index 99370fa38b..8da4cec1b6 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -25,7 +25,7 @@ typedef struct BlockCopyState BlockCopyState;
 typedef struct BlockCopyCallState BlockCopyCallState;
 
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
- Error **errp);
+ BdrvDirtyBitmap *bitmap, Error **errp);
 
 /* Function should be called prior any actual copy request */
 void block_copy_set_copy_opts(BlockCopyState *s, bool use_copy_range,
diff --git a/block/block-copy.c b/block/block-copy.c
index c39cb5fda7..65019d0d1d 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -383,8 +383,9 @@ static int64_t 
block_copy_calculate_cluster_size(BlockDriverState *target,
 }
 
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
- Error **errp)
+ BdrvDirtyBitmap *bitmap, Error **errp)
 {
+ERRP_GUARD();
 BlockCopyState *s;
 int64_t cluster_size;
 BdrvDirtyBitmap *copy_bitmap;
@@ -401,7 +402,16 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, 
BdrvChild *target,
 return NULL;
 }
 bdrv_disable_dirty_bitmap(copy_bitmap);
-bdrv_set_dirty_bitmap(copy_bitmap, 0, bdrv_dirty_bitmap_size(copy_bitmap));
+if (bitmap) {
+if (!bdrv_merge_dirty_bitmap(copy_bitmap, bitmap, NULL, errp)) {
+error_prepend(errp, "Failed to merge bitmap '%s' to internal "
+  "copy-bitmap: ", bdrv_dirty_bitmap_name(bitmap));
+return NULL;
+}
+} else {
+bdrv_set_dirty_bitmap(copy_bitmap, 0,
+  bdrv_dirty_bitmap_size(copy_bitmap));
+}
 
 /*
  * If source is in backing chain of target assume that target is going to 
be
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index f5551cd15b..d31ca6 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -169,7 +169,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, 
int flags,
 ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
  bs->file->bs->supported_zero_flags);
 
-s->bcs = block_copy_state_new(bs->file, s->target, errp);
+s->bcs = block_copy_state_new(bs->file, s->target, NULL, errp);
 if (!s->bcs) {
 error_prepend(errp, "Cannot create block-copy-state: ");
 return -EINVAL;
-- 
2.29.2




[PATCH v2 04/19] block/copy-before-write: add bitmap open parameter

2021-08-27 Thread Vladimir Sementsov-Ogievskiy
This brings "incremental" mode to copy-before-write filter: user can
specify bitmap so that filter will copy only "dirty" areas.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 qapi/block-core.json  | 10 +-
 block/copy-before-write.c | 30 +-
 2 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 6764d8b84f..c42d23752d 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4063,11 +4063,19 @@
 #
 # @target: The target for copy-before-write operations.
 #
+# @bitmap: If specified, copy-before-write filter will do
+#  copy-before-write operations only for dirty regions of the
+#  bitmap. Bitmap size must be equal to length of file and
+#  target child of the filter. Note also, that bitmap is used
+#  only to initialize internal bitmap of the process, so further
+#  modifications (or removing) of specified bitmap doesn't
+#  influence the filter.
+#
 # Since: 6.2
 ##
 { 'struct': 'BlockdevOptionsCbw',
   'base': 'BlockdevOptionsGenericFormat',
-  'data': { 'target': 'BlockdevRef' } }
+  'data': { 'target': 'BlockdevRef', '*bitmap': 'BlockDirtyBitmap' } }
 
 ##
 # @BlockdevOptions:
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index d31ca6..e3456ad6aa 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -148,6 +148,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, 
int flags,
 Error **errp)
 {
 BDRVCopyBeforeWriteState *s = bs->opaque;
+BdrvDirtyBitmap *bitmap = NULL;
 
 bs->file = bdrv_open_child(NULL, options, "file", bs, _of_bds,
BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
@@ -162,6 +163,33 @@ static int cbw_open(BlockDriverState *bs, QDict *options, 
int flags,
 return -EINVAL;
 }
 
+if (qdict_haskey(options, "bitmap.node") ||
+qdict_haskey(options, "bitmap.name"))
+{
+const char *bitmap_node, *bitmap_name;
+
+if (!qdict_haskey(options, "bitmap.node")) {
+error_setg(errp, "bitmap.node is not specified");
+return -EINVAL;
+}
+
+if (!qdict_haskey(options, "bitmap.name")) {
+error_setg(errp, "bitmap.name is not specified");
+return -EINVAL;
+}
+
+bitmap_node = qdict_get_str(options, "bitmap.node");
+bitmap_name = qdict_get_str(options, "bitmap.name");
+qdict_del(options, "bitmap.node");
+qdict_del(options, "bitmap.name");
+
+bitmap = block_dirty_bitmap_lookup(bitmap_node, bitmap_name, NULL,
+   errp);
+if (!bitmap) {
+return -EINVAL;
+}
+}
+
 bs->total_sectors = bs->file->bs->total_sectors;
 bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
 (BDRV_REQ_FUA & bs->file->bs->supported_write_flags);
@@ -169,7 +197,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, 
int flags,
 ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
  bs->file->bs->supported_zero_flags);
 
-s->bcs = block_copy_state_new(bs->file, s->target, NULL, errp);
+s->bcs = block_copy_state_new(bs->file, s->target, bitmap, errp);
 if (!s->bcs) {
 error_prepend(errp, "Cannot create block-copy-state: ");
 return -EINVAL;
-- 
2.29.2




[PATCH v2 06/19] block: intoduce reqlist

2021-08-27 Thread Vladimir Sementsov-Ogievskiy
Split intersecting-requests functionality out of block-copy to be
reused in copy-before-write filter.

Note: while being here, fix tiny typo in MAINTAINERS.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/reqlist.h |  67 +++
 block/block-copy.c  | 116 +---
 block/reqlist.c |  76 ++
 MAINTAINERS |   4 +-
 block/meson.build   |   1 +
 5 files changed, 184 insertions(+), 80 deletions(-)
 create mode 100644 include/block/reqlist.h
 create mode 100644 block/reqlist.c

diff --git a/include/block/reqlist.h b/include/block/reqlist.h
new file mode 100644
index 00..b904d80216
--- /dev/null
+++ b/include/block/reqlist.h
@@ -0,0 +1,67 @@
+/*
+ * reqlist API
+ *
+ * Copyright (C) 2013 Proxmox Server Solutions
+ * Copyright (c) 2021 Virtuozzo International GmbH.
+ *
+ * Authors:
+ *  Dietmar Maurer (diet...@proxmox.com)
+ *  Vladimir Sementsov-Ogievskiy 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef REQLIST_H
+#define REQLIST_H
+
+#include "qemu/coroutine.h"
+
+/*
+ * The API is not thread-safe and shouldn't be. The struct is public to be part
+ * of other structures and protected by third-party locks, see
+ * block/block-copy.c for example.
+ */
+
+typedef struct BlockReq {
+int64_t offset;
+int64_t bytes;
+
+CoQueue wait_queue; /* coroutines blocked on this req */
+QLIST_ENTRY(BlockReq) list;
+} BlockReq;
+
+typedef QLIST_HEAD(, BlockReq) BlockReqList;
+
+/*
+ * Initialize new request and add it to the list. Caller should be sure that
+ * there are no conflicting requests in the list.
+ */
+void reqlist_init_req(BlockReqList *reqs, BlockReq *req, int64_t offset,
+  int64_t bytes);
+/* Search for request in the list intersecting with @offset/@bytes area. */
+BlockReq *reqlist_find_conflict(BlockReqList *reqs, int64_t offset,
+int64_t bytes);
+
+/*
+ * If there are no intersecting requests return false. Otherwise, wait for the
+ * first found intersecting request to finish and return true.
+ *
+ * @lock is passed to qemu_co_queue_wait()
+ * False return value proves that lock was NOT released.
+ */
+bool coroutine_fn reqlist_wait_one(BlockReqList *reqs, int64_t offset,
+   int64_t bytes, CoMutex *lock);
+
+/*
+ * Shrink request and wake all waiting coroutines (may be some of them are not
+ * intersecting with shrunk request).
+ */
+void coroutine_fn reqlist_shrink_req(BlockReq *req, int64_t new_bytes);
+
+/*
+ * Remove request and wake all waiting coroutines. Do not release any memory.
+ */
+void coroutine_fn reqlist_remove_req(BlockReq *req);
+
+#endif /* REQLIST_H */
diff --git a/block/block-copy.c b/block/block-copy.c
index ca51eab149..46e6a6736d 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -17,6 +17,7 @@
 #include "trace.h"
 #include "qapi/error.h"
 #include "block/block-copy.h"
+#include "block/reqlist.h"
 #include "sysemu/block-backend.h"
 #include "qemu/units.h"
 #include "qemu/coroutine.h"
@@ -82,7 +83,6 @@ typedef struct BlockCopyTask {
  */
 BlockCopyState *s;
 BlockCopyCallState *call_state;
-int64_t offset;
 /*
  * @method can also be set again in the while loop of
  * block_copy_dirty_clusters(), but it is never accessed concurrently
@@ -93,21 +93,17 @@ typedef struct BlockCopyTask {
 BlockCopyMethod method;
 
 /*
- * Fields whose state changes throughout the execution
- * Protected by lock in BlockCopyState.
+ * Generally, req is protected by lock in BlockCopyState, Still req.offset
+ * is only set on task creation, so may be read concurrently after 
creation.
+ * req.bytes is changed at most once, and need only protecting the case of
+ * parallel read while updating @bytes value in block_copy_task_shrink().
  */
-CoQueue wait_queue; /* coroutines blocked on this task */
-/*
- * Only protect the case of parallel read while updating @bytes
- * value in block_copy_task_shrink().
- */
-int64_t bytes;
-QLIST_ENTRY(BlockCopyTask) list;
+BlockReq req;
 } BlockCopyTask;
 
 static int64_t task_end(BlockCopyTask *task)
 {
-return task->offset + task->bytes;
+return task->req.offset + task->req.bytes;
 }
 
 typedef struct BlockCopyState {
@@ -135,7 +131,7 @@ typedef struct BlockCopyState {
 CoMutex lock;
 int64_t in_flight_bytes;
 BlockCopyMethod method;
-QLIST_HEAD(, BlockCopyTask) tasks; /* All tasks from all block-copy calls 
*/
+BlockReqList reqs;
 QLIST_HEAD(, BlockCopyCallState) calls;
 /*
  * skip_unallocated:
@@ -159,42 +155,6 @@ typedef struct BlockCopyState {
 RateLimit rate_limit;
 } BlockCopyState;
 
-/* Called with lock held */
-static BlockCopyTask *find_conflicting_task(BlockCopyState *s,
-

[PATCH v2 16/19] iotests/image-fleecing: add test case with bitmap

2021-08-27 Thread Vladimir Sementsov-Ogievskiy
Note that reads zero areas (not dirty in the bitmap) fails, that's
correct.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/tests/image-fleecing | 32 ++--
 tests/qemu-iotests/tests/image-fleecing.out | 84 +
 2 files changed, 108 insertions(+), 8 deletions(-)

diff --git a/tests/qemu-iotests/tests/image-fleecing 
b/tests/qemu-iotests/tests/image-fleecing
index f18881fa71..ab95e4960e 100755
--- a/tests/qemu-iotests/tests/image-fleecing
+++ b/tests/qemu-iotests/tests/image-fleecing
@@ -23,7 +23,7 @@
 # Creator/Owner: John Snow 
 
 import iotests
-from iotests import log, qemu_img, qemu_io, qemu_io_silent
+from iotests import log, qemu_img, qemu_io, qemu_io_silent, 
qemu_io_pipe_and_status
 
 iotests.script_initialize(
 supported_fmts=['qcow2', 'qcow', 'qed', 'vmdk', 'vhdx', 'raw'],
@@ -49,11 +49,15 @@ remainder = [('0xd5', '0x108000',  '32k'), # Right-end of 
partial-left [1]
  ('0xcd', '0x3ff', '64k')] # patterns[3]
 
 def do_test(use_cbw, use_fleecing_filter, base_img_path,
-fleece_img_path, nbd_sock_path, vm):
+fleece_img_path, nbd_sock_path, vm,
+bitmap=False):
 log('--- Setting up images ---')
 log('')
 
 assert qemu_img('create', '-f', iotests.imgfmt, base_img_path, '64M') == 0
+if bitmap:
+assert qemu_img('bitmap', '--add', base_img_path, 'bitmap0') == 0
+
 if use_fleecing_filter:
 assert use_cbw
 assert qemu_img('create', '-f', 'raw', fleece_img_path, '64M') == 0
@@ -113,12 +117,17 @@ def do_test(use_cbw, use_fleecing_filter, base_img_path,
 'source': src_node,
 }))
 
-log(vm.qmp('blockdev-add', {
+fl_cbw = {
 'driver': 'copy-before-write',
 'node-name': 'fl-cbw',
 'file': src_node,
 'target': 'fl-fleecing' if use_fleecing_filter else tmp_node
-}))
+}
+
+if bitmap:
+fl_cbw['bitmap'] = {'node': src_node, 'name': 'bitmap0'}
+
+log(vm.qmp('blockdev-add', fl_cbw))
 
 log(vm.qmp('qom-set', path=qom_path, property='drive', value='fl-cbw'))
 else:
@@ -148,7 +157,9 @@ def do_test(use_cbw, use_fleecing_filter, base_img_path,
 for p in patterns + zeroes:
 cmd = 'read -P%s %s %s' % p
 log(cmd)
-assert qemu_io_silent('-r', '-f', 'raw', '-c', cmd, nbd_uri) == 0
+out, ret = qemu_io_pipe_and_status('-r', '-f', 'raw', '-c', cmd, 
nbd_uri)
+if ret != 0:
+print(out)
 
 log('')
 log('--- Testing COW ---')
@@ -166,7 +177,9 @@ def do_test(use_cbw, use_fleecing_filter, base_img_path,
 for p in patterns + zeroes:
 cmd = 'read -P%s %s %s' % p
 log(cmd)
-assert qemu_io_silent('-r', '-f', 'raw', '-c', cmd, nbd_uri) == 0
+out, ret = qemu_io_pipe_and_status('-r', '-f', 'raw', '-c', cmd, 
nbd_uri)
+if ret != 0:
+print(out)
 
 log('')
 log('--- Cleanup ---')
@@ -201,14 +214,14 @@ def do_test(use_cbw, use_fleecing_filter, base_img_path,
 log('Done')
 
 
-def test(use_cbw, use_fleecing_filter):
+def test(use_cbw, use_fleecing_filter, bitmap=False):
 with iotests.FilePath('base.img') as base_img_path, \
  iotests.FilePath('fleece.img') as fleece_img_path, \
  iotests.FilePath('nbd.sock',
   base_dir=iotests.sock_dir) as nbd_sock_path, \
  iotests.VM() as vm:
 do_test(use_cbw, use_fleecing_filter, base_img_path,
-fleece_img_path, nbd_sock_path, vm)
+fleece_img_path, nbd_sock_path, vm, bitmap=bitmap)
 
 
 log('=== Test backup(sync=none) based fleecing ===\n')
@@ -219,3 +232,6 @@ test(True, False)
 
 log('=== Test fleecing-format based fleecing ===\n')
 test(True, True)
+
+log('=== Test fleecing-format based fleecing with bitmap ===\n')
+test(True, True, bitmap=True)
diff --git a/tests/qemu-iotests/tests/image-fleecing.out 
b/tests/qemu-iotests/tests/image-fleecing.out
index da0af93388..62e1c1fe42 100644
--- a/tests/qemu-iotests/tests/image-fleecing.out
+++ b/tests/qemu-iotests/tests/image-fleecing.out
@@ -190,6 +190,90 @@ read -P0 0x00f8000 32k
 read -P0 0x201 32k
 read -P0 0x3fe 64k
 
+--- Cleanup ---
+
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+
+--- Confirming writes ---
+
+read -P0xab 0 64k
+read -P0xad 0x00f8000 64k
+read -P0x1d 0x2008000 64k
+read -P0xea 0x3fe 64k
+read -P0xd5 0x108000 32k
+read -P0xdc 32M 32k
+read -P0xcd 0x3ff 64k
+
+Done
+=== Test fleecing-format based fleecing with bitmap ===
+
+--- Setting up images ---
+
+Done
+
+--- Launching VM ---
+
+Done
+
+--- Setting up Fleecing Graph ---
+
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"return": {}}
+
+--- Setting up NBD Export ---
+
+{"return": {}}
+{"return": {}}
+
+--- Sanity Check ---
+
+read -P0x5d 0 64k
+read -P0xd5 1M 64k
+read -P0xdc 32M 64k
+read -P0xcd 0x3ff 64k
+read -P0 0x00f8000 

[PATCH v2 14/19] iotests/image-fleecing: add test-case for fleecing format node

2021-08-27 Thread Vladimir Sementsov-Ogievskiy
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/tests/image-fleecing | 67 +--
 tests/qemu-iotests/tests/image-fleecing.out | 74 -
 2 files changed, 121 insertions(+), 20 deletions(-)

diff --git a/tests/qemu-iotests/tests/image-fleecing 
b/tests/qemu-iotests/tests/image-fleecing
index f6318492c6..f18881fa71 100755
--- a/tests/qemu-iotests/tests/image-fleecing
+++ b/tests/qemu-iotests/tests/image-fleecing
@@ -48,12 +48,17 @@ remainder = [('0xd5', '0x108000',  '32k'), # Right-end of 
partial-left [1]
  ('0xdc', '32M',   '32k'), # Left-end of partial-right [2]
  ('0xcd', '0x3ff', '64k')] # patterns[3]
 
-def do_test(use_cbw, base_img_path, fleece_img_path, nbd_sock_path, vm):
+def do_test(use_cbw, use_fleecing_filter, base_img_path,
+fleece_img_path, nbd_sock_path, vm):
 log('--- Setting up images ---')
 log('')
 
 assert qemu_img('create', '-f', iotests.imgfmt, base_img_path, '64M') == 0
-assert qemu_img('create', '-f', 'qcow2', fleece_img_path, '64M') == 0
+if use_fleecing_filter:
+assert use_cbw
+assert qemu_img('create', '-f', 'raw', fleece_img_path, '64M') == 0
+else:
+assert qemu_img('create', '-f', 'qcow2', fleece_img_path, '64M') == 0
 
 for p in patterns:
 qemu_io('-f', iotests.imgfmt,
@@ -80,24 +85,39 @@ def do_test(use_cbw, base_img_path, fleece_img_path, 
nbd_sock_path, vm):
 log('')
 
 
-# create tmp_node backed by src_node
-log(vm.qmp('blockdev-add', {
-'driver': 'qcow2',
-'node-name': tmp_node,
-'file': {
+if use_fleecing_filter:
+log(vm.qmp('blockdev-add', {
+'node-name': tmp_node,
 'driver': 'file',
 'filename': fleece_img_path,
-},
-'backing': src_node,
-}))
+}))
+else:
+# create tmp_node backed by src_node
+log(vm.qmp('blockdev-add', {
+'driver': 'qcow2',
+'node-name': tmp_node,
+'file': {
+'driver': 'file',
+'filename': fleece_img_path,
+},
+'backing': src_node,
+}))
 
 # Establish CBW from source to fleecing node
 if use_cbw:
+if use_fleecing_filter:
+log(vm.qmp('blockdev-add', {
+'driver': 'fleecing',
+'node-name': 'fl-fleecing',
+'file': tmp_node,
+'source': src_node,
+}))
+
 log(vm.qmp('blockdev-add', {
 'driver': 'copy-before-write',
 'node-name': 'fl-cbw',
 'file': src_node,
-'target': tmp_node
+'target': 'fl-fleecing' if use_fleecing_filter else tmp_node
 }))
 
 log(vm.qmp('qom-set', path=qom_path, property='drive', value='fl-cbw'))
@@ -108,16 +128,18 @@ def do_test(use_cbw, base_img_path, fleece_img_path, 
nbd_sock_path, vm):
target=tmp_node,
sync='none'))
 
+export_node = 'fl-fleecing' if use_fleecing_filter else tmp_node
+
 log('')
 log('--- Setting up NBD Export ---')
 log('')
 
-nbd_uri = 'nbd+unix:///%s?socket=%s' % (tmp_node, nbd_sock_path)
+nbd_uri = 'nbd+unix:///%s?socket=%s' % (export_node, nbd_sock_path)
 log(vm.qmp('nbd-server-start',
{'addr': { 'type': 'unix',
   'data': { 'path': nbd_sock_path } } }))
 
-log(vm.qmp('nbd-server-add', device=tmp_node))
+log(vm.qmp('nbd-server-add', device=export_node))
 
 log('')
 log('--- Sanity Check ---')
@@ -150,16 +172,19 @@ def do_test(use_cbw, base_img_path, fleece_img_path, 
nbd_sock_path, vm):
 log('--- Cleanup ---')
 log('')
 
+log(vm.qmp('nbd-server-stop'))
+
 if use_cbw:
 log(vm.qmp('qom-set', path=qom_path, property='drive', value=src_node))
 log(vm.qmp('blockdev-del', node_name='fl-cbw'))
+if use_fleecing_filter:
+log(vm.qmp('blockdev-del', node_name='fl-fleecing'))
 else:
 log(vm.qmp('block-job-cancel', device='fleecing'))
 e = vm.event_wait('BLOCK_JOB_CANCELLED')
 assert e is not None
 log(e, filters=[iotests.filter_qmp_event])
 
-log(vm.qmp('nbd-server-stop'))
 log(vm.qmp('blockdev-del', node_name=tmp_node))
 vm.shutdown()
 
@@ -176,17 +201,21 @@ def do_test(use_cbw, base_img_path, fleece_img_path, 
nbd_sock_path, vm):
 log('Done')
 
 
-def test(use_cbw):
+def test(use_cbw, use_fleecing_filter):
 with iotests.FilePath('base.img') as base_img_path, \
  iotests.FilePath('fleece.img') as fleece_img_path, \
  iotests.FilePath('nbd.sock',
   base_dir=iotests.sock_dir) as nbd_sock_path, \
  iotests.VM() as vm:
-do_test(use_cbw, base_img_path, fleece_img_path, nbd_sock_path, vm)
+do_test(use_cbw, use_fleecing_filter, base_img_path,
+

[PATCH v2 02/19] block/dirty-bitmap: bdrv_merge_dirty_bitmap(): add return value

2021-08-27 Thread Vladimir Sementsov-Ogievskiy
That simplifies handling failure in existing code and in further new
usage of bdrv_merge_dirty_bitmap().

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/dirty-bitmap.h| 2 +-
 block/dirty-bitmap.c| 9 +++--
 block/monitor/bitmap-qmp-cmds.c | 5 +
 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index 40950ae3d5..f95d350b70 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -77,7 +77,7 @@ void bdrv_dirty_bitmap_set_persistence(BdrvDirtyBitmap 
*bitmap,
bool persistent);
 void bdrv_dirty_bitmap_set_inconsistent(BdrvDirtyBitmap *bitmap);
 void bdrv_dirty_bitmap_set_busy(BdrvDirtyBitmap *bitmap, bool busy);
-void bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const BdrvDirtyBitmap *src,
+bool bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const BdrvDirtyBitmap *src,
  HBitmap **backup, Error **errp);
 void bdrv_dirty_bitmap_skip_store(BdrvDirtyBitmap *bitmap, bool skip);
 bool bdrv_dirty_bitmap_get(BdrvDirtyBitmap *bitmap, int64_t offset);
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 0ef46163e3..94a0276833 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -880,11 +880,14 @@ bool bdrv_dirty_bitmap_next_dirty_area(BdrvDirtyBitmap 
*bitmap,
  * Ensures permissions on bitmaps are reasonable; use for public API.
  *
  * @backup: If provided, make a copy of dest here prior to merge.
+ *
+ * Returns true on success, false on failure. In case of failure bitmaps are
+ * untouched.
  */
-void bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const BdrvDirtyBitmap *src,
+bool bdrv_merge_dirty_bitmap(BdrvDirtyBitmap *dest, const BdrvDirtyBitmap *src,
  HBitmap **backup, Error **errp)
 {
-bool ret;
+bool ret = false;
 
 bdrv_dirty_bitmaps_lock(dest->bs);
 if (src->bs != dest->bs) {
@@ -912,6 +915,8 @@ out:
 if (src->bs != dest->bs) {
 bdrv_dirty_bitmaps_unlock(src->bs);
 }
+
+return ret;
 }
 
 /**
diff --git a/block/monitor/bitmap-qmp-cmds.c b/block/monitor/bitmap-qmp-cmds.c
index 9f11deec64..83970b22fa 100644
--- a/block/monitor/bitmap-qmp-cmds.c
+++ b/block/monitor/bitmap-qmp-cmds.c
@@ -259,7 +259,6 @@ BdrvDirtyBitmap *block_dirty_bitmap_merge(const char *node, 
const char *target,
 BlockDriverState *bs;
 BdrvDirtyBitmap *dst, *src, *anon;
 BlockDirtyBitmapMergeSourceList *lst;
-Error *local_err = NULL;
 
 dst = block_dirty_bitmap_lookup(node, target, , errp);
 if (!dst) {
@@ -297,9 +296,7 @@ BdrvDirtyBitmap *block_dirty_bitmap_merge(const char *node, 
const char *target,
 abort();
 }
 
-bdrv_merge_dirty_bitmap(anon, src, NULL, _err);
-if (local_err) {
-error_propagate(errp, local_err);
+if (!bdrv_merge_dirty_bitmap(anon, src, NULL, errp)) {
 dst = NULL;
 goto out;
 }
-- 
2.29.2




[PATCH v2 12/19] block/block-copy: add write-unchanged mode

2021-08-27 Thread Vladimir Sementsov-Ogievskiy
We are going to implement push backup with fleecing scheme. This means
that backup job will be a fleecing user and therefore will not need
separate copy-before-write filter. Instead it will consider source as
constant unchanged drive. Of course backup will want to unshare writes
on source for this case. But we want to do copy-before-write
operations. Still these operations may be considered as
write-unchanged. Add corresponding option to block-copy now, to use in
the following commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/block-copy.h | 3 ++-
 block/block-copy.c | 9 ++---
 block/copy-before-write.c  | 2 +-
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index a11e1620f6..a66f81d314 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -25,7 +25,8 @@ typedef struct BlockCopyState BlockCopyState;
 typedef struct BlockCopyCallState BlockCopyCallState;
 
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
- BdrvDirtyBitmap *bitmap, Error **errp);
+ BdrvDirtyBitmap *bitmap,
+ bool write_unchanged, Error **errp);
 
 /* Function should be called prior any actual copy request */
 void block_copy_set_copy_opts(BlockCopyState *s, bool use_copy_range,
diff --git a/block/block-copy.c b/block/block-copy.c
index 46e6a6736d..6d5d517ac6 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -279,7 +279,8 @@ void block_copy_set_copy_opts(BlockCopyState *s, bool 
use_copy_range,
   bool compress)
 {
 /* Keep BDRV_REQ_SERIALISING set (or not set) in block_copy_state_new() */
-s->write_flags = (s->write_flags & BDRV_REQ_SERIALISING) |
+s->write_flags = (s->write_flags &
+  (BDRV_REQ_SERIALISING | BDRV_REQ_WRITE_UNCHANGED)) |
 (compress ? BDRV_REQ_WRITE_COMPRESSED : 0);
 
 if (s->max_transfer < s->cluster_size) {
@@ -340,7 +341,8 @@ static int64_t 
block_copy_calculate_cluster_size(BlockDriverState *target,
 }
 
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
- BdrvDirtyBitmap *bitmap, Error **errp)
+ BdrvDirtyBitmap *bitmap,
+ bool write_unchanged, Error **errp)
 {
 ERRP_GUARD();
 BlockCopyState *s;
@@ -393,7 +395,8 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, 
BdrvChild *target,
 .copy_bitmap = copy_bitmap,
 .cluster_size = cluster_size,
 .len = bdrv_dirty_bitmap_size(copy_bitmap),
-.write_flags = (is_fleecing ? BDRV_REQ_SERIALISING : 0),
+.write_flags = (is_fleecing ? BDRV_REQ_SERIALISING : 0) |
+(write_unchanged ? BDRV_REQ_WRITE_UNCHANGED : 0),
 .mem = shres_create(BLOCK_COPY_MAX_MEM),
 .max_transfer = QEMU_ALIGN_DOWN(
 block_copy_max_transfer(source, target),
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 686a085861..7e4e4bf7d4 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -212,7 +212,7 @@ static int cbw_open(BlockDriverState *bs, QDict *options, 
int flags,
 ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
  bs->file->bs->supported_zero_flags);
 
-s->bcs = block_copy_state_new(bs->file, s->target, bitmap, errp);
+s->bcs = block_copy_state_new(bs->file, s->target, bitmap, false, errp);
 if (!s->bcs) {
 error_prepend(errp, "Cannot create block-copy-state: ");
 return -EINVAL;
-- 
2.29.2




[PATCH v2 07/19] block/dirty-bitmap: introduce bdrv_dirty_bitmap_status()

2021-08-27 Thread Vladimir Sementsov-Ogievskiy
Add a convenient function similar with bdrv_block_status() to get
status of dirty bitmap.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/dirty-bitmap.h |  2 ++
 include/qemu/hbitmap.h   | 11 +++
 block/dirty-bitmap.c |  6 ++
 util/hbitmap.c   | 36 
 4 files changed, 55 insertions(+)

diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index f95d350b70..2ae7dc3d1d 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -115,6 +115,8 @@ int64_t bdrv_dirty_bitmap_next_zero(BdrvDirtyBitmap 
*bitmap, int64_t offset,
 bool bdrv_dirty_bitmap_next_dirty_area(BdrvDirtyBitmap *bitmap,
 int64_t start, int64_t end, int64_t max_dirty_count,
 int64_t *dirty_start, int64_t *dirty_count);
+void bdrv_dirty_bitmap_status(BdrvDirtyBitmap *bitmap, int64_t offset,
+  int64_t bytes, bool *is_dirty, int64_t *count);
 BdrvDirtyBitmap *bdrv_reclaim_dirty_bitmap_locked(BdrvDirtyBitmap *bitmap,
   Error **errp);
 
diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
index 5e71b6d6f7..845fda12db 100644
--- a/include/qemu/hbitmap.h
+++ b/include/qemu/hbitmap.h
@@ -340,6 +340,17 @@ bool hbitmap_next_dirty_area(const HBitmap *hb, int64_t 
start, int64_t end,
  int64_t max_dirty_count,
  int64_t *dirty_start, int64_t *dirty_count);
 
+/*
+ * bdrv_dirty_bitmap_status:
+ * @hb: The HBitmap to operate on
+ * @start: the offset to start from
+ * @end: end of requested area
+ * @is_dirty: is bitmap dirty at @offset
+ * @pnum: how many bits has same value starting from @offset
+ */
+void hbitmap_status(const HBitmap *hb, int64_t offset, int64_t bytes,
+bool *is_dirty, int64_t *pnum);
+
 /**
  * hbitmap_iter_next:
  * @hbi: HBitmapIter to operate on.
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 94a0276833..e4a836749a 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -875,6 +875,12 @@ bool bdrv_dirty_bitmap_next_dirty_area(BdrvDirtyBitmap 
*bitmap,
dirty_start, dirty_count);
 }
 
+void bdrv_dirty_bitmap_status(BdrvDirtyBitmap *bitmap, int64_t offset,
+  int64_t bytes, bool *is_dirty, int64_t *count)
+{
+hbitmap_status(bitmap->bitmap, offset, bytes, is_dirty, count);
+}
+
 /**
  * bdrv_merge_dirty_bitmap: merge src into dest.
  * Ensures permissions on bitmaps are reasonable; use for public API.
diff --git a/util/hbitmap.c b/util/hbitmap.c
index 305b894a63..ae8d0eb4d2 100644
--- a/util/hbitmap.c
+++ b/util/hbitmap.c
@@ -301,6 +301,42 @@ bool hbitmap_next_dirty_area(const HBitmap *hb, int64_t 
start, int64_t end,
 return true;
 }
 
+void hbitmap_status(const HBitmap *hb, int64_t start, int64_t count,
+bool *is_dirty, int64_t *pnum)
+{
+int64_t next_dirty, next_zero;
+
+assert(start >= 0);
+assert(count > 0);
+assert(start + count <= hb->orig_size);
+
+next_dirty = hbitmap_next_dirty(hb, start, count);
+if (next_dirty == -1) {
+*pnum = count;
+*is_dirty = false;
+return;
+}
+
+if (next_dirty > start) {
+*pnum = next_dirty - start;
+*is_dirty = false;
+return;
+}
+
+assert(next_dirty == start);
+
+next_zero = hbitmap_next_zero(hb, start, count);
+if (next_zero == -1) {
+*pnum = count;
+*is_dirty = true;
+return;
+}
+
+assert(next_zero > start);
+*pnum = next_zero - start;
+*is_dirty = false;
+}
+
 bool hbitmap_empty(const HBitmap *hb)
 {
 return hb->count == 0;
-- 
2.29.2




[PATCH v2 05/19] block/block-copy: add block_copy_reset()

2021-08-27 Thread Vladimir Sementsov-Ogievskiy
Split block_copy_reset() out of block_copy_reset_unallocated() to be
used separately later.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/block-copy.h |  1 +
 block/block-copy.c | 21 +
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index 8da4cec1b6..a11e1620f6 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -34,6 +34,7 @@ void block_copy_set_progress_meter(BlockCopyState *s, 
ProgressMeter *pm);
 
 void block_copy_state_free(BlockCopyState *s);
 
+void block_copy_reset(BlockCopyState *s, int64_t offset, int64_t bytes);
 int64_t block_copy_reset_unallocated(BlockCopyState *s,
  int64_t offset, int64_t *count);
 
diff --git a/block/block-copy.c b/block/block-copy.c
index 65019d0d1d..ca51eab149 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -689,6 +689,18 @@ static int block_copy_is_cluster_allocated(BlockCopyState 
*s, int64_t offset,
 }
 }
 
+void block_copy_reset(BlockCopyState *s, int64_t offset, int64_t bytes)
+{
+QEMU_LOCK_GUARD(>lock);
+
+bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
+if (s->progress) {
+progress_set_remaining(s->progress,
+   bdrv_get_dirty_count(s->copy_bitmap) +
+   s->in_flight_bytes);
+}
+}
+
 /*
  * Reset bits in copy_bitmap starting at offset if they represent unallocated
  * data in the image. May reset subsequent contiguous bits.
@@ -709,14 +721,7 @@ int64_t block_copy_reset_unallocated(BlockCopyState *s,
 bytes = clusters * s->cluster_size;
 
 if (!ret) {
-qemu_co_mutex_lock(>lock);
-bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
-if (s->progress) {
-progress_set_remaining(s->progress,
-   bdrv_get_dirty_count(s->copy_bitmap) +
-   s->in_flight_bytes);
-}
-qemu_co_mutex_unlock(>lock);
+block_copy_reset(s, offset, bytes);
 }
 
 *count = bytes;
-- 
2.29.2




Re: [PATCH RFC server v2 01/11] vfio-user: build library

2021-08-27 Thread Jag Raman


> On Aug 27, 2021, at 1:53 PM, Jag Raman  wrote:
> 
> add the libvfio-user library as a submodule. build it as a cmake
> subproject.
> 
> Signed-off-by: Elena Ufimtseva 
> Signed-off-by: John G Johnson 
> Signed-off-by: Jagannathan Raman 
> ---
> configure| 11 +++
> meson.build  | 28 
> .gitmodules  |  3 +++
> MAINTAINERS  |  7 +++
> hw/remote/meson.build|  2 ++
> subprojects/libvfio-user |  1 +
> 6 files changed, 52 insertions(+)
> create mode 16 subprojects/libvfio-user
> 
> diff --git a/configure b/configure
> index 9a79a00..794e900 100755
> --- a/configure
> +++ b/configure
> @@ -4291,6 +4291,17 @@ but not implemented on your system"
> fi
> 
> ##
> +# check for multiprocess
> +
> +case "$multiprocess" in
> +  auto | enabled )
> +if test "$git_submodules_action" != "ignore"; then
> +  git_submodules="${git_submodules} libvfio-user"
> +fi
> +;;
> +esac
> +
> +##
> # End of CC checks
> # After here, no more $cc or $ld runs
> 
> diff --git a/meson.build b/meson.build
> index bf63784..2b2d5c2 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -1898,6 +1898,34 @@ if get_option('cfi') and slirp_opt == 'system'
>  + ' Please configure with --enable-slirp=git')
> endif
> 
> +vfiouser = not_found
> +if have_system and multiprocess_allowed
> +  have_internal = fs.exists(meson.current_source_dir() / 
> 'subprojects/libvfio-user/Makefile')
> +
> +  if not have_internal
> +error('libvfio-user source not found - please pull git submodule')
> +  endif
> +
> +  json_c = dependency('json-c', required: false)
> +if not json_c.found()
> +  json_c = dependency('libjson-c')
> +  endif

One of the things we’re wondering is about this json-c package that we need to 
build
libvfio-user library.

The gitlab runners typically don’t have this package installed, as such the 
gitlab builds
fail. Wondering if there's a way to install this package for all QEMU builds?

We checked out the various jobs defined in “.gitlab-ci.d/buildtest.yml” - there 
is a
“before_script” keyword which we could use to install this package. The 
“before_script”
keyword appears to be run every time before a job’s script is executed. But 
this option
appears to be per job/build. Wondering if there's a distro-independent global 
way to
install a required package for all builds.

Thank you!
--
Jag

> +
> +  cmake = import('cmake')
> +
> +  vfiouser_subproj = cmake.subproject('libvfio-user')
> +
> +  vfiouser_sl = vfiouser_subproj.dependency('vfio-user-static')
> +
> +  # Although cmake links the json-c library with vfio-user-static
> +  # target, that info is not available to meson via cmake.subproject.
> +  # As such, we have to separately declare the json-c dependency here.
> +  # This appears to be a current limitation of using cmake inside meson.
> +  # libvfio-user is planning a switch to meson in the future, which
> +  # would address this item automatically.
> +  vfiouser = declare_dependency(dependencies: [vfiouser_sl, json_c])
> +endif
> +
> fdt = not_found
> fdt_opt = get_option('fdt')
> if have_system
> diff --git a/.gitmodules b/.gitmodules
> index 08b1b48..cfeea7c 100644
> --- a/.gitmodules
> +++ b/.gitmodules
> @@ -64,3 +64,6 @@
> [submodule "roms/vbootrom"]
>   path = roms/vbootrom
>   url = https://gitlab.com/qemu-project/vbootrom.git
> +[submodule "subprojects/libvfio-user"]
> + path = subprojects/libvfio-user
> + url = https://github.com/nutanix/libvfio-user.git
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 4039d3c..0c5a18e 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3361,6 +3361,13 @@ F: semihosting/
> F: include/semihosting/
> F: tests/tcg/multiarch/arm-compat-semi/
> 
> +libvfio-user Library
> +M: Thanos Makatos 
> +M: John Levon 
> +T: https://github.com/nutanix/libvfio-user.git
> +S: Maintained
> +F: subprojects/libvfio-user/*
> +
> Multi-process QEMU
> M: Elena Ufimtseva 
> M: Jagannathan Raman 
> diff --git a/hw/remote/meson.build b/hw/remote/meson.build
> index e6a5574..fb35fb8 100644
> --- a/hw/remote/meson.build
> +++ b/hw/remote/meson.build
> @@ -7,6 +7,8 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: 
> files('remote-obj.c'))
> remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
> remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('iohub.c'))
> 
> +remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: vfiouser)
> +
> specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('memory.c'))
> specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: 
> files('proxy-memory-listener.c'))
> 
> diff --git a/subprojects/libvfio-user b/subprojects/libvfio-user
> new file mode 16
> index 000..647c934
> --- /dev/null
> +++ b/subprojects/libvfio-user
> @@ -0,0 +1 @@
> +Subproject commit 647c9341d2e06266a710ddd075f69c95dd3b8446
> -- 
> 1.8.3.1
> 



[PATCH RFC server v2 09/11] vfio-user: handle device interrupts

2021-08-27 Thread Jagannathan Raman
Forward remote device's interrupts to the guest

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
 include/hw/remote/iohub.h |  2 ++
 hw/remote/iohub.c |  5 +
 hw/remote/vfio-user-obj.c | 30 ++
 hw/remote/trace-events|  1 +
 4 files changed, 38 insertions(+)

diff --git a/include/hw/remote/iohub.h b/include/hw/remote/iohub.h
index 0bf98e0..d5bd0b0 100644
--- a/include/hw/remote/iohub.h
+++ b/include/hw/remote/iohub.h
@@ -15,6 +15,7 @@
 #include "qemu/event_notifier.h"
 #include "qemu/thread-posix.h"
 #include "hw/remote/mpqemu-link.h"
+#include "libvfio-user.h"
 
 #define REMOTE_IOHUB_NB_PIRQSPCI_DEVFN_MAX
 
@@ -30,6 +31,7 @@ typedef struct RemoteIOHubState {
 unsigned int irq_level[REMOTE_IOHUB_NB_PIRQS];
 ResampleToken token[REMOTE_IOHUB_NB_PIRQS];
 QemuMutex irq_level_lock[REMOTE_IOHUB_NB_PIRQS];
+vfu_ctx_t *vfu_ctx[REMOTE_IOHUB_NB_PIRQS];
 } RemoteIOHubState;
 
 int remote_iohub_map_irq(PCIDevice *pci_dev, int intx);
diff --git a/hw/remote/iohub.c b/hw/remote/iohub.c
index 547d597..9410233 100644
--- a/hw/remote/iohub.c
+++ b/hw/remote/iohub.c
@@ -18,6 +18,7 @@
 #include "hw/remote/machine.h"
 #include "hw/remote/iohub.h"
 #include "qemu/main-loop.h"
+#include "trace.h"
 
 void remote_iohub_init(RemoteIOHubState *iohub)
 {
@@ -62,6 +63,10 @@ void remote_iohub_set_irq(void *opaque, int pirq, int level)
 QEMU_LOCK_GUARD(>irq_level_lock[pirq]);
 
 if (level) {
+if (iohub->vfu_ctx[pirq]) {
+trace_vfu_interrupt(pirq);
+vfu_irq_trigger(iohub->vfu_ctx[pirq], 0);
+}
 if (++iohub->irq_level[pirq] == 1) {
 event_notifier_set(>irqfds[pirq]);
 }
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 299c938..92605ed 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -42,6 +42,9 @@
 #include "libvfio-user.h"
 #include "hw/qdev-core.h"
 #include "hw/pci/pci.h"
+#include "hw/boards.h"
+#include "hw/remote/iohub.h"
+#include "hw/remote/machine.h"
 
 #define TYPE_VFU_OBJECT "vfio-user"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -313,6 +316,26 @@ static void vfu_object_register_bars(vfu_ctx_t *vfu_ctx, 
PCIDevice *pdev)
 }
 }
 
+static int vfu_object_setup_irqs(vfu_ctx_t *vfu_ctx, PCIDevice *pci_dev)
+{
+RemoteMachineState *machine = REMOTE_MACHINE(current_machine);
+RemoteIOHubState *iohub = >iohub;
+int pirq, intx, ret;
+
+ret = vfu_setup_device_nr_irqs(vfu_ctx, VFU_DEV_INTX_IRQ, 1);
+if (ret < 0) {
+return ret;
+}
+
+intx = pci_get_byte(pci_dev->config + PCI_INTERRUPT_PIN) - 1;
+
+pirq = remote_iohub_map_irq(pci_dev, intx);
+
+iohub->vfu_ctx[pirq] = vfu_ctx;
+
+return 0;
+}
+
 static void vfu_object_machine_done(Notifier *notifier, void *data)
 {
 VfuObject *o = container_of(notifier, VfuObject, machine_done);
@@ -371,6 +394,13 @@ static void vfu_object_machine_done(Notifier *notifier, 
void *data)
 
 vfu_object_register_bars(o->vfu_ctx, o->pci_dev);
 
+ret = vfu_object_setup_irqs(o->vfu_ctx, o->pci_dev);
+if (ret < 0) {
+error_setg(_abort, "vfu: Failed to setup interrupts for %s",
+   o->devid);
+return;
+}
+
 ret = vfu_realize_ctx(o->vfu_ctx);
 if (ret < 0) {
 error_setg(_abort, "vfu: Failed to realize device %s- %s",
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index f3f65e2..b419d6f 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -11,3 +11,4 @@ vfu_dma_register(uint64_t gpa, size_t len) "vfu: registering 
GPA 0x%"PRIx64", %z
 vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
 vfu_bar_rw_enter(const char *op, uint64_t addr) "vfu: %s request for BAR 
address 0x%"PRIx64""
 vfu_bar_rw_exit(const char *op, uint64_t addr) "vfu: Finished %s of BAR 
address 0x%"PRIx64""
+vfu_interrupt(int pirq) "vfu: sending interrupt to device - PIRQ %d"
-- 
1.8.3.1




[PATCH] Report any problems with loading the VGA driver for PPC Macintosh targets

2021-08-27 Thread John Arbuckle
I was having a problem with missing video resolutions in my Mac OS 9 VM. When I
ran QEMU it gave no indication as to why these resolutions were missing. I found
out that the OpenFirmware VGA driver was not being loaded. To prevent anyone 
from
going thru the same trouble I went thru I added messages that the user can see
when a problem takes place with loading this driver in the future.

Signed-off-by: John Arbuckle 
---
 hw/ppc/mac_newworld.c | 6 ++
 hw/ppc/mac_oldworld.c | 6 ++
 2 files changed, 12 insertions(+)

diff --git a/hw/ppc/mac_newworld.c b/hw/ppc/mac_newworld.c
index 7bb7ac3997..c1960452b8 100644
--- a/hw/ppc/mac_newworld.c
+++ b/hw/ppc/mac_newworld.c
@@ -526,8 +526,14 @@ static void ppc_core99_init(MachineState *machine)
 
 if (g_file_get_contents(filename, _file, _size, NULL)) {
 fw_cfg_add_file(fw_cfg, "ndrv/qemu_vga.ndrv", ndrv_file, 
ndrv_size);
+} else {
+printf("Warning: failed to load driver %s. This may cause video"
+   " problems.\n");
 }
 g_free(filename);
+} else {
+printf("Warning: driver %s not found. This may cause video 
problems.\n",
+   NDRV_VGA_FILENAME);
 }
 
 qemu_register_boot_set(fw_cfg_boot_set, fw_cfg);
diff --git a/hw/ppc/mac_oldworld.c b/hw/ppc/mac_oldworld.c
index de2be960e6..96603fe9cf 100644
--- a/hw/ppc/mac_oldworld.c
+++ b/hw/ppc/mac_oldworld.c
@@ -367,8 +367,14 @@ static void ppc_heathrow_init(MachineState *machine)
 
 if (g_file_get_contents(filename, _file, _size, NULL)) {
 fw_cfg_add_file(fw_cfg, "ndrv/qemu_vga.ndrv", ndrv_file, 
ndrv_size);
+} else {
+printf("Warning: failed to load driver %s. This may cause video"
+   " problems.\n");
 }
 g_free(filename);
+} else {
+printf("Warning: driver %s not found. This may cause video 
problems.\n",
+   NDRV_VGA_FILENAME);
 }
 
 qemu_register_boot_set(fw_cfg_boot_set, fw_cfg);
-- 
2.24.3 (Apple Git-128)




Re: [PATCH v2 0/2] hw/arm/raspi: Remove deprecated raspi2/raspi3 aliases

2021-08-27 Thread Willian Rampazzo
Hi, Phil,

On Thu, Aug 26, 2021 at 1:49 PM Philippe Mathieu-Daudé  wrote:
>
> Hi Peter,
>
> On 7/9/21 6:00 PM, Peter Maydell wrote:
> > On Fri, 9 Jul 2021 at 16:33, Peter Maydell  wrote:
> >>
> >> On Thu, 8 Jul 2021 at 15:55, Philippe Mathieu-Daudé  
> >> wrote:
> >>>
> >>> Since v1:
> >>> - renamed tests (Peter)
> >>>
> >>> Philippe Mathieu-Daudé (2):
> >>>   tests: Remove uses of deprecated raspi2/raspi3 machine names
> >>>   hw/arm/raspi: Remove deprecated raspi2/raspi3 aliases
> >>
> >>
> >>
> >> Applied to target-arm.next, thanks.
> >
> > I found that this seems to break 'make check':
> >
> >  (09/52) 
> > tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_raspi2_initrd:
> > ERROR: Could not perform graceful shutdown (40.38 s)
>
> I can not reproduce. Maybe something got changed in Python/Avocado
> since... I'm going to simply respin (updating 6.1 -> 6.2).

I also could not reproduce. I checked and nothing changed on the
Avocado side. The version is still the same on the QEMU side.

>
> >
> > Dropped from target-arm.next.
> >
> > thanks
> > -- PMM
> >
>




[PATCH RFC server v2 08/11] vfio-user: handle PCI BAR accesses

2021-08-27 Thread Jagannathan Raman
Determine the BARs used by the PCI device and register handlers to
manage the access to the same.

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
 hw/remote/vfio-user-obj.c | 95 +++
 hw/remote/trace-events|  2 +
 2 files changed, 97 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 76fb2d4..299c938 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -220,6 +220,99 @@ static int dma_unregister(vfu_ctx_t *vfu_ctx, 
vfu_dma_info_t *info)
 return 0;
 }
 
+static ssize_t vfu_object_bar_rw(PCIDevice *pci_dev, hwaddr addr, size_t count,
+ char * const buf, const bool is_write,
+ uint8_t type)
+{
+AddressSpace *as = NULL;
+MemTxResult res;
+
+if (type == PCI_BASE_ADDRESS_SPACE_MEMORY) {
+as = pci_device_iommu_address_space(pci_dev);
+} else {
+as = _space_io;
+}
+
+trace_vfu_bar_rw_enter(is_write ? "Write" : "Read", (uint64_t)addr);
+
+res = address_space_rw(as, addr, MEMTXATTRS_UNSPECIFIED, (void *)buf,
+   (hwaddr)count, is_write);
+if (res != MEMTX_OK) {
+warn_report("vfu: failed to %s 0x%"PRIx64"",
+is_write ? "write to" : "read from",
+addr);
+return -1;
+}
+
+trace_vfu_bar_rw_exit(is_write ? "Write" : "Read", (uint64_t)addr);
+
+return count;
+}
+
+/**
+ * VFU_OBJECT_BAR_HANDLER - macro for defining handlers for PCI BARs.
+ *
+ * To create handler for BAR number 2, VFU_OBJECT_BAR_HANDLER(2) would
+ * define vfu_object_bar2_handler
+ */
+#define VFU_OBJECT_BAR_HANDLER(BAR_NO) 
\
+static ssize_t vfu_object_bar##BAR_NO##_handler(vfu_ctx_t *vfu_ctx,
\
+char * const buf, size_t count,
\
+loff_t offset, const bool is_write)
\
+{  
\
+VfuObject *o = vfu_get_private(vfu_ctx);   
\
+hwaddr addr = (hwaddr)(pci_get_long(o->pci_dev->config +   
\
+PCI_BASE_ADDRESS_0 +   
\
+(4 * BAR_NO)) + offset);   
\
+   
\
+return vfu_object_bar_rw(o->pci_dev, addr, count, buf, is_write,   
\
+ o->pci_dev->io_regions[BAR_NO].type); 
\
+}  
\
+
+VFU_OBJECT_BAR_HANDLER(0)
+VFU_OBJECT_BAR_HANDLER(1)
+VFU_OBJECT_BAR_HANDLER(2)
+VFU_OBJECT_BAR_HANDLER(3)
+VFU_OBJECT_BAR_HANDLER(4)
+VFU_OBJECT_BAR_HANDLER(5)
+
+static vfu_region_access_cb_t *vfu_object_bar_handlers[PCI_NUM_REGIONS] = {
+_object_bar0_handler,
+_object_bar1_handler,
+_object_bar2_handler,
+_object_bar3_handler,
+_object_bar4_handler,
+_object_bar5_handler,
+};
+
+/**
+ * vfu_object_register_bars - Identify active BAR regions of pdev and setup
+ *callbacks to handle read/write accesses
+ */
+static void vfu_object_register_bars(vfu_ctx_t *vfu_ctx, PCIDevice *pdev)
+{
+uint32_t orig_val, new_val;
+int i, size;
+
+for (i = 0; i < PCI_NUM_REGIONS; i++) {
+orig_val = pci_default_read_config(pdev,
+   PCI_BASE_ADDRESS_0 + (4 * i), 4);
+new_val = 0x;
+pci_default_write_config(pdev,
+ PCI_BASE_ADDRESS_0 + (4 * i), new_val, 4);
+new_val = pci_default_read_config(pdev,
+  PCI_BASE_ADDRESS_0 + (4 * i), 4);
+size = (~(new_val & 0xFFF0)) + 1;
+pci_default_write_config(pdev, PCI_BASE_ADDRESS_0 + (4 * i),
+ orig_val, 4);
+if (size) {
+vfu_setup_region(vfu_ctx, VFU_PCI_DEV_BAR0_REGION_IDX + i, size,
+ vfu_object_bar_handlers[i], VFU_REGION_FLAG_RW,
+ NULL, 0, -1, 0);
+}
+}
+}
+
 static void vfu_object_machine_done(Notifier *notifier, void *data)
 {
 VfuObject *o = container_of(notifier, VfuObject, machine_done);
@@ -276,6 +369,8 @@ static void vfu_object_machine_done(Notifier *notifier, 
void *data)
 return;
 }
 
+vfu_object_register_bars(o->vfu_ctx, o->pci_dev);
+
 ret = vfu_realize_ctx(o->vfu_ctx);
 if (ret < 0) {
 error_setg(_abort, "vfu: Failed to realize device %s- %s",
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index f945c7e..f3f65e2 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -9,3 +9,5 @@ vfu_cfg_read(uint32_t 

[PATCH RFC server v2 06/11] vfio-user: handle PCI config space accesses

2021-08-27 Thread Jagannathan Raman
Define and register handlers for PCI config space accesses

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
 hw/remote/vfio-user-obj.c | 44 
 hw/remote/trace-events|  2 ++
 2 files changed, 46 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 0726eb9..13011ce 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -36,6 +36,7 @@
 #include "sysemu/runstate.h"
 #include "qemu/notify.h"
 #include "qemu/thread.h"
+#include "qemu/main-loop.h"
 #include "qapi/error.h"
 #include "sysemu/sysemu.h"
 #include "libvfio-user.h"
@@ -144,6 +145,38 @@ retry_attach:
 return NULL;
 }
 
+static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
+ size_t count, loff_t offset,
+ const bool is_write)
+{
+VfuObject *o = vfu_get_private(vfu_ctx);
+uint32_t pci_access_width = sizeof(uint32_t);
+size_t bytes = count;
+uint32_t val = 0;
+char *ptr = buf;
+int len;
+
+while (bytes > 0) {
+len = (bytes > pci_access_width) ? pci_access_width : bytes;
+if (is_write) {
+memcpy(, ptr, len);
+pci_default_write_config(PCI_DEVICE(o->pci_dev),
+ offset, val, len);
+trace_vfu_cfg_write(offset, val);
+} else {
+val = pci_default_read_config(PCI_DEVICE(o->pci_dev),
+  offset, len);
+memcpy(ptr, , len);
+trace_vfu_cfg_read(offset, val);
+}
+offset += len;
+ptr += len;
+bytes -= len;
+}
+
+return count;
+}
+
 static void vfu_object_machine_done(Notifier *notifier, void *data)
 {
 VfuObject *o = container_of(notifier, VfuObject, machine_done);
@@ -182,6 +215,17 @@ static void vfu_object_machine_done(Notifier *notifier, 
void *data)
 return;
 }
 
+ret = vfu_setup_region(o->vfu_ctx, VFU_PCI_DEV_CFG_REGION_IDX,
+   pci_config_size(o->pci_dev), _object_cfg_access,
+   VFU_REGION_FLAG_RW | VFU_REGION_FLAG_ALWAYS_CB,
+   NULL, 0, -1, 0);
+if (ret < 0) {
+error_setg(_abort,
+   "vfu: Failed to setup config space handlers for %s- %s",
+   o->devid, strerror(errno));
+return;
+}
+
 ret = vfu_realize_ctx(o->vfu_ctx);
 if (ret < 0) {
 error_setg(_abort, "vfu: Failed to realize device %s- %s",
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index 7da12f0..2ef7884 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -5,3 +5,5 @@ mpqemu_recv_io_error(int cmd, int size, int nfds) "failed to 
receive %d size %d,
 
 # vfio-user-obj.c
 vfu_prop(const char *prop, const char *val) "vfu: setting %s as %s"
+vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u -> 0x%x"
+vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u <- 0x%x"
-- 
1.8.3.1




Re: [PATCH] nbd/server: Advertise MULTI_CONN for shared writable exports

2021-08-27 Thread Eric Blake
On Fri, Aug 27, 2021 at 05:48:10PM +0100, Richard W.M. Jones wrote:
> On Fri, Aug 27, 2021 at 10:09:16AM -0500, Eric Blake wrote:
> > +# Parallel client connections are easier with libnbd than qemu-io:
> > +if ! (nbdsh --version) >/dev/null 2>&1; then
> 
> I'm curious why the parentheses are needed here?

Habit - some earlier Bourne shells would leak an error message if
nbdsh was not found unless you guarantee that stderr is redirected
first by using the subshell (or maybe that's what happen when you do
feature tests of a shell builtin, rather than an external app).  But
since this test uses bash, you're right that it's not necessary in
this instance, so I'll simplify.

> 
> > +export nbd_unix_socket
> > +nbdsh -c '
> > +import os
> > +sock = os.getenv("nbd_unix_socket")
> > +h = []
> > +
> > +for i in range(3):
> > +  h.append(nbd.NBD())
> > +  h[i].connect_unix(sock)
> > +  assert h[i].can_multi_conn()
> > +
> > +buf1 = h[0].pread(1024 * 1024, 0)
> > +if buf1 != b"\x01" * 1024 * 1024:
> > +  print("Unexpected initial read")
> > +buf2 = b"\x03" * 1024 * 1024
> > +h[1].pwrite(buf2, 0)   #1
> > +h[2].flush()
> > +buf1 = h[0].pread(1024 * 1024, 0)
> > +if buf1 == buf2:
> > +  print("Flush appears to be consistent across connections")
> 
> The test is ... simple.
> 
> Would it be a better test if the statement at line #1 above was
> h.aio_pwrite instead, so the operation is only started?  This would
> depend on some assumptions inside libnbd: That the h[1] socket is
> immediately writable and that libnbd's statement will write before
> returning, which are likely to be true, but perhaps you could do
> something with parsing libnbd debug statements to check that the state
> machine has started the write.

The rules on consistent operations are tricky - it is not enough that
the client started the request in order, but that the server processed
things in order.  Even though you can have two clients in one process
and enough happens-between code in that you are sure that client A
sent NBD_CMD_WRITE prior to client B sending NBD_CMD_FLUSH, you do NOT
have enough power over TCP to prove that the server did not receive
client B's request first, unless you ALSO ensure that client A waited
for the server's response to the NBD_CMD_WRITE.  Similarly for client
C sending NBD_CMD_READ prior to client B getting the server's response
to NBD_CMD_FLUSH.  As soon as out-of-order request processing can
happen (which is more likely when you have parallel sockets), you can
only guarantee the cross-client consistency if you can also guarantee
which replies were received prior to sending a given request.  Using
asynch I/O does not provide those guarantees; the only reliable way to
test cross-client consistency is with synchronous commands.

Hmm - I should probably tweak the qemu-nbd command to set an explicit
cache mode (if it accidentally settles on cache=writethrough, that
defeats the point of an independent client's flush impacting all other
clients, since the client doing the write will also be flushing).

> 
> Another idea would be to make the write at line #1 be very large, so
> that perhaps the qemu block layer would split it up.  After the flush
> you would read the beginning and end bytes of the written part to
> approximate a check that the whole write has been flushed so parts of
> it are not hanging around in the cache of the first connection.
> 
> Anyway the patch as written seems fine to me so:
> 
> Reviewed-by: Richard W.M. Jones 
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




[PATCH RFC server v2 10/11] vfio-user: register handlers to facilitate migration

2021-08-27 Thread Jagannathan Raman
Store and load the device's state during migration. use libvfio-user's
handlers for this purpose

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
 migration/savevm.h|   2 +
 hw/remote/vfio-user-obj.c | 313 ++
 migration/savevm.c|  73 +++
 3 files changed, 388 insertions(+)

diff --git a/migration/savevm.h b/migration/savevm.h
index 6461342..8007064 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -67,5 +67,7 @@ int qemu_loadvm_state_main(QEMUFile *f, 
MigrationIncomingState *mis);
 int qemu_load_device_state(QEMUFile *f);
 int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
 bool in_postcopy, bool inactivate_disks);
+int qemu_remote_savevm(QEMUFile *f, DeviceState *dev);
+int qemu_remote_loadvm(QEMUFile *f);
 
 #endif
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 92605ed..16cf515 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -45,6 +45,10 @@
 #include "hw/boards.h"
 #include "hw/remote/iohub.h"
 #include "hw/remote/machine.h"
+#include "migration/qemu-file.h"
+#include "migration/savevm.h"
+#include "migration/global_state.h"
+#include "block/block.h"
 
 #define TYPE_VFU_OBJECT "vfio-user"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -72,6 +76,33 @@ struct VfuObject {
 PCIDevice *pci_dev;
 
 int vfu_poll_fd;
+
+/*
+ * vfu_mig_buf holds the migration data. In the remote server, this
+ * buffer replaces the role of an IO channel which links the source
+ * and the destination.
+ *
+ * Whenever the client QEMU process initiates migration, the remote
+ * server gets notified via libvfio-user callbacks. The remote server
+ * sets up a QEMUFile object using this buffer as backend. The remote
+ * server passes this object to its migration subsystem, which slurps
+ * the VMSD of the device ('devid' above) referenced by this object
+ * and stores the VMSD in this buffer.
+ *
+ * The client subsequetly asks the remote server for any data that
+ * needs to be moved over to the destination via libvfio-user
+ * library's vfu_migration_callbacks_t callbacks. The remote hands
+ * over this buffer as data at this time.
+ *
+ * A reverse of this process happens at the destination.
+ */
+uint8_t *vfu_mig_buf;
+
+uint64_t vfu_mig_buf_size;
+
+uint64_t vfu_mig_buf_pending;
+
+QEMUFile *vfu_mig_file;
 };
 
 static void vfu_object_set_socket(Object *obj, const char *str, Error **errp)
@@ -96,6 +127,250 @@ static void vfu_object_set_devid(Object *obj, const char 
*str, Error **errp)
 trace_vfu_prop("devid", str);
 }
 
+/**
+ * Migration helper functions
+ *
+ * vfu_mig_buf_read & vfu_mig_buf_write are used by QEMU's migration
+ * subsystem - qemu_remote_loadvm & qemu_remote_savevm. loadvm/savevm
+ * call these functions via QEMUFileOps to load/save the VMSD of a
+ * device into vfu_mig_buf
+ *
+ */
+static ssize_t vfu_mig_buf_read(void *opaque, uint8_t *buf, int64_t pos,
+size_t size, Error **errp)
+{
+VfuObject *o = opaque;
+
+if (pos > o->vfu_mig_buf_size) {
+size = 0;
+} else if ((pos + size) > o->vfu_mig_buf_size) {
+size = o->vfu_mig_buf_size;
+}
+
+memcpy(buf, (o->vfu_mig_buf + pos), size);
+
+o->vfu_mig_buf_size -= size;
+
+return size;
+}
+
+static ssize_t vfu_mig_buf_write(void *opaque, struct iovec *iov, int iovcnt,
+ int64_t pos, Error **errp)
+{
+VfuObject *o = opaque;
+uint64_t end = pos + iov_size(iov, iovcnt);
+int i;
+
+if (end > o->vfu_mig_buf_size) {
+o->vfu_mig_buf = g_realloc(o->vfu_mig_buf, end);
+}
+
+for (i = 0; i < iovcnt; i++) {
+memcpy((o->vfu_mig_buf + o->vfu_mig_buf_size), iov[i].iov_base,
+   iov[i].iov_len);
+o->vfu_mig_buf_size += iov[i].iov_len;
+o->vfu_mig_buf_pending += iov[i].iov_len;
+}
+
+return iov_size(iov, iovcnt);
+}
+
+static int vfu_mig_buf_shutdown(void *opaque, bool rd, bool wr, Error **errp)
+{
+VfuObject *o = opaque;
+
+o->vfu_mig_buf_size = 0;
+
+g_free(o->vfu_mig_buf);
+
+return 0;
+}
+
+static const QEMUFileOps vfu_mig_fops_save = {
+.writev_buffer  = vfu_mig_buf_write,
+.shut_down  = vfu_mig_buf_shutdown,
+};
+
+static const QEMUFileOps vfu_mig_fops_load = {
+.get_buffer = vfu_mig_buf_read,
+.shut_down  = vfu_mig_buf_shutdown,
+};
+
+/**
+ * handlers for vfu_migration_callbacks_t
+ *
+ * The libvfio-user library accesses these handlers to drive the migration
+ * at the remote end, and also to transport the data stored in vfu_mig_buf
+ *
+ */
+static void vfu_mig_state_precopy(vfu_ctx_t *vfu_ctx)
+{
+VfuObject *o = vfu_get_private(vfu_ctx);
+int ret;
+
+if (!o->vfu_mig_file) {
+o->vfu_mig_file = qemu_fopen_ops(o, 

[PATCH RFC server v2 05/11] vfio-user: run vfio-user context

2021-08-27 Thread Jagannathan Raman
Setup a handler to run vfio-user context. The context is driven by
messages to the file descriptor associated with it - get the fd for
the context and hook up the handler with it

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
 hw/remote/vfio-user-obj.c | 71 ++-
 1 file changed, 70 insertions(+), 1 deletion(-)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 5ae0991..0726eb9 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -35,6 +35,7 @@
 #include "trace.h"
 #include "sysemu/runstate.h"
 #include "qemu/notify.h"
+#include "qemu/thread.h"
 #include "qapi/error.h"
 #include "sysemu/sysemu.h"
 #include "libvfio-user.h"
@@ -65,6 +66,8 @@ struct VfuObject {
 vfu_ctx_t *vfu_ctx;
 
 PCIDevice *pci_dev;
+
+int vfu_poll_fd;
 };
 
 static void vfu_object_set_socket(Object *obj, const char *str, Error **errp)
@@ -89,13 +92,67 @@ static void vfu_object_set_devid(Object *obj, const char 
*str, Error **errp)
 trace_vfu_prop("devid", str);
 }
 
+static void vfu_object_ctx_run(void *opaque)
+{
+VfuObject *o = opaque;
+int ret = -1;
+
+while (ret != 0) {
+ret = vfu_run_ctx(o->vfu_ctx);
+if (ret < 0) {
+if (errno == EINTR) {
+continue;
+} else if (errno == ENOTCONN) {
+qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL);
+o->vfu_poll_fd = -1;
+object_unparent(OBJECT(o));
+break;
+} else {
+error_setg(_abort, "vfu: Failed to run device %s - %s",
+   o->devid, strerror(errno));
+ break;
+}
+}
+}
+}
+
+static void *vfu_object_attach_ctx(void *opaque)
+{
+VfuObject *o = opaque;
+int ret;
+
+retry_attach:
+ret = vfu_attach_ctx(o->vfu_ctx);
+if (ret < 0 && (errno == EAGAIN || errno == EWOULDBLOCK)) {
+goto retry_attach;
+} else if (ret < 0) {
+error_setg(_abort,
+   "vfu: Failed to attach device %s to context - %s",
+   o->devid, strerror(errno));
+return NULL;
+}
+
+o->vfu_poll_fd = vfu_get_poll_fd(o->vfu_ctx);
+if (o->vfu_poll_fd < 0) {
+error_setg(_abort, "vfu: Failed to get poll fd %s", o->devid);
+return NULL;
+}
+
+qemu_set_fd_handler(o->vfu_poll_fd, vfu_object_ctx_run,
+NULL, o);
+
+return NULL;
+}
+
 static void vfu_object_machine_done(Notifier *notifier, void *data)
 {
 VfuObject *o = container_of(notifier, VfuObject, machine_done);
 DeviceState *dev = NULL;
+QemuThread thread;
 int ret;
 
-o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket, 0,
+o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket,
+LIBVFIO_USER_FLAG_ATTACH_NB,
 o, VFU_DEV_TYPE_PCI);
 if (o->vfu_ctx == NULL) {
 error_setg(_abort, "vfu: Failed to create context - %s",
@@ -124,6 +181,16 @@ static void vfu_object_machine_done(Notifier *notifier, 
void *data)
o->devid, strerror(errno));
 return;
 }
+
+ret = vfu_realize_ctx(o->vfu_ctx);
+if (ret < 0) {
+error_setg(_abort, "vfu: Failed to realize device %s- %s",
+   o->devid, strerror(errno));
+return;
+}
+
+qemu_thread_create(, o->socket, vfu_object_attach_ctx, o,
+   QEMU_THREAD_DETACHED);
 }
 
 static void vfu_object_init(Object *obj)
@@ -147,6 +214,8 @@ static void vfu_object_init(Object *obj)
 
 o->machine_done.notify = vfu_object_machine_done;
 qemu_add_machine_init_done_notifier(>machine_done);
+
+o->vfu_poll_fd = -1;
 }
 
 static void vfu_object_finalize(Object *obj)
-- 
1.8.3.1




[PATCH RFC server v2 07/11] vfio-user: handle DMA mappings

2021-08-27 Thread Jagannathan Raman
Define and register callbacks to manage the RAM regions used for
device DMA

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
 hw/remote/vfio-user-obj.c | 50 +++
 hw/remote/trace-events|  2 ++
 2 files changed, 52 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 13011ce..76fb2d4 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -177,6 +177,49 @@ static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, 
char * const buf,
 return count;
 }
 
+static void dma_register(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
+{
+MemoryRegion *subregion = NULL;
+g_autofree char *name = NULL;
+static unsigned int suffix;
+struct iovec *iov = >iova;
+
+if (!info->vaddr) {
+return;
+}
+
+name = g_strdup_printf("remote-mem-%u", suffix++);
+
+subregion = g_new0(MemoryRegion, 1);
+
+memory_region_init_ram_ptr(subregion, NULL, name,
+   iov->iov_len, info->vaddr);
+
+memory_region_add_subregion(get_system_memory(), (hwaddr)iov->iov_base,
+subregion);
+
+trace_vfu_dma_register((uint64_t)iov->iov_base, iov->iov_len);
+}
+
+static int dma_unregister(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
+{
+MemoryRegion *mr = NULL;
+ram_addr_t offset;
+
+mr = memory_region_from_host(info->vaddr, );
+if (!mr) {
+return 0;
+}
+
+memory_region_del_subregion(get_system_memory(), mr);
+
+object_unparent((OBJECT(mr)));
+
+trace_vfu_dma_unregister((uint64_t)info->iova.iov_base);
+
+return 0;
+}
+
 static void vfu_object_machine_done(Notifier *notifier, void *data)
 {
 VfuObject *o = container_of(notifier, VfuObject, machine_done);
@@ -226,6 +269,13 @@ static void vfu_object_machine_done(Notifier *notifier, 
void *data)
 return;
 }
 
+ret = vfu_setup_device_dma(o->vfu_ctx, _register, _unregister);
+if (ret < 0) {
+error_setg(_abort, "vfu: Failed to setup DMA handlers for %s",
+   o->devid);
+return;
+}
+
 ret = vfu_realize_ctx(o->vfu_ctx);
 if (ret < 0) {
 error_setg(_abort, "vfu: Failed to realize device %s- %s",
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index 2ef7884..f945c7e 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -7,3 +7,5 @@ mpqemu_recv_io_error(int cmd, int size, int nfds) "failed to 
receive %d size %d,
 vfu_prop(const char *prop, const char *val) "vfu: setting %s as %s"
 vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u -> 0x%x"
 vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u <- 0x%x"
+vfu_dma_register(uint64_t gpa, size_t len) "vfu: registering GPA 0x%"PRIx64", 
%zu bytes"
+vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
-- 
1.8.3.1




[PATCH RFC server v2 04/11] vfio-user: find and init PCI device

2021-08-27 Thread Jagannathan Raman
Find the PCI device with specified id. Initialize the device context
with the QEMU PCI device

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
 hw/remote/vfio-user-obj.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 99d3dd1..5ae0991 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -38,6 +38,8 @@
 #include "qapi/error.h"
 #include "sysemu/sysemu.h"
 #include "libvfio-user.h"
+#include "hw/qdev-core.h"
+#include "hw/pci/pci.h"
 
 #define TYPE_VFU_OBJECT "vfio-user"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -61,6 +63,8 @@ struct VfuObject {
 Notifier machine_done;
 
 vfu_ctx_t *vfu_ctx;
+
+PCIDevice *pci_dev;
 };
 
 static void vfu_object_set_socket(Object *obj, const char *str, Error **errp)
@@ -88,6 +92,8 @@ static void vfu_object_set_devid(Object *obj, const char 
*str, Error **errp)
 static void vfu_object_machine_done(Notifier *notifier, void *data)
 {
 VfuObject *o = container_of(notifier, VfuObject, machine_done);
+DeviceState *dev = NULL;
+int ret;
 
 o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket, 0,
 o, VFU_DEV_TYPE_PCI);
@@ -96,6 +102,28 @@ static void vfu_object_machine_done(Notifier *notifier, 
void *data)
strerror(errno));
 return;
 }
+
+dev = qdev_find_recursive(sysbus_get_default(), o->devid);
+if (dev == NULL) {
+error_setg(_abort, "vfu: Device %s not found", o->devid);
+return;
+}
+
+if (!object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
+error_setg(_abort, "vfu: %s not a PCI devices", o->devid);
+return;
+}
+
+o->pci_dev = PCI_DEVICE(dev);
+
+ret = vfu_pci_init(o->vfu_ctx, VFU_PCI_TYPE_CONVENTIONAL,
+   PCI_HEADER_TYPE_NORMAL, 0);
+if (ret < 0) {
+error_setg(_abort,
+   "vfu: Failed to attach PCI device %s to context - %s",
+   o->devid, strerror(errno));
+return;
+}
 }
 
 static void vfu_object_init(Object *obj)
-- 
1.8.3.1




[PATCH RFC server v2 03/11] vfio-user: instantiate vfio-user context

2021-08-27 Thread Jagannathan Raman
create a context with the vfio-user library to run a PCI device

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
 hw/remote/vfio-user-obj.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 4a1e297..99d3dd1 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -27,11 +27,17 @@
 #include "qemu/osdep.h"
 #include "qemu-common.h"
 
+#include 
+
 #include "qom/object.h"
 #include "qom/object_interfaces.h"
 #include "qemu/error-report.h"
 #include "trace.h"
 #include "sysemu/runstate.h"
+#include "qemu/notify.h"
+#include "qapi/error.h"
+#include "sysemu/sysemu.h"
+#include "libvfio-user.h"
 
 #define TYPE_VFU_OBJECT "vfio-user"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -51,6 +57,10 @@ struct VfuObject {
 
 char *socket;
 char *devid;
+
+Notifier machine_done;
+
+vfu_ctx_t *vfu_ctx;
 };
 
 static void vfu_object_set_socket(Object *obj, const char *str, Error **errp)
@@ -75,9 +85,23 @@ static void vfu_object_set_devid(Object *obj, const char 
*str, Error **errp)
 trace_vfu_prop("devid", str);
 }
 
+static void vfu_object_machine_done(Notifier *notifier, void *data)
+{
+VfuObject *o = container_of(notifier, VfuObject, machine_done);
+
+o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket, 0,
+o, VFU_DEV_TYPE_PCI);
+if (o->vfu_ctx == NULL) {
+error_setg(_abort, "vfu: Failed to create context - %s",
+   strerror(errno));
+return;
+}
+}
+
 static void vfu_object_init(Object *obj)
 {
 VfuObjectClass *k = VFU_OBJECT_GET_CLASS(obj);
+VfuObject *o = VFU_OBJECT(obj);
 
 if (!object_dynamic_cast(OBJECT(current_machine), TYPE_REMOTE_MACHINE)) {
 error_report("vfu: %s only compatible with %s machine",
@@ -92,6 +116,9 @@ static void vfu_object_init(Object *obj)
 }
 
 k->nr_devs++;
+
+o->machine_done.notify = vfu_object_machine_done;
+qemu_add_machine_init_done_notifier(>machine_done);
 }
 
 static void vfu_object_finalize(Object *obj)
@@ -101,6 +128,8 @@ static void vfu_object_finalize(Object *obj)
 
 k->nr_devs--;
 
+vfu_destroy_ctx(o->vfu_ctx);
+
 g_free(o->socket);
 g_free(o->devid);
 
-- 
1.8.3.1




[PATCH RFC server v2 11/11] vfio-user: acceptance test

2021-08-27 Thread Jagannathan Raman
Acceptance test for libvfio-user in QEMU

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
 MAINTAINERS   |  1 +
 tests/acceptance/vfio-user.py | 94 +++
 2 files changed, 95 insertions(+)
 create mode 100644 tests/acceptance/vfio-user.py

diff --git a/MAINTAINERS b/MAINTAINERS
index f9d8092..2c7332b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3392,6 +3392,7 @@ F: include/hw/remote/proxy-memory-listener.h
 F: hw/remote/iohub.c
 F: include/hw/remote/iohub.h
 F: hw/remote/vfio-user-obj.c
+F: tests/acceptance/vfio-user.py
 
 EBPF:
 M: Jason Wang 
diff --git a/tests/acceptance/vfio-user.py b/tests/acceptance/vfio-user.py
new file mode 100644
index 000..ef318d9
--- /dev/null
+++ b/tests/acceptance/vfio-user.py
@@ -0,0 +1,94 @@
+# vfio-user protocol sanity test
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or
+# later.  See the COPYING file in the top-level directory.
+
+
+import os
+import socket
+import uuid
+
+from avocado_qemu import Test
+from avocado_qemu import wait_for_console_pattern
+from avocado_qemu import exec_command
+from avocado_qemu import exec_command_and_wait_for_pattern
+
+class VfioUser(Test):
+"""
+:avocado: tags=vfiouser
+"""
+KERNEL_COMMON_COMMAND_LINE = 'printk.time=0 '
+
+def do_test(self, kernel_url, initrd_url, kernel_command_line,
+machine_type):
+"""Main test method"""
+self.require_accelerator('kvm')
+
+kernel_path = self.fetch_asset(kernel_url)
+initrd_path = self.fetch_asset(initrd_url)
+
+socket = os.path.join('/tmp', str(uuid.uuid4()))
+if os.path.exists(socket):
+os.remove(socket)
+
+# Create remote process
+remote_vm = self.get_vm()
+remote_vm.add_args('-machine', 'x-remote')
+remote_vm.add_args('-nodefaults')
+remote_vm.add_args('-device', 'lsi53c895a,id=lsi1')
+remote_vm.add_args('-object', 'vfio-user,id=vfioobj1,'
+   'devid=lsi1,socket='+socket)
+remote_vm.launch()
+
+# Create proxy process
+self.vm.set_console()
+self.vm.add_args('-machine', machine_type)
+self.vm.add_args('-accel', 'kvm')
+self.vm.add_args('-cpu', 'host')
+self.vm.add_args('-object',
+ 'memory-backend-memfd,id=sysmem-file,size=2G')
+self.vm.add_args('--numa', 'node,memdev=sysmem-file')
+self.vm.add_args('-m', '2048')
+self.vm.add_args('-kernel', kernel_path,
+ '-initrd', initrd_path,
+ '-append', kernel_command_line)
+self.vm.add_args('-device',
+ 'vfio-user-pci,'
+ 'socket='+socket)
+self.vm.launch()
+wait_for_console_pattern(self, 'as init process',
+ 'Kernel panic - not syncing')
+exec_command(self, 'mount -t sysfs sysfs /sys')
+exec_command_and_wait_for_pattern(self,
+  'cat /sys/bus/pci/devices/*/uevent',
+  'PCI_ID=1000:0012')
+
+def test_multiprocess_x86_64(self):
+"""
+:avocado: tags=arch:x86_64
+"""
+kernel_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+  '/linux/releases/31/Everything/x86_64/os/images'
+  '/pxeboot/vmlinuz')
+initrd_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+  '/linux/releases/31/Everything/x86_64/os/images'
+  '/pxeboot/initrd.img')
+kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
+   'console=ttyS0 rdinit=/bin/bash')
+machine_type = 'pc'
+self.do_test(kernel_url, initrd_url, kernel_command_line, machine_type)
+
+def test_multiprocess_aarch64(self):
+"""
+:avocado: tags=arch:aarch64
+"""
+kernel_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+  '/linux/releases/31/Everything/aarch64/os/images'
+  '/pxeboot/vmlinuz')
+initrd_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+  '/linux/releases/31/Everything/aarch64/os/images'
+  '/pxeboot/initrd.img')
+kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
+   'rdinit=/bin/bash console=ttyAMA0')
+machine_type = 'virt,gic-version=3'
+self.do_test(kernel_url, initrd_url, kernel_command_line, machine_type)
-- 
1.8.3.1




[PATCH RFC server v2 00/11] vfio-user server in QEMU

2021-08-27 Thread Jagannathan Raman
Hi,

This series depends on the following series from
Elena Ufimtseva :
[PATCH RFC v2 00/16] vfio-user implementation

Thank you for your feedback for the v1 patches!
https://www.mail-archive.com/qemu-devel@nongnu.org/msg825021.html

We have incorporated the following feedback from v1 of the
review cycle:

[PATCH RFC server v2 01/11] vfio-user: build library
  - Using cmake subproject to build libvfio-user

[PATCH RFC server v2 02/11] vfio-user: define vfio-user object
  - Added check to confirm that TYPE_REMOTE_MACHINE is used
with TYPE_VFU_OBJECT

[PATCH RFC server v2 04/11] vfio-user: find and init PCI device
  - Removed call to vfu_pci_set_id()
  - Added check to confirm that TYPE_PCI_DEVICE is used with
TYPE_VFU_OBJECT

[PATCH RFC server v2 05/11] vfio-user: run vfio-user context
  - Using QEMU main-loop to drive the vfu_ctx (using
vfu_get_poll_fd() & qemu_set_fd_handler())
  - Set vfu_ctx to non-blocking mode (LIBVFIO_USER_FLAG_ATTACH_NB)
  - Modified how QEMU attaches to the vfu_ctx

[PATCH RFC server v2 06/11] handle PCI config space accesses
  - Broke-up PCI config space access to 4-byte accesses

[PATCH RFC server v2 07/11] vfio-user: handle DMA mappings
  - Received feedback to assert that vfu_dma_info_t->vaddr is not
NULL - unable to do it as it appears to be a valid case.

[PATCH RFC server v2 10/11] register handlers to facilitate migration
  - Migrate only one device's data per contect

Would appreciate if you could kindly review this v2 series. Looking
forward to your comments.

Thank you!

Jagannathan Raman (11):
  vfio-user: build library
  vfio-user: define vfio-user object
  vfio-user: instantiate vfio-user context
  vfio-user: find and init PCI device
  vfio-user: run vfio-user context
  vfio-user: handle PCI config space accesses
  vfio-user: handle DMA mappings
  vfio-user: handle PCI BAR accesses
  vfio-user: handle device interrupts
  vfio-user: register handlers to facilitate migration
  vfio-user: acceptance test

 configure |  11 +
 meson.build   |  28 ++
 qapi/qom.json |  20 +-
 include/hw/remote/iohub.h |   2 +
 migration/savevm.h|   2 +
 hw/remote/iohub.c |   5 +
 hw/remote/vfio-user-obj.c | 803 ++
 migration/savevm.c|  73 
 .gitmodules   |   3 +
 MAINTAINERS   |   9 +
 hw/remote/meson.build |   3 +
 hw/remote/trace-events|  10 +
 subprojects/libvfio-user  |   1 +
 tests/acceptance/vfio-user.py |  94 +
 14 files changed, 1062 insertions(+), 2 deletions(-)
 create mode 100644 hw/remote/vfio-user-obj.c
 create mode 16 subprojects/libvfio-user
 create mode 100644 tests/acceptance/vfio-user.py

-- 
1.8.3.1




[PATCH RFC server v2 01/11] vfio-user: build library

2021-08-27 Thread Jagannathan Raman
add the libvfio-user library as a submodule. build it as a cmake
subproject.

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
 configure| 11 +++
 meson.build  | 28 
 .gitmodules  |  3 +++
 MAINTAINERS  |  7 +++
 hw/remote/meson.build|  2 ++
 subprojects/libvfio-user |  1 +
 6 files changed, 52 insertions(+)
 create mode 16 subprojects/libvfio-user

diff --git a/configure b/configure
index 9a79a00..794e900 100755
--- a/configure
+++ b/configure
@@ -4291,6 +4291,17 @@ but not implemented on your system"
 fi
 
 ##
+# check for multiprocess
+
+case "$multiprocess" in
+  auto | enabled )
+if test "$git_submodules_action" != "ignore"; then
+  git_submodules="${git_submodules} libvfio-user"
+fi
+;;
+esac
+
+##
 # End of CC checks
 # After here, no more $cc or $ld runs
 
diff --git a/meson.build b/meson.build
index bf63784..2b2d5c2 100644
--- a/meson.build
+++ b/meson.build
@@ -1898,6 +1898,34 @@ if get_option('cfi') and slirp_opt == 'system'
  + ' Please configure with --enable-slirp=git')
 endif
 
+vfiouser = not_found
+if have_system and multiprocess_allowed
+  have_internal = fs.exists(meson.current_source_dir() / 
'subprojects/libvfio-user/Makefile')
+
+  if not have_internal
+error('libvfio-user source not found - please pull git submodule')
+  endif
+
+  json_c = dependency('json-c', required: false)
+if not json_c.found()
+  json_c = dependency('libjson-c')
+  endif
+
+  cmake = import('cmake')
+
+  vfiouser_subproj = cmake.subproject('libvfio-user')
+
+  vfiouser_sl = vfiouser_subproj.dependency('vfio-user-static')
+
+  # Although cmake links the json-c library with vfio-user-static
+  # target, that info is not available to meson via cmake.subproject.
+  # As such, we have to separately declare the json-c dependency here.
+  # This appears to be a current limitation of using cmake inside meson.
+  # libvfio-user is planning a switch to meson in the future, which
+  # would address this item automatically.
+  vfiouser = declare_dependency(dependencies: [vfiouser_sl, json_c])
+endif
+
 fdt = not_found
 fdt_opt = get_option('fdt')
 if have_system
diff --git a/.gitmodules b/.gitmodules
index 08b1b48..cfeea7c 100644
--- a/.gitmodules
+++ b/.gitmodules
@@ -64,3 +64,6 @@
 [submodule "roms/vbootrom"]
path = roms/vbootrom
url = https://gitlab.com/qemu-project/vbootrom.git
+[submodule "subprojects/libvfio-user"]
+   path = subprojects/libvfio-user
+   url = https://github.com/nutanix/libvfio-user.git
diff --git a/MAINTAINERS b/MAINTAINERS
index 4039d3c..0c5a18e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3361,6 +3361,13 @@ F: semihosting/
 F: include/semihosting/
 F: tests/tcg/multiarch/arm-compat-semi/
 
+libvfio-user Library
+M: Thanos Makatos 
+M: John Levon 
+T: https://github.com/nutanix/libvfio-user.git
+S: Maintained
+F: subprojects/libvfio-user/*
+
 Multi-process QEMU
 M: Elena Ufimtseva 
 M: Jagannathan Raman 
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
index e6a5574..fb35fb8 100644
--- a/hw/remote/meson.build
+++ b/hw/remote/meson.build
@@ -7,6 +7,8 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: 
files('remote-obj.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('iohub.c'))
 
+remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: vfiouser)
+
 specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('memory.c'))
 specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: 
files('proxy-memory-listener.c'))
 
diff --git a/subprojects/libvfio-user b/subprojects/libvfio-user
new file mode 16
index 000..647c934
--- /dev/null
+++ b/subprojects/libvfio-user
@@ -0,0 +1 @@
+Subproject commit 647c9341d2e06266a710ddd075f69c95dd3b8446
-- 
1.8.3.1




[PATCH RFC server v2 02/11] vfio-user: define vfio-user object

2021-08-27 Thread Jagannathan Raman
Define vfio-user object which is remote process server for QEMU. Setup
object initialization functions and properties necessary to instantiate
the object

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
 qapi/qom.json |  20 ++-
 hw/remote/vfio-user-obj.c | 145 ++
 MAINTAINERS   |   1 +
 hw/remote/meson.build |   1 +
 hw/remote/trace-events|   3 +
 5 files changed, 168 insertions(+), 2 deletions(-)
 create mode 100644 hw/remote/vfio-user-obj.c

diff --git a/qapi/qom.json b/qapi/qom.json
index a25616b..3e941ee 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -689,6 +689,20 @@
   'data': { 'fd': 'str', 'devid': 'str' } }
 
 ##
+# @VfioUserProperties:
+#
+# Properties for vfio-user objects.
+#
+# @socket: path to be used as socket by the libvfiouser library
+#
+# @devid: the id of the device to be associated with the file descriptor
+#
+# Since: 6.0
+##
+{ 'struct': 'VfioUserProperties',
+  'data': { 'socket': 'str', 'devid': 'str' } }
+
+##
 # @RngProperties:
 #
 # Properties for objects of classes derived from rng.
@@ -812,7 +826,8 @@
 'tls-creds-psk',
 'tls-creds-x509',
 'tls-cipher-suites',
-'x-remote-object'
+'x-remote-object',
+'vfio-user'
   ] }
 
 ##
@@ -868,7 +883,8 @@
   'tls-creds-psk':  'TlsCredsPskProperties',
   'tls-creds-x509': 'TlsCredsX509Properties',
   'tls-cipher-suites':  'TlsCredsProperties',
-  'x-remote-object':'RemoteObjectProperties'
+  'x-remote-object':'RemoteObjectProperties',
+  'vfio-user':  'VfioUserProperties'
   } }
 
 ##
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
new file mode 100644
index 000..4a1e297
--- /dev/null
+++ b/hw/remote/vfio-user-obj.c
@@ -0,0 +1,145 @@
+/**
+ * QEMU vfio-user server object
+ *
+ * Copyright © 2021 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL-v2, version 2 or later.
+ *
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+/**
+ * Usage: add options:
+ * -machine x-remote
+ * -device ,id=
+ * -object vfio-user,id=,socket=,devid=
+ *
+ * Note that vfio-user object must be used with x-remote machine only. This
+ * server could only support PCI devices for now.
+ *
+ * socket is path to a file. This file will be created by the server. It is
+ * a required option
+ *
+ * devid is the id of a PCI device on the server. It is also a required option.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+
+#include "qom/object.h"
+#include "qom/object_interfaces.h"
+#include "qemu/error-report.h"
+#include "trace.h"
+#include "sysemu/runstate.h"
+
+#define TYPE_VFU_OBJECT "vfio-user"
+OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
+
+struct VfuObjectClass {
+ObjectClass parent_class;
+
+unsigned int nr_devs;
+
+/* Maximum number of devices the server could support */
+unsigned int max_devs;
+};
+
+struct VfuObject {
+/* private */
+Object parent;
+
+char *socket;
+char *devid;
+};
+
+static void vfu_object_set_socket(Object *obj, const char *str, Error **errp)
+{
+VfuObject *o = VFU_OBJECT(obj);
+
+g_free(o->socket);
+
+o->socket = g_strdup(str);
+
+trace_vfu_prop("socket", str);
+}
+
+static void vfu_object_set_devid(Object *obj, const char *str, Error **errp)
+{
+VfuObject *o = VFU_OBJECT(obj);
+
+g_free(o->devid);
+
+o->devid = g_strdup(str);
+
+trace_vfu_prop("devid", str);
+}
+
+static void vfu_object_init(Object *obj)
+{
+VfuObjectClass *k = VFU_OBJECT_GET_CLASS(obj);
+
+if (!object_dynamic_cast(OBJECT(current_machine), TYPE_REMOTE_MACHINE)) {
+error_report("vfu: %s only compatible with %s machine",
+ TYPE_VFU_OBJECT, TYPE_REMOTE_MACHINE);
+return;
+}
+
+if (k->nr_devs >= k->max_devs) {
+error_report("Reached maximum number of vfio-user devices: %u",
+ k->max_devs);
+return;
+}
+
+k->nr_devs++;
+}
+
+static void vfu_object_finalize(Object *obj)
+{
+VfuObjectClass *k = VFU_OBJECT_GET_CLASS(obj);
+VfuObject *o = VFU_OBJECT(obj);
+
+k->nr_devs--;
+
+g_free(o->socket);
+g_free(o->devid);
+
+if (k->nr_devs == 0) {
+qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
+}
+}
+
+static void vfu_object_class_init(ObjectClass *klass, void *data)
+{
+VfuObjectClass *k = VFU_OBJECT_CLASS(klass);
+
+/* Limiting maximum number of devices to 1 until IOMMU support is added */
+k->max_devs = 1;
+k->nr_devs = 0;
+
+object_class_property_add_str(klass, "socket", NULL,
+  vfu_object_set_socket);
+object_class_property_add_str(klass, "devid", NULL,
+  vfu_object_set_devid);
+}
+
+static const TypeInfo vfu_object_info = {

[Bug 1819289] Re: Windows 95 and Windows 98 will not install or run

2021-08-27 Thread Philippe Mathieu-Daudé
Per Brad on IRC, this issue persists. Re-opening to move to GitLab.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1819289

Title:
  Windows 95 and Windows 98 will not install or run

Status in QEMU:
  Fix Released

Bug description:
  The last version of QEMU I have been able to run Windows 95 or Windows
  98 on was 2.7 or 2.8. Recent versions since then even up to 3.1 will
  either not install or will not run 95 or 98 at all. I have tried every
  combination of options like isapc or no isapc, cpu pentium  or cpu as
  486. Tried different memory configurations, but they just don't work
  anymore.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1819289/+subscriptions




RE: [EXTERNAL] [PULL 08/15] whpx nvmm: Drop useless migrate_del_blocker()

2021-08-27 Thread Sunil Muthuswamy
> -Original Message-
> From: Markus Armbruster 
> Sent: Thursday, August 26, 2021 9:51 PM
> To: qemu-devel@nongnu.org
> Cc: peter.mayd...@linaro.org; Sunil Muthuswamy ; 
> Kamil Rytarowski ; Reinoud Zandijk
> ; Reinoud Zandijk ; Michael S . 
> Tsirkin 
> Subject: [EXTERNAL] [PULL 08/15] whpx nvmm: Drop useless migrate_del_blocker()
> 
> There is nothing to delete after migrate_add_blocker() failed.  Trying
> anyway is safe, but useless.  Don't.
> 
> Cc: Sunil Muthuswamy 
> Cc: Kamil Rytarowski 
> Cc: Reinoud Zandijk 
> Signed-off-by: Markus Armbruster 
> Message-Id: <20210720125408.387910-9-arm...@redhat.com>
> Reviewed-by: Reinoud Zandijk 
> Acked-by: Michael S. Tsirkin 
> ---
>  target/i386/nvmm/nvmm-all.c | 1 -
>  target/i386/whpx/whpx-all.c | 1 -
>  2 files changed, 2 deletions(-)
> 
> diff --git a/target/i386/nvmm/nvmm-all.c b/target/i386/nvmm/nvmm-all.c
> index dfa690d65d..7bb0d9e30e 100644
> --- a/target/i386/nvmm/nvmm-all.c
> +++ b/target/i386/nvmm/nvmm-all.c
> @@ -929,7 +929,6 @@ nvmm_init_vcpu(CPUState *cpu)
>  (void)migrate_add_blocker(nvmm_migration_blocker, _error);
>  if (local_error) {
>  error_report_err(local_error);
> -migrate_del_blocker(nvmm_migration_blocker);
>  error_free(nvmm_migration_blocker);
>  return -EINVAL;
>  }
> diff --git a/target/i386/whpx/whpx-all.c b/target/i386/whpx/whpx-all.c
> index f832f286ac..cc8c0b984b 100644
> --- a/target/i386/whpx/whpx-all.c
> +++ b/target/i386/whpx/whpx-all.c
> @@ -1349,7 +1349,6 @@ int whpx_init_vcpu(CPUState *cpu)
>  (void)migrate_add_blocker(whpx_migration_blocker, _error);
>  if (local_error) {
>  error_report_err(local_error);
> -migrate_del_blocker(whpx_migration_blocker);
>  error_free(whpx_migration_blocker);
>  ret = -EINVAL;
>  goto error;
> --
> 2.31.1

Signed-off-by: Sunil Muthuswamy 




Re: [PATCH] nbd/server: Advertise MULTI_CONN for shared writable exports

2021-08-27 Thread Vladimir Sementsov-Ogievskiy

27.08.2021 18:09, Eric Blake wrote:

According to the NBD spec, a server advertising
NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will
not see any cache inconsistencies: when properly separated by a single
flush, actions performed by one client will be visible to another
client, regardless of which client did the flush.  We satisfy these
conditions in qemu because our block layer serializes any overlapping
operations (see bdrv_find_conflicting_request and friends)


Not any. We serialize only write operations not aligned to request_alignment of 
bs (see bdrv_make_request_serialising() call in bdrv_co_pwritev_part). So, 
actually most of overlapping operations remain overlapping. And that's correct: 
it's not a Qemu work to resolve overlapping requests. We resolve them only when 
we are responsible for appearing of intersection: when we align requests up.

--
Best regards,
Vladimir



Re: intermittent failure, s390 host, x86-64 guest, vhost-user-blk test

2021-08-27 Thread Raphael Norwitz
On Wed, Aug 04, 2021 at 09:43:25AM -0400, Michael S. Tsirkin wrote:
> On Wed, Aug 04, 2021 at 01:40:37PM +0100, Peter Maydell wrote:
> > Saw this intermittent as part of my ad-hoc CI test setup.
> > The backtrace says that the QEMU process has somehow hung
> > during 'realize' of the vhost-user device...
> 
> Hmm seems to have to do with the daemon not responding ...
> Cc a bunch more people ...

On a recent master with the meson dependency fix (commit ID:
cc1838c25d55e7f478cd493431679337e24e1b72) I’ve been able to reproduce
this quite easily with the following command from the make check-qtest
V=1 output.

#!/bin/sh

 I="0"
 while [ true ] 
 do 
 echo "=== iteration $I ===" 

 ALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} \
 QTEST_QEMU_IMG=./qemu-img \
 
G_TEST_DBUS_DAEMON=/home/raphael/pub_src/qemu_checkouts/qemu/tests/dbus-vmstate-daemon.sh
 \
 QTEST_QEMU_BINARY=./qemu-system-x86_64 \
 QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon \
 tests/qtest/qos-test --tap -k 

I=$((I+1)) 
done 

I typically hit it within 10 iterations.

Looking at the trace provided, we see that QEMU has sent a
VHOST_USER_GET_PROTOCOL_FEATURES message and is stuck waiting for a
reply. Meanwhile the storage-daemon is stuck waiting for a message. In
the vhost-user-blk-test, as of now there is nothing stoping
vhost-user-blk in QEMU writing to the socket right after forking off the
storage daemon before it has a chance to come up properly, leaving the
test hanging forever. I’ve seen this happen before with other
backends/device types. I added a one second sleep to the test case
after the test forks off the storage daemon, and it resolved the hang
completely for me. I ran the test for 1000+ iterations and couldn’t
reproduce the failure with the sleep.

I just mailed some patches which pipe the stdout of the storage-daemon
back to he vhost-user-blk-test. The storage-daemon prints a log when
it is fully initialized which wakes up vhost-user-blk-test. This
should reliably resolve the hang.

I’m thinking we could also change the vhost-user code to timeout (or
possibly retry?) if the first VHOST_USER_GET_PROTOCOL_FEATURES
message doesn’t receive a response, as it’s better to fail power-on then
hang forever. If others agree I can pull together an RFC.

> 
> > Process tree:
> > 
> > ubuntu   59552  0.0  0.0   5460  3120 ?S06:51   0:00
> >\_ bash -o pipefail -c echo
> > 'MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}
> > QTEST_QEMU_IMG=./qemu-img
> > G_TEST_DBUS_DAEMON=/home/ubuntu/qemu/tests/dbus-vmstate-daemon.sh
> > QTEST_QEMU_BINARY=./qemu-system-x86_64
> > QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon
> > tests/qtest/qos-test --tap -k' &&
> > MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}
> > QTEST_QEMU_IMG=./qemu-img
> > G_TEST_DBUS_DAEMON=/home/ubuntu/qemu/tests/dbus-vmstate-daemon.sh
> > QTEST_QEMU_BINARY=./qemu-system-x86_64
> > QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon
> > tests/qtest/qos-test --tap -k -m quick < /dev/null |
> > ./scripts/tap-driver.pl --test-name="qtest-x86_64/qos-test"
> > ubuntu   59553  0.0  0.2  88008  8332 ?Sl   06:51   0:00
> >\_ tests/qtest/qos-test --tap -k -m quick
> > ubuntu   60122  0.0  0.5 254432 21516 ?Sl   06:51   0:00
> >|   \_ ./storage-daemon/qemu-storage-daemon --blockdev
> > driver=file,node-name=disk0,filename=qtest.trJnuR --export
> > type=vhost-user-blk,id=disk0,addr.type=unix,addr.path=/tmp/qtest-59553-sock.iRt6WL,node-name=disk0,writable=on,num-queues=1
> > ubuntu   60123  0.0  1.6 747952 62732 ?Sl   06:51   0:00
> >|   \_ ./qemu-system-x86_64 -qtest
> > unix:/tmp/qtest-59553.sock -qtest-log /dev/null -chardev
> > socket,path=/tmp/qtest-59553.qmp,id=char0 -mon
> > chardev=char0,mode=control -display none -M pc -device
> > vhost-user-blk-pci,id=drv0,chardev=char1,addr=4.0 -object
> > memory-backend-memfd,id=mem,size=256M,share=on -M memory-backend=mem
> > -m 256M -chardev socket,id=char1,path=/tmp/qtest-59553-sock.iRt6WL
> > -accel qtest
> > ubuntu   59554  0.0  0.2  14116 11204 ?S06:51   0:00
> >\_ perl ./scripts/tap-driver.pl
> > --test-name=qtest-x86_64/qos-test
> > 
> > Backtrace, qemu process:
> > 
> > Thread 4 (Thread 0x3ff5be1d910 (LWP 60131)):
> > #0  0x03ff7e2bed6a in __GI___sigtimedwait (set=,
> > set@entry=0x3ff5be1a098, info=info@entry=0x3ff5be19f68,
> > timeout=timeout@entry=0x0)
> > at ../sysdeps/unix/sysv/linux/sigtimedwait.c:42
> > #1  0x03ff7e492ca8 in __sigwait (set=set@entry=0x3ff5be1a098,
> > sig=sig@entry=0x3ff5be1a094)
> > at ../sysdeps/unix/sysv/linux/sigwait.c:28
> > #2  0x02aa16891ab2 in dummy_cpu_thread_fn
> > (arg=arg@entry=0x2aa182103e0) at ../../accel/dummy-cpus.c:46
> > #3  0x02aa16b22f5a in qemu_thread_start (args=) at
> > ../../util/qemu-thread-posix.c:541
> > #4  

[PATCH 2/2] Prevent vhost-user-blk-test hang

2021-08-27 Thread Raphael Norwitz
In the vhost-user-blk-test, as of now there is nothing stoping
vhost-user-blk in QEMU writing to the socket right after forking off the
storage daemon before it has a chance to come up properly, leaving the
test hanging forever. This intermittently hanging test has caused QEMU
automation failures reported multiple times on the mailing list [1].

This change makes the storage-daemon notify the vhost-user-blk-test
that it is fully initialized and ready to handle client connections via
a pipefd before allowing the test to proceed. This ensures that the
storage-daemon backend won't miss vhost-user messages and thereby
resolves the hang.

[1] 
https://lore.kernel.org/qemu-devel/CAFEAcA8kYpz9LiPNxnWJAPSjc=nv532bedyfynabemeohqb...@mail.gmail.com/

Signed-off-by: Raphael Norwitz 
---
 tests/qtest/vhost-user-blk-test.c | 33 ---
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/tests/qtest/vhost-user-blk-test.c 
b/tests/qtest/vhost-user-blk-test.c
index 6f108a1b62..b62af449df 100644
--- a/tests/qtest/vhost-user-blk-test.c
+++ b/tests/qtest/vhost-user-blk-test.c
@@ -21,6 +21,8 @@
 #include "libqos/vhost-user-blk.h"
 #include "libqos/libqos-pc.h"
 
+const char *daemon_msg = "Block exports setup\n";
+
 #define TEST_IMAGE_SIZE (64 * 1024 * 1024)
 #define QVIRTIO_BLK_TIMEOUT_US  (30 * 1000 * 1000)
 #define PCI_SLOT_HP 0x06
@@ -885,7 +887,8 @@ static void start_vhost_user_blk(GString *cmd_line, int 
vus_instances,
  int num_queues)
 {
 const char *vhost_user_blk_bin = qtest_qemu_storage_daemon_binary();
-int i;
+int i, err, pipe_fds[2];
+char buf[32] = {0};
 gchar *img_path;
 GString *storage_daemon_command = g_string_new(NULL);
 QemuStorageDaemonState *qsd;
@@ -898,6 +901,12 @@ static void start_vhost_user_blk(GString *cmd_line, int 
vus_instances,
 " -object memory-backend-memfd,id=mem,size=256M,share=on "
 " -M memory-backend=mem -m 256M ");
 
+err = pipe(pipe_fds);
+if (err != 0) {
+fprintf(stderr, "start_vhost_user_blk: pipe() failed %m\n");
+abort();
+}
+
 for (i = 0; i < vus_instances; i++) {
 int fd;
 char *sock_path = create_listen_socket();
@@ -914,22 +923,40 @@ static void start_vhost_user_blk(GString *cmd_line, int 
vus_instances,
i + 1, sock_path);
 }
 
+g_string_append_printf(storage_daemon_command, "--printset");
+
 g_test_message("starting vhost-user backend: %s",
storage_daemon_command->str);
+
 pid_t pid = fork();
 if (pid == 0) {
+close(pipe_fds[0]);
+
 /*
  * Close standard file descriptors so tap-driver.pl pipe detects when
  * our parent terminates.
  */
 close(0);
-close(1);
 open("/dev/null", O_RDONLY);
-open("/dev/null", O_WRONLY);
+close(1);
+dup2(pipe_fds[1], 1);
 
 execlp("/bin/sh", "sh", "-c", storage_daemon_command->str, NULL);
 exit(1);
 }
+
+close(pipe_fds[1]);
+
+err = read(pipe_fds[0], buf, 20);
+if (err < 0) {
+fprintf(stderr, "Failed to read from storage-daemon pipe %m\n");
+abort();
+} else if (strcmp(buf, daemon_msg) != 0) {
+fprintf(stderr, "qemu-storage-daemon did not write expected messaage "
+"to the pipe. Total bytes read: %d. Got: %s\n", err, buf);
+abort();
+}
+
 g_string_free(storage_daemon_command, true);
 
 qsd = g_new(QemuStorageDaemonState, 1);
-- 
2.20.1



[PATCH 1/2] storage-daemon: add opt to print when initialized

2021-08-27 Thread Raphael Norwitz
This change adds a command line option to print a line to standard out
when the storage daemon has completed initialization and is ready to
serve client connections.

This option will be used to resolve a hang in the vhost-user-blk-test.

Signed-off-by: Raphael Norwitz 
---
 storage-daemon/qemu-storage-daemon.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/storage-daemon/qemu-storage-daemon.c 
b/storage-daemon/qemu-storage-daemon.c
index fc8b150629..c4f76d1564 100644
--- a/storage-daemon/qemu-storage-daemon.c
+++ b/storage-daemon/qemu-storage-daemon.c
@@ -61,6 +61,9 @@
 
 static const char *pid_file;
 static volatile bool exit_requested = false;
+static bool print_setup;
+
+const char *init_msg = "Block exports setup\n";
 
 void qemu_system_killed(int signal, pid_t pid)
 {
@@ -82,6 +85,7 @@ static void help(void)
 "  -T, --trace [[enable=]][,events=][,file=]\n"
 " specify tracing options\n"
 "  -V, --version  output version information and exit\n"
+"  -P, --printset print to stdout once server is fully initialized\n"
 "\n"
 "  --blockdev [driver=][,node-name=][,discard=ignore|unmap]\n"
 " [,cache.direct=on|off][,cache.no-flush=on|off]\n"
@@ -174,6 +178,7 @@ static void process_options(int argc, char *argv[])
 {"nbd-server", required_argument, NULL, OPTION_NBD_SERVER},
 {"object", required_argument, NULL, OPTION_OBJECT},
 {"pidfile", required_argument, NULL, OPTION_PIDFILE},
+{"printset", no_argument, NULL, 'P'},
 {"trace", required_argument, NULL, 'T'},
 {"version", no_argument, NULL, 'V'},
 {0, 0, 0, 0}
@@ -195,6 +200,9 @@ static void process_options(int argc, char *argv[])
 trace_opt_parse(optarg);
 trace_init_file();
 break;
+case 'P':
+print_setup = true;
+break;
 case 'V':
 printf("qemu-storage-daemon version "
QEMU_FULL_VERSION "\n" QEMU_COPYRIGHT "\n");
@@ -310,6 +318,7 @@ static void pid_file_init(void)
 
 int main(int argc, char *argv[])
 {
+int err;
 #ifdef CONFIG_POSIX
 signal(SIGPIPE, SIG_IGN);
 #endif
@@ -341,6 +350,18 @@ int main(int argc, char *argv[])
  */
 pid_file_init();
 
+/*
+ * If requested to pipe output once exports are initialized, print to
+ * stdout.
+ */
+if (print_setup) {
+err = write(1, init_msg, strlen(init_msg));
+if (err == -1) {
+fprintf(stderr, "Write to pipe failed: %m\n");
+return -1;
+}
+}
+
 while (!exit_requested) {
 main_loop_wait(false);
 }
-- 
2.20.1



Re: [PATCH] nbd/server: Advertise MULTI_CONN for shared writable exports

2021-08-27 Thread Richard W.M. Jones
On Fri, Aug 27, 2021 at 10:09:16AM -0500, Eric Blake wrote:
> +# Parallel client connections are easier with libnbd than qemu-io:
> +if ! (nbdsh --version) >/dev/null 2>&1; then

I'm curious why the parentheses are needed here?

> +export nbd_unix_socket
> +nbdsh -c '
> +import os
> +sock = os.getenv("nbd_unix_socket")
> +h = []
> +
> +for i in range(3):
> +  h.append(nbd.NBD())
> +  h[i].connect_unix(sock)
> +  assert h[i].can_multi_conn()
> +
> +buf1 = h[0].pread(1024 * 1024, 0)
> +if buf1 != b"\x01" * 1024 * 1024:
> +  print("Unexpected initial read")
> +buf2 = b"\x03" * 1024 * 1024
> +h[1].pwrite(buf2, 0)   #1
> +h[2].flush()
> +buf1 = h[0].pread(1024 * 1024, 0)
> +if buf1 == buf2:
> +  print("Flush appears to be consistent across connections")

The test is ... simple.

Would it be a better test if the statement at line #1 above was
h.aio_pwrite instead, so the operation is only started?  This would
depend on some assumptions inside libnbd: That the h[1] socket is
immediately writable and that libnbd's statement will write before
returning, which are likely to be true, but perhaps you could do
something with parsing libnbd debug statements to check that the state
machine has started the write.

Another idea would be to make the write at line #1 be very large, so
that perhaps the qemu block layer would split it up.  After the flush
you would read the beginning and end bytes of the written part to
approximate a check that the whole write has been flushed so parts of
it are not hanging around in the cache of the first connection.

Anyway the patch as written seems fine to me so:

Reviewed-by: Richard W.M. Jones 

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html




Re: [PATCH v2 41/43] bsd-user: Implement cpu_copy() helper routine

2021-08-27 Thread Warner Losh


> On Aug 27, 2021, at 10:00 AM, Philippe Mathieu-Daudé  wrote:
> 
> On 8/27/21 4:56 PM, Warner Losh wrote:
>>> On Aug 26, 2021, at 10:47 PM, Philippe Mathieu-Daudé  
>>> wrote:
>>> 
>>> On 8/26/21 11:11 PM, i...@bsdimp.com wrote:
 From: Warner Losh 
 
 cpu_copy shouldbe called when processes are creating new threads. It
>>> 
>>> Typo "should be"
>>> 
 copies the current state of the CPU to a new cpu state needed for the
 new thread.
 
 Signed-off-by: Stacey Son 
 Signed-off-by: Warner Losh 
 Signed-off-by: Justin Hibbits 
 Reviewed-by: Richard Henderson 
 ---
 bsd-user/main.c | 30 ++
 1 file changed, 30 insertions(+)
 
 diff --git a/bsd-user/main.c b/bsd-user/main.c
 index e2ed9e32ba..b35bcf4d1e 100644
 --- a/bsd-user/main.c
 +++ b/bsd-user/main.c
 @@ -180,6 +180,36 @@ void init_task_state(TaskState *ts)
ts->sigqueue_table[i].next = NULL;
 }
 
 +CPUArchState *cpu_copy(CPUArchState *env)
 +{
 +CPUState *cpu = env_cpu(env);
 +CPUState *new_cpu = cpu_create(cpu_type);
 +CPUArchState *new_env = new_cpu->env_ptr;
 +CPUBreakpoint *bp;
 +CPUWatchpoint *wp;
 +
 +/* Reset non arch specific state */
 +cpu_reset(new_cpu);
 +
 +memcpy(new_env, env, sizeof(CPUArchState));
 +
 +/*
 + * Clone all break/watchpoints.
 + * Note: Once we support ptrace with hw-debug register access, make 
 sure
 + * BP_CPU break/watchpoints are handled correctly on clone.
 + */
 +QTAILQ_INIT(>breakpoints);
 +QTAILQ_INIT(>watchpoints);
 +QTAILQ_FOREACH(bp, >breakpoints, entry) {
 +cpu_breakpoint_insert(new_cpu, bp->pc, bp->flags, NULL);
 +}
 +QTAILQ_FOREACH(wp, >watchpoints, entry) {
 +cpu_watchpoint_insert(new_cpu, wp->vaddr, wp->len, wp->flags, 
 NULL);
 +}
 +
 +return new_env;
 +}
>>> 
>>> But where is it called?
>> 
>> It’s in the bsd-user fork’d proc code:
>> 
>> https://github.com/qemu-bsd-user/qemu-bsd-user/blob/079d45942db8d1038806cb459992b4f016b52b51/bsd-user/freebsd/os-thread.c#L1566
>> 
>> Is where it’s called from. I wanted to get it out of the way in this review 
>> since I was trying to get all the changes to main.c done, but if you’d like, 
>> I can drop it and submit in the next round.
> 
> Better keep it for next round :)

OK. I’ll drop and queue up next time.

Warner



signature.asc
Description: Message signed with OpenPGP


Re: [PATCH v2 30/43] bsd-user: Remove dead #ifdefs from elfload.c

2021-08-27 Thread Warner Losh


> On Aug 27, 2021, at 9:58 AM, Philippe Mathieu-Daudé  wrote:
> 
> On 8/27/21 5:02 PM, Warner Losh wrote:
>>> On Aug 26, 2021, at 10:42 PM, Philippe Mathieu-Daudé  
>>> wrote:
>>> 
>>> On 8/26/21 11:11 PM, i...@bsdimp.com wrote:
 From: Warner Losh 
 
 LOW_ELF_STACK doesn't exist on FreeBSD and likely never will. Remove it.
 Likewise, remove an #if 0 block that's not useful
 
 Signed-off-by: Warner Losh 
 Reviewed-by: Richard Henderson 
 ---
 bsd-user/elfload.c | 20 
 1 file changed, 20 deletions(-)
>>> 
>>> Move as patch #14?
>> 
>> Are you suggesting I move this to be right after patch #14 or that I squash 
>> / fold it into patch #14?
> 
> Move, if possible. If too much trouble (rebase conflict) then
> don't worry and let it here.

Yea, there’s rebase failures that aren’t trivial to resolve :(  Too many moving 
parts in this patch set.

>> Warner
>> 
>>> Reviewed-by: Philippe Mathieu-Daudé 



signature.asc
Description: Message signed with OpenPGP


Re: [PATCH v2 41/43] bsd-user: Implement cpu_copy() helper routine

2021-08-27 Thread Philippe Mathieu-Daudé
On 8/27/21 4:56 PM, Warner Losh wrote:
>> On Aug 26, 2021, at 10:47 PM, Philippe Mathieu-Daudé  wrote:
>>
>> On 8/26/21 11:11 PM, i...@bsdimp.com wrote:
>>> From: Warner Losh 
>>>
>>> cpu_copy shouldbe called when processes are creating new threads. It
>>
>> Typo "should be"
>>
>>> copies the current state of the CPU to a new cpu state needed for the
>>> new thread.
>>>
>>> Signed-off-by: Stacey Son 
>>> Signed-off-by: Warner Losh 
>>> Signed-off-by: Justin Hibbits 
>>> Reviewed-by: Richard Henderson 
>>> ---
>>> bsd-user/main.c | 30 ++
>>> 1 file changed, 30 insertions(+)
>>>
>>> diff --git a/bsd-user/main.c b/bsd-user/main.c
>>> index e2ed9e32ba..b35bcf4d1e 100644
>>> --- a/bsd-user/main.c
>>> +++ b/bsd-user/main.c
>>> @@ -180,6 +180,36 @@ void init_task_state(TaskState *ts)
>>> ts->sigqueue_table[i].next = NULL;
>>> }
>>>
>>> +CPUArchState *cpu_copy(CPUArchState *env)
>>> +{
>>> +CPUState *cpu = env_cpu(env);
>>> +CPUState *new_cpu = cpu_create(cpu_type);
>>> +CPUArchState *new_env = new_cpu->env_ptr;
>>> +CPUBreakpoint *bp;
>>> +CPUWatchpoint *wp;
>>> +
>>> +/* Reset non arch specific state */
>>> +cpu_reset(new_cpu);
>>> +
>>> +memcpy(new_env, env, sizeof(CPUArchState));
>>> +
>>> +/*
>>> + * Clone all break/watchpoints.
>>> + * Note: Once we support ptrace with hw-debug register access, make 
>>> sure
>>> + * BP_CPU break/watchpoints are handled correctly on clone.
>>> + */
>>> +QTAILQ_INIT(>breakpoints);
>>> +QTAILQ_INIT(>watchpoints);
>>> +QTAILQ_FOREACH(bp, >breakpoints, entry) {
>>> +cpu_breakpoint_insert(new_cpu, bp->pc, bp->flags, NULL);
>>> +}
>>> +QTAILQ_FOREACH(wp, >watchpoints, entry) {
>>> +cpu_watchpoint_insert(new_cpu, wp->vaddr, wp->len, wp->flags, 
>>> NULL);
>>> +}
>>> +
>>> +return new_env;
>>> +}
>>
>> But where is it called?
> 
> It’s in the bsd-user fork’d proc code:
> 
> https://github.com/qemu-bsd-user/qemu-bsd-user/blob/079d45942db8d1038806cb459992b4f016b52b51/bsd-user/freebsd/os-thread.c#L1566
> 
> Is where it’s called from. I wanted to get it out of the way in this review 
> since I was trying to get all the changes to main.c done, but if you’d like, 
> I can drop it and submit in the next round.

Better keep it for next round :)



Re: [PATCH v2 30/43] bsd-user: Remove dead #ifdefs from elfload.c

2021-08-27 Thread Philippe Mathieu-Daudé
On 8/27/21 5:02 PM, Warner Losh wrote:
>> On Aug 26, 2021, at 10:42 PM, Philippe Mathieu-Daudé  wrote:
>>
>> On 8/26/21 11:11 PM, i...@bsdimp.com wrote:
>>> From: Warner Losh 
>>>
>>> LOW_ELF_STACK doesn't exist on FreeBSD and likely never will. Remove it.
>>> Likewise, remove an #if 0 block that's not useful
>>>
>>> Signed-off-by: Warner Losh 
>>> Reviewed-by: Richard Henderson 
>>> ---
>>> bsd-user/elfload.c | 20 
>>> 1 file changed, 20 deletions(-)
>>
>> Move as patch #14?
> 
> Are you suggesting I move this to be right after patch #14 or that I squash / 
> fold it into patch #14?

Move, if possible. If too much trouble (rebase conflict) then
don't worry and let it here.

> Warner
> 
>> Reviewed-by: Philippe Mathieu-Daudé 
> 



Re: [PATCH v2 07/43] bsd-user: move arch specific defines out of elfload.c

2021-08-27 Thread Warner Losh


> On Aug 26, 2021, at 10:19 PM, Philippe Mathieu-Daudé  wrote:
> 
> On 8/26/21 11:11 PM, i...@bsdimp.com wrote:
>> From: Warner Losh 
>> 
>> Move the arcitecture specific defines to target_arch_elf.h and delete
> 
> Typo "architecture"

Thanks

>> them from elfload.c. unifdef as appropriate for i386 vs x86_64
> 
> "un-ifdef" or untangle?

I’ve reworded to make it clearer what I’ve done: only retain the
ifdefs relevant for i386 and x86_64.

Warner

>> versions. Add the copyright/license comments, and guard ifdefs.
>> 
>> Signed-off-by: Warner Losh 
>> Reviewed-by: Richard Henderson 
>> ---
>> bsd-user/elfload.c| 81 +--
>> bsd-user/i386/target_arch_elf.h   | 76 +
>> bsd-user/x86_64/target_arch_elf.h | 64 
>> 3 files changed, 142 insertions(+), 79 deletions(-)
>> create mode 100644 bsd-user/i386/target_arch_elf.h
>> create mode 100644 bsd-user/x86_64/target_arch_elf.h



signature.asc
Description: Message signed with OpenPGP


Re: [PATCH 4/3] gitlab-ci: Don't try to use the system libfdt in the debian job

2021-08-27 Thread Philippe Mathieu-Daudé
On 8/27/21 5:17 PM, Thomas Huth wrote:
> libfdt in Debian is too old to be usable for QEMU. So far we were
> silently falling back to the internal dtc submodule, but since
> this is wrong, let's remove the --enable-fdt=system switch here now.
> 
> Signed-off-by: Thomas Huth 
> ---
>  Sorry, I just noticed this after sending out the first three patches already
> 
>  .gitlab-ci.d/buildtest.yml | 1 -
>  1 file changed, 1 deletion(-)

Acked-by: Philippe Mathieu-Daudé 




Re: [PATCH v2 08/43] bsd-user: pass the bsd_param into loader_exec

2021-08-27 Thread Warner Losh


> On Aug 26, 2021, at 10:22 PM, Philippe Mathieu-Daudé  wrote:
> 
> On 8/26/21 11:11 PM, i...@bsdimp.com wrote:
>> From: Warner Losh 
>> 
>> Pass the bsd_param into loader_exec, and adjust.
> 
> Missing the "why" justification.

So I am. I’ve added it and it will be in v3 of the patches. It’s used to share 
state between
bsdload and elf load, especially open files and the stack used to construct the 
initial args
to start/main.

Warner

> Anyway,
> Reviewed-by: Philippe Mathieu-Daudé 
> 
>> Signed-off-by: Warner Losh 
>> Reviewed-by: Richard Henderson 
>> ---
>> bsd-user/bsdload.c | 37 +++--
>> bsd-user/main.c|  7 ++-
>> bsd-user/qemu.h|  3 ++-
>> 3 files changed, 27 insertions(+), 20 deletions(-)



signature.asc
Description: Message signed with OpenPGP


Re: [PATCH v2 19/43] bsd-user: start to move target CPU functions to target_arch*

2021-08-27 Thread Warner Losh


> On Aug 26, 2021, at 10:39 PM, Philippe Mathieu-Daudé  wrote:
> 
> On 8/26/21 11:11 PM, i...@bsdimp.com wrote:
>> From: Warner Losh 
>> 
>> Move the CPU functons into target_arch_cpu.c that are unique to each
> 
> Typo "functions"

doh! Thanks!

>> CPU. These are defined in target_arch.h.
>> 
>> Signed-off-by: Stacey Son 
>> Signed-off-by: Warner Losh 
>> Reviewed-by: Richard Henderson 
>> ---
>> bsd-user/i386/target_arch.h   | 31 +
>> bsd-user/i386/target_arch_cpu.c   | 75 +++
>> bsd-user/main.c   | 12 -
>> bsd-user/x86_64/target_arch.h | 31 +
>> bsd-user/x86_64/target_arch_cpu.c | 75 +++
>> configure |  7 +--
>> meson.build   |  8 +++-
>> 7 files changed, 219 insertions(+), 20 deletions(-)
>> create mode 100644 bsd-user/i386/target_arch.h
>> create mode 100644 bsd-user/i386/target_arch_cpu.c
>> create mode 100644 bsd-user/x86_64/target_arch.h
>> create mode 100644 bsd-user/x86_64/target_arch_cpu.c
> 
>> diff --git a/bsd-user/main.c b/bsd-user/main.c
>> index f7e1df5da5..7b3550898d 100644
>> --- a/bsd-user/main.c
>> +++ b/bsd-user/main.c
>> @@ -72,13 +72,6 @@ void gemu_log(const char *fmt, ...)
>> va_end(ap);
>> }
>> 
>> -#if defined(TARGET_I386)
>> -int cpu_get_pic_interrupt(CPUX86State *env)
>> -{
>> -return -1;
>> -}
>> -#endif
> 
> Let's avoid that using a stub.

Not sure I understand this comment. I’ve removed this code. And I have i386 and 
x86_64
as separate functions because that’s how Stacey broke it up. I have a todo item 
to merge
them back together once I’m caught up.

>> diff --git a/meson.build b/meson.build
>> index f2e148eaf9..5fe6b4aae6 100644
>> --- a/meson.build
>> +++ b/meson.build
>> @@ -2560,9 +2560,13 @@ foreach target : target_dirs
>> if 'CONFIG_LINUX_USER' in config_target
>>   base_dir = 'linux-user'
>>   target_inc += include_directories('linux-user/host/' / 
>> config_host['ARCH'])
>> -else
>> +endif
>> +if 'CONFIG_BSD_USER' in config_target
>>   base_dir = 'bsd-user'
>> -  target_inc += include_directories('bsd-user/freebsd')
>> +  target_inc += include_directories('bsd-user/' / targetos)
>> +# target_inc += include_directories('bsd-user/host/' / 
>> config_host['ARCH'])
> 
> Left-over?

Yea. Future changes will need this line. I’ll delete for now.

>> +  dir = base_dir / abi
>> +  arch_srcs += files(dir / 'target_arch_cpu.c')
>> endif
>> target_inc += include_directories(
>>   base_dir,



signature.asc
Description: Message signed with OpenPGP


[PATCH v4] hw/intc/sifive_clint: Fix muldiv64 overflow in sifive_clint_write_timecmp()

2021-08-27 Thread David Hoppenbrouwers
`muldiv64` would overflow in cases where the final 96-bit value does not
fit in a `uint64_t`. This would result in small values that cause an
interrupt to be triggered much sooner than intended.

The overflow can be detected in most cases by checking if the new value is
smaller than the previous value. If the final result is larger than
`diff` it is either correct or it doesn't matter as it is effectively
infinite anyways.

`next` is an `uint64_t` value, but `timer_mod` takes an `int64_t`. This
resulted in high values such as `UINT64_MAX` being converted to `-1`,
which caused an immediate timer interrupt.

By limiting `next` to `INT64_MAX` no overflow will happen while the
timer will still be effectively set to "infinitely" far in the future.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/493
Signed-off-by: David Hoppenbrouwers 
---
I addressed the potential signed integer overflow if `ns_diff` is
`INT64_MAX`. An unsigned integer overflow still should not be able to
happen practically.

David

 hw/intc/sifive_clint.c | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/hw/intc/sifive_clint.c b/hw/intc/sifive_clint.c
index 0f41e5ea1c..99c870ced2 100644
--- a/hw/intc/sifive_clint.c
+++ b/hw/intc/sifive_clint.c
@@ -59,8 +59,29 @@ static void sifive_clint_write_timecmp(RISCVCPU *cpu, 
uint64_t value,
 riscv_cpu_update_mip(cpu, MIP_MTIP, BOOL_TO_MASK(0));
 diff = cpu->env.timecmp - rtc_r;
 /* back to ns (note args switched in muldiv64) */
-next = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
-muldiv64(diff, NANOSECONDS_PER_SECOND, timebase_freq);
+uint64_t ns_diff = muldiv64(diff, NANOSECONDS_PER_SECOND, timebase_freq);
+
+/*
+ * check if ns_diff overflowed and check if the addition would potentially
+ * overflow
+ */
+if ((NANOSECONDS_PER_SECOND > timebase_freq && ns_diff < diff) ||
+ns_diff > INT64_MAX) {
+next = INT64_MAX;
+} else {
+/*
+ * as it is very unlikely qemu_clock_get_ns will return a value
+ * greater than INT64_MAX, no additional check is needed for an
+ * unsigned integer overflow.
+ */
+next = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + ns_diff;
+/*
+ * if ns_diff is INT64_MAX next may still be outside the range
+ * of a signed integer.
+ */
+next = MIN(next, INT64_MAX);
+}
+
 timer_mod(cpu->env.timer, next);
 }
 
-- 
2.20.1




Re: [PATCH 4/3] gitlab-ci: Don't try to use the system libfdt in the debian job

2021-08-27 Thread Daniel P . Berrangé
On Fri, Aug 27, 2021 at 05:17:18PM +0200, Thomas Huth wrote:
> libfdt in Debian is too old to be usable for QEMU. So far we were
> silently falling back to the internal dtc submodule, but since
> this is wrong, let's remove the --enable-fdt=system switch here now.
> 
> Signed-off-by: Thomas Huth 
> ---
>  Sorry, I just noticed this after sending out the first three patches already
> 
>  .gitlab-ci.d/buildtest.yml | 1 -
>  1 file changed, 1 deletion(-)

Reviewed-by: Daniel P. Berrangé 

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




[PATCH 4/3] gitlab-ci: Don't try to use the system libfdt in the debian job

2021-08-27 Thread Thomas Huth
libfdt in Debian is too old to be usable for QEMU. So far we were
silently falling back to the internal dtc submodule, but since
this is wrong, let's remove the --enable-fdt=system switch here now.

Signed-off-by: Thomas Huth 
---
 Sorry, I just noticed this after sending out the first three patches already

 .gitlab-ci.d/buildtest.yml | 1 -
 1 file changed, 1 deletion(-)

diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index 903ee65f32..f0eb5f6286 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -74,7 +74,6 @@ build-system-debian:
 job: amd64-debian-container
   variables:
 IMAGE: debian-amd64
-CONFIGURE_ARGS: --enable-fdt=system
 TARGETS: arm-softmmu avr-softmmu i386-softmmu mipsel-softmmu
   riscv64-softmmu sh4eb-softmmu sparc-softmmu xtensaeb-softmmu
 MAKE_CHECK_ARGS: check-build
-- 
2.27.0




[PATCH 4/3] gitlab-ci: Don't try to use the system libfdt in the debian job

2021-08-27 Thread Thomas Huth
libfdt in Debian is too old to be usable for QEMU. So far we were
silently falling back to the internal dtc submodule, but since
this is wrong, let's remove the --enable-fdt=system switch here now.

Signed-off-by: Thomas Huth 
---
 Sorry, I just noticed this after sending out the first three patches already

 .gitlab-ci.d/buildtest.yml | 1 -
 1 file changed, 1 deletion(-)

diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index 903ee65f32..f0eb5f6286 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -74,7 +74,6 @@ build-system-debian:
 job: amd64-debian-container
   variables:
 IMAGE: debian-amd64
-CONFIGURE_ARGS: --enable-fdt=system
 TARGETS: arm-softmmu avr-softmmu i386-softmmu mipsel-softmmu
   riscv64-softmmu sh4eb-softmmu sparc-softmmu xtensaeb-softmmu
 MAKE_CHECK_ARGS: check-build
-- 
2.27.0




[PATCH] nbd/server: Advertise MULTI_CONN for shared writable exports

2021-08-27 Thread Eric Blake
According to the NBD spec, a server advertising
NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will
not see any cache inconsistencies: when properly separated by a single
flush, actions performed by one client will be visible to another
client, regardless of which client did the flush.  We satisfy these
conditions in qemu because our block layer serializes any overlapping
operations (see bdrv_find_conflicting_request and friends): no matter
which client performs a flush, parallel requests coming from distinct
NBD clients will still be well-ordered by the time they are passed on
to the underlying device, with no caching in qemu proper to allow
stale results to leak after a flush.

We don't want to advertise MULTI_CONN when we know that a second
client can connect (which is the default for qemu-nbd, but not for QMP
nbd-server-add), so it does require a QAPI addition.  But other than
that, the actual change to advertise the bit for writable servers is
fairly small.  The harder part of this patch is setting up an iotest
to demonstrate behavior of multiple NBD clients to a single server.
It might be possible with parallel qemu-io processes, but concisely
managing that in shell is painful.  I found it easier to do by relying
on the libnbd project's nbdsh, which means this test will be skipped
on platforms where that is not available.

Signed-off-by: Eric Blake 
---
 docs/interop/nbd.txt   |  1 +
 docs/tools/qemu-nbd.rst|  3 +-
 qapi/block-export.json |  6 +-
 blockdev-nbd.c |  4 +
 nbd/server.c   | 10 +--
 qemu-nbd.c |  2 +
 MAINTAINERS|  1 +
 tests/qemu-iotests/tests/nbd-multiconn | 91 ++
 tests/qemu-iotests/tests/nbd-multiconn.out | 14 
 9 files changed, 124 insertions(+), 8 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/nbd-multiconn
 create mode 100644 tests/qemu-iotests/tests/nbd-multiconn.out

diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
index 10ce098a29bf..d03910f1e9eb 100644
--- a/docs/interop/nbd.txt
+++ b/docs/interop/nbd.txt
@@ -68,3 +68,4 @@ NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
 * 4.2: NBD_FLAG_CAN_MULTI_CONN for shareable read-only exports,
 NBD_CMD_FLAG_FAST_ZERO
 * 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth"
+* 6.2: NBD_FLAG_CAN_MULTI_CONN for shareable writable exports
diff --git a/docs/tools/qemu-nbd.rst b/docs/tools/qemu-nbd.rst
index 5643da26e982..81be32164a55 100644
--- a/docs/tools/qemu-nbd.rst
+++ b/docs/tools/qemu-nbd.rst
@@ -138,8 +138,7 @@ driver options if ``--image-opts`` is specified.
 .. option:: -e, --shared=NUM

   Allow up to *NUM* clients to share the device (default
-  ``1``), 0 for unlimited. Safe for readers, but for now,
-  consistency is not guaranteed between multiple writers.
+  ``1``), 0 for unlimited.

 .. option:: -t, --persistent

diff --git a/qapi/block-export.json b/qapi/block-export.json
index 0ed63442a819..b2085a9fdd4c 100644
--- a/qapi/block-export.json
+++ b/qapi/block-export.json
@@ -95,11 +95,15 @@
 #the metadata context name "qemu:allocation-depth" to
 #inspect allocation details. (since 5.2)
 #
+# @shared: True if the server should advertise that multiple clients may
+#  connect, default false. (since 6.2)
+#
 # Since: 5.2
 ##
 { 'struct': 'BlockExportOptionsNbd',
   'base': 'BlockExportOptionsNbdBase',
-  'data': { '*bitmaps': ['str'], '*allocation-depth': 'bool' } }
+  'data': { '*bitmaps': ['str'], '*allocation-depth': 'bool',
+ '*shared': 'bool' } }

 ##
 # @BlockExportOptionsVhostUserBlk:
diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index bdfa7ed3a5a9..258feaa76e02 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -221,6 +221,10 @@ void qmp_nbd_server_add(NbdServerAddOptions *arg, Error 
**errp)
 QAPI_LIST_PREPEND(export_opts->u.nbd.bitmaps, g_strdup(arg->bitmap));
 }

+/* nbd-server-add always permits parallel clients */
+export_opts->u.nbd.has_shared = true;
+export_opts->u.nbd.shared = true;
+
 /*
  * nbd-server-add doesn't complain when a read-only device should be
  * exported as writable, but simply downgrades it. This is an error with
diff --git a/nbd/server.c b/nbd/server.c
index 3927f7789dcf..1646796a4798 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1,5 +1,5 @@
 /*
- *  Copyright (C) 2016-2020 Red Hat, Inc.
+ *  Copyright (C) 2016-2021 Red Hat, Inc.
  *  Copyright (C) 2005  Anthony Liguori 
  *
  *  Network Block Device Server Side
@@ -1634,7 +1634,7 @@ static int nbd_export_create(BlockExport *blk_exp, 
BlockExportOptions *exp_args,
 int64_t size;
 uint64_t perm, shared_perm;
 bool readonly = !exp_args->writable;
-bool shared = !exp_args->writable;
+bool shared = arg->shared;
 strList *bitmaps;
 size_t i;
 int ret;
@@ -1685,11 

Re: [PATCH v3 01/19] target/loongarch: Add README

2021-08-27 Thread Peter Xu
On Fri, Aug 27, 2021 at 03:14:36PM +0800, Song Gao wrote:
> +The following versions of the LoongArch core are supported
> +core: 3A5000
> +
> https://github.com/loongson/LoongArch-Documentation/releases/download/LoongArch-Vol1-v3/LoongArch-Vol1-v1.00-EN.pdf

This link seems invalid now.

-- 
Peter Xu




Re: [PATCH v2 30/43] bsd-user: Remove dead #ifdefs from elfload.c

2021-08-27 Thread Warner Losh


> On Aug 26, 2021, at 10:42 PM, Philippe Mathieu-Daudé  wrote:
> 
> On 8/26/21 11:11 PM, i...@bsdimp.com wrote:
>> From: Warner Losh 
>> 
>> LOW_ELF_STACK doesn't exist on FreeBSD and likely never will. Remove it.
>> Likewise, remove an #if 0 block that's not useful
>> 
>> Signed-off-by: Warner Losh 
>> Reviewed-by: Richard Henderson 
>> ---
>> bsd-user/elfload.c | 20 
>> 1 file changed, 20 deletions(-)
> 
> Move as patch #14?

Are you suggesting I move this to be right after patch #14 or that I squash / 
fold it into patch #14?

Warner

> Reviewed-by: Philippe Mathieu-Daudé 



signature.asc
Description: Message signed with OpenPGP


Re: [PATCH v2 41/43] bsd-user: Implement cpu_copy() helper routine

2021-08-27 Thread Warner Losh


> On Aug 26, 2021, at 10:47 PM, Philippe Mathieu-Daudé  wrote:
> 
> On 8/26/21 11:11 PM, i...@bsdimp.com wrote:
>> From: Warner Losh 
>> 
>> cpu_copy shouldbe called when processes are creating new threads. It
> 
> Typo "should be"
> 
>> copies the current state of the CPU to a new cpu state needed for the
>> new thread.
>> 
>> Signed-off-by: Stacey Son 
>> Signed-off-by: Warner Losh 
>> Signed-off-by: Justin Hibbits 
>> Reviewed-by: Richard Henderson 
>> ---
>> bsd-user/main.c | 30 ++
>> 1 file changed, 30 insertions(+)
>> 
>> diff --git a/bsd-user/main.c b/bsd-user/main.c
>> index e2ed9e32ba..b35bcf4d1e 100644
>> --- a/bsd-user/main.c
>> +++ b/bsd-user/main.c
>> @@ -180,6 +180,36 @@ void init_task_state(TaskState *ts)
>> ts->sigqueue_table[i].next = NULL;
>> }
>> 
>> +CPUArchState *cpu_copy(CPUArchState *env)
>> +{
>> +CPUState *cpu = env_cpu(env);
>> +CPUState *new_cpu = cpu_create(cpu_type);
>> +CPUArchState *new_env = new_cpu->env_ptr;
>> +CPUBreakpoint *bp;
>> +CPUWatchpoint *wp;
>> +
>> +/* Reset non arch specific state */
>> +cpu_reset(new_cpu);
>> +
>> +memcpy(new_env, env, sizeof(CPUArchState));
>> +
>> +/*
>> + * Clone all break/watchpoints.
>> + * Note: Once we support ptrace with hw-debug register access, make sure
>> + * BP_CPU break/watchpoints are handled correctly on clone.
>> + */
>> +QTAILQ_INIT(>breakpoints);
>> +QTAILQ_INIT(>watchpoints);
>> +QTAILQ_FOREACH(bp, >breakpoints, entry) {
>> +cpu_breakpoint_insert(new_cpu, bp->pc, bp->flags, NULL);
>> +}
>> +QTAILQ_FOREACH(wp, >watchpoints, entry) {
>> +cpu_watchpoint_insert(new_cpu, wp->vaddr, wp->len, wp->flags, NULL);
>> +}
>> +
>> +return new_env;
>> +}
> 
> But where is it called?

It’s in the bsd-user fork’d proc code:

https://github.com/qemu-bsd-user/qemu-bsd-user/blob/079d45942db8d1038806cb459992b4f016b52b51/bsd-user/freebsd/os-thread.c#L1566

Is where it’s called from. I wanted to get it out of the way in this review 
since I was trying to get all the changes to main.c done, but if you’d like, I 
can drop it and submit in the next round.

Warner


signature.asc
Description: Message signed with OpenPGP


Re: [PATCH v2 42/43] bsd-user: Add '-0 argv0' option to bsd-user/main.c

2021-08-27 Thread Warner Losh


> On Aug 26, 2021, at 10:48 PM, Philippe Mathieu-Daudé  wrote:
> 
> On 8/26/21 11:12 PM, i...@bsdimp.com wrote:
>> From: Colin Percival 
>> 
>> Previously it was impossible to emulate a program with a file name
>> different from its argv[0].  With this change, you can run
>>qemu -0 fakename realname args
>> which runs the program "realname" with an argv of "fakename args".
>> 
>> Signed-off-by: Colin Percival 
>> Signed-off-by: Warner Losh 
>> Reviewed-by: Richard Henderson 
>> ---
>> bsd-user/main.c | 5 +
>> 1 file changed, 5 insertions(+)
>> 
>> diff --git a/bsd-user/main.c b/bsd-user/main.c
>> index b35bcf4d1e..ae25f4c773 100644
>> --- a/bsd-user/main.c
>> +++ b/bsd-user/main.c
>> @@ -268,6 +268,7 @@ int main(int argc, char **argv)
>> char **target_environ, **wrk;
>> envlist_t *envlist = NULL;
>> bsd_type = HOST_DEFAULT_BSD_TYPE;
>> +char * argv0 = NULL;
>> 
>> adjust_ssize();
>> 
>> @@ -390,6 +391,8 @@ int main(int argc, char **argv)
>> do_strace = 1;
>> } else if (!strcmp(r, "trace")) {
>> trace_opt_parse(optarg);
>> +} else if (!strcmp(r, "0")) {
>> +argv0 = argv[optind++];
>> } else {
>> usage();
>> }
>> @@ -413,6 +416,8 @@ int main(int argc, char **argv)
>> usage();
>> }
>> filename = argv[optind];
>> +if (argv0)
> 
> Style:
> 
> {
> 
>> +argv[optind] = argv0;
> 
> }

Done in v3.

Warner

> 
>> 
>> if (!trace_init_backends()) {
>> exit(1);



signature.asc
Description: Message signed with OpenPGP


[PATCH] qapi: Set boolean value correctly in examples

2021-08-27 Thread Guoyi Tu



Signed-off-by: Guoyi Tu 
---
 qapi/trace.json | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qapi/trace.json b/qapi/trace.json
index 47c68f04da..eedfded512 100644
--- a/qapi/trace.json
+++ b/qapi/trace.json
@@ -99,7 +99,7 @@
 # Example:
 #
 # -> { "execute": "trace-event-set-state",
-#  "arguments": { "name": "qemu_memalign", "enable": "true" } }
+#  "arguments": { "name": "qemu_memalign", "enable": true } }
 # <- { "return": {} }
 #
 ##
--
2.25.1

--
Best Regards,
Guoyi Tu



Re: [PATCH v2 3/3] dump-guest-memory: Block live migration

2021-08-27 Thread Peter Xu
On Fri, Aug 27, 2021 at 11:51:04AM +0400, Marc-André Lureau wrote:
> Hi
> 
> On Fri, Aug 27, 2021 at 11:44 AM Marc-André Lureau <
> marcandre.lur...@redhat.com> wrote:
> 
> > Hi
> >
> > On Thu, Aug 26, 2021 at 10:58 PM Peter Xu  wrote:
> >
> >> Both dump-guest-memory and live migration caches vm state at the
> >> beginning.
> >> Either of them entering the other one will cause race on the vm state,
> >> and even
> >> more severe on that (please refer to the crash report in the bug link).
> >>
> >> Let's block live migration in dump-guest-memory, and that'll also block
> >> dump-guest-memory if it detected that we're during a live migration.
> >>
> >> Side note: migrate_del_blocker() can be called even if the blocker is not
> >> inserted yet, so it's safe to unconditionally delete that blocker in
> >> dump_cleanup (g_slist_remove allows no-entry-found case).
> >>
> >> Suggested-by: Dr. David Alan Gilbert 
> >> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1996609
> >> Signed-off-by: Peter Xu 
> >> ---
> >>  dump/dump.c | 24 +++-
> >>  1 file changed, 19 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/dump/dump.c b/dump/dump.c
> >> index ab625909f3..9c1c1fb738 100644
> >> --- a/dump/dump.c
> >> +++ b/dump/dump.c
> >> @@ -29,6 +29,7 @@
> >>  #include "qemu/error-report.h"
> >>  #include "qemu/main-loop.h"
> >>  #include "hw/misc/vmcoreinfo.h"
> >> +#include "migration/blocker.h"
> >>
> >>  #ifdef TARGET_X86_64
> >>  #include "win_dump.h"
> >> @@ -47,6 +48,8 @@
> >>
> >>  #define MAX_GUEST_NOTE_SIZE (1 << 20) /* 1MB should be enough */
> >>
> >> +static Error *dump_migration_blocker;
> >> +
> >>  #define ELF_NOTE_SIZE(hdr_size, name_size, desc_size)   \
> >>  ((DIV_ROUND_UP((hdr_size), 4) + \
> >>DIV_ROUND_UP((name_size), 4) +\
> >> @@ -101,6 +104,7 @@ static int dump_cleanup(DumpState *s)
> >>  qemu_mutex_unlock_iothread();
> >>  }
> >>  }
> >> +migrate_del_blocker(dump_migration_blocker);
> >>
> >>  return 0;
> >>  }
> >> @@ -1927,11 +1931,6 @@ void qmp_dump_guest_memory(bool paging, const char
> >> *file,
> >>  Error *local_err = NULL;
> >>  bool detach_p = false;
> >>
> >> -if (runstate_check(RUN_STATE_INMIGRATE)) {
> >> -error_setg(errp, "Dump not allowed during incoming migration.");
> >> -return;
> >> -}
> >> -
> >>  /* if there is a dump in background, we should wait until the dump
> >>   * finished */
> >>  if (dump_in_progress()) {
> >> @@ -2005,6 +2004,21 @@ void qmp_dump_guest_memory(bool paging, const char
> >> *file,
> >>  return;
> >>  }
> >>
> >> +if (!dump_migration_blocker) {
> >> +error_setg(_migration_blocker,
> >> +   "Live migration disabled: dump-guest-memory in
> >> progress");
> >> +}
> >> +
> >> +/*
> >> + * Allows even for -only-migratable, but forbid migration during the
> >> + * process of dump guest memory.
> >> + */
> >> +if (migrate_add_blocker_internal(dump_migration_blocker, errp)) {
> >> +/* Remember to release the fd before passing it over to dump
> >> state */
> >> +close(fd);
> >> +return;
> >> +}
> >>
> >
> > I would move it earlier.  Why not leave it at the beginning of the
> > function as it was?
> >
> >
> Ah I think it's because dump_cleanup() isn't called when returning earlier.
> But relying on a cleanup done outside of this function is not obvious.
> Either dump_cleanup() should be called from here, or the blocker code
> should be moved to dump_init() imho.

Note that dump_cleanup() cleans s->fd, however fd has not delievered to s->fd
now.

I can move it into dump_init() but I need to put it after setting s->fd to make
dump_cleanup work.  That could look a bit weird (as checking migration blocker
shouldn't really be bound to initialize of dump states).

There's another way to move add blocker to the entry of dump_in_progress()
check, however then all the checks between dump_in_progress() and before
opening the fd will not be able to "return" directly but we'll need to del the
blocker if any of those check fails.

That's how I come up with the current patch, which looks the easiest change and
still looks clean.

Besides, I think dump_in_progress() can be removed too with patch 1 checks
SAVEVM state too, however that's a side effect of migration_is_idle() and
dump_in_progres is doing the check in different ways by looking up dump state,
so I kept it.

Thanks,

-- 
Peter Xu




Re: [PATCH 0/3] gdbstub: add support for switchable endianness

2021-08-27 Thread Changbin Du
On Tue, Aug 24, 2021 at 10:11:14AM +0100, Peter Maydell wrote:
> On Tue, 24 Aug 2021 at 00:05, Changbin Du  wrote:
> >
> > On Mon, Aug 23, 2021 at 04:30:05PM +0100, Peter Maydell wrote:
> > > changes to be more capable of handling dynamic target changes
> > > (this would also help with eg debugging across 32<->64 bit switches);
> > > as I understand it that gdb work would be pretty significant,
> > > and at least for aarch64 pretty much nobody cares about
> > > big-endian, so nobody's got round to doing it yet.
> > >
> > Mostly we do not care dynamic target changes because nearly all OS will 
> > setup
> > endianness mode by its first instruction. And dynamic changes for gdb is 
> > hard
> > since the byte order of debugging info in elf is fixed. And currently the 
> > GDB
> > remote protocol does not support querying endianness info from remote.
> >
> > So usually we needn't change byte order during a debug session, but we just 
> > want
> > the qemu gdbstub can send data in and handle data it received in right byte 
> > order.
> > This patch does this work with the help of users via the option 
> > 'endianness='.
> 
> I'm not a huge fan of putting in workarounds that deal with the
> problem for specific cases and require users to tweak options settings,
> rather than solving the problem in a more general way that would
> let it all Just Work for all cases.
>
Probably we can add a new callback 'gdb_get_endianness' for CPUClass, and use
this callback to determine if bswap is needed every time we read/write cpu
registers. What's your thought?

> -- PMM

-- 
Cheers,
Changbin Du



RE: [PATCH 02/19] host-utils: move abs64() to host-utils

2021-08-27 Thread Luis Fernando Fujita Pires
> >> Oh, I wasn't referring to any specific users. What I meant is that,
> >> if we make abs64() generically available from host-utils, callers
> >> could expect it to behave the same way as abs() in stdlib, for
> >> example.
> >
> > That would be surprising, but do you think there are cases where that
> > would be a bad surprise?
> >
> > I don't think anybody who is aware of the abs(INT_MIN),
> > labs(LONG_MIN), and llabs(LLONG_MIN) edge cases actually _like_ that
> > behaviour.
> >
> > If you really want to avoid surprises, providing a saner function with
> > a different name seems better than trying to emulate the edge cases of
> > abs()/labs()/llabs().
> 
> Agreed. See do_strtosz() for example.

I'll make this change when I submit the next version of this patch series.

Thanks!

--
Luis Pires
Instituto de Pesquisas ELDORADO
Aviso Legal - Disclaimer 


Re: [PATCH v3 09/18] ui/vdagent: disconnect handlers and reset state on finalize

2021-08-27 Thread Philippe Mathieu-Daudé
On 8/5/21 3:57 PM, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> Avoid handlers being called with dangling pointers when the object is
> freed.
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  ui/vdagent.c | 25 +++--
>  1 file changed, 15 insertions(+), 10 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v3 1/2] tests: Remove uses of deprecated raspi2/raspi3 machine names

2021-08-27 Thread Willian Rampazzo
On Fri, Aug 27, 2021 at 3:14 AM Philippe Mathieu-Daudé  wrote:
>
> Commit 155e1c82ed0 deprecated the raspi2/raspi3 machine names.
> Use the recommended new names: raspi2b and raspi3b.
>
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  docs/devel/qgraph.rst   | 38 -
>  tests/qtest/libqos/qgraph.h |  6 ++--
>  tests/qtest/libqos/qgraph_internal.h|  2 +-
>  tests/qtest/boot-serial-test.c  |  2 +-
>  tests/qtest/libqos/arm-raspi2-machine.c |  8 +++---
>  tests/unit/test-qgraph.c|  2 +-
>  tests/acceptance/boot_linux_console.py  |  6 ++--
>  7 files changed, 32 insertions(+), 32 deletions(-)
>

Reviewed-by: Willian Rampazzo 




Re: [PATCH v3 01/18] ui/vdagent: fix leak on error path

2021-08-27 Thread Philippe Mathieu-Daudé
On 8/5/21 3:56 PM, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> "info" was leaked when more than 10 entries.
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  include/ui/clipboard.h | 2 ++
>  ui/vdagent.c   | 4 +---
>  2 files changed, 3 insertions(+), 3 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH 0/6] virtio-iommu: Add ACPI support

2021-08-27 Thread Jean-Philippe Brucker
Hi Eric,

On Tue, Aug 17, 2021 at 04:58:01PM +0200, Eric Auger wrote:
> Hi Jean,
> 
> On 8/10/21 10:45 AM, Jean-Philippe Brucker wrote:
> > Allow instantiating a virtio-iommu device on ACPI systems by adding a
> > Virtual I/O Translation table (VIOT). Enable x86 support for VIOT.
> 
> Don't you need your other patch
> "virtio-iommu: Default to bypass during boot"?
> 
> Without this latter, for me the guest fails to boot.

Good point, I think I've been lucky during my testing. My bootloader and
kernel are on virtio-blk-pci devices and as I wasn't explicitly enabling
the "iommu_platform" parameter, they would bypass the IOMMU. When enabling
the parameter, boot hangs since the IOMMU isn't enabled when the
bootloader needs to fetch the kernel, and DMA faults. That parameter is
specific to virtio devices. Using another storage for bootloader and
kernel will result in failure to boot.

I've been postponing the boot-bypass patch since it requires a
specification change to be done right, but it's next on my list.

Thanks,
Jean



Re: [PATCH 2/6] hw/acpi: Add VIOT table

2021-08-27 Thread Jean-Philippe Brucker
On Tue, Aug 10, 2021 at 11:22:27AM +0200, Igor Mammedov wrote:
> On Tue, 10 Aug 2021 10:45:02 +0200
> Jean-Philippe Brucker  wrote:
> 
> > Add a function that generates a Virtual I/O Translation table (VIOT),
> > describing the topology of paravirtual IOMMUs. The table is created when
> > instantiating a virtio-iommu device. It contains a virtio-iommu node and
> > PCI Range nodes for endpoints managed by the IOMMU. By default, a single
> > node describes all PCI devices. When passing the "default_bus_bypass_iommu"
> > machine option and "bypass_iommu" PXB option, only buses that do not
> > bypass the IOMMU are described by PCI Range nodes.
> > 
> > Signed-off-by: Jean-Philippe Brucker 
> 
> 
> using packed structures for composing ACPI tables is discouraged,
> pls use build_append_int_noprefix() API instead. You can look at
> build_amd_iommu() as an example.
> 
> PS:
> Also note field comments format.
> /it should be verbatim copy of entry name from respective table in spec/

Got it, I'll switch to build_append_int_noprefix()

Thanks,
Jean




Re: [PATCH 4/6] hw/arm/virt: Remove device tree restriction for virtio-iommu

2021-08-27 Thread Jean-Philippe Brucker
On Tue, Aug 17, 2021 at 03:42:22PM +0200, Eric Auger wrote:
> > diff --git a/hw/virtio/virtio-iommu-pci.c b/hw/virtio/virtio-iommu-pci.c
> > index 770c286be7..f30eb16cbf 100644
> > --- a/hw/virtio/virtio-iommu-pci.c
> > +++ b/hw/virtio/virtio-iommu-pci.c
> > @@ -48,16 +48,9 @@ static void virtio_iommu_pci_realize(VirtIOPCIProxy 
> > *vpci_dev, Error **errp)
> >  VirtIOIOMMU *s = VIRTIO_IOMMU(vdev);
> >  
> >  if (!qdev_get_machine_hotplug_handler(DEVICE(vpci_dev))) {
> > -MachineClass *mc = MACHINE_GET_CLASS(qdev_get_machine());
> > -
> > -error_setg(errp,
> > -   "%s machine fails to create iommu-map device tree 
> > bindings",
> > -   mc->name);
> >  error_append_hint(errp,
> >"Check your machine implements a hotplug handler 
> > "
> >"for the virtio-iommu-pci device\n");
> > -error_append_hint(errp, "Check the guest is booted without FW or 
> > with "
> > -  "-no-acpi\n");
> We may check the vms->iommu is not already set to something else (to
> VIRT_IOMMU_SMMUV3 for instance).

Since that check is machine specific, virt_machine_device_plug_cb() in
hw/arm/virt.c may be a good place for it. The change feels unrelated to
this series but it's simple enough that I'm tempted to just append the
patch at the end. It also deals with trying to instantiate multiple
virtio-iommu devices, which isn't supported either.

Thanks,
Jean



Re: [PATCH 6/6] pc: Allow instantiating a virtio-iommu device

2021-08-27 Thread Jean-Philippe Brucker
On Tue, Aug 17, 2021 at 04:11:49PM +0200, Eric Auger wrote:
> Hi Jean,
> 
> On 8/10/21 10:45 AM, Jean-Philippe Brucker wrote:
> > From: Eric Auger 
> >
> > Add a hotplug handler for virtio-iommu on x86 and set the necessary
> > reserved region property. On x86, the [0xfee0, 0xfeef] DMA
> > region is reserved for MSIs. DMA transactions to this range either
> > trigger IRQ remapping in the IOMMU or bypasses IOMMU translation.
> >
> > Although virtio-iommu does not support IRQ remapping it must be informed
> > of the reserved region so that it can forward DMA transactions targeting
> > this region.
> >
> > Signed-off-by: Eric Auger 
> > Signed-off-by: Jean-Philippe Brucker 
> 
> I think we need to handle the case where the end-user gets lost with
> iommu options and use an invalid combination such as
> 
> -M q35,iommu=on,int_remap=on,kernel_irqchip=off -device -device 
> virtio-iommu-pci

I guess that would be
"-M q35,kernel_irqchip=off -device intel-iommu,intremap=on -device 
virtio-iommu-pci"

I'll add the checks, similar to the one in x86_iommu_set_default().

> We may also document somewhere that the virtio-iommu-pci
> does not support irq remapping as this may be an important limitation on x86.

I'll mention it in the commit message, unless you had another place in
mind?

Thanks,
Jean



Re: [PATCH v3 00/18] Clipboard fixes (for 6.1?)

2021-08-27 Thread Marc-André Lureau
Hi

On Thu, Aug 5, 2021 at 5:59 PM  wrote:

> From: Marc-André Lureau 
>
> Hi,
>
> Here is a few fixes I have collected while working on clipboard-related
> code.
>
> There are some obvious code improvements/fixes, and better handling of
> release &
> unregister to avoid dangling pointers and improve user experience.
>
> v3:
>  - add a migration blocker
>  - improve the code by using a few helpers
>
> v2:
>  - replaced "ui/vdagent: unregister clipboard peer on finalize" with
> "ui/vdagent: disconnect handlers and reset state on finalize" patch.
>  - added "ui/vdagent: reset outbuf on disconnect"
>  - commit message tweaks
>
> Marc-André Lureau (18):
>   ui/vdagent: fix leak on error path
>   ui/vdagent: remove copy-pasta comment
>   ui/gtk-clipboard: use existing macros
>   ui/gtk-clipboard: fix clipboard enum typo
>   ui/clipboard: add helper to retrieve current clipboard
>   ui/clipboard: add qemu_clipboard_peer_owns() helper
>   ui/clipboard: add qemu_clipboard_peer_release() helper
>   ui/clipboard: release owned grabs on unregister
>   ui/vdagent: disconnect handlers and reset state on finalize
>   ui/vdagent: reset outbuf on disconnect
>   ui/vdagent: split clipboard recv message handling
>   ui/vdagent: use qemu_clipboard_peer_release helper
>   ui/vdagent: use qemu_clipboard_info helper
>   ui/vdagent: send empty clipboard when unhandled
>   ui/gtk-clipboard: use qemu_clipboard_info helper
>   ui/vdagent: send release when no clipboard owner
>   ui/gtk-clipboard: emit release clipboard events
>   ui/vdagent: add a migration blocker
>
>  include/ui/clipboard.h |  33 ++
>  include/ui/gtk.h   |   1 -
>  ui/clipboard.c |  41 +++
>  ui/gtk-clipboard.c |  24 ++---
>  ui/vdagent.c   | 239 ++---
>  ui/trace-events|   1 +
>  6 files changed, 240 insertions(+), 99 deletions(-)
>
> --
> 2.32.0.264.g75ae10bc75
>
>
>
>

Ping

Since Gerd is lowering his maintainer responsibilities, is anyone else
interested?

-- 
Marc-André Lureau


Re: [PATCH 2/3] meson.build: Don't use internal libfdt if the user requested the system libfdt

2021-08-27 Thread Philippe Mathieu-Daudé
On 8/27/21 2:09 PM, Thomas Huth wrote:
> If the users ran configure with --enable-libfdt=system, they likely did
> that on purpose. We should not silently fall back to the internal libfdt
> if the system libfdt is not usable, but report the problem with a proper
> message instead.
> 
> Signed-off-by: Thomas Huth 
> ---
>  meson.build | 2 ++
>  1 file changed, 2 insertions(+)

Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH 0/3] dtc: Fixes for the fdt check and update submodule to 1.6.1

2021-08-27 Thread Marc-André Lureau
On Fri, Aug 27, 2021 at 4:13 PM Thomas Huth  wrote:

> There are some issues in the checks for libfdt in meson.build which
> get fixed with the first two patches.
>
> And while we're at it, also update the dtc submodule to a proper release
> version (in the third patch).
>
> Thomas Huth (3):
>   meson.build: Fix the check for a usable libfdt
>   meson.build: Don't use internal libfdt if the user requested the
> system libfdt
>   dtc: Update to version 1.6.1
>
>  dtc | 2 +-
>  meson.build | 3 +++
>  2 files changed, 4 insertions(+), 1 deletion(-)
>
> --
> 2.27.0
>
>
>
Reviewed-by: Marc-André Lureau 

-- 
Marc-André Lureau


Ping: [PATCH 2/2] ui/cocoa.m: Add ability to swap command/meta and options keys

2021-08-27 Thread Programmingkid
Ping:

From: John Arbuckle 
Date: Fri, 30 Jul 2021 10:18:56 -0400
Subject: [PATCH 2/2] ui/cocoa.m: Add ability to swap command/meta and options 
keys

For users who are use to using a Macintosh keyboard having to use a PC keyboard
can be a pain because the Command/meta and Option/Alt keys are switched. To
make life easier this option is added to allow the user to switch how the guest
reacts to each of these keys being pushed. It can make a Macintosh keyboard user
feel almost at home with a PC keyboard. The same could be said for PC keyboard
users who have to use a Macinosh keyboard.
Based on patch by Gustavo Noronha Silva .

Signed-off-by: John Arbuckle 
---
 ui/cocoa.m | 66 +++---
 1 file changed, 58 insertions(+), 8 deletions(-)

diff --git a/ui/cocoa.m b/ui/cocoa.m
index fdef9e9901..98596d5f38 100644
--- a/ui/cocoa.m
+++ b/ui/cocoa.m
@@ -116,6 +116,7 @@ static void cocoa_switch(DisplayChangeListener *dcl,
 
 static CFMachPortRef eventsTap = NULL;
 static CFRunLoopSourceRef eventsTapSource = NULL;
+static bool swap_command_option_keys = false;
 
 static void with_iothread_lock(CodeBlock block)
 {
@@ -810,12 +811,22 @@ - (bool) handleEventLocked:(NSEvent *)event
 qkbd_state_key_event(kbd, Q_KEY_CODE_CTRL_R, false);
 }
 if (!(modifiers & NSEventModifierFlagOption)) {
-qkbd_state_key_event(kbd, Q_KEY_CODE_ALT, false);
-qkbd_state_key_event(kbd, Q_KEY_CODE_ALT_R, false);
+if (swap_command_option_keys) {
+qkbd_state_key_event(kbd, Q_KEY_CODE_META_L, false);
+qkbd_state_key_event(kbd, Q_KEY_CODE_META_R, false);
+} else {
+qkbd_state_key_event(kbd, Q_KEY_CODE_ALT, false);
+qkbd_state_key_event(kbd, Q_KEY_CODE_ALT_R, false);
+}
 }
 if (!(modifiers & NSEventModifierFlagCommand)) {
-qkbd_state_key_event(kbd, Q_KEY_CODE_META_L, false);
-qkbd_state_key_event(kbd, Q_KEY_CODE_META_R, false);
+if (swap_command_option_keys) {
+qkbd_state_key_event(kbd, Q_KEY_CODE_ALT, false);
+qkbd_state_key_event(kbd, Q_KEY_CODE_ALT_R, false);
+} else {
+qkbd_state_key_event(kbd, Q_KEY_CODE_META_L, false);
+qkbd_state_key_event(kbd, Q_KEY_CODE_META_R, false);
+}
 }
 
 switch ([event type]) {
@@ -847,13 +858,22 @@ - (bool) handleEventLocked:(NSEvent *)event
 
 case kVK_Option:
 if (!!(modifiers & NSEventModifierFlagOption)) {
-[self toggleKey:Q_KEY_CODE_ALT];
+if (swap_command_option_keys) {
+[self toggleKey:Q_KEY_CODE_META_L];
+} else {
+[self toggleKey:Q_KEY_CODE_ALT];
+}
+
 }
 break;
 
 case kVK_RightOption:
 if (!!(modifiers & NSEventModifierFlagOption)) {
-[self toggleKey:Q_KEY_CODE_ALT_R];
+if (swap_command_option_keys) {
+[self toggleKey:Q_KEY_CODE_META_R];
+} else {
+[self toggleKey:Q_KEY_CODE_ALT_R];
+}
 }
 break;
 
@@ -861,14 +881,22 @@ - (bool) handleEventLocked:(NSEvent *)event
 case kVK_Command:
 if (isMouseGrabbed &&
 !!(modifiers & NSEventModifierFlagCommand)) {
-[self toggleKey:Q_KEY_CODE_META_L];
+if (swap_command_option_keys) {
+[self toggleKey:Q_KEY_CODE_ALT];
+} else {
+[self toggleKey:Q_KEY_CODE_META_L];
+}
 }
 break;
 
 case kVK_RightCommand:
 if (isMouseGrabbed &&
 !!(modifiers & NSEventModifierFlagCommand)) {
-[self toggleKey:Q_KEY_CODE_META_R];
+if (swap_command_option_keys) {
+[self toggleKey:Q_KEY_CODE_ALT_R];
+} else {
+[self toggleKey:Q_KEY_CODE_META_R];
+}
 }
 break;
 }
@@ -1188,6 +1216,7 @@ - (IBAction) do_about_menu_item: (id) sender;
 - (void)make_about_window;
 - (void)adjustSpeed:(id)sender;
 - (IBAction)doFullGrab:(id)sender;
+- (IBAction)doSwapCommandOptionKeys:(id)sender;
 @end
 
 @implementation QemuCocoaAppController
@@ -1669,6 +1698,22 @@ - (IBAction)doFullGrab:(id) sender
 }
 }
 
+// The action method to the "Options->Swap Command and Option Keys" menu item
+- (IBAction)doSwapCommandOptionKeys:(id)sender
+{
+// If the menu item is not checked
+if ([sender state] == 

Ping: [PATCH 1/2] ui/cocoa.m: Add full keyboard grab support

2021-08-27 Thread Programmingkid
ping

From: John Arbuckle 
Date: Thu, 29 Jul 2021 14:41:57 -0400
Subject: [PATCH 1/2] ui/cocoa.m: Add full keyboard grab support

There are keyboard shortcuts that are vital for use in a guest that runs Mac OS.
These shortcuts are reserved for Mac OS use only which makes having the guest
see them impossible on a Mac OS host - until now. This patch will enable the
user to decide if the guest should see all keyboard shortcuts using a menu item.
This patch adds a new menu called Options and a new menu item called
"Full Keyboard Grab". Simply selecting this menu item will turn the feature on
or off at any time. Mac OS requires the user to enable access to assistive
devices to use this feature. How to do this varies with each Mac OS version.
Based on patch by Gustavo Noronha Silva . 

Signed-off-by: John Arbuckle 
---
 ui/cocoa.m | 112 +
 1 file changed, 112 insertions(+)

diff --git a/ui/cocoa.m b/ui/cocoa.m
index 9f72844b07..fdef9e9901 100644
--- a/ui/cocoa.m
+++ b/ui/cocoa.m
@@ -114,6 +114,9 @@ static void cocoa_switch(DisplayChangeListener *dcl,
 typedef void (^CodeBlock)(void);
 typedef bool (^BoolCodeBlock)(void);
 
+static CFMachPortRef eventsTap = NULL;
+static CFRunLoopSourceRef eventsTapSource = NULL;
+
 static void with_iothread_lock(CodeBlock block)
 {
 bool locked = qemu_mutex_iothread_locked();
@@ -332,10 +335,27 @@ - (float) cdx;
 - (float) cdy;
 - (QEMUScreen) gscreen;
 - (void) raiseAllKeys;
+- (void) setFullGrab;
 @end
 
 QemuCocoaView *cocoaView;
 
+// Part of the full keyboard grab system
+static CGEventRef handleTapEvent(CGEventTapProxy proxy, CGEventType type,
+CGEventRef cgEvent, void *userInfo)
+{
+QemuCocoaView *cocoaView = (QemuCocoaView*) userInfo;
+NSEvent* event = [NSEvent eventWithCGEvent:cgEvent];
+if ([cocoaView isMouseGrabbed] && [cocoaView handleEvent:event]) {
+COCOA_DEBUG("Global events tap: qemu handled the event, capturing!\n");
+return NULL;
+}
+COCOA_DEBUG("Global events tap: qemu did not handle the event, letting it"
+" through...\n");
+
+return cgEvent;
+}
+
 @implementation QemuCocoaView
 - (id)initWithFrame:(NSRect)frameRect
 {
@@ -361,6 +381,12 @@ - (void) dealloc
 }
 
 qkbd_state_free(kbd);
+if (eventsTap) {
+CFRelease(eventsTap);
+}
+if (eventsTapSource) {
+CFRelease(eventsTapSource);
+}
 [super dealloc];
 }
 
@@ -1086,6 +1112,50 @@ - (void) raiseAllKeys
 qkbd_state_lift_all_keys(kbd);
 });
 }
+
+// Inserts the event tap.
+// This enables us to receive keyboard events that Mac OS would
+// otherwise not let us see - like Command-Option-Esc.
+- (void) setFullGrab
+{
+COCOA_DEBUG("QemuCocoaView: setFullGrab\n");
+NSString *advice = @"Try enabling access to assistive devices";
+CGEventMask mask = CGEventMaskBit(kCGEventKeyDown) |
+CGEventMaskBit(kCGEventKeyUp) | CGEventMaskBit(kCGEventFlagsChanged);
+eventsTap = CGEventTapCreate(kCGHIDEventTap, kCGHeadInsertEventTap,
+ kCGEventTapOptionDefault, mask, 
handleTapEvent,
+ self);
+if (!eventsTap) {
+@throw [NSException
+ exceptionWithName:@"Tap failure"
+reason:[NSString stringWithFormat:@"%@\n%@", @"Could not "
+"create event tap.", advice]
+userInfo:nil];
+} else {
+COCOA_DEBUG("Global events tap created! Will capture system key"
+" combos.\n");
+}
+
+eventsTapSource =
+CFMachPortCreateRunLoopSource(kCFAllocatorDefault, eventsTap, 0);
+if (!eventsTapSource ) {
+@throw [NSException
+ exceptionWithName:@"Tap failure"
+ reason:@"Could not obtain current CFRunLoop source."
+userInfo:nil];
+}
+CFRunLoopRef runLoop = CFRunLoopGetCurrent();
+if (!runLoop) {
+   @throw [NSException
+ exceptionWithName:@"Tap failure"
+ reason:@"Could not obtain current CFRunLoop."
+userInfo:nil];
+}
+
+CFRunLoopAddSource(runLoop, eventsTapSource, kCFRunLoopDefaultMode);
+CFRelease(eventsTapSource);
+}
+
 @end
 
 
@@ -1117,6 +1187,7 @@ - (void)openDocumentation:(NSString *)filename;
 - (IBAction) do_about_menu_item: (id) sender;
 - (void)make_about_window;
 - (void)adjustSpeed:(id)sender;
+- (IBAction)doFullGrab:(id)sender;
 @end
 
 @implementation QemuCocoaAppController
@@ -1569,6 +1640,35 @@ - (void)adjustSpeed:(id)sender
 COCOA_DEBUG("cpu throttling at %d%c\n", cpu_throttle_get_percentage(), 
'%');
 }
 
+// The action method to the 'Options->Full Keyboard Grab' menu item
+- (IBAction)doFullGrab:(id) sender
+{
+@try
+{
+// Set the state of the menu item
+// if already checked
+if ([sender state] == NSControlStateValueOn) {
+// remove runloop source
+

Re: [PATCH v3 3/3] hw/usb/xhci: Always expect 'dma' link property to be set

2021-08-27 Thread Philippe Mathieu-Daudé
On 8/27/21 1:03 PM, Mark Cave-Ayland wrote:
> On 27/08/2021 11:14, Peter Maydell wrote:
> 
>> On Fri, 27 Aug 2021 at 10:14, Mark Cave-Ayland
>>  wrote:
>>> Ah so the plan moving forward is to always have an explicit MR passed
>>> in for DMA use.
>>> Sorry if I missed that in earlier versions of the patchset, I'm still
>>> getting back up
>>> to speed on QEMU hacking.
>>>
>>> Was there a decision as to what the property name should be? I see
>>> "dma_mr" used
>>> quite a bit, and if there will be more patches to remove the
>>> system_memory default
>>> from other devices then it would be nice to try and use the same name
>>> everywhere.
>>
>> No, I don't think we have a convention; it might be nice to add one.
>> Currently a quick git grep for DEFINE_PROP_LINK and eyeballing of
>> the results shows that we have:
>>
>>   "memory" x 7
>>   "dram" x 4
>>   "downstream" x 3
>>   "dma-memory" x 3
>>   "dma" x 2
>>   "source-memory"
>>   "dram-mr"
>>   "ddr"
>>   "ddr-ram"
>>   "gic"
>>   "cpc"
>>   "port[N]"
>>   "dma_mr"
>>   "ahb-bus"
>>   "system-memory"
>>   "main-bus"
>>
>> This list includes all our TYPE_MEMORY_REGION link properties; a few
>> of these
>> are special-purpose, and reasonably have specialized names. 2 out of 3
>> users
>> of "downstream" are devices which pass on (or filter out) memory
>> transactions
>> from the CPU (tz-msc, tz-mpc), and I think that name makes sense there.
>> (The 3rd is pl080.c, which is a plain old DMA controller, and the naming
>> there is not so well-suited.)
>>
>> "memory" is mostly SoC and CPU objects taking a link to whatever they
>> should
>> have as the CPU's view of memory.
>>
>> I don't have a strong view on what we ought to try to standardize on,
>> except that I don't like the "_mr" or "-mr" suffix -- I don't think we
>> need to try to encode the type of the link property in the property name.
>>
>> It is probably reasonable to have different naming conventions for:
>>   * SoC and CPU objects, which take a link to the MR which represents
>>     the CPU/SoC's view of the outside world
>>   * Endpoint devices which can be DMA masters and take a link giving
>>     them their view of what they can DMA to
>>   * filtering/control devices which take incoming transactions from
>>     an upstream port, filter some and pass the rest through to a
>>     downstream port

Which category fits IOMMUs?

>>
>> In pretty much all cases, these link properties are used only internally
>> to QEMU, so if we decide on a naming convention we can fairly easily
>> rename existing properties to match.
> 
> I quite like "cpu-memory" for SoC/CPU objects and "dma-memory" for
> endpoint devices that can be DMA masters. Perhaps the last case is
> specialised enough that a convention there doesn't make sense?
So "iommu-memory"?




[PATCH 3/3] dtc: Update to version 1.6.1

2021-08-27 Thread Thomas Huth
The dtc submodule is currently pointing to non-release commit. It's nicer
if submodules point to release versions instead and since dtc 1.6.1 is
available now, let's update to that version.

Signed-off-by: Thomas Huth 
---
 dtc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dtc b/dtc
index 85e5d83984..b6910bec11 16
--- a/dtc
+++ b/dtc
@@ -1 +1 @@
-Subproject commit 85e5d839847af54efab170f2b1331b2a6421e647
+Subproject commit b6910bec11614980a21e46fbccc35934b671bd81
-- 
2.27.0




  1   2   >