date:20201203

Re: [PATCH RFC] vfio: Move the saving of the config space to the right place in VFIO migration

2020-12-03 Thread Shenming Lu

On 2020/12/2 6:21, Alex Williamson wrote:
> On Tue, 1 Dec 2020 14:37:52 +0800
> Shenming Lu  wrote:
> 
>> On 2020/12/1 1:03, Alex Williamson wrote:
>>> On Thu, 26 Nov 2020 14:56:17 +0800
>>> Shenming Lu  wrote:
>>>   
 Hi,

 After reading everyone's opinions, we have a rough idea for this issue.

 One key point is whether it is necessary to setup the config space before
 the device can accept further migration data. I think it is decided by
 the vendor driver, so we can simply ask the vendor driver about it in
 .save_setup, which could avoid a lot of unnecessary copies and settings.
 Once we have known the need, we can iterate the config space (before)
 along with the device migration data in .save_live_iterate and
 .save_live_complete_precopy, and if not needed, we can only migrate the
 config space in .save_state.

 Another key point is that the interrupt enabling should be after the
 restoring of the interrupt controller (might not only interrupts).
 My solution is to add a subflag at the beginning of the config data
 (right after VFIO_MIG_FLAG_DEV_CONFIG_STATE) to indicate the triggered
 actions on the dst (such as whether to enable interrupts).

 Below is it's workflow.

 On the save path:
In vfio_save_setup():
Ask the vendor driver if it needs the config space setup before it
can accept further migration data.  
>>>
>>> How does "ask the vendor driver" actually work?  
>>
>> Maybe via a ioctl?
>> Oh, it seems that we have to ask the dst vendor driver (in vfio_load_setup)
>> and send the config data (before) along with the device migration data
>> every time?...
> 
> Migration data streams are unidirectional, otherwise we break offline
> migration.  There are various ways other than an ioctl for a vendor
> driver to advertise requirements, ex. a capability on the migration
> region info, but it's not clear to me that we really need this info yet.

Ok, this is not clear yet.

> 
|
In vfio_save_iterate() (pre-copy):
If *needed*, save the config space which would be setup on the dst
before the migration data, but send with a subflag to instruct not
to (such as) enable interrupts.  
>>>
>>> If not for triggering things like MSI/X configuration, isn't config
>>> space almost entirely virtual?  What visibility does the vendor driver
>>> have to the VM machine dependencies regarding device interrupt versus
>>> interrupt controller migration?  
>>
>> My thought is that the vendor driver only decides the order of the config
>> space setup and the receiving of the migration data, but leaves the VM
>> machine dependencies to QEMU.
> 
> But again, config space is largely virtual, the vendor driver doesn't
> have access to it, it's only the effects of config space, like
> representing the interrupt mode and configuring it on the device or
> specific registers that QEMU writes through that the vendor driver
> sees.  So how is it that the vendor driver decides the order?

As you said above, a vendor driver could advertise its requirements via
the migration region...

> The
> vendor driver doesn't have visibility to the VM machine dependencies,
> like when the interrupt controller is sufficiently configured to enable
> device interrupts.  It seems that a vendor driver that depends on QEMU
> enabling interrupts at a specific point is inherently dependent on
> assumptions in the machine configuration.

Yeah, there might be more than one dependency.

> 
|
In vfio_save_complete_precopy() (stop-and-copy, iterable process):
The same as that in vfio_save_iterate().
|
In .save_state (stop-and-copy, non-iterable process):
If *needed*, only send a subflag to instruct to enable interrupts.
If *not needed*, save the config space and setup everything on the dst. 
  
>>>
>>> Again, how does the vendor driver have visibility to know when the VM
>>> machine can enable interrupts?  
>>
>> It seems troubling if the vendor driver needs the interrupts to be enabled
>> first...
> 
> Yes.
> 
 Besides the above idea, we might be able to choose to let the vendor 
 driver do
 more: qemu just sends and writes the config data (before) along with the 
 device
 migration data every time, and it's up to the vendor driver to filter 
 out/buffer
 the received data or reorder the settings...  
>>>
>>> There is no vendor driver in QEMU though, so are you suggesting that
>>> QEMU follows a standard protocol and the vendor driver chooses when to
>>> enable specific features?  For instance, QEMU would call SET_IRQS and
>>> the driver would return success, but defer that setup if necessary?
>>> That seems quite troubling as we then have ioctls that behave
>>> differently depending on the device state and we have no error path to
>>> userspace should that setup fail later.  The vendor driver does

Re: [DISCUSSION] How to set properties of non-pluggable devices?

2020-12-03 Thread Gerd Hoffmann

  Hi,

> Being non-pluggable means I can't use "-device foo,bar=baz" on the command
> line.
> [But I can use "-device foo,help" to list its properties :-)  (if I also
> specify -M bar) ]
> 
> How do people do this?

There is -global.  This sets properties for all instances of a given
device.  If there is only one instance of your built-in device you can
use that to set the properties.  A bit hackish, but works and has been
the only way to do it for a long time.

These days we typically add an alias to the machine type, examples are
pflash{0,1} and pcspk-audiodev for the pc machine type which configure
the builtin flash and pc speaker devices.

take care,
  Gerd

Re: [PATCH] hw/block: m25p80: Fix fast read for SST flashes

2020-12-03 Thread Francisco Iglesias

Hi Bin and Alistair,

On [2020 Dec 02] Wed 11:40:11, Alistair Francis wrote:
> On Sun, Nov 29, 2020 at 6:55 PM Bin Meng  wrote:
> >
> > From: Bin Meng 
> >
> > SST flashes require a dummy byte after the address bits.
> >
> > Signed-off-by: Bin Meng 
> 
> I couldn't find a datasheet that says this... But the actual code
> change looks fine, so:
> 
> Acked-by: Alistair Francis 
> 
> Alistair
> 
> > ---
> >
> >  hw/block/m25p80.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
> > index 483925f..9b36762 100644
> > --- a/hw/block/m25p80.c
> > +++ b/hw/block/m25p80.c
> > @@ -825,6 +825,9 @@ static void decode_fast_read_cmd(Flash *s)
> >  s->needed_bytes = get_addr_length(s);
> >  switch (get_man(s)) {
> >  /* Dummy cycles - modeled with bytes writes instead of bits */
> > +case MAN_SST:
> > +s->needed_bytes += 1;

1 dummy clk cycle is modelled as 1 byte write (see the comment above), so 1
dummy byte (8 dummy clk cycles) will need +8 above. An option could be to fall
through to the Windbond case below instead (since it seems to operate
likewise). 

Best regards,
Francisco Iglesias


> > +break;
> >  case MAN_WINBOND:
> >  s->needed_bytes += 8;
> >  break;
> > --
> > 2.7.4
> >
> >
>

Re: [PATCH-for 5.2?] hw/core/ressetable: fix reset count decrement

2020-12-03 Thread Damien Hedde




On 12/2/20 6:10 PM, Philippe Mathieu-Daudé wrote:
> On 12/2/20 5:40 PM, Damien Hedde wrote:
>> The reset count was only decremented if the device was in a single
>> reset.
>>
>> Also move the decrement before calling the exit phase method to fix
>> problem of reset state evaluation during that call. Update the doc
>> accordingly.
>>
>> Signed-off-by: Damien Hedde 
>> Fixes: 1905297 ("Zynq7000 UART clock reset initialization", 2020-11-23)
> 

> $ git show 1905297
> fatal: ambiguous argument '1905297': unknown revision or path not in the
> working tree.

I put Michael's bug number there. Should I put the incriminated commit
instead ?

> 
> Beside, typo ressetable -> resettable in subject.

Thanks,
Damien

Cc'ing michael new address too

> 
>> Reported-by: Michael Peter 
>> --
>>
>> Hi all,
>>
>> While looking at the bug reported by Michael and his patch. I found another
>> bug. Apparently I forgot to decrement the reset count if there was several
>> reset at the same time.
>>
>> This patch fixes that.
>>
>> I also moved the place of the decrement: before calling the exit phase 
>> method.
>> it globally fixes Michael's reported bug, as I think it will avoid some 
>> boiler
>> plate code in every exit phase method we do.
>>
>> Only other place where the reset state is checked is in the
>> hw/char/cadence-uart.c so it does not have high impact.
>>
>> I'm not sure if this meets the condition for 5.2 as it changes a documented
>> feature. In that case we can just accept Michael solution and I'll fix the
>> rest later.
>>
>> Here's the pointer for the bug and michael's patch.
>> https://lists.nongnu.org/archive/html/qemu-devel/2020-11/msg05786.html
>> https://lists.nongnu.org/archive/html/qemu-devel/2020-11/msg06105.html
>>
>> Damien

Re: [PATCH for-6.0 v2 1/3] spapr: Improve naming of some vCPU id related items

2020-12-03 Thread Greg Kurz

On Wed, 2 Dec 2020 15:56:16 +1100
David Gibson  wrote:

> On Mon, Nov 30, 2020 at 05:52:56PM +0100, Greg Kurz wrote:
> > The machine tells the IC backend the number of vCPU ids it will be
> > exposed to, in order to:
> > - fill the "ibm,interrupt-server-ranges" property in the DT (XICS)
> > - size the VP block used by the in-kernel chip (XICS-on-XIVE, XIVE)
> > 
> > The current "nr_servers" and "spapr_max_server_number" naming can
> > mislead people info thinking it is about a quantity of CPUs. Make
> > it clear this is all about vCPU ids.
> > 
> > Signed-off-by: Greg Kurz 
> 
> I know it seems very finicky, but can you please
> s/max_vcpu_ids/max_vcpu_id/g
> 
> At least to be "max_vcpu_ids" has some of the same confusion as the
> existing code - it reads like the maximum *number* of IDs, rather than
> the maximum *value* of IDs.
> 

Well... this is precisely the point. Finding a suitable name for
something that is essentially the "size of a range of consecutive
IDs starting from 0" and not the "maximum value of IDs". Not sure
of the more appropriate wording to describe this is a _least upper
bound_ actually.

> > ---
> >  include/hw/ppc/spapr.h  |  2 +-
> >  include/hw/ppc/spapr_irq.h  |  8 
> >  include/hw/ppc/spapr_xive.h |  2 +-
> >  include/hw/ppc/xics_spapr.h |  2 +-
> >  hw/intc/spapr_xive.c|  8 
> >  hw/intc/spapr_xive_kvm.c|  4 ++--
> >  hw/intc/xics_kvm.c  |  4 ++--
> >  hw/intc/xics_spapr.c|  8 
> >  hw/ppc/spapr.c  |  8 
> >  hw/ppc/spapr_irq.c  | 18 +-
> >  10 files changed, 32 insertions(+), 32 deletions(-)
> > 
> > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > index b7ced9faebf5..dc99d45e2852 100644
> > --- a/include/hw/ppc/spapr.h
> > +++ b/include/hw/ppc/spapr.h
> > @@ -849,7 +849,7 @@ int spapr_hpt_shift_for_ramsize(uint64_t ramsize);
> >  int spapr_reallocate_hpt(SpaprMachineState *spapr, int shift, Error 
> > **errp);
> >  void spapr_clear_pending_events(SpaprMachineState *spapr);
> >  void spapr_clear_pending_hotplug_events(SpaprMachineState *spapr);
> > -int spapr_max_server_number(SpaprMachineState *spapr);
> > +int spapr_max_vcpu_ids(SpaprMachineState *spapr);
> >  void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
> >uint64_t pte0, uint64_t pte1);
> >  void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered);
> > diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
> > index c22a72c9e270..2e53fc9e6cbb 100644
> > --- a/include/hw/ppc/spapr_irq.h
> > +++ b/include/hw/ppc/spapr_irq.h
> > @@ -43,7 +43,7 @@ DECLARE_CLASS_CHECKERS(SpaprInterruptControllerClass, 
> > SPAPR_INTC,
> >  struct SpaprInterruptControllerClass {
> >  InterfaceClass parent;
> >  
> > -int (*activate)(SpaprInterruptController *intc, uint32_t nr_servers,
> > +int (*activate)(SpaprInterruptController *intc, uint32_t max_vcpu_ids,
> >  Error **errp);
> >  void (*deactivate)(SpaprInterruptController *intc);
> >  
> > @@ -62,7 +62,7 @@ struct SpaprInterruptControllerClass {
> >  /* These methods should only be called on the active intc */
> >  void (*set_irq)(SpaprInterruptController *intc, int irq, int val);
> >  void (*print_info)(SpaprInterruptController *intc, Monitor *mon);
> > -void (*dt)(SpaprInterruptController *intc, uint32_t nr_servers,
> > +void (*dt)(SpaprInterruptController *intc, uint32_t max_vcpu_ids,
> > void *fdt, uint32_t phandle);
> >  int (*post_load)(SpaprInterruptController *intc, int version_id);
> >  };
> > @@ -74,7 +74,7 @@ int spapr_irq_cpu_intc_create(struct SpaprMachineState 
> > *spapr,
> >  void spapr_irq_cpu_intc_reset(struct SpaprMachineState *spapr, PowerPCCPU 
> > *cpu);
> >  void spapr_irq_cpu_intc_destroy(struct SpaprMachineState *spapr, 
> > PowerPCCPU *cpu);
> >  void spapr_irq_print_info(struct SpaprMachineState *spapr, Monitor *mon);
> > -void spapr_irq_dt(struct SpaprMachineState *spapr, uint32_t nr_servers,
> > +void spapr_irq_dt(struct SpaprMachineState *spapr, uint32_t max_vcpu_ids,
> >void *fdt, uint32_t phandle);
> >  
> >  uint32_t spapr_irq_nr_msis(struct SpaprMachineState *spapr);
> > @@ -105,7 +105,7 @@ typedef int 
> > (*SpaprInterruptControllerInitKvm)(SpaprInterruptController *,
> >  
> >  int spapr_irq_init_kvm(SpaprInterruptControllerInitKvm fn,
> > SpaprInterruptController *intc,
> > -   uint32_t nr_servers,
> > +   uint32_t max_vcpu_ids,
> > Error **errp);
> >  
> >  /*
> > diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> > index 26c8d90d7196..643129b13536 100644
> > --- a/include/hw/ppc/spapr_xive.h
> > +++ b/include/hw/ppc/spapr_xive.h
> > @@ -79,7 +79,7 @@ int spapr_xive_end_to_target(uint8_t end_blk, uint32_t 
> > end_idx,
> >  /*
> >   * KVM XIVE device helpers
> >   */
> > -int kvmppc_xive_conne

RE: [PATCH] block: Don't inactivate bs if it is aleady inactive

2020-12-03 Thread Tuguoyi

> -Original Message-
> From: Vladimir Sementsov-Ogievskiy [mailto:vsement...@virtuozzo.com]
> Sent: Saturday, November 28, 2020 4:48 PM
> To: tuguoyi (Cloud) ; Kevin Wolf ;
> Max Reitz ; qemu-bl...@nongnu.org
> Cc: qemu-devel@nongnu.org; wangyong (Cloud) 
> Subject: Re: [PATCH] block: Don't inactivate bs if it is aleady inactive
> 
> 24.11.2020 13:04, Tuguoyi wrote:
> > The following steps will cause qemu assertion failure:
> > - pause vm
> > - save memory snapshot into local file through fd migration
> > - do the above operation again will cause qemu assertion failure
> 
> Hi!
> 
> Why do you need such scenario? Isn't it more correct and safer to just
> error-out on savevm if disks are already inactive? Inactive disks are a sign,
> that vm may be migrated to other host and already running, so creating any
> kind of snapshots of this old state may be bad idea. I mean, you try to allow 
> a
> doubtful feature to avoid an assertion. If you don't have strong reasons for 
> the
> feature, it's better to turn a crash into clean error-out.

This scenario is triggered by auto test tool which has a snapshot policy that 
create 
external snapshots of disk and memory periodically by virsh. But during the 
test, 
the virtual machine get paused due to storage shortage, so the second external 
snapshot
after that point will cause qemu crash.

I agree that it's better to error-out when that case happens.

> As far as I remember, bdrv_inactive_all() is the only source of inactivation, 
> so
> actually, it's more like "inactive" state of the vm, not just some disks are
> inactive.. And you change the code in a way that it looks like that some disks
> may be inactive and some not, which would actually be unexpected behavior.

Thanks for your comments, and I'll send a new patch

> >
> > The backtrace looks like:
> > #0  0x7fbf958c5c37 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> > #1  0x7fbf958c9028 in abort () from /lib/x86_64-linux-gnu/libc.so.6
> > #2  0x7fbf958bebf6 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> > #3  0x7fbf958beca2 in __assert_fail () from
> /lib/x86_64-linux-gnu/libc.so.6
> > #4  0x55ca8decd39d in bdrv_inactivate_recurse (bs=0x55ca90c80400)
> at /build/qemu-5.0/block.c:5724
> > #5  0x55ca8dece967 in bdrv_inactivate_all () at
> /build//qemu-5.0/block.c:5792
> > #6  0x55ca8de5539d in
> qemu_savevm_state_complete_precopy_non_iterable (inactivate_disks=true,
> in_postcopy=false, f=0x55ca907044b0)
> >  at /build/qemu-5.0/migration/savevm.c:1401
> > #7  qemu_savevm_state_complete_precopy (f=0x55ca907044b0,
> iterable_only=iterable_only@entry=false,
> inactivate_disks=inactivate_disks@entry=true)
> >  at /build/qemu-5.0/migration/savevm.c:1453
> > #8  0x55ca8de4f581 in migration_completion (s=0x55ca8f64d9f0) at
> /build/qemu-5.0/migration/migration.c:2941
> > #9  migration_iteration_run (s=0x55ca8f64d9f0) at
> /build/qemu-5.0/migration/migration.c:3295
> > #10 migration_thread (opaque=opaque@entry=0x55ca8f64d9f0) at
> /build/qemu-5.0/migration/migration.c:3459
> > #11 0x55ca8dfc6716 in qemu_thread_start (args=) at
> /build/qemu-5.0/util/qemu-thread-posix.c:519
> > #12 0x7fbf95c5f184 in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> > #13 0x7fbf9598cbed in clone () from /lib/x86_64-linux-gnu/libc.so.6
> >
> > When the first migration completes, bs->open_flags will set
> BDRV_O_INACTIVE flag
> > by bdrv_inactivate_recurse(), and during the second migration the
> > bdrv_inactivate_recurse assert that the bs->open_flags is already
> BDRV_O_INACTIVE
> > enabled which cause crash.
> >
> > This patch just make the bdrv_inactivate_all() to don't inactivate the bs 
> > if it is
> > already inactive
> >
> > Signed-off-by: Tuguoyi 
> > ---
> >   block.c | 7 ++-
> >   1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/block.c b/block.c
> > index f1cedac..02361e1 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -5938,6 +5938,11 @@ static int
> bdrv_inactivate_recurse(BlockDriverState *bs)
> >   return 0;
> >   }
> >
> > +static bool bdrv_is_inactive(BlockDriverState *bs)
> > +{
> > +return bs->open_flags & BDRV_O_INACTIVE;
> > +}
> > +
> >   int bdrv_inactivate_all(void)
> >   {
> >   BlockDriverState *bs = NULL;
> > @@ -5958,7 +5963,7 @@ int bdrv_inactivate_all(void)
> >   /* Nodes with BDS parents are covered by recursion from the
> last
> >* parent that gets inactivated. Don't inactivate them a second
> >* time if that has already happened. */
> > -if (bdrv_has_bds_parent(bs, false)) {
> > +if (bdrv_has_bds_parent(bs, false) || bdrv_is_inactive(bs)) {
> >   continue;
> >   }
> >   ret = bdrv_inactivate_recurse(bs);
> >
> 
> 
> 
> --
> Best regards,
> Vladimir

Re: [PATCH-for-5.2? 1/1] Acceptance tests: bump Fedora to 32

2020-12-03 Thread Philippe Mathieu-Daudé

On 12/2/20 10:57 PM, Cleber Rosa wrote:
> Currently in use Fedora 31 has been moved out of the standard download
> locations that are supported by the functionality provided by
> avocado.utils.vmimage.  So right now, the boot_linux.py tests will get
> canceled by not being able to find those specific images.
> 
> Ideally, this would be bumped to version 33.  But, I've found issues
> with the aarch64 images, with various systemd services failing to
> start.  So to keep all archs consistent, I've settled on 32.
> 
> Signed-off-by: Cleber Rosa 
> ---
>  tests/acceptance/boot_linux.py | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/tests/acceptance/boot_linux.py b/tests/acceptance/boot_linux.py
> index 1da4a53d6a..0824de008e 100644
> --- a/tests/acceptance/boot_linux.py
> +++ b/tests/acceptance/boot_linux.py
> @@ -42,13 +42,13 @@ class BootLinuxBase(Test):
>  vmimage.QEMU_IMG = qemu_img
>  
>  self.log.info('Downloading/preparing boot image')
> -# Fedora 31 only provides ppc64le images
> +# Fedora 32 only provides ppc64le images
>  image_arch = self.arch
>  if image_arch == 'ppc64':
>  image_arch = 'ppc64le'
>  try:
>  boot = vmimage.get(
> -'fedora', arch=image_arch, version='31',
> +'fedora', arch=image_arch, version='32',

I already expressed my view on this (latest QEMU should be
able to use at least f31 - which was tested - and eventually
f33 - which is coverage extension). I'm not going to vouch
this change. If other maintainers are happy with it, I don't
mind this gets merged.

BTW I don't see why this is urgent for 5.2.

Phil.

Re: [PATCH 2/8] hvf: Move common code out

2020-12-03 Thread Roman Bolshakov

On Mon, Nov 30, 2020 at 04:00:11PM -0800, Peter Collingbourne wrote:
> On Mon, Nov 30, 2020 at 3:18 PM Alexander Graf  wrote:
> >
> >
> > On 01.12.20 00:01, Peter Collingbourne wrote:
> > > On Mon, Nov 30, 2020 at 1:40 PM Alexander Graf  wrote:
> > >> Hi Peter,
> > >>
> > >> On 30.11.20 22:08, Peter Collingbourne wrote:
> > >>> On Mon, Nov 30, 2020 at 12:56 PM Frank Yang  wrote:
> > 
> >  On Mon, Nov 30, 2020 at 12:34 PM Alexander Graf  
> >  wrote:
> > > Hi Frank,
> > >
> > > Thanks for the update :). Your previous email nudged me into the 
> > > right direction. I previously had implemented WFI through the 
> > > internal timer framework which performed way worse.
> >  Cool, glad it's helping. Also, Peter found out that the main thing 
> >  keeping us from just using cntpct_el0 on the host directly and compare 
> >  with cval is that if we sleep, cval is going to be much < cntpct_el0 
> >  by the sleep time. If we can get either the architecture or macos to 
> >  read out the sleep time then we might be able to not have to use a 
> >  poll interval either!
> > > Along the way, I stumbled over a few issues though. For starters, the 
> > > signal mask for SIG_IPI was not set correctly, so while pselect() 
> > > would exit, the signal would never get delivered to the thread! For a 
> > > fix, check out
> > >
> > > 
> > > https://patchew.org/QEMU/20201130030723.78326-1-ag...@csgraf.de/20201130030723.78326-4-ag...@csgraf.de/
> > >
> >  Thanks, we'll take a look :)
> > 
> > > Please also have a look at my latest stab at WFI emulation. It 
> > > doesn't handle WFE (that's only relevant in overcommitted scenarios). 
> > > But it does handle WFI and even does something similar to hlt 
> > > polling, albeit not with an adaptive threshold.
> > >>> Sorry I'm not subscribed to qemu-devel (I'll subscribe in a bit) so
> > >>> I'll reply to your patch here. You have:
> > >>>
> > >>> +/* Set cpu->hvf->sleeping so that we get a
> > >>> SIG_IPI signal. */
> > >>> +cpu->hvf->sleeping = true;
> > >>> +smp_mb();
> > >>> +
> > >>> +/* Bail out if we received an IRQ meanwhile */
> > >>> +if (cpu->thread_kicked || (cpu->interrupt_request &
> > >>> +(CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ))) {
> > >>> +cpu->hvf->sleeping = false;
> > >>> +break;
> > >>> +}
> > >>> +
> > >>> +/* nanosleep returns on signal, so we wake up on 
> > >>> kick. */
> > >>> +nanosleep(ts, NULL);
> > >>>
> > >>> and then send the signal conditional on whether sleeping is true, but
> > >>> I think this is racy. If the signal is sent after sleeping is set to
> > >>> true but before entering nanosleep then I think it will be ignored and
> > >>> we will miss the wakeup. That's why in my implementation I block IPI
> > >>> on the CPU thread at startup and then use pselect to atomically
> > >>> unblock and begin sleeping. The signal is sent unconditionally so
> > >>> there's no need to worry about races between actually sleeping and the
> > >>> "we think we're sleeping" state. It may lead to an extra wakeup but
> > >>> that's better than missing it entirely.
> > >>
> > >> Thanks a bunch for the comment! So the trick I was using here is to > > 
> > >> >> modify the timespec from the kick function before sending the IPI
> > >> signal. That way, we know that either we are inside the sleep (where the
> > >> signal wakes it up) or we are outside the sleep (where timespec={} will
> > >> make it return immediately).
> > >>
> > >> The only race I can think of is if nanosleep does calculations based on
> > >> the timespec and we happen to send the signal right there and then.
> > > Yes that's the race I was thinking of. Admittedly it's a small window
> > > but it's theoretically possible and part of the reason why pselect was
> > > created.
> > >
> > >> The problem with blocking IPIs is basically what Frank was describing
> > >> earlier: How do you unset the IPI signal pending status? If the signal
> > >> is never delivered, how can pselect differentiate "signal from last time
> > >> is still pending" from "new signal because I got an IPI"?
> > > In this case we would take the additional wakeup which should be
> > > harmless since we will take the WFx exit again and put us in the
> > > correct state. But that's a lot better than busy looping.
> >
> >
> > I'm not sure I follow. I'm thinking of the following scenario:
> >
> >- trap into WFI handler
> >- go to sleep with blocked SIG_IPI
> >- SIG_IPI arrives, pselect() exits
> >- signal is still pending because it's blocked
> >- enter guest
> >- trap into WFI handler
> >- run pselect(), but it immediate exits because SIG_IPI is still pending
> >
> > This wa

Re: [PATCH v1 1/1] security-process: update process information

2020-12-03 Thread Daniel P . Berrangé

On Thu, Dec 03, 2020 at 11:32:44AM +0530, P J P wrote:
> +-- On Wed, 2 Dec 2020, Daniel P. Berrangé wrote --+
> | > +- If issue is found to be less severe, an upstream public bug (or an
> | > +  issue) will be created immediately.
> | 
> | No need to repeat "or an issue". I think it would read more clearly as
> | 
> |- If the severity of the issue is sufficiently low, an upstream public 
> bug
> |  may be created immediately.
> 
> * Let's settle on public GitLab issues, shall we? 
> 
> * Tomorrow after an issue triage if someone asks where should they create a 
>   public tracker, it's better to have one definite answer, instead of choose 
>   either LaunchPad or GitLab issues.
> 
> * OR is it better to have both? ie. file a public tracker anywhere as per 
> ones 
>   convenience?
> 
> * One GitLab is good I think.

Just link to the page where QEMU documents its bug tracker. That is
current laucnhpad, but may change to Gitlab. The security docs don't
need to mention the specific host, if we just link to the bugs page.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH] fuzz: Add more i386 configurations for fuzzing

2020-12-03 Thread Philippe Mathieu-Daudé

On 12/2/20 5:40 PM, Alexander Bulekov wrote:
> On 201123 1343, Alexander Bulekov wrote:
>> This adds configurations for fuzzing the following devices on oss-fuzz:
>>
...
>> I have little context for how useful these configurations are for
>> fuzzing. I appreciate if you can Ack/Nack them or provide feedback if
>> the devices should be configured differently.  Of course, if you think
>> we should be fuzzing some additional device configurations, you can also
>> submit a patch adding the necessary lines to this generic_fuzz_configs.h
>> file. 
>> Thanks
>> -Alex
>>
> 
> Ping. We could just add all of these configurations and, later, remove
> any that produce too many useless reports.

Not a Nack, but I'd rather enhance qtest coverage by adding
these configs via , and then consume this
with tests/qtest/fuzz/qos_fuzz.c.

>>  tests/qtest/fuzz/generic_fuzz_configs.h | 80 +
>>  1 file changed, 80 insertions(+)
>>
>> diff --git a/tests/qtest/fuzz/generic_fuzz_configs.h 
>> b/tests/qtest/fuzz/generic_fuzz_configs.h
>> index c4d925f9e6..0b1fe0f836 100644
>> --- a/tests/qtest/fuzz/generic_fuzz_configs.h
>> +++ b/tests/qtest/fuzz/generic_fuzz_configs.h
>> @@ -115,6 +115,86 @@ const generic_fuzz_config predefined_configs[] = {
>>  .name = "pc-q35",
>>  .args = "-machine q35",
>>  .objects = "*",
>> +},{
>> +.name = "vmxnet3",
>> +.args = "-machine q35 -nodefaults "
>> +"-device vmxnet3,netdev=net0 -netdev user,id=net0",
>> +.objects = "vmxnet3"
>> +},{
>> +.name = "ne2k_pci",
>> +.args = "-machine q35 -nodefaults "
>> +"-device ne2k_pci,netdev=net0 -netdev user,id=net0",
>> +.objects = "ne2k*"
>> +},{
>> +.name = "pcnet",
>> +.args = "-machine q35 -nodefaults "
>> +"-device pcnet,netdev=net0 -netdev user,id=net0",
>> +.objects = "pcnet"
>> +},{
>> +.name = "rtl8139",
>> +.args = "-machine q35 -nodefaults "
>> +"-device rtl8139,netdev=net0 -netdev user,id=net0",
>> +.objects = "rtl8139"
>> +},{
>> +.name = "i82550",
>> +.args = "-machine q35 -nodefaults "
>> +"-device i82550,netdev=net0 -netdev user,id=net0",
>> +.objects = "eepro*"
>> +},{
>> +.name = "sdhci-v3",
>> +.args = "-nodefaults -device sdhci-pci,sd-spec-version=3 "
>> +"-device sd-card,drive=mydrive "
>> +"-drive if=sd,index=0,file=null-co://,format=raw,id=mydrive 
>> -nographic",
>> +.objects = "sd*"
>> +},{
>> +.name = "ehci",
>> +.args = "-machine q35 -nodefaults "
>> +"-device ich9-usb-ehci1,bus=pcie.0,addr=1d.7,"
>> +"multifunction=on,id=ich9-ehci-1 "
>> +"-device ich9-usb-uhci1,bus=pcie.0,addr=1d.0,"
>> +"multifunction=on,masterbus=ich9-ehci-1.0,firstport=0 "
>> +"-device ich9-usb-uhci2,bus=pcie.0,addr=1d.1,"
>> +"multifunction=on,masterbus=ich9-ehci-1.0,firstport=2 "
>> +"-device ich9-usb-uhci3,bus=pcie.0,addr=1d.2,"
>> +"multifunction=on,masterbus=ich9-ehci-1.0,firstport=4 "
>> +"-drive if=none,id=usbcdrom,media=cdrom "
>> +"-device usb-tablet,bus=ich9-ehci-1.0,port=1,usb_version=1 "
>> +"-device usb-storage,bus=ich9-ehci-1.0,port=2,drive=usbcdrom",
>> +.objects = "*usb* *hci*",
>> +},{
>> +.name = "ohci",
>> +.args = "-machine q35 -nodefaults  -device pci-ohci -device 
>> usb-kbd",
>> +.objects = "*usb* *ohci*",
>> +},{
>> +.name = "megaraid",
>> +.args = "-machine q35 -nodefaults -device megasas -device 
>> scsi-cd,drive=null0 "
>> +"-blockdev driver=null-co,read-zeroes=on,node-name=null0",
>> +.objects = "megasas*",
>> +},{
>> +.name = "ac97",
>> +.args = "-machine q35 -nodefaults "
>> +"-device ac97,audiodev=snd0 -audiodev none,id=snd0 -nodefaults",
>> +.objects = "ac97*",
>> +},{
>> +.name = "cs4231a",
>> +.args = "-machine q35 -nodefaults "
>> +"-device cs4231a,audiodev=snd0 -audiodev none,id=snd0 -nodefaults",
>> +.objects = "cs4231a* i8257*",
>> +},{
>> +.name = "es1370",
>> +.args = "-machine q35 -nodefaults "
>> +"-device es1370,audiodev=snd0 -audiodev none,id=snd0 -nodefaults",
>> +.objects = "es1370*",
>> +},{
>> +.name = "sb16",
>> +.args = "-machine q35 -nodefaults "
>> +"-device sb16,audiodev=snd0 -audiodev none,id=snd0 -nodefaults",
>> +.objects = "sb16* i8257*",
>> +},{
>> +.name = "parallel",
>> +.args = "-machine q35 -nodefaults "
>> +"-parallel file:/dev/null",
>> +.objects = "parallel*",
>>  }
>>  };
>>  
>> -- 
>> 2.28.0
>>
>

Re: [PATCH] ide:atapi: check io_buffer_index in ide_atapi_cmd_reply_end

2020-12-03 Thread P J P

+-- On Wed, 2 Dec 2020, Philippe Mathieu-DaudÃ© wrote --+
| a fair part is to ask the reporter to attach its reproducer to the private 
| BZ,

Yes, reporters sharing/releasing it is best.

Thank you.
--
Prasad J Pandit / Red Hat Product Security Team
8685 545E B54C 486B C6EB 271E E285 8B5A F050 DE8D

Re: [PATCH for-6.0 v2 2/3] spapr/xive: Fix size of END table and number of claimed IPIs

2020-12-03 Thread Cédric Le Goater

On 11/30/20 7:07 PM, Cédric Le Goater wrote:
> On 11/30/20 5:52 PM, Greg Kurz wrote:
>> The sPAPR XIVE device has an internal ENDT table the size of
>> which is configurable by the machine. This table is supposed
>> to contain END structures for all possible vCPUs that may
>> enter the guest. The machine must also claim IPIs for all
>> possible vCPUs since this is expected by the guest.
>>
>> spapr_irq_init() takes care of that under the assumption that
>> spapr_max_vcpu_ids() returns the number of possible vCPUs.
>> This happens to be the case when the VSMT mode is set to match
>> the number of threads per core in the guest (default behavior).
>> With non-default VSMT settings, this limit is > to the number
>> of vCPUs. In the worst case, we can end up allocating an 8 times
>> bigger ENDT and claiming 8 times more IPIs than needed. But more
>> importantly, this creates a confusion between number of vCPUs and
>> vCPU ids, which can lead to subtle bugs like [1].
>>
>> Use smp.max_cpus instead of spapr_max_vcpu_ids() in
>> spapr_irq_init() for the latest machine type. Older machine
>> types continue to use spapr_max_vcpu_ids() since the size of
>> the ENDT is migration visible.
>>
>> [1] https://bugs.launchpad.net/qemu/+bug/1900241
>>
>> Signed-off-by: Greg Kurz 
> 
> 
> Reviewed-by: Cédric Le Goater 

I gave patch 2 and 3 a little more thinking. 

I don't think we need much more than patch 1 which clarifies the 
nature of the values being manipulated, quantities vs. numbering.

The last 2 patches are adding complexity to try to optimize the 
XIVE VP space in a case scenario which is not very common (vSMT). 
May be it's not worth it. 

Today, we can start 4K (-2) KVM guests with 16 vCPUs each on 
a witherspoon (2 socket P9) and we are far from reaching the 
limits of the VP space. Available RAM is more a problem. 

VP space is even bigger on P10. The width was increased to 24bit 
per chip.

C.

Re: [PATCH v12 00/19] Initial support for multi-process Qemu

2020-12-03 Thread Stefan Hajnoczi

On Tue, Dec 01, 2020 at 03:22:35PM -0500, Jagannathan Raman wrote:
> This is the v12 of the patchset. Thank you very much for the
> review of the v11 of the series.

I'm in favor of merging this for QEMU 6.0. The command-line interface
has the x- prefix so QEMU is not committing to a stable interface.
Changes needed to support additional device types or to switch to the
vfio-user protocol can be made later.

Jag, Elena, JJ: I suggest getting your GPG key to Peter Maydell so you
can send multi-process QEMU pull requests.

Reviewed-by: Stefan Hajnoczi 

signature.asc
Description: PGP signature

Re: [PATCH v2 3/4] contrib/vhost-user-input: avoid g_return_val_if() input validation

2020-12-03 Thread Stefan Hajnoczi

On Wed, Dec 02, 2020 at 07:51:49PM +0400, Marc-André Lureau wrote:
> On Wed, Dec 2, 2020 at 7:27 PM Stefan Hajnoczi  wrote:
> 
> > Do not validate input with g_return_val_if(). This API is intended for
> > checking programming errors and is compiled out with -DG_DISABLE_CHECKS.
> >
> > Use an explicit if statement for input validation so it cannot
> > accidentally be compiled out.
> >
> > Suggested-by: Markus Armbruster 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  contrib/vhost-user-input/main.c | 6 +-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/contrib/vhost-user-input/main.c
> > b/contrib/vhost-user-input/main.c
> > index 6020c6f33a..1d79c61200 100644
> > --- a/contrib/vhost-user-input/main.c
> > +++ b/contrib/vhost-user-input/main.c
> > @@ -212,7 +212,11 @@ static int vi_get_config(VuDev *dev, uint8_t *config,
> > uint32_t len)
> >  {
> >  VuInput *vi = container_of(dev, VuInput, dev.parent);
> >
> > -g_return_val_if_fail(len <= sizeof(*vi->sel_config), -1);
> > +if (len > sizeof(*vi->sel_config)) {
> > +g_critical("%s: len %u is larger than %zu",
> > +   __func__, len, sizeof(*vi->sel_config));
> > +return -1;
> >
> 
> g_critical() already has __FILE__ __LINE__ and G_STRFUNC.

Will fix.

Stefan


signature.asc
Description: PGP signature

Re: [PATCH v2 2/4] contrib/vhost-user-gpu: avoid g_return_val_if() input validation

2020-12-03 Thread Stefan Hajnoczi

On Wed, Dec 02, 2020 at 07:50:51PM +0400, Marc-André Lureau wrote:
> On Wed, Dec 2, 2020 at 7:27 PM Stefan Hajnoczi  wrote:
> 
> > Do not validate input with g_return_val_if(). This API is intended for
> > checking programming errors and is compiled out with -DG_DISABLE_CHECKS.
> >
> > Use an explicit if statement for input validation so it cannot
> > accidentally be compiled out.
> >
> > Suggested-by: Markus Armbruster 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  contrib/vhost-user-gpu/vhost-user-gpu.c | 6 +-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/contrib/vhost-user-gpu/vhost-user-gpu.c
> > b/contrib/vhost-user-gpu/vhost-user-gpu.c
> > index a019d0a9ac..534bad24d1 100644
> > --- a/contrib/vhost-user-gpu/vhost-user-gpu.c
> > +++ b/contrib/vhost-user-gpu/vhost-user-gpu.c
> > @@ -1044,7 +1044,11 @@ vg_get_config(VuDev *dev, uint8_t *config, uint32_t
> > len)
> >  {
> >  VuGpu *g = container_of(dev, VuGpu, dev.parent);
> >
> > -g_return_val_if_fail(len <= sizeof(struct virtio_gpu_config), -1);
> > +if (len > sizeof(struct virtio_gpu_config)) {
> > +g_critical("%s: len %u is larger than %zu",
> > +   __func__, len, sizeof(struct virtio_gpu_config));
> >
> 
> g_critical() already has __FILE__ __LINE__ and G_STRFUNC.

I did this for consistency with the logging in this source file. The
other g_critical() calls in the file also print __func__.

Stefan


signature.asc
Description: PGP signature

Re: [PATCH] scsi: allow user to set werror as report

2020-12-03 Thread Philippe Mathieu-Daudé

On 12/3/20 3:55 AM, Zihao Chang wrote:
> Ping? This is a fix patch which has been reviewed, whose tree should it go 
> via?

The change itself is in-between 'Block layer' and 'SCSI'
subsystems, so either Paolo or Kevin (Cc'ing qemu-block@).

> 
> Thanks
> Zihao
> 
> On 2020/11/3 22:03, Zihao Chang wrote:
>>
>>
>> On 2020/11/3 18:52, Fam Zheng wrote:
>>> On Tue, 2020-11-03 at 14:12 +0800, Zihao Chang wrote:
 'enospc' is the default for -drive, but qemu allows user to set
 drive option werror. If werror of scsi-generic is set to 'report'
 by user, qemu will not allow vm to start.

 This patch allow user to set werror as 'report' for scsi-generic.

 Signed-off-by: Zihao Chang 
 ---
  hw/scsi/scsi-generic.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

 diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
 index 2cb23ca891..2730e37d63 100644
 --- a/hw/scsi/scsi-generic.c
 +++ b/hw/scsi/scsi-generic.c
 @@ -664,7 +664,8 @@ static void scsi_generic_realize(SCSIDevice *s,
 Error **errp)
  return;
  }
  
 -if (blk_get_on_error(s->conf.blk, 0) !=
 BLOCKDEV_ON_ERROR_ENOSPC) {
 +if (blk_get_on_error(s->conf.blk, 0) != BLOCKDEV_ON_ERROR_ENOSPC
 &&
 +blk_get_on_error(s->conf.blk, 0) !=
 BLOCKDEV_ON_ERROR_REPORT) {
  error_setg(errp, "Device doesn't support drive option
 werror");
  return;
  }
>>>
>>> Accepting the report sounds sane to me, it matches what we actually
>>> (always) do. Is the idea to allow users to spell it out explicitly in
>>> the command line?
>>>
>> Actually, qemu supports user to spell it out explicitly in the command
>> line like "enospc", "report" & "action". This patch just allows user to
>> set werror as "report" for scsi-generic, which is a common scenario.
>>
>>> Reviewed-by: Fam Zheng 
>>>
>>> .
>>>
>

Re: [PATCH-for 5.2?] hw/core/ressetable: fix reset count decrement

2020-12-03 Thread Philippe Mathieu-Daudé

On 12/3/20 9:58 AM, Damien Hedde wrote:
> On 12/2/20 6:10 PM, Philippe Mathieu-Daudé wrote:
>> On 12/2/20 5:40 PM, Damien Hedde wrote:
>>> The reset count was only decremented if the device was in a single
>>> reset.
>>>
>>> Also move the decrement before calling the exit phase method to fix
>>> problem of reset state evaluation during that call. Update the doc
>>> accordingly.
>>>
>>> Signed-off-by: Damien Hedde 
>>> Fixes: 1905297 ("Zynq7000 UART clock reset initialization", 2020-11-23)
>>
> 
>> $ git show 1905297
>> fatal: ambiguous argument '1905297': unknown revision or path not in the
>> working tree.
> 
> I put Michael's bug number there. Should I put the incriminated commit
> instead ?

In that case you want:
Buglink: https://bugs.launchpad.net/qemu/+bug/1905297

> 
>>
>> Beside, typo ressetable -> resettable in subject.
> 
> Thanks,
> Damien
> 
> Cc'ing michael new address too
> 
>>
>>> Reported-by: Michael Peter 
>>> --
>>>
>>> Hi all,
>>>
>>> While looking at the bug reported by Michael and his patch. I found another
>>> bug. Apparently I forgot to decrement the reset count if there was several
>>> reset at the same time.
>>>
>>> This patch fixes that.
>>>
>>> I also moved the place of the decrement: before calling the exit phase 
>>> method.
>>> it globally fixes Michael's reported bug, as I think it will avoid some 
>>> boiler
>>> plate code in every exit phase method we do.
>>>
>>> Only other place where the reset state is checked is in the
>>> hw/char/cadence-uart.c so it does not have high impact.
>>>
>>> I'm not sure if this meets the condition for 5.2 as it changes a documented
>>> feature. In that case we can just accept Michael solution and I'll fix the
>>> rest later.
>>>
>>> Here's the pointer for the bug and michael's patch.
>>> https://lists.nongnu.org/archive/html/qemu-devel/2020-11/msg05786.html
>>> https://lists.nongnu.org/archive/html/qemu-devel/2020-11/msg06105.html
>>>
>>> Damien
>

Re: [PATCH v1 1/9] memory: Introduce RamDiscardMgr for RAM memory regions

2020-12-03 Thread David Hildenbrand

On 03.12.20 00:26, Alex Williamson wrote:
> On Thu, 19 Nov 2020 16:39:10 +0100
> David Hildenbrand  wrote:
> 
>> We have some special RAM memory regions (managed by virtio-mem), whereby
>> the guest agreed to only use selected memory ranges. "unused" parts are
>> discarded so they won't consume memory - to logically unplug these memory
>> ranges. Before the VM is allowed to use such logically unplugged memory
>> again, coordination with the hypervisor is required.
>>
>> This results in "sparse" mmaps/RAMBlocks/memory regions, whereby only
>> coordinated parts are valid to be used/accessed by the VM.
>>
>> In most cases, we don't care about that - e.g., in KVM, we simply have a
>> single KVM memory slot. However, in case of vfio, registering the
>> whole region with the kernel results in all pages getting pinned, and
>> therefore an unexpected high memory consumption - discarding of RAM in
>> that context is broken.
>>
>> Let's introduce a way to coordinate discarding/populating memory within a
>> RAM memory region with such special consumers of RAM memory regions: they
>> can register as listeners and get updates on memory getting discarded and
>> populated. Using this machinery, vfio will be able to map only the
>> currently populated parts, resulting in discarded parts not getting pinned
>> and not consuming memory.
>>
>> A RamDiscardMgr has to be set for a memory region before it is getting
>> mapped, and cannot change while the memory region is mapped.
>>
>> Note: At some point, we might want to let RAMBlock users (esp. vfio used
>> for nvme://) consume this interface as well. We'll need RAMBlock notifier
>> calls when a RAMBlock is getting mapped/unmapped (via the corresponding
>> memory region), so we can properly register a listener there as well.
>>
>> Cc: Paolo Bonzini 
>> Cc: "Michael S. Tsirkin" 
>> Cc: Alex Williamson 
>> Cc: Dr. David Alan Gilbert 
>> Cc: Igor Mammedov 
>> Cc: Pankaj Gupta 
>> Cc: Peter Xu 
>> Cc: Auger Eric 
>> Cc: Wei Yang 
>> Cc: teawater 
>> Cc: Marek Kedzierski 
>> Signed-off-by: David Hildenbrand 
>> ---
>>  include/exec/memory.h | 225 ++
>>  softmmu/memory.c  |  22 +
>>  2 files changed, 247 insertions(+)
>>
>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>> index 0f3e6bcd5e..468cbb53a4 100644
>> --- a/include/exec/memory.h
>> +++ b/include/exec/memory.h
> ...
>> @@ -425,6 +501,120 @@ struct IOMMUMemoryRegionClass {
>>   Error **errp);
>>  };
>>  
>> +/*
>> + * RamDiscardMgrClass:
>> + *
>> + * A #RamDiscardMgr coordinates which parts of specific RAM #MemoryRegion
>> + * regions are currently populated to be used/accessed by the VM, notifying
>> + * after parts were discarded (freeing up memory) and before parts will be
>> + * populated (consuming memory), to be used/acessed by the VM.
>> + *
>> + * A #RamDiscardMgr can only be set for a RAM #MemoryRegion while the
>> + * #MemoryRegion isn't mapped yet; it cannot change while the #MemoryRegion 
>> is
>> + * mapped.
>> + *
>> + * The #RamDiscardMgr is intended to be used by technologies that are
>> + * incompatible with discarding of RAM (e.g., VFIO, which may pin all
>> + * memory inside a #MemoryRegion), and require proper coordination to only
>> + * map the currently populated parts, to hinder parts that are expected to
>> + * remain discarded from silently getting populated and consuming memory.
>> + * Technologies that support discarding of RAM don't have to bother and can
>> + * simply map the whole #MemoryRegion.
>> + *
>> + * An example #RamDiscardMgr is virtio-mem, which logically (un)plugs
>> + * memory within an assigned RAM #MemoryRegion, coordinated with the VM.
>> + * Logically unplugging memory consists of discarding RAM. The VM agreed to 
>> not
>> + * access unplugged (discarded) memory - especially via DMA. virtio-mem will
>> + * properly coordinate with listeners before memory is plugged (populated),
>> + * and after memory is unplugged (discarded).
>> + *
>> + * Listeners are called in multiples of the minimum granularity and changes 
>> are
>> + * aligned to the minimum granularity within the #MemoryRegion. Listeners 
>> have
>> + * to prepare for memory becomming discarded in a different granularity 
>> than it
>> + * was populated and the other way around.
>> + */
>> +struct RamDiscardMgrClass {
>> +/* private */
>> +InterfaceClass parent_class;
>> +
>> +/* public */
>> +
>> +/**
>> + * @get_min_granularity:
>> + *
>> + * Get the minimum granularity in which listeners will get notified
>> + * about changes within the #MemoryRegion via the #RamDiscardMgr.
>> + *
>> + * @rdm: the #RamDiscardMgr
>> + * @mr: the #MemoryRegion
>> + *
>> + * Returns the minimum granularity.
>> + */
>> +uint64_t (*get_min_granularity)(const RamDiscardMgr *rdm,
>> +const MemoryRegion *mr);
>> +
>> +/**
>> + * @is_populated:

Re: [PATCH v1 4/9] vfio: Support for RamDiscardMgr in the !vIOMMU case

2020-12-03 Thread David Hildenbrand

On 03.12.20 00:26, Alex Williamson wrote:
> On Thu, 19 Nov 2020 16:39:13 +0100
> David Hildenbrand  wrote:
> 
>> Implement support for RamDiscardMgr, to prepare for virtio-mem
>> support. Instead of mapping the whole memory section, we only map
>> "populated" parts and update the mapping when notified about
>> discarding/population of memory via the RamDiscardListener. Similarly, when
>> syncing the dirty bitmaps, sync only the actually mapped (populated) parts
>> by replaying via the notifier.
>>
>> Small mapping granularity is problematic for vfio, because we might run out
>> of mappings. Warn to at least make users aware that there is such a
>> limitation and that we are dealing with a setup issue e.g., of
>> virtio-mem devices.
>>
>> Using virtio-mem with vfio is still blocked via
>> ram_block_discard_disable()/ram_block_discard_require() after this patch.
>>
>> Cc: Paolo Bonzini 
>> Cc: "Michael S. Tsirkin" 
>> Cc: Alex Williamson 
>> Cc: Dr. David Alan Gilbert 
>> Cc: Igor Mammedov 
>> Cc: Pankaj Gupta 
>> Cc: Peter Xu 
>> Cc: Auger Eric 
>> Cc: Wei Yang 
>> Cc: teawater 
>> Cc: Marek Kedzierski 
>> Signed-off-by: David Hildenbrand 
>> ---
>>  hw/vfio/common.c  | 233 ++
>>  include/hw/vfio/vfio-common.h |  12 ++
>>  2 files changed, 245 insertions(+)
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index c1fdbf17f2..d52e7356cb 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
> ...
>> +static void vfio_register_ram_discard_notifier(VFIOContainer *container,
>> +   MemoryRegionSection *section)
>> +{
>> +RamDiscardMgr *rdm = memory_region_get_ram_discard_mgr(section->mr);
>> +RamDiscardMgrClass *rdmc = RAM_DISCARD_MGR_GET_CLASS(rdm);
>> +MachineState *ms = MACHINE(qdev_get_machine());
>> +uint64_t suggested_granularity;
>> +VFIORamDiscardListener *vrdl;
>> +int ret;
>> +
>> +vrdl = g_new0(VFIORamDiscardListener, 1);
>> +vrdl->container = container;
>> +vrdl->mr = section->mr;
>> +vrdl->offset_within_region = section->offset_within_region;
>> +vrdl->offset_within_address_space = 
>> section->offset_within_address_space;
>> +vrdl->size = int128_get64(section->size);
>> +vrdl->granularity = rdmc->get_min_granularity(rdm, section->mr);
>> +
>> +/* Ignore some corner cases not relevant in practice. */
>> +g_assert(QEMU_IS_ALIGNED(vrdl->offset_within_region, TARGET_PAGE_SIZE));
>> +g_assert(QEMU_IS_ALIGNED(vrdl->offset_within_address_space,
>> + TARGET_PAGE_SIZE));
>> +g_assert(QEMU_IS_ALIGNED(vrdl->size, TARGET_PAGE_SIZE));
>> +
>> +/*
>> + * We assume initial RAM never has a RamDiscardMgr and that all memory
>> + * to eventually get hotplugged later could be coordinated via a
>> + * RamDiscardMgr ("worst case").
>> + *
>> + * We assume the Linux kernel is configured ("dma_entry_limit") for the
>> + * maximum of 65535 mappings and that we can consume roughly half of 
>> that
> 
> 
> s/maximum/default/
> 
> Deciding we should only use half of it seems arbitrary.

Yeah, it's sub-optimal - bad heuristic :) . What would be your
suggestion for a better heuristic? My gut feeling would be that we
rarely use more than 512 mappings in the system address space (e.g.,
maximum number of DIMMs is 256).

> 
> 
>> + * for this purpose.
>> + *
>> + * In reality, we might also have RAM without a RamDiscardMgr in our 
>> device
>> + * memory region and might be able to consume more mappings.
>> + */
>> +suggested_granularity = pow2ceil((ms->maxram_size - ms->ram_size) / 
>> 32768);
>> +suggested_granularity = MAX(suggested_granularity, 1 * MiB);
>> +if (vrdl->granularity < suggested_granularity) {
>> +warn_report("%s: eventually problematic mapping granularity (%" 
>> PRId64
>> +" MiB) with coordinated discards (e.g., 'block-size' in"
>> +" virtio-mem). Suggested minimum granularity: %" PRId64
>> +" MiB", __func__, vrdl->granularity / MiB,
>> +suggested_granularity / MiB);
>> +}
> 
> 
> Starting w/ kernel 5.10 we have a way to get the instantaneous count of
> available DMA mappings, so we could avoid assuming 64k when that's
> available (see ex. s390_pci_update_dma_avail()). 

Interesting, I missed that interface. Will have a look. TThanks!


-- 
Thanks,

David / dhildenb

Re: [PATCH-for-5.2? 0/1] Acceptance tests: bump Fedora to 32

2020-12-03 Thread Peter Maydell

On Wed, 2 Dec 2020 at 21:57, Cleber Rosa  wrote:
>
> I believe this may be a candidate for "right now" because the code
> changes here simply sync with external infrastructure changes, that
> is, the retirement of Fedora 31 from the official repository
> locations).

Strong "no" at this point in the release cycle. Only
"this is absolutely critical because QEMU sets users'
machines on fire" bug fixes at this point.

thanks
-- PMM

Re: [PATCH] arm/hvf: Optimize and simplify WFI handling

2020-12-03 Thread Roman Bolshakov

On Tue, Dec 01, 2020 at 10:59:50AM -0800, Peter Collingbourne wrote:
> On Tue, Dec 1, 2020 at 3:16 AM Alexander Graf  wrote:
> >
> > Hi Peter,
> >
> > On 01.12.20 09:21, Peter Collingbourne wrote:
> > > Sleep on WFx until the VTIMER is due but allow ourselves to be woken
> > > up on IPI.
> > >
> > > Signed-off-by: Peter Collingbourne 
> >
> >
> > Thanks a bunch!
> >
> >
> > > ---
> > > Alexander Graf wrote:
> > >> I would love to take a patch from you here :). I'll still be stuck for a
> > >> while with the sysreg sync rework that Peter asked for before I can look
> > >> at WFI again.
> > > Okay, here's a patch :) It's a relatively straightforward adaptation
> > > of what we have in our fork, which can now boot Android to GUI while
> > > remaining at around 4% CPU when idle.
> > >
> > > I'm not set up to boot a full Linux distribution at the moment so I
> > > tested it on upstream QEMU by running a recent mainline Linux kernel
> > > with a rootfs containing an init program that just does sleep(5)
> > > and verified that the qemu process remains at low CPU usage during
> > > the sleep. This was on top of your v2 plus the last patch of your v1
> > > since it doesn't look like you have a replacement for that logic yet.
> > >
> > >   accel/hvf/hvf-cpus.c |  5 +--
> > >   include/sysemu/hvf_int.h |  3 +-
> > >   target/arm/hvf/hvf.c | 94 +++-
> > >   3 files changed, 28 insertions(+), 74 deletions(-)
> > >
> > > diff --git a/accel/hvf/hvf-cpus.c b/accel/hvf/hvf-cpus.c
> > > index 4360f64671..b2c8fb57f6 100644
> > > --- a/accel/hvf/hvf-cpus.c
> > > +++ b/accel/hvf/hvf-cpus.c
> > > @@ -344,9 +344,8 @@ static int hvf_init_vcpu(CPUState *cpu)
> > >   sigact.sa_handler = dummy_signal;
> > >   sigaction(SIG_IPI, &sigact, NULL);
> > >
> > > -pthread_sigmask(SIG_BLOCK, NULL, &set);
> > > -sigdelset(&set, SIG_IPI);
> > > -pthread_sigmask(SIG_SETMASK, &set, NULL);
> > > +pthread_sigmask(SIG_BLOCK, NULL, &cpu->hvf->unblock_ipi_mask);
> > > +sigdelset(&cpu->hvf->unblock_ipi_mask, SIG_IPI);
> >
> >
> > What will this do to the x86 hvf implementation? We're now not
> > unblocking SIG_IPI again for that, right?
> 
> Yes and that was the case before your patch series.
> 
> > >
> > >   #ifdef __aarch64__
> > >   r = hv_vcpu_create(&cpu->hvf->fd, (hv_vcpu_exit_t 
> > > **)&cpu->hvf->exit, NULL);
> > > diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
> > > index c56baa3ae8..13adf6ea77 100644
> > > --- a/include/sysemu/hvf_int.h
> > > +++ b/include/sysemu/hvf_int.h
> > > @@ -62,8 +62,7 @@ extern HVFState *hvf_state;
> > >   struct hvf_vcpu_state {
> > >   uint64_t fd;
> > >   void *exit;
> > > -struct timespec ts;
> > > -bool sleeping;
> > > +sigset_t unblock_ipi_mask;
> > >   };
> > >
> > >   void assert_hvf_ok(hv_return_t ret);
> > > diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
> > > index 8fe10966d2..60a361ff38 100644
> > > --- a/target/arm/hvf/hvf.c
> > > +++ b/target/arm/hvf/hvf.c
> > > @@ -2,6 +2,7 @@
> > >* QEMU Hypervisor.framework support for Apple Silicon
> > >
> > >* Copyright 2020 Alexander Graf 
> > > + * Copyright 2020 Google LLC
> > >*
> > >* This work is licensed under the terms of the GNU GPL, version 2 or 
> > > later.
> > >* See the COPYING file in the top-level directory.
> > > @@ -18,6 +19,7 @@
> > >   #include "sysemu/hw_accel.h"
> > >
> > >   #include 
> > > +#include 
> > >
> > >   #include "exec/address-spaces.h"
> > >   #include "hw/irq.h"
> > > @@ -320,18 +322,8 @@ int hvf_arch_init_vcpu(CPUState *cpu)
> > >
> > >   void hvf_kick_vcpu_thread(CPUState *cpu)
> > >   {
> > > -if (cpu->hvf->sleeping) {
> > > -/*
> > > - * When sleeping, make sure we always send signals. Also, clear 
> > > the
> > > - * timespec, so that an IPI that arrives between setting 
> > > hvf->sleeping
> > > - * and the nanosleep syscall still aborts the sleep.
> > > - */
> > > -cpu->thread_kicked = false;
> > > -cpu->hvf->ts = (struct timespec){ };
> > > -cpus_kick_thread(cpu);
> > > -} else {
> > > -hv_vcpus_exit(&cpu->hvf->fd, 1);
> > > -}
> > > +cpus_kick_thread(cpu);
> > > +hv_vcpus_exit(&cpu->hvf->fd, 1);
> >
> >
> > This means your first WFI will almost always return immediately due to a
> > pending signal, because there probably was an IRQ pending before on the
> > same CPU, no?
> 
> That's right. Any approach involving the "sleeping" field would need
> to be implemented carefully to avoid races that may result in missed
> wakeups so for simplicity I just decided to send both kinds of
> wakeups. In particular the approach in the updated patch you sent is
> racy and I'll elaborate more in the reply to that patch.
> 
> > >   }
> > >
> > >   static int hvf_inject_interrupts(CPUState *cpu)
> > > @@ -385,18 +377,19 @@ int hvf_vcpu_exec(CPUState *cpu)
> > >   uint64_t syndrome = hvf_exit->exce

Re: [PULL 0/6] pc,vhost: fixes

2020-12-03 Thread Peter Maydell

On Wed, 2 Dec 2020 at 11:03, Michael S. Tsirkin  wrote:
>
> Patch 5 gave me pause but we do need patch 6 as
> it's a guest triggerable assert, and it seemed
> cleaner to just take the whole patchset than cherry-pick.

Is this only "fixes a guest triggerable assert"? If so, that's
not sufficiently important to require an rc5 at this point.

thanks
-- PMM

qemu:handle_cpu_signal received signal outside vCPU context

2020-12-03 Thread Tj (Elloe Linux)

We're seeing this error on Ubuntu 20.04 amd64 host and aarch64
qemu-aarch64-static chroot when executing 'aptitude':

qemu:handle_cpu_signal received signal outside vCPU context
qemu:handle_cpu_signal received signal outside vCPU context

Research led us to an identical issue report against qemu-m68k [0] where
it is reported this issue is resolved in 5.0.0+.

Ubuntu 20.04 LTS currently ships v4.2 so I'm trying to identify the
commit(s) that resolved this issue so we can test a cherry-pick of the fix.

Do you know which commit(s) addressed this?

Some suggestions as to the responsible patch have been made but I've not
been able to find the patchwork patch [1] content in current upstream
which suggests its the wrong patch, it was never merged, or the code has
since changed so it is unrecognisable.


[0] https://github.com/vivier/qemu-m68k/issues/38

[1]
http://patchwork.ozlabs.org/project/qemu-devel/patch/1456850240-21096-1-git-send-email-peter.mayd...@linaro.org/

Re: [PATCH v1 1/1] hvf: arm: Properly sync guest time on migration

2020-12-03 Thread Alexander Graf



On 03.12.20 00:25, Frank Yang wrote:



On Wed, Dec 2, 2020 at 2:57 PM Alexander Graf > wrote:



On 02.12.20 23:46, Frank Yang wrote:



On Wed, Dec 2, 2020 at 2:28 PM Alexander Graf mailto:ag...@csgraf.de>> wrote:


On 02.12.20 23:19, Frank Yang wrote:


From downstream:

https://android-review.googlesource.com/c/platform/external/qemu/+/1515002



Based on v3 of Alexander Graf's patches

https://patchew.org/QEMU/20201202190408.2041-1-ag...@csgraf.de


We need to adjust CNTVOFF_EL2 so that time doesnt warp. 
Even though we
can set separate CNTVOFF_EL2 values per vCPU, it just is not
worth the
require effort to do that accurately---with individual
values, even if
they are a tiny bit off it can result in a lockup due to
inconsistent
time differences between vCPUs. So just use a global
approximate value
for now.

Not tested in upstream yet, but Android emulator snapshots
work without
time warp now.

Signed-off-by: Lingfeng Yang mailto:l...@google.com>>



If we just always make CNTV start at the same 0 as
QEMU_CLOCK_VIRTUAL, we should be able to just recover the
offset after migration by looking at QEMU_CLOCK_VIRTUAL to
set CNTVOFF, right?

That would end up much easier than this patch I hope.



The virtual clock interfaces/implementations in QEMU seem complex
to me relative to the fix needed here and they don't seem to
compute ticks with mach_absolute_time() (which in this case we
want since we want to compute in timer ticks instead of having to
mess with ns / cycle conversions). I do agree this patch does
seem more complicated on the surface though versus "just" setting
cntvoff directly to some value. Maybe we should simplify the
QEMU_CLOCK_VIRTUAL implementation first to maintain
CNTVOFF_EL2/CNTV using mach_absolute_time() first?



So QEMU_CLOCK_VIRTUAL calls cpu_get_clock() which just adds an
offset to gettimeofday(). This offset is already part of the live
migration stream[1]. So if you just configure CNTVOFF_EL2 based on
QEMU_CLOCK_VIRTUAL adjusted by the clock frequency on vcpu init,
you should have everything you need. You can do that on every CPU
init even, as the virtual clock will just be 0 on start.

The only thing we need to change then is to move the WFI from a
direct call to mach_absolute_time() to also check the virtual
clock instead. I would hope that gettimeofday() calls
mach_absolute_time() in the background too to speed it up.

Sounds plausible, but I noticed that we also have cpu_ticks_offset as 
part of the migration stream, and I prefer mach_absolute_time() 
(ticks) instead of seconds in WFI as well as it makes the calculation 
more accurate (ticks against ticks instead of conversion between ns 
and ticks).


Should we look at integrating this with cpu_ticks_offset instead?



The big benefit of reusing the virtual clock is that you create 
compatible migration streams between TCG and HVF, no? You'd lose out on 
that with a hack like this.



Alex

Re: [PATCH v2 2/4] contrib/vhost-user-gpu: avoid g_return_val_if() input validation

2020-12-03 Thread Marc-André Lureau

On Thu, Dec 3, 2020 at 1:52 PM Stefan Hajnoczi  wrote:

> On Wed, Dec 02, 2020 at 07:50:51PM +0400, Marc-André Lureau wrote:
> > On Wed, Dec 2, 2020 at 7:27 PM Stefan Hajnoczi 
> wrote:
> >
> > > Do not validate input with g_return_val_if(). This API is intended for
> > > checking programming errors and is compiled out with
> -DG_DISABLE_CHECKS.
> > >
> > > Use an explicit if statement for input validation so it cannot
> > > accidentally be compiled out.
> > >
> > > Suggested-by: Markus Armbruster 
> > > Signed-off-by: Stefan Hajnoczi 
> > > ---
> > >  contrib/vhost-user-gpu/vhost-user-gpu.c | 6 +-
> > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/contrib/vhost-user-gpu/vhost-user-gpu.c
> > > b/contrib/vhost-user-gpu/vhost-user-gpu.c
> > > index a019d0a9ac..534bad24d1 100644
> > > --- a/contrib/vhost-user-gpu/vhost-user-gpu.c
> > > +++ b/contrib/vhost-user-gpu/vhost-user-gpu.c
> > > @@ -1044,7 +1044,11 @@ vg_get_config(VuDev *dev, uint8_t *config,
> uint32_t
> > > len)
> > >  {
> > >  VuGpu *g = container_of(dev, VuGpu, dev.parent);
> > >
> > > -g_return_val_if_fail(len <= sizeof(struct virtio_gpu_config), -1);
> > > +if (len > sizeof(struct virtio_gpu_config)) {
> > > +g_critical("%s: len %u is larger than %zu",
> > > +   __func__, len, sizeof(struct virtio_gpu_config));
> > >
> >
> > g_critical() already has __FILE__ __LINE__ and G_STRFUNC.
>
> I did this for consistency with the logging in this source file. The
> other g_critical() calls in the file also print __func__.
>
>
>
I see, nevermind then. I gave rb anyway


-- 
Marc-André Lureau

Re: [PATCH] docs: set CONFDIR when running sphinx

2020-12-03 Thread Marc-André Lureau

Hi

On Wed, Dec 2, 2020 at 11:55 PM Eduardo Habkost  wrote:

> On Wed, Dec 02, 2020 at 10:05:50AM +0100, Paolo Bonzini wrote:
> > On 01/12/20 19:37, marcandre.lur...@redhat.com wrote:
> > > From: Marc-André Lureau 
> > >
> > > The default configuration path /etc/qemu can be overriden with
> configure
> > > options, and the generated documentation used to reflect it.
> > >
> > > Fixes regression introduced in commit
> > > f8aa24ea9a82da38370470c6bc0eaa393999edfe ("meson: sphinx-build").
> > >
> > > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1902537
> > > Signed-off-by: Marc-André Lureau 
> > > ---
> > >   docs/meson.build | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/docs/meson.build b/docs/meson.build
> > > index ebd85d59f9..bb8fe4c9e4 100644
> > > --- a/docs/meson.build
> > > +++ b/docs/meson.build
> > > @@ -9,7 +9,7 @@ endif
> > >   # Check if tools are available to build documentation.
> > >   build_docs = false
> > >   if sphinx_build.found()
> > > -  SPHINX_ARGS = [sphinx_build]
> > > +  SPHINX_ARGS = ['env', 'CONFDIR=' + qemu_confdir, sphinx_build]
> > > # If we're making warnings fatal, apply this to Sphinx runs as well
> > > if get_option('werror')
> > >   SPHINX_ARGS += [ '-W' ]
> > >
> >
> > I can queue the patch, but I also wouldn't mind removing support for
> > /etc/qemu completely.  I'm not sure why one would use it.  Eduardo?
>
> I agree, and I had a series for this 3 years ago.
>
> I guess I need to my keep my word and finally submit v5 of the series:
>
> https://lore.kernel.org/qemu-devel/20171005123414.GE4015@localhost.localdomain/
>

Note that the original bug that prompted this fix is about qemu-ga
configuration though.

Paolo, please queue the patch. thanks!

-- 
Marc-André Lureau

Re: [PATCH v1 1/1] hvf: arm: Properly sync guest time on migration

2020-12-03 Thread Alexander Graf




On 03.12.20 00:28, Peter Collingbourne wrote:

On Wed, Dec 2, 2020 at 2:57 PM Alexander Graf  wrote:


On 02.12.20 23:46, Frank Yang wrote:



On Wed, Dec 2, 2020 at 2:28 PM Alexander Graf  wrote:


On 02.12.20 23:19, Frank Yang wrote:


 From downstream: 
https://android-review.googlesource.com/c/platform/external/qemu/+/1515002

Based on v3 of Alexander Graf's patches

https://patchew.org/QEMU/20201202190408.2041-1-ag...@csgraf.de

We need to adjust CNTVOFF_EL2 so that time doesnt warp.  Even though we
can set separate CNTVOFF_EL2 values per vCPU, it just is not worth the
require effort to do that accurately---with individual values, even if
they are a tiny bit off it can result in a lockup due to inconsistent
time differences between vCPUs. So just use a global approximate value
for now.

Not tested in upstream yet, but Android emulator snapshots work without
time warp now.

Signed-off-by: Lingfeng Yang 


If we just always make CNTV start at the same 0 as QEMU_CLOCK_VIRTUAL, we 
should be able to just recover the offset after migration by looking at 
QEMU_CLOCK_VIRTUAL to set CNTVOFF, right?

That would end up much easier than this patch I hope.



The virtual clock interfaces/implementations in QEMU seem complex to me relative to the 
fix needed here and they don't seem to compute ticks with mach_absolute_time() (which in 
this case we want since we want to compute in timer ticks instead of having to mess with 
ns / cycle conversions). I do agree this patch does seem more complicated on the surface 
though versus "just" setting cntvoff directly to some value. Maybe we should 
simplify the QEMU_CLOCK_VIRTUAL implementation first to maintain CNTVOFF_EL2/CNTV using 
mach_absolute_time() first?


So QEMU_CLOCK_VIRTUAL calls cpu_get_clock() which just adds an offset to 
gettimeofday(). This offset is already part of the live migration stream[1]. So 
if you just configure CNTVOFF_EL2 based on QEMU_CLOCK_VIRTUAL adjusted by the 
clock frequency on vcpu init, you should have everything you need. You can do 
that on every CPU init even, as the virtual clock will just be 0 on start.

The only thing we need to change then is to move the WFI from a direct call to 
mach_absolute_time() to also check the virtual clock instead. I would hope that 
gettimeofday() calls mach_absolute_time() in the background too to speed it up.

I'm not sure that something based on gettimeofday() (or
clock_gettime(CLOCK_MONOTONIC) which it looks like cpu_get_clock() can
also call) will work. It will include time spent asleep so it won't
correspond to mach_absolute_time() aka guest CNTVCT_EL0.



It will correspond to it on launch, because we'd set the offset. 
Afterwards, the only problem I think you're unraveling here is that we 
would need to adjust CNTVOFF after a suspend/resume cycle to propagate 
the time jump into the virtual machine. Which probably is not a terrible 
idea anyway, because how else would the guest know that time has passed?



Alex

[PATCH v4 0/2] hw/virtio-pci: AER capability

2020-12-03 Thread andrew

From: Andrew Melnychenko 

Main motivation:
According to Microsoft driver\device certification requirements
for next version of Window Server, PCIe device must support AER.
"Windows Server PCI Express devices are required to support
Advanced Error Reporting [AER] as defined in
PCI Express Base Specification version 2.1."

AER capability for virtio-pci is disabled by default.
AER capability is only for PCI with PCIe interface on PCIe bus.
During migration - device "realize" should initialize AER
if requested by device properties.
Fixed commit message.

Andrew (2):
  hw/virtio-pci Added counter for pcie capabilities offsets.
  hw/virtio-pci Added AER capability.

 hw/virtio/virtio-pci.c | 20 +++-
 hw/virtio/virtio-pci.h |  4 
 2 files changed, 23 insertions(+), 1 deletion(-)

-- 
2.29.2

[PATCH v4 1/2] hw/virtio-pci Added counter for pcie capabilities offsets.

2020-12-03 Thread andrew

From: Andrew 

Removed hardcoded offset for ats. Added cap offset counter
for future capabilities like AER.

Signed-off-by: Andrew Melnychenko 
---
 hw/virtio/virtio-pci.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 36524a5728..ceaa233129 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1798,6 +1798,7 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error 
**errp)
 
 if (pcie_port && pci_is_express(pci_dev)) {
 int pos;
+uint16_t last_pcie_cap_offset = PCI_CONFIG_SPACE_SIZE;
 
 pos = pcie_endpoint_cap_init(pci_dev, 0);
 assert(pos > 0);
@@ -1833,7 +1834,8 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error 
**errp)
 }
 
 if (proxy->flags & VIRTIO_PCI_FLAG_ATS) {
-pcie_ats_init(pci_dev, 256);
+pcie_ats_init(pci_dev, last_pcie_cap_offset);
+last_pcie_cap_offset += PCI_EXT_CAP_ATS_SIZEOF;
 }
 
 if (proxy->flags & VIRTIO_PCI_FLAG_INIT_FLR) {
-- 
2.29.2

[PATCH v4 2/2] hw/virtio-pci Added AER capability.

2020-12-03 Thread andrew

From: Andrew 

Added AER capability for virtio-pci devices.
Also added property for devices, by default AER is disabled.

Signed-off-by: Andrew Melnychenko 
---
 hw/virtio/virtio-pci.c | 16 
 hw/virtio/virtio-pci.h |  4 
 2 files changed, 20 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index ceaa233129..f863f69ede 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1817,6 +1817,12 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error 
**errp)
  */
 pci_set_word(pci_dev->config + pos + PCI_PM_PMC, 0x3);
 
+if (proxy->flags & VIRTIO_PCI_FLAG_AER) {
+pcie_aer_init(pci_dev, PCI_ERR_VER, last_pcie_cap_offset,
+  PCI_ERR_SIZEOF, NULL);
+last_pcie_cap_offset += PCI_ERR_SIZEOF;
+}
+
 if (proxy->flags & VIRTIO_PCI_FLAG_INIT_DEVERR) {
 /* Init error enabling flags */
 pcie_cap_deverr_init(pci_dev);
@@ -1858,7 +1864,15 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error 
**errp)
 
 static void virtio_pci_exit(PCIDevice *pci_dev)
 {
+VirtIOPCIProxy *proxy = VIRTIO_PCI(pci_dev);
+bool pcie_port = pci_bus_is_express(pci_get_bus(pci_dev)) &&
+ !pci_bus_is_root(pci_get_bus(pci_dev));
+
 msix_uninit_exclusive_bar(pci_dev);
+if (proxy->flags & VIRTIO_PCI_FLAG_AER && pcie_port &&
+pci_is_express(pci_dev)) {
+pcie_aer_exit(pci_dev);
+}
 }
 
 static void virtio_pci_reset(DeviceState *qdev)
@@ -1911,6 +1925,8 @@ static Property virtio_pci_properties[] = {
 VIRTIO_PCI_FLAG_INIT_PM_BIT, true),
 DEFINE_PROP_BIT("x-pcie-flr-init", VirtIOPCIProxy, flags,
 VIRTIO_PCI_FLAG_INIT_FLR_BIT, true),
+DEFINE_PROP_BIT("aer", VirtIOPCIProxy, flags,
+VIRTIO_PCI_FLAG_AER_BIT, false),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
index 06e2af12de..d7d5d403a9 100644
--- a/hw/virtio/virtio-pci.h
+++ b/hw/virtio/virtio-pci.h
@@ -41,6 +41,7 @@ enum {
 VIRTIO_PCI_FLAG_INIT_LNKCTL_BIT,
 VIRTIO_PCI_FLAG_INIT_PM_BIT,
 VIRTIO_PCI_FLAG_INIT_FLR_BIT,
+VIRTIO_PCI_FLAG_AER_BIT,
 };
 
 /* Need to activate work-arounds for buggy guests at vmstate load. */
@@ -80,6 +81,9 @@ enum {
 /* Init Function Level Reset capability */
 #define VIRTIO_PCI_FLAG_INIT_FLR (1 << VIRTIO_PCI_FLAG_INIT_FLR_BIT)
 
+/* Advanced Error Reporting capability */
+#define VIRTIO_PCI_FLAG_AER (1 << VIRTIO_PCI_FLAG_AER_BIT)
+
 typedef struct {
 MSIMessage msg;
 int virq;
-- 
2.29.2

Re: [PATCH v3 08/10] arm/hvf: Add a WFI handler

2020-12-03 Thread Roman Bolshakov

On Wed, Dec 02, 2020 at 08:04:06PM +0100, Alexander Graf wrote:
> From: Peter Collingbourne 
> 
> Sleep on WFI until the VTIMER is due but allow ourselves to be woken
> up on IPI.
> 
> Signed-off-by: Peter Collingbourne 
> [agraf: Remove unused 'set' variable, always advance PC on WFX trap]
> Signed-off-by: Alexander Graf 
> ---
>  accel/hvf/hvf-cpus.c |  5 ++--
>  include/sysemu/hvf_int.h |  1 +
>  target/arm/hvf/hvf.c | 55 
>  3 files changed, 58 insertions(+), 3 deletions(-)
> 
> diff --git a/accel/hvf/hvf-cpus.c b/accel/hvf/hvf-cpus.c
> index e613c22ad0..a981ccde70 100644
> --- a/accel/hvf/hvf-cpus.c
> +++ b/accel/hvf/hvf-cpus.c
> @@ -337,15 +337,14 @@ static int hvf_init_vcpu(CPUState *cpu)
>  cpu->hvf = g_malloc0(sizeof(*cpu->hvf));
>  
>  /* init cpu signals */
> -sigset_t set;
>  struct sigaction sigact;
>  
>  memset(&sigact, 0, sizeof(sigact));
>  sigact.sa_handler = dummy_signal;
>  sigaction(SIG_IPI, &sigact, NULL);
>  
> -pthread_sigmask(SIG_BLOCK, NULL, &set);
> -sigdelset(&set, SIG_IPI);
> +pthread_sigmask(SIG_BLOCK, NULL, &cpu->hvf->unblock_ipi_mask);
> +sigdelset(&cpu->hvf->unblock_ipi_mask, SIG_IPI);
>  
>  #ifdef __aarch64__
>  r = hv_vcpu_create(&cpu->hvf->fd, (hv_vcpu_exit_t **)&cpu->hvf->exit, 
> NULL);
> diff --git a/include/sysemu/hvf_int.h b/include/sysemu/hvf_int.h
> index 5f15119184..13adf6ea77 100644
> --- a/include/sysemu/hvf_int.h
> +++ b/include/sysemu/hvf_int.h
> @@ -62,6 +62,7 @@ extern HVFState *hvf_state;
>  struct hvf_vcpu_state {
>  uint64_t fd;
>  void *exit;
> +sigset_t unblock_ipi_mask;
>  };
>  
>  void assert_hvf_ok(hv_return_t ret);
> diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
> index 5ecce36d4a..79aeeb237b 100644
> --- a/target/arm/hvf/hvf.c
> +++ b/target/arm/hvf/hvf.c
> @@ -2,6 +2,7 @@
>   * QEMU Hypervisor.framework support for Apple Silicon
>  
>   * Copyright 2020 Alexander Graf 
> + * Copyright 2020 Google LLC
>   *
>   * This work is licensed under the terms of the GNU GPL, version 2 or later.
>   * See the COPYING file in the top-level directory.
> @@ -18,6 +19,7 @@
>  #include "sysemu/hw_accel.h"
>  
>  #include 
> +#include 
>  
>  #include "exec/address-spaces.h"
>  #include "hw/irq.h"
> @@ -413,6 +415,7 @@ int hvf_arch_init_vcpu(CPUState *cpu)
>  
>  void hvf_kick_vcpu_thread(CPUState *cpu)
>  {
> +cpus_kick_thread(cpu);
>  hv_vcpus_exit(&cpu->hvf->fd, 1);
>  }
>  
> @@ -468,6 +471,18 @@ static int hvf_inject_interrupts(CPUState *cpu)
>  return 0;
>  }
>  
> +static void hvf_wait_for_ipi(CPUState *cpu, struct timespec *ts)
> +{
> +/*
> + * Use pselect to sleep so that other threads can IPI us while we're
> + * sleeping.
> + */
> +qatomic_mb_set(&cpu->thread_kicked, false);
> +qemu_mutex_unlock_iothread();

I raised a concern earlier, but I don't for sure if a kick could be lost
right here. On x86 it could be lost.

> +pselect(0, 0, 0, 0, ts, &cpu->hvf->unblock_ipi_mask);
> +qemu_mutex_lock_iothread();
> +}
> +
>  int hvf_vcpu_exec(CPUState *cpu)
>  {
>  ARMCPU *arm_cpu = ARM_CPU(cpu);
> @@ -579,6 +594,46 @@ int hvf_vcpu_exec(CPUState *cpu)
>  }
>  case EC_WFX_TRAP:
>  advance_pc = true;
> +if (!(syndrome & WFX_IS_WFE) && !(cpu->interrupt_request &
> +(CPU_INTERRUPT_HARD | CPU_INTERRUPT_FIQ))) {
> +
> +uint64_t ctl;
> +r = hv_vcpu_get_sys_reg(cpu->hvf->fd, 
> HV_SYS_REG_CNTV_CTL_EL0,
> +&ctl);
> +assert_hvf_ok(r);
> +
> +if (!(ctl & 1) || (ctl & 2)) {
> +/* Timer disabled or masked, just wait for an IPI. */
> +hvf_wait_for_ipi(cpu, NULL);
> +break;
> +}
> +
> +uint64_t cval;
> +r = hv_vcpu_get_sys_reg(cpu->hvf->fd, 
> HV_SYS_REG_CNTV_CVAL_EL0,
> +&cval);
> +assert_hvf_ok(r);
> +
> +int64_t ticks_to_sleep = cval - mach_absolute_time();


Apple reference recommends to use [1]:

  clock_gettime_nsec_np(CLOCK_UPTIME_RAW)

It, internally in Libc, invokes mach_absolute_time() [2].

1. https://developer.apple.com/documentation/kernel/1462446-mach_absolute_time
2. 
https://opensource.apple.com/source/Libc/Libc-1158.1.2/gen/clock_gettime.c.auto.html

Thanks,
Roman

> +if (ticks_to_sleep < 0) {
> +break;
> +}
> +
> +uint64_t seconds = ticks_to_sleep / arm_cpu->gt_cntfrq_hz;
> +uint64_t nanos =
> +(ticks_to_sleep - arm_cpu->gt_cntfrq_hz * seconds) *
> +10 / arm_cpu->gt_cntfrq_hz;
> +
> +/*
> + * Don't sleep for less than 2ms. This is believed to improve
> + * latency of message passing

Re: qemu:handle_cpu_signal received signal outside vCPU context

2020-12-03 Thread Peter Maydell

On Thu, 3 Dec 2020 at 10:27, Tj (Elloe Linux)  wrote:
>
> We're seeing this error on Ubuntu 20.04 amd64 host and aarch64
> qemu-aarch64-static chroot when executing 'aptitude':
>
> qemu:handle_cpu_signal received signal outside vCPU context
> qemu:handle_cpu_signal received signal outside vCPU context
>
> Research led us to an identical issue report against qemu-m68k [0] where
> it is reported this issue is resolved in 5.0.0+.
>
> Ubuntu 20.04 LTS currently ships v4.2 so I'm trying to identify the
> commit(s) that resolved this issue so we can test a cherry-pick of the fix.
>
> Do you know which commit(s) addressed this?

You might try commits 365510fb860a91 / 9fcff3a67f2be53de2d / 6bc024e713fd35e.
Otherwise, try a git bisect to see where your test case started working
correctly.

thanks
-- PMM

Re: [PATCH v2 0/4] [RfC] fix tracing for modules

2020-12-03 Thread Gerd Hoffmann

  Hi,

> I noticed an issue with simpletrace: the trace file does not contain
> qxl_* TRACE_RECORD_TYPE_MAPPING records when ./configure
> --enable-modules is used. This happens because st_write_event_mapping()
> is called before the qxl module calls trace_event_register_group().
> 
> (The mapping records describe the integer ID to string name mapping used
> in a simpletrace file.)

It's not the only initialization issue.  "qemu-system-x86_64 -vga qxl
-trace qxl*" doesn't work either for simliar reasons.

> One way to solve this is by modifying trace_event_register_group() to
> call into trace/simple.c (maybe with a TraceEventIter initialized to
> iterate over the newly registered trace events group?).
> 
> Alternatively, simpletrace.c might be able to emit
> TRACE_RECORD_TYPE_MAPPING on demand instead of ahead of time.

I guess I'll have a look at the first option, or more general at
per-group initialization, that looks like the better approach on
a quick glance.

take care,
  Gerd

Re: [PATCH v2 1/4] tests/qtest/fuzz-test: Quit test_lp1878642 once done

2020-12-03 Thread Li Qiang

Philippe Mathieu-Daudé  于2020年12月2日周三 上午3:11写道：
>
> Missed in fd250172842 ("qtest: add a reproducer for LP#1878642").
>
> Reviewed-by: Thomas Huth 
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Li Qiang 

> ---
>  tests/qtest/fuzz-test.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/tests/qtest/fuzz-test.c b/tests/qtest/fuzz-test.c
> index 9cb4c42bdea..87b72307a5b 100644
> --- a/tests/qtest/fuzz-test.c
> +++ b/tests/qtest/fuzz-test.c
> @@ -45,6 +45,7 @@ static void 
> test_lp1878642_pci_bus_get_irq_level_assert(void)
>  qtest_outl(s, 0xcf8, 0x8400f841);
>  qtest_outl(s, 0xcfc, 0xebed205d);
>  qtest_outl(s, 0x5d02, 0xebed205d);
> +qtest_quit(s);
>  }
>
>  int main(int argc, char **argv)
> --
> 2.26.2
>
>

Re: [PATCH for-6.0 v2 2/3] spapr/xive: Fix size of END table and number of claimed IPIs

2020-12-03 Thread Greg Kurz

On Thu, 3 Dec 2020 10:51:10 +0100
Cédric Le Goater  wrote:

> On 11/30/20 7:07 PM, Cédric Le Goater wrote:
> > On 11/30/20 5:52 PM, Greg Kurz wrote:
> >> The sPAPR XIVE device has an internal ENDT table the size of
> >> which is configurable by the machine. This table is supposed
> >> to contain END structures for all possible vCPUs that may
> >> enter the guest. The machine must also claim IPIs for all
> >> possible vCPUs since this is expected by the guest.
> >>
> >> spapr_irq_init() takes care of that under the assumption that
> >> spapr_max_vcpu_ids() returns the number of possible vCPUs.
> >> This happens to be the case when the VSMT mode is set to match
> >> the number of threads per core in the guest (default behavior).
> >> With non-default VSMT settings, this limit is > to the number
> >> of vCPUs. In the worst case, we can end up allocating an 8 times
> >> bigger ENDT and claiming 8 times more IPIs than needed. But more
> >> importantly, this creates a confusion between number of vCPUs and
> >> vCPU ids, which can lead to subtle bugs like [1].
> >>
> >> Use smp.max_cpus instead of spapr_max_vcpu_ids() in
> >> spapr_irq_init() for the latest machine type. Older machine
> >> types continue to use spapr_max_vcpu_ids() since the size of
> >> the ENDT is migration visible.
> >>
> >> [1] https://bugs.launchpad.net/qemu/+bug/1900241
> >>
> >> Signed-off-by: Greg Kurz 
> > 
> > 
> > Reviewed-by: Cédric Le Goater 
> 
> 
> I gave patch 2 and 3 a little more thinking. 
> 
> I don't think we need much more than patch 1 which clarifies the 
> nature of the values being manipulated, quantities vs. numbering.
> 
> The last 2 patches are adding complexity to try to optimize the 
> XIVE VP space in a case scenario which is not very common (vSMT). 
> May be it's not worth it. 
> 

Well, the motivation isn't about optimization really since
a non-default vSMT setting already wastes VP space because
of the vCPU spacing. This is more about not using values
with wrong semantics in the code to avoid confusion in
future changes.

I agree though that the extra complexity, especially the
compat crust, might be excessive. So maybe I should just
add comments in the code to clarify when we're using the
wrong semantics ?

> Today, we can start 4K (-2) KVM guests with 16 vCPUs each on 
> a witherspoon (2 socket P9) and we are far from reaching the 
> limits of the VP space. Available RAM is more a problem. 
> 
> VP space is even bigger on P10. The width was increased to 24bit 
> per chip.
> 
> C.

[PATCH v3 00/12] microvm: add second ioapic

2020-12-03 Thread Gerd Hoffmann

Add a second ioapic to microvm.  Gives us more IRQ lines we can
use for virtio-mmio devices.  Bump number of possible virtio-mmio
devices from 8 to 24.

v3:
 - pick up some review tags.
 - replace magic numbers with #defines.
 - add asl changes to commit messages.
v2:
 - reorganize code a bit.
 - add ioapic2= option to microvm.

Gerd Hoffmann (12):
  [testing] disable xhci msix
  x86: rewrite gsi_handler()
  x86: add support for second ioapic
  microvm: make number of virtio transports runtime changeable
  microvm: make pcie irq base runtime changeable
  microvm: drop microvm_gsi_handler()
  microvm: add second ioapic
  tests/acpi: allow updates for expected data files
  tests/acpi: add data files for ioapic2 test variant
  tests/acpi: add ioapic2=on test for microvm
  tests/acpi: update expected data files
  tests/acpi: disallow updates for expected data files

 include/hw/i386/ioapic.h |   2 +
 include/hw/i386/ioapic_internal.h|   2 +-
 include/hw/i386/microvm.h|   6 +-
 include/hw/i386/x86.h|   3 +
 hw/i386/acpi-common.c|  10 
 hw/i386/microvm.c|  82 ---
 hw/i386/x86.c|  35 ++--
 hw/usb/hcd-xhci-pci.c|   2 +-
 tests/qtest/bios-tables-test.c   |  20 +--
 tests/data/acpi/microvm/APIC.ioapic2 | Bin 0 -> 82 bytes
 tests/data/acpi/microvm/DSDT.ioapic2 | Bin 0 -> 365 bytes
 11 files changed, 129 insertions(+), 33 deletions(-)
 create mode 100644 tests/data/acpi/microvm/APIC.ioapic2
 create mode 100644 tests/data/acpi/microvm/DSDT.ioapic2

-- 
2.27.0

[PATCH v3 09/12] tests/acpi: add data files for ioapic2 test variant

2020-12-03 Thread Gerd Hoffmann

Copy microvm/APIC -> microvm/APIC.ioapic2
Copy microvm/DSDT -> microvm/DSDT.ioapic2

Signed-off-by: Gerd Hoffmann 
---
 tests/data/acpi/microvm/APIC.ioapic2 | Bin 0 -> 70 bytes
 tests/data/acpi/microvm/DSDT.ioapic2 | Bin 0 -> 365 bytes
 2 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 tests/data/acpi/microvm/APIC.ioapic2
 create mode 100644 tests/data/acpi/microvm/DSDT.ioapic2

diff --git a/tests/data/acpi/microvm/APIC.ioapic2 
b/tests/data/acpi/microvm/APIC.ioapic2
new file mode 100644
index 
..7472c7e830b6c7139720e93dd544d4441556661d
GIT binary patch
literal 70
zcmZ<^@N{-#U|?Xp?&R<65v<@85#a0y6k`O6f!H9Lf#JbFFwFr}2jnsGfW!{`1CcCj
H|A7JkCE=5t
zE;#2y@U)soH1M>j)Okp0N|?)B()oaTI~2uqaQfeg{#@eeMV8=e3}=mj(J&BYn|!sEgD;a

[PATCH v3 04/12] microvm: make number of virtio transports runtime changeable

2020-12-03 Thread Gerd Hoffmann

This will allow to increase the number of transports in
case we have enough irq lines available for them all.

Signed-off-by: Gerd Hoffmann 
---
 include/hw/i386/microvm.h | 2 +-
 hw/i386/microvm.c | 9 +++--
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h
index 0fc21600..c5d60bacb5e8 100644
--- a/include/hw/i386/microvm.h
+++ b/include/hw/i386/microvm.h
@@ -52,7 +52,6 @@
 
 /* Platform virtio definitions */
 #define VIRTIO_MMIO_BASE  0xfeb0
-#define VIRTIO_NUM_TRANSPORTS 8
 #define VIRTIO_CMDLINE_MAXLEN 64
 
 #define GED_MMIO_BASE 0xfea0
@@ -98,6 +97,7 @@ struct MicrovmMachineState {
 
 /* Machine state */
 uint32_t virtio_irq_base;
+uint32_t virtio_num_transports;
 bool kernel_cmdline_fixed;
 Notifier machine_done;
 Notifier powerdown_req;
diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
index 5428448b7059..e92f236bf442 100644
--- a/hw/i386/microvm.c
+++ b/hw/i386/microvm.c
@@ -178,8 +178,13 @@ static void microvm_devices_init(MicrovmMachineState *mms)
 
 kvmclock_create(true);
 
-mms->virtio_irq_base = x86_machine_is_acpi_enabled(x86ms) ? 16 : 5;
-for (i = 0; i < VIRTIO_NUM_TRANSPORTS; i++) {
+mms->virtio_irq_base = 5;
+mms->virtio_num_transports = 8;
+if (x86_machine_is_acpi_enabled(x86ms)) {
+mms->virtio_irq_base = 16;
+}
+
+for (i = 0; i < mms->virtio_num_transports; i++) {
 sysbus_create_simple("virtio-mmio",
  VIRTIO_MMIO_BASE + i * 512,
  x86ms->gsi[mms->virtio_irq_base + i]);
-- 
2.27.0

[PATCH v3 01/12] [testing] disable xhci msix

2020-12-03 Thread Gerd Hoffmann

Signed-off-by: Gerd Hoffmann 
---
 hw/usb/hcd-xhci-pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/usb/hcd-xhci-pci.c b/hw/usb/hcd-xhci-pci.c
index bba628d3d230..5def3ea55ff3 100644
--- a/hw/usb/hcd-xhci-pci.c
+++ b/hw/usb/hcd-xhci-pci.c
@@ -239,7 +239,7 @@ static void qemu_xhci_instance_init(Object *obj)
 XHCIState *xhci = &s->xhci;
 
 s->msi  = ON_OFF_AUTO_OFF;
-s->msix = ON_OFF_AUTO_AUTO;
+s->msix = ON_OFF_AUTO_OFF;
 xhci->numintrs = XHCI_MAXINTRS;
 xhci->numslots = XHCI_MAXSLOTS;
 xhci_set_flag(xhci, XHCI_FLAG_SS_FIRST);
-- 
2.27.0

[PATCH v3 05/12] microvm: make pcie irq base runtime changeable

2020-12-03 Thread Gerd Hoffmann

Allows to move them in case we have enough
irq lines available.

Signed-off-by: Gerd Hoffmann 
Reviewed-by: Igor Mammedov 
---
 include/hw/i386/microvm.h |  2 +-
 hw/i386/microvm.c | 11 ++-
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h
index c5d60bacb5e8..f1e9db059b85 100644
--- a/include/hw/i386/microvm.h
+++ b/include/hw/i386/microvm.h
@@ -66,7 +66,6 @@
 #define PCIE_MMIO_SIZE0x2000
 #define PCIE_ECAM_BASE0xe000
 #define PCIE_ECAM_SIZE0x1000
-#define PCIE_IRQ_BASE 12
 
 /* Machine type options */
 #define MICROVM_MACHINE_PIT "pit"
@@ -96,6 +95,7 @@ struct MicrovmMachineState {
 bool auto_kernel_cmdline;
 
 /* Machine state */
+uint32_t pcie_irq_base;
 uint32_t virtio_irq_base;
 uint32_t virtio_num_transports;
 bool kernel_cmdline_fixed;
diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
index e92f236bf442..5e4182b47464 100644
--- a/hw/i386/microvm.c
+++ b/hw/i386/microvm.c
@@ -181,6 +181,7 @@ static void microvm_devices_init(MicrovmMachineState *mms)
 mms->virtio_irq_base = 5;
 mms->virtio_num_transports = 8;
 if (x86_machine_is_acpi_enabled(x86ms)) {
+mms->pcie_irq_base = 12;
 mms->virtio_irq_base = 16;
 }
 
@@ -226,12 +227,12 @@ static void microvm_devices_init(MicrovmMachineState *mms)
 mms->gpex.mmio32.size = PCIE_MMIO_SIZE;
 mms->gpex.ecam.base   = PCIE_ECAM_BASE;
 mms->gpex.ecam.size   = PCIE_ECAM_SIZE;
-mms->gpex.irq = PCIE_IRQ_BASE;
+mms->gpex.irq = mms->pcie_irq_base;
 create_gpex(mms);
-x86ms->pci_irq_mask = ((1 << (PCIE_IRQ_BASE + 0)) |
-   (1 << (PCIE_IRQ_BASE + 1)) |
-   (1 << (PCIE_IRQ_BASE + 2)) |
-   (1 << (PCIE_IRQ_BASE + 3)));
+x86ms->pci_irq_mask = ((1 << (mms->pcie_irq_base + 0)) |
+   (1 << (mms->pcie_irq_base + 1)) |
+   (1 << (mms->pcie_irq_base + 2)) |
+   (1 << (mms->pcie_irq_base + 3)));
 } else {
 x86ms->pci_irq_mask = 0;
 }
-- 
2.27.0

[PATCH v3 03/12] x86: add support for second ioapic

2020-12-03 Thread Gerd Hoffmann

Add ioapic_init_secondary to initialize it, wire up
in gsi handling and acpi apic table creation.

Signed-off-by: Gerd Hoffmann 
---
 include/hw/i386/ioapic.h  |  2 ++
 include/hw/i386/ioapic_internal.h |  2 +-
 include/hw/i386/x86.h |  3 +++
 hw/i386/acpi-common.c | 10 ++
 hw/i386/x86.c | 21 +
 5 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/include/hw/i386/ioapic.h b/include/hw/i386/ioapic.h
index 06bfaaeac6b8..ef37b8a9fda1 100644
--- a/include/hw/i386/ioapic.h
+++ b/include/hw/i386/ioapic.h
@@ -22,6 +22,8 @@
 
 #define IOAPIC_NUM_PINS 24
 #define IO_APIC_DEFAULT_ADDRESS 0xfec0
+#define IO_APIC_SECONDARY_ADDRESS (IO_APIC_DEFAULT_ADDRESS + 0x1)
+#define IO_APIC_SECONDARY_IRQBASE 24 /* primary 0 -> 23, secondary 24 -> 47 */
 
 #define TYPE_KVM_IOAPIC "kvm-ioapic"
 #define TYPE_IOAPIC "ioapic"
diff --git a/include/hw/i386/ioapic_internal.h 
b/include/hw/i386/ioapic_internal.h
index 0f9002a2c23e..021e715f1131 100644
--- a/include/hw/i386/ioapic_internal.h
+++ b/include/hw/i386/ioapic_internal.h
@@ -27,7 +27,7 @@
 #include "qemu/notify.h"
 #include "qom/object.h"
 
-#define MAX_IOAPICS 1
+#define MAX_IOAPICS 2
 
 #define IOAPIC_LVT_DEST_SHIFT   56
 #define IOAPIC_LVT_DEST_IDX_SHIFT   48
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index 739fac50871b..3f9b052cfc34 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -50,6 +50,7 @@ struct X86MachineState {
 ISADevice *rtc;
 FWCfgState *fw_cfg;
 qemu_irq *gsi;
+DeviceState *ioapic2;
 GMappedFile *initrd_mapped_file;
 HotplugHandler *acpi_dev;
 
@@ -120,10 +121,12 @@ bool x86_machine_is_acpi_enabled(const X86MachineState 
*x86ms);
 typedef struct GSIState {
 qemu_irq i8259_irq[ISA_NUM_IRQS];
 qemu_irq ioapic_irq[IOAPIC_NUM_PINS];
+qemu_irq ioapic2_irq[IOAPIC_NUM_PINS];
 } GSIState;
 
 qemu_irq x86_allocate_cpu_irq(void);
 void gsi_handler(void *opaque, int n, int level);
 void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name);
+DeviceState *ioapic_init_secondary(GSIState *gsi_state);
 
 #endif
diff --git a/hw/i386/acpi-common.c b/hw/i386/acpi-common.c
index 8a769654060e..a6a30e836339 100644
--- a/hw/i386/acpi-common.c
+++ b/hw/i386/acpi-common.c
@@ -103,6 +103,16 @@ void acpi_build_madt(GArray *table_data, BIOSLinker 
*linker,
 io_apic->address = cpu_to_le32(IO_APIC_DEFAULT_ADDRESS);
 io_apic->interrupt = cpu_to_le32(0);
 
+if (x86ms->ioapic2) {
+AcpiMadtIoApic *io_apic2;
+io_apic2 = acpi_data_push(table_data, sizeof *io_apic);
+io_apic2->type = ACPI_APIC_IO;
+io_apic2->length = sizeof(*io_apic);
+io_apic2->io_apic_id = ACPI_BUILD_IOAPIC_ID + 1;
+io_apic2->address = cpu_to_le32(IO_APIC_SECONDARY_ADDRESS);
+io_apic2->interrupt = cpu_to_le32(IO_APIC_SECONDARY_IRQBASE);
+}
+
 if (x86ms->apic_xrupt_override) {
 intsrcovr = acpi_data_push(table_data, sizeof *intsrcovr);
 intsrcovr->type   = ACPI_APIC_XRUPT_OVERRIDE;
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index b67e7b789f89..d68a9eaefc2c 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -598,6 +598,10 @@ void gsi_handler(void *opaque, int n, int level)
 case ISA_NUM_IRQS ... IOAPIC_NUM_PINS - 1:
 qemu_set_irq(s->ioapic_irq[n], level);
 break;
+case IO_APIC_SECONDARY_IRQBASE
+... IO_APIC_SECONDARY_IRQBASE + IOAPIC_NUM_PINS - 1:
+qemu_set_irq(s->ioapic2_irq[n - IO_APIC_SECONDARY_IRQBASE], level);
+break;
 }
 }
 
@@ -624,6 +628,23 @@ void ioapic_init_gsi(GSIState *gsi_state, const char 
*parent_name)
 }
 }
 
+DeviceState *ioapic_init_secondary(GSIState *gsi_state)
+{
+DeviceState *dev;
+SysBusDevice *d;
+unsigned int i;
+
+dev = qdev_new(TYPE_IOAPIC);
+d = SYS_BUS_DEVICE(dev);
+sysbus_realize_and_unref(d, &error_fatal);
+sysbus_mmio_map(d, 0, IO_APIC_SECONDARY_ADDRESS);
+
+for (i = 0; i < IOAPIC_NUM_PINS; i++) {
+gsi_state->ioapic2_irq[i] = qdev_get_gpio_in(dev, i);
+}
+return dev;
+}
+
 struct setup_data {
 uint64_t next;
 uint32_t type;
-- 
2.27.0

[PATCH v3 10/12] tests/acpi: add ioapic2=on test for microvm

2020-12-03 Thread Gerd Hoffmann

APIC table changes:

 [034h 0052   1]Subtable Type : 01 [I/O APIC]
 [035h 0053   1]   Length : 0C
 [036h 0054   1]  I/O Apic ID : 00
 [037h 0055   1] Reserved : 00
 [038h 0056   4]  Address : FEC0
 [03Ch 0060   4]Interrupt : 

+[040h 0064   1]Subtable Type : 01 [I/O APIC]
+[041h 0065   1]   Length : 0C
+[042h 0066   1]  I/O Apic ID : 01
+[043h 0067   1] Reserved : 00
+[044h 0068   4]  Address : FEC1
+[048h 0072   4]Interrupt : 0018

DSDT table changes:

-Device (VR07)
+Device (VR23)
 {
 Name (_HID, "LNRO0005")  // _HID: Hardware ID
-Name (_UID, 0x07)  // _UID: Unique ID
+Name (_UID, 0x17)  // _UID: Unique ID
 Name (_CCA, One)  // _CCA: Cache Coherency Attribute
 Name (_CRS, ResourceTemplate ()  // _CRS: Current Resource Settings
 {
 Memory32Fixed (ReadWrite,
-0xFEB00E00, // Address Base
+0xFEB02E00, // Address Base
 0x0200, // Address Length
 )
 Interrupt (ResourceConsumer, Level, ActiveHigh, Exclusive, ,, )
 {
-0x0017,
+0x002F,
 }
 })
 }
 }

Signed-off-by: Gerd Hoffmann 
---
 tests/qtest/bios-tables-test.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/tests/qtest/bios-tables-test.c b/tests/qtest/bios-tables-test.c
index 0a0ce76ffcf9..4c4e6dd1e9cc 100644
--- a/tests/qtest/bios-tables-test.c
+++ b/tests/qtest/bios-tables-test.c
@@ -1154,6 +1154,17 @@ static void test_acpi_microvm_pcie_tcg(void)
 free_test_data(&data);
 }
 
+static void test_acpi_microvm_ioapic2_tcg(void)
+{
+test_data data;
+
+test_acpi_microvm_prepare(&data);
+data.variant = ".ioapic2";
+test_acpi_one(" -machine microvm,acpi=on,ioapic2=on,rtc=off",
+  &data);
+free_test_data(&data);
+}
+
 static void test_acpi_virt_tcg_numamem(void)
 {
 test_data data = {
@@ -1280,6 +1291,7 @@ int main(int argc, char *argv[])
 qtest_add_func("acpi/microvm", test_acpi_microvm_tcg);
 qtest_add_func("acpi/microvm/usb", test_acpi_microvm_usb_tcg);
 qtest_add_func("acpi/microvm/rtc", test_acpi_microvm_rtc_tcg);
+qtest_add_func("acpi/microvm/ioapic2", test_acpi_microvm_ioapic2_tcg);
 if (strcmp(arch, "x86_64") == 0) {
 qtest_add_func("acpi/microvm/pcie", test_acpi_microvm_pcie_tcg);
 }
-- 
2.27.0

[PATCH v3 11/12] tests/acpi: update expected data files

2020-12-03 Thread Gerd Hoffmann

Signed-off-by: Gerd Hoffmann 
---
 tests/data/acpi/microvm/APIC.ioapic2 | Bin 70 -> 82 bytes
 tests/data/acpi/microvm/DSDT.ioapic2 | Bin 365 -> 365 bytes
 2 files changed, 0 insertions(+), 0 deletions(-)

diff --git a/tests/data/acpi/microvm/APIC.ioapic2 
b/tests/data/acpi/microvm/APIC.ioapic2
index 
7472c7e830b6c7139720e93dd544d4441556661d..a305f89d99eca881109ba54090da0f90262a402c
 100644
GIT binary patch
delta 35
ncmZ<@;&Ke|bPi%*U|@VUk;_bok%tk;KKM@pNV2f~2MPlKf^P=P

delta 23
dcmWG?<8ln}barE4U|_sHk;{yYh3!9(2>?8v1it_P

diff --git a/tests/data/acpi/microvm/DSDT.ioapic2 
b/tests/data/acpi/microvm/DSDT.ioapic2
index 
b43f427a222a933d3f34aceab6224a2c6115c365..aee44dd3de1bb16585bf571ff0ca8e44d467d009
 100644
GIT binary patch
delta 83
zcmaFM^p=UsCD

[PATCH v3 08/12] tests/acpi: allow updates for expected data files

2020-12-03 Thread Gerd Hoffmann

Signed-off-by: Gerd Hoffmann 
---
 tests/qtest/bios-tables-test-allowed-diff.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8bf4..191ac230b013 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,3 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/microvm/APIC.ioapic2",
+"tests/data/acpi/microvm/DSDT.ioapic2",
-- 
2.27.0

[PATCH v3 02/12] x86: rewrite gsi_handler()

2020-12-03 Thread Gerd Hoffmann

Rewrite function to use switch() for IRQ number mapping.
Check i8259_irq exists before raising it so the function
also works in case no i8259 (aka pic) is present.

Signed-off-by: Gerd Hoffmann 
Reviewed-by: Igor Mammedov 
---
 hw/i386/x86.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 5944fc44edca..b67e7b789f89 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -588,11 +588,17 @@ void gsi_handler(void *opaque, int n, int level)
 GSIState *s = opaque;
 
 trace_x86_gsi_interrupt(n, level);
-if (n < ISA_NUM_IRQS) {
-/* Under KVM, Kernel will forward to both PIC and IOAPIC */
-qemu_set_irq(s->i8259_irq[n], level);
+switch (n) {
+case 0 ... ISA_NUM_IRQS - 1:
+if (s->i8259_irq[n]) {
+/* Under KVM, Kernel will forward to both PIC and IOAPIC */
+qemu_set_irq(s->i8259_irq[n], level);
+}
+/* fall through */
+case ISA_NUM_IRQS ... IOAPIC_NUM_PINS - 1:
+qemu_set_irq(s->ioapic_irq[n], level);
+break;
 }
-qemu_set_irq(s->ioapic_irq[n], level);
 }
 
 void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name)
-- 
2.27.0

Re: [PATCH v3 01/12] [testing] disable xhci msix

2020-12-03 Thread Michael S. Tsirkin

On Thu, Dec 03, 2020 at 11:54:12AM +0100, Gerd Hoffmann wrote:
> Signed-off-by: Gerd Hoffmann 

A bit more context on why you are doing this?

> ---
>  hw/usb/hcd-xhci-pci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/usb/hcd-xhci-pci.c b/hw/usb/hcd-xhci-pci.c
> index bba628d3d230..5def3ea55ff3 100644
> --- a/hw/usb/hcd-xhci-pci.c
> +++ b/hw/usb/hcd-xhci-pci.c
> @@ -239,7 +239,7 @@ static void qemu_xhci_instance_init(Object *obj)
>  XHCIState *xhci = &s->xhci;
>  
>  s->msi  = ON_OFF_AUTO_OFF;
> -s->msix = ON_OFF_AUTO_AUTO;
> +s->msix = ON_OFF_AUTO_OFF;
>  xhci->numintrs = XHCI_MAXINTRS;
>  xhci->numslots = XHCI_MAXSLOTS;
>  xhci_set_flag(xhci, XHCI_FLAG_SS_FIRST);
> -- 
> 2.27.0

[PATCH v3 06/12] microvm: drop microvm_gsi_handler()

2020-12-03 Thread Gerd Hoffmann

With the improved gsi_handler() we don't need
our private version any more.

Signed-off-by: Gerd Hoffmann 
Reviewed-by: Igor Mammedov 
---
 hw/i386/microvm.c | 14 +-
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
index 5e4182b47464..829b376a1278 100644
--- a/hw/i386/microvm.c
+++ b/hw/i386/microvm.c
@@ -96,13 +96,6 @@ static void microvm_set_rtc(MicrovmMachineState *mms, 
ISADevice *s)
 rtc_set_memory(s, 0x5d, val >> 16);
 }
 
-static void microvm_gsi_handler(void *opaque, int n, int level)
-{
-GSIState *s = opaque;
-
-qemu_set_irq(s->ioapic_irq[n], level);
-}
-
 static void create_gpex(MicrovmMachineState *mms)
 {
 X86MachineState *x86ms = X86_MACHINE(mms);
@@ -163,12 +156,7 @@ static void microvm_devices_init(MicrovmMachineState *mms)
 /* Core components */
 
 gsi_state = g_malloc0(sizeof(*gsi_state));
-if (mms->pic == ON_OFF_AUTO_ON || mms->pic == ON_OFF_AUTO_AUTO) {
-x86ms->gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS);
-} else {
-x86ms->gsi = qemu_allocate_irqs(microvm_gsi_handler,
-gsi_state, GSI_NUM_PINS);
-}
+x86ms->gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS);
 
 isa_bus = isa_bus_new(NULL, get_system_memory(), get_system_io(),
   &error_abort);
-- 
2.27.0

Re: [PULL 0/6] pc,vhost: fixes

2020-12-03 Thread Peter Maydell

On Thu, 3 Dec 2020 at 10:59, Michael S. Tsirkin  wrote:
>
> On Thu, Dec 03, 2020 at 10:20:03AM +, Peter Maydell wrote:
> > On Wed, 2 Dec 2020 at 11:03, Michael S. Tsirkin  wrote:
> > >
> > > Patch 5 gave me pause but we do need patch 6 as
> > > it's a guest triggerable assert, and it seemed
> > > cleaner to just take the whole patchset than cherry-pick.
> >
> > Is this only "fixes a guest triggerable assert"?
>
> My understanding is that without the patches a rhel7 guest triggers
> the assert if started with vtd enabled and virtio-net with
> iommu_platform=on.
>
> https://bugs.launchpad.net/qemu/+bug/1885175

Bug reported in June ? Is this a regression since 5.1, or
was it this way in 5.1 as well?

thanks
-- PMM

Re: [PATCH v3 00/12] microvm: add second ioapic

2020-12-03 Thread Michael S. Tsirkin

On Thu, Dec 03, 2020 at 11:54:11AM +0100, Gerd Hoffmann wrote:
> Add a second ioapic to microvm.  Gives us more IRQ lines we can
> use for virtio-mmio devices.  Bump number of possible virtio-mmio
> devices from 8 to 24.


acpi things all look ok:

Reviewed-by: Michael S. Tsirkin 


> v3:
>  - pick up some review tags.
>  - replace magic numbers with #defines.
>  - add asl changes to commit messages.
> v2:
>  - reorganize code a bit.
>  - add ioapic2= option to microvm.
> 
> Gerd Hoffmann (12):
>   [testing] disable xhci msix
>   x86: rewrite gsi_handler()
>   x86: add support for second ioapic
>   microvm: make number of virtio transports runtime changeable
>   microvm: make pcie irq base runtime changeable
>   microvm: drop microvm_gsi_handler()
>   microvm: add second ioapic
>   tests/acpi: allow updates for expected data files
>   tests/acpi: add data files for ioapic2 test variant
>   tests/acpi: add ioapic2=on test for microvm
>   tests/acpi: update expected data files
>   tests/acpi: disallow updates for expected data files
> 
>  include/hw/i386/ioapic.h |   2 +
>  include/hw/i386/ioapic_internal.h|   2 +-
>  include/hw/i386/microvm.h|   6 +-
>  include/hw/i386/x86.h|   3 +
>  hw/i386/acpi-common.c|  10 
>  hw/i386/microvm.c|  82 ---
>  hw/i386/x86.c|  35 ++--
>  hw/usb/hcd-xhci-pci.c|   2 +-
>  tests/qtest/bios-tables-test.c   |  20 +--
>  tests/data/acpi/microvm/APIC.ioapic2 | Bin 0 -> 82 bytes
>  tests/data/acpi/microvm/DSDT.ioapic2 | Bin 0 -> 365 bytes
>  11 files changed, 129 insertions(+), 33 deletions(-)
>  create mode 100644 tests/data/acpi/microvm/APIC.ioapic2
>  create mode 100644 tests/data/acpi/microvm/DSDT.ioapic2
> 
> -- 
> 2.27.0
>

[PATCH v3 07/12] microvm: add second ioapic

2020-12-03 Thread Gerd Hoffmann

Create second ioapic, route virtio-mmio IRQs to it,
allow more virtio-mmio devices (24 instead of 8).

Needs ACPI, enabled by default, can be turned off
using -machine ioapic2=off

Signed-off-by: Gerd Hoffmann 
---
 include/hw/i386/microvm.h  |  2 ++
 hw/i386/microvm.c  | 56 +++---
 tests/qtest/bios-tables-test.c |  8 ++---
 3 files changed, 57 insertions(+), 9 deletions(-)

diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h
index f1e9db059b85..f25f8374413f 100644
--- a/include/hw/i386/microvm.h
+++ b/include/hw/i386/microvm.h
@@ -72,6 +72,7 @@
 #define MICROVM_MACHINE_PIC "pic"
 #define MICROVM_MACHINE_RTC "rtc"
 #define MICROVM_MACHINE_PCIE"pcie"
+#define MICROVM_MACHINE_IOAPIC2 "ioapic2"
 #define MICROVM_MACHINE_ISA_SERIAL  "isa-serial"
 #define MICROVM_MACHINE_OPTION_ROMS "x-option-roms"
 #define MICROVM_MACHINE_AUTO_KERNEL_CMDLINE "auto-kernel-cmdline"
@@ -90,6 +91,7 @@ struct MicrovmMachineState {
 OnOffAuto pit;
 OnOffAuto rtc;
 OnOffAuto pcie;
+OnOffAuto ioapic2;
 bool isa_serial;
 bool option_roms;
 bool auto_kernel_cmdline;
diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
index 829b376a1278..56886086133c 100644
--- a/hw/i386/microvm.c
+++ b/hw/i386/microvm.c
@@ -145,32 +145,53 @@ static void create_gpex(MicrovmMachineState *mms)
 }
 }
 
+static int microvm_ioapics(MicrovmMachineState *mms)
+{
+if (!x86_machine_is_acpi_enabled(X86_MACHINE(mms))) {
+return 1;
+}
+if (mms->ioapic2 == ON_OFF_AUTO_OFF) {
+return 1;
+}
+return 2;
+}
+
 static void microvm_devices_init(MicrovmMachineState *mms)
 {
 X86MachineState *x86ms = X86_MACHINE(mms);
 ISABus *isa_bus;
 ISADevice *rtc_state;
 GSIState *gsi_state;
+int ioapics;
 int i;
 
 /* Core components */
-
+ioapics = microvm_ioapics(mms);
 gsi_state = g_malloc0(sizeof(*gsi_state));
-x86ms->gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS);
+x86ms->gsi = qemu_allocate_irqs(gsi_handler, gsi_state,
+IOAPIC_NUM_PINS * ioapics);
 
 isa_bus = isa_bus_new(NULL, get_system_memory(), get_system_io(),
   &error_abort);
 isa_bus_irqs(isa_bus, x86ms->gsi);
 
 ioapic_init_gsi(gsi_state, "machine");
+if (ioapics > 1) {
+x86ms->ioapic2 = ioapic_init_secondary(gsi_state);
+}
 
 kvmclock_create(true);
 
 mms->virtio_irq_base = 5;
 mms->virtio_num_transports = 8;
-if (x86_machine_is_acpi_enabled(x86ms)) {
-mms->pcie_irq_base = 12;
-mms->virtio_irq_base = 16;
+if (x86ms->ioapic2) {
+mms->pcie_irq_base = 16;/* 16 -> 19 */
+/* use second ioapic (24 -> 47) for virtio-mmio irq lines */
+mms->virtio_irq_base = IO_APIC_SECONDARY_IRQBASE;
+mms->virtio_num_transports = IOAPIC_NUM_PINS;
+} else if (x86_machine_is_acpi_enabled(x86ms)) {
+mms->pcie_irq_base = 12;/* 12 -> 15 */
+mms->virtio_irq_base = 16;  /* 16 -> 23 */
 }
 
 for (i = 0; i < mms->virtio_num_transports; i++) {
@@ -544,6 +565,23 @@ static void microvm_machine_set_pcie(Object *obj, Visitor 
*v, const char *name,
 visit_type_OnOffAuto(v, name, &mms->pcie, errp);
 }
 
+static void microvm_machine_get_ioapic2(Object *obj, Visitor *v, const char 
*name,
+void *opaque, Error **errp)
+{
+MicrovmMachineState *mms = MICROVM_MACHINE(obj);
+OnOffAuto ioapic2 = mms->ioapic2;
+
+visit_type_OnOffAuto(v, name, &ioapic2, errp);
+}
+
+static void microvm_machine_set_ioapic2(Object *obj, Visitor *v, const char 
*name,
+void *opaque, Error **errp)
+{
+MicrovmMachineState *mms = MICROVM_MACHINE(obj);
+
+visit_type_OnOffAuto(v, name, &mms->ioapic2, errp);
+}
+
 static bool microvm_machine_get_isa_serial(Object *obj, Error **errp)
 {
 MicrovmMachineState *mms = MICROVM_MACHINE(obj);
@@ -620,6 +658,7 @@ static void microvm_machine_initfn(Object *obj)
 mms->pit = ON_OFF_AUTO_AUTO;
 mms->rtc = ON_OFF_AUTO_AUTO;
 mms->pcie = ON_OFF_AUTO_AUTO;
+mms->ioapic2 = ON_OFF_AUTO_AUTO;
 mms->isa_serial = true;
 mms->option_roms = true;
 mms->auto_kernel_cmdline = true;
@@ -693,6 +732,13 @@ static void microvm_class_init(ObjectClass *oc, void *data)
 object_class_property_set_description(oc, MICROVM_MACHINE_PCIE,
 "Enable PCIe");
 
+object_class_property_add(oc, MICROVM_MACHINE_IOAPIC2, "OnOffAuto",
+  microvm_machine_get_ioapic2,
+  microvm_machine_set_ioapic2,
+  NULL, NULL);
+object_class_property_set_description(oc, MICROVM_MACHINE_IOAPIC2,
+"Enable second IO-APIC");
+
 object_class_property_add_bool(oc, MICROVM_MACHINE_ISA_SERIAL,

[PATCH v3 12/12] tests/acpi: disallow updates for expected data files

2020-12-03 Thread Gerd Hoffmann

Signed-off-by: Gerd Hoffmann 
---
 tests/qtest/bios-tables-test-allowed-diff.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index 191ac230b013..dfb8523c8bf4 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1,3 +1 @@
 /* List of comma-separated changed AML files to ignore */
-"tests/data/acpi/microvm/APIC.ioapic2",
-"tests/data/acpi/microvm/DSDT.ioapic2",
-- 
2.27.0

[PATCH 2/9] console: add check for ui_info pointer

2020-12-03 Thread Gerd Hoffmann

Verify the hw_ops->ui_info function pointer is non-zero before
calling it.  Can be triggered by qxl which changes hw_ops when
switching between qxl-native and vga-compat modes.

Signed-off-by: Gerd Hoffmann 
---
 ui/console.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/ui/console.c b/ui/console.c
index 16b326854080..8930808d0b6d 100644
--- a/ui/console.c
+++ b/ui/console.c
@@ -1539,7 +1539,9 @@ static void dpy_set_ui_info_timer(void *opaque)
 {
 QemuConsole *con = opaque;
 
-con->hw_ops->ui_info(con->hw, con->head, &con->ui_info);
+if (con->hw_ops->ui_info) {
+con->hw_ops->ui_info(con->hw, con->head, &con->ui_info);
+}
 }
 
 bool dpy_ui_info_supported(QemuConsole *con)
-- 
2.27.0

[PATCH 3/9] vnc: use enum for features

2020-12-03 Thread Gerd Hoffmann

Use an enum for the vnc feature bits.  That way they are enumerated
automatically and we don't have to do that manually when adding or
removing features.

Signed-off-by: Gerd Hoffmann 
---
 ui/vnc.h | 26 ++
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/ui/vnc.h b/ui/vnc.h
index 4e2637ce6c5c..262fcf179b44 100644
--- a/ui/vnc.h
+++ b/ui/vnc.h
@@ -438,18 +438,20 @@ enum {
  *
  */
 
-#define VNC_FEATURE_RESIZE   0
-#define VNC_FEATURE_HEXTILE  1
-#define VNC_FEATURE_POINTER_TYPE_CHANGE  2
-#define VNC_FEATURE_WMVI 3
-#define VNC_FEATURE_TIGHT4
-#define VNC_FEATURE_ZLIB 5
-#define VNC_FEATURE_COPYRECT 6
-#define VNC_FEATURE_RICH_CURSOR  7
-#define VNC_FEATURE_TIGHT_PNG8
-#define VNC_FEATURE_ZRLE 9
-#define VNC_FEATURE_ZYWRLE  10
-#define VNC_FEATURE_LED_STATE   11
+enum VncFeatures {
+VNC_FEATURE_RESIZE,
+VNC_FEATURE_HEXTILE,
+VNC_FEATURE_POINTER_TYPE_CHANGE,
+VNC_FEATURE_WMVI,
+VNC_FEATURE_TIGHT,
+VNC_FEATURE_ZLIB,
+VNC_FEATURE_COPYRECT,
+VNC_FEATURE_RICH_CURSOR,
+VNC_FEATURE_TIGHT_PNG,
+VNC_FEATURE_ZRLE,
+VNC_FEATURE_ZYWRLE,
+VNC_FEATURE_LED_STATE,
+};
 
 #define VNC_FEATURE_RESIZE_MASK  (1 << VNC_FEATURE_RESIZE)
 #define VNC_FEATURE_HEXTILE_MASK (1 << VNC_FEATURE_HEXTILE)
-- 
2.27.0

[PATCH 0/9] vnc: support some new extensions.

2020-12-03 Thread Gerd Hoffmann

The rfb standard keeps envolving.  While the official spec is kind of
frozen since a decade or so the community maintains an updated version
of the spec at:
https://github.com/rfbproto/rfbproto/

This series adds support for two new extensions from that spec: alpha
cursor and extended desktop resize.

alpha cursor allows a full alpha channel for the cursor image (unlike
the rich cursor extension which has only a bitmask for transparency).

extended desktop resize makes the desktop-resize work both ways, i.e. we
can not only signal guest display resolution changes to the vnc client
but also vnc client window size changes to the guest.

Tested with tigervnc.

gtk-vnc (and anything building on top of it like virt-viewer and
virt-manager) has no support for these extensions.

Gerd Hoffmann (9):
  console: allow con==NULL in dpy_set_ui_info
  console: add check for ui_info pointer
  vnc: use enum for features
  vnc: drop unused copyrect feature
  vnc: add pseudo encodings
  vnc: add alpha cursor support
  vnc: force initial resize message
  vnc: add support for extended desktop resize
  qxl: add ui_info callback

 ui/vnc.h | 32 +---
 hw/display/qxl.c | 27 ++
 ui/console.c |  8 +++-
 ui/vnc.c | 97 ++--
 4 files changed, 138 insertions(+), 26 deletions(-)

-- 
2.27.0

Re: [PATCH v3 05/10] hvf: arm: Mark CPU as dirty on reset

2020-12-03 Thread Alexander Graf




On 03.12.20 02:52, Roman Bolshakov wrote:

On Wed, Dec 02, 2020 at 08:04:03PM +0100, Alexander Graf wrote:

When clearing internal state of a CPU, we should also make sure that HVF
knows about it and can push the new values down to vcpu state.


I'm sorry if I'm asking something dumb. But isn't
cpu_synchronize_all_post_reset() is supposed to push QEMU state into HVF
(or any other accel) after reset?

For x86 it used to work:

   static void do_hvf_cpu_synchronize_post_reset(CPUState *cpu,
 run_on_cpu_data arg)
   {
   hvf_put_registers(cpu); 
   
cpu->vcpu_dirty = false;
   }



Yes, it works because after the reset is done, there is no other 
register modification happening. With the PSCI emulation code in QEMU, 
we still do modify CPU state after reset though.


Different question though: Why do we need the post_reset() call at all 
here to push state? We would just push it on the next run anyways, 
right? So if we don't set vcpu_dirty to false then, we wouldn't need 
this patch here I think.



Alex

[PATCH 6/9] vnc: add alpha cursor support

2020-12-03 Thread Gerd Hoffmann

There is a new vnc extension for cursors with an alpha channel.  Use
it if supported by the vnc client, prefer it over the "rich cursor"
extension which supports only a bitmask for transparency.

This is a visible improvement especially on modern desktops which
actually use the alpha channel when defining cursors.

https://github.com/rfbproto/rfbproto/blob/master/rfbproto.rst#cursor-with-alpha-pseudo-encoding

Signed-off-by: Gerd Hoffmann 
---
 ui/vnc.h |  2 ++
 ui/vnc.c | 21 ++---
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/ui/vnc.h b/ui/vnc.h
index 6f5006da3593..c8d3ad9ec496 100644
--- a/ui/vnc.h
+++ b/ui/vnc.h
@@ -448,6 +448,7 @@ enum VncFeatures {
 VNC_FEATURE_TIGHT,
 VNC_FEATURE_ZLIB,
 VNC_FEATURE_RICH_CURSOR,
+VNC_FEATURE_ALPHA_CURSOR,
 VNC_FEATURE_TIGHT_PNG,
 VNC_FEATURE_ZRLE,
 VNC_FEATURE_ZYWRLE,
@@ -461,6 +462,7 @@ enum VncFeatures {
 #define VNC_FEATURE_TIGHT_MASK   (1 << VNC_FEATURE_TIGHT)
 #define VNC_FEATURE_ZLIB_MASK(1 << VNC_FEATURE_ZLIB)
 #define VNC_FEATURE_RICH_CURSOR_MASK (1 << VNC_FEATURE_RICH_CURSOR)
+#define VNC_FEATURE_ALPHA_CURSOR_MASK(1 << VNC_FEATURE_ALPHA_CURSOR)
 #define VNC_FEATURE_TIGHT_PNG_MASK   (1 << VNC_FEATURE_TIGHT_PNG)
 #define VNC_FEATURE_ZRLE_MASK(1 << VNC_FEATURE_ZRLE)
 #define VNC_FEATURE_ZYWRLE_MASK  (1 << VNC_FEATURE_ZYWRLE)
diff --git a/ui/vnc.c b/ui/vnc.c
index 8c2771c1ce3b..247e80d8f5c8 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -937,6 +937,18 @@ static int vnc_cursor_define(VncState *vs)
 QEMUCursor *c = vs->vd->cursor;
 int isize;
 
+if (vnc_has_feature(vs, VNC_FEATURE_ALPHA_CURSOR)) {
+vnc_lock_output(vs);
+vnc_write_u8(vs,  VNC_MSG_SERVER_FRAMEBUFFER_UPDATE);
+vnc_write_u8(vs,  0);  /*  padding */
+vnc_write_u16(vs, 1);  /*  # of rects  */
+vnc_framebuffer_update(vs, c->hot_x, c->hot_y, c->width, c->height,
+   VNC_ENCODING_ALPHA_CURSOR);
+vnc_write_s32(vs, VNC_ENCODING_RAW);
+vnc_write(vs, c->data, c->width * c->height * 4);
+vnc_unlock_output(vs);
+return 0;
+}
 if (vnc_has_feature(vs, VNC_FEATURE_RICH_CURSOR)) {
 vnc_lock_output(vs);
 vnc_write_u8(vs,  VNC_MSG_SERVER_FRAMEBUFFER_UPDATE);
@@ -2102,9 +2114,9 @@ static void set_encodings(VncState *vs, int32_t 
*encodings, size_t n_encodings)
 break;
 case VNC_ENCODING_RICH_CURSOR:
 vs->features |= VNC_FEATURE_RICH_CURSOR_MASK;
-if (vs->vd->cursor) {
-vnc_cursor_define(vs);
-}
+break;
+case VNC_ENCODING_ALPHA_CURSOR:
+vs->features |= VNC_FEATURE_ALPHA_CURSOR_MASK;
 break;
 case VNC_ENCODING_EXT_KEY_EVENT:
 send_ext_key_event_ack(vs);
@@ -2134,6 +2146,9 @@ static void set_encodings(VncState *vs, int32_t 
*encodings, size_t n_encodings)
 vnc_desktop_resize(vs);
 check_pointer_type_change(&vs->mouse_mode_notifier, NULL);
 vnc_led_state_change(vs);
+if (vs->vd->cursor) {
+vnc_cursor_define(vs);
+}
 }
 
 static void set_pixel_conversion(VncState *vs)
-- 
2.27.0

[PATCH 7/9] vnc: force initial resize message

2020-12-03 Thread Gerd Hoffmann

The vnc server should send desktop resize notifications unconditionally
on a new client connect, for feature negotiation reasons.  Add a bool
flag to vnc_desktop_resize() to force sending the message even in case
there is no size change.

Signed-off-by: Gerd Hoffmann 
---
 ui/vnc.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/ui/vnc.c b/ui/vnc.c
index 247e80d8f5c8..bdaf384f71a4 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -664,13 +664,14 @@ void vnc_framebuffer_update(VncState *vs, int x, int y, 
int w, int h,
 }
 
 
-static void vnc_desktop_resize(VncState *vs)
+static void vnc_desktop_resize(VncState *vs, bool force)
 {
 if (vs->ioc == NULL || !vnc_has_feature(vs, VNC_FEATURE_RESIZE)) {
 return;
 }
 if (vs->client_width == pixman_image_get_width(vs->vd->server) &&
-vs->client_height == pixman_image_get_height(vs->vd->server)) {
+vs->client_height == pixman_image_get_height(vs->vd->server) &&
+!force) {
 return;
 }
 
@@ -800,7 +801,7 @@ static void vnc_dpy_switch(DisplayChangeListener *dcl,
 
 QTAILQ_FOREACH(vs, &vd->clients, next) {
 vnc_colordepth(vs);
-vnc_desktop_resize(vs);
+vnc_desktop_resize(vs, false);
 if (vs->vd->cursor) {
 vnc_cursor_define(vs);
 }
@@ -2143,7 +2144,7 @@ static void set_encodings(VncState *vs, int32_t 
*encodings, size_t n_encodings)
 break;
 }
 }
-vnc_desktop_resize(vs);
+vnc_desktop_resize(vs, true);
 check_pointer_type_change(&vs->mouse_mode_notifier, NULL);
 vnc_led_state_change(vs);
 if (vs->vd->cursor) {
-- 
2.27.0

Re: [PULL 0/6] pc,vhost: fixes

2020-12-03 Thread Michael S. Tsirkin

On Thu, Dec 03, 2020 at 10:20:03AM +, Peter Maydell wrote:
> On Wed, 2 Dec 2020 at 11:03, Michael S. Tsirkin  wrote:
> >
> > Patch 5 gave me pause but we do need patch 6 as
> > it's a guest triggerable assert, and it seemed
> > cleaner to just take the whole patchset than cherry-pick.
> 
> Is this only "fixes a guest triggerable assert"?

My understanding is that without the patches a rhel7 guest triggers
the assert if started with vtd enabled and virtio-net with
iommu_platform=on.

https://bugs.launchpad.net/qemu/+bug/1885175


> If so, that's
> not sufficiently important to require an rc5 at this point.
> 
> thanks
> -- PMM

For sure most people don't use vtd ...

-- 
MST

[PATCH 8/9] vnc: add support for extended desktop resize

2020-12-03 Thread Gerd Hoffmann

The extended desktop resize encoding adds support for (a) clients
sending resize requests to the server, and (b) multihead support.

This patch implements (a).  All resize requests are rejected by qemu.
Qemu can't resize the framebuffer on its own, this is in the hands of
the guest, so all qemu can do is forward the request to the guest.
Should the guest actually resize the framebuffer we can notify the vnc
client later with a separate message.

This requires support in the display device.  Works with virtio-gpu.

https://github.com/rfbproto/rfbproto/blob/master/rfbproto.rst#extendeddesktopsize-pseudo-encoding

Signed-off-by: Gerd Hoffmann 
---
 ui/vnc.h |  2 ++
 ui/vnc.c | 64 +++-
 2 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/ui/vnc.h b/ui/vnc.h
index c8d3ad9ec496..77a310947bd6 100644
--- a/ui/vnc.h
+++ b/ui/vnc.h
@@ -442,6 +442,7 @@ enum {
 
 enum VncFeatures {
 VNC_FEATURE_RESIZE,
+VNC_FEATURE_RESIZE_EXT,
 VNC_FEATURE_HEXTILE,
 VNC_FEATURE_POINTER_TYPE_CHANGE,
 VNC_FEATURE_WMVI,
@@ -456,6 +457,7 @@ enum VncFeatures {
 };
 
 #define VNC_FEATURE_RESIZE_MASK  (1 << VNC_FEATURE_RESIZE)
+#define VNC_FEATURE_RESIZE_EXT_MASK  (1 << VNC_FEATURE_RESIZE_EXT)
 #define VNC_FEATURE_HEXTILE_MASK (1 << VNC_FEATURE_HEXTILE)
 #define VNC_FEATURE_POINTER_TYPE_CHANGE_MASK (1 << 
VNC_FEATURE_POINTER_TYPE_CHANGE)
 #define VNC_FEATURE_WMVI_MASK(1 << VNC_FEATURE_WMVI)
diff --git a/ui/vnc.c b/ui/vnc.c
index bdaf384f71a4..a15132faa96f 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -663,10 +663,35 @@ void vnc_framebuffer_update(VncState *vs, int x, int y, 
int w, int h,
 vnc_write_s32(vs, encoding);
 }
 
+static void vnc_desktop_resize_ext(VncState *vs, bool reject)
+{
+vnc_lock_output(vs);
+vnc_write_u8(vs, VNC_MSG_SERVER_FRAMEBUFFER_UPDATE);
+vnc_write_u8(vs, 0);
+vnc_write_u16(vs, 1); /* number of rects */
+vnc_framebuffer_update(vs,
+   reject ? 1 : 0,
+   reject ? 3 : 0,
+   vs->client_width, vs->client_height,
+   VNC_ENCODING_DESKTOP_RESIZE_EXT);
+vnc_write_u8(vs, 1);  /* number of screens */
+vnc_write_u8(vs, 0);  /* padding */
+vnc_write_u8(vs, 0);  /* padding */
+vnc_write_u8(vs, 0);  /* padding */
+vnc_write_u32(vs, 0); /* screen id */
+vnc_write_u16(vs, 0); /* screen x-pos */
+vnc_write_u16(vs, 0); /* screen y-pos */
+vnc_write_u16(vs, vs->client_width);
+vnc_write_u16(vs, vs->client_height);
+vnc_write_u32(vs, 0); /* screen flags */
+vnc_unlock_output(vs);
+vnc_flush(vs);
+}
 
 static void vnc_desktop_resize(VncState *vs, bool force)
 {
-if (vs->ioc == NULL || !vnc_has_feature(vs, VNC_FEATURE_RESIZE)) {
+if (vs->ioc == NULL || (!vnc_has_feature(vs, VNC_FEATURE_RESIZE) &&
+!vnc_has_feature(vs, VNC_FEATURE_RESIZE_EXT))) {
 return;
 }
 if (vs->client_width == pixman_image_get_width(vs->vd->server) &&
@@ -681,6 +706,12 @@ static void vnc_desktop_resize(VncState *vs, bool force)
pixman_image_get_height(vs->vd->server) >= 0);
 vs->client_width = pixman_image_get_width(vs->vd->server);
 vs->client_height = pixman_image_get_height(vs->vd->server);
+
+if (vnc_has_feature(vs, VNC_FEATURE_RESIZE_EXT)) {
+vnc_desktop_resize_ext(vs, false);
+return;
+}
+
 vnc_lock_output(vs);
 vnc_write_u8(vs, VNC_MSG_SERVER_FRAMEBUFFER_UPDATE);
 vnc_write_u8(vs, 0);
@@ -2110,6 +2141,9 @@ static void set_encodings(VncState *vs, int32_t 
*encodings, size_t n_encodings)
 case VNC_ENCODING_DESKTOPRESIZE:
 vs->features |= VNC_FEATURE_RESIZE_MASK;
 break;
+case VNC_ENCODING_DESKTOP_RESIZE_EXT:
+vs->features |= VNC_FEATURE_RESIZE_EXT_MASK;
+break;
 case VNC_ENCODING_POINTER_TYPE_CHANGE:
 vs->features |= VNC_FEATURE_POINTER_TYPE_CHANGE_MASK;
 break;
@@ -2431,6 +2465,34 @@ static int protocol_client_msg(VncState *vs, uint8_t 
*data, size_t len)
 break;
 }
 break;
+case VNC_MSG_CLIENT_SET_DESKTOP_SIZE:
+{
+size_t size;
+uint8_t screens;
+uint16_t width;
+uint16_t height;
+QemuUIInfo info;
+
+if (len < 8) {
+return 8;
+}
+
+screens = read_u8(data, 6);
+size= 8 + screens * 16;
+if (len < size) {
+return size;
+}
+
+width   = read_u16(data, 2);
+height  = read_u16(data, 4);
+vnc_desktop_resize_ext(vs, true);
+
+memset(&info, 0, sizeof(info));
+info.width = width;
+info.height = height;
+dpy_set_ui_info(vs->vd->dcl.con, &info);
+break;
+}
 default:
 VNC_DEBUG("Msg: %d\n", data[0]);
 vnc_client_error(vs);
-- 
2.27.0

Re: [PATCH 00/18] qapi/qom: QAPIfy object-add

2020-12-03 Thread Paolo Bonzini


On 02/12/20 18:35, Kevin Wolf wrote:

Could we have an intermediate state that doesn't require any
duplication and thus have no separate code paths that could
diverge?


The one requirement we have for an intermediate state is that it
supports both interfaces: The well-know create/set properties/realize
dance, and a new DeviceClass method, say .create(), that takes the
configuration in parameters instead of relying on previously set
properties.

I assumed two separate implementations of transferring the configuration
into the internal state. On second thought, this assumption is maybe
wrong.

You can implement the new method as wrapper around the old way: It could
just set all the properties and call realize. Of course, you don't win
much in terms of improving the class implementation this way, but just
support the new interface, but I guess it can be a reasonable
intermediate step to resolve complicated dependencies etc.


I sketched something at https://wiki.qemu.org/Features/QOM-QAPI_integration.

The main difference with what we discussed so far is that it doesn't 
subsume the complete/realize step, only the setting of properties.  The 
idea is that user_creatable_add_type does the following:


- calls an oc->configure method on every superclass of the object

- takes what's left of the input visitor and uses it to set properties

- then calls ucc->complete

So in the end the only new step is the first.  If the first two steps 
are bundled in a new function object_configure, they can be reused for 
devices, machines and accelerators.


The QAPI code generator can also easily wrap them into a C API for QOM 
object creation.


I glossed completely over the generation of properties within the QAPI 
code generator.  Making properties read-only (or, in the 
field-properties world, having a default allow_set of "return false") is 
already a nice improvement over



It would be much nicer to do the wrapper the other way round, i.e.
setting properties before the device is realized would update a
configuration struct and realize would then call .create() with that
struct. To me, this sounds much harder, though also a more useful state.


This sounds much harder.  However, based on the RngEgd sample, going 
from a basic QAPI struct to a full conversion is not too hard and makes 
code shorter.  So it's win-win even though it's human work.


Paolo

[Bug 1906536] Re: Unable to set SVE VL to 1024 bits or above since 7b6a2198

2020-12-03 Thread Alex Bennée

Are there any other examples of where linux-user tries to preserve
execution environment details over an execve?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1906536

Title:
  Unable to set SVE VL to 1024 bits or above since 7b6a2198

Status in QEMU:
  New

Bug description:
  Prior to 7b6a2198e71794c851f39ac7a92d39692c786820, the QEMU option
  sve-max-vq could be used to set the vector length of the
  implementation. This is useful (among other reasons) for testing
  software compiled with a fixed SVE vector length. Since this commit,
  the vector length is capped at 512 bits.

  To reproduce the issue:

  $ cat rdvl.s
  .global _start
  _start:
rdvl x0, #1
asr x0, x0, #4
mov x8, #93 // exit
svc #0
  $ aarch64-linux-gnu-as -march=armv8.2-a+sve rdvl.s -o rdvl.o
  $ aarch64-linux-gnu-ld rdvl.o
  $ for vl in 1 2 4 8 16; do ../build-qemu/aarch64-linux-user/qemu-aarch64 -cpu 
max,sve-max-vq=$vl a.out; echo $?; done
  1
  2
  4
  4
  4

  For a QEMU built prior to the above revision, we get the output:
  1
  2
  4
  8
  16

  as expected. It seems that either the old behavior should be restored,
  or there should be an option to force a higher vector length?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1906536/+subscriptions

[PATCH 5/9] vnc: add pseudo encodings

2020-12-03 Thread Gerd Hoffmann

Add #defines for two new pseudo encodings:
 * cursor with alpha channel.
 * extended desktop resize.

Signed-off-by: Gerd Hoffmann 
---
 ui/vnc.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/ui/vnc.h b/ui/vnc.h
index a7fd38a82075..6f5006da3593 100644
--- a/ui/vnc.h
+++ b/ui/vnc.h
@@ -411,6 +411,8 @@ enum {
 #define VNC_ENCODING_AUDIO0XFEFD /* -259 */
 #define VNC_ENCODING_TIGHT_PNG0xFEFC /* -260 */
 #define VNC_ENCODING_LED_STATE0XFEFB /* -261 */
+#define VNC_ENCODING_DESKTOP_RESIZE_EXT   0XFECC /* -308 */
+#define VNC_ENCODING_ALPHA_CURSOR 0XFEC6 /* -314 */
 #define VNC_ENCODING_WMVi 0x574D5669
 
 /*
-- 
2.27.0

[PATCH 4/9] vnc: drop unused copyrect feature

2020-12-03 Thread Gerd Hoffmann

vnc stopped using the copyrect pseudo encoding in 2017, in commit
50628d3479e4 ("cirrus/vnc: zap bitblit support from console code.")
So we can drop the now unused copyrect feature bit.

Signed-off-by: Gerd Hoffmann 
---
 ui/vnc.h | 2 --
 ui/vnc.c | 3 ---
 2 files changed, 5 deletions(-)

diff --git a/ui/vnc.h b/ui/vnc.h
index 262fcf179b44..a7fd38a82075 100644
--- a/ui/vnc.h
+++ b/ui/vnc.h
@@ -445,7 +445,6 @@ enum VncFeatures {
 VNC_FEATURE_WMVI,
 VNC_FEATURE_TIGHT,
 VNC_FEATURE_ZLIB,
-VNC_FEATURE_COPYRECT,
 VNC_FEATURE_RICH_CURSOR,
 VNC_FEATURE_TIGHT_PNG,
 VNC_FEATURE_ZRLE,
@@ -459,7 +458,6 @@ enum VncFeatures {
 #define VNC_FEATURE_WMVI_MASK(1 << VNC_FEATURE_WMVI)
 #define VNC_FEATURE_TIGHT_MASK   (1 << VNC_FEATURE_TIGHT)
 #define VNC_FEATURE_ZLIB_MASK(1 << VNC_FEATURE_ZLIB)
-#define VNC_FEATURE_COPYRECT_MASK(1 << VNC_FEATURE_COPYRECT)
 #define VNC_FEATURE_RICH_CURSOR_MASK (1 << VNC_FEATURE_RICH_CURSOR)
 #define VNC_FEATURE_TIGHT_PNG_MASK   (1 << VNC_FEATURE_TIGHT_PNG)
 #define VNC_FEATURE_ZRLE_MASK(1 << VNC_FEATURE_ZRLE)
diff --git a/ui/vnc.c b/ui/vnc.c
index 49235056f7a8..8c2771c1ce3b 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -2061,9 +2061,6 @@ static void set_encodings(VncState *vs, int32_t 
*encodings, size_t n_encodings)
 case VNC_ENCODING_RAW:
 vs->vnc_encoding = enc;
 break;
-case VNC_ENCODING_COPYRECT:
-vs->features |= VNC_FEATURE_COPYRECT_MASK;
-break;
 case VNC_ENCODING_HEXTILE:
 vs->features |= VNC_FEATURE_HEXTILE_MASK;
 vs->vnc_encoding = enc;
-- 
2.27.0

[PATCH 9/9] qxl: add ui_info callback

2020-12-03 Thread Gerd Hoffmann

This makes qxl respond to user interface window resizes
when not using spice, so it works with gtk and vnc too.

Signed-off-by: Gerd Hoffmann 
---
 hw/display/qxl.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/hw/display/qxl.c b/hw/display/qxl.c
index 431c1070967a..e1df95c3e8a9 100644
--- a/hw/display/qxl.c
+++ b/hw/display/qxl.c
@@ -1177,8 +1177,35 @@ static const QXLInterface qxl_interface = {
 .client_monitors_config = interface_client_monitors_config,
 };
 
+static int qxl_ui_info(void *opaque, uint32_t idx, QemuUIInfo *info)
+{
+PCIQXLDevice *qxl = opaque;
+VDAgentMonitorsConfig *cfg;
+size_t size;
+
+if (using_spice) {
+/* spice agent will handle display resize */
+return -1;
+}
+if (idx > 0) {
+/* supporting only single head for now */
+return -1;
+}
+
+/* go fake a spice agent message */
+size = sizeof(VDAgentMonitorsConfig) + sizeof(VDAgentMonConfig);
+cfg = g_malloc0(size);
+cfg->num_of_monitors = 1;
+cfg->monitors[0].width = info->width;
+cfg->monitors[0].height = info->height;
+interface_client_monitors_config(&qxl->ssd.qxl, cfg);
+g_free(cfg);
+return 0;
+}
+
 static const GraphicHwOps qxl_ops = {
 .gfx_update  = qxl_hw_update,
+.ui_info = qxl_ui_info,
 .gfx_update_async = true,
 };
 
-- 
2.27.0

[PATCH 1/9] console: allow con==NULL in dpy_set_ui_info

2020-12-03 Thread Gerd Hoffmann

Use active_console in that case like we do in many other places.

Signed-off-by: Gerd Hoffmann 
---
 ui/console.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/ui/console.c b/ui/console.c
index 53dee8e26b17..16b326854080 100644
--- a/ui/console.c
+++ b/ui/console.c
@@ -1556,7 +1556,9 @@ const QemuUIInfo *dpy_get_ui_info(const QemuConsole *con)
 
 int dpy_set_ui_info(QemuConsole *con, QemuUIInfo *info)
 {
-assert(con != NULL);
+if (con == NULL) {
+con = active_console;
+}
 
 if (!dpy_ui_info_supported(con)) {
 return -1;
-- 
2.27.0

Re: [PATCH v3 01/12] [testing] disable xhci msix

2020-12-03 Thread Gerd Hoffmann

On Thu, Dec 03, 2020 at 06:00:54AM -0500, Michael S. Tsirkin wrote:
> On Thu, Dec 03, 2020 at 11:54:12AM +0100, Gerd Hoffmann wrote:
> > Signed-off-by: Gerd Hoffmann 
> 
> A bit more context on why you are doing this?

Scratch the patch, I just forgot to pass in --base so git publish skips
this one when mailing the series.

I'm using it to have a pcie device without msi support so I can check
whenever pci link interrupts are working properly.

take care,
  Gerd

Re: [PATCH v2 01/27] migration: Network Failover can't work with a paused guest

2020-12-03 Thread Michael S. Tsirkin

On Wed, Dec 02, 2020 at 12:01:21PM +, Daniel P. Berrangé wrote:
> On Wed, Dec 02, 2020 at 06:37:46AM -0500, Michael S. Tsirkin wrote:
> > On Wed, Dec 02, 2020 at 11:26:39AM +, Daniel P. Berrangé wrote:
> > > On Wed, Dec 02, 2020 at 06:19:29AM -0500, Michael S. Tsirkin wrote:
> > > > On Wed, Dec 02, 2020 at 10:55:15AM +, Daniel P. Berrangé wrote:
> > > > > On Wed, Dec 02, 2020 at 11:51:05AM +0100, Juan Quintela wrote:
> > > > > > "Michael S. Tsirkin"  wrote:
> > > > > > > On Wed, Dec 02, 2020 at 05:31:53AM -0500, Michael S. Tsirkin 
> > > > > > > wrote:
> > > > > > >> On Wed, Dec 02, 2020 at 10:27:18AM +, Daniel P. BerrangÃƒÂ© 
> > > > > > >> wrote:
> > > > > > >> > On Wed, Dec 02, 2020 at 05:13:18AM -0500, Michael S. Tsirkin 
> > > > > > >> > wrote:
> > > > > > >> > > On Wed, Nov 18, 2020 at 09:37:22AM +0100, Juan Quintela 
> > > > > > >> > > wrote:
> > > > > > >> > > > If we have a paused guest, it can't unplug the network VF 
> > > > > > >> > > > device, so
> > > > > > >> > > > we wait there forever.  Just change the code to give one 
> > > > > > >> > > > error on that
> > > > > > >> > > > case.
> > > > > > >> > > > 
> > > > > > >> > > > Signed-off-by: Juan Quintela 
> > > > > > >> > > 
> > > > > > >> > > It's certainly possible but it's management that created
> > > > > > >> > > this situation after all - why do we bother to enforce
> > > > > > >> > > a policy? It is possible that management will unpause 
> > > > > > >> > > immediately
> > > > > > >> > > afterwards and everything will proceed smoothly.
> > > > > > >> > > 
> > > > > > >> > > Yes migration will not happen until guest is
> > > > > > >> > > unpaused but the same it true of e.g. a guest that is stuck
> > > > > > >> > > because of a bug.
> > > > > > >> > 
> > > > > > >> > That's pretty different behaviour from how migration normally 
> > > > > > >> > handles
> > > > > > >> > a paused guest, which is that it is guaranteed to complete the 
> > > > > > >> > migration
> > > > > > >> > in as short a time as network bandwidth allows.
> > > > > > >> > 
> > > > > > >> > Just ignoring the situation I think will lead to surprise apps 
> > > > > > >> > / admins,
> > > > > > >> > because the person/entity invoking the migration is not likely 
> > > > > > >> > to have
> > > > > > >> > checked wether this particular guest uses net failover or not 
> > > > > > >> > before
> > > > > > >> > invoking - they'll just be expecting a paused migration to run 
> > > > > > >> > fast and
> > > > > > >> > be guaranteed to complete.
> > > > > > >> > 
> > > > > > >> > Regards,
> > > > > > >> > Daniel
> > > > > > >> 
> > > > > > >> Okay I guess. But then shouldn't we handle the reverse situation 
> > > > > > >> too:
> > > > > > >> pausing guest after migration started but before device was
> > > > > > >> unplugged?
> > > > > > >> 
> > > > > > >
> > > > > > > Thinking of which, I have no idea how we'd handle it - fail
> > > > > > > pausing guest until migration is cancelled?
> > > > > > >
> > > > > > > All this seems heavy handed to me ...
> > > > > > 
> > > > > > This is the minimal fix that I can think of.
> > > > > > 
> > > > > > Further solution would be:
> > > > > > - Add a new migration parameter: migrate-paused
> > > > > > - change libvirt to use the new parameter if it exist
> > > > > > - in qemu, when we do start migration (but after we wait for the 
> > > > > > unplug
> > > > > >   device) paused the guest before starting migration and resume it 
> > > > > > after
> > > > > >   migration finish.
> > > > > 
> > > > > It would also have to handle issuing of paused after migration has
> > > > > been started - delay the pause request until the nuplug is complete
> > > > > is one answer.
> > > > 
> > > > Hmm my worry would be that pausing is one way to give cpu
> > > > resources back to host. It's problematic if guest can delay
> > > > that indefinitely.
> > > 
> > > hmm, yes, that is awkward.  Perhaps we should just report an explicit
> > > error then.
> > 
> > Report an error in response to which command? Do you mean
> > fail migration?
> 
> If mgt attempt to pause an existing migration that hasn't finished
> the PCI unplug stage, then fail the pause request.

Pause guest not migration ...
Might be tricky ...

Let me ask this, why not just produce a warning
that migration wan't finish until guest actually runs?
User will then know and unpause the guest when he wants
migration to succeed ...


For example, user can restrict the amount of cpu
using cgroups to a level where almost no progress
is made. QEMU can't detect this 



> > 
> > > In normal cases this won't happen, as unplug will have
> > > easily completed before the mgmt app pauses the running migration.
> > > In broken/malicious guest cases, this at least ives mgmt a heads up
> > > that something is wrong and they might then decide to cancel the
> > > migration.
> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvi

Re: [PATCH v2 2/4] hw/scsi/megasas: Assert cdb_len is valid in megasas_handle_scsi()

2020-12-03 Thread Li Qiang

Philippe Mathieu-Daudé  于2020年12月2日周三 上午3:13写道：
>
> cdb_len can not be zero... (or less than 6) here, else we have a
> out-of-bound read first in scsi_cdb_length():
>
>  71 int scsi_cdb_length(uint8_t *buf)
>  72 {
>  73 int cdb_len;
>  74
>  75 switch (buf[0] >> 5) {

Hi Philippe,

Here I not read the spec. Just guest from your patch, this 'buf[0]>>5'
indicates/related with the cdb length, right?

So here(this patch) you  just want to ensure the 'buf[0]>>5' and the
'cdb_len' is consistent.

But I don't  think here is a 'out-of-bound' read issue.


The 'buf' is from here 'cdb'.
static int megasas_handle_scsi(MegasasState *s, MegasasCmd *cmd,
   int frame_cmd)
{

cdb = cmd->frame->pass.cdb;

'cmd->frame->pass.cdb' is an array in heap and  its data is mmaped
from the guest.

The guest can put any data in 'pass.cdb' which 'buf[0]>>5' can be 0 or
less than 6.

So every read of this 'pass.cdb'[0~15] is valid. Not an oob.




>  76 case 0:
>  77 cdb_len = 6;
>  78 break;
>
> Then another out-of-bound read when the size returned by
> scsi_cdb_length() is used.

Where is this?

So I think your intention is to ensure  'cdb_len' is consistent with
'cdb[0]>>5'.

Please correct me if I'm wrong.

Thanks,
Li Qiang

>
> Figured out after reviewing:
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg757937.html
>
> And reproduced fuzzing:
>
>   qemu-fuzz-i386: hw/scsi/megasas.c:1679: int 
> megasas_handle_scsi(MegasasState *, MegasasCmd *, int):
>   Assertion `len > 0 && cdb_len >= len' failed.
>   ==1689590== ERROR: libFuzzer: deadly signal
>   #8 0x7f7a5d918e75 in __assert_fail (/lib64/libc.so.6+0x34e75)
>   #9 0x55a1b95cf6d4 in megasas_handle_scsi hw/scsi/megasas.c:1679:5
>   #10 0x55a1b95cf6d4 in megasas_handle_frame hw/scsi/megasas.c:1975:24
>   #11 0x55a1b95cf6d4 in megasas_mmio_write hw/scsi/megasas.c:2132:9
>   #12 0x55a1b981972e in memory_region_write_accessor 
> softmmu/memory.c:491:5
>   #13 0x55a1b981972e in access_with_adjusted_size softmmu/memory.c:552:18
>   #14 0x55a1b981972e in memory_region_dispatch_write 
> softmmu/memory.c:1501:16
>   #15 0x55a1b97f0ab0 in flatview_write_continue softmmu/physmem.c:2759:23
>   #16 0x55a1b97ec3f2 in flatview_write softmmu/physmem.c:2799:14
>   #17 0x55a1b97ec3f2 in address_space_write softmmu/physmem.c:2891:18
>   #18 0x55a1b985c7cd in cpu_outw softmmu/ioport.c:70:5
>   #19 0x55a1b99577ac in qtest_process_command softmmu/qtest.c:481:13
>
> Inspired-by: Daniele Buono 
> Inspired-by: Alexander Bulekov 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/scsi/megasas.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
> index 1a5fc5857db..f5ad4425b5b 100644
> --- a/hw/scsi/megasas.c
> +++ b/hw/scsi/megasas.c
> @@ -1667,6 +1667,7 @@ static int megasas_handle_scsi(MegasasState *s, 
> MegasasCmd *cmd,
>  {
>  uint8_t *cdb;
>  int target_id, lun_id, cdb_len;
> +int len = -1;
>  bool is_write;
>  struct SCSIDevice *sdev = NULL;
>  bool is_logical = (frame_cmd == MFI_CMD_LD_SCSI_IO);
> @@ -1676,6 +1677,10 @@ static int megasas_handle_scsi(MegasasState *s, 
> MegasasCmd *cmd,
>  lun_id = cmd->frame->header.lun_id;
>  cdb_len = cmd->frame->header.cdb_len;
>
> +if (cdb_len > 0) {
> +len = scsi_cdb_length(cdb);
> +}
> +assert(len > 0 && cdb_len >= len);
>  if (is_logical) {
>  if (target_id >= MFI_MAX_LD || lun_id != 0) {
>  trace_megasas_scsi_target_not_present(
> --
> 2.26.2
>
>

Re: [PATCH 1/2] nvme: updated shared header for copy command

2020-12-03 Thread Stefan Hajnoczi

On Tue, Nov 24, 2020 at 08:14:17AM +0100, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> Add new data structures and types for the Simple Copy command.
> 
> Signed-off-by: Klaus Jensen 
> Cc: Stefan Hajnoczi 
> Cc: Fam Zheng 
> ---
>  include/block/nvme.h | 45 ++--
>  1 file changed, 43 insertions(+), 2 deletions(-)

Acked-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH 8/9] vnc: add support for extended desktop resize

2020-12-03 Thread Daniel P . Berrangé

On Thu, Dec 03, 2020 at 12:08:04PM +0100, Gerd Hoffmann wrote:
> The extended desktop resize encoding adds support for (a) clients
> sending resize requests to the server, and (b) multihead support.
> 
> This patch implements (a).  All resize requests are rejected by qemu.
> Qemu can't resize the framebuffer on its own, this is in the hands of
> the guest, so all qemu can do is forward the request to the guest.
> Should the guest actually resize the framebuffer we can notify the vnc
> client later with a separate message.
> 
> This requires support in the display device.  Works with virtio-gpu.
> 
> https://github.com/rfbproto/rfbproto/blob/master/rfbproto.rst#extendeddesktopsize-pseudo-encoding
> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  ui/vnc.h |  2 ++
>  ui/vnc.c | 64 +++-
>  2 files changed, 65 insertions(+), 1 deletion(-)
> 
> diff --git a/ui/vnc.h b/ui/vnc.h
> index c8d3ad9ec496..77a310947bd6 100644
> --- a/ui/vnc.h
> +++ b/ui/vnc.h
> @@ -442,6 +442,7 @@ enum {
>  
>  enum VncFeatures {
>  VNC_FEATURE_RESIZE,
> +VNC_FEATURE_RESIZE_EXT,
>  VNC_FEATURE_HEXTILE,
>  VNC_FEATURE_POINTER_TYPE_CHANGE,
>  VNC_FEATURE_WMVI,
> @@ -456,6 +457,7 @@ enum VncFeatures {
>  };
>  
>  #define VNC_FEATURE_RESIZE_MASK  (1 << VNC_FEATURE_RESIZE)
> +#define VNC_FEATURE_RESIZE_EXT_MASK  (1 << VNC_FEATURE_RESIZE_EXT)
>  #define VNC_FEATURE_HEXTILE_MASK (1 << VNC_FEATURE_HEXTILE)
>  #define VNC_FEATURE_POINTER_TYPE_CHANGE_MASK (1 << 
> VNC_FEATURE_POINTER_TYPE_CHANGE)
>  #define VNC_FEATURE_WMVI_MASK(1 << VNC_FEATURE_WMVI)
> diff --git a/ui/vnc.c b/ui/vnc.c
> index bdaf384f71a4..a15132faa96f 100644
> --- a/ui/vnc.c
> +++ b/ui/vnc.c
> @@ -663,10 +663,35 @@ void vnc_framebuffer_update(VncState *vs, int x, int y, 
> int w, int h,
>  vnc_write_s32(vs, encoding);
>  }
>  
> +static void vnc_desktop_resize_ext(VncState *vs, bool reject)
> +{
> +vnc_lock_output(vs);
> +vnc_write_u8(vs, VNC_MSG_SERVER_FRAMEBUFFER_UPDATE);
> +vnc_write_u8(vs, 0);
> +vnc_write_u16(vs, 1); /* number of rects */
> +vnc_framebuffer_update(vs,
> +   reject ? 1 : 0,
> +   reject ? 3 : 0,
> +   vs->client_width, vs->client_height,
> +   VNC_ENCODING_DESKTOP_RESIZE_EXT);
> +vnc_write_u8(vs, 1);  /* number of screens */
> +vnc_write_u8(vs, 0);  /* padding */
> +vnc_write_u8(vs, 0);  /* padding */
> +vnc_write_u8(vs, 0);  /* padding */
> +vnc_write_u32(vs, 0); /* screen id */
> +vnc_write_u16(vs, 0); /* screen x-pos */
> +vnc_write_u16(vs, 0); /* screen y-pos */
> +vnc_write_u16(vs, vs->client_width);
> +vnc_write_u16(vs, vs->client_height);
> +vnc_write_u32(vs, 0); /* screen flags */
> +vnc_unlock_output(vs);
> +vnc_flush(vs);
> +}
>  
>  static void vnc_desktop_resize(VncState *vs, bool force)
>  {
> -if (vs->ioc == NULL || !vnc_has_feature(vs, VNC_FEATURE_RESIZE)) {
> +if (vs->ioc == NULL || (!vnc_has_feature(vs, VNC_FEATURE_RESIZE) &&
> +!vnc_has_feature(vs, VNC_FEATURE_RESIZE_EXT))) {
>  return;
>  }
>  if (vs->client_width == pixman_image_get_width(vs->vd->server) &&
> @@ -681,6 +706,12 @@ static void vnc_desktop_resize(VncState *vs, bool force)
> pixman_image_get_height(vs->vd->server) >= 0);
>  vs->client_width = pixman_image_get_width(vs->vd->server);
>  vs->client_height = pixman_image_get_height(vs->vd->server);
> +
> +if (vnc_has_feature(vs, VNC_FEATURE_RESIZE_EXT)) {
> +vnc_desktop_resize_ext(vs, false);
> +return;
> +}
> +
>  vnc_lock_output(vs);
>  vnc_write_u8(vs, VNC_MSG_SERVER_FRAMEBUFFER_UPDATE);
>  vnc_write_u8(vs, 0);
> @@ -2110,6 +2141,9 @@ static void set_encodings(VncState *vs, int32_t 
> *encodings, size_t n_encodings)
>  case VNC_ENCODING_DESKTOPRESIZE:
>  vs->features |= VNC_FEATURE_RESIZE_MASK;
>  break;
> +case VNC_ENCODING_DESKTOP_RESIZE_EXT:
> +vs->features |= VNC_FEATURE_RESIZE_EXT_MASK;

IIUC, we shouldn't set this flag unless all current displays adapters
associated with the VNC server support the "ui_info" callbacks,
otherwise the client will think it can send resize requests
but they'll never be honoured.

> +break;
>  case VNC_ENCODING_POINTER_TYPE_CHANGE:
>  vs->features |= VNC_FEATURE_POINTER_TYPE_CHANGE_MASK;
>  break;
> @@ -2431,6 +2465,34 @@ static int protocol_client_msg(VncState *vs, uint8_t 
> *data, size_t len)
>  break;
>  }
>  break;
> +case VNC_MSG_CLIENT_SET_DESKTOP_SIZE:
> +{
> +size_t size;
> +uint8_t screens;
> +uint16_t width;
> +uint16_t height;
> +QemuUIInfo info;
> +
> +if (len < 8) {
> +return 8;
> +}
> +
> +

[Bug 1906536] Re: Unable to set SVE VL to 1024 bits or above since 7b6a2198

2020-12-03 Thread Peter Maydell

I don't think we currently do have anything else we try to preserve over
execve-to-maybe-another-QEMU-emulated-process, no. The other approach of
"provide a config knob equivalent to
/proc/sys/abi/sve_default_vector_length" is probably simpler.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1906536

Title:
  Unable to set SVE VL to 1024 bits or above since 7b6a2198

Status in QEMU:
  New

Bug description:
  Prior to 7b6a2198e71794c851f39ac7a92d39692c786820, the QEMU option
  sve-max-vq could be used to set the vector length of the
  implementation. This is useful (among other reasons) for testing
  software compiled with a fixed SVE vector length. Since this commit,
  the vector length is capped at 512 bits.

  To reproduce the issue:

  $ cat rdvl.s
  .global _start
  _start:
rdvl x0, #1
asr x0, x0, #4
mov x8, #93 // exit
svc #0
  $ aarch64-linux-gnu-as -march=armv8.2-a+sve rdvl.s -o rdvl.o
  $ aarch64-linux-gnu-ld rdvl.o
  $ for vl in 1 2 4 8 16; do ../build-qemu/aarch64-linux-user/qemu-aarch64 -cpu 
max,sve-max-vq=$vl a.out; echo $?; done
  1
  2
  4
  4
  4

  For a QEMU built prior to the above revision, we get the output:
  1
  2
  4
  8
  16

  as expected. It seems that either the old behavior should be restored,
  or there should be an option to force a higher vector length?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1906536/+subscriptions

Re: [PATCH v2 01/27] migration: Network Failover can't work with a paused guest

2020-12-03 Thread Daniel P . Berrangé

On Thu, Dec 03, 2020 at 06:21:47AM -0500, Michael S. Tsirkin wrote:
> On Wed, Dec 02, 2020 at 12:01:21PM +, Daniel P. Berrangé wrote:
> > On Wed, Dec 02, 2020 at 06:37:46AM -0500, Michael S. Tsirkin wrote:
> > > On Wed, Dec 02, 2020 at 11:26:39AM +, Daniel P. Berrangé wrote:
> > > > On Wed, Dec 02, 2020 at 06:19:29AM -0500, Michael S. Tsirkin wrote:
> > > > > On Wed, Dec 02, 2020 at 10:55:15AM +, Daniel P. Berrangé wrote:
> > > > > > On Wed, Dec 02, 2020 at 11:51:05AM +0100, Juan Quintela wrote:
> > > > > > > "Michael S. Tsirkin"  wrote:
> > > > > > > > On Wed, Dec 02, 2020 at 05:31:53AM -0500, Michael S. Tsirkin 
> > > > > > > > wrote:
> > > > > > > >> On Wed, Dec 02, 2020 at 10:27:18AM +, Daniel P. 
> > > > > > > >> BerrangÃƒÂ© wrote:
> > > > > > > >> > On Wed, Dec 02, 2020 at 05:13:18AM -0500, Michael S. Tsirkin 
> > > > > > > >> > wrote:
> > > > > > > >> > > On Wed, Nov 18, 2020 at 09:37:22AM +0100, Juan Quintela 
> > > > > > > >> > > wrote:
> > > > > > > >> > > > If we have a paused guest, it can't unplug the network 
> > > > > > > >> > > > VF device, so
> > > > > > > >> > > > we wait there forever.  Just change the code to give one 
> > > > > > > >> > > > error on that
> > > > > > > >> > > > case.
> > > > > > > >> > > > 
> > > > > > > >> > > > Signed-off-by: Juan Quintela 
> > > > > > > >> > > 
> > > > > > > >> > > It's certainly possible but it's management that created
> > > > > > > >> > > this situation after all - why do we bother to enforce
> > > > > > > >> > > a policy? It is possible that management will unpause 
> > > > > > > >> > > immediately
> > > > > > > >> > > afterwards and everything will proceed smoothly.
> > > > > > > >> > > 
> > > > > > > >> > > Yes migration will not happen until guest is
> > > > > > > >> > > unpaused but the same it true of e.g. a guest that is stuck
> > > > > > > >> > > because of a bug.
> > > > > > > >> > 
> > > > > > > >> > That's pretty different behaviour from how migration 
> > > > > > > >> > normally handles
> > > > > > > >> > a paused guest, which is that it is guaranteed to complete 
> > > > > > > >> > the migration
> > > > > > > >> > in as short a time as network bandwidth allows.
> > > > > > > >> > 
> > > > > > > >> > Just ignoring the situation I think will lead to surprise 
> > > > > > > >> > apps / admins,
> > > > > > > >> > because the person/entity invoking the migration is not 
> > > > > > > >> > likely to have
> > > > > > > >> > checked wether this particular guest uses net failover or 
> > > > > > > >> > not before
> > > > > > > >> > invoking - they'll just be expecting a paused migration to 
> > > > > > > >> > run fast and
> > > > > > > >> > be guaranteed to complete.
> > > > > > > >> > 
> > > > > > > >> > Regards,
> > > > > > > >> > Daniel
> > > > > > > >> 
> > > > > > > >> Okay I guess. But then shouldn't we handle the reverse 
> > > > > > > >> situation too:
> > > > > > > >> pausing guest after migration started but before device was
> > > > > > > >> unplugged?
> > > > > > > >> 
> > > > > > > >
> > > > > > > > Thinking of which, I have no idea how we'd handle it - fail
> > > > > > > > pausing guest until migration is cancelled?
> > > > > > > >
> > > > > > > > All this seems heavy handed to me ...
> > > > > > > 
> > > > > > > This is the minimal fix that I can think of.
> > > > > > > 
> > > > > > > Further solution would be:
> > > > > > > - Add a new migration parameter: migrate-paused
> > > > > > > - change libvirt to use the new parameter if it exist
> > > > > > > - in qemu, when we do start migration (but after we wait for the 
> > > > > > > unplug
> > > > > > >   device) paused the guest before starting migration and resume 
> > > > > > > it after
> > > > > > >   migration finish.
> > > > > > 
> > > > > > It would also have to handle issuing of paused after migration has
> > > > > > been started - delay the pause request until the nuplug is complete
> > > > > > is one answer.
> > > > > 
> > > > > Hmm my worry would be that pausing is one way to give cpu
> > > > > resources back to host. It's problematic if guest can delay
> > > > > that indefinitely.
> > > > 
> > > > hmm, yes, that is awkward.  Perhaps we should just report an explicit
> > > > error then.
> > > 
> > > Report an error in response to which command? Do you mean
> > > fail migration?
> > 
> > If mgt attempt to pause an existing migration that hasn't finished
> > the PCI unplug stage, then fail the pause request.
> 
> Pause guest not migration ...
> Might be tricky ...
> 
> Let me ask this, why not just produce a warning
> that migration wan't finish until guest actually runs?
> User will then know and unpause the guest when he wants
> migration to succeed ...

A warning is going to be essentally invisible if the pause command
succeeeds. 

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https:/

Re: [PATCH v2 2/4] hw/scsi/megasas: Assert cdb_len is valid in megasas_handle_scsi()

2020-12-03 Thread Philippe Mathieu-Daudé

Hi Li,

On 12/3/20 12:21 PM, Li Qiang wrote:
> Philippe Mathieu-Daudé  于2020年12月2日周三 上午3:13写道：
>>
>> cdb_len can not be zero... (or less than 6) here, else we have a
>> out-of-bound read first in scsi_cdb_length():
>>
>>  71 int scsi_cdb_length(uint8_t *buf)
>>  72 {
>>  73 int cdb_len;
>>  74
>>  75 switch (buf[0] >> 5) {
> 
> Hi Philippe,
> 
> Here I not read the spec.

Neither did I...

> Just guest from your patch, this 'buf[0]>>5'
> indicates/related with the cdb length, right?

This is my understanding too.

> So here(this patch) you  just want to ensure the 'buf[0]>>5' and the
> 'cdb_len' is consistent.
> 
> But I don't  think here is a 'out-of-bound' read issue.
> 
> 
> The 'buf' is from here 'cdb'.
> static int megasas_handle_scsi(MegasasState *s, MegasasCmd *cmd,
>int frame_cmd)
> {
> 
> cdb = cmd->frame->pass.cdb;
> 
> 'cmd->frame->pass.cdb' is an array in heap and  its data is mmaped
> from the guest.
> 
> The guest can put any data in 'pass.cdb' which 'buf[0]>>5' can be 0 or
> less than 6.
> 
> So every read of this 'pass.cdb'[0~15] is valid. Not an oob.

OK maybe not OOB but infoleak?

>>  76 case 0:
>>  77 cdb_len = 6;
>>  78 break;
>>
>> Then another out-of-bound read when the size returned by
>> scsi_cdb_length() is used.
> 
> Where is this?

IIRC scsi_req_parse_cdb().

> 
> So I think your intention is to ensure  'cdb_len' is consistent with
> 'cdb[0]>>5'.
> 
> Please correct me if I'm wrong.

I'll recheck and go back to you during January.

Regards,

Phil.

>>
>> Figured out after reviewing:
>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg757937.html
>>
>> And reproduced fuzzing:
>>
>>   qemu-fuzz-i386: hw/scsi/megasas.c:1679: int 
>> megasas_handle_scsi(MegasasState *, MegasasCmd *, int):
>>   Assertion `len > 0 && cdb_len >= len' failed.
>>   ==1689590== ERROR: libFuzzer: deadly signal
>>   #8 0x7f7a5d918e75 in __assert_fail (/lib64/libc.so.6+0x34e75)
>>   #9 0x55a1b95cf6d4 in megasas_handle_scsi hw/scsi/megasas.c:1679:5
>>   #10 0x55a1b95cf6d4 in megasas_handle_frame hw/scsi/megasas.c:1975:24
>>   #11 0x55a1b95cf6d4 in megasas_mmio_write hw/scsi/megasas.c:2132:9
>>   #12 0x55a1b981972e in memory_region_write_accessor 
>> softmmu/memory.c:491:5
>>   #13 0x55a1b981972e in access_with_adjusted_size softmmu/memory.c:552:18
>>   #14 0x55a1b981972e in memory_region_dispatch_write 
>> softmmu/memory.c:1501:16
>>   #15 0x55a1b97f0ab0 in flatview_write_continue softmmu/physmem.c:2759:23
>>   #16 0x55a1b97ec3f2 in flatview_write softmmu/physmem.c:2799:14
>>   #17 0x55a1b97ec3f2 in address_space_write softmmu/physmem.c:2891:18
>>   #18 0x55a1b985c7cd in cpu_outw softmmu/ioport.c:70:5
>>   #19 0x55a1b99577ac in qtest_process_command softmmu/qtest.c:481:13
>>
>> Inspired-by: Daniele Buono 
>> Inspired-by: Alexander Bulekov 
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>>  hw/scsi/megasas.c | 5 +
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
>> index 1a5fc5857db..f5ad4425b5b 100644
>> --- a/hw/scsi/megasas.c
>> +++ b/hw/scsi/megasas.c
>> @@ -1667,6 +1667,7 @@ static int megasas_handle_scsi(MegasasState *s, 
>> MegasasCmd *cmd,
>>  {
>>  uint8_t *cdb;
>>  int target_id, lun_id, cdb_len;
>> +int len = -1;
>>  bool is_write;
>>  struct SCSIDevice *sdev = NULL;
>>  bool is_logical = (frame_cmd == MFI_CMD_LD_SCSI_IO);
>> @@ -1676,6 +1677,10 @@ static int megasas_handle_scsi(MegasasState *s, 
>> MegasasCmd *cmd,
>>  lun_id = cmd->frame->header.lun_id;
>>  cdb_len = cmd->frame->header.cdb_len;
>>
>> +if (cdb_len > 0) {
>> +len = scsi_cdb_length(cdb);
>> +}
>> +assert(len > 0 && cdb_len >= len);
>>  if (is_logical) {
>>  if (target_id >= MFI_MAX_LD || lun_id != 0) {
>>  trace_megasas_scsi_target_not_present(
>> --
>> 2.26.2
>>
>>
>

Re: [PATCH v2 2/4] contrib/vhost-user-gpu: avoid g_return_val_if() input validation

2020-12-03 Thread Stefan Hajnoczi

On Thu, Dec 03, 2020 at 02:26:08PM +0400, Marc-André Lureau wrote:
> On Thu, Dec 3, 2020 at 1:52 PM Stefan Hajnoczi  wrote:
> 
> > On Wed, Dec 02, 2020 at 07:50:51PM +0400, Marc-André Lureau wrote:
> > > On Wed, Dec 2, 2020 at 7:27 PM Stefan Hajnoczi 
> > wrote:
> > >
> > > > Do not validate input with g_return_val_if(). This API is intended for
> > > > checking programming errors and is compiled out with
> > -DG_DISABLE_CHECKS.
> > > >
> > > > Use an explicit if statement for input validation so it cannot
> > > > accidentally be compiled out.
> > > >
> > > > Suggested-by: Markus Armbruster 
> > > > Signed-off-by: Stefan Hajnoczi 
> > > > ---
> > > >  contrib/vhost-user-gpu/vhost-user-gpu.c | 6 +-
> > > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/contrib/vhost-user-gpu/vhost-user-gpu.c
> > > > b/contrib/vhost-user-gpu/vhost-user-gpu.c
> > > > index a019d0a9ac..534bad24d1 100644
> > > > --- a/contrib/vhost-user-gpu/vhost-user-gpu.c
> > > > +++ b/contrib/vhost-user-gpu/vhost-user-gpu.c
> > > > @@ -1044,7 +1044,11 @@ vg_get_config(VuDev *dev, uint8_t *config,
> > uint32_t
> > > > len)
> > > >  {
> > > >  VuGpu *g = container_of(dev, VuGpu, dev.parent);
> > > >
> > > > -g_return_val_if_fail(len <= sizeof(struct virtio_gpu_config), -1);
> > > > +if (len > sizeof(struct virtio_gpu_config)) {
> > > > +g_critical("%s: len %u is larger than %zu",
> > > > +   __func__, len, sizeof(struct virtio_gpu_config));
> > > >
> > >
> > > g_critical() already has __FILE__ __LINE__ and G_STRFUNC.
> >
> > I did this for consistency with the logging in this source file. The
> > other g_critical() calls in the file also print __func__.
> >
> >
> >
> I see, nevermind then. I gave rb anyway

Thanks! I checked now and don't see the function name printed by
g_critical() even though G_STRFUNC is captured in the header file:

** (process:693258): CRITICAL **: 11:30:18.210: test

Maybe only custom log handlers can display the function name?

So it seems the explicit __func__ approach is okay-ish :).

Stefan


signature.asc
Description: PGP signature

Re: [PATCH 05/11] target/arm: Enforce alignment for SRS

2020-12-03 Thread Peter Maydell

On Wed, 25 Nov 2020 at 04:09, Richard Henderson
 wrote:
>
> Signed-off-by: Richard Henderson 
> ---
>  target/arm/translate.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index 4406f6a67c..b1f43bfb8f 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -5124,11 +5124,13 @@ static void gen_srs(DisasContext *s,
>  }
>  tcg_gen_addi_i32(addr, addr, offset);
>  tmp = load_reg(s, 14);
> -gen_aa32_st32(s, tmp, addr, get_mem_index(s));
> +gen_aa32_st_i32(s, tmp, addr, get_mem_index(s),
> +MO_UL | MO_ALIGN | s->be_data);
>  tcg_temp_free_i32(tmp);
>  tmp = load_cpu_field(spsr);
>  tcg_gen_addi_i32(addr, addr, 4);
> -gen_aa32_st32(s, tmp, addr, get_mem_index(s));
> +gen_aa32_st_i32(s, tmp, addr, get_mem_index(s),
> +MO_UL | MO_ALIGN | s->be_data);

Having just come back to look at this as a result of reading
a review comment from you on the v8.1M series, it's a bit
unfortunate that we now have to remember to factor in s->be_data
in every memory access. Previously gen_aa32_st32() got this
right for us automatically, as well as being able to provide
the right sized MO_UL or whatever part... Can we make the
new API a bit less awkward ? (I suspect we're eventually
going to want to be able to pass an enum for "always OK
unaligned", "never OK unaligned", or "OK unaligned only
if SCTLR.A is 0", for that matter.)

thanks
-- PMM

Re: [PATCH] docs: set CONFDIR when running sphinx

2020-12-03 Thread Paolo Bonzini


On 03/12/20 11:28, Marc-André Lureau wrote:
Note that the original bug that prompted this fix is about qemu-ga 
configuration though.


Paolo, please queue the patch. thanks!


Done.

Paolo

Re: [PATCH v2 09/28] target/arm: Implement VLDR/VSTR system register

2020-12-03 Thread Peter Maydell

On Tue, 1 Dec 2020 at 13:11, Richard Henderson
 wrote:
>
> On 11/19/20 3:55 PM, Peter Maydell wrote:
> > +gen_aa32_st32(s, value, addr, get_mem_index(s));
>
> This is MemA, so should use
>
>   gen_aa32_st_i32(s, value, addr, get_mem_index(s),
>   MO_UL | MO_ALIGN);
>
> a-la my patch set from last week fixing other instances.

Also " | s->be_data", right ?

thanks
-- PMM

Re: [PATCH v2 01/27] migration: Network Failover can't work with a paused guest

2020-12-03 Thread Michael S. Tsirkin

On Thu, Dec 03, 2020 at 11:32:53AM +, Daniel P. Berrangé wrote:
> On Thu, Dec 03, 2020 at 06:21:47AM -0500, Michael S. Tsirkin wrote:
> > On Wed, Dec 02, 2020 at 12:01:21PM +, Daniel P. Berrangé wrote:
> > > On Wed, Dec 02, 2020 at 06:37:46AM -0500, Michael S. Tsirkin wrote:
> > > > On Wed, Dec 02, 2020 at 11:26:39AM +, Daniel P. Berrangé wrote:
> > > > > On Wed, Dec 02, 2020 at 06:19:29AM -0500, Michael S. Tsirkin wrote:
> > > > > > On Wed, Dec 02, 2020 at 10:55:15AM +, Daniel P. Berrangé wrote:
> > > > > > > On Wed, Dec 02, 2020 at 11:51:05AM +0100, Juan Quintela wrote:
> > > > > > > > "Michael S. Tsirkin"  wrote:
> > > > > > > > > On Wed, Dec 02, 2020 at 05:31:53AM -0500, Michael S. Tsirkin 
> > > > > > > > > wrote:
> > > > > > > > >> On Wed, Dec 02, 2020 at 10:27:18AM +, Daniel P. 
> > > > > > > > >> BerrangÃƒÂ© wrote:
> > > > > > > > >> > On Wed, Dec 02, 2020 at 05:13:18AM -0500, Michael S. 
> > > > > > > > >> > Tsirkin wrote:
> > > > > > > > >> > > On Wed, Nov 18, 2020 at 09:37:22AM +0100, Juan Quintela 
> > > > > > > > >> > > wrote:
> > > > > > > > >> > > > If we have a paused guest, it can't unplug the network 
> > > > > > > > >> > > > VF device, so
> > > > > > > > >> > > > we wait there forever.  Just change the code to give 
> > > > > > > > >> > > > one error on that
> > > > > > > > >> > > > case.
> > > > > > > > >> > > > 
> > > > > > > > >> > > > Signed-off-by: Juan Quintela 
> > > > > > > > >> > > 
> > > > > > > > >> > > It's certainly possible but it's management that created
> > > > > > > > >> > > this situation after all - why do we bother to enforce
> > > > > > > > >> > > a policy? It is possible that management will unpause 
> > > > > > > > >> > > immediately
> > > > > > > > >> > > afterwards and everything will proceed smoothly.
> > > > > > > > >> > > 
> > > > > > > > >> > > Yes migration will not happen until guest is
> > > > > > > > >> > > unpaused but the same it true of e.g. a guest that is 
> > > > > > > > >> > > stuck
> > > > > > > > >> > > because of a bug.
> > > > > > > > >> > 
> > > > > > > > >> > That's pretty different behaviour from how migration 
> > > > > > > > >> > normally handles
> > > > > > > > >> > a paused guest, which is that it is guaranteed to complete 
> > > > > > > > >> > the migration
> > > > > > > > >> > in as short a time as network bandwidth allows.
> > > > > > > > >> > 
> > > > > > > > >> > Just ignoring the situation I think will lead to surprise 
> > > > > > > > >> > apps / admins,
> > > > > > > > >> > because the person/entity invoking the migration is not 
> > > > > > > > >> > likely to have
> > > > > > > > >> > checked wether this particular guest uses net failover or 
> > > > > > > > >> > not before
> > > > > > > > >> > invoking - they'll just be expecting a paused migration to 
> > > > > > > > >> > run fast and
> > > > > > > > >> > be guaranteed to complete.
> > > > > > > > >> > 
> > > > > > > > >> > Regards,
> > > > > > > > >> > Daniel
> > > > > > > > >> 
> > > > > > > > >> Okay I guess. But then shouldn't we handle the reverse 
> > > > > > > > >> situation too:
> > > > > > > > >> pausing guest after migration started but before device was
> > > > > > > > >> unplugged?
> > > > > > > > >> 
> > > > > > > > >
> > > > > > > > > Thinking of which, I have no idea how we'd handle it - fail
> > > > > > > > > pausing guest until migration is cancelled?
> > > > > > > > >
> > > > > > > > > All this seems heavy handed to me ...
> > > > > > > > 
> > > > > > > > This is the minimal fix that I can think of.
> > > > > > > > 
> > > > > > > > Further solution would be:
> > > > > > > > - Add a new migration parameter: migrate-paused
> > > > > > > > - change libvirt to use the new parameter if it exist
> > > > > > > > - in qemu, when we do start migration (but after we wait for 
> > > > > > > > the unplug
> > > > > > > >   device) paused the guest before starting migration and resume 
> > > > > > > > it after
> > > > > > > >   migration finish.
> > > > > > > 
> > > > > > > It would also have to handle issuing of paused after migration has
> > > > > > > been started - delay the pause request until the nuplug is 
> > > > > > > complete
> > > > > > > is one answer.
> > > > > > 
> > > > > > Hmm my worry would be that pausing is one way to give cpu
> > > > > > resources back to host. It's problematic if guest can delay
> > > > > > that indefinitely.
> > > > > 
> > > > > hmm, yes, that is awkward.  Perhaps we should just report an explicit
> > > > > error then.
> > > > 
> > > > Report an error in response to which command? Do you mean
> > > > fail migration?
> > > 
> > > If mgt attempt to pause an existing migration that hasn't finished
> > > the PCI unplug stage, then fail the pause request.
> > 
> > Pause guest not migration ...
> > Might be tricky ...
> > 
> > Let me ask this, why not just produce a warning
> > that migration wan't finish until guest actually runs?
> > User will then know and unpause the guest when he wants
> > migration to succeed ...
>

Re: [PATCH v2 01/27] migration: Network Failover can't work with a paused guest

2020-12-03 Thread Dr. David Alan Gilbert

* Michael S. Tsirkin (m...@redhat.com) wrote:
> On Thu, Dec 03, 2020 at 11:32:53AM +, Daniel P. Berrangé wrote:
> > On Thu, Dec 03, 2020 at 06:21:47AM -0500, Michael S. Tsirkin wrote:
> > > On Wed, Dec 02, 2020 at 12:01:21PM +, Daniel P. Berrangé wrote:
> > > > On Wed, Dec 02, 2020 at 06:37:46AM -0500, Michael S. Tsirkin wrote:
> > > > > On Wed, Dec 02, 2020 at 11:26:39AM +, Daniel P. Berrangé wrote:
> > > > > > On Wed, Dec 02, 2020 at 06:19:29AM -0500, Michael S. Tsirkin wrote:
> > > > > > > On Wed, Dec 02, 2020 at 10:55:15AM +, Daniel P. Berrangé 
> > > > > > > wrote:
> > > > > > > > On Wed, Dec 02, 2020 at 11:51:05AM +0100, Juan Quintela wrote:
> > > > > > > > > "Michael S. Tsirkin"  wrote:
> > > > > > > > > > On Wed, Dec 02, 2020 at 05:31:53AM -0500, Michael S. 
> > > > > > > > > > Tsirkin wrote:
> > > > > > > > > >> On Wed, Dec 02, 2020 at 10:27:18AM +, Daniel P. 
> > > > > > > > > >> BerrangÃƒÂ© wrote:
> > > > > > > > > >> > On Wed, Dec 02, 2020 at 05:13:18AM -0500, Michael S. 
> > > > > > > > > >> > Tsirkin wrote:
> > > > > > > > > >> > > On Wed, Nov 18, 2020 at 09:37:22AM +0100, Juan 
> > > > > > > > > >> > > Quintela wrote:
> > > > > > > > > >> > > > If we have a paused guest, it can't unplug the 
> > > > > > > > > >> > > > network VF device, so
> > > > > > > > > >> > > > we wait there forever.  Just change the code to give 
> > > > > > > > > >> > > > one error on that
> > > > > > > > > >> > > > case.
> > > > > > > > > >> > > > 
> > > > > > > > > >> > > > Signed-off-by: Juan Quintela 
> > > > > > > > > >> > > 
> > > > > > > > > >> > > It's certainly possible but it's management that 
> > > > > > > > > >> > > created
> > > > > > > > > >> > > this situation after all - why do we bother to enforce
> > > > > > > > > >> > > a policy? It is possible that management will unpause 
> > > > > > > > > >> > > immediately
> > > > > > > > > >> > > afterwards and everything will proceed smoothly.
> > > > > > > > > >> > > 
> > > > > > > > > >> > > Yes migration will not happen until guest is
> > > > > > > > > >> > > unpaused but the same it true of e.g. a guest that is 
> > > > > > > > > >> > > stuck
> > > > > > > > > >> > > because of a bug.
> > > > > > > > > >> > 
> > > > > > > > > >> > That's pretty different behaviour from how migration 
> > > > > > > > > >> > normally handles
> > > > > > > > > >> > a paused guest, which is that it is guaranteed to 
> > > > > > > > > >> > complete the migration
> > > > > > > > > >> > in as short a time as network bandwidth allows.
> > > > > > > > > >> > 
> > > > > > > > > >> > Just ignoring the situation I think will lead to 
> > > > > > > > > >> > surprise apps / admins,
> > > > > > > > > >> > because the person/entity invoking the migration is not 
> > > > > > > > > >> > likely to have
> > > > > > > > > >> > checked wether this particular guest uses net failover 
> > > > > > > > > >> > or not before
> > > > > > > > > >> > invoking - they'll just be expecting a paused migration 
> > > > > > > > > >> > to run fast and
> > > > > > > > > >> > be guaranteed to complete.
> > > > > > > > > >> > 
> > > > > > > > > >> > Regards,
> > > > > > > > > >> > Daniel
> > > > > > > > > >> 
> > > > > > > > > >> Okay I guess. But then shouldn't we handle the reverse 
> > > > > > > > > >> situation too:
> > > > > > > > > >> pausing guest after migration started but before device was
> > > > > > > > > >> unplugged?
> > > > > > > > > >> 
> > > > > > > > > >
> > > > > > > > > > Thinking of which, I have no idea how we'd handle it - fail
> > > > > > > > > > pausing guest until migration is cancelled?
> > > > > > > > > >
> > > > > > > > > > All this seems heavy handed to me ...
> > > > > > > > > 
> > > > > > > > > This is the minimal fix that I can think of.
> > > > > > > > > 
> > > > > > > > > Further solution would be:
> > > > > > > > > - Add a new migration parameter: migrate-paused
> > > > > > > > > - change libvirt to use the new parameter if it exist
> > > > > > > > > - in qemu, when we do start migration (but after we wait for 
> > > > > > > > > the unplug
> > > > > > > > >   device) paused the guest before starting migration and 
> > > > > > > > > resume it after
> > > > > > > > >   migration finish.
> > > > > > > > 
> > > > > > > > It would also have to handle issuing of paused after migration 
> > > > > > > > has
> > > > > > > > been started - delay the pause request until the nuplug is 
> > > > > > > > complete
> > > > > > > > is one answer.
> > > > > > > 
> > > > > > > Hmm my worry would be that pausing is one way to give cpu
> > > > > > > resources back to host. It's problematic if guest can delay
> > > > > > > that indefinitely.
> > > > > > 
> > > > > > hmm, yes, that is awkward.  Perhaps we should just report an 
> > > > > > explicit
> > > > > > error then.
> > > > > 
> > > > > Report an error in response to which command? Do you mean
> > > > > fail migration?
> > > > 
> > > > If mgt attempt to pause an existing migration that hasn't finished
> > > > the PCI unplug

Re: [PATCH v2 01/27] migration: Network Failover can't work with a paused guest

2020-12-03 Thread Daniel P . Berrangé

On Thu, Dec 03, 2020 at 06:40:11AM -0500, Michael S. Tsirkin wrote:
> On Thu, Dec 03, 2020 at 11:32:53AM +, Daniel P. Berrangé wrote:
> > On Thu, Dec 03, 2020 at 06:21:47AM -0500, Michael S. Tsirkin wrote:
> > > On Wed, Dec 02, 2020 at 12:01:21PM +, Daniel P. Berrangé wrote:
> > > > On Wed, Dec 02, 2020 at 06:37:46AM -0500, Michael S. Tsirkin wrote:
> > > > > On Wed, Dec 02, 2020 at 11:26:39AM +, Daniel P. Berrangé wrote:
> > > > > > On Wed, Dec 02, 2020 at 06:19:29AM -0500, Michael S. Tsirkin wrote:
> > > > > > > On Wed, Dec 02, 2020 at 10:55:15AM +, Daniel P. Berrangé 
> > > > > > > wrote:
> > > > > > > > On Wed, Dec 02, 2020 at 11:51:05AM +0100, Juan Quintela wrote:
> > > > > > > > > "Michael S. Tsirkin"  wrote:
> > > > > > > > > > On Wed, Dec 02, 2020 at 05:31:53AM -0500, Michael S. 
> > > > > > > > > > Tsirkin wrote:
> > > > > > > > > >> On Wed, Dec 02, 2020 at 10:27:18AM +, Daniel P. 
> > > > > > > > > >> BerrangÃƒÂ© wrote:
> > > > > > > > > >> > On Wed, Dec 02, 2020 at 05:13:18AM -0500, Michael S. 
> > > > > > > > > >> > Tsirkin wrote:
> > > > > > > > > >> > > On Wed, Nov 18, 2020 at 09:37:22AM +0100, Juan 
> > > > > > > > > >> > > Quintela wrote:
> > > > > > > > > >> > > > If we have a paused guest, it can't unplug the 
> > > > > > > > > >> > > > network VF device, so
> > > > > > > > > >> > > > we wait there forever.  Just change the code to give 
> > > > > > > > > >> > > > one error on that
> > > > > > > > > >> > > > case.
> > > > > > > > > >> > > > 
> > > > > > > > > >> > > > Signed-off-by: Juan Quintela 
> > > > > > > > > >> > > 
> > > > > > > > > >> > > It's certainly possible but it's management that 
> > > > > > > > > >> > > created
> > > > > > > > > >> > > this situation after all - why do we bother to enforce
> > > > > > > > > >> > > a policy? It is possible that management will unpause 
> > > > > > > > > >> > > immediately
> > > > > > > > > >> > > afterwards and everything will proceed smoothly.
> > > > > > > > > >> > > 
> > > > > > > > > >> > > Yes migration will not happen until guest is
> > > > > > > > > >> > > unpaused but the same it true of e.g. a guest that is 
> > > > > > > > > >> > > stuck
> > > > > > > > > >> > > because of a bug.
> > > > > > > > > >> > 
> > > > > > > > > >> > That's pretty different behaviour from how migration 
> > > > > > > > > >> > normally handles
> > > > > > > > > >> > a paused guest, which is that it is guaranteed to 
> > > > > > > > > >> > complete the migration
> > > > > > > > > >> > in as short a time as network bandwidth allows.
> > > > > > > > > >> > 
> > > > > > > > > >> > Just ignoring the situation I think will lead to 
> > > > > > > > > >> > surprise apps / admins,
> > > > > > > > > >> > because the person/entity invoking the migration is not 
> > > > > > > > > >> > likely to have
> > > > > > > > > >> > checked wether this particular guest uses net failover 
> > > > > > > > > >> > or not before
> > > > > > > > > >> > invoking - they'll just be expecting a paused migration 
> > > > > > > > > >> > to run fast and
> > > > > > > > > >> > be guaranteed to complete.
> > > > > > > > > >> > 
> > > > > > > > > >> > Regards,
> > > > > > > > > >> > Daniel
> > > > > > > > > >> 
> > > > > > > > > >> Okay I guess. But then shouldn't we handle the reverse 
> > > > > > > > > >> situation too:
> > > > > > > > > >> pausing guest after migration started but before device was
> > > > > > > > > >> unplugged?
> > > > > > > > > >> 
> > > > > > > > > >
> > > > > > > > > > Thinking of which, I have no idea how we'd handle it - fail
> > > > > > > > > > pausing guest until migration is cancelled?
> > > > > > > > > >
> > > > > > > > > > All this seems heavy handed to me ...
> > > > > > > > > 
> > > > > > > > > This is the minimal fix that I can think of.
> > > > > > > > > 
> > > > > > > > > Further solution would be:
> > > > > > > > > - Add a new migration parameter: migrate-paused
> > > > > > > > > - change libvirt to use the new parameter if it exist
> > > > > > > > > - in qemu, when we do start migration (but after we wait for 
> > > > > > > > > the unplug
> > > > > > > > >   device) paused the guest before starting migration and 
> > > > > > > > > resume it after
> > > > > > > > >   migration finish.
> > > > > > > > 
> > > > > > > > It would also have to handle issuing of paused after migration 
> > > > > > > > has
> > > > > > > > been started - delay the pause request until the nuplug is 
> > > > > > > > complete
> > > > > > > > is one answer.
> > > > > > > 
> > > > > > > Hmm my worry would be that pausing is one way to give cpu
> > > > > > > resources back to host. It's problematic if guest can delay
> > > > > > > that indefinitely.
> > > > > > 
> > > > > > hmm, yes, that is awkward.  Perhaps we should just report an 
> > > > > > explicit
> > > > > > error then.
> > > > > 
> > > > > Report an error in response to which command? Do you mean
> > > > > fail migration?
> > > > 
> > > > If mgt attempt to pause an existing migration that hasn't finished
>

Re: [PULL 0/6] pc,vhost: fixes

2020-12-03 Thread Peter Maydell

On Thu, 3 Dec 2020 at 11:02, Peter Maydell  wrote:
>
> On Thu, 3 Dec 2020 at 10:59, Michael S. Tsirkin  wrote:
> >
> > On Thu, Dec 03, 2020 at 10:20:03AM +, Peter Maydell wrote:
> > > On Wed, 2 Dec 2020 at 11:03, Michael S. Tsirkin  wrote:
> > > >
> > > > Patch 5 gave me pause but we do need patch 6 as
> > > > it's a guest triggerable assert, and it seemed
> > > > cleaner to just take the whole patchset than cherry-pick.
> > >
> > > Is this only "fixes a guest triggerable assert"?
> >
> > My understanding is that without the patches a rhel7 guest triggers
> > the assert if started with vtd enabled and virtio-net with
> > iommu_platform=on.
> >
> > https://bugs.launchpad.net/qemu/+bug/1885175
>
> Bug reported in June ? Is this a regression since 5.1, or
> was it this way in 5.1 as well?

MST confirmed on IRC that this isn't a regression since 5.1,
so we'll fix it for 6.0.

thanks
-- PMM

Re: [PATCH v2 01/27] migration: Network Failover can't work with a paused guest

2020-12-03 Thread Michael S. Tsirkin

On Thu, Dec 03, 2020 at 11:45:12AM +, Daniel P. Berrangé wrote:
> On Thu, Dec 03, 2020 at 06:40:11AM -0500, Michael S. Tsirkin wrote:
> > On Thu, Dec 03, 2020 at 11:32:53AM +, Daniel P. Berrangé wrote:
> > > On Thu, Dec 03, 2020 at 06:21:47AM -0500, Michael S. Tsirkin wrote:
> > > > On Wed, Dec 02, 2020 at 12:01:21PM +, Daniel P. Berrangé wrote:
> > > > > On Wed, Dec 02, 2020 at 06:37:46AM -0500, Michael S. Tsirkin wrote:
> > > > > > On Wed, Dec 02, 2020 at 11:26:39AM +, Daniel P. Berrangé wrote:
> > > > > > > On Wed, Dec 02, 2020 at 06:19:29AM -0500, Michael S. Tsirkin 
> > > > > > > wrote:
> > > > > > > > On Wed, Dec 02, 2020 at 10:55:15AM +, Daniel P. Berrangé 
> > > > > > > > wrote:
> > > > > > > > > On Wed, Dec 02, 2020 at 11:51:05AM +0100, Juan Quintela wrote:
> > > > > > > > > > "Michael S. Tsirkin"  wrote:
> > > > > > > > > > > On Wed, Dec 02, 2020 at 05:31:53AM -0500, Michael S. 
> > > > > > > > > > > Tsirkin wrote:
> > > > > > > > > > >> On Wed, Dec 02, 2020 at 10:27:18AM +, Daniel P. 
> > > > > > > > > > >> BerrangÃƒÂ© wrote:
> > > > > > > > > > >> > On Wed, Dec 02, 2020 at 05:13:18AM -0500, Michael S. 
> > > > > > > > > > >> > Tsirkin wrote:
> > > > > > > > > > >> > > On Wed, Nov 18, 2020 at 09:37:22AM +0100, Juan 
> > > > > > > > > > >> > > Quintela wrote:
> > > > > > > > > > >> > > > If we have a paused guest, it can't unplug the 
> > > > > > > > > > >> > > > network VF device, so
> > > > > > > > > > >> > > > we wait there forever.  Just change the code to 
> > > > > > > > > > >> > > > give one error on that
> > > > > > > > > > >> > > > case.
> > > > > > > > > > >> > > > 
> > > > > > > > > > >> > > > Signed-off-by: Juan Quintela 
> > > > > > > > > > >> > > 
> > > > > > > > > > >> > > It's certainly possible but it's management that 
> > > > > > > > > > >> > > created
> > > > > > > > > > >> > > this situation after all - why do we bother to 
> > > > > > > > > > >> > > enforce
> > > > > > > > > > >> > > a policy? It is possible that management will 
> > > > > > > > > > >> > > unpause immediately
> > > > > > > > > > >> > > afterwards and everything will proceed smoothly.
> > > > > > > > > > >> > > 
> > > > > > > > > > >> > > Yes migration will not happen until guest is
> > > > > > > > > > >> > > unpaused but the same it true of e.g. a guest that 
> > > > > > > > > > >> > > is stuck
> > > > > > > > > > >> > > because of a bug.
> > > > > > > > > > >> > 
> > > > > > > > > > >> > That's pretty different behaviour from how migration 
> > > > > > > > > > >> > normally handles
> > > > > > > > > > >> > a paused guest, which is that it is guaranteed to 
> > > > > > > > > > >> > complete the migration
> > > > > > > > > > >> > in as short a time as network bandwidth allows.
> > > > > > > > > > >> > 
> > > > > > > > > > >> > Just ignoring the situation I think will lead to 
> > > > > > > > > > >> > surprise apps / admins,
> > > > > > > > > > >> > because the person/entity invoking the migration is 
> > > > > > > > > > >> > not likely to have
> > > > > > > > > > >> > checked wether this particular guest uses net failover 
> > > > > > > > > > >> > or not before
> > > > > > > > > > >> > invoking - they'll just be expecting a paused 
> > > > > > > > > > >> > migration to run fast and
> > > > > > > > > > >> > be guaranteed to complete.
> > > > > > > > > > >> > 
> > > > > > > > > > >> > Regards,
> > > > > > > > > > >> > Daniel
> > > > > > > > > > >> 
> > > > > > > > > > >> Okay I guess. But then shouldn't we handle the reverse 
> > > > > > > > > > >> situation too:
> > > > > > > > > > >> pausing guest after migration started but before device 
> > > > > > > > > > >> was
> > > > > > > > > > >> unplugged?
> > > > > > > > > > >> 
> > > > > > > > > > >
> > > > > > > > > > > Thinking of which, I have no idea how we'd handle it - 
> > > > > > > > > > > fail
> > > > > > > > > > > pausing guest until migration is cancelled?
> > > > > > > > > > >
> > > > > > > > > > > All this seems heavy handed to me ...
> > > > > > > > > > 
> > > > > > > > > > This is the minimal fix that I can think of.
> > > > > > > > > > 
> > > > > > > > > > Further solution would be:
> > > > > > > > > > - Add a new migration parameter: migrate-paused
> > > > > > > > > > - change libvirt to use the new parameter if it exist
> > > > > > > > > > - in qemu, when we do start migration (but after we wait 
> > > > > > > > > > for the unplug
> > > > > > > > > >   device) paused the guest before starting migration and 
> > > > > > > > > > resume it after
> > > > > > > > > >   migration finish.
> > > > > > > > > 
> > > > > > > > > It would also have to handle issuing of paused after 
> > > > > > > > > migration has
> > > > > > > > > been started - delay the pause request until the nuplug is 
> > > > > > > > > complete
> > > > > > > > > is one answer.
> > > > > > > > 
> > > > > > > > Hmm my worry would be that pausing is one way to give cpu
> > > > > > > > resources back to host. It's problematic if guest can delay
> > > > > > > > that

Re: [PATCH v2 2/4] hw/scsi/megasas: Assert cdb_len is valid in megasas_handle_scsi()

2020-12-03 Thread Li Qiang

Philippe Mathieu-Daudé  于2020年12月3日周四 下午7:37写道：
>
> Hi Li,
>
> On 12/3/20 12:21 PM, Li Qiang wrote:
> > Philippe Mathieu-Daudé  于2020年12月2日周三 上午3:13写道：
> >>
> >> cdb_len can not be zero... (or less than 6) here, else we have a
> >> out-of-bound read first in scsi_cdb_length():
> >>
> >>  71 int scsi_cdb_length(uint8_t *buf)
> >>  72 {
> >>  73 int cdb_len;
> >>  74
> >>  75 switch (buf[0] >> 5) {
> >
> > Hi Philippe,
> >
> > Here I not read the spec.
>
> Neither did I...
>
> > Just guest from your patch, this 'buf[0]>>5'
> > indicates/related with the cdb length, right?
>
> This is my understanding too.
>
> > So here(this patch) you  just want to ensure the 'buf[0]>>5' and the
> > 'cdb_len' is consistent.
> >
> > But I don't  think here is a 'out-of-bound' read issue.
> >
> >
> > The 'buf' is from here 'cdb'.
> > static int megasas_handle_scsi(MegasasState *s, MegasasCmd *cmd,
> >int frame_cmd)
> > {
> >
> > cdb = cmd->frame->pass.cdb;
> >
> > 'cmd->frame->pass.cdb' is an array in heap and  its data is mmaped
> > from the guest.
> >
> > The guest can put any data in 'pass.cdb' which 'buf[0]>>5' can be 0 or
> > less than 6.
> >
> > So every read of this 'pass.cdb'[0~15] is valid. Not an oob.
>
> OK maybe not OOB but infoleak?

No. We refer 'infoleak' in qemu situation if the guest can get some
memory(not the guest itself, but the qemu's process memory) from the
qemu.

However here the 'cdb' is actually mmaped from guest. It can be anything.
I think here it is just no use data.


Thanks,
Li Qiang

>
> >>  76 case 0:
> >>  77 cdb_len = 6;
> >>  78 break;
> >>
> >> Then another out-of-bound read when the size returned by
> >> scsi_cdb_length() is used.
> >
> > Where is this?
>
> IIRC scsi_req_parse_cdb().
>
> >
> > So I think your intention is to ensure  'cdb_len' is consistent with
> > 'cdb[0]>>5'.
> >
> > Please correct me if I'm wrong.
>
> I'll recheck and go back to you during January.
>
> Regards,
>
> Phil.
>
> >>
> >> Figured out after reviewing:
> >> https://www.mail-archive.com/qemu-devel@nongnu.org/msg757937.html
> >>
> >> And reproduced fuzzing:
> >>
> >>   qemu-fuzz-i386: hw/scsi/megasas.c:1679: int 
> >> megasas_handle_scsi(MegasasState *, MegasasCmd *, int):
> >>   Assertion `len > 0 && cdb_len >= len' failed.
> >>   ==1689590== ERROR: libFuzzer: deadly signal
> >>   #8 0x7f7a5d918e75 in __assert_fail (/lib64/libc.so.6+0x34e75)
> >>   #9 0x55a1b95cf6d4 in megasas_handle_scsi hw/scsi/megasas.c:1679:5
> >>   #10 0x55a1b95cf6d4 in megasas_handle_frame hw/scsi/megasas.c:1975:24
> >>   #11 0x55a1b95cf6d4 in megasas_mmio_write hw/scsi/megasas.c:2132:9
> >>   #12 0x55a1b981972e in memory_region_write_accessor 
> >> softmmu/memory.c:491:5
> >>   #13 0x55a1b981972e in access_with_adjusted_size 
> >> softmmu/memory.c:552:18
> >>   #14 0x55a1b981972e in memory_region_dispatch_write 
> >> softmmu/memory.c:1501:16
> >>   #15 0x55a1b97f0ab0 in flatview_write_continue 
> >> softmmu/physmem.c:2759:23
> >>   #16 0x55a1b97ec3f2 in flatview_write softmmu/physmem.c:2799:14
> >>   #17 0x55a1b97ec3f2 in address_space_write softmmu/physmem.c:2891:18
> >>   #18 0x55a1b985c7cd in cpu_outw softmmu/ioport.c:70:5
> >>   #19 0x55a1b99577ac in qtest_process_command softmmu/qtest.c:481:13
> >>
> >> Inspired-by: Daniele Buono 
> >> Inspired-by: Alexander Bulekov 
> >> Signed-off-by: Philippe Mathieu-Daudé 
> >> ---
> >>  hw/scsi/megasas.c | 5 +
> >>  1 file changed, 5 insertions(+)
> >>
> >> diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
> >> index 1a5fc5857db..f5ad4425b5b 100644
> >> --- a/hw/scsi/megasas.c
> >> +++ b/hw/scsi/megasas.c
> >> @@ -1667,6 +1667,7 @@ static int megasas_handle_scsi(MegasasState *s, 
> >> MegasasCmd *cmd,
> >>  {
> >>  uint8_t *cdb;
> >>  int target_id, lun_id, cdb_len;
> >> +int len = -1;
> >>  bool is_write;
> >>  struct SCSIDevice *sdev = NULL;
> >>  bool is_logical = (frame_cmd == MFI_CMD_LD_SCSI_IO);
> >> @@ -1676,6 +1677,10 @@ static int megasas_handle_scsi(MegasasState *s, 
> >> MegasasCmd *cmd,
> >>  lun_id = cmd->frame->header.lun_id;
> >>  cdb_len = cmd->frame->header.cdb_len;
> >>
> >> +if (cdb_len > 0) {
> >> +len = scsi_cdb_length(cdb);
> >> +}
> >> +assert(len > 0 && cdb_len >= len);
> >>  if (is_logical) {
> >>  if (target_id >= MFI_MAX_LD || lun_id != 0) {
> >>  trace_megasas_scsi_target_not_present(
> >> --
> >> 2.26.2
> >>
> >>
> >
>

Re: [PATCH v2 01/27] migration: Network Failover can't work with a paused guest

2020-12-03 Thread Michael S. Tsirkin

On Thu, Dec 03, 2020 at 11:43:41AM +, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (m...@redhat.com) wrote:
> > On Thu, Dec 03, 2020 at 11:32:53AM +, Daniel P. Berrangé wrote:
> > > On Thu, Dec 03, 2020 at 06:21:47AM -0500, Michael S. Tsirkin wrote:
> > > > On Wed, Dec 02, 2020 at 12:01:21PM +, Daniel P. Berrangé wrote:
> > > > > On Wed, Dec 02, 2020 at 06:37:46AM -0500, Michael S. Tsirkin wrote:
> > > > > > On Wed, Dec 02, 2020 at 11:26:39AM +, Daniel P. Berrangé wrote:
> > > > > > > On Wed, Dec 02, 2020 at 06:19:29AM -0500, Michael S. Tsirkin 
> > > > > > > wrote:
> > > > > > > > On Wed, Dec 02, 2020 at 10:55:15AM +, Daniel P. Berrangé 
> > > > > > > > wrote:
> > > > > > > > > On Wed, Dec 02, 2020 at 11:51:05AM +0100, Juan Quintela wrote:
> > > > > > > > > > "Michael S. Tsirkin"  wrote:
> > > > > > > > > > > On Wed, Dec 02, 2020 at 05:31:53AM -0500, Michael S. 
> > > > > > > > > > > Tsirkin wrote:
> > > > > > > > > > >> On Wed, Dec 02, 2020 at 10:27:18AM +, Daniel P. 
> > > > > > > > > > >> BerrangÃƒÂ© wrote:
> > > > > > > > > > >> > On Wed, Dec 02, 2020 at 05:13:18AM -0500, Michael S. 
> > > > > > > > > > >> > Tsirkin wrote:
> > > > > > > > > > >> > > On Wed, Nov 18, 2020 at 09:37:22AM +0100, Juan 
> > > > > > > > > > >> > > Quintela wrote:
> > > > > > > > > > >> > > > If we have a paused guest, it can't unplug the 
> > > > > > > > > > >> > > > network VF device, so
> > > > > > > > > > >> > > > we wait there forever.  Just change the code to 
> > > > > > > > > > >> > > > give one error on that
> > > > > > > > > > >> > > > case.
> > > > > > > > > > >> > > > 
> > > > > > > > > > >> > > > Signed-off-by: Juan Quintela 
> > > > > > > > > > >> > > 
> > > > > > > > > > >> > > It's certainly possible but it's management that 
> > > > > > > > > > >> > > created
> > > > > > > > > > >> > > this situation after all - why do we bother to 
> > > > > > > > > > >> > > enforce
> > > > > > > > > > >> > > a policy? It is possible that management will 
> > > > > > > > > > >> > > unpause immediately
> > > > > > > > > > >> > > afterwards and everything will proceed smoothly.
> > > > > > > > > > >> > > 
> > > > > > > > > > >> > > Yes migration will not happen until guest is
> > > > > > > > > > >> > > unpaused but the same it true of e.g. a guest that 
> > > > > > > > > > >> > > is stuck
> > > > > > > > > > >> > > because of a bug.
> > > > > > > > > > >> > 
> > > > > > > > > > >> > That's pretty different behaviour from how migration 
> > > > > > > > > > >> > normally handles
> > > > > > > > > > >> > a paused guest, which is that it is guaranteed to 
> > > > > > > > > > >> > complete the migration
> > > > > > > > > > >> > in as short a time as network bandwidth allows.
> > > > > > > > > > >> > 
> > > > > > > > > > >> > Just ignoring the situation I think will lead to 
> > > > > > > > > > >> > surprise apps / admins,
> > > > > > > > > > >> > because the person/entity invoking the migration is 
> > > > > > > > > > >> > not likely to have
> > > > > > > > > > >> > checked wether this particular guest uses net failover 
> > > > > > > > > > >> > or not before
> > > > > > > > > > >> > invoking - they'll just be expecting a paused 
> > > > > > > > > > >> > migration to run fast and
> > > > > > > > > > >> > be guaranteed to complete.
> > > > > > > > > > >> > 
> > > > > > > > > > >> > Regards,
> > > > > > > > > > >> > Daniel
> > > > > > > > > > >> 
> > > > > > > > > > >> Okay I guess. But then shouldn't we handle the reverse 
> > > > > > > > > > >> situation too:
> > > > > > > > > > >> pausing guest after migration started but before device 
> > > > > > > > > > >> was
> > > > > > > > > > >> unplugged?
> > > > > > > > > > >> 
> > > > > > > > > > >
> > > > > > > > > > > Thinking of which, I have no idea how we'd handle it - 
> > > > > > > > > > > fail
> > > > > > > > > > > pausing guest until migration is cancelled?
> > > > > > > > > > >
> > > > > > > > > > > All this seems heavy handed to me ...
> > > > > > > > > > 
> > > > > > > > > > This is the minimal fix that I can think of.
> > > > > > > > > > 
> > > > > > > > > > Further solution would be:
> > > > > > > > > > - Add a new migration parameter: migrate-paused
> > > > > > > > > > - change libvirt to use the new parameter if it exist
> > > > > > > > > > - in qemu, when we do start migration (but after we wait 
> > > > > > > > > > for the unplug
> > > > > > > > > >   device) paused the guest before starting migration and 
> > > > > > > > > > resume it after
> > > > > > > > > >   migration finish.
> > > > > > > > > 
> > > > > > > > > It would also have to handle issuing of paused after 
> > > > > > > > > migration has
> > > > > > > > > been started - delay the pause request until the nuplug is 
> > > > > > > > > complete
> > > > > > > > > is one answer.
> > > > > > > > 
> > > > > > > > Hmm my worry would be that pausing is one way to give cpu
> > > > > > > > resources back to host. It's problematic if guest can delay
> > > > > > > > that indefinitely.
> >

Re: [PATCH v2 00/28] target/arm: Implement v8.1M and Cortex-M55

2020-12-03 Thread Peter Maydell

On Thu, 19 Nov 2020 at 21:56, Peter Maydell  wrote:
>
> This is a v2 because it's a respin of "target/arm: More v8.1M
> features".  The bad news is it's nearly doubled in length.  The good
> news is that this is because the new patches on the end are enough to
> implement all the remaining missing v8.1M specifics to the point
> where we can provide a Cortex-M55 CPU.  (There is as yet no board
> model that uses the Cortex-M55, though; that's next on my todo list.)

I'm going to take all of this series into target-arm.next except:

>   target/arm: Implement FPCXT_NS fp system register
>   hw/intc/armv7m_nvic: Correct handling of CCR.BFHFNMIGN
>   target/arm: Implement Cortex-M55 model

(since the first two need a v2 and we don't want to enable the
CPU until all the pieces are there).

thanks
-- PMM

Re: [PATCH v2 01/27] migration: Network Failover can't work with a paused guest

2020-12-03 Thread Dr. David Alan Gilbert

* Michael S. Tsirkin (m...@redhat.com) wrote:
> On Thu, Dec 03, 2020 at 11:43:41AM +, Dr. David Alan Gilbert wrote:
> > * Michael S. Tsirkin (m...@redhat.com) wrote:
> > > On Thu, Dec 03, 2020 at 11:32:53AM +, Daniel P. Berrangé wrote:
> > > > On Thu, Dec 03, 2020 at 06:21:47AM -0500, Michael S. Tsirkin wrote:
> > > > > On Wed, Dec 02, 2020 at 12:01:21PM +, Daniel P. Berrangé wrote:
> > > > > > On Wed, Dec 02, 2020 at 06:37:46AM -0500, Michael S. Tsirkin wrote:
> > > > > > > On Wed, Dec 02, 2020 at 11:26:39AM +, Daniel P. Berrangé 
> > > > > > > wrote:
> > > > > > > > On Wed, Dec 02, 2020 at 06:19:29AM -0500, Michael S. Tsirkin 
> > > > > > > > wrote:
> > > > > > > > > On Wed, Dec 02, 2020 at 10:55:15AM +, Daniel P. Berrangé 
> > > > > > > > > wrote:
> > > > > > > > > > On Wed, Dec 02, 2020 at 11:51:05AM +0100, Juan Quintela 
> > > > > > > > > > wrote:
> > > > > > > > > > > "Michael S. Tsirkin"  wrote:
> > > > > > > > > > > > On Wed, Dec 02, 2020 at 05:31:53AM -0500, Michael S. 
> > > > > > > > > > > > Tsirkin wrote:
> > > > > > > > > > > >> On Wed, Dec 02, 2020 at 10:27:18AM +, Daniel P. 
> > > > > > > > > > > >> BerrangÃƒÂ© wrote:
> > > > > > > > > > > >> > On Wed, Dec 02, 2020 at 05:13:18AM -0500, Michael S. 
> > > > > > > > > > > >> > Tsirkin wrote:
> > > > > > > > > > > >> > > On Wed, Nov 18, 2020 at 09:37:22AM +0100, Juan 
> > > > > > > > > > > >> > > Quintela wrote:
> > > > > > > > > > > >> > > > If we have a paused guest, it can't unplug the 
> > > > > > > > > > > >> > > > network VF device, so
> > > > > > > > > > > >> > > > we wait there forever.  Just change the code to 
> > > > > > > > > > > >> > > > give one error on that
> > > > > > > > > > > >> > > > case.
> > > > > > > > > > > >> > > > 
> > > > > > > > > > > >> > > > Signed-off-by: Juan Quintela 
> > > > > > > > > > > >> > > > 
> > > > > > > > > > > >> > > 
> > > > > > > > > > > >> > > It's certainly possible but it's management that 
> > > > > > > > > > > >> > > created
> > > > > > > > > > > >> > > this situation after all - why do we bother to 
> > > > > > > > > > > >> > > enforce
> > > > > > > > > > > >> > > a policy? It is possible that management will 
> > > > > > > > > > > >> > > unpause immediately
> > > > > > > > > > > >> > > afterwards and everything will proceed smoothly.
> > > > > > > > > > > >> > > 
> > > > > > > > > > > >> > > Yes migration will not happen until guest is
> > > > > > > > > > > >> > > unpaused but the same it true of e.g. a guest that 
> > > > > > > > > > > >> > > is stuck
> > > > > > > > > > > >> > > because of a bug.
> > > > > > > > > > > >> > 
> > > > > > > > > > > >> > That's pretty different behaviour from how migration 
> > > > > > > > > > > >> > normally handles
> > > > > > > > > > > >> > a paused guest, which is that it is guaranteed to 
> > > > > > > > > > > >> > complete the migration
> > > > > > > > > > > >> > in as short a time as network bandwidth allows.
> > > > > > > > > > > >> > 
> > > > > > > > > > > >> > Just ignoring the situation I think will lead to 
> > > > > > > > > > > >> > surprise apps / admins,
> > > > > > > > > > > >> > because the person/entity invoking the migration is 
> > > > > > > > > > > >> > not likely to have
> > > > > > > > > > > >> > checked wether this particular guest uses net 
> > > > > > > > > > > >> > failover or not before
> > > > > > > > > > > >> > invoking - they'll just be expecting a paused 
> > > > > > > > > > > >> > migration to run fast and
> > > > > > > > > > > >> > be guaranteed to complete.
> > > > > > > > > > > >> > 
> > > > > > > > > > > >> > Regards,
> > > > > > > > > > > >> > Daniel
> > > > > > > > > > > >> 
> > > > > > > > > > > >> Okay I guess. But then shouldn't we handle the reverse 
> > > > > > > > > > > >> situation too:
> > > > > > > > > > > >> pausing guest after migration started but before 
> > > > > > > > > > > >> device was
> > > > > > > > > > > >> unplugged?
> > > > > > > > > > > >> 
> > > > > > > > > > > >
> > > > > > > > > > > > Thinking of which, I have no idea how we'd handle it - 
> > > > > > > > > > > > fail
> > > > > > > > > > > > pausing guest until migration is cancelled?
> > > > > > > > > > > >
> > > > > > > > > > > > All this seems heavy handed to me ...
> > > > > > > > > > > 
> > > > > > > > > > > This is the minimal fix that I can think of.
> > > > > > > > > > > 
> > > > > > > > > > > Further solution would be:
> > > > > > > > > > > - Add a new migration parameter: migrate-paused
> > > > > > > > > > > - change libvirt to use the new parameter if it exist
> > > > > > > > > > > - in qemu, when we do start migration (but after we wait 
> > > > > > > > > > > for the unplug
> > > > > > > > > > >   device) paused the guest before starting migration and 
> > > > > > > > > > > resume it after
> > > > > > > > > > >   migration finish.
> > > > > > > > > > 
> > > > > > > > > > It would also have to handle issuing of paused after 
> > > > > > > > > > migration has
> > > > > > > > > > been started - delay the pause reque

Re: [PATCH v2 01/27] migration: Network Failover can't work with a paused guest

2020-12-03 Thread Daniel P . Berrangé

On Thu, Dec 03, 2020 at 07:01:11AM -0500, Michael S. Tsirkin wrote:
> On Thu, Dec 03, 2020 at 11:45:12AM +, Daniel P. Berrangé wrote:
> > On Thu, Dec 03, 2020 at 06:40:11AM -0500, Michael S. Tsirkin wrote:
> > > On Thu, Dec 03, 2020 at 11:32:53AM +, Daniel P. Berrangé wrote:
> > > > On Thu, Dec 03, 2020 at 06:21:47AM -0500, Michael S. Tsirkin wrote:
> > > > > On Wed, Dec 02, 2020 at 12:01:21PM +, Daniel P. Berrangé wrote:
> > > > > > On Wed, Dec 02, 2020 at 06:37:46AM -0500, Michael S. Tsirkin wrote:
> > > > > > > On Wed, Dec 02, 2020 at 11:26:39AM +, Daniel P. Berrangé 
> > > > > > > wrote:
> > > > > > > > On Wed, Dec 02, 2020 at 06:19:29AM -0500, Michael S. Tsirkin 
> > > > > > > > wrote:
> > > > > > > > > On Wed, Dec 02, 2020 at 10:55:15AM +, Daniel P. Berrangé 
> > > > > > > > > wrote:
> > > > > > > > > > On Wed, Dec 02, 2020 at 11:51:05AM +0100, Juan Quintela 
> > > > > > > > > > wrote:
> > > > > > > > > > > "Michael S. Tsirkin"  wrote:
> > > > > > > > > > > > On Wed, Dec 02, 2020 at 05:31:53AM -0500, Michael S. 
> > > > > > > > > > > > Tsirkin wrote:
> > > > > > > > > > > >> On Wed, Dec 02, 2020 at 10:27:18AM +, Daniel P. 
> > > > > > > > > > > >> BerrangÃƒÂ© wrote:
> > > > > > > > > > > >> > On Wed, Dec 02, 2020 at 05:13:18AM -0500, Michael S. 
> > > > > > > > > > > >> > Tsirkin wrote:
> > > > > > > > > > > >> > > On Wed, Nov 18, 2020 at 09:37:22AM +0100, Juan 
> > > > > > > > > > > >> > > Quintela wrote:
> > > > > > > > > > > >> > > > If we have a paused guest, it can't unplug the 
> > > > > > > > > > > >> > > > network VF device, so
> > > > > > > > > > > >> > > > we wait there forever.  Just change the code to 
> > > > > > > > > > > >> > > > give one error on that
> > > > > > > > > > > >> > > > case.
> > > > > > > > > > > >> > > > 
> > > > > > > > > > > >> > > > Signed-off-by: Juan Quintela 
> > > > > > > > > > > >> > > > 
> > > > > > > > > > > >> > > 
> > > > > > > > > > > >> > > It's certainly possible but it's management that 
> > > > > > > > > > > >> > > created
> > > > > > > > > > > >> > > this situation after all - why do we bother to 
> > > > > > > > > > > >> > > enforce
> > > > > > > > > > > >> > > a policy? It is possible that management will 
> > > > > > > > > > > >> > > unpause immediately
> > > > > > > > > > > >> > > afterwards and everything will proceed smoothly.
> > > > > > > > > > > >> > > 
> > > > > > > > > > > >> > > Yes migration will not happen until guest is
> > > > > > > > > > > >> > > unpaused but the same it true of e.g. a guest that 
> > > > > > > > > > > >> > > is stuck
> > > > > > > > > > > >> > > because of a bug.
> > > > > > > > > > > >> > 
> > > > > > > > > > > >> > That's pretty different behaviour from how migration 
> > > > > > > > > > > >> > normally handles
> > > > > > > > > > > >> > a paused guest, which is that it is guaranteed to 
> > > > > > > > > > > >> > complete the migration
> > > > > > > > > > > >> > in as short a time as network bandwidth allows.
> > > > > > > > > > > >> > 
> > > > > > > > > > > >> > Just ignoring the situation I think will lead to 
> > > > > > > > > > > >> > surprise apps / admins,
> > > > > > > > > > > >> > because the person/entity invoking the migration is 
> > > > > > > > > > > >> > not likely to have
> > > > > > > > > > > >> > checked wether this particular guest uses net 
> > > > > > > > > > > >> > failover or not before
> > > > > > > > > > > >> > invoking - they'll just be expecting a paused 
> > > > > > > > > > > >> > migration to run fast and
> > > > > > > > > > > >> > be guaranteed to complete.
> > > > > > > > > > > >> > 
> > > > > > > > > > > >> > Regards,
> > > > > > > > > > > >> > Daniel
> > > > > > > > > > > >> 
> > > > > > > > > > > >> Okay I guess. But then shouldn't we handle the reverse 
> > > > > > > > > > > >> situation too:
> > > > > > > > > > > >> pausing guest after migration started but before 
> > > > > > > > > > > >> device was
> > > > > > > > > > > >> unplugged?
> > > > > > > > > > > >> 
> > > > > > > > > > > >
> > > > > > > > > > > > Thinking of which, I have no idea how we'd handle it - 
> > > > > > > > > > > > fail
> > > > > > > > > > > > pausing guest until migration is cancelled?
> > > > > > > > > > > >
> > > > > > > > > > > > All this seems heavy handed to me ...
> > > > > > > > > > > 
> > > > > > > > > > > This is the minimal fix that I can think of.
> > > > > > > > > > > 
> > > > > > > > > > > Further solution would be:
> > > > > > > > > > > - Add a new migration parameter: migrate-paused
> > > > > > > > > > > - change libvirt to use the new parameter if it exist
> > > > > > > > > > > - in qemu, when we do start migration (but after we wait 
> > > > > > > > > > > for the unplug
> > > > > > > > > > >   device) paused the guest before starting migration and 
> > > > > > > > > > > resume it after
> > > > > > > > > > >   migration finish.
> > > > > > > > > > 
> > > > > > > > > > It would also have to handle issuing of paused after 
> > > > > > > > > > migration has
> > > > > > > >

Re: [PATCH v2 01/27] migration: Network Failover can't work with a paused guest

2020-12-03 Thread Michael S. Tsirkin

On Thu, Dec 03, 2020 at 11:43:41AM +, Dr. David Alan Gilbert wrote:
> Another way to solve this would be to remove the unplugging from the
> migration layer and leave it as a problem for the management layer to do
> the unplug.

Daniel described the problem with modular management tools which expect
pausing or slowing down guest to cause migration to converge.

Point is, it actually *will* make it converge but only if you
pause it after unplug.

As it is, these tools fundamentally can not handle failover
requiring guest cooperation. Moving code between layers won't help.
Introducing failure modes as this patch does won't help either
especially since Daniel wrote there are countless tools like this.
We just break them all but have no resources to fix them,
this does not help at all.

We can just leave the situation as is.

Or if we do want to be nice to these tools, how about we
unpause the guest until unplug, then pause it again?
This actually addresses the problem instead of
shifting the blame, does it not?

-- 
MST

Re: [PATCH v2 01/27] migration: Network Failover can't work with a paused guest

2020-12-03 Thread Michael S. Tsirkin

On Thu, Dec 03, 2020 at 12:06:14PM +, Daniel P. BerrangÃ© wrote:
> > > It isn't really about the admin.  It is about countless existing mgmt apps
> > > that expect migration will always succeed if the VM is paused. The mgmt
> > > apps triggering the migraiton is not neccessarily the same as the app
> > > which introduced the use of NIC failover in the config.
> > > 
> > > eg in OpenStack Nova provides the VM config, but there are completely
> > > separate apps that are built todo automation on top of Nova which 
> > > this is liable to break. There's no human admin there to diagnose
> > > this and re-try with unpause, as all the logic is in the apps.
> > > 
> > > 
> > > Regards,
> > > Daniel
> > 
> > So let's say pause fails. Won't this break these theoretical apps?
> 
> Yes, they are broken in a way that can now actually be seen, as opposed
> to just hanging forever in migration.

Right but there are countless apps out there, and they are all broken,
maybe not too hard to debug if one knows where to look, but still ...

And again only true for pause then migrate, migrate then pause is still
broken, and if we fix it by failing pause then no one knows whether apps
are prepared to handle a failure in the pause command, previously if
used correctly it would never fail I think ...

-- 
MST

Re: [PATCH v2 01/27] migration: Network Failover can't work with a paused guest

2020-12-03 Thread Daniel P . Berrangé

On Thu, Dec 03, 2020 at 07:11:17AM -0500, Michael S. Tsirkin wrote:
> On Thu, Dec 03, 2020 at 11:43:41AM +, Dr. David Alan Gilbert wrote:
> > Another way to solve this would be to remove the unplugging from the
> > migration layer and leave it as a problem for the management layer to do
> > the unplug.
> 
> Daniel described the problem with modular management tools which expect
> pausing or slowing down guest to cause migration to converge.
> 
> Point is, it actually *will* make it converge but only if you
> pause it after unplug.
> 
> As it is, these tools fundamentally can not handle failover
> requiring guest cooperation. Moving code between layers won't help.
> Introducing failure modes as this patch does won't help either
> especially since Daniel wrote there are countless tools like this.
> We just break them all but have no resources to fix them,
> this does not help at all.
> 
> We can just leave the situation as is.
> 
> Or if we do want to be nice to these tools, how about we
> unpause the guest until unplug, then pause it again?
> This actually addresses the problem instead of
> shifting the blame, does it not?

This is a very bad idea because it changes the execution status of the
guest behind the apps/admins back, and that cannot be assumed to be a
safe thing todo.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH for-6.0 00/11] target/arm: enforce alignment

2020-12-03 Thread Philippe Mathieu-Daudé

Cc'ing Pavel

On 12/1/20 4:55 PM, Peter Maydell wrote:
> On Wed, 25 Nov 2020 at 04:06, Richard Henderson
>  wrote:
>>
>> As reported in https://bugs.launchpad.net/bugs/1905356
>>
>> Not implementing SCTLR.A, but all of the other required
>> alignment for SCTLR.A=0 in Table A3-1.
> 
> Something in this series breaks the 'make check-acceptance'
> record-and-replay test:
> 
>  (30/40) 
> tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_aarch64_virt:
> PASS (9.14 s)
>  (31/40) tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_arm_virt:
> INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred:
> Timeout reached\nOriginal status: ERROR\n{'name':
> '31-tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_arm_virt',
> 'logdir': 
> '/home/petmay01/linaro/qemu-from-laptop/qemu/build/arm-clang/tests/result...
> (90.19 s)
> 
> The log shows the "recording execution" apparently hanging,
> with the last output from the guest
> [3.183662] Registering SWP/SWPB emulation handler
> 
> thanks
> -- PMM
>

Re: [PATCH v2 2/4] hw/scsi/megasas: Assert cdb_len is valid in megasas_handle_scsi()

2020-12-03 Thread Philippe Mathieu-Daudé

On 12/3/20 1:02 PM, Li Qiang wrote:
> Philippe Mathieu-Daudé  于2020年12月3日周四 下午7:37写道：
>>
>> Hi Li,
>>
>> On 12/3/20 12:21 PM, Li Qiang wrote:
>>> Philippe Mathieu-Daudé  于2020年12月2日周三 上午3:13写道：

 cdb_len can not be zero... (or less than 6) here, else we have a
 out-of-bound read first in scsi_cdb_length():

  71 int scsi_cdb_length(uint8_t *buf)
  72 {
  73 int cdb_len;
  74
  75 switch (buf[0] >> 5) {
>>>
>>> Hi Philippe,
>>>
>>> Here I not read the spec.
>>
>> Neither did I...
>>
>>> Just guest from your patch, this 'buf[0]>>5'
>>> indicates/related with the cdb length, right?
>>
>> This is my understanding too.
>>
>>> So here(this patch) you  just want to ensure the 'buf[0]>>5' and the
>>> 'cdb_len' is consistent.
>>>
>>> But I don't  think here is a 'out-of-bound' read issue.
>>>
>>>
>>> The 'buf' is from here 'cdb'.
>>> static int megasas_handle_scsi(MegasasState *s, MegasasCmd *cmd,
>>>int frame_cmd)
>>> {
>>>
>>> cdb = cmd->frame->pass.cdb;
>>>
>>> 'cmd->frame->pass.cdb' is an array in heap and  its data is mmaped
>>> from the guest.
>>>
>>> The guest can put any data in 'pass.cdb' which 'buf[0]>>5' can be 0 or
>>> less than 6.
>>>
>>> So every read of this 'pass.cdb'[0~15] is valid. Not an oob.
>>
>> OK maybe not OOB but infoleak?
> 
> No. We refer 'infoleak' in qemu situation if the guest can get some
> memory(not the guest itself, but the qemu's process memory) from the
> qemu.
> 
> However here the 'cdb' is actually mmaped from guest. It can be anything.
> I think here it is just no use data.

'pass.cdb'[0~15] is allocated. If it gets filled with only
1 byte: 0x04, then scsi_cdb_length() returns buflen = 16
while only 1 byte is filled, so callers will read 1 byte
set and 15 random bytes.

You are saying this is not an "INFOleak" because the
leaked memory is allocated on the heap, so nothing critical /
useful gets stored there?

While this might not be a security problem, this still produces
unpredictable code behavior, so deserve to be fixed.

>>
  76 case 0:
  77 cdb_len = 6;
  78 break;

 Then another out-of-bound read when the size returned by
 scsi_cdb_length() is used.
>>>
>>> Where is this?
>>
>> IIRC scsi_req_parse_cdb().
>>
>>>
>>> So I think your intention is to ensure  'cdb_len' is consistent with
>>> 'cdb[0]>>5'.
>>>
>>> Please correct me if I'm wrong.
>>
>> I'll recheck and go back to you during January.
>>
>> Regards,
>>
>> Phil.
>>

 Figured out after reviewing:
 https://www.mail-archive.com/qemu-devel@nongnu.org/msg757937.html

 And reproduced fuzzing:

   qemu-fuzz-i386: hw/scsi/megasas.c:1679: int 
 megasas_handle_scsi(MegasasState *, MegasasCmd *, int):
   Assertion `len > 0 && cdb_len >= len' failed.
   ==1689590== ERROR: libFuzzer: deadly signal
   #8 0x7f7a5d918e75 in __assert_fail (/lib64/libc.so.6+0x34e75)
   #9 0x55a1b95cf6d4 in megasas_handle_scsi hw/scsi/megasas.c:1679:5
   #10 0x55a1b95cf6d4 in megasas_handle_frame hw/scsi/megasas.c:1975:24
   #11 0x55a1b95cf6d4 in megasas_mmio_write hw/scsi/megasas.c:2132:9
   #12 0x55a1b981972e in memory_region_write_accessor 
 softmmu/memory.c:491:5
   #13 0x55a1b981972e in access_with_adjusted_size 
 softmmu/memory.c:552:18
   #14 0x55a1b981972e in memory_region_dispatch_write 
 softmmu/memory.c:1501:16
   #15 0x55a1b97f0ab0 in flatview_write_continue 
 softmmu/physmem.c:2759:23
   #16 0x55a1b97ec3f2 in flatview_write softmmu/physmem.c:2799:14
   #17 0x55a1b97ec3f2 in address_space_write softmmu/physmem.c:2891:18
   #18 0x55a1b985c7cd in cpu_outw softmmu/ioport.c:70:5
   #19 0x55a1b99577ac in qtest_process_command softmmu/qtest.c:481:13

 Inspired-by: Daniele Buono 
 Inspired-by: Alexander Bulekov 
 Signed-off-by: Philippe Mathieu-Daudé 
 ---
  hw/scsi/megasas.c | 5 +
  1 file changed, 5 insertions(+)

 diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
 index 1a5fc5857db..f5ad4425b5b 100644
 --- a/hw/scsi/megasas.c
 +++ b/hw/scsi/megasas.c
 @@ -1667,6 +1667,7 @@ static int megasas_handle_scsi(MegasasState *s, 
 MegasasCmd *cmd,
  {
  uint8_t *cdb;
  int target_id, lun_id, cdb_len;
 +int len = -1;
  bool is_write;
  struct SCSIDevice *sdev = NULL;
  bool is_logical = (frame_cmd == MFI_CMD_LD_SCSI_IO);
 @@ -1676,6 +1677,10 @@ static int megasas_handle_scsi(MegasasState *s, 
 MegasasCmd *cmd,
  lun_id = cmd->frame->header.lun_id;
  cdb_len = cmd->frame->header.cdb_len;

 +if (cdb_len > 0) {
 +len = scsi_cdb_length(cdb);
 +}
 +assert(len > 0 && cdb_len >= len);
  if (is_logical) {
  if (target_id >= MFI_MAX_LD || lun_id != 0) {
  trace_megasas_scsi_target_

Re: help with a build-user and build-user-plugin failure

2020-12-03 Thread Claudio Fontana

Hi all,

and thanks for the help, after a lot of fiddling and applying your suggestions 
(and a reboot !?)
now things work.

The only thing I am left seeing (also on master) is with check-tcg:


Remote 'g' packet reply is too long (expected 312 bytes, got 560 bytes): 
004000800431001006300010063482009207
Traceback (most recent call last):
  File "/home/claudio/git/qemu/tests/tcg/multiarch/gdbstub/sha1.py", line 68, 
in 
if gdb.parse_and_eval('$pc') == 0:
gdb.error: No registers.


a number of times during the test.

Seems not to break anything, but I wonder if it is expected or it would need 
suppressing?

Thanks again,

Claudio


On 11/25/20 6:02 PM, Alex Bennée wrote:
> 
> Claudio Fontana  writes:
> 
>> Hi Alex,
>>
>> On 11/25/20 10:42 AM, Alex Bennée wrote:
>>>
>>> Philippe Mathieu-Daudé  writes:
>>>
 On 11/24/20 12:04 PM, Claudio Fontana wrote:
> Hi Alex,
>
> I am seeing build failures with build-user and build-user-plugin:
>
> https://gitlab.com/hw-claudio/qemu/-/pipelines/220245998
>
> and I am trying to start investigating.
>
> How do I reproduce this locally?
>
> I am trying to run locally the check-tcg rule, but I cannot get it to 
> work.
> I managed to work around the problem of static libraries (disabled them),
>
> but then I get:
>
>   BUILD   TCG tests for x86_64-linux-user
>   BUILD   x86_64-linux-user guest-tests with cc
> /usr/lib64/gcc/x86_64-suse-linux/7/../../../../x86_64-suse-linux/bin/ld: 
> /tmp/ccgqtAM9.o: in function `test_fops':
> /dev/shm/cfontana/qemu/tests/tcg/i386/test-i386.c:759: undefined 
> reference to `fmod'
> /usr/lib64/gcc/x86_64-suse-linux/7/../../../../x86_64-suse-linux/bin/ld: 
> /dev/shm/cfontana/qemu/tests/tcg/i386/test-i386.c:760: undefined 
> reference to `sqrt'
> /usr/lib64/gcc/x86_64-suse-linux/7/../../../../x86_64-suse-linux/bin/ld: 
> /dev/shm/cfontana/qemu/tests/tcg/i386/test-i386.c:761: undefined 
> reference to `sin'
> /usr/lib64/gcc/x86_64-suse-linux/7/../../../../x86_64-suse-linux/bin/ld: 
> /dev/shm/cfontana/qemu/tests/tcg/i386/test-i386.c:762: undefined 
> reference to `cos'
>
> Have you seen it before?
> Any suggestions? I'm on OpenSUSE Leap 15 SP2.

 Related to 3fc1aad3864 ("configure: remove unnecessary libm test")
 + tcg tests still not ported to Meson?
>>>
>>> Hmm so we certainly need libm for the testcase but I guess this is> failing 
>>> with a local cross compiler rather than docker? I'm not sure the
>>> global feature test should be relevant for testcases.
>>>
>>
>> Probably it's my attempt to make it work with non-static libm that failed 
>> then,
>>
>> is it supposed to work?
>>
>> I see mention of BUILD_STATIC there, but it does not seem to actually work 
>> for me.
>>
>> If I use static libm, then it works.
>> If I uninstall static libm, any attempt to build fails, regardless of
>> whether I pass BUILD_STATIC='n' or so.
> 
> All the test cases themselves should be built as static although I see
> we fall back for the case of using a local cross compiler. That normally
> only covers the case where the host compiler can also build for 32 bit
> for testcases.
> 
>>
>> Ciao and thanks,
>>
>> CLaudio
> 
>

[PATCH RFC v4 07/15] hw/riscv: PLIC update external interrupt by KVM when kvm enabled

2020-12-03 Thread Yifei Jiang

Only support supervisor external interrupt currently.

Signed-off-by: Yifei Jiang 
Signed-off-by: Yipeng Yin 
---
 hw/intc/sifive_plic.c| 31 ++-
 target/riscv/kvm.c   | 19 +++
 target/riscv/kvm_riscv.h |  1 +
 3 files changed, 42 insertions(+), 9 deletions(-)

diff --git a/hw/intc/sifive_plic.c b/hw/intc/sifive_plic.c
index 97a1a27a9a..a419ca3a3c 100644
--- a/hw/intc/sifive_plic.c
+++ b/hw/intc/sifive_plic.c
@@ -31,6 +31,8 @@
 #include "target/riscv/cpu.h"
 #include "sysemu/sysemu.h"
 #include "migration/vmstate.h"
+#include "sysemu/kvm.h"
+#include "kvm_riscv.h"
 
 #define RISCV_DEBUG_PLIC 0
 
@@ -147,15 +149,26 @@ static void sifive_plic_update(SiFivePLICState *plic)
 continue;
 }
 int level = sifive_plic_irqs_pending(plic, addrid);
-switch (mode) {
-case PLICMode_M:
-riscv_cpu_update_mip(RISCV_CPU(cpu), MIP_MEIP, 
BOOL_TO_MASK(level));
-break;
-case PLICMode_S:
-riscv_cpu_update_mip(RISCV_CPU(cpu), MIP_SEIP, 
BOOL_TO_MASK(level));
-break;
-default:
-break;
+if (kvm_enabled()) {
+if (mode == PLICMode_M) {
+continue;
+}
+#ifdef CONFIG_KVM
+kvm_riscv_set_irq(RISCV_CPU(cpu), IRQ_S_EXT, level);
+#endif
+} else {
+switch (mode) {
+case PLICMode_M:
+riscv_cpu_update_mip(RISCV_CPU(cpu),
+ MIP_MEIP, BOOL_TO_MASK(level));
+break;
+case PLICMode_S:
+riscv_cpu_update_mip(RISCV_CPU(cpu),
+ MIP_SEIP, BOOL_TO_MASK(level));
+break;
+default:
+break;
+}
 }
 }
 
diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
index 6250ca0c7d..b01ff0754c 100644
--- a/target/riscv/kvm.c
+++ b/target/riscv/kvm.c
@@ -454,3 +454,22 @@ void kvm_riscv_reset_vcpu(RISCVCPU *cpu)
 env->satp = 0;
 }
 
+void kvm_riscv_set_irq(RISCVCPU *cpu, int irq, int level)
+{
+int ret;
+unsigned virq = level ? KVM_INTERRUPT_SET : KVM_INTERRUPT_UNSET;
+
+if (irq != IRQ_S_EXT) {
+return;
+}
+
+if (!kvm_enabled()) {
+return;
+}
+
+ret = kvm_vcpu_ioctl(CPU(cpu), KVM_INTERRUPT, &virq);
+if (ret < 0) {
+perror("Set irq failed");
+abort();
+}
+}
diff --git a/target/riscv/kvm_riscv.h b/target/riscv/kvm_riscv.h
index f38c82bf59..ed281bdce0 100644
--- a/target/riscv/kvm_riscv.h
+++ b/target/riscv/kvm_riscv.h
@@ -20,5 +20,6 @@
 #define QEMU_KVM_RISCV_H
 
 void kvm_riscv_reset_vcpu(RISCVCPU *cpu);
+void kvm_riscv_set_irq(RISCVCPU *cpu, int irq, int level);
 
 #endif
-- 
2.19.1

[PATCH RFC v4 14/15] target/riscv: Synchronize vcpu's frequency with KVM

2020-12-03 Thread Yifei Jiang

If vcpu's frequency is specified by cpu option 'frequency', it
will be set into KVM by KVM_SET_ONE_REG ioctl. Otherwise, vcpu's
frequency will follow KVM by KVM_GET_ONE_REG ioctl.

Signed-off-by: Yifei Jiang 
Signed-off-by: Yipeng Yin 
---
 target/riscv/kvm.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
index 1e16d24544..3499efd4eb 100644
--- a/target/riscv/kvm.c
+++ b/target/riscv/kvm.c
@@ -409,6 +409,8 @@ int kvm_arch_get_registers(CPUState *cs)
 int kvm_arch_put_registers(CPUState *cs, int level)
 {
 int ret = 0;
+RISCVCPU *cpu = RISCV_CPU(cs);
+CPURISCVState *env = &cpu->env;
 
 ret = kvm_riscv_put_regs_core(cs);
 if (ret) {
@@ -425,6 +427,10 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 return ret;
 }
 
+if (env->frequency) {
+ret = kvm_set_one_reg(cs, RISCV_TIMER_REG(frequency), &env->frequency);
+}
+
 return ret;
 }
 
@@ -481,6 +487,17 @@ int kvm_arch_init_vcpu(CPUState *cs)
 }
 env->misa = isa;
 
+/*
+ * Synchronize vcpu's frequency with KVM. If vcpu's frequency is specified
+ * by cpu option 'frequency', this will be set to KVM. Otherwise, vcpu's
+ * frequency will follow KVM.
+ */
+if (env->user_frequency) {
+ret = kvm_set_one_reg(cs, RISCV_TIMER_REG(frequency), &env->frequency);
+} else {
+ret = kvm_get_one_reg(cs, RISCV_TIMER_REG(frequency), &env->frequency);
+}
+
 return ret;
 }
 
-- 
2.19.1

[PATCH RFC v4 01/15] linux-header: Update linux/kvm.h

2020-12-03 Thread Yifei Jiang

Update linux-headers/linux/kvm.h from
https://github.com/avpatel/linux/tree/riscv_kvm_v15.
Only use this header file, so here do not update all linux headers by
update-linux-headers.sh before above KVM series is accepted.

Signed-off-by: Yifei Jiang 
Signed-off-by: Yipeng Yin 
---
 linux-headers/linux/kvm.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index 56ce14ad20..ddeedcf3df 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -250,6 +250,7 @@ struct kvm_hyperv_exit {
 #define KVM_EXIT_ARM_NISV 28
 #define KVM_EXIT_X86_RDMSR29
 #define KVM_EXIT_X86_WRMSR30
+#define KVM_EXIT_RISCV_SBI31
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -426,6 +427,13 @@ struct kvm_run {
__u32 index; /* kernel -> user */
__u64 data; /* kernel <-> user */
} msr;
+   /* KVM_EXIT_RISCV_SBI */
+   struct {
+   unsigned long extension_id;
+   unsigned long function_id;
+   unsigned long args[6];
+   unsigned long ret[2];
+   } riscv_sbi;
/* Fix the size of the union. */
char padding[256];
};
-- 
2.19.1

[PATCH RFC v4 05/15] target/riscv: Implement kvm_arch_put_registers

2020-12-03 Thread Yifei Jiang

Put GPR CSR and FP registers to kvm by KVM_SET_ONE_REG ioctl

Signed-off-by: Yifei Jiang 
Signed-off-by: Yipeng Yin 
---
 target/riscv/kvm.c | 142 -
 1 file changed, 141 insertions(+), 1 deletion(-)

diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
index e679619e79..8b206ce99c 100644
--- a/target/riscv/kvm.c
+++ b/target/riscv/kvm.c
@@ -85,6 +85,31 @@ static int kvm_riscv_get_regs_core(CPUState *cs)
 return ret;
 }
 
+static int kvm_riscv_put_regs_core(CPUState *cs)
+{
+int ret = 0;
+int i;
+target_ulong reg;
+CPURISCVState *env = &RISCV_CPU(cs)->env;
+
+reg = env->pc;
+ret = kvm_set_one_reg(cs, RISCV_CORE_REG(regs.pc), ®);
+if (ret) {
+return ret;
+}
+
+for (i = 1; i < 32; i++) {
+__u64 id = kvm_riscv_reg_id(KVM_REG_RISCV_CORE, i);
+reg = env->gpr[i];
+ret = kvm_set_one_reg(cs, id, ®);
+if (ret) {
+return ret;
+}
+}
+
+return ret;
+}
+
 static int kvm_riscv_get_regs_csr(CPUState *cs)
 {
 int ret = 0;
@@ -148,6 +173,70 @@ static int kvm_riscv_get_regs_csr(CPUState *cs)
 return ret;
 }
 
+static int kvm_riscv_put_regs_csr(CPUState *cs)
+{
+int ret = 0;
+target_ulong reg;
+CPURISCVState *env = &RISCV_CPU(cs)->env;
+
+reg = env->mstatus;
+ret = kvm_set_one_reg(cs, RISCV_CSR_REG(sstatus), ®);
+if (ret) {
+return ret;
+}
+
+reg = env->mie;
+ret = kvm_set_one_reg(cs, RISCV_CSR_REG(sie), ®);
+if (ret) {
+return ret;
+}
+
+reg = env->stvec;
+ret = kvm_set_one_reg(cs, RISCV_CSR_REG(stvec), ®);
+if (ret) {
+return ret;
+}
+
+reg = env->sscratch;
+ret = kvm_set_one_reg(cs, RISCV_CSR_REG(sscratch), ®);
+if (ret) {
+return ret;
+}
+
+reg = env->sepc;
+ret = kvm_set_one_reg(cs, RISCV_CSR_REG(sepc), ®);
+if (ret) {
+return ret;
+}
+
+reg = env->scause;
+ret = kvm_set_one_reg(cs, RISCV_CSR_REG(scause), ®);
+if (ret) {
+return ret;
+}
+
+reg = env->sbadaddr;
+ret = kvm_set_one_reg(cs, RISCV_CSR_REG(stval), ®);
+if (ret) {
+return ret;
+}
+
+reg = env->mip;
+ret = kvm_set_one_reg(cs, RISCV_CSR_REG(sip), ®);
+if (ret) {
+return ret;
+}
+
+reg = env->satp;
+ret = kvm_set_one_reg(cs, RISCV_CSR_REG(satp), ®);
+if (ret) {
+return ret;
+}
+
+return ret;
+}
+
+
 static int kvm_riscv_get_regs_fp(CPUState *cs)
 {
 int ret = 0;
@@ -181,6 +270,40 @@ static int kvm_riscv_get_regs_fp(CPUState *cs)
 return ret;
 }
 
+static int kvm_riscv_put_regs_fp(CPUState *cs)
+{
+int ret = 0;
+int i;
+CPURISCVState *env = &RISCV_CPU(cs)->env;
+
+if (riscv_has_ext(env, RVD)) {
+uint64_t reg;
+for (i = 0; i < 32; i++) {
+reg = env->fpr[i];
+ret = kvm_set_one_reg(cs, RISCV_FP_D_REG(i), ®);
+if (ret) {
+return ret;
+}
+}
+return ret;
+}
+
+if (riscv_has_ext(env, RVF)) {
+uint32_t reg;
+for (i = 0; i < 32; i++) {
+reg = env->fpr[i];
+ret = kvm_set_one_reg(cs, RISCV_FP_F_REG(i), ®);
+if (ret) {
+return ret;
+}
+}
+return ret;
+}
+
+return ret;
+}
+
+
 const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 KVM_CAP_LAST_INFO
 };
@@ -209,7 +332,24 @@ int kvm_arch_get_registers(CPUState *cs)
 
 int kvm_arch_put_registers(CPUState *cs, int level)
 {
-return 0;
+int ret = 0;
+
+ret = kvm_riscv_put_regs_core(cs);
+if (ret) {
+return ret;
+}
+
+ret = kvm_riscv_put_regs_csr(cs);
+if (ret) {
+return ret;
+}
+
+ret = kvm_riscv_put_regs_fp(cs);
+if (ret) {
+return ret;
+}
+
+return ret;
 }
 
 int kvm_arch_release_virq_post(int virq)
-- 
2.19.1

[PATCH RFC v4 00/15] Add riscv kvm accel support

2020-12-03 Thread Yifei Jiang

This series adds both riscv32 and riscv64 kvm support, and implements
migration based on riscv. It is based on temporarily unaccepted kvm:
https://github.com/kvm-riscv/linux (lastest version v15).

This series depends on above pending changes which haven't yet been
accepted, so this QEMU patch series is treated as RFC patches until
that dependency has been dealt with.

Compared to RFC v3, the time scaling is supported in this series. The
new feature also requires the following patches:
[1] Bugfix in kvm v15
https://lkml.org/lkml/2020/11/30/245
[2] kvm patches:
[PATCH RFC 0/3] Implement guest time scaling in RISC-V KVM

Several steps to use this:
1. Build emulation
$ ./configure --target-list=riscv64-softmmu
$ make -j$(nproc)

2. Build kernel
https://github.com/kvm-riscv/linux

3. Build QEMU VM
Cross built in riscv toolchain.
$ PKG_CONFIG_LIBDIR=
$ export PKG_CONFIG_SYSROOT_DIR=
$ ./configure --target-list=riscv64-softmmu --enable-kvm \
--cross-prefix=riscv64-linux-gnu- --disable-libiscsi --disable-glusterfs \
--disable-libusb --disable-usb-redir --audio-drv-list= --disable-opengl \
--disable-libxml2
$ make -j$(nproc)

4. Start emulation
$ ./qemu-system-riscv64 -M virt -m 4096M -cpu rv64,x-h=true -nographic \
-name guest=riscv-hyp,debug-threads=on \
-smp 4 \
-bios ./fw_jump.bin \
-kernel ./Image \
-drive file=./hyp.img,format=raw,id=hd0 \
-device virtio-blk-device,drive=hd0 \
-append "root=/dev/vda rw console=ttyS0 earlycon=sbi"

5. Start kvm-acceled QEMU VM in emulation
$ ./qemu-system-riscv64 -M virt,accel=kvm -m 1024M -cpu host -nographic \
-name guest=riscv-guset \
-smp 2 \
-bios none \
-kernel ./Image \
-drive file=./guest.img,format=raw,id=hd0 \
-device virtio-blk-device,drive=hd0 \
-append "root=/dev/vda rw console=ttyS0 earlycon=sbi"

Changes since RFC v3
- Rebase on QEMU v5.2.0-rc2 and kvm-riscv linux v15.
- Add time scaling support(New patches 13, 14 and 15).
- Fix the bug that guest vm can't reboot.

Changes since RFC v2
- Fix checkpatch error at target/riscv/sbi_ecall_interface.h.
- Add riscv migration support.

Changes since RFC v1
- Add separate SBI ecall interface header.
- Add riscv32 kvm accel support.

Yifei Jiang (15):
  linux-header: Update linux/kvm.h
  target/riscv: Add target/riscv/kvm.c to place the public kvm interface
  target/riscv: Implement function kvm_arch_init_vcpu
  target/riscv: Implement kvm_arch_get_registers
  target/riscv: Implement kvm_arch_put_registers
  target/riscv: Support start kernel directly by KVM
  hw/riscv: PLIC update external interrupt by KVM when kvm enabled
  target/riscv: Handle KVM_EXIT_RISCV_SBI exit
  target/riscv: Add host cpu type
  target/riscv: Add kvm_riscv_get/put_regs_timer
  target/riscv: Implement virtual time adjusting with vm state changing
  target/riscv: Support virtual time context synchronization
  target/riscv: Introduce dynamic time frequency for virt machine
  target/riscv: Synchronize vcpu's frequency with KVM
  target/riscv: Add time frequency migration support

 hw/intc/sifive_plic.c  |  31 +-
 hw/riscv/virt.c|  26 +-
 linux-headers/linux/kvm.h  |   8 +
 meson.build|   2 +
 target/riscv/cpu.c |  13 +
 target/riscv/cpu.h |  12 +
 target/riscv/kvm.c | 617 +
 target/riscv/kvm_riscv.h   |  25 ++
 target/riscv/machine.c |  23 ++
 target/riscv/meson.build   |   1 +
 target/riscv/sbi_ecall_interface.h |  72 
 11 files changed, 817 insertions(+), 13 deletions(-)
 create mode 100644 target/riscv/kvm.c
 create mode 100644 target/riscv/kvm_riscv.h
 create mode 100644 target/riscv/sbi_ecall_interface.h

-- 
2.19.1

[PATCH RFC v4 04/15] target/riscv: Implement kvm_arch_get_registers

2020-12-03 Thread Yifei Jiang

Get GPR CSR and FP registers from kvm by KVM_GET_ONE_REG ioctl.

Signed-off-by: Yifei Jiang 
Signed-off-by: Yipeng Yin 
---
 target/riscv/kvm.c | 150 -
 1 file changed, 149 insertions(+), 1 deletion(-)

diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
index 86660ba81b..e679619e79 100644
--- a/target/riscv/kvm.c
+++ b/target/riscv/kvm.c
@@ -50,13 +50,161 @@ static __u64 kvm_riscv_reg_id(__u64 type, __u64 idx)
 return id;
 }
 
+#define RISCV_CORE_REG(name)  kvm_riscv_reg_id(KVM_REG_RISCV_CORE, \
+ KVM_REG_RISCV_CORE_REG(name))
+
+#define RISCV_CSR_REG(name)  kvm_riscv_reg_id(KVM_REG_RISCV_CSR, \
+ KVM_REG_RISCV_CSR_REG(name))
+
+#define RISCV_FP_F_REG(idx)  kvm_riscv_reg_id(KVM_REG_RISCV_FP_F, idx)
+
+#define RISCV_FP_D_REG(idx)  kvm_riscv_reg_id(KVM_REG_RISCV_FP_D, idx)
+
+static int kvm_riscv_get_regs_core(CPUState *cs)
+{
+int ret = 0;
+int i;
+target_ulong reg;
+CPURISCVState *env = &RISCV_CPU(cs)->env;
+
+ret = kvm_get_one_reg(cs, RISCV_CORE_REG(regs.pc), ®);
+if (ret) {
+return ret;
+}
+env->pc = reg;
+
+for (i = 1; i < 32; i++) {
+__u64 id = kvm_riscv_reg_id(KVM_REG_RISCV_CORE, i);
+ret = kvm_get_one_reg(cs, id, ®);
+if (ret) {
+return ret;
+}
+env->gpr[i] = reg;
+}
+
+return ret;
+}
+
+static int kvm_riscv_get_regs_csr(CPUState *cs)
+{
+int ret = 0;
+target_ulong reg;
+CPURISCVState *env = &RISCV_CPU(cs)->env;
+
+ret = kvm_get_one_reg(cs, RISCV_CSR_REG(sstatus), ®);
+if (ret) {
+return ret;
+}
+env->mstatus = reg;
+
+ret = kvm_get_one_reg(cs, RISCV_CSR_REG(sie), ®);
+if (ret) {
+return ret;
+}
+env->mie = reg;
+
+ret = kvm_get_one_reg(cs, RISCV_CSR_REG(stvec), ®);
+if (ret) {
+return ret;
+}
+env->stvec = reg;
+
+ret = kvm_get_one_reg(cs, RISCV_CSR_REG(sscratch), ®);
+if (ret) {
+return ret;
+}
+env->sscratch = reg;
+
+ret = kvm_get_one_reg(cs, RISCV_CSR_REG(sepc), ®);
+if (ret) {
+return ret;
+}
+env->sepc = reg;
+
+ret = kvm_get_one_reg(cs, RISCV_CSR_REG(scause), ®);
+if (ret) {
+return ret;
+}
+env->scause = reg;
+
+ret = kvm_get_one_reg(cs, RISCV_CSR_REG(stval), ®);
+if (ret) {
+return ret;
+}
+env->sbadaddr = reg;
+
+ret = kvm_get_one_reg(cs, RISCV_CSR_REG(sip), ®);
+if (ret) {
+return ret;
+}
+env->mip = reg;
+
+ret = kvm_get_one_reg(cs, RISCV_CSR_REG(satp), ®);
+if (ret) {
+return ret;
+}
+env->satp = reg;
+
+return ret;
+}
+
+static int kvm_riscv_get_regs_fp(CPUState *cs)
+{
+int ret = 0;
+int i;
+CPURISCVState *env = &RISCV_CPU(cs)->env;
+
+if (riscv_has_ext(env, RVD)) {
+uint64_t reg;
+for (i = 0; i < 32; i++) {
+ret = kvm_get_one_reg(cs, RISCV_FP_D_REG(i), ®);
+if (ret) {
+return ret;
+}
+env->fpr[i] = reg;
+}
+return ret;
+}
+
+if (riscv_has_ext(env, RVF)) {
+uint32_t reg;
+for (i = 0; i < 32; i++) {
+ret = kvm_get_one_reg(cs, RISCV_FP_F_REG(i), ®);
+if (ret) {
+return ret;
+}
+env->fpr[i] = reg;
+}
+return ret;
+}
+
+return ret;
+}
+
 const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 KVM_CAP_LAST_INFO
 };
 
 int kvm_arch_get_registers(CPUState *cs)
 {
-return 0;
+int ret = 0;
+
+ret = kvm_riscv_get_regs_core(cs);
+if (ret) {
+return ret;
+}
+
+ret = kvm_riscv_get_regs_csr(cs);
+if (ret) {
+return ret;
+}
+
+ret = kvm_riscv_get_regs_fp(cs);
+if (ret) {
+return ret;
+}
+
+return ret;
 }
 
 int kvm_arch_put_registers(CPUState *cs, int level)
-- 
2.19.1

[PATCH RFC v4 11/15] target/riscv: Implement virtual time adjusting with vm state changing

2020-12-03 Thread Yifei Jiang

We hope that virtual time adjusts with vm state changing. When a vm
is stopped, guest virtual time should stop counting and kvm_timer
should be stopped. When the vm is resumed, guest virtual time should
continue to count and kvm_timer should be restored.

Signed-off-by: Yifei Jiang 
Signed-off-by: Yipeng Yin 
---
 target/riscv/kvm.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
index 79228eb726..1e16d24544 100644
--- a/target/riscv/kvm.c
+++ b/target/riscv/kvm.c
@@ -40,6 +40,7 @@
 #include "kvm_riscv.h"
 #include "sbi_ecall_interface.h"
 #include "chardev/char-fe.h"
+#include "sysemu/runstate.h"
 
 static __u64 kvm_riscv_reg_id(__u64 type, __u64 idx)
 {
@@ -448,6 +449,17 @@ unsigned long kvm_arch_vcpu_id(CPUState *cpu)
 return cpu->cpu_index;
 }
 
+static void kvm_riscv_vm_state_change(void *opaque, int running, RunState 
state)
+{
+CPUState *cs = opaque;
+
+if (running) {
+kvm_riscv_put_regs_timer(cs);
+} else {
+kvm_riscv_get_regs_timer(cs);
+}
+}
+
 void kvm_arch_init_irq_routing(KVMState *s)
 {
 }
@@ -460,6 +472,8 @@ int kvm_arch_init_vcpu(CPUState *cs)
 CPURISCVState *env = &cpu->env;
 __u64 id;
 
+qemu_add_vm_change_state_handler(kvm_riscv_vm_state_change, cs);
+
 id = kvm_riscv_reg_id(KVM_REG_RISCV_CONFIG, KVM_REG_RISCV_CONFIG_REG(isa));
 ret = kvm_get_one_reg(cs, id, &isa);
 if (ret) {
-- 
2.19.1

1 2 3 >

1 - 100 of 256 matches

Mail list logo