Re: [Qemu-devel] [PATCH 0/4] RFC: make chardev context switching safer

2019-02-20 Thread Peter Xu
On Wed, Feb 20, 2019 at 05:06:24PM +0100, Marc-André Lureau wrote:
> Hi,

Hi,

> 
> The chardev context switching code is a bit fragile, as it works as if
> the current context is properly synchronized with the new context.  It
> isn't so obvious to me that concurrent usage of chardev can't happen,
> as there might be various main loop sources being dispatched during
> the switch.
> 
> Worried about the situation, I wrote those patches a while ago, I
> think they are still worth to consider. I used to have some basic
> test, but it now conflicts a lot with recent changes. I would like to
> get some feedback about the series before I rewrite it.
> 
> The importat patch is "chardev: make qemu_chr_fe_set_handlers()
> context switching safer". It works by "freezing" the given contexts
> while the chardev will "move" (recreate to the new context) the
> various sources it owns. This looks quite ugly to me overall, but still
> safer than today.
> 
> This should allow to simplify the scary code from "monitor: set the
> chardev context from the main context/thread".

Indeed it's at least not that friendly to readers... so out of
curiosity - is this the only reason for this series?

> 
> Finally, "char-socket: restart the reconnect timer to switch context"
> shows that we have chardev backends that do not switch fully yet.

This seems irrelevant to the series, or am I wrong?

Thanks,

-- 
Peter Xu



Re: [Qemu-devel] [PATCH v2 2/3] vfio/display: add xres + yres properties

2019-02-20 Thread Gerd Hoffmann
  Hi,

> > +DEFINE_PROP_UINT32("xres", VFIOPCIDevice, display_xres, 0),
> > +DEFINE_PROP_UINT32("yres", VFIOPCIDevice, display_yres, 0),
> 
> This is actually quite fun, I started my VM with arbitrary numbers and
> the Windows GUI honored it every time.  Probably very useful for
> playing with odd screen sizes.  I also tried to break it using
> 100x100, but the display came up as 1920x1200, the maximum
> resolution GVT-g supports for this type.  I don't see that QEMU is
> bounding this though, do we depend on the mdev device to ignore it if
> we pass values it cannot support?

There is a check in vfio_display_edid_update().

cheers,
  Gerd




Re: [Qemu-devel] m68k gdb has stopped single stepping correctly

2019-02-20 Thread Lucien Anti-Spam via Qemu-devel
 Hi Thomas,
> did you ever sent the patch? I can't find it on the mailing list, and I think 
>this bug is still pending?Yes, I have patches but was waiting for my first one 
>to be successful in order to make sure I got the process down.  I will finish 
>that shortly.
CheersLuc
   
 On 24/01/2019 14.38, Alex Bennée wrote:
> 
> Lucien Anti-Spam via Qemu-devel  writes:
> 
>>    > On Thursday, January 24, 2019, 3:08:07 AM GMT+9, Emilio G. Cota 
>> wrote: > > On Wed, Jan 23, 2019 at 15:58:27 +, Lucien 
>>Anti-Spam via Qemu-devel wrote:> > Hi folks,
 I noticed that with 3.x release that the GDB options (-S -s) for certain 
 CPU results in very weird stepping.Usually stops afer a few steps, whilst 
 the stub continues responding the PC doesnt update, however, I have only 
 deeply looked at the m68k.
 In the case of the m68K the SR gets the trace bit set (T=10b), and the PC 
 doesnt update.The m68k gdbstub, and main gdbstub seem mostly unchanged.But 
 it seems the INSN handling has changed greatly for the m68k.
 Does anyone have any ideas what happened?>> Can you please bisect to find 
 at which point things start misbehaving?
>>>
>>> Thanks,
>>>  Emilio
>> Understood, I was hoping my original post might jog someone's memory about 
>> the issue.
>> Apparently not, so after some digging I found that it was introduced with 
>> the refactor to TranslatorOps, specifically two lines got dropped that 
>> update the PC if single-stepping is being performed ( commit 
>> 11ab74b01e0a8ea4973eed89c6b90fa6e4fb9fb6 )
>> Since its not valid to revert, shall I go ahead and submit a patch for
>> these two lines?
> 
> Yes please!

 Hi Lucien,

did you ever sent the patch? I can't find it on the mailing list, and I
think this bug is still pending?

 Thomas
  


Re: [Qemu-devel] [PATCH v2 1/3] vfio/display: add edid support.

2019-02-20 Thread Gerd Hoffmann
On Wed, Feb 20, 2019 at 02:54:35PM -0700, Alex Williamson wrote:
> On Wed, 20 Feb 2019 09:47:51 +0100
> Gerd Hoffmann  wrote:
> 
> > This patch adds EDID support to the vfio display (aka vgpu) code.
> > When supported by the mdev driver qemu will generate a EDID blob
> > and pass it on using the new vfio edid region.  The EDID blob will
> > be updated on UI changes (i.e. window resize), so the guest can
> > adapt.
> 
> What are the requirements to enable this resizing feature?  I grabbed
> the gvt-next-2019-02-01 branch and my ever expanding qemu:commandline
> now looks like this:
> 
>   
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>   
> 
> Other relevant sections:
> 
> 
>   
>   
> 

When using spice you also need the spicevmc channel and the spice agent
being installed and active in the guest.

> > +dpy->edid_regs->link_state = VFIO_DEVICE_GFX_LINK_STATE_UP;
> > +pwrite_field(fd, dpy->edid_info, dpy->edid_regs, link_state);
> > +trace_vfio_display_edid_link_up();
> > +return;
> > +
> > +err:
> > +trace_vfio_display_edid_write_error();
> > +return;
> 
> nit, no unwind and only one call point, could probably do without the
> goto.

Not that easily due to the goto being hidden in the pwrite_field()
macro.

> > +trace_vfio_display_edid_available();
> > +dpy->edid_regs = g_new0(struct vfio_region_gfx_edid, 1);
> > +pread_field(fd, dpy->edid_info, dpy->edid_regs, edid_offset);
> > +pread_field(fd, dpy->edid_info, dpy->edid_regs, edid_max_size);
> > +pread_field(fd, dpy->edid_info, dpy->edid_regs, max_xres);
> > +pread_field(fd, dpy->edid_info, dpy->edid_regs, max_yres);
> > +dpy->edid_blob = g_malloc0(dpy->edid_regs->edid_max_size);
> > +
> > +vfio_display_edid_update(vdev, true, 0, 0);
> > +return;
> > +
> > +err:
> > +fprintf(stderr, "%s: Oops, pread error\n", __func__);
> > +g_free(dpy->edid_regs);
> > +dpy->edid_regs = NULL;
> > +return;
> 
> This code is unreachable.

It's not.  Again, the goto is in pread_field.

But I just noticed I missed one fprintf which should be a
trace_vfio_display_edid_write_error() ...

cheers,
  Gerd




Re: [Qemu-devel] [PATCH v3 0/5] Add migration support for VFIO device

2019-02-20 Thread Neo Jia
On Thu, Feb 21, 2019 at 05:52:53AM +, Tian, Kevin wrote:
> > From: Kirti Wankhede [mailto:kwankh...@nvidia.com]
> > Sent: Thursday, February 21, 2019 1:25 PM
> > 
> > On 2/20/2019 3:52 PM, Dr. David Alan Gilbert wrote:
> > > * Kirti Wankhede (kwankh...@nvidia.com) wrote:
> > >> Add migration support for VFIO device
> > >
> > > Hi Kirti,
> > >   Can you explain how this compares and works with Yan Zhao's
> > > set?
> > 
> > This patch set is incremental version of my previous patch set:
> > https://patchwork.ozlabs.org/cover/1000719/
> > This takes care of the feedbacks received on previous version.
> > 
> > This patch set is different than Yan Zhao's set.
> > 
> 
> I can help give some background about Yan's work:
> 
> There was a big gap between Kirti's last version and the overall review
> comments, especially this one:
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg576652.html

Hi Kevin,

> 
> Then there was no reply from Kirti whether she agreed with the comments
> and was working on a new version.

Sorry, we should ack on those comments when we have received them last time.

> 
> Then we think we should jump in to keep the ball moving, based on
> a fresh implementation according to recommended direction, i.e. focusing
> on device state management instead of sticking to migration flow in kernel
> API design.
> 
> and also more importantly we provided kernel side implementation based
> on Intel GVT-g to give the whole picture of both user/kernel side changes.
> That should give people a better understanding of how those new APIs
> are expected to be used by Qemu, and to be implemented by vendor driver.
> 
> That is why Yan just shared her work.

Really glad to see the v2 version works for you guys, appreciate for the driver
side changes.

> 
> Now it's great to see that Kirti is still actively working on this effort and 
> is
> also moving toward the right direction. Let's have a close look at two
> implementations and then choose a cleaner one as base for future
> enhancements. :-)

Yes, the v3 has addressed all the comments / concerns raised in the v2, I think
we should take a look and keep moving.

Just a quick thought - would be possible / better to have Kirti focus on the 
QEMU 
patches and Yan take care GVT-g kernel driver side changes? This will give us
the best testing coverage. Hope I don't step on anybody's toes here. ;-)

Thanks,
Neo

> 
> Thanks
> Kevin



[Qemu-devel] [PATCH v2] ui/gtk: Fix the license information

2019-02-20 Thread Thomas Huth
The license information in this file is very messy. A short note at
the beginning says GPL first, but the long boilerplate code then
talks about "GNU Lesser General Public License version 2.0". First,
there is no such version of the "GNU Lesser GPL", it only started with
version 2.1. In version 2.0, it was still called "GNU Library GPL"
instead. Second, you can easily get the license of this file wrong
if you only quickly glance at the long boilerplate code.

Anyway, looking at the text of the LGPL (see COPYING.LIB in the top
directory), the license clearly states in section "3." that one should
rather replace the license information with the GPL information in
such a case of a mixture instead. Thus let's clean up the confusing
statements and use the proper GPL text only.

Signed-off-by: Thomas Huth 
---
 v2: Move the boilerplate code to the top

 ui/gtk.c | 30 +-
 1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/ui/gtk.c b/ui/gtk.c
index 949b143..ff505ae 100644
--- a/ui/gtk.c
+++ b/ui/gtk.c
@@ -6,29 +6,25 @@
  * Authors:
  *  Anthony Liguori   
  *
- * This work is licensed under the terms of the GNU GPL, version 2 or later.
- * See the COPYING file in the top-level directory.
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
  *
- * Portions from gtk-vnc:
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ *
+ * Portions from gtk-vnc (originally licensed under the LGPL v2):
  *
  * GTK VNC Widget
  *
  * Copyright (C) 2006  Anthony Liguori 
  * Copyright (C) 2009-2010 Daniel P. Berrange 
- *
- * This library is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public
- * License as published by the Free Software Foundation; either
- * version 2.0 of the License, or (at your option) any later version.
- *
- * This library is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public
- * License along with this library; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301 
USA
  */
 
 #define GETTEXT_PACKAGE "qemu"
-- 
1.8.3.1




Re: [Qemu-devel] [PATCH] virtio-net: do not start queues that are not enabled by the guest

2019-02-20 Thread Jason Wang



On 2019/2/21 下午2:00, Yuri Benditovich wrote:

On Tue, Feb 19, 2019 at 8:27 AM Jason Wang  wrote:


On 2019/2/19 上午7:34, Michael S. Tsirkin wrote:

On Mon, Feb 18, 2019 at 10:49:08PM +0200, Yuri Benditovich wrote:

On Mon, Feb 18, 2019 at 6:39 PM Michael S. Tsirkin  wrote:

On Mon, Feb 18, 2019 at 11:58:51AM +0200, Yuri Benditovich wrote:

On Mon, Feb 18, 2019 at 5:49 AM Jason Wang  wrote:

On 2019/2/13 下午10:51, Yuri Benditovich wrote:

https://bugzilla.redhat.com/show_bug.cgi?id=1608226
On startup/link-up in multiqueue configuration the virtio-net
tries to starts all the queues, including those that the guest
will not enable by VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET.
If the guest driver does not allocate queues that it will not
use (for example, Windows driver does not) and number of actually
used queues is less that maximal number supported by the device,

Is this a requirement of e.g NDIS? If not, could we simply allocate all
queues in this case. This is usually what normal Linux driver did.



this causes vhost_net_start to fail and actually disables vhost
for all the queues, reducing the performance.
Current commit fixes this: initially only first queue is started,
upon VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET started all the queues
requested by the guest.

Signed-off-by: Yuri Benditovich 
---
hw/net/virtio-net.c | 7 +--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 3f319ef723..d3b1ac6d3a 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -174,7 +174,7 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t 
status)
{
VirtIODevice *vdev = VIRTIO_DEVICE(n);
NetClientState *nc = qemu_get_queue(n->nic);
-int queues = n->multiqueue ? n->max_queues : 1;
+int queues = n->multiqueue ? n->curr_queues : 1;

if (!get_vhost_net(nc->peer)) {
return;
@@ -1016,9 +1016,12 @@ static int virtio_net_handle_mq(VirtIONet *n, uint8_t 
cmd,
return VIRTIO_NET_ERR;
}

-n->curr_queues = queues;
/* stop the backend before changing the number of queues to avoid 
handling a
 * disabled queue */
+virtio_net_set_status(vdev, 0);

Any reason for doing this?

I think there are 2 reasons:
1. The spec does not require guest SW to allocate unused queues.
2. We spend guest's physical memory to just make vhost happy when it
touches queues that it should not use.

Thanks,
Yuri Benditovich

The spec also says:
  queue_enable The driver uses this to selectively prevent the device 
from executing requests from this
  virtqueue. 1 - enabled; 0 - disabled.

While this is not a conformance clause this strongly implies that
queues which are not enabled are never accessed by device.

Yuri I am guessing you are not enabling these unused queues right?

Of course, we (Windows driver) do not.
The code of virtio-net passes max_queues to vhost and this causes
vhost to try accessing all the queues, fail on unused ones and finally
leave vhost disabled at all.

Jason, at least for 1.0 accessing disabled queues looks like a spec
violation. What do you think?


Yes, but there's some issues:

- How to detect a disabled queue for 0.9x device? Looks like there's no
way according to the spec, so device must assume all queues was enabled.

Can you please add several words - what is 0.9 device (probably this
is more about driver) and
what is the problem with it?



It's not a net specific issue. 0.9x device is legacy device defined in 
the spec. We don't have a method to disable and enable a specific queue 
at that time. Michael said we can assume queue address 0 as disabled, 
but there's still a question of how to enable it. Spec is unclear and it 
was too late to add thing for legacy device. For 1.0 device we have 
queue_enable, but its implementation is incomplete, since it can work 
with vhost correctly, we probably need to add thing to make it work.






- For 1.0, if we depends on queue_enable, we should implement the
callback for vhost I think. Otherwise it's still buggy.

So it looks tricky to enable and disable queues through set status

If I succeed to modify the patch such a way that it will act only in
'target' case,
i.e. only if some of queueus are not initialized (at time of
driver_ok), will it be more safe?



For 1.0 device, we can fix the queue_enable, but for 0.9x device how do 
you enable one specific queue in this case? (setting status?)


A fundamental question is what prevents you from just initialization all 
queues during driver start? It looks to me this save lots of efforts 
than allocating queue dynamically.


Thanks





Thanks





Thanks



+
+n->curr_queues = queues;
+
virtio_net_set_status(vdev, vdev->status);
virtio_net_set_queues(n);





Re: [Qemu-devel] [PATCH v6 01/18] update-linux-headers.sh: Copy new headers

2019-02-20 Thread Alexey Kardashevskiy



On 15/02/2019 03:36, Peter Maydell wrote:
> On Tue, 5 Feb 2019 at 17:33, Eric Auger  wrote:
>>
>> From: Alexey Kardashevskiy 
>>
>> Since Linux'es ab66dcc76d "powerpc: generate uapi header and system call
>> table files" there are 2 new files: unistd_32.h and unistd_64.h. These
>> files content is moved from unistd.h so now we have to copy new files
>> as well, just like we already do for other architectures; this does it
>> for MIPS as well.
>>
>> Also, v5.0-rc2 moved vhost bits around in 4b86713236e4bd
>> "vhost: split structs into a separate header file", add those too.
>>
>> Signed-off-by: Alexey Kardashevskiy 
> 
> I think this fix is handled by commit a0a6ef91a4a4edde27
> (now in master), yes ?

uff, just noticed this mail. yes, it is done by a0a6ef91a4a4edde27. Thanks,


-- 
Alexey



Re: [Qemu-devel] [PATCH v2 0/3] PCDIMM cleanup

2019-02-20 Thread Wei Yang
On Thu, Feb 21, 2019 at 02:03:19PM +0800, Xiao Guangrong wrote:
>
>
>On 2/20/19 8:51 AM, Wei Yang wrote:
>> Three trivial cleanup for pc-dimm.
>> 
>> Patch [1] remove the check on class->hotpluggable since pc-dimm is always
>> hotpluggable.
>> Patch [2] remove nvdimm_realize
>> Patch [2] remove pcdimm realize-callback
>> 
>> v2:
>>* fix warning in Patch 1
>>* split Patch 2 into two
>> 
>> Wei Yang (3):
>>pc-dimm: remove check on pc-dimm hotpluggable
>>mem/nvdimm: remove nvdimm_realize
>
>>pc-dimm: revert "introduce realize callback"
>
>I think the word 'revert' is not so precise as that hints
>the commit is bugly, instead, it was factored in the later
>comments then becomes useless now.
>

You are right. It is always difficult for me to pick up the proper word.

>Anyway, this pathset looks good to me.
>
>Reviewed-by: Xiao Guangrong 

Thanks, Xiao.

-- 
Wei Yang
Help you, Help me



Re: [Qemu-devel] [PATCH v2 0/3] PCDIMM cleanup

2019-02-20 Thread Xiao Guangrong




On 2/20/19 8:51 AM, Wei Yang wrote:

Three trivial cleanup for pc-dimm.

Patch [1] remove the check on class->hotpluggable since pc-dimm is always
hotpluggable.
Patch [2] remove nvdimm_realize
Patch [2] remove pcdimm realize-callback

v2:
   * fix warning in Patch 1
   * split Patch 2 into two

Wei Yang (3):
   pc-dimm: remove check on pc-dimm hotpluggable
   mem/nvdimm: remove nvdimm_realize



   pc-dimm: revert "introduce realize callback"


I think the word 'revert' is not so precise as that hints
the commit is bugly, instead, it was factored in the later
comments then becomes useless now.

Anyway, this pathset looks good to me.

Reviewed-by: Xiao Guangrong 



Re: [Qemu-devel] [PATCH] virtio-net: do not start queues that are not enabled by the guest

2019-02-20 Thread Yuri Benditovich
On Tue, Feb 19, 2019 at 8:27 AM Jason Wang  wrote:
>
>
> On 2019/2/19 上午7:34, Michael S. Tsirkin wrote:
> > On Mon, Feb 18, 2019 at 10:49:08PM +0200, Yuri Benditovich wrote:
> >> On Mon, Feb 18, 2019 at 6:39 PM Michael S. Tsirkin  wrote:
> >>> On Mon, Feb 18, 2019 at 11:58:51AM +0200, Yuri Benditovich wrote:
>  On Mon, Feb 18, 2019 at 5:49 AM Jason Wang  wrote:
> >
> > On 2019/2/13 下午10:51, Yuri Benditovich wrote:
> >> https://bugzilla.redhat.com/show_bug.cgi?id=1608226
> >> On startup/link-up in multiqueue configuration the virtio-net
> >> tries to starts all the queues, including those that the guest
> >> will not enable by VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET.
> >> If the guest driver does not allocate queues that it will not
> >> use (for example, Windows driver does not) and number of actually
> >> used queues is less that maximal number supported by the device,
> >
> > Is this a requirement of e.g NDIS? If not, could we simply allocate all
> > queues in this case. This is usually what normal Linux driver did.
> >
> >
> >> this causes vhost_net_start to fail and actually disables vhost
> >> for all the queues, reducing the performance.
> >> Current commit fixes this: initially only first queue is started,
> >> upon VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET started all the queues
> >> requested by the guest.
> >>
> >> Signed-off-by: Yuri Benditovich 
> >> ---
> >>hw/net/virtio-net.c | 7 +--
> >>1 file changed, 5 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> >> index 3f319ef723..d3b1ac6d3a 100644
> >> --- a/hw/net/virtio-net.c
> >> +++ b/hw/net/virtio-net.c
> >> @@ -174,7 +174,7 @@ static void virtio_net_vhost_status(VirtIONet *n, 
> >> uint8_t status)
> >>{
> >>VirtIODevice *vdev = VIRTIO_DEVICE(n);
> >>NetClientState *nc = qemu_get_queue(n->nic);
> >> -int queues = n->multiqueue ? n->max_queues : 1;
> >> +int queues = n->multiqueue ? n->curr_queues : 1;
> >>
> >>if (!get_vhost_net(nc->peer)) {
> >>return;
> >> @@ -1016,9 +1016,12 @@ static int virtio_net_handle_mq(VirtIONet *n, 
> >> uint8_t cmd,
> >>return VIRTIO_NET_ERR;
> >>}
> >>
> >> -n->curr_queues = queues;
> >>/* stop the backend before changing the number of queues to 
> >> avoid handling a
> >> * disabled queue */
> >> +virtio_net_set_status(vdev, 0);
> >
> > Any reason for doing this?
>  I think there are 2 reasons:
>  1. The spec does not require guest SW to allocate unused queues.
>  2. We spend guest's physical memory to just make vhost happy when it
>  touches queues that it should not use.
> 
>  Thanks,
>  Yuri Benditovich
> >>> The spec also says:
> >>>  queue_enable The driver uses this to selectively prevent the 
> >>> device from executing requests from this
> >>>  virtqueue. 1 - enabled; 0 - disabled.
> >>>
> >>> While this is not a conformance clause this strongly implies that
> >>> queues which are not enabled are never accessed by device.
> >>>
> >>> Yuri I am guessing you are not enabling these unused queues right?
> >> Of course, we (Windows driver) do not.
> >> The code of virtio-net passes max_queues to vhost and this causes
> >> vhost to try accessing all the queues, fail on unused ones and finally
> >> leave vhost disabled at all.
> >
> > Jason, at least for 1.0 accessing disabled queues looks like a spec
> > violation. What do you think?
>
>
> Yes, but there's some issues:
>
> - How to detect a disabled queue for 0.9x device? Looks like there's no
> way according to the spec, so device must assume all queues was enabled.

Can you please add several words - what is 0.9 device (probably this
is more about driver) and
what is the problem with it?

>
> - For 1.0, if we depends on queue_enable, we should implement the
> callback for vhost I think. Otherwise it's still buggy.
>
> So it looks tricky to enable and disable queues through set status

If I succeed to modify the patch such a way that it will act only in
'target' case,
i.e. only if some of queueus are not initialized (at time of
driver_ok), will it be more safe?

>
> Thanks
>
>
> >
> >>>
> >>>
> > Thanks
> >
> >
> >> +
> >> +n->curr_queues = queues;
> >> +
> >>virtio_net_set_status(vdev, vdev->status);
> >>virtio_net_set_queues(n);
> >>



Re: [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration

2019-02-20 Thread Gonglei (Arei)
> >
> > > -Original Message-
> > > From: Zhao Yan [mailto:yan.y.z...@intel.com]
> > > Sent: Thursday, February 21, 2019 10:05 AM
> > > To: Gonglei (Arei) 
> > > Cc: alex.william...@redhat.com; qemu-devel@nongnu.org;
> > > intel-gvt-...@lists.freedesktop.org; zhengxiao...@alibaba-inc.com;
> > > yi.l@intel.com; eskul...@redhat.com; ziye.y...@intel.com;
> > > coh...@redhat.com; shuangtai@alibaba-inc.com;
> dgilb...@redhat.com;
> > > zhi.a.w...@intel.com; mlevi...@redhat.com; pa...@linux.ibm.com;
> > > a...@ozlabs.ru; eau...@redhat.com; fel...@nutanix.com;
> > > jonathan.dav...@nutanix.com; changpeng@intel.com;
> ken@amd.com;
> > > kwankh...@nvidia.com; kevin.t...@intel.com; c...@nvidia.com;
> > > k...@vger.kernel.org
> > > Subject: Re: [PATCH 0/5] QEMU VFIO live migration
> > >
> > > > >
> > > > > > 5) About log sync, why not register log_global_start/stop in
> > > > > vfio_memory_listener?
> > > > > >
> > > > > >
> > > > > seems log_global_start/stop cannot be iterately called in pre-copy
> phase?
> > > > > for dirty pages in system memory, it's better to transfer dirty data
> > > > > iteratively to reduce down time, right?
> > > > >
> > > >
> > > > We just need invoking only once for start and stop logging. Why we need
> to
> > > call
> > > > them literately? See memory_listener of vhost.
> > > >
> > > the dirty pages in system memory produces by device is incremental.
> > > if it can be got iteratively, the dirty pages in stop-and-copy phase can 
> > > be
> > > minimal.
> > > :)
> > >
> > I mean starting or stopping the capability of logging, not log sync.
> >
> > We register the below callbacks:
> >
> > .log_sync = vfio_log_sync,
> > .log_global_start = vfio_log_global_start,
> > .log_global_stop = vfio_log_global_stop,
> >
> .log_global_start is also a good point to notify logging state.
> But if notifying in .save_setup handler, we can do fine-grained
> control of when to notify of logging starting together with get_buffer
> operation.
> Is there any special benifit by registering to .log_global_start/stop?
> 

Performance benefit when one VM has multiple same vfio devices.


Regards,
-Gonglei



Re: [Qemu-devel] [PATCH v3 0/5] Add migration support for VFIO device

2019-02-20 Thread Tian, Kevin
> From: Kirti Wankhede [mailto:kwankh...@nvidia.com]
> Sent: Thursday, February 21, 2019 1:25 PM
> 
> On 2/20/2019 3:52 PM, Dr. David Alan Gilbert wrote:
> > * Kirti Wankhede (kwankh...@nvidia.com) wrote:
> >> Add migration support for VFIO device
> >
> > Hi Kirti,
> >   Can you explain how this compares and works with Yan Zhao's
> > set?
> 
> This patch set is incremental version of my previous patch set:
> https://patchwork.ozlabs.org/cover/1000719/
> This takes care of the feedbacks received on previous version.
> 
> This patch set is different than Yan Zhao's set.
> 

I can help give some background about Yan's work:

There was a big gap between Kirti's last version and the overall review
comments, especially this one:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg576652.html

Then there was no reply from Kirti whether she agreed with the comments
and was working on a new version.

Then we think we should jump in to keep the ball moving, based on
a fresh implementation according to recommended direction, i.e. focusing
on device state management instead of sticking to migration flow in kernel
API design.

and also more importantly we provided kernel side implementation based
on Intel GVT-g to give the whole picture of both user/kernel side changes.
That should give people a better understanding of how those new APIs
are expected to be used by Qemu, and to be implemented by vendor driver.

That is why Yan just shared her work.

Now it's great to see that Kirti is still actively working on this effort and is
also moving toward the right direction. Let's have a close look at two
implementations and then choose a cleaner one as base for future
enhancements. :-)

Thanks
Kevin


Re: [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration

2019-02-20 Thread Gonglei (Arei)







> -Original Message-
> From: Zhao Yan [mailto:yan.y.z...@intel.com]
> Sent: Thursday, February 21, 2019 12:08 PM
> To: Gonglei (Arei) 
> Cc: c...@nvidia.com; k...@vger.kernel.org; a...@ozlabs.ru;
> zhengxiao...@alibaba-inc.com; shuangtai@alibaba-inc.com;
> qemu-devel@nongnu.org; kwankh...@nvidia.com; eau...@redhat.com;
> yi.l@intel.com; eskul...@redhat.com; ziye.y...@intel.com;
> mlevi...@redhat.com; pa...@linux.ibm.com; fel...@nutanix.com;
> ken@amd.com; kevin.t...@intel.com; dgilb...@redhat.com;
> alex.william...@redhat.com; intel-gvt-...@lists.freedesktop.org;
> changpeng@intel.com; coh...@redhat.com; zhi.a.w...@intel.com;
> jonathan.dav...@nutanix.com
> Subject: Re: [PATCH 0/5] QEMU VFIO live migration
> 
> On Thu, Feb 21, 2019 at 03:33:24AM +, Gonglei (Arei) wrote:
> >
> > > -Original Message-
> > > From: Zhao Yan [mailto:yan.y.z...@intel.com]
> > > Sent: Thursday, February 21, 2019 9:59 AM
> > > To: Gonglei (Arei) 
> > > Cc: alex.william...@redhat.com; qemu-devel@nongnu.org;
> > > intel-gvt-...@lists.freedesktop.org; zhengxiao...@alibaba-inc.com;
> > > yi.l@intel.com; eskul...@redhat.com; ziye.y...@intel.com;
> > > coh...@redhat.com; shuangtai@alibaba-inc.com;
> dgilb...@redhat.com;
> > > zhi.a.w...@intel.com; mlevi...@redhat.com; pa...@linux.ibm.com;
> > > a...@ozlabs.ru; eau...@redhat.com; fel...@nutanix.com;
> > > jonathan.dav...@nutanix.com; changpeng@intel.com;
> ken@amd.com;
> > > kwankh...@nvidia.com; kevin.t...@intel.com; c...@nvidia.com;
> > > k...@vger.kernel.org
> > > Subject: Re: [PATCH 0/5] QEMU VFIO live migration
> > >
> > > On Thu, Feb 21, 2019 at 01:35:43AM +, Gonglei (Arei) wrote:
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Zhao Yan [mailto:yan.y.z...@intel.com]
> > > > > Sent: Thursday, February 21, 2019 8:25 AM
> > > > > To: Gonglei (Arei) 
> > > > > Cc: alex.william...@redhat.com; qemu-devel@nongnu.org;
> > > > > intel-gvt-...@lists.freedesktop.org; zhengxiao...@alibaba-inc.com;
> > > > > yi.l@intel.com; eskul...@redhat.com; ziye.y...@intel.com;
> > > > > coh...@redhat.com; shuangtai@alibaba-inc.com;
> > > dgilb...@redhat.com;
> > > > > zhi.a.w...@intel.com; mlevi...@redhat.com; pa...@linux.ibm.com;
> > > > > a...@ozlabs.ru; eau...@redhat.com; fel...@nutanix.com;
> > > > > jonathan.dav...@nutanix.com; changpeng@intel.com;
> > > ken@amd.com;
> > > > > kwankh...@nvidia.com; kevin.t...@intel.com; c...@nvidia.com;
> > > > > k...@vger.kernel.org
> > > > > Subject: Re: [PATCH 0/5] QEMU VFIO live migration
> > > > >
> > > > > On Wed, Feb 20, 2019 at 11:56:01AM +, Gonglei (Arei) wrote:
> > > > > > Hi yan,
> > > > > >
> > > > > > Thanks for your work.
> > > > > >
> > > > > > I have some suggestions or questions:
> > > > > >
> > > > > > 1) Would you add msix mode support,? if not, pls add a check in
> > > > > vfio_pci_save_config(), likes Nvidia's solution.
> > > > > ok.
> > > > >
> > > > > > 2) We should start vfio devices before vcpu resumes, so we can't 
> > > > > > rely
> on
> > > vm
> > > > > start change handler completely.
> > > > > vfio devices is by default set to running state.
> > > > > In the target machine, its state transition flow is
> running->stop->running.
> > > >
> > > > That's confusing. We should start vfio devices after vfio_load_state,
> > > otherwise
> > > > how can you keep the devices' information are the same between source
> side
> > > > and destination side?
> > > >
> > > so, your meaning is to set device state to running in the first call to
> > > vfio_load_state?
> > >
> > No, it should start devices after vfio_load_state and before vcpu resuming.
> >
> 
> What about set device state to running in load_cleanup handler ?
> 

The timing is fine, but you should also think about if should set device state 
to running in failure branches when calling load_cleanup handler.

Regards,
-Gonglei



Re: [Qemu-devel] [PATCH v3 0/5] Add migration support for VFIO device

2019-02-20 Thread Kirti Wankhede



On 2/20/2019 3:52 PM, Dr. David Alan Gilbert wrote:
> * Kirti Wankhede (kwankh...@nvidia.com) wrote:
>> Add migration support for VFIO device
> 
> Hi Kirti,
>   Can you explain how this compares and works with Yan Zhao's
> set?

This patch set is incremental version of my previous patch set:
https://patchwork.ozlabs.org/cover/1000719/
This takes care of the feedbacks received on previous version.

This patch set is different than Yan Zhao's set.

Thanks,
Kirti

>   These look like two incompatible solutions to me - if that's
> the case we need to take a step back and figure out how to combine
> them into one.
> 
> Dave
> 
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
> 



Re: [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration

2019-02-20 Thread Zhao Yan
On Thu, Feb 21, 2019 at 03:16:45AM +, Gonglei (Arei) wrote:
> 
> 
> 
> > -Original Message-
> > From: Zhao Yan [mailto:yan.y.z...@intel.com]
> > Sent: Thursday, February 21, 2019 10:05 AM
> > To: Gonglei (Arei) 
> > Cc: alex.william...@redhat.com; qemu-devel@nongnu.org;
> > intel-gvt-...@lists.freedesktop.org; zhengxiao...@alibaba-inc.com;
> > yi.l@intel.com; eskul...@redhat.com; ziye.y...@intel.com;
> > coh...@redhat.com; shuangtai@alibaba-inc.com; dgilb...@redhat.com;
> > zhi.a.w...@intel.com; mlevi...@redhat.com; pa...@linux.ibm.com;
> > a...@ozlabs.ru; eau...@redhat.com; fel...@nutanix.com;
> > jonathan.dav...@nutanix.com; changpeng@intel.com; ken@amd.com;
> > kwankh...@nvidia.com; kevin.t...@intel.com; c...@nvidia.com;
> > k...@vger.kernel.org
> > Subject: Re: [PATCH 0/5] QEMU VFIO live migration
> > 
> > > >
> > > > > 5) About log sync, why not register log_global_start/stop in
> > > > vfio_memory_listener?
> > > > >
> > > > >
> > > > seems log_global_start/stop cannot be iterately called in pre-copy 
> > > > phase?
> > > > for dirty pages in system memory, it's better to transfer dirty data
> > > > iteratively to reduce down time, right?
> > > >
> > >
> > > We just need invoking only once for start and stop logging. Why we need to
> > call
> > > them literately? See memory_listener of vhost.
> > >
> > the dirty pages in system memory produces by device is incremental.
> > if it can be got iteratively, the dirty pages in stop-and-copy phase can be
> > minimal.
> > :)
> > 
> I mean starting or stopping the capability of logging, not log sync. 
> 
> We register the below callbacks:
> 
> .log_sync = vfio_log_sync,
> .log_global_start = vfio_log_global_start,
> .log_global_stop = vfio_log_global_stop,
>
.log_global_start is also a good point to notify logging state.
But if notifying in .save_setup handler, we can do fine-grained
control of when to notify of logging starting together with get_buffer
operation.
Is there any special benifit by registering to .log_global_start/stop?


> Regards,
> -Gonglei



Re: [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration

2019-02-20 Thread Zhao Yan
On Thu, Feb 21, 2019 at 03:33:24AM +, Gonglei (Arei) wrote:
> 
> > -Original Message-
> > From: Zhao Yan [mailto:yan.y.z...@intel.com]
> > Sent: Thursday, February 21, 2019 9:59 AM
> > To: Gonglei (Arei) 
> > Cc: alex.william...@redhat.com; qemu-devel@nongnu.org;
> > intel-gvt-...@lists.freedesktop.org; zhengxiao...@alibaba-inc.com;
> > yi.l@intel.com; eskul...@redhat.com; ziye.y...@intel.com;
> > coh...@redhat.com; shuangtai@alibaba-inc.com; dgilb...@redhat.com;
> > zhi.a.w...@intel.com; mlevi...@redhat.com; pa...@linux.ibm.com;
> > a...@ozlabs.ru; eau...@redhat.com; fel...@nutanix.com;
> > jonathan.dav...@nutanix.com; changpeng@intel.com; ken@amd.com;
> > kwankh...@nvidia.com; kevin.t...@intel.com; c...@nvidia.com;
> > k...@vger.kernel.org
> > Subject: Re: [PATCH 0/5] QEMU VFIO live migration
> > 
> > On Thu, Feb 21, 2019 at 01:35:43AM +, Gonglei (Arei) wrote:
> > >
> > >
> > > > -Original Message-
> > > > From: Zhao Yan [mailto:yan.y.z...@intel.com]
> > > > Sent: Thursday, February 21, 2019 8:25 AM
> > > > To: Gonglei (Arei) 
> > > > Cc: alex.william...@redhat.com; qemu-devel@nongnu.org;
> > > > intel-gvt-...@lists.freedesktop.org; zhengxiao...@alibaba-inc.com;
> > > > yi.l@intel.com; eskul...@redhat.com; ziye.y...@intel.com;
> > > > coh...@redhat.com; shuangtai@alibaba-inc.com;
> > dgilb...@redhat.com;
> > > > zhi.a.w...@intel.com; mlevi...@redhat.com; pa...@linux.ibm.com;
> > > > a...@ozlabs.ru; eau...@redhat.com; fel...@nutanix.com;
> > > > jonathan.dav...@nutanix.com; changpeng@intel.com;
> > ken@amd.com;
> > > > kwankh...@nvidia.com; kevin.t...@intel.com; c...@nvidia.com;
> > > > k...@vger.kernel.org
> > > > Subject: Re: [PATCH 0/5] QEMU VFIO live migration
> > > >
> > > > On Wed, Feb 20, 2019 at 11:56:01AM +, Gonglei (Arei) wrote:
> > > > > Hi yan,
> > > > >
> > > > > Thanks for your work.
> > > > >
> > > > > I have some suggestions or questions:
> > > > >
> > > > > 1) Would you add msix mode support,? if not, pls add a check in
> > > > vfio_pci_save_config(), likes Nvidia's solution.
> > > > ok.
> > > >
> > > > > 2) We should start vfio devices before vcpu resumes, so we can't rely 
> > > > > on
> > vm
> > > > start change handler completely.
> > > > vfio devices is by default set to running state.
> > > > In the target machine, its state transition flow is 
> > > > running->stop->running.
> > >
> > > That's confusing. We should start vfio devices after vfio_load_state,
> > otherwise
> > > how can you keep the devices' information are the same between source side
> > > and destination side?
> > >
> > so, your meaning is to set device state to running in the first call to
> > vfio_load_state?
> > 
> No, it should start devices after vfio_load_state and before vcpu resuming.
>

What about set device state to running in load_cleanup handler ?

> > > > so, maybe you can ignore the stop notification in kernel?
> > > > > 3) We'd better support live migration rollback since have many failure
> > > > scenarios,
> > > > >  register a migration notifier is a good choice.
> > > > I think this patchset can also handle the failure case well.
> > > > if migration failure or cancelling happens,
> > > > in cleanup handler, LOGGING state is cleared. device state(running or
> > > > stopped) keeps as it is).
> > >
> > > IIRC there're many failure paths don't calling cleanup handler.
> > >
> > could you take an example?
> 
> Never mind, that's another bug I think. 
> 
> > > > then,
> > > > if vm switches back to running, device state will be set to running;
> > > > if vm stayes at stopped state, device state is also stopped (it has no
> > > > meaning to let it in running state).
> > > > Do you think so ?
> > > >
> > > IF the underlying state machine is complicated,
> > > We should tell the canceling state to vendor driver proactively.
> > >
> > That makes sense.
> > 
> > > > > 4) Four memory region for live migration is too complicated IMHO.
> > > > one big region requires the sub-regions well padded.
> > > > like for the first control fields, they have to be padded to 4K.
> > > > the same for other data fields.
> > > > Otherwise, mmap simply fails, because the start-offset and size for mmap
> > > > both need to be PAGE aligned.
> > > >
> > > But if we don't need use mmap for control filed and device state, they are
> > small basically.
> > > The performance is enough using pread/pwrite.
> > >
> > we don't mmap control fields. but if data fields going immedately after
> > control fields (e.g. just 64 bytes), we can't mmap data fields
> > successfully because its start offset is 64. Therefore control fields have
> > to be padded to 4k to let data fields start from 4k.
> > That's the drawback of one big region holding both control and data fields.
> > 
> > > > Also, 4 regions is clearer in my view :)
> > > >
> > > > > 5) About log sync, why not register log_global_start/stop in
> > > > vfio_memory_listener?
> > > > >
> > > > >
> > > > seems 

Re: [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration

2019-02-20 Thread Gonglei (Arei)


> -Original Message-
> From: Zhao Yan [mailto:yan.y.z...@intel.com]
> Sent: Thursday, February 21, 2019 9:59 AM
> To: Gonglei (Arei) 
> Cc: alex.william...@redhat.com; qemu-devel@nongnu.org;
> intel-gvt-...@lists.freedesktop.org; zhengxiao...@alibaba-inc.com;
> yi.l@intel.com; eskul...@redhat.com; ziye.y...@intel.com;
> coh...@redhat.com; shuangtai@alibaba-inc.com; dgilb...@redhat.com;
> zhi.a.w...@intel.com; mlevi...@redhat.com; pa...@linux.ibm.com;
> a...@ozlabs.ru; eau...@redhat.com; fel...@nutanix.com;
> jonathan.dav...@nutanix.com; changpeng@intel.com; ken@amd.com;
> kwankh...@nvidia.com; kevin.t...@intel.com; c...@nvidia.com;
> k...@vger.kernel.org
> Subject: Re: [PATCH 0/5] QEMU VFIO live migration
> 
> On Thu, Feb 21, 2019 at 01:35:43AM +, Gonglei (Arei) wrote:
> >
> >
> > > -Original Message-
> > > From: Zhao Yan [mailto:yan.y.z...@intel.com]
> > > Sent: Thursday, February 21, 2019 8:25 AM
> > > To: Gonglei (Arei) 
> > > Cc: alex.william...@redhat.com; qemu-devel@nongnu.org;
> > > intel-gvt-...@lists.freedesktop.org; zhengxiao...@alibaba-inc.com;
> > > yi.l@intel.com; eskul...@redhat.com; ziye.y...@intel.com;
> > > coh...@redhat.com; shuangtai@alibaba-inc.com;
> dgilb...@redhat.com;
> > > zhi.a.w...@intel.com; mlevi...@redhat.com; pa...@linux.ibm.com;
> > > a...@ozlabs.ru; eau...@redhat.com; fel...@nutanix.com;
> > > jonathan.dav...@nutanix.com; changpeng@intel.com;
> ken@amd.com;
> > > kwankh...@nvidia.com; kevin.t...@intel.com; c...@nvidia.com;
> > > k...@vger.kernel.org
> > > Subject: Re: [PATCH 0/5] QEMU VFIO live migration
> > >
> > > On Wed, Feb 20, 2019 at 11:56:01AM +, Gonglei (Arei) wrote:
> > > > Hi yan,
> > > >
> > > > Thanks for your work.
> > > >
> > > > I have some suggestions or questions:
> > > >
> > > > 1) Would you add msix mode support,? if not, pls add a check in
> > > vfio_pci_save_config(), likes Nvidia's solution.
> > > ok.
> > >
> > > > 2) We should start vfio devices before vcpu resumes, so we can't rely on
> vm
> > > start change handler completely.
> > > vfio devices is by default set to running state.
> > > In the target machine, its state transition flow is 
> > > running->stop->running.
> >
> > That's confusing. We should start vfio devices after vfio_load_state,
> otherwise
> > how can you keep the devices' information are the same between source side
> > and destination side?
> >
> so, your meaning is to set device state to running in the first call to
> vfio_load_state?
> 
No, it should start devices after vfio_load_state and before vcpu resuming.

> > > so, maybe you can ignore the stop notification in kernel?
> > > > 3) We'd better support live migration rollback since have many failure
> > > scenarios,
> > > >  register a migration notifier is a good choice.
> > > I think this patchset can also handle the failure case well.
> > > if migration failure or cancelling happens,
> > > in cleanup handler, LOGGING state is cleared. device state(running or
> > > stopped) keeps as it is).
> >
> > IIRC there're many failure paths don't calling cleanup handler.
> >
> could you take an example?

Never mind, that's another bug I think. 

> > > then,
> > > if vm switches back to running, device state will be set to running;
> > > if vm stayes at stopped state, device state is also stopped (it has no
> > > meaning to let it in running state).
> > > Do you think so ?
> > >
> > IF the underlying state machine is complicated,
> > We should tell the canceling state to vendor driver proactively.
> >
> That makes sense.
> 
> > > > 4) Four memory region for live migration is too complicated IMHO.
> > > one big region requires the sub-regions well padded.
> > > like for the first control fields, they have to be padded to 4K.
> > > the same for other data fields.
> > > Otherwise, mmap simply fails, because the start-offset and size for mmap
> > > both need to be PAGE aligned.
> > >
> > But if we don't need use mmap for control filed and device state, they are
> small basically.
> > The performance is enough using pread/pwrite.
> >
> we don't mmap control fields. but if data fields going immedately after
> control fields (e.g. just 64 bytes), we can't mmap data fields
> successfully because its start offset is 64. Therefore control fields have
> to be padded to 4k to let data fields start from 4k.
> That's the drawback of one big region holding both control and data fields.
> 
> > > Also, 4 regions is clearer in my view :)
> > >
> > > > 5) About log sync, why not register log_global_start/stop in
> > > vfio_memory_listener?
> > > >
> > > >
> > > seems log_global_start/stop cannot be iterately called in pre-copy phase?
> > > for dirty pages in system memory, it's better to transfer dirty data
> > > iteratively to reduce down time, right?
> > >
> >
> > We just need invoking only once for start and stop logging. Why we need to
> call
> > them literately? See memory_listener of vhost.
> >
> 
> 
> 
> > Regards,
> > -Gonglei



Re: [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration

2019-02-20 Thread Gonglei (Arei)




> -Original Message-
> From: Zhao Yan [mailto:yan.y.z...@intel.com]
> Sent: Thursday, February 21, 2019 10:05 AM
> To: Gonglei (Arei) 
> Cc: alex.william...@redhat.com; qemu-devel@nongnu.org;
> intel-gvt-...@lists.freedesktop.org; zhengxiao...@alibaba-inc.com;
> yi.l@intel.com; eskul...@redhat.com; ziye.y...@intel.com;
> coh...@redhat.com; shuangtai@alibaba-inc.com; dgilb...@redhat.com;
> zhi.a.w...@intel.com; mlevi...@redhat.com; pa...@linux.ibm.com;
> a...@ozlabs.ru; eau...@redhat.com; fel...@nutanix.com;
> jonathan.dav...@nutanix.com; changpeng@intel.com; ken@amd.com;
> kwankh...@nvidia.com; kevin.t...@intel.com; c...@nvidia.com;
> k...@vger.kernel.org
> Subject: Re: [PATCH 0/5] QEMU VFIO live migration
> 
> > >
> > > > 5) About log sync, why not register log_global_start/stop in
> > > vfio_memory_listener?
> > > >
> > > >
> > > seems log_global_start/stop cannot be iterately called in pre-copy phase?
> > > for dirty pages in system memory, it's better to transfer dirty data
> > > iteratively to reduce down time, right?
> > >
> >
> > We just need invoking only once for start and stop logging. Why we need to
> call
> > them literately? See memory_listener of vhost.
> >
> the dirty pages in system memory produces by device is incremental.
> if it can be got iteratively, the dirty pages in stop-and-copy phase can be
> minimal.
> :)
> 
I mean starting or stopping the capability of logging, not log sync. 

We register the below callbacks:

.log_sync = vfio_log_sync,
.log_global_start = vfio_log_global_start,
.log_global_stop = vfio_log_global_stop,

Regards,
-Gonglei



Re: [Qemu-devel] [PATCH 05/19] ppc/pnv: add XIVE support

2019-02-20 Thread David Gibson
On Tue, Feb 19, 2019 at 08:31:25AM +0100, Cédric Le Goater wrote:
> On 2/12/19 6:40 AM, David Gibson wrote:
> > On Mon, Jan 28, 2019 at 10:46:11AM +0100, Cédric Le Goater wrote:
[snip]
> >>  #endif /* _PPC_PNV_H */
> >> diff --git a/include/hw/ppc/pnv_core.h b/include/hw/ppc/pnv_core.h
> >> index 9961ea3a92cd..8e57c064e661 100644
> >> --- a/include/hw/ppc/pnv_core.h
> >> +++ b/include/hw/ppc/pnv_core.h
> >> @@ -49,6 +49,7 @@ typedef struct PnvCoreClass {
> >>  
> >>  typedef struct PnvCPUState {
> >>  struct ICPState *icp;
> >> +struct XiveTCTX *tctx;
> > 
> > Unlike sPAPR, we really do always know in advance the interrupt
> > controller for a particular machine.  I think it makes sense to
> > further split the POWER8 and POWER9 cases here, so we only track one
> > for any given setup.
> 
> So, you would define a : 
> 
>   typedef struct Pnv9CPUState {
>   struct XiveTCTX *tctx;
>   } Pnv9CPUState;
> 
> to be allocated when the core is realized ? and later the routine 
> pnv_chip_power9_intc_create() would assign the ->tctx pointer.

Sounds about right.

[snip]
> >> +uint32_t  nr_ends;
> >> +XiveENDSource end_source;
> >> +
> >> +/* Interrupt controller registers */
> >> +uint64_t  regs[0x300];
> >> +
> >> +/* Can be configured by FW */
> >> +uint32_t  tctx_chipid;
> >> +uint32_t  chip_id;
> > 
> > Can't you derive that since you have a pointer to the owning chip?
> 
> Not always, there are register fields to purposely override this value.
> I can improve the current code a little I think.

Ok.

[snip]
> >> +/*
> >> + * Virtual structures table (VST)
> >> + */
> >> +typedef struct XiveVstInfo {
> >> +uint32_ttype;
> >> +const char *name;
> >> +uint32_tsize;
> >> +uint32_tmax_blocks;
> >> +} XiveVstInfo;
> >> +
> >> +static const XiveVstInfo vst_infos[] = {
> >> +[VST_TSEL_IVT]  = { VST_TSEL_IVT,  "EAT",  sizeof(XiveEAS), 16 },
> > 
> > I don't love explicitly storing the type/index in each record, as well
> > as it being implicit in the table slot.
> 
> The 'vst_infos' table decribes the different table types and the 'type' 
> field is used to index the runtime table of VSDs. See
> pnv_xive_vst_addr()

Yes, I know what it's for but it's still redundant information.  You
could avoid it, for example, by passing around an index instead of a
pointer to a vst_infos[] slot - then you can look up both vst_infos
and the other table using that index.

[snip]
> >> +case CQ_TM1_BAR: /* TM BAR. 4 pages. Map only once */
> >> +case CQ_TM2_BAR: /* second TM BAR. for hotplug. Not modeled */
> >> +xive->tm_shift = val & CQ_TM_BAR_64K ? 16 : 12;
> >> +if (!(val & CQ_TM_BAR_VALID)) {
> >> +xive->tm_base = 0;
> >> +if (xive->regs[reg] & CQ_TM_BAR_VALID && xive->chip_id == 0) {
> >> +memory_region_del_subregion(sysmem, >tm_mmio);
> >> +}
> >> +} else {
> >> +xive->tm_base = val & ~(CQ_TM_BAR_VALID | CQ_TM_BAR_64K);
> >> +if (!(xive->regs[reg] & CQ_TM_BAR_VALID) && xive->chip_id == 
> >> 0) {
> >> +memory_region_add_subregion(sysmem, xive->tm_base,
> >> +>tm_mmio);
> >> +}
> >> +}
> >> +break;
> >> +
> >> +case CQ_PC_BARM:
> >> +xive->regs[reg] = val;
> > 
> > As discussed elsewhere, this seems to be a big mix of writing things
> > directly into regs[reg] and doing other things instead, and you really
> > want to go one way or the other.  I'd suggest dropping xive->regs[]
> > and instead putting the state you need persistent into its own
> > variables.
> 
> I made a big effort to introduce helper routines to avoid storing values 
> that can be calculated under the PnvXive model, as you asked for it. 
> The assignment above is only necessary for the pnv_xive_pc_size() below
> and I don't know how handle this current case without duplicating the 
> switch statement, which I think is ugly.

I'm not sure quite what you mean about duplicating the case.

The point here is that since you're only storing in a couple of the
switch cases, you can just have explicit data backing just those
values and write to those in the switch cases instead of having a
great big regs[] array of which only a few bits are used.

> So, I will keep the xive->regs[] and make the couple of fixes still needed.

[snip]
> >> +/*
> >> + * Virtualization Controller MMIO region containing the IPI and END ESB 
> >> pages
> >> + */
> >> +static uint64_t pnv_xive_vc_read(void *opaque, hwaddr offset,
> >> + unsigned size)
> >> +{
> >> +PnvXive *xive = PNV_XIVE(opaque);
> >> +uint64_t edt_index = offset >> pnv_xive_edt_shift(xive);
> >> +uint64_t edt_type = 0;
> >> +uint64_t edt_offset;
> >> +MemTxResult result;
> >> +AddressSpace *edt_as = NULL;
> >> +uint64_t ret = -1;
> >> +
> >> +if (edt_index < 

Re: [Qemu-devel] [qemu-s390x] [PATCH 15/15] s390-bios: Support booting from real dasd device

2019-02-20 Thread Eric Farman




On 01/29/2019 08:29 AM, Jason J. Herne wrote:

Allows guest to boot from a vfio configured real dasd device.

Signed-off-by: Jason J. Herne 
---
  docs/devel/s390-dasd-ipl.txt | 132 +++
  pc-bios/s390-ccw/Makefile|   2 +-
  pc-bios/s390-ccw/dasd-ipl.c  | 249 +++
  pc-bios/s390-ccw/dasd-ipl.h  |  16 +++
  pc-bios/s390-ccw/main.c  |   4 +
  pc-bios/s390-ccw/s390-arch.h |  13 +++
  6 files changed, 415 insertions(+), 1 deletion(-)
  create mode 100644 docs/devel/s390-dasd-ipl.txt
  create mode 100644 pc-bios/s390-ccw/dasd-ipl.c
  create mode 100644 pc-bios/s390-ccw/dasd-ipl.h


...snip...


diff --git a/pc-bios/s390-ccw/dasd-ipl.c b/pc-bios/s390-ccw/dasd-ipl.c
new file mode 100644
index 000..b7ce6d9
--- /dev/null
+++ b/pc-bios/s390-ccw/dasd-ipl.c
@@ -0,0 +1,249 @@


...snip...


+static void ipl1_fixup(void)
+{
+Ccw0 *ccwSeek = (Ccw0 *) 0x08;
+Ccw0 *ccwSearchID = (Ccw0 *) 0x10;
+Ccw0 *ccwSearchTic = (Ccw0 *) 0x18;
+Ccw0 *ccwRead = (Ccw0 *) 0x20;
+CcwSeekData *seekData = (CcwSeekData *) 0x30;
+CcwSearchIdData *searchData = (CcwSearchIdData *) 0x38;
+
+/* move IPL1 CCWs to make room for CCWs needed to locate record 2 */
+memcpy(ccwRead, (void *)0x08, 16);
+
+/* Disable chaining so we don't TIC to IPL2 channel program */
+ccwRead->chain = 0x00;
+
+ccwSeek->cmd_code = CCW_CMD_DASD_SEEK;
+ccwSeek->cda = ptr2u32(seekData);
+ccwSeek->chain = 1;
+ccwSeek->count = sizeof(seekData);


This needs to be sizeof(*seekData)


+seekData->reserved = 0x00;
+seekData->cyl = 0x00;
+seekData->head = 0x00;
+
+ccwSearchID->cmd_code = CCW_CMD_DASD_SEARCH_ID_EQ;
+ccwSearchID->cda = ptr2u32(searchData);
+ccwSearchID->chain = 1;
+ccwSearchID->count = sizeof(searchData);


sizeof(*searchData)

I notice that vfio sees the count for each of these as 8 bytes despite 
them being packed structs of 6 or 5 bytes.



+searchData->cyl = 0;
+searchData->head = 0;
+searchData->record = 2;
+
+/* Go back to Search CCW if correct record not yet found */
+ccwSearchTic->cmd_code = CCW_CMD_TIC;
+ccwSearchTic->cda = ptr2u32(ccwSearchID);
+}
+
+static void run_ipl1(SubChannelId schid)
+ {
+uint32_t startAddr = 0x08;
+
+if (do_cio(schid, startAddr, CCW_FMT0)) {
+panic("dasd-ipl: Failed to run IPL1 channel program");
+}
+}
+
+static void run_ipl2(SubChannelId schid, uint32_t addr)
+{
+
+if (run_dynamic_ccw_program(schid, addr)) {
+panic("dasd-ipl: Failed to run IPL2 channel program");
+}
+}
+
+static void lpsw(void *psw_addr)
+{
+PSWLegacy *pswl = (PSWLegacy *) psw_addr;
+
+pswl->mask |= PSW_MASK_EAMODE;   /* Force z-mode */
+pswl->addr |= PSW_MASK_BAMODE;
+asm volatile("  llgtr 0,0\n llgtr 1,1\n" /* Some OS's expect to be */
+ "  llgtr 2,2\n llgtr 3,3\n" /* in 32-bit mode. Clear  */
+ "  llgtr 4,4\n llgtr 5,5\n" /* high part of regs to   */
+ "  llgtr 6,6\n llgtr 7,7\n" /* avoid messing up   */
+ "  llgtr 8,8\n llgtr 9,9\n" /* instructions that work */
+ "  llgtr 10,10\n llgtr 11,11\n" /* in both addressing */
+ "  llgtr 12,12\n llgtr 13,13\n" /* modes, like servc. */
+ "  llgtr 14,14\n llgtr 15,15\n"
+ "  lpsw %0\n"
+ : : "Q" (*pswl) : "cc");
+}
+
+/*
+ * Limitations in QEMU's CCW support complicate the IPL process. Details can
+ * be found in docs/devel/s390-dasd-ipl.txt
+ */
+void dasd_ipl(SubChannelId schid)
+{
+uint32_t ipl2_addr;
+
+/* Construct Read IPL CCW and run it to read IPL1 from boot disk */
+make_readipl();
+run_readipl(schid);
+ipl2_addr = read_ipl2_addr();
+check_ipl1();
+
+/*
+ * Fixup IPL1 channel program to account for QEMU limitations, then run it
+ * to read IPL2 channel program from boot disk.
+ */
+ipl1_fixup();
+run_ipl1(schid);
+check_ipl2(ipl2_addr);
+
+/*
+ * Run IPL2 channel program to read operating system code from boot disk
+ * then transfer control to the guest operating system
+ */
+run_ipl2(schid, ipl2_addr);
+lpsw(0);
+}
diff --git a/pc-bios/s390-ccw/dasd-ipl.h b/pc-bios/s390-ccw/dasd-ipl.h
new file mode 100644
index 000..56bba82
--- /dev/null
+++ b/pc-bios/s390-ccw/dasd-ipl.h
@@ -0,0 +1,16 @@
+/*
+ * S390 IPL (boot) from a real DASD device via vfio framework.
+ *
+ * Copyright (c) 2018 Jason J. Herne 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+
+#ifndef DASD_IPL_H
+#define DASD_IPL_H
+
+void dasd_ipl(SubChannelId schid);
+
+#endif /* DASD_IPL_H */
diff --git a/pc-bios/s390-ccw/main.c b/pc-bios/s390-ccw/main.c
index 5ee02c3..0a46339 100644
--- a/pc-bios/s390-ccw/main.c
+++ b/pc-bios/s390-ccw/main.c
@@ -13,6 

Re: [Qemu-devel] [Qemu-ppc] [PULL 00/43] ppc-for-4.0 queue 20190219

2019-02-20 Thread David Gibson
On Wed, Feb 20, 2019 at 05:54:25PM +0100, Greg Kurz wrote:
> On Wed, 20 Feb 2019 15:49:41 +
> Peter Maydell  wrote:
> 
> > On Wed, 20 Feb 2019 at 15:43, Greg Kurz  wrote:
> > > I have an account. I'll start updating the wiki according to what was
> > > actually merged up to this pull request.  
> > 
> > Thanks. Personally I've found for the arm parts of the changelog
> > that it's much easier to update the changelog with every
> > pull request, and it's going to result in a more detailed log
> > as a result.
> > 
> 
> Yeah. Cedric and I will take care of that for future pull requests.

Thanks so much for doing that - I'm quite aware I've been slack about
making these updates.

> Here are the changes I've come up so far:
> 
> https://wiki.qemu.org/index.php?title=ChangeLog%2F4.0=8226=8221
> 
> PPC developpers,
> 
> Please ping me or Cedric if there are other user-visible changes we
> should mention, or even better, update the wiki if you can :)
> 
> > -- PMM
> > 
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH] target-i386: Enhance the stub for kvm_arch_get_supported_cpuid()

2019-02-20 Thread Kamil Rytarowski
On 20.02.2019 18:29, Paolo Bonzini wrote:
> On 20/02/19 12:59, Kamil Rytarowski wrote:
>> Ping, still valid.
> 
> Sorry, I missed your email.
> 
>> On 15.02.2019 00:38, Kamil Rytarowski wrote:
>>> I consider it as fragile hack and certainly not something to depend on.
>>> Also in some circumstances of such code, especially "if (zero0)" we want
>>> to enable disabled code under a debugger.
> 
> That's a good objection, but certainly does not apply to KVM on NetBSD.
>

There is KVM for Darwin (experimental and rather toy project) and it
might be ported to NetBSD (I have actually forked it on GitHub
recently), but I doubt that someone would enable KVM on any platform
under a debugger this way and expect something to work.

>>> There were also kernel backdoors due to this optimization.
> 
> Citation please?
> 

I saw an exploit for such case with a .txt writeup on ftp of grsecurity
but that service seems to be gone (probably long time ago), so please
defer discussion on it. If someone is interested to find it out, there
are enough pointers to dig it (assuming that this is still possible).

>>> Requested cpu.i (hopefully correctly generated)
>>>
>>> http://netbsd.org/~kamil/qemu/cpu.i.bz2
> 
> So, first thing first I can reproduce clang's behavior with this .i file
> and also with this reduced test case.
> 
> extern void f(void);
> int i, j;
> int main()
> {
> if (0  && i) f();
> if (j  && 0) f();
>}
> 
> The first is eliminated but the second is not, just like in QEMU where
> this works:
> 
> if (kvm_enabled() && cpu->enable_pmu) {
> KVMState *s = cs->kvm_state;
> 
> *eax = kvm_arch_get_supported_cpuid(s, 0xA, count, R_EAX);
> *ebx = kvm_arch_get_supported_cpuid(s, 0xA, count, R_EBX);
> *ecx = kvm_arch_get_supported_cpuid(s, 0xA, count, R_ECX);
> *edx = kvm_arch_get_supported_cpuid(s, 0xA, count, R_EDX);
> } else if (hvf_enabled() && cpu->enable_pmu) {
> *eax = hvf_get_supported_cpuid(0xA, count, R_EAX);
> *ebx = hvf_get_supported_cpuid(0xA, count, R_EBX);
> *ecx = hvf_get_supported_cpuid(0xA, count, R_ECX);
> *edx = hvf_get_supported_cpuid(0xA, count, R_EDX);
> 
> while this doesn't:
> 
> if ((env->features[FEAT_7_0_EBX] & CPUID_7_0_EBX_INTEL_PT) &&
> kvm_enabled()) {
> KVMState *s = CPU(cpu)->kvm_state;
> uint32_t eax_0 = kvm_arch_get_supported_cpuid(s, 0x14, 0, R_EAX);
> uint32_t ebx_0 = kvm_arch_get_supported_cpuid(s, 0x14, 0, R_EBX);
> uint32_t ecx_0 = kvm_arch_get_supported_cpuid(s, 0x14, 0, R_ECX);
> uint32_t eax_1 = kvm_arch_get_supported_cpuid(s, 0x14, 1, R_EAX);
> uint32_t ebx_1 = kvm_arch_get_supported_cpuid(s, 0x14, 1, R_EBX);
> 
> But, that's okay, it's -O0 so we give clang a pass for that  Note that
> clang does do the optimization even in more complex cases like
> 
> extern _Bool f(void);
> int main()
> {
> if (!0) return 0;
> if (!f()) return 0;
> }
> 
> The problem is that there is a kvm-stub.c entry for that, and in fact
> my compilation passes and the symbol is resolved correctly:
> 
> $ nm target/i386/cpu.o |grep kvm_.*get_sup
>  U kvm_arch_get_supported_cpuid
> $ nm target/i386/kvm-stub.o|grep kvm_.*get_sup
> 0030 T kvm_arch_get_supported_cpuid
> $ nm qemu-system-x86_64 |grep kvm_.*get_sup
> 0046eab0 T kvm_arch_get_supported_cpuid
> 
> As expected, something much less obvious is going on for you, in
> particular __OPTIMIZE__seems not to be working properly.  However,
> that would also be very surprising.
> 
> Please:
> 
> 1) run the last two "nm" commands on your build (wthout grep).
> 

I cannot run nm(1) on qemu-system-x86_64 as it's not linkable.

I'm getting the same result for target/i386/cpu.o and
target/i386/kvm-stub.o.

$ nm ./i386-softmmu/target/i386/kvm-stub.o
 U abort
 T kvm_allows_irq0_override
0030 T kvm_arch_get_supported_cpuid
0020 T kvm_enable_x2apic
0010 T kvm_has_smm
0050 T kvm_hv_vpindex_settable

grep(1) used, but otherwise I would need to upload results somewhere else.

$ nm ./i386-bsd-user/target/i386/cpu.o |grep kvm
 U kvm_arch_get_supported_cpuid
1290 d kvm_default_props
 U kvm_state
0240 T x86_cpu_change_kvm_default

Please note that there are 4 types of x86 build: i386, x86_64 and two
bsd-user (32-bit and 64-bit).

According to my observations of repeated attempts both builds of
bsd-user are affected.

There is also a difference that kvm-stub is twice in !bsd-user
directories and once in bsd-user ones.

$ find . -name kvm-stub.o|grep x86_64
./x86_64-softmmu/accel/stubs/kvm-stub.o
./x86_64-softmmu/target/i386/kvm-stub.o
./x86_64-bsd-user/accel/stubs/kvm-stub.o

> 2) do the same exercise to get a .i for target/i386/kvm-stub.c
> 
> 3) try removing 

Re: [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration

2019-02-20 Thread Zhao Yan
> > 
> > > 5) About log sync, why not register log_global_start/stop in
> > vfio_memory_listener?
> > >
> > >
> > seems log_global_start/stop cannot be iterately called in pre-copy phase?
> > for dirty pages in system memory, it's better to transfer dirty data
> > iteratively to reduce down time, right?
> > 
> 
> We just need invoking only once for start and stop logging. Why we need to 
> call
> them literately? See memory_listener of vhost.
>
the dirty pages in system memory produces by device is incremental.
if it can be got iteratively, the dirty pages in stop-and-copy phase can be
minimal. 
:)

> Regards,
> -Gonglei



Re: [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration

2019-02-20 Thread Zhao Yan
On Thu, Feb 21, 2019 at 01:35:43AM +, Gonglei (Arei) wrote:
> 
> 
> > -Original Message-
> > From: Zhao Yan [mailto:yan.y.z...@intel.com]
> > Sent: Thursday, February 21, 2019 8:25 AM
> > To: Gonglei (Arei) 
> > Cc: alex.william...@redhat.com; qemu-devel@nongnu.org;
> > intel-gvt-...@lists.freedesktop.org; zhengxiao...@alibaba-inc.com;
> > yi.l@intel.com; eskul...@redhat.com; ziye.y...@intel.com;
> > coh...@redhat.com; shuangtai@alibaba-inc.com; dgilb...@redhat.com;
> > zhi.a.w...@intel.com; mlevi...@redhat.com; pa...@linux.ibm.com;
> > a...@ozlabs.ru; eau...@redhat.com; fel...@nutanix.com;
> > jonathan.dav...@nutanix.com; changpeng@intel.com; ken@amd.com;
> > kwankh...@nvidia.com; kevin.t...@intel.com; c...@nvidia.com;
> > k...@vger.kernel.org
> > Subject: Re: [PATCH 0/5] QEMU VFIO live migration
> > 
> > On Wed, Feb 20, 2019 at 11:56:01AM +, Gonglei (Arei) wrote:
> > > Hi yan,
> > >
> > > Thanks for your work.
> > >
> > > I have some suggestions or questions:
> > >
> > > 1) Would you add msix mode support,? if not, pls add a check in
> > vfio_pci_save_config(), likes Nvidia's solution.
> > ok.
> > 
> > > 2) We should start vfio devices before vcpu resumes, so we can't rely on 
> > > vm
> > start change handler completely.
> > vfio devices is by default set to running state.
> > In the target machine, its state transition flow is running->stop->running.
> 
> That's confusing. We should start vfio devices after vfio_load_state, 
> otherwise
> how can you keep the devices' information are the same between source side
> and destination side?
>
so, your meaning is to set device state to running in the first call to
vfio_load_state?

> > so, maybe you can ignore the stop notification in kernel?
> > > 3) We'd better support live migration rollback since have many failure
> > scenarios,
> > >  register a migration notifier is a good choice.
> > I think this patchset can also handle the failure case well.
> > if migration failure or cancelling happens,
> > in cleanup handler, LOGGING state is cleared. device state(running or
> > stopped) keeps as it is).
> 
> IIRC there're many failure paths don't calling cleanup handler.
>
could you take an example?
> > then,
> > if vm switches back to running, device state will be set to running;
> > if vm stayes at stopped state, device state is also stopped (it has no
> > meaning to let it in running state).
> > Do you think so ?
> > 
> IF the underlying state machine is complicated,
> We should tell the canceling state to vendor driver proactively.
> 
That makes sense.

> > > 4) Four memory region for live migration is too complicated IMHO.
> > one big region requires the sub-regions well padded.
> > like for the first control fields, they have to be padded to 4K.
> > the same for other data fields.
> > Otherwise, mmap simply fails, because the start-offset and size for mmap
> > both need to be PAGE aligned.
> > 
> But if we don't need use mmap for control filed and device state, they are 
> small basically.
> The performance is enough using pread/pwrite. 
> 
we don't mmap control fields. but if data fields going immedately after
control fields (e.g. just 64 bytes), we can't mmap data fields
successfully because its start offset is 64. Therefore control fields have
to be padded to 4k to let data fields start from 4k.
That's the drawback of one big region holding both control and data fields.

> > Also, 4 regions is clearer in my view :)
> > 
> > > 5) About log sync, why not register log_global_start/stop in
> > vfio_memory_listener?
> > >
> > >
> > seems log_global_start/stop cannot be iterately called in pre-copy phase?
> > for dirty pages in system memory, it's better to transfer dirty data
> > iteratively to reduce down time, right?
> > 
> 
> We just need invoking only once for start and stop logging. Why we need to 
> call
> them literately? See memory_listener of vhost.
> 



> Regards,
> -Gonglei



Re: [Qemu-devel] [PATCH 1/5] vfio/migration: define kernel interfaces

2019-02-20 Thread Zhao Yan
On Wed, Feb 20, 2019 at 06:08:13PM +0100, Cornelia Huck wrote:
> On Wed, 20 Feb 2019 02:36:36 -0500
> Zhao Yan  wrote:
> 
> > On Tue, Feb 19, 2019 at 02:09:18PM +0100, Cornelia Huck wrote:
> > > On Tue, 19 Feb 2019 16:52:14 +0800
> > > Yan Zhao  wrote:
> (...)
> > > > + *  Size of device config data is smaller than or equal to 
> > > > that of
> > > > + *  device config region.  
> > > 
> > > Not sure if I understand that sentence correctly... but what if a
> > > device has more config state than fits into this region? Is that
> > > supposed to be covered by the device memory region? Or is this assumed
> > > to be something so exotic that we don't need to plan for it?
> > >   
> > Device config data and device config region are all provided by vendor
> > driver, so vendor driver is always able to create a large enough device
> > config region to hold device config data.
> > So, if a device has data that are better to be saved after device stop and
> > saved/loaded in strict order, the data needs to be in device config region.
> > This kind of data is supposed to be small.
> > If the device data can be saved/loaded several times, it can also be put
> > into device memory region.
> 
> So, it is the vendor driver's decision which device information should
> go via which region? With the device config data supposed to be
> saved/loaded in one go?
Right, exactly.


> (...)
> > > > +/* version number of the device state interface */
> > > > +#define VFIO_DEVICE_STATE_INTERFACE_VERSION 1  
> > > 
> > > Hm. Is this supposed to be backwards-compatible, should we need to bump
> > > this?
> > >  
> > currently no backwords-compatible. we can discuss on that.
> 
> It might be useful if we discover that we need some extensions. But I'm
> not sure how much work it would be.
> 
> (...)
> > > > +/*
> > > > + * DEVICE STATES
> > > > + *
> > > > + * Four states are defined for a VFIO device:
> > > > + * RUNNING, RUNNING & LOGGING, STOP & LOGGING, STOP.
> > > > + * They can be set by writing to device_state field of
> > > > + * vfio_device_state_ctl region.  
> > > 
> > > Who controls this? Userspace?  
> > 
> > Yes. Userspace notifies vendor driver to do the state switching.
> 
> Might be good to mention this (just to make it obvious).
>
Got it. thanks

> > > > + * LOGGING state is a special state that it CANNOT exist
> > > > + * independently.  
> > > 
> > > So it's not a state, but rather a modifier?
> > >   
> > yes. or thinking LOGGING/not LOGGING as bit 1 of a device state,
> > whereas RUNNING/STOPPED is bit 0 of a device state.
> > They have to be got as a whole.
> 
> So it is (on a bit level):
> RUNNING -> 00
> STOPPED -> 01
> LOGGING/RUNNING -> 10
> LOGGING/STOPPED -> 11
> 

Yes.

> > > > + * It must be set alongside with state RUNNING or STOP, i.e,
> > > > + * RUNNING & LOGGING, STOP & LOGGING.
> > > > + * It is used for dirty data logging both for device memory
> > > > + * and system memory.
> > > > + *
> > > > + * LOGGING only impacts device/system memory. In LOGGING state, get 
> > > > buffer
> > > > + * of device memory returns dirty pages since last call; outside 
> > > > LOGGING
> > > > + * state, get buffer of device memory returns whole snapshot of device
> > > > + * memory. system memory's dirty page is only available in LOGGING 
> > > > state.
> > > > + *
> > > > + * Device config should be always accessible and return whole config 
> > > > snapshot
> > > > + * regardless of LOGGING state.
> > > > + * */
> > > > +#define VFIO_DEVICE_STATE_RUNNING 0
> > > > +#define VFIO_DEVICE_STATE_STOP 1
> > > > +#define VFIO_DEVICE_STATE_LOGGING 2
> 
> This makes it look a bit like LOGGING were an individual state, while 2
> is in reality LOGGING/RUNNING... not sure how to make that more
> obvious. Maybe (as we are dealing with a u32):
> 
> #define VFIO_DEVICE_STATE_RUNNING 0x
> #define VFIO_DEVICE_STATE_STOPPED 0x0001
> #define VFIO_DEVICE_STATE_LOGGING_RUNNING 0x0002
> #define VFIO_DEVICE_STATE_LOGGING_STOPPED 0x0003
> #define VFIO_DEVICE_STATE_LOGGING_MASK 0x0002
>
Yes, yours are better, thanks:)

> > > > +
> > > > +/* action to get data from device memory or device config
> > > > + * the action is write to device state's control region, and data is 
> > > > read
> > > > + * from device memory region or device config region.
> > > > + * Each time before read device memory region or device config region,
> > > > + * action VFIO_DEVICE_DATA_ACTION_GET_BUFFER is required to write to 
> > > > action
> > > > + * field in control region. That is because device memory and devie 
> > > > config
> > > > + * region is mmaped into user space. vendor driver has to be notified 
> > > > of
> > > > + * the the GET_BUFFER action in advance.
> > > > + */
> > > > +#define VFIO_DEVICE_DATA_ACTION_GET_BUFFER 1
> > > > +
> > > > +/* action to set data to device memory or device config
> > > > + * the action is write to device state's control region, and data is
> > > > + * written to device 

Re: [Qemu-devel] [PATCH v11 7/7] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2019-02-20 Thread Wei Wang

On 02/20/2019 09:12 PM, Dr. David Alan Gilbert wrote:

* Wang, Wei W (wei.w.w...@intel.com) wrote:

On Friday, December 14, 2018 7:17 PM, Dr. David Alan Gilbert wrote:

On 12/14/2018 05:56 PM, Dr. David Alan Gilbert wrote:

* Wei Wang (wei.w.w...@intel.com) wrote:

On 12/13/2018 11:45 PM, Dr. David Alan Gilbert wrote:

* Wei Wang (wei.w.w...@intel.com) wrote:

The new feature enables the virtio-balloon device to receive
hints of guest free pages from the free page vq.

A notifier is registered to the migration precopy notifier
chain. The notifier calls free_page_start after the migration
thread syncs the dirty bitmap, so that the free page
optimization starts to clear bits of free pages from the
bitmap. It calls the free_page_stop before the migration
thread syncs the bitmap, which is the end of the current round
of ram save. The free_page_stop is also called to stop the

optimization in the case when there is an error occurred in the process of
ram saving.

Note: balloon will report pages which were free at the time of this

call.

As the reporting happens asynchronously, dirty bit logging
must be enabled before this free_page_start call is made.
Guest reporting must be disabled before the migration dirty bitmap

is synchronized.

Signed-off-by: Wei Wang 
CC: Michael S. Tsirkin 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Peter Xu 

I think I'm OK for this from the migration side, I'd appreciate
someone checking the virtio and aio bits.

I'm not too sure how it gets switched on and off - i.e. if we
get a nice new qemu on a new kernel, what happens when I try and
migrate to the same qemu on an older kernel without these hints?


This feature doesn't rely on the host kernel. Those hints are
reported from the guest kernel.
So migration across different hosts wouldn't affect the use of this

feature.

Please correct me if I didn't get your point.

Ah OK, yes;  now what about migrating from new->old qemu with a new
guest but old machine type?


I think normally, the source QEMU and destination QEMU should have the
same QEMU booting parameter. If the destination QEMU doesn't support
"--device virtio-balloon,free-page-hint=true", which the source QEMU
has, the destination side QEMU will fail to boot, and migration will
not happen then.

Ah that's OK; as long as free-page-hint is false by default that will work fine.

Dave


Hi Dave,

Could we have this feature in QEMU 4.0 (freeze on Mar 12)?

I think so; can you remind me where we're up to:
   a) It looks like you've already got the kernel changes merged -
correct?


Yes, they were already merged half year ago.


   b) What about the virtio spec changes - where are they upto?


The spec changes are in progress. v1 were posted out, a v2 is in 
preparation.



   c) Where are the other reviews upto - I think most are reviewed - is
it just 7/7 that is missing the review-by?

7/7 is about the virtio changes, and Michael has given the reviewed-by:
http://lists.nongnu.org/archive/html/qemu-devel/2018-12/msg03732.html


Best,
Wei



Re: [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration

2019-02-20 Thread Gonglei (Arei)



> -Original Message-
> From: Zhao Yan [mailto:yan.y.z...@intel.com]
> Sent: Thursday, February 21, 2019 8:25 AM
> To: Gonglei (Arei) 
> Cc: alex.william...@redhat.com; qemu-devel@nongnu.org;
> intel-gvt-...@lists.freedesktop.org; zhengxiao...@alibaba-inc.com;
> yi.l@intel.com; eskul...@redhat.com; ziye.y...@intel.com;
> coh...@redhat.com; shuangtai@alibaba-inc.com; dgilb...@redhat.com;
> zhi.a.w...@intel.com; mlevi...@redhat.com; pa...@linux.ibm.com;
> a...@ozlabs.ru; eau...@redhat.com; fel...@nutanix.com;
> jonathan.dav...@nutanix.com; changpeng@intel.com; ken@amd.com;
> kwankh...@nvidia.com; kevin.t...@intel.com; c...@nvidia.com;
> k...@vger.kernel.org
> Subject: Re: [PATCH 0/5] QEMU VFIO live migration
> 
> On Wed, Feb 20, 2019 at 11:56:01AM +, Gonglei (Arei) wrote:
> > Hi yan,
> >
> > Thanks for your work.
> >
> > I have some suggestions or questions:
> >
> > 1) Would you add msix mode support,? if not, pls add a check in
> vfio_pci_save_config(), likes Nvidia's solution.
> ok.
> 
> > 2) We should start vfio devices before vcpu resumes, so we can't rely on vm
> start change handler completely.
> vfio devices is by default set to running state.
> In the target machine, its state transition flow is running->stop->running.

That's confusing. We should start vfio devices after vfio_load_state, otherwise
how can you keep the devices' information are the same between source side
and destination side?

> so, maybe you can ignore the stop notification in kernel?
> > 3) We'd better support live migration rollback since have many failure
> scenarios,
> >  register a migration notifier is a good choice.
> I think this patchset can also handle the failure case well.
> if migration failure or cancelling happens,
> in cleanup handler, LOGGING state is cleared. device state(running or
> stopped) keeps as it is).

IIRC there're many failure paths don't calling cleanup handler.

> then,
> if vm switches back to running, device state will be set to running;
> if vm stayes at stopped state, device state is also stopped (it has no
> meaning to let it in running state).
> Do you think so ?
> 
IF the underlying state machine is complicated,
We should tell the canceling state to vendor driver proactively.

> > 4) Four memory region for live migration is too complicated IMHO.
> one big region requires the sub-regions well padded.
> like for the first control fields, they have to be padded to 4K.
> the same for other data fields.
> Otherwise, mmap simply fails, because the start-offset and size for mmap
> both need to be PAGE aligned.
> 
But if we don't need use mmap for control filed and device state, they are 
small basically.
The performance is enough using pread/pwrite. 

> Also, 4 regions is clearer in my view :)
> 
> > 5) About log sync, why not register log_global_start/stop in
> vfio_memory_listener?
> >
> >
> seems log_global_start/stop cannot be iterately called in pre-copy phase?
> for dirty pages in system memory, it's better to transfer dirty data
> iteratively to reduce down time, right?
> 

We just need invoking only once for start and stop logging. Why we need to call
them literately? See memory_listener of vhost.

Regards,
-Gonglei



Re: [Qemu-devel] [PATCH v6 0/7] vhost-user-blk: Add support for backend reconnecting

2019-02-20 Thread Yongji Xie
On Thu, 21 Feb 2019 at 04:00, Michael S. Tsirkin  wrote:
>
> On Mon, Feb 18, 2019 at 06:27:41PM +0800, elohi...@gmail.com wrote:
> > From: Xie Yongji 
> >
> > This patchset is aimed at supporting qemu to reconnect
> > vhost-user-blk backend after vhost-user-blk backend crash or
> > restart.
> >
> > The patch 1 introduces two new messages VHOST_USER_GET_INFLIGHT_FD
> > and VHOST_USER_SET_INFLIGHT_FD to support transferring shared
> > buffer between qemu and backend.
> >
> > The patch 2 deletes some redundant check in contrib/libvhost-user.c.
> >
> > The patch 3,4 are the corresponding libvhost-user patches of
> > patch 1. Make libvhost-user support VHOST_USER_GET_INFLIGHT_FD
> > and VHOST_USER_SET_INFLIGHT_FD.
> >
> > The patch 5 allows vhost-user-blk to use the two new messages
> > to get/set inflight buffer from/to backend.
> >
> > The patch 6 supports vhost-user-blk to reconnect backend when
> > connection closed.
> >
> > The patch 7 introduces VHOST_USER_PROTOCOL_F_SLAVE_SHMFD
> > to vhost-user-blk backend which is used to tell qemu that
> > we support reconnecting now.
> >
> > To use it, we could start qemu with:
> >
> > qemu-system-x86_64 \
> > -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1, \
> > -device vhost-user-blk-pci,chardev=char0 \
> >
> > and start vhost-user-blk backend with:
> >
> > vhost-user-blk -b /path/file -s /path/vhost.socket
> >
> > Then we can restart vhost-user-blk at any time during VM running.
>
> Sorry is elohi...@gmail.com also an address that belongs to
> Xie Yongji?
>

Yes, that's also my email address.

Thanks,
Yongji



[Qemu-devel] [PATCH v3 20/20] Boot Linux Console Test: add a test for alpha + clipper

2019-02-20 Thread Cleber Rosa
Similar to the x86_64 + pc test, it boots a Linux kernel on a Malta
board and verify the serial is working.  One extra command added to
the QEMU command line is '-vga std', because the kernel used is
known to crash without it.

If alpha is a target being built, "make check-acceptance" will
automatically include this test by the use of the "arch:alpha" tags.

Alternatively, this test can be run using:

$ avocado run -t arch:alpha tests/acceptance
$ avocado run -t machine:clipper tests/acceptance

Signed-off-by: Philippe Mathieu-Daudé 
Signed-off-by: Cleber Rosa 
Reviewed-by: Caio Carrara 
---
 .travis.yml|  2 +-
 tests/acceptance/boot_linux_console.py | 22 ++
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/.travis.yml b/.travis.yml
index 82d680c437..ea63cac9e9 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -198,7 +198,7 @@ matrix:
 
 # Acceptance (Functional) tests
 - env:
-- CONFIG="--python=/usr/bin/python3 
--target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu,aarch64-softmmu,arm-softmmu,s390x-softmmu"
+- CONFIG="--python=/usr/bin/python3 
--target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu,aarch64-softmmu,arm-softmmu,s390x-softmmu,alpha-softmmu"
 - TEST_CMD="make check-acceptance"
   addons:
 apt:
diff --git a/tests/acceptance/boot_linux_console.py 
b/tests/acceptance/boot_linux_console.py
index aa581aa7de..d866886067 100644
--- a/tests/acceptance/boot_linux_console.py
+++ b/tests/acceptance/boot_linux_console.py
@@ -217,3 +217,25 @@ class BootLinuxConsole(Test):
 self.vm.launch()
 console_pattern = 'Kernel command line: %s' % kernel_command_line
 self.wait_for_console_pattern(console_pattern)
+
+def test_alpha_clipper(self):
+"""
+:avocado: tags=arch:alpha
+:avocado: tags=machine:clipper
+"""
+kernel_url = ('http://archive.debian.org/debian/dists/lenny/main/'
+  'installer-alpha/current/images/cdrom/vmlinuz')
+kernel_hash = '3a943149335529e2ed3e74d0d787b85fb5671ba3'
+kernel_path = self.fetch_asset(kernel_url, asset_hash=kernel_hash)
+
+uncompressed_kernel = archive.uncompress(kernel_path, self.workdir)
+
+self.vm.set_machine('clipper')
+self.vm.set_console()
+kernel_command_line = self.KERNEL_COMMON_COMMAND_LINE + 'console=ttyS0'
+self.vm.add_args('-vga', 'std',
+ '-kernel', uncompressed_kernel,
+ '-append', kernel_command_line)
+self.vm.launch()
+console_pattern = 'Kernel command line: %s' % kernel_command_line
+self.wait_for_console_pattern(console_pattern)
-- 
2.20.1




[Qemu-devel] [PATCH v3 10/20] Boot Linux Console Test: add common kernel command line options

2019-02-20 Thread Cleber Rosa
The 'printk.time=0' option makes it easier to parse the console
output.  Let's set it as a default, and reusable, kernel command line
options for this and future similar tests.

Signed-off-by: Cleber Rosa 
Reviewed-by: Philippe Mathieu-Daudé 
---
 tests/acceptance/boot_linux_console.py | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tests/acceptance/boot_linux_console.py 
b/tests/acceptance/boot_linux_console.py
index 35b31162d4..cc5dcd7373 100644
--- a/tests/acceptance/boot_linux_console.py
+++ b/tests/acceptance/boot_linux_console.py
@@ -23,6 +23,8 @@ class BootLinuxConsole(Test):
 
 timeout = 60
 
+KERNEL_COMMON_COMMAND_LINE = 'printk.time=0 '
+
 def test_x86_64_pc(self):
 """
 :avocado: tags=arch:x86_64
@@ -35,7 +37,7 @@ class BootLinuxConsole(Test):
 
 self.vm.set_machine('pc')
 self.vm.set_console()
-kernel_command_line = 'console=ttyS0'
+kernel_command_line = self.KERNEL_COMMON_COMMAND_LINE + 'console=ttyS0'
 self.vm.add_args('-kernel', kernel_path,
  '-append', kernel_command_line)
 self.vm.launch()
-- 
2.20.1




[Qemu-devel] [PATCH v3 18/20] Boot Linux Console Test: add a test for arm + virt

2019-02-20 Thread Cleber Rosa
Just like the previous tests, boots a Linux kernel on an arm target
using the virt machine.

Signed-off-by: Cleber Rosa 
Reviewed-by: Caio Carrara 
---
 .travis.yml|  2 +-
 tests/acceptance/boot_linux_console.py | 20 
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/.travis.yml b/.travis.yml
index 60a4bc00b8..f34bd8dc2b 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -198,7 +198,7 @@ matrix:
 
 # Acceptance (Functional) tests
 - env:
-- CONFIG="--python=/usr/bin/python3 
--target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu,aarch64-softmmu"
+- CONFIG="--python=/usr/bin/python3 
--target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu,aarch64-softmmu,arm-softmmu"
 - TEST_CMD="make check-acceptance"
   addons:
 apt:
diff --git a/tests/acceptance/boot_linux_console.py 
b/tests/acceptance/boot_linux_console.py
index 1f2dfa3654..311f6fbb96 100644
--- a/tests/acceptance/boot_linux_console.py
+++ b/tests/acceptance/boot_linux_console.py
@@ -177,3 +177,23 @@ class BootLinuxConsole(Test):
 self.vm.launch()
 console_pattern = 'Kernel command line: %s' % kernel_command_line
 self.wait_for_console_pattern(console_pattern)
+
+def test_arm_virt(self):
+"""
+:avocado: tags=arch:arm
+:avocado: tags=machine:virt
+"""
+kernel_url = ('https://sjc.edge.kernel.org/fedora-buffet/fedora/linux/'
+  'releases/29/Server/armhfp/os/images/pxeboot/vmlinuz')
+kernel_hash = 'e9826d741b4fb04cadba8d4824d1ed3b7fb8b4d4'
+kernel_path = self.fetch_asset(kernel_url, asset_hash=kernel_hash)
+
+self.vm.set_machine('virt')
+self.vm.set_console()
+kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
+   'console=ttyAMA0')
+self.vm.add_args('-kernel', kernel_path,
+ '-append', kernel_command_line)
+self.vm.launch()
+console_pattern = 'Kernel command line: %s' % kernel_command_line
+self.wait_for_console_pattern(console_pattern)
-- 
2.20.1




[Qemu-devel] [PATCH v3 19/20] Boot Linux Console Test: add a test for s390x + s390-ccw-virtio

2019-02-20 Thread Cleber Rosa
Just like the previous tests, boots a Linux kernel on a s390x target
using the s390-ccw-virtio machine.

Because it's not possible to have multiple VT220 consoles,
'-nodefaults' is used, so that the one set with set_console() works
correctly.

Signed-off-by: Cleber Rosa 
Reviewed-by: Cornelia Huck 
Reviewed-by: Caio Carrara 
---
 .travis.yml|  2 +-
 tests/acceptance/boot_linux_console.py | 20 
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/.travis.yml b/.travis.yml
index f34bd8dc2b..82d680c437 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -198,7 +198,7 @@ matrix:
 
 # Acceptance (Functional) tests
 - env:
-- CONFIG="--python=/usr/bin/python3 
--target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu,aarch64-softmmu,arm-softmmu"
+- CONFIG="--python=/usr/bin/python3 
--target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu,aarch64-softmmu,arm-softmmu,s390x-softmmu"
 - TEST_CMD="make check-acceptance"
   addons:
 apt:
diff --git a/tests/acceptance/boot_linux_console.py 
b/tests/acceptance/boot_linux_console.py
index 311f6fbb96..aa581aa7de 100644
--- a/tests/acceptance/boot_linux_console.py
+++ b/tests/acceptance/boot_linux_console.py
@@ -197,3 +197,23 @@ class BootLinuxConsole(Test):
 self.vm.launch()
 console_pattern = 'Kernel command line: %s' % kernel_command_line
 self.wait_for_console_pattern(console_pattern)
+
+def test_s390x_s390_ccw_virtio(self):
+"""
+:avocado: tags=arch:s390x
+:avocado: tags=machine:s390_ccw_virtio
+"""
+kernel_url = ('http://mirrors.rit.edu/fedora/fedora-secondary/releases'
+  '/29/Server/s390x/os/images/kernel.img')
+kernel_hash = 'e8e8439103ef8053418ef062644ffd46a7919313'
+kernel_path = self.fetch_asset(kernel_url, asset_hash=kernel_hash)
+
+self.vm.set_machine('s390-ccw-virtio')
+self.vm.set_console()
+kernel_command_line = self.KERNEL_COMMON_COMMAND_LINE + 'console=sclp0'
+self.vm.add_args('-nodefaults',
+ '-kernel', kernel_path,
+ '-append', kernel_command_line)
+self.vm.launch()
+console_pattern = 'Kernel command line: %s' % kernel_command_line
+self.wait_for_console_pattern(console_pattern)
-- 
2.20.1




[Qemu-devel] [PATCH v3 17/20] Boot Linux Console Test: add a test for aarch64 + virt

2019-02-20 Thread Cleber Rosa
Just like the previous tests, boots a Linux kernel on a aarch64 target
using the virt machine.

One special option added is the CPU type, given that the kernel
selected fails to boot on the virt machine's default CPU (cortex-a15).

Signed-off-by: Cleber Rosa 
Reviewed-by: Caio Carrara 
---
 .travis.yml|  2 +-
 tests/acceptance/boot_linux_console.py | 21 +
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/.travis.yml b/.travis.yml
index b5abe130f1..60a4bc00b8 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -198,7 +198,7 @@ matrix:
 
 # Acceptance (Functional) tests
 - env:
-- CONFIG="--python=/usr/bin/python3 
--target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu"
+- CONFIG="--python=/usr/bin/python3 
--target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu,aarch64-softmmu"
 - TEST_CMD="make check-acceptance"
   addons:
 apt:
diff --git a/tests/acceptance/boot_linux_console.py 
b/tests/acceptance/boot_linux_console.py
index 6bc9a6b303..1f2dfa3654 100644
--- a/tests/acceptance/boot_linux_console.py
+++ b/tests/acceptance/boot_linux_console.py
@@ -156,3 +156,24 @@ class BootLinuxConsole(Test):
 self.vm.launch()
 console_pattern = 'Kernel command line: %s' % kernel_command_line
 self.wait_for_console_pattern(console_pattern)
+
+def test_aarch64_virt(self):
+"""
+:avocado: tags=arch:aarch64
+:avocado: tags=machine:virt
+"""
+kernel_url = ('https://sjc.edge.kernel.org/fedora-buffet/fedora/linux/'
+  'releases/29/Server/aarch64/os/images/pxeboot/vmlinuz')
+kernel_hash = '8c73e469fc6ea06a58dc83a628fc695b693b8493'
+kernel_path = self.fetch_asset(kernel_url, asset_hash=kernel_hash)
+
+self.vm.set_machine('virt')
+self.vm.set_console()
+kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
+   'console=ttyAMA0')
+self.vm.add_args('-cpu', 'cortex-a53',
+ '-kernel', kernel_path,
+ '-append', kernel_command_line)
+self.vm.launch()
+console_pattern = 'Kernel command line: %s' % kernel_command_line
+self.wait_for_console_pattern(console_pattern)
-- 
2.20.1




[Qemu-devel] [PATCH v3 16/20] Boot Linux Console Test: add a test for ppc64 + pseries

2019-02-20 Thread Cleber Rosa
Just like the previous tests, boots a Linux kernel on a ppc64 target
using the pseries machine.

Signed-off-by: Cleber Rosa 
Reviewed-by: Caio Carrara 
---
 .travis.yml|  2 +-
 tests/acceptance/boot_linux_console.py | 19 +++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/.travis.yml b/.travis.yml
index 0260263bb8..b5abe130f1 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -198,7 +198,7 @@ matrix:
 
 # Acceptance (Functional) tests
 - env:
-- CONFIG="--python=/usr/bin/python3 
--target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu"
+- CONFIG="--python=/usr/bin/python3 
--target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu,ppc64-softmmu"
 - TEST_CMD="make check-acceptance"
   addons:
 apt:
diff --git a/tests/acceptance/boot_linux_console.py 
b/tests/acceptance/boot_linux_console.py
index 899c27a9ec..6bc9a6b303 100644
--- a/tests/acceptance/boot_linux_console.py
+++ b/tests/acceptance/boot_linux_console.py
@@ -137,3 +137,22 @@ class BootLinuxConsole(Test):
 self.vm.launch()
 console_pattern = 'Kernel command line: %s' % kernel_command_line
 self.wait_for_console_pattern(console_pattern)
+
+def test_ppc64_pseries(self):
+"""
+:avocado: tags=arch:ppc64
+:avocado: tags=machine:pseries
+"""
+kernel_url = ('http://mirrors.rit.edu/fedora/fedora-secondary/'
+  'releases/29/Everything/ppc64le/os/ppc/ppc64/vmlinuz')
+kernel_hash = '3fe04abfc852b66653b8c3c897a59a689270bc77'
+kernel_path = self.fetch_asset(kernel_url, asset_hash=kernel_hash)
+
+self.vm.set_machine('pseries')
+self.vm.set_console()
+kernel_command_line = self.KERNEL_COMMON_COMMAND_LINE + 'console=hvc0'
+self.vm.add_args('-kernel', kernel_path,
+ '-append', kernel_command_line)
+self.vm.launch()
+console_pattern = 'Kernel command line: %s' % kernel_command_line
+self.wait_for_console_pattern(console_pattern)
-- 
2.20.1




[Qemu-devel] [PATCH v3 13/20] scripts/qemu.py: support adding a console with the default serial device

2019-02-20 Thread Cleber Rosa
The set_console() utility function traditionally adds a device either
based on the explicitly given device type, or based on the machine type,
a known good type of device.

But, for a number of machine types, it may be impossible or
inconvenient to add the devices my means of "-device" command line
options, and then it may better to just use the "-serial" option and
let QEMU itself, based on the machine type, set the device
accordingly.

To achieve that, the behavior of set_console() now flags the intention
to add a console device on launch(), and if no explicit device type is
given, and there's no definition on CONSOLE_DEV_TYPES, the "-serial"
is going to be added to the QEMU command line, instead of raising
exceptions.

Based on testing with different machine types, the CONSOLE_DEV_TYPES
is now being set to the bare essential entries (one entry to be
honest), for machine types that can not easily give us a working
console with "-serial".

Signed-off-by: Cleber Rosa 
---
 scripts/qemu.py | 39 +++
 1 file changed, 19 insertions(+), 20 deletions(-)

diff --git a/scripts/qemu.py b/scripts/qemu.py
index ee85309923..bd1d2e2b9a 100644
--- a/scripts/qemu.py
+++ b/scripts/qemu.py
@@ -42,11 +42,6 @@ def kvm_available(target_arch=None):
 
 #: Maps machine types to the preferred console device types
 CONSOLE_DEV_TYPES = {
-r'^clipper$': 'isa-serial',
-r'^malta': 'isa-serial',
-r'^(pc.*|q35.*|isapc)$': 'isa-serial',
-r'^(40p|powernv|prep)$': 'isa-serial',
-r'^pseries.*': 'spapr-vty',
 r'^s390-ccw-virtio.*': 'sclpconsole',
 }
 
@@ -129,6 +124,7 @@ class QEMUMachine(object):
 self._temp_dir = None
 self._launched = False
 self._machine = None
+self._console_set = False
 self._console_device_type = None
 self._console_address = None
 self._console_socket = None
@@ -248,13 +244,17 @@ class QEMUMachine(object):
 '-display', 'none', '-vga', 'none']
 if self._machine is not None:
 args.extend(['-machine', self._machine])
-if self._console_device_type is not None:
+if self._console_set:
 self._console_address = os.path.join(self._temp_dir,
  self._name + "-console.sock")
 chardev = ('socket,id=console,path=%s,server,nowait' %
self._console_address)
-device = '%s,chardev=console' % self._console_device_type
-args.extend(['-chardev', chardev, '-device', device])
+args.extend(['-chardev', chardev])
+if self._console_device_type is None:
+args.extend(['-serial', 'chardev:console'])
+else:
+device = '%s,chardev=console' % self._console_device_type
+args.extend(['-device', device])
 return args
 
 def _pre_launch(self):
@@ -480,30 +480,29 @@ class QEMUMachine(object):
 line.
 
 This is a convenience method that will either use the provided
-device type, of if not given, it will used the device type set
-on CONSOLE_DEV_TYPES.
+device type, of if not given, it will use the device type set
+on CONSOLE_DEV_TYPES if a machine type is set, and a matching
+entry exists on CONSOLE_DEV_TYPES.
 
 The actual setting of command line arguments will be be done at
 machine launch time, as it depends on the temporary directory
 to be created.
 
-@param device_type: the device type, such as "isa-serial"
+@param device_type: the device type, such as "isa-serial".  If
+None is given (the default value) a "-serial
+chardev:console" command line argument will
+be used instead, resorting to the machine's
+default device type, if a machine type is set,
+and a matching entry exists on CONSOLE_DEV_TYPES.
 @raises: QEMUMachineAddDeviceError if the device type is not given
  and can not be determined.
 """
-if device_type is None:
-if self._machine is None:
-raise QEMUMachineAddDeviceError("Can not add a console device:"
-" QEMU instance without a "
-"defined machine type")
+self._console_set = True
+if device_type is None and self._machine is not None:
 for regex, device in CONSOLE_DEV_TYPES.items():
 if re.match(regex, self._machine):
 device_type = device
 break
-if device_type is None:
-raise QEMUMachineAddDeviceError("Can not add a console device:"
-" no matching console device "
-  

[Qemu-devel] [PATCH v3 11/20] Boot Linux Console Test: increase timeout

2019-02-20 Thread Cleber Rosa
When running on very low powered environments, some tests may time out
causing false negatives.  As a conservative change, and for
considering that human time (investigating false negatives) is worth
more than some extra machine cycles (and time), let's increase the
overall timeout.

CC: Alex Bennée 
Signed-off-by: Cleber Rosa 
---
 tests/acceptance/boot_linux_console.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/acceptance/boot_linux_console.py 
b/tests/acceptance/boot_linux_console.py
index cc5dcd7373..fa721a7355 100644
--- a/tests/acceptance/boot_linux_console.py
+++ b/tests/acceptance/boot_linux_console.py
@@ -21,7 +21,7 @@ class BootLinuxConsole(Test):
 :avocado: enable
 """
 
-timeout = 60
+timeout = 90
 
 KERNEL_COMMON_COMMAND_LINE = 'printk.time=0 '
 
-- 
2.20.1




[Qemu-devel] [PATCH v3 09/20] Boot Linux Console Test: update the x86_64 kernel

2019-02-20 Thread Cleber Rosa
To the stock Fedora 29 kernel, from the Fedora 28.  New tests will be
added using the 29 kernel, so for consistency, let's also update it
here.

Signed-off-by: Cleber Rosa 
Reviewed-by: Caio Carrara 
---
 tests/acceptance/boot_linux_console.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/acceptance/boot_linux_console.py 
b/tests/acceptance/boot_linux_console.py
index 89df7f6e4f..35b31162d4 100644
--- a/tests/acceptance/boot_linux_console.py
+++ b/tests/acceptance/boot_linux_console.py
@@ -28,9 +28,9 @@ class BootLinuxConsole(Test):
 :avocado: tags=arch:x86_64
 :avocado: tags=machine:pc
 """
-kernel_url = ('https://mirrors.kernel.org/fedora/releases/28/'
+kernel_url = ('https://mirrors.kernel.org/fedora/releases/29/'
   'Everything/x86_64/os/images/pxeboot/vmlinuz')
-kernel_hash = '238e083e114c48200f80d889f7e32eeb2793e02a'
+kernel_hash = '23bebd2680757891cf7adedb033532163a792495'
 kernel_path = self.fetch_asset(kernel_url, asset_hash=kernel_hash)
 
 self.vm.set_machine('pc')
-- 
2.20.1




[Qemu-devel] [PATCH v3 12/20] Boot Linux Console Test: refactor the console watcher into utility method

2019-02-20 Thread Cleber Rosa
This introduces a utility method that monitors the console device and
looks for either a message that signals the test success or failure.

Signed-off-by: Cleber Rosa 
Reviewed-by: Caio Carrara 
Reviewed-by: Philippe Mathieu-Daudé 
---
 tests/acceptance/boot_linux_console.py | 30 ++
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/tests/acceptance/boot_linux_console.py 
b/tests/acceptance/boot_linux_console.py
index fa721a7355..e2ef43e7ce 100644
--- a/tests/acceptance/boot_linux_console.py
+++ b/tests/acceptance/boot_linux_console.py
@@ -25,6 +25,25 @@ class BootLinuxConsole(Test):
 
 KERNEL_COMMON_COMMAND_LINE = 'printk.time=0 '
 
+def wait_for_console_pattern(self, success_message,
+ failure_message='Kernel panic - not syncing'):
+"""
+Waits for messages to appear on the console, while logging the content
+
+:param success_message: if this message appears, test succeeds
+:param failure_message: if this message appears, test fails
+"""
+console = self.vm.console_socket.makefile()
+console_logger = logging.getLogger('console')
+while True:
+msg = console.readline()
+console_logger.debug(msg.strip())
+if success_message in msg:
+break
+if failure_message in msg:
+fail = 'Failure message found in console: %s' % failure_message
+self.fail(fail)
+
 def test_x86_64_pc(self):
 """
 :avocado: tags=arch:x86_64
@@ -41,12 +60,5 @@ class BootLinuxConsole(Test):
 self.vm.add_args('-kernel', kernel_path,
  '-append', kernel_command_line)
 self.vm.launch()
-console = self.vm.console_socket.makefile()
-console_logger = logging.getLogger('console')
-while True:
-msg = console.readline()
-console_logger.debug(msg.strip())
-if 'Kernel command line: %s' % kernel_command_line in msg:
-break
-if 'Kernel panic - not syncing' in msg:
-self.fail("Kernel panic reached")
+console_pattern = 'Kernel command line: %s' % kernel_command_line
+self.wait_for_console_pattern(console_pattern)
-- 
2.20.1




[Qemu-devel] [PATCH v3 08/20] Boot Linux Console Test: rename the x86_64 after the arch and machine

2019-02-20 Thread Cleber Rosa
Given that the test is specific to x86_64 and pc, and new tests are
going to be added to the same class, let's rename it accordingly.
Also, let's make the class documentation not architecture specific.

Signed-off-by: Cleber Rosa 
Reviewed-by: Caio Carrara 
Reviewed-by: Philippe Mathieu-Daudé 
---
 tests/acceptance/boot_linux_console.py | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/tests/acceptance/boot_linux_console.py 
b/tests/acceptance/boot_linux_console.py
index 46b20bdfe2..89df7f6e4f 100644
--- a/tests/acceptance/boot_linux_console.py
+++ b/tests/acceptance/boot_linux_console.py
@@ -15,16 +15,19 @@ from avocado_qemu import Test
 
 class BootLinuxConsole(Test):
 """
-Boots a x86_64 Linux kernel and checks that the console is operational
-and the kernel command line is properly passed from QEMU to the kernel
+Boots a Linux kernel and checks that the console is operational and the
+kernel command line is properly passed from QEMU to the kernel
 
 :avocado: enable
-:avocado: tags=arch:x86_64
 """
 
 timeout = 60
 
-def test(self):
+def test_x86_64_pc(self):
+"""
+:avocado: tags=arch:x86_64
+:avocado: tags=machine:pc
+"""
 kernel_url = ('https://mirrors.kernel.org/fedora/releases/28/'
   'Everything/x86_64/os/images/pxeboot/vmlinuz')
 kernel_hash = '238e083e114c48200f80d889f7e32eeb2793e02a'
-- 
2.20.1




[Qemu-devel] [PATCH v3 06/20] Acceptance tests: use "arch:" tag to filter target specific tests

2019-02-20 Thread Cleber Rosa
Currently, the only test that contains some target architecture
information is "boot_linux_console.py" which test contains a "x86_64"
tag.  But that tag is not respected in the default execution, that is,
"make check-acceptance" doesn't do anything with it.

That said, even the target architecture handling currently present in
the "avocado_qemu.Test" class, class is pretty limited.  For instance,
by default, it chooses a target based on the host architecture.

Because the original implementation of the tags feature in Avocado did
not include any time of namespace or "key:val" mechanism, no tag has
relation to another tag.  The new implementation of the tags feature
from version 67.0 onwards, allows "key:val" tags, and because of that,
a test can be classified with a tag in a given key.  For instance, the
new proposed version of the "boot_linux_console.py" test, which
downloads and attempts to run a x86_64 kernel, is now tagged as:

  :avocado: tags=arch:x86_64

This means that it can be filtered (out) when no x86_64 target is
available.  At the same time, tests that don't have a "arch:" tag,
will not be filtered out.

Signed-off-by: Cleber Rosa 
---
 tests/Makefile.include | 3 +++
 tests/acceptance/boot_linux_console.py | 2 +-
 tests/acceptance/linux_initrd.py   | 2 +-
 tests/acceptance/virtio_version.py | 2 +-
 tests/requirements.txt | 2 +-
 5 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/tests/Makefile.include b/tests/Makefile.include
index 93ea42553e..633992603d 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -1090,6 +1090,7 @@ TESTS_RESULTS_DIR=$(BUILD_DIR)/tests/results
 # Any number of command separated loggers are accepted.  For more
 # information please refer to "avocado --help".
 AVOCADO_SHOW=app
+AVOCADO_TAGS=$(patsubst %-softmmu,-t arch:%, $(filter 
%-softmmu,$(TARGET_DIRS)))
 
 ifneq ($(findstring v2,"v$(PYTHON_VERSION)"),v2)
 $(TESTS_VENV_DIR): $(TESTS_VENV_REQ)
@@ -1115,6 +1116,8 @@ check-acceptance: check-venv $(TESTS_RESULTS_DIR)
$(call quiet-command, \
 $(TESTS_VENV_DIR)/bin/python -m avocado \
 --show=$(AVOCADO_SHOW) run --job-results-dir=$(TESTS_RESULTS_DIR) \
+--filter-by-tags-include-empty --filter-by-tags-include-empty-key \
+$(AVOCADO_TAGS) \
 --failfast=on $(SRC_PATH)/tests/acceptance, \
 "AVOCADO", "tests/acceptance")
 
diff --git a/tests/acceptance/boot_linux_console.py 
b/tests/acceptance/boot_linux_console.py
index 98324f7591..46b20bdfe2 100644
--- a/tests/acceptance/boot_linux_console.py
+++ b/tests/acceptance/boot_linux_console.py
@@ -19,7 +19,7 @@ class BootLinuxConsole(Test):
 and the kernel command line is properly passed from QEMU to the kernel
 
 :avocado: enable
-:avocado: tags=x86_64
+:avocado: tags=arch:x86_64
 """
 
 timeout = 60
diff --git a/tests/acceptance/linux_initrd.py b/tests/acceptance/linux_initrd.py
index 737355c2ef..c75e29be70 100644
--- a/tests/acceptance/linux_initrd.py
+++ b/tests/acceptance/linux_initrd.py
@@ -19,7 +19,7 @@ class LinuxInitrd(Test):
 Checks QEMU evaluates correctly the initrd file passed as -initrd option.
 
 :avocado: enable
-:avocado: tags=x86_64
+:avocado: tags=arch:x86_64
 """
 
 timeout = 60
diff --git a/tests/acceptance/virtio_version.py 
b/tests/acceptance/virtio_version.py
index ce990250d8..3b280e7fc3 100644
--- a/tests/acceptance/virtio_version.py
+++ b/tests/acceptance/virtio_version.py
@@ -62,7 +62,7 @@ class VirtioVersionCheck(Test):
 `disable-legacy`.
 
 :avocado: enable
-:avocado: tags=x86_64
+:avocado: tags=arch:x86_64
 """
 
 # just in case there are failures, show larger diff:
diff --git a/tests/requirements.txt b/tests/requirements.txt
index 64c6e27a94..002ded6a22 100644
--- a/tests/requirements.txt
+++ b/tests/requirements.txt
@@ -1,4 +1,4 @@
 # Add Python module requirements, one per line, to be installed
 # in the tests/venv Python virtual environment. For more info,
 # refer to: https://pip.pypa.io/en/stable/user_guide/#id1
-avocado-framework==65.0
+avocado-framework==68.0
-- 
2.20.1




[Qemu-devel] [PATCH v3 07/20] Acceptance tests: look for target architecture in test tags first

2019-02-20 Thread Cleber Rosa
A test can, optionally, be tagged for one or many architectures.  If a
test has been tagged for a single architecture, there's a high chance
that the test won't run on other architectures.  This changes the
default order of choosing a default target architecture to use based
on the 'arch' tag value first.

The precedence order is for choosing a QEMU binary to use for a test
is now:

 * qemu_bin parameter
 * arch parameter
 * arch tag value (for example, x86_64 if ":avocado: tags=arch:x86_64
   is used)

This means that if one runs:

 $ avocado run -p qemu_bin=/usr/bin/qemu-system-x86_64 test.py

No arch parameter or tag will influence the selection of the QEMU
target binary.  If one runs:

 $ avocado run -p arch=ppc64 test.py

The target binary selection mechanism will attempt to find a binary
such as "ppc64-softmmu/qemu-system-ppc64".  And finally, if one runs
a test that is tagged (in its docstring) with "arch:aarch64":

 $ avocado run aarch64.py

The target binary selection mechanism will attempt to find a binary
such as "aarch64-softmmu/qemu-system-aarch64".

At this time, no provision is made to cancel the execution of tests if
the arch parameter given (manually) does not match the test "arch"
tag, but it may be a useful default behavior to be added in the
future.

Signed-off-by: Cleber Rosa 
---
 docs/devel/testing.rst| 4 +++-
 tests/acceptance/avocado_qemu/__init__.py | 7 ++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst
index 6035db1b44..87bcf8ef43 100644
--- a/docs/devel/testing.rst
+++ b/docs/devel/testing.rst
@@ -702,7 +702,9 @@ A test may, for instance, use the same value when selecting 
the
 architecture of a kernel or disk image to boot a VM with.
 
 The ``arch`` attribute will be set to the test parameter of the same
-name, and if one is not given explicitly, it will be set to ``None``.
+name.  If one is not given explicitly, it will either be set to
+``None``, or, if the test is tagged with one (and only one)
+``:avocado: tags=arch:VALUE`` tag, it will be set to ``VALUE``.
 
 qemu_bin
 
diff --git a/tests/acceptance/avocado_qemu/__init__.py 
b/tests/acceptance/avocado_qemu/__init__.py
index f580582602..9e98d113cb 100644
--- a/tests/acceptance/avocado_qemu/__init__.py
+++ b/tests/acceptance/avocado_qemu/__init__.py
@@ -53,7 +53,12 @@ def pick_default_qemu_bin(arch=None):
 class Test(avocado.Test):
 def setUp(self):
 self.vm = None
-self.arch = self.params.get('arch')
+arches = self.tags.get('arch', [])
+if len(arches) == 1:
+arch = arches.pop()
+else:
+arch = None
+self.arch = self.params.get('arch', default=arch)
 default_qemu_bin = pick_default_qemu_bin(arch=self.arch)
 self.qemu_bin = self.params.get('qemu_bin',
 default=default_qemu_bin)
-- 
2.20.1




[Qemu-devel] [PATCH v3 14/20] Boot Linux Console Test: add a test for mips + malta

2019-02-20 Thread Cleber Rosa
From: Philippe Mathieu-Daudé 

Similar to the x86_64 + pc test, it boots a Linux kernel on a Malta
board and verify the serial is working.  Also, it relies on the serial
device set by the machine itself.

If mips is a target being built, "make check-acceptance" will
automatically include this test by the use of the "arch:mips" tags.

Alternatively, this test can be run using:

$ avocado run -t arch:mips tests/acceptance
$ avocado run -t machine:malta tests/acceptance
$ avocado run -t endian:big tests/acceptance

Signed-off-by: Philippe Mathieu-Daudé 
Signed-off-by: Cleber Rosa 
---
 .travis.yml|  2 +-
 tests/acceptance/boot_linux_console.py | 41 ++
 2 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/.travis.yml b/.travis.yml
index 42971484ab..0a5e0613be 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -198,7 +198,7 @@ matrix:
 
 # Acceptance (Functional) tests
 - env:
-- CONFIG="--python=/usr/bin/python3 --target-list=x86_64-softmmu"
+- CONFIG="--python=/usr/bin/python3 
--target-list=x86_64-softmmu,mips-softmmu"
 - TEST_CMD="make check-acceptance"
   addons:
 apt:
diff --git a/tests/acceptance/boot_linux_console.py 
b/tests/acceptance/boot_linux_console.py
index e2ef43e7ce..05e43360b8 100644
--- a/tests/acceptance/boot_linux_console.py
+++ b/tests/acceptance/boot_linux_console.py
@@ -8,9 +8,12 @@
 # This work is licensed under the terms of the GNU GPL, version 2 or
 # later.  See the COPYING file in the top-level directory.
 
+import os
 import logging
 
 from avocado_qemu import Test
+from avocado.utils import process
+from avocado.utils import archive
 
 
 class BootLinuxConsole(Test):
@@ -44,6 +47,21 @@ class BootLinuxConsole(Test):
 fail = 'Failure message found in console: %s' % failure_message
 self.fail(fail)
 
+def extract_from_deb(self, deb, path):
+"""
+Extracts a file from a deb package into the test workdir
+
+:param deb: path to the deb archive
+:param file: path within the deb archive of the file to be extracted
+:returns: path of the extracted file
+"""
+cwd = os.getcwd()
+os.chdir(self.workdir)
+process.run("ar x %s data.tar.gz" % deb)
+archive.extract("data.tar.gz", self.workdir)
+os.chdir(cwd)
+return self.workdir + path
+
 def test_x86_64_pc(self):
 """
 :avocado: tags=arch:x86_64
@@ -62,3 +80,26 @@ class BootLinuxConsole(Test):
 self.vm.launch()
 console_pattern = 'Kernel command line: %s' % kernel_command_line
 self.wait_for_console_pattern(console_pattern)
+
+def test_mips_malta(self):
+"""
+:avocado: tags=arch:mips
+:avocado: tags=machine:malta
+:avocado: tags=endian:big
+"""
+deb_url = ('http://snapshot.debian.org/archive/debian/'
+   '20130217T032700Z/pool/main/l/linux-2.6/'
+   'linux-image-2.6.32-5-4kc-malta_2.6.32-48_mips.deb')
+deb_hash = 'a8cfc28ad8f45f54811fc6cf74fc43ffcfe0ba04'
+deb_path = self.fetch_asset(deb_url, asset_hash=deb_hash)
+kernel_path = self.extract_from_deb(deb_path,
+'/boot/vmlinux-2.6.32-5-4kc-malta')
+
+self.vm.set_machine('malta')
+self.vm.set_console()
+kernel_command_line = self.KERNEL_COMMON_COMMAND_LINE + 'console=ttyS0'
+self.vm.add_args('-kernel', kernel_path,
+ '-append', kernel_command_line)
+self.vm.launch()
+console_pattern = 'Kernel command line: %s' % kernel_command_line
+self.wait_for_console_pattern(console_pattern)
-- 
2.20.1




[Qemu-devel] [PATCH v3 03/20] Acceptance tests: improve docstring on pick_default_qemu_bin()

2019-02-20 Thread Cleber Rosa
Making it clear what is returned by this utility function.

Signed-off-by: Cleber Rosa 
Reviewed-by: Caio Carrara 
Reviewed-by: Philippe Mathieu-Daudé 
---
 tests/acceptance/avocado_qemu/__init__.py | 4 
 1 file changed, 4 insertions(+)

diff --git a/tests/acceptance/avocado_qemu/__init__.py 
b/tests/acceptance/avocado_qemu/__init__.py
index 1e54fd5932..d8d5b48dac 100644
--- a/tests/acceptance/avocado_qemu/__init__.py
+++ b/tests/acceptance/avocado_qemu/__init__.py
@@ -27,6 +27,10 @@ def pick_default_qemu_bin():
 """
 Picks the path of a QEMU binary, starting either in the current working
 directory or in the source tree root directory.
+
+:returns: the path to the default QEMU binary or None if one could not
+  be found
+:rtype: str or None
 """
 arch = os.uname()[4]
 qemu_bin_relative_path = os.path.join("%s-softmmu" % arch,
-- 
2.20.1




[Qemu-devel] [PATCH v3 04/20] Acceptance tests: fix doc reference to avocado_qemu directory

2019-02-20 Thread Cleber Rosa
The "this directory" reference is misleading and confusing, it's a
leftover from when this text was proposed in a README file inside
the "tests/acceptance/avocado_qemu" directory.

When that text was moved to the top level docs directory, the
reference was not updated.

Signed-off-by: Cleber Rosa 
Reviewed-by: Caio Carrara 
Reviewed-by: Philippe Mathieu-Daudé 
---
 docs/devel/testing.rst | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst
index 135743a2bf..ceaaafc69f 100644
--- a/docs/devel/testing.rst
+++ b/docs/devel/testing.rst
@@ -590,8 +590,9 @@ Alternatively, follow the instructions on this link:
 Overview
 
 
-This directory provides the ``avocado_qemu`` Python module, containing
-the ``avocado_qemu.Test`` class.  Here's a simple usage example:
+The ``tests/acceptance/avocado_qemu`` directory provides the
+``avocado_qemu`` Python module, containing the ``avocado_qemu.Test``
+class.  Here's a simple usage example:
 
 .. code::
 
-- 
2.20.1




[Qemu-devel] [PATCH v3 05/20] Acceptance tests: introduce arch parameter and attribute

2019-02-20 Thread Cleber Rosa
It's useful to define the architecture that should be used in
situations such as:
 * the intended target of the QEMU binary to be used on tests
 * the architecture of code to be run within the QEMU binary, such
   as a kernel image or a full blown guest OS image

This commit introduces both a test parameter and a test instance
attribute, that will contain such a value.

Now, when the "arch" test parameter is given, it will influence the
selection of the default QEMU binary, if one is not given explicitly
by means of the "qemu_img" parameter.

Signed-off-by: Cleber Rosa 
Reviewed-by: Philippe Mathieu-Daudé 
---
 docs/devel/testing.rst| 28 +++
 tests/acceptance/avocado_qemu/__init__.py | 14 +---
 2 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst
index ceaaafc69f..6035db1b44 100644
--- a/docs/devel/testing.rst
+++ b/docs/devel/testing.rst
@@ -689,6 +689,21 @@ vm
 A QEMUMachine instance, initially configured according to the given
 ``qemu_bin`` parameter.
 
+arch
+
+
+The architecture can be used on different levels of the stack, e.g. by
+the framework or by the test itself.  At the framework level, it will
+will currently influence the selection of a QEMU binary (when one is
+not explicitly given).
+
+Tests are also free to use this attribute value, for their own needs.
+A test may, for instance, use the same value when selecting the
+architecture of a kernel or disk image to boot a VM with.
+
+The ``arch`` attribute will be set to the test parameter of the same
+name, and if one is not given explicitly, it will be set to ``None``.
+
 qemu_bin
 
 
@@ -711,6 +726,19 @@ like the following:
 
   PARAMS (key=qemu_bin, path=*, default=x86_64-softmmu/qemu-system-x86_64) => 
'x86_64-softmmu/qemu-system-x86_64
 
+arch
+
+
+The architecture that will influence the selection of a QEMU binary
+(when one is not explicitly given).
+
+Tests are also free to use this parameter value, for their own needs.
+A test may, for instance, use the same value when selecting the
+architecture of a kernel or disk image to boot a VM with.
+
+This parameter has a direct relation with the ``arch`` attribute.  If
+not given, it will default to None.
+
 qemu_bin
 
 
diff --git a/tests/acceptance/avocado_qemu/__init__.py 
b/tests/acceptance/avocado_qemu/__init__.py
index d8d5b48dac..f580582602 100644
--- a/tests/acceptance/avocado_qemu/__init__.py
+++ b/tests/acceptance/avocado_qemu/__init__.py
@@ -23,16 +23,22 @@ def is_readable_executable_file(path):
 return os.path.isfile(path) and os.access(path, os.R_OK | os.X_OK)
 
 
-def pick_default_qemu_bin():
+def pick_default_qemu_bin(arch=None):
 """
 Picks the path of a QEMU binary, starting either in the current working
 directory or in the source tree root directory.
 
+:param arch: the arch to use when looking for a QEMU binary (the target
+ will match the arch given).  If None (the default) arch
+ will be the current host system arch (as given by
+ :func:`os.uname`).
+:type arch: str
 :returns: the path to the default QEMU binary or None if one could not
   be found
 :rtype: str or None
 """
-arch = os.uname()[4]
+if arch is None:
+arch = os.uname()[4]
 qemu_bin_relative_path = os.path.join("%s-softmmu" % arch,
   "qemu-system-%s" % arch)
 if is_readable_executable_file(qemu_bin_relative_path):
@@ -47,8 +53,10 @@ def pick_default_qemu_bin():
 class Test(avocado.Test):
 def setUp(self):
 self.vm = None
+self.arch = self.params.get('arch')
+default_qemu_bin = pick_default_qemu_bin(arch=self.arch)
 self.qemu_bin = self.params.get('qemu_bin',
-default=pick_default_qemu_bin())
+default=default_qemu_bin)
 if self.qemu_bin is None:
 self.cancel("No QEMU binary defined or found in the source tree")
 self.vm = QEMUMachine(self.qemu_bin)
-- 
2.20.1




[Qemu-devel] [PATCH v3 15/20] Boot Linux Console Test: add a test for mips64el + malta

2019-02-20 Thread Cleber Rosa
Similar to the x86_64 + pc test, it boots a Linux kernel on a Malta
board and verify the serial is working.

If mips64el is a target being built, "make check-acceptance" will
automatically include this test by the use of the "arch:mips64el"
tags.

Alternatively, this test can be run using:

$ avocado run -t arch:mips64el tests/acceptance
$ avocado run -t machine:malta tests/acceptance

Signed-off-by: Philippe Mathieu-Daudé 
Signed-off-by: Cleber Rosa 
---
 .travis.yml|  2 +-
 tests/acceptance/boot_linux_console.py | 34 ++
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/.travis.yml b/.travis.yml
index 0a5e0613be..0260263bb8 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -198,7 +198,7 @@ matrix:
 
 # Acceptance (Functional) tests
 - env:
-- CONFIG="--python=/usr/bin/python3 
--target-list=x86_64-softmmu,mips-softmmu"
+- CONFIG="--python=/usr/bin/python3 
--target-list=x86_64-softmmu,mips-softmmu,mips64el-softmmu"
 - TEST_CMD="make check-acceptance"
   addons:
 apt:
diff --git a/tests/acceptance/boot_linux_console.py 
b/tests/acceptance/boot_linux_console.py
index 05e43360b8..899c27a9ec 100644
--- a/tests/acceptance/boot_linux_console.py
+++ b/tests/acceptance/boot_linux_console.py
@@ -10,6 +10,7 @@
 
 import os
 import logging
+import os
 
 from avocado_qemu import Test
 from avocado.utils import process
@@ -103,3 +104,36 @@ class BootLinuxConsole(Test):
 self.vm.launch()
 console_pattern = 'Kernel command line: %s' % kernel_command_line
 self.wait_for_console_pattern(console_pattern)
+
+def test_mips64el_malta(self):
+"""
+This test requires the ar tool to extract "data.tar.gz" from
+the Debian package.
+
+The kernel can be rebuilt using this Debian kernel source [1] and
+following the instructions on [2].
+
+[1] http://snapshot.debian.org/package/linux-2.6/2.6.32-48/
+#linux-source-2.6.32_2.6.32-48
+[2] https://kernel-team.pages.debian.net/kernel-handbook/
+ch-common-tasks.html#s-common-official
+
+:avocado: tags=arch:mips64el
+:avocado: tags=machine:malta
+"""
+deb_url = ('http://snapshot.debian.org/archive/debian/'
+   '20130217T032700Z/pool/main/l/linux-2.6/'
+   'linux-image-2.6.32-5-5kc-malta_2.6.32-48_mipsel.deb')
+deb_hash = '1aaec92083bf22fda31e0d27fa8d9a388e5fc3d5'
+deb_path = self.fetch_asset(deb_url, asset_hash=deb_hash)
+kernel_path = self.extract_from_deb(deb_path,
+'/boot/vmlinux-2.6.32-5-5kc-malta')
+
+self.vm.set_machine('malta')
+self.vm.set_console()
+kernel_command_line = self.KERNEL_COMMON_COMMAND_LINE + 'console=ttyS0'
+self.vm.add_args('-kernel', kernel_path,
+ '-append', kernel_command_line)
+self.vm.launch()
+console_pattern = 'Kernel command line: %s' % kernel_command_line
+self.wait_for_console_pattern(console_pattern)
-- 
2.20.1




[Qemu-devel] [PATCH v3 02/20] Acceptance tests: show avocado test execution by default

2019-02-20 Thread Cleber Rosa
The current version of the "check-acceptance" target will only show
one line for execution of all tests.  That's probably OK if the tests
to be run are quick enough and they're always the same.

But, there's already one test alone that takes on average ~5 seconds
to run, we intend to adapt the list of tests to match the user's build
environment (among other choices).

Because of that, let's present the default Avocado UI by default.
Users can always choose a different output by setting the AVOCADO_SHOW
variable.

Signed-off-by: Cleber Rosa 
Reviewed-by: Caio Carrara 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Alex Bennée 
---
 .travis.yml| 2 +-
 tests/Makefile.include | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/.travis.yml b/.travis.yml
index baa06b976a..42971484ab 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -199,7 +199,7 @@ matrix:
 # Acceptance (Functional) tests
 - env:
 - CONFIG="--python=/usr/bin/python3 --target-list=x86_64-softmmu"
-- TEST_CMD="make AVOCADO_SHOW=app check-acceptance"
+- TEST_CMD="make check-acceptance"
   addons:
 apt:
   packages:
diff --git a/tests/Makefile.include b/tests/Makefile.include
index b39e989f72..93ea42553e 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -1089,7 +1089,7 @@ TESTS_RESULTS_DIR=$(BUILD_DIR)/tests/results
 # Controls the output generated by Avocado when running tests.
 # Any number of command separated loggers are accepted.  For more
 # information please refer to "avocado --help".
-AVOCADO_SHOW=none
+AVOCADO_SHOW=app
 
 ifneq ($(findstring v2,"v$(PYTHON_VERSION)"),v2)
 $(TESTS_VENV_DIR): $(TESTS_VENV_REQ)
-- 
2.20.1




[Qemu-devel] [PATCH v3 01/20] scripts/qemu.py: log QEMU launch command line

2019-02-20 Thread Cleber Rosa
Even when the launch of QEMU succeeds, it's useful to have the command
line recorded.

Signed-off-by: Cleber Rosa 
Reviewed-by: Caio Carrara 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Alex Bennée 
---
 scripts/qemu.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/scripts/qemu.py b/scripts/qemu.py
index 32b00af5cc..ee85309923 100644
--- a/scripts/qemu.py
+++ b/scripts/qemu.py
@@ -320,6 +320,7 @@ class QEMUMachine(object):
 self._pre_launch()
 self._qemu_full_args = (self._wrapper + [self._binary] +
 self._base_args() + self._args)
+LOG.debug('VM launch command: %r', ' '.join(self._qemu_full_args))
 self._popen = subprocess.Popen(self._qemu_full_args,
stdin=devnull,
stdout=self._qemu_log_file,
-- 
2.20.1




[Qemu-devel] [PATCH v3 00/20] Acceptance Tests: target architecture support

2019-02-20 Thread Cleber Rosa
The current acceptance tests don't provide any type of architecture
information that can be used to influence the selection of the QEMU
binary used on them[1].  If one is running tests on a x86_64 host, the
default QEMU binary will be "x86_64-softmmu/qemu-system-x86_64".

Given the nature of QEMU, some tests will be architecture agnostic,
while others will be architecture dependent.  The "check-qtest" and
"check-qtest-TARGET" make targets exemplify that pattern.

For the acceptance tests, the same requirement exists.  Tests should
be allowed to influence the binary used, and when they don't, a
default selection mechanism should kick in[2].  The proposed solution
here requires only that an Avocado tag is set, such as:

   class My(Test):
   def test_nx_cpu_flag(self):
   """
   :avocado: tags=arch:x86_64
   """
   test_code()

The value of the "arch" key, in this case, "x86_64" will be used when
selecting the QEMU binary to use in the test.  At the same time, if
"x86_64-softmmu" is not a built target, the test will be filtered out
by "make check-acceptance"[3].

Besides the convention explained above, where the binary will be
selected from the "arch" tag, it's also possible to set an "arch"
*parameter* that will also influence the QEMU binary selection:

  $ avocado run -p arch=aarch64 works-on-many-arches.py

Finally, it's also posible to set the "qemu_bin" parameter, which will
define (instead of just influencing) the QEMU binary to be used:

 $ avocado run -p qemu_bin=qemu-bin-aarch64 test.py

As examples for the idea proposed here, a number of "boot linux
console" tests have been added, for a number of different target
architectures.  When the build environment includes them (as it has
been added to Travis CI jobs) the architecture specific tests will be
automatically executed.

As mentioned previously, this patch series include ideas present in
other patch series, and from different authors.  I tried by best
to include the information about authorship, but if I missed any,
please accept my apologies and let me know.

---

[1] - The "boot_linux_console.py" contains a "x86_64" test tag, but
  that is informational only, in the sense that it's not consumed
  by the test itself, or used by "make check-acceptance" to filter
  out tests.

[2] - This patch series doesn't attempt to change the default selection
  mechanism.  Possible changes in this area may include looking for
  any one built binary first, no matter the host architecture.

[3] - On a previous proposed version, the test class would look at the
  "arch" parameter given, and would cancel the test if there wasn't
  a match.

---

Changes from v2:


 * On "Acceptance tests: introduce arch parameter and attribute":
   - Made the documentation on the "arch" attribute and parameter
 clearer as to what the framework does with them, and what tests
 may use them for (Cornelia).

 * On "Acceptance tests: use "arch:" tag to filter target specific tests":
   - Bumped avocado-framework to the latest release (68.0, instead of
 67.0 on v2).

 * On "Boot Linux Console Test: add a test for mips64el + malta":
   - Broke down kernel source/doc URLs lines on docstring to make
 patchew happy about number of columns in line

 * On "Boot Linux Console Test: add a test for aarch64 + virt"
   - Broke down "kernel_command_line" assignment line to make patchew
 happy about number of columns in line

 * On "Boot Linux Console Test: add a test for arm + virt"
   - Broke down "kernel_command_line" assignment line to make patchew
 happy about number of columns in line

Open issues during v2:
==

 * The timeout change to 90s may not be necessary, or the best
   idea, given that a possible tcg+ppc64 performance regression
   has been identified.
   - https://lists.gnu.org/archive/html/qemu-devel/2019-02/msg00338.html

 * A possible race condition has been identified in the aarch64
   target, when running on an environment with more than 1 CPUs
   (initially reported by Wainer).
   - https://lists.gnu.org/archive/html/qemu-devel/2019-02/msg00192.html

Changes from v1:


 * On "Acceptance tests: introduce arch parameter and attribute":
   - Added explicit *host system architecture* to the "arch" parameter
 behavior documentation (Caio / Philippe)
   - Added explicit arch parameter name in call to pick_default_qemu_bin()
 (Caio)
   - Fixed the documentation about the value of the "arch" attribute when
 a parameter is not given (Wainer).

 * On "Acceptance tests: use "arch:" tag to filter target specific tests":
   - Updated "arch" tag on tests "linux_initrd.py" and
 "virtio_version.py" (Cornelia)
   - Fixed the documentation about the value of the "arch" attribute (Wainer)

 * On "Acceptance tests: look for target architecture in test tags first"
   - Fixed the documentation given that starting with this patch, the
 

[Qemu-devel] [PATCH v2 11/11] riscv: sifive_u: Allow up to 4 CPUs to be created

2019-02-20 Thread Alistair Francis
Signed-off-by: Alistair Francis 
---
 hw/riscv/sifive_u.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index 7bc25820fe..3199238ba0 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -398,7 +398,10 @@ static void riscv_sifive_u_machine_init(MachineClass *mc)
 {
 mc->desc = "RISC-V Board compatible with SiFive U SDK";
 mc->init = riscv_sifive_u_init;
-mc->max_cpus = 1;
+/* The real hardware has 5 CPUs, but one of them is a small embedded power
+ * management CPU.
+ */
+mc->max_cpus = 4;
 }
 
 DEFINE_MACHINE("sifive_u", riscv_sifive_u_machine_init)
-- 
2.20.1




Re: [Qemu-devel] [PATCH] hw/display: Add basic ATI VGA emulation

2019-02-20 Thread BALATON Zoltan

On Thu, 21 Feb 2019, BALATON Zoltan wrote:

On Tue, 19 Feb 2019, Peter Maydell wrote:

On Tue, 12 Feb 2019 at 23:59, BALATON Zoltan  wrote:

On Tue, 12 Feb 2019, Philippe Mathieu-Daudé wrote:

I'd have use a pair of extract32/deposit32 but this is probably easier


By the way, should these lines in include/qemu/bitops.h have 1ULL instead of 
1UL?

22 #define BIT(nr) (1UL << (nr))
23 #define BIT_MASK(nr)(1UL << ((nr) % BITS_PER_LONG))

Regards,
BALATON Zoltan


[Qemu-devel] [PATCH v2 09/11] RISC-V: Convert trap debugging to trace events

2019-02-20 Thread Alistair Francis
From: Michael Clark 

Cc: Palmer Dabbelt 
Cc: Alistair Francis 
Signed-off-by: Michael Clark 
Signed-off-by: Alistair Francis 
---
 Makefile.objs |  1 +
 target/riscv/cpu_helper.c | 12 +++-
 target/riscv/trace-events |  2 ++
 3 files changed, 6 insertions(+), 9 deletions(-)
 create mode 100644 target/riscv/trace-events

diff --git a/Makefile.objs b/Makefile.objs
index 5fb022d7ad..581bd97042 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -181,6 +181,7 @@ trace-events-subdirs += target/arm
 trace-events-subdirs += target/i386
 trace-events-subdirs += target/mips
 trace-events-subdirs += target/ppc
+trace-events-subdirs += target/riscv
 trace-events-subdirs += target/s390x
 trace-events-subdirs += target/sparc
 trace-events-subdirs += ui
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index a02f4dad8c..6d3fbc3401 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -22,8 +22,7 @@
 #include "cpu.h"
 #include "exec/exec-all.h"
 #include "tcg-op.h"
-
-#define RISCV_DEBUG_INTERRUPT 0
+#include "trace.h"
 
 int riscv_cpu_mmu_index(CPURISCVState *env, bool ifetch)
 {
@@ -493,13 +492,8 @@ void riscv_cpu_do_interrupt(CPUState *cs)
 }
 }
 
-if (RISCV_DEBUG_INTERRUPT) {
-qemu_log_mask(LOG_TRACE, "core " TARGET_FMT_ld ": %s %s, "
-"epc 0x" TARGET_FMT_lx ": tval 0x" TARGET_FMT_lx "\n",
-env->mhartid, async ? "intr" : "trap",
-(async ? riscv_intr_names : riscv_excp_names)[cause],
-env->pc, tval);
-}
+trace_riscv_trap(env->mhartid, async, cause, env->pc, tval, cause < 16 ?
+(async ? riscv_intr_names : riscv_excp_names)[cause] : "(unknown)");
 
 if (env->priv <= PRV_S &&
 cause < TARGET_LONG_BITS && ((deleg >> cause) & 1)) {
diff --git a/target/riscv/trace-events b/target/riscv/trace-events
new file mode 100644
index 00..48af0373df
--- /dev/null
+++ b/target/riscv/trace-events
@@ -0,0 +1,2 @@
+# target/riscv/cpu_helper.c
+riscv_trap(uint64_t hartid, bool async, uint64_t cause, uint64_t epc, uint64_t 
tval, const char *desc) "hart:%"PRId64", async:%d, cause:%"PRId64", 
epc:0x%"PRIx64", tval:0x%"PRIx64", desc=%s"
-- 
2.20.1




[Qemu-devel] [PATCH v2 05/11] elf: Add RISC-V PSABI ELF header defines

2019-02-20 Thread Alistair Francis
From: Michael Clark 

Refer to the RISC-V PSABI specification for details:

- https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md

Cc: Michael Tokarev 
Cc: Richard Henderson 
Cc: Alistair Francis 
Reviewed-by: Laurent Vivier 
Signed-off-by: Michael Clark 
Signed-off-by: Alistair Francis 
---
 include/elf.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/include/elf.h b/include/elf.h
index b35347eee7..ea7708a4ea 100644
--- a/include/elf.h
+++ b/include/elf.h
@@ -1393,6 +1393,16 @@ typedef struct {
 #define R_RISCV_SET16 55
 #define R_RISCV_SET32 56
 
+/* RISC-V ELF Flags.  */
+#define EF_RISCV_RVC  0x0001
+#define EF_RISCV_FLOAT_ABI0x0006
+#define EF_RISCV_FLOAT_ABI_SOFT   0x
+#define EF_RISCV_FLOAT_ABI_SINGLE 0x0002
+#define EF_RISCV_FLOAT_ABI_DOUBLE 0x0004
+#define EF_RISCV_FLOAT_ABI_QUAD   0x0006
+#define EF_RISCV_RVE  0x0008
+#define EF_RISCV_TSO  0x0010
+
 typedef struct elf32_rel {
   Elf32_Addr   r_offset;
   Elf32_Word   r_info;
-- 
2.20.1




[Qemu-devel] [PATCH v2 08/11] RISC-V: Add support for vectored interrupts

2019-02-20 Thread Alistair Francis
From: Michael Clark 

If vectored interrupts are enabled (bits[1:0]
of mtvec/stvec == 1) then use the following
logic for trap entry address calculation:

 pc = mtvec + cause * 4

In addition to adding support for vectored interrupts
this patch simplifies the interrupt delivery logic
by making sync/async cause decoding and encoding
steps distinct.

The cause code and the sign bit indicating sync/async
is split at the beginning of the function and fixed
cause is renamed to cause. The MSB setting for async
traps is delayed until setting mcause/scause to allow
redundant variables to be eliminated. Some variables
are renamed for conciseness and moved so that decls
are at the start of the block.

Cc: Palmer Dabbelt 
Cc: Alistair Francis 
Signed-off-by: Michael Clark 
Signed-off-by: Alistair Francis 
---
 target/riscv/cpu_helper.c | 145 ++
 target/riscv/csr.c|  12 ++--
 2 files changed, 60 insertions(+), 97 deletions(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 073bdcfe74..a02f4dad8c 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -454,118 +454,81 @@ void riscv_cpu_do_interrupt(CPUState *cs)
 RISCVCPU *cpu = RISCV_CPU(cs);
 CPURISCVState *env = >env;
 
-if (RISCV_DEBUG_INTERRUPT) {
-int log_cause = cs->exception_index & RISCV_EXCP_INT_MASK;
-if (cs->exception_index & RISCV_EXCP_INT_FLAG) {
-qemu_log_mask(LOG_TRACE, "core "
-TARGET_FMT_ld ": trap %s, epc 0x" TARGET_FMT_lx "\n",
-env->mhartid, riscv_intr_names[log_cause], env->pc);
-} else {
-qemu_log_mask(LOG_TRACE, "core "
-TARGET_FMT_ld ": intr %s, epc 0x" TARGET_FMT_lx "\n",
-env->mhartid, riscv_excp_names[log_cause], env->pc);
+/* cs->exception is 32-bits wide unlike mcause which is XLEN-bits wide
+ * so we mask off the MSB and separate into trap type and cause.
+ */
+bool async = !!(cs->exception_index & RISCV_EXCP_INT_FLAG);
+target_ulong cause = cs->exception_index & RISCV_EXCP_INT_MASK;
+target_ulong deleg = async ? env->mideleg : env->medeleg;
+target_ulong tval = 0;
+
+static const int ecall_cause_map[] = {
+[PRV_U] = RISCV_EXCP_U_ECALL,
+[PRV_S] = RISCV_EXCP_S_ECALL,
+[PRV_H] = RISCV_EXCP_H_ECALL,
+[PRV_M] = RISCV_EXCP_M_ECALL
+};
+
+if (!async) {
+/* set tval to badaddr for traps with address information */
+switch (cause) {
+case RISCV_EXCP_INST_ADDR_MIS:
+case RISCV_EXCP_INST_ACCESS_FAULT:
+case RISCV_EXCP_LOAD_ADDR_MIS:
+case RISCV_EXCP_STORE_AMO_ADDR_MIS:
+case RISCV_EXCP_LOAD_ACCESS_FAULT:
+case RISCV_EXCP_STORE_AMO_ACCESS_FAULT:
+case RISCV_EXCP_INST_PAGE_FAULT:
+case RISCV_EXCP_LOAD_PAGE_FAULT:
+case RISCV_EXCP_STORE_PAGE_FAULT:
+tval = env->badaddr;
+break;
+default:
+break;
 }
-}
-
-target_ulong fixed_cause = 0;
-if (cs->exception_index & (RISCV_EXCP_INT_FLAG)) {
-/* hacky for now. the MSB (bit 63) indicates interrupt but 
cs->exception
-   index is only 32 bits wide */
-fixed_cause = cs->exception_index & RISCV_EXCP_INT_MASK;
-fixed_cause |= ((target_ulong)1) << (TARGET_LONG_BITS - 1);
-} else {
-/* fixup User ECALL -> correct priv ECALL */
-if (cs->exception_index == RISCV_EXCP_U_ECALL) {
-switch (env->priv) {
-case PRV_U:
-fixed_cause = RISCV_EXCP_U_ECALL;
-break;
-case PRV_S:
-fixed_cause = RISCV_EXCP_S_ECALL;
-break;
-case PRV_H:
-fixed_cause = RISCV_EXCP_H_ECALL;
-break;
-case PRV_M:
-fixed_cause = RISCV_EXCP_M_ECALL;
-break;
-}
-} else {
-fixed_cause = cs->exception_index;
+/* ecall is dispatched as one cause so translate based on mode */
+if (cause == RISCV_EXCP_U_ECALL) {
+assert(env->priv <= 3);
+cause = ecall_cause_map[env->priv];
 }
 }
 
-target_ulong backup_epc = env->pc;
-
-target_ulong bit = fixed_cause;
-target_ulong deleg = env->medeleg;
-
-int hasbadaddr =
-(fixed_cause == RISCV_EXCP_INST_ADDR_MIS) ||
-(fixed_cause == RISCV_EXCP_INST_ACCESS_FAULT) ||
-(fixed_cause == RISCV_EXCP_LOAD_ADDR_MIS) ||
-(fixed_cause == RISCV_EXCP_STORE_AMO_ADDR_MIS) ||
-(fixed_cause == RISCV_EXCP_LOAD_ACCESS_FAULT) ||
-(fixed_cause == RISCV_EXCP_STORE_AMO_ACCESS_FAULT) ||
-(fixed_cause == RISCV_EXCP_INST_PAGE_FAULT) ||
-(fixed_cause == RISCV_EXCP_LOAD_PAGE_FAULT) ||
-(fixed_cause == RISCV_EXCP_STORE_PAGE_FAULT);
-
-if (bit & ((target_ulong)1 << (TARGET_LONG_BITS - 1))) {
-deleg = 

[Qemu-devel] [PATCH v2 07/11] RISC-V: Change local interrupts from edge to level

2019-02-20 Thread Alistair Francis
From: Michael Clark 

This effectively changes riscv_cpu_update_mip
from edge to level. i.e. cpu_interrupt or
cpu_reset_interrupt are called regardless of
the current interrupt level.

Fixes WFI doesn't return when a IPI is issued:

- https://github.com/riscv/riscv-qemu/issues/132

To test:

1) Apply RISC-V Linux CPU hotplug patch:

- http://lists.infradead.org/pipermail/linux-riscv/2018-May/000603.html

2) Enable CONFIG_CPU_HOTPLUG in linux .config

3) Try to offline and online cpus:

  echo 1 > /sys/devices/system/cpu/cpu2/online
  echo 0 > /sys/devices/system/cpu/cpu2/online
  echo 1 > /sys/devices/system/cpu/cpu2/online

Reported-by: Atish Patra 
Cc: Atish Patra 
Cc: Alistair Francis 
Signed-off-by: Michael Clark 
Signed-off-by: Alistair Francis 
---
 target/riscv/cpu_helper.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 555756d40c..073bdcfe74 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -95,9 +95,9 @@ uint32_t riscv_cpu_update_mip(RISCVCPU *cpu, uint32_t mask, 
uint32_t value)
 cmp = atomic_cmpxchg(>mip, old, new);
 } while (old != cmp);
 
-if (new && !old) {
+if (new) {
 cpu_interrupt(CPU(cpu), CPU_INTERRUPT_HARD);
-} else if (!new && old) {
+} else {
 cpu_reset_interrupt(CPU(cpu), CPU_INTERRUPT_HARD);
 }
 
-- 
2.20.1




[Qemu-devel] [PATCH v2 03/11] RISC-V: Allow interrupt controllers to claim interrupts

2019-02-20 Thread Alistair Francis
From: Michael Clark 

We can't allow the supervisor to control SEIP as this would allow the
supervisor to clear a pending external interrupt which will result in
lost a interrupt in the case a PLIC is attached. The SEIP bit must be
hardware controlled when a PLIC is attached.

This logic was previously hard-coded so SEIP was always masked even
if no PLIC was attached. This patch adds riscv_cpu_claim_interrupts
so that the PLIC can register control of SEIP. In the case of models
without a PLIC (spike), the SEIP bit remains software controlled.

This interface allows for hardware control of supervisor timer and
software interrupts by other interrupt controller models.

Cc: Palmer Dabbelt 
Cc: Sagar Karandikar 
Cc: Bastian Koppelmann 
Cc: Alistair Francis 
Signed-off-by: Michael Clark 
Signed-off-by: Alistair Francis 
---
 hw/riscv/sifive_plic.c| 15 +++
 target/riscv/cpu.h|  2 ++
 target/riscv/cpu_helper.c | 11 +++
 target/riscv/csr.c| 10 ++
 4 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/hw/riscv/sifive_plic.c b/hw/riscv/sifive_plic.c
index b859f919a7..1c703e1a37 100644
--- a/hw/riscv/sifive_plic.c
+++ b/hw/riscv/sifive_plic.c
@@ -23,6 +23,7 @@
 #include "qemu/error-report.h"
 #include "hw/sysbus.h"
 #include "target/riscv/cpu.h"
+#include "sysemu/sysemu.h"
 #include "hw/riscv/sifive_plic.h"
 
 #define RISCV_DEBUG_PLIC 0
@@ -431,6 +432,7 @@ static void sifive_plic_irq_request(void *opaque, int irq, 
int level)
 static void sifive_plic_realize(DeviceState *dev, Error **errp)
 {
 SiFivePLICState *plic = SIFIVE_PLIC(dev);
+int i;
 
 memory_region_init_io(>mmio, OBJECT(dev), _plic_ops, plic,
   TYPE_SIFIVE_PLIC, plic->aperture_size);
@@ -443,6 +445,19 @@ static void sifive_plic_realize(DeviceState *dev, Error 
**errp)
 plic->enable = g_new0(uint32_t, plic->bitfield_words * plic->num_addrs);
 sysbus_init_mmio(SYS_BUS_DEVICE(dev), >mmio);
 qdev_init_gpio_in(dev, sifive_plic_irq_request, plic->num_sources);
+
+/* We can't allow the supervisor to control SEIP as this would allow the
+ * supervisor to clear a pending external interrupt which will result in
+ * lost a interrupt in the case a PLIC is attached. The SEIP bit must be
+ * hardware controlled when a PLIC is attached.
+ */
+for (i = 0; i < smp_cpus; i++) {
+RISCVCPU *cpu = RISCV_CPU(qemu_get_cpu(i));
+if (riscv_cpu_claim_interrupts(cpu, MIP_SEIP) < 0) {
+error_report("SEIP already claimed");
+exit(1);
+}
+}
 }
 
 static void sifive_plic_class_init(ObjectClass *klass, void *data)
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 5c2aebf132..a0b3c22dec 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -140,6 +140,7 @@ struct CPURISCVState {
  * mip is 32-bits to allow atomic_read on 32-bit hosts.
  */
 uint32_t mip;
+uint32_t miclaim;
 
 target_ulong mie;
 target_ulong mideleg;
@@ -263,6 +264,7 @@ void riscv_cpu_list(FILE *f, fprintf_function cpu_fprintf);
 #define cpu_mmu_index riscv_cpu_mmu_index
 
 #ifndef CONFIG_USER_ONLY
+int riscv_cpu_claim_interrupts(RISCVCPU *cpu, uint32_t interrupts);
 uint32_t riscv_cpu_update_mip(RISCVCPU *cpu, uint32_t mask, uint32_t value);
 #define BOOL_TO_MASK(x) (-!!(x)) /* helper for riscv_cpu_update_mip value */
 #endif
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index f49e98ed59..555756d40c 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -72,6 +72,17 @@ bool riscv_cpu_exec_interrupt(CPUState *cs, int 
interrupt_request)
 
 #if !defined(CONFIG_USER_ONLY)
 
+int riscv_cpu_claim_interrupts(RISCVCPU *cpu, uint32_t interrupts)
+{
+CPURISCVState *env = >env;
+if (env->miclaim & interrupts) {
+return -1;
+} else {
+env->miclaim |= interrupts;
+return 0;
+}
+}
+
 /* iothread_mutex must be held */
 uint32_t riscv_cpu_update_mip(RISCVCPU *cpu, uint32_t mask, uint32_t value)
 {
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 960d2b0aa9..938c10897c 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -550,16 +550,10 @@ static int rmw_mip(CPURISCVState *env, int csrno, 
target_ulong *ret_value,
target_ulong new_value, target_ulong write_mask)
 {
 RISCVCPU *cpu = riscv_env_get_cpu(env);
-target_ulong mask = write_mask & delegable_ints;
+/* Allow software control of delegable interrupts not claimed by hardware 
*/
+target_ulong mask = write_mask & delegable_ints & ~env->miclaim;
 uint32_t old_mip;
 
-/* We can't allow the supervisor to control SEIP as this would allow the
- * supervisor to clear a pending external interrupt which will result in
- * lost a interrupt in the case a PLIC is attached. The SEIP bit must be
- * hardware controlled when a PLIC is attached. This should be an option
- * for CPUs with software-delegated Supervisor External 

[Qemu-devel] [PATCH v2 10/11] RISC-V: Update load reservation comment in do_interrupt

2019-02-20 Thread Alistair Francis
From: Michael Clark 

Cc: Palmer Dabbelt 
Cc: Alistair Francis 
Signed-off-by: Michael Clark 
Signed-off-by: Alistair Francis 
---
 target/riscv/cpu_helper.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 6d3fbc3401..b17f169681 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -525,7 +525,13 @@ void riscv_cpu_do_interrupt(CPUState *cs)
 ((async && (env->mtvec & 3) == 1) ? cause * 4 : 0);
 riscv_cpu_set_mode(env, PRV_M);
 }
-/* TODO yield load reservation  */
+
+/* NOTE: it is not necessary to yield load reservations here. It is only
+ * necessary for an SC from "another hart" to cause a load reservation
+ * to be yielded. Refer to the memory consistency model section of the
+ * RISC-V ISA Specification.
+ */
+
 #endif
 cs->exception_index = EXCP_NONE; /* mark handled to qemu */
 }
-- 
2.20.1




[Qemu-devel] [PATCH v2 01/11] riscv: pmp: Log pmp access errors as guest errors

2019-02-20 Thread Alistair Francis
Signed-off-by: Alistair Francis 
---
 target/riscv/pmp.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
index 15a5366616..b11c4ae22f 100644
--- a/target/riscv/pmp.c
+++ b/target/riscv/pmp.c
@@ -113,10 +113,11 @@ static void pmp_write_cfg(CPURISCVState *env, uint32_t 
pmp_index, uint8_t val)
 env->pmp_state.pmp[pmp_index].cfg_reg = val;
 pmp_update_rule(env, pmp_index);
 } else {
-PMP_DEBUG("ignoring write - locked");
+qemu_log_mask(LOG_GUEST_ERROR, "ignoring pmpcfg write - locked\n");
 }
 } else {
-PMP_DEBUG("ignoring write - out of bounds");
+qemu_log_mask(LOG_GUEST_ERROR,
+  "ignoring pmpcfg write - out of bounds\n");
 }
 }
 
@@ -249,7 +250,8 @@ bool pmp_hart_has_privs(CPURISCVState *env, target_ulong 
addr,
 
 /* partially inside */
 if ((s + e) == 1) {
-PMP_DEBUG("pmp violation - access is partially inside");
+qemu_log_mask(LOG_GUEST_ERROR,
+  "pmp violation - access is partially inside\n");
 ret = 0;
 break;
 }
@@ -306,7 +308,8 @@ void pmpcfg_csr_write(CPURISCVState *env, uint32_t 
reg_index,
 env->mhartid, reg_index, val);
 
 if ((reg_index & 1) && (sizeof(target_ulong) == 8)) {
-PMP_DEBUG("ignoring write - incorrect address");
+qemu_log_mask(LOG_GUEST_ERROR,
+  "ignoring pmpcfg write - incorrect address\n");
 return;
 }
 
@@ -353,10 +356,12 @@ void pmpaddr_csr_write(CPURISCVState *env, uint32_t 
addr_index,
 env->pmp_state.pmp[addr_index].addr_reg = val;
 pmp_update_rule(env, addr_index);
 } else {
-PMP_DEBUG("ignoring write - locked");
+qemu_log_mask(LOG_GUEST_ERROR,
+  "ignoring pmpaddr write - locked\n");
 }
 } else {
-PMP_DEBUG("ignoring write - out of bounds");
+qemu_log_mask(LOG_GUEST_ERROR,
+  "ignoring pmpaddr write - out of bounds\n");
 }
 }
 
@@ -372,7 +377,8 @@ target_ulong pmpaddr_csr_read(CPURISCVState *env, uint32_t 
addr_index)
 if (addr_index < MAX_RISCV_PMPS) {
 return env->pmp_state.pmp[addr_index].addr_reg;
 } else {
-PMP_DEBUG("ignoring read - out of bounds");
+qemu_log_mask(LOG_GUEST_ERROR,
+  "ignoring pmpaddr read - out of bounds\n");
 return 0;
 }
 }
-- 
2.20.1




[Qemu-devel] [PATCH v2 06/11] RISC-V: linux-user support for RVE ABI

2019-02-20 Thread Alistair Francis
From: Kito Cheng 

This change checks elf_flags for EF_RISCV_RVE and if
present uses the RVE linux syscall ABI which uses t0
for the syscall number instead of a7.

Warn and exit if a non-RVE ABI binary is run on a
cpu with the RVE extension as it is incompatible.

Cc: Palmer Dabbelt 
Cc: Sagar Karandikar 
Cc: Bastian Koppelmann 
Cc: Alistair Francis 
Co-authored-by: Kito Cheng 
Co-authored-by: Michael Clark 
Signed-off-by: Michael Clark 
Signed-off-by: Alistair Francis 
---
 linux-user/riscv/cpu_loop.c | 15 ++-
 target/riscv/cpu.h  |  4 
 target/riscv/cpu_user.h |  3 ++-
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/linux-user/riscv/cpu_loop.c b/linux-user/riscv/cpu_loop.c
index 4cf3e94632..a9bac4ca79 100644
--- a/linux-user/riscv/cpu_loop.c
+++ b/linux-user/riscv/cpu_loop.c
@@ -18,8 +18,10 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/error-report.h"
 #include "qemu.h"
 #include "cpu_loop-common.h"
+#include "elf.h"
 
 void cpu_loop(CPURISCVState *env)
 {
@@ -53,7 +55,8 @@ void cpu_loop(CPURISCVState *env)
 ret = 0;
 } else {
 ret = do_syscall(env,
- env->gpr[xA7],
+ env->gpr[(env->elf_flags & EF_RISCV_RVE)
+? xT0 : xA7],
  env->gpr[xA0],
  env->gpr[xA1],
  env->gpr[xA2],
@@ -113,6 +116,16 @@ void cpu_loop(CPURISCVState *env)
 
 void target_cpu_copy_regs(CPUArchState *env, struct target_pt_regs *regs)
 {
+CPUState *cpu = ENV_GET_CPU(env);
+TaskState *ts = cpu->opaque;
+struct image_info *info = ts->info;
+
 env->pc = regs->sepc;
 env->gpr[xSP] = regs->sp;
+env->elf_flags = info->elf_flags;
+
+if ((env->misa & RVE) && !(env->elf_flags & EF_RISCV_RVE)) {
+error_report("Incompatible ELF: RVE cpu requires RVE ABI binary");
+exit(EXIT_FAILURE);
+}
 }
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index a0b3c22dec..8e4b5cfe26 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -123,6 +123,10 @@ struct CPURISCVState {
 
 uint32_t features;
 
+#ifdef CONFIG_USER_ONLY
+uint32_t elf_flags;
+#endif
+
 #ifndef CONFIG_USER_ONLY
 target_ulong priv;
 target_ulong resetvec;
diff --git a/target/riscv/cpu_user.h b/target/riscv/cpu_user.h
index c2199610ab..52d380aa98 100644
--- a/target/riscv/cpu_user.h
+++ b/target/riscv/cpu_user.h
@@ -10,4 +10,5 @@
 #define xA4 14
 #define xA5 15
 #define xA6 16
-#define xA7 17  /* syscall number goes here */
+#define xA7 17  /* syscall number for RVI ABI */
+#define xT0 5   /* syscall number for RVE ABI */
-- 
2.20.1




[Qemu-devel] [PATCH v2 04/11] RISC-V: Remove unnecessary disassembler constraints

2019-02-20 Thread Alistair Francis
From: Michael Clark 

Remove machine generated constraints that are not
referenced by the pseudo-instruction constraints.

Cc: Palmer Dabbelt 
Cc: Sagar Karandikar 
Cc: Bastian Koppelmann 
Cc: Alistair Francis 
Signed-off-by: Michael Clark 
Signed-off-by: Alistair Francis 
---
 disas/riscv.c | 138 --
 1 file changed, 138 deletions(-)

diff --git a/disas/riscv.c b/disas/riscv.c
index 7fd1019623..27546dd790 100644
--- a/disas/riscv.c
+++ b/disas/riscv.c
@@ -87,33 +87,10 @@ typedef enum {
 
 typedef enum {
 rvc_end,
-rvc_simm_6,
-rvc_imm_6,
-rvc_imm_7,
-rvc_imm_8,
-rvc_imm_9,
-rvc_imm_10,
-rvc_imm_12,
-rvc_imm_18,
-rvc_imm_nz,
-rvc_imm_x2,
-rvc_imm_x4,
-rvc_imm_x8,
-rvc_imm_x16,
-rvc_rd_b3,
-rvc_rs1_b3,
-rvc_rs2_b3,
-rvc_rd_eq_rs1,
 rvc_rd_eq_ra,
-rvc_rd_eq_sp,
 rvc_rd_eq_x0,
-rvc_rs1_eq_sp,
 rvc_rs1_eq_x0,
 rvc_rs2_eq_x0,
-rvc_rd_ne_x0_x2,
-rvc_rd_ne_x0,
-rvc_rs1_ne_x0,
-rvc_rs2_ne_x0,
 rvc_rs2_eq_rs1,
 rvc_rs1_eq_ra,
 rvc_imm_eq_zero,
@@ -2522,111 +2499,16 @@ static bool check_constraints(rv_decode *dec, const 
rvc_constraint *c)
 uint8_t rd = dec->rd, rs1 = dec->rs1, rs2 = dec->rs2;
 while (*c != rvc_end) {
 switch (*c) {
-case rvc_simm_6:
-if (!(imm >= -32 && imm < 32)) {
-return false;
-}
-break;
-case rvc_imm_6:
-if (!(imm <= 63)) {
-return false;
-}
-break;
-case rvc_imm_7:
-if (!(imm <= 127)) {
-return false;
-}
-break;
-case rvc_imm_8:
-if (!(imm <= 255)) {
-return false;
-}
-break;
-case rvc_imm_9:
-if (!(imm <= 511)) {
-return false;
-}
-break;
-case rvc_imm_10:
-if (!(imm <= 1023)) {
-return false;
-}
-break;
-case rvc_imm_12:
-if (!(imm <= 4095)) {
-return false;
-}
-break;
-case rvc_imm_18:
-if (!(imm <= 262143)) {
-return false;
-}
-break;
-case rvc_imm_nz:
-if (!(imm != 0)) {
-return false;
-}
-break;
-case rvc_imm_x2:
-if (!((imm & 0b1) == 0)) {
-return false;
-}
-break;
-case rvc_imm_x4:
-if (!((imm & 0b11) == 0)) {
-return false;
-}
-break;
-case rvc_imm_x8:
-if (!((imm & 0b111) == 0)) {
-return false;
-}
-break;
-case rvc_imm_x16:
-if (!((imm & 0b) == 0)) {
-return false;
-}
-break;
-case rvc_rd_b3:
-if (!(rd  >= 8 && rd  <= 15)) {
-return false;
-}
-break;
-case rvc_rs1_b3:
-if (!(rs1 >= 8 && rs1 <= 15)) {
-return false;
-}
-break;
-case rvc_rs2_b3:
-if (!(rs2 >= 8 && rs2 <= 15)) {
-return false;
-}
-break;
-case rvc_rd_eq_rs1:
-if (!(rd == rs1)) {
-return false;
-}
-break;
 case rvc_rd_eq_ra:
 if (!(rd == 1)) {
 return false;
 }
 break;
-case rvc_rd_eq_sp:
-if (!(rd == 2)) {
-return false;
-}
-break;
 case rvc_rd_eq_x0:
 if (!(rd == 0)) {
 return false;
 }
 break;
-case rvc_rs1_eq_sp:
-if (!(rs1 == 2)) {
-return false;
-}
-break;
 case rvc_rs1_eq_x0:
 if (!(rs1 == 0)) {
 return false;
@@ -2637,26 +2519,6 @@ static bool check_constraints(rv_decode *dec, const 
rvc_constraint *c)
 return false;
 }
 break;
-case rvc_rd_ne_x0_x2:
-if (!(rd != 0 && rd != 2)) {
-return false;
-}
-break;
-case rvc_rd_ne_x0:
-if (!(rd != 0)) {
-return false;
-}
-break;
-case rvc_rs1_ne_x0:
-if (!(rs1 != 0)) {
-return false;
-}
-break;
-case rvc_rs2_ne_x0:
-if (!(rs2 != 0)) {
-return false;
-}
-break;
 case rvc_rs2_eq_rs1:
 if (!(rs2 == rs1)) {
 return false;
-- 
2.20.1




[Qemu-devel] [PATCH v2 02/11] RISC-V: Replace __builtin_popcount with ctpop8 in PLIC

2019-02-20 Thread Alistair Francis
From: Michael Clark 

The mode variable only uses the lower 4-bits (M,H,S,U) so
replace the GCC specific __builtin_popcount with ctpop8.

Cc: Palmer Dabbelt 
Cc: Sagar Karandikar 
Cc: Bastian Koppelmann 
Cc: Alistair Francis 
Signed-off-by: Michael Clark 
Signed-off-by: Alistair Francis 
---
 hw/riscv/sifive_plic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/riscv/sifive_plic.c b/hw/riscv/sifive_plic.c
index d12ec3fc9a..b859f919a7 100644
--- a/hw/riscv/sifive_plic.c
+++ b/hw/riscv/sifive_plic.c
@@ -383,7 +383,7 @@ static void parse_hart_config(SiFivePLICState *plic)
 p = plic->hart_config;
 while ((c = *p++)) {
 if (c == ',') {
-addrid += __builtin_popcount(modes);
+addrid += ctpop8(modes);
 modes = 0;
 hartid++;
 } else {
@@ -397,7 +397,7 @@ static void parse_hart_config(SiFivePLICState *plic)
 }
 }
 if (modes) {
-addrid += __builtin_popcount(modes);
+addrid += ctpop8(modes);
 }
 hartid++;
 
-- 
2.20.1




[Qemu-devel] [PATCH v2 00/11] Upstream RISC-V fork patches, part 4

2019-02-20 Thread Alistair Francis
v2:
 - Add a patch for SiFive U SMP support
 - Rebase on master

Alistair Francis (2):
  riscv: pmp: Log pmp access errors as guest errors
  riscv: sifive_u: Allow up to 4 CPUs to be created

Kito Cheng (1):
  RISC-V: linux-user support for RVE ABI

Michael Clark (8):
  RISC-V: Replace __builtin_popcount with ctpop8 in PLIC
  RISC-V: Allow interrupt controllers to claim interrupts
  RISC-V: Remove unnecessary disassembler constraints
  elf: Add RISC-V PSABI ELF header defines
  RISC-V: Change local interrupts from edge to level
  RISC-V: Add support for vectored interrupts
  RISC-V: Convert trap debugging to trace events
  RISC-V: Update load reservation comment in do_interrupt

 Makefile.objs   |   1 +
 disas/riscv.c   | 138 -
 hw/riscv/sifive_plic.c  |  19 +++-
 hw/riscv/sifive_u.c |   5 +-
 include/elf.h   |  10 +++
 linux-user/riscv/cpu_loop.c |  15 +++-
 target/riscv/cpu.h  |   6 ++
 target/riscv/cpu_helper.c   | 168 +++-
 target/riscv/cpu_user.h |   3 +-
 target/riscv/csr.c  |  22 ++---
 target/riscv/pmp.c  |  20 +++--
 target/riscv/trace-events   |   2 +
 12 files changed, 148 insertions(+), 261 deletions(-)
 create mode 100644 target/riscv/trace-events

-- 
2.20.1




Re: [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration

2019-02-20 Thread Zhao Yan
On Wed, Feb 20, 2019 at 11:01:43AM +, Dr. David Alan Gilbert wrote:
> * Zhao Yan (yan.y.z...@intel.com) wrote:
> > On Tue, Feb 19, 2019 at 11:32:13AM +, Dr. David Alan Gilbert wrote:
> > > * Yan Zhao (yan.y.z...@intel.com) wrote:
> > > > This patchset enables VFIO devices to have live migration capability.
> > > > Currently it does not support post-copy phase.
> > > > 
> > > > It follows Alex's comments on last version of VFIO live migration 
> > > > patches,
> > > > including device states, VFIO device state region layout, dirty bitmap's
> > > > query.
> > > 
> > > Hi,
> > >   I've sent minor comments to later patches; but some minor general
> > > comments:
> > > 
> > >   a) Never trust the incoming migrations stream - it might be corrupt,
> > > so check when you can.
> > hi Dave
> > Thanks for this suggestion. I'll add more checks for migration streams.
> > 
> > 
> > >   b) How do we detect if we're migrating from/to the wrong device or
> > > version of device?  Or say to a device with older firmware or perhaps
> > > a device that has less device memory ?
> > Actually it's still an open for VFIO migration. Need to think about
> > whether it's better to check that in libvirt or qemu (like a device magic
> > along with verion ?).
> > This patchset is intended to settle down the main device state interfaces
> > for VFIO migration. So that we can work on that and improve it.
> > 
> > 
> > >   c) Consider using the trace_ mechanism - it's really useful to
> > > add to loops writing/reading data so that you can see when it fails.
> > > 
> > > Dave
> > >
> > Got it. many thanks~~
> > 
> > 
> > > (P.S. You have a few typo's grep your code for 'devcie', 'devie' and
> > > 'migrtion'
> > 
> > sorry :)
> 
> No problem.
> 
> Given the mails, I'm guessing you've mostly tested this on graphics
> devices?  Have you also checked with VFIO network cards?
> 
yes, I tested it on Intel's graphics devices which do not have device
memory. so the cap of device-memory is off.
I believe this patchset can work well on VFIO network cards as well,
because Gonglei once said their NIC can work well on our previous code
(i.e. device-memory cap off).


> Also see the mail I sent in reply to Kirti's series; we need to boil
> these down to one solution.
>
Maybe Kirti can merge their implementaion into the code for device-memory
cap (like in my patch 5 for device-memory).

> Dave
> 
> > > 
> > > > Device Data
> > > > ---
> > > > Device data is divided into three types: device memory, device config,
> > > > and system memory dirty pages produced by device.
> > > > 
> > > > Device config: data like MMIOs, page tables...
> > > > Every device is supposed to possess device config data.
> > > > Usually device config's size is small (no big than 10M), and it
> > > > needs to be loaded in certain strict order.
> > > > Therefore, device config only needs to be saved/loaded in
> > > > stop-and-copy phase.
> > > > The data of device config is held in device config region.
> > > > Size of device config data is smaller than or equal to that of
> > > > device config region.
> > > > 
> > > > Device Memory: device's internal memory, standalone and outside system
> > > > memory. It is usually very big.
> > > > This kind of data needs to be saved / loaded in pre-copy and
> > > > stop-and-copy phase.
> > > > The data of device memory is held in device memory region.
> > > > Size of devie memory is usually larger than that of device
> > > > memory region. qemu needs to save/load it in chunks of size of
> > > > device memory region.
> > > > Not all device has device memory. Like IGD only uses system 
> > > > memory.
> > > > 
> > > > System memory dirty pages: If a device produces dirty pages in system
> > > > memory, it is able to get dirty bitmap for certain range of 
> > > > system
> > > > memory. This dirty bitmap is queried in pre-copy and 
> > > > stop-and-copy
> > > > phase in .log_sync callback. By setting dirty bitmap in 
> > > > .log_sync
> > > > callback, dirty pages in system memory will be save/loaded by 
> > > > ram's
> > > > live migration code.
> > > > The dirty bitmap of system memory is held in dirty bitmap 
> > > > region.
> > > > If system memory range is larger than that dirty bitmap region 
> > > > can
> > > > hold, qemu will cut it into several chunks and get dirty bitmap 
> > > > in
> > > > succession.
> > > > 
> > > > 
> > > > Device State Regions
> > > > 
> > > > Vendor driver is required to expose two mandatory regions and another 
> > > > two
> > > > optional regions if it plans to support device state management.
> > > > 
> > > > So, there are up to four regions in total.
> > > > One control region: mandatory.
> > > > Get access via read/write system call.
> > > > Its layout is 

Re: [Qemu-devel] [PATCH 0/5] QEMU VFIO live migration

2019-02-20 Thread Zhao Yan
On Wed, Feb 20, 2019 at 11:56:01AM +, Gonglei (Arei) wrote:
> Hi yan,
> 
> Thanks for your work.
> 
> I have some suggestions or questions:
> 
> 1) Would you add msix mode support,? if not, pls add a check in 
> vfio_pci_save_config(), likes Nvidia's solution.
ok.

> 2) We should start vfio devices before vcpu resumes, so we can't rely on vm 
> start change handler completely.
vfio devices is by default set to running state.
In the target machine, its state transition flow is running->stop->running.
so, maybe you can ignore the stop notification in kernel?
> 3) We'd better support live migration rollback since have many failure 
> scenarios,
>  register a migration notifier is a good choice.
I think this patchset can also handle the failure case well.
if migration failure or cancelling happens, 
in cleanup handler, LOGGING state is cleared. device state(running or
stopped) keeps as it is).
then,
if vm switches back to running, device state will be set to running;
if vm stayes at stopped state, device state is also stopped (it has no
meaning to let it in running state).
Do you think so ?

> 4) Four memory region for live migration is too complicated IMHO. 
one big region requires the sub-regions well padded.
like for the first control fields, they have to be padded to 4K.
the same for other data fields.
Otherwise, mmap simply fails, because the start-offset and size for mmap
both need to be PAGE aligned.

Also, 4 regions is clearer in my view :)

> 5) About log sync, why not register log_global_start/stop in 
> vfio_memory_listener?
> 
> 
seems log_global_start/stop cannot be iterately called in pre-copy phase?
for dirty pages in system memory, it's better to transfer dirty data
iteratively to reduce down time, right?


> Regards,
> -Gonglei
> 
> 
> > -Original Message-
> > From: Yan Zhao [mailto:yan.y.z...@intel.com]
> > Sent: Tuesday, February 19, 2019 4:51 PM
> > To: alex.william...@redhat.com; qemu-devel@nongnu.org
> > Cc: intel-gvt-...@lists.freedesktop.org; zhengxiao...@alibaba-inc.com;
> > yi.l@intel.com; eskul...@redhat.com; ziye.y...@intel.com;
> > coh...@redhat.com; shuangtai@alibaba-inc.com; dgilb...@redhat.com;
> > zhi.a.w...@intel.com; mlevi...@redhat.com; pa...@linux.ibm.com;
> > a...@ozlabs.ru; eau...@redhat.com; fel...@nutanix.com;
> > jonathan.dav...@nutanix.com; changpeng@intel.com; ken@amd.com;
> > kwankh...@nvidia.com; kevin.t...@intel.com; c...@nvidia.com; Gonglei (Arei)
> > ; k...@vger.kernel.org; Yan Zhao
> > 
> > Subject: [PATCH 0/5] QEMU VFIO live migration
> > 
> > This patchset enables VFIO devices to have live migration capability.
> > Currently it does not support post-copy phase.
> > 
> > It follows Alex's comments on last version of VFIO live migration patches,
> > including device states, VFIO device state region layout, dirty bitmap's
> > query.
> > 
> > Device Data
> > ---
> > Device data is divided into three types: device memory, device config,
> > and system memory dirty pages produced by device.
> > 
> > Device config: data like MMIOs, page tables...
> > Every device is supposed to possess device config data.
> > Usually device config's size is small (no big than 10M), and it
> > needs to be loaded in certain strict order.
> > Therefore, device config only needs to be saved/loaded in
> > stop-and-copy phase.
> > The data of device config is held in device config region.
> > Size of device config data is smaller than or equal to that of
> > device config region.
> > 
> > Device Memory: device's internal memory, standalone and outside system
> > memory. It is usually very big.
> > This kind of data needs to be saved / loaded in pre-copy and
> > stop-and-copy phase.
> > The data of device memory is held in device memory region.
> > Size of devie memory is usually larger than that of device
> > memory region. qemu needs to save/load it in chunks of size of
> > device memory region.
> > Not all device has device memory. Like IGD only uses system memory.
> > 
> > System memory dirty pages: If a device produces dirty pages in system
> > memory, it is able to get dirty bitmap for certain range of system
> > memory. This dirty bitmap is queried in pre-copy and stop-and-copy
> > phase in .log_sync callback. By setting dirty bitmap in .log_sync
> > callback, dirty pages in system memory will be save/loaded by ram's
> > live migration code.
> > The dirty bitmap of system memory is held in dirty bitmap region.
> > If system memory range is larger than that dirty bitmap region can
> > hold, qemu will cut it into several chunks and get dirty bitmap in
> > succession.
> > 
> > 
> > Device State Regions
> > 
> > Vendor driver is required to expose two mandatory regions and another two
> > optional regions if it plans to support 

Re: [Qemu-devel] [PATCH 5/5] vfio/migration: support device memory capability

2019-02-20 Thread Zhao Yan
On Wed, Feb 20, 2019 at 11:14:24AM +0100, Christophe de Dinechin wrote:
> 
> 
> > On 20 Feb 2019, at 08:58, Zhao Yan  wrote:
> > 
> > On Tue, Feb 19, 2019 at 03:42:36PM +0100, Christophe de Dinechin wrote:
> >> 
> >> 
> >>> On 19 Feb 2019, at 09:53, Yan Zhao  wrote:
> >>> 
> >>> If a device has device memory capability, save/load data from device 
> >>> memory
> >>> in pre-copy and stop-and-copy phases.
> >>> 
> >>> LOGGING state is set for device memory for dirty page logging:
> >>> in LOGGING state, get device memory returns whole device memory snapshot;
> >>> outside LOGGING state, get device memory returns dirty data since last get
> >>> operation.
> >>> 
> >>> Usually, device memory is very big, qemu needs to chunk it into several
> >>> pieces each with size of device memory region.
> >>> 
> >>> Signed-off-by: Yan Zhao 
> >>> Signed-off-by: Kirti Wankhede 
> >>> ---
> >>> hw/vfio/migration.c | 235 
> >>> ++--
> >>> hw/vfio/pci.h   |   1 +
> >>> 2 files changed, 231 insertions(+), 5 deletions(-)
> >>> 
> >>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> >>> index 16d6395..f1e9309 100644
> >>> --- a/hw/vfio/migration.c
> >>> +++ b/hw/vfio/migration.c
> >>> @@ -203,6 +203,201 @@ static int 
> >>> vfio_load_data_device_config(VFIOPCIDevice *vdev,
> >>>return 0;
> >>> }
> >>> 
> >>> +static int vfio_get_device_memory_size(VFIOPCIDevice *vdev)
> >>> +{
> >>> +VFIODevice *vbasedev = >vbasedev;
> >>> +VFIORegion *region_ctl =
> >>> +>migration->region[VFIO_DEVSTATE_REGION_CTL];
> >>> +uint64_t len;
> >>> +int sz;
> >>> +
> >>> +sz = sizeof(len);
> >>> +if (pread(vbasedev->fd, , sz,
> >>> +region_ctl->fd_offset +
> >>> +offsetof(struct vfio_device_state_ctl, 
> >>> device_memory.size))
> >>> +!= sz) {
> >>> +error_report("vfio: Failed to get length of device memory”);
> >> 
> >> s/length/size/ ? (to be consistent with function name)
> > 
> > ok. thanks
> >>> +return -1;
> >>> +}
> >>> +vdev->migration->devmem_size = len;
> >>> +return 0;
> >>> +}
> >>> +
> >>> +static int vfio_set_device_memory_size(VFIOPCIDevice *vdev, uint64_t 
> >>> size)
> >>> +{
> >>> +VFIODevice *vbasedev = >vbasedev;
> >>> +VFIORegion *region_ctl =
> >>> +>migration->region[VFIO_DEVSTATE_REGION_CTL];
> >>> +int sz;
> >>> +
> >>> +sz = sizeof(size);
> >>> +if (pwrite(vbasedev->fd, , sz,
> >>> +region_ctl->fd_offset +
> >>> +offsetof(struct vfio_device_state_ctl, 
> >>> device_memory.size))
> >>> +!= sz) {
> >>> +error_report("vfio: Failed to set length of device comemory”);
> >> 
> >> What is comemory? Typo?
> > 
> > Right, typo. should be "memory" :)
> >> 
> >> Same comment about length vs size
> >> 
> > got it. thanks
> > 
> >>> +return -1;
> >>> +}
> >>> +vdev->migration->devmem_size = size;
> >>> +return 0;
> >>> +}
> >>> +
> >>> +static
> >>> +int vfio_save_data_device_memory_chunk(VFIOPCIDevice *vdev, QEMUFile *f,
> >>> +uint64_t pos, uint64_t len)
> >>> +{
> >>> +VFIODevice *vbasedev = >vbasedev;
> >>> +VFIORegion *region_ctl =
> >>> +>migration->region[VFIO_DEVSTATE_REGION_CTL];
> >>> +VFIORegion *region_devmem =
> >>> +
> >>> >migration->region[VFIO_DEVSTATE_REGION_DATA_DEVICE_MEMORY];
> >>> +void *dest;
> >>> +uint32_t sz;
> >>> +uint8_t *buf = NULL;
> >>> +uint32_t action = VFIO_DEVICE_DATA_ACTION_GET_BUFFER;
> >>> +
> >>> +if (len > region_devmem->size) {
> >> 
> >> Is it intentional that there is no error_report here?
> >> 
> > an error_report here may be better.
> >>> +return -1;
> >>> +}
> >>> +
> >>> +sz = sizeof(pos);
> >>> +if (pwrite(vbasedev->fd, , sz,
> >>> +region_ctl->fd_offset +
> >>> +offsetof(struct vfio_device_state_ctl, 
> >>> device_memory.pos))
> >>> +!= sz) {
> >>> +error_report("vfio: Failed to set save buffer pos");
> >>> +return -1;
> >>> +}
> >>> +sz = sizeof(action);
> >>> +if (pwrite(vbasedev->fd, , sz,
> >>> +region_ctl->fd_offset +
> >>> +offsetof(struct vfio_device_state_ctl, 
> >>> device_memory.action))
> >>> +!= sz) {
> >>> +error_report("vfio: Failed to set save buffer action");
> >>> +return -1;
> >>> +}
> >>> +
> >>> +if (!vfio_device_state_region_mmaped(region_devmem)) {
> >>> +buf = g_malloc(len);
> >>> +if (buf == NULL) {
> >>> +error_report("vfio: Failed to allocate memory for migrate”);
> >> s/migrate/migration/ ?
> > 
> > yes, thanks
> >>> +return -1;
> >>> +}
> >>> +if (pread(vbasedev->fd, buf, len, region_devmem->fd_offset) != 
> >>> len) {
> >>> +error_report("vfio: error load device memory buffer”);
> >> 

[Qemu-devel] [PATCH 1/2] target/arm: Implement ARMv8.0-SB

2019-02-20 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/arm/cpu.h   | 10 ++
 linux-user/elfload.c   |  1 +
 target/arm/cpu.c   |  1 +
 target/arm/cpu64.c |  2 ++
 target/arm/translate-a64.c | 14 ++
 target/arm/translate.c | 22 ++
 6 files changed, 50 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 0480f9baba..76d6a73c0e 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3302,6 +3302,11 @@ static inline bool isar_feature_aa32_dp(const 
ARMISARegisters *id)
 return FIELD_EX32(id->id_isar6, ID_ISAR6, DP) != 0;
 }
 
+static inline bool isar_feature_aa32_sb(const ARMISARegisters *id)
+{
+return FIELD_EX32(id->id_isar6, ID_ISAR6, SB) != 0;
+}
+
 static inline bool isar_feature_aa32_fp16_arith(const ARMISARegisters *id)
 {
 /*
@@ -3405,6 +3410,11 @@ static inline bool isar_feature_aa64_pauth(const 
ARMISARegisters *id)
  FIELD_DP64(0, ID_AA64ISAR1, GPI, 0xf))) != 0;
 }
 
+static inline bool isar_feature_aa64_sb(const ARMISARegisters *id)
+{
+return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, SB) != 0;
+}
+
 static inline bool isar_feature_aa64_fp16(const ARMISARegisters *id)
 {
 /* We always set the AdvSIMD and FP fields identically wrt FP16.  */
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index ef7138839d..02ba705e73 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -603,6 +603,7 @@ static uint32_t get_elf_hwcap(void)
 GET_FEATURE_ID(aa64_sve, ARM_HWCAP_A64_SVE);
 GET_FEATURE_ID(aa64_pauth, ARM_HWCAP_A64_PACA | ARM_HWCAP_A64_PACG);
 GET_FEATURE_ID(aa64_condm_4, ARM_HWCAP_A64_FLAGM);
+GET_FEATURE_ID(aa64_sb, ARM_HWCAP_A64_SB);
 
 #undef GET_FEATURE_ID
 
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index a5599ae19f..5cd27f2f64 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -2027,6 +2027,7 @@ static void arm_max_initfn(Object *obj)
 
 t = cpu->isar.id_isar6;
 t = FIELD_DP32(t, ID_ISAR6, DP, 1);
+t = FIELD_DP32(t, ID_ISAR6, SB, 1);
 cpu->isar.id_isar6 = t;
 
 t = cpu->id_mmfr4;
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index fc54734256..95c6ee4cda 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -343,6 +343,7 @@ static void aarch64_max_initfn(Object *obj)
 t = FIELD_DP64(t, ID_AA64ISAR1, API, 0);
 t = FIELD_DP64(t, ID_AA64ISAR1, GPA, 1);
 t = FIELD_DP64(t, ID_AA64ISAR1, GPI, 0);
+t = FIELD_DP64(t, ID_AA64ISAR1, SB, 1);
 cpu->isar.id_aa64isar1 = t;
 
 t = cpu->isar.id_aa64pfr0;
@@ -373,6 +374,7 @@ static void aarch64_max_initfn(Object *obj)
 
 u = cpu->isar.id_isar6;
 u = FIELD_DP32(u, ID_ISAR6, DP, 1);
+u = FIELD_DP32(u, ID_ISAR6, SB, 1);
 cpu->isar.id_isar6 = u;
 
 /*
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 1d9bf81c0e..40c4f2fe54 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -1638,7 +1638,21 @@ static void handle_sync(DisasContext *s, uint32_t insn,
 reset_btype(s);
 gen_goto_tb(s, 0, s->pc);
 return;
+
+case 7: /* SB */
+if (crm != 0 || !dc_isar_feature(aa64_sb, s)) {
+goto do_unallocated;
+}
+/*
+ * TODO: There is no speculation barrier opcode for TCG;
+ * MB and end the TB instead.
+ */
+tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
+s->base.is_jmp = DISAS_TOO_MANY;
+return;
+
 default:
+do_unallocated:
 unallocated_encoding(s);
 return;
 }
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 92f0c8d557..796ba2df43 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -9192,6 +9192,17 @@ static void disas_arm_insn(DisasContext *s, unsigned int 
insn)
  */
 gen_goto_tb(s, 0, s->pc & ~1);
 return;
+case 7: /* sb */
+if (!dc_isar_feature(aa32_sb, s)) {
+goto illegal_op;
+}
+/*
+ * TODO: There is no speculation barrier opcode
+ * for TCG; MB and end the TB instead.
+ */
+tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
+s->base.is_jmp = DISAS_TOO_MANY;
+return;
 default:
 goto illegal_op;
 }
@@ -11810,6 +11821,17 @@ static void disas_thumb2_insn(DisasContext *s, 
uint32_t insn)
  */
 gen_goto_tb(s, 0, s->pc & ~1);
 break;
+case 7: /* sb */
+if (!dc_isar_feature(aa32_sb, s)) {
+goto illegal_op;
+}
+/*
+ * TODO: There is no speculation barrier opcode
+

[Qemu-devel] [PATCH 0/2] target/arm: SB and PredRes extensions

2019-02-20 Thread Richard Henderson
Both of these are defined by the ARMv8.5 spec, but back-defined
as v8.0 extensions.

All of the relevant instructions are nops within QEMU.  Tested by
locally setting SCTLR_EL1.EnRCTX for aarch64-linux-user and then
executing each of the insns to see that they decode properly.

The SB extension is already upstream in linux 5.0-rc1, with the
HWCAP entry.  The PredRes extension has no upstream support yet,
so we need to wait to see what they do for userland ABI.


r~


Richard Henderson (2):
  target/arm: Implement ARMv8.0-SB
  target/arm: Implement ARMv8.0-PredRes

 target/arm/cpu.h   | 21 
 linux-user/elfload.c   |  1 +
 target/arm/cpu.c   |  2 ++
 target/arm/cpu64.c |  4 
 target/arm/helper.c| 49 ++
 target/arm/translate-a64.c | 14 +++
 target/arm/translate.c | 22 +
 7 files changed, 113 insertions(+)

-- 
2.17.2




[Qemu-devel] [PATCH 2/2] target/arm: Implement ARMv8.0-PredRes

2019-02-20 Thread Richard Henderson
This is named "Execution and Data prediction restriction instructions"
within the ARMv8.5 manual, and given the name "PredRes" by binutils.

Signed-off-by: Richard Henderson 
---
 target/arm/cpu.h| 11 ++
 target/arm/cpu.c|  1 +
 target/arm/cpu64.c  |  2 ++
 target/arm/helper.c | 49 +
 4 files changed, 63 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 76d6a73c0e..202ff1f1ea 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1074,6 +1074,7 @@ void pmu_init(ARMCPU *cpu);
 #define SCTLR_UMA (1U << 9) /* v8 onward, AArch64 only */
 #define SCTLR_F   (1U << 10) /* up to v6 */
 #define SCTLR_SW  (1U << 10) /* v7, RES0 in v8 */
+#define SCTLR_EnRCTX  (1U << 10) /* in v8.0-specres */
 #define SCTLR_Z   (1U << 11) /* in v7, RES1 in v8 */
 #define SCTLR_EOS (1U << 11) /* v8.5-ExS */
 #define SCTLR_I   (1U << 12)
@@ -3307,6 +3308,11 @@ static inline bool isar_feature_aa32_sb(const 
ARMISARegisters *id)
 return FIELD_EX32(id->id_isar6, ID_ISAR6, SB) != 0;
 }
 
+static inline bool isar_feature_aa32_specres(const ARMISARegisters *id)
+{
+return FIELD_EX32(id->id_isar6, ID_ISAR6, SPECRES) != 0;
+}
+
 static inline bool isar_feature_aa32_fp16_arith(const ARMISARegisters *id)
 {
 /*
@@ -3415,6 +3421,11 @@ static inline bool isar_feature_aa64_sb(const 
ARMISARegisters *id)
 return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, SB) != 0;
 }
 
+static inline bool isar_feature_aa64_specres(const ARMISARegisters *id)
+{
+return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, SPECRES) != 0;
+}
+
 static inline bool isar_feature_aa64_fp16(const ARMISARegisters *id)
 {
 /* We always set the AdvSIMD and FP fields identically wrt FP16.  */
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 5cd27f2f64..c1d2848baa 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -2028,6 +2028,7 @@ static void arm_max_initfn(Object *obj)
 t = cpu->isar.id_isar6;
 t = FIELD_DP32(t, ID_ISAR6, DP, 1);
 t = FIELD_DP32(t, ID_ISAR6, SB, 1);
+t = FIELD_DP32(t, ID_ISAR6, SPECRES, 1);
 cpu->isar.id_isar6 = t;
 
 t = cpu->id_mmfr4;
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 95c6ee4cda..5f273399db 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -344,6 +344,7 @@ static void aarch64_max_initfn(Object *obj)
 t = FIELD_DP64(t, ID_AA64ISAR1, GPA, 1);
 t = FIELD_DP64(t, ID_AA64ISAR1, GPI, 0);
 t = FIELD_DP64(t, ID_AA64ISAR1, SB, 1);
+t = FIELD_DP64(t, ID_AA64ISAR1, SPECRES, 1);
 cpu->isar.id_aa64isar1 = t;
 
 t = cpu->isar.id_aa64pfr0;
@@ -375,6 +376,7 @@ static void aarch64_max_initfn(Object *obj)
 u = cpu->isar.id_isar6;
 u = FIELD_DP32(u, ID_ISAR6, DP, 1);
 u = FIELD_DP32(u, ID_ISAR6, SB, 1);
+u = FIELD_DP32(u, ID_ISAR6, SPECRES, 1);
 cpu->isar.id_isar6 = u;
 
 /*
diff --git a/target/arm/helper.c b/target/arm/helper.c
index a2ab300051..c34b1401bd 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -5884,6 +5884,50 @@ static const ARMCPRegInfo mte_reginfo[] = {
 };
 #endif
 
+static CPAccessResult access_specres(CPUARMState *env, const ARMCPRegInfo *ri,
+ bool isread)
+{
+int el = arm_current_el(env);
+
+if (el == 0) {
+uint64_t sctlr = arm_sctlr(env, el);
+if (!(sctlr & SCTLR_EnRCTX)) {
+return CP_ACCESS_TRAP;
+}
+} else if (el == 1) {
+uint64_t hcr = arm_hcr_el2_eff(env);
+if (hcr & HCR_NV) {
+return CP_ACCESS_TRAP_EL2;
+}
+}
+return CP_ACCESS_OK;
+}
+
+static const ARMCPRegInfo specres_reginfo[] = {
+{ .name = "CFP_RCTX", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 3, .opc2 = 4,
+  .type = ARM_CP_NOP, .access = PL0_W, .accessfn = access_specres },
+{ .name = "DVP_RCTX", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 3, .opc2 = 5,
+  .type = ARM_CP_NOP, .access = PL0_W, .accessfn = access_specres },
+{ .name = "CPP_RCTX", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 3, .crn = 7, .crm = 3, .opc2 = 7,
+  .type = ARM_CP_NOP, .access = PL0_W, .accessfn = access_specres },
+/*
+ * Note the AArch32 opcodes have a different OPC1.
+ */
+{ .name = "CFPRCTX", .state = ARM_CP_STATE_AA32,
+  .cp = 15, .opc1 = 0, .crn = 7, .crm = 3, .opc2 = 4,
+  .type = ARM_CP_NOP, .access = PL0_W, .accessfn = access_specres },
+{ .name = "DVPRCTX", .state = ARM_CP_STATE_AA32,
+  .cp = 15, .opc1 = 0, .crn = 7, .crm = 3, .opc2 = 5,
+  .type = ARM_CP_NOP, .access = PL0_W, .accessfn = access_specres },
+{ .name = "CPPRCTX", .state = ARM_CP_STATE_AA32,
+  .cp = 15, .opc1 = 0, .crn = 7, .crm = 3, .opc2 = 7,
+  .type = ARM_CP_NOP, .access = PL0_W, .accessfn = access_specres },
+REGINFO_SENTINEL

Re: [Qemu-devel] [PATCH] hw/display: Add basic ATI VGA emulation

2019-02-20 Thread BALATON Zoltan

On Tue, 19 Feb 2019, Peter Maydell wrote:

On Tue, 12 Feb 2019 at 23:59, BALATON Zoltan  wrote:

On Tue, 12 Feb 2019, Philippe Mathieu-Daudé wrote:

On 2/11/19 4:19 AM, BALATON Zoltan wrote:


This is where my question about valid/impl on mem ops started but I asked 
separately again after not getting an answer here. Then Peter answered 
here so I'm merging these threads again. I think for me this is solved 
with that I can't use mem ops for this even if it was working so I'm only 
recording here what I've found and will likely stay with implementing 
this in the device model.



[...]

+
+static void ati_reg_write_offs(uint32_t *reg, int offs,
+   uint64_t data, unsigned int size)
+{
+int shift, i;
+uint32_t mask;
+
+for (i = 0; i < size; i++) {
+shift = (offs + i) * 8;
+mask = 0xffUL << shift;
+*reg &= ~mask;
+*reg |= (data & 0xff) << shift;
+data >>= 8;


I'd have use a pair of extract32/deposit32 but this is probably easier
to singlestep.


You've told me that before but I have concerns about the asserts in those
functions which to me seem like unnecessary overhead in such low level
functions so unless these are removed or *_noassert versions introduced
I'll stay away from them.


The code above is IMHO pretty hard to read -- you have to
think through all the shifts and masks to figure out exactly
what is being done. I would definitely recommend extract32/deposit32,
as they convey the intent much better. You're already inside a
register accessor for a device model, there is much more overhead
on this path than a few assert condition checks. (And they do
catch bugs -- they found one in the arm code last month.)

(Alternatively, if you believe the overhead of the asserts matters,
then provide benchmarking demonstrating it, and we could look
at restricting this assert to the case where start and length are
compile-time constant, or to only the --enable-debug build.)


But I'm also not too happy about these *_offs functions but some registers
support 8/16/32 bit access and guest code seems to actually do this to
update bits in the middle of the register at an odd address. Best would be
if I could just set .impl.min = 4, .impl.max = 4 and .valid.min = 1
.valid.max = 4 for the mem region ops but I'm not sure that would work or
would it? If that's working maybe I should just go with that instead.


This will work, but only if all the registers in the memory region
are happy with "read 32 bits, write back 32 bits", ie they have
no "write-1-to-clear", "special behaviour on read", etc. (The
memory system will implement byte reads as "read 32 bits, modify,
write back".) If it does work then that's a nice way to do it.


Based on a quick try This does not seem to work. With setting 
impl.{min,max}=4 and valid.min=1 valid.max=4 I get:


ati_mm_write 4 0x9c0  <- 0x0
ati_mm_write 4 0x55  <- 0x4
ati_mm_write 4 0x50  <- 0x3000200

where this was before:

ati_mm_write 4 0x9c0  <- 0x0
ati_mm_write 1 0x55  <- 0x4
ati_mm_write 4 0x50  <- 0x3000200

and should probably be something like:

ati_mm_read 4 0x54  -> 0x0
ati_mm_write 4 0x54  <- 0x400

if access was adjusted as expected. So only the size was adjusted but not 
the address or value. Should I also add unaligned to either valid or impl? 
But probably that would not help either, I see a comment saying:

/* FIXME: support unaligned access? */ in memory.c:access_with_adjusted_size().

But now I think that it would be a better idea to not use valid/impl for 
this but keep a local function (maybe rewritten to use deposit/extract) 
for now and use that explicitely. This is better for two reasons: no added 
read before write which might have side effects and also can model device 
better which only allows unaligned access to some registers but not all. 
It's a bit more code but I think this cannot be correctly handled by the 
memory subsystem anyway even if the above is fixed.



Anything that
distracts from actual values and makes it harder to read (such as
timestamps and pids added by trace)


-d trace:your_trace_event doesn't add timestamps or PIDs, FWIW.


Neither this seems to work, I still get pid@timestamp: added to log lines 
with this (same as with -trace enable=pattern). Should I use something 
else than "log" trace backend? (But I can live with this, it's only 
slightly distracting as this is before the log line so I can just start 
reading from the colon, the interesting info is at the end of the line 
anyway.)


Regards,
BALATON Zoltan


Re: [Qemu-devel] [PATCH 2/5] vfio/migration: support device of device config capability

2019-02-20 Thread Zhao Yan
On Tue, Feb 19, 2019 at 03:37:24PM +0100, Cornelia Huck wrote:
> On Tue, 19 Feb 2019 16:52:27 +0800
> Yan Zhao  wrote:
> 
> > Device config is the default data that every device should have. so
> > device config capability is by default on, no need to set.
> > 
> > - Currently two type of resources are saved/loaded for device of device
> >   config capability:
> >   General PCI config data, and Device config data.
> >   They are copies as a whole when precopy is stopped.
> > 
> > Migration setup flow:
> > - Setup device state regions, check its device state version and 
> > capabilities.
> >   Mmap Device Config Region and Dirty Bitmap Region, if available.
> > - If device state regions are failed to get setup, a migration blocker is
> >   registered instead.
> > - Added SaveVMHandlers to register device state save/load handlers.
> > - Register VM state change handler to set device's running/stop states.
> > - On migration startup on source machine, set device's state to
> >   VFIO_DEVICE_STATE_LOGGING
> > 
> > Signed-off-by: Yan Zhao 
> > Signed-off-by: Yulei Zhang 
> > ---
> >  hw/vfio/Makefile.objs |   2 +-
> >  hw/vfio/migration.c   | 633 
> > ++
> >  hw/vfio/pci.c |   1 -
> >  hw/vfio/pci.h |  25 +-
> >  include/hw/vfio/vfio-common.h |   1 +
> >  5 files changed, 659 insertions(+), 3 deletions(-)
> >  create mode 100644 hw/vfio/migration.c
> > 
> > diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
> > index 8b3f664..f32ff19 100644
> > --- a/hw/vfio/Makefile.objs
> > +++ b/hw/vfio/Makefile.objs
> > @@ -1,6 +1,6 @@
> >  ifeq ($(CONFIG_LINUX), y)
> >  obj-$(CONFIG_SOFTMMU) += common.o
> > -obj-$(CONFIG_PCI) += pci.o pci-quirks.o display.o
> > +obj-$(CONFIG_PCI) += pci.o pci-quirks.o display.o migration.o
> 
> I think you want to split the migration code: The type-independent
> code, and the pci-specific code.
>
ok. actually, now only saving/loading of pci generic config data is
pci-specific. the data getting/setting through device state
interfaces are type-independent.

> >  obj-$(CONFIG_VFIO_CCW) += ccw.o
> >  obj-$(CONFIG_SOFTMMU) += platform.o
> >  obj-$(CONFIG_VFIO_XGMAC) += calxeda-xgmac.o
> > diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> > new file mode 100644
> > index 000..16d6395
> > --- /dev/null
> > +++ b/hw/vfio/migration.c
> > @@ -0,0 +1,633 @@
> > +#include "qemu/osdep.h"
> > +
> > +#include "hw/vfio/vfio-common.h"
> > +#include "migration/blocker.h"
> > +#include "migration/register.h"
> > +#include "qapi/error.h"
> > +#include "pci.h"
> > +#include "sysemu/kvm.h"
> > +#include "exec/ram_addr.h"
> > +
> > +#define VFIO_SAVE_FLAG_SETUP 0
> > +#define VFIO_SAVE_FLAG_PCI 1
> > +#define VFIO_SAVE_FLAG_DEVCONFIG 2
> > +#define VFIO_SAVE_FLAG_DEVMEMORY 4
> > +#define VFIO_SAVE_FLAG_CONTINUE 8
> > +
> > +static int vfio_device_state_region_setup(VFIOPCIDevice *vdev,
> > +VFIORegion *region, uint32_t subtype, const char *name)
> 
> This function looks like it should be more generic and e.g. take a
> VFIODevice instead of a VFIOPCIDevice as argument.
> 
> > +{
> > +VFIODevice *vbasedev = >vbasedev;
> > +struct vfio_region_info *info;
> > +int ret;
> > +
> > +ret = vfio_get_dev_region_info(vbasedev, VFIO_REGION_TYPE_DEVICE_STATE,
> > +subtype, );
> > +if (ret) {
> > +error_report("Failed to get info of region %s", name);
> > +return ret;
> > +}
> > +
> > +if (vfio_region_setup(OBJECT(vdev), vbasedev,
> > +region, info->index, name)) {
> > +error_report("Failed to setup migrtion region %s", name);
> > +return ret;
> > +}
> > +
> > +if (vfio_region_mmap(region)) {
> > +error_report("Failed to mmap migrtion region %s", name);
> > +}
> > +
> > +return 0;
> > +}
> > +
> > +bool vfio_device_data_cap_system_memory(VFIOPCIDevice *vdev)
> > +{
> > +   return !!(vdev->migration->data_caps & 
> > VFIO_DEVICE_DATA_CAP_SYSTEM_MEMORY);
> > +}
> > +
> > +bool vfio_device_data_cap_device_memory(VFIOPCIDevice *vdev)
> > +{
> > +   return !!(vdev->migration->data_caps & 
> > VFIO_DEVICE_DATA_CAP_DEVICE_MEMORY);
> > +}
> 
> These two as well. The migration structure should probably hang off the
> VFIODevice instead.
>
ok.

> > +
> > +static bool vfio_device_state_region_mmaped(VFIORegion *region)
> > +{
> > +bool mmaped = true;
> > +if (region->nr_mmaps != 1 || region->mmaps[0].offset ||
> > +(region->size != region->mmaps[0].size) ||
> > +(region->mmaps[0].mmap == NULL)) {
> > +mmaped = false;
> > +}
> > +
> > +return mmaped;
> > +}
> 
> s/mmaped/mmapped/ ?

yes :)
> 
> > +
> > +static int vfio_get_device_config_size(VFIOPCIDevice *vdev)
> > +{
> > +VFIODevice *vbasedev = >vbasedev;
> > +VFIORegion *region_ctl =
> > +>migration->region[VFIO_DEVSTATE_REGION_CTL];
> > +VFIORegion *region_config =
> > +

[Qemu-devel] [PATCH v7 14/17] nvdimm: use configurable ACPI IO base and size

2019-02-20 Thread Eric Auger
From: Kwangwoo Lee 

This patch uses configurable IO base and size to create NPIO AML for
ACPI NFIT. Since a different architecture like AArch64 does not use
port-mapped IO, a configurable IO base is required to create correct
mapping of ACPI IO address and size.

Signed-off-by: Kwangwoo Lee 
Signed-off-by: Eric Auger 

---
v6 -> v7:
- Use NvdimmDsmIO constant
- use AcpiGenericAddress instead of AcpiNVDIMMIOEntry

v2 -> v3:
- s/size/len in pc_piix.c and pc_q35.c
---
 hw/acpi/nvdimm.c| 31 ++-
 hw/i386/pc_piix.c   |  6 +-
 hw/i386/pc_q35.c|  6 +-
 include/hw/mem/nvdimm.h |  4 
 4 files changed, 36 insertions(+), 11 deletions(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index e53b2cb681..fddc790945 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -33,6 +33,9 @@
 #include "hw/nvram/fw_cfg.h"
 #include "hw/mem/nvdimm.h"
 
+const struct AcpiGenericAddress NvdimmDsmIO = { .space_id = AML_AS_SYSTEM_IO,
+.bit_width = NVDIMM_ACPI_IO_LEN << 3, .address = NVDIMM_ACPI_IO_BASE};
+
 static int nvdimm_device_list(Object *obj, void *opaque)
 {
 GSList **list = opaque;
@@ -929,8 +932,8 @@ void nvdimm_init_acpi_state(AcpiNVDIMMState *state, 
MemoryRegion *io,
 FWCfgState *fw_cfg, Object *owner)
 {
 memory_region_init_io(>io_mr, owner, _dsm_ops, state,
-  "nvdimm-acpi-io", NVDIMM_ACPI_IO_LEN);
-memory_region_add_subregion(io, NVDIMM_ACPI_IO_BASE, >io_mr);
+  "nvdimm-acpi-io", state->dsm_io.bit_width >> 3);
+memory_region_add_subregion(io, state->dsm_io.address, >io_mr);
 
 state->dsm_mem = g_array_new(false, true /* clear */, 1);
 acpi_data_push(state->dsm_mem, sizeof(NvdimmDsmIn));
@@ -959,12 +962,14 @@ void nvdimm_init_acpi_state(AcpiNVDIMMState *state, 
MemoryRegion *io,
 
 #define NVDIMM_QEMU_RSVD_UUID   "648B9CF2-CDA1-4312-8AD9-49C4AF32BD62"
 
-static void nvdimm_build_common_dsm(Aml *dev)
+static void nvdimm_build_common_dsm(Aml *dev,
+AcpiNVDIMMState *acpi_nvdimm_state)
 {
 Aml *method, *ifctx, *function, *handle, *uuid, *dsm_mem, *elsectx2;
 Aml *elsectx, *unsupport, *unpatched, *expected_uuid, *uuid_invalid;
 Aml *pckg, *pckg_index, *pckg_buf, *field, *dsm_out_buf, *dsm_out_buf_size;
 uint8_t byte_list[1];
+AmlRegionSpace rs;
 
 method = aml_method(NVDIMM_COMMON_DSM, 5, AML_SERIALIZED);
 uuid = aml_arg(0);
@@ -975,9 +980,16 @@ static void nvdimm_build_common_dsm(Aml *dev)
 
 aml_append(method, aml_store(aml_name(NVDIMM_ACPI_MEM_ADDR), dsm_mem));
 
+if (acpi_nvdimm_state->dsm_io.space_id == AML_AS_SYSTEM_IO) {
+rs = AML_SYSTEM_IO;
+} else {
+rs = AML_SYSTEM_MEMORY;
+}
+
 /* map DSM memory and IO into ACPI namespace. */
-aml_append(method, aml_operation_region(NVDIMM_DSM_IOPORT, AML_SYSTEM_IO,
-   aml_int(NVDIMM_ACPI_IO_BASE), NVDIMM_ACPI_IO_LEN));
+aml_append(method, aml_operation_region(NVDIMM_DSM_IOPORT, rs,
+   aml_int(acpi_nvdimm_state->dsm_io.address),
+   acpi_nvdimm_state->dsm_io.bit_width >> 3));
 aml_append(method, aml_operation_region(NVDIMM_DSM_MEMORY,
AML_SYSTEM_MEMORY, dsm_mem, sizeof(NvdimmDsmIn)));
 
@@ -1260,7 +1272,8 @@ static void nvdimm_build_nvdimm_devices(Aml *root_dev, 
uint32_t ram_slots)
 }
 
 static void nvdimm_build_ssdt(GArray *table_offsets, GArray *table_data,
-  BIOSLinker *linker, GArray *dsm_dma_arrea,
+  BIOSLinker *linker,
+  AcpiNVDIMMState *acpi_nvdimm_state,
   uint32_t ram_slots)
 {
 Aml *ssdt, *sb_scope, *dev;
@@ -1288,7 +1301,7 @@ static void nvdimm_build_ssdt(GArray *table_offsets, 
GArray *table_data,
  */
 aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0012")));
 
-nvdimm_build_common_dsm(dev);
+nvdimm_build_common_dsm(dev, acpi_nvdimm_state);
 
 /* 0 is reserved for root device. */
 nvdimm_build_device_dsm(dev, 0);
@@ -1307,7 +1320,7 @@ static void nvdimm_build_ssdt(GArray *table_offsets, 
GArray *table_data,
NVDIMM_ACPI_MEM_ADDR);
 
 bios_linker_loader_alloc(linker,
- NVDIMM_DSM_MEM_FILE, dsm_dma_arrea,
+ NVDIMM_DSM_MEM_FILE, acpi_nvdimm_state->dsm_mem,
  sizeof(NvdimmDsmIn), false /* high memory */);
 bios_linker_loader_add_pointer(linker,
 ACPI_BUILD_TABLE_FILE, mem_addr_offset, sizeof(uint32_t),
@@ -1329,7 +1342,7 @@ void nvdimm_build_acpi(GArray *table_offsets, GArray 
*table_data,
 return;
 }
 
-nvdimm_build_ssdt(table_offsets, table_data, linker, state->dsm_mem,
+nvdimm_build_ssdt(table_offsets, table_data, linker, state,
   ram_slots);
 
 device_list = nvdimm_get_device_list();
diff --git 

Re: [Qemu-devel] [PATCH v6 00/73] per-CPU locks

2019-02-20 Thread Emilio G. Cota
On Wed, Feb 20, 2019 at 09:27:06 -0800, Richard Henderson wrote:
> Thanks for the patience.  Both Alex and I have now completed review, and I
> think this is ready for merge.
> 
> There are some patch conflicts with master, so if you can fix those and post a
> v7, we'll get it merged right away.

Thanks for reviewing! Will send a v7 in a few days.

Emilio



[Qemu-devel] [PATCH v7 11/17] hw/arm/boot: Expose the PC-DIMM nodes in the DT

2019-02-20 Thread Eric Auger
From: Shameer Kolothum 

This patch adds memory nodes corresponding to PC-DIMM regions.

NV_DIMM and ACPI_NVDIMM configs are not yet set for ARM so we
don't need to care about NV-DIMM at this stage.

Signed-off-by: Shameer Kolothum 
Signed-off-by: Eric Auger 

---
v6 -> v7:
- rework the error messages, use a switch/case
v3 -> v4:
- git rid of @base and @len in fdt_add_hotpluggable_memory_nodes

v1 -> v2:
- added qapi_free_MemoryDeviceInfoList and simplify the loop
---
 hw/arm/boot.c | 42 ++
 1 file changed, 42 insertions(+)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index a830655e1a..255aaca0cf 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -19,6 +19,7 @@
 #include "sysemu/numa.h"
 #include "hw/boards.h"
 #include "hw/loader.h"
+#include "hw/mem/memory-device.h"
 #include "elf.h"
 #include "sysemu/device_tree.h"
 #include "qemu/config-file.h"
@@ -522,6 +523,41 @@ static void fdt_add_psci_node(void *fdt)
 qemu_fdt_setprop_cell(fdt, "/psci", "migrate", migrate_fn);
 }
 
+static int fdt_add_hotpluggable_memory_nodes(void *fdt,
+ uint32_t acells, uint32_t scells) 
{
+MemoryDeviceInfoList *info, *info_list = qmp_memory_device_list();
+MemoryDeviceInfo *mi;
+int ret = 0;
+
+for (info = info_list; info != NULL; info = info->next) {
+mi = info->value;
+switch (mi->type) {
+case MEMORY_DEVICE_INFO_KIND_DIMM:
+{
+PCDIMMDeviceInfo *di = mi->u.dimm.data;
+
+ret = fdt_add_memory_node(fdt, acells, di->addr,
+  scells, di->size, di->node);
+if (ret) {
+fprintf(stderr,
+"couldn't add PCDIMM /memory@%"PRIx64" node\n",
+di->addr);
+goto out;
+}
+break;
+}
+default:
+fprintf(stderr, "%s memory nodes are not yet supported\n",
+MemoryDeviceInfoKind_str(mi->type));
+ret = -ENOENT;
+goto out;
+}
+}
+out:
+qapi_free_MemoryDeviceInfoList(info_list);
+return ret;
+}
+
 int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
  hwaddr addr_limit, AddressSpace *as)
 {
@@ -621,6 +657,12 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info 
*binfo,
 }
 }
 
+rc = fdt_add_hotpluggable_memory_nodes(fdt, acells, scells);
+if (rc < 0) {
+fprintf(stderr, "couldn't add hotpluggable memory nodes\n");
+goto fail;
+}
+
 rc = fdt_path_offset(fdt, "/chosen");
 if (rc < 0) {
 qemu_fdt_add_subnode(fdt, "/chosen");
-- 
2.20.1




[Qemu-devel] [PATCH v7 12/17] hw/arm/virt-acpi-build: Add PC-DIMM in SRAT

2019-02-20 Thread Eric Auger
From: Shameer Kolothum 

Generate Memory Affinity Structures for PC-DIMM ranges.

Signed-off-by: Shameer Kolothum 
Signed-off-by: Eric Auger 
Reviewed-by: Igor Mammedov 

---

v6 -> v7:
- add Igor's R-b

v5 -> v6:
- fix mingw compil issue

v4 -> v5:
- Align to x86 code and especially
  "pc: acpi: revert back to 1 SRAT entry for hotpluggable area"

v3 -> v4:
- do not use vms->bootinfo.device_memory_start/device_memory_size anymore

v1 -> v2:
- build_srat_hotpluggable_memory movedc to aml-build
---
 hw/arm/virt-acpi-build.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 829d2f0035..781eafaf5e 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -516,6 +516,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 int i, srat_start;
 uint64_t mem_base;
 MachineClass *mc = MACHINE_GET_CLASS(vms);
+MachineState *ms = MACHINE(vms);
 const CPUArchIdList *cpu_list = mc->possible_cpu_arch_ids(MACHINE(vms));
 
 srat_start = table_data->len;
@@ -541,6 +542,14 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 }
 }
 
+if (ms->device_memory) {
+numamem = acpi_data_push(table_data, sizeof *numamem);
+build_srat_memory(numamem, ms->device_memory->base,
+  memory_region_size(>device_memory->mr),
+  nb_numa_nodes - 1,
+  MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
+}
+
 build_header(linker, table_data, (void *)(table_data->data + srat_start),
  "SRAT", table_data->len - srat_start, 3, NULL, NULL);
 }
-- 
2.20.1




[Qemu-devel] [PATCH v7 17/17] hw/arm/virt: Add nvdimm and nvdimm-persistence options

2019-02-20 Thread Eric Auger
Machine option nvdimm allows to turn NVDIMM support on.

Signed-off-by: Eric Auger 
---
 hw/arm/virt.c | 59 +--
 1 file changed, 57 insertions(+), 2 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 1896920570..c7e68e2428 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1814,6 +1814,47 @@ static void virt_set_iommu(Object *obj, const char 
*value, Error **errp)
 }
 }
 
+static bool virt_get_nvdimm(Object *obj, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+return vms->acpi_nvdimm_state.is_enabled;
+}
+
+static void virt_set_nvdimm(Object *obj, bool value, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+vms->acpi_nvdimm_state.is_enabled = value;
+}
+
+static char *virt_get_nvdimm_persistence(Object *obj, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+return g_strdup(vms->acpi_nvdimm_state.persistence_string);
+}
+
+static void virt_set_nvdimm_persistence(Object *obj, const char *value,
+Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+AcpiNVDIMMState *nvdimm_state = >acpi_nvdimm_state;
+
+if (strcmp(value, "cpu") == 0)
+nvdimm_state->persistence = 3;
+else if (strcmp(value, "mem-ctrl") == 0)
+nvdimm_state->persistence = 2;
+else {
+error_report("-machine nvdimm-persistence=%s: unsupported option",
+ value);
+exit(EXIT_FAILURE);
+}
+
+g_free(nvdimm_state->persistence_string);
+nvdimm_state->persistence_string = g_strdup(value);
+}
+
 static CpuInstanceProperties
 virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
 {
@@ -1856,13 +1897,14 @@ static void virt_memory_pre_plug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
  Error **errp)
 {
 const bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
+VirtMachineState *vms = VIRT_MACHINE(hotplug_dev);
 
 if (dev->hotplugged) {
 error_setg(errp, "memory hotplug is not supported");
 }
 
-if (is_nvdimm) {
-error_setg(errp, "nvdimm is not yet supported");
+if (is_nvdimm && !vms->acpi_nvdimm_state.is_enabled) {
+error_setg(errp, "nvdimm is not enabled: missing 'nvdimm' in '-M'");
 return;
 }
 
@@ -2076,6 +2118,19 @@ static void virt_instance_init(Object *obj)
 vms->extended_memmap = true;
 }
 
+object_property_add_bool(obj, "nvdimm",
+ virt_get_nvdimm, virt_set_nvdimm, NULL);
+object_property_set_description(obj, "nvdimm",
+ "Set on/off to enable/disable NVDIMM "
+ "instantiation", NULL);
+
+object_property_add_str(obj, "nvdimm-persistence",
+virt_get_nvdimm_persistence,
+virt_set_nvdimm_persistence, NULL);
+object_property_set_description(obj, "nvdimm-persistence",
+"Set NVDIMM persistence"
+"Valid values are cpu and mem-ctrl", NULL);
+
 vms->irqmap = a15irqmap;
 }
 
-- 
2.20.1




[Qemu-devel] [PATCH v7 13/17] hw/arm/virt: Allocate device_memory

2019-02-20 Thread Eric Auger
The device memory region is located after the initial RAM.
its start/size are 1GB aligned.

Signed-off-by: Eric Auger 
Signed-off-by: Kwangwoo Lee 

---
v6 -> v7:
- check the device memory top does not wrap
- check the device memory can fit the slots

v4 -> v5:
- device memory set after the initial RAM

v3 -> v4:
- remove bootinfo.device_memory_start/device_memory_size
- rename VIRT_HOTPLUG_MEM into VIRT_DEVICE_MEM
---
 hw/arm/virt.c | 33 +
 1 file changed, 33 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 470ca0ce2d..33ad9b3f63 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -62,6 +62,7 @@
 #include "target/arm/internals.h"
 #include "hw/mem/pc-dimm.h"
 #include "hw/mem/nvdimm.h"
+#include "hw/acpi/acpi.h"
 
 #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
 static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
@@ -1263,6 +1264,34 @@ static void create_secure_ram(VirtMachineState *vms,
 g_free(nodename);
 }
 
+static void create_device_memory(VirtMachineState *vms, MemoryRegion *sysmem)
+{
+MachineState *ms = MACHINE(vms);
+
+if (!vms->device_memory_size) {
+return;
+}
+
+if (ms->ram_slots > ACPI_MAX_RAM_SLOTS) {
+error_report("unsupported number of memory slots: %"PRIu64,
+ ms->ram_slots);
+exit(EXIT_FAILURE);
+}
+
+if (QEMU_ALIGN_UP(ms->maxram_size, GiB) != ms->maxram_size) {
+error_report("maximum memory size must be GiB aligned");
+exit(EXIT_FAILURE);
+}
+
+ms->device_memory = g_malloc0(sizeof(*ms->device_memory));
+ms->device_memory->base = vms->device_memory_base;
+
+memory_region_init(>device_memory->mr, OBJECT(vms),
+   "device-memory", vms->device_memory_size);
+memory_region_add_subregion(sysmem, ms->device_memory->base,
+>device_memory->mr);
+}
+
 static void *machvirt_dtb(const struct arm_boot_info *binfo, int *fdt_size)
 {
 const VirtMachineState *board = container_of(binfo, VirtMachineState,
@@ -1610,6 +1639,10 @@ static void machvirt_init(MachineState *machine)
  machine->ram_size);
 memory_region_add_subregion(sysmem, vms->memmap[VIRT_MEM].base, ram);
 
+if (vms->extended_memmap) {
+create_device_memory(vms, sysmem);
+}
+
 create_flash(vms, sysmem, secure_sysmem ? secure_sysmem : sysmem);
 
 create_gic(vms, pic);
-- 
2.20.1




[Qemu-devel] [PATCH v7 08/17] hw/arm/virt: Implement kvm_type function for 4.0 machine

2019-02-20 Thread Eric Auger
This patch implements the machine class kvm_type() callback.
It returns the number of bits requested to implement the whole GPA
range including the RAM and IO regions located beyond.
The returned value in passed though the KVM_CREATE_VM ioctl and
this allows KVM to set the stage2 tables dynamically.

Signed-off-by: Eric Auger 

---

v6 -> v7:
- Introduce RAMBASE and rename add LEGACY_ prefix in that patch
- use local variables with explicit names in virt_set_memmap:
  device_memory_base, device_memory_size
- add an extended_memmap field in the class

v5 -> v6:
- add some comments
- high IO region cannot start before 256GiB
---
 hw/arm/virt.c | 50 ++-
 include/hw/arm/virt.h |  2 ++
 2 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 9db602457b..ad3a0ad73d 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1437,7 +1437,14 @@ static void machvirt_init(MachineState *machine)
 bool firmware_loaded = bios_name || drive_get(IF_PFLASH, 0, 0);
 bool aarch64 = true;
 
-virt_set_memmap(vms);
+/*
+ * In accelerated mode, the memory map is computed in kvm_type(),
+ * if set, to create a VM with the right number of IPA bits.
+ */
+
+if (!mc->kvm_type || !kvm_enabled()) {
+virt_set_memmap(vms);
+}
 
 /* We can probe only here because during property set
  * KVM is not available yet
@@ -1814,6 +1821,36 @@ static HotplugHandler 
*virt_machine_get_hotplug_handler(MachineState *machine,
 return NULL;
 }
 
+/*
+ * for arm64 kvm_type [7-0] encodes the requested number of bits
+ * in the IPA address space
+ */
+static int virt_kvm_type(MachineState *ms, const char *type_str)
+{
+VirtMachineState *vms = VIRT_MACHINE(ms);
+int max_vm_pa_size = kvm_arm_get_max_vm_ipa_size(ms);
+int requested_pa_size;
+
+/* we freeze the memory map to compute the highest gpa */
+virt_set_memmap(vms);
+
+requested_pa_size = 64 - clz64(vms->highest_gpa);
+
+if (requested_pa_size > max_vm_pa_size) {
+error_report("-m and ,maxmem option values "
+ "require an IPA range (%d bits) larger than "
+ "the one supported by the host (%d bits)",
+ requested_pa_size, max_vm_pa_size);
+   exit(1);
+}
+/*
+ * By default we return 0 which corresponds to an implicit legacy
+ * 40b IPA setting. Otherwise we return the actual requested PA
+ * logsize
+ */
+return requested_pa_size > 40 ? requested_pa_size : 0;
+}
+
 static void virt_machine_class_init(ObjectClass *oc, void *data)
 {
 MachineClass *mc = MACHINE_CLASS(oc);
@@ -1838,6 +1875,7 @@ static void virt_machine_class_init(ObjectClass *oc, void 
*data)
 mc->cpu_index_to_instance_props = virt_cpu_index_to_props;
 mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a15");
 mc->get_default_cpu_node_id = virt_get_default_cpu_node_id;
+mc->kvm_type = virt_kvm_type;
 assert(!mc->get_hotplug_handler);
 mc->get_hotplug_handler = virt_machine_get_hotplug_handler;
 hc->plug = virt_machine_device_plug_cb;
@@ -1909,6 +1947,12 @@ static void virt_instance_init(Object *obj)
 "Valid values are none and smmuv3",
 NULL);
 
+if (vmc->no_extended_memmap) {
+vms->extended_memmap = false;
+} else {
+vms->extended_memmap = true;
+}
+
 vms->irqmap = a15irqmap;
 }
 
@@ -1939,8 +1983,12 @@ DEFINE_VIRT_MACHINE_AS_LATEST(4, 0)
 
 static void virt_machine_3_1_options(MachineClass *mc)
 {
+VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
 virt_machine_4_0_options(mc);
 compat_props_add(mc->compat_props, hw_compat_3_1, hw_compat_3_1_len);
+
+/* extended memory map is enabled from 4.0 onwards */
+vmc->no_extended_memmap = true;
 }
 DEFINE_VIRT_MACHINE(3, 1)
 
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index acad0400d8..7798462cb0 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -106,6 +106,7 @@ typedef struct {
 bool claim_edge_triggered_timers;
 bool smbios_old_sys_ver;
 bool no_highmem_ecam;
+bool no_extended_memmap;
 } VirtMachineClass;
 
 typedef struct {
@@ -135,6 +136,7 @@ typedef struct {
 hwaddr highest_gpa;
 hwaddr device_memory_base;
 hwaddr device_memory_size;
+bool extended_memmap;
 } VirtMachineState;
 
 #define VIRT_ECAM_ID(high) (high ? VIRT_HIGH_PCIE_ECAM : VIRT_PCIE_ECAM)
-- 
2.20.1




[Qemu-devel] [PATCH v7 06/17] vl: Set machine ram_size, maxram_size and ram_slots earlier

2019-02-20 Thread Eric Auger
The machine RAM attributes will need to be analyzed during the
configure_accelerator() process. especially kvm_type() arm64
machine callback will use them to know how many IPA/GPA bits are
needed to model the whole RAM range. So let's assign those machine
state fields before calling configure_accelerator.

Signed-off-by: Eric Auger 
Reviewed-by: Peter Maydell 

---
v6 -> v7:
- add Peter's R-b

v4: new
---
 vl.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/vl.c b/vl.c
index 502857a176..fd0d51320d 100644
--- a/vl.c
+++ b/vl.c
@@ -4239,6 +4239,9 @@ int main(int argc, char **argv, char **envp)
 machine_opts = qemu_get_machine_opts();
 qemu_opt_foreach(machine_opts, machine_set_property, current_machine,
  _fatal);
+current_machine->ram_size = ram_size;
+current_machine->maxram_size = maxram_size;
+current_machine->ram_slots = ram_slots;
 
 configure_accelerator(current_machine, argv[0]);
 
@@ -4434,9 +4437,6 @@ int main(int argc, char **argv, char **envp)
 replay_checkpoint(CHECKPOINT_INIT);
 qdev_machine_init();
 
-current_machine->ram_size = ram_size;
-current_machine->maxram_size = maxram_size;
-current_machine->ram_slots = ram_slots;
 current_machine->boot_order = boot_order;
 
 /* parse features once if machine provides default cpu_type */
-- 
2.20.1




[Qemu-devel] [PATCH v7 07/17] hw/arm/virt: Dynamic memory map depending on RAM requirements

2019-02-20 Thread Eric Auger
Up to now the memory map has been static and the high IO region
base has always been 256GiB.

This patch modifies the virt_set_memmap() function, which freezes
the memory map, so that the high IO range base becomes floating,
located after the initial RAM and the device memory.

The function computes
- the base of the device memory,
- the size of the device memory and
- the highest GPA used in the memory map.

The two former will be used when defining the device memory region
while the latter will be used at VM creation to choose the requested
IPA size.

Setting all the existing highmem IO regions beyond the RAM
allows to have a single contiguous RAM region (initial RAM and
possible hotpluggable device memory). That way we do not need
to do invasive changes in the EDK2 FW to support a dynamic
RAM base.

Still the user cannot request an initial RAM size greater than 255GB.
Also we handle the case where maxmem or slots options are passed,
although no device memory is usable at the moment. In this case, we
just ignore those settings.

Signed-off-by: Eric Auger 
---
 hw/arm/virt.c | 47 ++-
 include/hw/arm/virt.h |  3 +++
 2 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 12039a0367..9db602457b 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -107,8 +107,9 @@
  * of a terabyte of RAM will be doing it on a host with more than a
  * terabyte of physical address space.)
  */
-#define RAMLIMIT_GB 255
-#define RAMLIMIT_BYTES (RAMLIMIT_GB * 1024ULL * 1024 * 1024)
+#define RAMBASE GiB
+#define LEGACY_RAMLIMIT_GB 255
+#define LEGACY_RAMLIMIT_BYTES (LEGACY_RAMLIMIT_GB * GiB)
 
 /* Addresses and sizes of our components.
  * 0..128MB is space for a flash device so we can run bootrom code such as 
UEFI.
@@ -149,7 +150,7 @@ static const MemMapEntry base_memmap[] = {
 [VIRT_PCIE_MMIO] =  { 0x1000, 0x2eff },
 [VIRT_PCIE_PIO] =   { 0x3eff, 0x0001 },
 [VIRT_PCIE_ECAM] =  { 0x3f00, 0x0100 },
-[VIRT_MEM] ={ 0x4000, RAMLIMIT_BYTES },
+[VIRT_MEM] ={ RAMBASE, LEGACY_RAMLIMIT_BYTES },
 };
 
 /*
@@ -1367,16 +1368,48 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
*vms, int idx)
 
 static void virt_set_memmap(VirtMachineState *vms)
 {
+MachineState *ms = MACHINE(vms);
 hwaddr base;
 int i;
 
+if (ms->maxram_size > ms->ram_size || ms->ram_slots > 0) {
+error_report("mach-virt: does not support device memory: "
+ "ignore maxmem and slots options");
+ms->maxram_size = ms->ram_size;
+ms->ram_slots = 0;
+}
+if (ms->ram_size > (ram_addr_t)LEGACY_RAMLIMIT_BYTES) {
+error_report("mach-virt: cannot model more than %dGB RAM",
+ LEGACY_RAMLIMIT_GB);
+exit(1);
+}
+
 vms->memmap = extended_memmap;
 
 for (i = 0; i < ARRAY_SIZE(base_memmap); i++) {
 vms->memmap[i] = base_memmap[i];
 }
 
-vms->high_io_base = 256 * GiB; /* Top of the legacy initial RAM region */
+/*
+ * We compute the base of the high IO region depending on the
+ * amount of initial and device memory. The device memory start/size
+ * is aligned on 1GiB. We never put the high IO region below 256GiB
+ * so that if maxram_size is < 255GiB we keep the legacy memory map.
+ * The device region size assumes 1GiB page max alignment per slot.
+ */
+vms->device_memory_base = ROUND_UP(RAMBASE + ms->ram_size, GiB);
+vms->device_memory_size = ms->maxram_size - ms->ram_size +
+  ms->ram_slots * GiB;
+
+vms->high_io_base = vms->device_memory_base +
+ROUND_UP(vms->device_memory_size, GiB);
+if (vms->high_io_base < vms->device_memory_base) {
+error_report("maxmem/slots too huge");
+exit(EXIT_FAILURE);
+}
+if (vms->high_io_base < 256 * GiB) {
+vms->high_io_base = 256 * GiB;
+}
 base = vms->high_io_base;
 
 for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
@@ -1387,6 +1420,7 @@ static void virt_set_memmap(VirtMachineState *vms)
 vms->memmap[i].size = size;
 base += size;
 }
+vms->highest_gpa = base - 1;
 }
 
 static void machvirt_init(MachineState *machine)
@@ -1470,11 +1504,6 @@ static void machvirt_init(MachineState *machine)
 
 vms->smp_cpus = smp_cpus;
 
-if (machine->ram_size > vms->memmap[VIRT_MEM].size) {
-error_report("mach-virt: cannot model more than %dGB RAM", 
RAMLIMIT_GB);
-exit(1);
-}
-
 if (vms->virt && kvm_enabled()) {
 error_report("mach-virt: KVM does not support providing "
  "Virtualization extensions to the guest CPU");
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 3dc7a6c5d5..acad0400d8 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -132,6 +132,9 @@ typedef struct {
 

[Qemu-devel] [PATCH v7 05/17] kvm: add kvm_arm_get_max_vm_ipa_size

2019-02-20 Thread Eric Auger
Add the kvm_arm_get_max_vm_ipa_size() helper that returns the
number of bits in the IPA address space supported by KVM.

This capability needs to be known to create the VM with a
specific IPA max size (kvm_type passed along KVM_CREATE_VM ioctl.

Signed-off-by: Eric Auger 

---
v6 -> v7:
- s/kvm_arm_get_max_vm_phys_shift/kvm_arm_get_max_vm_ipa_size
- reword the comment

v4 -> v5:
- return 40 if the host does not support the capability

v3 -> v4:
- s/s/ms in kvm_arm_get_max_vm_phys_shift function comment
- check KVM_CAP_ARM_VM_IPA_SIZE extension

v1 -> v2:
- put this in ARM specific code
---
 target/arm/kvm.c | 10 ++
 target/arm/kvm_arm.h | 13 +
 2 files changed, 23 insertions(+)

diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index e00ccf9c98..79a79f0190 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -18,6 +18,7 @@
 #include "qemu/error-report.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/kvm.h"
+#include "sysemu/kvm_int.h"
 #include "kvm_arm.h"
 #include "cpu.h"
 #include "trace.h"
@@ -162,6 +163,15 @@ void kvm_arm_set_cpu_features_from_host(ARMCPU *cpu)
 env->features = arm_host_cpu_features.features;
 }
 
+int kvm_arm_get_max_vm_ipa_size(MachineState *ms)
+{
+KVMState *s = KVM_STATE(ms->accelerator);
+int ret;
+
+ret = kvm_check_extension(s, KVM_CAP_ARM_VM_IPA_SIZE);
+return ret > 0 ? ret : 40;
+}
+
 int kvm_arch_init(MachineState *ms, KVMState *s)
 {
 /* For ARM interrupt delivery is always asynchronous,
diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
index 6393455b1d..2a07333c61 100644
--- a/target/arm/kvm_arm.h
+++ b/target/arm/kvm_arm.h
@@ -207,6 +207,14 @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures 
*ahcf);
  */
 void kvm_arm_set_cpu_features_from_host(ARMCPU *cpu);
 
+/**
+ * kvm_arm_get_max_vm_ipa_size - Returns the number of bits in the
+ * IPA address space supported by KVM
+ *
+ * @ms: Machine state handle
+ */
+int kvm_arm_get_max_vm_ipa_size(MachineState *ms);
+
 /**
  * kvm_arm_sync_mpstate_to_kvm
  * @cpu: ARMCPU
@@ -239,6 +247,11 @@ static inline void 
kvm_arm_set_cpu_features_from_host(ARMCPU *cpu)
 cpu->host_cpu_probe_failed = true;
 }
 
+static inline int kvm_arm_get_max_vm_ipa_size(MachineState *ms)
+{
+return -ENOENT;
+}
+
 static inline int kvm_arm_vgic_probe(void)
 {
 return 0;
-- 
2.20.1




Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support

2019-02-20 Thread Auger Eric
Hi Peter,

On 2/20/19 11:39 PM, Eric Auger wrote:
> This series aims to bump the 255GB RAM limit in machvirt and to
> support device memory in general, and especially PCDIMM/NVDIMM.
> 
> In machvirt versions < 4.0, the initial RAM starts at 1GB and can
> grow up to 255GB. From 256GB onwards we find IO regions such as the
> additional GICv3 RDIST region, high PCIe ECAM region and high PCIe
> MMIO region. The address map was 1TB large. This corresponded to
> the max IPA capacity KVM was able to manage.
> 
> Since 4.20, the host kernel is able to support a larger and dynamic
> IPA range. So the guest physical address can go beyond the 1TB. The
> max GPA size depends on the host kernel configuration and physical CPUs.
> 
> In this series we use this feature and allow the RAM to grow without
> any other limit than the one put by the host kernel.
> 
> The RAM still starts at 1GB. First comes the initial ram (-m) of size
> ram_size and then comes the device memory (,maxmem) of size
> maxram_size - ram_size. The device memory is potentially hotpluggable
> depending on the instantiated memory objects.
> 
> IO regions previously located between 256GB and 1TB are moved after
> the RAM. Their offset is dynamically computed, depends on ram_size
> and maxram_size. Size alignment is enforced.
> 
> In case maxmem value is inferior to 255GB, the legacy memory map
> still is used. The change of memory map becomes effective from 4.0
> onwards.
> 
> As we keep the initial RAM at 1GB base address, we do not need to do
> invasive changes in the EDK2 FW. It seems nobody is eager to do
> that job at the moment.
> 
> Device memory being put just after the initial RAM, it is possible
> to get access to this feature while keeping a 1TB address map.
> 
> This series reuses/rebases patches initially submitted by Shameer
> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts.
> 
> Functionally, the series is split into 3 parts:
> 1) bump of the initial RAM limit [1 - 9] and change in
>the memory map
I respinned the whole series including PCDIMM and NVDIMM parts as Igor
did a first review pass on those latter. However the first objective is
to get [1 - 9] upstreamed as we discussed realier. So please consider
those patches independently on the others.

Thanks

Eric
> 2) Support of PC-DIMM [10 - 13]
> 3) Support of NV-DIMM [14 - 17]
> 
> 1) can be upstreamed before 2 and 2 can be upstreamed before 3.
> 
> Work is ongoing to transform the whole memory as device memory.
> However this move is not trivial and to me, is independent on
> the improvements brought by this series:
> - if we were to use DIMM for initial RAM, those DIMMs would use
>   use slots. Although they would not be part of the ones provided
>   using the ",slots" options, they are ACPI limited resources.
> - DT and ACPI description needs to be reworked
> - NUMA integration needs special care
> - a special device memory object may be required to avoid consuming
>   slots and easing the FW description.
> 
> So I preferred to separate the concerns. This new implementation
> based on device memory could be candidate for another virt
> version.
> 
> Best Regards
> 
> Eric
> 
> References:
> 
> [0] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
> http://patchwork.ozlabs.org/cover/914694/
> 
> [1] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform
> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html
> 
> This series can be found at:
> https://github.com/eauger/qemu/tree/v3.1.0-dimm-v7
> 
> History:
> 
> v6 -> v7:
> - Addressed Peter and Igor comments (exceptions sent my email)
> - Fixed TCG case. Now device memory works also for TCG and vcpu
>   pamax is checked
> - See individual logs for more details
> 
> v5 -> v6:
> - mingw compilation issue fix
> - kvm_arm_get_max_vm_phys_shift always returns the number of supported
>   IPA bits
> - new patch "hw/arm/virt: Rename highmem IO regions" that eases the review
>   of "hw/arm/virt: Split the memory map description"
> - "hw/arm/virt: Move memory map initialization into machvirt_init"
>   squashed into the previous patch
> - change alignment of IO regions beyond the RAM so that it matches their
>   size
> 
> v4 -> v5:
> - change in the memory map
> - see individual logs
> 
> v3 -> v4:
> - rebase on David's "pc-dimm: next bunch of cleanups" and
>   "pc-dimm: pre_plug "slot" and "addr" assignment"
> - kvm-type option not used anymore. We directly use
>   maxram_size and ram_size machine fields to compute the
>   MAX IPA range. Migration is naturally handled as CLI
>   option are kept between source and destination. This was
>   suggested by David.
> - device_memory_start and device_memory_size not stored
>   anymore in vms->bootinfo
> - I did not take into account 2 Igor's comments: the one
>   related to the refactoring of arm_load_dtb and the one
>   related to the generation of the dtb after system_reset
>   which would contain nodes of hotplugged devices (we do
>   not support 

[Qemu-devel] [PATCH v7 03/17] hw/arm/virt: Split the memory map description

2019-02-20 Thread Eric Auger
In the prospect to introduce an extended memory map supporting more
RAM, let's split the memory map array into two parts:

- the former a15memmap contains regions below and including the RAM
- extended_memmap, only initialized with entries located after the RAM.
  Only the size of the region is initialized there since their base
  address will be dynamically computed, depending on the top of the
  RAM (initial RAM at the moment), with same alignment as their size.

This new split will allow to grow the RAM size without changing the
description of the high regions.

The patch also moves the memory map setup into machvirt_init().
The rationale is the memory map will be soon affected by the
kvm_type() call that happens after virt_instance_init() and
before machvirt_init().

The memory map is unchanged (the top of the initial RAM still is
256GiB). Then come the high IO regions with same layout as before.

Signed-off-by: Eric Auger 
Reviewed-by: Peter Maydell 

---
v6 -> v7:
- s/a15memmap/base_memmap
- slight rewording of the commit message
- add "if there is less than 256GiB of RAM then the floating area
  starts at the 256GiB mark" in the comment associated to the floating
  memory map
- Added Peter's R-b

v5 -> v6
- removal of many macros in units.h
- introduce the virt_set_memmap helper
- new computation for offsets of high IO regions
- add comments
---
 hw/arm/virt.c | 48 +--
 include/hw/arm/virt.h | 14 +
 2 files changed, 52 insertions(+), 10 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index a1955e7764..12039a0367 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -29,6 +29,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/units.h"
 #include "qapi/error.h"
 #include "hw/sysbus.h"
 #include "hw/arm/arm.h"
@@ -121,7 +122,7 @@
  * Note that devices should generally be placed at multiples of 0x1,
  * to accommodate guests using 64K pages.
  */
-static const MemMapEntry a15memmap[] = {
+static const MemMapEntry base_memmap[] = {
 /* Space up to 0x800 is reserved for a boot ROM */
 [VIRT_FLASH] =  {  0, 0x0800 },
 [VIRT_CPUPERIPHS] = { 0x0800, 0x0002 },
@@ -149,11 +150,21 @@ static const MemMapEntry a15memmap[] = {
 [VIRT_PCIE_PIO] =   { 0x3eff, 0x0001 },
 [VIRT_PCIE_ECAM] =  { 0x3f00, 0x0100 },
 [VIRT_MEM] ={ 0x4000, RAMLIMIT_BYTES },
+};
+
+/*
+ * Highmem IO Regions: This memory map is floating, located after the RAM.
+ * Each IO region offset will be dynamically computed, depending on the
+ * top of the RAM, so that its base get the same alignment as the size,
+ * ie. a 512GiB region will be aligned on a 512GiB boundary. If there is
+ * less than 256GiB of RAM, the floating area starts at the 256GiB mark.
+ */
+static MemMapEntry extended_memmap[] = {
 /* Additional 64 MB redist region (can contain up to 512 redistributors) */
-[VIRT_HIGH_GIC_REDIST2] =   { 0x40ULL, 0x400 },
-[VIRT_HIGH_PCIE_ECAM] = { 0x401000ULL, 0x1000 },
-/* Second PCIe window, 512GB wide at the 512GB boundary */
-[VIRT_HIGH_PCIE_MMIO] = { 0x80ULL, 0x80ULL },
+[VIRT_HIGH_GIC_REDIST2] =   { 0x0, 64 * MiB },
+[VIRT_HIGH_PCIE_ECAM] = { 0x0, 256 * MiB },
+/* Second PCIe window */
+[VIRT_HIGH_PCIE_MMIO] = { 0x0, 512 * GiB },
 };
 
 static const int a15irqmap[] = {
@@ -1354,6 +1365,30 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
*vms, int idx)
 return arm_cpu_mp_affinity(idx, clustersz);
 }
 
+static void virt_set_memmap(VirtMachineState *vms)
+{
+hwaddr base;
+int i;
+
+vms->memmap = extended_memmap;
+
+for (i = 0; i < ARRAY_SIZE(base_memmap); i++) {
+vms->memmap[i] = base_memmap[i];
+}
+
+vms->high_io_base = 256 * GiB; /* Top of the legacy initial RAM region */
+base = vms->high_io_base;
+
+for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
+hwaddr size = extended_memmap[i].size;
+
+base = ROUND_UP(base, size);
+vms->memmap[i].base = base;
+vms->memmap[i].size = size;
+base += size;
+}
+}
+
 static void machvirt_init(MachineState *machine)
 {
 VirtMachineState *vms = VIRT_MACHINE(machine);
@@ -1368,6 +1403,8 @@ static void machvirt_init(MachineState *machine)
 bool firmware_loaded = bios_name || drive_get(IF_PFLASH, 0, 0);
 bool aarch64 = true;
 
+virt_set_memmap(vms);
+
 /* We can probe only here because during property set
  * KVM is not available yet
  */
@@ -1843,7 +1880,6 @@ static void virt_instance_init(Object *obj)
 "Valid values are none and smmuv3",
 NULL);
 
-vms->memmap = a15memmap;
 vms->irqmap = a15irqmap;
 }
 
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index a27086d524..3dc7a6c5d5 100644
--- 

[Qemu-devel] [PATCH v7 04/17] hw/boards: Add a MachineState parameter to kvm_type callback

2019-02-20 Thread Eric Auger
On ARM, the kvm_type will be resolved by querying the KVMState.
Let's add the MachineState handle to the callback so that we
can retrieve the  KVMState handle. in kvm_init, when the callback
is called, the kvm_state variable is not yet set.

Signed-off-by: Eric Auger 
Acked-by: David Gibson 
[ppc parts]
Reviewed-by: Peter Maydell 

---
v6 -> v7:
- add a comment for kvm_type
- use machine instead of ms in the declaration
- add Peter's R-b
---
 accel/kvm/kvm-all.c   | 2 +-
 hw/ppc/mac_newworld.c | 3 +--
 hw/ppc/mac_oldworld.c | 2 +-
 hw/ppc/spapr.c| 2 +-
 include/hw/boards.h   | 5 -
 5 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index fd92b6f375..241db496c3 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1593,7 +1593,7 @@ static int kvm_init(MachineState *ms)
 
 kvm_type = qemu_opt_get(qemu_get_machine_opts(), "kvm-type");
 if (mc->kvm_type) {
-type = mc->kvm_type(kvm_type);
+type = mc->kvm_type(ms, kvm_type);
 } else if (kvm_type) {
 ret = -EINVAL;
 fprintf(stderr, "Invalid argument kvm-type=%s\n", kvm_type);
diff --git a/hw/ppc/mac_newworld.c b/hw/ppc/mac_newworld.c
index 98461052ac..97e8817145 100644
--- a/hw/ppc/mac_newworld.c
+++ b/hw/ppc/mac_newworld.c
@@ -564,8 +564,7 @@ static char *core99_fw_dev_path(FWPathProvider *p, BusState 
*bus,
 
 return NULL;
 }
-
-static int core99_kvm_type(const char *arg)
+static int core99_kvm_type(MachineState *machine, const char *arg)
 {
 /* Always force PR KVM */
 return 2;
diff --git a/hw/ppc/mac_oldworld.c b/hw/ppc/mac_oldworld.c
index 284431ddd6..cc1e463466 100644
--- a/hw/ppc/mac_oldworld.c
+++ b/hw/ppc/mac_oldworld.c
@@ -420,7 +420,7 @@ static char *heathrow_fw_dev_path(FWPathProvider *p, 
BusState *bus,
 return NULL;
 }
 
-static int heathrow_kvm_type(const char *arg)
+static int heathrow_kvm_type(MachineState *machine, const char *arg)
 {
 /* Always force PR KVM */
 return 2;
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index abf9ebce59..3d0811fa81 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2920,7 +2920,7 @@ static void spapr_machine_init(MachineState *machine)
 }
 }
 
-static int spapr_kvm_type(const char *vm_type)
+static int spapr_kvm_type(MachineState *machine, const char *vm_type)
 {
 if (!vm_type) {
 return 0;
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 05f9f45c3d..ed2fec82d5 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -156,6 +156,9 @@ typedef struct {
  *should instead use "unimplemented-device" for all memory ranges where
  *the guest will attempt to probe for a device that QEMU doesn't
  *implement and a stub device is required.
+ * @kvm_type:
+ *Return the type of KVM corresponding to the kvm-type string option or
+ *computed based on other criteria such as the host kernel capabilities.
  */
 struct MachineClass {
 /*< private >*/
@@ -171,7 +174,7 @@ struct MachineClass {
 void (*init)(MachineState *state);
 void (*reset)(void);
 void (*hot_add_cpu)(const int64_t id, Error **errp);
-int (*kvm_type)(const char *arg);
+int (*kvm_type)(MachineState *machine, const char *arg);
 
 BlockInterfaceType block_default_type;
 int units_per_default_bus;
-- 
2.20.1




[Qemu-devel] [PATCH v7 15/17] hw/arm/virt: Add nvdimm hot-plug infrastructure

2019-02-20 Thread Eric Auger
From: Kwangwoo Lee 

Pre-plug and plug handlers are prepared for NVDIMM support.

Signed-off-by: Eric Auger 
Signed-off-by: Kwangwoo Lee 
---
 default-configs/arm-softmmu.mak |  2 ++
 hw/arm/virt-acpi-build.c|  6 ++
 hw/arm/virt.c   | 22 ++
 include/hw/arm/virt.h   |  3 +++
 4 files changed, 33 insertions(+)

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index 0a78421f72..03dbebb197 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -165,3 +165,5 @@ CONFIG_HIGHBANK=y
 CONFIG_MUSICPAL=y
 CONFIG_MEM_DEVICE=y
 CONFIG_DIMM=y
+CONFIG_NVDIMM=y
+CONFIG_ACPI_NVDIMM=y
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 781eafaf5e..f086adfa82 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -784,6 +784,7 @@ static
 void virt_acpi_build(VirtMachineState *vms, AcpiBuildTables *tables)
 {
 VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
+MachineState *ms = MACHINE(vms);
 GArray *table_offsets;
 unsigned dsdt, xsdt;
 GArray *tables_blob = tables->table_data;
@@ -824,6 +825,11 @@ void virt_acpi_build(VirtMachineState *vms, 
AcpiBuildTables *tables)
 }
 }
 
+if (vms->acpi_nvdimm_state.is_enabled) {
+nvdimm_build_acpi(table_offsets, tables_blob, tables->linker,
+  >acpi_nvdimm_state, ms->ram_slots);
+}
+
 if (its_class_name() && !vmc->no_its) {
 acpi_add_table(table_offsets, tables_blob);
 build_iort(tables_blob, tables->linker, vms);
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 33ad9b3f63..1896920570 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -134,6 +134,7 @@ static const MemMapEntry base_memmap[] = {
 [VIRT_GPIO] =   { 0x0903, 0x1000 },
 [VIRT_SECURE_UART] ={ 0x0904, 0x1000 },
 [VIRT_SMMU] =   { 0x0905, 0x0002 },
+[VIRT_ACPI_IO] ={ 0x0907, 0x0001 },
 [VIRT_MMIO] =   { 0x0a00, 0x0200 },
 /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */
 [VIRT_PLATFORM_BUS] =   { 0x0c00, 0x0200 },
@@ -1675,6 +1676,18 @@ static void machvirt_init(MachineState *machine)
 
 create_platform_bus(vms, pic);
 
+if (vms->acpi_nvdimm_state.is_enabled) {
+AcpiNVDIMMState *acpi_nvdimm_state = >acpi_nvdimm_state;
+
+acpi_nvdimm_state->dsm_io.space_id = AML_AS_SYSTEM_MEMORY;
+acpi_nvdimm_state->dsm_io.address =
+vms->memmap[VIRT_ACPI_IO].base + NVDIMM_ACPI_IO_BASE;
+acpi_nvdimm_state->dsm_io.bit_width = NVDIMM_ACPI_IO_LEN << 3;
+
+nvdimm_init_acpi_state(acpi_nvdimm_state, sysmem,
+   vms->fw_cfg, OBJECT(vms));
+}
+
 vms->bootinfo.ram_size = machine->ram_size;
 vms->bootinfo.kernel_filename = machine->kernel_filename;
 vms->bootinfo.kernel_cmdline = machine->kernel_cmdline;
@@ -1860,10 +1873,19 @@ static void virt_memory_plug(HotplugHandler 
*hotplug_dev,
  DeviceState *dev, Error **errp)
 {
 VirtMachineState *vms = VIRT_MACHINE(hotplug_dev);
+bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
 Error *local_err = NULL;
 
 pc_dimm_plug(PC_DIMM(dev), MACHINE(vms), _err);
+if (local_err) {
+goto out;
+}
 
+if (is_nvdimm) {
+nvdimm_plug(>acpi_nvdimm_state);
+}
+
+out:
 error_propagate(errp, local_err);
 }
 
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 7798462cb0..bd9cf68311 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -37,6 +37,7 @@
 #include "hw/arm/arm.h"
 #include "sysemu/kvm.h"
 #include "hw/intc/arm_gicv3_common.h"
+#include "hw/mem/nvdimm.h"
 
 #define NUM_GICV2M_SPIS   64
 #define NUM_VIRTIO_TRANSPORTS 32
@@ -77,6 +78,7 @@ enum {
 VIRT_GPIO,
 VIRT_SECURE_UART,
 VIRT_SECURE_MEM,
+VIRT_ACPI_IO,
 VIRT_LOWMEMMAP_LAST,
 };
 
@@ -137,6 +139,7 @@ typedef struct {
 hwaddr device_memory_base;
 hwaddr device_memory_size;
 bool extended_memmap;
+AcpiNVDIMMState acpi_nvdimm_state;
 } VirtMachineState;
 
 #define VIRT_ECAM_ID(high) (high ? VIRT_HIGH_PCIE_ECAM : VIRT_PCIE_ECAM)
-- 
2.20.1




[Qemu-devel] [PATCH v7 16/17] hw/arm/boot: Expose the pmem nodes in the DT

2019-02-20 Thread Eric Auger
In case of NV-DIMM slots, let's add /pmem DT nodes.

Signed-off-by: Eric Auger 

---

v6 -> v7
- does the same rework as for fdt_add_memory_node
---
 hw/arm/boot.c | 40 
 1 file changed, 40 insertions(+)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index 255aaca0cf..66caf005e5 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -450,6 +450,32 @@ out:
 return ret;
 }
 
+static int fdt_add_pmem_node(void *fdt, uint32_t acells, hwaddr mem_base,
+ uint32_t scells, hwaddr mem_len,
+ int numa_node_id)
+{
+char *nodename;
+int ret;
+
+nodename = g_strdup_printf("/pmem@%" PRIx64, mem_base);
+qemu_fdt_add_subnode(fdt, nodename);
+qemu_fdt_setprop_string(fdt, nodename, "compatible", "pmem-region");
+ret = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg", acells, mem_base,
+   scells, mem_len);
+if (ret < 0) {
+goto out;
+}
+
+/* only set the NUMA ID if it is specified */
+if (numa_node_id >= 0) {
+ret = qemu_fdt_setprop_cell(fdt, nodename,
+"numa-node-id", numa_node_id);
+}
+out:
+g_free(nodename);
+return ret;
+}
+
 static void fdt_add_psci_node(void *fdt)
 {
 uint32_t cpu_suspend_fn;
@@ -546,6 +572,20 @@ static int fdt_add_hotpluggable_memory_nodes(void *fdt,
 }
 break;
 }
+case MEMORY_DEVICE_INFO_KIND_NVDIMM:
+{
+PCDIMMDeviceInfo *di = mi->u.nvdimm.data;
+
+ret = fdt_add_pmem_node(fdt, acells, di->addr,
+scells, di->size, di->node);
+if (ret) {
+fprintf(stderr,
+"couldn't add NVDIMM /memory@%"PRIx64" node\n",
+di->addr);
+goto out;
+}
+break;
+}
 default:
 fprintf(stderr, "%s memory nodes are not yet supported\n",
 MemoryDeviceInfoKind_str(mi->type));
-- 
2.20.1




[Qemu-devel] [PATCH v7 01/17] hw/arm/boot: introduce fdt_add_memory_node helper

2019-02-20 Thread Eric Auger
From: Shameer Kolothum 

We introduce an helper to create a memory node.

Signed-off-by: Eric Auger 
Signed-off-by: Shameer Kolothum 

---

v6 -> v7:
- msg error in the caller
- add comment about NUMA ID
---
 hw/arm/boot.c | 54 ---
 1 file changed, 34 insertions(+), 20 deletions(-)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index d90af2f17d..a830655e1a 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -423,6 +423,32 @@ static void set_kernel_args_old(const struct arm_boot_info 
*info,
 }
 }
 
+static int fdt_add_memory_node(void *fdt, uint32_t acells, hwaddr mem_base,
+   uint32_t scells, hwaddr mem_len,
+   int numa_node_id)
+{
+char *nodename;
+int ret;
+
+nodename = g_strdup_printf("/memory@%" PRIx64, mem_base);
+qemu_fdt_add_subnode(fdt, nodename);
+qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
+ret = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg", acells, mem_base,
+   scells, mem_len);
+if (ret < 0) {
+goto out;
+}
+
+/* only set the NUMA ID if it is specified */
+if (numa_node_id >= 0) {
+ret = qemu_fdt_setprop_cell(fdt, nodename,
+"numa-node-id", numa_node_id);
+}
+out:
+g_free(nodename);
+return ret;
+}
+
 static void fdt_add_psci_node(void *fdt)
 {
 uint32_t cpu_suspend_fn;
@@ -502,7 +528,6 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info 
*binfo,
 void *fdt = NULL;
 int size, rc, n = 0;
 uint32_t acells, scells;
-char *nodename;
 unsigned int i;
 hwaddr mem_base, mem_len;
 char **node_path;
@@ -576,35 +601,24 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info 
*binfo,
 mem_base = binfo->loader_start;
 for (i = 0; i < nb_numa_nodes; i++) {
 mem_len = numa_info[i].node_mem;
-nodename = g_strdup_printf("/memory@%" PRIx64, mem_base);
-qemu_fdt_add_subnode(fdt, nodename);
-qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
-rc = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg",
-  acells, mem_base,
-  scells, mem_len);
+rc = fdt_add_memory_node(fdt, acells, mem_base,
+ scells, mem_len, i);
 if (rc < 0) {
-fprintf(stderr, "couldn't set %s/reg for node %d\n", nodename,
-i);
+fprintf(stderr, "couldn't add /memory@%"PRIx64" node\n",
+mem_base);
 goto fail;
 }
 
-qemu_fdt_setprop_cell(fdt, nodename, "numa-node-id", i);
 mem_base += mem_len;
-g_free(nodename);
 }
 } else {
-nodename = g_strdup_printf("/memory@%" PRIx64, binfo->loader_start);
-qemu_fdt_add_subnode(fdt, nodename);
-qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
-
-rc = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg",
-  acells, binfo->loader_start,
-  scells, binfo->ram_size);
+rc = fdt_add_memory_node(fdt, acells, binfo->loader_start,
+ scells, binfo->ram_size, -1);
 if (rc < 0) {
-fprintf(stderr, "couldn't set %s reg\n", nodename);
+fprintf(stderr, "couldn't add /memory@%"PRIx64" node\n",
+binfo->loader_start);
 goto fail;
 }
-g_free(nodename);
 }
 
 rc = fdt_path_offset(fdt, "/chosen");
-- 
2.20.1




[Qemu-devel] [PATCH v7 09/17] hw/arm/virt: Bump the 255GB initial RAM limit

2019-02-20 Thread Eric Auger
Now we have the extended memory map (high IO regions beyond the
scalable RAM) and dynamic IPA range support at KVM/ARM level
we can bump the legacy 255GB initial RAM limit. The actual maximum
RAM size now depends on the physical CPU and host kernel.

Signed-off-by: Eric Auger 

---
v6 -> v7
- handle TCG case
- set_memmap modifications moved to previous patches
---
 hw/arm/virt.c | 54 ---
 1 file changed, 30 insertions(+), 24 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ad3a0ad73d..5b656f9db5 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -59,6 +59,7 @@
 #include "qapi/visitor.h"
 #include "standard-headers/linux/input.h"
 #include "hw/arm/smmuv3.h"
+#include "target/arm/internals.h"
 
 #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
 static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
@@ -93,21 +94,8 @@
 
 #define PLATFORM_BUS_NUM_IRQS 64
 
-/* RAM limit in GB. Since VIRT_MEM starts at the 1GB mark, this means
- * RAM can go up to the 256GB mark, leaving 256GB of the physical
- * address space unallocated and free for future use between 256G and 512G.
- * If we need to provide more RAM to VMs in the future then we need to:
- *  * allocate a second bank of RAM starting at 2TB and working up
- *  * fix the DT and ACPI table generation code in QEMU to correctly
- *report two split lumps of RAM to the guest
- *  * fix KVM in the host kernel to allow guests with >40 bit address spaces
- * (We don't want to fill all the way up to 512GB with RAM because
- * we might want it for non-RAM purposes later. Conversely it seems
- * reasonable to assume that anybody configuring a VM with a quarter
- * of a terabyte of RAM will be doing it on a host with more than a
- * terabyte of physical address space.)
- */
 #define RAMBASE GiB
+/* Legacy RAM limit in GB (< version 4.0) */
 #define LEGACY_RAMLIMIT_GB 255
 #define LEGACY_RAMLIMIT_BYTES (LEGACY_RAMLIMIT_GB * GiB)
 
@@ -1372,16 +1360,18 @@ static void virt_set_memmap(VirtMachineState *vms)
 hwaddr base;
 int i;
 
-if (ms->maxram_size > ms->ram_size || ms->ram_slots > 0) {
-error_report("mach-virt: does not support device memory: "
- "ignore maxmem and slots options");
-ms->maxram_size = ms->ram_size;
-ms->ram_slots = 0;
-}
-if (ms->ram_size > (ram_addr_t)LEGACY_RAMLIMIT_BYTES) {
-error_report("mach-virt: cannot model more than %dGB RAM",
- LEGACY_RAMLIMIT_GB);
-exit(1);
+if (!vms->extended_memmap) {
+if (ms->maxram_size > ms->ram_size || ms->ram_slots > 0) {
+error_report("mach-virt: does not support device memory: "
+ "ignore maxmem and slots options");
+ms->maxram_size = ms->ram_size;
+ms->ram_slots = 0;
+}
+if (ms->ram_size > (ram_addr_t)LEGACY_RAMLIMIT_BYTES) {
+error_report("mach-virt: cannot model more than %dGB RAM",
+ LEGACY_RAMLIMIT_GB);
+exit(1);
+}
 }
 
 vms->memmap = extended_memmap;
@@ -1598,6 +1588,22 @@ static void machvirt_init(MachineState *machine)
 fdt_add_timer_nodes(vms);
 fdt_add_cpu_nodes(vms);
 
+   if (!kvm_enabled()) {
+ARMCPU *cpu = ARM_CPU(first_cpu);
+bool aarch64 = object_property_get_bool(OBJECT(cpu), "aarch64", NULL);
+
+if (aarch64 && vms->highmem) {
+int requested_pa_size, pamax = arm_pamax(cpu);
+
+requested_pa_size = 64 - clz64(vms->highest_gpa);
+if (pamax < requested_pa_size) {
+error_report("VCPU supports less PA bits (%d) than requested "
+"by the memory map (%d)", pamax, 
requested_pa_size);
+exit(1);
+}
+}
+}
+
 memory_region_allocate_system_memory(ram, NULL, "mach-virt.ram",
  machine->ram_size);
 memory_region_add_subregion(sysmem, vms->memmap[VIRT_MEM].base, ram);
-- 
2.20.1




[Qemu-devel] [PATCH v7 10/17] hw/arm/virt: Add memory hotplug framework

2019-02-20 Thread Eric Auger
This patch adds the the memory hot-plug/hot-unplug infrastructure
in machvirt. It is still not enabled as no device memory is allocated.

Signed-off-by: Eric Auger 
Signed-off-by: Shameer Kolothum 
Signed-off-by: Kwangwoo Lee 

---
v4 -> v5:
- change in pc_dimm_pre_plug signature
- CONFIG_MEM_HOTPLUG replaced by CONFIG_MEM_DEVICE and CONFIG_DIMM

v3 -> v4:
- check the memory device is not hotplugged

v2 -> v3:
- change in pc_dimm_plug()'s signature
- add pc_dimm_pre_plug call

v1 -> v2:
- s/virt_dimm_plug|unplug/virt_memory_plug|unplug
- s/pc_dimm_memory_plug/pc_dimm_plug
- reworded title and commit message
- added pre_plug cb
- don't handle get_memory_region failure anymore
---
 default-configs/arm-softmmu.mak |  2 ++
 hw/arm/virt.c   | 64 -
 2 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index 734ca721e9..0a78421f72 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -163,3 +163,5 @@ CONFIG_PCI_EXPRESS_DESIGNWARE=y
 CONFIG_STRONGARM=y
 CONFIG_HIGHBANK=y
 CONFIG_MUSICPAL=y
+CONFIG_MEM_DEVICE=y
+CONFIG_DIMM=y
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 5b656f9db5..470ca0ce2d 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -60,6 +60,8 @@
 #include "standard-headers/linux/input.h"
 #include "hw/arm/smmuv3.h"
 #include "target/arm/internals.h"
+#include "hw/mem/pc-dimm.h"
+#include "hw/mem/nvdimm.h"
 
 #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
 static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
@@ -1804,6 +1806,49 @@ static const CPUArchIdList 
*virt_possible_cpu_arch_ids(MachineState *ms)
 return ms->possible_cpus;
 }
 
+static void virt_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
+ Error **errp)
+{
+const bool is_nvdimm = object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM);
+
+if (dev->hotplugged) {
+error_setg(errp, "memory hotplug is not supported");
+}
+
+if (is_nvdimm) {
+error_setg(errp, "nvdimm is not yet supported");
+return;
+}
+
+pc_dimm_pre_plug(PC_DIMM(dev), MACHINE(hotplug_dev), NULL, errp);
+}
+
+static void virt_memory_plug(HotplugHandler *hotplug_dev,
+ DeviceState *dev, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(hotplug_dev);
+Error *local_err = NULL;
+
+pc_dimm_plug(PC_DIMM(dev), MACHINE(vms), _err);
+
+error_propagate(errp, local_err);
+}
+
+static void virt_memory_unplug(HotplugHandler *hotplug_dev,
+   DeviceState *dev, Error **errp)
+{
+pc_dimm_unplug(PC_DIMM(dev), MACHINE(hotplug_dev));
+object_unparent(OBJECT(dev));
+}
+
+static void virt_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
+DeviceState *dev, Error **errp)
+{
+if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+virt_memory_pre_plug(hotplug_dev, dev, errp);
+}
+}
+
 static void virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
 DeviceState *dev, Error **errp)
 {
@@ -1815,12 +1860,27 @@ static void virt_machine_device_plug_cb(HotplugHandler 
*hotplug_dev,
  SYS_BUS_DEVICE(dev));
 }
 }
+if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+virt_memory_plug(hotplug_dev, dev, errp);
+}
+}
+
+static void virt_machine_device_unplug_cb(HotplugHandler *hotplug_dev,
+  DeviceState *dev, Error **errp)
+{
+if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+virt_memory_unplug(hotplug_dev, dev, errp);
+} else {
+error_setg(errp, "device unplug request for unsupported device"
+   " type: %s", object_get_typename(OBJECT(dev)));
+}
 }
 
 static HotplugHandler *virt_machine_get_hotplug_handler(MachineState *machine,
 DeviceState *dev)
 {
-if (object_dynamic_cast(OBJECT(dev), TYPE_SYS_BUS_DEVICE)) {
+if (object_dynamic_cast(OBJECT(dev), TYPE_SYS_BUS_DEVICE) ||
+   (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM))) {
 return HOTPLUG_HANDLER(machine);
 }
 
@@ -1884,7 +1944,9 @@ static void virt_machine_class_init(ObjectClass *oc, void 
*data)
 mc->kvm_type = virt_kvm_type;
 assert(!mc->get_hotplug_handler);
 mc->get_hotplug_handler = virt_machine_get_hotplug_handler;
+hc->pre_plug = virt_machine_device_pre_plug_cb;
 hc->plug = virt_machine_device_plug_cb;
+hc->unplug = virt_machine_device_unplug_cb;
 }
 
 static void virt_instance_init(Object *obj)
-- 
2.20.1




[Qemu-devel] [PATCH v7 02/17] hw/arm/virt: Rename highmem IO regions

2019-02-20 Thread Eric Auger
In preparation for a split of the memory map into a static
part and a dynamic part floating after the RAM, let's rename the
regions located after the RAM

Signed-off-by: Eric Auger 
Reviewed-by: Peter Maydell 

---
v7: added Peter's R-b
v6: creation
---
 hw/arm/virt-acpi-build.c |  8 
 hw/arm/virt.c| 21 +++--
 include/hw/arm/virt.h|  8 
 3 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 04b62c714d..829d2f0035 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -229,8 +229,8 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry 
*memmap,
  size_pio));
 
 if (use_highmem) {
-hwaddr base_mmio_high = memmap[VIRT_PCIE_MMIO_HIGH].base;
-hwaddr size_mmio_high = memmap[VIRT_PCIE_MMIO_HIGH].size;
+hwaddr base_mmio_high = memmap[VIRT_HIGH_PCIE_MMIO].base;
+hwaddr size_mmio_high = memmap[VIRT_HIGH_PCIE_MMIO].size;
 
 aml_append(rbuf,
 aml_qword_memory(AML_POS_DECODE, AML_MIN_FIXED, AML_MAX_FIXED,
@@ -663,8 +663,8 @@ build_madt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 gicr = acpi_data_push(table_data, sizeof(*gicr));
 gicr->type = ACPI_APIC_GENERIC_REDISTRIBUTOR;
 gicr->length = sizeof(*gicr);
-gicr->base_address = cpu_to_le64(memmap[VIRT_GIC_REDIST2].base);
-gicr->range_length = cpu_to_le32(memmap[VIRT_GIC_REDIST2].size);
+gicr->base_address = 
cpu_to_le64(memmap[VIRT_HIGH_GIC_REDIST2].base);
+gicr->range_length = 
cpu_to_le32(memmap[VIRT_HIGH_GIC_REDIST2].size);
 }
 
 if (its_class_name() && !vmc->no_its) {
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 99c2b6e60d..a1955e7764 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -150,10 +150,10 @@ static const MemMapEntry a15memmap[] = {
 [VIRT_PCIE_ECAM] =  { 0x3f00, 0x0100 },
 [VIRT_MEM] ={ 0x4000, RAMLIMIT_BYTES },
 /* Additional 64 MB redist region (can contain up to 512 redistributors) */
-[VIRT_GIC_REDIST2] ={ 0x40ULL, 0x400 },
-[VIRT_PCIE_ECAM_HIGH] = { 0x401000ULL, 0x1000 },
+[VIRT_HIGH_GIC_REDIST2] =   { 0x40ULL, 0x400 },
+[VIRT_HIGH_PCIE_ECAM] = { 0x401000ULL, 0x1000 },
 /* Second PCIe window, 512GB wide at the 512GB boundary */
-[VIRT_PCIE_MMIO_HIGH] =   { 0x80ULL, 0x80ULL },
+[VIRT_HIGH_PCIE_MMIO] = { 0x80ULL, 0x80ULL },
 };
 
 static const int a15irqmap[] = {
@@ -435,8 +435,8 @@ static void fdt_add_gic_node(VirtMachineState *vms)
  2, vms->memmap[VIRT_GIC_DIST].size,
  2, vms->memmap[VIRT_GIC_REDIST].base,
  2, vms->memmap[VIRT_GIC_REDIST].size,
- 2, vms->memmap[VIRT_GIC_REDIST2].base,
- 2, 
vms->memmap[VIRT_GIC_REDIST2].size);
+ 2, 
vms->memmap[VIRT_HIGH_GIC_REDIST2].base,
+ 2, 
vms->memmap[VIRT_HIGH_GIC_REDIST2].size);
 }
 
 if (vms->virt) {
@@ -584,7 +584,7 @@ static void create_gic(VirtMachineState *vms, qemu_irq *pic)
 
 if (nb_redist_regions == 2) {
 uint32_t redist1_capacity =
-vms->memmap[VIRT_GIC_REDIST2].size / GICV3_REDIST_SIZE;
+vms->memmap[VIRT_HIGH_GIC_REDIST2].size / 
GICV3_REDIST_SIZE;
 
 qdev_prop_set_uint32(gicdev, "redist-region-count[1]",
 MIN(smp_cpus - redist0_count, redist1_capacity));
@@ -601,7 +601,8 @@ static void create_gic(VirtMachineState *vms, qemu_irq *pic)
 if (type == 3) {
 sysbus_mmio_map(gicbusdev, 1, vms->memmap[VIRT_GIC_REDIST].base);
 if (nb_redist_regions == 2) {
-sysbus_mmio_map(gicbusdev, 2, vms->memmap[VIRT_GIC_REDIST2].base);
+sysbus_mmio_map(gicbusdev, 2,
+vms->memmap[VIRT_HIGH_GIC_REDIST2].base);
 }
 } else {
 sysbus_mmio_map(gicbusdev, 1, vms->memmap[VIRT_GIC_CPU].base);
@@ -1088,8 +1089,8 @@ static void create_pcie(VirtMachineState *vms, qemu_irq 
*pic)
 {
 hwaddr base_mmio = vms->memmap[VIRT_PCIE_MMIO].base;
 hwaddr size_mmio = vms->memmap[VIRT_PCIE_MMIO].size;
-hwaddr base_mmio_high = vms->memmap[VIRT_PCIE_MMIO_HIGH].base;
-hwaddr size_mmio_high = vms->memmap[VIRT_PCIE_MMIO_HIGH].size;
+hwaddr base_mmio_high = vms->memmap[VIRT_HIGH_PCIE_MMIO].base;
+hwaddr size_mmio_high = vms->memmap[VIRT_HIGH_PCIE_MMIO].size;
 hwaddr base_pio = vms->memmap[VIRT_PCIE_PIO].base;
 hwaddr size_pio = vms->memmap[VIRT_PCIE_PIO].size;
 hwaddr base_ecam, size_ecam;
@@ -1418,7 +1419,7 @@ static void 

[Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion and PCDIMM/NVDIMM support

2019-02-20 Thread Eric Auger
This series aims to bump the 255GB RAM limit in machvirt and to
support device memory in general, and especially PCDIMM/NVDIMM.

In machvirt versions < 4.0, the initial RAM starts at 1GB and can
grow up to 255GB. From 256GB onwards we find IO regions such as the
additional GICv3 RDIST region, high PCIe ECAM region and high PCIe
MMIO region. The address map was 1TB large. This corresponded to
the max IPA capacity KVM was able to manage.

Since 4.20, the host kernel is able to support a larger and dynamic
IPA range. So the guest physical address can go beyond the 1TB. The
max GPA size depends on the host kernel configuration and physical CPUs.

In this series we use this feature and allow the RAM to grow without
any other limit than the one put by the host kernel.

The RAM still starts at 1GB. First comes the initial ram (-m) of size
ram_size and then comes the device memory (,maxmem) of size
maxram_size - ram_size. The device memory is potentially hotpluggable
depending on the instantiated memory objects.

IO regions previously located between 256GB and 1TB are moved after
the RAM. Their offset is dynamically computed, depends on ram_size
and maxram_size. Size alignment is enforced.

In case maxmem value is inferior to 255GB, the legacy memory map
still is used. The change of memory map becomes effective from 4.0
onwards.

As we keep the initial RAM at 1GB base address, we do not need to do
invasive changes in the EDK2 FW. It seems nobody is eager to do
that job at the moment.

Device memory being put just after the initial RAM, it is possible
to get access to this feature while keeping a 1TB address map.

This series reuses/rebases patches initially submitted by Shameer
in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts.

Functionally, the series is split into 3 parts:
1) bump of the initial RAM limit [1 - 9] and change in
   the memory map
2) Support of PC-DIMM [10 - 13]
3) Support of NV-DIMM [14 - 17]

1) can be upstreamed before 2 and 2 can be upstreamed before 3.

Work is ongoing to transform the whole memory as device memory.
However this move is not trivial and to me, is independent on
the improvements brought by this series:
- if we were to use DIMM for initial RAM, those DIMMs would use
  use slots. Although they would not be part of the ones provided
  using the ",slots" options, they are ACPI limited resources.
- DT and ACPI description needs to be reworked
- NUMA integration needs special care
- a special device memory object may be required to avoid consuming
  slots and easing the FW description.

So I preferred to separate the concerns. This new implementation
based on device memory could be candidate for another virt
version.

Best Regards

Eric

References:

[0] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions
http://patchwork.ozlabs.org/cover/914694/

[1] [RFC PATCH 0/3] add nvdimm support on AArch64 virt platform
https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg04599.html

This series can be found at:
https://github.com/eauger/qemu/tree/v3.1.0-dimm-v7

History:

v6 -> v7:
- Addressed Peter and Igor comments (exceptions sent my email)
- Fixed TCG case. Now device memory works also for TCG and vcpu
  pamax is checked
- See individual logs for more details

v5 -> v6:
- mingw compilation issue fix
- kvm_arm_get_max_vm_phys_shift always returns the number of supported
  IPA bits
- new patch "hw/arm/virt: Rename highmem IO regions" that eases the review
  of "hw/arm/virt: Split the memory map description"
- "hw/arm/virt: Move memory map initialization into machvirt_init"
  squashed into the previous patch
- change alignment of IO regions beyond the RAM so that it matches their
  size

v4 -> v5:
- change in the memory map
- see individual logs

v3 -> v4:
- rebase on David's "pc-dimm: next bunch of cleanups" and
  "pc-dimm: pre_plug "slot" and "addr" assignment"
- kvm-type option not used anymore. We directly use
  maxram_size and ram_size machine fields to compute the
  MAX IPA range. Migration is naturally handled as CLI
  option are kept between source and destination. This was
  suggested by David.
- device_memory_start and device_memory_size not stored
  anymore in vms->bootinfo
- I did not take into account 2 Igor's comments: the one
  related to the refactoring of arm_load_dtb and the one
  related to the generation of the dtb after system_reset
  which would contain nodes of hotplugged devices (we do
  not support hotplug at this stage)
- check the end-user does not attempt to hotplug a device
- addition of "vl: Set machine ram_size, maxram_size and
  ram_slots earlier"

v2 -> v3:
- fix pc_q35 and pc_piix compilation error
- kwangwoo's email being not valid anymore, remove his address

v1 -> v2:
- kvm_get_max_vm_phys_shift moved in arch specific file
- addition of NVDIMM part
- single series
- rebase on David's refactoring

v1:
- was "[RFC 0/6] KVM/ARM: Dynamic and larger GPA size"
- was "[RFC 0/5] ARM virt: Support PC-DIMM at 2TB"

Best Regards

Eric



Re: [Qemu-devel] [PATCH v2 3/3] vfio/display: delay link up event

2019-02-20 Thread Alex Williamson
On Wed, 20 Feb 2019 09:47:53 +0100
Gerd Hoffmann  wrote:

> Kick the display link up event with a 0.1 sec delay,
> so the guest has a chance to notice the link down first.
> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  include/hw/vfio/vfio-common.h |  1 +
>  hw/vfio/display.c | 26 +++---
>  2 files changed, 24 insertions(+), 3 deletions(-)
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 5f7f709b95..b65a2f0518 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -151,6 +151,7 @@ typedef struct VFIODisplay {
>  struct vfio_region_info *edid_info;
>  struct vfio_region_gfx_edid *edid_regs;
>  uint8_t *edid_blob;
> +QEMUTimer *edid_link_timer;
>  struct {
>  VFIORegion buffer;
>  DisplaySurface *surface;
> diff --git a/hw/vfio/display.c b/hw/vfio/display.c
> index 7b9b604a64..361823b23b 100644
> --- a/hw/vfio/display.c
> +++ b/hw/vfio/display.c
> @@ -38,6 +38,21 @@
>  goto err;
>  
>  
> +static void vfio_display_edid_link_up(void *opaque)
> +{
> +VFIOPCIDevice *vdev = opaque;
> +VFIODisplay *dpy = vdev->dpy;
> +int fd = vdev->vbasedev.fd;
> +
> +dpy->edid_regs->link_state = VFIO_DEVICE_GFX_LINK_STATE_UP;
> +pwrite_field(fd, dpy->edid_info, dpy->edid_regs, link_state);
> +trace_vfio_display_edid_link_up();
> +return;
> +
> +err:
> +trace_vfio_display_edid_write_error();

No jumps to here.  Thanks,

Alex

> +}
> +
>  static void vfio_display_edid_update(VFIOPCIDevice *vdev, bool enabled,
>   int prefx, int prefy)
>  {
> @@ -50,6 +65,7 @@ static void vfio_display_edid_update(VFIOPCIDevice *vdev, 
> bool enabled,
>  .prefy = prefy ?: vdev->display_yres,
>  };
>  
> +timer_del(dpy->edid_link_timer);
>  dpy->edid_regs->link_state = VFIO_DEVICE_GFX_LINK_STATE_DOWN;
>  pwrite_field(fd, dpy->edid_info, dpy->edid_regs, link_state);
>  trace_vfio_display_edid_link_down();
> @@ -77,9 +93,8 @@ static void vfio_display_edid_update(VFIOPCIDevice *vdev, 
> bool enabled,
>  goto err;
>  }
>  
> -dpy->edid_regs->link_state = VFIO_DEVICE_GFX_LINK_STATE_UP;
> -pwrite_field(fd, dpy->edid_info, dpy->edid_regs, link_state);
> -trace_vfio_display_edid_link_up();
> +timer_mod(dpy->edid_link_timer,
> +  qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + 100);
>  return;
>  
>  err:
> @@ -140,6 +155,9 @@ static void vfio_display_edid_init(VFIOPCIDevice *vdev)
>  vdev->display_yres = dpy->edid_regs->max_yres;
>  }
>  
> +dpy->edid_link_timer = timer_new_ms(QEMU_CLOCK_REALTIME,
> +vfio_display_edid_link_up, vdev);
> +
>  vfio_display_edid_update(vdev, true, 0, 0);
>  return;
>  
> @@ -158,6 +176,8 @@ static void vfio_display_edid_exit(VFIODisplay *dpy)
>  
>  g_free(dpy->edid_regs);
>  g_free(dpy->edid_blob);
> +timer_del(dpy->edid_link_timer);
> +timer_free(dpy->edid_link_timer);
>  }
>  
>  static void vfio_display_update_cursor(VFIODMABuf *dmabuf,




Re: [Qemu-devel] [PATCH v2 2/3] vfio/display: add xres + yres properties

2019-02-20 Thread Alex Williamson
On Wed, 20 Feb 2019 09:47:52 +0100
Gerd Hoffmann  wrote:

> This allows configure the display resolution which the vgpu should use.
> The information will be passed to the guest using EDID, so the mdev
> driver must support the vfio edid region for this to work.
> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  hw/vfio/pci.h |  2 ++
>  hw/vfio/display.c | 16 ++--
>  hw/vfio/pci.c |  2 ++
>  3 files changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> index b1ae4c0754..c11c3f1670 100644
> --- a/hw/vfio/pci.h
> +++ b/hw/vfio/pci.h
> @@ -149,6 +149,8 @@ typedef struct VFIOPCIDevice {
>  #define VFIO_FEATURE_ENABLE_IGD_OPREGION \
>  (1 << VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT)
>  OnOffAuto display;
> +uint32_t display_xres;
> +uint32_t display_yres;
>  int32_t bootindex;
>  uint32_t igd_gms;
>  OffAutoPCIBAR msix_relo;
> diff --git a/hw/vfio/display.c b/hw/vfio/display.c
> index ed2eb19ea3..7b9b604a64 100644
> --- a/hw/vfio/display.c
> +++ b/hw/vfio/display.c
> @@ -46,8 +46,8 @@ static void vfio_display_edid_update(VFIOPCIDevice *vdev, 
> bool enabled,
>  qemu_edid_info edid = {
>  .maxx  = dpy->edid_regs->max_xres,
>  .maxy  = dpy->edid_regs->max_yres,
> -.prefx = prefx,
> -.prefy = prefy,
> +.prefx = prefx ?: vdev->display_xres,
> +.prefy = prefy ?: vdev->display_yres,
>  };
>  
>  dpy->edid_regs->link_state = VFIO_DEVICE_GFX_LINK_STATE_DOWN;
> @@ -117,6 +117,10 @@ static void vfio_display_edid_init(VFIOPCIDevice *vdev)
> VFIO_REGION_SUBTYPE_GFX_EDID,
> >edid_info);
>  if (ret) {
> +if (vdev->display_xres || vdev->display_yres) {
> +warn_report("vfio: no edid support available, "
> +"xres and yres properties have no effect.");
> +}

In order to get here the device needs to have a display option set to
'on' or 'auto' and that display needs to be backed by a dmabuf graphics
plane.  That means that QEMU is happy to run without any warning if a
user sets a resolution on a region backed display, or a device without
a display.  I think that QEMU should probably fail, not just warn, for
all cases where an option is not appropriate for a device.  Perhaps
EDID setup should set a feature bit or flag that we can test similar to
how and where we test for a stray ramfb option.

>  return;
>  }
>  
> @@ -128,6 +132,14 @@ static void vfio_display_edid_init(VFIOPCIDevice *vdev)
>  pread_field(fd, dpy->edid_info, dpy->edid_regs, max_yres);
>  dpy->edid_blob = g_malloc0(dpy->edid_regs->edid_max_size);
>  
> +/* if xres + yres properties are unset use the maximum resolution */
> +if (!vdev->display_xres) {
> +vdev->display_xres = dpy->edid_regs->max_xres;
> +}
> +if (!vdev->display_yres) {
> +vdev->display_yres = dpy->edid_regs->max_yres;
> +}
> +
>  vfio_display_edid_update(vdev, true, 0, 0);
>  return;
>  
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index dd12f36391..edb8394038 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3182,6 +3182,8 @@ static Property vfio_pci_dev_properties[] = {
>  DEFINE_PROP_STRING("sysfsdev", VFIOPCIDevice, vbasedev.sysfsdev),
>  DEFINE_PROP_ON_OFF_AUTO("display", VFIOPCIDevice,
>  display, ON_OFF_AUTO_OFF),
> +DEFINE_PROP_UINT32("xres", VFIOPCIDevice, display_xres, 0),
> +DEFINE_PROP_UINT32("yres", VFIOPCIDevice, display_yres, 0),

This is actually quite fun, I started my VM with arbitrary numbers and
the Windows GUI honored it every time.  Probably very useful for
playing with odd screen sizes.  I also tried to break it using
100x100, but the display came up as 1920x1200, the maximum
resolution GVT-g supports for this type.  I don't see that QEMU is
bounding this though, do we depend on the mdev device to ignore it if
we pass values it cannot support?  Thanks,

Alex

>  DEFINE_PROP_UINT32("x-intx-mmap-timeout-ms", VFIOPCIDevice,
> intx.mmap_timeout, 1100),
>  DEFINE_PROP_BIT("x-vga", VFIOPCIDevice, features,




Re: [Qemu-devel] [PATCH v2 1/3] vfio/display: add edid support.

2019-02-20 Thread Alex Williamson
On Wed, 20 Feb 2019 09:47:51 +0100
Gerd Hoffmann  wrote:

> This patch adds EDID support to the vfio display (aka vgpu) code.
> When supported by the mdev driver qemu will generate a EDID blob
> and pass it on using the new vfio edid region.  The EDID blob will
> be updated on UI changes (i.e. window resize), so the guest can
> adapt.

What are the requirements to enable this resizing feature?  I grabbed
the gvt-next-2019-02-01 branch and my ever expanding qemu:commandline
now looks like this:

  










  

Other relevant sections:


  
  


  
  


  

  
  
  


The 1600x900 is used, which is great, but neither virt-manager nor
virt-viewer are giving me any automatic resizing as this seems to
suggest it should.

One nit and one bug below.  Thanks,

Alex

> Signed-off-by: Gerd Hoffmann 
> ---
>  include/hw/vfio/vfio-common.h |   3 +
>  hw/vfio/display.c | 127 
> ++
>  hw/vfio/trace-events  |   7 +++
>  3 files changed, 137 insertions(+)
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 7624c9f511..5f7f709b95 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -148,6 +148,9 @@ typedef struct VFIODMABuf {
>  typedef struct VFIODisplay {
>  QemuConsole *con;
>  RAMFBState *ramfb;
> +struct vfio_region_info *edid_info;
> +struct vfio_region_gfx_edid *edid_regs;
> +uint8_t *edid_blob;
>  struct {
>  VFIORegion buffer;
>  DisplaySurface *surface;
> diff --git a/hw/vfio/display.c b/hw/vfio/display.c
> index dead30e626..ed2eb19ea3 100644
> --- a/hw/vfio/display.c
> +++ b/hw/vfio/display.c
> @@ -15,15 +15,139 @@
>  #include 
>  
>  #include "sysemu/sysemu.h"
> +#include "hw/display/edid.h"
>  #include "ui/console.h"
>  #include "qapi/error.h"
>  #include "pci.h"
> +#include "trace.h"
>  
>  #ifndef DRM_PLANE_TYPE_PRIMARY
>  # define DRM_PLANE_TYPE_PRIMARY 1
>  # define DRM_PLANE_TYPE_CURSOR  2
>  #endif
>  
> +#define pread_field(_fd, _reg, _ptr, _fld)  \
> +if (sizeof(_ptr->_fld) !=   \
> +pread(_fd, &(_ptr->_fld), sizeof(_ptr->_fld),   \
> +  _reg->offset + offsetof(typeof(*_ptr), _fld)))\
> +goto err;
> +#define pwrite_field(_fd, _reg, _ptr, _fld) \
> +if (sizeof(_ptr->_fld) !=   \
> +pwrite(_fd, &(_ptr->_fld), sizeof(_ptr->_fld),  \
> +   _reg->offset + offsetof(typeof(*_ptr), _fld)))   \
> +goto err;
> +
> +
> +static void vfio_display_edid_update(VFIOPCIDevice *vdev, bool enabled,
> + int prefx, int prefy)
> +{
> +VFIODisplay *dpy = vdev->dpy;
> +int fd = vdev->vbasedev.fd;
> +qemu_edid_info edid = {
> +.maxx  = dpy->edid_regs->max_xres,
> +.maxy  = dpy->edid_regs->max_yres,
> +.prefx = prefx,
> +.prefy = prefy,
> +};
> +
> +dpy->edid_regs->link_state = VFIO_DEVICE_GFX_LINK_STATE_DOWN;
> +pwrite_field(fd, dpy->edid_info, dpy->edid_regs, link_state);
> +trace_vfio_display_edid_link_down();
> +
> +if (!enabled) {
> +return;
> +}
> +
> +if (edid.maxx && edid.prefx > edid.maxx) {
> +edid.prefx = edid.maxx;
> +}
> +if (edid.maxy && edid.prefy > edid.maxy) {
> +edid.prefy = edid.maxy;
> +}
> +qemu_edid_generate(dpy->edid_blob,
> +   dpy->edid_regs->edid_max_size,
> +   );
> +trace_vfio_display_edid_update(edid.prefx, edid.prefy);
> +
> +dpy->edid_regs->edid_size = qemu_edid_size(dpy->edid_blob);
> +pwrite_field(fd, dpy->edid_info, dpy->edid_regs, edid_size);
> +if (pwrite(fd, dpy->edid_blob, dpy->edid_regs->edid_size,
> +   dpy->edid_info->offset + dpy->edid_regs->edid_offset)
> +!= dpy->edid_regs->edid_size) {
> +goto err;
> +}
> +
> +dpy->edid_regs->link_state = VFIO_DEVICE_GFX_LINK_STATE_UP;
> +pwrite_field(fd, dpy->edid_info, dpy->edid_regs, link_state);
> +trace_vfio_display_edid_link_up();
> +return;
> +
> +err:
> +trace_vfio_display_edid_write_error();
> +return;

nit, no unwind and only one call point, could probably do without the
goto.

> +}
> +
> +static int vfio_display_edid_ui_info(void *opaque, uint32_t idx,
> + QemuUIInfo *info)
> +{
> +VFIOPCIDevice *vdev = opaque;
> +VFIODisplay *dpy = vdev->dpy;
> +
> +if (!dpy->edid_regs) {
> +return 0;
> +}
> +
> +if (info->width && info->height) {
> +vfio_display_edid_update(vdev, true, info->width, info->height);
> +} else {
> +vfio_display_edid_update(vdev, false, 0, 0);
> +}
> +
> +

[Qemu-devel] [PATCH v5 07/14] dsoundaudio: port to -audiodev config

2019-02-20 Thread Kővágó, Zoltán
Signed-off-by: Kővágó, Zoltán 
---
 audio/dsound_template.h |  6 ++---
 audio/audio_legacy.c| 41 
 audio/dsoundaudio.c | 59 -
 3 files changed, 61 insertions(+), 45 deletions(-)

diff --git a/audio/dsound_template.h b/audio/dsound_template.h
index b439f33f58..8ece870c9e 100644
--- a/audio/dsound_template.h
+++ b/audio/dsound_template.h
@@ -167,17 +167,18 @@ static int dsound_init_out(HWVoiceOut *hw, struct 
audsettings *as,
 dsound *s = drv_opaque;
 WAVEFORMATEX wfx;
 struct audsettings obt_as;
-DSoundConf *conf = >conf;
 #ifdef DSBTYPE_IN
 const char *typ = "ADC";
 DSoundVoiceIn *ds = (DSoundVoiceIn *) hw;
 DSCBUFFERDESC bd;
 DSCBCAPS bc;
+AudiodevPerDirectionOptions *pdo = s->dev->u.dsound.in;
 #else
 const char *typ = "DAC";
 DSoundVoiceOut *ds = (DSoundVoiceOut *) hw;
 DSBUFFERDESC bd;
 DSBCAPS bc;
+AudiodevPerDirectionOptions *pdo = s->dev->u.dsound.out;
 #endif
 
 if (!s->FIELD2) {
@@ -193,8 +194,8 @@ static int dsound_init_out(HWVoiceOut *hw, struct 
audsettings *as,
 memset (, 0, sizeof (bd));
 bd.dwSize = sizeof (bd);
 bd.lpwfxFormat = 
+bd.dwBufferBytes = audio_buffer_bytes(pdo, as, 92880);
 #ifdef DSBTYPE_IN
-bd.dwBufferBytes = conf->bufsize_in;
 hr = IDirectSoundCapture_CreateCaptureBuffer (
 s->dsound_capture,
 ,
@@ -203,7 +204,6 @@ static int dsound_init_out(HWVoiceOut *hw, struct 
audsettings *as,
 );
 #else
 bd.dwFlags = DSBCAPS_STICKYFOCUS | DSBCAPS_GETCURRENTPOSITION2;
-bd.dwBufferBytes = conf->bufsize_out;
 hr = IDirectSound_CreateSoundBuffer (
 s->dsound,
 ,
diff --git a/audio/audio_legacy.c b/audio/audio_legacy.c
index 086011fbff..17105e7239 100644
--- a/audio/audio_legacy.c
+++ b/audio/audio_legacy.c
@@ -120,6 +120,30 @@ static void get_frames_to_usecs(const char *env, uint32_t 
*dst, bool *has_dst,
 }
 }
 
+static uint32_t samples_to_usecs(uint32_t samples,
+ AudiodevPerDirectionOptions *pdo)
+{
+uint32_t channels = pdo->has_channels ? pdo->channels : 2;
+return frames_to_usecs(samples / channels, pdo);
+}
+
+static uint32_t bytes_to_usecs(uint32_t bytes, AudiodevPerDirectionOptions 
*pdo)
+{
+AudioFormat fmt = pdo->has_format ? pdo->format : AUDIO_FORMAT_S16;
+uint32_t bytes_per_sample = audioformat_bytes_per_sample(fmt);
+return samples_to_usecs(bytes / bytes_per_sample, pdo);
+}
+
+static void get_bytes_to_usecs(const char *env, uint32_t *dst, bool *has_dst,
+   AudiodevPerDirectionOptions *pdo)
+{
+const char *val = getenv(env);
+if (val) {
+*dst = bytes_to_usecs(toui32(val), pdo);
+*has_dst = true;
+}
+}
+
 /* backend specific functions */
 /* ALSA */
 static void handle_alsa_per_direction(
@@ -180,6 +204,19 @@ static void handle_coreaudio(Audiodev *dev)
 >u.coreaudio.out->has_buffer_count);
 }
 
+/* dsound */
+static void handle_dsound(Audiodev *dev)
+{
+get_millis_to_usecs("QEMU_DSOUND_LATENCY_MILLIS",
+>u.dsound.latency, >u.dsound.has_latency);
+get_bytes_to_usecs("QEMU_DSOUND_BUFSIZE_OUT",
+   >u.dsound.out->buffer_len,
+   >u.dsound.out->has_buffer_len, dev->u.dsound.out);
+get_bytes_to_usecs("QEMU_DSOUND_BUFSIZE_IN",
+   >u.dsound.in->buffer_len,
+   >u.dsound.in->has_buffer_len, dev->u.dsound.in);
+}
+
 /* general */
 static void handle_per_direction(
 AudiodevPerDirectionOptions *pdo, const char *prefix)
@@ -229,6 +266,10 @@ static AudiodevListEntry *legacy_opt(const char *drvname)
 handle_coreaudio(e->dev);
 break;
 
+case AUDIODEV_DRIVER_DSOUND:
+handle_dsound(e->dev);
+break;
+
 default:
 break;
 }
diff --git a/audio/dsoundaudio.c b/audio/dsoundaudio.c
index 02fe777cba..a7d04b5033 100644
--- a/audio/dsoundaudio.c
+++ b/audio/dsoundaudio.c
@@ -32,6 +32,7 @@
 
 #define AUDIO_CAP "dsound"
 #include "audio_int.h"
+#include "qemu/host-utils.h"
 
 #include 
 #include 
@@ -42,17 +43,11 @@
 
 /* #define DEBUG_DSOUND */
 
-typedef struct {
-int bufsize_in;
-int bufsize_out;
-int latency_millis;
-} DSoundConf;
-
 typedef struct {
 LPDIRECTSOUND dsound;
 LPDIRECTSOUNDCAPTURE dsound_capture;
 struct audsettings settings;
-DSoundConf conf;
+Audiodev *dev;
 } dsound;
 
 typedef struct {
@@ -248,9 +243,9 @@ static void GCC_FMT_ATTR (3, 4) dsound_logerr2 (
 dsound_log_hresult (hr);
 }
 
-static DWORD millis_to_bytes (struct audio_pcm_info *info, DWORD millis)
+static uint64_t usecs_to_bytes(struct audio_pcm_info *info, uint32_t usecs)
 {
-return (millis * info->bytes_per_second) / 1000;
+return muldiv64(usecs, info->bytes_per_second, 100);
 }
 
 #ifdef DEBUG_DSOUND
@@ -478,7 +473,7 @@ static int dsound_run_out (HWVoiceOut *hw, int live)
 

[Qemu-devel] [PATCH v5 02/14] audio: use qapi AudioFormat instead of audfmt_e

2019-02-20 Thread Kővágó, Zoltán
I had to include an enum for audio sampling formats into qapi, but that
meant duplicating the audfmt_e enum.  This patch replaces audfmt_e and
associated values with the qapi generated AudioFormat enum.

This patch is mostly a search-and-replace, except for switches where the
qapi generated AUDIO_FORMAT_MAX caused problems.

Signed-off-by: Kővágó, Zoltán 
Reviewed-by: Thomas Huth 
---
 audio/audio.h | 12 +
 audio/alsaaudio.c | 53 +++--
 audio/audio.c | 97 +--
 audio/audio_win_int.c | 18 
 audio/ossaudio.c  | 30 ++--
 audio/paaudio.c   | 28 +--
 audio/sdlaudio.c  | 26 +--
 audio/spiceaudio.c|  4 +-
 audio/wavaudio.c  | 17 ---
 audio/wavcapture.c|  2 +-
 hw/arm/omap2.c|  2 +-
 hw/audio/ac97.c   |  2 +-
 hw/audio/adlib.c  |  2 +-
 hw/audio/cs4231a.c|  6 +--
 hw/audio/es1370.c |  4 +-
 hw/audio/gus.c|  2 +-
 hw/audio/hda-codec.c  | 18 
 hw/audio/lm4549.c |  6 +--
 hw/audio/milkymist-ac97.c |  2 +-
 hw/audio/pcspk.c  |  2 +-
 hw/audio/sb16.c   | 14 +++---
 hw/audio/wm8750.c |  6 +--
 hw/display/xlnx_dp.c  |  2 +-
 hw/input/tsc210x.c|  2 +-
 hw/usb/dev-audio.c|  2 +-
 ui/vnc.c  | 26 +--
 26 files changed, 196 insertions(+), 189 deletions(-)

diff --git a/audio/audio.h b/audio/audio.h
index f4339a185e..02f29a3b3e 100644
--- a/audio/audio.h
+++ b/audio/audio.h
@@ -26,18 +26,10 @@
 #define QEMU_AUDIO_H
 
 #include "qemu/queue.h"
+#include "qapi/qapi-types-audio.h"
 
 typedef void (*audio_callback_fn) (void *opaque, int avail);
 
-typedef enum {
-AUD_FMT_U8,
-AUD_FMT_S8,
-AUD_FMT_U16,
-AUD_FMT_S16,
-AUD_FMT_U32,
-AUD_FMT_S32
-} audfmt_e;
-
 #ifdef HOST_WORDS_BIGENDIAN
 #define AUDIO_HOST_ENDIANNESS 1
 #else
@@ -47,7 +39,7 @@ typedef enum {
 struct audsettings {
 int freq;
 int nchannels;
-audfmt_e fmt;
+AudioFormat fmt;
 int endianness;
 };
 
diff --git a/audio/alsaaudio.c b/audio/alsaaudio.c
index 635be73bf4..5bd034267f 100644
--- a/audio/alsaaudio.c
+++ b/audio/alsaaudio.c
@@ -87,7 +87,7 @@ struct alsa_params_req {
 
 struct alsa_params_obt {
 int freq;
-audfmt_e fmt;
+AudioFormat fmt;
 int endianness;
 int nchannels;
 snd_pcm_uframes_t samples;
@@ -294,16 +294,16 @@ static int alsa_write (SWVoiceOut *sw, void *buf, int len)
 return audio_pcm_sw_write (sw, buf, len);
 }
 
-static snd_pcm_format_t aud_to_alsafmt (audfmt_e fmt, int endianness)
+static snd_pcm_format_t aud_to_alsafmt (AudioFormat fmt, int endianness)
 {
 switch (fmt) {
-case AUD_FMT_S8:
+case AUDIO_FORMAT_S8:
 return SND_PCM_FORMAT_S8;
 
-case AUD_FMT_U8:
+case AUDIO_FORMAT_U8:
 return SND_PCM_FORMAT_U8;
 
-case AUD_FMT_S16:
+case AUDIO_FORMAT_S16:
 if (endianness) {
 return SND_PCM_FORMAT_S16_BE;
 }
@@ -311,7 +311,7 @@ static snd_pcm_format_t aud_to_alsafmt (audfmt_e fmt, int 
endianness)
 return SND_PCM_FORMAT_S16_LE;
 }
 
-case AUD_FMT_U16:
+case AUDIO_FORMAT_U16:
 if (endianness) {
 return SND_PCM_FORMAT_U16_BE;
 }
@@ -319,7 +319,7 @@ static snd_pcm_format_t aud_to_alsafmt (audfmt_e fmt, int 
endianness)
 return SND_PCM_FORMAT_U16_LE;
 }
 
-case AUD_FMT_S32:
+case AUDIO_FORMAT_S32:
 if (endianness) {
 return SND_PCM_FORMAT_S32_BE;
 }
@@ -327,7 +327,7 @@ static snd_pcm_format_t aud_to_alsafmt (audfmt_e fmt, int 
endianness)
 return SND_PCM_FORMAT_S32_LE;
 }
 
-case AUD_FMT_U32:
+case AUDIO_FORMAT_U32:
 if (endianness) {
 return SND_PCM_FORMAT_U32_BE;
 }
@@ -344,58 +344,58 @@ static snd_pcm_format_t aud_to_alsafmt (audfmt_e fmt, int 
endianness)
 }
 }
 
-static int alsa_to_audfmt (snd_pcm_format_t alsafmt, audfmt_e *fmt,
+static int alsa_to_audfmt (snd_pcm_format_t alsafmt, AudioFormat *fmt,
int *endianness)
 {
 switch (alsafmt) {
 case SND_PCM_FORMAT_S8:
 *endianness = 0;
-*fmt = AUD_FMT_S8;
+*fmt = AUDIO_FORMAT_S8;
 break;
 
 case SND_PCM_FORMAT_U8:
 *endianness = 0;
-*fmt = AUD_FMT_U8;
+*fmt = AUDIO_FORMAT_U8;
 break;
 
 case SND_PCM_FORMAT_S16_LE:
 *endianness = 0;
-*fmt = AUD_FMT_S16;
+*fmt = AUDIO_FORMAT_S16;
 break;
 
 case SND_PCM_FORMAT_U16_LE:
 *endianness = 0;
-*fmt = AUD_FMT_U16;
+*fmt = AUDIO_FORMAT_U16;
 break;
 
 case SND_PCM_FORMAT_S16_BE:
 *endianness = 1;
-*fmt = AUD_FMT_S16;
+*fmt = AUDIO_FORMAT_S16;
 break;
 
 case SND_PCM_FORMAT_U16_BE:
 *endianness = 1;
-*fmt = AUD_FMT_U16;

[Qemu-devel] [PATCH v5 05/14] alsaaudio: port to -audiodev config

2019-02-20 Thread Kővágó, Zoltán
Signed-off-by: Kővágó, Zoltán 
---
 audio/alsaaudio.c| 330 +--
 audio/audio_legacy.c |  84 ++-
 2 files changed, 182 insertions(+), 232 deletions(-)

diff --git a/audio/alsaaudio.c b/audio/alsaaudio.c
index 8302f3e882..ecd0474310 100644
--- a/audio/alsaaudio.c
+++ b/audio/alsaaudio.c
@@ -33,28 +33,9 @@
 #define AUDIO_CAP "alsa"
 #include "audio_int.h"
 
-typedef struct ALSAConf {
-int size_in_usec_in;
-int size_in_usec_out;
-const char *pcm_name_in;
-const char *pcm_name_out;
-unsigned int buffer_size_in;
-unsigned int period_size_in;
-unsigned int buffer_size_out;
-unsigned int period_size_out;
-unsigned int threshold;
-
-int buffer_size_in_overridden;
-int period_size_in_overridden;
-
-int buffer_size_out_overridden;
-int period_size_out_overridden;
-} ALSAConf;
-
 struct pollhlp {
 snd_pcm_t *handle;
 struct pollfd *pfds;
-ALSAConf *conf;
 int count;
 int mask;
 };
@@ -66,6 +47,7 @@ typedef struct ALSAVoiceOut {
 void *pcm_buf;
 snd_pcm_t *handle;
 struct pollhlp pollhlp;
+Audiodev *dev;
 } ALSAVoiceOut;
 
 typedef struct ALSAVoiceIn {
@@ -73,16 +55,13 @@ typedef struct ALSAVoiceIn {
 snd_pcm_t *handle;
 void *pcm_buf;
 struct pollhlp pollhlp;
+Audiodev *dev;
 } ALSAVoiceIn;
 
 struct alsa_params_req {
 int freq;
 snd_pcm_format_t fmt;
 int nchannels;
-int size_in_usec;
-int override_mask;
-unsigned int buffer_size;
-unsigned int period_size;
 };
 
 struct alsa_params_obt {
@@ -408,17 +387,19 @@ static int alsa_to_audfmt (snd_pcm_format_t alsafmt, 
AudioFormat *fmt,
 
 static void alsa_dump_info (struct alsa_params_req *req,
 struct alsa_params_obt *obt,
-snd_pcm_format_t obtfmt)
+snd_pcm_format_t obtfmt,
+AudiodevAlsaPerDirectionOptions *apdo)
 {
-dolog ("parameter | requested value | obtained value\n");
-dolog ("format|  %10d | %10d\n", req->fmt, obtfmt);
-dolog ("channels  |  %10d | %10d\n",
-   req->nchannels, obt->nchannels);
-dolog ("frequency |  %10d | %10d\n", req->freq, obt->freq);
-dolog ("\n");
-dolog ("requested: buffer size %d period size %d\n",
-   req->buffer_size, req->period_size);
-dolog ("obtained: samples %ld\n", obt->samples);
+dolog("parameter | requested value | obtained value\n");
+dolog("format|  %10d | %10d\n", req->fmt, obtfmt);
+dolog("channels  |  %10d | %10d\n",
+  req->nchannels, obt->nchannels);
+dolog("frequency |  %10d | %10d\n", req->freq, obt->freq);
+dolog("\n");
+dolog("requested: buffer len %" PRId32 " period len %" PRId32 "\n",
+  apdo->has_buffer_len ? apdo->buffer_len : 0,
+  apdo->has_period_len ? apdo->period_len : 0);
+dolog("obtained: samples %ld\n", obt->samples);
 }
 
 static void alsa_set_threshold (snd_pcm_t *handle, snd_pcm_uframes_t threshold)
@@ -451,23 +432,23 @@ static void alsa_set_threshold (snd_pcm_t *handle, 
snd_pcm_uframes_t threshold)
 }
 }
 
-static int alsa_open (int in, struct alsa_params_req *req,
-  struct alsa_params_obt *obt, snd_pcm_t **handlep,
-  ALSAConf *conf)
+static int alsa_open(bool in, struct alsa_params_req *req,
+ struct alsa_params_obt *obt, snd_pcm_t **handlep,
+ Audiodev *dev)
 {
+AudiodevAlsaOptions *aopts = >u.alsa;
+AudiodevAlsaPerDirectionOptions *apdo = in ? aopts->in : aopts->out;
 snd_pcm_t *handle;
 snd_pcm_hw_params_t *hw_params;
 int err;
-int size_in_usec;
 unsigned int freq, nchannels;
-const char *pcm_name = in ? conf->pcm_name_in : conf->pcm_name_out;
+const char *pcm_name = apdo->has_dev ? apdo->dev : "default";
 snd_pcm_uframes_t obt_buffer_size;
 const char *typ = in ? "ADC" : "DAC";
 snd_pcm_format_t obtfmt;
 
 freq = req->freq;
 nchannels = req->nchannels;
-size_in_usec = req->size_in_usec;
 
 snd_pcm_hw_params_alloca (_params);
 
@@ -527,79 +508,42 @@ static int alsa_open (int in, struct alsa_params_req *req,
 goto err;
 }
 
-if (req->buffer_size) {
-unsigned long obt;
+if (apdo->buffer_len) {
+int dir = 0;
+unsigned int btime = apdo->buffer_len;
 
-if (size_in_usec) {
-int dir = 0;
-unsigned int btime = req->buffer_size;
+err = snd_pcm_hw_params_set_buffer_time_near(
+handle, hw_params, , );
 
-err = snd_pcm_hw_params_set_buffer_time_near (
-handle,
-hw_params,
-,
-
-);
-obt = btime;
-}
-else {
-

[Qemu-devel] [PATCH v5 01/14] qapi: qapi for audio backends

2019-02-20 Thread Kővágó, Zoltán
This patch adds structures into qapi to replace the existing
configuration structures used by audio backends currently. This qapi
will be the base of the -audiodev command line parameter (that replaces
the old environment variables based config).

This is not a 1:1 translation of the old options, I've tried to make
them much more consistent (e.g. almost every backend had an option to
specify buffer size, but the name was different for every backend, and
some backends required usecs, while some other required frames, samples
or bytes). Also tried to reduce the number of abbreviations used by the
config keys.

Some of the more important changes:
* use `in` and `out` instead of `ADC` and `DAC`, as the former is more
  user friendly imho
* moved buffer settings into the global setting area (so it's the same
  for all backends that support it. Backends that can't change buffer
  size will simply ignore them). Also using usecs, as it's probably more
  user friendly than samples or bytes.
* try-poll is now an alsa backend specific option (as all other backends
  currently ignore it)

Signed-off-by: Kővágó, Zoltán 
---

Notes:
Changes from v4:

* documentation fixes
* renamed pa's source/sink to pa-in/pa-out
* per-direction options changed per Markus Armbruster's comments

Changes from v2:

* update copyright, version numbers
* remove #optional
* per-direction options are now optional (needed for 
qobject_object_visitor_new_str)
* removed unnecessary AudiodevNoOptions
* changed integers to unsigned

 qapi/audio.json   | 304 ++
 qapi/qapi-schema.json |   1 +
 qapi/Makefile.objs|   6 +-
 3 files changed, 308 insertions(+), 3 deletions(-)
 create mode 100644 qapi/audio.json

diff --git a/qapi/audio.json b/qapi/audio.json
new file mode 100644
index 00..2f203462c7
--- /dev/null
+++ b/qapi/audio.json
@@ -0,0 +1,304 @@
+# -*- mode: python -*-
+#
+# Copyright (C) 2015-2019 Zoltán Kővágó 
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or later.
+# See the COPYING file in the top-level directory.
+
+##
+# @AudiodevPerDirectionOptions:
+#
+# General audio backend options that are used for both playback and
+# recording.
+#
+# @fixed-settings: use fixed settings for host input/output. When off,
+#  frequency, channels and format must not be
+#  specified (default true)
+#
+# @frequency: frequency to use when using fixed settings
+# (default 44100)
+#
+# @channels: number of channels when using fixed settings (default 2)
+#
+# @voices: number of voices to use (default 1)
+#
+# @format: sample format to use when using fixed settings
+#  (default s16)
+#
+# @buffer-len: the buffer size in microseconds
+#
+# Since: 4.0
+##
+{ 'struct': 'AudiodevPerDirectionOptions',
+  'data': {
+'*fixed-settings': 'bool',
+'*frequency':  'uint32',
+'*channels':   'uint32',
+'*voices': 'uint32',
+'*format': 'AudioFormat',
+'*buffer-len': 'uint32' } }
+
+##
+# @AudiodevGenericOptions:
+#
+# Generic driver-specific options.
+#
+# @in: options of the capture stream
+#
+# @out: options of the playback stream
+#
+# Since: 4.0
+##
+{ 'struct': 'AudiodevGenericOptions',
+  'data': {
+'*in':  'AudiodevPerDirectionOptions',
+'*out': 'AudiodevPerDirectionOptions' } }
+
+##
+# @AudiodevAlsaPerDirectionOptions:
+#
+# Options of the alsa backend that are used for both playback and
+# recording.
+#
+# @dev: the name of the alsa device to use (default 'default')
+#
+# @period-len: the period length in microseconds
+#
+# @try-poll: attempt to use poll mode, falling back to non-polling
+#access on failure (default true)
+#
+# Since: 4.0
+##
+{ 'struct': 'AudiodevAlsaPerDirectionOptions',
+  'base': 'AudiodevPerDirectionOptions',
+  'data': {
+'*dev':'str',
+'*period-len': 'uint32',
+'*try-poll':   'bool' } }
+
+##
+# @AudiodevAlsaOptions:
+#
+# Options of the alsa audio backend.
+#
+# @in: options of the capture stream
+#
+# @out: options of the playback stream
+#
+# @threshold: set the threshold (in microseconds) when playback starts
+#
+# Since: 4.0
+##
+{ 'struct': 'AudiodevAlsaOptions',
+  'data': {
+'*in':'AudiodevAlsaPerDirectionOptions',
+'*out':   'AudiodevAlsaPerDirectionOptions',
+'*threshold': 'uint32' } }
+
+##
+# @AudiodevCoreaudioPerDirectionOptions:
+#
+# Options of the coreaudio backend that are used for both playback and
+# recording.
+#
+# @buffer-count: number of buffers
+#
+# Since: 4.0
+##
+{ 'struct': 'AudiodevCoreaudioPerDirectionOptions',
+  'base': 'AudiodevPerDirectionOptions',
+  'data': {
+'*buffer-count': 'uint32' } }
+
+##
+# @AudiodevCoreaudioOptions:
+#
+# Options of the coreaudio audio backend.
+#
+# @in: options of the capture stream
+#
+# @out: options of the playback stream
+#
+# Since: 4.0
+##
+{ 'struct': 

[Qemu-devel] [PATCH v5 09/14] ossaudio: port to -audiodev config

2019-02-20 Thread Kővágó, Zoltán
Signed-off-by: Kővágó, Zoltán 
---
 audio/audio_legacy.c |  32 +
 audio/ossaudio.c | 161 ++-
 2 files changed, 83 insertions(+), 110 deletions(-)

diff --git a/audio/audio_legacy.c b/audio/audio_legacy.c
index 17105e7239..bfba41fefe 100644
--- a/audio/audio_legacy.c
+++ b/audio/audio_legacy.c
@@ -217,6 +217,34 @@ static void handle_dsound(Audiodev *dev)
>u.dsound.in->has_buffer_len, dev->u.dsound.in);
 }
 
+/* OSS */
+static void handle_oss_per_direction(
+AudiodevOssPerDirectionOptions *opdo, const char *try_poll_env,
+const char *dev_env)
+{
+get_bool(try_poll_env, >try_poll, >has_try_poll);
+get_str(dev_env, >dev, >has_dev);
+
+get_bytes_to_usecs("QEMU_OSS_FRAGSIZE",
+   >buffer_len, >has_buffer_len,
+   qapi_AudiodevOssPerDirectionOptions_base(opdo));
+get_int("QEMU_OSS_NFRAGS", >buffer_count,
+>has_buffer_count);
+}
+
+static void handle_oss(Audiodev *dev)
+{
+AudiodevOssOptions *oopt = >u.oss;
+handle_oss_per_direction(oopt->in, "QEMU_AUDIO_ADC_TRY_POLL",
+ "QEMU_OSS_ADC_DEV");
+handle_oss_per_direction(oopt->out, "QEMU_AUDIO_DAC_TRY_POLL",
+ "QEMU_OSS_DAC_DEV");
+
+get_bool("QEMU_OSS_MMAP", >try_mmap, >has_try_mmap);
+get_bool("QEMU_OSS_EXCLUSIVE", >exclusive, >has_exclusive);
+get_int("QEMU_OSS_POLICY", >dsp_policy, >has_dsp_policy);
+}
+
 /* general */
 static void handle_per_direction(
 AudiodevPerDirectionOptions *pdo, const char *prefix)
@@ -270,6 +298,10 @@ static AudiodevListEntry *legacy_opt(const char *drvname)
 handle_dsound(e->dev);
 break;
 
+case AUDIODEV_DRIVER_OSS:
+handle_oss(e->dev);
+break;
+
 default:
 break;
 }
diff --git a/audio/ossaudio.c b/audio/ossaudio.c
index e0cadbef29..fc28981a39 100644
--- a/audio/ossaudio.c
+++ b/audio/ossaudio.c
@@ -37,16 +37,6 @@
 #define USE_DSP_POLICY
 #endif
 
-typedef struct OSSConf {
-int try_mmap;
-int nfrags;
-int fragsize;
-const char *devpath_out;
-const char *devpath_in;
-int exclusive;
-int policy;
-} OSSConf;
-
 typedef struct OSSVoiceOut {
 HWVoiceOut hw;
 void *pcm_buf;
@@ -56,7 +46,7 @@ typedef struct OSSVoiceOut {
 int fragsize;
 int mmapped;
 int pending;
-OSSConf *conf;
+Audiodev *dev;
 } OSSVoiceOut;
 
 typedef struct OSSVoiceIn {
@@ -65,12 +55,12 @@ typedef struct OSSVoiceIn {
 int fd;
 int nfrags;
 int fragsize;
-OSSConf *conf;
+Audiodev *dev;
 } OSSVoiceIn;
 
 struct oss_params {
 int freq;
-AudioFormat fmt;
+int fmt;
 int nchannels;
 int nfrags;
 int fragsize;
@@ -262,19 +252,25 @@ static int oss_get_version (int fd, int *version, const 
char *typ)
 }
 #endif
 
-static int oss_open (int in, struct oss_params *req,
- struct oss_params *obt, int *pfd, OSSConf* conf)
+static int oss_open(int in, struct oss_params *req, audsettings *as,
+struct oss_params *obt, int *pfd, Audiodev *dev)
 {
+AudiodevOssOptions *oopts = >u.oss;
+AudiodevOssPerDirectionOptions *opdo = in ? oopts->in : oopts->out;
 int fd;
-int oflags = conf->exclusive ? O_EXCL : 0;
+int oflags = (oopts->has_exclusive && oopts->exclusive) ? O_EXCL : 0;
 audio_buf_info abinfo;
 int fmt, freq, nchannels;
 int setfragment = 1;
-const char *dspname = in ? conf->devpath_in : conf->devpath_out;
+const char *dspname = opdo->has_dev ? opdo->dev : "/dev/dsp";
 const char *typ = in ? "ADC" : "DAC";
+#ifdef USE_DSP_POLICY
+int policy = oopts->has_dsp_policy ? oopts->dsp_policy : 5;
+#endif
 
 /* Kludge needed to have working mmap on Linux */
-oflags |= conf->try_mmap ? O_RDWR : (in ? O_RDONLY : O_WRONLY);
+oflags |= (oopts->has_try_mmap && oopts->try_mmap) ?
+O_RDWR : (in ? O_RDONLY : O_WRONLY);
 
 fd = open (dspname, oflags | O_NONBLOCK);
 if (-1 == fd) {
@@ -285,6 +281,9 @@ static int oss_open (int in, struct oss_params *req,
 freq = req->freq;
 nchannels = req->nchannels;
 fmt = req->fmt;
+req->nfrags = opdo->has_buffer_count ? opdo->buffer_count : 4;
+req->fragsize = audio_buffer_bytes(
+qapi_AudiodevOssPerDirectionOptions_base(opdo), as, 23220);
 
 if (ioctl (fd, SNDCTL_DSP_SAMPLESIZE, )) {
 oss_logerr2 (errno, typ, "Failed to set sample size %d\n", req->fmt);
@@ -308,18 +307,18 @@ static int oss_open (int in, struct oss_params *req,
 }
 
 #ifdef USE_DSP_POLICY
-if (conf->policy >= 0) {
+if (policy >= 0) {
 int version;
 
 if (!oss_get_version (fd, , typ)) {
 trace_oss_version(version);
 
 if (version >= 0x04) {
-int policy = conf->policy;
-if (ioctl (fd, SNDCTL_DSP_POLICY, )) {
+int policy2 = policy;
+if (ioctl(fd, 

[Qemu-devel] [PATCH v5 03/14] audio: -audiodev command line option: documentation

2019-02-20 Thread Kővágó, Zoltán
This patch adds documentation of an -audiodev command line option, that
deprecates the old QEMU_* environment variables for audio backend
configuration.  It's syntax is similar to existing options (-netdev,
-device, etc):

  -audiodev driver_name,property=value,...

Although now it's possible to specify multiple -audiodev options on
command line, multiple audio backends are not supported yet.

Signed-off-by: Kővágó, Zoltán 
---

Notes:
Changes from v4:

* deprecated QEMU_AUDIO_ env vars
* updated to reflect qapi changes
* added info to qemu-deprecated.texi

 qemu-deprecated.texi |   7 ++
 qemu-options.hx  | 236 ++-
 2 files changed, 240 insertions(+), 3 deletions(-)

diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi
index 45c57952da..5c07ad4acb 100644
--- a/qemu-deprecated.texi
+++ b/qemu-deprecated.texi
@@ -60,6 +60,13 @@ Support for invalid topologies will be removed, the user 
must ensure
 topologies described with -smp include all possible cpus, i.e.
   @math{@var{sockets} * @var{cores} * @var{threads} = @var{maxcpus}}.
 
+@subsection QEMU_AUDIO_ environment variables and -audio-help (since 4.0)
+
+The ``-audiodev'' argument is now the preferred way to specify audio
+backend settings instead of environment variables.  To ease migration to
+the new format, the ``-audiodev-help'' option can be used to convert
+the current values of the environment variables to ``-audiodev'' options.
+
 @section QEMU Machine Protocol (QMP) commands
 
 @subsection block-dirty-bitmap-add "autoload" parameter (since 2.12.0)
diff --git a/qemu-options.hx b/qemu-options.hx
index 77bd98e20b..f77f4d89a7 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -416,14 +416,244 @@ The default is @code{en-us}.
 ETEXI
 
 
+HXCOMM Deprecated by -audiodev
 DEF("audio-help", 0, QEMU_OPTION_audio_help,
-"-audio-help print list of audio drivers and their options\n",
+"-audio-help show -audiodev equivalent of the currently specified 
audio settings\n",
 QEMU_ARCH_ALL)
 STEXI
 @item -audio-help
 @findex -audio-help
-Will show the audio subsystem help: list of drivers, tunable
-parameters.
+Will show the -audiodev equivalent of the currently specified
+(deprecated) environment variables.
+ETEXI
+
+DEF("audiodev", HAS_ARG, QEMU_OPTION_audiodev,
+"-audiodev [driver=]driver,id=id[,prop[=value][,...]]\n"
+"specifies the audio backend to use\n"
+"id= identifier of the backend\n"
+"timer-period= timer period in microseconds\n"
+"in|out.fixed-settings= use fixed settings for host 
audio\n"
+"in|out.frequency= frequency to use with fixed settings\n"
+"in|out.channels= number of channels to use with fixed 
settings\n"
+"in|out.format= sample format to use with fixed settings\n"
+"valid values: s8, s16, s32, u8, u16, u32\n"
+"in|out.voices= number of voices to use\n"
+"in|out.buffer-len= length of buffer in microseconds\n"
+"-audiodev none,id=id,[,prop[=value][,...]]\n"
+"dummy driver that discards all output\n"
+#ifdef CONFIG_ALSA
+"-audiodev alsa,id=id[,prop[=value][,...]]\n"
+"in|out.dev= name of the audio device to use\n"
+"in|out.period-len= length of period in microseconds\n"
+"in|out.try-poll= attempt to use poll mode\n"
+"threshold= threshold (in microseconds) when playback 
starts\n"
+#endif
+#ifdef CONFIG_COREAUDIO
+"-audiodev coreaudio,id=id[,prop[=value][,...]]\n"
+"in|out.buffer-count= number of buffers\n"
+#endif
+#ifdef CONFIG_DSOUND
+"-audiodev dsound,id=id[,prop[=value][,...]]\n"
+"latency= add extra latency to playback in microseconds\n"
+#endif
+#ifdef CONFIG_OSS
+"-audiodev oss,id=id[,prop[=value][,...]]\n"
+"in|out.dev= path of the audio device to use\n"
+"in|out.buffer-count= number of buffers\n"
+"in|out.try-poll= attempt to use poll mode\n"
+"try-mmap= try using memory mapped access\n"
+"exclusive= open device in exclusive mode\n"
+"dsp-policy= set timing policy (0..10), -1 to use fragment 
mode\n"
+#endif
+#ifdef CONFIG_PA
+"-audiodev pa,id=id[,prop[=value][,...]]\n"
+"server= PulseAudio server address\n"
+"in|out.name= source/sink device name\n"
+#endif
+#ifdef CONFIG_SDL
+"-audiodev sdl,id=id[,prop[=value][,...]]\n"
+#endif
+#ifdef CONFIG_SPICE
+"-audiodev spice,id=id[,prop[=value][,...]]\n"
+#endif
+"-audiodev wav,id=id[,prop[=value][,...]]\n"
+"path= path of wav file to record\n",
+QEMU_ARCH_ALL)
+STEXI
+@item -audiodev 

  1   2   3   4   >