Re: [Qemu-devel] [RFC 1/3] pci_expander_bridge: reserve enough mcfg space for pxb host

2018-05-21 Thread Zihan Yang
Hi Marcel,

Thanks a lot for your feedback.

> I don't think we should try to place the MMCFGs before 4G even if there
> is enough room. Is better to place them always after 4G.
>
> "above_4g_mem" PCI hole it is reserved for PCI devices hotplug. We cannot
use if for
> MMCFGs. What I think we can do is to "move" the 64 bit PCI hole after the
MMCFGs.
> So the layout of over 4G space will be:
>
> [RAM hotplug][MMCFGs][PCI hotplug]...

OK, I will reorganize the memory layout. Should the number of MMCFG be
limited,
as there might be insufficient memory above 4G?

> Do you need the  number of existing expander hosts? We have a
pxbdev_list, just query it.

Great, I think I missed that list.

> The above will need to change. We move the pci hole, not resize it.
> I am not sure this is the right place to handle it, maybe we add a new
property
> right beside pci_hole ones (extra-mmcfg?) and default it to 0.

That sounds good, so that we just need to check this range when setting
mcfg table instead of traversing the host bridge list.

> You cannot use the MCH_HOST_BRIDGE_PCIEXBAR_DEFAULT as it is in use
> by the "main" root complex MMCFG. Actually I don't think we can come up
> with a valid default.

I see, I'll replcae it with unmapped then.

> Be aware this is used by both pxb and pxb-pcie devices, I think you
should split the type
>for each one and let the pxb's one as before.

OK, I had thought it would make codes simpler, as TYPE_PCIE_HOST_BRIDGE
is also the child of TYPE_PCI_HOST_BRIDGE, but I did forget about the pxb
devices. I'll split it in v2.

Thanks
Zihan


Re: [Qemu-devel] [PATCH v2 3/4] sdcard: Implement the UHS-I SWITCH_FUNCTION entries (Spec v3)

2018-05-21 Thread Philippe Mathieu-Daudé
On 05/14/2018 12:38 PM, Peter Maydell wrote:
> On 9 May 2018 at 07:01, Philippe Mathieu-Daudé  wrote:
>> [based on a patch from Alistair Francis 
>>  from qemu/xilinx tag xilinx-v2015.2]
>> Signed-off-by: Edgar E. Iglesias 
>> [PMD: rebased, changed magic by definitions, use stw_be_p, add tracing,
>>  check all functions in group are correct before setting the values]
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
> 
>> +/* Bits 376-399: function (4 bits each group)
>> + *
>> + * Do not write the values back directly:
>> + * Check everything first writing to 'tmpbuf'
>> + */
>> +data_p = tmpbuf;
> 
> You don't need a tmpbuf here, because it doesn't matter if we
> write something to the data array that it turns out we don't want
> to write; we can always rewrite it later...
> 
>> +for (fn_grp = SD_FG_MAX; fn_grp >= SD_FG_MIN; fn_grp--) {
>> +new_func = (arg >> ((fn_grp - 1) * 4)) & 0x0f;
>> +if (new_func == SD_FN_NO_INFLUENCE) {
>> +/* Guest requested no influence, so this function group
>> + * stays the same.
>> + */
>> +new_func = sd->function_group[fn_grp - 1];
>> +} else {
>> +const sd_fn_support *def = 
>> _fn_support_defs[fn_grp][new_func];
>> +if (mode) {
>> +if (!def->name) {
>> +qemu_log_mask(LOG_GUEST_ERROR,
>> +  "Function %d not valid for "
>> +  "function group %d\n",
>> +  new_func, fn_grp);
>> +invalid_function_selected = true;
>> +new_func = SD_FN_NO_INFLUENCE;
>> +} else if (def->unimp) {
>> +qemu_log_mask(LOG_UNIMP,
>> +  "Function %s (fn grp %d) not 
>> implemented\n",
>> +  def->name, fn_grp);
>> +invalid_function_selected = true;
>> +new_func = SD_FN_NO_INFLUENCE;
>> +} else if (def->uhs_only && !sd->uhs_activated) {
>> +qemu_log_mask(LOG_GUEST_ERROR,
>> +  "Function %s (fn grp %d) only "
>> +  "valid in UHS mode\n",
>> +  def->name, fn_grp);
>> +invalid_function_selected = true;
>> +new_func = SD_FN_NO_INFLUENCE;
>> +} else {
>> +sd->function_group[fn_grp - 1] = new_func;
> 
> ...but don't want to update the function_group[n] to the new value until
> we've checked that all the selected values in the command are OK,
> so you either need a temporary for the new function values, or
> you need to do one pass over the inputs to check and another one to set.
> 
>> +}
>> +}
>> +trace_sdcard_function_select(def->name, sd_fn_grp_name[fn_grp],
>> + mode);
>> +}
>> +if (!(fn_grp & 0x1)) { /* evens go in high nibble */
>> +*data_p = new_func << 4;
>> +} else { /* odds go in low nibble */
>> +*(data_p++) |= new_func;
>> +}
>> +}
>> +if (invalid_function_selected) {
>> +/* Ignore all the set values */
>> +memset(>data[14], 0, SD_FN_BUFSZ);
> 
> All-zeroes doesn't seem to match the spec. The spec says "The response
> to an invalid selection of function will be 0xF", which is a bit unclear,
> but has to mean at least that we return 0xf for the function groups which
> were invalid selections. I'm not sure what we should return as status
> for the function groups which weren't invalid; possibilities include:
>  * 0xf
>  * whatever the provided argument for that function group was
>  * whatever the current status for that function group is
If selection is 0xF (No influence) to query, response is
"whatever the current status for that function group is"
per group.

> 
> I don't suppose you're in a position to find out what an actual
> hardware SD card does?

I tested some SanDisk 'Ultra' card.

Tests output posted on this thread:
http://lists.nongnu.org/archive/html/qemu-devel/2018-05/msg04840.html

I'll now rework sd_function_switch() before to respin.

Regards,

Phil.



Re: [Qemu-devel] [PATCH v2 0/2] error-report: introduce {error|warn}_report_once

2018-05-21 Thread Peter Xu
On Tue, May 22, 2018 at 11:56:27AM +0800, Peter Xu wrote:
> v2:
> - for patch 1: replace tabs, add trivial comment [Markus]
>   (I didn't add much comment otherwise I'll need to duplicate what's
>there in error_report())
> - add patch 2
> 
> Patch 1 introduce the helpers.
> 
> Patch 2 use it to replace VT-d trace_vtd_err().
> 
> Please review.  Thanks.

Sorry I forgot to CC Eric in the series.  Adding in.

-- 
Peter Xu



Re: [Qemu-devel] [PATCH 5/8] sdcard: Implement the UHS-I SWITCH_FUNCTION entries (Spec v3)

2018-05-21 Thread Philippe Mathieu-Daudé
Hi Peter,

On 03/12/2018 10:16 AM, Peter Maydell wrote:
> On 12 March 2018 at 13:03, Philippe Mathieu-Daudé  wrote:
>> On 03/09/2018 06:03 PM, Peter Maydell wrote:
>>> I think the spec says that if the guest makes an invalid selection
>>> for one function in the group then we must ignore all the set values,
>>
>> ... for the current group? ...
>>
>>> not just the one that was wrong, so we need to check everything
>>> first before we start writing the new values back.
>>
>> I'm following the "Physical Layer Simplified Specification Version 3.01".
>>
>>   4.3.10.3 Mode 1 Operation - Set Function
>>
>>   Switching to a new functionality is done by:
>>   • When a function cannot be switched because it is busy,
>> the card returns the current function number (not returns 0xF),
>> the other functions in the other groups may still be switched.
>>
>>   In response to a set function, the switch function will return ...
>>   • The function that is the result of the switch command. In case
>> of invalid selection of one function or more, all set values
>> are ignored and no change will be done (identical to the case
>> where the host selects 0xF for all functions groups). The
>> response to an invalid selection of function will be 0xF.
>>
>> I'm not sure how to interpret this paragraph, I understand it as:
>> "all set values are ignored [in the current group]" but this is
>> confusing because of the "identical to ... all functions groups".
> 
> The command only lets you specify one value function in each
> group, so "all set values" must mean "the set values for every
> group", I think, and the parenthesised text confirms that --
> it should act as if the command specified 0xf for everything.
> It's slightly less clear what exactly the response should be:
> should it return 0xf for the groups where there was an invalid
> selection, and  for the groups
> where the selection request was ok, or just 0xf for everything ?
> (This is probably most easily answered by testing the behaviour
> of a real sd card I guess...)

Sorry to let you wait so long, it took me days to have full setup and
tests :/

Testing, the behavior is as you said:
"it return 0xf for the groups where there was an invalid  selection, and
 for the groups where the selection
request was ok"

>>> do_cmd(6, 0x00f0)
"00648001800180018001c0018001"
0064 // Maximum current consumption: 64mA
8001 // Function group 6, information. If a bit i is set, function i is
supported. 0=default
8001 // 5: default
8001 // 4: default
8001 // 3: default
c001 // 2: 0 = default + 0xE = Vendor specific
8001 // 1: default
0 //6 The function which can be switched in function group 6. 0xF shows
function set error with the argument.
0 //5
0 //4
0 //3
0 //2
0 //1
00 // Data Structure Version: 00h – bits 511:376 are defined

// undef

>>> do_cmd(6, 0x0)
// same as do_cmd(6, 0x00f0)

Let's try to set Current Limit: 400mA (function name 1 to group No 4,
arg slice [15:12]):

we need to use CMD6 argument:
(gdb) p/x 0x00f0 & ~(0xf << 12) | (1 << 12)
0xff1ff0

>>> do_cmd(6, 0x00ff1ff0)
"8001800180018001c001800100f0"
 // 0mA
8001 //6
8001 //5
8001 //4
8001 //3
c001 //2
8001 //1
0 //6
0 //5
f //function group 4 "0xF shows function set error with the argument."
0 //3
0 //2
0 //1
00 // v0


Now, let's try to set Command system: Advanced Security SD (function
name 4 to group No 2, arg slice [7:4]):

(gdb) p/x 0x00f0 & ~(0xf << 4) | (2 << 4)
0x20

>>> do_cmd(6, 0x0020)
"8001800180018001c0018001f000"
 // 0mA
8001 //6
8001 //5
8001 //4
8001 //3
c001 //2
8001 //1
0 //6
0 //5
0 //4
0 //3
f //function group 2 "0xF shows function set error with the argument."
0 //1
00 // v0


Finally those 2 incorrect functions at once:

(gdb) p/x 0x00f0 & ~(0xf << 12 | 0xf << 4) | (1 << 12) | (2 << 4)
0xff1f20

>>> do_cmd(6, 0xff1f20)
"8001800180018001c001800100f0f000"
...
0 //6
0 //5
f //function group 4: error with the argument
0 //3
f //function group 2: error with the argument
0 //1
00 // v0
...

Regards,

Phil.



[Qemu-devel] [PATCH v2 2/2] intel-iommu: start to use error_report_once

2018-05-21 Thread Peter Xu
Replace existing trace_vtd_err() with error_report_once() then stderr
will capture something if any of the error happens, meanwhile we don't
suffer from any DDOS.  Then remove the trace point.

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c | 34 +-
 hw/i386/trace-events  |  1 -
 2 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index fb31de9416..cf655fb9f6 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -285,14 +285,14 @@ static void vtd_generate_fault_event(IntelIOMMUState *s, 
uint32_t pre_fsts)
 {
 if (pre_fsts & VTD_FSTS_PPF || pre_fsts & VTD_FSTS_PFO ||
 pre_fsts & VTD_FSTS_IQE) {
-trace_vtd_err("There are previous interrupt conditions "
+error_report_once("There are previous interrupt conditions "
   "to be serviced by software, fault event "
   "is not generated.");
 return;
 }
 vtd_set_clear_mask_long(s, DMAR_FECTL_REG, 0, VTD_FECTL_IP);
 if (vtd_get_long_raw(s, DMAR_FECTL_REG) & VTD_FECTL_IM) {
-trace_vtd_err("Interrupt Mask set, irq is not generated.");
+error_report_once("Interrupt Mask set, irq is not generated.");
 } else {
 vtd_generate_interrupt(s, DMAR_FEADDR_REG, DMAR_FEDATA_REG);
 vtd_set_clear_mask_long(s, DMAR_FECTL_REG, VTD_FECTL_IP, 0);
@@ -400,19 +400,19 @@ static void vtd_report_dmar_fault(IntelIOMMUState *s, 
uint16_t source_id,
 trace_vtd_dmar_fault(source_id, fault, addr, is_write);
 
 if (fsts_reg & VTD_FSTS_PFO) {
-trace_vtd_err("New fault is not recorded due to "
+error_report_once("New fault is not recorded due to "
   "Primary Fault Overflow.");
 return;
 }
 
 if (vtd_try_collapse_fault(s, source_id)) {
-trace_vtd_err("New fault is not recorded due to "
+error_report_once("New fault is not recorded due to "
   "compression of faults.");
 return;
 }
 
 if (vtd_is_frcd_set(s, s->next_frcd_reg)) {
-trace_vtd_err("Next Fault Recording Reg is used, "
+error_report_once("Next Fault Recording Reg is used, "
   "new fault is not recorded, set PFO field.");
 vtd_set_clear_mask_long(s, DMAR_FSTS_REG, 0, VTD_FSTS_PFO);
 return;
@@ -421,7 +421,7 @@ static void vtd_report_dmar_fault(IntelIOMMUState *s, 
uint16_t source_id,
 vtd_record_frcd(s, s->next_frcd_reg, source_id, addr, fault, is_write);
 
 if (fsts_reg & VTD_FSTS_PPF) {
-trace_vtd_err("There are pending faults already, "
+error_report_once("There are pending faults already, "
   "fault event is not generated.");
 vtd_set_frcd_and_update_ppf(s, s->next_frcd_reg);
 s->next_frcd_reg++;
@@ -1339,7 +1339,7 @@ static uint64_t 
vtd_context_cache_invalidate(IntelIOMMUState *s, uint64_t val)
 break;
 
 default:
-trace_vtd_err("Context cache invalidate type error.");
+error_report_once("Context cache invalidate type error.");
 caig = 0;
 }
 return caig;
@@ -1445,7 +1445,7 @@ static uint64_t vtd_iotlb_flush(IntelIOMMUState *s, 
uint64_t val)
 am = VTD_IVA_AM(addr);
 addr = VTD_IVA_ADDR(addr);
 if (am > VTD_MAMV) {
-trace_vtd_err("IOTLB PSI flush: address mask overflow.");
+error_report_once("IOTLB PSI flush: address mask overflow.");
 iaig = 0;
 break;
 }
@@ -1454,7 +1454,7 @@ static uint64_t vtd_iotlb_flush(IntelIOMMUState *s, 
uint64_t val)
 break;
 
 default:
-trace_vtd_err("IOTLB flush: invalid granularity.");
+error_report_once("IOTLB flush: invalid granularity.");
 iaig = 0;
 }
 return iaig;
@@ -1604,7 +1604,7 @@ static void vtd_handle_ccmd_write(IntelIOMMUState *s)
 /* Context-cache invalidation request */
 if (val & VTD_CCMD_ICC) {
 if (s->qi_enabled) {
-trace_vtd_err("Queued Invalidation enabled, "
+error_report_once("Queued Invalidation enabled, "
   "should not use register-based invalidation");
 return;
 }
@@ -1625,7 +1625,7 @@ static void vtd_handle_iotlb_write(IntelIOMMUState *s)
 /* IOTLB invalidation request */
 if (val & VTD_TLB_IVT) {
 if (s->qi_enabled) {
-trace_vtd_err("Queued Invalidation enabled, "
+error_report_once("Queued Invalidation enabled, "
   "should not use register-based invalidation.");
 return;
 }
@@ -1644,7 +1644,7 @@ static bool vtd_get_inv_desc(dma_addr_t base_addr, 
uint32_t offset,
 dma_addr_t addr = base_addr + offset * sizeof(*inv_desc);
 if (dma_memory_read(_space_memory, addr, inv_desc,
 sizeof(*inv_desc))) {
-trace_vtd_err("Read INV DESC failed.");
+

[Qemu-devel] [PATCH v2 0/2] error-report: introduce {error|warn}_report_once

2018-05-21 Thread Peter Xu
v2:
- for patch 1: replace tabs, add trivial comment [Markus]
  (I didn't add much comment otherwise I'll need to duplicate what's
   there in error_report())
- add patch 2

Patch 1 introduce the helpers.

Patch 2 use it to replace VT-d trace_vtd_err().

Please review.  Thanks.

Peter Xu (2):
  qemu-error: introduce {error|warn}_report_once
  intel-iommu: start to use error_report_once

 include/qemu/error-report.h | 26 ++
 hw/i386/intel_iommu.c   | 34 +-
 hw/i386/trace-events|  1 -
 3 files changed, 43 insertions(+), 18 deletions(-)

-- 
2.17.0




[Qemu-devel] [PATCH v2 1/2] qemu-error: introduce {error|warn}_report_once

2018-05-21 Thread Peter Xu
I stole the printk_once() macro.

I always wanted to be able to print some error directly if there is a
buffer to dump, however we can't use error_report() where the code path
can be triggered by DDOS attack.  To avoid that, we can introduce a
print-once-like function for it.  Meanwhile, we also introduce the
corresponding helper for warn_report().

CC: Markus Armbruster 
Signed-off-by: Peter Xu 
---
 include/qemu/error-report.h | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/include/qemu/error-report.h b/include/qemu/error-report.h
index e1c8ae1a52..3e6e84801f 100644
--- a/include/qemu/error-report.h
+++ b/include/qemu/error-report.h
@@ -44,6 +44,32 @@ void error_report(const char *fmt, ...) GCC_FMT_ATTR(1, 2);
 void warn_report(const char *fmt, ...) GCC_FMT_ATTR(1, 2);
 void info_report(const char *fmt, ...) GCC_FMT_ATTR(1, 2);
 
+/* Similar to error_report(), but it only prints the message once. */
+#define error_report_once(fmt, ...) \
+({  \
+static bool __print_once;   \
+bool __ret_print_once = !__print_once;  \
+\
+if (!__print_once) {\
+__print_once = true;\
+error_report(fmt, ##__VA_ARGS__);   \
+}   \
+unlikely(__ret_print_once); \
+})
+
+/* Similar to warn_report(), but it only prints the message once. */
+#define warn_report_once(fmt, ...)  \
+({  \
+static bool __print_once;   \
+bool __ret_print_once = !__print_once;  \
+\
+if (!__print_once) {\
+__print_once = true;\
+warn_report(fmt, ##__VA_ARGS__);   \
+}   \
+unlikely(__ret_print_once); \
+})
+
 const char *error_get_progname(void);
 extern bool enable_timestamp_msg;
 
-- 
2.17.0




Re: [Qemu-devel] [PATCH v3] monitor: let cur_mon be per-thread

2018-05-21 Thread Peter Xu
On Thu, Apr 12, 2018 at 02:11:08PM +0800, Peter Xu wrote:
> In the future the monitor iothread may be accessing the cur_mon as
> well (via monitor_qmp_dispatch_one()).  Before we introduce a real
> Out-Of-Band command, let's convert the cur_mon variable to be a
> per-thread variable to make sure there won't be a race between threads.
> 
> Note that thread variables are not initialized to a valid value when new
> thread is created.  However for our case we don't need to set it up,
> since the cur_mon variable is only used in such a pattern:
> 
>   old_mon = cur_mon;
>   cur_mon = xxx;
>   (do something, read cur_mon if necessary in the stack)
>   cur_mon = old_mon;
> 
> It plays a role as stack variable, so no need to be initialized at all.
> We only need to make sure the variable won't be changed unexpectedly by
> other threads.
> 
> Signed-off-by: Peter Xu 
> ---
> v3:
> - fix code style warning from patchew
> v2:
> - drop qemu-thread changes

Ping?

-- 
Peter Xu



Re: [Qemu-devel] [PATCH v6 0/4] qemu-thread: support --enable-debug-mutex

2018-05-21 Thread Peter Xu
On Wed, Apr 25, 2018 at 10:54:55AM +0800, Peter Xu wrote:
> v6:
> - rename __QEMU_THREAD_COMMON_H__ to QEMU_THREAD_COMMON_H
> - collect r-bs for Emilio

Ping.

It's fine if we don't really want this, but in case if this falls
through the cracks...

Regards,

-- 
Peter Xu



Re: [Qemu-devel] [RFC] monitor: turn on Out-Of-Band by default again

2018-05-21 Thread Peter Xu
On Mon, May 21, 2018 at 09:13:06AM -0500, Eric Blake wrote:
> On 05/21/2018 03:42 AM, Peter Xu wrote:
> > We turned Out-Of-Band feature of monitors off for 2.12 release.  Now we
> > try to turn that on again.
> 
> "try to turn" sounds weak, like you aren't sure of this patch.  If you
> aren't sure, then why should we feel safe in applying it?  This text is
> going in the permanent git history, so sound bold, rather than hesitant!

Yeah I am not really strong at turn that on by default, that's why I
marked the patch as RFC.  I wanted to hear from your opinions.  For
now IMHO even with x-oob, postcopy can start to work with network
recovery, then the requirement from my part is done.  However I'm
thinking maybe we should still turn that on for all the people.  One
reason is that we already have the QMP capability negotiation so it
seems redundant (as you mentioned before), meanwhile exposing it to
broader users can let broader users to leverage this new bits directly
(meanwhile easier to expose potential issues for OOB too).

Meanwhile I'm not confident too on that there can be other test cases
that has not yet been run with Out-Of-Band, so even if we solved all
existing problems I can't be sure that no further test will broke.
However I don't see it a problem for merging, since AFAIU I can't
really know what will break again (if there is) unless we apply that
to master again... :)

> 
> "We have resolved the issues from last time (commit 3fd2457d reverted by
> commit a4f90923):
> - issue 1 ...
> - issue 2 ...
> So now we are ready to enable advertisement of the feature by default"
> 
> with better descriptions of the issues that you fixed (I can think of at
> least the fixes adding thread-safety to the current monitor, and fixing
> early use of the monitor before qmp_capabilities completes; there may also
> be other issues that you want to call out).

Some of the monitor patches are not really related to previous OOB
breakage, the only one that really matters should be the ARM+Libvirt
one, which I will definitely mention in my next post.  The rest
(including per-thread cur_mon, monitor thread-safety, etc.) should
mostly for future new commands of Out-Of-Band but not for now.  For
example, current OOB commands are rare, now they don't use the
get_fd()/set_fd() interface, then the mon_fdsets won't need to be
protected at all.  But we can't guarantee that new OOB commands won't
use them too, so we still need to protect them with locks.

> 
> > 
> > Signed-off-by: Peter Xu 
> > --
> > Now OOB should be okay with all known tests (except iotest qcow2, since
> > it is still broken on master),
> 
> Which tests are still failing for you?  Ideally, you can still demonstrate
> that the tests not failing without this patch continue to pass with this
> patch, even if you call out the tests that have known issues to still be
> resolved.

I didn't remember.  We can first settle down on whether we'd like to
turn on this default value, then I can perform this test for my next
post to make sure good tests won't break.

> 
> > and AFAIK now we should also be okay with
> > ARM+Libvirt (not testsed, but Eric Auger helped to verify that before
> > the release).  So I think it's now safe to turn OOB on again.  Please
> > feel free to test this against any of existing testsuites to see whether
> > it'll still break any stuff.  Thanks,
> > 
> > Signed-off-by: Peter Xu 
> > ---
> >   monitor.c| 13 +++--
> >   tests/qmp-test.c |  2 +-
> >   vl.c |  9 -
> >   3 files changed, 8 insertions(+), 16 deletions(-)
> > 
> > diff --git a/monitor.c b/monitor.c
> > index 46814af533..ce5cc5e34e 100644
> > --- a/monitor.c
> > +++ b/monitor.c
> > @@ -4560,16 +4560,9 @@ void monitor_init(Chardev *chr, int flags)
> >   bool use_readline = flags & MONITOR_USE_READLINE;
> >   bool use_oob = flags & MONITOR_USE_OOB;
> > -if (use_oob) {
> > -if (CHARDEV_IS_MUX(chr)) {
> > -error_report("Monitor Out-Of-Band is not supported with "
> > - "MUX typed chardev backend");
> > -exit(1);
> > -}
> > -if (use_readline) {
> > -error_report("Monitor Out-Of-band is only supported by QMP");
> > -exit(1);
> > -}
> > +if (CHARDEV_IS_MUX(chr)) {
> > +/* MUX is still not supported for Out-Of-Band */
> > +use_oob = false;
> 
> This isn't a mere reinstatement of 3fd2457d, but is now advertising OOB when
> using readline (which presumably is a synonym for using HMP).  Is that
> intentional?  If so, the commit message should mention it.

At [1] below I directly moved the chunk into "mode=control" path, so
the QMP check is already there.  Here I turn OOB off explicitly for
MUX no matter HMP/QMP.  It should have the same affect as 3fd2457d.

> 
> >   }
> >   monitor_data_init(mon, false, use_oob);
> > diff --git a/tests/qmp-test.c b/tests/qmp-test.c
> > index 

Re: [Qemu-devel] [PATCH 14/27] iommu: Add IOMMU index concept to IOMMU API

2018-05-21 Thread Peter Xu
On Mon, May 21, 2018 at 03:03:49PM +0100, Peter Maydell wrote:
> If an IOMMU supports mappings that care about the memory
> transaction attributes, then it no longer has a unique
> address -> output mapping, but more than one. We can
> represent these using an IOMMU index, analogous to TCG's
> mmu indexes.
> 
> Signed-off-by: Peter Maydell 
> ---
>  include/exec/memory.h | 52 +++
>  memory.c  | 23 +++
>  2 files changed, 75 insertions(+)
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 309fdfb89b..f6226fb263 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -206,6 +206,20 @@ enum IOMMUMemoryRegionAttr {
>   * to report whenever mappings are changed, by calling
>   * memory_region_notify_iommu() (or, if necessary, by calling
>   * memory_region_notify_one() for each registered notifier).
> + *
> + * Conceptually an IOMMU provides a mapping from input address
> + * to an output TLB entry. If the IOMMU is aware of memory transaction
> + * attributes and the output TLB entry depends on the transaction
> + * attributes, we represent this using IOMMU indexes. Each index

Hi, Peter,

In what case will an IOMMU translation depend on translation
attributes?  It seems to me that we should always pass in the
translation attributes into the translate() function.  The translate()
function can omit that parameter if the specific IOMMU does not need
that information, but still I am confused about why we need to index
IOMMU by translation attributes.

> + * selects a particular translation table that the IOMMU has:
> + *   @attrs_to_index returns the IOMMU index for a set of transaction 
> attributes
> + *   @translate takes an input address and an IOMMU index
> + * and the mapping returned can only depend on the input address and the
> + * IOMMU index.
> + *
> + * Most IOMMUs don't care about the transaction attributes and support
> + * only a single IOMMU index. A more complex IOMMU might have one index
> + * for secure transactions and one for non-secure transactions.
>   */
>  typedef struct IOMMUMemoryRegionClass {
>  /* private */
> @@ -290,6 +304,26 @@ typedef struct IOMMUMemoryRegionClass {
>   */
>  int (*get_attr)(IOMMUMemoryRegion *iommu, enum IOMMUMemoryRegionAttr 
> attr,
>  void *data);
> +
> +/* Return the IOMMU index to use for a given set of transaction 
> attributes.
> + *
> + * Optional method: if an IOMMU only supports a single IOMMU index then
> + * the default implementation of memory_region_iommu_attrs_to_index()
> + * will return 0.
> + *
> + * The indexes supported by an IOMMU must be contiguous, starting at 0.
> + *
> + * @iommu: the IOMMUMemoryRegion
> + * @attrs: memory transaction attributes
> + */
> +int (*attrs_to_index)(IOMMUMemoryRegion *iommu, MemTxAttrs attrs);
> +
> +/* Return the number of IOMMU indexes this IOMMU supports.
> + *
> + * Optional method: if this method is not provided, then
> + * memory_region_iommu_num_indexes() will return 1, indicating that
> + * only a single IOMMU index is supported.
> + */

The num_indexes() definition is missing, and I saw that in the next
patch.  We'll possibly want to move it here.

Regards,

-- 
Peter Xu



[Qemu-devel] [PATCH v10 2/5] i386: Populate AMD Processor Cache Information for cpuid 0x8000001D

2018-05-21 Thread Babu Moger
Add information for cpuid 0x801D leaf. Populate cache topology information
for different cache types(Data Cache, Instruction Cache, L2 and L3) supported
by 0x801D leaf. Please refer Processor Programming Reference (PPR) for AMD
Family 17h Model for more details.

Signed-off-by: Babu Moger 
---
 target/i386/cpu.c | 103 ++
 target/i386/kvm.c |  29 +--
 2 files changed, 129 insertions(+), 3 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index d9773b6..1dd060a 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -336,6 +336,85 @@ static void encode_cache_cpuid8006(CPUCacheInfo *l2,
 }
 }
 
+/* Definitions used for building CPUID Leaf 0x801D and 0x801E */
+/* Please refer AMD64 Architecture Programmer’s Manual Volume 3 */
+#define MAX_CCX 2
+#define MAX_CORES_IN_CCX 4
+#define MAX_NODES_EPYC 4
+#define MAX_CORES_IN_NODE 8
+
+/* Number of logical processors sharing L3 cache */
+#define NUM_SHARING_CACHE(threads, num_sharing)   ((threads > 1) ? \
+ (((num_sharing - 1) * threads) + 1)  : \
+ (num_sharing - 1))
+/*
+ * L3 Cache is shared between all the cores in a core complex.
+ * Maximum cores that can share L3 is 4.
+ */
+static int num_sharing_l3_cache(int nr_cores)
+{
+int i, nodes = 1;
+
+/* Check if we can fit all the cores in one CCX */
+if (nr_cores <= MAX_CORES_IN_CCX) {
+return nr_cores;
+}
+/*
+ * Figure out the number of nodes(or dies) required to build
+ * this config. Max cores in a node is 8
+ */
+for (i = nodes; i <= MAX_NODES_EPYC; i++) {
+if (nr_cores <= (i * MAX_CORES_IN_NODE)) {
+nodes = i;
+break;
+}
+/* We support nodes 1, 2, 4 */
+if (i == 3) {
+continue;
+}
+}
+/* Spread the cores accros all the CCXs and return max cores in a ccx */
+return (nr_cores / (nodes * MAX_CCX)) +
+((nr_cores % (nodes * MAX_CCX)) ? 1 : 0);
+}
+
+/* Encode cache info for CPUID[801D] */
+static void encode_cache_cpuid801d(CPUCacheInfo *cache, CPUState *cs,
+uint32_t *eax, uint32_t *ebx,
+uint32_t *ecx, uint32_t *edx)
+{
+uint32_t num_share_l3;
+assert(cache->size == cache->line_size * cache->associativity *
+  cache->partitions * cache->sets);
+
+*eax = CACHE_TYPE(cache->type) | CACHE_LEVEL(cache->level) |
+   (cache->self_init ? CACHE_SELF_INIT_LEVEL : 0);
+
+/* L3 is shared among multiple cores */
+if (cache->level == 3) {
+num_share_l3 = num_sharing_l3_cache(cs->nr_cores);
+*eax |= (NUM_SHARING_CACHE(cs->nr_threads, num_share_l3) << 14);
+} else {
+*eax |= ((cs->nr_threads - 1) << 14);
+}
+
+assert(cache->line_size > 0);
+assert(cache->partitions > 0);
+assert(cache->associativity > 0);
+/* We don't implement fully-associative caches */
+assert(cache->associativity < cache->sets);
+*ebx = (cache->line_size - 1) |
+   ((cache->partitions - 1) << 12) |
+   ((cache->associativity - 1) << 22);
+
+assert(cache->sets > 0);
+*ecx = cache->sets - 1;
+
+*edx = (cache->no_invd_sharing ? CACHE_NO_INVD_SHARING : 0) |
+   (cache->inclusive ? CACHE_INCLUSIVE : 0) |
+   (cache->complex_indexing ? CACHE_COMPLEX_IDX : 0);
+}
+
 /*
  * Definitions of the hardcoded cache entries we expose:
  * These are legacy cache values. If there is a need to change any
@@ -4005,6 +4084,30 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 *edx = 0;
 }
 break;
+case 0x801D:
+*eax = 0;
+switch (count) {
+case 0: /* L1 dcache info */
+encode_cache_cpuid801d(env->cache_info_amd.l1d_cache, cs,
+   eax, ebx, ecx, edx);
+break;
+case 1: /* L1 icache info */
+encode_cache_cpuid801d(env->cache_info_amd.l1i_cache, cs,
+   eax, ebx, ecx, edx);
+break;
+case 2: /* L2 cache info */
+encode_cache_cpuid801d(env->cache_info_amd.l2_cache, cs,
+   eax, ebx, ecx, edx);
+break;
+case 3: /* L3 cache info */
+encode_cache_cpuid801d(env->cache_info_amd.l3_cache, cs,
+   eax, ebx, ecx, edx);
+break;
+default: /* end of info */
+*eax = *ebx = *ecx = *edx = 0;
+break;
+}
+break;
 case 0xC000:
 *eax = env->cpuid_xlevel2;
 *ebx = 0;
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index da4..a8bf7eb 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -979,9 +979,32 @@ int 

[Qemu-devel] [PATCH v10 0/5] i386: Enable TOPOEXT to support hyperthreading on AMD CPU

2018-05-21 Thread Babu Moger
This series enables the TOPOEXT feature for AMD CPUs. This is required to
support hyperthreading on kvm guests.

This addresses the issues reported in these bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1481253
https://bugs.launchpad.net/qemu/+bug/1703506 

v10:
 Based the patches on Eduardo's git://github.com/ehabkost/qemu.git x86-next
 Some of the earlier patches are already queued. So, submitting the rest of
 the series here. This series adds complete redesign of the cpu topology.
 Based on user given parameter, we try to build topology very close to the
 hardware. Maintains symmetry as much as possible. Added new function
 epyc_build_topology to build the topology based on user given nr_cores,
 nr_threads.
 Summary of changes.
 1. Build the topology dinamically based on nr_cores and nr_threads
 2. Added new epyc_build_topology to build the new topology.
 3. Added new function num_sharing_l3_cache to calculate the L3 sharing
 4. Added a check to verify the topology. Disabled the TOPOEXT if the
topology cannot be built.

v9:
 Based the patches on Eduardo's git://github.com/ehabkost/qemu.git x86-next
 tree. Following 3 patches from v8 are already queued.
  i386: Add cache information in X86CPUDefinition
  i386: Initialize cache information for EPYC family processors
  i386: Helpers to encode cache information consistently
 So, submitting the rest of the series here.

 Changes:
 1. Included Eduardo's clean up patch
 2. Added 2.13 machine types
 3. Disabled topoext for 2.12 and below versions.
 4. Added the assert to core_id as discussed.

v8:
 Addressed feedback from Eduardo. Thanks Eduardo for being patient with me.
 Tested on AMD EPYC server and also did some basic testing on intel box.
 Summary of changes.
 1. Reverted back l2 cache associativity. Kept it same as legacy.
 2. Changed cache_info structure in X86CPUDefinition and CPUX86State to 
pointers.
 3. Added legacy_cache property in PC_COMPAT_2_12 and initialized legacy_cache
based on static cache_info availability.
 4. Squashed patch 4 and 5 and applied it before patch 3.
 5. Added legacy cache check for cpuid[2] and cpuid[4] for consistancy.
 6. Simplified NUM_SHARING_CACHE definition for readability,
 7. Removed assert for core_id as it appeared redundant.
 8. Simplified encode_cache_cpuid801d little bit.
 9. Few more minor changes

v7:
 Rebased on top of latest tree after 2.12 release and done few basic tests. 
There are
 no changes except for few minor hunks. Hopefully this gets pulled into 2.13 
release.
 Please review, let me know of any feedback.

v6:
1.Fixed problem with patch#4(Add new property to control cache info). The 
parameter
 legacy_cache should be "on" by default on machine type "pc-q35-2.10". This was
 found by Alexandr Iarygin.
2.Fixed the l3 cache size for EPYC based machines(patch#3). Also, fixed the 
number of
 logical processors sharing the cache(patch#6). Only L3 cache is shared by 
multiple
 cores but not L1 or L2. This was a bug while decoding. This was found by 
Geoffrey McRae
 and he verified the fix. 

v5:
 In this series I tried to address the feedback from Eduardo Habkost.
 The discussion thread is here.
 https://patchwork.kernel.org/patch/10299745/
 The previous thread is here.
 http://patchwork.ozlabs.org/cover/884885/

Reason for these changes.
 The cache properties for AMD family of processors have changed from
 previous releases. We don't want to display the new information on the
 old family of processors as this might cause compatibility issues.

Changes:
1.Based the patches on top of Eduardo's(patch#1) patch.
  Changed few things.
  Moved the Cache definitions to cpu.h file.
  Changed the CPUID_4 names to generic names.
2.Added a new propery "legacy-cache" in cpu object(patch#2). This can be
  used to display the old property even if the host supports the new cache
  properties.
3.Added cache information in X86CPUDefinition and CPUX86State
4.Patch 6-7 changed quite a bit from previous version does to new approach.
5.Addressed few issues with CPUID_8000_001d and CPUID_8000_001E.

v4:
1.Removed the checks under cpuid 0x801D leaf(patch #2). These check are
  not necessary. Found this during internal review.
2.Added CPUID_EXT3_TOPOEXT feature for all the 17 family(patch #4). This was
  found by Kash Pande during his testing.
3.Removed th hardcoded cpuid xlevel and dynamically extended if 
CPUID_EXT3_TOPOEXT
  is supported(Suggested by Brijesh Singh). 

v3:
1.Removed the patch #1. Radim mentioned that original typo problem is in 
  linux kernel header. qemu is just copying those files.
2.In previous version, I used the cpuid 4 definitions for AMDs cpuid leaf
  0x801D. CPUID 4 is very intel specific and we dont want to expose those
  details under AMD. I have renamed some of these definitions as generic.
  These changes are in patch#1. Radim, let me know if this is what you intended.
3.Added assert to for core_id(Suggested by Radim Krcmár).
4.Changed the if condition under "L3 cache 

[Qemu-devel] [PATCH v10 5/5] i386: Remove generic SMT thread check

2018-05-21 Thread Babu Moger
Remove generic non-intel check while validating hyperthreading support.
Certain AMD CPUs can support hyperthreading now.

CPU family with TOPOEXT feature can support hyperthreading now.

Signed-off-by: Babu Moger 
Tested-by: Geoffrey McRae 
Reviewed-by: Eduardo Habkost 
---
 target/i386/cpu.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index d20b305..7eba8cc 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -4961,17 +4961,20 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
**errp)
 
 qemu_init_vcpu(cs);
 
-/* Only Intel CPUs support hyperthreading. Even though QEMU fixes this
- * issue by adjusting CPUID__0001_EBX and CPUID_8000_0008_ECX
- * based on inputs (sockets,cores,threads), it is still better to gives
+/* Most Intel and certain AMD CPUs support hyperthreading. Even though QEMU
+ * fixes this issue by adjusting CPUID__0001_EBX and 
CPUID_8000_0008_ECX
+ * based on inputs (sockets,cores,threads), it is still better to give
  * users a warning.
  *
  * NOTE: the following code has to follow qemu_init_vcpu(). Otherwise
  * cs->nr_threads hasn't be populated yet and the checking is incorrect.
  */
-if (!IS_INTEL_CPU(env) && cs->nr_threads > 1 && !ht_warned) {
-error_report("AMD CPU doesn't support hyperthreading. Please configure"
- " -smp options properly.");
+ if (IS_AMD_CPU(env) &&
+ !(env->features[FEAT_8000_0001_ECX] & CPUID_EXT3_TOPOEXT) &&
+ cs->nr_threads > 1 && !ht_warned) {
+error_report("This family of AMD CPU doesn't support "
+ "hyperthreading(%d). Please configure -smp "
+ "options properly.", cs->nr_threads);
 ht_warned = true;
 }
 
-- 
1.8.3.1




[Qemu-devel] [PATCH v10 4/5] i386: Enable TOPOEXT feature on AMD EPYC CPU

2018-05-21 Thread Babu Moger
Enable TOPOEXT feature on EPYC CPU. This is required to support
hyperthreading on VM guests. Also extend xlevel to 0x801E.

Disable TOPOEXT feature for legacy machines and also disable
TOPOEXT feature if the config cannot be supported.

Signed-off-by: Babu Moger 
---
 include/hw/i386/pc.h |  4 
 target/i386/cpu.c| 37 +++--
 2 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index a0c269f..9c8db3d 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -302,6 +302,10 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
 .driver   = TYPE_X86_CPU,\
 .property = "legacy-cache",\
 .value= "on",\
+},{\
+.driver   = "EPYC-" TYPE_X86_CPU,\
+.property = "topoext",\
+.value= "off",\
 },
 
 #define PC_COMPAT_2_11 \
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index d9ccaad..d20b305 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -496,6 +496,20 @@ static void encode_topo_cpuid801e(CPUState *cs, X86CPU 
*cpu,
 }
 
 /*
+ * Check if we can support this topology
+ * Fail if number of cores are beyond the supported config
+ * or nr_threads is more than 2
+ */
+static int verify_topology(int nr_cores, int nr_threads)
+{
+if ((nr_cores > (MAX_CORES_IN_NODE * MAX_NODES_EPYC)) ||
+(nr_threads > 2)) {
+return 0;
+}
+return 1;
+}
+
+/*
  * Definitions of the hardcoded cache entries we expose:
  * These are legacy cache values. If there is a need to change any
  * of these values please use builtin_x86_defs
@@ -2541,7 +2555,8 @@ static X86CPUDefinition builtin_x86_defs[] = {
 .features[FEAT_8000_0001_ECX] =
 CPUID_EXT3_OSVW | CPUID_EXT3_3DNOWPREFETCH |
 CPUID_EXT3_MISALIGNSSE | CPUID_EXT3_SSE4A | CPUID_EXT3_ABM |
-CPUID_EXT3_CR8LEG | CPUID_EXT3_SVM | CPUID_EXT3_LAHF_LM,
+CPUID_EXT3_CR8LEG | CPUID_EXT3_SVM | CPUID_EXT3_LAHF_LM |
+CPUID_EXT3_TOPOEXT,
 .features[FEAT_7_0_EBX] =
 CPUID_7_0_EBX_FSGSBASE | CPUID_7_0_EBX_BMI1 | CPUID_7_0_EBX_AVX2 |
 CPUID_7_0_EBX_SMEP | CPUID_7_0_EBX_BMI2 | CPUID_7_0_EBX_RDSEED |
@@ -2586,7 +2601,8 @@ static X86CPUDefinition builtin_x86_defs[] = {
 .features[FEAT_8000_0001_ECX] =
 CPUID_EXT3_OSVW | CPUID_EXT3_3DNOWPREFETCH |
 CPUID_EXT3_MISALIGNSSE | CPUID_EXT3_SSE4A | CPUID_EXT3_ABM |
-CPUID_EXT3_CR8LEG | CPUID_EXT3_SVM | CPUID_EXT3_LAHF_LM,
+CPUID_EXT3_CR8LEG | CPUID_EXT3_SVM | CPUID_EXT3_LAHF_LM |
+CPUID_EXT3_TOPOEXT,
 .features[FEAT_8000_0008_EBX] =
 CPUID_8000_0008_EBX_IBPB,
 .features[FEAT_7_0_EBX] =
@@ -4166,6 +4182,12 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 break;
 case 0x801D:
 *eax = 0;
+/* Check if we can support this topology */
+if (!verify_topology(cs->nr_cores, cs->nr_threads)) {
+/* Disable topology extention */
+env->features[FEAT_8000_0001_ECX] &= !CPUID_EXT3_TOPOEXT;
+break;
+}
 switch (count) {
 case 0: /* L1 dcache info */
 encode_cache_cpuid801d(env->cache_info_amd.l1d_cache, cs,
@@ -4190,6 +4212,12 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 break;
 case 0x801E:
 assert(cpu->core_id <= 255);
+/* Check if we can support this topology */
+if (!verify_topology(cs->nr_cores, cs->nr_threads)) {
+/* Disable topology extention */
+env->features[FEAT_8000_0001_ECX] &= !CPUID_EXT3_TOPOEXT;
+break;
+}
 encode_topo_cpuid801e(cs, cpu,
   eax, ebx, ecx, edx);
 break;
@@ -4654,6 +4682,11 @@ static void x86_cpu_expand_features(X86CPU *cpu, Error 
**errp)
 x86_cpu_adjust_level(cpu, >cpuid_min_xlevel, 0x800A);
 }
 
+/* TOPOEXT feature requires 0x801E */
+if (env->features[FEAT_8000_0001_ECX] & CPUID_EXT3_TOPOEXT) {
+x86_cpu_adjust_level(cpu, >cpuid_min_xlevel, 0x801E);
+}
+
 /* SEV requires CPUID[0x801F] */
 if (sev_enabled()) {
 x86_cpu_adjust_level(cpu, >cpuid_min_xlevel, 0x801F);
-- 
1.8.3.1




[Qemu-devel] [PATCH v10 1/5] i386: Clean up cache CPUID code

2018-05-21 Thread Babu Moger
From: Eduardo Habkost 

Always initialize CPUCaches structs with cache information, even
if legacy_cache=true.  Use different CPUCaches struct for
CPUID[2], CPUID[4], and the AMD CPUID leaves.

This will simplify a lot the logic inside cpu_x86_cpuid().

Signed-off-by: Eduardo Habkost 
Signed-off-by: Babu Moger 
---
 target/i386/cpu.c | 117 +++---
 target/i386/cpu.h |  14 ---
 2 files changed, 67 insertions(+), 64 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index e5e66a7..d9773b6 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1114,7 +1114,7 @@ struct X86CPUDefinition {
 };
 
 static CPUCaches epyc_cache_info = {
-.l1d_cache = {
+.l1d_cache = &(CPUCacheInfo) {
 .type = DCACHE,
 .level = 1,
 .size = 32 * KiB,
@@ -1126,7 +1126,7 @@ static CPUCaches epyc_cache_info = {
 .self_init = 1,
 .no_invd_sharing = true,
 },
-.l1i_cache = {
+.l1i_cache = &(CPUCacheInfo) {
 .type = ICACHE,
 .level = 1,
 .size = 64 * KiB,
@@ -1138,7 +1138,7 @@ static CPUCaches epyc_cache_info = {
 .self_init = 1,
 .no_invd_sharing = true,
 },
-.l2_cache = {
+.l2_cache = &(CPUCacheInfo) {
 .type = UNIFIED_CACHE,
 .level = 2,
 .size = 512 * KiB,
@@ -1148,7 +1148,7 @@ static CPUCaches epyc_cache_info = {
 .sets = 1024,
 .lines_per_tag = 1,
 },
-.l3_cache = {
+.l3_cache = &(CPUCacheInfo) {
 .type = UNIFIED_CACHE,
 .level = 3,
 .size = 8 * MiB,
@@ -3342,9 +3342,8 @@ static void x86_cpu_load_def(X86CPU *cpu, 
X86CPUDefinition *def, Error **errp)
 env->features[w] = def->features[w];
 }
 
-/* Store Cache information from the X86CPUDefinition if available */
-env->cache_info = def->cache_info;
-cpu->legacy_cache = def->cache_info ? 0 : 1;
+/* legacy-cache defaults to 'off' if CPU model provides cache info */
+cpu->legacy_cache = !def->cache_info;
 
 /* Special cases not set in the X86CPUDefinition structs: */
 /* TODO: in-kernel irqchip for hvf */
@@ -3695,21 +3694,11 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 if (!cpu->enable_l3_cache) {
 *ecx = 0;
 } else {
-if (env->cache_info && !cpu->legacy_cache) {
-*ecx = cpuid2_cache_descriptor(>cache_info->l3_cache);
-} else {
-*ecx = cpuid2_cache_descriptor(_l3_cache);
-}
-}
-if (env->cache_info && !cpu->legacy_cache) {
-*edx = (cpuid2_cache_descriptor(>cache_info->l1d_cache) << 
16) |
-   (cpuid2_cache_descriptor(>cache_info->l1i_cache) <<  
8) |
-   (cpuid2_cache_descriptor(>cache_info->l2_cache));
-} else {
-*edx = (cpuid2_cache_descriptor(_l1d_cache) << 16) |
-   (cpuid2_cache_descriptor(_l1i_cache) <<  8) |
-   (cpuid2_cache_descriptor(_l2_cache_cpuid2));
+*ecx = cpuid2_cache_descriptor(env->cache_info_cpuid2.l3_cache);
 }
+*edx = (cpuid2_cache_descriptor(env->cache_info_cpuid2.l1d_cache) << 
16) |
+   (cpuid2_cache_descriptor(env->cache_info_cpuid2.l1i_cache) <<  
8) |
+   (cpuid2_cache_descriptor(env->cache_info_cpuid2.l2_cache));
 break;
 case 4:
 /* cache info: needed for Core compatibility */
@@ -3722,35 +3711,27 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 }
 } else {
 *eax = 0;
-CPUCacheInfo *l1d, *l1i, *l2, *l3;
-if (env->cache_info && !cpu->legacy_cache) {
-l1d = >cache_info->l1d_cache;
-l1i = >cache_info->l1i_cache;
-l2 = >cache_info->l2_cache;
-l3 = >cache_info->l3_cache;
-} else {
-l1d = _l1d_cache;
-l1i = _l1i_cache;
-l2 = _l2_cache;
-l3 = _l3_cache;
-}
 switch (count) {
 case 0: /* L1 dcache info */
-encode_cache_cpuid4(l1d, 1, cs->nr_cores,
+encode_cache_cpuid4(env->cache_info_cpuid4.l1d_cache,
+1, cs->nr_cores,
 eax, ebx, ecx, edx);
 break;
 case 1: /* L1 icache info */
-encode_cache_cpuid4(l1i, 1, cs->nr_cores,
+encode_cache_cpuid4(env->cache_info_cpuid4.l1i_cache,
+1, cs->nr_cores,
 eax, ebx, ecx, edx);
 break;
 case 2: /* L2 cache info */
-encode_cache_cpuid4(l2, cs->nr_threads, cs->nr_cores,
+encode_cache_cpuid4(env->cache_info_cpuid4.l2_cache,
+   

[Qemu-devel] [PATCH v10 3/5] i386: Add support for CPUID_8000_001E for AMD

2018-05-21 Thread Babu Moger
Add support for cpuid leaf CPUID_8000_001E. Build the config that closely
match the underlying hardware. Please refer Processor Programming Reference
(PPR) for AMD Family 17h Model for more details.

Signed-off-by: Babu Moger 
---
 target/i386/cpu.c | 85 +++
 1 file changed, 85 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 1dd060a..d9ccaad 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -415,6 +415,86 @@ static void encode_cache_cpuid801d(CPUCacheInfo 
*cache, CPUState *cs,
(cache->complex_indexing ? CACHE_COMPLEX_IDX : 0);
 }
 
+/* Data structure to hold the configuration info for a given core index */
+struct epyc_topo {
+/* core complex id of the current core index */
+int ccx_id;
+/* new core id for this core index in the topology */
+int core_id;
+/* Node(or Die) id this core index */
+int node_id;
+/* Number of nodes(or dies) in this config, 0 based */
+int num_nodes;
+};
+
+/*
+ * Build the configuration closely match the EPYC hardware
+ * nr_cores : Total number of cores in the config
+ * core_id  : Core index of the current CPU
+ * topo : Data structure to hold all the config info for this core index
+ * Rules
+ * Max ccx in a node(die) = 2
+ * Max cores in a ccx = 4
+ * Max nodes(dies)= 4 (1, 2, 4)
+ * Max sockets= 2
+ * Maintain symmetry as much as possible
+ */
+static void epyc_build_topology(int nr_cores, int core_id,
+struct epyc_topo *topo)
+{
+int nodes = 1, cores_in_ccx;
+int i;
+
+/* Lets see if we can fit  all the cores in one ccx */
+if (nr_cores <= MAX_CORES_IN_CCX) {
+cores_in_ccx = nr_cores;
+goto topo;
+}
+/*
+ * Figure out the number of nodes(or dies) required to build
+ * this config. Max cores in a node is 8
+ */
+for (i = nodes; i <= MAX_NODES_EPYC; i++) {
+if (nr_cores <= (i * MAX_CORES_IN_NODE)) {
+nodes = i;
+break;
+}
+/* We support nodes 1, 2, 4 */
+if (i == 3) {
+continue;
+}
+}
+/* Spread the cores accros all the CCXs and return max cores in a ccx */
+cores_in_ccx = (nr_cores / (nodes * MAX_CCX)) +
+   ((nr_cores % (nodes * MAX_CCX)) ? 1 : 0);
+
+topo:
+topo->node_id = core_id / (cores_in_ccx * MAX_CCX);
+topo->ccx_id = (core_id % (cores_in_ccx * MAX_CCX)) / cores_in_ccx;
+topo->core_id = core_id % cores_in_ccx;
+/* num_nodes is 0 based, return n - 1 */
+topo->num_nodes = nodes - 1;
+}
+
+/* Encode cache info for CPUID[801E] */
+static void encode_topo_cpuid801e(CPUState *cs, X86CPU *cpu,
+   uint32_t *eax, uint32_t *ebx,
+   uint32_t *ecx, uint32_t *edx)
+{
+struct epyc_topo topo = {0};
+
+*eax = cpu->apic_id;
+epyc_build_topology(cs->nr_cores, cpu->core_id, );
+if (cs->nr_threads - 1) {
+*ebx = ((cs->nr_threads - 1) << 8) | (topo.node_id << 3) |
+(topo.ccx_id << 2) | topo.core_id;
+} else {
+*ebx = (topo.node_id << 4) | (topo.ccx_id << 3) | topo.core_id;
+}
+*ecx = (topo.num_nodes << 8) | (cpu->socket_id << 2) | topo.node_id;
+*edx = 0;
+}
+
 /*
  * Definitions of the hardcoded cache entries we expose:
  * These are legacy cache values. If there is a need to change any
@@ -4108,6 +4188,11 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 break;
 }
 break;
+case 0x801E:
+assert(cpu->core_id <= 255);
+encode_topo_cpuid801e(cs, cpu,
+  eax, ebx, ecx, edx);
+break;
 case 0xC000:
 *eax = env->cpuid_xlevel2;
 *ebx = 0;
-- 
1.8.3.1




Re: [Qemu-devel] [PATCH 02/27] Make tb_invalidate_phys_addr() take a MemTxAttrs argument

2018-05-21 Thread Richard Henderson
On 05/21/2018 07:03 AM, Peter Maydell wrote:
> As part of plumbing MemTxAttrs down to the IOMMU translate method,
> add MemTxAttrs as an argument to tb_invalidate_phys_addr().
> Its callers either have an attrs value to hand, or don't care
> and can use MEMTXATTRS_UNSPECIFIED.
> 
> Signed-off-by: Peter Maydell 
> ---

Reviewed-by: Richard Henderson 

r~



[Qemu-devel] [PATCH v3 14/17] translate-all: protect TB jumps with a per-destination-TB lock

2018-05-21 Thread Emilio G. Cota
This applies to both user-mode and !user-mode emulation.

Instead of relying on a global lock, protect the list of incoming
jumps with tb->jmp_lock. This lock also protects tb->cflags,
so update all tb->cflags readers outside tb->jmp_lock to use
atomic reads via tb_cflags().

In order to find the destination TB (and therefore its jmp_lock)
from the origin TB, we introduce tb->jmp_dest[].

I considered not using a linked list of jumps, which simplifies
code and makes the struct smaller. However, it unnecessarily increases
memory usage, which results in a performance decrease. See for
instance these numbers booting+shutting down debian-arm:
  Time (s)  Rel. err (%)  Abs. err (s)  Rel. slowdown (%)
--
 before  20.88  0.74  0.154512 0.
 after   20.81  0.38  0.079078-0.33524904
 GTree   21.02  0.28  0.058856 0.67049808
 GHashTable + xxhash 21.63  1.08  0.233604  3.5919540

Using a hash table or a binary tree to keep track of the jumps
doesn't really pay off, not only due to the increased memory usage,
but also because most TBs have only 0 or 1 jumps to them. The maximum
number of jumps when booting debian-arm that I measured is 35, but
as we can see in the histogram below a TB with that many incoming jumps
is extremely rare; the average TB has 0.80 incoming jumps.

n_jumps: 379208; avg jumps/tb: 0.801099
dist: [0.0,1.0)|▄█▁▁▁ ▁▁ ▁▁▁  ▁▁▁ ▁|[34.0,35.0]

Signed-off-by: Emilio G. Cota 
---
 docs/devel/multi-thread-tcg.txt |   6 +-
 include/exec/exec-all.h |  35 +++-
 accel/tcg/cpu-exec.c|  41 +-
 accel/tcg/translate-all.c   | 118 
 4 files changed, 124 insertions(+), 76 deletions(-)

diff --git a/docs/devel/multi-thread-tcg.txt b/docs/devel/multi-thread-tcg.txt
index faf09c6..df83445 100644
--- a/docs/devel/multi-thread-tcg.txt
+++ b/docs/devel/multi-thread-tcg.txt
@@ -131,8 +131,10 @@ DESIGN REQUIREMENT: Safely handle invalidation of TBs
 
 The direct jump themselves are updated atomically by the TCG
 tb_set_jmp_target() code. Modification to the linked lists that allow
-searching for linked pages are done under the protect of the
-tb_lock().
+searching for linked pages are done under the protection of tb->jmp_lock,
+where tb is the destination block of a jump. Each origin block keeps a
+pointer to its destinations so that the appropriate lock can be acquired before
+iterating over a jump list.
 
 The global page table is a lockless radix tree; cmpxchg is used
 to atomically insert new elements.
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 66902f7..daac968 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -344,7 +344,7 @@ struct TranslationBlock {
 #define CF_LAST_IO 0x8000 /* Last insn may be an IO access.  */
 #define CF_NOCACHE 0x0001 /* To be freed after execution */
 #define CF_USE_ICOUNT  0x0002
-#define CF_INVALID 0x0004 /* TB is stale. Setters need tb_lock */
+#define CF_INVALID 0x0004 /* TB is stale. Set with @jmp_lock held */
 #define CF_PARALLEL0x0008 /* Generate code for a parallel context */
 /* cflags' mask for hashing/comparison */
 #define CF_HASH_MASK   \
@@ -363,6 +363,9 @@ struct TranslationBlock {
 uintptr_t page_next[2];
 tb_page_addr_t page_addr[2];
 
+/* jmp_lock placed here to fill a 4-byte hole. Its documentation is below 
*/
+QemuSpin jmp_lock;
+
 /* The following data are used to directly call another TB from
  * the code of this one. This can be done either by emitting direct or
  * indirect native jump instructions. These jumps are reset so that the TB
@@ -374,20 +377,26 @@ struct TranslationBlock {
 #define TB_JMP_RESET_OFFSET_INVALID 0x /* indicates no jump generated */
 uintptr_t jmp_target_arg[2];  /* target address or offset */
 
-/* Each TB has an associated circular list of TBs jumping to this one.
- * jmp_list_first points to the first TB jumping to this one.
- * jmp_list_next is used to point to the next TB in a list.
- * Since each TB can have two jumps, it can participate in two lists.
- * jmp_list_first and jmp_list_next are 4-byte aligned pointers to a
- * TranslationBlock structure, but the two least significant bits of
- * them are used to encode which data field of the pointed TB should
- * be used to traverse the list further from that TB:
- * 0 => jmp_list_next[0], 1 => jmp_list_next[1], 2 => jmp_list_first.
- * In other words, 0/1 tells which jump is used in the pointed TB,
- * and 2 means that this is a pointer back to the target TB of this list.
+/*
+ * Each TB has a NULL-terminated list (jmp_list_head) of incoming jumps.
+ * Each TB can 

[Qemu-devel] [PATCH v3 10/17] translate-all: use per-page locking in !user-mode

2018-05-21 Thread Emilio G. Cota
Groundwork for supporting parallel TCG generation.

Instead of using a global lock (tb_lock) to protect changes
to pages, use fine-grained, per-page locks in !user-mode.
User-mode stays with mmap_lock.

Sometimes changes need to happen atomically on more than one
page (e.g. when a TB that spans across two pages is
added/invalidated, or when a range of pages is invalidated).
We therefore introduce struct page_collection, which helps
us keep track of a set of pages that have been locked in
the appropriate locking order (i.e. by ascending page index).

This commit first introduces the structs and the function helpers,
to then convert the calling code to use per-page locking. Note
that tb_lock is not removed yet.

While at it, rename tb_alloc_page to tb_page_add, which pairs with
tb_page_remove.

Signed-off-by: Emilio G. Cota 
---
 accel/tcg/translate-all.h |   3 +
 include/exec/exec-all.h   |   3 +-
 accel/tcg/translate-all.c | 444 +-
 3 files changed, 409 insertions(+), 41 deletions(-)

diff --git a/accel/tcg/translate-all.h b/accel/tcg/translate-all.h
index ba8e4d6..6d1d258 100644
--- a/accel/tcg/translate-all.h
+++ b/accel/tcg/translate-all.h
@@ -23,6 +23,9 @@
 
 
 /* translate-all.c */
+struct page_collection *page_collection_lock(tb_page_addr_t start,
+ tb_page_addr_t end);
+void page_collection_unlock(struct page_collection *set);
 void tb_invalidate_phys_page_fast(tb_page_addr_t start, int len);
 void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
int is_cpu_write_access);
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index b2d8c8e..3fad93b 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -358,7 +358,8 @@ struct TranslationBlock {
 /* original tb when cflags has CF_NOCACHE */
 struct TranslationBlock *orig_tb;
 /* first and second physical page containing code. The lower bit
-   of the pointer tells the index in page_next[] */
+   of the pointer tells the index in page_next[].
+   The list is protected by the TB's page('s) lock(s) */
 uintptr_t page_next[2];
 tb_page_addr_t page_addr[2];
 
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index bd08bce..14c2c23 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -112,8 +112,55 @@ typedef struct PageDesc {
 #else
 unsigned long flags;
 #endif
+#ifndef CONFIG_USER_ONLY
+QemuSpin lock;
+#endif
 } PageDesc;
 
+/**
+ * struct page_entry - page descriptor entry
+ * @pd: pointer to the  PageDesc of the page this entry represents
+ * @index:  page index of the page
+ * @locked: whether the page is locked
+ *
+ * This struct helps us keep track of the locked state of a page, without
+ * bloating  PageDesc.
+ *
+ * A page lock protects accesses to all fields of  PageDesc.
+ *
+ * See also:  page_collection.
+ */
+struct page_entry {
+PageDesc *pd;
+tb_page_addr_t index;
+bool locked;
+};
+
+/**
+ * struct page_collection - tracks a set of pages (i.e.  page_entry's)
+ * @tree:   Binary search tree (BST) of the pages, with key == page index
+ * @max:Pointer to the page in @tree with the highest page index
+ *
+ * To avoid deadlock we lock pages in ascending order of page index.
+ * When operating on a set of pages, we need to keep track of them so that
+ * we can lock them in order and also unlock them later. For this we collect
+ * pages (i.e.  page_entry's) in a binary search @tree. Given that the
+ * @tree implementation we use does not provide an O(1) operation to obtain the
+ * highest-ranked element, we use @max to keep track of the inserted page
+ * with the highest index. This is valuable because if a page is not in
+ * the tree and its index is higher than @max's, then we can lock it
+ * without breaking the locking order rule.
+ *
+ * Note on naming: 'struct page_set' would be shorter, but we already have a 
few
+ * page_set_*() helpers, so page_collection is used instead to avoid confusion.
+ *
+ * See also: page_collection_lock().
+ */
+struct page_collection {
+GTree *tree;
+struct page_entry *max;
+};
+
 /* list iterators for lists of tagged pointers in TranslationBlock */
 #define TB_FOR_EACH_TAGGED(head, tb, n, field)  \
 for (n = (head) & 1, tb = (TranslationBlock *)((head) & ~1);\
@@ -507,6 +554,15 @@ static PageDesc *page_find_alloc(tb_page_addr_t index, int 
alloc)
 return NULL;
 }
 pd = g_new0(PageDesc, V_L2_SIZE);
+#ifndef CONFIG_USER_ONLY
+{
+int i;
+
+for (i = 0; i < V_L2_SIZE; i++) {
+qemu_spin_init([i].lock);
+}
+}
+#endif
 existing = atomic_cmpxchg(lp, NULL, pd);
 if (unlikely(existing)) {
 g_free(pd);
@@ -522,6 +578,253 @@ static inline PageDesc *page_find(tb_page_addr_t 

[Qemu-devel] [PATCH v3 11/17] translate-all: add page_locked assertions

2018-05-21 Thread Emilio G. Cota
This is only compiled under CONFIG_DEBUG_TCG to avoid
bloating the binary.

In user-mode, assert_page_locked is equivalent to assert_mmap_lock.

Note: There are some tb_lock assertions left that will be
removed by later patches.

Suggested-by: Alex Bennée 
Signed-off-by: Emilio G. Cota 
---
 accel/tcg/translate-all.c | 81 +--
 1 file changed, 78 insertions(+), 3 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 14c2c23..8286203 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -583,6 +583,9 @@ static void page_lock_pair(PageDesc **ret_p1, 
tb_page_addr_t phys1,
 
 /* In user-mode page locks aren't used; mmap_lock is enough */
 #ifdef CONFIG_USER_ONLY
+
+#define assert_page_locked(pd) tcg_debug_assert(have_mmap_lock())
+
 static inline void page_lock(PageDesc *pd)
 { }
 
@@ -605,14 +608,80 @@ void page_collection_unlock(struct page_collection *set)
 { }
 #else /* !CONFIG_USER_ONLY */
 
+#ifdef CONFIG_DEBUG_TCG
+
+static __thread GHashTable *ht_pages_locked_debug;
+
+static void ht_pages_locked_debug_init(void)
+{
+if (ht_pages_locked_debug) {
+return;
+}
+ht_pages_locked_debug = g_hash_table_new(NULL, NULL);
+}
+
+static bool page_is_locked(const PageDesc *pd)
+{
+PageDesc *found;
+
+ht_pages_locked_debug_init();
+found = g_hash_table_lookup(ht_pages_locked_debug, pd);
+return !!found;
+}
+
+static void page_lock__debug(PageDesc *pd)
+{
+ht_pages_locked_debug_init();
+g_assert(!page_is_locked(pd));
+g_hash_table_insert(ht_pages_locked_debug, pd, pd);
+}
+
+static void page_unlock__debug(const PageDesc *pd)
+{
+bool removed;
+
+ht_pages_locked_debug_init();
+g_assert(page_is_locked(pd));
+removed = g_hash_table_remove(ht_pages_locked_debug, pd);
+g_assert(removed);
+}
+
+static void
+do_assert_page_locked(const PageDesc *pd, const char *file, int line)
+{
+if (unlikely(!page_is_locked(pd))) {
+error_report("assert_page_lock: PageDesc %p not locked @ %s:%d",
+ pd, file, line);
+abort();
+}
+}
+
+#define assert_page_locked(pd) do_assert_page_locked(pd, __FILE__, __LINE__)
+
+#else /* !CONFIG_DEBUG_TCG */
+
+#define assert_page_locked(pd)
+
+static inline void page_lock__debug(const PageDesc *pd)
+{
+}
+
+static inline void page_unlock__debug(const PageDesc *pd)
+{
+}
+
+#endif /* CONFIG_DEBUG_TCG */
+
 static inline void page_lock(PageDesc *pd)
 {
+page_lock__debug(pd);
 qemu_spin_lock(>lock);
 }
 
 static inline void page_unlock(PageDesc *pd)
 {
 qemu_spin_unlock(>lock);
+page_unlock__debug(pd);
 }
 
 /* lock the page(s) of a TB in the correct acquisition order */
@@ -775,6 +844,7 @@ page_collection_lock(tb_page_addr_t start, tb_page_addr_t 
end)
 g_tree_foreach(set->tree, page_entry_unlock, NULL);
 goto retry;
 }
+assert_page_locked(pd);
 PAGE_FOR_EACH_TB(pd, tb, n) {
 if (page_trylock_add(set, tb->page_addr[0]) ||
 (tb->page_addr[1] != -1 &&
@@ -1113,6 +1183,7 @@ static TranslationBlock *tb_alloc(target_ulong pc)
 /* call with @p->lock held */
 static inline void invalidate_page_bitmap(PageDesc *p)
 {
+assert_page_locked(p);
 #ifdef CONFIG_SOFTMMU
 g_free(p->code_bitmap);
 p->code_bitmap = NULL;
@@ -1269,6 +1340,7 @@ static inline void tb_page_remove(PageDesc *pd, 
TranslationBlock *tb)
 uintptr_t *pprev;
 unsigned int n1;
 
+assert_page_locked(pd);
 pprev = >first_tb;
 PAGE_FOR_EACH_TB(pd, tb1, n1) {
 if (tb1 == tb) {
@@ -1417,6 +1489,7 @@ static void build_page_bitmap(PageDesc *p)
 int n, tb_start, tb_end;
 TranslationBlock *tb;
 
+assert_page_locked(p);
 p->code_bitmap = bitmap_new(TARGET_PAGE_SIZE);
 
 PAGE_FOR_EACH_TB(p, tb, n) {
@@ -1450,7 +1523,7 @@ static inline void tb_page_add(PageDesc *p, 
TranslationBlock *tb,
 bool page_already_protected;
 #endif
 
-assert_memory_lock();
+assert_page_locked(p);
 
 tb->page_addr[n] = page_addr;
 tb->page_next[n] = p->first_tb;
@@ -1721,8 +1794,7 @@ tb_invalidate_phys_page_range__locked(struct 
page_collection *pages,
 uint32_t current_flags = 0;
 #endif /* TARGET_HAS_PRECISE_SMC */
 
-assert_memory_lock();
-assert_tb_locked();
+assert_page_locked(p);
 
 #if defined(TARGET_HAS_PRECISE_SMC)
 if (cpu != NULL) {
@@ -1734,6 +1806,7 @@ tb_invalidate_phys_page_range__locked(struct 
page_collection *pages,
 /* XXX: see if in some cases it could be faster to invalidate all
the code */
 PAGE_FOR_EACH_TB(p, tb, n) {
+assert_page_locked(p);
 /* NOTE: this is subtle as a TB may span two physical pages */
 if (n == 0) {
 /* NOTE: tb_end may be after the end of the page, but
@@ -1891,6 +1964,7 @@ void tb_invalidate_phys_page_fast(tb_page_addr_t start, 
int len)
 }
 
 pages = 

[Qemu-devel] [PATCH v3 03/17] tcg: track TBs with per-region BST's

2018-05-21 Thread Emilio G. Cota
This paves the way for enabling scalable parallel generation of TCG code.

Instead of tracking TBs with a single binary search tree (BST), use a
BST for each TCG region, protecting it with a lock. This is as scalable
as it gets, since each TCG thread operates on a separate region.

The core of this change is the introduction of struct tcg_region_tree,
which contains a pointer to a GTree and an associated lock to serialize
accesses to it. We then allocate an array of tcg_region_tree's, adding
the appropriate padding to avoid false sharing based on
qemu_dcache_linesize.

Given a tc_ptr, we first find the corresponding region_tree. This
is done by special-casing the first and last regions first, since they
might be of size != region.size; otherwise we just divide the offset
by region.stride. I was worried about this division (several dozen
cycles of latency), but profiling shows that this is not a fast path.
Note that region.stride is not required to be a power of two; it
is only required to be a multiple of the host's page size.

Note that with this design we can also provide consistent snapshots
about all region trees at once; for instance, tcg_tb_foreach
acquires/releases all region_tree locks before/after iterating over them.
For this reason we now drop tb_lock in dump_exec_info().

As an alternative I considered implementing a concurrent BST, but this
can be tricky to get right, offers no consistent snapshots of the BST,
and performance and scalability-wise I don't think it could ever beat
having separate GTrees, given that our workload is insert-mostly (all
concurrent BST designs I've seen focus, understandably, on making
lookups fast, which comes at the expense of convoluted, non-wait-free
insertions/removals).

Reviewed-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Signed-off-by: Emilio G. Cota 
---
 include/exec/exec-all.h   |   1 -
 include/exec/tb-context.h |   1 -
 tcg/tcg.h |   6 ++
 accel/tcg/cpu-exec.c  |   2 +-
 accel/tcg/translate-all.c | 101 
 tcg/tcg.c | 191 ++
 6 files changed, 213 insertions(+), 89 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index bd68328..6207b4d 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -404,7 +404,6 @@ static inline uint32_t curr_cflags(void)
  | (use_icount ? CF_USE_ICOUNT : 0);
 }
 
-void tb_remove(TranslationBlock *tb);
 void tb_flush(CPUState *cpu);
 void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr);
 TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,
diff --git a/include/exec/tb-context.h b/include/exec/tb-context.h
index 1d41202..d8472c8 100644
--- a/include/exec/tb-context.h
+++ b/include/exec/tb-context.h
@@ -31,7 +31,6 @@ typedef struct TBContext TBContext;
 
 struct TBContext {
 
-GTree *tb_tree;
 struct qht htable;
 /* any access to the tbs or the page table must use this lock */
 QemuMutex tb_lock;
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 08f8bbf..afe8492 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -866,6 +866,12 @@ void tcg_region_reset_all(void);
 size_t tcg_code_size(void);
 size_t tcg_code_capacity(void);
 
+void tcg_tb_insert(TranslationBlock *tb);
+void tcg_tb_remove(TranslationBlock *tb);
+TranslationBlock *tcg_tb_lookup(uintptr_t tc_ptr);
+void tcg_tb_foreach(GTraverseFunc func, gpointer user_data);
+size_t tcg_nb_tbs(void);
+
 /* user-mode: Called with tb_lock held.  */
 static inline void *tcg_malloc(int size)
 {
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index aefc682..7b934a6 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -225,7 +225,7 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
 
 tb_lock();
 tb_phys_invalidate(tb, -1);
-tb_remove(tb);
+tcg_tb_remove(tb);
 tb_unlock();
 }
 #endif
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 8080eb7..e9341f3 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -205,8 +205,6 @@ void tb_lock_reset(void)
 }
 }
 
-static TranslationBlock *tb_find_pc(uintptr_t tc_ptr);
-
 void cpu_gen_init(void)
 {
 tcg_context_init(_init_ctx);
@@ -375,13 +373,13 @@ bool cpu_restore_state(CPUState *cpu, uintptr_t host_pc, 
bool will_exit)
 
 if (check_offset < tcg_init_ctx.code_gen_buffer_size) {
 tb_lock();
-tb = tb_find_pc(host_pc);
+tb = tcg_tb_lookup(host_pc);
 if (tb) {
 cpu_restore_state_from_tb(cpu, tb, host_pc, will_exit);
 if (tb->cflags & CF_NOCACHE) {
 /* one-shot translation, invalidate it immediately */
 tb_phys_invalidate(tb, -1);
-tb_remove(tb);
+tcg_tb_remove(tb);
 }
 r = true;
 }
@@ -728,48 +726,6 @@ static inline void 

[Qemu-devel] [PATCH v3 16/17] translate-all: remove tb_lock mention from cpu_restore_state_from_tb

2018-05-21 Thread Emilio G. Cota
tb_lock was needed when the function did retranslation. However,
since fca8a500d519 ("tcg: Save insn data and use it in
cpu_restore_state_from_tb") we don't do retranslation.

Get rid of the comment.

Reviewed-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 accel/tcg/translate-all.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 031060f..6cc7b94 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -355,7 +355,6 @@ static int encode_search(TranslationBlock *tb, uint8_t 
*block)
 }
 
 /* The cpu state corresponding to 'searched_pc' is restored.
- * Called with tb_lock held.
  * When reset_icount is true, current TB will be interrupted and
  * icount should be recalculated.
  */
-- 
2.7.4




[Qemu-devel] [PATCH v3 02/17] qht: return existing entry when qht_insert fails

2018-05-21 Thread Emilio G. Cota
The meaning of "existing" is now changed to "matches in hash and
ht->cmp result". This is saner than just checking the pointer value.

Suggested-by: Richard Henderson 
Reviewed-by:  Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 include/qemu/qht.h|  7 +--
 accel/tcg/translate-all.c |  2 +-
 tests/qht-bench.c |  4 ++--
 tests/test-qht.c  |  8 +++-
 util/qht.c| 27 +--
 5 files changed, 32 insertions(+), 16 deletions(-)

diff --git a/include/qemu/qht.h b/include/qemu/qht.h
index 5f03a0f..1fb9116 100644
--- a/include/qemu/qht.h
+++ b/include/qemu/qht.h
@@ -70,6 +70,7 @@ void qht_destroy(struct qht *ht);
  * @ht: QHT to insert to
  * @p: pointer to be inserted
  * @hash: hash corresponding to @p
+ * @existing: address where the pointer to an existing entry can be copied to
  *
  * Attempting to insert a NULL @p is a bug.
  * Inserting the same pointer @p with different @hash values is a bug.
@@ -78,9 +79,11 @@ void qht_destroy(struct qht *ht);
  * inserted into the hash table.
  *
  * Returns true on success.
- * Returns false if the @p-@hash pair already exists in the hash table.
+ * Returns false if there is an existing entry in the table that is equivalent
+ * (i.e. ht->cmp matches and the hash is the same) to @p-@h. If @existing
+ * is !NULL, a pointer to this existing entry is copied to it.
  */
-bool qht_insert(struct qht *ht, void *p, uint32_t hash);
+bool qht_insert(struct qht *ht, void *p, uint32_t hash, void **existing);
 
 /**
  * qht_lookup_custom - Look up a pointer using a custom comparison function.
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 5b7b91d..8080eb7 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1242,7 +1242,7 @@ static void tb_link_page(TranslationBlock *tb, 
tb_page_addr_t phys_pc,
 /* add in the hash table */
 h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags & CF_HASH_MASK,
  tb->trace_vcpu_dstate);
-qht_insert(_ctx.htable, tb, h);
+qht_insert(_ctx.htable, tb, h, NULL);
 
 #ifdef CONFIG_USER_ONLY
 if (DEBUG_TB_CHECK_GATE) {
diff --git a/tests/qht-bench.c b/tests/qht-bench.c
index c94ac25..f492b3a 100644
--- a/tests/qht-bench.c
+++ b/tests/qht-bench.c
@@ -163,7 +163,7 @@ static void do_rw(struct thread_info *info)
 bool written = false;
 
 if (qht_lookup(, p, hash) == NULL) {
-written = qht_insert(, p, hash);
+written = qht_insert(, p, hash, NULL);
 }
 if (written) {
 stats->in++;
@@ -322,7 +322,7 @@ static void htable_init(void)
 r = xorshift64star(r);
 p = [r & (init_range - 1)];
 hash = h(*p);
-if (qht_insert(, p, hash)) {
+if (qht_insert(, p, hash, NULL)) {
 break;
 }
 retries++;
diff --git a/tests/test-qht.c b/tests/test-qht.c
index b069881..dda6a06 100644
--- a/tests/test-qht.c
+++ b/tests/test-qht.c
@@ -27,11 +27,17 @@ static void insert(int a, int b)
 
 for (i = a; i < b; i++) {
 uint32_t hash;
+void *existing;
+bool inserted;
 
 arr[i] = i;
 hash = i;
 
-qht_insert(, [i], hash);
+inserted = qht_insert(, [i], hash, NULL);
+g_assert_true(inserted);
+inserted = qht_insert(, [i], hash, );
+g_assert_false(inserted);
+g_assert_true(existing == [i]);
 }
 }
 
diff --git a/util/qht.c b/util/qht.c
index 8610ce3..9d030e7 100644
--- a/util/qht.c
+++ b/util/qht.c
@@ -511,9 +511,9 @@ void *qht_lookup(struct qht *ht, const void *userp, 
uint32_t hash)
 }
 
 /* call with head->lock held */
-static bool qht_insert__locked(struct qht *ht, struct qht_map *map,
-   struct qht_bucket *head, void *p, uint32_t hash,
-   bool *needs_resize)
+static void *qht_insert__locked(struct qht *ht, struct qht_map *map,
+struct qht_bucket *head, void *p, uint32_t 
hash,
+bool *needs_resize)
 {
 struct qht_bucket *b = head;
 struct qht_bucket *prev = NULL;
@@ -523,8 +523,9 @@ static bool qht_insert__locked(struct qht *ht, struct 
qht_map *map,
 do {
 for (i = 0; i < QHT_BUCKET_ENTRIES; i++) {
 if (b->pointers[i]) {
-if (unlikely(b->pointers[i] == p)) {
-return false;
+if (unlikely(b->hashes[i] == hash &&
+ ht->cmp(b->pointers[i], p))) {
+return b->pointers[i];
 }
 } else {
 goto found;
@@ -553,7 +554,7 @@ static bool qht_insert__locked(struct qht *ht, struct 
qht_map *map,
 atomic_set(>hashes[i], hash);
 atomic_set(>pointers[i], p);
 seqlock_write_end(>sequence);
-   

[Qemu-devel] [PATCH v3 13/17] translate-all: discard TB when tb_link_page returns an existing matching TB

2018-05-21 Thread Emilio G. Cota
Use the recently-gained QHT feature of returning the matching TB if it
already exists. This allows us to get rid of the lookup we perform
right after acquiring tb_lock.

Suggested-by: Richard Henderson 
Reviewed-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 docs/devel/multi-thread-tcg.txt |  3 +++
 accel/tcg/cpu-exec.c| 14 ++--
 accel/tcg/translate-all.c   | 50 +
 3 files changed, 46 insertions(+), 21 deletions(-)

diff --git a/docs/devel/multi-thread-tcg.txt b/docs/devel/multi-thread-tcg.txt
index faf8918..faf09c6 100644
--- a/docs/devel/multi-thread-tcg.txt
+++ b/docs/devel/multi-thread-tcg.txt
@@ -140,6 +140,9 @@ to atomically insert new elements.
 The lookup caches are updated atomically and the lookup hash uses QHT
 which is designed for concurrent safe lookup.
 
+Parallel code generation is supported. QHT is used at insertion time
+as the synchronization point across threads, thereby ensuring that we only
+keep track of a single TranslationBlock for each guest code block.
 
 Memory maps and TLBs
 
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index ad1f0c4..b45d10f 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -246,10 +246,7 @@ void cpu_exec_step_atomic(CPUState *cpu)
 if (tb == NULL) {
 mmap_lock();
 tb_lock();
-tb = tb_htable_lookup(cpu, pc, cs_base, flags, cf_mask);
-if (likely(tb == NULL)) {
-tb = tb_gen_code(cpu, pc, cs_base, flags, cflags);
-}
+tb = tb_gen_code(cpu, pc, cs_base, flags, cflags);
 tb_unlock();
 mmap_unlock();
 }
@@ -399,14 +396,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
 tb_lock();
 acquired_tb_lock = true;
 
-/* There's a chance that our desired tb has been translated while
- * taking the locks so we check again inside the lock.
- */
-tb = tb_htable_lookup(cpu, pc, cs_base, flags, cf_mask);
-if (likely(tb == NULL)) {
-/* if no translated code available, then translate it now */
-tb = tb_gen_code(cpu, pc, cs_base, flags, cf_mask);
-}
+tb = tb_gen_code(cpu, pc, cs_base, flags, cf_mask);
 
 mmap_unlock();
 /* We add the TB in the virtual pc hash table for the fast lookup */
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index f3a0ecb..f24dcb8 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1580,18 +1580,30 @@ static inline void tb_page_add(PageDesc *p, 
TranslationBlock *tb,
  * (-1) to indicate that only one page contains the TB.
  *
  * Called with mmap_lock held for user-mode emulation.
+ *
+ * Returns a pointer @tb, or a pointer to an existing TB that matches @tb.
+ * Note that in !user-mode, another thread might have already added a TB
+ * for the same block of guest code that @tb corresponds to. In that case,
+ * the caller should discard the original @tb, and use instead the returned TB.
  */
-static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
- tb_page_addr_t phys_page2)
+static TranslationBlock *
+tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
+ tb_page_addr_t phys_page2)
 {
 PageDesc *p;
 PageDesc *p2 = NULL;
+void *existing_tb = NULL;
 uint32_t h;
 
 assert_memory_lock();
 
 /*
  * Add the TB to the page list, acquiring first the pages's locks.
+ * We keep the locks held until after inserting the TB in the hash table,
+ * so that if the insertion fails we know for sure that the TBs are still
+ * in the page descriptors.
+ * Note that inserting into the hash table first isn't an option, since
+ * we can only insert TBs that are fully initialized.
  */
 page_lock_pair(, phys_pc, , phys_page2, 1);
 tb_page_add(p, tb, 0, phys_pc & TARGET_PAGE_MASK);
@@ -1601,21 +1613,33 @@ static void tb_link_page(TranslationBlock *tb, 
tb_page_addr_t phys_pc,
 tb->page_addr[1] = -1;
 }
 
+/* add in the hash table */
+h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags & CF_HASH_MASK,
+ tb->trace_vcpu_dstate);
+qht_insert(_ctx.htable, tb, h, _tb);
+
+/* remove TB from the page(s) if we couldn't insert it */
+if (unlikely(existing_tb)) {
+tb_page_remove(p, tb);
+invalidate_page_bitmap(p);
+if (p2) {
+tb_page_remove(p2, tb);
+invalidate_page_bitmap(p2);
+}
+tb = existing_tb;
+}
+
 if (p2) {
 page_unlock(p2);
 }
 page_unlock(p);
 
-/* add in the hash table */
-h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags & CF_HASH_MASK,
- tb->trace_vcpu_dstate);
-qht_insert(_ctx.htable, tb, h, NULL);
-
 

[Qemu-devel] [PATCH v3 08/17] translate-all: work page-by-page in tb_invalidate_phys_range_1

2018-05-21 Thread Emilio G. Cota
So that we pass a same-page range to tb_invalidate_phys_page_range,
instead of always passing an end address that could be on a different
page.

As discussed with Peter Maydell on the list [1], tb_invalidate_phys_page_range
doesn't actually do much with 'end', which explains why we have never
hit a bug despite going against what the comment on top of
tb_invalidate_phys_page_range requires:

> * Invalidate all TBs which intersect with the target physical address range
> * [start;end[. NOTE: start and end must refer to the *same* physical page.

The appended honours the comment, which avoids confusion.

While at it, rework the loop into a for loop, which is less error prone
(e.g. "continue" won't result in an infinite loop).

[1] https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg09165.html

Reviewed-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Signed-off-by: Emilio G. Cota 
---
 accel/tcg/translate-all.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 07674e4..8622f25 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1375,10 +1375,14 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
  */
 static void tb_invalidate_phys_range_1(tb_page_addr_t start, tb_page_addr_t 
end)
 {
-while (start < end) {
-tb_invalidate_phys_page_range(start, end, 0);
-start &= TARGET_PAGE_MASK;
-start += TARGET_PAGE_SIZE;
+tb_page_addr_t next;
+
+for (next = (start & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
+ start < end;
+ start = next, next += TARGET_PAGE_SIZE) {
+tb_page_addr_t bound = MIN(next, end);
+
+tb_invalidate_phys_page_range(start, bound, 0);
 }
 }
 
-- 
2.7.4




[Qemu-devel] [PATCH v3 05/17] translate-all: iterate over TBs in a page with PAGE_FOR_EACH_TB

2018-05-21 Thread Emilio G. Cota
This commit does several things, but to avoid churn I merged them all
into the same commit. To wit:

- Use uintptr_t instead of TranslationBlock * for the list of TBs in a page.
  Just like we did in (c37e6d7e "tcg: Use uintptr_t type for
  jmp_list_{next|first} fields of TB"), the rationale is the same: these
  are tagged pointers, not pointers. So use a more appropriate type.

- Only check the least significant bit of the tagged pointers. Masking
  with 3/~3 is unnecessary and confusing.

- Introduce the TB_FOR_EACH_TAGGED macro, and use it to define
  PAGE_FOR_EACH_TB, which improves readability. Note that
  TB_FOR_EACH_TAGGED will gain another user in a subsequent patch.

- Update tb_page_remove to use PAGE_FOR_EACH_TB. In case there
  is a bug and we attempt to remove a TB that is not in the list, instead
  of segfaulting (since the list is NULL-terminated) we will reach
  g_assert_not_reached().

Reviewed-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 include/exec/exec-all.h   |  2 +-
 accel/tcg/translate-all.c | 62 ++-
 2 files changed, 30 insertions(+), 34 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 6207b4d..b2d8c8e 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -359,7 +359,7 @@ struct TranslationBlock {
 struct TranslationBlock *orig_tb;
 /* first and second physical page containing code. The lower bit
of the pointer tells the index in page_next[] */
-struct TranslationBlock *page_next[2];
+uintptr_t page_next[2];
 tb_page_addr_t page_addr[2];
 
 /* The following data are used to directly call another TB from
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 8caf28d..7302d05 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -103,7 +103,7 @@
 
 typedef struct PageDesc {
 /* list of TBs intersecting this ram page */
-TranslationBlock *first_tb;
+uintptr_t first_tb;
 #ifdef CONFIG_SOFTMMU
 /* in order to optimize self modifying code, we count the number
of lookups we do to a given page to use a bitmap */
@@ -114,6 +114,15 @@ typedef struct PageDesc {
 #endif
 } PageDesc;
 
+/* list iterators for lists of tagged pointers in TranslationBlock */
+#define TB_FOR_EACH_TAGGED(head, tb, n, field)  \
+for (n = (head) & 1, tb = (TranslationBlock *)((head) & ~1);\
+ tb; tb = (TranslationBlock *)tb->field[n], n = (uintptr_t)tb & 1, \
+ tb = (TranslationBlock *)((uintptr_t)tb & ~1))
+
+#define PAGE_FOR_EACH_TB(pagedesc, tb, n)   \
+TB_FOR_EACH_TAGGED((pagedesc)->first_tb, tb, n, page_next)
+
 /* In system mode we want L1_MAP to be based on ram offsets,
while in user mode we want it to be based on virtual addresses.  */
 #if !defined(CONFIG_USER_ONLY)
@@ -815,7 +824,7 @@ static void page_flush_tb_1(int level, void **lp)
 PageDesc *pd = *lp;
 
 for (i = 0; i < V_L2_SIZE; ++i) {
-pd[i].first_tb = NULL;
+pd[i].first_tb = (uintptr_t)NULL;
 invalidate_page_bitmap(pd + i);
 }
 } else {
@@ -943,21 +952,21 @@ static void tb_page_check(void)
 
 #endif /* CONFIG_USER_ONLY */
 
-static inline void tb_page_remove(TranslationBlock **ptb, TranslationBlock *tb)
+static inline void tb_page_remove(PageDesc *pd, TranslationBlock *tb)
 {
 TranslationBlock *tb1;
+uintptr_t *pprev;
 unsigned int n1;
 
-for (;;) {
-tb1 = *ptb;
-n1 = (uintptr_t)tb1 & 3;
-tb1 = (TranslationBlock *)((uintptr_t)tb1 & ~3);
+pprev = >first_tb;
+PAGE_FOR_EACH_TB(pd, tb1, n1) {
 if (tb1 == tb) {
-*ptb = tb1->page_next[n1];
-break;
+*pprev = tb1->page_next[n1];
+return;
 }
-ptb = >page_next[n1];
+pprev = >page_next[n1];
 }
+g_assert_not_reached();
 }
 
 /* remove the TB from a list of TBs jumping to the n-th jump target of the TB 
*/
@@ -1045,12 +1054,12 @@ void tb_phys_invalidate(TranslationBlock *tb, 
tb_page_addr_t page_addr)
 /* remove the TB from the page list */
 if (tb->page_addr[0] != page_addr) {
 p = page_find(tb->page_addr[0] >> TARGET_PAGE_BITS);
-tb_page_remove(>first_tb, tb);
+tb_page_remove(p, tb);
 invalidate_page_bitmap(p);
 }
 if (tb->page_addr[1] != -1 && tb->page_addr[1] != page_addr) {
 p = page_find(tb->page_addr[1] >> TARGET_PAGE_BITS);
-tb_page_remove(>first_tb, tb);
+tb_page_remove(p, tb);
 invalidate_page_bitmap(p);
 }
 
@@ -1081,10 +1090,7 @@ static void build_page_bitmap(PageDesc *p)
 
 p->code_bitmap = bitmap_new(TARGET_PAGE_SIZE);
 
-tb = p->first_tb;
-while (tb != NULL) {
-n = (uintptr_t)tb & 3;
-tb = (TranslationBlock *)((uintptr_t)tb & ~3);
+PAGE_FOR_EACH_TB(p, tb, n) {
   

[Qemu-devel] [PATCH v3 15/17] cputlb: remove tb_lock from tlb_flush functions

2018-05-21 Thread Emilio G. Cota
The acquisition of tb_lock was added when the async tlb_flush
was introduced in e3b9ca810 ("cputlb: introduce tlb_flush_* async work.")

tb_lock was there to allow us to do memset() on the tb_jmp_cache's.
However, since f3ced3c5928 ("tcg: consistently access cpu->tb_jmp_cache
atomically") all accesses to tb_jmp_cache are atomic, so tb_lock
is not needed here. Get rid of it.

Reviewed-by: Alex Bennée 
Reviewed-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 accel/tcg/cputlb.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 0543903..f5c3a09 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -125,8 +125,6 @@ static void tlb_flush_nocheck(CPUState *cpu)
 atomic_set(>tlb_flush_count, env->tlb_flush_count + 1);
 tlb_debug("(count: %zu)\n", tlb_flush_count());
 
-tb_lock();
-
 memset(env->tlb_table, -1, sizeof(env->tlb_table));
 memset(env->tlb_v_table, -1, sizeof(env->tlb_v_table));
 cpu_tb_jmp_cache_clear(cpu);
@@ -135,8 +133,6 @@ static void tlb_flush_nocheck(CPUState *cpu)
 env->tlb_flush_addr = -1;
 env->tlb_flush_mask = 0;
 
-tb_unlock();
-
 atomic_mb_set(>pending_tlb_flush, 0);
 }
 
@@ -180,8 +176,6 @@ static void tlb_flush_by_mmuidx_async_work(CPUState *cpu, 
run_on_cpu_data data)
 
 assert_cpu_is_self(cpu);
 
-tb_lock();
-
 tlb_debug("start: mmu_idx:0x%04lx\n", mmu_idx_bitmask);
 
 for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
@@ -197,8 +191,6 @@ static void tlb_flush_by_mmuidx_async_work(CPUState *cpu, 
run_on_cpu_data data)
 cpu_tb_jmp_cache_clear(cpu);
 
 tlb_debug("done\n");
-
-tb_unlock();
 }
 
 void tlb_flush_by_mmuidx(CPUState *cpu, uint16_t idxmap)
-- 
2.7.4




[Qemu-devel] [PATCH v3 17/17] tcg: remove tb_lock

2018-05-21 Thread Emilio G. Cota
Use mmap_lock in user-mode to protect TCG state and the page
descriptors.
In !user-mode, each vCPU has its own TCG state, so no locks
needed. Per-page locks are used to protect the page descriptors.

Per-TB locks are used in both modes to protect TB jumps.

Some notes:

- tb_lock is removed from notdirty_mem_write by passing a
  locked page_collection to tb_invalidate_phys_page_fast.

- tcg_tb_lookup/remove/insert/etc have their own internal lock(s),
  so there is no need to further serialize access to them.

- do_tb_flush is run in a safe async context, meaning no other
  vCPU threads are running. Therefore acquiring mmap_lock there
  is just to please tools such as thread sanitizer.

- Not visible in the diff, but tb_invalidate_phys_page already
  has an assert_memory_lock.

- cpu_io_recompile is !user-only, so no mmap_lock there.

- Added mmap_unlock()'s before all siglongjmp's that could
  be called in user-mode while mmap_lock is held.
  + Added an assert for !have_mmap_lock() after returning from
the longjmp in cpu_exec, just like we do in cpu_exec_step_atomic.

Performance numbers before/after:

Host: AMD Opteron(tm) Processor 6376

 ubuntu 17.04 ppc64 bootup+shutdown time

  700 +-+--++--++---+*--+-+
  |++  ++   +   *B|
  | before ***B***** *|
  |tb lock removal ###D### ***|
  600 +-+   *** +-+
  |   ** #|
  |*B*  #D|
  | *** * ##  |
  500 +-+***   ###  +-+
  | * ***   ###   |
  |*B*  # ##  |
  |  ** *  #D#|
  400 +-+  **## +-+
  |  **   ### |
  |**   ##|
  |  ** # ##  |
  300 +-+  *   B*  #D#  +-+
  |B ***###   |
  |*   **     |
  | *   ***  ###  |
  200 +-+   B  *B #D#   +-+
  | #B* *   ## #  |
  | #*##  |
  |+ D##D# ++   ++|
  100 +-+--++--++---++--+-+
   18  16  Guest CPUs   48   64
  png: https://imgur.com/HwmBHXe

  debian jessie aarch64 bootup+shutdown time

  90 +-+--+-+-++++--+-+
 |+ + ++++|
 | before ***B***B|
  80 +tb lock removal ###D###  **D  +-+
 |   **###|
 | **##   |
  70 +-+ ** #   +-+
 | ** ##  |
 |   **  #|
  60 +-+   *B  ##   +-+
 |   **  ##   |
 |***  #D |
  50 +-+   ***   ## +-+
 | * **   ### |
 |   **B*  ###|
  40 +-+   # ## +-+
 |    #D# |
 | ***B**  ###|
  30 +-+B***B** +-+
 |B *   * # ###   |
 | B   ###D#  |
  20 +-+   D  ##D## +-+
 |  D#|
 |+ + ++++|
  10 +-+--+-+-++++--+-+
  1 8 16  Guest CPUs48   64
  png: https://imgur.com/iGpGFtv

The gains are high for 4-8 CPUs. Beyond that point, however, unrelated
lock contention significantly hurts scalability.


[Qemu-devel] [PATCH v3 09/17] translate-all: move tb_invalidate_phys_page_range up in the file

2018-05-21 Thread Emilio G. Cota
This greatly simplifies next commit's diff.

Reviewed-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Signed-off-by: Emilio G. Cota 
---
 accel/tcg/translate-all.c | 77 ---
 1 file changed, 39 insertions(+), 38 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 8622f25..bd08bce 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1365,44 +1365,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 
 /*
  * Invalidate all TBs which intersect with the target physical address range
- * [start;end[. NOTE: start and end may refer to *different* physical pages.
- * 'is_cpu_write_access' should be true if called from a real cpu write
- * access: the virtual CPU will exit the current TB if code is modified inside
- * this TB.
- *
- * Called with mmap_lock held for user-mode emulation, grabs tb_lock
- * Called with tb_lock held for system-mode emulation
- */
-static void tb_invalidate_phys_range_1(tb_page_addr_t start, tb_page_addr_t 
end)
-{
-tb_page_addr_t next;
-
-for (next = (start & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
- start < end;
- start = next, next += TARGET_PAGE_SIZE) {
-tb_page_addr_t bound = MIN(next, end);
-
-tb_invalidate_phys_page_range(start, bound, 0);
-}
-}
-
-#ifdef CONFIG_SOFTMMU
-void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
-{
-assert_tb_locked();
-tb_invalidate_phys_range_1(start, end);
-}
-#else
-void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
-{
-assert_memory_lock();
-tb_lock();
-tb_invalidate_phys_range_1(start, end);
-tb_unlock();
-}
-#endif
-/*
- * Invalidate all TBs which intersect with the target physical address range
  * [start;end[. NOTE: start and end must refer to the *same* physical page.
  * 'is_cpu_write_access' should be true if called from a real cpu write
  * access: the virtual CPU will exit the current TB if code is modified inside
@@ -1500,6 +1462,45 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, 
tb_page_addr_t end,
 #endif
 }
 
+/*
+ * Invalidate all TBs which intersect with the target physical address range
+ * [start;end[. NOTE: start and end may refer to *different* physical pages.
+ * 'is_cpu_write_access' should be true if called from a real cpu write
+ * access: the virtual CPU will exit the current TB if code is modified inside
+ * this TB.
+ *
+ * Called with mmap_lock held for user-mode emulation, grabs tb_lock
+ * Called with tb_lock held for system-mode emulation
+ */
+static void tb_invalidate_phys_range_1(tb_page_addr_t start, tb_page_addr_t 
end)
+{
+tb_page_addr_t next;
+
+for (next = (start & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
+ start < end;
+ start = next, next += TARGET_PAGE_SIZE) {
+tb_page_addr_t bound = MIN(next, end);
+
+tb_invalidate_phys_page_range(start, bound, 0);
+}
+}
+
+#ifdef CONFIG_SOFTMMU
+void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
+{
+assert_tb_locked();
+tb_invalidate_phys_range_1(start, end);
+}
+#else
+void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
+{
+assert_memory_lock();
+tb_lock();
+tb_invalidate_phys_range_1(start, end);
+tb_unlock();
+}
+#endif
+
 #ifdef CONFIG_SOFTMMU
 /* len must be <= 8 and start must be a multiple of len.
  * Called via softmmu_template.h when code areas are written to with
-- 
2.7.4




[Qemu-devel] [PATCH v3 01/17] qht: require a default comparison function

2018-05-21 Thread Emilio G. Cota
qht_lookup now uses the default cmp function. qht_lookup_custom is defined
to retain the old behaviour, that is a cmp function is explicitly provided.

qht_insert will gain use of the default cmp in the next patch.

Note that we move qht_lookup_custom's @func to be the last argument,
which makes the new qht_lookup as simple as possible.
Instead of this (i.e. keeping @func 2nd):
00010750 :
   10750:   89 d1   mov%edx,%ecx
   10752:   48 89 f2mov%rsi,%rdx
   10755:   48 8b 77 08 mov0x8(%rdi),%rsi
   10759:   e9 22 ff ff ff  jmpq   10680 
   1075e:   66 90   xchg   %ax,%ax

We get:
00010740 :
   10740:   48 8b 4f 08 mov0x8(%rdi),%rcx
   10744:   e9 37 ff ff ff  jmpq   10680 
   10749:   0f 1f 80 00 00 00 00nopl   0x0(%rax)

Reviewed-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Signed-off-by: Emilio G. Cota 
---
 include/qemu/qht.h| 25 -
 accel/tcg/cpu-exec.c  |  4 ++--
 accel/tcg/translate-all.c | 16 +++-
 tests/qht-bench.c | 14 +++---
 tests/test-qht.c  | 15 ++-
 util/qht.c| 14 +++---
 6 files changed, 65 insertions(+), 23 deletions(-)

diff --git a/include/qemu/qht.h b/include/qemu/qht.h
index 531aa95..5f03a0f 100644
--- a/include/qemu/qht.h
+++ b/include/qemu/qht.h
@@ -11,8 +11,11 @@
 #include "qemu/thread.h"
 #include "qemu/qdist.h"
 
+typedef bool (*qht_cmp_func_t)(const void *a, const void *b);
+
 struct qht {
 struct qht_map *map;
+qht_cmp_func_t cmp;
 QemuMutex lock; /* serializes setters of ht->map */
 unsigned int mode;
 };
@@ -47,10 +50,12 @@ typedef void (*qht_iter_func_t)(struct qht *ht, void *p, 
uint32_t h, void *up);
 /**
  * qht_init - Initialize a QHT
  * @ht: QHT to be initialized
+ * @cmp: default comparison function. Cannot be NULL.
  * @n_elems: number of entries the hash table should be optimized for.
  * @mode: bitmask with OR'ed QHT_MODE_*
  */
-void qht_init(struct qht *ht, size_t n_elems, unsigned int mode);
+void qht_init(struct qht *ht, qht_cmp_func_t cmp, size_t n_elems,
+  unsigned int mode);
 
 /**
  * qht_destroy - destroy a previously initialized QHT
@@ -78,11 +83,11 @@ void qht_destroy(struct qht *ht);
 bool qht_insert(struct qht *ht, void *p, uint32_t hash);
 
 /**
- * qht_lookup - Look up a pointer in a QHT
+ * qht_lookup_custom - Look up a pointer using a custom comparison function.
  * @ht: QHT to be looked up
- * @func: function to compare existing pointers against @userp
  * @userp: pointer to pass to @func
  * @hash: hash of the pointer to be looked up
+ * @func: function to compare existing pointers against @userp
  *
  * Needs to be called under an RCU read-critical section.
  *
@@ -94,8 +99,18 @@ bool qht_insert(struct qht *ht, void *p, uint32_t hash);
  * Returns the corresponding pointer when a match is found.
  * Returns NULL otherwise.
  */
-void *qht_lookup(struct qht *ht, qht_lookup_func_t func, const void *userp,
- uint32_t hash);
+void *qht_lookup_custom(struct qht *ht, const void *userp, uint32_t hash,
+qht_lookup_func_t func);
+
+/**
+ * qht_lookup - Look up a pointer in a QHT
+ * @ht: QHT to be looked up
+ * @userp: pointer to pass to the comparison function
+ * @hash: hash of the pointer to be looked up
+ *
+ * Calls qht_lookup_custom() using @ht's default comparison function.
+ */
+void *qht_lookup(struct qht *ht, const void *userp, uint32_t hash);
 
 /**
  * qht_remove - remove a pointer from the hash table
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 0b154cc..aefc682 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -296,7 +296,7 @@ struct tb_desc {
 uint32_t trace_vcpu_dstate;
 };
 
-static bool tb_cmp(const void *p, const void *d)
+static bool tb_lookup_cmp(const void *p, const void *d)
 {
 const TranslationBlock *tb = p;
 const struct tb_desc *desc = d;
@@ -341,7 +341,7 @@ TranslationBlock *tb_htable_lookup(CPUState *cpu, 
target_ulong pc,
 phys_pc = get_page_addr_code(desc.env, pc);
 desc.phys_page1 = phys_pc & TARGET_PAGE_MASK;
 h = tb_hash_func(phys_pc, pc, flags, cf_mask, *cpu->trace_dstate);
-return qht_lookup(_ctx.htable, tb_cmp, , h);
+return qht_lookup_custom(_ctx.htable, , h, tb_lookup_cmp);
 }
 
 void tb_set_jmp_target(TranslationBlock *tb, int n, uintptr_t addr)
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 732c919..5b7b91d 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -782,11 +782,25 @@ static inline void code_gen_alloc(size_t tb_size)
 qemu_mutex_init(_ctx.tb_lock);
 }
 
+static bool tb_cmp(const void *ap, const void *bp)
+{
+const TranslationBlock *a = ap;
+const TranslationBlock *b = bp;
+
+return 

[Qemu-devel] [PATCH v3 04/17] tcg: move tb_ctx.tb_phys_invalidate_count to tcg_ctx

2018-05-21 Thread Emilio G. Cota
Thereby making it per-TCGContext. Once we remove tb_lock, this will
avoid an atomic increment every time a TB is invalidated.

Reviewed-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Signed-off-by: Emilio G. Cota 
---
 include/exec/tb-context.h |  1 -
 tcg/tcg.h |  3 +++
 accel/tcg/translate-all.c |  5 +++--
 tcg/tcg.c | 14 ++
 4 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/include/exec/tb-context.h b/include/exec/tb-context.h
index d8472c8..8c9b49c 100644
--- a/include/exec/tb-context.h
+++ b/include/exec/tb-context.h
@@ -37,7 +37,6 @@ struct TBContext {
 
 /* statistics */
 unsigned tb_flush_count;
-int tb_phys_invalidate_count;
 };
 
 extern TBContext tb_ctx;
diff --git a/tcg/tcg.h b/tcg/tcg.h
index afe8492..ec8027b 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -695,6 +695,8 @@ struct TCGContext {
 /* Threshold to flush the translated code buffer.  */
 void *code_gen_highwater;
 
+size_t tb_phys_invalidate_count;
+
 /* Track which vCPU triggers events */
 CPUState *cpu;  /* *_trans */
 
@@ -868,6 +870,7 @@ size_t tcg_code_capacity(void);
 
 void tcg_tb_insert(TranslationBlock *tb);
 void tcg_tb_remove(TranslationBlock *tb);
+size_t tcg_tb_phys_invalidate_count(void);
 TranslationBlock *tcg_tb_lookup(uintptr_t tc_ptr);
 void tcg_tb_foreach(GTraverseFunc func, gpointer user_data);
 size_t tcg_nb_tbs(void);
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index e9341f3..8caf28d 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1069,7 +1069,8 @@ void tb_phys_invalidate(TranslationBlock *tb, 
tb_page_addr_t page_addr)
 /* suppress any remaining jumps to this TB */
 tb_jmp_unlink(tb);
 
-tb_ctx.tb_phys_invalidate_count++;
+atomic_set(_ctx->tb_phys_invalidate_count,
+   tcg_ctx->tb_phys_invalidate_count + 1);
 }
 
 #ifdef CONFIG_SOFTMMU
@@ -1855,7 +1856,7 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
 cpu_fprintf(f, "\nStatistics:\n");
 cpu_fprintf(f, "TB flush count  %u\n",
 atomic_read(_ctx.tb_flush_count));
-cpu_fprintf(f, "TB invalidate count %d\n", 
tb_ctx.tb_phys_invalidate_count);
+cpu_fprintf(f, "TB invalidate count %zu\n", 
tcg_tb_phys_invalidate_count());
 cpu_fprintf(f, "TLB flush count %zu\n", tlb_flush_count());
 tcg_dump_info(f, cpu_fprintf);
 }
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 62e3391..1d1dfd7 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -791,6 +791,20 @@ size_t tcg_code_capacity(void)
 return capacity;
 }
 
+size_t tcg_tb_phys_invalidate_count(void)
+{
+unsigned int n_ctxs = atomic_read(_tcg_ctxs);
+unsigned int i;
+size_t total = 0;
+
+for (i = 0; i < n_ctxs; i++) {
+const TCGContext *s = atomic_read(_ctxs[i]);
+
+total += atomic_read(>tb_phys_invalidate_count);
+}
+return total;
+}
+
 /* pool based memory allocation */
 void *tcg_malloc_internal(TCGContext *s, int size)
 {
-- 
2.7.4




[Qemu-devel] [PATCH v3 06/17] translate-all: make l1_map lockless

2018-05-21 Thread Emilio G. Cota
Groundwork for supporting parallel TCG generation.

We never remove entries from the radix tree, so we can use cmpxchg
to implement lockless insertions.

Reviewed-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Signed-off-by: Emilio G. Cota 
---
 docs/devel/multi-thread-tcg.txt |  4 ++--
 accel/tcg/translate-all.c   | 24 ++--
 2 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/docs/devel/multi-thread-tcg.txt b/docs/devel/multi-thread-tcg.txt
index a99b456..faf8918 100644
--- a/docs/devel/multi-thread-tcg.txt
+++ b/docs/devel/multi-thread-tcg.txt
@@ -134,8 +134,8 @@ tb_set_jmp_target() code. Modification to the linked lists 
that allow
 searching for linked pages are done under the protect of the
 tb_lock().
 
-The global page table is protected by the tb_lock() in system-mode and
-mmap_lock() in linux-user mode.
+The global page table is a lockless radix tree; cmpxchg is used
+to atomically insert new elements.
 
 The lookup caches are updated atomically and the lookup hash uses QHT
 which is designed for concurrent safe lookup.
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 7302d05..38e712d 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -469,20 +469,12 @@ static void page_init(void)
 #endif
 }
 
-/* If alloc=1:
- * Called with tb_lock held for system emulation.
- * Called with mmap_lock held for user-mode emulation.
- */
 static PageDesc *page_find_alloc(tb_page_addr_t index, int alloc)
 {
 PageDesc *pd;
 void **lp;
 int i;
 
-if (alloc) {
-assert_memory_lock();
-}
-
 /* Level 1.  Always allocated.  */
 lp = l1_map + ((index >> v_l1_shift) & (v_l1_size - 1));
 
@@ -491,11 +483,17 @@ static PageDesc *page_find_alloc(tb_page_addr_t index, 
int alloc)
 void **p = atomic_rcu_read(lp);
 
 if (p == NULL) {
+void *existing;
+
 if (!alloc) {
 return NULL;
 }
 p = g_new0(void *, V_L2_SIZE);
-atomic_rcu_set(lp, p);
+existing = atomic_cmpxchg(lp, NULL, p);
+if (unlikely(existing)) {
+g_free(p);
+p = existing;
+}
 }
 
 lp = p + ((index >> (i * V_L2_BITS)) & (V_L2_SIZE - 1));
@@ -503,11 +501,17 @@ static PageDesc *page_find_alloc(tb_page_addr_t index, 
int alloc)
 
 pd = atomic_rcu_read(lp);
 if (pd == NULL) {
+void *existing;
+
 if (!alloc) {
 return NULL;
 }
 pd = g_new0(PageDesc, V_L2_SIZE);
-atomic_rcu_set(lp, pd);
+existing = atomic_cmpxchg(lp, NULL, pd);
+if (unlikely(existing)) {
+g_free(pd);
+pd = existing;
+}
 }
 
 return pd + (index & (V_L2_SIZE - 1));
-- 
2.7.4




[Qemu-devel] [PATCH v3 12/17] translate-all: introduce assert_no_pages_locked

2018-05-21 Thread Emilio G. Cota
The appended adds assertions to make sure we do not longjmp with page
locks held. Note that user-mode has nothing to check, since page_locks
are !user-mode only.

Signed-off-by: Emilio G. Cota 
---
 include/exec/exec-all.h   | 8 
 accel/tcg/cpu-exec.c  | 1 +
 accel/tcg/translate-all.c | 7 +++
 3 files changed, 16 insertions(+)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 3fad93b..66902f7 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -434,6 +434,14 @@ void tb_lock(void);
 void tb_unlock(void);
 void tb_lock_reset(void);
 
+#if !defined(CONFIG_USER_ONLY) && defined(CONFIG_DEBUG_TCG)
+void assert_no_pages_locked(void);
+#else
+static inline void assert_no_pages_locked(void)
+{
+}
+#endif
+
 #if !defined(CONFIG_USER_ONLY)
 
 struct MemoryRegion *iotlb_to_region(CPUState *cpu,
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 7b934a6..ad1f0c4 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -274,6 +274,7 @@ void cpu_exec_step_atomic(CPUState *cpu)
 tcg_debug_assert(!have_mmap_lock());
 #endif
 tb_lock_reset();
+assert_no_pages_locked();
 }
 
 if (in_exclusive_region) {
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 8286203..f3a0ecb 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -658,6 +658,12 @@ do_assert_page_locked(const PageDesc *pd, const char 
*file, int line)
 
 #define assert_page_locked(pd) do_assert_page_locked(pd, __FILE__, __LINE__)
 
+void assert_no_pages_locked(void)
+{
+ht_pages_locked_debug_init();
+g_assert(g_hash_table_size(ht_pages_locked_debug) == 0);
+}
+
 #else /* !CONFIG_DEBUG_TCG */
 
 #define assert_page_locked(pd)
@@ -828,6 +834,7 @@ page_collection_lock(tb_page_addr_t start, tb_page_addr_t 
end)
 set->tree = g_tree_new_full(tb_page_addr_cmp, NULL, NULL,
 page_entry_destroy);
 set->max = NULL;
+assert_no_pages_locked();
 
  retry:
 g_tree_foreach(set->tree, page_entry_lock, NULL);
-- 
2.7.4




[Qemu-devel] [PATCH v3 07/17] translate-all: remove hole in PageDesc

2018-05-21 Thread Emilio G. Cota
Groundwork for supporting parallel TCG generation.

Move the hole to the end of the struct, so that a u32
field can be added there without bloating the struct.

Reviewed-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Signed-off-by: Emilio G. Cota 
---
 accel/tcg/translate-all.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 38e712d..07674e4 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -107,8 +107,8 @@ typedef struct PageDesc {
 #ifdef CONFIG_SOFTMMU
 /* in order to optimize self modifying code, we count the number
of lookups we do to a given page to use a bitmap */
-unsigned int code_write_count;
 unsigned long *code_bitmap;
+unsigned int code_write_count;
 #else
 unsigned long flags;
 #endif
-- 
2.7.4




[Qemu-devel] [PATCH v3 00/17] tcg: tb_lock removal redux v3

2018-05-21 Thread Emilio G. Cota
v2: https://lists.nongnu.org/archive/html/qemu-devel/2018-04/msg00656.html

Changes since v2:

- rebase onto master, fixing conflicts

- add R-b's

- add a missing page_lock to page_collection_lock

- add a couple of missing assert_page_locked assertions

- add page_lock_pair, as suggested by Alex and Richard

- use a per-thread GHashTable to keep track of locked pages

- get rid of page_collection assertions, and just export
  assert_no_pages_locked() [Alex: I removed your R-b.]

Thanks,

Emilio



Re: [Qemu-devel] [PATCH v2 1/1] tests/docker: Add a Avocado Docker test

2018-05-21 Thread Philippe Mathieu-Daudé
On 05/21/2018 07:33 PM, Alistair Francis wrote:
> On Sun, May 20, 2018 at 8:16 PM, Fam Zheng  wrote:
>> On Fri, 05/18 11:34, Alistair Francis wrote:
>>> Avocado is not trivial to setup on non-Fedora systems. To simplfying
>>> future testing add a docker test image that runs Avocado tests.
>>>
>>> Signed-off-by: Alistair Francis 
>>> ---
>>> v2:
>>>  - Add a seperate fedora-avocado Docker image
>>>  - Move the avocado vt-bootstrap into the Docker file
>>>
>>>  tests/docker/Makefile.include |  1 +
>>>  .../docker/dockerfiles/fedora-avocado.docker  | 25 +
>>>  tests/docker/test-avocado | 28 +++
>>>  3 files changed, 54 insertions(+)
>>>  create mode 100644 tests/docker/dockerfiles/fedora-avocado.docker
>>>  create mode 100755 tests/docker/test-avocado
>>>
>>> diff --git a/tests/docker/Makefile.include b/tests/docker/Makefile.include
>>> index ef1a3e62eb..0e3d108dde 100644
>>> --- a/tests/docker/Makefile.include
>>> +++ b/tests/docker/Makefile.include
>>> @@ -60,6 +60,7 @@ docker-image-debian-ppc64el-cross: docker-image-debian9
>>>  docker-image-debian-s390x-cross: docker-image-debian9
>>>  docker-image-debian-win32-cross: docker-image-debian8-mxe
>>>  docker-image-debian-win64-cross: docker-image-debian8-mxe
>>> +docker-image-fedora-avocado: docker-image-fedora
>>>  docker-image-travis: NOUSER=1
>>>
>>>  # Expand all the pre-requistes for each docker image and test combination
>>> diff --git a/tests/docker/dockerfiles/fedora-avocado.docker 
>>> b/tests/docker/dockerfiles/fedora-avocado.docker
>>> new file mode 100644
>>> index 00..55b19eebbf
>>> --- /dev/null
>>> +++ b/tests/docker/dockerfiles/fedora-avocado.docker
>>> @@ -0,0 +1,25 @@
>>> +FROM qemu:fedora
>>> +
>>> +ENV PACKAGES \
>>> +libvirt-devel \
>>> +nc \
>>> +python-avocado \
>>> +python2-devel python3-devel \
>>> +qemu-kvm \
>>> +tcpdump \
>>> +xz
>>> +ENV PIP_PACKAGES \
>>> +avocado-qemu \
>>> +avocado-framework-plugin-runner-remote \
>>> +avocado-framework-plugin-runner-vm \
>>> +avocado-framework-plugin-vt
>>> +
>>> +ENV QEMU_CONFIGURE_OPTS --python=/usr/bin/python3
>>
>> I think this is inherited from qemu:fedora, no?
> 
> It is, I have removed it.
> 
>>
>>> +
>>> +RUN dnf install -y $PACKAGES
>>> +RUN pip install $PIP_PACKAGES
>>> +RUN avocado vt-bootstrap --yes-to-all --vt-type qemu
>>> +
>>> +RUN rpm -q $PACKAGES | sort > /packages.txt
>>
>> Can you keep the parent image's list with ">>" or appending to the old 
>> $PACKAGES
>> in the above "ENV" directive?
>>
>>> +
>>> +ENV FEATURES mingw clang pyyaml asan avocado
>>
>> Similarly, is it possible to append to the parent list instead of overriding?
> 
> I have changed both of these to append to the original variables.

Feel free to add my R-b then:
Reviewed-by: Philippe Mathieu-Daudé 

I'll wait your respin to try again, hoping I got my proxy issues solved.

>>
>>> diff --git a/tests/docker/test-avocado b/tests/docker/test-avocado
>>> new file mode 100755
>>> index 00..40474db2ce
>>> --- /dev/null
>>> +++ b/tests/docker/test-avocado
>>> @@ -0,0 +1,28 @@
>>> +#!/bin/bash -e
>>> +#
>>> +# Avocado tests on Fedora, as these are a real pain on Debian systems
>>
>> Shouldn't pip packages work just well on Debian too? What are the pain?
>> (Cc'ing Cleber who may want to know this).
> 
> There is no debian package at the moment.
> 
> Alistair
> 
>>
>> Fam
>>
>>> +#
>>> +# Copyright (c) 2018 Western Digital.
>>> +#
>>> +# Authors:
>>> +#  Alistair Francis 
>>> +#
>>> +# This work is licensed under the terms of the GNU GPL, version 2
>>> +# or (at your option) any later version. See the COPYING file in
>>> +# the top-level directory.
>>> +#
>>> +# Run this test: NOUSER=1 make docker-test-avocado@fedora-avocado
>>> +
>>> +. common.rc
>>> +
>>> +requires avocado
>>> +
>>> +cd "$BUILD_DIR"
>>> +
>>> +DEF_TARGET_LIST="x86_64-softmmu"
>>> +TARGET_LIST=${TARGET_LIST:-$DEF_TARGET_LIST} \
>>> +build_qemu
>>> +install_qemu
>>> +
>>> +export PATH="${PATH}:$(pwd)"
>>> +avocado run boot --vt-qemu-bin ./x86_64-softmmu/qemu-system-x86_64
>>> --
>>> 2.17.0
>>>



Re: [Qemu-devel] [PATCH v2 1/1] tests/docker: Add a Avocado Docker test

2018-05-21 Thread Philippe Mathieu-Daudé
On 05/21/2018 07:37 PM, Alistair Francis wrote:
> On Mon, May 21, 2018 at 10:26 AM, Philippe Mathieu-Daudé
>  wrote:
>> Hi Alistair, Fam,
>>
>> On 05/21/2018 12:16 AM, Fam Zheng wrote:
>>> On Fri, 05/18 11:34, Alistair Francis wrote:
 Avocado is not trivial to setup on non-Fedora systems. To simplfying
 future testing add a docker test image that runs Avocado tests.
>>
>> Can you add an entry in the "make docker" help menu?
> 
> The one in tests/docker/Makefile.include? It seems like it's mostly
> auto generated. What do you think I should add?

Ah you right, that's fine then:

$ make docker
[...]
Available tests:
test-block test-debug test-clang test-build test-full test-avocado
test-mingw test-quick

> 
>>

 Signed-off-by: Alistair Francis 
 ---
 v2:
  - Add a seperate fedora-avocado Docker image
  - Move the avocado vt-bootstrap into the Docker file

  tests/docker/Makefile.include |  1 +
  .../docker/dockerfiles/fedora-avocado.docker  | 25 +
  tests/docker/test-avocado | 28 +++
  3 files changed, 54 insertions(+)
  create mode 100644 tests/docker/dockerfiles/fedora-avocado.docker
  create mode 100755 tests/docker/test-avocado

 diff --git a/tests/docker/Makefile.include b/tests/docker/Makefile.include
 index ef1a3e62eb..0e3d108dde 100644
 --- a/tests/docker/Makefile.include
 +++ b/tests/docker/Makefile.include
 @@ -60,6 +60,7 @@ docker-image-debian-ppc64el-cross: docker-image-debian9
  docker-image-debian-s390x-cross: docker-image-debian9
  docker-image-debian-win32-cross: docker-image-debian8-mxe
  docker-image-debian-win64-cross: docker-image-debian8-mxe
 +docker-image-fedora-avocado: docker-image-fedora
  docker-image-travis: NOUSER=1

  # Expand all the pre-requistes for each docker image and test combination
 diff --git a/tests/docker/dockerfiles/fedora-avocado.docker 
 b/tests/docker/dockerfiles/fedora-avocado.docker
 new file mode 100644
 index 00..55b19eebbf
 --- /dev/null
 +++ b/tests/docker/dockerfiles/fedora-avocado.docker
 @@ -0,0 +1,25 @@
 +FROM qemu:fedora
 +
 +ENV PACKAGES \
 +libvirt-devel \
 +nc \
 +python-avocado \
 +python2-devel python3-devel \
 +qemu-kvm \
 +tcpdump \
 +xz
 +ENV PIP_PACKAGES \
 +avocado-qemu \
 +avocado-framework-plugin-runner-remote \
 +avocado-framework-plugin-runner-vm \
 +avocado-framework-plugin-vt
 +
 +ENV QEMU_CONFIGURE_OPTS --python=/usr/bin/python3
>>>
>>> I think this is inherited from qemu:fedora, no?
>>
>> Yes.
>>
>>>
 +
 +RUN dnf install -y $PACKAGES
 +RUN pip install $PIP_PACKAGES
 +RUN avocado vt-bootstrap --yes-to-all --vt-type qemu
 +
 +RUN rpm -q $PACKAGES | sort > /packages.txt
>>>
>>> Can you keep the parent image's list with ">>" or appending to the old 
>>> $PACKAGES
>>> in the above "ENV" directive?
>>
>> Appending looks cleaner to me.
>>
>>>
 +
 +ENV FEATURES mingw clang pyyaml asan avocado
>>>
>>> Similarly, is it possible to append to the parent list instead of 
>>> overriding?
>>>
 diff --git a/tests/docker/test-avocado b/tests/docker/test-avocado
 new file mode 100755
 index 00..40474db2ce
 --- /dev/null
 +++ b/tests/docker/test-avocado
 @@ -0,0 +1,28 @@
 +#!/bin/bash -e
 +#
 +# Avocado tests on Fedora, as these are a real pain on Debian systems
>>>
>>> Shouldn't pip packages work just well on Debian too? What are the pain?
>>> (Cc'ing Cleber who may want to know this).
>>
>> Avocado isn't packaged (yet?) on Debian.
>>
>>>
>>> Fam
>>>
 +#
 +# Copyright (c) 2018 Western Digital.
 +#
 +# Authors:
 +#  Alistair Francis 
 +#
 +# This work is licensed under the terms of the GNU GPL, version 2
 +# or (at your option) any later version. See the COPYING file in
 +# the top-level directory.
 +#
 +# Run this test: NOUSER=1 make docker-test-avocado@fedora-avocado
 +
 +. common.rc
 +
 +requires avocado
 +
 +cd "$BUILD_DIR"
 +
 +DEF_TARGET_LIST="x86_64-softmmu"
 +TARGET_LIST=${TARGET_LIST:-$DEF_TARGET_LIST} \
 +build_qemu
 +install_qemu
 +
 +export PATH="${PATH}:$(pwd)"
 +avocado run boot --vt-qemu-bin ./x86_64-softmmu/qemu-system-x86_64
>>
>> This failed when testing (I suppose due to too old corporate proxy...):
>>
>> Step 7/11 : RUN avocado vt-bootstrap --yes-to-all --vt-type qemu
>>  ---> Running in 008e494971c7
>> [...]
>> 8 - Verifying (and possibly downloading) guest image
>> Verifying expected SHA1 sum from
>> http://avocado-project.org/data/assets/jeos/27/SHA1SUM_JEOS_27_64
>> Failed to get SHA1 from file: HTTP Error 403: Forbidden file type or
>> location: 

[Qemu-devel] Virtio-net drivers immune to Nethammer?

2018-05-21 Thread procmem


Hi I'm a privacy distro maintainer investigating the implications of the
newly published nethammer attack [0] on KVM guests particularly the
virtio-net drivers. The summary of the paper is that rowhammer can be
remotely triggered by feeding susceptible* network driver crafted
traffic. This attack can do all kinds of nasty things such as modifying
SSL certs on the victim system.

* Susceptible drivers are those relying on Intel CAT, uncached memory or
the clflush instruction.

My question is, do virtio-net drivers do any of these things?

***

[0] https://arxiv.org/abs/1805.04956





Re: [Qemu-devel] [PATCH v2 1/1] tests/docker: Add a Avocado Docker test

2018-05-21 Thread Alistair Francis
On Mon, May 21, 2018 at 10:26 AM, Philippe Mathieu-Daudé
 wrote:
> Hi Alistair, Fam,
>
> On 05/21/2018 12:16 AM, Fam Zheng wrote:
>> On Fri, 05/18 11:34, Alistair Francis wrote:
>>> Avocado is not trivial to setup on non-Fedora systems. To simplfying
>>> future testing add a docker test image that runs Avocado tests.
>
> Can you add an entry in the "make docker" help menu?

The one in tests/docker/Makefile.include? It seems like it's mostly
auto generated. What do you think I should add?

>
>>>
>>> Signed-off-by: Alistair Francis 
>>> ---
>>> v2:
>>>  - Add a seperate fedora-avocado Docker image
>>>  - Move the avocado vt-bootstrap into the Docker file
>>>
>>>  tests/docker/Makefile.include |  1 +
>>>  .../docker/dockerfiles/fedora-avocado.docker  | 25 +
>>>  tests/docker/test-avocado | 28 +++
>>>  3 files changed, 54 insertions(+)
>>>  create mode 100644 tests/docker/dockerfiles/fedora-avocado.docker
>>>  create mode 100755 tests/docker/test-avocado
>>>
>>> diff --git a/tests/docker/Makefile.include b/tests/docker/Makefile.include
>>> index ef1a3e62eb..0e3d108dde 100644
>>> --- a/tests/docker/Makefile.include
>>> +++ b/tests/docker/Makefile.include
>>> @@ -60,6 +60,7 @@ docker-image-debian-ppc64el-cross: docker-image-debian9
>>>  docker-image-debian-s390x-cross: docker-image-debian9
>>>  docker-image-debian-win32-cross: docker-image-debian8-mxe
>>>  docker-image-debian-win64-cross: docker-image-debian8-mxe
>>> +docker-image-fedora-avocado: docker-image-fedora
>>>  docker-image-travis: NOUSER=1
>>>
>>>  # Expand all the pre-requistes for each docker image and test combination
>>> diff --git a/tests/docker/dockerfiles/fedora-avocado.docker 
>>> b/tests/docker/dockerfiles/fedora-avocado.docker
>>> new file mode 100644
>>> index 00..55b19eebbf
>>> --- /dev/null
>>> +++ b/tests/docker/dockerfiles/fedora-avocado.docker
>>> @@ -0,0 +1,25 @@
>>> +FROM qemu:fedora
>>> +
>>> +ENV PACKAGES \
>>> +libvirt-devel \
>>> +nc \
>>> +python-avocado \
>>> +python2-devel python3-devel \
>>> +qemu-kvm \
>>> +tcpdump \
>>> +xz
>>> +ENV PIP_PACKAGES \
>>> +avocado-qemu \
>>> +avocado-framework-plugin-runner-remote \
>>> +avocado-framework-plugin-runner-vm \
>>> +avocado-framework-plugin-vt
>>> +
>>> +ENV QEMU_CONFIGURE_OPTS --python=/usr/bin/python3
>>
>> I think this is inherited from qemu:fedora, no?
>
> Yes.
>
>>
>>> +
>>> +RUN dnf install -y $PACKAGES
>>> +RUN pip install $PIP_PACKAGES
>>> +RUN avocado vt-bootstrap --yes-to-all --vt-type qemu
>>> +
>>> +RUN rpm -q $PACKAGES | sort > /packages.txt
>>
>> Can you keep the parent image's list with ">>" or appending to the old 
>> $PACKAGES
>> in the above "ENV" directive?
>
> Appending looks cleaner to me.
>
>>
>>> +
>>> +ENV FEATURES mingw clang pyyaml asan avocado
>>
>> Similarly, is it possible to append to the parent list instead of overriding?
>>
>>> diff --git a/tests/docker/test-avocado b/tests/docker/test-avocado
>>> new file mode 100755
>>> index 00..40474db2ce
>>> --- /dev/null
>>> +++ b/tests/docker/test-avocado
>>> @@ -0,0 +1,28 @@
>>> +#!/bin/bash -e
>>> +#
>>> +# Avocado tests on Fedora, as these are a real pain on Debian systems
>>
>> Shouldn't pip packages work just well on Debian too? What are the pain?
>> (Cc'ing Cleber who may want to know this).
>
> Avocado isn't packaged (yet?) on Debian.
>
>>
>> Fam
>>
>>> +#
>>> +# Copyright (c) 2018 Western Digital.
>>> +#
>>> +# Authors:
>>> +#  Alistair Francis 
>>> +#
>>> +# This work is licensed under the terms of the GNU GPL, version 2
>>> +# or (at your option) any later version. See the COPYING file in
>>> +# the top-level directory.
>>> +#
>>> +# Run this test: NOUSER=1 make docker-test-avocado@fedora-avocado
>>> +
>>> +. common.rc
>>> +
>>> +requires avocado
>>> +
>>> +cd "$BUILD_DIR"
>>> +
>>> +DEF_TARGET_LIST="x86_64-softmmu"
>>> +TARGET_LIST=${TARGET_LIST:-$DEF_TARGET_LIST} \
>>> +build_qemu
>>> +install_qemu
>>> +
>>> +export PATH="${PATH}:$(pwd)"
>>> +avocado run boot --vt-qemu-bin ./x86_64-softmmu/qemu-system-x86_64
>
> This failed when testing (I suppose due to too old corporate proxy...):
>
> Step 7/11 : RUN avocado vt-bootstrap --yes-to-all --vt-type qemu
>  ---> Running in 008e494971c7
> [...]
> 8 - Verifying (and possibly downloading) guest image
> Verifying expected SHA1 sum from
> http://avocado-project.org/data/assets/jeos/27/SHA1SUM_JEOS_27_64
> Failed to get SHA1 from file: HTTP Error 403: Forbidden file type or
> location: http://avocado-project.org/data/assets/jeos/27/SHA1SUM_JEOS_27_64
> File /var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2.xz not
> found
> Check your internet connection: HTTP Error 403: Forbidden file type or
> location: http://avocado-project.org/data/assets/jeos/27/jeos-27-64.qcow2.xz
> Uncompressing
> /var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2.xz ->

Re: [Qemu-devel] [PATCH v2 1/1] tests/docker: Add a Avocado Docker test

2018-05-21 Thread Alistair Francis
On Sun, May 20, 2018 at 8:16 PM, Fam Zheng  wrote:
> On Fri, 05/18 11:34, Alistair Francis wrote:
>> Avocado is not trivial to setup on non-Fedora systems. To simplfying
>> future testing add a docker test image that runs Avocado tests.
>>
>> Signed-off-by: Alistair Francis 
>> ---
>> v2:
>>  - Add a seperate fedora-avocado Docker image
>>  - Move the avocado vt-bootstrap into the Docker file
>>
>>  tests/docker/Makefile.include |  1 +
>>  .../docker/dockerfiles/fedora-avocado.docker  | 25 +
>>  tests/docker/test-avocado | 28 +++
>>  3 files changed, 54 insertions(+)
>>  create mode 100644 tests/docker/dockerfiles/fedora-avocado.docker
>>  create mode 100755 tests/docker/test-avocado
>>
>> diff --git a/tests/docker/Makefile.include b/tests/docker/Makefile.include
>> index ef1a3e62eb..0e3d108dde 100644
>> --- a/tests/docker/Makefile.include
>> +++ b/tests/docker/Makefile.include
>> @@ -60,6 +60,7 @@ docker-image-debian-ppc64el-cross: docker-image-debian9
>>  docker-image-debian-s390x-cross: docker-image-debian9
>>  docker-image-debian-win32-cross: docker-image-debian8-mxe
>>  docker-image-debian-win64-cross: docker-image-debian8-mxe
>> +docker-image-fedora-avocado: docker-image-fedora
>>  docker-image-travis: NOUSER=1
>>
>>  # Expand all the pre-requistes for each docker image and test combination
>> diff --git a/tests/docker/dockerfiles/fedora-avocado.docker 
>> b/tests/docker/dockerfiles/fedora-avocado.docker
>> new file mode 100644
>> index 00..55b19eebbf
>> --- /dev/null
>> +++ b/tests/docker/dockerfiles/fedora-avocado.docker
>> @@ -0,0 +1,25 @@
>> +FROM qemu:fedora
>> +
>> +ENV PACKAGES \
>> +libvirt-devel \
>> +nc \
>> +python-avocado \
>> +python2-devel python3-devel \
>> +qemu-kvm \
>> +tcpdump \
>> +xz
>> +ENV PIP_PACKAGES \
>> +avocado-qemu \
>> +avocado-framework-plugin-runner-remote \
>> +avocado-framework-plugin-runner-vm \
>> +avocado-framework-plugin-vt
>> +
>> +ENV QEMU_CONFIGURE_OPTS --python=/usr/bin/python3
>
> I think this is inherited from qemu:fedora, no?

It is, I have removed it.

>
>> +
>> +RUN dnf install -y $PACKAGES
>> +RUN pip install $PIP_PACKAGES
>> +RUN avocado vt-bootstrap --yes-to-all --vt-type qemu
>> +
>> +RUN rpm -q $PACKAGES | sort > /packages.txt
>
> Can you keep the parent image's list with ">>" or appending to the old 
> $PACKAGES
> in the above "ENV" directive?
>
>> +
>> +ENV FEATURES mingw clang pyyaml asan avocado
>
> Similarly, is it possible to append to the parent list instead of overriding?

I have changed both of these to append to the original variables.

>
>> diff --git a/tests/docker/test-avocado b/tests/docker/test-avocado
>> new file mode 100755
>> index 00..40474db2ce
>> --- /dev/null
>> +++ b/tests/docker/test-avocado
>> @@ -0,0 +1,28 @@
>> +#!/bin/bash -e
>> +#
>> +# Avocado tests on Fedora, as these are a real pain on Debian systems
>
> Shouldn't pip packages work just well on Debian too? What are the pain?
> (Cc'ing Cleber who may want to know this).

There is no debian package at the moment.

Alistair

>
> Fam
>
>> +#
>> +# Copyright (c) 2018 Western Digital.
>> +#
>> +# Authors:
>> +#  Alistair Francis 
>> +#
>> +# This work is licensed under the terms of the GNU GPL, version 2
>> +# or (at your option) any later version. See the COPYING file in
>> +# the top-level directory.
>> +#
>> +# Run this test: NOUSER=1 make docker-test-avocado@fedora-avocado
>> +
>> +. common.rc
>> +
>> +requires avocado
>> +
>> +cd "$BUILD_DIR"
>> +
>> +DEF_TARGET_LIST="x86_64-softmmu"
>> +TARGET_LIST=${TARGET_LIST:-$DEF_TARGET_LIST} \
>> +build_qemu
>> +install_qemu
>> +
>> +export PATH="${PATH}:$(pwd)"
>> +avocado run boot --vt-qemu-bin ./x86_64-softmmu/qemu-system-x86_64
>> --
>> 2.17.0
>>



[Qemu-devel] [PULL 1/3] i386: define the 'ssbd' CPUID feature bit (CVE-2018-3639)

2018-05-21 Thread Eduardo Habkost
From: Daniel P. Berrangé 

New microcode introduces the "Speculative Store Bypass Disable"
CPUID feature bit. This needs to be exposed to guest OS to allow
them to protect against CVE-2018-3639.

Signed-off-by: Daniel P. Berrangé 
Reviewed-by: Konrad Rzeszutek Wilk 
Signed-off-by: Konrad Rzeszutek Wilk 
Message-Id: <20180521215424.13520-2-berra...@redhat.com>
Signed-off-by: Eduardo Habkost 
---
 target/i386/cpu.h | 1 +
 target/i386/cpu.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 8bc54d70bf..f0b68905de 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -685,6 +685,7 @@ typedef uint32_t FeatureWordArray[FEATURE_WORDS];
 #define CPUID_7_0_EDX_AVX512_4VNNIW (1U << 2) /* AVX512 Neural Network 
Instructions */
 #define CPUID_7_0_EDX_AVX512_4FMAPS (1U << 3) /* AVX512 Multiply Accumulation 
Single Precision */
 #define CPUID_7_0_EDX_SPEC_CTRL (1U << 26) /* Speculation Control */
+#define CPUID_7_0_EDX_SPEC_CTRL_SSBD  (1U << 31) /* Speculative Store Bypass 
Disable */
 
 #define KVM_HINTS_DEDICATED (1U << 0)
 
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index e5e66a75d4..a1185b17d1 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -805,7 +805,7 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, "spec-ctrl", NULL,
-NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, "ssbd",
 },
 .cpuid_eax = 7,
 .cpuid_needs_ecx = true, .cpuid_ecx = 0,
-- 
2.14.3




[Qemu-devel] [PULL 3/3] i386: define the AMD 'virt-ssbd' CPUID feature bit (CVE-2018-3639)

2018-05-21 Thread Eduardo Habkost
From: Konrad Rzeszutek Wilk 

AMD Zen expose the Intel equivalant to Speculative Store Bypass Disable
via the 0x8008_EBX[25] CPUID feature bit.

This needs to be exposed to guest OS to allow them to protect
against CVE-2018-3639.

Signed-off-by: Konrad Rzeszutek Wilk 
Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Daniel P. Berrangé 
Message-Id: <20180521215424.13520-3-berra...@redhat.com>
Signed-off-by: Eduardo Habkost 
---
 target/i386/cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index a1185b17d1..d95310ffd4 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -836,7 +836,7 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 "ibpb", NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
-NULL, NULL, NULL, NULL,
+NULL, "virt-ssbd", NULL, NULL,
 NULL, NULL, NULL, NULL,
 },
 .cpuid_eax = 0x8008,
-- 
2.14.3




[Qemu-devel] [PULL 2/3] i386: Define the Virt SSBD MSR and handling of it (CVE-2018-3639)

2018-05-21 Thread Eduardo Habkost
From: Konrad Rzeszutek Wilk 

"Some AMD processors only support a non-architectural means of enabling
speculative store bypass disable (SSBD).  To allow a simplified view of
this to a guest, an architectural definition has been created through a new
CPUID bit, 0x8008_EBX[25], and a new MSR, 0xc001011f.  With this, a
hypervisor can virtualize the existence of this definition and provide an
architectural method for using SSBD to a guest.

Add the new CPUID feature, the new MSR and update the existing SSBD
support to use this MSR when present." (from x86/speculation: Add virtualized
speculative store bypass disable support in Linux).

Signed-off-by: Konrad Rzeszutek Wilk 
Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Daniel P. Berrangé 
Message-Id: <20180521215424.13520-4-berra...@redhat.com>
Signed-off-by: Eduardo Habkost 
---
 target/i386/cpu.h |  2 ++
 target/i386/kvm.c | 16 ++--
 target/i386/machine.c | 20 
 3 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index f0b68905de..8ac13f6c2c 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -351,6 +351,7 @@ typedef enum X86Seg {
 #define MSR_IA32_FEATURE_CONTROL0x003a
 #define MSR_TSC_ADJUST  0x003b
 #define MSR_IA32_SPEC_CTRL  0x48
+#define MSR_VIRT_SSBD   0xc001011f
 #define MSR_IA32_TSCDEADLINE0x6e0
 
 #define FEATURE_CONTROL_LOCKED(1<<0)
@@ -1210,6 +1211,7 @@ typedef struct CPUX86State {
 uint32_t pkru;
 
 uint64_t spec_ctrl;
+uint64_t virt_ssbd;
 
 /* End of state preserved by INIT (dummy marker).  */
 struct {} end_init_save;
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index da4b19..0c656a91a4 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -93,6 +93,7 @@ static bool has_msr_hv_frequencies;
 static bool has_msr_hv_reenlightenment;
 static bool has_msr_xss;
 static bool has_msr_spec_ctrl;
+static bool has_msr_virt_ssbd;
 static bool has_msr_smi_count;
 
 static uint32_t has_architectural_pmu_version;
@@ -1233,6 +1234,9 @@ static int kvm_get_supported_msrs(KVMState *s)
 case MSR_IA32_SPEC_CTRL:
 has_msr_spec_ctrl = true;
 break;
+case MSR_VIRT_SSBD:
+has_msr_virt_ssbd = true;
+break;
 }
 }
 }
@@ -1721,6 +1725,10 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 if (has_msr_spec_ctrl) {
 kvm_msr_entry_add(cpu, MSR_IA32_SPEC_CTRL, env->spec_ctrl);
 }
+if (has_msr_virt_ssbd) {
+kvm_msr_entry_add(cpu, MSR_VIRT_SSBD, env->virt_ssbd);
+}
+
 #ifdef TARGET_X86_64
 if (lm_capable_kernel) {
 kvm_msr_entry_add(cpu, MSR_CSTAR, env->cstar);
@@ -2100,8 +2108,9 @@ static int kvm_get_msrs(X86CPU *cpu)
 if (has_msr_spec_ctrl) {
 kvm_msr_entry_add(cpu, MSR_IA32_SPEC_CTRL, 0);
 }
-
-
+if (has_msr_virt_ssbd) {
+kvm_msr_entry_add(cpu, MSR_VIRT_SSBD, 0);
+}
 if (!env->tsc_valid) {
 kvm_msr_entry_add(cpu, MSR_IA32_TSC, 0);
 env->tsc_valid = !runstate_is_running();
@@ -2481,6 +2490,9 @@ static int kvm_get_msrs(X86CPU *cpu)
 case MSR_IA32_SPEC_CTRL:
 env->spec_ctrl = msrs[i].data;
 break;
+case MSR_VIRT_SSBD:
+env->virt_ssbd = msrs[i].data;
+break;
 case MSR_IA32_RTIT_CTL:
 env->msr_rtit_ctrl = msrs[i].data;
 break;
diff --git a/target/i386/machine.c b/target/i386/machine.c
index fd99c0bbb4..4d98d367c1 100644
--- a/target/i386/machine.c
+++ b/target/i386/machine.c
@@ -916,6 +916,25 @@ static const VMStateDescription vmstate_msr_intel_pt = {
 }
 };
 
+static bool virt_ssbd_needed(void *opaque)
+{
+X86CPU *cpu = opaque;
+CPUX86State *env = >env;
+
+return env->virt_ssbd != 0;
+}
+
+static const VMStateDescription vmstate_msr_virt_ssbd = {
+.name = "cpu/virt_ssbd",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = virt_ssbd_needed,
+.fields = (VMStateField[]){
+VMSTATE_UINT64(env.virt_ssbd, X86CPU),
+VMSTATE_END_OF_LIST()
+}
+};
+
 VMStateDescription vmstate_x86_cpu = {
 .name = "cpu",
 .version_id = 12,
@@ -1039,6 +1058,7 @@ VMStateDescription vmstate_x86_cpu = {
 _spec_ctrl,
 _mcg_ext_ctl,
 _msr_intel_pt,
+_msr_virt_ssbd,
 NULL
 }
 };
-- 
2.14.3




[Qemu-devel] [PULL 0/3] Speculative store buffer bypass mitigation (CVE-2018-3639)

2018-05-21 Thread Eduardo Habkost
This provides the QEMU part of the mitigations for the speculative
store buffer bypass vulnerabilities on the x86 platform[1], and is
the companion of the kernel patches merged in:

  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3b78ce4a34b761c7fe13520de822984019ff1a8f

[1] https://bugs.chromium.org/p/project-zero/issues/detail?id=1528
https://access.redhat.com/security/vulnerabilities/ssbd

The following changes since commit 9802316ed6c19fd45b4c498523df02ca370d0586:

  Merge remote-tracking branch 'remotes/mjt/tags/trivial-patches-fetch' into 
staging (2018-05-21 10:50:32 +0100)

are available in the Git repository at:

  git://github.com/ehabkost/qemu.git tags/x86-next-pull-request

for you to fetch changes up to 403503b162ffc33fb64cfefdf7b880acf41772cd:

  i386: define the AMD 'virt-ssbd' CPUID feature bit (CVE-2018-3639) 
(2018-05-21 18:59:08 -0300)


Speculative store buffer bypass mitigation (CVE-2018-3639)



Daniel P. Berrangé (1):
  i386: define the 'ssbd' CPUID feature bit (CVE-2018-3639)

Konrad Rzeszutek Wilk (2):
  i386: Define the Virt SSBD MSR and handling of it (CVE-2018-3639)
  i386: define the AMD 'virt-ssbd' CPUID feature bit (CVE-2018-3639)

 target/i386/cpu.h |  3 +++
 target/i386/cpu.c |  4 ++--
 target/i386/kvm.c | 16 ++--
 target/i386/machine.c | 20 
 4 files changed, 39 insertions(+), 4 deletions(-)

-- 
2.14.3




Re: [Qemu-devel] [PATCH 0/3] i386: speculative store buffer bypass mitigation (CVE-2018-3639)

2018-05-21 Thread Eduardo Habkost
On Mon, May 21, 2018 at 10:54:21PM +0100, Daniel P. Berrangé wrote:
> This provides the QEMU part of the mitigations for the speculative
> store buffer bypass vulnerabilities on the x86 platform[1], and is
> the companion of the kernel patches merged in:
> 
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3b78ce4a34b761c7fe13520de822984019ff1a8f
> 
> [1] https://bugs.chromium.org/p/project-zero/issues/detail?id=1528
> https://access.redhat.com/security/vulnerabilities/ssbd

Queued, but reordered patch 2 and patch 3 so the flag can't be
enabled without the corresponding MSR migration code being
available.

-- 
Eduardo



[Qemu-devel] [PATCH 0/3] i386: speculative store buffer bypass mitigation (CVE-2018-3639)

2018-05-21 Thread Daniel P . Berrangé
This provides the QEMU part of the mitigations for the speculative
store buffer bypass vulnerabilities on the x86 platform[1], and is
the companion of the kernel patches merged in:

  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3b78ce4a34b761c7fe13520de822984019ff1a8f

[1] https://bugs.chromium.org/p/project-zero/issues/detail?id=1528
https://access.redhat.com/security/vulnerabilities/ssbd

Daniel P. Berrangé (1):
  i386: define the 'ssbd' CPUID feature bit (CVE-2018-3639)

Konrad Rzeszutek Wilk (2):
  i386: define the AMD 'virt-ssbd' CPUID feature bit (CVE-2018-3639)
  i386: Define the Virt SSBD MSR and handling of it (CVE-2018-3639)

 target/i386/cpu.c |  4 ++--
 target/i386/cpu.h |  3 +++
 target/i386/kvm.c | 16 ++--
 target/i386/machine.c | 20 
 4 files changed, 39 insertions(+), 4 deletions(-)

-- 
2.17.0




[Qemu-devel] [PATCH 1/3] i386: define the 'ssbd' CPUID feature bit (CVE-2018-3639)

2018-05-21 Thread Daniel P . Berrangé
New microcode introduces the "Speculative Store Bypass Disable"
CPUID feature bit. This needs to be exposed to guest OS to allow
them to protect against CVE-2018-3639.

Signed-off-by: Daniel P. Berrangé 
Reviewed-by: Konrad Rzeszutek Wilk 
Signed-off-by: Konrad Rzeszutek Wilk 
---
 target/i386/cpu.c | 2 +-
 target/i386/cpu.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index e5e66a75d4..a1185b17d1 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -805,7 +805,7 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, "spec-ctrl", NULL,
-NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, "ssbd",
 },
 .cpuid_eax = 7,
 .cpuid_needs_ecx = true, .cpuid_ecx = 0,
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 8bc54d70bf..f0b68905de 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -685,6 +685,7 @@ typedef uint32_t FeatureWordArray[FEATURE_WORDS];
 #define CPUID_7_0_EDX_AVX512_4VNNIW (1U << 2) /* AVX512 Neural Network 
Instructions */
 #define CPUID_7_0_EDX_AVX512_4FMAPS (1U << 3) /* AVX512 Multiply Accumulation 
Single Precision */
 #define CPUID_7_0_EDX_SPEC_CTRL (1U << 26) /* Speculation Control */
+#define CPUID_7_0_EDX_SPEC_CTRL_SSBD  (1U << 31) /* Speculative Store Bypass 
Disable */
 
 #define KVM_HINTS_DEDICATED (1U << 0)
 
-- 
2.17.0




[Qemu-devel] [PATCH 3/3] i386: Define the Virt SSBD MSR and handling of it (CVE-2018-3639)

2018-05-21 Thread Daniel P . Berrangé
From: Konrad Rzeszutek Wilk 

"Some AMD processors only support a non-architectural means of enabling
speculative store bypass disable (SSBD).  To allow a simplified view of
this to a guest, an architectural definition has been created through a new
CPUID bit, 0x8008_EBX[25], and a new MSR, 0xc001011f.  With this, a
hypervisor can virtualize the existence of this definition and provide an
architectural method for using SSBD to a guest.

Add the new CPUID feature, the new MSR and update the existing SSBD
support to use this MSR when present." (from x86/speculation: Add virtualized
speculative store bypass disable support in Linux).

Signed-off-by: Konrad Rzeszutek Wilk 
Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Daniel P. Berrangé 
---
 target/i386/cpu.h |  2 ++
 target/i386/kvm.c | 16 ++--
 target/i386/machine.c | 20 
 3 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index f0b68905de..8ac13f6c2c 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -351,6 +351,7 @@ typedef enum X86Seg {
 #define MSR_IA32_FEATURE_CONTROL0x003a
 #define MSR_TSC_ADJUST  0x003b
 #define MSR_IA32_SPEC_CTRL  0x48
+#define MSR_VIRT_SSBD   0xc001011f
 #define MSR_IA32_TSCDEADLINE0x6e0
 
 #define FEATURE_CONTROL_LOCKED(1<<0)
@@ -1210,6 +1211,7 @@ typedef struct CPUX86State {
 uint32_t pkru;
 
 uint64_t spec_ctrl;
+uint64_t virt_ssbd;
 
 /* End of state preserved by INIT (dummy marker).  */
 struct {} end_init_save;
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index da4b19..0c656a91a4 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -93,6 +93,7 @@ static bool has_msr_hv_frequencies;
 static bool has_msr_hv_reenlightenment;
 static bool has_msr_xss;
 static bool has_msr_spec_ctrl;
+static bool has_msr_virt_ssbd;
 static bool has_msr_smi_count;
 
 static uint32_t has_architectural_pmu_version;
@@ -1233,6 +1234,9 @@ static int kvm_get_supported_msrs(KVMState *s)
 case MSR_IA32_SPEC_CTRL:
 has_msr_spec_ctrl = true;
 break;
+case MSR_VIRT_SSBD:
+has_msr_virt_ssbd = true;
+break;
 }
 }
 }
@@ -1721,6 +1725,10 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 if (has_msr_spec_ctrl) {
 kvm_msr_entry_add(cpu, MSR_IA32_SPEC_CTRL, env->spec_ctrl);
 }
+if (has_msr_virt_ssbd) {
+kvm_msr_entry_add(cpu, MSR_VIRT_SSBD, env->virt_ssbd);
+}
+
 #ifdef TARGET_X86_64
 if (lm_capable_kernel) {
 kvm_msr_entry_add(cpu, MSR_CSTAR, env->cstar);
@@ -2100,8 +2108,9 @@ static int kvm_get_msrs(X86CPU *cpu)
 if (has_msr_spec_ctrl) {
 kvm_msr_entry_add(cpu, MSR_IA32_SPEC_CTRL, 0);
 }
-
-
+if (has_msr_virt_ssbd) {
+kvm_msr_entry_add(cpu, MSR_VIRT_SSBD, 0);
+}
 if (!env->tsc_valid) {
 kvm_msr_entry_add(cpu, MSR_IA32_TSC, 0);
 env->tsc_valid = !runstate_is_running();
@@ -2481,6 +2490,9 @@ static int kvm_get_msrs(X86CPU *cpu)
 case MSR_IA32_SPEC_CTRL:
 env->spec_ctrl = msrs[i].data;
 break;
+case MSR_VIRT_SSBD:
+env->virt_ssbd = msrs[i].data;
+break;
 case MSR_IA32_RTIT_CTL:
 env->msr_rtit_ctrl = msrs[i].data;
 break;
diff --git a/target/i386/machine.c b/target/i386/machine.c
index fd99c0bbb4..4d98d367c1 100644
--- a/target/i386/machine.c
+++ b/target/i386/machine.c
@@ -916,6 +916,25 @@ static const VMStateDescription vmstate_msr_intel_pt = {
 }
 };
 
+static bool virt_ssbd_needed(void *opaque)
+{
+X86CPU *cpu = opaque;
+CPUX86State *env = >env;
+
+return env->virt_ssbd != 0;
+}
+
+static const VMStateDescription vmstate_msr_virt_ssbd = {
+.name = "cpu/virt_ssbd",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = virt_ssbd_needed,
+.fields = (VMStateField[]){
+VMSTATE_UINT64(env.virt_ssbd, X86CPU),
+VMSTATE_END_OF_LIST()
+}
+};
+
 VMStateDescription vmstate_x86_cpu = {
 .name = "cpu",
 .version_id = 12,
@@ -1039,6 +1058,7 @@ VMStateDescription vmstate_x86_cpu = {
 _spec_ctrl,
 _mcg_ext_ctl,
 _msr_intel_pt,
+_msr_virt_ssbd,
 NULL
 }
 };
-- 
2.17.0




[Qemu-devel] [PATCH 2/3] i386: define the AMD 'virt-ssbd' CPUID feature bit (CVE-2018-3639)

2018-05-21 Thread Daniel P . Berrangé
From: Konrad Rzeszutek Wilk 

AMD Zen expose the Intel equivalant to Speculative Store Bypass Disable
via the 0x8008_EBX[25] CPUID feature bit.

This needs to be exposed to guest OS to allow them to protect
against CVE-2018-3639.

Signed-off-by: Konrad Rzeszutek Wilk 
Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Daniel P. Berrangé 
---
 target/i386/cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index a1185b17d1..d95310ffd4 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -836,7 +836,7 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = {
 "ibpb", NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
-NULL, NULL, NULL, NULL,
+NULL, "virt-ssbd", NULL, NULL,
 NULL, NULL, NULL, NULL,
 },
 .cpuid_eax = 0x8008,
-- 
2.17.0




Re: [Qemu-devel] storing machine data in qcow images?

2018-05-21 Thread Eduardo Habkost
On Mon, May 21, 2018 at 09:18:17PM +0100, Daniel P. Berrangé wrote:
> On Fri, May 18, 2018 at 02:41:33PM -0300, Eduardo Habkost wrote:
> > On Fri, May 18, 2018 at 06:09:56PM +0100, Daniel P. Berrangé wrote:
> > > On Fri, May 18, 2018 at 06:30:38PM +0300, Michael S. Tsirkin wrote:
> > > > Hi!
> > > > Right now, QEMU supports multiple machine types within
> > > > a given architecture. This was the case for many architectures
> > > > (like ARM) for a while, somewhat more recently this is the case
> > > > for x86 with I440FX and Q35 options.
> > > > 
> > > > Unfortunately this means that it's no longer possible
> > > > to more or less reliably boot a VM just given a disk image,
> > > > even if you select the correct QEMU binary:
> > > > you must supply the correct machine type.
> > > 
> > > You must /sometimes/ supply the correct machine type.
> > > 
> > > It is quite dependent on the guest OS you have installed, and even
> > > just how the guest OS is configured.  In general Linux is very
> > > flexible and can adapt to a wide range of hardware, automatically
> > > detecting things as needed. It is possible for a sysadmin to build
> > > a Linux image in a way that would only work with I440FX, but I
> > > don't think it would be common to see that. Many distros build
> > > and distribute disk images that can work across VMWare, KVM,
> > > and VirtualBox which all have very quite different hardware.
> > > Non-x86 archs may be more fussy but I don't have personal
> > > experiance with them
> > > 
> > > Windows is probably where things get more tricky, as it is not
> > > happy with disks moving between different controller types
> > > for example, and you might trigger license activation again.
> > 
> > All I'm suggesting here is just adding extra hints that OpenStack
> > can use.
> > 
> > I have very specific goal here: the goal is to make it less
> > painful to users when OpenStack+libvirt+QEMU switch to using a
> > different machine-type by default (q35), and/or when guest OSes
> > stop supporting pc-i440fx.  I assume this is a goal for OpenStack
> > as well.
> > 
> > We can make the solution to be more extensible and solve other
> > problems as well, but my original goal is the one above.
> 
> Configuring the machine type is just one thing that users
> would do with OpenStack though.  A simple example might be
> 
> openstack image set \
>  --property hw_disk_bus=scsi \
>--property hw_vif_model=e1000e
> 
> Or if they're using libosinfo to set preferred devices 
> 
> openstack image set \
>  --property os_distro=fedora26
> 
> which will identify virtio-blk & virtio-net as disk+nic
> respectively. Using libosinfo is more flexible than setting
> the hw_disk_bus & hw_vif_model  explicitly, because libosinfo
> will report multiple devices that can be used, and the virt
> driver can then pick one which best suits the particular
> host or hypervisor.
> 
> Setting a non-default machine type is one extra prop
> 
> openstack image set \
>  --property hw_machine_type=q35
>  --property os_distro=fedora26

Nice.  Are these just hypothetical examples, or something that
already works?


> 
> So while your immediate motivation is only considering the
> machine type, from the Openstack POV thats only one property
> out of many that users might be setting.

Agreed.


> > > That said I'm not really convinced that using the qcow2 headers is
> > > a good plan. We have many disk image formats in common use, qcow2
> > > is just one. Even if the user provides the image in qcow2 format,
> > > that doesn't mean that mgmt apps actually store the qcow2 file.
> > > 
> > 
> > Why this OpenStack implementation detail matters?  Once the hints
> > are included in the input, it's up to OpenStack to choose how to
> > deal with it.
> 
> Well openstack aims to support multiple hypervisors - if there's a
> choice between implementing something that is a cross-vendor standard
> like OVF, or implementing something that only works with qcow2, the
> latter is not very appealing to support.

I still don't understand why you claim this would only work with
qcow2.  If somebody wants to implement the same functionality in
OVF, it's also possible.


> > > The closest to a cross-hypervisor standard is OVF which can store
> > > metadata about required hardware for a VM. I'm pretty sure it does
> > > not have the concept of machine types, but maybe it has a way for
> > > people to define metadata extensions. Since it is just XML at the
> > > end of the day, even if there was nothing official in OVF, it would
> > > be possible to just define a custom XML namespace and declare a
> > > schema for that to follow.
> > 
> > There's nothing preventing OVF from supporting the same kind of
> > hints.
> > 
> > I just don't think we should require people to migrate to OVF if
> > all they need is to tell OpenStack what's the recommended
> > machine-type for a guest image.
> > 
> > Requiring a different image format 

Re: [Qemu-devel] [PATCH v7 1/3] qmp: adding 'wakeup-suspend-support' in query-target

2018-05-21 Thread Eduardo Habkost
On Mon, May 21, 2018 at 04:46:36PM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 05/21/2018 03:14 PM, Eduardo Habkost wrote:
> > > Issue#2: the flag isn't a property of the target.  Due to -no-acpi, it's
> > > not even a property of the machine type.  If it was, query-machines
> > > would be the natural owner of the flag.
> > > 
> > > Perhaps query-machines is still the proper owner.  The value of
> > > wakeup-suspend-support would have to depend on -no-acpi for the machine
> > > types that honor it.  Not ideal; I'd prefer MachineInfo to be static.
> > > Tolerable?  I guess that's also a libvirt question.
> > It depends when libvirt is going to query it.  Is it OK to only
> > query it after the VM is already up and running?  If it is, then
> > we can simply expose it as a read-only property of the machine
> > object.
> > 
> > Or, if we don't want to rely on qom-get as a stable API, we can
> > add a new query command (query-machine? query-power-management?)
> > 
> In the first version this logic was included in a new query command called
> "query-wakeup-from-suspend-support":
> 
> https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg00889.html
> 
> In that review it was suggested that this logic could be a flag in either
> query-target
> or query-machines API. Before sending the v2 I sent the following comment:
> 
> "After investigating, I think that it's simpler to hook the wakeup support
> info into
> TargetInfo than MachineInfo, given that the detection I'm using for this new
> property
> is based on the current runtime state. Hooking into MachineInfo would
> require to
> change the MachineClass to add a new property, then setting it up for the
> machines
> that have the wakeup support (only x86 so far). Definitely doable, but if we
> don't
> have any favorites between MachineInfo and TargetInfo I'd rather pick the
> simpler
> route.
> 
> So, if no one objects, I'll rework this series by putting the logic inside
> query-target
> instead of a new API."

Apologies for not noticing this series months ago.  :(


> 
> Since no objection was made back then, this logic was put into query-target
> starting
> in v2. Still, I don't have any favorites though: query-target looks ok,
> query-machine
> looks ok and a new API looks ok too. It's all about what makes (more) sense
> in the
> management level, I think.

I understand the original objection from Eric: having to add a
new command for every runtime flag we want to expose to the user
looks wrong to me.

However, extending query-machines and query-target looks wrong
too, however.  query-target looks wrong because this not a
property of the target.  query-machines is wrong because this is
not a static property of the machine-type, but of the running
machine instance.

Can we have a new query command that could be an obvious
container for simple machine capabilities that are not static?  A
name like "query-machine" would be generic enough for that, I
guess.

Markus, Eric, what do you think?

-- 
Eduardo



Re: [Qemu-devel] storing machine data in qcow images?

2018-05-21 Thread Daniel P . Berrangé
On Fri, May 18, 2018 at 02:41:33PM -0300, Eduardo Habkost wrote:
> On Fri, May 18, 2018 at 06:09:56PM +0100, Daniel P. Berrangé wrote:
> > On Fri, May 18, 2018 at 06:30:38PM +0300, Michael S. Tsirkin wrote:
> > > Hi!
> > > Right now, QEMU supports multiple machine types within
> > > a given architecture. This was the case for many architectures
> > > (like ARM) for a while, somewhat more recently this is the case
> > > for x86 with I440FX and Q35 options.
> > > 
> > > Unfortunately this means that it's no longer possible
> > > to more or less reliably boot a VM just given a disk image,
> > > even if you select the correct QEMU binary:
> > > you must supply the correct machine type.
> > 
> > You must /sometimes/ supply the correct machine type.
> > 
> > It is quite dependent on the guest OS you have installed, and even
> > just how the guest OS is configured.  In general Linux is very
> > flexible and can adapt to a wide range of hardware, automatically
> > detecting things as needed. It is possible for a sysadmin to build
> > a Linux image in a way that would only work with I440FX, but I
> > don't think it would be common to see that. Many distros build
> > and distribute disk images that can work across VMWare, KVM,
> > and VirtualBox which all have very quite different hardware.
> > Non-x86 archs may be more fussy but I don't have personal
> > experiance with them
> > 
> > Windows is probably where things get more tricky, as it is not
> > happy with disks moving between different controller types
> > for example, and you might trigger license activation again.
> 
> All I'm suggesting here is just adding extra hints that OpenStack
> can use.
> 
> I have very specific goal here: the goal is to make it less
> painful to users when OpenStack+libvirt+QEMU switch to using a
> different machine-type by default (q35), and/or when guest OSes
> stop supporting pc-i440fx.  I assume this is a goal for OpenStack
> as well.
> 
> We can make the solution to be more extensible and solve other
> problems as well, but my original goal is the one above.

Configuring the machine type is just one thing that users
would do with OpenStack though.  A simple example might be

openstack image set \
 --property hw_disk_bus=scsi \
 --property hw_vif_model=e1000e

Or if they're using libosinfo to set preferred devices 

openstack image set \
 --property os_distro=fedora26

which will identify virtio-blk & virtio-net as disk+nic
respectively. Using libosinfo is more flexible than setting
the hw_disk_bus & hw_vif_model  explicitly, because libosinfo
will report multiple devices that can be used, and the virt
driver can then pick one which best suits the particular
host or hypervisor.

Setting a non-default machine type is one extra prop

openstack image set \
 --property hw_machine_type=q35
 --property os_distro=fedora26

So while your immediate motivation is only considering the
machine type, from the Openstack POV thats only one property
out of many that users might be setting.


> > That said I'm not really convinced that using the qcow2 headers is
> > a good plan. We have many disk image formats in common use, qcow2
> > is just one. Even if the user provides the image in qcow2 format,
> > that doesn't mean that mgmt apps actually store the qcow2 file.
> > 
> 
> Why this OpenStack implementation detail matters?  Once the hints
> are included in the input, it's up to OpenStack to choose how to
> deal with it.

Well openstack aims to support multiple hypervisors - if there's a
choice between implementing something that is a cross-vendor standard
like OVF, or implementing something that only works with qcow2, the
latter is not very appealing to support.

> > The closest to a cross-hypervisor standard is OVF which can store
> > metadata about required hardware for a VM. I'm pretty sure it does
> > not have the concept of machine types, but maybe it has a way for
> > people to define metadata extensions. Since it is just XML at the
> > end of the day, even if there was nothing official in OVF, it would
> > be possible to just define a custom XML namespace and declare a
> > schema for that to follow.
> 
> There's nothing preventing OVF from supporting the same kind of
> hints.
> 
> I just don't think we should require people to migrate to OVF if
> all they need is to tell OpenStack what's the recommended
> machine-type for a guest image.
> 
> Requiring a different image format seems very likely to not
> fulfill the goal I stated above: it will require using different
> tools to create the guest images, and we can't force everybody
> publishing guest images to stop using qcow2.

It doesn't have to require different tools - existing tools could
create a OVF/OVA file for the disk image as part of an "export"
process.


> > > - We most likely shouldn't get backend parameters from the image
> > > 
> > > Thoughts?
> > 
> > I tend to think we'd be better looking at what we can do in the 

Re: [Qemu-devel] [PATCH] target/arm: Honour FPCR.FZ in FRECPX

2018-05-21 Thread Richard Henderson
On 05/21/2018 10:27 AM, Peter Maydell wrote:
> The FRECPX instructions should (like most other floating point operations)
> honour the FPCR.FZ bit which specifies whether input denormals should
> be flushed to zero (or FZ16 for the half-precision version).
> We forgot to implement this, which doesn't affect the results (since
> the calculation doesn't actually care about the mantissa bits) but did
> mean we were failing to set the FPSR.IDC bit.
> 
> Signed-off-by: Peter Maydell 
> ---
>  target/arm/helper-a64.c | 6 ++
>  1 file changed, 6 insertions(+)

Reviewed-by: Richard Henderson 

r~




Re: [Qemu-devel] [PATCH v7 1/3] qmp: adding 'wakeup-suspend-support' in query-target

2018-05-21 Thread Daniel Henrique Barboza



On 05/21/2018 03:14 PM, Eduardo Habkost wrote:

Issue#2: the flag isn't a property of the target.  Due to -no-acpi, it's
not even a property of the machine type.  If it was, query-machines
would be the natural owner of the flag.

Perhaps query-machines is still the proper owner.  The value of
wakeup-suspend-support would have to depend on -no-acpi for the machine
types that honor it.  Not ideal; I'd prefer MachineInfo to be static.
Tolerable?  I guess that's also a libvirt question.

It depends when libvirt is going to query it.  Is it OK to only
query it after the VM is already up and running?  If it is, then
we can simply expose it as a read-only property of the machine
object.

Or, if we don't want to rely on qom-get as a stable API, we can
add a new query command (query-machine? query-power-management?)


In the first version this logic was included in a new query command called
"query-wakeup-from-suspend-support":

https://lists.gnu.org/archive/html/qemu-devel/2017-12/msg00889.html

In that review it was suggested that this logic could be a flag in 
either query-target

or query-machines API. Before sending the v2 I sent the following comment:

"After investigating, I think that it's simpler to hook the wakeup 
support info into
TargetInfo than MachineInfo, given that the detection I'm using for this 
new property
is based on the current runtime state. Hooking into MachineInfo would 
require to
change the MachineClass to add a new property, then setting it up for 
the machines
that have the wakeup support (only x86 so far). Definitely doable, but 
if we don't
have any favorites between MachineInfo and TargetInfo I'd rather pick 
the simpler

route.

So, if no one objects, I'll rework this series by putting the logic 
inside query-target

instead of a new API."

Since no objection was made back then, this logic was put into 
query-target starting
in v2. Still, I don't have any favorites though: query-target looks ok, 
query-machine
looks ok and a new API looks ok too. It's all about what makes (more) 
sense in the

management level, I think.


danielhb






[Qemu-devel] [PULL 12/15] xen_disk: remove use of grant map/unmap

2018-05-21 Thread Stefano Stabellini
From: Paul Durrant 

Now that the (native or emulated) xen_be_copy_grant_refs() helper is
always available, the xen_disk code can be significantly simplified by
removing direct use of grant map and unmap operations.

Signed-off-by: Paul Durrant 
Acked-by: Anthony Perard 
Signed-off-by: Stefano Stabellini 
---
 hw/block/xen_disk.c | 352 
 1 file changed, 25 insertions(+), 327 deletions(-)

diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index d3be45a..28be8b6 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -36,27 +36,9 @@
 
 /* - */
 
-static int batch_maps   = 0;
-
-/* - */
-
 #define BLOCK_SIZE  512
 #define IOCB_COUNT  (BLKIF_MAX_SEGMENTS_PER_REQUEST + 2)
 
-struct PersistentGrant {
-void *page;
-struct XenBlkDev *blkdev;
-};
-
-typedef struct PersistentGrant PersistentGrant;
-
-struct PersistentRegion {
-void *addr;
-int num;
-};
-
-typedef struct PersistentRegion PersistentRegion;
-
 struct ioreq {
 blkif_request_t req;
 int16_t status;
@@ -65,14 +47,11 @@ struct ioreq {
 off_t   start;
 QEMUIOVectorv;
 int presync;
-uint8_t mapped;
 
 /* grant mapping */
 uint32_trefs[BLKIF_MAX_SEGMENTS_PER_REQUEST];
-int prot;
 void*page[BLKIF_MAX_SEGMENTS_PER_REQUEST];
 void*pages;
-int num_unmap;
 
 /* aio status */
 int aio_inflight;
@@ -103,7 +82,6 @@ struct XenBlkDev {
 int protocol;
 blkif_back_rings_t  rings;
 int more_work;
-int cnt_map;
 
 /* request lists */
 QLIST_HEAD(inflight_head, ioreq) inflight;
@@ -114,13 +92,7 @@ struct XenBlkDev {
 int requests_finished;
 unsigned intmax_requests;
 
-/* Persistent grants extension */
 gbooleanfeature_discard;
-gbooleanfeature_persistent;
-GTree   *persistent_gnts;
-GSList  *persistent_regions;
-unsigned intpersistent_gnt_count;
-unsigned intmax_grants;
 
 /* qemu block driver */
 DriveInfo   *dinfo;
@@ -139,10 +111,8 @@ static void ioreq_reset(struct ioreq *ioreq)
 ioreq->status = 0;
 ioreq->start = 0;
 ioreq->presync = 0;
-ioreq->mapped = 0;
 
 memset(ioreq->refs, 0, sizeof(ioreq->refs));
-ioreq->prot = 0;
 memset(ioreq->page, 0, sizeof(ioreq->page));
 ioreq->pages = NULL;
 
@@ -156,37 +126,6 @@ static void ioreq_reset(struct ioreq *ioreq)
 qemu_iovec_reset(>v);
 }
 
-static gint int_cmp(gconstpointer a, gconstpointer b, gpointer user_data)
-{
-uint ua = GPOINTER_TO_UINT(a);
-uint ub = GPOINTER_TO_UINT(b);
-return (ua > ub) - (ua < ub);
-}
-
-static void destroy_grant(gpointer pgnt)
-{
-PersistentGrant *grant = pgnt;
-struct XenBlkDev *blkdev = grant->blkdev;
-struct XenDevice *xendev = >xendev;
-
-xen_be_unmap_grant_ref(xendev, grant->page);
-grant->blkdev->persistent_gnt_count--;
-xen_pv_printf(xendev, 3, "unmapped grant %p\n", grant->page);
-g_free(grant);
-}
-
-static void remove_persistent_region(gpointer data, gpointer dev)
-{
-PersistentRegion *region = data;
-struct XenBlkDev *blkdev = dev;
-struct XenDevice *xendev = >xendev;
-
-xen_be_unmap_grant_refs(xendev, region->addr, region->num);
-xen_pv_printf(xendev, 3, "unmapped grant region %p with %d pages\n",
-  region->addr, region->num);
-g_free(region);
-}
-
 static struct ioreq *ioreq_start(struct XenBlkDev *blkdev)
 {
 struct ioreq *ioreq = NULL;
@@ -254,7 +193,6 @@ static int ioreq_parse(struct ioreq *ioreq)
   ioreq->req.handle, ioreq->req.id, ioreq->req.sector_number);
 switch (ioreq->req.operation) {
 case BLKIF_OP_READ:
-ioreq->prot = PROT_WRITE; /* to memory */
 break;
 case BLKIF_OP_FLUSH_DISKCACHE:
 ioreq->presync = 1;
@@ -263,7 +201,6 @@ static int ioreq_parse(struct ioreq *ioreq)
 }
 /* fall through */
 case BLKIF_OP_WRITE:
-ioreq->prot = PROT_READ; /* from memory */
 break;
 case BLKIF_OP_DISCARD:
 return 0;
@@ -310,171 +247,6 @@ err:
 return -1;
 }
 
-static void ioreq_unmap(struct ioreq *ioreq)
-{
-struct XenBlkDev *blkdev = ioreq->blkdev;
-struct XenDevice *xendev = >xendev;
-int i;
-
-if (ioreq->num_unmap == 0 || ioreq->mapped == 0) {
-return;
-}
-if (batch_maps) {
-if (!ioreq->pages) {
-return;
-}
-xen_be_unmap_grant_refs(xendev, ioreq->pages, ioreq->num_unmap);
-

[Qemu-devel] [PULL 15/15] xen_disk: be consistent with use of xendev and blkdev->xendev

2018-05-21 Thread Stefano Stabellini
From: Paul Durrant 

Certain functions in xen_disk are called with a pointer to xendev
(struct XenDevice *). They then use container_of() to acces the surrounding
blkdev (struct XenBlkDev) but then in various places use >xendev
when use of the original xendev pointer is shorter to express and clearly
equivalent.

This patch is a purely cosmetic patch which makes sure there is a xendev
pointer on stack for any function where the pointer is need on multiple
occasions modified those functions to use it consistently.

Signed-off-by: Paul Durrant 
Acked-by: Anthony PERARD 
Signed-off-by: Stefano Stabellini 
---
 hw/block/xen_disk.c | 90 +++--
 1 file changed, 46 insertions(+), 44 deletions(-)

diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 28651c5..9fbc0cd 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -178,10 +178,11 @@ static void ioreq_release(struct ioreq *ioreq, bool 
finish)
 static int ioreq_parse(struct ioreq *ioreq)
 {
 struct XenBlkDev *blkdev = ioreq->blkdev;
+struct XenDevice *xendev = >xendev;
 size_t len;
 int i;
 
-xen_pv_printf(>xendev, 3,
+xen_pv_printf(xendev, 3,
   "op %d, nr %d, handle %d, id %" PRId64 ", sector %" PRId64 
"\n",
   ioreq->req.operation, ioreq->req.nr_segments,
   ioreq->req.handle, ioreq->req.id, ioreq->req.sector_number);
@@ -199,28 +200,28 @@ static int ioreq_parse(struct ioreq *ioreq)
 case BLKIF_OP_DISCARD:
 return 0;
 default:
-xen_pv_printf(>xendev, 0, "error: unknown operation (%d)\n",
+xen_pv_printf(xendev, 0, "error: unknown operation (%d)\n",
   ioreq->req.operation);
 goto err;
 };
 
 if (ioreq->req.operation != BLKIF_OP_READ && blkdev->mode[0] != 'w') {
-xen_pv_printf(>xendev, 0, "error: write req for ro device\n");
+xen_pv_printf(xendev, 0, "error: write req for ro device\n");
 goto err;
 }
 
 ioreq->start = ioreq->req.sector_number * blkdev->file_blk;
 for (i = 0; i < ioreq->req.nr_segments; i++) {
 if (i == BLKIF_MAX_SEGMENTS_PER_REQUEST) {
-xen_pv_printf(>xendev, 0, "error: nr_segments too big\n");
+xen_pv_printf(xendev, 0, "error: nr_segments too big\n");
 goto err;
 }
 if (ioreq->req.seg[i].first_sect > ioreq->req.seg[i].last_sect) {
-xen_pv_printf(>xendev, 0, "error: first > last sector\n");
+xen_pv_printf(xendev, 0, "error: first > last sector\n");
 goto err;
 }
 if (ioreq->req.seg[i].last_sect * BLOCK_SIZE >= XC_PAGE_SIZE) {
-xen_pv_printf(>xendev, 0, "error: page crossing\n");
+xen_pv_printf(xendev, 0, "error: page crossing\n");
 goto err;
 }
 
@@ -228,7 +229,7 @@ static int ioreq_parse(struct ioreq *ioreq)
 ioreq->size += len;
 }
 if (ioreq->start + ioreq->size > blkdev->file_size) {
-xen_pv_printf(>xendev, 0, "error: access beyond end of 
file\n");
+xen_pv_printf(xendev, 0, "error: access beyond end of file\n");
 goto err;
 }
 return 0;
@@ -244,7 +245,7 @@ static int ioreq_grant_copy(struct ioreq *ioreq)
 struct XenDevice *xendev = >xendev;
 XenGrantCopySegment segs[BLKIF_MAX_SEGMENTS_PER_REQUEST];
 int i, count, rc;
-int64_t file_blk = ioreq->blkdev->file_blk;
+int64_t file_blk = blkdev->file_blk;
 bool to_domain = (ioreq->req.operation == BLKIF_OP_READ);
 void *virt = ioreq->buf;
 
@@ -272,7 +273,7 @@ static int ioreq_grant_copy(struct ioreq *ioreq)
 rc = xen_be_copy_grant_refs(xendev, to_domain, segs, count);
 
 if (rc) {
-xen_pv_printf(>blkdev->xendev, 0,
+xen_pv_printf(xendev, 0,
   "failed to copy data %d\n", rc);
 ioreq->aio_errors++;
 return -1;
@@ -287,11 +288,12 @@ static void qemu_aio_complete(void *opaque, int ret)
 {
 struct ioreq *ioreq = opaque;
 struct XenBlkDev *blkdev = ioreq->blkdev;
+struct XenDevice *xendev = >xendev;
 
 aio_context_acquire(blkdev->ctx);
 
 if (ret != 0) {
-xen_pv_printf(>xendev, 0, "%s I/O error\n",
+xen_pv_printf(xendev, 0, "%s I/O error\n",
   ioreq->req.operation == BLKIF_OP_READ ? "read" : 
"write");
 ioreq->aio_errors++;
 }
@@ -625,16 +627,17 @@ static void blk_alloc(struct XenDevice *xendev)
 
 static void blk_parse_discard(struct XenBlkDev *blkdev)
 {
+struct XenDevice *xendev = >xendev;
 int enable;
 
 blkdev->feature_discard = true;
 
-if (xenstore_read_be_int(>xendev, "discard-enable", ) == 0) 
{
+if (xenstore_read_be_int(xendev, "discard-enable", ) == 0) {
 blkdev->feature_discard = !!enable;
 }
 
 if (blkdev->feature_discard) {
-

Re: [Qemu-devel] [PATCH 01/27] memory.h: Improve IOMMU related documentation

2018-05-21 Thread Richard Henderson
On 05/21/2018 07:03 AM, Peter Maydell wrote:
> Add more detail to the documentation for memory_region_init_iommu()
> and other IOMMU-related functions and data structures.
> 
> Signed-off-by: Peter Maydell 
> ---
> v2->v3 changes:
>  * minor wording tweaks per Eric's review
>  * moved the bit about requirements to notify out from the translate
>method docs to the top level class doc comment
>  * added description of flags argument and in particular that it's
>just an optimization and callers can pass IOMMU_NONE to get the
>full permissions
> v1 -> v2 changes:
>  * documented replay method
>  * added note about wanting RCU or big qemu lock while calling
>translate
> ---
>  include/exec/memory.h | 105 ++
>  1 file changed, 95 insertions(+), 10 deletions(-)

Reviewed-by: Richard Henderson 

r~



[Qemu-devel] [PULL 09/15] xen_disk: remove open-coded use of libxengnttab

2018-05-21 Thread Stefano Stabellini
From: Paul Durrant 

Now that helpers are present in xen_backend, this patch removes open-coded
calls to libxengnttab from the xen_disk code.

This patch also fixes one whitspace error in the assignment of the
XenDevOps initialise method.

Signed-off-by: Paul Durrant 
Acked-by: Anthony Perard 
Signed-off-by: Stefano Stabellini 
---
 hw/block/xen_disk.c | 122 ++--
 1 file changed, 32 insertions(+), 90 deletions(-)

diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 78bfb41..d3be45a 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -68,7 +68,6 @@ struct ioreq {
 uint8_t mapped;
 
 /* grant mapping */
-uint32_tdomids[BLKIF_MAX_SEGMENTS_PER_REQUEST];
 uint32_trefs[BLKIF_MAX_SEGMENTS_PER_REQUEST];
 int prot;
 void*page[BLKIF_MAX_SEGMENTS_PER_REQUEST];
@@ -142,7 +141,6 @@ static void ioreq_reset(struct ioreq *ioreq)
 ioreq->presync = 0;
 ioreq->mapped = 0;
 
-memset(ioreq->domids, 0, sizeof(ioreq->domids));
 memset(ioreq->refs, 0, sizeof(ioreq->refs));
 ioreq->prot = 0;
 memset(ioreq->page, 0, sizeof(ioreq->page));
@@ -168,16 +166,12 @@ static gint int_cmp(gconstpointer a, gconstpointer b, 
gpointer user_data)
 static void destroy_grant(gpointer pgnt)
 {
 PersistentGrant *grant = pgnt;
-xengnttab_handle *gnt = grant->blkdev->xendev.gnttabdev;
+struct XenBlkDev *blkdev = grant->blkdev;
+struct XenDevice *xendev = >xendev;
 
-if (xengnttab_unmap(gnt, grant->page, 1) != 0) {
-xen_pv_printf(>blkdev->xendev, 0,
-  "xengnttab_unmap failed: %s\n",
-  strerror(errno));
-}
+xen_be_unmap_grant_ref(xendev, grant->page);
 grant->blkdev->persistent_gnt_count--;
-xen_pv_printf(>blkdev->xendev, 3,
-  "unmapped grant %p\n", grant->page);
+xen_pv_printf(xendev, 3, "unmapped grant %p\n", grant->page);
 g_free(grant);
 }
 
@@ -185,15 +179,10 @@ static void remove_persistent_region(gpointer data, 
gpointer dev)
 {
 PersistentRegion *region = data;
 struct XenBlkDev *blkdev = dev;
-xengnttab_handle *gnt = blkdev->xendev.gnttabdev;
+struct XenDevice *xendev = >xendev;
 
-if (xengnttab_unmap(gnt, region->addr, region->num) != 0) {
-xen_pv_printf(>xendev, 0,
-  "xengnttab_unmap region %p failed: %s\n",
-  region->addr, strerror(errno));
-}
-xen_pv_printf(>xendev, 3,
-  "unmapped grant region %p with %d pages\n",
+xen_be_unmap_grant_refs(xendev, region->addr, region->num);
+xen_pv_printf(xendev, 3, "unmapped grant region %p with %d pages\n",
   region->addr, region->num);
 g_free(region);
 }
@@ -304,7 +293,6 @@ static int ioreq_parse(struct ioreq *ioreq)
 goto err;
 }
 
-ioreq->domids[i] = blkdev->xendev.dom;
 ioreq->refs[i]   = ioreq->req.seg[i].gref;
 
 mem = ioreq->req.seg[i].first_sect * blkdev->file_blk;
@@ -324,7 +312,8 @@ err:
 
 static void ioreq_unmap(struct ioreq *ioreq)
 {
-xengnttab_handle *gnt = ioreq->blkdev->xendev.gnttabdev;
+struct XenBlkDev *blkdev = ioreq->blkdev;
+struct XenDevice *xendev = >xendev;
 int i;
 
 if (ioreq->num_unmap == 0 || ioreq->mapped == 0) {
@@ -334,11 +323,7 @@ static void ioreq_unmap(struct ioreq *ioreq)
 if (!ioreq->pages) {
 return;
 }
-if (xengnttab_unmap(gnt, ioreq->pages, ioreq->num_unmap) != 0) {
-xen_pv_printf(>blkdev->xendev, 0,
-  "xengnttab_unmap failed: %s\n",
-  strerror(errno));
-}
+xen_be_unmap_grant_refs(xendev, ioreq->pages, ioreq->num_unmap);
 ioreq->blkdev->cnt_map -= ioreq->num_unmap;
 ioreq->pages = NULL;
 } else {
@@ -346,11 +331,7 @@ static void ioreq_unmap(struct ioreq *ioreq)
 if (!ioreq->page[i]) {
 continue;
 }
-if (xengnttab_unmap(gnt, ioreq->page[i], 1) != 0) {
-xen_pv_printf(>blkdev->xendev, 0,
-  "xengnttab_unmap failed: %s\n",
-  strerror(errno));
-}
+xen_be_unmap_grant_ref(xendev, ioreq->page[i]);
 ioreq->blkdev->cnt_map--;
 ioreq->page[i] = NULL;
 }
@@ -360,14 +341,14 @@ static void ioreq_unmap(struct ioreq *ioreq)
 
 static int ioreq_map(struct ioreq *ioreq)
 {
-xengnttab_handle *gnt = ioreq->blkdev->xendev.gnttabdev;
-uint32_t domids[BLKIF_MAX_SEGMENTS_PER_REQUEST];
+struct XenBlkDev *blkdev = ioreq->blkdev;
+struct XenDevice *xendev = >xendev;
 uint32_t refs[BLKIF_MAX_SEGMENTS_PER_REQUEST];
 void *page[BLKIF_MAX_SEGMENTS_PER_REQUEST];
 int i, j, new_maps = 0;

[Qemu-devel] [PULL 03/15] configure: Add explanation for --enable-xen-pci-passthrough

2018-05-21 Thread Stefano Stabellini
From: Anthony PERARD 

Signed-off-by: Anthony PERARD 
Reviewed-by: Markus Armbruster 
Signed-off-by: Stefano Stabellini 
---
 configure | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure b/configure
index 59f91ab..a8498ab 100755
--- a/configure
+++ b/configure
@@ -1588,7 +1588,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   virtfs  VirtFS
   mpath   Multipath persistent reservation passthrough
   xen xen backend driver support
-  xen-pci-passthrough
+  xen-pci-passthroughPCI passthrough support for Xen
   brlapi  BrlAPI (Braile)
   curlcurl connectivity
   membarrier  membarrier system call (for Linux 4.14+ or Windows)
-- 
1.9.1




[Qemu-devel] [PULL 10/15] xen: remove other open-coded use of libxengnttab

2018-05-21 Thread Stefano Stabellini
From: Paul Durrant 

Now that helpers are available in xen_backend, use them throughout all
Xen PV backends.

Signed-off-by: Paul Durrant 
Acked-by: Anthony Perard 
Signed-off-by: Stefano Stabellini 
---
 hw/9pfs/xen-9p-backend.c | 32 +++-
 hw/char/xen_console.c|  9 -
 hw/net/xen_nic.c | 33 ++---
 hw/usb/xen-usb.c | 37 +
 4 files changed, 50 insertions(+), 61 deletions(-)

diff --git a/hw/9pfs/xen-9p-backend.c b/hw/9pfs/xen-9p-backend.c
index 95e50c4..6026780 100644
--- a/hw/9pfs/xen-9p-backend.c
+++ b/hw/9pfs/xen-9p-backend.c
@@ -331,14 +331,14 @@ static int xen_9pfs_free(struct XenDevice *xendev)
 
 for (i = 0; i < xen_9pdev->num_rings; i++) {
 if (xen_9pdev->rings[i].data != NULL) {
-xengnttab_unmap(xen_9pdev->xendev.gnttabdev,
-xen_9pdev->rings[i].data,
-(1 << xen_9pdev->rings[i].ring_order));
+xen_be_unmap_grant_refs(_9pdev->xendev,
+xen_9pdev->rings[i].data,
+(1 << xen_9pdev->rings[i].ring_order));
 }
 if (xen_9pdev->rings[i].intf != NULL) {
-xengnttab_unmap(xen_9pdev->xendev.gnttabdev,
-xen_9pdev->rings[i].intf,
-1);
+xen_be_unmap_grant_refs(_9pdev->xendev,
+xen_9pdev->rings[i].intf,
+1);
 }
 if (xen_9pdev->rings[i].bh != NULL) {
 qemu_bh_delete(xen_9pdev->rings[i].bh);
@@ -390,11 +390,10 @@ static int xen_9pfs_connect(struct XenDevice *xendev)
 }
 g_free(str);
 
-xen_9pdev->rings[i].intf =  xengnttab_map_grant_ref(
-xen_9pdev->xendev.gnttabdev,
-xen_9pdev->xendev.dom,
-xen_9pdev->rings[i].ref,
-PROT_READ | PROT_WRITE);
+xen_9pdev->rings[i].intf =
+xen_be_map_grant_ref(_9pdev->xendev,
+ xen_9pdev->rings[i].ref,
+ PROT_READ | PROT_WRITE);
 if (!xen_9pdev->rings[i].intf) {
 goto out;
 }
@@ -403,12 +402,11 @@ static int xen_9pfs_connect(struct XenDevice *xendev)
 goto out;
 }
 xen_9pdev->rings[i].ring_order = ring_order;
-xen_9pdev->rings[i].data = xengnttab_map_domain_grant_refs(
-xen_9pdev->xendev.gnttabdev,
-(1 << ring_order),
-xen_9pdev->xendev.dom,
-xen_9pdev->rings[i].intf->ref,
-PROT_READ | PROT_WRITE);
+xen_9pdev->rings[i].data =
+xen_be_map_grant_refs(_9pdev->xendev,
+  xen_9pdev->rings[i].intf->ref,
+  (1 << ring_order),
+  PROT_READ | PROT_WRITE);
 if (!xen_9pdev->rings[i].data) {
 goto out;
 }
diff --git a/hw/char/xen_console.c b/hw/char/xen_console.c
index bdfaa40..8b4b4bf 100644
--- a/hw/char/xen_console.c
+++ b/hw/char/xen_console.c
@@ -233,12 +233,11 @@ static int con_initialise(struct XenDevice *xendev)
 if (!xendev->dev) {
 xen_pfn_t mfn = con->ring_ref;
 con->sring = xenforeignmemory_map(xen_fmem, con->xendev.dom,
-  PROT_READ|PROT_WRITE,
+  PROT_READ | PROT_WRITE,
   1, , NULL);
 } else {
-con->sring = xengnttab_map_grant_ref(xendev->gnttabdev, 
con->xendev.dom,
- con->ring_ref,
- PROT_READ|PROT_WRITE);
+con->sring = xen_be_map_grant_ref(xendev, con->ring_ref,
+  PROT_READ | PROT_WRITE);
 }
 if (!con->sring)
return -1;
@@ -267,7 +266,7 @@ static void con_disconnect(struct XenDevice *xendev)
 if (!xendev->dev) {
 xenforeignmemory_unmap(xen_fmem, con->sring, 1);
 } else {
-xengnttab_unmap(xendev->gnttabdev, con->sring, 1);
+xen_be_unmap_grant_ref(xendev, con->sring);
 }
 con->sring = NULL;
 }
diff --git a/hw/net/xen_nic.c b/hw/net/xen_nic.c
index 20c43a6..46a8dbf 100644
--- a/hw/net/xen_nic.c
+++ b/hw/net/xen_nic.c
@@ -160,9 +160,8 @@ static void net_tx_packets(struct XenNetDev *netdev)
   (txreq.flags & NETTXF_more_data)  ? " more_data" 
 : "",
   (txreq.flags & NETTXF_extra_info) ? " 
extra_info" : "");
 
-page = xengnttab_map_grant_ref(netdev->xendev.gnttabdev,
-   netdev->xendev.dom,
-

[Qemu-devel] [PULL 06/15] All the xen stable APIs define handle types of the form:

2018-05-21 Thread Stefano Stabellini
From: Paul Durrant 

xen_handle

and some define additional handle types of the form:

xen__handle

Examples of these are xenforeignmemory_handle and
xenforeignmemory_resource_handle.

Both of these types will be misparsed by checkpatch if they appear as the
first token in a line since, as types defined by an external library, they
do not conform to the QEMU CODING_STYLE, which suggests CamelCase.

A previous patch (5ac067a24a8) added xendevicemodel_handle to the list
of types. This patch changes that to xen\w+_handle such that it will
match all Xen stable API handles of the forms detailed above.

Signed-off-by: Paul Durrant 
Reviewed-by: Eric Blake 
Signed-off-by: Stefano Stabellini 
---
 dtc   | 2 +-
 scripts/checkpatch.pl | 2 +-
 ui/keycodemapdb   | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/dtc b/dtc
index e543880..558cd81 16
--- a/dtc
+++ b/dtc
@@ -1 +1 @@
-Subproject commit e54388015af1fb4bf04d0bca99caba1074d9cc42
+Subproject commit 558cd81bdd432769b59bff01240c44f82cfb1a9d
diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index cb1b652..e3d8c2c 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -271,7 +271,7 @@ our @typeList = (
qr{hwaddr},
 # external libraries
qr{xml${Ident}},
-   qr{xendevicemodel_handle},
+   qr{xen\w+_handle},
# Glib definitions
qr{gchar},
qr{gshort},
diff --git a/ui/keycodemapdb b/ui/keycodemapdb
index 6b3d716..10739aa 16
--- a/ui/keycodemapdb
+++ b/ui/keycodemapdb
@@ -1 +1 @@
-Subproject commit 6b3d716e2b6472eb7189d3220552280ef3d832ce
+Subproject commit 10739aa26051a5d49d88132604539d3ed085e72e
-- 
1.9.1




[Qemu-devel] [PULL 14/15] xen_disk: use a single entry iovec

2018-05-21 Thread Stefano Stabellini
From: Paul Durrant 

Since xen_disk now always copies data to and from a guest there is no need
to maintain a vector entry corresponding to every page of a request.
This means there is less per-request state to maintain so the ioreq
structure can shrink significantly.

Signed-off-by: Paul Durrant 
Acked-by: Anthony Perard 
Signed-off-by: Stefano Stabellini 
---
 hw/block/xen_disk.c | 76 +++--
 1 file changed, 21 insertions(+), 55 deletions(-)

diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 28be8b6..28651c5 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -46,13 +46,10 @@ struct ioreq {
 /* parsed request */
 off_t   start;
 QEMUIOVectorv;
+void*buf;
+size_t  size;
 int presync;
 
-/* grant mapping */
-uint32_trefs[BLKIF_MAX_SEGMENTS_PER_REQUEST];
-void*page[BLKIF_MAX_SEGMENTS_PER_REQUEST];
-void*pages;
-
 /* aio status */
 int aio_inflight;
 int aio_errors;
@@ -110,12 +107,10 @@ static void ioreq_reset(struct ioreq *ioreq)
 memset(>req, 0, sizeof(ioreq->req));
 ioreq->status = 0;
 ioreq->start = 0;
+ioreq->buf = NULL;
+ioreq->size = 0;
 ioreq->presync = 0;
 
-memset(ioreq->refs, 0, sizeof(ioreq->refs));
-memset(ioreq->page, 0, sizeof(ioreq->page));
-ioreq->pages = NULL;
-
 ioreq->aio_inflight = 0;
 ioreq->aio_errors = 0;
 
@@ -138,7 +133,7 @@ static struct ioreq *ioreq_start(struct XenBlkDev *blkdev)
 ioreq = g_malloc0(sizeof(*ioreq));
 ioreq->blkdev = blkdev;
 blkdev->requests_total++;
-qemu_iovec_init(>v, BLKIF_MAX_SEGMENTS_PER_REQUEST);
+qemu_iovec_init(>v, 1);
 } else {
 /* get one from freelist */
 ioreq = QLIST_FIRST(>freelist);
@@ -183,7 +178,6 @@ static void ioreq_release(struct ioreq *ioreq, bool finish)
 static int ioreq_parse(struct ioreq *ioreq)
 {
 struct XenBlkDev *blkdev = ioreq->blkdev;
-uintptr_t mem;
 size_t len;
 int i;
 
@@ -230,13 +224,10 @@ static int ioreq_parse(struct ioreq *ioreq)
 goto err;
 }
 
-ioreq->refs[i]   = ioreq->req.seg[i].gref;
-
-mem = ioreq->req.seg[i].first_sect * blkdev->file_blk;
 len = (ioreq->req.seg[i].last_sect - ioreq->req.seg[i].first_sect + 1) 
* blkdev->file_blk;
-qemu_iovec_add(>v, (void*)mem, len);
+ioreq->size += len;
 }
-if (ioreq->start + ioreq->v.size > blkdev->file_size) {
+if (ioreq->start + ioreq->size > blkdev->file_size) {
 xen_pv_printf(>xendev, 0, "error: access beyond end of 
file\n");
 goto err;
 }
@@ -247,35 +238,6 @@ err:
 return -1;
 }
 
-static void ioreq_free_copy_buffers(struct ioreq *ioreq)
-{
-int i;
-
-for (i = 0; i < ioreq->v.niov; i++) {
-ioreq->page[i] = NULL;
-}
-
-qemu_vfree(ioreq->pages);
-}
-
-static int ioreq_init_copy_buffers(struct ioreq *ioreq)
-{
-int i;
-
-if (ioreq->v.niov == 0) {
-return 0;
-}
-
-ioreq->pages = qemu_memalign(XC_PAGE_SIZE, ioreq->v.niov * XC_PAGE_SIZE);
-
-for (i = 0; i < ioreq->v.niov; i++) {
-ioreq->page[i] = ioreq->pages + i * XC_PAGE_SIZE;
-ioreq->v.iov[i].iov_base = ioreq->page[i];
-}
-
-return 0;
-}
-
 static int ioreq_grant_copy(struct ioreq *ioreq)
 {
 struct XenBlkDev *blkdev = ioreq->blkdev;
@@ -284,25 +246,27 @@ static int ioreq_grant_copy(struct ioreq *ioreq)
 int i, count, rc;
 int64_t file_blk = ioreq->blkdev->file_blk;
 bool to_domain = (ioreq->req.operation == BLKIF_OP_READ);
+void *virt = ioreq->buf;
 
-if (ioreq->v.niov == 0) {
+if (ioreq->req.nr_segments == 0) {
 return 0;
 }
 
-count = ioreq->v.niov;
+count = ioreq->req.nr_segments;
 
 for (i = 0; i < count; i++) {
 if (to_domain) {
-segs[i].dest.foreign.ref = ioreq->refs[i];
+segs[i].dest.foreign.ref = ioreq->req.seg[i].gref;
 segs[i].dest.foreign.offset = ioreq->req.seg[i].first_sect * 
file_blk;
-segs[i].source.virt = ioreq->v.iov[i].iov_base;
+segs[i].source.virt = virt;
 } else {
-segs[i].source.foreign.ref = ioreq->refs[i];
+segs[i].source.foreign.ref = ioreq->req.seg[i].gref;
 segs[i].source.foreign.offset = ioreq->req.seg[i].first_sect * 
file_blk;
-segs[i].dest.virt = ioreq->v.iov[i].iov_base;
+segs[i].dest.virt = virt;
 }
 segs[i].len = (ioreq->req.seg[i].last_sect
- ioreq->req.seg[i].first_sect + 1) * file_blk;
+virt += segs[i].len;
 }
 
 rc = xen_be_copy_grant_refs(xendev, to_domain, segs, count);
@@ -348,14 +312,14 @@ static void 

[Qemu-devel] [PULL 07/15] xen: add a meaningful declaration of grant_copy_segment into xen_common.h

2018-05-21 Thread Stefano Stabellini
From: Paul Durrant 

Currently the xen_disk source has to carry #ifdef exclusions to compile
against Xen older then 4.8. This is a bit messy so this patch lifts the
definition of struct xengnttab_grant_copy_segment and adds it into the
pre-4.8 compat area in xen_common.h, which allows xen_disk to be cleaned
up.

Signed-off-by: Paul Durrant 
Acked-by: Anthony PERARD 
Signed-off-by: Stefano Stabellini 
---
 hw/block/xen_disk.c | 18 --
 include/hw/xen/xen_common.h | 17 +++--
 2 files changed, 15 insertions(+), 20 deletions(-)

diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index f74fcd4..78bfb41 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -496,8 +496,6 @@ static int ioreq_map(struct ioreq *ioreq)
 return 0;
 }
 
-#if CONFIG_XEN_CTRL_INTERFACE_VERSION >= 40800
-
 static void ioreq_free_copy_buffers(struct ioreq *ioreq)
 {
 int i;
@@ -579,22 +577,6 @@ static int ioreq_grant_copy(struct ioreq *ioreq)
 
 return rc;
 }
-#else
-static void ioreq_free_copy_buffers(struct ioreq *ioreq)
-{
-abort();
-}
-
-static int ioreq_init_copy_buffers(struct ioreq *ioreq)
-{
-abort();
-}
-
-static int ioreq_grant_copy(struct ioreq *ioreq)
-{
-abort();
-}
-#endif
 
 static int ioreq_runio_qemu_aio(struct ioreq *ioreq);
 
diff --git a/include/hw/xen/xen_common.h b/include/hw/xen/xen_common.h
index 5f1402b..bbf207d 100644
--- a/include/hw/xen/xen_common.h
+++ b/include/hw/xen/xen_common.h
@@ -667,8 +667,21 @@ static inline int xen_domain_create(xc_interface *xc, 
uint32_t ssidref,
 
 #if CONFIG_XEN_CTRL_INTERFACE_VERSION < 40800
 
-
-typedef void *xengnttab_grant_copy_segment_t;
+struct xengnttab_grant_copy_segment {
+union xengnttab_copy_ptr {
+void *virt;
+struct {
+uint32_t ref;
+uint16_t offset;
+uint16_t domid;
+} foreign;
+} source, dest;
+uint16_t len;
+uint16_t flags;
+int16_t status;
+};
+
+typedef struct xengnttab_grant_copy_segment xengnttab_grant_copy_segment_t;
 
 static inline int xengnttab_grant_copy(xengnttab_handle *xgt, uint32_t count,
xengnttab_grant_copy_segment_t *segs)
-- 
1.9.1




[Qemu-devel] [PULL 13/15] xen_backend: make the xen_feature_grant_copy flag private

2018-05-21 Thread Stefano Stabellini
From: Paul Durrant 

There is no longer any use of this flag outside of the xen_backend code.

Signed-off-by: Paul Durrant 
Acked-by: Anthony Perard 
Signed-off-by: Stefano Stabellini 
---
 hw/xen/xen_backend.c | 2 +-
 include/hw/xen/xen_backend.h | 1 -
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/xen/xen_backend.c b/hw/xen/xen_backend.c
index 3c3fc2c..9a8e877 100644
--- a/hw/xen/xen_backend.c
+++ b/hw/xen/xen_backend.c
@@ -44,9 +44,9 @@ BusState *xen_sysbus;
 /* public */
 struct xs_handle *xenstore = NULL;
 const char *xen_protocol;
-bool xen_feature_grant_copy;
 
 /* private */
+static bool xen_feature_grant_copy;
 static int debug;
 
 int xenstore_write_be_str(struct XenDevice *xendev, const char *node, const 
char *val)
diff --git a/include/hw/xen/xen_backend.h b/include/hw/xen/xen_backend.h
index 29bf1c3..9c17fdd 100644
--- a/include/hw/xen/xen_backend.h
+++ b/include/hw/xen/xen_backend.h
@@ -16,7 +16,6 @@
 /* variables */
 extern struct xs_handle *xenstore;
 extern const char *xen_protocol;
-extern bool xen_feature_grant_copy;
 extern DeviceState *xen_sysdev;
 extern BusState *xen_sysbus;
 
-- 
1.9.1




[Qemu-devel] [PULL 05/15] xen-hvm: create separate function for ioreq server initialization

2018-05-21 Thread Stefano Stabellini
From: Paul Durrant 

The code is sufficiently substantial that it improves code readability
to put it in a new function called by xen_hvm_init() rather than having
it inline.

Signed-off-by: Paul Durrant 
Reviewed-by: Anthony Perard 
Signed-off-by: Stefano Stabellini 
---
 hw/i386/xen/xen-hvm.c | 76 +++
 1 file changed, 46 insertions(+), 30 deletions(-)

diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index caa563b..6ffa3c2 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -95,7 +95,8 @@ typedef struct XenIOState {
 CPUState **cpu_by_vcpu_id;
 /* the evtchn port for polling the notification, */
 evtchn_port_t *ioreq_local_port;
-/* evtchn local port for buffered io */
+/* evtchn remote and local ports for buffered io */
+evtchn_port_t bufioreq_remote_port;
 evtchn_port_t bufioreq_local_port;
 /* the evtchn fd for polling */
 xenevtchn_handle *xce_handle;
@@ -1236,12 +1237,52 @@ static void xen_wakeup_notifier(Notifier *notifier, 
void *data)
 xc_set_hvm_param(xen_xc, xen_domid, HVM_PARAM_ACPI_S_STATE, 0);
 }
 
-void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
+static int xen_map_ioreq_server(XenIOState *state)
 {
-int i, rc;
 xen_pfn_t ioreq_pfn;
 xen_pfn_t bufioreq_pfn;
 evtchn_port_t bufioreq_evtchn;
+int rc;
+
+rc = xen_get_ioreq_server_info(xen_domid, state->ioservid,
+   _pfn, _pfn,
+   _evtchn);
+if (rc < 0) {
+error_report("failed to get ioreq server info: error %d handle=%p",
+ errno, xen_xc);
+return rc;
+}
+
+DPRINTF("shared page at pfn %lx\n", ioreq_pfn);
+DPRINTF("buffered io page at pfn %lx\n", bufioreq_pfn);
+DPRINTF("buffered io evtchn is %x\n", bufioreq_evtchn);
+
+state->shared_page = xenforeignmemory_map(xen_fmem, xen_domid,
+  PROT_READ | PROT_WRITE,
+  1, _pfn, NULL);
+if (state->shared_page == NULL) {
+error_report("map shared IO page returned error %d handle=%p",
+ errno, xen_xc);
+return -1;
+}
+
+state->buffered_io_page = xenforeignmemory_map(xen_fmem, xen_domid,
+   PROT_READ | PROT_WRITE,
+   1, _pfn, NULL);
+if (state->buffered_io_page == NULL) {
+error_report("map buffered IO page returned error %d", errno);
+return -1;
+}
+
+state->bufioreq_remote_port = bufioreq_evtchn;
+
+return 0;
+}
+
+void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
+{
+int i, rc;
+xen_pfn_t ioreq_pfn;
 XenIOState *state;
 
 state = g_malloc0(sizeof (XenIOState));
@@ -1269,25 +1310,8 @@ void xen_hvm_init(PCMachineState *pcms, MemoryRegion 
**ram_memory)
 state->wakeup.notify = xen_wakeup_notifier;
 qemu_register_wakeup_notifier(>wakeup);
 
-rc = xen_get_ioreq_server_info(xen_domid, state->ioservid,
-   _pfn, _pfn,
-   _evtchn);
+rc = xen_map_ioreq_server(state);
 if (rc < 0) {
-error_report("failed to get ioreq server info: error %d handle=%p",
- errno, xen_xc);
-goto err;
-}
-
-DPRINTF("shared page at pfn %lx\n", ioreq_pfn);
-DPRINTF("buffered io page at pfn %lx\n", bufioreq_pfn);
-DPRINTF("buffered io evtchn is %x\n", bufioreq_evtchn);
-
-state->shared_page = xenforeignmemory_map(xen_fmem, xen_domid,
-  PROT_READ|PROT_WRITE,
-  1, _pfn, NULL);
-if (state->shared_page == NULL) {
-error_report("map shared IO page returned error %d handle=%p",
- errno, xen_xc);
 goto err;
 }
 
@@ -1308,14 +1332,6 @@ void xen_hvm_init(PCMachineState *pcms, MemoryRegion 
**ram_memory)
 goto err;
 }
 
-state->buffered_io_page = xenforeignmemory_map(xen_fmem, xen_domid,
-   PROT_READ|PROT_WRITE,
-   1, _pfn, NULL);
-if (state->buffered_io_page == NULL) {
-error_report("map buffered IO page returned error %d", errno);
-goto err;
-}
-
 /* Note: cpus is empty at this point in init */
 state->cpu_by_vcpu_id = g_malloc0(max_cpus * sizeof(CPUState *));
 
@@ -1340,7 +1356,7 @@ void xen_hvm_init(PCMachineState *pcms, MemoryRegion 
**ram_memory)
 }
 
 rc = xenevtchn_bind_interdomain(state->xce_handle, xen_domid,
-bufioreq_evtchn);
+state->bufioreq_remote_port);
 if (rc == -1) {
 

[Qemu-devel] [PULL 02/15] xen/pt: use address_space_memory object for memory region hooks

2018-05-21 Thread Stefano Stabellini
From: Igor Druzhinin 

Commit 99605175c (xen-pt: Fix PCI devices re-attach failed) introduced
a subtle bug. As soon as the guest switches off Bus Mastering on the
device it immediately causes all the BARs be unmapped due to the DMA
address space of the device being changed. This is undesired behavior
because the guest may try to communicate with the device after that
which triggers the following errors in the logs:

[00:05.0] xen_pt_bar_read: Error: Should not read BAR through QEMU. 
@0x0200
[00:05.0] xen_pt_bar_write: Error: Should not write BAR through QEMU. 
@0x0200

The issue that the original patch tried to workaround (uneven number of
region_add/del calls on device attach/detach) was fixed in d25836cafd
(memory: do explicit cleanup when remove listeners).

Signed-off-by: Igor Druzhinin 
Reported-by: Ross Lagerwall 
Acked-by: Anthony PERARD 
Signed-off-by: Stefano Stabellini 
---
 hw/xen/xen_pt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
index 9b7a960..e5a6eff 100644
--- a/hw/xen/xen_pt.c
+++ b/hw/xen/xen_pt.c
@@ -907,7 +907,7 @@ out:
 }
 }
 
-memory_listener_register(>memory_listener, >dev.bus_master_as);
+memory_listener_register(>memory_listener, _space_memory);
 memory_listener_register(>io_listener, _space_io);
 s->listener_set = true;
 XEN_PT_LOG(d,
-- 
1.9.1




[Qemu-devel] [PULL 11/15] xen_backend: add an emulation of grant copy

2018-05-21 Thread Stefano Stabellini
From: Paul Durrant 

Not all Xen environments support the xengnttab_grant_copy() operation.
E.g. where the OS is FreeBSD or Xen is older than 4.8.0.

This patch introduces an emulation of that operation using
xengnttab_map_domain_grant_refs() and memcpy() for those environments.

Signed-off-by: Paul Durrant 
Acked-by: Anthony PERARD 
Signed-off-by: Stefano Stabellini 
---
 hw/xen/xen_backend.c | 53 
 1 file changed, 53 insertions(+)

diff --git a/hw/xen/xen_backend.c b/hw/xen/xen_backend.c
index 50412d6..3c3fc2c 100644
--- a/hw/xen/xen_backend.c
+++ b/hw/xen/xen_backend.c
@@ -146,6 +146,55 @@ void xen_be_unmap_grant_refs(struct XenDevice *xendev, 
void *ptr,
 }
 }
 
+static int compat_copy_grant_refs(struct XenDevice *xendev,
+  bool to_domain,
+  XenGrantCopySegment segs[],
+  unsigned int nr_segs)
+{
+uint32_t *refs = g_new(uint32_t, nr_segs);
+int prot = to_domain ? PROT_WRITE : PROT_READ;
+void *pages;
+unsigned int i;
+
+for (i = 0; i < nr_segs; i++) {
+XenGrantCopySegment *seg = [i];
+
+refs[i] = to_domain ?
+seg->dest.foreign.ref : seg->source.foreign.ref;
+}
+
+pages = xengnttab_map_domain_grant_refs(xendev->gnttabdev, nr_segs,
+xen_domid, refs, prot);
+if (!pages) {
+xen_pv_printf(xendev, 0,
+  "xengnttab_map_domain_grant_refs failed: %s\n",
+  strerror(errno));
+g_free(refs);
+return -1;
+}
+
+for (i = 0; i < nr_segs; i++) {
+XenGrantCopySegment *seg = [i];
+void *page = pages + (i * XC_PAGE_SIZE);
+
+if (to_domain) {
+memcpy(page + seg->dest.foreign.offset, seg->source.virt,
+   seg->len);
+} else {
+memcpy(seg->dest.virt, page + seg->source.foreign.offset,
+   seg->len);
+}
+}
+
+if (xengnttab_unmap(xendev->gnttabdev, pages, nr_segs)) {
+xen_pv_printf(xendev, 0, "xengnttab_unmap failed: %s\n",
+  strerror(errno));
+}
+
+g_free(refs);
+return 0;
+}
+
 int xen_be_copy_grant_refs(struct XenDevice *xendev,
bool to_domain,
XenGrantCopySegment segs[],
@@ -157,6 +206,10 @@ int xen_be_copy_grant_refs(struct XenDevice *xendev,
 
 assert(xendev->ops->flags & DEVOPS_FLAG_NEED_GNTDEV);
 
+if (!xen_feature_grant_copy) {
+return compat_copy_grant_refs(xendev, to_domain, segs, nr_segs);
+}
+
 xengnttab_segs = g_new0(xengnttab_grant_copy_segment_t, nr_segs);
 
 for (i = 0; i < nr_segs; i++) {
-- 
1.9.1




[Qemu-devel] [PULL 01/15] xen-pvdevice: Introduce a simplistic xen-pvdevice save state

2018-05-21 Thread Stefano Stabellini
From: Igor Druzhinin 

This should help to avoid problems with accessing the device after
migration/resume without PV drivers by migrating its PCI configuration
space state. Without an explicitly defined state record it resets
every time a VM migrates which confuses the OS and makes every
access to xen-pvdevice MMIO region to fail. PV tools enable some
logic to save and restore PCI configuration state from within the VM
every time it migrates which basically hides the issue.

Older systems will acquire the new record when migrated which should
not change their state for worse.

Signed-off-by: Igor Druzhinin 
Reviewed-by: Paul Durrant 
Acked-by: Anthony PERARD 
Signed-off-by: Stefano Stabellini 
---
 hw/i386/xen/xen_pvdevice.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/hw/i386/xen/xen_pvdevice.c b/hw/i386/xen/xen_pvdevice.c
index f748823..a146f18 100644
--- a/hw/i386/xen/xen_pvdevice.c
+++ b/hw/i386/xen/xen_pvdevice.c
@@ -71,6 +71,16 @@ static const MemoryRegionOps xen_pv_mmio_ops = {
 .endianness = DEVICE_LITTLE_ENDIAN,
 };
 
+static const VMStateDescription vmstate_xen_pvdevice = {
+.name = "xen-pvdevice",
+.version_id = 1,
+.minimum_version_id = 1,
+.fields = (VMStateField[]) {
+VMSTATE_PCI_DEVICE(parent_obj, XenPVDevice),
+VMSTATE_END_OF_LIST()
+}
+};
+
 static void xen_pv_realize(PCIDevice *pci_dev, Error **errp)
 {
 XenPVDevice *d = XEN_PV_DEVICE(pci_dev);
@@ -120,6 +130,7 @@ static void xen_pv_class_init(ObjectClass *klass, void 
*data)
 k->class_id = PCI_CLASS_SYSTEM_OTHER;
 dc->desc = "Xen PV Device";
 dc->props = xen_pv_props;
+dc->vmsd = _xen_pvdevice;
 }
 
 static const TypeInfo xen_pv_type_info = {
-- 
1.9.1




[Qemu-devel] [PULL 08/15] xen_backend: add grant table helpers

2018-05-21 Thread Stefano Stabellini
From: Paul Durrant 

This patch adds grant table helper functions to the xen_backend code to
localize error reporting and use of xen_domid.

The patch also defers the call to xengnttab_open() until just before the
initialise method in XenDevOps is invoked. This method is responsible for
mapping the shared ring. No prior method requires access to the grant table.

Signed-off-by: Paul Durrant 
Acked-by: Anthony PERARD 
Signed-off-by: Stefano Stabellini 
---
 hw/xen/xen_backend.c | 123 ++-
 include/hw/xen/xen_backend.h |  33 
 2 files changed, 144 insertions(+), 12 deletions(-)

diff --git a/hw/xen/xen_backend.c b/hw/xen/xen_backend.c
index 7445b50..50412d6 100644
--- a/hw/xen/xen_backend.c
+++ b/hw/xen/xen_backend.c
@@ -106,6 +106,103 @@ int xen_be_set_state(struct XenDevice *xendev, enum 
xenbus_state state)
 return 0;
 }
 
+void xen_be_set_max_grant_refs(struct XenDevice *xendev,
+   unsigned int nr_refs)
+{
+assert(xendev->ops->flags & DEVOPS_FLAG_NEED_GNTDEV);
+
+if (xengnttab_set_max_grants(xendev->gnttabdev, nr_refs)) {
+xen_pv_printf(xendev, 0, "xengnttab_set_max_grants failed: %s\n",
+  strerror(errno));
+}
+}
+
+void *xen_be_map_grant_refs(struct XenDevice *xendev, uint32_t *refs,
+unsigned int nr_refs, int prot)
+{
+void *ptr;
+
+assert(xendev->ops->flags & DEVOPS_FLAG_NEED_GNTDEV);
+
+ptr = xengnttab_map_domain_grant_refs(xendev->gnttabdev, nr_refs,
+  xen_domid, refs, prot);
+if (!ptr) {
+xen_pv_printf(xendev, 0,
+  "xengnttab_map_domain_grant_refs failed: %s\n",
+  strerror(errno));
+}
+
+return ptr;
+}
+
+void xen_be_unmap_grant_refs(struct XenDevice *xendev, void *ptr,
+ unsigned int nr_refs)
+{
+assert(xendev->ops->flags & DEVOPS_FLAG_NEED_GNTDEV);
+
+if (xengnttab_unmap(xendev->gnttabdev, ptr, nr_refs)) {
+xen_pv_printf(xendev, 0, "xengnttab_unmap failed: %s\n",
+  strerror(errno));
+}
+}
+
+int xen_be_copy_grant_refs(struct XenDevice *xendev,
+   bool to_domain,
+   XenGrantCopySegment segs[],
+   unsigned int nr_segs)
+{
+xengnttab_grant_copy_segment_t *xengnttab_segs;
+unsigned int i;
+int rc;
+
+assert(xendev->ops->flags & DEVOPS_FLAG_NEED_GNTDEV);
+
+xengnttab_segs = g_new0(xengnttab_grant_copy_segment_t, nr_segs);
+
+for (i = 0; i < nr_segs; i++) {
+XenGrantCopySegment *seg = [i];
+xengnttab_grant_copy_segment_t *xengnttab_seg = _segs[i];
+
+if (to_domain) {
+xengnttab_seg->flags = GNTCOPY_dest_gref;
+xengnttab_seg->dest.foreign.domid = xen_domid;
+xengnttab_seg->dest.foreign.ref = seg->dest.foreign.ref;
+xengnttab_seg->dest.foreign.offset = seg->dest.foreign.offset;
+xengnttab_seg->source.virt = seg->source.virt;
+} else {
+xengnttab_seg->flags = GNTCOPY_source_gref;
+xengnttab_seg->source.foreign.domid = xen_domid;
+xengnttab_seg->source.foreign.ref = seg->source.foreign.ref;
+xengnttab_seg->source.foreign.offset =
+seg->source.foreign.offset;
+xengnttab_seg->dest.virt = seg->dest.virt;
+}
+
+xengnttab_seg->len = seg->len;
+}
+
+rc = xengnttab_grant_copy(xendev->gnttabdev, nr_segs, xengnttab_segs);
+
+if (rc) {
+xen_pv_printf(xendev, 0, "xengnttab_copy failed: %s\n",
+  strerror(errno));
+}
+
+for (i = 0; i < nr_segs; i++) {
+xengnttab_grant_copy_segment_t *xengnttab_seg =
+_segs[i];
+
+if (xengnttab_seg->status != GNTST_okay) {
+xen_pv_printf(xendev, 0, "segment[%u] status: %d\n", i,
+  xengnttab_seg->status);
+rc = -1;
+}
+}
+
+g_free(xengnttab_segs);
+return rc;
+}
+
 /*
  * get xen backend device, allocate a new one if it doesn't exist.
  */
@@ -149,18 +246,6 @@ static struct XenDevice *xen_be_get_xendev(const char 
*type, int dom, int dev,
 }
 qemu_set_cloexec(xenevtchn_fd(xendev->evtchndev));
 
-if (ops->flags & DEVOPS_FLAG_NEED_GNTDEV) {
-xendev->gnttabdev = xengnttab_open(NULL, 0);
-if (xendev->gnttabdev == NULL) {
-xen_pv_printf(NULL, 0, "can't open gnttab device\n");
-xenevtchn_close(xendev->evtchndev);
-qdev_unplug(DEVICE(xendev), NULL);
-return NULL;
-}
-} else {
-xendev->gnttabdev = NULL;
-}
-
 xen_pv_insert_xendev(xendev);
 
 if (xendev->ops->alloc) {
@@ -322,6 +407,16 @@ static int xen_be_try_initialise(struct 

[Qemu-devel] [PULL 04/15] xen_pt: Present the size of 64 bit BARs correctly

2018-05-21 Thread Stefano Stabellini
From: Ross Lagerwall 

The full size of the BAR is stored in the lower PCIIORegion.size. The
upper PCIIORegion.size is 0.  Calculate the size of the upper half
correctly from the lower half otherwise the size read by the guest will
be incorrect.

Signed-off-by: Ross Lagerwall 
Acked-by: Anthony PERARD 
Signed-off-by: Stefano Stabellini 
---
 hw/xen/xen_pt_config_init.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index a3ce33e..aee31c6 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -504,6 +504,8 @@ static int xen_pt_bar_reg_write(XenPCIPassthroughState *s, 
XenPTReg *cfg_entry,
 bar_ro_mask = XEN_PT_BAR_IO_RO_MASK | (r_size - 1);
 break;
 case XEN_PT_BAR_FLAG_UPPER:
+assert(index > 0);
+r_size = d->io_regions[index - 1].size >> 32;
 bar_emu_mask = XEN_PT_BAR_ALLF;
 bar_ro_mask = r_size ? r_size - 1 : 0;
 break;
-- 
1.9.1




[Qemu-devel] [PULL 00/15] xen-20180521-tag

2018-05-21 Thread Stefano Stabellini
The following changes since commit d32e41a1188e929cc0fb16829ce3736046951e39:

  Merge remote-tracking branch 
'remotes/famz/tags/docker-and-block-pull-request' into staging (2018-05-18 
14:11:52 +0100)

are available in the git repository at:


  http://xenbits.xenproject.org/git-http/people/sstabellini/qemu-dm.git 
tags/xen-20180521-tag

for you to fetch changes up to f03df99f09ee0ca27ea2298a1b77438e7999044d:

  xen_disk: be consistent with use of xendev and blkdev->xendev (2018-05-18 
11:13:01 -0700)


Xen 2018/05/21


Anthony PERARD (1):
  configure: Add explanation for --enable-xen-pci-passthrough

Igor Druzhinin (2):
  xen-pvdevice: Introduce a simplistic xen-pvdevice save state
  xen/pt: use address_space_memory object for memory region hooks

Paul Durrant (11):
  xen-hvm: create separate function for ioreq server initialization
  All the xen stable APIs define handle types of the form:
  xen: add a meaningful declaration of grant_copy_segment into xen_common.h
  xen_backend: add grant table helpers
  xen_disk: remove open-coded use of libxengnttab
  xen: remove other open-coded use of libxengnttab
  xen_backend: add an emulation of grant copy
  xen_disk: remove use of grant map/unmap
  xen_backend: make the xen_feature_grant_copy flag private
  xen_disk: use a single entry iovec
  xen_disk: be consistent with use of xendev and blkdev->xendev

Ross Lagerwall (1):
  xen_pt: Present the size of 64 bit BARs correctly

 configure|   2 +-
 dtc  |   2 +-
 hw/9pfs/xen-9p-backend.c |  32 ++-
 hw/block/xen_disk.c  | 614 +++
 hw/char/xen_console.c|   9 +-
 hw/i386/xen/xen-hvm.c|  76 +++---
 hw/i386/xen/xen_pvdevice.c   |  11 +
 hw/net/xen_nic.c |  33 +--
 hw/usb/xen-usb.c |  37 ++-
 hw/xen/xen_backend.c | 178 -
 hw/xen/xen_pt.c  |   2 +-
 hw/xen/xen_pt_config_init.c  |   2 +
 include/hw/xen/xen_backend.h |  34 ++-
 include/hw/xen/xen_common.h  |  17 +-
 scripts/checkpatch.pl|   2 +-
 ui/keycodemapdb  |   2 +-
 16 files changed, 429 insertions(+), 624 deletions(-)



Re: [Qemu-devel] storing machine data in qcow images?

2018-05-21 Thread Eduardo Habkost
On Mon, May 21, 2018 at 07:44:40PM +0100, Daniel P. Berrangé wrote:
> On Mon, May 21, 2018 at 03:29:28PM -0300, Eduardo Habkost wrote:
> > On Sat, May 19, 2018 at 08:05:06AM +0200, Markus Armbruster wrote:
> > > Eduardo Habkost  writes:
> > > 
> > > [...]
> > > > About being more expressive than just a single list of key,value
> > > > pairs, I don't see any evidence of that being necessary for the
> > > > problems we're trying to address.
> > > 
> > > Short history of a configuration format you might have encountered:
> 
> [snip]
> 
> > > How confident are we a single list of (key, value) is really all we're
> > > going to need?
> > > 
> > > Even if we think it is, would it be possible to provide for a future
> > > extension to trees at next to no cost?
> > 
> > I'm confident that a list of key,values is all we need for the
> > current problem.
> 
> I'm not convinced. A disk image may work with Q35 or i440fx,  or
> work with any of virtio, ide or sata disk. So that already means
> values have to be arrays, not scalars. You could do that with a
> simple key,value list, but only by defining a mapping of arrays
> into a flattened form. eg do we allow repeated keys, or do we
> allow array indexes on keys. 

No problem, we can support trees if it's necessary.


> > The point here is to allow users to simply copy an existing disk
> > image, and it will contain enough hints for a cloud stack to
> > choose reasonable defaults for machine-type and disk type
> > automatically.  Requiring the user to perform a separate step to
> > encapsulate the disk image in another file format defeats the
> > whole purpose of the proposal.
> 
> It doesn't have to mean more work for the user - the application
> that is used to create the image can do that on their behalf.
> oVirt for example can import/export OVA files, containing OVF
> metadata. I could imagine virt-manager, and other tools adding
> export ability without much trouble if this was deemed a desirable
> thing. Bundling gives ability to have multiple disk images in one
> archive, which is something OVF does.

I have the impression that "the application that is used to
create the image" is a very large set.  It can be virt-manager,
virt-install, virt-manager, or even QEMU itself.

Today people can simply create a VM on virt-manager, or run QEMU
manually, and upload the qcow2 image directly from its original
location (they don't need to copy/export it).  Don't we want the
same procedure to keep working instead of requiring users to use
another tool?

-- 
Eduardo



Re: [Qemu-devel] storing machine data in qcow images?

2018-05-21 Thread Daniel P . Berrangé
On Mon, May 21, 2018 at 03:29:28PM -0300, Eduardo Habkost wrote:
> On Sat, May 19, 2018 at 08:05:06AM +0200, Markus Armbruster wrote:
> > Eduardo Habkost  writes:
> > 
> > [...]
> > > About being more expressive than just a single list of key,value
> > > pairs, I don't see any evidence of that being necessary for the
> > > problems we're trying to address.
> > 
> > Short history of a configuration format you might have encountered:

[snip]

> > How confident are we a single list of (key, value) is really all we're
> > going to need?
> > 
> > Even if we think it is, would it be possible to provide for a future
> > extension to trees at next to no cost?
> 
> I'm confident that a list of key,values is all we need for the
> current problem.

I'm not convinced. A disk image may work with Q35 or i440fx,  or
work with any of virtio, ide or sata disk. So that already means
values have to be arrays, not scalars. You could do that with a
simple key,value list, but only by defining a mapping of arrays
into a flattened form. eg do we allow repeated keys, or do we
allow array indexes on keys. 

> The point here is to allow users to simply copy an existing disk
> image, and it will contain enough hints for a cloud stack to
> choose reasonable defaults for machine-type and disk type
> automatically.  Requiring the user to perform a separate step to
> encapsulate the disk image in another file format defeats the
> whole purpose of the proposal.

It doesn't have to mean more work for the user - the application
that is used to create the image can do that on their behalf.
oVirt for example can import/export OVA files, containing OVF
metadata. I could imagine virt-manager, and other tools adding
export ability without much trouble if this was deemed a desirable
thing. Bundling gives ability to have multiple disk images in one
archive, which is something OVF does.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-devel] storing machine data in qcow images?

2018-05-21 Thread Eduardo Habkost
On Sat, May 19, 2018 at 08:05:06AM +0200, Markus Armbruster wrote:
> Eduardo Habkost  writes:
> 
> [...]
> > About being more expressive than just a single list of key,value
> > pairs, I don't see any evidence of that being necessary for the
> > problems we're trying to address.
> 
> Short history of a configuration format you might have encountered:
> 
> 1. A couple of (key, value) is all we ne need for the problems we're
> trying to address.  (v0.4, 2003)
> 
> 2.1. I got this one special snowflake problem where I actually need a few
> related values.  Fortunately, this little ad hoc parser can take apart
> the key's single value easily.  (ca. v0.8, 2005)
> 
> ...
> 
> 2.n. Snowflakes are surprisingly common, but fortunately one more little
> ad hoc parser can't hurt.
> 
> 3. Umm, this is getting messy.  Let's have proper infrastructure for
> two-level keys.  Surely two levels are all we ne need for the problems
> we're trying to address.  Fortunately, we can bolt them on without too
> much trouble.  (v0.12, 2009)
> 
> 4. Err, trees, I'm afraid we actually need trees.  Fortunately, we can
> hack them into the existing two-level infrastructure without too much
> trouble.  (v1.3, 2013)
> 
> 5. You are in a maze of twisting little passages, all different.
> (today)
> 
> 
> How confident are we a single list of (key, value) is really all we're
> going to need?
> 
> Even if we think it is, would it be possible to provide for a future
> extension to trees at next to no cost?

I'm confident that a list of key,values is all we need for the
current problem.

I also agree that being possible to represent trees is a good
idea, and it would probably have next to no cost.

But I disagree if the point here is "we will eventually need much
more complex data in the future, so let's require users to move
to OVF instead".

The point here is to allow users to simply copy an existing disk
image, and it will contain enough hints for a cloud stack to
choose reasonable defaults for machine-type and disk type
automatically.  Requiring the user to perform a separate step to
encapsulate the disk image in another file format defeats the
whole purpose of the proposal.

-- 
Eduardo



Re: [Qemu-devel] [qemu PATCH v4 2/4] tests/.gitignore: add entry for generated file

2018-05-21 Thread Eric Blake

On 05/21/2018 12:32 PM, Philippe Mathieu-Daudé wrote:



 tests/test-block-backend




+test-block-backend
   test-blockjob
   test-blockjob-txn
   test-bufferiszero


What about using gitignore negated pattern in tests/?


Or, what we've threatened to do in the past: rename all unit tests to 
the pattern *-test instead of test-*, as a suffix is a lot easier to 
exclude via glob than a prefix.  And while we're renaming things, sort 
tests into separate subdirectories according to whether they are run as 
part of 'make check-unit' or 'make check-qtest'.  But until someone does 
that work, tweaking the .gitignore for individual tests as they keep 
getting added is no worse than what we've been doing.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-devel] [PATCH v7 1/3] qmp: adding 'wakeup-suspend-support' in query-target

2018-05-21 Thread Eduardo Habkost
On Fri, May 18, 2018 at 10:48:31AM +0200, Markus Armbruster wrote:
> Cc'ing a few more people.
> 
> Daniel Henrique Barboza  writes:
> 
> > When issuing the qmp/hmp 'system_wakeup' command, what happens in a
> > nutshell is:
> >
> > - qmp_system_wakeup_request set runstate to RUNNING, sets a wakeup_reason
> > and notify the event
> > - in the main_loop, all vcpus are paused, a system reset is issued, all
> > subscribers of wakeup_notifiers receives a notification, vcpus are then
> > resumed and the wake up QAPI event is fired
> >
> > Note that this procedure alone doesn't ensure that the guest will awake
> > from SUSPENDED state - the subscribers of the wake up event must take
> > action to resume the guest, otherwise the guest will simply reboot.
> >
> > At this moment there are only two subscribers of the wake up event: one
> > in hw/acpi/core.c and another one in hw/i386/xen/xen-hvm.c. This means
> > that system_wakeup does not work as intended with other architectures.
> >
> > However, only the presence of 'system_wakeup' is required for QGA to
> > support 'guest-suspend-ram' and 'guest-suspend-hybrid' at this moment.
> > This means that the user/management will expect to suspend the guest using
> > one of those suspend commands and then resume execution using system_wakeup,
> > regardless of the support offered in system_wakeup in the first place.
> >
> > This patch adds a new flag called 'wakeup-suspend-support' in TargetInfo
> > that allows the caller to query if the guest supports wake up from
> > suspend via system_wakeup. It goes over the subscribers of the wake up
> > event and, if it's empty, it assumes that the guest does not support
> > wake up from suspend (and thus, pm-suspend itself).
> >
> > This is the expected output of query-target when running a x86 guest:
> >
> > {"execute" : "query-target"}
> > {"return": {"arch": "x86_64", "wakeup-suspend-support": true}}
> >
> > This is the output when running a pseries guest:
> >
> > {"execute" : "query-target"}
> > {"return": {"arch": "ppc64", "wakeup-suspend-support": false}}
> >
> > Given that the TargetInfo structure is read-only, adding a new flag to
> > it is backwards compatible. There is no need to deprecate the old
> > TargetInfo format.
> >
> > With this extra tool, management can avoid situations where a guest
> > that does not have proper suspend/wake capabilities ends up in
> > inconsistent state (e.g.
> > https://github.com/open-power-host-os/qemu/issues/31).
> >
> > Reported-by: Balamuruhan S 
> > Signed-off-by: Daniel Henrique Barboza 
> 
> Is query-target is the right place to carry this flag?  v7 is rather
> late for this kind of question; my sincere apologies.
[...]
> 
> Issue#2: the flag isn't a property of the target.  Due to -no-acpi, it's
> not even a property of the machine type.  If it was, query-machines
> would be the natural owner of the flag.
> 
> Perhaps query-machines is still the proper owner.  The value of
> wakeup-suspend-support would have to depend on -no-acpi for the machine
> types that honor it.  Not ideal; I'd prefer MachineInfo to be static.
> Tolerable?  I guess that's also a libvirt question.

It depends when libvirt is going to query it.  Is it OK to only
query it after the VM is already up and running?  If it is, then
we can simply expose it as a read-only property of the machine
object.

Or, if we don't want to rely on qom-get as a stable API, we can
add a new query command (query-machine? query-power-management?)

-- 
Eduardo



Re: [Qemu-devel] [PATCH v3 0/8] linux-user: move socket.h definitions to CPU directories

2018-05-21 Thread no-reply
Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20180519092956.15134-1-laur...@vivier.eu
Subject: [Qemu-devel] [PATCH v3 0/8] linux-user: move socket.h definitions to 
CPU directories

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]   patchew/20180519092956.15134-1-laur...@vivier.eu -> 
patchew/20180519092956.15134-1-laur...@vivier.eu
Switched to a new branch 'test'
4302e15b3c linux-user: define TARGET_SO_REUSEPORT
79f9d462e0 linux-user: copy sparc/sockbits.h definitions from linux
4d4d80f6dd linux-user: update ARCH_HAS_SOCKET_TYPES use
f90eb19225 linux-user: move ppc socket.h definitions to ppc/sockbits.h
a159233782 linux-user: move socket.h generic definitions to generic/sockbits.h
7bda50c2b9 linux-user: move sparc/sparc64 socket.h definitions to 
sparc/sockbits.h
1e5e7d107e linux-user: move alpha socket.h definitions to alpha/sockbits.h
9675bdc0c5 linux-user: move mips socket.h definitions to mips/sockbits.h

=== OUTPUT BEGIN ===
Checking PATCH 1/8: linux-user: move mips socket.h definitions to 
mips/sockbits.h...
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#17: 
new file mode 100644

ERROR: if this code is redundant consider removing it
#52: FILE: linux-user/mips/sockbits.h:31:
+#if 0

total: 1 errors, 1 warnings, 227 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 2/8: linux-user: move alpha socket.h definitions to 
alpha/sockbits.h...
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#18: 
new file mode 100644

total: 0 errors, 1 warnings, 224 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 3/8: linux-user: move sparc/sparc64 socket.h definitions to 
sparc/sockbits.h...
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#74: 
new file mode 100644

total: 0 errors, 1 warnings, 146 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 4/8: linux-user: move socket.h generic definitions to 
generic/sockbits.h...
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#14: 
new file mode 100644

total: 0 errors, 1 warnings, 148 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 5/8: linux-user: move ppc socket.h definitions to 
ppc/sockbits.h...
Checking PATCH 6/8: linux-user: update ARCH_HAS_SOCKET_TYPES use...
Checking PATCH 7/8: linux-user: copy sparc/sockbits.h definitions from linux...
Checking PATCH 8/8: linux-user: define TARGET_SO_REUSEPORT...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [Qemu-devel] [qemu PATCH v4 2/4] tests/.gitignore: add entry for generated file

2018-05-21 Thread Philippe Mathieu-Daudé
On 05/21/2018 01:41 PM, Eric Blake wrote:
> On 05/21/2018 11:32 AM, Ross Zwisler wrote:
>> After a "make check" we end up with the following:
>>
>> $ git status
>> On branch master
>> Your branch is up-to-date with 'origin/master'.
>>
>> Untracked files:
>>    (use "git add ..." to include in what will be committed)
>>
>> tests/test-block-backend
>>
>> nothing added to commit but untracked files present (use "git add" to
>> track)
>>
>> Signed-off-by: Ross Zwisler 
>> Fixes: commit ad0df3e0fdac ("block: test blk_aio_flush() with
>> blk->root == NULL")
>> Cc: Kevin Wolf 
>> ---
>>   tests/.gitignore | 1 +
>>   1 file changed, 1 insertion(+)
> 
> Reviewed-by: Eric Blake 
> 
>>
>> diff --git a/tests/.gitignore b/tests/.gitignore
>> index fb62d2299b..2bc61a9a58 100644
>> --- a/tests/.gitignore
>> +++ b/tests/.gitignore
>> @@ -21,6 +21,7 @@ test-base64
>>   test-bdrv-drain
>>   test-bitops
>>   test-bitcnt
>> +test-block-backend
>>   test-blockjob
>>   test-blockjob-txn
>>   test-bufferiszero

What about using gitignore negated pattern in tests/?



[Qemu-devel] [PATCH] target/arm: Honour FPCR.FZ in FRECPX

2018-05-21 Thread Peter Maydell
The FRECPX instructions should (like most other floating point operations)
honour the FPCR.FZ bit which specifies whether input denormals should
be flushed to zero (or FZ16 for the half-precision version).
We forgot to implement this, which doesn't affect the results (since
the calculation doesn't actually care about the mantissa bits) but did
mean we were failing to set the FPSR.IDC bit.

Signed-off-by: Peter Maydell 
---
 target/arm/helper-a64.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index f92bdea732..c4d2a04827 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -384,6 +384,8 @@ float16 HELPER(frecpx_f16)(float16 a, void *fpstp)
 return nan;
 }
 
+a = float16_squash_input_denormal(a, fpst);
+
 val16 = float16_val(a);
 sbit = 0x8000 & val16;
 exp = extract32(val16, 10, 5);
@@ -413,6 +415,8 @@ float32 HELPER(frecpx_f32)(float32 a, void *fpstp)
 return nan;
 }
 
+a = float32_squash_input_denormal(a, fpst);
+
 val32 = float32_val(a);
 sbit = 0x8000ULL & val32;
 exp = extract32(val32, 23, 8);
@@ -442,6 +446,8 @@ float64 HELPER(frecpx_f64)(float64 a, void *fpstp)
 return nan;
 }
 
+a = float64_squash_input_denormal(a, fpst);
+
 val64 = float64_val(a);
 sbit = 0x8000ULL & val64;
 exp = extract64(float64_val(a), 52, 11);
-- 
2.17.0




Re: [Qemu-devel] [PATCH v2 1/1] tests/docker: Add a Avocado Docker test

2018-05-21 Thread Philippe Mathieu-Daudé
Hi Alistair, Fam,

On 05/21/2018 12:16 AM, Fam Zheng wrote:
> On Fri, 05/18 11:34, Alistair Francis wrote:
>> Avocado is not trivial to setup on non-Fedora systems. To simplfying
>> future testing add a docker test image that runs Avocado tests.

Can you add an entry in the "make docker" help menu?

>>
>> Signed-off-by: Alistair Francis 
>> ---
>> v2:
>>  - Add a seperate fedora-avocado Docker image
>>  - Move the avocado vt-bootstrap into the Docker file
>>
>>  tests/docker/Makefile.include |  1 +
>>  .../docker/dockerfiles/fedora-avocado.docker  | 25 +
>>  tests/docker/test-avocado | 28 +++
>>  3 files changed, 54 insertions(+)
>>  create mode 100644 tests/docker/dockerfiles/fedora-avocado.docker
>>  create mode 100755 tests/docker/test-avocado
>>
>> diff --git a/tests/docker/Makefile.include b/tests/docker/Makefile.include
>> index ef1a3e62eb..0e3d108dde 100644
>> --- a/tests/docker/Makefile.include
>> +++ b/tests/docker/Makefile.include
>> @@ -60,6 +60,7 @@ docker-image-debian-ppc64el-cross: docker-image-debian9
>>  docker-image-debian-s390x-cross: docker-image-debian9
>>  docker-image-debian-win32-cross: docker-image-debian8-mxe
>>  docker-image-debian-win64-cross: docker-image-debian8-mxe
>> +docker-image-fedora-avocado: docker-image-fedora
>>  docker-image-travis: NOUSER=1
>>  
>>  # Expand all the pre-requistes for each docker image and test combination
>> diff --git a/tests/docker/dockerfiles/fedora-avocado.docker 
>> b/tests/docker/dockerfiles/fedora-avocado.docker
>> new file mode 100644
>> index 00..55b19eebbf
>> --- /dev/null
>> +++ b/tests/docker/dockerfiles/fedora-avocado.docker
>> @@ -0,0 +1,25 @@
>> +FROM qemu:fedora
>> +
>> +ENV PACKAGES \
>> +libvirt-devel \
>> +nc \
>> +python-avocado \
>> +python2-devel python3-devel \
>> +qemu-kvm \
>> +tcpdump \
>> +xz
>> +ENV PIP_PACKAGES \
>> +avocado-qemu \
>> +avocado-framework-plugin-runner-remote \
>> +avocado-framework-plugin-runner-vm \
>> +avocado-framework-plugin-vt
>> +
>> +ENV QEMU_CONFIGURE_OPTS --python=/usr/bin/python3
> 
> I think this is inherited from qemu:fedora, no?

Yes.

> 
>> +
>> +RUN dnf install -y $PACKAGES
>> +RUN pip install $PIP_PACKAGES
>> +RUN avocado vt-bootstrap --yes-to-all --vt-type qemu
>> +
>> +RUN rpm -q $PACKAGES | sort > /packages.txt
> 
> Can you keep the parent image's list with ">>" or appending to the old 
> $PACKAGES
> in the above "ENV" directive?

Appending looks cleaner to me.

> 
>> +
>> +ENV FEATURES mingw clang pyyaml asan avocado
> 
> Similarly, is it possible to append to the parent list instead of overriding?
> 
>> diff --git a/tests/docker/test-avocado b/tests/docker/test-avocado
>> new file mode 100755
>> index 00..40474db2ce
>> --- /dev/null
>> +++ b/tests/docker/test-avocado
>> @@ -0,0 +1,28 @@
>> +#!/bin/bash -e
>> +#
>> +# Avocado tests on Fedora, as these are a real pain on Debian systems
> 
> Shouldn't pip packages work just well on Debian too? What are the pain?
> (Cc'ing Cleber who may want to know this).

Avocado isn't packaged (yet?) on Debian.

> 
> Fam
> 
>> +#
>> +# Copyright (c) 2018 Western Digital.
>> +#
>> +# Authors:
>> +#  Alistair Francis 
>> +#
>> +# This work is licensed under the terms of the GNU GPL, version 2
>> +# or (at your option) any later version. See the COPYING file in
>> +# the top-level directory.
>> +#
>> +# Run this test: NOUSER=1 make docker-test-avocado@fedora-avocado
>> +
>> +. common.rc
>> +
>> +requires avocado
>> +
>> +cd "$BUILD_DIR"
>> +
>> +DEF_TARGET_LIST="x86_64-softmmu"
>> +TARGET_LIST=${TARGET_LIST:-$DEF_TARGET_LIST} \
>> +build_qemu
>> +install_qemu
>> +
>> +export PATH="${PATH}:$(pwd)"
>> +avocado run boot --vt-qemu-bin ./x86_64-softmmu/qemu-system-x86_64

This failed when testing (I suppose due to too old corporate proxy...):

Step 7/11 : RUN avocado vt-bootstrap --yes-to-all --vt-type qemu
 ---> Running in 008e494971c7
[...]
8 - Verifying (and possibly downloading) guest image
Verifying expected SHA1 sum from
http://avocado-project.org/data/assets/jeos/27/SHA1SUM_JEOS_27_64
Failed to get SHA1 from file: HTTP Error 403: Forbidden file type or
location: http://avocado-project.org/data/assets/jeos/27/SHA1SUM_JEOS_27_64
File /var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2.xz not
found
Check your internet connection: HTTP Error 403: Forbidden file type or
location: http://avocado-project.org/data/assets/jeos/27/jeos-27-64.qcow2.xz
Uncompressing
/var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2.xz ->
/var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2
Bootstrap command failed
Command: xz -cd
/var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2.xz >
/var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2
stderr output:
xz: /var/lib/avocado/data/avocado-vt/images/jeos-27-x86_64.qcow2.xz:
File format not recognized

The command 

Re: [Qemu-devel] [qemu PATCH v4 2/4] tests/.gitignore: add entry for generated file

2018-05-21 Thread Eric Blake

On 05/21/2018 11:32 AM, Ross Zwisler wrote:

After a "make check" we end up with the following:

$ git status
On branch master
Your branch is up-to-date with 'origin/master'.

Untracked files:
   (use "git add ..." to include in what will be committed)

tests/test-block-backend

nothing added to commit but untracked files present (use "git add" to track)

Signed-off-by: Ross Zwisler 
Fixes: commit ad0df3e0fdac ("block: test blk_aio_flush() with blk->root == 
NULL")
Cc: Kevin Wolf 
---
  tests/.gitignore | 1 +
  1 file changed, 1 insertion(+)


Reviewed-by: Eric Blake 



diff --git a/tests/.gitignore b/tests/.gitignore
index fb62d2299b..2bc61a9a58 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -21,6 +21,7 @@ test-base64
  test-bdrv-drain
  test-bitops
  test-bitcnt
+test-block-backend
  test-blockjob
  test-blockjob-txn
  test-bufferiszero



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



[Qemu-devel] Ping [PATCH RFC v2 0/2] Fix UART serial implementation

2018-05-21 Thread Calvin Lee
Hello,

It's been about a week since the last email to my patches, in case anyone would
like to review but missed them.

Patches are the following on patchwork:
http://patchwork.ozlabs.org/patch/912281/
http://patchwork.ozlabs.org/patch/912282/

And the following on patchew:
http://patchew.org/QEMU/20180512000545.966-1-cyrus...@gmail.com/

Thank you,
Calvin Lee



[Qemu-devel] [qemu PATCH v4 3/4] nvdimm, acpi: support NFIT platform capabilities

2018-05-21 Thread Ross Zwisler
Add a machine command line option to allow the user to control the Platform
Capabilities Structure in the virtualized NFIT.  This Platform Capabilities
Structure was added in ACPI 6.2 Errata A.

Signed-off-by: Ross Zwisler 
---
 docs/nvdimm.txt | 27 +++
 hw/acpi/nvdimm.c| 45 +
 hw/i386/pc.c| 31 +++
 include/hw/i386/pc.h|  1 +
 include/hw/mem/nvdimm.h |  5 +
 5 files changed, 105 insertions(+), 4 deletions(-)

diff --git a/docs/nvdimm.txt b/docs/nvdimm.txt
index e903d8bb09..8b48fb4633 100644
--- a/docs/nvdimm.txt
+++ b/docs/nvdimm.txt
@@ -153,3 +153,30 @@ guest NVDIMM region mapping structure.  This unarmed flag 
indicates
 guest software that this vNVDIMM device contains a region that cannot
 accept persistent writes. In result, for example, the guest Linux
 NVDIMM driver, marks such vNVDIMM device as read-only.
+
+Platform Capabilities
+-
+
+ACPI 6.2 Errata A added support for a new Platform Capabilities Structure
+which allows the platform to communicate what features it supports related to
+NVDIMM data durability.  Users can provide a capabilities value to a guest via
+the optional "nvdimm-cap" machine command line option:
+
+-machine pc,accel=kvm,nvdimm,nvdimm-cap=2
+
+This "nvdimm-cap" field is an integer, and is the combined value of the
+various capability bits defined in table 5-137 of the ACPI 6.2 Errata A spec.
+
+Here is a quick summary of the three bits that are defined as of that spec:
+
+Bit[0] - CPU Cache Flush to NVDIMM Durability on Power Loss Capable.
+Bit[1] - Memory Controller Flush to NVDIMM Durability on Power Loss Capable.
+ Note: If bit 0 is set to 1 then this bit shall be set to 1 as well.
+Bit[2] - Byte Addressable Persistent Memory Hardware Mirroring Capable.
+
+So, a "nvdimm-cap" value of 2 would mean that the platform supports Memory
+Controller Flush on Power Loss, a value of 3 would mean that the platform
+supports CPU Cache Flush and Memory Controller Flush on Power Loss, etc.
+
+For a complete list of the flags available and for more detailed descriptions,
+please consult the ACPI spec.
diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 59d6e4254c..87e4280c71 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -169,6 +169,21 @@ struct NvdimmNfitControlRegion {
 } QEMU_PACKED;
 typedef struct NvdimmNfitControlRegion NvdimmNfitControlRegion;
 
+/*
+ * NVDIMM Platform Capabilities Structure
+ *
+ * Defined in section 5.2.25.9 of ACPI 6.2 Errata A, September 2017
+ */
+struct NvdimmNfitPlatformCaps {
+uint16_t type;
+uint16_t length;
+uint8_t highest_cap;
+uint8_t reserved[3];
+uint32_t capabilities;
+uint8_t reserved2[4];
+} QEMU_PACKED;
+typedef struct NvdimmNfitPlatformCaps NvdimmNfitPlatformCaps;
+
 /*
  * Module serial number is a unique number for each device. We use the
  * slot id of NVDIMM device to generate this number so that each device
@@ -351,7 +366,23 @@ static void nvdimm_build_structure_dcr(GArray *structures, 
DeviceState *dev)
  JEDEC Annex L Release 3. */);
 }
 
-static GArray *nvdimm_build_device_structure(void)
+/*
+ * ACPI 6.2 Errata A: 5.2.25.9 NVDIMM Platform Capabilities Structure
+ */
+static void
+nvdimm_build_structure_caps(GArray *structures, uint32_t capabilities)
+{
+NvdimmNfitPlatformCaps *nfit_caps;
+
+nfit_caps = acpi_data_push(structures, sizeof(*nfit_caps));
+
+nfit_caps->type = cpu_to_le16(7 /* NVDIMM Platform Capabilities */);
+nfit_caps->length = cpu_to_le16(sizeof(*nfit_caps));
+nfit_caps->highest_cap = 31 - clz32(capabilities);
+nfit_caps->capabilities = cpu_to_le32(capabilities);
+}
+
+static GArray *nvdimm_build_device_structure(AcpiNVDIMMState *state)
 {
 GSList *device_list = nvdimm_get_device_list();
 GArray *structures = g_array_new(false, true /* clear */, 1);
@@ -373,6 +404,10 @@ static GArray *nvdimm_build_device_structure(void)
 }
 g_slist_free(device_list);
 
+if (state->capabilities) {
+nvdimm_build_structure_caps(structures, state->capabilities);
+}
+
 return structures;
 }
 
@@ -381,16 +416,18 @@ static void nvdimm_init_fit_buffer(NvdimmFitBuffer 
*fit_buf)
 fit_buf->fit = g_array_new(false, true /* clear */, 1);
 }
 
-static void nvdimm_build_fit_buffer(NvdimmFitBuffer *fit_buf)
+static void nvdimm_build_fit_buffer(AcpiNVDIMMState *state)
 {
+NvdimmFitBuffer *fit_buf = >fit_buf;
+
 g_array_free(fit_buf->fit, true);
-fit_buf->fit = nvdimm_build_device_structure();
+fit_buf->fit = nvdimm_build_device_structure(state);
 fit_buf->dirty = true;
 }
 
 void nvdimm_plug(AcpiNVDIMMState *state)
 {
-nvdimm_build_fit_buffer(>fit_buf);
+nvdimm_build_fit_buffer(state);
 }
 
 static void nvdimm_build_nfit(AcpiNVDIMMState *state, GArray *table_offsets,
diff --git a/hw/i386/pc.c 

[Qemu-devel] [qemu PATCH v4 2/4] tests/.gitignore: add entry for generated file

2018-05-21 Thread Ross Zwisler
After a "make check" we end up with the following:

$ git status
On branch master
Your branch is up-to-date with 'origin/master'.

Untracked files:
  (use "git add ..." to include in what will be committed)

tests/test-block-backend

nothing added to commit but untracked files present (use "git add" to track)

Signed-off-by: Ross Zwisler 
Fixes: commit ad0df3e0fdac ("block: test blk_aio_flush() with blk->root == 
NULL")
Cc: Kevin Wolf 
---
 tests/.gitignore | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/.gitignore b/tests/.gitignore
index fb62d2299b..2bc61a9a58 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -21,6 +21,7 @@ test-base64
 test-bdrv-drain
 test-bitops
 test-bitcnt
+test-block-backend
 test-blockjob
 test-blockjob-txn
 test-bufferiszero
-- 
2.14.3




[Qemu-devel] [qemu PATCH v4 1/4] nvdimm: fix typo in label-size definition

2018-05-21 Thread Ross Zwisler
Signed-off-by: Ross Zwisler 
Fixes: commit da6789c27c2e ("nvdimm: add a macro for property "label-size"")
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Igor Mammedov 
Cc: Haozhong Zhang 
Cc: Michael S. Tsirkin 
Cc: Stefan Hajnoczi 
---
 hw/mem/nvdimm.c | 2 +-
 include/hw/mem/nvdimm.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
index acb656b672..4087aca25e 100644
--- a/hw/mem/nvdimm.c
+++ b/hw/mem/nvdimm.c
@@ -89,7 +89,7 @@ static void nvdimm_set_unarmed(Object *obj, bool value, Error 
**errp)
 
 static void nvdimm_init(Object *obj)
 {
-object_property_add(obj, NVDIMM_LABLE_SIZE_PROP, "int",
+object_property_add(obj, NVDIMM_LABEL_SIZE_PROP, "int",
 nvdimm_get_label_size, nvdimm_set_label_size, NULL,
 NULL, NULL);
 object_property_add_bool(obj, NVDIMM_UNARMED_PROP,
diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
index 7fd87c4e1c..74c60332e1 100644
--- a/include/hw/mem/nvdimm.h
+++ b/include/hw/mem/nvdimm.h
@@ -48,7 +48,7 @@
 #define NVDIMM_GET_CLASS(obj) OBJECT_GET_CLASS(NVDIMMClass, (obj), \
TYPE_NVDIMM)
 
-#define NVDIMM_LABLE_SIZE_PROP "label-size"
+#define NVDIMM_LABEL_SIZE_PROP "label-size"
 #define NVDIMM_UNARMED_PROP"unarmed"
 
 struct NVDIMMDevice {
-- 
2.14.3




[Qemu-devel] [qemu PATCH v4 4/4] ACPI testing: test NFIT platform capabilities

2018-05-21 Thread Ross Zwisler
Add testing for the newly added NFIT Platform Capabilities Structure.

Signed-off-by: Ross Zwisler 
Suggested-by: Igor Mammedov 
---
 tests/acpi-test-data/pc/NFIT.dimmpxm  | Bin 224 -> 240 bytes
 tests/acpi-test-data/q35/NFIT.dimmpxm | Bin 224 -> 240 bytes
 tests/bios-tables-test.c  |   2 +-
 3 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/acpi-test-data/pc/NFIT.dimmpxm 
b/tests/acpi-test-data/pc/NFIT.dimmpxm
index 
2bfc6c51f31c25a052803c494c933d4948fc0106..598d331b751cd3cb2137d431c1f34bb8957a0d31
 100644
GIT binary patch
delta 35
lcmaFB_<@nj&&@OB0|NsCqsm0CYXa;H0t}2m9y1Vw005dH1-}3Q

delta 18
Zcmeys_<)hi&&@OB0RsaAqyI#%YXCU~1+M@A

diff --git a/tests/acpi-test-data/q35/NFIT.dimmpxm 
b/tests/acpi-test-data/q35/NFIT.dimmpxm
index 
2bfc6c51f31c25a052803c494c933d4948fc0106..598d331b751cd3cb2137d431c1f34bb8957a0d31
 100644
GIT binary patch
delta 35
lcmaFB_<@nj&&@OB0|NsCqsm0CYXa;H0t}2m9y1Vw005dH1-}3Q

delta 18
Zcmeys_<)hi&&@OB0RsaAqyI#%YXCU~1+M@A

diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c
index bf3e193ae9..256d463cb8 100644
--- a/tests/bios-tables-test.c
+++ b/tests/bios-tables-test.c
@@ -830,7 +830,7 @@ static void test_acpi_tcg_dimm_pxm(const char *machine)
 memset(, 0, sizeof(data));
 data.machine = machine;
 data.variant = ".dimmpxm";
-test_acpi_one(" -machine nvdimm=on"
+test_acpi_one(" -machine nvdimm=on,nvdimm-cap=3"
   " -smp 4,sockets=4"
   " -m 128M,slots=3,maxmem=1G"
   " -numa node,mem=32M,nodeid=0"
-- 
2.14.3




[Qemu-devel] [qemu PATCH v4 0/4] support NFIT platform capabilities

2018-05-21 Thread Ross Zwisler
Changes since v3:
 * Updated the text in docs/nvdimm.txt to make it clear that the value
   being passed in on the command line in an integer made up of various
   bit fields. (Rob Elliott)
 
 * Updated the "Highest Valid Capability" byte to be dynamic based on
   the highest valid bit in the user's input. (Rob Elliott)

---

The first 2 patches in this series clean up some things I noticed while
coding.

Patch 3 adds support for the new Platform Capabilities Structure, which
was added to the NFIT in ACPI 6.2 Errata A.  We add a machine command
line option "nvdimm-cap":

-machine pc,accel=kvm,nvdimm,nvdimm-cap=2

which allows the user to pass in a value for this structure.  When such
a value is passed in we will generate the new NFIT subtable.

Patch 4 adds code to the "make check" self test infrastructure so that
we generate the new Platform Capabilities Structure, and adds it to the
expected NFIT output so that we test for it.

Ross Zwisler (4):
  nvdimm: fix typo in label-size definition
  tests/.gitignore: add entry for generated file
  nvdimm, acpi: support NFIT platform capabilities
  ACPI testing: test NFIT platform capabilities

 docs/nvdimm.txt   |  27 
 hw/acpi/nvdimm.c  |  45 +++---
 hw/i386/pc.c  |  31 +++
 hw/mem/nvdimm.c   |   2 +-
 include/hw/i386/pc.h  |   1 +
 include/hw/mem/nvdimm.h   |   7 +-
 tests/.gitignore  |   1 +
 tests/acpi-test-data/pc/NFIT.dimmpxm  | Bin 224 -> 240 bytes
 tests/acpi-test-data/q35/NFIT.dimmpxm | Bin 224 -> 240 bytes
 tests/bios-tables-test.c  |   2 +-
 10 files changed, 109 insertions(+), 7 deletions(-)

-- 
2.14.3




[Qemu-devel] [PATCH v2] Delete AF_UNIX socket after close

2018-05-21 Thread Pavel Balaev
This is a second attempt at sending this patch:

http://lists.nongnu.org/archive/html/qemu-devel/2018-05/msg04697.html

Signed-off-by: Pavel Balaev 
---
 io/channel-socket.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/io/channel-socket.c b/io/channel-socket.c
index 57cfb4d3a6..b50e63a053 100644
--- a/io/channel-socket.c
+++ b/io/channel-socket.c
@@ -685,8 +685,10 @@ qio_channel_socket_close(QIOChannel *ioc,
  Error **errp)
 {
 QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc);
+int rc = 0;
 
 if (sioc->fd != -1) {
+SocketAddress *addr = socket_local_address(sioc->fd, errp);
 #ifdef WIN32
 WSAEventSelect(sioc->fd, NULL, 0);
 #endif
@@ -697,8 +699,22 @@ qio_channel_socket_close(QIOChannel *ioc,
 return -1;
 }
 sioc->fd = -1;
+
+if (addr && addr->type == SOCKET_ADDRESS_TYPE_UNIX
+&& addr->u.q_unix.path) {
+if (unlink(addr->u.q_unix.path) < 0 && errno != ENOENT) {
+error_setg_errno(errp, errno,
+ "Failed to unlink socket %s",
+ addr->u.q_unix.path);
+rc = -1;
+}
+}
+
+if (addr) {
+qapi_free_SocketAddress(addr);
+}
 }
-return 0;
+return rc;
 }
 
 static int
-- 
2.16.1




Re: [Qemu-devel] [PATCH 00/27] iommu: support txattrs, support TCG execution, implement TZ MPC

2018-05-21 Thread no-reply
Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20180521140402.23318-1-peter.mayd...@linaro.org
Subject: [Qemu-devel] [PATCH 00/27] iommu: support txattrs, support TCG 
execution, implement TZ MPC

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
   5bcf917ee3..9802316ed6  master -> master
 t [tag update]patchew/20180509034658.26455-1-f4...@amsat.org -> 
patchew/20180509034658.26455-1-f4...@amsat.org
 * [new tag]   
patchew/20180521140402.23318-1-peter.mayd...@linaro.org -> 
patchew/20180521140402.23318-1-peter.mayd...@linaro.org
Switched to a new branch 'test'
19069d042c hw/arm/mps2-tz.c: Instantiate MPCs
545f7d3702 hw/arm/iotkit: Wire up MPC interrupt lines
965dcdf3c3 hw/arm/iotkit: Instantiate MPC
5aeb41635d hw/misc/iotkit-secctl.c: Implement SECMPCINTSTATUS
06b812c890 hw/core/or-irq: Support more than 16 inputs to an OR gate
d29f89b59f vmstate.h: Provide VMSTATE_BOOL_SUB_ARRAY
417b50c6f7 hw/misc/tz_mpc.c: Honour the BLK_LUT settings in translate
4d9bb0adb6 hw/misc/tz-mpc.c: Implement correct blocked-access behaviour
8b1bbc0790 hw/misc/tz-mpc.c: Implement registers
66fad85c8d hw/misc/tz-mpc.c: Implement the Arm TrustZone Memory Protection 
Controller
1e200a4e8f exec.c: Handle IOMMUs in address_space_translate_for_iotlb()
1b9a2ae51d iommu: Add IOMMU index argument to translate method
b022117fca iommu: Add IOMMU index argument to notifier APIs
71890073bb iommu: Add IOMMU index concept to IOMMU API
0093540ad1 Make address_space_translate_iommu take a MemTxAttrs argument
09947170a9 Make flatview_do_translate() take a MemTxAttrs argument
8388c25d74 Make address_space_get_iotlb_entry() take a MemTxAttrs argument
4f8f7862e7 Make flatview_translate() take a MemTxAttrs argument
29b8404366 Make flatview_access_valid() take a MemTxAttrs argument
2779af384f Make MemoryRegion valid.accepts callback take a MemTxAttrs argument
7ec3d4eee3 Make memory_region_access_valid() take a MemTxAttrs argument
2dbbe355a7 Make flatview_extend_translation() take a MemTxAttrs argument
b1a96a2a28 Make address_space_access_valid() take a MemTxAttrs argument
07fc6cedd7 Make address_space_map() take a MemTxAttrs argument
b99085d422 Make address_space_translate{, _cached}() take a MemTxAttrs argument
a8e73cc870 Make tb_invalidate_phys_addr() take a MemTxAttrs argument
c6d0746766 memory.h: Improve IOMMU related documentation

=== OUTPUT BEGIN ===
Checking PATCH 1/27: memory.h: Improve IOMMU related documentation...
Checking PATCH 2/27: Make tb_invalidate_phys_addr() take a MemTxAttrs 
argument...
Checking PATCH 3/27: Make address_space_translate{, _cached}() take a 
MemTxAttrs argument...
Checking PATCH 4/27: Make address_space_map() take a MemTxAttrs argument...
Checking PATCH 5/27: Make address_space_access_valid() take a MemTxAttrs 
argument...
Checking PATCH 6/27: Make flatview_extend_translation() take a MemTxAttrs 
argument...
Checking PATCH 7/27: Make memory_region_access_valid() take a MemTxAttrs 
argument...
Checking PATCH 8/27: Make MemoryRegion valid.accepts callback take a MemTxAttrs 
argument...
Checking PATCH 9/27: Make flatview_access_valid() take a MemTxAttrs argument...
Checking PATCH 10/27: Make flatview_translate() take a MemTxAttrs argument...
Checking PATCH 11/27: Make address_space_get_iotlb_entry() take a MemTxAttrs 
argument...
Checking PATCH 12/27: Make flatview_do_translate() take a MemTxAttrs argument...
Checking PATCH 13/27: Make address_space_translate_iommu take a MemTxAttrs 
argument...
WARNING: line over 80 characters
#30: FILE: exec.c:492:
+ AddressSpace 
**target_as,

total: 0 errors, 1 warnings, 32 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 14/27: iommu: Add IOMMU index concept to IOMMU API...
Checking PATCH 15/27: iommu: Add IOMMU index argument to notifier APIs...
Checking PATCH 16/27: iommu: Add IOMMU index argument to translate method...
Checking PATCH 17/27: exec.c: Handle IOMMUs in 
address_space_translate_for_iotlb()...
Checking PATCH 18/27: hw/misc/tz-mpc.c: Implement the Arm TrustZone Memory 
Protection Controller...
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#81: 
new file mode 100644


Re: [Qemu-devel] [qemu PATCH v2 3/4] nvdimm, acpi: support NFIT platform capabilities

2018-05-21 Thread Ross Zwisler
On Fri, May 18, 2018 at 04:37:10PM +, Elliott, Robert (Persistent Memory) 
wrote:
> 
> 
> ...
> > Would it help to show them in hex?
> > 
> >   As of ACPI 6.2 Errata A, the following values are valid for the bottom
> >   two bits:
> > 
> >   0x2 - Memory Controller Flush to NVDIMM Durability on Power Loss Capable.
> >   0x3 - CPU Cache Flush to NVDIMM Durability on Power Loss Capable.
> 
> Yes, that helps (unless the parser for that command-line does not 
> accept hex values).

Yep, the command-line parser does accept hex values.  I ended up just trying
to make the text clearer, though.

> It would also help to make the text be:
>   "CPU Cache and Memory Controller Flush"

My descriptions for the bits are coming straight out of ACPI. :)  I'd prefer
to stay consistent with what's written in the spec.



[Qemu-devel] [PULL 1/1] lm32: take BQL before writing IP/IM register

2018-05-21 Thread Michael Walle
Writing to these registers may raise an interrupt request. Actually,
this prevents the milkymist board from starting.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Michael Walle 
Tested-by: Philippe Mathieu-Daudé 
Reviewed-by: Alex Bennée 
---
 target/lm32/op_helper.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/target/lm32/op_helper.c b/target/lm32/op_helper.c
index 577f8306e3..234d55e056 100644
--- a/target/lm32/op_helper.c
+++ b/target/lm32/op_helper.c
@@ -102,12 +102,16 @@ void HELPER(wcsr_dc)(CPULM32State *env, uint32_t dc)
 
 void HELPER(wcsr_im)(CPULM32State *env, uint32_t im)
 {
+qemu_mutex_lock_iothread();
 lm32_pic_set_im(env->pic_state, im);
+qemu_mutex_unlock_iothread();
 }
 
 void HELPER(wcsr_ip)(CPULM32State *env, uint32_t im)
 {
+qemu_mutex_lock_iothread();
 lm32_pic_set_ip(env->pic_state, im);
+qemu_mutex_unlock_iothread();
 }
 
 void HELPER(wcsr_jtx)(CPULM32State *env, uint32_t jtx)
-- 
2.11.0




[Qemu-devel] [PULL 0/1] target/lm32 BQL patch

2018-05-21 Thread Michael Walle
The following changes since commit 81e9cbd0ca1131012b058df6804b1f626a6b730c:

  lm32: take BQL before writing IP/IM register (2018-05-21 13:37:12 +0200)

are available in the git repository at:

  git://github.com/mwalle/qemu.git tags/lm32-queue/20180521

for you to fetch changes up to 81e9cbd0ca1131012b058df6804b1f626a6b730c:

  lm32: take BQL before writing IP/IM register (2018-05-21 13:37:12 +0200)



Michael Walle (1):
  lm32: take BQL before writing IP/IM register

 target/lm32/op_helper.c | 4 
 1 file changed, 4 insertions(+)

-- 
2.11.0




Re: [Qemu-devel] [PATCH v2 2/2] target/lm32: hold BQL in gdbstub

2018-05-21 Thread Michael Walle

Am 2018-05-21 14:25, schrieb Peter Maydell:

On 21 May 2018 at 13:21, Michael Walle  wrote:

Changing the IP/IM registers may cause interrupts, so hold the BQL.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Michael Walle 
---
 target/lm32/gdbstub.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/target/lm32/gdbstub.c b/target/lm32/gdbstub.c
index cf929dd392..dac9418a2b 100644
--- a/target/lm32/gdbstub.c
+++ b/target/lm32/gdbstub.c
@@ -18,6 +18,7 @@
  * License along with this library; if not, see 
.

  */
 #include "qemu/osdep.h"
+#include "qemu/main-loop.h"
 #include "qemu-common.h"
 #include "cpu.h"
 #include "exec/gdbstub.h"
@@ -82,10 +83,14 @@ int lm32_cpu_gdb_write_register(CPUState *cs, 
uint8_t *mem_buf, int n)

 env->ie = tmp;
 break;
 case 37:
+qemu_mutex_lock_iothread();
 lm32_pic_set_im(env->pic_state, tmp);
+qemu_mutex_unlock_iothread();
 break;
 case 38:
+qemu_mutex_lock_iothread();
 lm32_pic_set_ip(env->pic_state, tmp);
+qemu_mutex_unlock_iothread();
 break;
 }
 }


Are you sure this is necessary? I would have expected the gdbstub to
be operating under the qemu lock anyway.



You're right. The gdbstub is already holding the lock. So i'll drop this 
and send the pull request right now.


-michael



[Qemu-devel] [PATCH v3 5/5] qmp: add pmemload command

2018-05-21 Thread Simon Ruderich
Adapted patch from Baojun Wang [1] with the following commit message:

I found this could be useful to have qemu-softmmu as a cross
debugger (launch with -s -S command line option), then if we can
have a command to load guest physical memory, we can use cross gdb
to do some target debug which gdb cannot do directly.

pmemload is necessary to directly write physical memory which is not
possible with gdb alone as it uses only logical addresses.

The QAPI for pmemload uses "val" as parameter name for the physical
address. This name is not very descriptive but is consistent with the
existing pmemsave. Changing the parameter name of pmemsave is not
possible without breaking the existing API.

[1]: https://lists.gnu.org/archive/html/qemu-trivial/2014-04/msg00074.html

Based-on-patch-by: Baojun Wang 
Signed-off-by: Simon Ruderich 
---
 cpus.c  | 41 +
 hmp-commands.hx | 14 ++
 hmp.c   | 12 
 hmp.h   |  1 +
 qapi/misc.json  | 20 
 5 files changed, 88 insertions(+)

diff --git a/cpus.c b/cpus.c
index 49d4d44916..9b105336af 100644
--- a/cpus.c
+++ b/cpus.c
@@ -2367,6 +2367,47 @@ exit:
 qemu_close(fd);
 }
 
+void qmp_pmemload(int64_t addr, int64_t size, int64_t offset,
+  const char *filename, Error **errp)
+{
+int fd;
+size_t l;
+ssize_t r;
+uint8_t buf[1024];
+
+fd = qemu_open(filename, O_RDONLY | O_BINARY);
+if (fd < 0) {
+error_setg_file_open(errp, errno, filename);
+return;
+}
+if (offset > 0) {
+if (lseek(fd, offset, SEEK_SET) != offset) {
+error_setg_errno(errp, errno,
+ "could not seek to offset %" PRIx64, offset);
+goto exit;
+}
+}
+
+while (size != 0) {
+l = sizeof(buf);
+if (l > size) {
+l = size;
+}
+r = read(fd, buf, l);
+if (r <= 0) {
+error_setg(errp, QERR_IO_ERROR);
+goto exit;
+}
+l = r; /* in case of short read */
+cpu_physical_memory_write(addr, buf, l);
+addr += l;
+size -= l;
+}
+
+exit:
+qemu_close(fd);
+}
+
 void qmp_inject_nmi(Error **errp)
 {
 nmi_monitor_handle(monitor_get_cpu_index(), errp);
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 0734fea931..84647c7c1d 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -822,6 +822,20 @@ STEXI
 @item pmemsave @var{addr} @var{size} @var{file}
 @findex pmemsave
 save to disk physical memory dump starting at @var{addr} of size @var{size}.
+ETEXI
+
+{
+.name   = "pmemload",
+.args_type  = "val:l,size:i,offset:i,filename:s",
+.params = "addr size offset file",
+.help   = "load from disk physical memory dump starting at 'addr' 
of size 'size' at file offset 'offset'",
+.cmd= hmp_pmemload,
+},
+
+STEXI
+@item pmemload @var{addr} @var{size} @var{offset} @var{file}
+@findex pmemload
+load from disk physical memory dump starting at @var{addr} of size @var{size} 
at file offset @var{offset}.
 ETEXI
 
 {
diff --git a/hmp.c b/hmp.c
index a4d28913bb..b85c943b63 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1105,6 +1105,18 @@ void hmp_pmemsave(Monitor *mon, const QDict *qdict)
 hmp_handle_error(mon, );
 }
 
+void hmp_pmemload(Monitor *mon, const QDict *qdict)
+{
+uint64_t size = qdict_get_int(qdict, "size");
+uint64_t offset = qdict_get_int(qdict, "offset");
+const char *filename = qdict_get_str(qdict, "filename");
+uint64_t addr = qdict_get_int(qdict, "val");
+Error *err = NULL;
+
+qmp_pmemload(addr, size, offset, filename, );
+hmp_handle_error(mon, );
+}
+
 void hmp_ringbuf_write(Monitor *mon, const QDict *qdict)
 {
 const char *chardev = qdict_get_str(qdict, "device");
diff --git a/hmp.h b/hmp.h
index 20f27439d3..31767ea4a8 100644
--- a/hmp.h
+++ b/hmp.h
@@ -47,6 +47,7 @@ void hmp_system_powerdown(Monitor *mon, const QDict *qdict);
 void hmp_cpu(Monitor *mon, const QDict *qdict);
 void hmp_memsave(Monitor *mon, const QDict *qdict);
 void hmp_pmemsave(Monitor *mon, const QDict *qdict);
+void hmp_pmemload(Monitor *mon, const QDict *qdict);
 void hmp_ringbuf_write(Monitor *mon, const QDict *qdict);
 void hmp_ringbuf_read(Monitor *mon, const QDict *qdict);
 void hmp_cont(Monitor *mon, const QDict *qdict);
diff --git a/qapi/misc.json b/qapi/misc.json
index f5988cc0b5..b4c0065b02 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -1219,6 +1219,26 @@
 { 'command': 'pmemsave',
   'data': {'val': 'int', 'size': 'int', 'filename': 'str'} }
 
+##
+# @pmemload:
+#
+# Load a portion of guest physical memory from a file.
+#
+# @val: the physical address of the guest to start from
+#
+# @size: the size of memory region to load
+#
+# @offset: the offset in the file to start from
+#
+# @filename: the file to load the memory from as binary data
+#

[Qemu-devel] [PATCH v3 2/5] cpus: convert qmp_memsave/qmp_pmemsave to use qemu_open

2018-05-21 Thread Simon Ruderich
qemu_open() allows passing file descriptors to qemu which is used in
restricted environments like libvirt where open() is prohibited.

Suggested-by: Eric Blake 
Signed-off-by: Simon Ruderich 
Reviewed-by: Eric Blake 
---
 cpus.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/cpus.c b/cpus.c
index 4b1609fe90..7fd8d3c32e 100644
--- a/cpus.c
+++ b/cpus.c
@@ -2291,7 +2291,7 @@ CpuInfoFastList *qmp_query_cpus_fast(Error **errp)
 void qmp_memsave(int64_t addr, int64_t size, const char *filename,
  bool has_cpu, int64_t cpu_index, Error **errp)
 {
-FILE *f;
+int fd;
 uint32_t l;
 CPUState *cpu;
 uint8_t buf[1024];
@@ -2308,8 +2308,8 @@ void qmp_memsave(int64_t addr, int64_t size, const char 
*filename,
 return;
 }
 
-f = fopen(filename, "wb");
-if (!f) {
+fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, 0600);
+if (fd < 0) {
 error_setg_file_open(errp, errno, filename);
 return;
 }
@@ -2324,7 +2324,7 @@ void qmp_memsave(int64_t addr, int64_t size, const char 
*filename,
  " specified", orig_addr, orig_size);
 goto exit;
 }
-if (fwrite(buf, 1, l, f) != l) {
+if (qemu_write_full(fd, buf, l) != l) {
 error_setg(errp, QERR_IO_ERROR);
 goto exit;
 }
@@ -2333,18 +2333,18 @@ void qmp_memsave(int64_t addr, int64_t size, const char 
*filename,
 }
 
 exit:
-fclose(f);
+qemu_close(fd);
 }
 
 void qmp_pmemsave(int64_t addr, int64_t size, const char *filename,
   Error **errp)
 {
-FILE *f;
+int fd;
 uint32_t l;
 uint8_t buf[1024];
 
-f = fopen(filename, "wb");
-if (!f) {
+fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, 0600);
+if (fd < 0) {
 error_setg_file_open(errp, errno, filename);
 return;
 }
@@ -2355,7 +2355,7 @@ void qmp_pmemsave(int64_t addr, int64_t size, const char 
*filename,
 l = size;
 }
 cpu_physical_memory_read(addr, buf, l);
-if (fwrite(buf, 1, l, f) != l) {
+if (qemu_write_full(fd, buf, l) != l) {
 error_setg(errp, QERR_IO_ERROR);
 goto exit;
 }
@@ -2364,7 +2364,7 @@ void qmp_pmemsave(int64_t addr, int64_t size, const char 
*filename,
 }
 
 exit:
-fclose(f);
+qemu_close(fd);
 }
 
 void qmp_inject_nmi(Error **errp)
-- 
2.15.0




[Qemu-devel] [PATCH v3 3/5] cpus: use size_t in qmp_memsave/qmp_pmemsave

2018-05-21 Thread Simon Ruderich
It's the natural type for object sizes and matches the return value of
sizeof(buf).

Signed-off-by: Simon Ruderich 
Reviewed-by: Eric Blake 
---
 cpus.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/cpus.c b/cpus.c
index 7fd8d3c32e..49d4d44916 100644
--- a/cpus.c
+++ b/cpus.c
@@ -2292,7 +2292,7 @@ void qmp_memsave(int64_t addr, int64_t size, const char 
*filename,
  bool has_cpu, int64_t cpu_index, Error **errp)
 {
 int fd;
-uint32_t l;
+size_t l;
 CPUState *cpu;
 uint8_t buf[1024];
 int64_t orig_addr = addr, orig_size = size;
@@ -2340,7 +2340,7 @@ void qmp_pmemsave(int64_t addr, int64_t size, const char 
*filename,
   Error **errp)
 {
 int fd;
-uint32_t l;
+size_t l;
 uint8_t buf[1024];
 
 fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, 0600);
-- 
2.15.0




[Qemu-devel] [PATCH v3 1/5] cpus: correct coding style in qmp_memsave/qmp_pmemsave

2018-05-21 Thread Simon Ruderich
Signed-off-by: Simon Ruderich 
Reviewed-by: Eric Blake 
---
 cpus.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/cpus.c b/cpus.c
index d1f16296de..4b1609fe90 100644
--- a/cpus.c
+++ b/cpus.c
@@ -2316,8 +2316,9 @@ void qmp_memsave(int64_t addr, int64_t size, const char 
*filename,
 
 while (size != 0) {
 l = sizeof(buf);
-if (l > size)
+if (l > size) {
 l = size;
+}
 if (cpu_memory_rw_debug(cpu, addr, buf, l, 0) != 0) {
 error_setg(errp, "Invalid addr 0x%016" PRIx64 "/size %" PRId64
  " specified", orig_addr, orig_size);
@@ -2350,8 +2351,9 @@ void qmp_pmemsave(int64_t addr, int64_t size, const char 
*filename,
 
 while (size != 0) {
 l = sizeof(buf);
-if (l > size)
+if (l > size) {
 l = size;
+}
 cpu_physical_memory_read(addr, buf, l);
 if (fwrite(buf, 1, l, f) != l) {
 error_setg(errp, QERR_IO_ERROR);
-- 
2.15.0




[Qemu-devel] [PATCH v3 4/5] hmp: don't truncate size in hmp_memsave/hmp_pmemsave

2018-05-21 Thread Simon Ruderich
The called function takes an uint64_t as size parameter and
qdict_get_int() returns an uint64_t. Don't truncate it needlessly to an
uint32_t.

Signed-off-by: Simon Ruderich 
Reviewed-by: Eric Blake 
---
 hmp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hmp.c b/hmp.c
index ef93f4878b..a4d28913bb 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1079,7 +1079,7 @@ void hmp_cpu(Monitor *mon, const QDict *qdict)
 
 void hmp_memsave(Monitor *mon, const QDict *qdict)
 {
-uint32_t size = qdict_get_int(qdict, "size");
+uint64_t size = qdict_get_int(qdict, "size");
 const char *filename = qdict_get_str(qdict, "filename");
 uint64_t addr = qdict_get_int(qdict, "val");
 Error *err = NULL;
@@ -1096,7 +1096,7 @@ void hmp_memsave(Monitor *mon, const QDict *qdict)
 
 void hmp_pmemsave(Monitor *mon, const QDict *qdict)
 {
-uint32_t size = qdict_get_int(qdict, "size");
+uint64_t size = qdict_get_int(qdict, "size");
 const char *filename = qdict_get_str(qdict, "filename");
 uint64_t addr = qdict_get_int(qdict, "val");
 Error *err = NULL;
-- 
2.15.0




[Qemu-devel] [PATCH v3 0/5] qmp: add pmemload command

2018-05-21 Thread Simon Ruderich
Hello,

This is third version of this patch set, rebased on current
master.

As I've received no answers to [1] (and I'd prefer to keep the
patch as is for now if possible) this doesn't include any changes
to address the comments to [2].

If there's anything else I can do to get these patches merged
please tell me.

Regards
Simon

[1]: <20180424145053.ga21...@ruderich.org>
 https://lists.gnu.org/archive/html/qemu-devel/2018-04/msg03894.html
[2]: 
<6f775e11a75a2faa1c66a86e6d23a97f695c2ca1.1523537181.git.si...@ruderich.org>
 https://lists.gnu.org/archive/html/qemu-devel/2018-04/msg01757.html

Simon Ruderich (5):
  cpus: correct coding style in qmp_memsave/qmp_pmemsave
  cpus: convert qmp_memsave/qmp_pmemsave to use qemu_open
  cpus: use size_t in qmp_memsave/qmp_pmemsave
  hmp: don't truncate size in hmp_memsave/hmp_pmemsave
  qmp: add pmemload command

 cpus.c  | 71 +
 hmp-commands.hx | 14 
 hmp.c   | 16 +++--
 hmp.h   |  1 +
 qapi/misc.json  | 20 
 5 files changed, 106 insertions(+), 16 deletions(-)

-- 
2.15.0



  1   2   >