Re: [Qemu-devel] [PATCH for 2.13 1/2] Revert "spapr: Don't allow memory hotplug to memory less nodes"

2018-04-05 Thread Serhii Popovych
Bharata B Rao wrote:
> On Thu, Apr 05, 2018 at 10:35:22AM -0400, Serhii Popovych wrote:
>> This reverts commit b556854bd8524c26b8be98ab1bfdf0826831e793.
>>
>> Leave change @node type from uint32_t to to int from reverted commit
>> because node < 0 is always false.
>>
>> Signed-off-by: Serhii Popovych 
>> ---
>>  hw/ppc/spapr.c | 22 --
>>  1 file changed, 22 deletions(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 2c0be8c..3ad4545 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -3477,28 +3477,6 @@ static void spapr_machine_device_plug(HotplugHandler 
>> *hotplug_dev,
>>  return;
>>  }
>>
>> -/*
>> - * Currently PowerPC kernel doesn't allow hot-adding memory to
>> - * memory-less node, but instead will silently add the memory
>> - * to the first node that has some memory. This causes two
>> - * unexpected behaviours for the user.
>> - *
>> - * - Memory gets hotplugged to a different node than what the user
>> - *   specified.
>> - * - Since pc-dimm subsystem in QEMU still thinks that memory 
>> belongs
>> - *   to memory-less node, a reboot will set things accordingly
>> - *   and the previously hotplugged memory now ends in the right 
>> node.
>> - *   This appears as if some memory moved from one node to another.
>> - *
>> - * So until kernel starts supporting memory hotplug to memory-less
>> - * nodes, just prevent such attempts upfront in QEMU.
>> - */
>> -if (nb_numa_nodes && !numa_info[node].node_mem) {
>> -error_setg(errp, "Can't hotplug memory to memory-less node %d",
>> -   node);
>> -return;
>> -}
>> -
> 
> If you remove this unconditionally, wouldn't it be a problem in case
> of newer QEMU with older guest kernels ?

Yes, that definitely would affect guest kernels without such support. We
probably need to add some capability to test for guest kernel
functionality presence.

> 
> Regards,
> Bharata.
> 
> 


-- 
Thanks,
Serhii



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH] kvmclock: fix clock_is_reliable on migration from QEMU < 2.9

2018-04-05 Thread Michael Chapman
When migrating from a pre-2.9 QEMU, no clock_is_reliable flag is
transferred. We should assume that the source host has an unreliable
KVM_GET_CLOCK, rather than using whatever was determined locally, to
ensure that any drift from the TSC-based value calculated by the guest
is corrected.

Signed-off-by: Michael Chapman 
---
 hw/i386/kvm/clock.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
index 1707434db3..7dac319403 100644
--- a/hw/i386/kvm/clock.c
+++ b/hw/i386/kvm/clock.c
@@ -241,6 +241,19 @@ static const VMStateDescription 
kvmclock_reliable_get_clock = {
 }
 };
 
+/*
+ * When migrating, assume the source has an unreliable
+ * KVM_GET_CLOCK unless told otherwise.
+ */
+static int kvmclock_pre_load(void *opaque)
+{
+KVMClockState *s = opaque;
+
+s->clock_is_reliable = false;
+
+return 0;
+}
+
 /*
  * When migrating, read the clock just before migration,
  * so that the guest clock counts during the events
@@ -268,6 +281,7 @@ static const VMStateDescription kvmclock_vmsd = {
 .name = "kvmclock",
 .version_id = 1,
 .minimum_version_id = 1,
+.pre_load = kvmclock_pre_load,
 .pre_save = kvmclock_pre_save,
 .fields = (VMStateField[]) {
 VMSTATE_UINT64(clock, KVMClockState),
-- 
2.14.3




Re: [Qemu-devel] [PATCH 17/19] uninorth: create new uninorth device

2018-04-05 Thread Mark Cave-Ayland

On 25/03/18 22:11, Mark Cave-Ayland wrote:

Just to follow up on this, I spent a bit looking at what this register 
is trying to do and from the Darwin source I can see that in fact it is 
simply a hard-wired hardware register which should return the revision 
of the UniNorth hardware.


So in fact the code in its current form is completely bogus which is 
visible when trying to boot FreeBSD, which as the register is never 
written to, returns a completely different random number each time.


David - are you okay to change DEVICE_NATIVE_ENDIAN to DEVICE_BIG_ENDIAN 
and then apply this and the final patch to your for-2.13 queue? I can 
then follow up with another patch later that will implement this 
register (and also the matching PCI revision ID) correctly.


Ping? I can see that more patches are being added to the for-2.13 branch 
so I was just wondering if there is now anything else needed from me in 
order to get the last 3 patches from this patchset queued?



ATB,

Mark.



[Qemu-devel] [PATCH resend] kvmclock: fix clock_is_reliable on migration from QEMU < 2.9

2018-04-05 Thread Michael Chapman
When migrating from a pre-2.9 QEMU, no clock_is_reliable flag is
transferred. We should assume that the source host has an unreliable
KVM_GET_CLOCK, rather than using whatever was determined locally, to
ensure that any drift from the TSC-based value calculated by the guest
is corrected.

Signed-off-by: Michael Chapman 
---
Resent due to missing CCs.

 hw/i386/kvm/clock.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
index 1707434db3..7dac319403 100644
--- a/hw/i386/kvm/clock.c
+++ b/hw/i386/kvm/clock.c
@@ -241,6 +241,19 @@ static const VMStateDescription 
kvmclock_reliable_get_clock = {
 }
 };
 
+/*
+ * When migrating, assume the source has an unreliable
+ * KVM_GET_CLOCK unless told otherwise.
+ */
+static int kvmclock_pre_load(void *opaque)
+{
+KVMClockState *s = opaque;
+
+s->clock_is_reliable = false;
+
+return 0;
+}
+
 /*
  * When migrating, read the clock just before migration,
  * so that the guest clock counts during the events
@@ -268,6 +281,7 @@ static const VMStateDescription kvmclock_vmsd = {
 .name = "kvmclock",
 .version_id = 1,
 .minimum_version_id = 1,
+.pre_load = kvmclock_pre_load,
 .pre_save = kvmclock_pre_save,
 .fields = (VMStateField[]) {
 VMSTATE_UINT64(clock, KVMClockState),
-- 
2.14.3




Re: [Qemu-devel] [RFC PATCH v2 4/4] memory: Add memory_region_set_priority()

2018-04-05 Thread Alexey Kardashevskiy
On 5/4/18 6:44 pm, KONRAD Frederic wrote:
> 
> 
> On 04/05/2018 03:22 AM, Philippe Mathieu-Daudé wrote:
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>> Sadly I'm missing something, this does not work.
> 
> Hmmm is that supposed to work dynamically?
> 
> If so I think you might need to call
> memory_region_update_container_subregions(..) because it will
> actually change the map?
> 
> Take a look at how memory_add_subregion works.


Ah, right, I guess it should be:

memory_region_transaction_begin();
mr->priority = priority;
memory_region_update_pending = true;
memory_region_transaction_commit();


as the priority may change the final layout of flatview.


> 
> Thanks,
> Fred
> 
>>
>>   memory.c | 18 +-
>>   1 file changed, 17 insertions(+), 1 deletion(-)
>>
>> diff --git a/memory.c b/memory.c
>> index eaa5fa7f23..ae45ea7779 100644
>> --- a/memory.c
>> +++ b/memory.c
>> @@ -1225,6 +1225,22 @@ static void memory_region_get_priority(Object
>> *obj, Visitor *v,
>>   visit_type_int32(v, name, , errp);
>>   }
>>   +static void memory_region_set_priority(Object *obj, Visitor *v,
>> +   const char *name, void *opaque,
>> +   Error **errp)
>> +{
>> +    MemoryRegion *mr = MEMORY_REGION(obj);
>> +    int32_t priority;
>> +    Error *local_err = NULL;
>> +
>> +    visit_type_int32(v, name, , _err);
>> +    if (local_err) {
>> +    error_propagate(errp, local_err);
>> +    return;
>> +    }
>> +    mr->priority = priority;
>> +}
>> +
>>   static void memory_region_get_size(Object *obj, Visitor *v, const char
>> *name,
>>  void *opaque, Error **errp)
>>   {
>> @@ -1260,7 +1276,7 @@ static void memory_region_initfn(Object *obj)
>>   NULL, NULL, _abort);
>>   object_property_add(OBJECT(mr), "priority", "int32",
>>   memory_region_get_priority,
>> -    NULL, /* memory_region_set_priority */
>> +    memory_region_set_priority,
>>   NULL, NULL, _abort);
>>   object_property_add(OBJECT(mr), "size", "uint64",
>>   memory_region_get_size,
>>


-- 
Alexey



Re: [Qemu-devel] [PATCH for-2.13 09/13] target/ppc: Move 1T segment and AMR options to PPCHash64Options

2018-04-05 Thread David Gibson
On Thu, Apr 05, 2018 at 02:06:19PM +0200, Greg Kurz wrote:
> On Thu,  5 Apr 2018 12:14:33 +1000
> David Gibson  wrote:
> 
> > Currently env->mmu_model is a bit of an unholy mess of an enum of distinct
> > MMU types, with various flag bits as well.  This makes which bits of the
> > field should be compared pretty confusing.
> > 
> > Make a start on cleaning that up by moving two of the flags bits -
> > POWERPC_MMU_1TSEG and POWERPC_MMU_AMR - which are specific to the 64-bit
> > hash MMU into a new flags field in PPCHash64Options structure.
> > 
> > Signed-off-by: David Gibson 
> > Reviewed-by: Cédric Le Goater 
> > Reviewed-by: Greg Kurz 
> > ---
> 
> While investigating a migration failure from an older QEMU, I realized
> this patch has a problem. The *cpu->hash64_opts structure is zeroed in
> kvm_fixup_page_sizes(), which has now the unwanted effect of clearing
> the cpu->hash64_opts->flags as well.
> 
> We only need to zero the segment page sizes actually. The following
> fixes migration:
> 
> @@ -442,7 +442,7 @@ static void kvm_fixup_page_sizes(PowerPCCPU *cpu)
>  }
>  
>  /* Convert to QEMU form */
> -memset(cpu->hash64_opts, 0, sizeof(*cpu->hash64_opts));
> +memset(>hash64_opts->sps, 0, sizeof(cpu->hash64_opts->sps));
>  
>  /* If we have HV KVM, we need to forbid CI large pages if our
>   * host page size is smaller than 64K.

Ah, good catch.  I mistakenly thought that kvm_fixup_page_sizes() was
completely rewriting those flags as well.  I've made this fix now.

I'm also currently working on some other patches which amongst other
things get rid of kvm_fixup_page_sizes(), because it's completely
broken by design.  But might as well not break things excessively in
the meantime.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH for 2.13 1/2] Revert "spapr: Don't allow memory hotplug to memory less nodes"

2018-04-05 Thread Bharata B Rao
On Thu, Apr 05, 2018 at 10:35:22AM -0400, Serhii Popovych wrote:
> This reverts commit b556854bd8524c26b8be98ab1bfdf0826831e793.
> 
> Leave change @node type from uint32_t to to int from reverted commit
> because node < 0 is always false.
> 
> Signed-off-by: Serhii Popovych 
> ---
>  hw/ppc/spapr.c | 22 --
>  1 file changed, 22 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 2c0be8c..3ad4545 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3477,28 +3477,6 @@ static void spapr_machine_device_plug(HotplugHandler 
> *hotplug_dev,
>  return;
>  }
> 
> -/*
> - * Currently PowerPC kernel doesn't allow hot-adding memory to
> - * memory-less node, but instead will silently add the memory
> - * to the first node that has some memory. This causes two
> - * unexpected behaviours for the user.
> - *
> - * - Memory gets hotplugged to a different node than what the user
> - *   specified.
> - * - Since pc-dimm subsystem in QEMU still thinks that memory belongs
> - *   to memory-less node, a reboot will set things accordingly
> - *   and the previously hotplugged memory now ends in the right node.
> - *   This appears as if some memory moved from one node to another.
> - *
> - * So until kernel starts supporting memory hotplug to memory-less
> - * nodes, just prevent such attempts upfront in QEMU.
> - */
> -if (nb_numa_nodes && !numa_info[node].node_mem) {
> -error_setg(errp, "Can't hotplug memory to memory-less node %d",
> -   node);
> -return;
> -}
> -

If you remove this unconditionally, wouldn't it be a problem in case
of newer QEMU with older guest kernels ?

Regards,
Bharata.




Re: [Qemu-devel] [PATCH v2 00/17] tcg: tb_lock_removal redux v2

2018-04-05 Thread no-reply
Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 1522980788-1252-1-git-send-email-c...@braap.org
Subject: [Qemu-devel] [PATCH v2 00/17] tcg: tb_lock_removal redux v2

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]   
patchew/1522980788-1252-1-git-send-email-c...@braap.org -> 
patchew/1522980788-1252-1-git-send-email-c...@braap.org
Switched to a new branch 'test'
596b0ae126 tcg: remove tb_lock
393740f4bc translate-all: remove tb_lock mention from cpu_restore_state_from_tb
1d225d49e8 cputlb: remove tb_lock from tlb_flush functions
7117367de6 translate-all: protect TB jumps with a per-destination-TB lock
1d1fd21dd7 translate-all: discard TB when tb_link_page returns an existing 
matching TB
ca22fa4417 translate-all: add page_collection assertions
50b6081956 translate-all: add page_locked assertions
5b6d7dd39f translate-all: use per-page locking in !user-mode
9fb729b4a1 translate-all: move tb_invalidate_phys_page_range up in the file
e3435070c4 translate-all: work page-by-page in tb_invalidate_phys_range_1
b7f40fb599 translate-all: remove hole in PageDesc
256fc42714 translate-all: make l1_map lockless
9bb165ec14 translate-all: iterate over TBs in a page with PAGE_FOR_EACH_TB
8a38ca3168 tcg: move tb_ctx.tb_phys_invalidate_count to tcg_ctx
c0e7a50dea tcg: track TBs with per-region BST's
812e356d97 qht: return existing entry when qht_insert fails
aa639c6e6f qht: require a default comparison function

=== OUTPUT BEGIN ===
Checking PATCH 1/17: qht: require a default comparison function...
Checking PATCH 2/17: qht: return existing entry when qht_insert fails...
Checking PATCH 3/17: tcg: track TBs with per-region BST's...
Checking PATCH 4/17: tcg: move tb_ctx.tb_phys_invalidate_count to tcg_ctx...
Checking PATCH 5/17: translate-all: iterate over TBs in a page with 
PAGE_FOR_EACH_TB...
ERROR: braces {} are necessary for all arms of this statement
#50: FILE: accel/tcg/translate-all.c:119:
+for (n = (head) & 1, tb = (TranslationBlock *)((head) & ~1);\
[...]

total: 1 errors, 0 warnings, 168 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 6/17: translate-all: make l1_map lockless...
Checking PATCH 7/17: translate-all: remove hole in PageDesc...
Checking PATCH 8/17: translate-all: work page-by-page in 
tb_invalidate_phys_range_1...
Checking PATCH 9/17: translate-all: move tb_invalidate_phys_page_range up in 
the file...
Checking PATCH 10/17: translate-all: use per-page locking in !user-mode...
Checking PATCH 11/17: translate-all: add page_locked assertions...
Checking PATCH 12/17: translate-all: add page_collection assertions...
Checking PATCH 13/17: translate-all: discard TB when tb_link_page returns an 
existing matching TB...
Checking PATCH 14/17: translate-all: protect TB jumps with a per-destination-TB 
lock...
Checking PATCH 15/17: cputlb: remove tb_lock from tlb_flush functions...
Checking PATCH 16/17: translate-all: remove tb_lock mention from 
cpu_restore_state_from_tb...
Checking PATCH 17/17: tcg: remove tb_lock...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [Qemu-devel] [PATCH] tcg: Fix out-of-line generic vector compares

2018-04-05 Thread Emilio G. Cota
On Fri, Apr 06, 2018 at 11:00:15 +1000, Richard Henderson wrote:
> A mistake in the type passed to sizeof, that happens to work
> when the out-of-line fallback itself is using host vectors,
> but fails when using only the base types.
> 
> Reported-by: Emilio G. Cota 
> Signed-off-by: Richard Henderson 

Tested-by: Emilio G. Cota 

Thanks!

Emilio



[Qemu-devel] [PATCH v2 17/17] tcg: remove tb_lock

2018-04-05 Thread Emilio G. Cota
Use mmap_lock in user-mode to protect TCG state and the page
descriptors.
In !user-mode, each vCPU has its own TCG state, so no locks
needed. Per-page locks are used to protect the page descriptors.

Per-TB locks are used in both modes to protect TB jumps.

Some notes:

- tb_lock is removed from notdirty_mem_write by passing a
  locked page_collection to tb_invalidate_phys_page_fast.

- tcg_tb_lookup/remove/insert/etc have their own internal lock(s),
  so there is no need to further serialize access to them.

- do_tb_flush is run in a safe async context, meaning no other
  vCPU threads are running. Therefore acquiring mmap_lock there
  is just to please tools such as thread sanitizer.

- Not visible in the diff, but tb_invalidate_phys_page already
  has an assert_memory_lock.

- cpu_io_recompile is !user-only, so no mmap_lock there.

- Added mmap_unlock()'s before all siglongjmp's that could
  be called in user-mode while mmap_lock is held.
  + Added an assert for !have_mmap_lock() after returning from
the longjmp in cpu_exec, just like we do in cpu_exec_step_atomic.

Performance numbers before/after:

Host: AMD Opteron(tm) Processor 6376

 ubuntu 17.04 ppc64 bootup+shutdown time

  700 +-+--++--++---+*--+-+
  |++  ++   +   *B|
  | before ***B***** *|
  |tb lock removal ###D### ***|
  600 +-+   *** +-+
  |   ** #|
  |*B*  #D|
  | *** * ##  |
  500 +-+***   ###  +-+
  | * ***   ###   |
  |*B*  # ##  |
  |  ** *  #D#|
  400 +-+  **## +-+
  |  **   ### |
  |**   ##|
  |  ** # ##  |
  300 +-+  *   B*  #D#  +-+
  |B ***###   |
  |*   **     |
  | *   ***  ###  |
  200 +-+   B  *B #D#   +-+
  | #B* *   ## #  |
  | #*##  |
  |+ D##D# ++   ++|
  100 +-+--++--++---++--+-+
   18  16  Guest CPUs   48   64
  png: https://imgur.com/HwmBHXe

  debian jessie aarch64 bootup+shutdown time

  90 +-+--+-+-++++--+-+
 |+ + ++++|
 | before ***B***B|
  80 +tb lock removal ###D###  **D  +-+
 |   **###|
 | **##   |
  70 +-+ ** #   +-+
 | ** ##  |
 |   **  #|
  60 +-+   *B  ##   +-+
 |   **  ##   |
 |***  #D |
  50 +-+   ***   ## +-+
 | * **   ### |
 |   **B*  ###|
  40 +-+   # ## +-+
 |    #D# |
 | ***B**  ###|
  30 +-+B***B** +-+
 |B *   * # ###   |
 | B   ###D#  |
  20 +-+   D  ##D## +-+
 |  D#|
 |+ + ++++|
  10 +-+--+-+-++++--+-+
  1 8 16  Guest CPUs48   64
  png: https://imgur.com/iGpGFtv

The gains are high for 4-8 CPUs. Beyond that point, however, unrelated
lock contention significantly hurts scalability.


[Qemu-devel] [PATCH v2 10/17] translate-all: use per-page locking in !user-mode

2018-04-05 Thread Emilio G. Cota
Groundwork for supporting parallel TCG generation.

Instead of using a global lock (tb_lock) to protect changes
to pages, use fine-grained, per-page locks in !user-mode.
User-mode stays with mmap_lock.

Sometimes changes need to happen atomically on more than one
page (e.g. when a TB that spans across two pages is
added/invalidated, or when a range of pages is invalidated).
We therefore introduce struct page_collection, which helps
us keep track of a set of pages that have been locked in
the appropriate locking order (i.e. by ascending page index).

This commit first introduces the structs and the function helpers,
to then convert the calling code to use per-page locking. Note
that tb_lock is not removed yet.

While at it, rename tb_alloc_page to tb_page_add, which pairs with
tb_page_remove.

Signed-off-by: Emilio G. Cota 
---
 accel/tcg/translate-all.h |   3 +
 include/exec/exec-all.h   |   3 +-
 accel/tcg/translate-all.c | 432 +-
 3 files changed, 396 insertions(+), 42 deletions(-)

diff --git a/accel/tcg/translate-all.h b/accel/tcg/translate-all.h
index ba8e4d6..6d1d258 100644
--- a/accel/tcg/translate-all.h
+++ b/accel/tcg/translate-all.h
@@ -23,6 +23,9 @@
 
 
 /* translate-all.c */
+struct page_collection *page_collection_lock(tb_page_addr_t start,
+ tb_page_addr_t end);
+void page_collection_unlock(struct page_collection *set);
 void tb_invalidate_phys_page_fast(tb_page_addr_t start, int len);
 void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
int is_cpu_write_access);
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 5f7e65a..aeaa127 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -355,7 +355,8 @@ struct TranslationBlock {
 /* original tb when cflags has CF_NOCACHE */
 struct TranslationBlock *orig_tb;
 /* first and second physical page containing code. The lower bit
-   of the pointer tells the index in page_next[] */
+   of the pointer tells the index in page_next[].
+   The list is protected by the TB's page('s) lock(s) */
 uintptr_t page_next[2];
 tb_page_addr_t page_addr[2];
 
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 4d4391f..042378a 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -112,8 +112,55 @@ typedef struct PageDesc {
 #else
 unsigned long flags;
 #endif
+#ifndef CONFIG_USER_ONLY
+QemuSpin lock;
+#endif
 } PageDesc;
 
+/**
+ * struct page_entry - page descriptor entry
+ * @pd: pointer to the  PageDesc of the page this entry represents
+ * @index:  page index of the page
+ * @locked: whether the page is locked
+ *
+ * This struct helps us keep track of the locked state of a page, without
+ * bloating  PageDesc.
+ *
+ * A page lock protects accesses to all fields of  PageDesc.
+ *
+ * See also:  page_collection.
+ */
+struct page_entry {
+PageDesc *pd;
+tb_page_addr_t index;
+bool locked;
+};
+
+/**
+ * struct page_collection - tracks a set of pages (i.e.  page_entry's)
+ * @tree:   Binary search tree (BST) of the pages, with key == page index
+ * @max:Pointer to the page in @tree with the highest page index
+ *
+ * To avoid deadlock we lock pages in ascending order of page index.
+ * When operating on a set of pages, we need to keep track of them so that
+ * we can lock them in order and also unlock them later. For this we collect
+ * pages (i.e.  page_entry's) in a binary search @tree. Given that the
+ * @tree implementation we use does not provide an O(1) operation to obtain the
+ * highest-ranked element, we use @max to keep track of the inserted page
+ * with the highest index. This is valuable because if a page is not in
+ * the tree and its index is higher than @max's, then we can lock it
+ * without breaking the locking order rule.
+ *
+ * Note on naming: 'struct page_set' would be shorter, but we already have a 
few
+ * page_set_*() helpers, so page_collection is used instead to avoid confusion.
+ *
+ * See also: page_collection_lock().
+ */
+struct page_collection {
+GTree *tree;
+struct page_entry *max;
+};
+
 /* list iterators for lists of tagged pointers in TranslationBlock */
 #define TB_FOR_EACH_TAGGED(head, tb, n, field)  \
 for (n = (head) & 1, tb = (TranslationBlock *)((head) & ~1);\
@@ -507,6 +554,15 @@ static PageDesc *page_find_alloc(tb_page_addr_t index, int 
alloc)
 return NULL;
 }
 pd = g_new0(PageDesc, V_L2_SIZE);
+#ifndef CONFIG_USER_ONLY
+{
+int i;
+
+for (i = 0; i < V_L2_SIZE; i++) {
+qemu_spin_init([i].lock);
+}
+}
+#endif
 existing = atomic_cmpxchg(lp, NULL, pd);
 if (unlikely(existing)) {
 g_free(pd);
@@ -522,6 +578,228 @@ static inline PageDesc *page_find(tb_page_addr_t 

[Qemu-devel] [PATCH v2 13/17] translate-all: discard TB when tb_link_page returns an existing matching TB

2018-04-05 Thread Emilio G. Cota
Use the recently-gained QHT feature of returning the matching TB if it
already exists. This allows us to get rid of the lookup we perform
right after acquiring tb_lock.

Suggested-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 docs/devel/multi-thread-tcg.txt |  3 +++
 accel/tcg/cpu-exec.c| 14 ++--
 accel/tcg/translate-all.c   | 50 +
 3 files changed, 46 insertions(+), 21 deletions(-)

diff --git a/docs/devel/multi-thread-tcg.txt b/docs/devel/multi-thread-tcg.txt
index faf8918..faf09c6 100644
--- a/docs/devel/multi-thread-tcg.txt
+++ b/docs/devel/multi-thread-tcg.txt
@@ -140,6 +140,9 @@ to atomically insert new elements.
 The lookup caches are updated atomically and the lookup hash uses QHT
 which is designed for concurrent safe lookup.
 
+Parallel code generation is supported. QHT is used at insertion time
+as the synchronization point across threads, thereby ensuring that we only
+keep track of a single TranslationBlock for each guest code block.
 
 Memory maps and TLBs
 
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 6a3a21d..c79c43b 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -243,10 +243,7 @@ void cpu_exec_step_atomic(CPUState *cpu)
 if (tb == NULL) {
 mmap_lock();
 tb_lock();
-tb = tb_htable_lookup(cpu, pc, cs_base, flags, cf_mask);
-if (likely(tb == NULL)) {
-tb = tb_gen_code(cpu, pc, cs_base, flags, cflags);
-}
+tb = tb_gen_code(cpu, pc, cs_base, flags, cflags);
 tb_unlock();
 mmap_unlock();
 }
@@ -396,14 +393,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
 tb_lock();
 acquired_tb_lock = true;
 
-/* There's a chance that our desired tb has been translated while
- * taking the locks so we check again inside the lock.
- */
-tb = tb_htable_lookup(cpu, pc, cs_base, flags, cf_mask);
-if (likely(tb == NULL)) {
-/* if no translated code available, then translate it now */
-tb = tb_gen_code(cpu, pc, cs_base, flags, cf_mask);
-}
+tb = tb_gen_code(cpu, pc, cs_base, flags, cf_mask);
 
 mmap_unlock();
 /* We add the TB in the virtual pc hash table for the fast lookup */
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index f8862f6..aabde7e 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1581,12 +1581,19 @@ static inline void tb_page_add(PageDesc *p, 
TranslationBlock *tb,
  * (-1) to indicate that only one page contains the TB.
  *
  * Called with mmap_lock held for user-mode emulation.
+ *
+ * Returns a pointer @tb, or a pointer to an existing TB that matches @tb.
+ * Note that in !user-mode, another thread might have already added a TB
+ * for the same block of guest code that @tb corresponds to. In that case,
+ * the caller should discard the original @tb, and use instead the returned TB.
  */
-static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
- tb_page_addr_t phys_page2)
+static TranslationBlock *
+tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
+ tb_page_addr_t phys_page2)
 {
 PageDesc *p;
 PageDesc *p2 = NULL;
+void *existing_tb = NULL;
 uint32_t h;
 
 assert_memory_lock();
@@ -1594,6 +1601,11 @@ static void tb_link_page(TranslationBlock *tb, 
tb_page_addr_t phys_pc,
 /*
  * Add the TB to the page list.
  * To avoid deadlock, acquire first the lock of the lower-addressed page.
+ * We keep the locks held until after inserting the TB in the hash table,
+ * so that if the insertion fails we know for sure that the TBs are still
+ * in the page descriptors.
+ * Note that inserting into the hash table first isn't an option, since
+ * we can only insert TBs that are fully initialized.
  */
 p = page_find_alloc(phys_pc >> TARGET_PAGE_BITS, 1);
 if (likely(phys_page2 == -1)) {
@@ -1613,21 +1625,33 @@ static void tb_link_page(TranslationBlock *tb, 
tb_page_addr_t phys_pc,
 tb_page_add(p2, tb, 1, phys_page2);
 }
 
+/* add in the hash table */
+h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags & CF_HASH_MASK,
+ tb->trace_vcpu_dstate);
+qht_insert(_ctx.htable, tb, h, _tb);
+
+/* remove TB from the page(s) if we couldn't insert it */
+if (unlikely(existing_tb)) {
+tb_page_remove(p, tb);
+invalidate_page_bitmap(p);
+if (p2) {
+tb_page_remove(p2, tb);
+invalidate_page_bitmap(p2);
+}
+tb = existing_tb;
+}
+
 if (p2) {
 page_unlock(p2);
 }
 page_unlock(p);
 
-/* add in the hash table */
-h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags & CF_HASH_MASK,
- 

[Qemu-devel] [PATCH v2 01/17] qht: require a default comparison function

2018-04-05 Thread Emilio G. Cota
qht_lookup now uses the default cmp function. qht_lookup_custom is defined
to retain the old behaviour, that is a cmp function is explicitly provided.

qht_insert will gain use of the default cmp in the next patch.

Note that we move qht_lookup_custom's @func to be the last argument,
which makes the new qht_lookup as simple as possible.
Instead of this (i.e. keeping @func 2nd):
00010750 :
   10750:   89 d1   mov%edx,%ecx
   10752:   48 89 f2mov%rsi,%rdx
   10755:   48 8b 77 08 mov0x8(%rdi),%rsi
   10759:   e9 22 ff ff ff  jmpq   10680 
   1075e:   66 90   xchg   %ax,%ax

We get:
00010740 :
   10740:   48 8b 4f 08 mov0x8(%rdi),%rcx
   10744:   e9 37 ff ff ff  jmpq   10680 
   10749:   0f 1f 80 00 00 00 00nopl   0x0(%rax)

Reviewed-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Signed-off-by: Emilio G. Cota 
---
 include/qemu/qht.h| 25 -
 accel/tcg/cpu-exec.c  |  4 ++--
 accel/tcg/translate-all.c | 16 +++-
 tests/qht-bench.c | 14 +++---
 tests/test-qht.c  | 15 ++-
 util/qht.c| 14 +++---
 6 files changed, 65 insertions(+), 23 deletions(-)

diff --git a/include/qemu/qht.h b/include/qemu/qht.h
index 531aa95..5f03a0f 100644
--- a/include/qemu/qht.h
+++ b/include/qemu/qht.h
@@ -11,8 +11,11 @@
 #include "qemu/thread.h"
 #include "qemu/qdist.h"
 
+typedef bool (*qht_cmp_func_t)(const void *a, const void *b);
+
 struct qht {
 struct qht_map *map;
+qht_cmp_func_t cmp;
 QemuMutex lock; /* serializes setters of ht->map */
 unsigned int mode;
 };
@@ -47,10 +50,12 @@ typedef void (*qht_iter_func_t)(struct qht *ht, void *p, 
uint32_t h, void *up);
 /**
  * qht_init - Initialize a QHT
  * @ht: QHT to be initialized
+ * @cmp: default comparison function. Cannot be NULL.
  * @n_elems: number of entries the hash table should be optimized for.
  * @mode: bitmask with OR'ed QHT_MODE_*
  */
-void qht_init(struct qht *ht, size_t n_elems, unsigned int mode);
+void qht_init(struct qht *ht, qht_cmp_func_t cmp, size_t n_elems,
+  unsigned int mode);
 
 /**
  * qht_destroy - destroy a previously initialized QHT
@@ -78,11 +83,11 @@ void qht_destroy(struct qht *ht);
 bool qht_insert(struct qht *ht, void *p, uint32_t hash);
 
 /**
- * qht_lookup - Look up a pointer in a QHT
+ * qht_lookup_custom - Look up a pointer using a custom comparison function.
  * @ht: QHT to be looked up
- * @func: function to compare existing pointers against @userp
  * @userp: pointer to pass to @func
  * @hash: hash of the pointer to be looked up
+ * @func: function to compare existing pointers against @userp
  *
  * Needs to be called under an RCU read-critical section.
  *
@@ -94,8 +99,18 @@ bool qht_insert(struct qht *ht, void *p, uint32_t hash);
  * Returns the corresponding pointer when a match is found.
  * Returns NULL otherwise.
  */
-void *qht_lookup(struct qht *ht, qht_lookup_func_t func, const void *userp,
- uint32_t hash);
+void *qht_lookup_custom(struct qht *ht, const void *userp, uint32_t hash,
+qht_lookup_func_t func);
+
+/**
+ * qht_lookup - Look up a pointer in a QHT
+ * @ht: QHT to be looked up
+ * @userp: pointer to pass to the comparison function
+ * @hash: hash of the pointer to be looked up
+ *
+ * Calls qht_lookup_custom() using @ht's default comparison function.
+ */
+void *qht_lookup(struct qht *ht, const void *userp, uint32_t hash);
 
 /**
  * qht_remove - remove a pointer from the hash table
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 9cc6972..dabf292 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -293,7 +293,7 @@ struct tb_desc {
 uint32_t trace_vcpu_dstate;
 };
 
-static bool tb_cmp(const void *p, const void *d)
+static bool tb_lookup_cmp(const void *p, const void *d)
 {
 const TranslationBlock *tb = p;
 const struct tb_desc *desc = d;
@@ -338,7 +338,7 @@ TranslationBlock *tb_htable_lookup(CPUState *cpu, 
target_ulong pc,
 phys_pc = get_page_addr_code(desc.env, pc);
 desc.phys_page1 = phys_pc & TARGET_PAGE_MASK;
 h = tb_hash_func(phys_pc, pc, flags, cf_mask, *cpu->trace_dstate);
-return qht_lookup(_ctx.htable, tb_cmp, , h);
+return qht_lookup_custom(_ctx.htable, , h, tb_lookup_cmp);
 }
 
 void tb_set_jmp_target(TranslationBlock *tb, int n, uintptr_t addr)
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index d419060..7af5f36 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -785,11 +785,25 @@ static inline void code_gen_alloc(size_t tb_size)
 qemu_mutex_init(_ctx.tb_lock);
 }
 
+static bool tb_cmp(const void *ap, const void *bp)
+{
+const TranslationBlock *a = ap;
+const TranslationBlock *b = bp;
+
+return 

[Qemu-devel] [PATCH v2 11/17] translate-all: add page_locked assertions

2018-04-05 Thread Emilio G. Cota
This is only compiled under CONFIG_DEBUG_TCG to avoid
bloating the binary.

In user-mode, assert_page_locked is equivalent to assert_mmap_lock.

Note: There are some tb_lock assertions left that will be
removed by later patches.

Suggested-by: Alex Bennée 
Signed-off-by: Emilio G. Cota 
---
 accel/tcg/translate-all.c | 90 +--
 1 file changed, 87 insertions(+), 3 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 042378a..29bc1da 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -580,6 +580,9 @@ static inline PageDesc *page_find(tb_page_addr_t index)
 
 /* In user-mode page locks aren't used; mmap_lock is enough */
 #ifdef CONFIG_USER_ONLY
+
+#define assert_page_locked(pd) tcg_debug_assert(have_mmap_lock())
+
 static inline void page_lock(PageDesc *pd)
 { }
 
@@ -602,14 +605,91 @@ void page_collection_unlock(struct page_collection *set)
 { }
 #else /* !CONFIG_USER_ONLY */
 
+#ifdef CONFIG_DEBUG_TCG
+
+struct page_lock_debug {
+const PageDesc *pd;
+QLIST_ENTRY(page_lock_debug) entry;
+};
+
+static __thread QLIST_HEAD(, page_lock_debug) page_lock_debug_head;
+
+static struct page_lock_debug *get_page_lock_debug(const PageDesc *pd)
+{
+struct page_lock_debug *pld;
+
+QLIST_FOREACH(pld, _lock_debug_head, entry) {
+if (pld->pd == pd) {
+return pld;
+}
+}
+return NULL;
+}
+
+static bool page_is_locked(const PageDesc *pd)
+{
+struct page_lock_debug *pld;
+
+pld = get_page_lock_debug(pd);
+return pld != NULL;
+}
+
+static void page_lock__debug(const PageDesc *pd)
+{
+struct page_lock_debug *pld;
+
+g_assert(!page_is_locked(pd));
+pld = g_new(struct page_lock_debug, 1);
+pld->pd = pd;
+QLIST_INSERT_HEAD(_lock_debug_head, pld, entry);
+}
+
+static void page_unlock__debug(const PageDesc *pd)
+{
+struct page_lock_debug *pld;
+
+pld = get_page_lock_debug(pd);
+g_assert(pld);
+QLIST_REMOVE(pld, entry);
+g_free(pld);
+}
+
+static void
+do_assert_page_locked(const PageDesc *pd, const char *file, int line)
+{
+if (unlikely(!page_is_locked(pd))) {
+error_report("assert_page_lock: PageDesc %p not locked @ %s:%d",
+ pd, file, line);
+abort();
+}
+}
+
+#define assert_page_locked(pd) do_assert_page_locked(pd, __FILE__, __LINE__)
+
+#else /* !CONFIG_DEBUG_TCG */
+
+#define assert_page_locked(pd)
+
+static inline void page_lock__debug(const PageDesc *pd)
+{
+}
+
+static inline void page_unlock__debug(const PageDesc *pd)
+{
+}
+
+#endif /* CONFIG_DEBUG_TCG */
+
 static inline void page_lock(PageDesc *pd)
 {
+page_lock__debug(pd);
 qemu_spin_lock(>lock);
 }
 
 static inline void page_unlock(PageDesc *pd)
 {
 qemu_spin_unlock(>lock);
+page_unlock__debug(pd);
 }
 
 /* lock the page(s) of a TB in the correct acquisition order */
@@ -1091,6 +1171,7 @@ static TranslationBlock *tb_alloc(target_ulong pc)
 /* call with @p->lock held */
 static inline void invalidate_page_bitmap(PageDesc *p)
 {
+assert_page_locked(p);
 #ifdef CONFIG_SOFTMMU
 g_free(p->code_bitmap);
 p->code_bitmap = NULL;
@@ -1247,6 +1328,7 @@ static inline void tb_page_remove(PageDesc *pd, 
TranslationBlock *tb)
 uintptr_t *pprev;
 unsigned int n1;
 
+assert_page_locked(pd);
 pprev = >first_tb;
 PAGE_FOR_EACH_TB(pd, tb1, n1) {
 if (tb1 == tb) {
@@ -1395,6 +1477,7 @@ static void build_page_bitmap(PageDesc *p)
 int n, tb_start, tb_end;
 TranslationBlock *tb;
 
+assert_page_locked(p);
 p->code_bitmap = bitmap_new(TARGET_PAGE_SIZE);
 
 PAGE_FOR_EACH_TB(p, tb, n) {
@@ -1428,7 +1511,7 @@ static inline void tb_page_add(PageDesc *p, 
TranslationBlock *tb,
 bool page_already_protected;
 #endif
 
-assert_memory_lock();
+assert_page_locked(p);
 
 tb->page_addr[n] = page_addr;
 tb->page_next[n] = p->first_tb;
@@ -1710,8 +1793,7 @@ tb_invalidate_phys_page_range__locked(struct 
page_collection *pages,
 uint32_t current_flags = 0;
 #endif /* TARGET_HAS_PRECISE_SMC */
 
-assert_memory_lock();
-assert_tb_locked();
+assert_page_locked(p);
 
 #if defined(TARGET_HAS_PRECISE_SMC)
 if (cpu != NULL) {
@@ -1723,6 +1805,7 @@ tb_invalidate_phys_page_range__locked(struct 
page_collection *pages,
 /* XXX: see if in some cases it could be faster to invalidate all
the code */
 PAGE_FOR_EACH_TB(p, tb, n) {
+assert_page_locked(p);
 /* NOTE: this is subtle as a TB may span two physical pages */
 if (n == 0) {
 /* NOTE: tb_end may be after the end of the page, but
@@ -1879,6 +1962,7 @@ void tb_invalidate_phys_page_fast(tb_page_addr_t start, 
int len)
 }
 
 pages = page_collection_lock(start, start + len);
+assert_page_locked(p);
 if (!p->code_bitmap &&
 ++p->code_write_count >= SMC_BITMAP_USE_THRESHOLD) {
 

[Qemu-devel] [PATCH v2 05/17] translate-all: iterate over TBs in a page with PAGE_FOR_EACH_TB

2018-04-05 Thread Emilio G. Cota
This commit does several things, but to avoid churn I merged them all
into the same commit. To wit:

- Use uintptr_t instead of TranslationBlock * for the list of TBs in a page.
  Just like we did in (c37e6d7e "tcg: Use uintptr_t type for
  jmp_list_{next|first} fields of TB"), the rationale is the same: these
  are tagged pointers, not pointers. So use a more appropriate type.

- Only check the least significant bit of the tagged pointers. Masking
  with 3/~3 is unnecessary and confusing.

- Introduce the TB_FOR_EACH_TAGGED macro, and use it to define
  PAGE_FOR_EACH_TB, which improves readability. Note that
  TB_FOR_EACH_TAGGED will gain another user in a subsequent patch.

- Update tb_page_remove to use PAGE_FOR_EACH_TB. In case there
  is a bug and we attempt to remove a TB that is not in the list, instead
  of segfaulting (since the list is NULL-terminated) we will reach
  g_assert_not_reached().

Reviewed-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 include/exec/exec-all.h   |  2 +-
 accel/tcg/translate-all.c | 62 ++-
 2 files changed, 30 insertions(+), 34 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 17e08b3..5f7e65a 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -356,7 +356,7 @@ struct TranslationBlock {
 struct TranslationBlock *orig_tb;
 /* first and second physical page containing code. The lower bit
of the pointer tells the index in page_next[] */
-struct TranslationBlock *page_next[2];
+uintptr_t page_next[2];
 tb_page_addr_t page_addr[2];
 
 /* The following data are used to directly call another TB from
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index feac8ce..ecf898c 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -103,7 +103,7 @@
 
 typedef struct PageDesc {
 /* list of TBs intersecting this ram page */
-TranslationBlock *first_tb;
+uintptr_t first_tb;
 #ifdef CONFIG_SOFTMMU
 /* in order to optimize self modifying code, we count the number
of lookups we do to a given page to use a bitmap */
@@ -114,6 +114,15 @@ typedef struct PageDesc {
 #endif
 } PageDesc;
 
+/* list iterators for lists of tagged pointers in TranslationBlock */
+#define TB_FOR_EACH_TAGGED(head, tb, n, field)  \
+for (n = (head) & 1, tb = (TranslationBlock *)((head) & ~1);\
+ tb; tb = (TranslationBlock *)tb->field[n], n = (uintptr_t)tb & 1, \
+ tb = (TranslationBlock *)((uintptr_t)tb & ~1))
+
+#define PAGE_FOR_EACH_TB(pagedesc, tb, n)   \
+TB_FOR_EACH_TAGGED((pagedesc)->first_tb, tb, n, page_next)
+
 /* In system mode we want L1_MAP to be based on ram offsets,
while in user mode we want it to be based on virtual addresses.  */
 #if !defined(CONFIG_USER_ONLY)
@@ -818,7 +827,7 @@ static void page_flush_tb_1(int level, void **lp)
 PageDesc *pd = *lp;
 
 for (i = 0; i < V_L2_SIZE; ++i) {
-pd[i].first_tb = NULL;
+pd[i].first_tb = (uintptr_t)NULL;
 invalidate_page_bitmap(pd + i);
 }
 } else {
@@ -946,21 +955,21 @@ static void tb_page_check(void)
 
 #endif /* CONFIG_USER_ONLY */
 
-static inline void tb_page_remove(TranslationBlock **ptb, TranslationBlock *tb)
+static inline void tb_page_remove(PageDesc *pd, TranslationBlock *tb)
 {
 TranslationBlock *tb1;
+uintptr_t *pprev;
 unsigned int n1;
 
-for (;;) {
-tb1 = *ptb;
-n1 = (uintptr_t)tb1 & 3;
-tb1 = (TranslationBlock *)((uintptr_t)tb1 & ~3);
+pprev = >first_tb;
+PAGE_FOR_EACH_TB(pd, tb1, n1) {
 if (tb1 == tb) {
-*ptb = tb1->page_next[n1];
-break;
+*pprev = tb1->page_next[n1];
+return;
 }
-ptb = >page_next[n1];
+pprev = >page_next[n1];
 }
+g_assert_not_reached();
 }
 
 /* remove the TB from a list of TBs jumping to the n-th jump target of the TB 
*/
@@ -1048,12 +1057,12 @@ void tb_phys_invalidate(TranslationBlock *tb, 
tb_page_addr_t page_addr)
 /* remove the TB from the page list */
 if (tb->page_addr[0] != page_addr) {
 p = page_find(tb->page_addr[0] >> TARGET_PAGE_BITS);
-tb_page_remove(>first_tb, tb);
+tb_page_remove(p, tb);
 invalidate_page_bitmap(p);
 }
 if (tb->page_addr[1] != -1 && tb->page_addr[1] != page_addr) {
 p = page_find(tb->page_addr[1] >> TARGET_PAGE_BITS);
-tb_page_remove(>first_tb, tb);
+tb_page_remove(p, tb);
 invalidate_page_bitmap(p);
 }
 
@@ -1084,10 +1093,7 @@ static void build_page_bitmap(PageDesc *p)
 
 p->code_bitmap = bitmap_new(TARGET_PAGE_SIZE);
 
-tb = p->first_tb;
-while (tb != NULL) {
-n = (uintptr_t)tb & 3;
-tb = (TranslationBlock *)((uintptr_t)tb & ~3);
+PAGE_FOR_EACH_TB(p, tb, n) {
   

[Qemu-devel] [PATCH v2 14/17] translate-all: protect TB jumps with a per-destination-TB lock

2018-04-05 Thread Emilio G. Cota
This applies to both user-mode and !user-mode emulation.

Instead of relying on a global lock, protect the list of incoming
jumps with tb->jmp_lock. This lock also protects tb->cflags,
so update all tb->cflags readers outside tb->jmp_lock to use
atomic reads via tb_cflags().

In order to find the destination TB (and therefore its jmp_lock)
from the origin TB, we introduce tb->jmp_dest[].

I considered not using a linked list of jumps, which simplifies
code and makes the struct smaller. However, it unnecessarily increases
memory usage, which results in a performance decrease. See for
instance these numbers booting+shutting down debian-arm:
  Time (s)  Rel. err (%)  Abs. err (s)  Rel. slowdown (%)
--
 before  20.88  0.74  0.154512 0.
 after   20.81  0.38  0.079078-0.33524904
 GTree   21.02  0.28  0.058856 0.67049808
 GHashTable + xxhash 21.63  1.08  0.233604  3.5919540

Using a hash table or a binary tree to keep track of the jumps
doesn't really pay off, not only due to the increased memory usage,
but also because most TBs have only 0 or 1 jumps to them. The maximum
number of jumps when booting debian-arm that I measured is 35, but
as we can see in the histogram below a TB with that many incoming jumps
is extremely rare; the average TB has 0.80 incoming jumps.

n_jumps: 379208; avg jumps/tb: 0.801099
dist: [0.0,1.0)|▄█▁▁▁ ▁▁ ▁▁▁  ▁▁▁ ▁|[34.0,35.0]

Signed-off-by: Emilio G. Cota 
---
 docs/devel/multi-thread-tcg.txt |   6 +-
 include/exec/exec-all.h |  35 +++-
 accel/tcg/cpu-exec.c|  41 +-
 accel/tcg/translate-all.c   | 118 
 4 files changed, 124 insertions(+), 76 deletions(-)

diff --git a/docs/devel/multi-thread-tcg.txt b/docs/devel/multi-thread-tcg.txt
index faf09c6..df83445 100644
--- a/docs/devel/multi-thread-tcg.txt
+++ b/docs/devel/multi-thread-tcg.txt
@@ -131,8 +131,10 @@ DESIGN REQUIREMENT: Safely handle invalidation of TBs
 
 The direct jump themselves are updated atomically by the TCG
 tb_set_jmp_target() code. Modification to the linked lists that allow
-searching for linked pages are done under the protect of the
-tb_lock().
+searching for linked pages are done under the protection of tb->jmp_lock,
+where tb is the destination block of a jump. Each origin block keeps a
+pointer to its destinations so that the appropriate lock can be acquired before
+iterating over a jump list.
 
 The global page table is a lockless radix tree; cmpxchg is used
 to atomically insert new elements.
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 7911e69..f8adeb8 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -341,7 +341,7 @@ struct TranslationBlock {
 #define CF_LAST_IO 0x8000 /* Last insn may be an IO access.  */
 #define CF_NOCACHE 0x0001 /* To be freed after execution */
 #define CF_USE_ICOUNT  0x0002
-#define CF_INVALID 0x0004 /* TB is stale. Setters need tb_lock */
+#define CF_INVALID 0x0004 /* TB is stale. Set with @jmp_lock held */
 #define CF_PARALLEL0x0008 /* Generate code for a parallel context */
 /* cflags' mask for hashing/comparison */
 #define CF_HASH_MASK   \
@@ -360,6 +360,9 @@ struct TranslationBlock {
 uintptr_t page_next[2];
 tb_page_addr_t page_addr[2];
 
+/* jmp_lock placed here to fill a 4-byte hole. Its documentation is below 
*/
+QemuSpin jmp_lock;
+
 /* The following data are used to directly call another TB from
  * the code of this one. This can be done either by emitting direct or
  * indirect native jump instructions. These jumps are reset so that the TB
@@ -371,20 +374,26 @@ struct TranslationBlock {
 #define TB_JMP_RESET_OFFSET_INVALID 0x /* indicates no jump generated */
 uintptr_t jmp_target_arg[2];  /* target address or offset */
 
-/* Each TB has an associated circular list of TBs jumping to this one.
- * jmp_list_first points to the first TB jumping to this one.
- * jmp_list_next is used to point to the next TB in a list.
- * Since each TB can have two jumps, it can participate in two lists.
- * jmp_list_first and jmp_list_next are 4-byte aligned pointers to a
- * TranslationBlock structure, but the two least significant bits of
- * them are used to encode which data field of the pointed TB should
- * be used to traverse the list further from that TB:
- * 0 => jmp_list_next[0], 1 => jmp_list_next[1], 2 => jmp_list_first.
- * In other words, 0/1 tells which jump is used in the pointed TB,
- * and 2 means that this is a pointer back to the target TB of this list.
+/*
+ * Each TB has a NULL-terminated list (jmp_list_head) of incoming jumps.
+ * Each TB can 

[Qemu-devel] [PATCH v2 15/17] cputlb: remove tb_lock from tlb_flush functions

2018-04-05 Thread Emilio G. Cota
The acquisition of tb_lock was added when the async tlb_flush
was introduced in e3b9ca810 ("cputlb: introduce tlb_flush_* async work.")

tb_lock was there to allow us to do memset() on the tb_jmp_cache's.
However, since f3ced3c5928 ("tcg: consistently access cpu->tb_jmp_cache
atomically") all accesses to tb_jmp_cache are atomic, so tb_lock
is not needed here. Get rid of it.

Reviewed-by: Alex Bennée 
Signed-off-by: Emilio G. Cota 
---
 accel/tcg/cputlb.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 0543903..f5c3a09 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -125,8 +125,6 @@ static void tlb_flush_nocheck(CPUState *cpu)
 atomic_set(>tlb_flush_count, env->tlb_flush_count + 1);
 tlb_debug("(count: %zu)\n", tlb_flush_count());
 
-tb_lock();
-
 memset(env->tlb_table, -1, sizeof(env->tlb_table));
 memset(env->tlb_v_table, -1, sizeof(env->tlb_v_table));
 cpu_tb_jmp_cache_clear(cpu);
@@ -135,8 +133,6 @@ static void tlb_flush_nocheck(CPUState *cpu)
 env->tlb_flush_addr = -1;
 env->tlb_flush_mask = 0;
 
-tb_unlock();
-
 atomic_mb_set(>pending_tlb_flush, 0);
 }
 
@@ -180,8 +176,6 @@ static void tlb_flush_by_mmuidx_async_work(CPUState *cpu, 
run_on_cpu_data data)
 
 assert_cpu_is_self(cpu);
 
-tb_lock();
-
 tlb_debug("start: mmu_idx:0x%04lx\n", mmu_idx_bitmask);
 
 for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
@@ -197,8 +191,6 @@ static void tlb_flush_by_mmuidx_async_work(CPUState *cpu, 
run_on_cpu_data data)
 cpu_tb_jmp_cache_clear(cpu);
 
 tlb_debug("done\n");
-
-tb_unlock();
 }
 
 void tlb_flush_by_mmuidx(CPUState *cpu, uint16_t idxmap)
-- 
2.7.4




[Qemu-devel] [PATCH v2 03/17] tcg: track TBs with per-region BST's

2018-04-05 Thread Emilio G. Cota
This paves the way for enabling scalable parallel generation of TCG code.

Instead of tracking TBs with a single binary search tree (BST), use a
BST for each TCG region, protecting it with a lock. This is as scalable
as it gets, since each TCG thread operates on a separate region.

The core of this change is the introduction of struct tcg_region_tree,
which contains a pointer to a GTree and an associated lock to serialize
accesses to it. We then allocate an array of tcg_region_tree's, adding
the appropriate padding to avoid false sharing based on
qemu_dcache_linesize.

Given a tc_ptr, we first find the corresponding region_tree. This
is done by special-casing the first and last regions first, since they
might be of size != region.size; otherwise we just divide the offset
by region.stride. I was worried about this division (several dozen
cycles of latency), but profiling shows that this is not a fast path.
Note that region.stride is not required to be a power of two; it
is only required to be a multiple of the host's page size.

Note that with this design we can also provide consistent snapshots
about all region trees at once; for instance, tcg_tb_foreach
acquires/releases all region_tree locks before/after iterating over them.
For this reason we now drop tb_lock in dump_exec_info().

As an alternative I considered implementing a concurrent BST, but this
can be tricky to get right, offers no consistent snapshots of the BST,
and performance and scalability-wise I don't think it could ever beat
having separate GTrees, given that our workload is insert-mostly (all
concurrent BST designs I've seen focus, understandably, on making
lookups fast, which comes at the expense of convoluted, non-wait-free
insertions/removals).

Reviewed-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Signed-off-by: Emilio G. Cota 
---
 include/exec/exec-all.h   |   1 -
 include/exec/tb-context.h |   1 -
 tcg/tcg.h |   6 ++
 accel/tcg/cpu-exec.c  |   2 +-
 accel/tcg/translate-all.c | 101 
 tcg/tcg.c | 191 ++
 6 files changed, 213 insertions(+), 89 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index e5afd2e..17e08b3 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -401,7 +401,6 @@ static inline uint32_t curr_cflags(void)
  | (use_icount ? CF_USE_ICOUNT : 0);
 }
 
-void tb_remove(TranslationBlock *tb);
 void tb_flush(CPUState *cpu);
 void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr);
 TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,
diff --git a/include/exec/tb-context.h b/include/exec/tb-context.h
index 1d41202..d8472c8 100644
--- a/include/exec/tb-context.h
+++ b/include/exec/tb-context.h
@@ -31,7 +31,6 @@ typedef struct TBContext TBContext;
 
 struct TBContext {
 
-GTree *tb_tree;
 struct qht htable;
 /* any access to the tbs or the page table must use this lock */
 QemuMutex tb_lock;
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 9e2d909..8bf29cc 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -850,6 +850,12 @@ void tcg_region_reset_all(void);
 size_t tcg_code_size(void);
 size_t tcg_code_capacity(void);
 
+void tcg_tb_insert(TranslationBlock *tb);
+void tcg_tb_remove(TranslationBlock *tb);
+TranslationBlock *tcg_tb_lookup(uintptr_t tc_ptr);
+void tcg_tb_foreach(GTraverseFunc func, gpointer user_data);
+size_t tcg_nb_tbs(void);
+
 /* user-mode: Called with tb_lock held.  */
 static inline void *tcg_malloc(int size)
 {
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index dabf292..778801a 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -222,7 +222,7 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
 
 tb_lock();
 tb_phys_invalidate(tb, -1);
-tb_remove(tb);
+tcg_tb_remove(tb);
 tb_unlock();
 }
 #endif
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index d16e094..449d4de 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -205,8 +205,6 @@ void tb_lock_reset(void)
 }
 }
 
-static TranslationBlock *tb_find_pc(uintptr_t tc_ptr);
-
 void cpu_gen_init(void)
 {
 tcg_context_init(_init_ctx);
@@ -375,13 +373,13 @@ bool cpu_restore_state(CPUState *cpu, uintptr_t host_pc)
 
 if (check_offset < tcg_init_ctx.code_gen_buffer_size) {
 tb_lock();
-tb = tb_find_pc(host_pc);
+tb = tcg_tb_lookup(host_pc);
 if (tb) {
 cpu_restore_state_from_tb(cpu, tb, host_pc);
 if (tb->cflags & CF_NOCACHE) {
 /* one-shot translation, invalidate it immediately */
 tb_phys_invalidate(tb, -1);
-tb_remove(tb);
+tcg_tb_remove(tb);
 }
 r = true;
 }
@@ -731,48 +729,6 @@ static inline void *alloc_code_gen_buffer(void)
 }
 #endif /* 

[Qemu-devel] [PATCH v2 12/17] translate-all: add page_collection assertions

2018-04-05 Thread Emilio G. Cota
The appended adds assertions to make sure we do not longjmp with page
locks held. Some notes:

- user-mode has nothing to check, since page_locks are !user-mode only.

- The checks only apply to page collections, since these have relatively
  complex callers.

- Some simple page_lock/unlock callers have been left unchecked --
  namely page_lock_tb, tb_phys_invalidate and tb_link_page.

Reviewed-by: Alex Bennée 
Signed-off-by: Emilio G. Cota 
---
 include/exec/exec-all.h   |  8 
 accel/tcg/cpu-exec.c  |  1 +
 accel/tcg/translate-all.c | 20 
 3 files changed, 29 insertions(+)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index aeaa127..7911e69 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -431,6 +431,14 @@ void tb_lock(void);
 void tb_unlock(void);
 void tb_lock_reset(void);
 
+#if !defined(CONFIG_USER_ONLY) && defined(CONFIG_DEBUG_TCG)
+void assert_page_collection_locked(bool val);
+#else
+static inline void assert_page_collection_locked(bool val)
+{
+}
+#endif
+
 #if !defined(CONFIG_USER_ONLY)
 
 struct MemoryRegion *iotlb_to_region(CPUState *cpu,
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 778801a..6a3a21d 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -271,6 +271,7 @@ void cpu_exec_step_atomic(CPUState *cpu)
 tcg_debug_assert(!have_mmap_lock());
 #endif
 tb_lock_reset();
+assert_page_collection_locked(false);
 }
 
 if (in_exclusive_region) {
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 29bc1da..f8862f6 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -666,6 +666,18 @@ do_assert_page_locked(const PageDesc *pd, const char 
*file, int line)
 
 #define assert_page_locked(pd) do_assert_page_locked(pd, __FILE__, __LINE__)
 
+static __thread bool page_collection_locked;
+
+void assert_page_collection_locked(bool val)
+{
+tcg_debug_assert(page_collection_locked == val);
+}
+
+static inline void set_page_collection_locked(bool val)
+{
+page_collection_locked = val;
+}
+
 #else /* !CONFIG_DEBUG_TCG */
 
 #define assert_page_locked(pd)
@@ -678,6 +690,10 @@ static inline void page_unlock__debug(const PageDesc *pd)
 {
 }
 
+static inline void set_page_collection_locked(bool val)
+{
+}
+
 #endif /* CONFIG_DEBUG_TCG */
 
 static inline void page_lock(PageDesc *pd)
@@ -754,6 +770,7 @@ static void do_page_entry_lock(struct page_entry *pe)
 page_lock(pe->pd);
 g_assert(!pe->locked);
 pe->locked = true;
+set_page_collection_locked(true);
 }
 
 static gboolean page_entry_lock(gpointer key, gpointer value, gpointer data)
@@ -846,6 +863,7 @@ page_collection_lock(tb_page_addr_t start, tb_page_addr_t 
end)
 set->tree = g_tree_new_full(tb_page_addr_cmp, NULL, NULL,
 page_entry_destroy);
 set->max = NULL;
+assert_page_collection_locked(false);
 
  retry:
 g_tree_foreach(set->tree, page_entry_lock, NULL);
@@ -864,6 +882,7 @@ page_collection_lock(tb_page_addr_t start, tb_page_addr_t 
end)
  page_trylock_add(set, tb->page_addr[1]))) {
 /* drop all locks, and reacquire in order */
 g_tree_foreach(set->tree, page_entry_unlock, NULL);
+set_page_collection_locked(false);
 goto retry;
 }
 }
@@ -876,6 +895,7 @@ void page_collection_unlock(struct page_collection *set)
 /* entries are unlocked and freed via page_entry_destroy */
 g_tree_destroy(set->tree);
 g_free(set);
+set_page_collection_locked(false);
 }
 
 #endif /* !CONFIG_USER_ONLY */
-- 
2.7.4




[Qemu-devel] [PATCH v2 08/17] translate-all: work page-by-page in tb_invalidate_phys_range_1

2018-04-05 Thread Emilio G. Cota
So that we pass a same-page range to tb_invalidate_phys_page_range,
instead of always passing an end address that could be on a different
page.

As discussed with Peter Maydell on the list [1], tb_invalidate_phys_page_range
doesn't actually do much with 'end', which explains why we have never
hit a bug despite going against what the comment on top of
tb_invalidate_phys_page_range requires:

> * Invalidate all TBs which intersect with the target physical address range
> * [start;end[. NOTE: start and end must refer to the *same* physical page.

The appended honours the comment, which avoids confusion.

While at it, rework the loop into a for loop, which is less error prone
(e.g. "continue" won't result in an infinite loop).

[1] https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg09165.html

Reviewed-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Signed-off-by: Emilio G. Cota 
---
 accel/tcg/translate-all.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 7c72354..b27aacc 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1378,10 +1378,14 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
  */
 static void tb_invalidate_phys_range_1(tb_page_addr_t start, tb_page_addr_t 
end)
 {
-while (start < end) {
-tb_invalidate_phys_page_range(start, end, 0);
-start &= TARGET_PAGE_MASK;
-start += TARGET_PAGE_SIZE;
+tb_page_addr_t next;
+
+for (next = (start & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
+ start < end;
+ start = next, next += TARGET_PAGE_SIZE) {
+tb_page_addr_t bound = MIN(next, end);
+
+tb_invalidate_phys_page_range(start, bound, 0);
 }
 }
 
-- 
2.7.4




[Qemu-devel] [PATCH v2 04/17] tcg: move tb_ctx.tb_phys_invalidate_count to tcg_ctx

2018-04-05 Thread Emilio G. Cota
Thereby making it per-TCGContext. Once we remove tb_lock, this will
avoid an atomic increment every time a TB is invalidated.

Reviewed-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Signed-off-by: Emilio G. Cota 
---
 include/exec/tb-context.h |  1 -
 tcg/tcg.h |  3 +++
 accel/tcg/translate-all.c |  5 +++--
 tcg/tcg.c | 14 ++
 4 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/include/exec/tb-context.h b/include/exec/tb-context.h
index d8472c8..8c9b49c 100644
--- a/include/exec/tb-context.h
+++ b/include/exec/tb-context.h
@@ -37,7 +37,6 @@ struct TBContext {
 
 /* statistics */
 unsigned tb_flush_count;
-int tb_phys_invalidate_count;
 };
 
 extern TBContext tb_ctx;
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 8bf29cc..9dd9448 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -694,6 +694,8 @@ struct TCGContext {
 /* Threshold to flush the translated code buffer.  */
 void *code_gen_highwater;
 
+size_t tb_phys_invalidate_count;
+
 /* Track which vCPU triggers events */
 CPUState *cpu;  /* *_trans */
 
@@ -852,6 +854,7 @@ size_t tcg_code_capacity(void);
 
 void tcg_tb_insert(TranslationBlock *tb);
 void tcg_tb_remove(TranslationBlock *tb);
+size_t tcg_tb_phys_invalidate_count(void);
 TranslationBlock *tcg_tb_lookup(uintptr_t tc_ptr);
 void tcg_tb_foreach(GTraverseFunc func, gpointer user_data);
 size_t tcg_nb_tbs(void);
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 449d4de..feac8ce 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1072,7 +1072,8 @@ void tb_phys_invalidate(TranslationBlock *tb, 
tb_page_addr_t page_addr)
 /* suppress any remaining jumps to this TB */
 tb_jmp_unlink(tb);
 
-tb_ctx.tb_phys_invalidate_count++;
+atomic_set(_ctx->tb_phys_invalidate_count,
+   tcg_ctx->tb_phys_invalidate_count + 1);
 }
 
 #ifdef CONFIG_SOFTMMU
@@ -1857,7 +1858,7 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
 cpu_fprintf(f, "\nStatistics:\n");
 cpu_fprintf(f, "TB flush count  %u\n",
 atomic_read(_ctx.tb_flush_count));
-cpu_fprintf(f, "TB invalidate count %d\n", 
tb_ctx.tb_phys_invalidate_count);
+cpu_fprintf(f, "TB invalidate count %zu\n", 
tcg_tb_phys_invalidate_count());
 cpu_fprintf(f, "TLB flush count %zu\n", tlb_flush_count());
 tcg_dump_info(f, cpu_fprintf);
 }
diff --git a/tcg/tcg.c b/tcg/tcg.c
index b471708..a7b596e 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -791,6 +791,20 @@ size_t tcg_code_capacity(void)
 return capacity;
 }
 
+size_t tcg_tb_phys_invalidate_count(void)
+{
+unsigned int n_ctxs = atomic_read(_tcg_ctxs);
+unsigned int i;
+size_t total = 0;
+
+for (i = 0; i < n_ctxs; i++) {
+const TCGContext *s = atomic_read(_ctxs[i]);
+
+total += atomic_read(>tb_phys_invalidate_count);
+}
+return total;
+}
+
 /* pool based memory allocation */
 void *tcg_malloc_internal(TCGContext *s, int size)
 {
-- 
2.7.4




[Qemu-devel] [PATCH v2 02/17] qht: return existing entry when qht_insert fails

2018-04-05 Thread Emilio G. Cota
The meaning of "existing" is now changed to "matches in hash and
ht->cmp result". This is saner than just checking the pointer value.

Suggested-by: Richard Henderson 
Reviewed-by:  Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 include/qemu/qht.h|  7 +--
 accel/tcg/translate-all.c |  2 +-
 tests/qht-bench.c |  4 ++--
 tests/test-qht.c  |  8 +++-
 util/qht.c| 27 +--
 5 files changed, 32 insertions(+), 16 deletions(-)

diff --git a/include/qemu/qht.h b/include/qemu/qht.h
index 5f03a0f..1fb9116 100644
--- a/include/qemu/qht.h
+++ b/include/qemu/qht.h
@@ -70,6 +70,7 @@ void qht_destroy(struct qht *ht);
  * @ht: QHT to insert to
  * @p: pointer to be inserted
  * @hash: hash corresponding to @p
+ * @existing: address where the pointer to an existing entry can be copied to
  *
  * Attempting to insert a NULL @p is a bug.
  * Inserting the same pointer @p with different @hash values is a bug.
@@ -78,9 +79,11 @@ void qht_destroy(struct qht *ht);
  * inserted into the hash table.
  *
  * Returns true on success.
- * Returns false if the @p-@hash pair already exists in the hash table.
+ * Returns false if there is an existing entry in the table that is equivalent
+ * (i.e. ht->cmp matches and the hash is the same) to @p-@h. If @existing
+ * is !NULL, a pointer to this existing entry is copied to it.
  */
-bool qht_insert(struct qht *ht, void *p, uint32_t hash);
+bool qht_insert(struct qht *ht, void *p, uint32_t hash, void **existing);
 
 /**
  * qht_lookup_custom - Look up a pointer using a custom comparison function.
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 7af5f36..d16e094 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1245,7 +1245,7 @@ static void tb_link_page(TranslationBlock *tb, 
tb_page_addr_t phys_pc,
 /* add in the hash table */
 h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags & CF_HASH_MASK,
  tb->trace_vcpu_dstate);
-qht_insert(_ctx.htable, tb, h);
+qht_insert(_ctx.htable, tb, h, NULL);
 
 #ifdef CONFIG_USER_ONLY
 if (DEBUG_TB_CHECK_GATE) {
diff --git a/tests/qht-bench.c b/tests/qht-bench.c
index c94ac25..f492b3a 100644
--- a/tests/qht-bench.c
+++ b/tests/qht-bench.c
@@ -163,7 +163,7 @@ static void do_rw(struct thread_info *info)
 bool written = false;
 
 if (qht_lookup(, p, hash) == NULL) {
-written = qht_insert(, p, hash);
+written = qht_insert(, p, hash, NULL);
 }
 if (written) {
 stats->in++;
@@ -322,7 +322,7 @@ static void htable_init(void)
 r = xorshift64star(r);
 p = [r & (init_range - 1)];
 hash = h(*p);
-if (qht_insert(, p, hash)) {
+if (qht_insert(, p, hash, NULL)) {
 break;
 }
 retries++;
diff --git a/tests/test-qht.c b/tests/test-qht.c
index b069881..dda6a06 100644
--- a/tests/test-qht.c
+++ b/tests/test-qht.c
@@ -27,11 +27,17 @@ static void insert(int a, int b)
 
 for (i = a; i < b; i++) {
 uint32_t hash;
+void *existing;
+bool inserted;
 
 arr[i] = i;
 hash = i;
 
-qht_insert(, [i], hash);
+inserted = qht_insert(, [i], hash, NULL);
+g_assert_true(inserted);
+inserted = qht_insert(, [i], hash, );
+g_assert_false(inserted);
+g_assert_true(existing == [i]);
 }
 }
 
diff --git a/util/qht.c b/util/qht.c
index 8610ce3..9d030e7 100644
--- a/util/qht.c
+++ b/util/qht.c
@@ -511,9 +511,9 @@ void *qht_lookup(struct qht *ht, const void *userp, 
uint32_t hash)
 }
 
 /* call with head->lock held */
-static bool qht_insert__locked(struct qht *ht, struct qht_map *map,
-   struct qht_bucket *head, void *p, uint32_t hash,
-   bool *needs_resize)
+static void *qht_insert__locked(struct qht *ht, struct qht_map *map,
+struct qht_bucket *head, void *p, uint32_t 
hash,
+bool *needs_resize)
 {
 struct qht_bucket *b = head;
 struct qht_bucket *prev = NULL;
@@ -523,8 +523,9 @@ static bool qht_insert__locked(struct qht *ht, struct 
qht_map *map,
 do {
 for (i = 0; i < QHT_BUCKET_ENTRIES; i++) {
 if (b->pointers[i]) {
-if (unlikely(b->pointers[i] == p)) {
-return false;
+if (unlikely(b->hashes[i] == hash &&
+ ht->cmp(b->pointers[i], p))) {
+return b->pointers[i];
 }
 } else {
 goto found;
@@ -553,7 +554,7 @@ static bool qht_insert__locked(struct qht *ht, struct 
qht_map *map,
 atomic_set(>hashes[i], hash);
 atomic_set(>pointers[i], p);
 seqlock_write_end(>sequence);
-   

[Qemu-devel] [PATCH v2 16/17] translate-all: remove tb_lock mention from cpu_restore_state_from_tb

2018-04-05 Thread Emilio G. Cota
tb_lock was needed when the function did retranslation. However,
since fca8a500d519 ("tcg: Save insn data and use it in
cpu_restore_state_from_tb") we don't do retranslation.

Get rid of the comment.

Signed-off-by: Emilio G. Cota 
---
 accel/tcg/translate-all.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 486c9df..62e5796 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -354,9 +354,7 @@ static int encode_search(TranslationBlock *tb, uint8_t 
*block)
 return p - block;
 }
 
-/* The cpu state corresponding to 'searched_pc' is restored.
- * Called with tb_lock held.
- */
+/* The cpu state corresponding to 'searched_pc' is restored */
 static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
  uintptr_t searched_pc)
 {
-- 
2.7.4




[Qemu-devel] [PATCH v2 00/17] tcg: tb_lock_removal redux v2

2018-04-05 Thread Emilio G. Cota
v1: http://lists.gnu.org/archive/html/qemu-devel/2018-02/msg06499.html

Changes since v1:

- Add R-b's

- Rebase onto master

- qht_lookup_custom: move @func to be the last argument, which
  simplifies the new qht_lookup function. (I've kept R-b's tags
  here because this is a very simple change.)

- qht_insert: add an **existing argument and keep the bool return value,
  as suggested by Alex.

- Fix indentation of TB_FOR_EACH_TAGGED macro

- Add page_locked assertions, as suggested by Alex.

- Expand comment in tb_link_page and in docs/mttcg about parallel
  code insertion.

- Fix stale comment about tb_lock next to CF_INVALID

- Fix stale comment in cpu_restore_state, as suggested by Alex.

There is only one checkpatch error for the entire series -- it is
a false positive.

You can fetch the tree from:
  https://github.com/cota/qemu/tree/tb-lock-removal-redux-v2

Thanks,

Emilio
---
 accel/tcg/cpu-exec.c|   96 ++-
 accel/tcg/cputlb.c  |8 -
 accel/tcg/translate-all.c   | 1053 ++--
 accel/tcg/translate-all.h   |6 +-
 docs/devel/multi-thread-tcg.txt |   24 +-
 exec.c  |   25 +-
 include/exec/cpu-common.h   |2 +-
 include/exec/exec-all.h |   51 +-
 include/exec/memory-internal.h  |6 +-
 include/exec/tb-context.h   |4 -
 include/qemu/qht.h  |   32 +-
 linux-user/main.c   |3 -
 tcg/tcg.c   |  205 +++
 tcg/tcg.h   |   13 +-
 tests/qht-bench.c   |   18 +-
 tests/test-qht.c|   23 +-
 util/qht.c  |   41 +-
 17 files changed, 1133 insertions(+), 477 deletions(-)




[Qemu-devel] [PATCH v2 09/17] translate-all: move tb_invalidate_phys_page_range up in the file

2018-04-05 Thread Emilio G. Cota
This greatly simplifies next commit's diff.

Reviewed-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Signed-off-by: Emilio G. Cota 
---
 accel/tcg/translate-all.c | 77 ---
 1 file changed, 39 insertions(+), 38 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index b27aacc..4d4391f 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1368,44 +1368,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 
 /*
  * Invalidate all TBs which intersect with the target physical address range
- * [start;end[. NOTE: start and end may refer to *different* physical pages.
- * 'is_cpu_write_access' should be true if called from a real cpu write
- * access: the virtual CPU will exit the current TB if code is modified inside
- * this TB.
- *
- * Called with mmap_lock held for user-mode emulation, grabs tb_lock
- * Called with tb_lock held for system-mode emulation
- */
-static void tb_invalidate_phys_range_1(tb_page_addr_t start, tb_page_addr_t 
end)
-{
-tb_page_addr_t next;
-
-for (next = (start & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
- start < end;
- start = next, next += TARGET_PAGE_SIZE) {
-tb_page_addr_t bound = MIN(next, end);
-
-tb_invalidate_phys_page_range(start, bound, 0);
-}
-}
-
-#ifdef CONFIG_SOFTMMU
-void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
-{
-assert_tb_locked();
-tb_invalidate_phys_range_1(start, end);
-}
-#else
-void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
-{
-assert_memory_lock();
-tb_lock();
-tb_invalidate_phys_range_1(start, end);
-tb_unlock();
-}
-#endif
-/*
- * Invalidate all TBs which intersect with the target physical address range
  * [start;end[. NOTE: start and end must refer to the *same* physical page.
  * 'is_cpu_write_access' should be true if called from a real cpu write
  * access: the virtual CPU will exit the current TB if code is modified inside
@@ -1502,6 +1464,45 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, 
tb_page_addr_t end,
 #endif
 }
 
+/*
+ * Invalidate all TBs which intersect with the target physical address range
+ * [start;end[. NOTE: start and end may refer to *different* physical pages.
+ * 'is_cpu_write_access' should be true if called from a real cpu write
+ * access: the virtual CPU will exit the current TB if code is modified inside
+ * this TB.
+ *
+ * Called with mmap_lock held for user-mode emulation, grabs tb_lock
+ * Called with tb_lock held for system-mode emulation
+ */
+static void tb_invalidate_phys_range_1(tb_page_addr_t start, tb_page_addr_t 
end)
+{
+tb_page_addr_t next;
+
+for (next = (start & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
+ start < end;
+ start = next, next += TARGET_PAGE_SIZE) {
+tb_page_addr_t bound = MIN(next, end);
+
+tb_invalidate_phys_page_range(start, bound, 0);
+}
+}
+
+#ifdef CONFIG_SOFTMMU
+void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
+{
+assert_tb_locked();
+tb_invalidate_phys_range_1(start, end);
+}
+#else
+void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
+{
+assert_memory_lock();
+tb_lock();
+tb_invalidate_phys_range_1(start, end);
+tb_unlock();
+}
+#endif
+
 #ifdef CONFIG_SOFTMMU
 /* len must be <= 8 and start must be a multiple of len.
  * Called via softmmu_template.h when code areas are written to with
-- 
2.7.4




[Qemu-devel] [PATCH v2 06/17] translate-all: make l1_map lockless

2018-04-05 Thread Emilio G. Cota
Groundwork for supporting parallel TCG generation.

We never remove entries from the radix tree, so we can use cmpxchg
to implement lockless insertions.

Reviewed-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Signed-off-by: Emilio G. Cota 
---
 docs/devel/multi-thread-tcg.txt |  4 ++--
 accel/tcg/translate-all.c   | 24 ++--
 2 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/docs/devel/multi-thread-tcg.txt b/docs/devel/multi-thread-tcg.txt
index a99b456..faf8918 100644
--- a/docs/devel/multi-thread-tcg.txt
+++ b/docs/devel/multi-thread-tcg.txt
@@ -134,8 +134,8 @@ tb_set_jmp_target() code. Modification to the linked lists 
that allow
 searching for linked pages are done under the protect of the
 tb_lock().
 
-The global page table is protected by the tb_lock() in system-mode and
-mmap_lock() in linux-user mode.
+The global page table is a lockless radix tree; cmpxchg is used
+to atomically insert new elements.
 
 The lookup caches are updated atomically and the lookup hash uses QHT
 which is designed for concurrent safe lookup.
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index ecf898c..83aec43 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -469,20 +469,12 @@ static void page_init(void)
 #endif
 }
 
-/* If alloc=1:
- * Called with tb_lock held for system emulation.
- * Called with mmap_lock held for user-mode emulation.
- */
 static PageDesc *page_find_alloc(tb_page_addr_t index, int alloc)
 {
 PageDesc *pd;
 void **lp;
 int i;
 
-if (alloc) {
-assert_memory_lock();
-}
-
 /* Level 1.  Always allocated.  */
 lp = l1_map + ((index >> v_l1_shift) & (v_l1_size - 1));
 
@@ -491,11 +483,17 @@ static PageDesc *page_find_alloc(tb_page_addr_t index, 
int alloc)
 void **p = atomic_rcu_read(lp);
 
 if (p == NULL) {
+void *existing;
+
 if (!alloc) {
 return NULL;
 }
 p = g_new0(void *, V_L2_SIZE);
-atomic_rcu_set(lp, p);
+existing = atomic_cmpxchg(lp, NULL, p);
+if (unlikely(existing)) {
+g_free(p);
+p = existing;
+}
 }
 
 lp = p + ((index >> (i * V_L2_BITS)) & (V_L2_SIZE - 1));
@@ -503,11 +501,17 @@ static PageDesc *page_find_alloc(tb_page_addr_t index, 
int alloc)
 
 pd = atomic_rcu_read(lp);
 if (pd == NULL) {
+void *existing;
+
 if (!alloc) {
 return NULL;
 }
 pd = g_new0(PageDesc, V_L2_SIZE);
-atomic_rcu_set(lp, pd);
+existing = atomic_cmpxchg(lp, NULL, pd);
+if (unlikely(existing)) {
+g_free(pd);
+pd = existing;
+}
 }
 
 return pd + (index & (V_L2_SIZE - 1));
-- 
2.7.4




[Qemu-devel] [PATCH v2 07/17] translate-all: remove hole in PageDesc

2018-04-05 Thread Emilio G. Cota
Groundwork for supporting parallel TCG generation.

Move the hole to the end of the struct, so that a u32
field can be added there without bloating the struct.

Reviewed-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Signed-off-by: Emilio G. Cota 
---
 accel/tcg/translate-all.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 83aec43..7c72354 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -107,8 +107,8 @@ typedef struct PageDesc {
 #ifdef CONFIG_SOFTMMU
 /* in order to optimize self modifying code, we count the number
of lookups we do to a given page to use a bitmap */
-unsigned int code_write_count;
 unsigned long *code_bitmap;
+unsigned int code_write_count;
 #else
 unsigned long flags;
 #endif
-- 
2.7.4




Re: [Qemu-devel] [PATCH 15/16] translate-all: remove tb_lock mention from cpu_restore_state_from_tb

2018-04-05 Thread Emilio G. Cota
On Thu, Mar 29, 2018 at 17:06:56 +0100, Alex Bennée wrote:
> 
> Emilio G. Cota  writes:
> 
> > tb_lock was needed when the function did retranslation. However,
> > since fca8a500d519 ("tcg: Save insn data and use it in
> > cpu_restore_state_from_tb") we don't do retranslation.
> >
> > Get rid of the comment.
> 
> I think we need to modify the comment in cpu_restore_state as well:
> 
>   Either way we need return early as we can't resolve it here.

Thanks, I've added this suggestion to patch 16/16.

E.



[Qemu-devel] [PATCH v4] linux-user: fix preadv/pwritev offsets

2018-04-05 Thread Max Filippov
preadv/pwritev accept low and high parts of file offset in two separate
parameters. When host bitness doesn't match guest bitness these parts
must be appropriately recombined.
Introduce target_to_host_low_high that does this recombination and use
it in preadv/pwritev syscalls.

This fixes glibc testsuite test misc/tst-preadvwritev64.

Signed-off-by: Max Filippov 
---
Changes v3->v4:
- recombine offset into unsigned long long and split it into two longs.

Changes v2->v3:
- rename helper to target_to_host_low_high;
- handle cases TARGET_LONG_BITS * 2 > HOST_LONG_BITS and
  TARGET_LONG_BITS < 2 * HOST_LONG_BITS correctly.

Changes v1->v2:
- fix host high computation in TARGET_LONG_BITS > HOST_LONG_BITS case

 linux-user/syscall.c | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 5ef517613577..eb9a31f1e092 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -3386,6 +3386,20 @@ static abi_long do_getsockopt(int sockfd, int level, int 
optname,
 return ret;
 }
 
+static void target_to_host_low_high(abi_ulong tlow,
+abi_ulong thigh,
+unsigned long *hlow,
+unsigned long *hhigh)
+{
+unsigned long long off = tlow |
+((unsigned long long)thigh << TARGET_LONG_BITS / 2) <<
+TARGET_LONG_BITS / 2;
+
+*hlow = (unsigned long)off;
+*hhigh = (unsigned long)((off >> HOST_LONG_BITS / 2) >>
+ HOST_LONG_BITS / 2);
+}
+
 static struct iovec *lock_iovec(int type, abi_ulong target_addr,
 abi_ulong count, int copy)
 {
@@ -10449,7 +10463,10 @@ abi_long do_syscall(void *cpu_env, int num, abi_long 
arg1,
 {
 struct iovec *vec = lock_iovec(VERIFY_WRITE, arg2, arg3, 0);
 if (vec != NULL) {
-ret = get_errno(safe_preadv(arg1, vec, arg3, arg4, arg5));
+unsigned long low, high;
+
+target_to_host_low_high(arg4, arg5, , );
+ret = get_errno(safe_preadv(arg1, vec, arg3, low, high));
 unlock_iovec(vec, arg2, arg3, 1);
 } else {
 ret = -host_to_target_errno(errno);
@@ -10462,7 +10479,10 @@ abi_long do_syscall(void *cpu_env, int num, abi_long 
arg1,
 {
 struct iovec *vec = lock_iovec(VERIFY_READ, arg2, arg3, 1);
 if (vec != NULL) {
-ret = get_errno(safe_pwritev(arg1, vec, arg3, arg4, arg5));
+unsigned long low, high;
+
+target_to_host_low_high(arg4, arg5, , );
+ret = get_errno(safe_pwritev(arg1, vec, arg3, low, high));
 unlock_iovec(vec, arg2, arg3, 0);
 } else {
 ret = -host_to_target_errno(errno);
-- 
2.11.0




Re: [Qemu-devel] [PATCH for-2.13 13/13] target/ppc: Fold slb_nr into PPCHash64Options

2018-04-05 Thread David Gibson
On Thu, Apr 05, 2018 at 03:27:34PM +0200, Cornelia Huck wrote:
> On Thu, 5 Apr 2018 15:12:55 +0200
> Greg Kurz  wrote:
> 
> > On Thu,  5 Apr 2018 12:14:37 +1000
> > David Gibson  wrote:
> 
> > > @@ -4000,7 +4000,12 @@ DEFINE_SPAPR_MACHINE(2_13, "2.13", true);
> > >   * pseries-2.12
> > >   */
> > >  #define SPAPR_COMPAT_2_12  \
> > > -HW_COMPAT_2_12  
> > 
> > This hunk doesn't apply on master, nor on your ppc-for-2.13 branch...
> > 
> > It looks like a patch to introduce the 2.13 machine type is missing.
> > 
> > FWIW, Connie has already queued a patch to do so for s390x, that also
> > introduces HW_COMPAT_2_12.
> > 
> > https://github.com/cohuck/qemu/commit/b54cde7350b6681b4349b904e0f9a8a8d58c0951
> > 
> > Maybe the HW_COMPAT_ macros should be added in a standalone patch ?
> > 
> > Cc'ing Connie for insights.
> > 
> > > +HW_COMPAT_2_12 \
> > > +{  \
> > > +.driver = TYPE_POWERPC_CPU,\
> > > +.property = "pre-2.13-migration",  \
> > > +.value= "on",  \ 
> > >  
> 
> I think the usual procedure is
> 
> - every arch that uses compat machines queues a patch that creates the
>   new compat machine(s) and adds an empty HW_COMPAT_
> - whoever has their queue pulled first wins wrt hw_compat

That's my understanding as well.  It's an easy conflict to resolve.

> So, I'm happy with anyone adding the empty HW_COMPAT_2_12 -- it needn't
> be me :)

Likewise.  I'm planning to keep it in my tree for the time being, so
as not to rely on external patches, but when the 2.13 tree opens, who
wins the race is mostly chance, and that's fine.

> [We could also introduce the 2.13 machines for all architectures in one
> sweep, but I think that would be generating needless churn for arch
> maintainers.]
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [Qemu-arm] [PATCH v2 06/67] target/arm: Implement SVE predicate test

2018-04-05 Thread Richard Henderson
On 04/03/2018 07:16 PM, Alex Bennée wrote:
>> +/* Set the cpu flags as per a return from an SVE helper.  */
>> +static void do_pred_flags(TCGv_i32 t)
>> +{
>> +tcg_gen_mov_i32(cpu_NF, t);
>> +tcg_gen_andi_i32(cpu_ZF, t, 2);
>> +tcg_gen_andi_i32(cpu_CF, t, 1);
>> +tcg_gen_movi_i32(cpu_VF, 0);
>> +}
> 
> Why bother returning a value from the helper to then spend time
> shuffling it into env->cpu_FLAG when we could do this directly? Does
> this aid code generation when flag values are queried?

It means that the helper itself clobbers no TCG global temps, and so does not
invalidate any of the guest integer registers that might be live in host 
registers.

The arithmetic above is approximately as efficient as plain moves, so I don't
see this as "spending time shuffling" per se.

> Also from above:
> 
>> + * The return value has bit 31 set if N is set, bit 1 set if Z is clear,
>> + * and bit 0 set if C is set.
> 
> So there is assumed knowledge in the encoding of cpu_NF here - maybe a
> reference to cpu.h where this is codified.

I suppose, sure.


r~



Re: [Qemu-devel] [PATCH 12/16] translate-all: discard TB when tb_link_page returns an existing matching TB

2018-04-05 Thread Emilio G. Cota
On Thu, Mar 29, 2018 at 16:19:39 +0100, Alex Bennée wrote:
> 
> Emilio G. Cota  writes:
> 
> > Use the recently-gained QHT feature of returning the matching TB if it
> > already exists. This allows us to get rid of the lookup we perform
> > right after acquiring tb_lock.
> >
> > Suggested-by: Richard Henderson 
> > Signed-off-by: Emilio G. Cota 
> > ---
> >  accel/tcg/cpu-exec.c  | 14 ++
> >  accel/tcg/translate-all.c | 47 
> > ++-
> >  2 files changed, 40 insertions(+), 21 deletions(-)
> >
> > diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> > index 7c83887..8aed38c 100644
> > --- a/accel/tcg/cpu-exec.c
> > +++ b/accel/tcg/cpu-exec.c
> > @@ -243,10 +243,7 @@ void cpu_exec_step_atomic(CPUState *cpu)
> >  if (tb == NULL) {
> >  mmap_lock();
> >  tb_lock();
> > -tb = tb_htable_lookup(cpu, pc, cs_base, flags, cf_mask);
> > -if (likely(tb == NULL)) {
> > -tb = tb_gen_code(cpu, pc, cs_base, flags, cflags);
> > -}
> > +tb = tb_gen_code(cpu, pc, cs_base, flags, cflags);
> 
> tb_gen_code needs to be renamed to reflect it's semantics.
> tb_get_or_gen_code? Or maybe tb_get_code with a sub-helper to do the
> generation.

I think it can remain as tb_gen_code. The caller still gets
a TB, and whether that TB has been generated by this thread or
any other thread is irrelevant.

(snip)
> > --- a/accel/tcg/translate-all.c
> > +++ b/accel/tcg/translate-all.c
> > @@ -1503,12 +1503,16 @@ static inline void tb_page_add(PageDesc *p, 
> > TranslationBlock *tb,
> >   * (-1) to indicate that only one page contains the TB.
> >   *
> >   * Called with mmap_lock held for user-mode emulation.
> > + *
> > + * Returns @tb or an existing TB that matches @tb.
> 
> That's just confusing to read. So this returns a TB like the @tb we
> passed in but actually a different one matching the same conditions?

Good point. Here tb_link_page is not a great name, but instead
of adding a long name such as tb_link_page_or_get_existing, in
v2 I've expanded the above comment. It now looks as follows:

 * Returns a pointer @tb, or a pointer to an existing TB that matches @tb.
 * Note that in !user-mode, another thread might have already added a TB
 * for the same block of guest code that @tb corresponds to. In that case,
 * the caller should discard the original @tb, and use instead the returned TB.
 
> > @@ -1706,7 +1727,15 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
> >   * memory barrier is required before tb_link_page() makes the TB 
> > visible
> >   * through the physical hash table and physical page list.
> >   */
> > -tb_link_page(tb, phys_pc, phys_page2);
> > +existing_tb = tb_link_page(tb, phys_pc, phys_page2);
> > +/* if the TB already exists, discard what we just translated */
> 
> So are we in the position now that we could potentially do a translation
> but be beaten by another thread generating the same code?

Exactly.

> I suspect we could
> do with a bit of explanatory commentary for the tb_gen_code functions.

As I said above I don't think tb_gen_code changes at all
to its callers, since the caller still gets a TB pointer that it
did not have before.

tb_link_page is the key here -- I hope the updated comment
I quoted above is enough to make things clear.

> Also I think the "Translation Blocks" section needs updating in the
> MTTCG design document to make this clear.

I've added a comment at the bottom of that section:

Parallel code generation is supported. QHT is used at insertion time
as the synchronization point across threads, thereby ensuring that we only
keep track of a single TranslationBlock for each guest code block.

> I'm curious if we should be counting unused translations somewhere in
> the JIT stats. I'm guessing you need to work at a pathalogical case to
> hit this much?

This should be extremely rare on most workloads. Given that and the
fact that we won't have unused translated code (we discard it by
resetting code_gen_ptr), I wouldn't worry too much about this.
In the unlikely case that it ever became a problem, TCG profiling
time would account for it, and on a perf profile we'd see the slow
path in tb_link_page being taken.

Thanks,

Emilio



Re: [Qemu-devel] [PATCH v2 05/67] target/arm: Implement SVE load vector/predicate

2018-04-05 Thread Richard Henderson
On 04/03/2018 07:26 PM, Alex Bennée wrote:
> You don't use it yet but probably worth a:
> 
> static inline int ffr_full_reg_offset(DisasContext *s)
> {
> return pred_full_reg_offset(s, 16);
> }
> 
> here when you get to it to avoid the magic 16 appearing in the main code.

Hum.  Most of the places that ffr is touched is in sve.decode.
I could certainly enhance the grammar there to allow a symbol
there instead of a number.

But I don't think treating ffr differently from a regular pr
register, as above, is a good idea.  At best I would use

  pred_full_reg_offset(s, FFR_PRED_NUM)

or something.


r~



Re: [Qemu-devel] [RfC PATCH] Add udmabuf misc device

2018-04-05 Thread Matt Roper
On Thu, Apr 05, 2018 at 10:32:04PM +0200, Daniel Vetter wrote:
> Pulling this out of the shadows again.
> 
> We now also have xen-zcopy from Oleksandr and the hyper dmabuf stuff
> from Matt and Dongwong.
> 
> At least from the intel side there seems to be the idea to just have 1
> special device that can handle cross-gues/host sharing for all kinds
> of hypervisors, so I guess you all need to work together :-)
> 
> Or we throw out the idea that hyper dmabuf will be cross-hypervisor
> (not sure how useful/reasonable that is, someone please convince me
> one way or the other).
> 
> Cheers, Daniel

Dongwon (DW) is the one doing all the real work on hyper_dmabuf, but I'm
familiar with the use cases he's trying to address, and I think there
are a couple high-level goals of his work that are worth calling out as
we discuss the various options for sharing buffers produced in one VM
with a consumer running in another VM:

 * We should try to keep the interface/usage separate from the
   underlying hypervisor implementation details.  I.e., in DW's design
   the sink/source drivers that handle the actual buffer passing in the
   two VM's should provide a generic interface that does not depend on a
   specific hypervisor.  Behind the scenes there could be various
   implementations for specific hypervisors (Xen, KVM, ACRN, etc.), and
   some of those backends may have additional restrictions, but it would
   be best if userspace didn't have to know the specific hypervisor
   running on the system and could just query the general capabilities
   available to it.  We've already got projects in flight that are
   wanting this functionality on Xen and ACRN today.

 * The general interface should be able to express sharing from any
   guest:guest, not just guest:host.  Arbitrary G:G sharing might be
   something some hypervisors simply aren't able to support, but the
   userspace API itself shouldn't make assumptions or restrict that.  I
   think ideally the sharing API would include some kind of
   query_targets interface that would return a list of VM's that your
   current OS is allowed to share with; that list would be depend on the
   policy established by the system integrator, but obviously wouldn't
   include targets that the hypervisor itself wouldn't be capable of
   handling.
   
 * A lot of the initial use cases are in the realm of graphics, but this
   shouldn't be a graphics-specific API.  Buffers might contain other
   types of content as well (e.g., audio).  Really the content producer
   could potentially be any driver (or userspace) running in the VM that
   knows how to import/export dma_buf's (or maybe just import given
   danvet's suggestion that we should make the sink driver do all the
   actual memory allocation for any buffers that may be shared).

 * We need to be able to handle cross-VM coordination of buffer usage as
   well, so I think we'd want to include fence forwarding support in the
   API as well to signal back and forth about production/consumption
   completion.  And of course document really well what should happen
   if, for example, the entire VM you're sharing with/from dies.

 * The sharing API could be used to share multiple kinds of content in a
   single system.  The sharing sink driver running in the content
   producer's VM should accept some additional metadata that will be
   passed over to the target VM as well.  The sharing source driver
   running in the content consumer's VM would then be able to use this
   metadata to determine the purpose of a new buffer that arrives and
   filter/dispatch it to the appropriate consumer.


For reference, the terminology I'm using is

 /--\  dma_buf   /--\ HV /\  dma_buf   /--\
 | Producer |--->| Sink | HV | Source |--->| Consumer |
 \--/   ioctls   \--/ HV \/  uevents   \--/



In the realm of graphics, "Producer" could potentially be something like
an EGL client that sends the buffer at context setup and then signals
with fences on each SwapBuffers.  "Consumer" could be a Wayland client
that proxies the buffers into surfaces or dispatches them to other
userspace software that's waiting for buffers.

With the hyper_dmabuf approach, there's a lot of ABI details that need
to be worked out and really clearly documented before we worry too much
about the backend hypervisor-specific stuff.

I'm not super familiar with xen-zcopy and udmabuf, but it sounds like
they're approaching similar problems from slightly different directions,
so we should make sure we can come up with something that satisfies
everyone's requirements. 


Matt

> 
> On Wed, Mar 14, 2018 at 9:03 AM, Gerd Hoffmann  wrote:
> >   Hi,
> >
> >> Either mlock account (because it's mlocked defacto), and get_user_pages
> >> won't do that for you.
> >>
> >> Or you write the full-blown userptr implementation, including mmu_notifier
> >> support (see i915 or amdgpu), but that also requires 

Re: [Qemu-devel] [PATCH for-2.13 13/13] target/ppc: Fold slb_nr into PPCHash64Options

2018-04-05 Thread David Gibson
On Thu, Apr 05, 2018 at 03:12:55PM +0200, Greg Kurz wrote:
> On Thu,  5 Apr 2018 12:14:37 +1000
> David Gibson  wrote:
> 
> > The env->slb_nr field gives the size of the SLB (Segment Lookaside Buffer).
> > This is another static-after-initialization parameter of the specific
> > version of the 64-bit hash MMU in the CPU.  So, this patch folds the field
> > into PPCHash64Options with the other hash MMU options.
> > 
> > This is a bit more complicated that the things previously put in there,
> > because slb_nr was foolishly included in the migration stream.  So we need
> > some of the usual dance to handle backwards compatible migration.
> > 
> > Signed-off-by: David Gibson 
> > ---
> >  hw/ppc/pnv.c|  2 +-
> >  hw/ppc/spapr.c  | 11 ---
> >  target/ppc/cpu.h|  3 ++-
> >  target/ppc/kvm.c|  2 +-
> >  target/ppc/machine.c| 23 ---
> >  target/ppc/mmu-hash64.c | 15 +--
> >  target/ppc/mmu-hash64.h |  1 +
> >  target/ppc/translate_init.c | 15 ---
> >  8 files changed, 42 insertions(+), 30 deletions(-)
> > 
> > diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
> > index 5905be3f71..53f672afa8 100644
> > --- a/hw/ppc/pnv.c
> > +++ b/hw/ppc/pnv.c
> > @@ -180,7 +180,7 @@ static void pnv_dt_core(PnvChip *chip, PnvCore *pc, 
> > void *fdt)
> >  
> >  _FDT((fdt_setprop_cell(fdt, offset, "timebase-frequency", tbfreq)));
> >  _FDT((fdt_setprop_cell(fdt, offset, "clock-frequency", cpufreq)));
> > -_FDT((fdt_setprop_cell(fdt, offset, "ibm,slb-size", env->slb_nr)));
> > +_FDT((fdt_setprop_cell(fdt, offset, "ibm,slb-size", 
> > cpu->hash64_opts->slb_size)));
> >  _FDT((fdt_setprop_string(fdt, offset, "status", "okay")));
> >  _FDT((fdt_setprop(fdt, offset, "64-bit", NULL, 0)));
> >  
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 60bc8417b6..6021631722 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -547,8 +547,8 @@ static void spapr_populate_cpu_dt(CPUState *cs, void 
> > *fdt, int offset,
> >  
> >  _FDT((fdt_setprop_cell(fdt, offset, "timebase-frequency", tbfreq)));
> >  _FDT((fdt_setprop_cell(fdt, offset, "clock-frequency", cpufreq)));
> > -_FDT((fdt_setprop_cell(fdt, offset, "slb-size", env->slb_nr)));
> > -_FDT((fdt_setprop_cell(fdt, offset, "ibm,slb-size", env->slb_nr)));
> > +_FDT((fdt_setprop_cell(fdt, offset, "slb-size", 
> > cpu->hash64_opts->slb_size)));
> > +_FDT((fdt_setprop_cell(fdt, offset, "ibm,slb-size", 
> > cpu->hash64_opts->slb_size)));
> >  _FDT((fdt_setprop_string(fdt, offset, "status", "okay")));
> >  _FDT((fdt_setprop(fdt, offset, "64-bit", NULL, 0)));
> >  
> > @@ -4000,7 +4000,12 @@ DEFINE_SPAPR_MACHINE(2_13, "2.13", true);
> >   * pseries-2.12
> >   */
> >  #define SPAPR_COMPAT_2_12  \
> > -HW_COMPAT_2_12
> 
> This hunk doesn't apply on master, nor on your ppc-for-2.13 branch...

Uh.. I think you need to pull the ppc-for-2.13 brnch again, then..

> It looks like a patch to introduce the 2.13 machine type is missing.

..since I added a patch to do exactly that.

> FWIW, Connie has already queued a patch to do so for s390x, that also
> introduces HW_COMPAT_2_12.
> 
> https://github.com/cohuck/qemu/commit/b54cde7350b6681b4349b904e0f9a8a8d58c0951
> 
> Maybe the HW_COMPAT_ macros should be added in a standalone patch ?
> 
> Cc'ing Connie for insights.
> 
> > +HW_COMPAT_2_12 \
> > +{  \
> > +.driver = TYPE_POWERPC_CPU,\
> > +.property = "pre-2.13-migration",  \
> > +.value= "on",  \
> 
> indentation ?

Oops, adjusted.

> 
> Also, this property must be added to the TYPE_POWERPC_CPU class.
> 
> --- a/target/ppc/translate_init.c
> +++ b/target/ppc/translate_init.c
> @@ -10471,6 +10471,8 @@ static Property ppc_cpu_properties[] = {
>  DEFINE_PROP_BOOL("pre-2.8-migration", PowerPCCPU, pre_2_8_migration, 
> false),
>  DEFINE_PROP_BOOL("pre-2.10-migration", PowerPCCPU, pre_2_10_migration,
>   false),
> +DEFINE_PROP_BOOL("pre-2.13-migration", PowerPCCPU, pre_2_13_migration,
> + false),
>  DEFINE_PROP_END_OF_LIST(),
>  };

Oops again, adjusted.

> Appart from that, the patch looks good, so:
> 
> Reviewed-by: Greg Kurz 
> 
> With all the above points addressed, I could successfully migrate from an
> older QEMU and back, so:
> 
> Tested-by: Greg Kurz 
> 
> > +},
> >  
> >  static void spapr_machine_2_12_instance_options(MachineState *machine)
> >  {
> > diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> > index c0c44fb91d..8c9e03f54d 100644
> > --- a/target/ppc/cpu.h
> > +++ 

[Qemu-devel] [PATCH] tcg: Fix out-of-line generic vector compares

2018-04-05 Thread Richard Henderson
A mistake in the type passed to sizeof, that happens to work
when the out-of-line fallback itself is using host vectors,
but fails when using only the base types.

Reported-by: Emilio G. Cota 
Signed-off-by: Richard Henderson 
---
 accel/tcg/tcg-runtime-gvec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c
index 8bf8d63912..90340e56e0 100644
--- a/accel/tcg/tcg-runtime-gvec.c
+++ b/accel/tcg/tcg-runtime-gvec.c
@@ -705,7 +705,7 @@ void HELPER(NAME)(void *d, void *a, void *b, uint32_t desc) 
   \
 {  \
 intptr_t oprsz = simd_oprsz(desc); \
 intptr_t i;\
-for (i = 0; i < oprsz; i += sizeof(vec64)) {   \
+for (i = 0; i < oprsz; i += sizeof(TYPE)) {\
 *(TYPE *)(d + i) = DO_CMP0(*(TYPE *)(a + i) OP *(TYPE *)(b + i));  \
 }  \
 clear_high(d, oprsz, desc);\
-- 
2.14.3




Re: [Qemu-devel] [PULL 16/20] target/arm: Use vector infrastructure for aa64 compares

2018-04-05 Thread Richard Henderson
On 04/05/2018 03:06 PM, Thomas Huth wrote:
> On 05.04.2018 02:54, Richard Henderson wrote:
>> On 04/05/2018 10:07 AM, Richard Henderson wrote:
>>> On 04/05/2018 02:49 AM, Emilio G. Cota wrote:
 1. grab this binary:
   http://cs.columbia.edu/~cota/qemu/nbench-aarch64
 2. run it on a PowerPC host with:
   $ aarch64-linux-user/qemu-aarch64 nbench-aarch64 -V

 Note: the "-V" (or "-v") flag is important! Without it, there's no 
 segfault.
>>>
>>> How quickly?  I did not see one up until it exited for lack of NNET.DAT.
>>>
>>> I will note that I am using gcc 7.2 on gcc112 
>>> (/opt/cfarm/gcc-latest/bin/gcc).
>>>
>>> Are you using gcc 4.8.5, and is commit 74912f6dad in your tree?
>>> That commit might have made a difference...
>>
>> Bah.  I confirm that it doesn't, and that the test still fails.
>>
>> Since qemu does work on the same host when built with gcc 7.2, I can only 
>> blame
>> this failure on a compiler bug wrt gcc 4.8 on ppc64.
>>
>> I have not tracked down exactly what's going wrong, and probably won't unless
>> someone feels that it's worthwhile.  This only begs the question of whether 
>> we
>> should blacklist this compiler entirely.  Certainly it's old enough that it's
>> probably not worth fixing.
> 
> GCC 4.8 is still used in enterprise distros like RHEL7 and AFAIK also in
> SLES 12 ... so blacklisting this compiler does not sound like a good
> idea to me.

I have now tracked down this failure and it turns out not to be a compiler bug
at all; patch to follow.


r~




Re: [Qemu-devel] [PATCH 10/16] translate-all: use per-page locking in !user-mode

2018-04-05 Thread Emilio G. Cota
On Thu, Mar 29, 2018 at 15:55:13 +0100, Alex Bennée wrote:
> 
> Emilio G. Cota  writes:
(snip)
> > +/* lock the page(s) of a TB in the correct acquisition order */
> > +static inline void page_lock_tb(const TranslationBlock *tb)
> > +{
> > +if (likely(tb->page_addr[1] == -1)) {
> > +page_lock(page_find(tb->page_addr[0] >> TARGET_PAGE_BITS));
> > +return;
> > +}
> > +if (tb->page_addr[0] < tb->page_addr[1]) {
> > +page_lock(page_find(tb->page_addr[0] >> TARGET_PAGE_BITS));
> > +page_lock(page_find(tb->page_addr[1] >> TARGET_PAGE_BITS));
> > +} else {
> > +page_lock(page_find(tb->page_addr[1] >> TARGET_PAGE_BITS));
> > +page_lock(page_find(tb->page_addr[0] >> TARGET_PAGE_BITS));
> > +}
> > +}
(snip)
> > +/*
> > + * Add the TB to the page list.
> > + * To avoid deadlock, acquire first the lock of the lower-addressed 
> > page.
> > + */
> > +p = page_find_alloc(phys_pc >> TARGET_PAGE_BITS, 1);
> > +if (likely(phys_page2 == -1)) {
> >  tb->page_addr[1] = -1;
> > +page_lock(p);
> > +tb_page_add(p, tb, 0, phys_pc & TARGET_PAGE_MASK);
> > +} else {
> > +p2 = page_find_alloc(phys_page2 >> TARGET_PAGE_BITS, 1);
> > +if (phys_pc < phys_page2) {
> > +page_lock(p);
> > +page_lock(p2);
> > +} else {
> > +page_lock(p2);
> > +page_lock(p);
> > +}
> 
> Give we repeat this check further up perhaps a:
> 
>   page_lock_pair(PageDesc *p1, th_page_addr_t phys1, PageDesc *p2,  
> tb_page_addr_t phys2)

After trying, I don't think it's worth the trouble.

Note that page_lock_tb expands to nothing in user-mode,
whereas the latter snippet is shared by user-mode and
!user-mode. Dealing with that gets ugly quickly;
besides, we'd have to optionally return *p1 and *p2,
plus choose whether to use page_find or page_find_alloc..

(snip)
> The diff is a little messy around tb_page_add but I think we need an
> assert_page_lock(p) which compiles to check mmap_lock in CONFIG_USER
> instead of the assert_memory_lock().
> 
> Then we can be clear about tb, memory and page locks.

I've added an extra patch to v2 to do this, since this patch is
already huge.

Thanks,

Emilio



Re: [Qemu-devel] [PATCH v4 0/2] hw/scsi: support SCSI-2 passthrough without PI

2018-04-05 Thread Daniel Henrique Barboza

Hi,


On 04/05/2018 01:23 PM, Paolo Bonzini wrote:

This is my version of Daniel's patch.  In order to keep the
RD/WRPROTECT check for emulated SCSI disks, I've added a new property
to scsi-disk/hd/cd devices as well.  The property, similar to the
earlier versions posted by Daniel, is available to the user, but for
scsi-disk/hd/cd it affects the INQUIRY value returned by the emulated
INQUIRY command.

Daniel, can you please test it?


Emulated case worked as intended: I was able to set the scsi_version of 
the emulated device via command line.


One thing I noticed is that I am also able to set the scsi_version of 
scsi-block devices - if I set scsi_version=6 in a SCSI passthrough 
device that has version=2, the version that ends up being passed to the 
guest is 6. In a quick look I believe that this behavior is caused by 
setting


+    s->qdev.scsi_version = s->qdev.default_scsi_version;

in scsi_disk_reset, which is the reset function that is also used for 
scsi-block devices. So, setting scsi_version=6 in the command line 
overwrites the default_scsi_version = -1 that is being set in patch 2, 
making scsi_version =! -1 and bypassing the scsi_version assignment code.


A quick fix is to get rid of the scsi_version == -1 restriction to set 
the scsi_version. This is something that Fam proposed in one of the 
reviews. Given that a well behaved guest will not spam INQUIRY messages 
(causing the code to always recalculate the same version again), I am 
fine with it.


Another quick fix is to make a new scsi_block_reset function that calls 
scsi_disk_reset and  sets s->scsi_version = -1 right after.


Yet another fix is to fully prohibit the user to set scsi_version 
scsi-block and scsi-generic cases, returning an error message right off 
the start. Not sure how hard this would be - perhaps the above 
alternatives are cleaner.


Another fix is ... no fix. I am not sure how far the design philosophy 
of passthrough devices allows the user to overwrite device parameters in 
despite of their real values, but   what if the user wants to 
enforce a scsi_version when using a scsi-block device? The device will 
surely behave badly, but the user explicitly enforced it via command 
line, so perhaps let him/her have at it?



Thanks,


Daniel






Thanks,

Paolo

Daniel Henrique Barboza (1):
   hw/scsi: support SCSI-2 passthrough without PI

Paolo Bonzini (1):
   scsi-disk: allow customizing the SCSI version

  hw/scsi/scsi-disk.c| 29 -
  hw/scsi/scsi-generic.c | 48 +---
  include/hw/scsi/scsi.h |  2 ++
  3 files changed, 63 insertions(+), 16 deletions(-)






[Qemu-devel] [PATCH for-2.12] sam460ex: Fix timer frequency and clock multipliers

2018-04-05 Thread BALATON Zoltan
We only emulate timer running at CPU frequency which is what most
guests expect so set the frequency to match real hardware. This also
allows setting clock multipliers which caused slowdown previously due
to wrong timer frequency.

Signed-off-by: BALATON Zoltan 
---
 hw/ppc/ppc440_uc.c | 3 +--
 hw/ppc/sam460ex.c  | 7 ---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/ppc/ppc440_uc.c b/hw/ppc/ppc440_uc.c
index 976ab2b..e312fdb 100644
--- a/hw/ppc/ppc440_uc.c
+++ b/hw/ppc/ppc440_uc.c
@@ -392,8 +392,7 @@ static uint32_t dcr_read_sdr(void *opaque, int dcrn)
 case SDR0_CFGDATA:
 switch (sdr->addr) {
 case SDR0_STRP0:
-/* FIXME: Is this correct? This breaks timing in U-Boot */
-ret = 0; /*(0xb5 << 8) | (1 << 4) | 9 */
+ret = (0xb5 << 8) | (1 << 4) | 9;
 break;
 case SDR0_STRP1:
 ret = (5 << 29) | (2 << 26) | (1 << 24);
diff --git a/hw/ppc/sam460ex.c b/hw/ppc/sam460ex.c
index 70b8e76..dfff262 100644
--- a/hw/ppc/sam460ex.c
+++ b/hw/ppc/sam460ex.c
@@ -67,6 +67,7 @@
IRQ12 = SM502_INT
 */
 
+#define CPU_FREQ 115000
 #define SDRAM_NR_BANKS 4
 
 /* FIXME: See u-boot.git 8ac41e, also fix in ppc440_uc.c */
@@ -253,8 +254,8 @@ static int sam460ex_load_device_tree(hwaddr addr,
 char *filename;
 int fdt_size;
 void *fdt;
-uint32_t tb_freq = 5000;
-uint32_t clock_freq = 5000;
+uint32_t tb_freq = CPU_FREQ;
+uint32_t clock_freq = CPU_FREQ;
 
 filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, BINARY_DEVICE_TREE_FILE);
 if (!filename) {
@@ -416,7 +417,7 @@ static void sam460ex_init(MachineState *machine)
 boot_info = g_malloc0(sizeof(*boot_info));
 env->load_info = boot_info;
 
-ppc_booke_timers_init(cpu, 5000, 0);
+ppc_booke_timers_init(cpu, CPU_FREQ, 0);
 ppc_dcr_init(env, NULL, NULL);
 
 /* PLB arbitrer */
-- 
2.7.6




Re: [Qemu-devel] [PATCH 1/2] scsi-disk: allow customizing the SCSI version

2018-04-05 Thread Daniel Henrique Barboza



On 04/05/2018 01:23 PM, Paolo Bonzini wrote:

We would like to have different behavior for passthrough devices
depending on the SCSI version they expose.  To prepare for that,
allow the user of emulated devices to specify the desired SCSI
level, and adjust the emulation according to the property value.
The next patch will set the level for scsi-block and scsi-generic
devices.

Based on a patch by Daniel Henrique Barboza
.

Signed-off-by: Paolo Bonzini 
---


Reviewed-by: Daniel Henrique Barboza 
Tested-by: Daniel Henrique Barboza 



  hw/scsi/scsi-disk.c| 29 -
  hw/scsi/scsi-generic.c |  1 +
  include/hw/scsi/scsi.h |  2 ++
  3 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index f8ed8cf2b4..9400b97387 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -825,7 +825,7 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, 
uint8_t *outbuf)
   * block characteristics VPD page by default.  Not all of SPC-3
   * is actually implemented, but we're good enough.
   */
-outbuf[2] = 5;
+outbuf[2] = s->qdev.default_scsi_version;
  outbuf[3] = 2 | 0x10; /* Format 2, HiSup */

  if (buflen > 36) {
@@ -2193,7 +2193,11 @@ static int32_t scsi_disk_dma_command(SCSIRequest *req, 
uint8_t *buf)
  case READ_12:
  case READ_16:
  DPRINTF("Read (sector %" PRId64 ", count %u)\n", r->req.cmd.lba, len);
-if (r->req.cmd.buf[1] & 0xe0) {
+/* Protection information is not supported.  For SCSI versions 2 and
+ * older (as determined by snooping the guest's INQUIRY commands),
+ * there is no RD/WR/VRPROTECT, so skip this check in these versions.
+ */
+if (s->qdev.scsi_version > 2 && (r->req.cmd.buf[1] & 0xe0)) {
  goto illegal_request;
  }
  if (!check_lba_range(s, r->req.cmd.lba, len)) {
@@ -2224,7 +2228,7 @@ static int32_t scsi_disk_dma_command(SCSIRequest *req, 
uint8_t *buf)
   * As far as DMA is concerned, we can treat it the same as a write;
   * scsi_block_do_sgio will send VERIFY commands.
   */
-if (r->req.cmd.buf[1] & 0xe0) {
+if (s->qdev.scsi_version > 2 && (r->req.cmd.buf[1] & 0xe0)) {
  goto illegal_request;
  }
  if (!check_lba_range(s, r->req.cmd.lba, len)) {
@@ -2270,6 +2274,8 @@ static void scsi_disk_reset(DeviceState *dev)
  /* reset tray statuses */
  s->tray_locked = 0;
  s->tray_open = 0;
+
+s->qdev.scsi_version = s->qdev.default_scsi_version;
  }

  static void scsi_disk_resize_cb(void *opaque)
@@ -2814,6 +2820,8 @@ static bool scsi_block_is_passthrough(SCSIDiskState *s, 
uint8_t *buf)
  static int32_t scsi_block_dma_command(SCSIRequest *req, uint8_t *buf)
  {
  SCSIBlockReq *r = (SCSIBlockReq *)req;
+SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, req->dev);
+
  r->cmd = req->cmd.buf[0];
  switch (r->cmd >> 5) {
  case 0:
@@ -2839,8 +2847,11 @@ static int32_t scsi_block_dma_command(SCSIRequest *req, 
uint8_t *buf)
  abort();
  }

-if (r->cdb1 & 0xe0) {
-/* Protection information is not supported.  */
+/* Protection information is not supported.  For SCSI versions 2 and
+ * older (as determined by snooping the guest's INQUIRY commands),
+ * there is no RD/WR/VRPROTECT, so skip this check in these versions.
+ */
+if (s->qdev.scsi_version > 2 && (req->cmd.buf[1] & 0xe0)) {
  scsi_check_condition(>req, SENSE_CODE(INVALID_FIELD));
  return 0;
  }
@@ -2952,6 +2963,8 @@ static Property scsi_hd_properties[] = {
  DEFINE_PROP_UINT64("max_io_size", SCSIDiskState, max_io_size,
 DEFAULT_MAX_IO_SIZE),
  DEFINE_PROP_UINT16("rotation_rate", SCSIDiskState, rotation_rate, 0),
+DEFINE_PROP_INT32("scsi_version", SCSIDiskState, qdev.default_scsi_version,
+  5),
  DEFINE_BLOCK_CHS_PROPERTIES(SCSIDiskState, qdev.conf),
  DEFINE_PROP_END_OF_LIST(),
  };
@@ -2997,6 +3010,8 @@ static Property scsi_cd_properties[] = {
  DEFINE_PROP_UINT16("port_index", SCSIDiskState, port_index, 0),
  DEFINE_PROP_UINT64("max_io_size", SCSIDiskState, max_io_size,
 DEFAULT_MAX_IO_SIZE),
+DEFINE_PROP_INT32("scsi_version", SCSIDiskState, qdev.default_scsi_version,
+  5),
  DEFINE_PROP_END_OF_LIST(),
  };

@@ -3025,6 +3040,8 @@ static Property scsi_block_properties[] = {
  DEFINE_PROP_DRIVE("drive", SCSIDiskState, qdev.conf.blk),
  DEFINE_PROP_BOOL("share-rw", SCSIDiskState, qdev.conf.share_rw, false),
  DEFINE_PROP_UINT16("rotation_rate", SCSIDiskState, rotation_rate, 0),
+DEFINE_PROP_INT32("scsi_version", SCSIDiskState, qdev.default_scsi_version,
+  5),
  DEFINE_PROP_END_OF_LIST(),
  };

@@ -3065,6 

[Qemu-devel] [Bug 1740219] Re: static linux-user ARM emulation has several-second startup time

2018-04-05 Thread LukeShu
I'm not on a Debian/Ubuntu-ish system, but extracting

qemu-user-static_2.11+dfsg-1ubuntu6~ppa3_amd64.deb : data.tar.xz :
usr/bin/qemu-arm-static

and testing with that binary:

$ time usr/bin/qemu-arm-static 
/var/lib/archbuild/dbscripts@armv7h/luke/usr/bin/ldconfig --help
Usage: ldconfig [OPTION...]
  ...
.

real0m0.068s
user0m0.067s
sys 0m0.000s

That is: LGTM.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1740219

Title:
  static linux-user ARM emulation has several-second startup time

Status in QEMU:
  Fix Committed
Status in qemu package in Ubuntu:
  Triaged

Bug description:
  static linux-user emulation has several-second startup time

  My problem: I'm a Parabola packager, and I'm updating our
  qemu-user-static package from 2.8 to 2.11.  With my new
  statically-linked 2.11, running `qemu-arm /my/arm-chroot/bin/true`
  went from taking 0.006s to 3s!  This does not happen with the normal
  dynamically linked 2.11, or the old static 2.8.

  What happens is it gets stuck in
  `linux-user/elfload.c:init_guest_space()`.  What `init_guest_space`
  does is map 2 parts of the address space: `[base, base+guest_size]`
  and `[base+0x, base+0x+page_size]`; where it must find
  an acceptable `base`.  Its strategy is to `mmap(NULL, guest_size,
  ...)` decide where the first range is, and then check if that
  +0x is also available.  If it isn't, then it starts trying
  `mmap(base, ...)` for the entire address space from low-address to
  high-address.

  "Normally," it finds an accaptable `base` within the first 2 tries.
  With a static 2.11, it's taking thousands of tries.

  

  Now, from my understanding, there are 2 factors working together to
  cause that in static 2.11 but not the other builds:

   - 2.11 increased the default `guest_size` from 0xf700 to 0x
   - PIE (and thus ASLR) is disabled for static builds

  For some reason that I don't understand, with the smaller
  `guest_size` the initial `mmap(NULL, guest_size, ...)` usually
  returns an acceptable address range; but larger `guest_size` makes it
  consistently return a block of memory that butts right up against
  another already mapped chunk of memory.  This isn't just true on the
  older builds, it's true with the 2.11 builds if I use the `-R` flag to
  shrink the `guest_size` back down to 0xf700.  That is with
  linux-hardened 4.13.13 on x86-64.

  So then, it it falls back to crawling the entire address space; so it
  tries base=0x1000.  With ASLR, that probably succeeds.  But with
  ASLR being disabled on static builds, the text segment is at
  0x6000; which is does not leave room for the needed
  0x1000-size block before it.  So then it tries base=0x2000.
  And so on, more than 6000 times until it finally gets to and passes
  the text segment; calling mmap more than 12000 times.

  

  I'm not sure what the fix is.  Perhaps try to mmap a continuous chunk
  of size 0x1000, then munmap it and then mmap the 2 chunks that we
  actually need.  The disadvantage to that is that it does not support
  the sparse address space that the current algorithm supports for
  `guest_size < 0x`.  If `guest_size < 0x` *and* the big
  mmap fails, then it could fall back to a sparse search; though I'm not
  sure the current algorithm is a good choice for it, as we see in this
  bug.  Perhaps it should inspect /proc/self/maps to try to find a
  suitable range before ever calling mmap?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1740219/+subscriptions



Re: [Qemu-devel] [PULL 11/26] pci-bridge/i82801b11: clear bridge registers on platform reset

2018-04-05 Thread Michael Roth
Quoting Laszlo Ersek (2018-03-23 13:42:07)
> Michael, Peter,
> 
> On 02/08/18 20:09, Michael S. Tsirkin wrote:
> > From: Laszlo Ersek 
> > 
> > The "i82801b11-bridge" device model is a descendant of "base-pci-bridge"
> > (TYPE_PCI_BRIDGE). However, unlike other similar devices, such as
> > 
> > - pci-bridge,
> > - pcie-pci-bridge,
> > - PCIE Root Port,
> > - xio3130 switch upstream and downstream ports,
> > - dec-21154-p2p-bridge,
> > - pbm-bridge,
> > - xilinx-pcie-root,
> > 
> > "i82801b11-bridge" does not clear the bridge specific registers at
> > platform reset.
> > 
> > This is a problem because devices on "i82801b11-bridge" continue to
> > respond to config space cycles after platform reset, when addressed with
> > the bus number that was previously programmed into the secondary bus
> > number register of "i82801b11-bridge". This error breaks OVMF's search for
> > extra (PXB) root buses, for example.
> > 
> > The device class reset method for "i82801b11-bridge" is currently NULL;
> > set it directly to pci_bridge_reset(), like the last three bridge models
> > in the above listing do.
> > 
> > Cc: "Michael S. Tsirkin" 
> > Cc: Marcel Apfelbaum 
> > Cc: qemu-sta...@nongnu.org
> > Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1541839
> > Signed-off-by: Laszlo Ersek 
> > Reviewed-by: Marcel Apfelbaum 
> > Reviewed-by: Michael S. Tsirkin 
> > Signed-off-by: Michael S. Tsirkin 
> > ---
> >  hw/pci-bridge/i82801b11.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/hw/pci-bridge/i82801b11.c b/hw/pci-bridge/i82801b11.c
> > index cb522bf..ebf7f5f 100644
> > --- a/hw/pci-bridge/i82801b11.c
> > +++ b/hw/pci-bridge/i82801b11.c
> > @@ -98,6 +98,7 @@ static void i82801b11_bridge_class_init(ObjectClass 
> > *klass, void *data)
> >  k->realize = i82801b11_bridge_realize;
> >  k->config_write = pci_bridge_write_config;
> >  dc->vmsd = _bridge_dev_vmstate;
> > +dc->reset = pci_bridge_reset;
> >  set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
> >  }
> >  
> > 
> 
> this patch didn't get included in the 2.11.1 round-up:
> 
> http://mid.mail-archive.com/20180206191515.25830-1-mdroth@linux.vnet.ibm.com
> 
> (The patch was posted, CC stable, between Mike's round-up set and the
> actual 2.11.1 release.)

Looks like I had it tagged as pending but the PULL got held up for a while
and it didn't hit master till the day before release. I'm okay with
cherry-picking from a maintainer tree but generally only do it if I get
poked about it or it's on my radar for other reasons.

> 
> Can we please make sure that the commit doesn't miss the 2.11.2 bus?
> 2.12 is still a month out, and without this patch, rebooting an UEFI
> guest OS (with OVMF) hangs, on certain VM configs (see the commit message).
> 
> (Apologies if there is a "queue" or "next" branch for stable releases --
> in that case I should have probably checked that branch before sending
> this email.)

There's https://github.com/mdroth/qemu/commits/stable-2.11-staging but
as you can see it generally doesn't see much action in-between releases.

Will start queueing patches for 2.11.2 soon and make sure it gets in,
but probably looking at a similar release timeframe as 2.12.

Thanks for the heads up!

> 
> The commit hash on the master branch is
> ed247f40db84c8bd4bb7d10948702cd47cc4d5ae; it's part of v2.12.0-rc0.
> 
> Thanks!
> Laszlo
> 




Re: [Qemu-devel] -icount changes physical address assignments in QEMU 2.10/2.11

2018-04-05 Thread alarson
"Peter Maydell"  wrote on 04/05/2018 12:28:01 
PM:

> From: "Peter Maydell" 
> To: alar...@ddci.com
> Cc: "QEMU Developers" 
> Date: 04/05/2018 12:28 PM
> Subject: Re: [Qemu-devel] -icount changes physical address assignments 
in QEMU 2.10/2.11
> 
> On 5 April 2018 at 17:44,   wrote:
> > "Peter Maydell"  wrote on 04/05/2018 
09:05:53
> > AM:
> >> I've just tried your attached test image ...
> >
> > Curious.  I just downloaded qemu-2.12.0-rc2.tar.xz and built it using
> > Cygwin (a version from about a month ago) using mingw compilers
> > (mingw64-i686-gcc-g++ 6.4.0), and I still see the issue when the
> > resulting QEMU binary is run using -icount 2 against my test binary,
> > but not when run without -icount.  Here are the commands used:
> >
> > ../qemu-2.12.0-rc2/configure --python=/usr/bin/python \
> >'--with-pkgversion=DDCI QEMU 2.12.0-rc2' \
> >--prefix=/usr/local/qemu \
> >--enable-sdl --with-sdlabi=2.0 \
> >'--target-list=aarch64-softmmu ppc64-softmmu x86_64-softmmu' \
> >--cross-prefix=i686-w64-mingw32-
> > /usr/bin/make -Otarget -j 8
> >
> > Any suggestions of things to try?
> 
> Can you reproduce the problem on a Linux host? It would
> be interesting to identify if this is a Windows-host
> specific issue somehow.

Linux "works".  I installed ubuntu 17.10 in a VM on my windows box, 
recompiled QEMU 2.12.0-rc2 (same sources as above), using a configure 
line the same as above except omitting --cross-prefix and --with-sdlabi.
Both with "-icount 2" and without show expected results.

I installed a fresh Cygwin with just the packages suggested at
https://wiki.qemu.org/Hosts/W32#Native_builds_with_Mingw-w64 (plus
some obviously missing ones like python, make, etc.) and the problem 
persists.  The updated configure line is:

../qemu-2.12.0-rc2/configure \
'--with-pkgversion=DDCI QEMU 2.12.0-rc2' \
--prefix=/usr/local/qemu \
'--target-list=aarch64-softmmu ppc64-softmmu x86_64-softmmu' \
--cross-prefix=i686-w64-mingw32-



[Qemu-devel] [Bug 1441443] Re: Is there a way to create a 10G network interface for VMs in KVM2.0?

2018-04-05 Thread liang yan
Unless you are using SRIOV or DPDK which both need hardware support. If
could support SRIOV, then using IOMMU+VFIO, and pass-through to VM, this
will get a close number. Or DPDK, using a user-space driver + vhost-net,
will also get a pretty good value.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1441443

Title:
  Is there a way to create a 10G network interface for VMs in KVM2.0?

Status in QEMU:
  Incomplete

Bug description:

  We have installed & configured the KVM 2.0 (qemu-kvm 2.0.0+dfsg-
  2ubuntu1.10) on Ubuntu 14.04. The physical server is connected to 10G
  network, KVM is configured in Bridged mode But the issue is, when we
  create Network interface on VMs, we have only 1G device as an options
  for vmhosts. Is this the limit of the KVM or is there a way to create
  a 10G network interface for VMs? Available device models

  E1000
  Ne2k_pci
  Pcnet
  Rtl8139
  virtio

  Please find the network configuration details

  Source device : Host device vnet1 (Bridge ‘br0’)
  Device model : virtio 

  Network configuration in the host /etc/network/interfaces

  auto br0
  iface br0 inet static
  address 10.221.x.10
  netmask 255.255.255.0
  network 10.221.x.0
  broadcast 10.221.x.255
  gateway 10.221.x.1
  # dns-* options are implemented by the resolvconf package, if 
installed
  dns-nameservers X.X.X.X
  dns-search XXX.NET
  bridge_ports em1
  bridge_fd 0
  bridge_stp off
  bridge_maxwait 0

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1441443/+subscriptions



Re: [Qemu-devel] [RfC PATCH] Add udmabuf misc device

2018-04-05 Thread Daniel Vetter
Pulling this out of the shadows again.

We now also have xen-zcopy from Oleksandr and the hyper dmabuf stuff
from Matt and Dongwong.

At least from the intel side there seems to be the idea to just have 1
special device that can handle cross-gues/host sharing for all kinds
of hypervisors, so I guess you all need to work together :-)

Or we throw out the idea that hyper dmabuf will be cross-hypervisor
(not sure how useful/reasonable that is, someone please convince me
one way or the other).

Cheers, Daniel

On Wed, Mar 14, 2018 at 9:03 AM, Gerd Hoffmann  wrote:
>   Hi,
>
>> Either mlock account (because it's mlocked defacto), and get_user_pages
>> won't do that for you.
>>
>> Or you write the full-blown userptr implementation, including mmu_notifier
>> support (see i915 or amdgpu), but that also requires Christian Königs
>> latest ->invalidate_mapping RFC for dma-buf (since atm exporting userptr
>> buffers is a no-go).
>
> I guess I'll look at mlock accounting for starters then.  Easier for
> now, and leaves the door open to switch to userptr later as this should
> be transparent to userspace.
>
>> > Known issue:  Driver API isn't complete yet.  Need add some flags, for
>> > example to support read-only buffers.
>>
>> dma-buf has no concept of read-only. I don't think we can even enforce
>> that (not many iommus can enforce this iirc), so pretty much need to
>> require r/w memory.
>
> Ah, ok.  Just saw the 'write' arg for get_user_pages_fast and figured we
> might support that, but if iommus can't handle that anyway it's
> pointless indeed.
>
>> > Cc: David Airlie 
>> > Cc: Tomeu Vizoso 
>> > Signed-off-by: Gerd Hoffmann 
>>
>> btw there's also the hyperdmabuf stuff from the xen folks, but imo their
>> solution of forwarding the entire dma-buf api is over the top. This here
>> looks _much_ better, pls cc all the hyperdmabuf people on your next
>> version.
>
> Fun fact: googling for "hyperdmabuf" found me your mail and nothing else :-o
> (Trying "hyper dmabuf" instead worked better then).
>
> Yes, will cc them on the next version.  Not sure it'll help much on xen
> though due to the memory management being very different.  Basically xen
> owns the memory, not the kernel of the control domain (dom0), so
> creating dmabufs for guest memory chunks isn't that simple ...
>
> Also it's not clear whenever they really need guest -> guest exports or
> guest -> dom0 exports.
>
>> Overall I like the idea, but too lazy to review.
>
> Cool.  General comments on the idea was all I was looking for for the
> moment.  Spare yor review cycles for the next version ;)
>
>> Oh, some kselftests for this stuff would be lovely.
>
> I'll look into it.
>
> thanks,
>   Gerd
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch



Re: [Qemu-devel] [PATCH] tcg: fix 16-byte vector operations detection

2018-04-05 Thread Paolo Bonzini
On 28/03/2018 15:31, Laurent Vivier wrote:
> configure tries to detect if the compiler
> supports 16-byte vector operations.
> 
> As stated in the comment of the detection
> program, there is a problem with the system
> compiler on GCC on Centos 7.
> 
> This program doesn't actually detect the problem
> with GCC on RHEL7 on PPC64LE (Red Hat 4.8.5-28).
> 
> This patch updates the test to look more like
> it is in QEMU helpers, and now detects the problem.
> 
> The error reported is:
> 
>   CC  ppc64-softmmu/accel/tcg/tcg-runtime-gvec.o
>   ..//accel/tcg/tcg-runtime-gvec.c: In function ‘helper_gvec_shl8i’:
>   ../accel/tcg/tcg-runtime-gvec.c:558:26: internal compiler error: in 
> emit_move_insn, at expr.c:3495
>*(vec8 *)(d + i) = *(vec8 *)(a + i) << shift;
> ^
> Fixes: db43267 "tcg: Add generic vector expanders"
> Signed-off-by: Laurent Vivier 
> ---
>  configure | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/configure b/configure
> index 4d0e92c96c..a2301dd0dc 100755
> --- a/configure
> +++ b/configure
> @@ -5054,6 +5054,14 @@ static S2 c2;
>  static S4 c4;
>  static S8 c8;
>  static int i;
> +void helper(void *d, void *a, int shift, int i);
> +void helper(void *d, void *a, int shift, int i)
> +{
> +  *(U1 *)(d + i) = *(U1 *)(a + i) << shift;
> +  *(U2 *)(d + i) = *(U2 *)(a + i) << shift;
> +  *(U4 *)(d + i) = *(U4 *)(a + i) << shift;
> +  *(U8 *)(d + i) = *(U8 *)(a + i) << shift;
> +}
>  int main(void)
>  {
>a1 += b1; a2 += b2; a4 += b4; a8 += b8;
> 

Queued, thanks.

Paolo



Re: [Qemu-devel] TCG icount interaction with timer deadlines

2018-04-05 Thread Paolo Bonzini


- Original Message -
> From: "Peter Maydell" 
> To: "Paolo Bonzini" 
> Cc: "QEMU Developers" , "Alex Bennée" 
> , "Richard Henderson"
> , "Emilio G. Cota" , "Pavel Dovgalyuk" 
> 
> Sent: Thursday, April 5, 2018 7:35:56 PM
> Subject: Re: TCG icount interaction with timer deadlines
> 
> On 5 April 2018 at 18:07, Paolo Bonzini  wrote:
> > On 05/04/2018 18:01, Peter Maydell wrote:
> >>  * however, if the guest reprograms the clock during the tcg_cpu_exec()
> >>run, we don't do anything to cause us to stop earlier
> >
> > Anything that does this from the vCPU thread should be between
> > gen_icount_start and gen_icount_end.  (In fact, it's the entire reason
> > why cpu_io_recompile exists).
> 
> Yes, and this does cause us to do a cpu_io_recompile, which
> rebuilds the TB and does a longjmp. However:
>  (1) that only takes us out to cpu_exec(), which will then
>  just go ahead and execute the next TB, whereas the
>  recalculation of deadlines happens at the next level out
>  in tcg_cpu_exec()
>  (2) the io_recompile happens *before* the guest writes to
>  the timer register that reprograms the deadline, so even
>  if we recomputed deadlines after this longjmp they wouldn't
>  be correct

Right - that part would be handled here:

void qemu_timer_notify_cb(void *opaque, QEMUClockType type)
{
if (!use_icount || type != QEMU_CLOCK_VIRTUAL) {
qemu_notify_event();
return;
}

if (!qemu_in_vcpu_thread() && first_cpu) {
/* qemu_cpu_kick is not enough to kick a halted CPU out of
 * qemu_tcg_wait_io_event.  async_run_on_cpu, instead,
 * causes cpu_thread_is_idle to return false.  This way,
 * handle_icount_deadline can run.
 */
async_run_on_cpu(first_cpu, do_nothing, RUN_ON_CPU_NULL);
}
}

(called by timerlist_notify, called in turn by timerlist_rearm)
but that second "if" is too restrictive.  Maybe just removing
the first arm is enough.  All this was broken by MTTCG.

Paolo



Re: [Qemu-devel] [PATCH v2] vhost: Allow adjoining regions

2018-04-05 Thread Alex Williamson
On Thu, 5 Apr 2018 20:46:55 +0100
"Dr. David Alan Gilbert"  wrote:

> * Alex Williamson (alex.william...@redhat.com) wrote:
> > On Fri, 23 Mar 2018 15:39:39 +
> > "Dr. David Alan Gilbert (git)"  wrote:
> >   
> > > From: "Dr. David Alan Gilbert" 
> > > 
> > > My rework of section adding combines overlapping or adjoining regions,
> > > but checks they're actually the same underlying RAM block.
> > > Fix the case where two blocks adjoin but don't overlap; that new region
> > > should get added (but not combined), but my previous patch was 
> > > disallowing it.
> > > 
> > > Fixes: c1ece84e7c9
> > > 
> > > Reported-by: Alex Williamson 
> > > Signed-off-by: Dr. David Alan Gilbert 
> > > ---  
> > 
> > 
> > Ping?  This error report is a regression and should be considered for
> > 2.12.
> > 
> > Tested-by: Alex Williamson   
> 
> It's a change in vhost - so mst is taking it ?

That's my assumption, sorry, I should have set to/cc explicitly rather
than just replying.  Thanks,

Alex

> > >  hw/virtio/vhost.c | 13 +
> > >  1 file changed, 9 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > > index 250f886acb..2a720cca3d 100644
> > > --- a/hw/virtio/vhost.c
> > > +++ b/hw/virtio/vhost.c
> > > @@ -595,10 +595,15 @@ static void vhost_region_add_section(struct 
> > > vhost_dev *dev,
> > >  
> > > prev_sec->offset_within_address_space,
> > >  prev_sec->offset_within_region);
> > >  } else {
> > > -error_report("%s: Overlapping but not coherent sections "
> > > - "at %"PRIx64,
> > > - __func__, mrs_gpa);
> > > -return;
> > > +/* adjoining regions are fine, but overlapping ones with
> > > + * different blocks/offsets shouldn't happen
> > > + */
> > > +if (mrs_gpa != prev_gpa_end + 1) {
> > > +error_report("%s: Overlapping but not coherent 
> > > sections "
> > > + "at %"PRIx64,
> > > + __func__, mrs_gpa);
> > > +return;
> > > +}
> > >  }
> > >  }
> > >  }  
> >   
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [Qemu-devel] [PATCH v2] vhost: Allow adjoining regions

2018-04-05 Thread Dr. David Alan Gilbert
* Alex Williamson (alex.william...@redhat.com) wrote:
> On Fri, 23 Mar 2018 15:39:39 +
> "Dr. David Alan Gilbert (git)"  wrote:
> 
> > From: "Dr. David Alan Gilbert" 
> > 
> > My rework of section adding combines overlapping or adjoining regions,
> > but checks they're actually the same underlying RAM block.
> > Fix the case where two blocks adjoin but don't overlap; that new region
> > should get added (but not combined), but my previous patch was disallowing 
> > it.
> > 
> > Fixes: c1ece84e7c9
> > 
> > Reported-by: Alex Williamson 
> > Signed-off-by: Dr. David Alan Gilbert 
> > ---
> 
> 
> Ping?  This error report is a regression and should be considered for
> 2.12.
> 
> Tested-by: Alex Williamson 

It's a change in vhost - so mst is taking it ?

Dave

> 
> >  hw/virtio/vhost.c | 13 +
> >  1 file changed, 9 insertions(+), 4 deletions(-)
> > 
> > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > index 250f886acb..2a720cca3d 100644
> > --- a/hw/virtio/vhost.c
> > +++ b/hw/virtio/vhost.c
> > @@ -595,10 +595,15 @@ static void vhost_region_add_section(struct vhost_dev 
> > *dev,
> >  
> > prev_sec->offset_within_address_space,
> >  prev_sec->offset_within_region);
> >  } else {
> > -error_report("%s: Overlapping but not coherent sections "
> > - "at %"PRIx64,
> > - __func__, mrs_gpa);
> > -return;
> > +/* adjoining regions are fine, but overlapping ones with
> > + * different blocks/offsets shouldn't happen
> > + */
> > +if (mrs_gpa != prev_gpa_end + 1) {
> > +error_report("%s: Overlapping but not coherent 
> > sections "
> > + "at %"PRIx64,
> > + __func__, mrs_gpa);
> > +return;
> > +}
> >  }
> >  }
> >  }
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



[Qemu-devel] [PULL for-2.12 1/1] vfio: Use a trace point when a RAM section cannot be DMA mapped

2018-04-05 Thread Alex Williamson
From: Eric Auger 

Commit 567b5b309abe ("vfio/pci: Relax DMA map errors for MMIO regions")
added an error message if a passed memory section address or size
is not aligned to the page size and thus cannot be DMA mapped.

This patch fixes the trace by printing the region name and the
memory region section offset within the address space (instead of
offset_within_region).

We also turn the error_report into a trace event. Indeed, In some
cases, the traces can be confusing to non expert end-users and
let think the use case does not work (whereas it works as before).

This is the case where a BAR is successively mapped at different
GPAs and its sections are not compatible with dma map. The listener
is called several times and traces are issued for each intermediate
mapping.  The end-user cannot easily match those GPAs against the
final GPA output by lscpi. So let's keep those information to
informed users. In mid term, the plan is to advise the user about
BAR relocation relevance.

Fixes: 567b5b309abe ("vfio/pci: Relax DMA map errors for MMIO regions")
Signed-off-by: Eric Auger 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Alexey Kardashevskiy 
Signed-off-by: Alex Williamson 
---
 hw/vfio/common.c |   11 +--
 hw/vfio/trace-events |1 +
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 5e8471621800..07ffa0ba1062 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -548,12 +548,11 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 hwaddr pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
 
 if ((iova & pgmask) || (int128_get64(llsize) & pgmask)) {
-error_report("Region 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx
- " is not aligned to 0x%"HWADDR_PRIx
- " and cannot be mapped for DMA",
- section->offset_within_region,
- int128_getlo(section->size),
- pgmask + 1);
+trace_vfio_listener_region_add_no_dma_map(
+memory_region_name(section->mr),
+section->offset_within_address_space,
+int128_getlo(section->size),
+pgmask + 1);
 return;
 }
 }
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 79f63a2ff6f7..20109cb7581f 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -90,6 +90,7 @@ vfio_iommu_map_notify(const char *op, uint64_t iova_start, 
uint64_t iova_end) "i
 vfio_listener_region_add_skip(uint64_t start, uint64_t end) "SKIPPING 
region_add 0x%"PRIx64" - 0x%"PRIx64
 vfio_listener_region_add_iommu(uint64_t start, uint64_t end) "region_add 
[iommu] 0x%"PRIx64" - 0x%"PRIx64
 vfio_listener_region_add_ram(uint64_t iova_start, uint64_t iova_end, void 
*vaddr) "region_add [ram] 0x%"PRIx64" - 0x%"PRIx64" [%p]"
+vfio_listener_region_add_no_dma_map(const char *name, uint64_t iova, uint64_t 
size, uint64_t page_size) "Region \"%s\" 0x%"PRIx64" size=0x%"PRIx64" is not 
aligned to 0x%"PRIx64" and cannot be mapped for DMA"
 vfio_listener_region_del_skip(uint64_t start, uint64_t end) "SKIPPING 
region_del 0x%"PRIx64" - 0x%"PRIx64
 vfio_listener_region_del(uint64_t start, uint64_t end) "region_del 0x%"PRIx64" 
- 0x%"PRIx64
 vfio_disconnect_container(int fd) "close container->fd=%d"




[Qemu-devel] [PULL for-2.12 0/1] vfio fix

2018-04-05 Thread Alex Williamson
The following changes since commit 0e87fdc966d05f4e5ad868034fcd8ee2a08ca62d:

  Update version for v2.12.0-rc2 release (2018-04-04 20:37:20 +0100)

are available in the Git repository at:

  git://github.com/awilliam/qemu-vfio.git tags/vfio-fixes-20180405.0

for you to fetch changes up to 5c08600547c059e3fd072995f9f367cdaf3c7d9d:

  vfio: Use a trace point when a RAM section cannot be DMA mapped (2018-04-05 
10:48:52 -0600)


VFIO fixes 2018-04-05

 - Convert over zealous error_report to trace point and make more
   meaningful (Eric Auger)


Eric Auger (1):
  vfio: Use a trace point when a RAM section cannot be DMA mapped

 hw/vfio/common.c | 11 +--
 hw/vfio/trace-events |  1 +
 2 files changed, 6 insertions(+), 6 deletions(-)



Re: [Qemu-devel] [PATCH v2] vhost: Allow adjoining regions

2018-04-05 Thread Alex Williamson
On Fri, 23 Mar 2018 15:39:39 +
"Dr. David Alan Gilbert (git)"  wrote:

> From: "Dr. David Alan Gilbert" 
> 
> My rework of section adding combines overlapping or adjoining regions,
> but checks they're actually the same underlying RAM block.
> Fix the case where two blocks adjoin but don't overlap; that new region
> should get added (but not combined), but my previous patch was disallowing it.
> 
> Fixes: c1ece84e7c9
> 
> Reported-by: Alex Williamson 
> Signed-off-by: Dr. David Alan Gilbert 
> ---


Ping?  This error report is a regression and should be considered for
2.12.

Tested-by: Alex Williamson 


>  hw/virtio/vhost.c | 13 +
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 250f886acb..2a720cca3d 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -595,10 +595,15 @@ static void vhost_region_add_section(struct vhost_dev 
> *dev,
>  
> prev_sec->offset_within_address_space,
>  prev_sec->offset_within_region);
>  } else {
> -error_report("%s: Overlapping but not coherent sections "
> - "at %"PRIx64,
> - __func__, mrs_gpa);
> -return;
> +/* adjoining regions are fine, but overlapping ones with
> + * different blocks/offsets shouldn't happen
> + */
> +if (mrs_gpa != prev_gpa_end + 1) {
> +error_report("%s: Overlapping but not coherent sections "
> + "at %"PRIx64,
> + __func__, mrs_gpa);
> +return;
> +}
>  }
>  }
>  }




[Qemu-devel] [PATCH v2] e1000e: Prevent MSI/MSI-X storms

2018-04-05 Thread Jan Kiszka
From: Jan Kiszka 

Only signal MSI/MSI-X events on rising edges. So far we re-triggered the
interrupt sources even if the guest did no consumed the pending one,
easily causing interrupt storms.

Issue was observable with Linux 4.16 e1000e driver when MSI-X was used.
Vector 2 was causing interrupt storms after the driver activated the
device.

Signed-off-by: Jan Kiszka 
---

Changes in v2:
 - also update msi_causes_pending after EIAC changes (required because
   there is no e1000e_update_interrupt_state after that

 hw/net/e1000e_core.c | 11 +++
 hw/net/e1000e_core.h |  2 ++
 2 files changed, 13 insertions(+)

diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index c93c4661ed..d6ddd59986 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -2027,6 +2027,7 @@ e1000e_msix_notify_one(E1000ECore *core, uint32_t cause, 
uint32_t int_cfg)
 }
 
 core->mac[ICR] &= ~effective_eiac;
+core->msi_causes_pending &= ~effective_eiac;
 
 if (!(core->mac[CTRL_EXT] & E1000_CTRL_EXT_IAME)) {
 core->mac[IMS] &= ~effective_eiac;
@@ -2123,6 +2124,13 @@ e1000e_send_msi(E1000ECore *core, bool msix)
 {
 uint32_t causes = core->mac[ICR] & core->mac[IMS] & ~E1000_ICR_ASSERTED;
 
+core->msi_causes_pending &= causes;
+causes ^= core->msi_causes_pending;
+if (causes == 0) {
+return;
+}
+core->msi_causes_pending |= causes;
+
 if (msix) {
 e1000e_msix_notify(core, causes);
 } else {
@@ -2160,6 +2168,9 @@ e1000e_update_interrupt_state(E1000ECore *core)
 core->mac[ICS] = core->mac[ICR];
 
 interrupts_pending = (core->mac[IMS] & core->mac[ICR]) ? true : false;
+if (!interrupts_pending) {
+core->msi_causes_pending = 0;
+}
 
 trace_e1000e_irq_pending_interrupts(core->mac[ICR] & core->mac[IMS],
 core->mac[ICR], core->mac[IMS]);
diff --git a/hw/net/e1000e_core.h b/hw/net/e1000e_core.h
index 7d8ff41890..63a15510cc 100644
--- a/hw/net/e1000e_core.h
+++ b/hw/net/e1000e_core.h
@@ -109,6 +109,8 @@ struct E1000Core {
 NICState *owner_nic;
 PCIDevice *owner;
 void (*owner_start_recv)(PCIDevice *d);
+
+uint32_t msi_causes_pending;
 };
 
 void
-- 
2.13.6



Re: [Qemu-devel] TCG icount interaction with timer deadlines

2018-04-05 Thread Peter Maydell
On 5 April 2018 at 18:07, Paolo Bonzini  wrote:
> On 05/04/2018 18:01, Peter Maydell wrote:
>>  * however, if the guest reprograms the clock during the tcg_cpu_exec()
>>run, we don't do anything to cause us to stop earlier
>
> Anything that does this from the vCPU thread should be between
> gen_icount_start and gen_icount_end.  (In fact, it's the entire reason
> why cpu_io_recompile exists).

Yes, and this does cause us to do a cpu_io_recompile, which
rebuilds the TB and does a longjmp. However:
 (1) that only takes us out to cpu_exec(), which will then
 just go ahead and execute the next TB, whereas the
 recalculation of deadlines happens at the next level out
 in tcg_cpu_exec()
 (2) the io_recompile happens *before* the guest writes to
 the timer register that reprograms the deadline, so even
 if we recomputed deadlines after this longjmp they wouldn't
 be correct

thanks
-- PMM



Re: [Qemu-devel] [Qemu-arm] [PATCH for-2.12] target/arm: Report unsupported MPU region sizes more clearly

2018-04-05 Thread Philippe Mathieu-Daudé
On 04/05/2018 02:25 PM, Peter Maydell wrote:
> Currently our PMSAv7 and ARMv7M MPU implementation cannot handle
> MPU region sizes smaller than our TARGET_PAGE_SIZE. However we
> report that in a slightly confusing way:
> 
>  DRSR[3]: No support for MPU (sub)region alignment of 9 bits. Minimum is 10
> 
> The problem is not the alignment of the region, but its size;
> tweak the error message to say so:
>  DRSR[3]: No support for MPU (sub)region size of 512 bytes. Minimum is 1024.
> 
> Signed-off-by: Peter Maydell 

Reviewed-by: Philippe Mathieu-Daudé 

> ---
> This doesn't really need to go into 2.12, but as a cleanup to a warning
> message it's pretty harmless.
> 
>  target/arm/helper.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index dcb8476d9e..b14fdab140 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -9625,9 +9625,9 @@ static bool get_phys_addr_pmsav7(CPUARMState *env, 
> uint32_t address,
>  }
>  if (rsize < TARGET_PAGE_BITS) {
>  qemu_log_mask(LOG_UNIMP,
> -  "DRSR[%d]: No support for MPU (sub)region "
> -  "alignment of %" PRIu32 " bits. Minimum is 
> %d\n",
> -  n, rsize, TARGET_PAGE_BITS);
> +  "DRSR[%d]: No support for MPU (sub)region size 
> of"
> +  " %" PRIu32 " bytes. Minimum is %d.\n",
> +  n, (1 << rsize), TARGET_PAGE_SIZE);
>  continue;
>  }
>  if (srdis) {
> 



Re: [Qemu-devel] [PATCHv5] dma/i82374: avoid double creation of i82374 device

2018-04-05 Thread Philippe Mathieu-Daudé
Hi Paolo,

On 04/05/2018 02:02 PM, Paolo Bonzini wrote:
> On 26/03/2018 17:05, Eduardo Otubo wrote:
>> QEMU fails when used with the following command line:
>>
>> ./ppc64-softmmu/qemu-system-ppc64 -S -machine 40p,accel=tcg -device 
>> i82374
>> qemu-system-ppc64: hw/isa/isa-bus.c:110: isa_bus_dma: Assertion 
>> `!bus->dma[0] && !bus->dma[1]' failed.
>> Aborted (core dumped)
>>
>> The 40p machine type already creates the device i82374. If specified in
>> the command line, it will try to create it again, hence generating the
>> error. The function isa_bus_dma() isn't supposed to be called twice for
>> the same bus. This patch fixes this issue by propagating back the error
>> so QEMU can fail nicely without Abort or core dump.
>>
>> Signed-off-by: Eduardo Otubo 
> 
> Queued, thanks.

The fix is not this patch but:
http://lists.nongnu.org/archive/html/qemu-devel/2018-03/msg06678.html

with
http://lists.nongnu.org/archive/html/qemu-devel/2018-03/msg06844.html
Reviewed-by: Thomas Huth 
Tested-by: Thomas Huth 

and
http://lists.nongnu.org/archive/html/qemu-devel/2018-03/msg06827.html
Reviewed-by: Eduardo Otubo 

Regards,

Phil.



Re: [Qemu-devel] -icount changes physical address assignments in QEMU 2.10/2.11

2018-04-05 Thread Peter Maydell
On 5 April 2018 at 17:44,   wrote:
> "Peter Maydell"  wrote on 04/05/2018 09:05:53
> AM:
>> I've just tried your attached test image ...
>
> Curious.  I just downloaded qemu-2.12.0-rc2.tar.xz and built it using
> Cygwin (a version from about a month ago) using mingw compilers
> (mingw64-i686-gcc-g++ 6.4.0), and I still see the issue when the
> resulting QEMU binary is run using -icount 2 against my test binary,
> but not when run without -icount.  Here are the commands used:
>
> ../qemu-2.12.0-rc2/configure --python=/usr/bin/python \
>'--with-pkgversion=DDCI QEMU 2.12.0-rc2' \
>--prefix=/usr/local/qemu \
>--enable-sdl --with-sdlabi=2.0 \
>'--target-list=aarch64-softmmu ppc64-softmmu x86_64-softmmu' \
>--cross-prefix=i686-w64-mingw32-
> /usr/bin/make -Otarget -j 8
>
> Any suggestions of things to try?

Can you reproduce the problem on a Linux host? It would
be interesting to identify if this is a Windows-host
specific issue somehow.

thanks
-- PMM



[Qemu-devel] [PATCH for-2.12] target/arm: Report unsupported MPU region sizes more clearly

2018-04-05 Thread Peter Maydell
Currently our PMSAv7 and ARMv7M MPU implementation cannot handle
MPU region sizes smaller than our TARGET_PAGE_SIZE. However we
report that in a slightly confusing way:

 DRSR[3]: No support for MPU (sub)region alignment of 9 bits. Minimum is 10

The problem is not the alignment of the region, but its size;
tweak the error message to say so:
 DRSR[3]: No support for MPU (sub)region size of 512 bytes. Minimum is 1024.

Signed-off-by: Peter Maydell 
---
This doesn't really need to go into 2.12, but as a cleanup to a warning
message it's pretty harmless.

 target/arm/helper.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index dcb8476d9e..b14fdab140 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -9625,9 +9625,9 @@ static bool get_phys_addr_pmsav7(CPUARMState *env, 
uint32_t address,
 }
 if (rsize < TARGET_PAGE_BITS) {
 qemu_log_mask(LOG_UNIMP,
-  "DRSR[%d]: No support for MPU (sub)region "
-  "alignment of %" PRIu32 " bits. Minimum is %d\n",
-  n, rsize, TARGET_PAGE_BITS);
+  "DRSR[%d]: No support for MPU (sub)region size 
of"
+  " %" PRIu32 " bytes. Minimum is %d.\n",
+  n, (1 << rsize), TARGET_PAGE_SIZE);
 continue;
 }
 if (srdis) {
-- 
2.16.2




Re: [Qemu-devel] [PATCH 04/16] tcg: move tb_ctx.tb_phys_invalidate_count to tcg_ctx

2018-04-05 Thread Emilio G. Cota
On Thu, Mar 29, 2018 at 11:06:07 +0100, Alex Bennée wrote:
> Emilio G. Cota  writes:
(snip)
> > diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> > index 3a51d49..20ad3fc 100644
> > --- a/accel/tcg/translate-all.c
> > +++ b/accel/tcg/translate-all.c
> > @@ -1072,7 +1072,8 @@ void tb_phys_invalidate(TranslationBlock *tb, 
> > tb_page_addr_t page_addr)
> >  /* suppress any remaining jumps to this TB */
> >  tb_jmp_unlink(tb);
> >
> > -tb_ctx.tb_phys_invalidate_count++;
> > +atomic_set(_ctx->tb_phys_invalidate_count,
> > +   tcg_ctx->tb_phys_invalidate_count + 1);
> 
> We do have an atomic_inc helper for this or we need comment that we only
> have atomic_reads() to worry about hence no races.
> 
> Otherwise:
> 
> Reviewed-by: Alex Bennée 

tcg_ctx is per-thread (or otherwise tb_lock is held here), so
yes we only worry here about concurrent atomic_reads. This
is common enough not to need a comment, me thinks.
[we have quite a few more instances of atomic_set(foo, foo + 1),
for instance when incrementing TCGProfile counts.]

Thanks,

Emilio



Re: [Qemu-devel] [PATCH v3 6/7] s390x/kvm: handle AP instruction interception

2018-04-05 Thread Halil Pasic


On 04/05/2018 06:38 PM, Tony Krowiak wrote:
>> Hard to really give good advice without access to the documentation, but:
>> - If we tell the guest that the feature is available, but it does not
>>    get any cards to use, returning an empty matrix makes the most sense
>>    to me.
>> - I would not tie starting the guest to the presence of a vfio-ap
>>    device. Having the feature available in theory but without any
>>    devices actually being usable by the guest does not really sound
>>    wrong (can we hotplug this later?)
> For this phase of development, it is my opinion that introducing AP 
> instruction
> interception handlers is superfluous for the following reasons:
> 
> 1. Interception handling was introduced solely to ensure an operation 
> exception would
>    not be injected into the guest when CPU model feature for AP (i.e., ap=on)
>    is specified but a VFIO AP device (i.e., -device vfio-ap,sysfsdev=$path)
>    is not.
> 
> 2. The implementation of guest dedicated crypto adapters uses AP instruction
>    interpretation to virtualize AP devices for a guest. As such, the NQAP,
>    DQAP and most variants of the PQAP instructions will not be
>    intercepted.
> 
> 3. Hot plugging AP devices is not being supported for this phase of 
> development.
> 
> It is my opinion that introducing these interception handlers at this time is
> unnecessary and risks opening a can of worms that would be
> better dealt with in a subsequent phase. For that reason and the reasons 
> stated
> above, I think the better option is to terminate starting the guest if the
> CPU model feature for AP is enabled but an AP device is not defined for the
> guest. This restriction, of course, will be removed when hot plug is 
> implemented
> in a subsequent development phase.

I second that! I agree that having ap instructions but not having the
possibility to actually do AP crypto is probably not what the user wants.
Preventing such a guest form starting (with a nice message) sounds reasonable
to me.

I agree with Connie, the approach 'hold the line' (until future hotplugs)
is the most reasonable thing to do *in the long run*. But I think it's better
to limit ourselves to the simplest case for now, I don't see any problems
with doing the hotplug support later.

Regards,
Halil



Re: [Qemu-devel] [PATCH 02/16] qht: return existing entry when qht_insert fails

2018-04-05 Thread Emilio G. Cota
On Wed, Mar 28, 2018 at 17:33:09 +0100, Alex Bennée wrote:
> Emilio G. Cota  writes:
> > -bool qht_insert(struct qht *ht, void *p, uint32_t hash);
> > +void *qht_insert(struct qht *ht, void *p, uint32_t hash);
> 
> Hmm this seems needlessly counter intuitive. I realise the potential
> efficiency in overloading success/fail but wouldn't a:
> 
>   bool qht_insert(struct qht *ht, void *p, uint32_t hash, void **existing);
> 
> be conceptually nicer?

Good point, fixed in v2.

Thanks,

E.



Re: [Qemu-devel] TCG icount interaction with timer deadlines

2018-04-05 Thread Paolo Bonzini
On 05/04/2018 18:01, Peter Maydell wrote:
>  * however, if the guest reprograms the clock during the tcg_cpu_exec()
>run, we don't do anything to cause us to stop earlier

Anything that does this from the vCPU thread should be between
gen_icount_start and gen_icount_end.  (In fact, it's the entire reason
why cpu_io_recompile exists).

For completeness, if the I/O thread does it, it should (in icount mode
only) kick the running VCPU.

Am I missing something?

Paolo



[Qemu-devel] [RFC][BROKEN] rbd: Allow configuration of authentication scheme

2018-04-05 Thread Kevin Wolf
The legacy command line syntax supports a "password-secret" option that
allows to pass an authentication key to Ceph. This was not supported in
QMP so far.

This patch introduces authentication options in the QAPI schema, makes
them do the corresponding rados_conf_set() calls and adds compatibility
code that translates the old "password-secret" option both for opening
and creating images to the new set of options.

Note that the old option didn't allow to explicitly specify the set of
allowed authentication schemes. The compatibility code assumes that if
"password-secret" is given, only the cephx scheme is allowed. If it's
missing, both none and cephx are allowed because the configuration file
could still provide a key.

Signed-off-by: Kevin Wolf 
---

This doesn't actually work correctly yet because the way that options
are passed through the block layer (QAPI -> QemuOpts -> QDict). Before
we fix or hack around this, let's make sure this is the schema that we
want.

The two known problems are:

1. QDict *options in qemu_rbd_open() may contain options either in their
   proper QObject types (if passed from blockdev-add) or as strings
   (-drive). Both forms can be mixed in the same options QDict.

   rbd uses the keyval visitor to convert the options into a QAPI
   object. This means that it requires all options to be strings. This
   patch, however, introduces a bool property, so the visitor breaks
   when it gets its input from blockdev-add.

   Options to hack around the problem:

   a. Do an extra conversion step or two and go through QemuOpts like
  some other block drivers. When I offered something like this to
  Markus a while ago in a similar case, he rejected the idea.

   b. Introduce a qdict_stringify_entries() that just does what its name
  says. It would be called before the running keyval visitor so that
  only strings will be present in the QDict.

   c. Do a local one-off hack that checks if the entry with the key
  "auth-none" is a QBool, and if so, overwrite it with a string. The
  problem will reappear with the next non-string option.

   (d. Get rid of the QDict detour and work only with QAPI objects
   everywhere. Support rbd authentication only in QEMU 4.0.)

2. The proposed schema allows 'auth-cephx': {} as a valid option with
   the meaning that the cephx authentication scheme is enabled, but no
   key is given (e.g. it is taken from the config file).

   However, an empty dict cannot currently be represented by flattened
   QDicts. We need to find a way to enable this. I think this will be
   externally visible because it directly translates into the dotted
   syntax of -blockdev, so we may want to be careful.

Any thoughts on the proposed QAPI schema or the two implementation
problems are welcome.

 qapi/block-core.json |  22 +++
 block/rbd.c  | 102 ++-
 2 files changed, 99 insertions(+), 25 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index c50517bff3..d5ce588add 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3170,6 +3170,19 @@
 
 
 ##
+# @RbdAuthCephx:
+#
+# @key-secret: ID of a QCryptoSecret object providing a key for cephx
+#  authentication. If not specified, a key from the
+#  specified configuration file, or the system default
+#  configuration is used, if present.
+#
+# Since: 2.13
+##
+{ 'struct': 'RbdAuthCephx',
+  'data': { '*key-secret': 'str' } }
+
+##
 # @BlockdevOptionsRbd:
 #
 # @pool:   Ceph pool name.
@@ -3184,6 +3197,13 @@
 #
 # @user:   Ceph id name.
 #
+# @auth-none:  true if connecting to a server without authentication
+#  should be allowed (default: false; since 2.13)
+#
+# @auth-cephx: Configuration for cephx authentication if specified. If
+#  not specified, cephx authentication is not allowed.
+#  (since 2.13)
+#
 # @server: Monitor host address and port.  This maps
 #  to the "mon_host" Ceph option.
 #
@@ -3195,6 +3215,8 @@
 '*conf': 'str',
 '*snapshot': 'str',
 '*user': 'str',
+'*auth-none': 'bool',
+'*auth-cephx': 'RbdAuthCephx',
 '*server': ['InetSocketAddressBase'] } }
 
 ##
diff --git a/block/rbd.c b/block/rbd.c
index 5b64849dc6..c2a9a92dd5 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -105,8 +105,7 @@ typedef struct BDRVRBDState {
 
 static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx,
 BlockdevOptionsRbd *opts, bool cache,
-const char *keypairs, const char *secretid,
-Error **errp);
+const char *keypairs, Error **errp);
 
 static char *qemu_rbd_next_tok(char *src, char delim, char **p)
 {
@@ -231,21 

Re: [Qemu-devel] [PATCH v3 1/1] hw/scsi: support SCSI-2 passthrough without PI

2018-04-05 Thread Paolo Bonzini
On 05/04/2018 18:09, Daniel Henrique Barboza wrote:
>>>
>> This also has to check for "s->qdev.scsi_version != -1" so that the
>> behavior of emulated SCSI isn't changed (they claim SPC-3).  I made this
>> change and queued the patch.
> 
> Good catch.

Since there were some more changes for emulated devices, I sent out a
v4.  Thanks,

Paolo



Re: [Qemu-devel] [PATCHv5] dma/i82374: avoid double creation of i82374 device

2018-04-05 Thread Paolo Bonzini
On 26/03/2018 17:05, Eduardo Otubo wrote:
> QEMU fails when used with the following command line:
> 
> ./ppc64-softmmu/qemu-system-ppc64 -S -machine 40p,accel=tcg -device i82374
> qemu-system-ppc64: hw/isa/isa-bus.c:110: isa_bus_dma: Assertion 
> `!bus->dma[0] && !bus->dma[1]' failed.
> Aborted (core dumped)
> 
> The 40p machine type already creates the device i82374. If specified in
> the command line, it will try to create it again, hence generating the
> error. The function isa_bus_dma() isn't supposed to be called twice for
> the same bus. This patch fixes this issue by propagating back the error
> so QEMU can fail nicely without Abort or core dump.
> 
> Signed-off-by: Eduardo Otubo 

Queued, thanks.

> ---
> v5:
>  * Remove qdev_cleanup_nofail() and call object_property_set_bool() and
>object_unparent() directly.
>  * Fix wrong usage of local and global error variables
> 
> v4:
>  * Change return value from int8_t to int
>  * Changed function calling for other architectures.
> 
> v3:
>  * Removed all unecessary local_err
>  * Change return of isa_bus_dma() and DMA_init() from void to int8_t,
>returning -EBUSY on error and 0 on success
>  * Added qdev_cleanup_nofail() in case isa_bus_dma() returns error. The
>cleanup looks safe, but please review if I didn't miss any detail
> 
> v2:
>  * Removed user_creatable=false and replaced by error handling using
>Error **errp and error_propagate();
> 
>  hw/dma/i82374.c |  8 +++-
>  hw/dma/i8257.c  | 37 +
>  hw/i386/pc.c|  2 +-
>  hw/isa/isa-bus.c|  8 ++--
>  hw/mips/mips_fulong2e.c |  2 +-
>  hw/mips/mips_jazz.c |  2 +-
>  hw/mips/mips_malta.c|  2 +-
>  include/hw/dma/i8257.h  |  2 +-
>  include/hw/isa/isa.h|  2 +-
>  9 files changed, 40 insertions(+), 25 deletions(-)
> 
> diff --git a/hw/dma/i82374.c b/hw/dma/i82374.c
> index 83c87d92e0..08c06020bf 100644
> --- a/hw/dma/i82374.c
> +++ b/hw/dma/i82374.c
> @@ -25,6 +25,7 @@
>  #include "qemu/osdep.h"
>  #include "hw/isa/isa.h"
>  #include "hw/dma/i8257.h"
> +#include "qapi/error.h"
>  
>  #define TYPE_I82374 "i82374"
>  #define I82374(obj) OBJECT_CHECK(I82374State, (obj), TYPE_I82374)
> @@ -118,13 +119,18 @@ static const MemoryRegionPortio i82374_portio_list[] = {
>  static void i82374_realize(DeviceState *dev, Error **errp)
>  {
>  I82374State *s = I82374(dev);
> +Error *local_err = NULL;
> +i8257_dma_init(isa_bus_from_device(ISA_DEVICE(dev)), true, _err);
> +if (local_err) {
> +error_propagate(errp, local_err);
> +return;
> +}
>  
>  portio_list_init(>port_list, OBJECT(s), i82374_portio_list, s,
>   "i82374");
>  portio_list_add(>port_list, isa_address_space_io(>parent_obj),
>  s->iobase);
>  
> -i8257_dma_init(isa_bus_from_device(ISA_DEVICE(dev)), true);
>  memset(s->commands, 0, sizeof(s->commands));
>  }
>  
> diff --git a/hw/dma/i8257.c b/hw/dma/i8257.c
> index 52675e97c9..e8155cf608 100644
> --- a/hw/dma/i8257.c
> +++ b/hw/dma/i8257.c
> @@ -622,26 +622,31 @@ static void i8257_register_types(void)
>  
>  type_init(i8257_register_types)
>  
> -void i8257_dma_init(ISABus *bus, bool high_page_enable)
> +void i8257_dma_init(ISABus *bus, bool high_page_enable, Error **errp)
>  {
>  ISADevice *isa1, *isa2;
> -DeviceState *d;
> +DeviceState *d1, *d2;
>  
>  isa1 = isa_create(bus, TYPE_I8257);
> -d = DEVICE(isa1);
> -qdev_prop_set_int32(d, "base", 0x00);
> -qdev_prop_set_int32(d, "page-base", 0x80);
> -qdev_prop_set_int32(d, "pageh-base", high_page_enable ? 0x480 : -1);
> -qdev_prop_set_int32(d, "dshift", 0);
> -qdev_init_nofail(d);
> +d1 = DEVICE(isa1);
> +qdev_prop_set_int32(d1, "base", 0x00);
> +qdev_prop_set_int32(d1, "page-base", 0x80);
> +qdev_prop_set_int32(d1, "pageh-base", high_page_enable ? 0x480 : -1);
> +qdev_prop_set_int32(d1, "dshift", 0);
> +qdev_init_nofail(d1);
>  
>  isa2 = isa_create(bus, TYPE_I8257);
> -d = DEVICE(isa2);
> -qdev_prop_set_int32(d, "base", 0xc0);
> -qdev_prop_set_int32(d, "page-base", 0x88);
> -qdev_prop_set_int32(d, "pageh-base", high_page_enable ? 0x488 : -1);
> -qdev_prop_set_int32(d, "dshift", 1);
> -qdev_init_nofail(d);
> -
> -isa_bus_dma(bus, ISADMA(isa1), ISADMA(isa2));
> +d2 = DEVICE(isa2);
> +qdev_prop_set_int32(d2, "base", 0xc0);
> +qdev_prop_set_int32(d2, "page-base", 0x88);
> +qdev_prop_set_int32(d2, "pageh-base", high_page_enable ? 0x488 : -1);
> +qdev_prop_set_int32(d2, "dshift", 1);
> +qdev_init_nofail(d2);
> +
> +if (isa_bus_dma(bus, ISADMA(isa1), ISADMA(isa2), errp) < 0) {
> +object_property_set_bool(OBJECT(d1), false, "realized", errp);
> +object_unparent(OBJECT(d1));
> +object_property_set_bool(OBJECT(d2), false, "realized", errp);
> +object_unparent(OBJECT(d2));
> +}
>  }
> 

Re: [Qemu-devel] [PATCH 2/3] s390: Ensure IPL from SCSI works as expected

2018-04-05 Thread Farhan Ali



On 04/05/2018 11:07 AM, Viktor Mihajlovski wrote:

Operating systems may request an IPL from a virtio-scsi device
by specifying an IPL parameter type of CCW. In this case QEMU
won't set up the IPLB correctly. The BIOS will still detect
it's a SCSI device to boot from, but it will now have to search
for the first LUN and attempt to boot from there.
However this may not be the original boot LUN if there's more than
one SCSI disk attached to the HBA.

With this change QEMU will detect that the request is for a
SCSI device and will rebuild the initial IPL parameter info
if it's the SCSI device used for the first boot. In consequence
the BIOS can use the boot LUN from the IPL information block.

In case a different SCSI device has been set, the BIOS will find
and use the first available LUN.

Signed-off-by: Viktor Mihajlovski 
---
  hw/s390x/ipl.c | 31 +--
  1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/hw/s390x/ipl.c b/hw/s390x/ipl.c
index 58e33c5..fb554ab 100644
--- a/hw/s390x/ipl.c
+++ b/hw/s390x/ipl.c
@@ -427,7 +427,8 @@ unref_mr:
  return img_size;
  }

-static bool is_virtio_net_device(IplParameterBlock *iplb)
+static bool is_virtio_ccw_device_of_type(IplParameterBlock *iplb,
+ int virtio_id)
  {
  uint8_t cssid;
  uint8_t ssid;
@@ -447,13 +448,23 @@ static bool is_virtio_net_device(IplParameterBlock *iplb)
  sch = css_find_subch(1, cssid, ssid, schid);

  if (sch && sch->devno == devno) {
-return sch->id.cu_model == VIRTIO_ID_NET;
+return sch->id.cu_model == virtio_id;
  }
  }
  }
  return false;
  }

+static bool is_virtio_net_device(IplParameterBlock *iplb)
+{
+return is_virtio_ccw_device_of_type(iplb, VIRTIO_ID_NET);
+}
+
+static bool is_virtio_scsi_device(IplParameterBlock *iplb)
+{
+return is_virtio_ccw_device_of_type(iplb, VIRTIO_ID_SCSI);
+}
+
  void s390_ipl_update_diag308(IplParameterBlock *iplb)
  {
  S390IPLState *ipl = get_ipl_device();
@@ -478,6 +489,22 @@ void s390_reipl_request(void)
  S390IPLState *ipl = get_ipl_device();

  ipl->reipl_requested = true;
+if (ipl->iplb_valid &&
+!ipl->netboot &&
+ipl->iplb.pbt == S390_IPL_TYPE_CCW &&
+is_virtio_scsi_device(>iplb)) {
+CcwDevice *ccw_dev = s390_get_ccw_device(get_boot_device(0));
+
+if (ccw_dev &&
+cpu_to_be16(ccw_dev->sch->devno) == ipl->iplb.ccw.devno &&
+(ccw_dev->sch->ssid & 3) == ipl->iplb.ccw.ssid) {
+/*
+ * this is the original boot device's SCSI
+ * so restore IPL parameter info from it
+ */
+ipl->iplb_valid = s390_gen_initial_iplb(ipl);
+}
+}
  qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
  }

This does feel a little hackish, but I can't think of a better way to 
ensure the correct SCSI device is used for booting.


Reviewed-by: Farhan Ali 







Re: [Qemu-devel] -icount changes physical address assignments in QEMU 2.10/2.11

2018-04-05 Thread alarson
"Peter Maydell"  wrote on 04/05/2018 09:05:53 
AM:

> From: "Peter Maydell" 
> To: alar...@ddci.com
> Cc: "QEMU Developers" 
> Date: 04/05/2018 09:06 AM
> Subject: Re: [Qemu-devel] -icount changes physical address assignments 
in QEMU 2.10/2.11
> 
> On 22 March 2018 at 05:31,   wrote:
> > Your patch (applied to 2.11 source release) changed the behavior
> > somewhat, but did not fix the problem.  Attached is a binary that when
> > run should show a CGA fontset and color bars.
> >
> > This command should "work":
> >
> > qemu-system-aarch64 -M virt,virtualization=on -cpu cortex-a53 -vga std
> > -device secondary-vga -device virtio-net,netdev=vlan0,addr=2 -kernel
> > icount-bug.bin -netdev user,id=vlan0
> >
> > If you add "-icount 2" the display will appear, but be mangled.
> >
> > I didn't spend too much time trimming the source code, so if you need
> > to step by step debug walking through the guest code, I'll have to
> > prune it down some more.
> >
> > For the record, the QEMU source I have is modified slightly to add ARM 
WFE
> > support, something I will submit once this is all straightened out,
> > but this bug appeared before I made that patch.
> 
> Hi; sorry for the delay in testing this. I've just tried your
> attached test image with current head of git QEMU (commit
> 0e87fdc966d05f4e5ad86, which is the 2.12.0-rc2 release candidate),
> and it gives me a correct display both with and without -icount 2.
> So I think we've fixed this bug, probably with the combination
> of commits 0790f86861079b19 and 87f963be66a3245, or possibly
> a75a52d62418da.
> 
> thanks
> -- PMM

Thank you for your help.

> I've just tried your attached test image ...

Curious.  I just downloaded qemu-2.12.0-rc2.tar.xz and built it using
Cygwin (a version from about a month ago) using mingw compilers
(mingw64-i686-gcc-g++ 6.4.0), and I still see the issue when the
resulting QEMU binary is run using -icount 2 against my test binary,
but not when run without -icount.  Here are the commands used:

../qemu-2.12.0-rc2/configure --python=/usr/bin/python \
   '--with-pkgversion=DDCI QEMU 2.12.0-rc2' \
   --prefix=/usr/local/qemu \
   --enable-sdl --with-sdlabi=2.0 \
   '--target-list=aarch64-softmmu ppc64-softmmu x86_64-softmmu' \
   --cross-prefix=i686-w64-mingw32-
/usr/bin/make -Otarget -j 8

Any suggestions of things to try?

If you think this is a build environment issue, I can try MSYS and
report my results.



Re: [Qemu-devel] [PATCH v3 6/7] s390x/kvm: handle AP instruction interception

2018-04-05 Thread Tony Krowiak

On 04/03/2018 05:36 AM, Cornelia Huck wrote:

On Mon, 2 Apr 2018 12:36:27 -0400
Tony Krowiak  wrote:


On 03/26/2018 05:03 AM, Pierre Morel wrote:

On 26/03/2018 10:32, David Hildenbrand wrote:

On 16.03.2018 00:24, Tony Krowiak wrote:

+/*
+ * The Query Configuration Information (QCI) function (fc == 4)
does not
+ * set a response code in reg 1, so check for that along with the
+ * AP feature.
+ */
+if ((fc != 4) && s390_has_feat(S390_FEAT_AP)) {
+env->regs[1] = 0x1;
+
+return 0;
+}

This would imply an operation exception in case fc==4, which sounds very
wrong.

It depends but I think that the S390_FEAT_AP_QUERY_CONFIG_INFO must be
tested
to know what to answer.
If the feature is there, QCI must be answered correctly.

This is an interesting proposition which raises several issues that will
need to
be addressed. The only time the PQAP(QCI) instruction is intercepted is
when:
* A vfio-ap device is NOT defined for the guest because the vfio_ap
device driver
will set ECA.28 and the PQAP(QCI) instruction will be interpreted
* STFLE.12 is set for the guest

You say that the QCI must be answered correctly, but what is the correct
response?
If we return the matrix - i.e., APM, ADM and AQM - configured via the
mediated
matrix device's sysfs attributes files, then if any APQNs are defined in
the matrix,
we will have to handle *ALL* AP instructions by executing them on behalf
of the
guest. I suppose we could return an empty matrix in which case the AP
bus will come
up without any devices on the guest, but what is the expectation of an
admin who
deliberately configures the mediated matrix device? Should we forego
handling interception
of AP instructions and consider a guest configuration that turns on
S390_FEAT_AP but
does not define a vfio-ap device to be erroneous and terminate starting
of the guest?
Anybody have any thoughts?

Hard to really give good advice without access to the documentation, but:
- If we tell the guest that the feature is available, but it does not
   get any cards to use, returning an empty matrix makes the most sense
   to me.
- I would not tie starting the guest to the presence of a vfio-ap
   device. Having the feature available in theory but without any
   devices actually being usable by the guest does not really sound
   wrong (can we hotplug this later?)
For this phase of development, it is my opinion that introducing AP 
instruction

interception handlers is superfluous for the following reasons:

1. Interception handling was introduced solely to ensure an operation 
exception would
   not be injected into the guest when CPU model feature for AP (i.e., 
ap=on)
   is specified but a VFIO AP device (i.e., -device 
vfio-ap,sysfsdev=$path)

   is not.

2. The implementation of guest dedicated crypto adapters uses AP 
instruction

   interpretation to virtualize AP devices for a guest. As such, the NQAP,
   DQAP and most variants of the PQAP instructions will not be
   intercepted.

3. Hot plugging AP devices is not being supported for this phase of 
development.


It is my opinion that introducing these interception handlers at this 
time is

unnecessary and risks opening a can of worms that would be
better dealt with in a subsequent phase. For that reason and the reasons 
stated

above, I think the better option is to terminate starting the guest if the
CPU model feature for AP is enabled but an AP device is not defined for the
guest. This restriction, of course, will be removed when hot plug is 
implemented

in a subsequent development phase.







Re: [Qemu-devel] [PATCH v3] linux-user: fix preadv/pwritev offsets

2018-04-05 Thread Laurent Vivier
Le 05/04/2018 à 18:27, Max Filippov a écrit :
> On Thu, Apr 5, 2018 at 8:52 AM, Laurent Vivier  wrote:
>> Why don't you try to de-construct then re-construct the offset?
> 
> It would require 128-bit arithmetic on 64-bit host.
> 
>> Kernel commit
>>   601cc11d054a "Make non-compat preadv/pwritev use native register size"
>> is interesting.
>>
>> static inline abi_llong target_pos_from_hilo(abi_ulong high,
>>  abi_ulong low)
>> {
>> #define TARGET_HALF_LONG_BITS (TARGET_LONG_BITS / 2)
>>return (((abi_llong)high << TARGET_HALF_LONG_BITS) <<
>>  TARGET_HALF_LONG_BITS) | low;
>> }
>>
>> #define HOST_HALF_LONG_BITS (HOST_LONG_BITS / 2)
>>
>>  abi_llong pos = target_pos_from_hilo(arg4, arg5);
>>  ret = get_errno(safe_preadv(arg1, vec, arg3,
>>   pos,
>>   (pos >> HOST_HALF_LONG_BITS) >> HOST_HALF_LONG_BITS));
>>
>> It looks simpler, but perhaps I miss something
>> (it's just cut, I don't pretend to understand that code...)?
> 
> If we decide that host unsigned long long is wide enough to represent
> meaningful file positions then this function may be simplified to
> something like
> 
> unsigned long long off = (unsigned long long)tlow |
> ((unsigned long long)thigh << (TARGET_LONG_BITS / 2)) << (TARGET_LONG_BITS / 
> 2);
> *hlow = (unsigned long)off;
> *hhigh = (unsigned long)((off >> (HOST_LONG_BITS / 2)) >> (HOST_LONG_BITS / 
> 2));
> 
> There's also a bug that the target may specify an offset not representable
> on the host, that will get truncated and point to a different location in the
> file, possibly resulting in data corruption.

I don't think it can happen, because "long long" is always 64bit, so it
fits in two 32bit (long is 32bit on 32bit, 64bit on 64bit).

Thanks,
Laurent




Re: [Qemu-devel] [PATCH 1/3] s390: Refactor IPL parameter block generation

2018-04-05 Thread Farhan Ali



On 04/05/2018 11:07 AM, Viktor Mihajlovski wrote:

Splitting out the the CCW device extraction allows reuse.

Signed-off-by: Viktor Mihajlovski
---
  hw/s390x/ipl.c | 81 --
  1 file changed, 51 insertions(+), 30 deletions(-)


Reviewed-by: Farhan Ali 




[Qemu-devel] [Bug 1761535] [NEW] qemu-aarch64-static docker arm64v8/openjdk coredump

2018-04-05 Thread Richard Henwood
Public bug reported:

I am using qemu-aarch64-static to run the arm64v8/openjdk official image
on my x86 machine. Using QEMU master, I immediately hit a bug which
hangs the container. With Ubuntu default version qemu-aarch64 version
2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.24) and qemu-aarch64 version 2.11.1
(v2.11.1-dirty) the hang does not take place.

To reproduce (and get to the core dump):

$ /tmp/tmptgyg3nvh/qemu-aarch64-static/qemu-aarch64-static -version
qemu-aarch64 version 2.11.91 (v2.12.0-rc1-5-g47d3b60-dirty)
Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers

$ docker run -it -v 
/tmp/tmptgyg3nvh/qemu-aarch64-static:/usr/bin/qemu-aarch64-static 
arm64v8/openjdk /bin/bash
root@bf75cf45d311:/# javac
Usage: javac  
where possible options include:
  -g Generate all debugging info
<...snip...>
  @Read options and filenames from file

qemu: uncaught target signal 11 (Segmentation fault) - core dumped
...TERMINAL HANGS...


To get the core dump, In a separate terminal:

# snapshot the file system of the hung image
$ docker commit $(docker ps -aqf "name=latest_qemu") qemu_coredump

# connect with known working qemu
$ docker run -t -v /usr/bin/qemu-aarch64-static:/usr/bin/qemu-aarch64-static  
-i qemu_coredump /bin/bash

$$ ls -lat
total 10608

-rw-r--r--   1 root root 10792960 Mar 29 18:02 qemu_bash_20180329-180251_1.core
drwxrwxrwt   5 root root 4096 Mar 29 18:02 tmp


** Affects: qemu
 Importance: Undecided
 Status: New


** Tags: aarch64 core dump hang static

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1761535

Title:
  qemu-aarch64-static docker arm64v8/openjdk coredump

Status in QEMU:
  New

Bug description:
  I am using qemu-aarch64-static to run the arm64v8/openjdk official
  image on my x86 machine. Using QEMU master, I immediately hit a bug
  which hangs the container. With Ubuntu default version qemu-aarch64
  version 2.5.0 (Debian 1:2.5+dfsg-5ubuntu10.24) and qemu-aarch64
  version 2.11.1 (v2.11.1-dirty) the hang does not take place.

  To reproduce (and get to the core dump):

  $ /tmp/tmptgyg3nvh/qemu-aarch64-static/qemu-aarch64-static -version
  qemu-aarch64 version 2.11.91 (v2.12.0-rc1-5-g47d3b60-dirty)
  Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers

  $ docker run -it -v 
/tmp/tmptgyg3nvh/qemu-aarch64-static:/usr/bin/qemu-aarch64-static 
arm64v8/openjdk /bin/bash
  root@bf75cf45d311:/# javac
  Usage: javac  
  where possible options include:
-g Generate all debugging info
  <...snip...>
@Read options and filenames from file

  qemu: uncaught target signal 11 (Segmentation fault) - core dumped
  ...TERMINAL HANGS...

  
  To get the core dump, In a separate terminal:

  # snapshot the file system of the hung image
  $ docker commit $(docker ps -aqf "name=latest_qemu") qemu_coredump

  # connect with known working qemu
  $ docker run -t -v /usr/bin/qemu-aarch64-static:/usr/bin/qemu-aarch64-static  
-i qemu_coredump /bin/bash

  $$ ls -lat
  total 10608
  
  -rw-r--r--   1 root root 10792960 Mar 29 18:02 
qemu_bash_20180329-180251_1.core
  drwxrwxrwt   5 root root 4096 Mar 29 18:02 tmp
  

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1761535/+subscriptions



Re: [Qemu-devel] [PATCH v3] linux-user: fix preadv/pwritev offsets

2018-04-05 Thread Max Filippov
On Thu, Apr 5, 2018 at 8:52 AM, Laurent Vivier  wrote:
> Why don't you try to de-construct then re-construct the offset?

It would require 128-bit arithmetic on 64-bit host.

> Kernel commit
>   601cc11d054a "Make non-compat preadv/pwritev use native register size"
> is interesting.
>
> static inline abi_llong target_pos_from_hilo(abi_ulong high,
>  abi_ulong low)
> {
> #define TARGET_HALF_LONG_BITS (TARGET_LONG_BITS / 2)
>return (((abi_llong)high << TARGET_HALF_LONG_BITS) <<
>  TARGET_HALF_LONG_BITS) | low;
> }
>
> #define HOST_HALF_LONG_BITS (HOST_LONG_BITS / 2)
>
>  abi_llong pos = target_pos_from_hilo(arg4, arg5);
>  ret = get_errno(safe_preadv(arg1, vec, arg3,
>   pos,
>   (pos >> HOST_HALF_LONG_BITS) >> HOST_HALF_LONG_BITS));
>
> It looks simpler, but perhaps I miss something
> (it's just cut, I don't pretend to understand that code...)?

If we decide that host unsigned long long is wide enough to represent
meaningful file positions then this function may be simplified to
something like

unsigned long long off = (unsigned long long)tlow |
((unsigned long long)thigh << (TARGET_LONG_BITS / 2)) << (TARGET_LONG_BITS / 2);
*hlow = (unsigned long)off;
*hhigh = (unsigned long)((off >> (HOST_LONG_BITS / 2)) >> (HOST_LONG_BITS / 2));

There's also a bug that the target may specify an offset not representable
on the host, that will get truncated and point to a different location in the
file, possibly resulting in data corruption.

-- 
Thanks.
-- Max



Re: [Qemu-devel] [PATCH] hw/sparc64/sun4u: Fix introspection by converting prom instance_init to realize

2018-04-05 Thread Philippe Mathieu-Daudé
On 04/05/2018 01:22 PM, Philippe Mathieu-Daudé wrote:
> On 04/05/2018 06:32 AM, Thomas Huth wrote:
>> The instance_init function of devices should always succeed to be able
>> to introspect the device. However, the instance_init function of the
>> "openprom" device can currently fail, for example like this:
>>
>> $ echo "{'execute':'qmp_capabilities'}"\
>>"{'execute':'device-list-properties',"\
>>" 'arguments':{'typename':'openprom'}}" \
>>| sparc64-softmmu/qemu-system-sparc64 -M sun4v,accel=qtest -qmp stdio
>> {"QMP": {"version": {"qemu": {"micro": 91, "minor": 11, "major": 2},
>>  "package": "build-all"}, "capabilities": []}}
>> {"return": {}}
>> RAMBlock "sun4u.prom" already registered, abort!
>> Aborted (core dumped)
>>
>> This should not happen. Fix this problem by moving the affected code from
>> instance_init into a realize function instead.
>>
>> Signed-off-by: Thomas Huth 
>> ---
>>  hw/sparc64/sun4u.c | 18 --
>>  1 file changed, 12 insertions(+), 6 deletions(-)
>>
>> diff --git a/hw/sparc64/sun4u.c b/hw/sparc64/sun4u.c
>> index 2044a52..d62f5a2 100644
>> --- a/hw/sparc64/sun4u.c
>> +++ b/hw/sparc64/sun4u.c
>> @@ -425,13 +425,19 @@ static void prom_init(hwaddr addr, const char 
>> *bios_name)
>>  }
>>  }
>>  
>> -static void prom_init1(Object *obj)
>> +static void prom_realize(DeviceState *ds, Error **errp)
>>  {
>> -PROMState *s = OPENPROM(obj);
>> -SysBusDevice *dev = SYS_BUS_DEVICE(obj);
>> +PROMState *s = OPENPROM(ds);
>> +SysBusDevice *dev = SYS_BUS_DEVICE(ds);
>> +Error *local_err = NULL;
>> +
>> +memory_region_init_ram_nomigrate(>prom, OBJECT(ds), "sun4u.prom",
>> + PROM_SIZE_MAX, _err);
> 
> This looks the memory_region_init_ram() pattern...
> 
>> +if (local_err) {
>> +error_propagate(errp, local_err);
>> +return;
>> +}
>>  
>> -memory_region_init_ram_nomigrate(>prom, obj, "sun4u.prom", 
>> PROM_SIZE_MAX,
>> -   _fatal);
>>  vmstate_register_ram_global(>prom);
> 
> ...^
> 
> Reviewed-by: Philippe Mathieu-Daudé 
> 
>>  memory_region_set_readonly(>prom, true);

Maybe memory_region_init_rom_nomigrate() even?

>>  sysbus_init_mmio(dev, >prom);
>> @@ -446,6 +452,7 @@ static void prom_class_init(ObjectClass *klass, void 
>> *data)
>>  DeviceClass *dc = DEVICE_CLASS(klass);
>>  
>>  dc->props = prom_properties;
>> +dc->realize = prom_realize;
>>  }
>>  
>>  static const TypeInfo prom_info = {
>> @@ -453,7 +460,6 @@ static const TypeInfo prom_info = {
>>  .parent= TYPE_SYS_BUS_DEVICE,
>>  .instance_size = sizeof(PROMState),
>>  .class_init= prom_class_init,
>> -.instance_init = prom_init1,
>>  };
>>  
>>  
>>



[Qemu-devel] [PATCH 2/2] hw/scsi: support SCSI-2 passthrough without PI

2018-04-05 Thread Paolo Bonzini
From: Daniel Henrique Barboza 

QEMU SCSI code makes assumptions about how the PROTECT and BYTCHK
works in the protocol, denying support for PI (Protection
Information) in case the guest OS requests it. However, in SCSI versions 2
and older, there is no PI concept in the protocol.

This means that when dealing with such devices:

- there is no PROTECT bit in byte 5 of the standard INQUIRY response. The
whole byte is marked as "Reserved";

- there is no RDPROTECT in byte 2 of READ. We have 'Logical Unit Number'
in this field instead;

- there is no VRPROTECT in byte 2 of VERIFY. We have 'Logical Unit Number'
in this field instead. This also means that the BYTCHK bit in this case
is not related to PI.

Since QEMU does not consider these changes, a SCSI passthrough using
a SCSI-2 device will not work. It will mistake these fields with
PI information and return Illegal Request SCSI SENSE thinking
that the driver is asking for PI support.

This patch fixes it by adding a new attribute called 'scsi_version'
that is read from the standard INQUIRY response of passthrough
devices. This allows for a version verification before applying
conditions related to PI that doesn't apply for older versions.

Reported-by: Dac Nguyen 
Signed-off-by: Daniel Henrique Barboza 
Message-Id: <20180327211451.14647-1-danie...@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini 
---
 hw/scsi/scsi-disk.c|  2 +-
 hw/scsi/scsi-generic.c | 47 ---
 2 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 9400b97387..ded23d36ca 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -3041,7 +3041,7 @@ static Property scsi_block_properties[] = {
 DEFINE_PROP_BOOL("share-rw", SCSIDiskState, qdev.conf.share_rw, false),
 DEFINE_PROP_UINT16("rotation_rate", SCSIDiskState, rotation_rate, 0),
 DEFINE_PROP_INT32("scsi_version", SCSIDiskState, qdev.default_scsi_version,
-  5),
+  -1),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
index 1870085762..381f04e339 100644
--- a/hw/scsi/scsi-generic.c
+++ b/hw/scsi/scsi-generic.c
@@ -194,17 +194,40 @@ static void scsi_read_complete(void * opaque, int ret)
 r->buf[3] |= 0x80;
 }
 }
-if (s->type == TYPE_DISK &&
-r->req.cmd.buf[0] == INQUIRY &&
-r->req.cmd.buf[2] == 0xb0) {
-uint32_t max_transfer =
-blk_get_max_transfer(s->conf.blk) / s->blocksize;
-
-assert(max_transfer);
-stl_be_p(>buf[8], max_transfer);
-/* Also take care of the opt xfer len. */
-stl_be_p(>buf[12],
- MIN_NON_ZERO(max_transfer, ldl_be_p(>buf[12])));
+if (r->req.cmd.buf[0] == INQUIRY) {
+/*
+ *  EVPD set to zero returns the standard INQUIRY data.
+ *
+ *  Check if scsi_version is unset (-1) to avoid re-defining it
+ *  each time an INQUIRY with standard data is received.
+ *  scsi_version is initialized with -1 in scsi_generic_reset
+ *  and scsi_disk_reset, making sure that we'll set the
+ *  scsi_version after a reset. If the version field of the
+ *  INQUIRY response somehow changes after a guest reboot,
+ *  we'll be able to keep track of it.
+ *
+ *  On SCSI-2 and older, first 3 bits of byte 2 is the
+ *  ANSI-approved version, while on later versions the
+ *  whole byte 2 contains the version. Check if we're dealing
+ *  with a newer version and, in that case, assign the
+ *  whole byte.
+ */
+if (s->scsi_version == -1 && !(r->req.cmd.buf[1] & 0x01)) {
+s->scsi_version = r->buf[2] & 0x07;
+if (s->scsi_version > 2) {
+s->scsi_version = r->buf[2];
+}
+}
+if (s->type == TYPE_DISK && r->req.cmd.buf[2] == 0xb0) {
+uint32_t max_transfer =
+blk_get_max_transfer(s->conf.blk) / s->blocksize;
+
+assert(max_transfer);
+stl_be_p(>buf[8], max_transfer);
+/* Also take care of the opt xfer len. */
+stl_be_p(>buf[12],
+ MIN_NON_ZERO(max_transfer, ldl_be_p(>buf[12])));
+}
 }
 scsi_req_data(>req, len);
 scsi_req_unref(>req);
@@ -550,6 +573,8 @@ static void scsi_generic_realize(SCSIDevice *s, Error 
**errp)
 
 DPRINTF("block size %d\n", s->blocksize);
 
+/* Only used by scsi-block, but initialize it nevertheless to be clean.  */
+s->default_scsi_version = -1;
 scsi_generic_read_device_identification(s);
 }
 
-- 
2.16.2




[Qemu-devel] [PATCH 1/2] scsi-disk: allow customizing the SCSI version

2018-04-05 Thread Paolo Bonzini
We would like to have different behavior for passthrough devices
depending on the SCSI version they expose.  To prepare for that,
allow the user of emulated devices to specify the desired SCSI
level, and adjust the emulation according to the property value.
The next patch will set the level for scsi-block and scsi-generic
devices.

Based on a patch by Daniel Henrique Barboza
.

Signed-off-by: Paolo Bonzini 
---
 hw/scsi/scsi-disk.c| 29 -
 hw/scsi/scsi-generic.c |  1 +
 include/hw/scsi/scsi.h |  2 ++
 3 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index f8ed8cf2b4..9400b97387 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -825,7 +825,7 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, 
uint8_t *outbuf)
  * block characteristics VPD page by default.  Not all of SPC-3
  * is actually implemented, but we're good enough.
  */
-outbuf[2] = 5;
+outbuf[2] = s->qdev.default_scsi_version;
 outbuf[3] = 2 | 0x10; /* Format 2, HiSup */
 
 if (buflen > 36) {
@@ -2193,7 +2193,11 @@ static int32_t scsi_disk_dma_command(SCSIRequest *req, 
uint8_t *buf)
 case READ_12:
 case READ_16:
 DPRINTF("Read (sector %" PRId64 ", count %u)\n", r->req.cmd.lba, len);
-if (r->req.cmd.buf[1] & 0xe0) {
+/* Protection information is not supported.  For SCSI versions 2 and
+ * older (as determined by snooping the guest's INQUIRY commands),
+ * there is no RD/WR/VRPROTECT, so skip this check in these versions.
+ */
+if (s->qdev.scsi_version > 2 && (r->req.cmd.buf[1] & 0xe0)) {
 goto illegal_request;
 }
 if (!check_lba_range(s, r->req.cmd.lba, len)) {
@@ -2224,7 +2228,7 @@ static int32_t scsi_disk_dma_command(SCSIRequest *req, 
uint8_t *buf)
  * As far as DMA is concerned, we can treat it the same as a write;
  * scsi_block_do_sgio will send VERIFY commands.
  */
-if (r->req.cmd.buf[1] & 0xe0) {
+if (s->qdev.scsi_version > 2 && (r->req.cmd.buf[1] & 0xe0)) {
 goto illegal_request;
 }
 if (!check_lba_range(s, r->req.cmd.lba, len)) {
@@ -2270,6 +2274,8 @@ static void scsi_disk_reset(DeviceState *dev)
 /* reset tray statuses */
 s->tray_locked = 0;
 s->tray_open = 0;
+
+s->qdev.scsi_version = s->qdev.default_scsi_version;
 }
 
 static void scsi_disk_resize_cb(void *opaque)
@@ -2814,6 +2820,8 @@ static bool scsi_block_is_passthrough(SCSIDiskState *s, 
uint8_t *buf)
 static int32_t scsi_block_dma_command(SCSIRequest *req, uint8_t *buf)
 {
 SCSIBlockReq *r = (SCSIBlockReq *)req;
+SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, req->dev);
+
 r->cmd = req->cmd.buf[0];
 switch (r->cmd >> 5) {
 case 0:
@@ -2839,8 +2847,11 @@ static int32_t scsi_block_dma_command(SCSIRequest *req, 
uint8_t *buf)
 abort();
 }
 
-if (r->cdb1 & 0xe0) {
-/* Protection information is not supported.  */
+/* Protection information is not supported.  For SCSI versions 2 and
+ * older (as determined by snooping the guest's INQUIRY commands),
+ * there is no RD/WR/VRPROTECT, so skip this check in these versions.
+ */
+if (s->qdev.scsi_version > 2 && (req->cmd.buf[1] & 0xe0)) {
 scsi_check_condition(>req, SENSE_CODE(INVALID_FIELD));
 return 0;
 }
@@ -2952,6 +2963,8 @@ static Property scsi_hd_properties[] = {
 DEFINE_PROP_UINT64("max_io_size", SCSIDiskState, max_io_size,
DEFAULT_MAX_IO_SIZE),
 DEFINE_PROP_UINT16("rotation_rate", SCSIDiskState, rotation_rate, 0),
+DEFINE_PROP_INT32("scsi_version", SCSIDiskState, qdev.default_scsi_version,
+  5),
 DEFINE_BLOCK_CHS_PROPERTIES(SCSIDiskState, qdev.conf),
 DEFINE_PROP_END_OF_LIST(),
 };
@@ -2997,6 +3010,8 @@ static Property scsi_cd_properties[] = {
 DEFINE_PROP_UINT16("port_index", SCSIDiskState, port_index, 0),
 DEFINE_PROP_UINT64("max_io_size", SCSIDiskState, max_io_size,
DEFAULT_MAX_IO_SIZE),
+DEFINE_PROP_INT32("scsi_version", SCSIDiskState, qdev.default_scsi_version,
+  5),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -3025,6 +3040,8 @@ static Property scsi_block_properties[] = {
 DEFINE_PROP_DRIVE("drive", SCSIDiskState, qdev.conf.blk),
 DEFINE_PROP_BOOL("share-rw", SCSIDiskState, qdev.conf.share_rw, false),
 DEFINE_PROP_UINT16("rotation_rate", SCSIDiskState, rotation_rate, 0),
+DEFINE_PROP_INT32("scsi_version", SCSIDiskState, qdev.default_scsi_version,
+  5),
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -3065,6 +3082,8 @@ static Property scsi_disk_properties[] = {
DEFAULT_MAX_UNMAP_SIZE),
 DEFINE_PROP_UINT64("max_io_size", SCSIDiskState, max_io_size,
DEFAULT_MAX_IO_SIZE),
+

[Qemu-devel] [PATCH v4 0/2] hw/scsi: support SCSI-2 passthrough without PI

2018-04-05 Thread Paolo Bonzini
This is my version of Daniel's patch.  In order to keep the
RD/WRPROTECT check for emulated SCSI disks, I've added a new property
to scsi-disk/hd/cd devices as well.  The property, similar to the
earlier versions posted by Daniel, is available to the user, but for
scsi-disk/hd/cd it affects the INQUIRY value returned by the emulated
INQUIRY command.

Daniel, can you please test it?

Thanks,

Paolo

Daniel Henrique Barboza (1):
  hw/scsi: support SCSI-2 passthrough without PI

Paolo Bonzini (1):
  scsi-disk: allow customizing the SCSI version

 hw/scsi/scsi-disk.c| 29 -
 hw/scsi/scsi-generic.c | 48 +---
 include/hw/scsi/scsi.h |  2 ++
 3 files changed, 63 insertions(+), 16 deletions(-)

-- 
2.16.2




Re: [Qemu-devel] [PATCH] hw/sparc64/sun4u: Fix introspection by converting prom instance_init to realize

2018-04-05 Thread Philippe Mathieu-Daudé
On 04/05/2018 06:32 AM, Thomas Huth wrote:
> The instance_init function of devices should always succeed to be able
> to introspect the device. However, the instance_init function of the
> "openprom" device can currently fail, for example like this:
> 
> $ echo "{'execute':'qmp_capabilities'}"\
>"{'execute':'device-list-properties',"\
>" 'arguments':{'typename':'openprom'}}" \
>| sparc64-softmmu/qemu-system-sparc64 -M sun4v,accel=qtest -qmp stdio
> {"QMP": {"version": {"qemu": {"micro": 91, "minor": 11, "major": 2},
>  "package": "build-all"}, "capabilities": []}}
> {"return": {}}
> RAMBlock "sun4u.prom" already registered, abort!
> Aborted (core dumped)
> 
> This should not happen. Fix this problem by moving the affected code from
> instance_init into a realize function instead.
> 
> Signed-off-by: Thomas Huth 
> ---
>  hw/sparc64/sun4u.c | 18 --
>  1 file changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/sparc64/sun4u.c b/hw/sparc64/sun4u.c
> index 2044a52..d62f5a2 100644
> --- a/hw/sparc64/sun4u.c
> +++ b/hw/sparc64/sun4u.c
> @@ -425,13 +425,19 @@ static void prom_init(hwaddr addr, const char 
> *bios_name)
>  }
>  }
>  
> -static void prom_init1(Object *obj)
> +static void prom_realize(DeviceState *ds, Error **errp)
>  {
> -PROMState *s = OPENPROM(obj);
> -SysBusDevice *dev = SYS_BUS_DEVICE(obj);
> +PROMState *s = OPENPROM(ds);
> +SysBusDevice *dev = SYS_BUS_DEVICE(ds);
> +Error *local_err = NULL;
> +
> +memory_region_init_ram_nomigrate(>prom, OBJECT(ds), "sun4u.prom",
> + PROM_SIZE_MAX, _err);

This looks the memory_region_init_ram() pattern...

> +if (local_err) {
> +error_propagate(errp, local_err);
> +return;
> +}
>  
> -memory_region_init_ram_nomigrate(>prom, obj, "sun4u.prom", 
> PROM_SIZE_MAX,
> -   _fatal);
>  vmstate_register_ram_global(>prom);

...^

Reviewed-by: Philippe Mathieu-Daudé 

>  memory_region_set_readonly(>prom, true);
>  sysbus_init_mmio(dev, >prom);
> @@ -446,6 +452,7 @@ static void prom_class_init(ObjectClass *klass, void 
> *data)
>  DeviceClass *dc = DEVICE_CLASS(klass);
>  
>  dc->props = prom_properties;
> +dc->realize = prom_realize;
>  }
>  
>  static const TypeInfo prom_info = {
> @@ -453,7 +460,6 @@ static const TypeInfo prom_info = {
>  .parent= TYPE_SYS_BUS_DEVICE,
>  .instance_size = sizeof(PROMState),
>  .class_init= prom_class_init,
> -.instance_init = prom_init1,
>  };
>  
>  
> 



Re: [Qemu-devel] [PATCH v3 1/1] hw/scsi: support SCSI-2 passthrough without PI

2018-04-05 Thread Daniel Henrique Barboza



On 04/05/2018 12:56 PM, Paolo Bonzini wrote:

On 27/03/2018 23:14, Daniel Henrique Barboza wrote:

  /* We get here only for BYTCHK == 0x01 and only for scsi-block.
   * As far as DMA is concerned, we can treat it the same as a write;
   * scsi_block_do_sgio will send VERIFY commands.
+ *
+ * For scsi versions 2 and older, the BYTCHK isn't related
+ * to VRPROTECT (in fact, there is no VRPROTECT). Skip
+ * this check in these versions.
   */
-if (r->req.cmd.buf[1] & 0xe0) {
+if ((r->req.cmd.buf[1] & 0xe0) && (s->qdev.scsi_version > 2)) {
  goto illegal_request;
  }

This also has to check for "s->qdev.scsi_version != -1" so that the
behavior of emulated SCSI isn't changed (they claim SPC-3).  I made this
change and queued the patch.


Good catch.


Daniel



Paolo






[Qemu-devel] TCG icount interaction with timer deadlines

2018-04-05 Thread Peter Maydell
Does anybody understand how icount TCG is supposed to arrange to
respect timer deadlines?

https://bugs.launchpad.net/qemu/+bug/1754038 has a test case which
shows that we don't get this right.

At the moment what happens is:
 * when we're about to call tcg_cpu_exec(), we call
   prepare_icount_for_run(), which looks at the closest clock deadline
   to figure out how many insns to execute before dropping out, with
   a cap at INT32_MAX nanoseconds
 * however, if the guest reprograms the clock during the tcg_cpu_exec()
   run, we don't do anything to cause us to stop earlier
 * so we blithely continue on til that INT32_MAX cap, and then
   belatedly notice that we should have fired a timer interrupt

In the test case this manifests as the first timer interrupt being
very delayed, because the first tcg_cpu_exec() goes from "start of
program" to INT32_MAX nanoseconds. Later interrupts happen OK, because
the guest isn't reprogramming the timer interrupt, so the deadline
picked at the start of each tcg_cpu_exec run is correct.

What should we be doing to arrange to stop execution of the
tcg_cpu_exec() earlier when the deadline moves closer?

thanks
-- PMM



Re: [Qemu-devel] [PATCH v3 1/1] hw/scsi: support SCSI-2 passthrough without PI

2018-04-05 Thread Paolo Bonzini
On 27/03/2018 23:14, Daniel Henrique Barboza wrote:
>  /* We get here only for BYTCHK == 0x01 and only for scsi-block.
>   * As far as DMA is concerned, we can treat it the same as a write;
>   * scsi_block_do_sgio will send VERIFY commands.
> + *
> + * For scsi versions 2 and older, the BYTCHK isn't related
> + * to VRPROTECT (in fact, there is no VRPROTECT). Skip
> + * this check in these versions.
>   */
> -if (r->req.cmd.buf[1] & 0xe0) {
> +if ((r->req.cmd.buf[1] & 0xe0) && (s->qdev.scsi_version > 2)) {
>  goto illegal_request;
>  }

This also has to check for "s->qdev.scsi_version != -1" so that the
behavior of emulated SCSI isn't changed (they claim SPC-3).  I made this
change and queued the patch.

Paolo



Re: [Qemu-devel] [PATCH v3] linux-user: fix preadv/pwritev offsets

2018-04-05 Thread Laurent Vivier
Le 05/04/2018 à 15:47, Max Filippov a écrit :
> preadv/pwritev accept low and high parts of file offset in two separate
> parameters. When host bitness doesn't match guest bitness these parts
> must be appropriately recombined.
> Introduce target_to_host_low_high that does this recombination and use
> it in preadv/pwritev syscalls.
> 
> This fixes glibc testsuite test misc/tst-preadvwritev64.
> 
> Signed-off-by: Max Filippov 
> ---
> Changes v2->v3:
> - rename helper to target_to_host_low_high;
> - handle cases TARGET_LONG_BITS * 2 > HOST_LONG_BITS and
>   TARGET_LONG_BITS < 2 * HOST_LONG_BITS correctly.
> 
> Changes v1->v2:
> - fix host high computation in TARGET_LONG_BITS > HOST_LONG_BITS case
> 
>  linux-user/syscall.c | 34 --
>  1 file changed, 32 insertions(+), 2 deletions(-)
> 
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index 5ef517613577..13ad572e0d3b 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -3386,6 +3386,30 @@ static abi_long do_getsockopt(int sockfd, int level, 
> int optname,
>  return ret;
>  }
>  
> +static void target_to_host_low_high(abi_ulong tlow,
> +abi_ulong thigh,
> +unsigned long *hlow,
> +unsigned long *hhigh)
> +{
> +#if TARGET_LONG_BITS == HOST_LONG_BITS
> +*hlow = tlow;
> +*hhigh = thigh;
> +#elif TARGET_LONG_BITS < HOST_LONG_BITS
> +*hlow = tlow | (unsigned long)thigh << TARGET_LONG_BITS;
> +#if TARGET_LONG_BITS * 2 <= HOST_LONG_BITS
> +*hhigh = 0;
> +#else
> +*hhigh = (unsigned long)thigh >> (HOST_LONG_BITS - TARGET_LONG_BITS);
> +#endif
> +#else
> +*hlow = (unsigned long)tlow;
> +*hhigh = (unsigned long)(tlow >> HOST_LONG_BITS);
> +#if TARGET_LONG_BITS < 2 * HOST_LONG_BITS
> +*hhigh |= (unsigned long)thigh << (TARGET_LONG_BITS - 
> HOST_LONG_BITS);
> +#endif
> +#endif
> +}

Why don't you try to de-construct then re-construct the offset?

Kernel commit
  601cc11d054a "Make non-compat preadv/pwritev use native register size"
is interesting.

static inline abi_llong target_pos_from_hilo(abi_ulong high,
 abi_ulong low)
{
#define TARGET_HALF_LONG_BITS (TARGET_LONG_BITS / 2)
   return (((abi_llong)high << TARGET_HALF_LONG_BITS) <<
 TARGET_HALF_LONG_BITS) | low;
}

#define HOST_HALF_LONG_BITS (HOST_LONG_BITS / 2)

 abi_llong pos = target_pos_from_hilo(arg4, arg5);
 ret = get_errno(safe_preadv(arg1, vec, arg3,
  pos,
  (pos >> HOST_HALF_LONG_BITS) >> HOST_HALF_LONG_BITS));

It looks simpler, but perhaps I miss something
(it's just cut, I don't pretend to understand that code...)?

Richard?

Thanks,
Laurent



Re: [Qemu-devel] [PATCH v2] scsi-disk: Don't enlarge min_io_size to max_io_size

2018-04-05 Thread Paolo Bonzini
On 27/03/2018 18:41, Fam Zheng wrote:
> Some backends report big max_io_sectors. Making min_io_size the same
> value in this case will make it impossible for guest to align memory,
> therefore the disk may not be usable at all.
> 
> Do not enlarge them when they are zero.
> 
> Reported-by: David Gibson 
> Signed-off-by: Fam Zheng 
> 
> ---
> 
> v2: Leave the values alone if zero. [Paolo]
> At least we can consult block layer for a slightly more sensible
> opt_io_size, but that's for another patch.
> ---
>  hw/scsi/scsi-disk.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
> index f5ab767ab5..f8ed8cf2b4 100644
> --- a/hw/scsi/scsi-disk.c
> +++ b/hw/scsi/scsi-disk.c
> @@ -714,10 +714,12 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, 
> uint8_t *outbuf)
>  
>  /* min_io_size and opt_io_size can't be greater than
>   * max_io_sectors */
> -min_io_size =
> -MIN_NON_ZERO(min_io_size, max_io_sectors);
> -opt_io_size =
> -MIN_NON_ZERO(opt_io_size, max_io_sectors);
> +if (min_io_size) {
> +min_io_size = MIN(min_io_size, max_io_sectors);
> +}
> +if (opt_io_size) {
> +opt_io_size = MIN(opt_io_size, max_io_sectors);
> +}
>  }
>  /* required VPD size with unmap support */
>  buflen = 0x40;
> 

Queued, thanks.

Paolo



Re: [Qemu-devel] [PATCH] configure: Add missing configure options to help text

2018-04-05 Thread Paolo Bonzini
On 27/03/2018 17:09, Thomas Huth wrote:
> We forgot to mention --with-git, --libexecdir and --with-pkgversion
> so far.
> 
> Signed-off-by: Thomas Huth 
> ---
>  configure | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/configure b/configure
> index 4d0e92c..a473609 100755
> --- a/configure
> +++ b/configure
> @@ -1497,16 +1497,19 @@ Advanced options (experts only):
>--install=INSTALLuse specified install [$install]
>--python=PYTHON  use specified python [$python]
>--smbd=SMBD  use specified smbd [$smbd]
> +  --with-git=GIT   use specified git [$git]
>--static enable static build [$static]
>--mandir=PATHinstall man pages in PATH
>--datadir=PATH   install firmware in PATH$confsuffix
>--docdir=PATHinstall documentation in PATH$confsuffix
>--bindir=PATHinstall binaries in PATH
>--libdir=PATHinstall libraries in PATH
> +  --libexecdir=PATHinstall helper binaries in PATH
>--sysconfdir=PATHinstall config in PATH$confsuffix
>--localstatedir=PATH install local state in PATH (set at runtime on 
> win32)
>--firmwarepath=PATH  search PATH for firmware files
>--with-confsuffix=SUFFIX suffix for QEMU data inside 
> datadir/libdir/sysconfdir [$confsuffix]
> +  --with-pkgversion=VERS   use specified string as sub-version of the package
>--enable-debug   enable common debug build options
>--enable-sanitizers  enable default sanitizers
>--disable-strip  disable stripping binaries
> 

Queued, thanks.

Paolo




Re: [Qemu-devel] [PATCH for-2.12 v3 0/2] i386/hyperv: fully control Hyper-V features in CPUID

2018-04-05 Thread Paolo Bonzini
On 30/03/2018 19:02, Roman Kagan wrote:
> In order to guarantee compatibility on migration, QEMU should have
> complete control over the features it announces to the guest via CPUID.
> 
> However, a number of Hyper-V-related features happen to depend on the
> support in the underlying KVM, with no regard to QEMU configuration.
> 
> Make QEMU regain control over what Hyper-V features it announces to the
> guest.
> 
> Note #1: the patches are also being proposed[*] for stable-2.11, even
> though one of them introduces a new cpu property.  This is done to
> minimize the number of published QEMU releases where the behavior of the
> features is unpredictable, with potentially fatal consequences for the
> guest.
> 
> Note #2: there are other problems in the surrounding code, like ugly
> error reporting or inconsistent population of MSRs.  I think this can be
> put off to post-2.12.
> 
> [*] for the stable branch the second patch will have error returns
> replaced with warnings; I'll post a separate series.
> 
> v2 -> v3:
>  - include the fix for 'hv-time' missed previously
> 
> v1 -> v2:
>  - indicate what flag requested the feature that can't be enabled in the
>error message
>  - fix a typo in the error message for VP_RUNTIME

Queued, thanks.

Paolo

> 
> Roman Kagan (2):
>   i386/hyperv: add hv-frequencies cpu property
>   i386/hyperv: error out if features requested but unsupported
> 
>  target/i386/cpu.h |  1 +
>  target/i386/cpu.c |  1 +
>  target/i386/kvm.c | 56 
> ++-
>  3 files changed, 45 insertions(+), 13 deletions(-)
> 




Re: [Qemu-devel] [PATCH] WHPX fixes an issue with CPUID 1 not returning CPUID_EXT_HYPERVISOR

2018-04-05 Thread Paolo Bonzini
On 28/03/2018 22:48, Justin Terry (VM) wrote:
> 1. (As the code is doing now). At partition creation time you can
> register for specific CPUID exits and then respond to the CPUID with
> your custom answer or with the Hypervisor defaults that were forwarded
> to you. Unfortunately, QEMU has no way to know the Hypervisor default
> ahead of time but QEMU can make at least make a runtime decision about
> how to respond.
>
> 2. At partition creation time the platform allows QEMU to inject (set)
> the default responses for specific CPUID exits.

... but it still cannot access the hypervisor defaults, right?

> This can now be done by
> setting the `WHV_X64_CPUID_RESULT` in the `CpuidResultList` of
> `WHV_PARTITION_PROPERTY` to the exit values QEMU wants. So effectively
> you can know the answers ahead of time for any that you set but the
> answers are not dynamic.
> 
> The only issues/questions I have there are:
> 
> If we use [1] (like the code is now) I don't see any way to keep the
> exits in cpu_x86_cpuid() matched up with the registered exits to WHPX.

You'd have to do that on a case-by-case basis.  For example, the number
of leaves can be the minimum of the hypervisor and QEMU values, or just
the QEMU value; for "feature" leaves the results will be the AND of the
QEMU and WHPX features; for the XSAVE/XSAVEC/XSAVES (size, offset)
tuples you have to use WHPX's; for family/model/stepping/name/vendor
your mileage may vary but I suppose you can just use WHPX's; and so on.

You can take a look at target/i386/cpu.c's array feature_word_info and
add a function like

void x86_is_feature_word(uint32_t eax, uint32_t ecx, int reg)
{
for (w = 0; w < FEATURE_WORDS; w++) {
FeatureWordInfo *wi = _word_info[w];
if (eax == wi->cpuid_eax &&
(ecx == wi->cpuid_ecx || wi->cpuid_needs_ecx) &&
reg == wi->cpuid_reg) {
return true;
}
}
return false;
}

so that the code in the end is

switch (eax) {
/* yadda yadda... code to special case leaves whose value
 * comes from WHPX.
 */
...
default:
if (x86_is_feature_word(eax, ecx, R_EAX)) {
rax &= vcpu->exit_ctx.CpuidAccess.DefaultResultRax;
}
if (x86_is_feature_word(eax, ecx, R_EBX)) {
rax &= vcpu->exit_ctx.CpuidAccess.DefaultResultRbx;
}
if (x86_is_feature_word(eax, ecx, R_ECX)) {
rax &= vcpu->exit_ctx.CpuidAccess.DefaultResultRcx;
}
if (x86_is_feature_word(eax, ecx, R_EDX)) {
rax &= vcpu->exit_ctx.CpuidAccess.DefaultResultRdx;
}
}

> If we use [2] to inject the answers at creation time WHPX needs access
> to the CPUX86State at accel init which also doesn't seem to be possible
> in QEMU today. WHPX could basically just call cpu_x86_cpuid() for each
> CPUID QEMU cares about and plumb the answer before start. This has the
> best performance as we avoid the additional exits but has an issue in
> that the results must be known ahead of time.

The earliest where you have access to that is x86_cpu_initfn.

Paolo



Re: [Qemu-devel] [PATCH for-2.12] memfd: fix vhost-user-test on non-memfd capable host

2018-04-05 Thread Paolo Bonzini
On 28/03/2018 14:18, Marc-André Lureau wrote:
> On RHEL7, memfd is not supported, and vhost-user-test fails:
> TEST: tests/vhost-user-test... (pid=10248)
>   /x86_64/vhost-user/migrate:
>   qemu-system-x86_64: -object memory-backend-memfd,id=mem,size=2M,: failed to 
> create memfd
> FAIL
> 
> There is a qemu_memfd_check() to prevent running memfd path, but it
> also checks for fallback implementation. Let's specialize
> qemu_memfd_check() to check memfd only, while qemu_memfd_alloc_check()
> checks for the qemu_memfd_alloc() API.
> 
> Reported-by: Miroslav Rezanina 
> Tested-by: Miroslav Rezanina 
> Signed-off-by: Marc-André Lureau 
> ---
>  include/qemu/memfd.h |  1 +
>  hw/virtio/vhost.c|  2 +-
>  util/memfd.c | 30 +-
>  3 files changed, 31 insertions(+), 2 deletions(-)
> 
> diff --git a/include/qemu/memfd.h b/include/qemu/memfd.h
> index de10198ed6..49e79634da 100644
> --- a/include/qemu/memfd.h
> +++ b/include/qemu/memfd.h
> @@ -18,6 +18,7 @@
>  
>  int qemu_memfd_create(const char *name, size_t size, bool hugetlb,
>uint64_t hugetlbsize, unsigned int seals, Error 
> **errp);
> +bool qemu_memfd_alloc_check(void);
>  void *qemu_memfd_alloc(const char *name, size_t size, unsigned int seals,
> int *fd, Error **errp);
>  void qemu_memfd_free(void *ptr, size_t size, int fd);
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 250f886acb..27c1ec5fe8 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -1223,7 +1223,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
>  if (!(hdev->features & (0x1ULL << VHOST_F_LOG_ALL))) {
>  error_setg(>migration_blocker,
> "Migration disabled: vhost lacks VHOST_F_LOG_ALL 
> feature.");
> -} else if (vhost_dev_log_is_shared(hdev) && !qemu_memfd_check()) {
> +} else if (vhost_dev_log_is_shared(hdev) && 
> !qemu_memfd_alloc_check()) {
>  error_setg(>migration_blocker,
> "Migration disabled: failed to allocate shared 
> memory");
>  }
> diff --git a/util/memfd.c b/util/memfd.c
> index 07d579ea7d..277f7211af 100644
> --- a/util/memfd.c
> +++ b/util/memfd.c
> @@ -173,7 +173,13 @@ enum {
>  MEMFD_TODO
>  };
>  
> -bool qemu_memfd_check(void)
> +/**
> + * qemu_memfd_alloc_check():
> + *
> + * Check if qemu_memfd_alloc() can allocate, including using a
> + * fallback implementation when host doesn't support memfd.
> + */
> +bool qemu_memfd_alloc_check(void)
>  {
>  static int memfd_check = MEMFD_TODO;
>  
> @@ -188,3 +194,25 @@ bool qemu_memfd_check(void)
>  
>  return memfd_check == MEMFD_OK;
>  }
> +
> +/**
> + * qemu_memfd_check():
> + *
> + * Check if host supports memfd.
> + */
> +bool qemu_memfd_check(void)
> +{
> +static int memfd_check = MEMFD_TODO;
> +
> +if (memfd_check == MEMFD_TODO) {
> +int mfd = memfd_create("test", 0);
> +if (mfd >= 0) {
> +memfd_check = MEMFD_OK;
> +close(mfd);
> +} else {
> +memfd_check = MEMFD_KO;
> +}
> +}
> +
> +return memfd_check == MEMFD_OK;
> +}
> 

Queued, thanks.

Paolo



Re: [Qemu-devel] [PATCH 3/3] s390: Do not pass inofficial IPL type to the guest

2018-04-05 Thread Viktor VM Mihajlovski
On 05.04.2018 17:11, David Hildenbrand wrote:
> On 05.04.2018 17:07, Viktor Mihajlovski wrote:
>> IPL over a virtio-scsi device requires special handling not
>> available in the real architecture. For this purpose the IPL
>> type 0xFF has been chosen as means of communication between
>> QEMU and the pc-bios. However, a guest OS could be confused
>> by seeing an unknown IPL type.
>>
>> This change sets the IPL parameter type to 0x02 (CCW) to prevent
>> this. Pre-existing Linux has looked up the IPL parameters only in
>> the case of FCP IPL. This means that the behavior should stay
>> the same even if Linux checks for the IPL type unconditionally.
>>
>> Signed-off-by: Viktor Mihajlovski 
>> ---
>>  pc-bios/s390-ccw/bootmap.c |  7 +++
>>  pc-bios/s390-ccw/iplb.h| 15 +--
>>  2 files changed, 20 insertions(+), 2 deletions(-)
>>
>> diff --git a/pc-bios/s390-ccw/bootmap.c b/pc-bios/s390-ccw/bootmap.c
>> index fc2a9fe..9287b7a 100644
>> --- a/pc-bios/s390-ccw/bootmap.c
>> +++ b/pc-bios/s390-ccw/bootmap.c
>> @@ -70,6 +70,13 @@ static void jump_to_IPL_code(uint64_t address)
>>  {
>>  /* store the subsystem information _after_ the bootmap was loaded */
>>  write_subsystem_identification();
>> +
>> +/* prevent unknown IPL types in the guest */
>> +if (iplb.pbt == S390_IPL_TYPE_QEMU_SCSI) {
>> +iplb.pbt = S390_IPL_TYPE_CCW;
>> +set_iplb();
>> +}
> 
> Confused, doesn't this imply that a system reset immediately after this
> instruction will result in something different getting booted?
> 
Not if the other (QEMU) patches of this series are applied. Without
them, the behavior is the same as with an re-ipl (Today's Linux will
always request a CCW re-ipl).
[...]

-- 
Regards,
  Viktor Mihajlovski




Re: [Qemu-devel] [PATCH 3/3] s390: Do not pass inofficial IPL type to the guest

2018-04-05 Thread David Hildenbrand
On 05.04.2018 17:07, Viktor Mihajlovski wrote:
> IPL over a virtio-scsi device requires special handling not
> available in the real architecture. For this purpose the IPL
> type 0xFF has been chosen as means of communication between
> QEMU and the pc-bios. However, a guest OS could be confused
> by seeing an unknown IPL type.
> 
> This change sets the IPL parameter type to 0x02 (CCW) to prevent
> this. Pre-existing Linux has looked up the IPL parameters only in
> the case of FCP IPL. This means that the behavior should stay
> the same even if Linux checks for the IPL type unconditionally.
> 
> Signed-off-by: Viktor Mihajlovski 
> ---
>  pc-bios/s390-ccw/bootmap.c |  7 +++
>  pc-bios/s390-ccw/iplb.h| 15 +--
>  2 files changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/pc-bios/s390-ccw/bootmap.c b/pc-bios/s390-ccw/bootmap.c
> index fc2a9fe..9287b7a 100644
> --- a/pc-bios/s390-ccw/bootmap.c
> +++ b/pc-bios/s390-ccw/bootmap.c
> @@ -70,6 +70,13 @@ static void jump_to_IPL_code(uint64_t address)
>  {
>  /* store the subsystem information _after_ the bootmap was loaded */
>  write_subsystem_identification();
> +
> +/* prevent unknown IPL types in the guest */
> +if (iplb.pbt == S390_IPL_TYPE_QEMU_SCSI) {
> +iplb.pbt = S390_IPL_TYPE_CCW;
> +set_iplb();
> +}

Confused, doesn't this imply that a system reset immediately after this
instruction will result in something different getting booted?

> +
>  /*
>   * The IPL PSW is at address 0. We also must not overwrite the
>   * content of non-BIOS memory after we loaded the guest, so we
> diff --git a/pc-bios/s390-ccw/iplb.h b/pc-bios/s390-ccw/iplb.h
> index 7dfce4f..5357a36 100644
> --- a/pc-bios/s390-ccw/iplb.h
> +++ b/pc-bios/s390-ccw/iplb.h
> @@ -97,16 +97,27 @@ extern QemuIplParameters qipl;
>  #define S390_IPL_TYPE_CCW 0x02
>  #define S390_IPL_TYPE_QEMU_SCSI 0xff
>  
> -static inline bool store_iplb(IplParameterBlock *iplb)
> +static inline bool manage_iplb(IplParameterBlock *iplb, bool store)
>  {
>  register unsigned long addr asm("0") = (unsigned long) iplb;
>  register unsigned long rc asm("1") = 0;
>  
>  asm volatile ("diag %0,%2,0x308\n"
>: "+d" (addr), "+d" (rc)
> -  : "d" (6)
> +  : "d" (store ? 6 : 5)
>: "memory", "cc");
>  return rc == 0x01;
>  }
>  
> +
> +static inline bool store_iplb(IplParameterBlock *iplb)
> +{
> +return manage_iplb(iplb, true);
> +}
> +
> +static inline bool set_iplb(IplParameterBlock *iplb)
> +{
> +return manage_iplb(iplb, false);
> +}
> +
>  #endif /* IPLB_H */
> 


-- 

Thanks,

David / dhildenb



[Qemu-devel] [PATCH 1/3] s390: Refactor IPL parameter block generation

2018-04-05 Thread Viktor Mihajlovski
Splitting out the the CCW device extraction allows reuse.

Signed-off-by: Viktor Mihajlovski 
---
 hw/s390x/ipl.c | 81 --
 1 file changed, 51 insertions(+), 30 deletions(-)

diff --git a/hw/s390x/ipl.c b/hw/s390x/ipl.c
index fdeaec3..58e33c5 100644
--- a/hw/s390x/ipl.c
+++ b/hw/s390x/ipl.c
@@ -279,44 +279,52 @@ static void s390_ipl_set_boot_menu(S390IPLState *ipl)
 *timeout = cpu_to_be32(splash_time);
 }
 
+static CcwDevice *s390_get_ccw_device(DeviceState *dev_st)
+{
+CcwDevice *ccw_dev = NULL;
+
+if (dev_st) {
+VirtioCcwDevice *virtio_ccw_dev = (VirtioCcwDevice *)
+object_dynamic_cast(OBJECT(qdev_get_parent_bus(dev_st)->parent),
+TYPE_VIRTIO_CCW_DEVICE);
+if (virtio_ccw_dev) {
+ccw_dev = CCW_DEVICE(virtio_ccw_dev);
+} else {
+SCSIDevice *sd = (SCSIDevice *)
+object_dynamic_cast(OBJECT(dev_st),
+TYPE_SCSI_DEVICE);
+if (sd) {
+SCSIBus *bus = scsi_bus_from_device(sd);
+VirtIOSCSI *vdev = container_of(bus, VirtIOSCSI, bus);
+VirtIOSCSICcw *scsi_ccw = container_of(vdev, VirtIOSCSICcw,
+   vdev);
+
+ccw_dev = (CcwDevice *)object_dynamic_cast(OBJECT(scsi_ccw),
+   TYPE_CCW_DEVICE);
+}
+}
+}
+return ccw_dev;
+}
+
 static bool s390_gen_initial_iplb(S390IPLState *ipl)
 {
 DeviceState *dev_st;
+CcwDevice *ccw_dev = NULL;
 
 dev_st = get_boot_device(0);
 if (dev_st) {
-VirtioCcwDevice *virtio_ccw_dev = (VirtioCcwDevice *)
-object_dynamic_cast(OBJECT(qdev_get_parent_bus(dev_st)->parent),
-TYPE_VIRTIO_CCW_DEVICE);
+ccw_dev = s390_get_ccw_device(dev_st);
+}
+
+/*
+ * Currently allow IPL only from CCW devices.
+ */
+if (ccw_dev) {
 SCSIDevice *sd = (SCSIDevice *) object_dynamic_cast(OBJECT(dev_st),
 TYPE_SCSI_DEVICE);
-VirtIONet *vn = (VirtIONet *) object_dynamic_cast(OBJECT(dev_st),
-  TYPE_VIRTIO_NET);
-
-if (vn) {
-ipl->netboot = true;
-}
-if (virtio_ccw_dev) {
-CcwDevice *ccw_dev = CCW_DEVICE(virtio_ccw_dev);
-
-ipl->iplb.len = cpu_to_be32(S390_IPLB_MIN_CCW_LEN);
-ipl->iplb.blk0_len =
-cpu_to_be32(S390_IPLB_MIN_CCW_LEN - S390_IPLB_HEADER_LEN);
-ipl->iplb.pbt = S390_IPL_TYPE_CCW;
-ipl->iplb.ccw.devno = cpu_to_be16(ccw_dev->sch->devno);
-ipl->iplb.ccw.ssid = ccw_dev->sch->ssid & 3;
-} else if (sd) {
-SCSIBus *bus = scsi_bus_from_device(sd);
-VirtIOSCSI *vdev = container_of(bus, VirtIOSCSI, bus);
-VirtIOSCSICcw *scsi_ccw = container_of(vdev, VirtIOSCSICcw, vdev);
-CcwDevice *ccw_dev;
-
-ccw_dev = (CcwDevice *)object_dynamic_cast(OBJECT(scsi_ccw),
-   TYPE_CCW_DEVICE);
-if (!ccw_dev) {   /* It might be a PCI device instead */
-return false;
-}
 
+if (sd) {
 ipl->iplb.len = cpu_to_be32(S390_IPLB_MIN_QEMU_SCSI_LEN);
 ipl->iplb.blk0_len =
 cpu_to_be32(S390_IPLB_MIN_QEMU_SCSI_LEN - 
S390_IPLB_HEADER_LEN);
@@ -327,12 +335,25 @@ static bool s390_gen_initial_iplb(S390IPLState *ipl)
 ipl->iplb.scsi.devno = cpu_to_be16(ccw_dev->sch->devno);
 ipl->iplb.scsi.ssid = ccw_dev->sch->ssid & 3;
 } else {
-return false; /* unknown device */
+VirtIONet *vn = (VirtIONet *) object_dynamic_cast(OBJECT(dev_st),
+  TYPE_VIRTIO_NET);
+
+ipl->iplb.len = cpu_to_be32(S390_IPLB_MIN_CCW_LEN);
+ipl->iplb.blk0_len =
+cpu_to_be32(S390_IPLB_MIN_CCW_LEN - S390_IPLB_HEADER_LEN);
+ipl->iplb.pbt = S390_IPL_TYPE_CCW;
+ipl->iplb.ccw.devno = cpu_to_be16(ccw_dev->sch->devno);
+ipl->iplb.ccw.ssid = ccw_dev->sch->ssid & 3;
+
+if (vn) {
+ipl->netboot = true;
+}
 }
 
 if (!s390_ipl_set_loadparm(ipl->iplb.loadparm)) {
 ipl->iplb.flags |= DIAG308_FLAGS_LP_VALID;
 }
+
 return true;
 }
 
-- 
1.9.1




[Qemu-devel] [PATCH 3/3] s390: Do not pass inofficial IPL type to the guest

2018-04-05 Thread Viktor Mihajlovski
IPL over a virtio-scsi device requires special handling not
available in the real architecture. For this purpose the IPL
type 0xFF has been chosen as means of communication between
QEMU and the pc-bios. However, a guest OS could be confused
by seeing an unknown IPL type.

This change sets the IPL parameter type to 0x02 (CCW) to prevent
this. Pre-existing Linux has looked up the IPL parameters only in
the case of FCP IPL. This means that the behavior should stay
the same even if Linux checks for the IPL type unconditionally.

Signed-off-by: Viktor Mihajlovski 
---
 pc-bios/s390-ccw/bootmap.c |  7 +++
 pc-bios/s390-ccw/iplb.h| 15 +--
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/pc-bios/s390-ccw/bootmap.c b/pc-bios/s390-ccw/bootmap.c
index fc2a9fe..9287b7a 100644
--- a/pc-bios/s390-ccw/bootmap.c
+++ b/pc-bios/s390-ccw/bootmap.c
@@ -70,6 +70,13 @@ static void jump_to_IPL_code(uint64_t address)
 {
 /* store the subsystem information _after_ the bootmap was loaded */
 write_subsystem_identification();
+
+/* prevent unknown IPL types in the guest */
+if (iplb.pbt == S390_IPL_TYPE_QEMU_SCSI) {
+iplb.pbt = S390_IPL_TYPE_CCW;
+set_iplb();
+}
+
 /*
  * The IPL PSW is at address 0. We also must not overwrite the
  * content of non-BIOS memory after we loaded the guest, so we
diff --git a/pc-bios/s390-ccw/iplb.h b/pc-bios/s390-ccw/iplb.h
index 7dfce4f..5357a36 100644
--- a/pc-bios/s390-ccw/iplb.h
+++ b/pc-bios/s390-ccw/iplb.h
@@ -97,16 +97,27 @@ extern QemuIplParameters qipl;
 #define S390_IPL_TYPE_CCW 0x02
 #define S390_IPL_TYPE_QEMU_SCSI 0xff
 
-static inline bool store_iplb(IplParameterBlock *iplb)
+static inline bool manage_iplb(IplParameterBlock *iplb, bool store)
 {
 register unsigned long addr asm("0") = (unsigned long) iplb;
 register unsigned long rc asm("1") = 0;
 
 asm volatile ("diag %0,%2,0x308\n"
   : "+d" (addr), "+d" (rc)
-  : "d" (6)
+  : "d" (store ? 6 : 5)
   : "memory", "cc");
 return rc == 0x01;
 }
 
+
+static inline bool store_iplb(IplParameterBlock *iplb)
+{
+return manage_iplb(iplb, true);
+}
+
+static inline bool set_iplb(IplParameterBlock *iplb)
+{
+return manage_iplb(iplb, false);
+}
+
 #endif /* IPLB_H */
-- 
1.9.1




[Qemu-devel] [PATCH 2/3] s390: Ensure IPL from SCSI works as expected

2018-04-05 Thread Viktor Mihajlovski
Operating systems may request an IPL from a virtio-scsi device
by specifying an IPL parameter type of CCW. In this case QEMU
won't set up the IPLB correctly. The BIOS will still detect
it's a SCSI device to boot from, but it will now have to search
for the first LUN and attempt to boot from there.
However this may not be the original boot LUN if there's more than
one SCSI disk attached to the HBA.

With this change QEMU will detect that the request is for a
SCSI device and will rebuild the initial IPL parameter info
if it's the SCSI device used for the first boot. In consequence
the BIOS can use the boot LUN from the IPL information block.

In case a different SCSI device has been set, the BIOS will find
and use the first available LUN.

Signed-off-by: Viktor Mihajlovski 
---
 hw/s390x/ipl.c | 31 +--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/hw/s390x/ipl.c b/hw/s390x/ipl.c
index 58e33c5..fb554ab 100644
--- a/hw/s390x/ipl.c
+++ b/hw/s390x/ipl.c
@@ -427,7 +427,8 @@ unref_mr:
 return img_size;
 }
 
-static bool is_virtio_net_device(IplParameterBlock *iplb)
+static bool is_virtio_ccw_device_of_type(IplParameterBlock *iplb,
+ int virtio_id)
 {
 uint8_t cssid;
 uint8_t ssid;
@@ -447,13 +448,23 @@ static bool is_virtio_net_device(IplParameterBlock *iplb)
 sch = css_find_subch(1, cssid, ssid, schid);
 
 if (sch && sch->devno == devno) {
-return sch->id.cu_model == VIRTIO_ID_NET;
+return sch->id.cu_model == virtio_id;
 }
 }
 }
 return false;
 }
 
+static bool is_virtio_net_device(IplParameterBlock *iplb)
+{
+return is_virtio_ccw_device_of_type(iplb, VIRTIO_ID_NET);
+}
+
+static bool is_virtio_scsi_device(IplParameterBlock *iplb)
+{
+return is_virtio_ccw_device_of_type(iplb, VIRTIO_ID_SCSI);
+}
+
 void s390_ipl_update_diag308(IplParameterBlock *iplb)
 {
 S390IPLState *ipl = get_ipl_device();
@@ -478,6 +489,22 @@ void s390_reipl_request(void)
 S390IPLState *ipl = get_ipl_device();
 
 ipl->reipl_requested = true;
+if (ipl->iplb_valid &&
+!ipl->netboot &&
+ipl->iplb.pbt == S390_IPL_TYPE_CCW &&
+is_virtio_scsi_device(>iplb)) {
+CcwDevice *ccw_dev = s390_get_ccw_device(get_boot_device(0));
+
+if (ccw_dev &&
+cpu_to_be16(ccw_dev->sch->devno) == ipl->iplb.ccw.devno &&
+(ccw_dev->sch->ssid & 3) == ipl->iplb.ccw.ssid) {
+/*
+ * this is the original boot device's SCSI
+ * so restore IPL parameter info from it
+ */
+ipl->iplb_valid = s390_gen_initial_iplb(ipl);
+}
+}
 qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
 }
 
-- 
1.9.1




[Qemu-devel] [PATCH 0/3] s390: Fix virtio-scsi IPL quirks

2018-04-05 Thread Viktor Mihajlovski
IPL from virtio-scsi currently uses a non-standard parameter
type definition to pass boot parameters from QEMU to the
BIOS.

There are two potential issues with this approach:
o If the guest operating systems requests a re-ipl of type CCW
  where the boot device is a virtio-scsi HBA, this goes unnoticed
  by QEMU. The BIOS will detect that it's IPLing from a SCSI
  device, but it will boot the first LUN found, which might not
  be the one used for the initial boot.
o The guest operating system can be confused by an unknown
  IPL parameter block type. If the OS hasn't previously used 
  diag308 to store the IPL info but is changed to do so, a
  user-observable change in behavior will happen.

The following patches address the issues above. 

Viktor Mihajlovski (3):
  s390: Refactor IPL parameter block generation
  s390: Ensure IPL from SCSI works as expected
  s390: Do not pass inofficial IPL type to the guest

 hw/s390x/ipl.c | 112 -
 pc-bios/s390-ccw/bootmap.c |   7 +++
 pc-bios/s390-ccw/iplb.h|  15 +-
 3 files changed, 100 insertions(+), 34 deletions(-)

-- 
1.9.1




[Qemu-devel] [PATCH for 2.13 0/2] target/ppc: Support adding memory to initially memory-less NUMA nodes

2018-04-05 Thread Serhii Popovych
Now PowerPC Linux kernel supports hot-add to NUMA nodes not populated
initially with memory we can enable such support in qemu. This requires
two changes:

  o Add device tree property "ibm,max-associativity-domains" to let
guest kernel chance to find max possible NUMA node

  o Revert  commit b556854bd852 ("spapr: Don't allow memory hotplug to
memory less nodes") to remove check for hot-add to memory-less node.

See description messges for individual changes for more details.

Serhii Popovych (2):
  Revert "spapr: Don't allow memory hotplug to memory less nodes"
  spapr: Add ibm,max-associativity-domains property

 hw/ppc/spapr.c | 33 +++--
 1 file changed, 11 insertions(+), 22 deletions(-)

-- 
1.8.3.1




[Qemu-devel] [PATCH for 2.13 2/2] spapr: Add ibm, max-associativity-domains property

2018-04-05 Thread Serhii Popovych
Now recent kernels (i.e. since linux-stable commit a346137e9142
("powerpc/numa: Use ibm,max-associativity-domains to discover possible nodes")
support this property to mark initially memory-less NUMA nodes as "possible"
to allow further memory hot-add to them.

Advertise this property for pSeries machines to let guest kernels detect
maximum supported node configuration and benefit from kernel side change
when hot-add memory to specific, possibly empty before, NUMA node.

Signed-off-by: Serhii Popovych 
---
 hw/ppc/spapr.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 3ad4545..e02fc94 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -909,6 +909,14 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, void 
*fdt)
 0, cpu_to_be32(SPAPR_MEMORY_BLOCK_SIZE),
 cpu_to_be32(max_cpus / smp_threads),
 };
+uint32_t maxdomains[] = {
+cpu_to_be32(5),
+cpu_to_be32(0),
+cpu_to_be32(0),
+cpu_to_be32(0),
+cpu_to_be32(nb_numa_nodes - 1),
+cpu_to_be32(max_cpus - 1),
+};
 
 _FDT(rtas = fdt_add_subnode(fdt, 0, "rtas"));
 
@@ -945,6 +953,9 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, void 
*fdt)
 _FDT(fdt_setprop(fdt, rtas, "ibm,associativity-reference-points",
  refpoints, sizeof(refpoints)));
 
+_FDT(fdt_setprop(fdt, rtas, "ibm,max-associativity-domains",
+ maxdomains, sizeof(maxdomains)));
+
 _FDT(fdt_setprop_cell(fdt, rtas, "rtas-error-log-max",
   RTAS_ERROR_LOG_MAX));
 _FDT(fdt_setprop_cell(fdt, rtas, "rtas-event-scan-rate",
-- 
1.8.3.1




[Qemu-devel] [PATCH for 2.13 1/2] Revert "spapr: Don't allow memory hotplug to memory less nodes"

2018-04-05 Thread Serhii Popovych
This reverts commit b556854bd8524c26b8be98ab1bfdf0826831e793.

Leave change @node type from uint32_t to to int from reverted commit
because node < 0 is always false.

Signed-off-by: Serhii Popovych 
---
 hw/ppc/spapr.c | 22 --
 1 file changed, 22 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 2c0be8c..3ad4545 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3477,28 +3477,6 @@ static void spapr_machine_device_plug(HotplugHandler 
*hotplug_dev,
 return;
 }
 
-/*
- * Currently PowerPC kernel doesn't allow hot-adding memory to
- * memory-less node, but instead will silently add the memory
- * to the first node that has some memory. This causes two
- * unexpected behaviours for the user.
- *
- * - Memory gets hotplugged to a different node than what the user
- *   specified.
- * - Since pc-dimm subsystem in QEMU still thinks that memory belongs
- *   to memory-less node, a reboot will set things accordingly
- *   and the previously hotplugged memory now ends in the right node.
- *   This appears as if some memory moved from one node to another.
- *
- * So until kernel starts supporting memory hotplug to memory-less
- * nodes, just prevent such attempts upfront in QEMU.
- */
-if (nb_numa_nodes && !numa_info[node].node_mem) {
-error_setg(errp, "Can't hotplug memory to memory-less node %d",
-   node);
-return;
-}
-
 spapr_memory_plug(hotplug_dev, dev, node, errp);
 } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
 spapr_core_plug(hotplug_dev, dev, errp);
-- 
1.8.3.1




Re: [Qemu-devel] [PATCH for-2.13 0/2] net: Get rid of 'vlan' terminology and use 'hub' instead

2018-04-05 Thread Stefan Hajnoczi
On Wed, Apr 04, 2018 at 05:55:57PM +0200, Thomas Huth wrote:
> On 04.04.2018 17:49, Paolo Bonzini wrote:
> > On 04/04/2018 17:33, Thomas Huth wrote:
> >> The 'vlan' term caused a lot of confusion in the past, so let's try
> >> to switch to the better word "hub" everywhere where it is appropriate.
> >> (Note: The CLI parameter "vlan=xxx" still remains as is to not break the
> >> compatibility with older versions)
> >>
> >> Thomas Huth (2):
> >>   net: Get rid of 'vlan' terminology and use 'hub' instead in the source
> >> files
> >>   net: Get rid of 'vlan' terminology and use 'hub' instead in the doc
> >> files
> > 
> > Bold proposal: remove 'vlan' property/suboption in 2.13, so that we can
> > remove the "aka"s altogether.  Yes, it's early, but even though I've
> > occasionally used vlans I've never used more than one...
> 
> Fine for me, too - if we all feel confident enough already for that. The
> "vlan=..." parameter of "-net" has been declared as deprecated since
> QEMU v2.9, so unless somebody sees an urgent need to still keep it, I
> can add another patch to this series to remove it ("-net" will then only
> work with the default hub #0).

Sound good to me.

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [Qemu-arm] [PATCH v2 2/2] arm_gicv3_kvm: kvm_dist_get/put: skip the registers banked by GICR

2018-04-05 Thread Peter Maydell
On 29 March 2018 at 11:54, Peter Maydell  wrote:
> On 23 March 2018 at 12:08, Peter Maydell  wrote:
>> On 21 March 2018 at 08:00, Shannon Zhao  wrote:
>>> On 2018/3/20 19:54, Peter Maydell wrote:
 Can you still successfully migrate a VM from a QEMU version
 without this bugfix to one with the bugfix ?

>>> I've tested this case. I can migrate a VM between these two versions.
>>
>> Hmm. Looking at the code I can't see how that would work,
>> except by accident. Let me see if I understand what's happening
>> here:

> I was thinking a bit more about how to handle this, and
> my best idea was:
>
> (1) send something in the migration stream that says
> "I don't have this bug" (version number change?
> vmstate field that's just a "no bug" flag? subsection
> with no contents?)
>
> (2) on the destination, if the source doesn't tell us
> it doesn't have this bug, and we are running KVM, then
> shift all the data in the arrays down to fix it up
> [Strictly what we want to know is if the source is
> running KVM, not if the destination is, but I don't
> know of a way to find that out, and in practice TCG->KVM
> migrations don't work anyway, so it's not a big deal.]

Shannon, are you planning to look at this for 2.12, or should
we postpone it to 2.13? (It's not a regression, right? So
we don't necessarily have to urgently fix it for 2.12.)

thanks
-- PMM



Re: [Qemu-devel] -icount changes physical address assignments in QEMU 2.10/2.11

2018-04-05 Thread Peter Maydell
On 22 March 2018 at 05:31,   wrote:
> Your patch (applied to 2.11 source release) changed the behavior
> somewhat, but did not fix the problem.  Attached is a binary that when
> run should show a CGA fontset and color bars.
>
> This command should "work":
>
> qemu-system-aarch64 -M virt,virtualization=on -cpu cortex-a53 -vga std
> -device secondary-vga -device virtio-net,netdev=vlan0,addr=2 -kernel
> icount-bug.bin -netdev user,id=vlan0
>
> If you add "-icount 2" the display will appear, but be mangled.
>
> I didn't spend too much time trimming the source code, so if you need
> to step by step debug walking through the guest code, I'll have to
> prune it down some more.
>
> For the record, the QEMU source I have is modified slightly to add ARM WFE
> support, something I will submit once this is all straightened out,
> but this bug appeared before I made that patch.

Hi; sorry for the delay in testing this. I've just tried your
attached test image with current head of git QEMU (commit
0e87fdc966d05f4e5ad86, which is the 2.12.0-rc2 release candidate),
and it gives me a correct display both with and without -icount 2.
So I think we've fixed this bug, probably with the combination
of commits 0790f86861079b19 and 87f963be66a3245, or possibly
a75a52d62418da.

thanks
-- PMM



Re: [Qemu-devel] [Qemu-ppc] [PATCHv2 for-2.13 2/2] Add host_memory_backend_pagesize() helper

2018-04-05 Thread Greg Kurz
On Thu,  5 Apr 2018 12:20:02 +1000
David Gibson  wrote:

> There are a couple places (one generic, one target specific) where we need
> to get the host page size associated with a particular memory backend.  I
> have some upcoming code which will add another place which wants this.  So,
> for convenience, add a helper function to calculate this.
> 
> host_memory_backend_pagesize() returns the host pagesize for a given
> HostMemoryBackend object.
> 
> Signed-off-by: David Gibson 
> ---

Reviewed-by: Greg Kurz 

>  backends/hostmem.c   | 11 +++
>  exec.c   |  5 ++---
>  include/sysemu/hostmem.h |  2 ++
>  target/ppc/kvm.c |  6 +-
>  4 files changed, 16 insertions(+), 8 deletions(-)
> 
> diff --git a/backends/hostmem.c b/backends/hostmem.c
> index f61093654e..ff2bb0489c 100644
> --- a/backends/hostmem.c
> +++ b/backends/hostmem.c
> @@ -18,6 +18,7 @@
>  #include "qapi/visitor.h"
>  #include "qemu/config-file.h"
>  #include "qom/object_interfaces.h"
> +#include "qemu/mmap-alloc.h"
>  
>  #ifdef CONFIG_NUMA
>  #include 
> @@ -262,6 +263,16 @@ bool host_memory_backend_is_mapped(HostMemoryBackend 
> *backend)
>  return backend->is_mapped;
>  }
>  
> +size_t host_memory_backend_pagesize(HostMemoryBackend *memdev)
> +{
> +Object *obj = OBJECT(memdev);
> +char *path = object_property_get_str(obj, "mem-path", NULL);
> +size_t pagesize = qemu_mempath_getpagesize(path);
> +
> +g_free(path);
> +return pagesize;
> +}
> +
>  static void
>  host_memory_backend_memory_complete(UserCreatable *uc, Error **errp)
>  {
> diff --git a/exec.c b/exec.c
> index b38b004563..c7fcefa851 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1491,9 +1491,8 @@ static int find_max_supported_pagesize(Object *obj, 
> void *opaque)
>  long *hpsize_min = opaque;
>  
>  if (object_dynamic_cast(obj, TYPE_MEMORY_BACKEND)) {
> -char *mem_path = object_property_get_str(obj, "mem-path", NULL);
> -long hpsize = qemu_mempath_getpagesize(mem_path);
> -g_free(mem_path);
> +long hpsize = host_memory_backend_pagesize(MEMORY_BACKEND(obj));
> +
>  if (hpsize < *hpsize_min) {
>  *hpsize_min = hpsize;
>  }
> diff --git a/include/sysemu/hostmem.h b/include/sysemu/hostmem.h
> index 47bc9846ac..bc36899bb8 100644
> --- a/include/sysemu/hostmem.h
> +++ b/include/sysemu/hostmem.h
> @@ -68,4 +68,6 @@ MemoryRegion 
> *host_memory_backend_get_memory(HostMemoryBackend *backend,
>  
>  void host_memory_backend_set_mapped(HostMemoryBackend *backend, bool mapped);
>  bool host_memory_backend_is_mapped(HostMemoryBackend *backend);
> +size_t host_memory_backend_pagesize(HostMemoryBackend *memdev);
> +
>  #endif
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index f393eae127..1ab9cf3a8a 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -493,11 +493,7 @@ static void kvm_fixup_page_sizes(PowerPCCPU *cpu)
>  bool kvmppc_is_mem_backend_page_size_ok(const char *obj_path)
>  {
>  Object *mem_obj = object_resolve_path(obj_path, NULL);
> -char *mempath = object_property_get_str(mem_obj, "mem-path", NULL);
> -long pagesize;
> -
> -pagesize = qemu_mempath_getpagesize(mempath);
> -g_free(mempath);
> +long pagesize = host_memory_backend_pagesize(MEMORY_BACKEND(mem_obj));
>  
>  return pagesize >= max_cpu_page_size;
>  }




  1   2   >