date:20160302

Re: [Qemu-devel] [PATCH v7 5/6] s390x/cpu: Add error handling to cpu creation

2016-03-02 Thread David Hildenbrand

> >> +static void s390_cpu_get_id(Object *obj, Visitor *v, const char *name,
> >> +void *opaque, Error **errp)
> >> +{
> >> +S390CPU *cpu = S390_CPU(obj);
> >> +int64_t value = cpu->id;
> >> +
> >> +visit_type_int(v, name, , errp);
> >> +}
> >> +
> >> +static void s390_cpu_set_id(Object *obj, Visitor *v, const char *name,
> >> +void *opaque, Error **errp)
> >> +{
> >> +S390CPU *cpu = S390_CPU(obj);
> >> +DeviceState *dev = DEVICE(obj);
> >> +const int64_t min = 0;
> >> +const int64_t max = UINT32_MAX;
> >> +Error *local_err = NULL;
> >> +int64_t value;
> >> +
> >> +if (dev->realized) {
> >> +error_setg(errp, "Attempt to set property '%s' on '%s' after "
> >> +   "it was realized", name, object_get_typename(obj));
> >> +return;
> >> +}
> >> +
> >> +visit_type_int(v, name, , _err);
> >> +if (local_err) {
> >> +error_propagate(errp, local_err);
> >> +return;
> >> +}
> >> +if (value < min || value > max) {
> >> +error_setg(errp, "Property %s.%s doesn't take value %" PRId64
> >> +   " (minimum: %" PRId64 ", maximum: %" PRId64 ")" ,
> >> +   object_get_typename(obj), name, value, min, max);
> >> +return;
> >> +}
> >> +if ((value != cpu->id) && cpu_exists(value)) {
> >> +error_setg(errp, "CPU with ID %" PRIi64 " exists", value);
> >> +return;
> >> +}
> >> +cpu->id = value;
> >> +}  
> > 
> > Just curious, what about using a simple
> > 
> > object_property_set_int() and doing all the checks in realize() ?
> > 
> > Then we could live without manual getter/setter (and without the realize 
> > check).
> >   
> 
> I think we still need at least a manual setter, even if you want to move
> the checks to realize.
> 
> See something like object_property_add_uint64_ptr() -- It sets a
> boilerplate get routine, and no set routine -- I think this presumes you
> set your property upfront (at add time), never change it for the life of
> the object, but want to read it later.
> By comparison, S390CPU.id is set sometime after instance_init, based on
> input.
> 
> So, we call object_property_set_int() to update it --  This just passes
> the provided int value to the setter routine associated with the
> property.  If one doesn't exist, you get:
> qemu: Insufficient permission to perform this operation
> 
> I think this is also why we want to check for dev->realized in the
> setter routine, to make sure the property is not being changed "too
> late" -- Once the cpu is realized, the ID is baked and can't be changed.
> 
> Or did I misunderstand your idea here?

If we care about malicious users, wanting to set id's after realize that is
true. But I am no QOM expert and don't know if that is a scenarios that
has to be taken care of. But as I see similar code for other properties,
I assume we are better off doing it also that way.

David

Re: [Qemu-devel] [PATCH v3] net: check packet payload length

2016-03-02 Thread Jason Wang



On 03/02/2016 07:59 PM, P J P wrote:
> From: Prasad J Pandit 
>
> While computing IP checksum, 'net_checksum_calculate' reads
> payload length from the packet. It could exceed the given 'data'
> buffer size. Add a check to avoid it.
>
> Reported-by: Liu Ling 
> Signed-off-by: Prasad J Pandit 
> ---
>  net/checksum.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> Update as per review:
>   -> https://lists.gnu.org/archive/html/qemu-devel/2016-02/msg06121.html
>
> diff --git a/net/checksum.c b/net/checksum.c
> index 14c0855..0942437 100644
> --- a/net/checksum.c
> +++ b/net/checksum.c
> @@ -59,6 +59,11 @@ void net_checksum_calculate(uint8_t *data, int length)
>  int hlen, plen, proto, csum_offset;
>  uint16_t csum;
>  
> +/* Ensure data has complete L2 & L3 headers. */
> +if (length < 14 + 20) {
> +return;
> +}
> +
>  if ((data[14] & 0xf0) != 0x40)
>   return; /* not IPv4 */
>  hlen  = (data[14] & 0x0f) * 4;
> @@ -76,8 +81,9 @@ void net_checksum_calculate(uint8_t *data, int length)
>   return;
>  }
>  
> -if (plen < csum_offset+2)
> - return;
> +if (plen < csum_offset + 2 || 14 + hlen + plen > length) {
> +return;
> +}
>  
>  data[14+hlen+csum_offset]   = 0;
>  data[14+hlen+csum_offset+1] = 0;

Applied to -net.

Thanks

Re: [Qemu-devel] [PATCH V2 1/3] net/filter-traffic: add filter-traffic.h

2016-03-02 Thread Jason Wang



On 03/03/2016 03:20 PM, Zhang Chen wrote:
>
>
> On 03/03/2016 02:48 PM, Jason Wang wrote:
>>
>> On 03/02/2016 03:25 PM, Zhang Chen wrote:
>>>
>>> On 03/02/2016 02:19 PM, Jason Wang wrote:
 On 02/29/2016 08:23 PM, Zhang Chen wrote:
> We can reuse filter-traffic by filter-mirror,
> filter-redirector and so on.
 I think we could share more than this. E.g just use filter-mirror.c to
 implement both mirror and redirector.
>>> OK, should we change the name to filter-mirror? like filter-traffic?
>>> and add redirect=on/off
>>>
>>> -object
>>> filter-mirror,id=qtest-f0,netdev=qtest-bn0,queue=tx,indev=redirect0,redirect=on,outdev=mirror0
>>>
>>>
>> I think we'd better use another type for this.
>>
>
> What is the "another type " ?
> please tell me the detail for this.

E.g. register another type like "filter-redirector" in reister_types()
of filter-mirror.c.

Thanks

>
> Thanks
> zhangchen
>
>>
>>
>

Re: [Qemu-devel] [PATCH V2 1/3] net/filter-traffic: add filter-traffic.h

2016-03-02 Thread Zhang Chen




On 03/03/2016 02:48 PM, Jason Wang wrote:


On 03/02/2016 03:25 PM, Zhang Chen wrote:


On 03/02/2016 02:19 PM, Jason Wang wrote:

On 02/29/2016 08:23 PM, Zhang Chen wrote:

We can reuse filter-traffic by filter-mirror,
filter-redirector and so on.

I think we could share more than this. E.g just use filter-mirror.c to
implement both mirror and redirector.

OK, should we change the name to filter-mirror? like filter-traffic?
and add redirect=on/off

-object
filter-mirror,id=qtest-f0,netdev=qtest-bn0,queue=tx,indev=redirect0,redirect=on,outdev=mirror0


I think we'd better use another type for this.



What is the "another type " ?
please tell me the detail for this.

Thanks
zhangchen






--
Thanks
zhangchen

Re: [Qemu-devel] [PATCH v2 11/19] qapi: Add type.is_empty() helper

2016-03-02 Thread Markus Armbruster

Eric Blake  writes:

> On 03/02/2016 12:04 PM, Markus Armbruster wrote:
>> Eric Blake  writes:
>> 
>>> And use it in qapi-types and qapi-event.  Down the road, we may
>>> want to lift our artificial restriction of no variants at the
>>> top level of an event, at which point, inlining our check for
>>> whether members is empty will no longer be sufficient, but
>>> adding a check for variants adds verbosity; in the meantime,
>>> add some asserts in places where we don't handle variants.
>> 
>> Perhaps I'm just running out of steam for today, but I've read this
>> twice, and still don't get why adding these assertions goes in the same
>> patch as adding the helper, or what it has to do with events.
>
> Okay, will split this into two patches for v3.

A better commit message might do.  Use your judgement.

Re: [Qemu-devel] [PATCH 24/38] ivshmem: Propagate errors through ivshmem_recv_setup()

2016-03-02 Thread Markus Armbruster

Marc-André Lureau  writes:

> Hi
>
> On Wed, Mar 2, 2016 at 8:35 PM, Markus Armbruster  wrote:
>> You know, I'd prefer that, too, and I've argued for it unsuccessfully.
>> As it is, we fairly consistently return void when the function returns
>> errors through Error ** and has no non-error value.
>
> Good to know we are on same side.
>
  {
  PCIDevice *pdev = PCI_DEVICE(s);
  MSIMessage msg = msix_get_message(pdev, vector);
 @@ -522,22 +518,21 @@ static int ivshmem_add_kvm_msi_virq(IVShmemState *s, 
 int vector)

  ret = kvm_irqchip_add_msi_route(kvm_state, msg, pdev);
  if (ret < 0) {
 -error_report("ivshmem: kvm_irqchip_add_msi_route failed");
 -return -1;
 +error_setg(errp, "kvm_irqchip_add_msi_route failed");
 +return;
  }

  s->msi_vectors[vector].virq = ret;
  s->msi_vectors[vector].pdev = pdev;
 -
 -return 0;
  }

 -static void setup_interrupt(IVShmemState *s, int vector)
 +static void setup_interrupt(IVShmemState *s, int vector, Error **errp)
  {
  EventNotifier *n = >peers[s->vm_id].eventfds[vector];
  bool with_irqfd = kvm_msi_via_irqfd_enabled() &&
  ivshmem_has_feature(s, IVSHMEM_MSI);
  PCIDevice *pdev = PCI_DEVICE(s);
 +Error *err = NULL;

  IVSHMEM_DPRINTF("setting up interrupt for vector: %d\n", vector);

 @@ -546,13 +541,16 @@ static void setup_interrupt(IVShmemState *s, int 
 vector)
  watch_vector_notifier(s, n, vector);
  } else if (msix_enabled(pdev)) {
  IVSHMEM_DPRINTF("with irqfd\n");
 -if (ivshmem_add_kvm_msi_virq(s, vector) < 0) {
 +ivshmem_add_kvm_msi_virq(s, vector, );
 +if (err) {
 +error_propagate(errp, err);
  return;
>>>
>>> That would make this simpler, avoiding local err variables, and
>>> propagate. But you seem to prefer that form. I don't know if there is
>>> any conventions (I am used to glib conventions, and usually a bool
>>> success is returned, even if the function takes a GError)
>>
>> Does GLib spell out this convention somewhere?
>
> For glib, there is a paragraph about return bool/GError conventions
> (which is usually adapted to other return type):
> https://developer.gnome.org/glib/unstable/glib-Error-Reporting.html

While I can't see a hard-and-fast rule there, the text clearly shows a
strong preference for making the function value a reliable error
indicator whenever possible.

Thanks!

>>
>> I can perhaps try to cook up a patch to demonstrate the advantages of
>> returning a success/failure value even with Error **, and try to get our
>> convention changed.
>>
>> Until then, we better stick to the existing convention, whether we like
>> it or not.
>
> ok

Re: [Qemu-devel] [PATCH v2 11/19] qapi: Add type.is_empty() helper

2016-03-02 Thread Markus Armbruster

Eric Blake  writes:

> On 03/02/2016 12:04 PM, Markus Armbruster wrote:
>> Eric Blake  writes:
>> 
>>> And use it in qapi-types and qapi-event.  Down the road, we may
>>> want to lift our artificial restriction of no variants at the
>>> top level of an event, at which point, inlining our check for
>>> whether members is empty will no longer be sufficient, but
>>> adding a check for variants adds verbosity; in the meantime,
>>> add some asserts in places where we don't handle variants.
>> 
>> Perhaps I'm just running out of steam for today, but I've read this
>> twice, and still don't get why adding these assertions goes in the same
>> patch as adding the helper, or what it has to do with events.
>
> And yet it was the review on the earlier posting that caused me to add
> asserts; maybe re-reading that thread will help refresh memory, and spur
> an idea for how to better express it in the commit message:
> https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg04726.html

I suspect what's needed is a clearer commit message a fresher mind
reading it.

>>> More immediately, the new .is_empty() helper will help fix a bug
>>> in qapi-visit in the next patch, where the generator did not
>>> handle an explicit empty type in the same was as a missing type.
>> 
>> same way
>
> [Ever wonder if I intentionally stick in a typo, just to see who will
> notice? Or maybe it really was a slip of the finger...]
>
>>> +++ b/scripts/qapi-event.py
>>> @@ -39,7 +39,7 @@ def gen_event_send(name, arg_type):
>>>  ''',
>>>  proto=gen_event_send_proto(name, arg_type))
>>>
>>> -if arg_type and arg_type.members:
>>> +if arg_type and not arg_type.is_empty():
>>>  ret += mcgen('''
>>>  QmpOutputVisitor *qov;
>>>  Visitor *v;
>> 
>> Oh, you don't just add a helper, you actually *change* the condition!
>> Perhaps the commit message would be easier to understand if it explained
>> that first.
>
> The old condition:
> arg_type and arg_type.members
>
> New condition:
> arg_type and (arg_type.members or arg_type.variants)
>
> But we know there are no variants, since unions cannot (yet) be passed
> as event 'data', so the condition is the same effect now, and
> future-proofing for a future patch when I do allow unions in events.

Unless allowing unions makes generators nicer, I'll want to see a
compelling use case.

Can you give me an idea what .is_empty() does for *this* patch series?
Wait, you did: "will help fix a bug [...] in the next patch".

Perhaps a slight rearrangement would make things easier to grok: start
with the bug fix, and introduce .is_empty() there.  In the next patch,
say "In these places, the intent is to test for empty, but the actual
code exploits that there can be no variants.  Using .is_empty() instead
is a bit clearer and a bit more robust."

>>> +++ b/scripts/qapi-types.py
>>> @@ -90,7 +90,7 @@ struct %(c_name)s {
>>>  # potential issues with attempting to malloc space for zero-length
>>>  # structs in C, and also incompatibility with C++ (where an empty
>>>  # struct is size 1).
>>> -if not (base and base.members) and not members and not variants:
>>> +if (not base or base.is_empty()) and not members and not variants:
>>>  ret += mcgen('''
>>>  char qapi_dummy_for_empty_struct;
>>>  ''')
>> 
>> I figure the case for the helper based on this patch alone is making the
>> code a bit more future-proof.  Suggest you try to explain that in your
>> commit message, including against what future change exactly you're
>> proofing the code.
>
> And here, bases cannot (yet) have variants, but that's also on my plate
> of things I'd like to support in the future.

Also needs a use case.

>> Haven't reviewed for completeness.

Re: [Qemu-devel] [PATCH 1/3] arm: gic: add GICType

2016-03-02 Thread Peter Xu

On Thu, Mar 03, 2016 at 07:34:21AM +0100, Markus Armbruster wrote:
> Peter Xu  writes:
> > I see that qapi-introspect branch is not there now. Is it merged to
> > some other branch already? When will it be there in QEMU master
> > (still not in, right?)? Just curious about it.
> 
> Merged in commit 9e72681.
> 
> Between the talk and the merge, query-schema got renamed to
> query-qmp-schema.  Sometimes I relapse.  Sorry for the confusion!

Got it!

> > Now I can understand. For this case, I guess both ways work, right?
> > Considering that if "query-schema" is still not there, I'd still
> > prefer the "array" solution. At least, it can keep the schema
> > several lines shorter (as you have mentioned already, it's *big*
> > enough :). Also, even we would have "query-schema", I would still
> > prefer not change schema unless necessary. What do you think?
> 
> I can't say without understanding what the introspection question would
> be.  That needs actual thought, which is in short supply, especially
> before breakfast ;)

Yah, anyway, looking forward to further review comments! For now,
maybe I can start to work on v2 if no big problem. If there is
better way, v3 is ready to go. :)

Peter

Re: [Qemu-devel] [PATCH 4/7] target-i386: Dump illegal opcodes with -d unimp

2016-03-02 Thread Hervé Poussineau


Le 03/03/2016 06:30, Richard Henderson a écrit :

Signed-off-by: Richard Henderson 
---
  target-i386/translate.c | 22 +++---
  1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/target-i386/translate.c b/target-i386/translate.c
index b73c237..aa423cb 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -99,6 +99,7 @@ typedef struct DisasContext {
  int prefix;
  TCGMemOp aflag;
  TCGMemOp dflag;
+target_ulong pc_start;
  target_ulong pc; /* pc = eip + cs_base */
  int is_jmp; /* 1 = means jump (stop translation), 2 means CPU
 static state change (stop translation) */
@@ -2368,6 +2369,21 @@ static void gen_exception(DisasContext *s, int trapno, 
target_ulong cur_eip)
  s->is_jmp = DISAS_TB_JUMP;
  }

+static void gen_illop(CPUX86State *env, DisasContext *s)
+{
+target_ulong pc = s->pc_start;
+gen_exception(s, EXCP06_ILLOP, pc - s->cs_base);
+
+if (qemu_loglevel_mask(LOG_UNIMP)) {


Do you want LOG_UNIMP or LOG_GUEST_ERROR?
Both are possible. Either you decide that guest works well, and an unknown 
instruction is a valid instruction unimplemented in QEMU side,
you decide that guest can do invalid things, and LOG_GUEST_ERROR is probably 
better.


+target_ulong end = s->pc;
+qemu_log("ILLOPC: " TARGET_FMT_lx ":", pc);
+for (; pc < end; ++pc) {
+qemu_log(" %02x", cpu_ldub_code(env, pc));
+}
+qemu_log("\n");
+}
+}
+
  /* an interrupt is different from an exception because of the
 privilege checks */
  static void gen_interrupt(DisasContext *s, int intno,
@@ -2893,7 +2909,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, 
int b,
  }
  if (s->flags & HF_EM_MASK) {
  illegal_op:
-gen_exception(s, EXCP06_ILLOP, pc_start - s->cs_base);
+gen_illop(env, s);
  return;
  }
  if (is_xmm && !(s->flags & HF_OSFXSR_MASK))
@@ -4293,7 +4309,7 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
  target_ulong next_eip, tval;
  int rex_w, rex_r;

-s->pc = pc_start;
+s->pc_start = s->pc = pc_start;
  prefixes = 0;
  s->override = -1;
  rex_w = -1;
@@ -8031,7 +8047,7 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
  if (s->prefix & PREFIX_LOCK)
  gen_helper_unlock();
  /* XXX: ensure that no lock was generated */
-gen_exception(s, EXCP06_ILLOP, pc_start - s->cs_base);
+gen_illop(env, s);
  return s->pc;
  }





This patch is not quiet on some operating systems:
OS/2:
ILLOPC: 000172e1: 0f a6

Windows XP:
ILLOPC: 00020d1a: c4 c4

And very verbose in Windows 3.11, Windows 9x:
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 27fe: 63
ILLOPC: 000118ca: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000118ca: 63
ILLOPC: 00011b36: 63
ILLOPC: 000ffb17: 63
ILLOPC: 00011b3d: 63
ILLOPC: 00011b36: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000118ca: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000118ca: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000118ca: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 0001e3b9: 0f ff
ILLOPC: 000ffb17: 63
ILLOPC: 000118ca: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000118ca: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 00011b36: 63
ILLOPC: 00011b3d: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000118ca: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000118ca: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000118ca: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 00014d8a: 0f ff
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000118ca: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000ffb17: 63
ILLOPC: 000118ca: 63
ILLOPC: 000ffb17: 63

Is it normal?

Regards,

Hervé

Re: [Qemu-devel] [PATCH 0/7] target-i386 fixes

2016-03-02 Thread Hervé Poussineau


Le 03/03/2016 06:30, Richard Henderson a écrit :

This is primarily patches fixing Windows booting regressions
introduced by myself.  Many thanks to Herve for reporting them
and Paolo for fixing two of them.

Changes from patches previously seen on-list:
   * Tweak the bnd_jmp patch to test MPX enabled properly.
   * Dump illegal opcode data with -d unimp.
   * Use gen_nop_modrm for prefetch.
   * Fix Windows XP booting.

The illegal opcode dumping was useful for debugging.  The prefetch
patch I noticed while reviewing the addr16 patch wrt lea.


r~


Paolo Bonzini (3):
   target-i386: avoid repeated calls to the bnd_jmp helper
   target-i386: fix smsw and lmsw from/to register
   target-i386: fix addr16 prefix

Richard Henderson (4):
   target-i386: Fix SMSW for 64-bit mode
   target-i386: Dump illegal opcodes with -d unimp
   target-i386: Use gen_nop_modrm for prefetch instructions
   target-i386: Fix inhibit irq mask handling

  target-i386/translate.c | 179 +++-
  1 file changed, 100 insertions(+), 79 deletions(-)



Tested-by: Hervé Poussineau 

Regressions in MS-DOS 6 / Windows 9x / Windows XP are fixed.

Hervé

Re: [Qemu-devel] [PATCH V2 1/3] net/filter-traffic: add filter-traffic.h

2016-03-02 Thread Jason Wang



On 03/02/2016 03:25 PM, Zhang Chen wrote:
>
>
> On 03/02/2016 02:19 PM, Jason Wang wrote:
>>
>> On 02/29/2016 08:23 PM, Zhang Chen wrote:
>>> We can reuse filter-traffic by filter-mirror,
>>> filter-redirector and so on.
>> I think we could share more than this. E.g just use filter-mirror.c to
>> implement both mirror and redirector.
>
> OK, should we change the name to filter-mirror? like filter-traffic?
> and add redirect=on/off
>
> -object
> filter-mirror,id=qtest-f0,netdev=qtest-bn0,queue=tx,indev=redirect0,redirect=on,outdev=mirror0
>

I think we'd better use another type for this.

Re: [Qemu-devel] [PATCH 1/3] arm: gic: add GICType

2016-03-02 Thread Markus Armbruster

Peter Xu  writes:

> On Wed, Mar 02, 2016 at 02:59:57PM +0100, Markus Armbruster wrote:
>> Peter Xu  writes:
>> > What's "query-schema"? Is that a QMP command?
>> 
>> Yes.
>> 
>> More than you ever wanted to know:
>> http://events.linuxfoundation.org/sites/events/files/slides/armbru-qemu-introspection.pdf
>> https://www.youtube.com/watch?v=IEa8Ao8_B9o=PLW3ep1uCIRfyLNSu708gWG7uvqlolk0ep=28
>
> Thanks for the pointers and cool slides! It's a good thing to pick
> up. :-)
>
> It'll be cool we treat data as codes, and codes as data.
>
> I see that qapi-introspect branch is not there now. Is it merged to
> some other branch already? When will it be there in QEMU master
> (still not in, right?)? Just curious about it.

Merged in commit 9e72681.

Between the talk and the merge, query-schema got renamed to
query-qmp-schema.  Sometimes I relapse.  Sorry for the confusion!

>> > What I meant is that, we can define the following (for example):
>> >
>> > { 'struct': 'GICCapInfo',
>> >   'data': [
>> > 'version': 'int',
>> > 'emulated': 'bool',
>> > 'kernel': 'bool'] }
>> >
>> > And:
>> >
>> > { 'command': 'query-gic-capability',
>> >   'returns': ['GICCapInfo'] }
>> >
>> > So we can keep this schema as it is when new versions arrive. We
>> > can just push another element in.
>> 
>> To answer questions of the sort "can this QEMU version do X?", it's
>> often useful to tie X to a schema change that is visible in the result
>> of query-schema.
>
> Now I can understand. For this case, I guess both ways work, right?
> Considering that if "query-schema" is still not there, I'd still
> prefer the "array" solution. At least, it can keep the schema
> several lines shorter (as you have mentioned already, it's *big*
> enough :). Also, even we would have "query-schema", I would still
> prefer not change schema unless necessary. What do you think?

I can't say without understanding what the introspection question would
be.  That needs actual thought, which is in short supply, especially
before breakfast ;)

Re: [Qemu-devel] [PATCH v5 0/8] QOM'ify hw/timer/*

2016-03-02 Thread hitmoon


On 25 February 2016 at 10:30, xiaoqiang zhao  wrote:


This patch series QOM'ify timer code under hw/timer directory.
Main idea is to split the initfn's work, some to TypeInfo.instance_init
and some is placed in DeviceClass::realize.
Drop the use of SysBusDeviceClass::init if possible.

Patch 3,4 (m48t59) has been tested in a sparc vm with debian linux guest
and savevm/loadvm looks fine.
Comments from the relevant maintainers are needed!

ping ...
http://lists.nongnu.org/archive/html/qemu-devel/2016-02/msg05859.html

[Qemu-devel] [PATCH] hw/usb: whitespace fix in tusb6010.c

2016-03-02 Thread xiaoqiang zhao

use spaces instead of Tabs

Signed-off-by: xiaoqiang zhao 
---
 hw/usb/tusb6010.c | 318 +++---
 1 file changed, 159 insertions(+), 159 deletions(-)

diff --git a/hw/usb/tusb6010.c b/hw/usb/tusb6010.c
index 9f6af90..4daa3a5 100644
--- a/hw/usb/tusb6010.c
+++ b/hw/usb/tusb6010.c
@@ -68,178 +68,178 @@ typedef struct TUSBState {
 uint32_t otg_timer_val;
 } TUSBState;
 
-#define TUSB_DEVCLOCK  6000/* 60 MHz */
+#define TUSB_DEVCLOCK   6000/* 60 MHz */
 
-#define TUSB_VLYNQ_CTRL0x004
+#define TUSB_VLYNQ_CTRL 0x004
 
 /* Mentor Graphics OTG core registers.  */
-#define TUSB_BASE_OFFSET   0x400
+#define TUSB_BASE_OFFSET0x400
 
 /* FIFO registers, 32-bit.  */
-#define TUSB_FIFO_BASE 0x600
+#define TUSB_FIFO_BASE  0x600
 
 /* Device System & Control registers, 32-bit.  */
-#define TUSB_SYS_REG_BASE  0x800
+#define TUSB_SYS_REG_BASE   0x800
 
-#define TUSB_DEV_CONF  (TUSB_SYS_REG_BASE + 0x000)
-#defineTUSB_DEV_CONF_USB_HOST_MODE (1 << 16)
-#defineTUSB_DEV_CONF_PROD_TEST_MODE(1 << 15)
-#defineTUSB_DEV_CONF_SOFT_ID   (1 << 1)
-#defineTUSB_DEV_CONF_ID_SEL(1 << 0)
+#define TUSB_DEV_CONF   (TUSB_SYS_REG_BASE + 0x000)
+#define TUSB_DEV_CONF_USB_HOST_MODE (1 << 16)
+#define TUSB_DEV_CONF_PROD_TEST_MODE(1 << 15)
+#define TUSB_DEV_CONF_SOFT_ID   (1 << 1)
+#define TUSB_DEV_CONF_ID_SEL(1 << 0)
 
-#define TUSB_PHY_OTG_CTRL_ENABLE   (TUSB_SYS_REG_BASE + 0x004)
-#define TUSB_PHY_OTG_CTRL  (TUSB_SYS_REG_BASE + 0x008)
-#defineTUSB_PHY_OTG_CTRL_WRPROTECT (0xa5 << 24)
-#defineTUSB_PHY_OTG_CTRL_O_ID_PULLUP   (1 << 23)
-#defineTUSB_PHY_OTG_CTRL_O_VBUS_DET_EN (1 << 19)
-#defineTUSB_PHY_OTG_CTRL_O_SESS_END_EN (1 << 18)
-#defineTUSB_PHY_OTG_CTRL_TESTM2(1 << 17)
-#defineTUSB_PHY_OTG_CTRL_TESTM1(1 << 16)
-#defineTUSB_PHY_OTG_CTRL_TESTM0(1 << 15)
-#defineTUSB_PHY_OTG_CTRL_TX_DATA2  (1 << 14)
-#defineTUSB_PHY_OTG_CTRL_TX_GZ2(1 << 13)
-#defineTUSB_PHY_OTG_CTRL_TX_ENABLE2(1 << 12)
-#defineTUSB_PHY_OTG_CTRL_DM_PULLDOWN   (1 << 11)
-#defineTUSB_PHY_OTG_CTRL_DP_PULLDOWN   (1 << 10)
-#defineTUSB_PHY_OTG_CTRL_OSC_EN(1 << 9)
-#defineTUSB_PHY_OTG_CTRL_PHYREF_CLK(v) (((v) & 3) << 7)
-#defineTUSB_PHY_OTG_CTRL_PD(1 << 6)
-#defineTUSB_PHY_OTG_CTRL_PLL_ON(1 << 5)
-#defineTUSB_PHY_OTG_CTRL_EXT_RPU   (1 << 4)
-#defineTUSB_PHY_OTG_CTRL_PWR_GOOD  (1 << 3)
-#defineTUSB_PHY_OTG_CTRL_RESET (1 << 2)
-#defineTUSB_PHY_OTG_CTRL_SUSPENDM  (1 << 1)
-#defineTUSB_PHY_OTG_CTRL_CLK_MODE  (1 << 0)
+#define TUSB_PHY_OTG_CTRL_ENABLE(TUSB_SYS_REG_BASE + 0x004)
+#define TUSB_PHY_OTG_CTRL   (TUSB_SYS_REG_BASE + 0x008)
+#define TUSB_PHY_OTG_CTRL_WRPROTECT (0xa5 << 24)
+#define TUSB_PHY_OTG_CTRL_O_ID_PULLUP   (1 << 23)
+#define TUSB_PHY_OTG_CTRL_O_VBUS_DET_EN (1 << 19)
+#define TUSB_PHY_OTG_CTRL_O_SESS_END_EN (1 << 18)
+#define TUSB_PHY_OTG_CTRL_TESTM2(1 << 17)
+#define TUSB_PHY_OTG_CTRL_TESTM1(1 << 16)
+#define TUSB_PHY_OTG_CTRL_TESTM0(1 << 15)
+#define TUSB_PHY_OTG_CTRL_TX_DATA2  (1 << 14)
+#define TUSB_PHY_OTG_CTRL_TX_GZ2(1 << 13)
+#define TUSB_PHY_OTG_CTRL_TX_ENABLE2(1 << 12)
+#define TUSB_PHY_OTG_CTRL_DM_PULLDOWN   (1 << 11)
+#define TUSB_PHY_OTG_CTRL_DP_PULLDOWN   (1 << 10)
+#define TUSB_PHY_OTG_CTRL_OSC_EN(1 << 9)
+#define TUSB_PHY_OTG_CTRL_PHYREF_CLK(v) (((v) & 3) << 7)
+#define TUSB_PHY_OTG_CTRL_PD(1 << 6)
+#define TUSB_PHY_OTG_CTRL_PLL_ON(1 << 5)
+#define TUSB_PHY_OTG_CTRL_EXT_RPU   (1 << 4)
+#define TUSB_PHY_OTG_CTRL_PWR_GOOD  (1 << 3)
+#define TUSB_PHY_OTG_CTRL_RESET (1 << 2)
+#define TUSB_PHY_OTG_CTRL_SUSPENDM  (1 << 1)
+#define TUSB_PHY_OTG_CTRL_CLK_MODE  (1 << 0)
 
 /* OTG status register */
-#define TUSB_DEV_OTG_STAT  (TUSB_SYS_REG_BASE + 0x00c)
-#defineTUSB_DEV_OTG_STAT_PWR_CLK_GOOD  (1 << 8)
-#defineTUSB_DEV_OTG_STAT_SESS_END  (1 << 7)
-#defineTUSB_DEV_OTG_STAT_SESS_VALID(1 << 6)
-#defineTUSB_DEV_OTG_STAT_VBUS_VALID(1 << 5)
-#defineTUSB_DEV_OTG_STAT_VBUS_SENSE(1 << 4)
-#defineTUSB_DEV_OTG_STAT_ID_STATUS (1 << 3)
-#defineTUSB_DEV_OTG_STAT_HOST_DISCON   (1 << 2)
-#defineTUSB_DEV_OTG_STAT_LINE_STATE(3 << 0)
-#defineTUSB_DEV_OTG_STAT_DP_ENABLE (1 << 1)
-#defineTUSB_DEV_OTG_STAT_DM_ENABLE (1 << 0)
+#define TUSB_DEV_OTG_STAT   (TUSB_SYS_REG_BASE + 0x00c)
+#define

[Qemu-devel] Doubts regarding parallelism on KVM, IO threads

2016-03-02 Thread Gaurav Sharma

Hi was trying to do some digging for multi core scenarios both with and
without KVM.

In short i have some devices and a user application that does some r/w
operations on those devices.

As per my understanding, in case binary translation using TCG is invoked,
we only create a single Qemuthread for all vcpu's. In case of KVM we have a
Qemuthread for each vcpu as TCG is bypassed in this case.


[Test scenario]
Lets say i have devices dev1 and dev2. Test application for dev1 is
executed on core0 and dev2 on core1.
For device dev1, for testing purposes i specified some sleep whenever a
read comes.
In case KVM is enabled, whenever the sleep is hit my whole VM freezes.

1. Are are devices emulated in a separate single thread ?
2. How and where in the code do we do a switch between CPU thread and
IOthread ?

Regards,
Gaurav

Re: [Qemu-devel] [Qemu-ppc] [PATCH qemu v13 09/16] vfio: Generalize IOMMU memory listener

2016-03-02 Thread Alexey Kardashevskiy


On 03/03/2016 04:36 PM, David Gibson wrote:

On Tue, Mar 01, 2016 at 08:10:34PM +1100, Alexey Kardashevskiy wrote:

At the moment VFIOContainer uses one memory listener which listens on
PCI address space for both Type1 and sPAPR IOMMUs. Soon we will need
another listener to listen on RAM; this will do DMA memory
pre-registration for sPAPR guests which basically pins all guest
pages in the host physical RAM.

This introduces VFIOMemoryListener which is wrapper for MemoryListener
and stores a pointer to the container. This allows having multiple
memory listeners for the same container. This replaces the existing
@listener with @iommu_listener.

This should cause no change in behavior.


This is nonsense.

The two listeners you're talking about have (or should have) both a
different AS they're listening on,


They do have different AS.


*and* different notification
functions.


They do use totally different region_add/region_del, later in the series.


Since they have nothing in common, there's no point trying
to build a common structure for them.


They use the same VFIOContainer pointer. VFIOMemoryListener is made of 
MemoryListener and VFIOContainer, and that's it.


Ok, I'll get rid of VFIOMemoryListener. It is just hard sometime to 
understand what bits I have to reuse and which I do not, constant argument...






Signed-off-by: Alexey Kardashevskiy 
---
  hw/vfio/common.c  | 41 +++--
  include/hw/vfio/vfio-common.h |  9 -
  2 files changed, 39 insertions(+), 11 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index ca3fd47..0e67a5a 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -318,10 +318,10 @@ static hwaddr vfio_container_granularity(VFIOContainer 
*container)
  return (hwaddr)1 << ctz64(container->iova_pgsizes);
  }

-static void vfio_listener_region_add(MemoryListener *listener,
+static void vfio_listener_region_add(VFIOMemoryListener *vlistener,
   MemoryRegionSection *section)
  {
-VFIOContainer *container = container_of(listener, VFIOContainer, listener);
+VFIOContainer *container = vlistener->container;
  hwaddr iova, end;
  Int128 llend;
  void *vaddr;
@@ -425,10 +425,10 @@ fail:
  }
  }

-static void vfio_listener_region_del(MemoryListener *listener,
+static void vfio_listener_region_del(VFIOMemoryListener *vlistener,
   MemoryRegionSection *section)
  {
-VFIOContainer *container = container_of(listener, VFIOContainer, listener);
+VFIOContainer *container = vlistener->container;
  hwaddr iova, end;
  int ret;
  MemoryRegion *iommu = NULL;
@@ -492,14 +492,33 @@ static void vfio_listener_region_del(MemoryListener 
*listener,
  }
  }

-static const MemoryListener vfio_memory_listener = {
-.region_add = vfio_listener_region_add,
-.region_del = vfio_listener_region_del,
+static void vfio_iommu_listener_region_add(MemoryListener *listener,
+   MemoryRegionSection *section)
+{
+VFIOMemoryListener *vlistener = container_of(listener, VFIOMemoryListener,
+ listener);
+
+vfio_listener_region_add(vlistener, section);
+}
+
+
+static void vfio_iommu_listener_region_del(MemoryListener *listener,
+   MemoryRegionSection *section)
+{
+VFIOMemoryListener *vlistener = container_of(listener, VFIOMemoryListener,
+ listener);
+
+vfio_listener_region_del(vlistener, section);
+}
+
+static const MemoryListener vfio_iommu_listener = {
+.region_add = vfio_iommu_listener_region_add,
+.region_del = vfio_iommu_listener_region_del,
  };

  static void vfio_listener_release(VFIOContainer *container)
  {
-memory_listener_unregister(>listener);
+memory_listener_unregister(>iommu_listener.listener);
  }

  int vfio_mmap_region(Object *obj, VFIORegion *region,
@@ -768,9 +787,11 @@ static int vfio_connect_container(VFIOGroup *group, 
AddressSpace *as)
  goto free_container_exit;
  }

-container->listener = vfio_memory_listener;
+container->iommu_listener.container = container;
+container->iommu_listener.listener = vfio_iommu_listener;

-memory_listener_register(>listener, container->space->as);
+memory_listener_register(>iommu_listener.listener,
+ container->space->as);

  if (container->error) {
  ret = container->error;
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 9ffa681..b6b736c 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -57,12 +57,19 @@ typedef struct VFIOAddressSpace {
  QLIST_ENTRY(VFIOAddressSpace) list;
  } VFIOAddressSpace;

+typedef struct VFIOContainer VFIOContainer;
+
+typedef struct VFIOMemoryListener {
+struct MemoryListener

Re: [Qemu-devel] [Qemu-ppc] [PATCH qemu v13 07/16] vfio, memory: Notify IOMMU about starting/stopping being used by VFIO

2016-03-02 Thread Alexey Kardashevskiy


On 03/03/2016 04:28 PM, David Gibson wrote:

On Tue, Mar 01, 2016 at 08:10:32PM +1100, Alexey Kardashevskiy wrote:

This adds a vfio_votify() callback to inform an IOMMU (and then its owner)
that VFIO started using the IOMMU. This is used by the pseries machine to
enable/disable in-kernel acceleration of TCE hypercalls.

Signed-off-by: Alexey Kardashevskiy 


Hmm.. the current approach of having a hook when vfio-pci devices are
attached is pretty ugly, but what exactly the case that it doesn't
handle and this approach does?


Sorry, I am not following you here. What hook do you mean here?

My hook fixes the case when I want to enable/disable KVM acceleration, 
without these patches, I need to re-count how many vfio-pci devices are 
there and it is more ugly with PCI hotplug/unplug...




This two tiered notify system for a single bit is also kinda ugly.


---
  hw/ppc/spapr_iommu.c   |  9 +
  hw/ppc/spapr_pci.c | 14 --
  hw/vfio/common.c   |  7 +++
  include/exec/memory.h  |  2 ++
  include/hw/ppc/spapr.h |  4 
  5 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index 8a88a74..67a8356 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -136,6 +136,13 @@ static IOMMUTLBEntry 
spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr,
  return ret;
  }

+static int spapr_tce_vfio_notify(MemoryRegion *iommu, bool attached)
+{
+sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
+
+return spapr_tce_vfio_notify_owner(tcet->owner, tcet, attached);


I'm guessing the "owner" is the PHB, but I'm not entirely clear.

Could you use the QOM parent to get the the PHB instead of storing it
explicitly?



I am pretty sure I am not allowed to use the QOM parent, this is why there 
is no object_get_parent() helper.






+}
+
  static int spapr_tce_table_post_load(void *opaque, int version_id)
  {
  sPAPRTCETable *tcet = SPAPR_TCE_TABLE(opaque);
@@ -167,6 +174,7 @@ static const VMStateDescription vmstate_spapr_tce_table = {

  static MemoryRegionIOMMUOps spapr_iommu_ops = {
  .translate = spapr_tce_translate_iommu,
+.vfio_notify = spapr_tce_vfio_notify,
  };

  static int spapr_tce_table_realize(DeviceState *dev)
@@ -235,6 +243,7 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, 
uint32_t liobn)

  tcet = SPAPR_TCE_TABLE(object_new(TYPE_SPAPR_TCE_TABLE));
  tcet->liobn = liobn;
+tcet->owner = owner;

  snprintf(tmp, sizeof(tmp), "tce-table-%x", liobn);
  object_property_add_child(OBJECT(owner), tmp, OBJECT(tcet), NULL);
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index ee0fecf..b0cd148 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1084,6 +1084,14 @@ static int spapr_populate_pci_child_dt(PCIDevice *dev, 
void *fdt, int offset,
  return 0;
  }

+int spapr_tce_vfio_notify_owner(DeviceState *dev, sPAPRTCETable *tcet,
+bool attached)
+{
+spapr_tce_set_need_vfio(tcet, attached);


Hmm.. you go to the trouble of storing owner in dev, then don't
actually use it.



Yeah, I need to clean this, I removed spapr_tce_vfio_notify_owner() from my 
working branch already and call spapr_tce_set_need_vfio() directly from 
spapr_tce_vfio_notify().






+return 0;
+}
+
  /* create OF node for pci device and required OF DT properties */
  static int spapr_create_pci_child_dt(sPAPRPHBState *phb, PCIDevice *dev,
   void *fdt, int node_offset)
@@ -1118,12 +1126,6 @@ static void spapr_phb_add_pci_device(sPAPRDRConnector 
*drc,
  void *fdt = NULL;
  int fdt_start_offset = 0, fdt_size;

-if (object_dynamic_cast(OBJECT(pdev), "vfio-pci")) {
-sPAPRTCETable *tcet = spapr_tce_find_by_liobn(phb->dma_liobn);
-
-spapr_tce_set_need_vfio(tcet, true);
-}
-
  if (dev->hotplugged) {
  fdt = create_device_tree(_size);
  fdt_start_offset = spapr_create_pci_child_dt(phb, pdev, fdt, 0);
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 9bf4c3b..ca3fd47 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -384,6 +384,7 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
  QLIST_INSERT_HEAD(>giommu_list, giommu, giommu_next);

  memory_region_register_iommu_notifier(giommu->iommu, >n);
+giommu->iommu->iommu_ops->vfio_notify(section->mr, true);
  memory_region_iommu_replay(giommu->iommu, >n,
 vfio_container_granularity(container),
 false);
@@ -430,6 +431,7 @@ static void vfio_listener_region_del(MemoryListener 
*listener,
  VFIOContainer *container = container_of(listener, VFIOContainer, 
listener);
  hwaddr iova, end;
  int ret;
+MemoryRegion *iommu = NULL;

  if (vfio_listener_skipped_section(section)) {
  trace_vfio_listener_region_del_skip(
@@ -451,6 +453,7 @@ static

Re: [Qemu-devel] [Qemu-ppc] [PATCH qemu v13 09/16] vfio: Generalize IOMMU memory listener

2016-03-02 Thread David Gibson

On Tue, Mar 01, 2016 at 08:10:34PM +1100, Alexey Kardashevskiy wrote:
> At the moment VFIOContainer uses one memory listener which listens on
> PCI address space for both Type1 and sPAPR IOMMUs. Soon we will need
> another listener to listen on RAM; this will do DMA memory
> pre-registration for sPAPR guests which basically pins all guest
> pages in the host physical RAM.
> 
> This introduces VFIOMemoryListener which is wrapper for MemoryListener
> and stores a pointer to the container. This allows having multiple
> memory listeners for the same container. This replaces the existing
> @listener with @iommu_listener.
> 
> This should cause no change in behavior.

This is nonsense.

The two listeners you're talking about have (or should have) both a
different AS they're listening on, *and* different notification
functions.  Since they have nothing in common, there's no point trying
to build a common structure for them.

> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
>  hw/vfio/common.c  | 41 +++--
>  include/hw/vfio/vfio-common.h |  9 -
>  2 files changed, 39 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index ca3fd47..0e67a5a 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -318,10 +318,10 @@ static hwaddr vfio_container_granularity(VFIOContainer 
> *container)
>  return (hwaddr)1 << ctz64(container->iova_pgsizes);
>  }
>  
> -static void vfio_listener_region_add(MemoryListener *listener,
> +static void vfio_listener_region_add(VFIOMemoryListener *vlistener,
>   MemoryRegionSection *section)
>  {
> -VFIOContainer *container = container_of(listener, VFIOContainer, 
> listener);
> +VFIOContainer *container = vlistener->container;
>  hwaddr iova, end;
>  Int128 llend;
>  void *vaddr;
> @@ -425,10 +425,10 @@ fail:
>  }
>  }
>  
> -static void vfio_listener_region_del(MemoryListener *listener,
> +static void vfio_listener_region_del(VFIOMemoryListener *vlistener,
>   MemoryRegionSection *section)
>  {
> -VFIOContainer *container = container_of(listener, VFIOContainer, 
> listener);
> +VFIOContainer *container = vlistener->container;
>  hwaddr iova, end;
>  int ret;
>  MemoryRegion *iommu = NULL;
> @@ -492,14 +492,33 @@ static void vfio_listener_region_del(MemoryListener 
> *listener,
>  }
>  }
>  
> -static const MemoryListener vfio_memory_listener = {
> -.region_add = vfio_listener_region_add,
> -.region_del = vfio_listener_region_del,
> +static void vfio_iommu_listener_region_add(MemoryListener *listener,
> +   MemoryRegionSection *section)
> +{
> +VFIOMemoryListener *vlistener = container_of(listener, 
> VFIOMemoryListener,
> + listener);
> +
> +vfio_listener_region_add(vlistener, section);
> +}
> +
> +
> +static void vfio_iommu_listener_region_del(MemoryListener *listener,
> +   MemoryRegionSection *section)
> +{
> +VFIOMemoryListener *vlistener = container_of(listener, 
> VFIOMemoryListener,
> + listener);
> +
> +vfio_listener_region_del(vlistener, section);
> +}
> +
> +static const MemoryListener vfio_iommu_listener = {
> +.region_add = vfio_iommu_listener_region_add,
> +.region_del = vfio_iommu_listener_region_del,
>  };
>  
>  static void vfio_listener_release(VFIOContainer *container)
>  {
> -memory_listener_unregister(>listener);
> +memory_listener_unregister(>iommu_listener.listener);
>  }
>  
>  int vfio_mmap_region(Object *obj, VFIORegion *region,
> @@ -768,9 +787,11 @@ static int vfio_connect_container(VFIOGroup *group, 
> AddressSpace *as)
>  goto free_container_exit;
>  }
>  
> -container->listener = vfio_memory_listener;
> +container->iommu_listener.container = container;
> +container->iommu_listener.listener = vfio_iommu_listener;
>  
> -memory_listener_register(>listener, container->space->as);
> +memory_listener_register(>iommu_listener.listener,
> + container->space->as);
>  
>  if (container->error) {
>  ret = container->error;
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 9ffa681..b6b736c 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -57,12 +57,19 @@ typedef struct VFIOAddressSpace {
>  QLIST_ENTRY(VFIOAddressSpace) list;
>  } VFIOAddressSpace;
>  
> +typedef struct VFIOContainer VFIOContainer;
> +
> +typedef struct VFIOMemoryListener {
> +struct MemoryListener listener;
> +VFIOContainer *container;
> +} VFIOMemoryListener;
> +
>  struct VFIOGroup;
>  
>  typedef struct VFIOContainer {
>  VFIOAddressSpace *space;
>  int fd; /* /dev/vfio/vfio, empowered by the

Re: [Qemu-devel] [Qemu-ppc] [PATCH qemu v13 08/16] memory: Add reporting of supported page sizes

2016-03-02 Thread David Gibson

On Tue, Mar 01, 2016 at 08:10:33PM +1100, Alexey Kardashevskiy wrote:
> Every IOMMU has some granularity which MemoryRegionIOMMUOps::translate
> uses when translating, however this information is not available outside
> the translate context for various checks.
> 
> This adds a get_page_sizes callback to MemoryRegionIOMMUOps and
> a wrapper for it so IOMMU users (such as VFIO) can know the actual
> page size(s) used by an IOMMU.
> 
> The qemu_real_host_page_mask is used as fallback.
> 
> Signed-off-by: Alexey Kardashevskiy 

I'm going to have to see how this gets used to really analyze it.
But, a preliminary comment:

Once this is added, it should be possible to remove the explicit page
size parameter from the iommu_replay function (since it could be
derived from the IOMMU page sizes).

> ---
> Changes:
> v4:
> * s/1< ---
>  hw/ppc/spapr_iommu.c  |  8 
>  include/exec/memory.h | 11 +++
>  memory.c  |  9 +
>  3 files changed, 28 insertions(+)
> 
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 67a8356..4c52cf4 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -143,6 +143,13 @@ static int spapr_tce_vfio_notify(MemoryRegion *iommu, 
> bool attached)
>  return spapr_tce_vfio_notify_owner(tcet->owner, tcet, attached);
>  }
>  
> +static uint64_t spapr_tce_get_page_sizes(MemoryRegion *iommu)
> +{
> +sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
> +
> +return 1ULL << tcet->page_shift;
> +}
> +
>  static int spapr_tce_table_post_load(void *opaque, int version_id)
>  {
>  sPAPRTCETable *tcet = SPAPR_TCE_TABLE(opaque);
> @@ -175,6 +182,7 @@ static const VMStateDescription vmstate_spapr_tce_table = 
> {
>  static MemoryRegionIOMMUOps spapr_iommu_ops = {
>  .translate = spapr_tce_translate_iommu,
>  .vfio_notify = spapr_tce_vfio_notify,
> +.get_page_sizes = spapr_tce_get_page_sizes,
>  };
>  
>  static int spapr_tce_table_realize(DeviceState *dev)
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 9f82629..c34e67c 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -152,6 +152,8 @@ struct MemoryRegionIOMMUOps {
>  IOMMUTLBEntry (*translate)(MemoryRegion *iommu, hwaddr addr, bool 
> is_write);
>  /* Called when VFIO starts/stops using this */
>  int (*vfio_notify)(MemoryRegion *iommu, bool attached);
> +/* Returns supported page sizes */
> +uint64_t (*get_page_sizes)(MemoryRegion *iommu);
>  };
>  
>  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
> @@ -576,6 +578,15 @@ static inline bool memory_region_is_iommu(MemoryRegion 
> *mr)
>  
>  
>  /**
> + * memory_region_iommu_get_page_sizes: get supported page sizes in an iommu
> + *
> + * Returns %bitmap of supported page sizes for an iommu.
> + *
> + * @mr: the memory region being queried
> + */
> +uint64_t memory_region_iommu_get_page_sizes(MemoryRegion *mr);
> +
> +/**
>   * memory_region_notify_iommu: notify a change in an IOMMU translation entry.
>   *
>   * @mr: the memory region that was changed
> diff --git a/memory.c b/memory.c
> index 0dd9695..5d8453d 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1462,6 +1462,15 @@ void memory_region_notify_iommu(MemoryRegion *mr,
>  notifier_list_notify(>iommu_notify, );
>  }
>  
> +uint64_t memory_region_iommu_get_page_sizes(MemoryRegion *mr)
> +{
> +assert(memory_region_is_iommu(mr));
> +if (mr->iommu_ops && mr->iommu_ops->get_page_sizes) {
> +return mr->iommu_ops->get_page_sizes(mr);
> +}
> +return qemu_real_host_page_size;
> +}
> +
>  void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client)
>  {
>  uint8_t mask = 1 << client;

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [Qemu-ppc] [PATCH qemu v13 07/16] vfio, memory: Notify IOMMU about starting/stopping being used by VFIO

2016-03-02 Thread David Gibson

On Tue, Mar 01, 2016 at 08:10:32PM +1100, Alexey Kardashevskiy wrote:
> This adds a vfio_votify() callback to inform an IOMMU (and then its owner)
> that VFIO started using the IOMMU. This is used by the pseries machine to
> enable/disable in-kernel acceleration of TCE hypercalls.
> 
> Signed-off-by: Alexey Kardashevskiy 

Hmm.. the current approach of having a hook when vfio-pci devices are
attached is pretty ugly, but what exactly the case that it doesn't
handle and this approach does?

This two tiered notify system for a single bit is also kinda ugly.

> ---
>  hw/ppc/spapr_iommu.c   |  9 +
>  hw/ppc/spapr_pci.c | 14 --
>  hw/vfio/common.c   |  7 +++
>  include/exec/memory.h  |  2 ++
>  include/hw/ppc/spapr.h |  4 
>  5 files changed, 30 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 8a88a74..67a8356 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -136,6 +136,13 @@ static IOMMUTLBEntry 
> spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr,
>  return ret;
>  }
>  
> +static int spapr_tce_vfio_notify(MemoryRegion *iommu, bool attached)
> +{
> +sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
> +
> +return spapr_tce_vfio_notify_owner(tcet->owner, tcet, attached);

I'm guessing the "owner" is the PHB, but I'm not entirely clear.

Could you use the QOM parent to get the the PHB instead of storing it
explicitly?

> +}
> +
>  static int spapr_tce_table_post_load(void *opaque, int version_id)
>  {
>  sPAPRTCETable *tcet = SPAPR_TCE_TABLE(opaque);
> @@ -167,6 +174,7 @@ static const VMStateDescription vmstate_spapr_tce_table = 
> {
>  
>  static MemoryRegionIOMMUOps spapr_iommu_ops = {
>  .translate = spapr_tce_translate_iommu,
> +.vfio_notify = spapr_tce_vfio_notify,
>  };
>  
>  static int spapr_tce_table_realize(DeviceState *dev)
> @@ -235,6 +243,7 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, 
> uint32_t liobn)
>  
>  tcet = SPAPR_TCE_TABLE(object_new(TYPE_SPAPR_TCE_TABLE));
>  tcet->liobn = liobn;
> +tcet->owner = owner;
>  
>  snprintf(tmp, sizeof(tmp), "tce-table-%x", liobn);
>  object_property_add_child(OBJECT(owner), tmp, OBJECT(tcet), NULL);
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index ee0fecf..b0cd148 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1084,6 +1084,14 @@ static int spapr_populate_pci_child_dt(PCIDevice *dev, 
> void *fdt, int offset,
>  return 0;
>  }
>  
> +int spapr_tce_vfio_notify_owner(DeviceState *dev, sPAPRTCETable *tcet,
> +bool attached)
> +{
> +spapr_tce_set_need_vfio(tcet, attached);

Hmm.. you go to the trouble of storing owner in dev, then don't
actually use it.

> +return 0;
> +}
> +
>  /* create OF node for pci device and required OF DT properties */
>  static int spapr_create_pci_child_dt(sPAPRPHBState *phb, PCIDevice *dev,
>   void *fdt, int node_offset)
> @@ -1118,12 +1126,6 @@ static void spapr_phb_add_pci_device(sPAPRDRConnector 
> *drc,
>  void *fdt = NULL;
>  int fdt_start_offset = 0, fdt_size;
>  
> -if (object_dynamic_cast(OBJECT(pdev), "vfio-pci")) {
> -sPAPRTCETable *tcet = spapr_tce_find_by_liobn(phb->dma_liobn);
> -
> -spapr_tce_set_need_vfio(tcet, true);
> -}
> -
>  if (dev->hotplugged) {
>  fdt = create_device_tree(_size);
>  fdt_start_offset = spapr_create_pci_child_dt(phb, pdev, fdt, 0);
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 9bf4c3b..ca3fd47 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -384,6 +384,7 @@ static void vfio_listener_region_add(MemoryListener 
> *listener,
>  QLIST_INSERT_HEAD(>giommu_list, giommu, giommu_next);
>  
>  memory_region_register_iommu_notifier(giommu->iommu, >n);
> +giommu->iommu->iommu_ops->vfio_notify(section->mr, true);
>  memory_region_iommu_replay(giommu->iommu, >n,
> vfio_container_granularity(container),
> false);
> @@ -430,6 +431,7 @@ static void vfio_listener_region_del(MemoryListener 
> *listener,
>  VFIOContainer *container = container_of(listener, VFIOContainer, 
> listener);
>  hwaddr iova, end;
>  int ret;
> +MemoryRegion *iommu = NULL;
>  
>  if (vfio_listener_skipped_section(section)) {
>  trace_vfio_listener_region_del_skip(
> @@ -451,6 +453,7 @@ static void vfio_listener_region_del(MemoryListener 
> *listener,
>  QLIST_FOREACH(giommu, >giommu_list, giommu_next) {
>  if (giommu->iommu == section->mr) {
>  memory_region_unregister_iommu_notifier(>n);
> +iommu = giommu->iommu;
>  QLIST_REMOVE(giommu, giommu_next);
>  g_free(giommu);
>  break;
> @@ -483,6 +486,10 @@ static void

Re: [Qemu-devel] Performance Profiling 2 VMs

2016-03-02 Thread kalyan tata

Thanks a lot for the quick reply Stefan

Following from problem VM:
18:56:29 CPU   %user   %nice%sys %iowait%irq   %soft  %steal
%idleintr/s

18:56:44   10.000.000.000.000.000.000.00
 100.20  0.00
18:56:49   10.000.000.000.003.21   10.220.00
79.56908.22
18:56:54   10.000.000.000.00   11.47   54.930.00
 2.82   5527.77
18:56:59   10.000.000.000.00   10.04   66.060.00
 2.21   7160.64
18:57:04   10.000.000.000.00   10.42   65.130.00
 2.00   7295.99
18:57:09   10.000.000.000.00   12.53   50.510.00
 4.04   5700.20
18:57:14   10.000.000.000.00   16.43   65.530.00
 8.62   9572.34
18:57:19   10.000.000.000.00   11.45   60.640.00
 4.02   5798.19
18:57:24   10.000.000.000.00   11.45   81.330.00
 0.80   6064.26
18:57:29   10.000.000.000.007.65   85.110.00
 0.80   7578.27
18:57:34   10.000.000.000.009.42   84.170.00
 1.40   9083.97
18:57:39   10.000.000.000.007.78   82.830.00
 1.60   7264.87
18:57:44   10.000.000.000.008.62   87.780.00
 0.60   8597.80
18:57:49   10.000.000.000.00   10.02   82.160.00
 2.40   7750.90
18:57:54   10.000.000.000.008.42   81.760.00
 1.00   6303.41
18:57:59   10.000.000.000.007.63   87.350.00
 1.20   9422.49
18:58:04   10.000.000.000.00   10.44   80.320.00
 2.21   7496.79
18:58:09   10.000.000.000.006.43   59.840.00
26.91   5019.28
18:58:14   10.000.000.000.000.000.000.00
 100.00  1.00
18:58:19   10.000.000.000.000.000.000.00
 100.00  0.00

I set the affinity of both tx and rx interfaces to cpu 1 so just showing
cpu1.

NAPI weight is 128 in this version, I changed to 64 just to see. This
version of the code seems to be changing quota and budget (which i did not
see in newer versions) I am thinking of playing around with that.
I also see that this version kicks for every packet on the tx side.

Any other pointers would be really helpful.

Thanks
Kal




On Wed, Mar 2, 2016 at 3:28 AM, Stefan Hajnoczi  wrote:

> On Tue, Mar 01, 2016 at 04:06:16PM -0800, kalyan tata wrote:
> > Hi All,
> >
> > I am new to qemu development.
> > Sorry If this is not the correct forum for this question, it would be
> great
> > if you could direct me to correct forum.
> >
> > I am seeing very low virtio network throughput on an older (2.6.18) linux
> > guest  vs another newer guest (3.10) both running on the same host. (same
> > config 2 vcpus, no multi Q etc.)  I see very high CPU usage on the 2.6.18
> > guest at very low network throughput and want to profile to find
> > bottleneck.
> >
> > I tried to use "perf kvm" but the analysis  shows overhead as  max .25 %
> > where as top in VM shows 100% cpu. (I used following as a guide
> >
> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html-single/Virtualization_Tuning_and_Optimization_Guide/index.html#sect-Virtualization_Tuning_Optimization_Guide-Monitoring_Tools-perf_kvm
> > )
> >
> >  0.25%  :5235[uhci_hcd][g] 0x80182236
> >  0.24%  :5235[uhci_hcd][g] 0x8018226a
> >  0.23%  :5235[virtio_ring] [g] vring_new_virtqueue
> >  0.20%  :5236[uhci_hcd][g] 0x80182236
> >  0.18%  :5236[uhci_hcd][g] 0x8018226a
> >  0.18%  :5235[uhci_hcd][g] 0x8016f385
> >  0.14%  :5236[uhci_hcd][g] 0x802fbe0f
> >  0.14%  :5235[uhci_hcd][g] 0x8001161a
> >  0.14%  :5235[virtio_ring] [g] virtqueue_is_broken
> >
> >
> > My basic question is - Is there a way I can profile the older version of
> > linux guest so i can see the bottleneck (where the guest is spending CPU
> > cycles) My aim is to see if i can patch the older version in the critical
> > path with improvements made in newer version
>
> What is the output of "mpstat 5" in the guest and on the host?  mpstat
> is part of the "sysstat" package.
>
> mpstat is similar to vmstat but also shows "guest time" and "steal
> time".  Both are relevant to virtualization and will help show which
> component is using so much CPU time.
>
> Stefan
>

[Qemu-devel] [PATCH 4/7] target-i386: Dump illegal opcodes with -d unimp

2016-03-02 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 target-i386/translate.c | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/target-i386/translate.c b/target-i386/translate.c
index b73c237..aa423cb 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -99,6 +99,7 @@ typedef struct DisasContext {
 int prefix;
 TCGMemOp aflag;
 TCGMemOp dflag;
+target_ulong pc_start;
 target_ulong pc; /* pc = eip + cs_base */
 int is_jmp; /* 1 = means jump (stop translation), 2 means CPU
static state change (stop translation) */
@@ -2368,6 +2369,21 @@ static void gen_exception(DisasContext *s, int trapno, 
target_ulong cur_eip)
 s->is_jmp = DISAS_TB_JUMP;
 }
 
+static void gen_illop(CPUX86State *env, DisasContext *s)
+{
+target_ulong pc = s->pc_start;
+gen_exception(s, EXCP06_ILLOP, pc - s->cs_base);
+
+if (qemu_loglevel_mask(LOG_UNIMP)) {
+target_ulong end = s->pc;
+qemu_log("ILLOPC: " TARGET_FMT_lx ":", pc);
+for (; pc < end; ++pc) {
+qemu_log(" %02x", cpu_ldub_code(env, pc));
+}
+qemu_log("\n");
+}
+}
+
 /* an interrupt is different from an exception because of the
privilege checks */
 static void gen_interrupt(DisasContext *s, int intno,
@@ -2893,7 +2909,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, 
int b,
 }
 if (s->flags & HF_EM_MASK) {
 illegal_op:
-gen_exception(s, EXCP06_ILLOP, pc_start - s->cs_base);
+gen_illop(env, s);
 return;
 }
 if (is_xmm && !(s->flags & HF_OSFXSR_MASK))
@@ -4293,7 +4309,7 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
 target_ulong next_eip, tval;
 int rex_w, rex_r;
 
-s->pc = pc_start;
+s->pc_start = s->pc = pc_start;
 prefixes = 0;
 s->override = -1;
 rex_w = -1;
@@ -8031,7 +8047,7 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
 if (s->prefix & PREFIX_LOCK)
 gen_helper_unlock();
 /* XXX: ensure that no lock was generated */
-gen_exception(s, EXCP06_ILLOP, pc_start - s->cs_base);
+gen_illop(env, s);
 return s->pc;
 }
 
-- 
2.5.0

[Qemu-devel] [PATCH 7/7] target-i386: Fix inhibit irq mask handling

2016-03-02 Thread Richard Henderson

The patch in 7f0b714 was too simplistic, in that we wound up setting
the flag and then resetting it immediately in gen_eob.

Fixes the reported boot problem with Windows XP.

Reported-by: Hervé Poussineau 
Signed-off-by: Richard Henderson 
---
 target-i386/translate.c | 76 -
 1 file changed, 37 insertions(+), 39 deletions(-)

diff --git a/target-i386/translate.c b/target-i386/translate.c
index 7455a18..513b53a 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -2441,12 +2441,19 @@ static void gen_bnd_jmp(DisasContext *s)
 }
 }
 
-/* generate a generic end of block. Trace exception is also generated
-   if needed */
-static void gen_eob(DisasContext *s)
+/* Generate an end of block. Trace exception is also generated if needed.
+   If IIM, set HF_INHIBIT_IRQ_MASK if it isn't already set.  */
+static void gen_eob_inhibit_irq(DisasContext *s, bool inhibit)
 {
 gen_update_cc_op(s);
-gen_reset_hflag(s, HF_INHIBIT_IRQ_MASK);
+
+/* If several instructions disable interrupts, only the first does it.  */
+if (inhibit && !(s->flags & HF_INHIBIT_IRQ_MASK)) {
+gen_set_hflag(s, HF_INHIBIT_IRQ_MASK);
+} else {
+gen_reset_hflag(s, HF_INHIBIT_IRQ_MASK);
+}
+
 if (s->tb->flags & HF_RF_MASK) {
 gen_helper_reset_rf(cpu_env);
 }
@@ -2460,6 +2467,12 @@ static void gen_eob(DisasContext *s)
 s->is_jmp = DISAS_TB_JUMP;
 }
 
+/* End of block, resetting the inhibit irq flag.  */
+static void gen_eob(DisasContext *s)
+{
+gen_eob_inhibit_irq(s, false);
+}
+
 /* generate a jump to eip. No segment change must happen before as a
direct call to the next block may occur */
 static void gen_jmp_tb(DisasContext *s, target_ulong eip, int tb_num)
@@ -5193,16 +5206,15 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
 ot = gen_pop_T0(s);
 gen_movl_seg_T0(s, reg);
 gen_pop_update(s, ot);
-if (reg == R_SS) {
-/* if reg == SS, inhibit interrupts/trace. */
-/* If several instructions disable interrupts, only the
-   _first_ does it */
-gen_set_hflag(s, HF_INHIBIT_IRQ_MASK);
-s->tf = 0;
-}
+/* Note that reg == R_SS in gen_movl_seg_T0 always sets is_jmp.  */
 if (s->is_jmp) {
 gen_jmp_im(s->pc - s->cs_base);
-gen_eob(s);
+if (reg == R_SS) {
+s->tf = 0;
+gen_eob_inhibit_irq(s, true);
+} else {
+gen_eob(s);
+}
 }
 break;
 case 0x1a1: /* pop fs */
@@ -5260,16 +5272,15 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
 goto illegal_op;
 gen_ldst_modrm(env, s, modrm, MO_16, OR_TMP0, 0);
 gen_movl_seg_T0(s, reg);
-if (reg == R_SS) {
-/* if reg == SS, inhibit interrupts/trace */
-/* If several instructions disable interrupts, only the
-   _first_ does it */
-gen_set_hflag(s, HF_INHIBIT_IRQ_MASK);
-s->tf = 0;
-}
+/* Note that reg == R_SS in gen_movl_seg_T0 always sets is_jmp.  */
 if (s->is_jmp) {
 gen_jmp_im(s->pc - s->cs_base);
-gen_eob(s);
+if (reg == R_SS) {
+s->tf = 0;
+gen_eob_inhibit_irq(s, true);
+} else {
+gen_eob(s);
+}
 }
 break;
 case 0x8c: /* mov Gv, seg */
@@ -6795,26 +6806,13 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
 }
 break;
 case 0xfb: /* sti */
-if (!s->vm86) {
-if (s->cpl <= s->iopl) {
-gen_sti:
-gen_helper_sti(cpu_env);
-/* interruptions are enabled only the first insn after sti */
-/* If several instructions disable interrupts, only the
-   _first_ does it */
-gen_set_hflag(s, HF_INHIBIT_IRQ_MASK);
-/* give a chance to handle pending irqs */
-gen_jmp_im(s->pc - s->cs_base);
-gen_eob(s);
-} else {
-gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
-}
+if (s->vm86 ? s->iopl == 3 : s->cpl <= s->iopl) {
+gen_helper_sti(cpu_env);
+/* interruptions are enabled only the first insn after sti */
+gen_jmp_im(s->pc - s->cs_base);
+gen_eob_inhibit_irq(s, true);
 } else {
-if (s->iopl == 3) {
-goto gen_sti;
-} else {
-gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
-}
+gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
 }
 break;
 case 0x62: /* bound */
-- 
2.5.0

[Qemu-devel] [PATCH 2/7] target-i386: fix smsw and lmsw from/to register

2016-03-02 Thread Richard Henderson

From: Paolo Bonzini 

SMSW and LMSW accept register operands, but commit 1906b2a ("target-i386:
Rearrange processing of 0F 01", 2016-02-13) did not account for that.

Fixes: 1906b2af7c2345037d9b2fdf484b457b5acd09d1
Cc: r...@twiddle.net
Reported-by: Hervé Poussineau 
Signed-off-by: Paolo Bonzini 
Message-Id: <1456845134-18812-1-git-send-email-pbonz...@redhat.com>
Signed-off-by: Richard Henderson 
---
 target-i386/translate.c | 38 ++
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/target-i386/translate.c b/target-i386/translate.c
index cd214a6..10cc2fa 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -57,11 +57,17 @@
 #endif
 
 /* For a switch indexed by MODRM, match all memory operands for a given OP.  */
-#define CASE_MEM_OP(OP) \
+#define CASE_MODRM_MEM_OP(OP) \
 case (0 << 6) | (OP << 3) | 0 ... (0 << 6) | (OP << 3) | 7: \
 case (1 << 6) | (OP << 3) | 0 ... (1 << 6) | (OP << 3) | 7: \
 case (2 << 6) | (OP << 3) | 0 ... (2 << 6) | (OP << 3) | 7
 
+#define CASE_MODRM_OP(OP) \
+case (0 << 6) | (OP << 3) | 0 ... (0 << 6) | (OP << 3) | 7: \
+case (1 << 6) | (OP << 3) | 0 ... (1 << 6) | (OP << 3) | 7: \
+case (2 << 6) | (OP << 3) | 0 ... (2 << 6) | (OP << 3) | 7: \
+case (3 << 6) | (OP << 3) | 0 ... (3 << 6) | (OP << 3) | 7
+
 //#define MACRO_TEST   1
 
 /* global register indexes */
@@ -7038,7 +7044,7 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
 case 0x101:
 modrm = cpu_ldub_code(env, s->pc++);
 switch (modrm) {
-CASE_MEM_OP(0): /* sgdt */
+CASE_MODRM_MEM_OP(0): /* sgdt */
 gen_svm_check_intercept(s, pc_start, SVM_EXIT_GDTR_READ);
 gen_lea_modrm(env, s, modrm);
 tcg_gen_ld32u_tl(cpu_T0,
@@ -7094,7 +7100,7 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
 gen_eob(s);
 break;
 
-CASE_MEM_OP(1): /* sidt */
+CASE_MODRM_MEM_OP(1): /* sidt */
 gen_svm_check_intercept(s, pc_start, SVM_EXIT_IDTR_READ);
 gen_lea_modrm(env, s, modrm);
 tcg_gen_ld32u_tl(cpu_T0, cpu_env, offsetof(CPUX86State, 
idt.limit));
@@ -7240,7 +7246,7 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
 gen_helper_invlpga(cpu_env, tcg_const_i32(s->aflag - 1));
 break;
 
-CASE_MEM_OP(2): /* lgdt */
+CASE_MODRM_MEM_OP(2): /* lgdt */
 if (s->cpl != 0) {
 gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
 break;
@@ -7257,7 +7263,7 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
 tcg_gen_st32_tl(cpu_T1, cpu_env, offsetof(CPUX86State, gdt.limit));
 break;
 
-CASE_MEM_OP(3): /* lidt */
+CASE_MODRM_MEM_OP(3): /* lidt */
 if (s->cpl != 0) {
 gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
 break;
@@ -7274,7 +7280,7 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
 tcg_gen_st32_tl(cpu_T1, cpu_env, offsetof(CPUX86State, idt.limit));
 break;
 
-CASE_MEM_OP(4): /* smsw */
+CASE_MODRM_OP(4): /* smsw */
 gen_svm_check_intercept(s, pc_start, SVM_EXIT_READ_CR0);
 #if defined TARGET_X86_64 && defined HOST_WORDS_BIGENDIAN
 tcg_gen_ld32u_tl(cpu_T0, cpu_env, offsetof(CPUX86State, cr[0]) + 
4);
@@ -7284,7 +7290,7 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
 gen_ldst_modrm(env, s, modrm, MO_16, OR_TMP0, 1);
 break;
 
-CASE_MEM_OP(6): /* lmsw */
+CASE_MODRM_OP(6): /* lmsw */
 if (s->cpl != 0) {
 gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
 break;
@@ -7296,7 +7302,7 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
 gen_eob(s);
 break;
 
-CASE_MEM_OP(7): /* invlpg */
+CASE_MODRM_MEM_OP(7): /* invlpg */
 if (s->cpl != 0) {
 gen_exception(s, EXCP0D_GPF, pc_start - s->cs_base);
 break;
@@ -7778,7 +7784,7 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
 case 0x1ae:
 modrm = cpu_ldub_code(env, s->pc++);
 switch (modrm) {
-CASE_MEM_OP(0): /* fxsave */
+CASE_MODRM_MEM_OP(0): /* fxsave */
 if (!(s->cpuid_features & CPUID_FXSR)
 || (prefixes & PREFIX_LOCK)) {
 goto illegal_op;
@@ -7791,7 +7797,7 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
 gen_helper_fxsave(cpu_env, cpu_A0);
 break;
 
-CASE_MEM_OP(1): /* fxrstor */
+CASE_MODRM_MEM_OP(1): /* fxrstor */
 if (!(s->cpuid_features & CPUID_FXSR)
 ||

[Qemu-devel] [PATCH 3/7] target-i386: Fix SMSW for 64-bit mode

2016-03-02 Thread Richard Henderson

In non-64-bit modes, the instruction always stores 16 bits.
But in 64-bit mode, when the destination is a register, the
instruction can write 32 or 64 bits.

Signed-off-by: Richard Henderson 
---
 target-i386/translate.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/target-i386/translate.c b/target-i386/translate.c
index 10cc2fa..b73c237 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -7282,12 +7282,14 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
 
 CASE_MODRM_OP(4): /* smsw */
 gen_svm_check_intercept(s, pc_start, SVM_EXIT_READ_CR0);
-#if defined TARGET_X86_64 && defined HOST_WORDS_BIGENDIAN
-tcg_gen_ld32u_tl(cpu_T0, cpu_env, offsetof(CPUX86State, cr[0]) + 
4);
-#else
-tcg_gen_ld32u_tl(cpu_T0, cpu_env, offsetof(CPUX86State, cr[0]));
-#endif
-gen_ldst_modrm(env, s, modrm, MO_16, OR_TMP0, 1);
+tcg_gen_ld_tl(cpu_T0, cpu_env, offsetof(CPUX86State, cr[0]));
+if (CODE64(s)) {
+mod = (modrm >> 6) & 3;
+ot = (mod != 3 ? MO_16 : s->dflag);
+} else {
+ot = MO_16;
+}
+gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 1);
 break;
 
 CASE_MODRM_OP(6): /* lmsw */
-- 
2.5.0

[Qemu-devel] [PATCH 1/7] target-i386: avoid repeated calls to the bnd_jmp helper

2016-03-02 Thread Richard Henderson

From: Paolo Bonzini 

Two flags were tested the wrong way.

Signed-off-by: Paolo Bonzini 
Message-Id: <1456845145-18891-1-git-send-email-pbonz...@redhat.com>
Signed-off-by: Richard Henderson 
[rth: Fixed enable test as well.]
---
 target-i386/translate.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/target-i386/translate.c b/target-i386/translate.c
index 53dee79..cd214a6 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -2409,12 +2409,12 @@ static void gen_reset_hflag(DisasContext *s, uint32_t 
mask)
 /* Clear BND registers during legacy branches.  */
 static void gen_bnd_jmp(DisasContext *s)
 {
-/* Do nothing if BND prefix present, MPX is disabled, or if the
-   BNDREGs are known to be in INIT state already.  The helper
-   itself will check BNDPRESERVE at runtime.  */
+/* Clear the registers only if BND prefix is missing, MPX is enabled,
+   and if the BNDREGs are known to be in use (non-zero) already.
+   The helper itself will check BNDPRESERVE at runtime.  */
 if ((s->prefix & PREFIX_REPNZ) == 0
-&& (s->flags & HF_MPX_EN_MASK) == 0
-&& (s->flags & HF_MPX_IU_MASK) == 0) {
+&& (s->flags & HF_MPX_EN_MASK) != 0
+&& (s->flags & HF_MPX_IU_MASK) != 0) {
 gen_helper_bnd_jmp(cpu_env);
 }
 }
-- 
2.5.0

[Qemu-devel] [PATCH 5/7] target-i386: fix addr16 prefix

2016-03-02 Thread Richard Henderson

From: Paolo Bonzini 

While ADDSEG will only be false in 16-bit mode for LEA, it can be
false even in other cases when 16-bit addresses are obtained via
the 67h prefix in 32-bit mode.  In this case, gen_lea_v_seg forgets
to add a nonzero FS or GS base if CS/DS/ES/SS are all zero.  This
case is pretty rare but happens when booting Windows 95/98, and
this patch fixes it.

The bug is visible since commit d6a291498, but it was introduced
together with gen_lea_v_seg and it probably could be reproduced
with a "addr16 gs movsb" instruction as early as in commit
ca2f29f555805d07fb0b9ebfbbfc4e3656530977.

Reported-by: Hervé Poussineau 
Signed-off-by: Paolo Bonzini 
Message-Id: <1456931078-21635-1-git-send-email-pbonz...@redhat.com>
Signed-off-by: Richard Henderson 
---
 target-i386/translate.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/target-i386/translate.c b/target-i386/translate.c
index aa423cb..ed54381 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -467,15 +467,15 @@ static void gen_lea_v_seg(DisasContext *s, TCGMemOp 
aflag, TCGv a0,
 break;
 case MO_16:
 /* 16 bit address */
-if (ovr_seg < 0) {
-ovr_seg = def_seg;
-}
 tcg_gen_ext16u_tl(cpu_A0, a0);
-/* ADDSEG will only be false in 16-bit mode for LEA.  */
-if (!s->addseg) {
-return;
-}
 a0 = cpu_A0;
+if (ovr_seg < 0) {
+if (s->addseg) {
+ovr_seg = def_seg;
+} else {
+return;
+}
+}
 break;
 default:
 tcg_abort();
-- 
2.5.0

[Qemu-devel] [PATCH 0/7] target-i386 fixes

2016-03-02 Thread Richard Henderson

This is primarily patches fixing Windows booting regressions
introduced by myself.  Many thanks to Herve for reporting them
and Paolo for fixing two of them.

Changes from patches previously seen on-list:
  * Tweak the bnd_jmp patch to test MPX enabled properly.
  * Dump illegal opcode data with -d unimp.
  * Use gen_nop_modrm for prefetch.
  * Fix Windows XP booting.

The illegal opcode dumping was useful for debugging.  The prefetch
patch I noticed while reviewing the addr16 patch wrt lea.


r~


Paolo Bonzini (3):
  target-i386: avoid repeated calls to the bnd_jmp helper
  target-i386: fix smsw and lmsw from/to register
  target-i386: fix addr16 prefix

Richard Henderson (4):
  target-i386: Fix SMSW for 64-bit mode
  target-i386: Dump illegal opcodes with -d unimp
  target-i386: Use gen_nop_modrm for prefetch instructions
  target-i386: Fix inhibit irq mask handling

 target-i386/translate.c | 179 +++-
 1 file changed, 100 insertions(+), 79 deletions(-)

-- 
2.5.0

[Qemu-devel] [PATCH 6/7] target-i386: Use gen_nop_modrm for prefetch instructions

2016-03-02 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 target-i386/translate.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/target-i386/translate.c b/target-i386/translate.c
index ed54381..7455a18 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -7491,7 +7491,7 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
 case 3: /* prefetchnt0 */
 if (mod == 3)
 goto illegal_op;
-gen_lea_modrm(env, s, modrm);
+gen_nop_modrm(env, s, modrm);
 /* nothing more to do */
 break;
 default: /* nop (multi byte) */
@@ -7989,8 +7989,7 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
 mod = (modrm >> 6) & 3;
 if (mod == 3)
 goto illegal_op;
-gen_lea_modrm(env, s, modrm);
-/* ignore for now */
+gen_nop_modrm(env, s, modrm);
 break;
 case 0x1aa: /* rsm */
 gen_svm_check_intercept(s, pc_start, SVM_EXIT_RSM);
-- 
2.5.0

Re: [Qemu-devel] [Qemu-arm] [PATCH v2 2/3] hw/intc: Add (new) ASPEED AST2400 AVIC device model

2016-03-02 Thread Andrew Jeffery

On Thu, 2016-02-25 at 16:29 +, Peter Maydell wrote:
> > +case 0x20: /* Interrupt Enable */
> > +s->int_enable |= data;
> 
> Are you sure this only ORs in new 1 bits?

As in, am I sure I only want to take the newly set bits? If so, yes, as
the the following register serves to clear the field's set bits:

> 
> > +break;
> > +case 0x28: /* Interrupt Enable Clear */
> > +s->int_enable &= ~data;
> > +break;

The 'int_enable', 'int_trigger' and 'edge_status' fields all use the pa
ttern of separate set and clear registers (the remaining registers may
benefit from the extract64/deposit64 helpers, I'll think about that
further). I'll add some comments to help clear this up.

Otherwise, can you rephrase the question? At face value it seems like
you're implying that I'm doing more than ORing in the new 1 bits?

Andrew

signature.asc
Description: This is a digitally signed message part

Re: [Qemu-devel] [PATCH] rng-random: implement request queue

2016-03-02 Thread Amit Shah

On (Thu) 04 Feb 2016 [13:07:35], Ladi Prosek wrote:
> - Original Message -
> > - Original Message -
> > > 
> > > 
> > > On 03/02/2016 13:36, Amit Shah wrote:
> > > > ... and this can lead to breaking migration (the queue of requests on
> > > > the host needs to be migrated, else the new host will have no idea of
> > > > the queue).
> > > 
> > > It is already migrated as part of virtio_rng_save's call to virtio_save.
> > >  On the loading side, virtio_rng_process condenses all requests into one
> > > and chr_read fills in as many virtqueue buffers as possible from the
> > > single request.
> > 
> > Thanks! So this looks broken. The contract between virtio-rng and the rng
> > backend is "one chr_read callback per one rng_backend_request_entropy",
> > regardless of the request size. It guarantees that chr_read will get at
> > least one byte, nothing else. If the one condensed request is bigger than
> > what the backend is able to give at the moment, there may be unsatisfied
> > virtio buffers left in the queue.
> > 
> > One way of fixing this would be to have chr_read call virtio_rng_process
> > if it ran out of backend data but not out of virtio buffers. Otherwise
> > those buffers will be slowly drained by the rate limit timer, which is
> > not optimal (especially in the absence of rate limit :)
> 
> Basically, we could just revert this commit:
> 
> commit 4621c1768ef5d12171cca2aa1473595ecb9f1c9e
> Author: Amit Shah 
> Date:   Wed Nov 21 11:21:19 2012 +0530
> 
> virtio-rng: remove extra request for entropy
> 
> If we got fewer bytes from the backend than requested, don't poke the
> backend for more bytes; the guest will ask for more (or if the guest has
> already asked for more, the backend knows about it via handle_input()

OK, agreed.

It makes sense to revert this so the live migration scenario is sane.
Without live migration reverting this commit doesn't get us any
benefit, and that's why I had removed it.

Do you want to post a patch reverting this too?

Thanks,

Amit

Re: [Qemu-devel] [Qemu-ppc] [PATCH] target-ppc: fix sync of SPR_SDR1 with KVM

2016-03-02 Thread David Gibson

On Wed, Mar 02, 2016 at 11:06:19AM +1100, David Gibson wrote:
> On Tue, Mar 01, 2016 at 07:03:10PM +0100, Greg Kurz wrote:
> > The gdbstub can't access guest memory with current master. This is what you
> > get in gdb:
> > 
> > 0x19b8 in main (argc= > memory
> > at address 0x3fffce4d3620>, argv= > memory
> > at address 0x3fffce4d3628>) at fp.c:11
> > 
> > Bisect leads to the following commit:
> > 
> > commit fa48b4328c39b2532e47efcfcba6d4031512f514
> > Author: David Gibson 
> > Date:   Tue Feb 9 09:30:21 2016 +1000
> > 
> > target-ppc: Remove hack for ppc_hash64_load_hpte*() with HV KVM
> > 
> > Looking at the env->external_htab users, I've spotted a behaviour change in
> > kvm_arch_get_registers(), which now always calls ppc_store_sdr1().
> > 
> > Checking kvmppc_kern_htab, like it is done in the MMU helpers, fixes the
> > issue.
> > 
> > Signed-off-by: Greg Kurz 
> 
> Mea culpa.  Good catch, applied to ppc-for-2.6, thanks.

Ah.. wait.. this patch breaks compile for the ppc32 target.  Can you
fix this please.

> > ---
> >  target-ppc/kvm.c |2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> > index d67c169ba324..dbc37f25af2b 100644
> > --- a/target-ppc/kvm.c
> > +++ b/target-ppc/kvm.c
> > @@ -1190,7 +1190,7 @@ int kvm_arch_get_registers(CPUState *cs)
> >  return ret;
> >  }
> >  
> > -if (!env->external_htab) {
> > +if (!kvmppc_kern_htab && !env->external_htab) {
> >  ppc_store_sdr1(env, sregs.u.s.sdr1);
> >  }
> >  
> > 
> 



-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 1/3] arm: gic: add GICType

2016-03-02 Thread Peter Xu

On Wed, Mar 02, 2016 at 02:59:57PM +0100, Markus Armbruster wrote:
> Peter Xu  writes:
> > What's "query-schema"? Is that a QMP command?
> 
> Yes.
> 
> More than you ever wanted to know:
> http://events.linuxfoundation.org/sites/events/files/slides/armbru-qemu-introspection.pdf
> https://www.youtube.com/watch?v=IEa8Ao8_B9o=PLW3ep1uCIRfyLNSu708gWG7uvqlolk0ep=28

Thanks for the pointers and cool slides! It's a good thing to pick
up. :-)

It'll be cool we treat data as codes, and codes as data.

I see that qapi-introspect branch is not there now. Is it merged to
some other branch already? When will it be there in QEMU master
(still not in, right?)? Just curious about it.

> 
> > What I meant is that, we can define the following (for example):
> >
> > { 'struct': 'GICCapInfo',
> >   'data': [
> > 'version': 'int',
> > 'emulated': 'bool',
> > 'kernel': 'bool'] }
> >
> > And:
> >
> > { 'command': 'query-gic-capability',
> >   'returns': ['GICCapInfo'] }
> >
> > So we can keep this schema as it is when new versions arrive. We
> > can just push another element in.
> 
> To answer questions of the sort "can this QEMU version do X?", it's
> often useful to tie X to a schema change that is visible in the result
> of query-schema.

Now I can understand. For this case, I guess both ways work, right?
Considering that if "query-schema" is still not there, I'd still
prefer the "array" solution. At least, it can keep the schema
several lines shorter (as you have mentioned already, it's *big*
enough :). Also, even we would have "query-schema", I would still
prefer not change schema unless necessary. What do you think?

Thanks!
Peter

Re: [Qemu-devel] [PATCH v3 1/1] block/sheepdog: fix argument passed to qemu_strtoul()

2016-03-02 Thread Hitoshi Mitake

On Thu, Mar 3, 2016 at 1:36 AM, Jeff Cody  wrote:

> On Wed, Mar 02, 2016 at 05:32:11PM +0100, Max Reitz wrote:
> > On 02.03.2016 17:24, Jeff Cody wrote:
> > > The function qemu_strtoul() reads 'unsigned long' sized data,
> > > which is larger than uint32_t on 64-bit machines.
> > >
> > > Even though the snap_id field in the header is 32-bits, we must
> > > accomodate the full size in qemu_strtoul().
> > >
> > > This patch also adds more meaningful error handling to the
> > > qemu_strtoul() call, and subsequent results.
> > >
> > > Reported-by: Paolo Bonzini 
> > > Signed-off-by: Jeff Cody 
> > > ---
> > >  block/sheepdog.c | 11 +++
> > >  1 file changed, 7 insertions(+), 4 deletions(-)
> >
> > Another problem with this function is that it doesn't always set errp on
> > error. Actually, this patch introduces the first instance where it does.
> >
> > qemu-img will not print an error if errp is not set; it actually ignores
> > bdrv_snapshot_delete_by_id_or_name()'s return value. So this is a real
> > issue that should be fixed as well.
> >
>
> I'll (or perhaps one of the sheepdog maintainers?) will handle that in
> subsequent patch(es).
>

Thanks for your pointing. I didn't notice the problem when I posted the
patch. I'll fix it later.

Thanks,
Hitoshi

Re: [Qemu-devel] [PATCH v3 1/1] block/sheepdog: fix argument passed to qemu_strtoul()

2016-03-02 Thread Hitoshi Mitake

On Thu, Mar 3, 2016 at 1:24 AM, Jeff Cody  wrote:

> The function qemu_strtoul() reads 'unsigned long' sized data,
> which is larger than uint32_t on 64-bit machines.
>
> Even though the snap_id field in the header is 32-bits, we must
> accomodate the full size in qemu_strtoul().
>
> This patch also adds more meaningful error handling to the
> qemu_strtoul() call, and subsequent results.
>
> Reported-by: Paolo Bonzini 
> Signed-off-by: Jeff Cody 
> ---
>  block/sheepdog.c | 11 +++
>  1 file changed, 7 insertions(+), 4 deletions(-)
>

Thanks a lot for your fix!

Reviewed-by: Hitoshi Mitake 

Thanks,
Hitoshi


>
> diff --git a/block/sheepdog.c b/block/sheepdog.c
> index 8739acc..87f0027 100644
> --- a/block/sheepdog.c
> +++ b/block/sheepdog.c
> @@ -2543,7 +2543,7 @@ static int sd_snapshot_delete(BlockDriverState *bs,
>const char *name,
>Error **errp)
>  {
> -uint32_t snap_id = 0;
> +unsigned long snap_id = 0;
>  char snap_tag[SD_MAX_VDI_TAG_LEN];
>  Error *local_err = NULL;
>  int fd, ret;
> @@ -2565,12 +2565,15 @@ static int sd_snapshot_delete(BlockDriverState *bs,
>  memset(buf, 0, sizeof(buf));
>  memset(snap_tag, 0, sizeof(snap_tag));
>  pstrcpy(buf, SD_MAX_VDI_LEN, s->name);
> -if (qemu_strtoul(snapshot_id, NULL, 10, (unsigned long *)_id)) {
> -return -1;
> +ret = qemu_strtoul(snapshot_id, NULL, 10, _id);
> +if (ret || snap_id > UINT32_MAX) {
> +error_setg(errp, "Invalid snapshot ID: %s",
> + snapshot_id ? snapshot_id : "");
> +return -EINVAL;
>  }
>
>  if (snap_id) {
> -hdr.snapid = snap_id;
> +hdr.snapid = (uint32_t) snap_id;
>  } else {
>  pstrcpy(snap_tag, sizeof(snap_tag), snapshot_id);
>  pstrcpy(buf + SD_MAX_VDI_LEN, SD_MAX_VDI_TAG_LEN, snap_tag);
> --
> 1.9.3
>
>
>

Re: [Qemu-devel] [PATCH RFC v2 1/2] Add param Error** to msi_init() & modify the callers

2016-03-02 Thread Cao jin


hi, Markus
Thanks for still remembering this patch, and quite a lot response:)
I will give a appropriate response after I read & understand them 
all.(so, not cc other guys here)


On 03/02/2016 05:13 PM, Markus Armbruster wrote:

This got lost over the Christmas break, sorry.

Cc'ing Marcel for additional PCI expertise.

Cao jin  writes:


msi_init() is a supporting function in PCI device initialization,
in order to convert .init() to .realize(), it should be modified first.


"Supporting function" doesn't imply "should use Error to report
errors".  HACKING explains:

 Use the simplest suitable method to communicate success / failure to
 callers.  Stick to common methods: non-negative on success / -1 on
 error, non-negative / -errno, non-null / null, or Error objects.

 Example: when a function returns a non-null pointer on success, and it
 can fail only in one way (as far as the caller is concerned), returning
 null on failure is just fine, and certainly simpler and a lot easier on
 the eyes than propagating an Error object through an Error ** parameter.

 Example: when a function's callers need to report details on failure
 only the function really knows, use Error **, and set suitable errors.

 Do not report an error to the user when you're also returning an error
 for somebody else to handle.  Leave the reporting to the place that
 consumes the error returned.

As we'll see in your patch of msi.c, msi_init() can fail in multiple
ways, and uses -errno to comunicate them.  That can be okay even in
realize().  It also reports to the user.  That's what makes it
unsuitable for realize().


Also modify the callers


Actually, you're *fixing* callers!  But the bugs aren't 100% clear, yet,
see below for details.  Once we know what the bugs are, we'll want to
reword the commit message to describe the bugs and their impact.

I recommend to skip ahead to msi.c, then come back to the device models.


Bonus: add more comment for msi_init().
Signed-off-by: Cao jin 
---
  hw/audio/intel-hda.c   | 10 -
  hw/ide/ich.c   |  2 +-
  hw/net/vmxnet3.c   | 13 +++---
  hw/pci-bridge/ioh3420.c|  7 +++-
  hw/pci-bridge/pci_bridge_dev.c |  8 +++-
  hw/pci-bridge/xio3130_downstream.c |  8 +++-
  hw/pci-bridge/xio3130_upstream.c   |  8 +++-
  hw/pci/msi.c   | 18 +++--
  hw/scsi/megasas.c  | 15 +--
  hw/scsi/vmw_pvscsi.c   | 13 --
  hw/usb/hcd-xhci.c  | 81 +-
  hw/vfio/pci.c  | 20 +-
  include/hw/pci/msi.h   |  4 +-
  13 files changed, 135 insertions(+), 72 deletions(-)

diff --git a/hw/audio/intel-hda.c b/hw/audio/intel-hda.c
index 433463e..0d770131 100644
--- a/hw/audio/intel-hda.c
+++ b/hw/audio/intel-hda.c
@@ -1130,6 +1130,7 @@ static void intel_hda_realize(PCIDevice *pci, Error 
**errp)
  {
  IntelHDAState *d = INTEL_HDA(pci);
  uint8_t *conf = d->pci.config;
+int ret;

  d->name = object_get_typename(OBJECT(d));

@@ -1142,11 +1143,18 @@ static void intel_hda_realize(PCIDevice *pci, Error 
**errp)
"intel-hda", 0x4000);
  pci_register_bar(>pci, 0, 0, >mmio);
  if (d->msi) {
-msi_init(>pci, d->old_msi_addr ? 0x50 : 0x60, 1, true, false);
+ret = msi_init(>pci, d->old_msi_addr ? 0x50 : 0x60, 1, true,
+false, errp);
+if (ret < 0) {
+goto cleanup_on_msi_fail;
+}
  }

  hda_codec_bus_init(DEVICE(pci), >codecs, sizeof(d->codecs),
 intel_hda_response, intel_hda_xfer);
+
+cleanup_on_msi_fail:
+object_unref(OBJECT(>mmio));
  }



This is a bug fix.  Two bugs, actually.

Bug#1: we ignore msi_init() failure.  If it fails because the board
doesn't support MSI, the failure is ignored silently.  If it fails
because the PCI config space offset is already occupied, we report the
error to the user, but still ignore it.  The latter feels like a
programming error to me.

Your patch fixes realize() to fail when msi_init() fails.

Bug#2: we report errors with error_report_err() instead through errp.
This is because we use pci_add_capability() instead of
pci_add_capability2().

I don't have the time to review the other devices you patch.  Both bugs
should be easy to spot in the patch: if the value of msi_init() is
ignored before the patch, the device got bug#1, and if the patched
device uses realize(), it got bug#2.

The patch could be split up into parts that fix just one thing.  Not
sure that's worth the bother.


  static void intel_hda_exit(PCIDevice *pci)
diff --git a/hw/ide/ich.c b/hw/ide/ich.c
index 16925fa..94b1809 100644
--- a/hw/ide/ich.c
+++ b/hw/ide/ich.c
@@ -145,7 +145,7 @@ static void pci_ich9_ahci_realize(PCIDevice *dev, Error 
**errp)
  /* Although the AHCI 1.3

[Qemu-devel] Question regarding a special case VM communication

2016-03-02 Thread Jidong Xiao

Hello, All,

I am facing this special case of VM communication.

I have a Linux host machine, on which two Qemu/KVM virtual machines
are launched. Let's say they are VM1 and VM2, and their corresponding
processes on the host are Qemu1 and Qemu2. Inside VM2, I need nested
virtualization, i.e., A qemu process is started inside VM2, let's say
it's Qemu3, and a virtual machine VM3 is created accordingly.

What I want to achieve is: communication between Qemu1 and Qemu3. For
example, if I need to migrate VM1 from Qemu1 to Qemu3, how do I
achieve that? One naïve solution is just go through the TCP stack,
however, since they are in the same host machine, is there any fast or
efficient way to implement it? I.e., Something similar to Unix domain
socket. I know Unix domain socket is not applicable to case like this,
but I am just looking for something similar to that. Does anyone have
any suggestions, ideas? Thanks.

-Jidong

[Qemu-devel] [PATCH] ui/cocoa.m: Allow console selection using keypad number keys

2016-03-02 Thread Programmingkid

This patch allows for the user to use the keypad number keys to select a
console.

Signed-off-by: John Arbuckle 

---
This patch depends on this patch: http://patchwork.ozlabs.org/patch/591221/

 ui/cocoa.m | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/ui/cocoa.m b/ui/cocoa.m
index 65301ff..d3310bf 100644
--- a/ui/cocoa.m
+++ b/ui/cocoa.m
@@ -562,13 +562,10 @@ QemuCocoaView *cocoaView;
 
 // handle control + alt Key Combos (ctrl+alt is reserved for QEMU)
 if (([event modifierFlags] & NSControlKeyMask) && ([event 
modifierFlags] & NSAlternateKeyMask)) {
-switch (keycode) {
-
-// enable graphic console
-case Q_KEY_CODE_1 ... Q_KEY_CODE_9: // '1' to '9' keys
-console_select(keycode - 11);
-break;
-}
+int selected_console = atoi([[event characters]
+   cStringUsingEncoding:
+   NSASCIIStringEncoding]);
+console_select(selected_console - 1);
 
 // handle keys for graphic console
 } else if (qemu_console_is_graphic(NULL)) {
-- 
2.7.2

[Qemu-devel] [PATCH v2] ui/cocoa.m: Replace pc/xt keyboard keycode array with QKeyCode

2016-03-02 Thread Programmingkid

The old pc/xt keyboard keycode array is replaced with QEMU's own QKeyCode
layout.

Signed-off-by: John Arbuckle 

---
Removed dependency on MacKeys.h. 
Included Carbon.h.
Replaced MacKeys' constants with Carbon's constants.
Replaced numbers in handleEvent: with QKeyCode constants.

Maintainer note:
Please apply these patches before testing: 
- http://patchwork.ozlabs.org/patch/590829/  [v2] qapi-schema.json: Add 
kp_equals and power keys
*Sorry but git send-email refuses to work on my system.
You could apply this patch also if you want:
http://patchwork.ozlabs.org/patch/591028/ input-keymap.c: Add keypad equal and 
power keys

 ui/cocoa.m | 319 +++--
 1 file changed, 143 insertions(+), 176 deletions(-)

diff --git a/ui/cocoa.m b/ui/cocoa.m
index 3ee5549..65301ff 100644
--- a/ui/cocoa.m
+++ b/ui/cocoa.m
@@ -33,6 +33,7 @@
 #include "sysemu/sysemu.h"
 #include "qmp-commands.h"
 #include "sysemu/blockdev.h"
+#include 
 
 #ifndef MAC_OS_X_VERSION_10_5
 #define MAC_OS_X_VERSION_10_5 1050
@@ -72,178 +73,141 @@ bool stretch_video;
 NSTextField *pauseLabel;
 NSArray * supportedImageFileTypes;
 
-// keymap conversion
-int keymap[] =
-{
-//  SdlImacImacHSdlH104xtH  104xtC  sdl
-30, //  0   0x000x1eA   QZ_a
-31, //  1   0x010x1fS   QZ_s
-32, //  2   0x020x20D   QZ_d
-33, //  3   0x030x21F   QZ_f
-35, //  4   0x040x23H   QZ_h
-34, //  5   0x050x22G   QZ_g
-44, //  6   0x060x2cZ   QZ_z
-45, //  7   0x070x2dX   QZ_x
-46, //  8   0x080x2eC   QZ_c
-47, //  9   0x090x2fV   QZ_v
-0,  //  10  0x0AUndefined
-48, //  11  0x0B0x30B   QZ_b
-16, //  12  0x0C0x10Q   QZ_q
-17, //  13  0x0D0x11W   QZ_w
-18, //  14  0x0E0x12E   QZ_e
-19, //  15  0x0F0x13R   QZ_r
-21, //  16  0x100x15Y   QZ_y
-20, //  17  0x110x14T   QZ_t
-2,  //  18  0x120x021   QZ_1
-3,  //  19  0x130x032   QZ_2
-4,  //  20  0x140x043   QZ_3
-5,  //  21  0x150x054   QZ_4
-7,  //  22  0x160x076   QZ_6
-6,  //  23  0x170x065   QZ_5
-13, //  24  0x180x0d=   QZ_EQUALS
-10, //  25  0x190x0a9   QZ_9
-8,  //  26  0x1A0x087   QZ_7
-12, //  27  0x1B0x0c-   QZ_MINUS
-9,  //  28  0x1C0x098   QZ_8
-11, //  29  0x1D0x0b0   QZ_0
-27, //  30  0x1E0x1b]   QZ_RIGHTBRACKET
-24, //  31  0x1F0x18O   QZ_o
-22, //  32  0x200x16U   QZ_u
-26, //  33  0x210x1a[   QZ_LEFTBRACKET
-23, //  34  0x220x17I   QZ_i
-25, //  35  0x230x19P   QZ_p
-28, //  36  0x240x1cENTER   QZ_RETURN
-38, //  37  0x250x26L   QZ_l
-36, //  38  0x260x24J   QZ_j
-40, //  39  0x270x28'   QZ_QUOTE
-37, //  40  0x280x25K   QZ_k
-39, //  41  0x290x27;   QZ_SEMICOLON
-43, //  42  0x2A0x2b\   QZ_BACKSLASH
-51, //  43  0x2B0x33,   QZ_COMMA
-53, //  44  0x2C0x35/   QZ_SLASH
-49, //  45  0x2D0x31N   QZ_n
-50, //  46  0x2E0x32M   QZ_m
-52, //  47  0x2F0x34.   QZ_PERIOD
-15, //  48  0x300x0fTAB QZ_TAB
-57, //  49  0x310x39SPACE   QZ_SPACE
-41, //  50  0x320x29`   QZ_BACKQUOTE
-14, //  51  0x330x0eBKSPQZ_BACKSPACE
-0,  //  52  0x34Undefined
-1,  //  53  0x350x01ESC QZ_ESCAPE
-220, // 54  0x360xdcE0,5C   R GUI   QZ_RMETA
-219, // 55  0x370xdbE0,5B   L GUI   QZ_LMETA
-42, //  56  0x380x2aL SHFT  QZ_LSHIFT
-58, //  57  0x390x3aCAPSQZ_CAPSLOCK
-56, //  58  0x3A0x38L ALT   QZ_LALT
-29, //  59  0x3B0x1dL CTRL  QZ_LCTRL
-54, //  60  0x3C0x36R SHFT  QZ_RSHIFT
-184,//  61  0x3D0xb8E0,38   R ALT   QZ_RALT
-157,//  62  0x3E

Re: [Qemu-devel] [Qemu-ppc] [PATCH qemu v13 04/16] spapr_iommu: Introduce "enabled" state for TCE table

2016-03-02 Thread David Gibson

On Tue, Mar 01, 2016 at 08:10:29PM +1100, Alexey Kardashevskiy wrote:
> Currently TCE tables are created once at start and their sizes never
> change. We are going to change that by introducing a Dynamic DMA windows
> support where DMA configuration may change during the guest execution.
> 
> This changes spapr_tce_new_table() to create an empty zero-size IOMMU
> memory region (IOMMU MR). Only LIOBN is assigned by the time of creation.
> It still will be called once at the owner object (VIO or PHB) creation.
> 
> This introduces an "enabled" state for TCE table objects with two
> helper functions - spapr_tce_table_enable()/spapr_tce_table_disable().
> - spapr_tce_table_enable() receives TCE table parameters, allocates
> a guest view of the TCE table (in the user space or KVM) and
> sets the correct size on the IOMMU MR.
> - spapr_tce_table_disable() disposes the table and resets the IOMMU MR
> size.
> 
> This changes the PHB reset handler to do the default DMA initialization
> instead of spapr_phb_realize(). This does not make differenct now but
> later with more than just one DMA window, we will have to remove them all
> and create the default one on a system reset.
> 
> No visible change in behaviour is expected except the actual table
> will be reallocated every reset. We might optimize this later.
> 
> The other way to implement this would be dynamically create/remove
> the TCE table QOM objects but this would make migration impossible
> as the migration code expects all QOM objects to exist at the receiver
> so we have to have TCE table objects created when migration begins.
> 
> spapr_tce_table_do_enable() is separated from from spapr_tce_table_enable()
> as later it will be called at the sPAPRTCETable post-migration stage when
> it already has all the properties set after the migration.
> 
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: David Gibson 

Although there's one nit that could be improved:


> ---
>  hw/ppc/spapr_iommu.c   | 80 
> +++---
>  hw/ppc/spapr_pci.c | 21 +
>  hw/ppc/spapr_vio.c |  9 +++---
>  include/hw/ppc/spapr.h | 10 +++
>  4 files changed, 80 insertions(+), 40 deletions(-)
> 
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 8132f64..e66e128 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -174,15 +174,8 @@ static int spapr_tce_table_realize(DeviceState *dev)
>  sPAPRTCETable *tcet = SPAPR_TCE_TABLE(dev);
>  
>  tcet->fd = -1;
> -tcet->table = spapr_tce_alloc_table(tcet->liobn,
> -tcet->page_shift,
> -tcet->nb_table,
> ->fd,
> -tcet->need_vfio);
> -
>  memory_region_init_iommu(>iommu, OBJECT(dev), _iommu_ops,
> - "iommu-spapr",
> - (uint64_t)tcet->nb_table << tcet->page_shift);
> + "iommu-spapr", 0);
>  
>  QLIST_INSERT_HEAD(_tce_tables, tcet, list);
>  
> @@ -224,14 +217,10 @@ void spapr_tce_set_need_vfio(sPAPRTCETable *tcet, bool 
> need_vfio)
>  tcet->table = newtable;
>  }
>  
> -sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn,
> -   uint64_t bus_offset,
> -   uint32_t page_shift,
> -   uint32_t nb_table,
> -   bool need_vfio)
> +sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn)
>  {
>  sPAPRTCETable *tcet;
> -char tmp[64];
> +char tmp[32];
>  
>  if (spapr_tce_find_by_liobn(liobn)) {
>  fprintf(stderr, "Attempted to create TCE table with duplicate"
> @@ -239,16 +228,8 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, 
> uint32_t liobn,
>  return NULL;
>  }
>  
> -if (!nb_table) {
> -return NULL;
> -}
> -
>  tcet = SPAPR_TCE_TABLE(object_new(TYPE_SPAPR_TCE_TABLE));
>  tcet->liobn = liobn;
> -tcet->bus_offset = bus_offset;
> -tcet->page_shift = page_shift;
> -tcet->nb_table = nb_table;
> -tcet->need_vfio = need_vfio;
>  
>  snprintf(tmp, sizeof(tmp), "tce-table-%x", liobn);
>  object_property_add_child(OBJECT(owner), tmp, OBJECT(tcet), NULL);
> @@ -258,14 +239,65 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, 
> uint32_t liobn,
>  return tcet;
>  }
>  
> +static void spapr_tce_table_do_enable(sPAPRTCETable *tcet)
> +{
> +if (!tcet->nb_table) {
> +return;
> +}
> +
> +tcet->table = spapr_tce_alloc_table(tcet->liobn,
> +tcet->page_shift,
> +tcet->nb_table,
> +>fd,
> +tcet->need_vfio);
> +
> +

Re: [Qemu-devel] [Qemu-ppc] [PATCH qemu v13 06/16] spapr_pci: Reset DMA config on PHB reset

2016-03-02 Thread David Gibson

On Tue, Mar 01, 2016 at 08:10:31PM +1100, Alexey Kardashevskiy wrote:
> LoPAPR dictates that during system reset all DMA windows must be removed
> and the default DMA32 window must be created so does the patch.
> 
> At the moment there is just one window supported so no change in
> behaviour is expected.
> 
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: David Gibson 

> ---
>  hw/ppc/spapr_iommu.c   |  2 +-
>  hw/ppc/spapr_pci.c | 29 +++--
>  include/hw/ppc/spapr.h |  1 +
>  3 files changed, 25 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index ba9ddbb..8a88a74 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -279,7 +279,7 @@ void spapr_tce_table_enable(sPAPRTCETable *tcet,
>  spapr_tce_table_do_enable(tcet);
>  }
>  
> -static void spapr_tce_table_disable(sPAPRTCETable *tcet)
> +void spapr_tce_table_disable(sPAPRTCETable *tcet)
>  {
>  if (!tcet->enabled) {
>  return;
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 7b40687..ee0fecf 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -825,6 +825,19 @@ static int spapr_phb_dma_window_enable(sPAPRPHBState 
> *sphb,
>  return 0;
>  }
>  
> +static int spapr_phb_dma_window_disable(sPAPRPHBState *sphb, uint32_t liobn)
> +{
> +sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn);
> +
> +if (!tcet) {
> +return -1;
> +}
> +
> +spapr_tce_table_disable(tcet);
> +
> +return 0;
> +}
> +
>  /* Macros to operate with address in OF binding to PCI */
>  #define b_x(x, p, l)(((x) & ((1<<(l))-1)) << (p))
>  #define b_n(x)  b_x((x), 31, 1) /* 0 if relocatable */
> @@ -1412,12 +1425,6 @@ static void spapr_phb_realize(DeviceState *dev, Error 
> **errp)
>  memory_region_add_subregion(>iommu_root, 0,
>  spapr_tce_get_iommu(tcet));
>  
> -/* Register default 32bit DMA window */
> -spapr_phb_dma_window_enable(sphb, sphb->dma_liobn,
> -SPAPR_TCE_PAGE_SHIFT,
> -sphb->dma_win_addr,
> -sphb->dma_win_size);
> -
>  sphb->msi = g_hash_table_new_full(g_int_hash, g_int_equal, g_free, 
> g_free);
>  }
>  
> @@ -1434,6 +1441,16 @@ static int spapr_phb_children_reset(Object *child, 
> void *opaque)
>  
>  static void spapr_phb_reset(DeviceState *qdev)
>  {
> +sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(qdev);
> +
> +spapr_phb_dma_window_disable(sphb, sphb->dma_liobn);
> +
> +/* Register default 32bit DMA window */
> +spapr_phb_dma_window_enable(sphb, sphb->dma_liobn,
> +SPAPR_TCE_PAGE_SHIFT,
> +sphb->dma_win_addr,
> +sphb->dma_win_size);
> +
>  /* Reset the IOMMU state */
>  object_child_foreach(OBJECT(qdev), spapr_phb_children_reset, NULL);
>  
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index bdf27ec..8aa0c45 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -571,6 +571,7 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, 
> uint32_t liobn);
>  void spapr_tce_table_enable(sPAPRTCETable *tcet,
>  uint32_t page_shift, uint64_t bus_offset,
>  uint32_t nb_table, bool vfio_accel);
> +void spapr_tce_table_disable(sPAPRTCETable *tcet);
>  void spapr_tce_set_need_vfio(sPAPRTCETable *tcet, bool need_vfio);
>  
>  MemoryRegion *spapr_tce_get_iommu(sPAPRTCETable *tcet);

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 2/2] hw/9pfs: fix alignment issue when host filesystem block size is larger than client msize

2016-03-02 Thread Jevon Qiao


Any further question/comment on this patch?

Thanks,
Jevon
On 24/2/16 15:04, Jevon Qiao wrote:

[Removing ceph-devel alias]

Hi Aneesh,

Any further comment on my reply below?

Thanks,
Jevon
On 19/2/16 16:56, Jevon Qiao wrote:

Hi Aneesh,

I am not sure I understand the details correctly. iounit is the size
that we use in client_read to determine the  size in which
we should request I/O from the client. But we still can't do I/O in 
size
larger than s->msize. If you look at the client side (kernel 9p fs), 
you

will find

rsize = fid->iounit;
if (!rsize || rsize > clnt->msize-P9_IOHDRSZ)
rsize = clnt->msize - P9_IOHDRSZ;

Yes, I know this.

if your iounit calculation ends up zero, that should be handled
correctly by

 if (!iounit) {
 iounit = s->msize - P9_IOHDRSZ;
 }
 return iounit;


So what is the issue here. ?
This will result in an alignment issue while mapping the I/O 
requested by

client into pages in the function of p9_nr_pages().

   int p9_nr_pages(char *data, int len)
   {
unsigned long start_page, end_page;
start_page =  (unsigned long)data >> PAGE_SHIFT;
end_page = ((unsigned long)data + len + PAGE_SIZE - 1) >>
   PAGE_SHIFT;
return end_page - start_page;
   }

Please see the following experiment I did without the fix.

1) Start qemu with cephfs,

   $ qemu-system-x86_64 /root/CentOS---6.6-64bit---2015-03-06-a.qcow2
   -smp 4 -m 4096 -fsdev
   cephfs,security_model=passthrough,id=fsdev0,path=/ -device
   virtio-9p-pci,id=fs0,fsdev=fsdev0,mount_tag=cephfs --enable-kvm
   -nographic -net nic -net tap,ifname=tap0,script=no,downscript=no


2) Mount the fs in the guest.

   [root@localhost ~]# mount -t 9p -o trans=virtio,version=9p2000.L
   cephfs /mnt
   [root@localhost ~]# ls -lah /mnt/8kfile
   -rw-r--r-- 1 root root 8.0K 2016-02-19 09:37 /mnt/8kfile

In this case, I used the default msize which is 8192(in Byte). Since 
cephfs

is using 4M as the f_bsize, the iounit will be 8168 as P9_IOHDRSZ is
equal to 24.

3) Run the following systemtap script to trace the paging result,

   [root@localhost ~]# cat p9_read.stp
   probe kernel.function("p9_virtio_zc_request").call
   {
printf("p9_virtio_zc_request: inlen size is %d\n", int_arg(5));
   }

   probe kernel.function("p9_nr_pages").call
   {
printf("p9_nr_pages: start_page = %ld\n", int_arg(1) >> 12);
printf("p9_nr_pages: end_age = %ld\n", (int_arg(1) + 8168 +
   4096 -1) >> 12);
   }

4) The output I got when I copied out the file /mnt/8kfile to /tmp/ 
directory,


   p9_virtio_zc_request: inlen size is 8168
   p9_nr_pages: start_page = 34293757815
   p9_nr_pages: end_age = 34293757818

Per the text in red(start_page = 34293757815, end_page = 34293757818),
it turns out 8k data will be mapped into three pages. This could hurt 
the

performance.

Actually, I enabled the cephfs debug functionality added by me to see
how the data is distributed in this case, the result is as follows,

   CEPHFS_DEBUG: cephfs_preadv iov_len=4096
   CEPHFS_DEBUG: cephfs_preadv iov_len=4072
   CEPHFS_DEBUG: cephfs_preadv iov_len=24

This patch aims to fix this. And the result turns out it works quite 
well, all the

data is well aligned.

   p9_virtio_zc_request: inlen size is 4096
   p9_nr_pages: start_page = 34203171814
   p9_nr_pages: end_age = 34203171815
   p9_virtio_zc_request: inlen size is 4096
   p9_nr_pages: start_page = 34203171815
   p9_nr_pages: end_age = 34203171816

   CEPHFS_DEBUG: cephfs_preadv iov_len=4096
   CEPHFS_DEBUG: cephfs_preadv iov_len=4096

Thanks,
Jevon

-aneesh

Re: [Qemu-devel] [Qemu-ppc] [PATCH qemu v13 01/16] memory: Fix IOMMU replay base address

2016-03-02 Thread David Gibson

On Tue, Mar 01, 2016 at 08:10:26PM +1100, Alexey Kardashevskiy wrote:
> Since a788f227 "memory: Allow replay of IOMMU mapping notifications"
> when new VFIO listener is added, all existing IOMMU mappings are
> replayed. However there is a problem that the base address of
> an IOMMU memory region (IOMMU MR) is ignored which is not a problem
> for the existing user (which is pseries) with its default 32bit DMA
> window starting at 0 but it is if there is another DMA window.
> 
> This stores the IOMMU's offset_within_address_space and adjusts
> the IOVA before calling vfio_dma_map/vfio_dma_unmap.
> 
> As the IOMMU notifier expects IOVA offset rather than the absolute
> address, this also adjusts IOVA in sPAPR H_PUT_TCE handler before
> calling notifier(s).
> 
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: David Gibson 

It's not worth reworking just for this, but it might be slightly
preferable for merge purposes to split out the fix to put_tce_emu
(spapr code) away from the other changes (vfio code).

> ---
>  hw/ppc/spapr_iommu.c  |  2 +-
>  hw/vfio/common.c  | 14 --
>  include/hw/vfio/vfio-common.h |  1 +
>  3 files changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 7dd4588..277f289 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -277,7 +277,7 @@ static target_ulong put_tce_emu(sPAPRTCETable *tcet, 
> target_ulong ioba,
>  tcet->table[index] = tce;
>  
>  entry.target_as = _space_memory,
> -entry.iova = ioba & page_mask;
> +entry.iova = (ioba - tcet->bus_offset) & page_mask;
>  entry.translated_addr = tce & page_mask;
>  entry.addr_mask = ~page_mask;
>  entry.perm = spapr_tce_iommu_access_flags(tce);
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 55e87d3..9bf4c3b 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -257,14 +257,14 @@ static void vfio_iommu_map_notify(Notifier *n, void 
> *data)
>  VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
>  VFIOContainer *container = giommu->container;
>  IOMMUTLBEntry *iotlb = data;
> +hwaddr iova = iotlb->iova + giommu->offset_within_address_space;
>  MemoryRegion *mr;
>  hwaddr xlat;
>  hwaddr len = iotlb->addr_mask + 1;
>  void *vaddr;
>  int ret;
>  
> -trace_vfio_iommu_map_notify(iotlb->iova,
> -iotlb->iova + iotlb->addr_mask);
> +trace_vfio_iommu_map_notify(iova, iova + iotlb->addr_mask);
>  
>  /*
>   * The IOMMU TLB entry we have just covers translation through
> @@ -291,21 +291,21 @@ static void vfio_iommu_map_notify(Notifier *n, void 
> *data)
>  
>  if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
>  vaddr = memory_region_get_ram_ptr(mr) + xlat;
> -ret = vfio_dma_map(container, iotlb->iova,
> +ret = vfio_dma_map(container, iova,
> iotlb->addr_mask + 1, vaddr,
> !(iotlb->perm & IOMMU_WO) || mr->readonly);
>  if (ret) {
>  error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
>   "0x%"HWADDR_PRIx", %p) = %d (%m)",
> - container, iotlb->iova,
> + container, iova,
>   iotlb->addr_mask + 1, vaddr, ret);
>  }
>  } else {
> -ret = vfio_dma_unmap(container, iotlb->iova, iotlb->addr_mask + 1);
> +ret = vfio_dma_unmap(container, iova, iotlb->addr_mask + 1);
>  if (ret) {
>  error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
>   "0x%"HWADDR_PRIx") = %d (%m)",
> - container, iotlb->iova,
> + container, iova,
>   iotlb->addr_mask + 1, ret);
>  }
>  }
> @@ -377,6 +377,8 @@ static void vfio_listener_region_add(MemoryListener 
> *listener,
>   */
>  giommu = g_malloc0(sizeof(*giommu));
>  giommu->iommu = section->mr;
> +giommu->offset_within_address_space =
> +section->offset_within_address_space;
>  giommu->container = container;
>  giommu->n.notify = vfio_iommu_map_notify;
>  QLIST_INSERT_HEAD(>giommu_list, giommu, giommu_next);
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index f037f3c..9ffa681 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -80,6 +80,7 @@ typedef struct VFIOContainer {
>  typedef struct VFIOGuestIOMMU {
>  VFIOContainer *container;
>  MemoryRegion *iommu;
> +hwaddr offset_within_address_space;
>  Notifier n;
>  QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
>  } VFIOGuestIOMMU;

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_

Re: [Qemu-devel] [Qemu-ppc] [PATCH qemu v13 02/16] spapr_pci: Move DMA window enablement to a helper

2016-03-02 Thread David Gibson

On Tue, Mar 01, 2016 at 08:10:27PM +1100, Alexey Kardashevskiy wrote:
> We are going to have multiple DMA windows soon so let's start preparing.
> 
> This adds a new helper to create a DMA window and makes use of it in
> sPAPRPHBState::realize().
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
>  hw/ppc/spapr_pci.c | 40 +++-
>  1 file changed, 27 insertions(+), 13 deletions(-)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 3d1145e..248f20a 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -803,6 +803,29 @@ static char *spapr_phb_get_loc_code(sPAPRPHBState *sphb, 
> PCIDevice *pdev)
>  return buf;
>  }
>  
> +static int spapr_phb_dma_window_enable(sPAPRPHBState *sphb,
> +   uint32_t liobn, uint32_t page_shift,
> +   uint64_t window_addr,
> +   uint64_t window_size)
> +{
> +sPAPRTCETable *tcet;
> +uint32_t nb_table = window_size >> page_shift;
> +
> +if (!nb_table) {
> +return -1;
> +}

The caller shouldn't do this, so this probably makes more sense as an
assert() than an error return.

> +
> +tcet = spapr_tce_new_table(DEVICE(sphb), liobn, window_addr,
> +   page_shift, nb_table, false);
> +if (!tcet) {
> +return -1;
> +}

Since you're adding error reporting, you might as well make it via the
error API instead of a return code.  That way if we wasnt to add more
detailed error API reporting to spapr_tce_new_table() in future,
there's less to change.

> +
> +memory_region_add_subregion(>iommu_root, tcet->bus_offset,
> +spapr_tce_get_iommu(tcet));
> +return 0;
> +}
> +
>  /* Macros to operate with address in OF binding to PCI */
>  #define b_x(x, p, l)(((x) & ((1<<(l))-1)) << (p))
>  #define b_n(x)  b_x((x), 31, 1) /* 0 if relocatable */
> @@ -1228,8 +1251,6 @@ static void spapr_phb_realize(DeviceState *dev, Error 
> **errp)
>  int i;
>  PCIBus *bus;
>  uint64_t msi_window_size = 4096;
> -sPAPRTCETable *tcet;
> -uint32_t nb_table;
>  
>  if (sphb->index != (uint32_t)-1) {
>  hwaddr windows_base;
> @@ -1381,18 +1402,11 @@ static void spapr_phb_realize(DeviceState *dev, Error 
> **errp)
>  }
>  }
>  
> -nb_table = sphb->dma_win_size >> SPAPR_TCE_PAGE_SHIFT;
> -tcet = spapr_tce_new_table(DEVICE(sphb), sphb->dma_liobn,
> -   0, SPAPR_TCE_PAGE_SHIFT, nb_table, false);
> -if (!tcet) {
> -error_setg(errp, "Unable to create TCE table for %s",
> -   sphb->dtbusname);
> -return;
> -}
> -
>  /* Register default 32bit DMA window */
> -memory_region_add_subregion(>iommu_root, sphb->dma_win_addr,
> -spapr_tce_get_iommu(tcet));
> +if (spapr_phb_dma_window_enable(sphb, sphb->dma_liobn, 
> SPAPR_TCE_PAGE_SHIFT,
> +sphb->dma_win_addr, sphb->dma_win_size)) 
> {
> +error_setg(errp, "Unable to create TCE table for %s", 
> sphb->dtbusname);
> +}
>  
>  sphb->msi = g_hash_table_new_full(g_int_hash, g_int_equal, g_free, 
> g_free);
>  }

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

[Qemu-devel] [PATCH qemu] spapr-pci: Make MMIO spacing a machine property and increase it

2016-03-02 Thread Alexey Kardashevskiy

The pseries machine supports multiple PHBs. Each PHB's MMIO/IO space is
mapped to the CPU address space starting at SPAPR_PCI_WINDOW_BASE plus
some offset which is calculated from PHB's index and
SPAPR_PCI_WINDOW_SPACING which is defined now as 64GB.

Since the default 32bit DMA window is using first 2GB of MMIO space,
the amount of MMIO which the PCI devices can actually use is reduced
to 62GB. This is a problem if the user wants to use devices with
huge BARs.

For example, 2 PCI functions of a NVIDIA K80 adapter being passed through
will exceed this limit as they have 16M + 16G + 32M BARs which
(when aligned) will need 64GB.

This converts SPAPR_PCI_WINDOW_BASE and SPAPR_PCI_WINDOW_SPACING to
sPAPRMachineState properties. This uses old values for pseries machine
before 2.6 and increases the spacing to 128GB so MMIO space becomes 126GB.

This changes the default value of sPAPRPHBState::mem_win_size to -1 for
pseries-2.6 and adds setup to spapr_phb_realize.

Signed-off-by: Alexey Kardashevskiy 
---
 hw/ppc/spapr.c  | 43 ++-
 hw/ppc/spapr_pci.c  | 14 ++
 include/hw/pci-host/spapr.h |  4 +---
 include/hw/ppc/spapr.h  |  1 +
 4 files changed, 54 insertions(+), 8 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index e9d4abf..d21ad8a 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -40,6 +40,7 @@
 #include "migration/migration.h"
 #include "mmu-hash64.h"
 #include "qom/cpu.h"
+#include "qapi/visitor.h"
 
 #include "hw/boards.h"
 #include "hw/ppc/ppc.h"
@@ -2100,6 +2101,29 @@ static void spapr_set_kvm_type(Object *obj, const char 
*value, Error **errp)
 spapr->kvm_type = g_strdup(value);
 }
 
+static void spapr_prop_get_uint64(Object *obj, Visitor *v, const char *name,
+  void *opaque, Error **errp)
+{
+uint64_t value = *(uint64_t *)opaque;
+visit_type_uint64(v, name, , errp);
+}
+
+static void spapr_prop_set_uint64(Object *obj, Visitor *v, const char *name,
+  void *opaque, Error **errp)
+{
+uint64_t value = -1;
+visit_type_uint64(v, name, , errp);
+*(uint64_t *)opaque = value;
+}
+
+static void spapr_prop_add_uint64(Object *obj, const char *name,
+  uint64_t *pval, const char *desc)
+{
+object_property_add(obj, name, "uint64", spapr_prop_get_uint64,
+spapr_prop_set_uint64, NULL, pval, NULL);
+object_property_set_description(obj, name, desc, NULL);
+}
+
 static void spapr_machine_initfn(Object *obj)
 {
 sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
@@ -2110,6 +2134,10 @@ static void spapr_machine_initfn(Object *obj)
 object_property_set_description(obj, "kvm-type",
 "Specifies the KVM virtualization mode 
(HV, PR)",
 NULL);
+spapr_prop_add_uint64(obj, "phb-mmio-base", >phb_mmio_base,
+  "Base address for PCI host bridge MMIO");
+spapr_prop_add_uint64(obj, "phb-mmio-spacing", >phb_mmio_spacing,
+  "Amount of MMIO space per PCI host bridge");
 }
 
 static void spapr_machine_finalizefn(Object *obj)
@@ -2357,6 +2385,10 @@ static const TypeInfo spapr_machine_info = {
  */
 static void spapr_machine_2_6_instance_options(MachineState *machine)
 {
+sPAPRMachineState *spapr = SPAPR_MACHINE(machine);
+
+spapr->phb_mmio_base = SPAPR_PCI_WINDOW_BASE;
+spapr->phb_mmio_spacing = SPAPR_PCI_WINDOW_SPACING;
 }
 
 static void spapr_machine_2_6_class_options(MachineClass *mc)
@@ -2370,10 +2402,19 @@ DEFINE_SPAPR_MACHINE(2_6, "2.6", true);
  * pseries-2.5
  */
 #define SPAPR_COMPAT_2_5 \
-HW_COMPAT_2_5
+HW_COMPAT_2_5 \
+{\
+.driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,\
+.property = "mem_win_size",\
+.value= "0x10",\
+},
 
 static void spapr_machine_2_5_instance_options(MachineState *machine)
 {
+sPAPRMachineState *spapr = SPAPR_MACHINE(machine);
+
+spapr->phb_mmio_base = 0x100ULL;
+spapr->phb_mmio_spacing = 0x10ULL;
 }
 
 static void spapr_machine_2_5_class_options(MachineClass *mc)
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index e8edad3..bae01dd 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1260,9 +1260,11 @@ static void spapr_phb_realize(DeviceState *dev, Error 
**errp)
 sphb->buid = SPAPR_PCI_BASE_BUID + sphb->index;
 sphb->dma_liobn = SPAPR_PCI_LIOBN(sphb->index, 0);
 
-windows_base = SPAPR_PCI_WINDOW_BASE
-+ sphb->index * SPAPR_PCI_WINDOW_SPACING;
+windows_base = spapr->phb_mmio_base
++ sphb->index * spapr->phb_mmio_spacing;
 sphb->mem_win_addr = windows_base + SPAPR_PCI_MMIO_WIN_OFF;
+sphb->mem_win_size = spapr->phb_mmio_spacing -
+SPAPR_PCI_MEM_WIN_BUS_OFFSET;
 sphb->io_win_addr = windows_base +

Re: [Qemu-devel] [PATCH v4 04/26] crypto: add support for anti-forensic split algorithm

2016-03-02 Thread Eric Blake

On 02/29/2016 05:00 AM, Daniel P. Berrange wrote:
> The LUKS format specifies an anti-forensic split algorithm which
> is used to artificially expand the size of the key material on
> disk. This is an implementation of that algorithm.
> 
> Signed-off-by: Daniel P. Berrange 
> ---
> +static void qcrypto_afsplit_xor(size_t blocklen,
> +const uint8_t *in1,
> +const uint8_t *in2,
> +uint8_t *out)
> +{
> +size_t i;
> +for (i = 0; i < blocklen; i++) {
> +out[i] = in1[i] ^ in2[i];
> +}
> +}

I hope the compiler can optimize this into vectored operation.  But no
need to change your code.

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH qemu] target-ppc: Add PVR for POWER8NVL processor

2016-03-02 Thread David Gibson

On Thu, Mar 03, 2016 at 11:08:19AM +1100, Alexey Kardashevskiy wrote:
> This adds a new POWER8+NVLink CPU PVR which core is identical to POWER8
> but has a different PVR. The only available machine now has PVR
> pvr 004c 0100 so this defines "POWER8NVL" alias as v1.0.
> 
> The corresponding kernel commit is
> https://github.com/torvalds/linux/commit/ddee09c099c3
> "powerpc: Add PVR for POWER8NVL processor"
> 
> Signed-off-by: Alexey Kardashevskiy 

Applied, thanks.

> ---
>  target-ppc/cpu-models.c | 3 +++
>  target-ppc/cpu-models.h | 2 ++
>  target-ppc/translate_init.c | 3 +++
>  3 files changed, 8 insertions(+)
> 
> diff --git a/target-ppc/cpu-models.c b/target-ppc/cpu-models.c
> index ed005d7..5209e63 100644
> --- a/target-ppc/cpu-models.c
> +++ b/target-ppc/cpu-models.c
> @@ -1143,6 +1143,8 @@
>  "POWER8E v2.1")
>  POWERPC_DEF("POWER8_v2.0",   CPU_POWERPC_POWER8_v20, POWER8,
>  "POWER8 v2.0")
> +POWERPC_DEF("POWER8NVL_v1.0",CPU_POWERPC_POWER8NVL_v10,  POWER8,
> +"POWER8NVL v1.0")
>  POWERPC_DEF("970_v2.2",  CPU_POWERPC_970_v22,970,
>  "PowerPC 970 v2.2")
>  POWERPC_DEF("970fx_v1.0",CPU_POWERPC_970FX_v10,  970,
> @@ -1392,6 +1394,7 @@ PowerPCCPUAlias ppc_cpu_aliases[] = {
>  { "POWER7+", "POWER7+_v2.1" },
>  { "POWER8E", "POWER8E_v2.1" },
>  { "POWER8", "POWER8_v2.0" },
> +{ "POWER8NVL", "POWER8NVL_v1.0" },
>  { "970", "970_v2.2" },
>  { "970fx", "970fx_v3.1" },
>  { "970mp", "970mp_v1.1" },
> diff --git a/target-ppc/cpu-models.h b/target-ppc/cpu-models.h
> index 2992427..f21a44c 100644
> --- a/target-ppc/cpu-models.h
> +++ b/target-ppc/cpu-models.h
> @@ -560,6 +560,8 @@ enum {
>  CPU_POWERPC_POWER8E_v21= 0x004B0201,
>  CPU_POWERPC_POWER8_BASE= 0x004D,
>  CPU_POWERPC_POWER8_v20 = 0x004D0200,
> +CPU_POWERPC_POWER8NVL_BASE = 0x004C,
> +CPU_POWERPC_POWER8NVL_v10  = 0x004C0100,
>  CPU_POWERPC_970_v22= 0x00390202,
>  CPU_POWERPC_970FX_v10  = 0x00391100,
>  CPU_POWERPC_970FX_v20  = 0x003C0200,
> diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
> index bd0cffc..927bd24 100644
> --- a/target-ppc/translate_init.c
> +++ b/target-ppc/translate_init.c
> @@ -8219,6 +8219,9 @@ static void init_proc_POWER8(CPUPPCState *env)
>  
>  static bool ppc_pvr_match_power8(PowerPCCPUClass *pcc, uint32_t pvr)
>  {
> +if ((pvr & CPU_POWERPC_POWER_SERVER_MASK) == CPU_POWERPC_POWER8NVL_BASE) 
> {
> +return true;
> +}
>  if ((pvr & CPU_POWERPC_POWER_SERVER_MASK) == CPU_POWERPC_POWER8E_BASE) {
>  return true;
>  }

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v4 03/26] crypto: add support for generating initialization vectors

2016-03-02 Thread Eric Blake

On 02/29/2016 05:00 AM, Daniel P. Berrange wrote:
> There are a number of different algorithms that can be used
> to generate initialization vectors for disk encryption. This
> introduces a simple internal QCryptoBlockIV object to provide
> a consistent internal API to the different algorithms. The
> initially implemented algorithms are 'plain', 'plain64' and
> 'essiv', each matching the same named algorithm provided
> by the Linux kernel dm-crypt driver.
> 
> Signed-off-by: Daniel P. Berrange 
> ---

> +++ b/crypto/ivgen-essiv.c

> +static int qcrypto_ivgen_essiv_init(QCryptoIVGen *ivgen,
> +const uint8_t *key, size_t nkey,
> +Error **errp)
> +{
> +uint8_t *salt;
> +size_t nhash;
> +size_t nsalt;
> +QCryptoIVGenESSIV *essiv = g_new0(QCryptoIVGenESSIV, 1);
> +
> +/* Not neccessarily the same as nkey */

s/neccessarily/necessarily/

> +++ b/include/crypto/ivgen.h

> + *
> + * while (ndata) {
> + * if (qcrypto_ivgen_calculate(ivgen, sector, iv, niv, errp) < 0) {
> + * goto error;
> + * }
> + * if (qcrypto_cipher_setiv(cipher, iv, niv, errp) < 0) {
> + * goto error;
> + * }
> + * if (qcrypto_cipher_encrypt(cipher,
> + *data + (sector * 512),
> + *data + (sector * 512),
> + *512, errp) < 0) {

Don't you reuse a single in/out buffer later in the series? If so, don't
forget to update the comment at that time (the compiler will only catch
code changes).


> + *
> + * - QCRYPTO_IVGEN_ALG_PLAIN
> + *
> + * The IVs are generated by the 32-bit truncated sector
> + * number. This should never be used for block devices
> + * that are larger than 2^32 sectors in size

s/$/./

> + * All the other parameters are unused.
> + *

> +++ b/qapi/crypto.json
> @@ -78,3 +78,22 @@
>  { 'enum': 'QCryptoCipherMode',
>'prefix': 'QCRYPTO_CIPHER_MODE',
>'data': ['ecb', 'cbc']}
> +
> +
> +##
> +# QCryptoIVGenAlgorithm:
> +#
> +# The supported algorithms for generating initialization
> +# vectors for full disk encryption. The 'plain' generator
> +# should not be used for disks with sector numbers larger
> +# than 2^32, except where compatibility with pre-existing
> +# Linux dm-crypt volumes is required.
> +#
> +# @plain: 64-bit sector number truncated to 32-bits
> +# @plain64: 64-bit sector number
> +# @essiv: 64-bit sector number encrypted with a hash of the encryption key
> +# Since: 2.6

Worth warning that 'plain' and 'plain64' expose the encrypted disk to
some weaknesses when compared to 'essiv'?

Fixes are minor, so I'm okay if you add:
Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH v15] block/raw-posix.c: Make physical devices usable in QEMU under Mac OS X host

2016-03-02 Thread Programmingkid

Mac OS X can be picky when it comes to allowing the user
to use physical devices in QEMU. Most mounted volumes
appear to be off limits to QEMU. If an issue is detected,
a message is displayed showing the user how to unmount a
volume. Now QEMU uses both CD and DVD media.

Signed-off-by: John Arbuckle 

---

raw-posix.patch
Description: Binary data

Replaced one "return -1" with "return -ENOENT".
Replaced another "return -1" with "return ret".
Removed trailing whitespace.
This patch is also attached as an email attachment. git send-email does not 
work on my system :(

 block/raw-posix.c | 165 +-
 1 file changed, 126 insertions(+), 39 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 8866121..eb50c02 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -44,6 +44,7 @@
 #include 
 #include 
 //#include 
+#include 
 #include 
 #endif
 
@@ -1965,33 +1966,47 @@ BlockDriver bdrv_file = {
 /* host device */
 
 #if defined(__APPLE__) && defined(__MACH__)
-static kern_return_t FindEjectableCDMedia( io_iterator_t *mediaIterator );
 static kern_return_t GetBSDPath(io_iterator_t mediaIterator, char *bsdPath,
 CFIndex maxPathSize, int flags);
-kern_return_t FindEjectableCDMedia( io_iterator_t *mediaIterator )
+static char *FindEjectableOpticalMedia(io_iterator_t *mediaIterator)
 {
-kern_return_t   kernResult;
+kern_return_t kernResult = KERN_FAILURE;
 mach_port_t masterPort;
 CFMutableDictionaryRef  classesToMatch;
+const char *matching_array[] = {kIODVDMediaClass, kIOCDMediaClass};
+char *mediaType = NULL;
 
 kernResult = IOMasterPort( MACH_PORT_NULL,  );
 if ( KERN_SUCCESS != kernResult ) {
 printf( "IOMasterPort returned %d\n", kernResult );
 }
 
-classesToMatch = IOServiceMatching( kIOCDMediaClass );
-if ( classesToMatch == NULL ) {
-printf( "IOServiceMatching returned a NULL dictionary.\n" );
-} else {
-CFDictionarySetValue( classesToMatch, CFSTR( kIOMediaEjectableKey ), 
kCFBooleanTrue );
-}
-kernResult = IOServiceGetMatchingServices( masterPort, classesToMatch, 
mediaIterator );
-if ( KERN_SUCCESS != kernResult )
-{
-printf( "IOServiceGetMatchingServices returned %d\n", kernResult );
-}
+int index;
+for (index = 0; index < ARRAY_SIZE(matching_array); index++) {
+classesToMatch = IOServiceMatching(matching_array[index]);
+if (classesToMatch == NULL) {
+error_report("IOServiceMatching returned NULL for %s",
+ matching_array[index]);
+continue;
+}
+CFDictionarySetValue(classesToMatch, CFSTR(kIOMediaEjectableKey),
+ kCFBooleanTrue);
+kernResult = IOServiceGetMatchingServices(masterPort, classesToMatch,
+  mediaIterator);
+if (kernResult != KERN_SUCCESS) {
+error_report("Note: IOServiceGetMatchingServices returned %d",
+ kernResult);
+continue;
+}
 
-return kernResult;
+/* If a match was found, leave the loop */
+if (*mediaIterator != 0) {
+DPRINTF("Matching using %s\n", matching_array[index]);
+mediaType = g_strdup(matching_array[index]);
+break;
+}
+}
+return mediaType;
 }
 
 kern_return_t GetBSDPath(io_iterator_t mediaIterator, char *bsdPath,
@@ -2023,7 +2038,46 @@ kern_return_t GetBSDPath(io_iterator_t mediaIterator, 
char *bsdPath,
 return kernResult;
 }
 
-#endif
+/* Sets up a real cdrom for use in QEMU */
+static bool setup_cdrom(char *bsd_path, Error **errp)
+{
+int index, num_of_test_partitions = 2, fd;
+char test_partition[MAXPATHLEN];
+bool partition_found = false;
+
+/* look for a working partition */
+for (index = 0; index < num_of_test_partitions; index++) {
+snprintf(test_partition, sizeof(test_partition), "%ss%d", bsd_path,
+ index);
+fd = qemu_open(test_partition, O_RDONLY | O_BINARY | O_LARGEFILE);
+if (fd >= 0) {
+partition_found = true;
+qemu_close(fd);
+break;
+}
+}
+
+/* if a working partition on the device was not found */
+if (partition_found == false) {
+error_setg(errp, "Failed to find a working partition on disc");
+} else {
+DPRINTF("Using %s as optical disc\n", test_partition);
+pstrcpy(bsd_path, MAXPATHLEN, test_partition);
+}
+return partition_found;
+}
+
+/* Prints directions on mounting and unmounting a device */
+static void print_unmounting_directions(const char *file_name)
+{
+error_report("If device %s is mounted on the desktop, unmount"
+ " it first before using it in QEMU", file_name);
+error_report("Command to unmount device: diskutil unmountDisk %s",
+

Re: [Qemu-devel] [PATCH v4 02/26] crypto: add support for PBKDF2 algorithm

2016-03-02 Thread Eric Blake

On 02/29/2016 05:00 AM, Daniel P. Berrange wrote:
> The LUKS data format includes use of PBKDF2 (Password-Based
> Key Derivation Function). The Nettle library can provide
> an implementation of this, but we don't want code directly
> depending on a specific crypto library backend. Introduce
> a new include/crypto/pbkdf.h header which defines a QEMU
> API for invoking PBKDK2. The initial implementations are
> backed by nettle & gcrypt, which are commonly available
> with distros shipping GNUTLS.
> 
> The test suite data is taken from the cryptsetup codebase
> under the LGPLv2.1+ license. This merely aims to verify
> that whatever backend we provide for this function in QEMU
> will comply with the spec.
> 
> Signed-off-by: Daniel P. Berrange 
> ---

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 0/3] ppc: Define some more SPRs of POWER8 in QEMU to fix migration

2016-03-02 Thread David Gibson

On Wed, Mar 02, 2016 at 09:19:19PM +0100, Thomas Huth wrote:
> While tinkering with the new kvm-unit-tests framework for Power,
> I discovered that a couple of SPRs are destroyed during migration.
> We've got to define them in QEMU and make sure that they are
> synchronized with the kernel to make sure that the register
> contents are not lost.
> The first patch introduces the new PSPB register from POWER8,
> second patcch fixes the definition of the TAR register, and
> the third patch (which has been taken from Ben's "Add native
> POWER8 platform" patch series) introduces some missing
> performance monitor registers.

Nice catches.  Applied to ppc-for-2.6.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

[Qemu-devel] [PATCH qemu] target-ppc: Add PVR for POWER8NVL processor

2016-03-02 Thread Alexey Kardashevskiy

This adds a new POWER8+NVLink CPU PVR which core is identical to POWER8
but has a different PVR. The only available machine now has PVR
pvr 004c 0100 so this defines "POWER8NVL" alias as v1.0.

The corresponding kernel commit is
https://github.com/torvalds/linux/commit/ddee09c099c3
"powerpc: Add PVR for POWER8NVL processor"

Signed-off-by: Alexey Kardashevskiy 
---
 target-ppc/cpu-models.c | 3 +++
 target-ppc/cpu-models.h | 2 ++
 target-ppc/translate_init.c | 3 +++
 3 files changed, 8 insertions(+)

diff --git a/target-ppc/cpu-models.c b/target-ppc/cpu-models.c
index ed005d7..5209e63 100644
--- a/target-ppc/cpu-models.c
+++ b/target-ppc/cpu-models.c
@@ -1143,6 +1143,8 @@
 "POWER8E v2.1")
 POWERPC_DEF("POWER8_v2.0",   CPU_POWERPC_POWER8_v20, POWER8,
 "POWER8 v2.0")
+POWERPC_DEF("POWER8NVL_v1.0",CPU_POWERPC_POWER8NVL_v10,  POWER8,
+"POWER8NVL v1.0")
 POWERPC_DEF("970_v2.2",  CPU_POWERPC_970_v22,970,
 "PowerPC 970 v2.2")
 POWERPC_DEF("970fx_v1.0",CPU_POWERPC_970FX_v10,  970,
@@ -1392,6 +1394,7 @@ PowerPCCPUAlias ppc_cpu_aliases[] = {
 { "POWER7+", "POWER7+_v2.1" },
 { "POWER8E", "POWER8E_v2.1" },
 { "POWER8", "POWER8_v2.0" },
+{ "POWER8NVL", "POWER8NVL_v1.0" },
 { "970", "970_v2.2" },
 { "970fx", "970fx_v3.1" },
 { "970mp", "970mp_v1.1" },
diff --git a/target-ppc/cpu-models.h b/target-ppc/cpu-models.h
index 2992427..f21a44c 100644
--- a/target-ppc/cpu-models.h
+++ b/target-ppc/cpu-models.h
@@ -560,6 +560,8 @@ enum {
 CPU_POWERPC_POWER8E_v21= 0x004B0201,
 CPU_POWERPC_POWER8_BASE= 0x004D,
 CPU_POWERPC_POWER8_v20 = 0x004D0200,
+CPU_POWERPC_POWER8NVL_BASE = 0x004C,
+CPU_POWERPC_POWER8NVL_v10  = 0x004C0100,
 CPU_POWERPC_970_v22= 0x00390202,
 CPU_POWERPC_970FX_v10  = 0x00391100,
 CPU_POWERPC_970FX_v20  = 0x003C0200,
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index bd0cffc..927bd24 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -8219,6 +8219,9 @@ static void init_proc_POWER8(CPUPPCState *env)
 
 static bool ppc_pvr_match_power8(PowerPCCPUClass *pcc, uint32_t pvr)
 {
+if ((pvr & CPU_POWERPC_POWER_SERVER_MASK) == CPU_POWERPC_POWER8NVL_BASE) {
+return true;
+}
 if ((pvr & CPU_POWERPC_POWER_SERVER_MASK) == CPU_POWERPC_POWER8E_BASE) {
 return true;
 }
-- 
2.5.0.rc3

Re: [Qemu-devel] [PATCH 24/38] ivshmem: Propagate errors through ivshmem_recv_setup()

2016-03-02 Thread Marc-André Lureau

Hi

On Wed, Mar 2, 2016 at 8:35 PM, Markus Armbruster  wrote:
> You know, I'd prefer that, too, and I've argued for it unsuccessfully.
> As it is, we fairly consistently return void when the function returns
> errors through Error ** and has no non-error value.

Good to know we are on same side.

>>>  {
>>>  PCIDevice *pdev = PCI_DEVICE(s);
>>>  MSIMessage msg = msix_get_message(pdev, vector);
>>> @@ -522,22 +518,21 @@ static int ivshmem_add_kvm_msi_virq(IVShmemState *s, 
>>> int vector)
>>>
>>>  ret = kvm_irqchip_add_msi_route(kvm_state, msg, pdev);
>>>  if (ret < 0) {
>>> -error_report("ivshmem: kvm_irqchip_add_msi_route failed");
>>> -return -1;
>>> +error_setg(errp, "kvm_irqchip_add_msi_route failed");
>>> +return;
>>>  }
>>>
>>>  s->msi_vectors[vector].virq = ret;
>>>  s->msi_vectors[vector].pdev = pdev;
>>> -
>>> -return 0;
>>>  }
>>>
>>> -static void setup_interrupt(IVShmemState *s, int vector)
>>> +static void setup_interrupt(IVShmemState *s, int vector, Error **errp)
>>>  {
>>>  EventNotifier *n = >peers[s->vm_id].eventfds[vector];
>>>  bool with_irqfd = kvm_msi_via_irqfd_enabled() &&
>>>  ivshmem_has_feature(s, IVSHMEM_MSI);
>>>  PCIDevice *pdev = PCI_DEVICE(s);
>>> +Error *err = NULL;
>>>
>>>  IVSHMEM_DPRINTF("setting up interrupt for vector: %d\n", vector);
>>>
>>> @@ -546,13 +541,16 @@ static void setup_interrupt(IVShmemState *s, int 
>>> vector)
>>>  watch_vector_notifier(s, n, vector);
>>>  } else if (msix_enabled(pdev)) {
>>>  IVSHMEM_DPRINTF("with irqfd\n");
>>> -if (ivshmem_add_kvm_msi_virq(s, vector) < 0) {
>>> +ivshmem_add_kvm_msi_virq(s, vector, );
>>> +if (err) {
>>> +error_propagate(errp, err);
>>>  return;
>>
>> That would make this simpler, avoiding local err variables, and
>> propagate. But you seem to prefer that form. I don't know if there is
>> any conventions (I am used to glib conventions, and usually a bool
>> success is returned, even if the function takes a GError)
>
> Does GLib spell out this convention somewhere?

For glib, there is a paragraph about return bool/GError conventions
(which is usually adapted to other return type):
https://developer.gnome.org/glib/unstable/glib-Error-Reporting.html

>
> I can perhaps try to cook up a patch to demonstrate the advantages of
> returning a success/failure value even with Error **, and try to get our
> convention changed.
>
> Until then, we better stick to the existing convention, whether we like
> it or not.

ok




-- 
Marc-André Lureau

Re: [Qemu-devel] [PATCH 22/38] ivshmem: Plug leaks on unplug, fix peer disconnect

2016-03-02 Thread Marc-André Lureau

On Wed, Mar 2, 2016 at 8:19 PM, Markus Armbruster  wrote:
> When called from process_msg_disconnect(): invalid as long as
> ivshmem-spec.txt doesn't assign a sane meaning to it.  Let's make it an
> error there, okay?


Sounds find to me too

thanks

-- 
Marc-André Lureau

Re: [Qemu-devel] [PATCH v2 11/19] qapi: Add type.is_empty() helper

2016-03-02 Thread Eric Blake

On 03/02/2016 12:04 PM, Markus Armbruster wrote:
> Eric Blake  writes:
> 
>> And use it in qapi-types and qapi-event.  Down the road, we may
>> want to lift our artificial restriction of no variants at the
>> top level of an event, at which point, inlining our check for
>> whether members is empty will no longer be sufficient, but
>> adding a check for variants adds verbosity; in the meantime,
>> add some asserts in places where we don't handle variants.
> 
> Perhaps I'm just running out of steam for today, but I've read this
> twice, and still don't get why adding these assertions goes in the same
> patch as adding the helper, or what it has to do with events.

Okay, will split this into two patches for v3.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v7 5/6] s390x/cpu: Add error handling to cpu creation

2016-03-02 Thread Matthew Rosato

On 03/02/2016 02:59 AM, David Hildenbrand wrote:
>> Check for and propogate errors during s390 cpu creation.
>>
>> Signed-off-by: Matthew Rosato 
>> ---
>>  hw/s390x/s390-virtio-ccw.c | 30 +
>>  hw/s390x/s390-virtio.c |  2 +-
>>  hw/s390x/s390-virtio.h |  1 +
>>  target-s390x/cpu-qom.h |  3 +++
>>  target-s390x/cpu.c | 65 
>> --
>>  target-s390x/cpu.h |  1 +
>>  target-s390x/helper.c  | 31 --
>>  7 files changed, 128 insertions(+), 5 deletions(-)
>>
>> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
>> index 3090e76..4886dbf 100644
>> --- a/hw/s390x/s390-virtio-ccw.c
>> +++ b/hw/s390x/s390-virtio-ccw.c
>> @@ -112,6 +112,36 @@ void s390_memory_init(ram_addr_t mem_size)
>>  s390_skeys_init();
>>  }
>>
>> +S390CPU *s390_new_cpu(MachineState *machine, int64_t id, Error **errp)
> 
> Just a thought, if not passing machine but the model string, we could make
> 
> cpu_s390x_init() call s390_new_cpu().
> 
> But then s390_new_cpu() would have to be moved again. Not sure if this is
> really worth it.
> 

Additionally, next_cpu_id would have to be moved somewhere other than
the S390CPUClass, as we need that value for input to s390_new_cpu()
here, but won't have a cpu object to acquire it until after
s390_new_cpu() returns.

Going to leave cpu_s390x_init() calling cpu_s390x_create() unless
someone voices a strong opinion.

Matt

Re: [Qemu-devel] [V6 4/4] hw/pci-host: Emulate AMD IOMMU

2016-03-02 Thread David Kiarie

On Thu, Mar 3, 2016 at 12:17 AM, Michael S. Tsirkin  wrote:
> On Thu, Mar 03, 2016 at 12:09:28AM +0300, David Kiarie wrote:
>>
>>
>> On 22/02/16 14:22, Marcel Apfelbaum wrote:
>> >On 02/21/2016 08:11 PM, David Kiarie wrote:
>> >>Add AMD IOMMU emulation support to q35 chipset
>> >>
>> >>Signed-off-by: David Kiarie 
>> >>---
>> >>  hw/pci-host/piix.c|  1 +
>> >>  hw/pci-host/q35.c | 14 --
>> >>  include/hw/i386/intel_iommu.h |  1 +
>> >>  3 files changed, 14 insertions(+), 2 deletions(-)
>> >>
>> >>diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
>> >>index 41aa66f..ab2e24a 100644
>> >>--- a/hw/pci-host/piix.c
>> >>+++ b/hw/pci-host/piix.c
>> >>@@ -36,6 +36,7 @@
>> >>  #include "hw/i386/ioapic.h"
>> >>  #include "qapi/visitor.h"
>> >>  #include "qemu/error-report.h"
>> >>+#include "hw/i386/amd_iommu.h"
>> >
>> >Hi,
>> >
>> >I think you don't need this include anymore.
>> >
>> >>
>> >>  /*
>> >>   * I440FX chipset data sheet.
>> >>diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
>> >>index 115fb8c..355fb32 100644
>> >>--- a/hw/pci-host/q35.c
>> >>+++ b/hw/pci-host/q35.c
>> >>@@ -31,6 +31,7 @@
>> >>  #include "hw/hw.h"
>> >>  #include "hw/pci-host/q35.h"
>> >>  #include "qapi/visitor.h"
>> >>+#include "hw/i386/amd_iommu.h"
>> >>
>> >>/
>> >>   * Q35 host
>> >>@@ -505,9 +506,18 @@ static void mch_realize(PCIDevice *d, Error **errp)
>> >>   mch->pci_address_space, >pam_regions[i+1],
>> >>   PAM_EXPAN_BASE + i * PAM_EXPAN_SIZE, PAM_EXPAN_SIZE);
>> >>  }
>> >>-/* Intel IOMMU (VT-d) */
>> >>-if (object_property_get_bool(qdev_get_machine(), "iommu", NULL)) {
>> >>+
>> >>+if (g_strcmp0(MACHINE(qdev_get_machine())->iommu, INTEL_IOMMU_STR)
>> >>== 0) {
>> >>+/* Intel IOMMU (VT-d) */
>> >>  mch_init_dmar(mch);
>> >>+} else if (g_strcmp0(MACHINE(qdev_get_machine())->iommu,
>> >>AMD_IOMMU_STR)
>> >>+   == 0) {
>> >>+AMDIOMMUState *iommu_state;
>> >>+PCIDevice *iommu;
>> >>+PCIBus *bus = PCI_BUS(qdev_get_parent_bus(DEVICE(mch)));
>> >>+iommu = pci_create_simple(bus, 0x20, TYPE_AMD_IOMMU_DEVICE);
>
> Pls don't hardcode paths like this. Set addr property instead.
>
>> >>+iommu_state = AMD_IOMMU_DEVICE(iommu);
>> >>+pci_setup_iommu(bus, bridge_host_amd_iommu, iommu_state);
>> >
>> >pci_setup_iommu third parameter is void*, so you don't need to cast to
>> >AMDIOMMUState
>> >before passing it.
>>
>> This include is necessary for the definition of "AMD_IOMMU_STR" either way
>> so am leaving this as is.
>
> This option parsing is just too ugly.
>
> Looks like it was a mistake to support the iommu
> machine property, but I see no reason to add to the
> existing mess.
>
> Can't users create iommu with -device amd-iommu ?

You mean getting rid of the above code and starting device with
'-device amd-iommu' ? This way am not able to setup IOMMU regions for
devices correctly. IIRC 'pci_setup_iommu' when called from IOMMU code
sets up IOMMU region for IOMMU only. Calling this from the bus sets up
IOMMU regions for all devices though.

I need to setup IOMMU regions for all devices from the bus, to be able
to do that I need to have created the IOMMU device first.

>
> It's necessary if we are to support multiple IOMMUs, anyway.
>
>> >
>> >Thanks,
>> >Marcel
>> >
>> >>  }
>> >>  }
>> >>
>> >>diff --git a/include/hw/i386/intel_iommu.h
>> >>b/include/hw/i386/intel_iommu.h
>> >>index b024ffa..539530c 100644
>> >>--- a/include/hw/i386/intel_iommu.h
>> >>+++ b/include/hw/i386/intel_iommu.h
>> >>@@ -27,6 +27,7 @@
>> >>  #define TYPE_INTEL_IOMMU_DEVICE "intel-iommu"
>> >>  #define INTEL_IOMMU_DEVICE(obj) \
>> >>   OBJECT_CHECK(IntelIOMMUState, (obj), TYPE_INTEL_IOMMU_DEVICE)
>> >>+#define INTEL_IOMMU_STR "intel"
>> >>
>> >>  /* DMAR Hardware Unit Definition address (IOMMU unit) */
>> >>  #define Q35_HOST_BRIDGE_IOMMU_ADDR  0xfed9ULL
>> >>
>> >

Re: [Qemu-devel] [PATCH] target-arm: Fix translation level on early translation faults

2016-03-02 Thread Sergey Fedorov


On 02.03.2016 21:04, Sergey Sorokin wrote:

Qemu reports translation fault on 1st level instead of 0th level in case of
AArch64 address translation if the translation table walk is disabled or
the address is in the gap between the two regions.

Signed-off-by: Sergey Sorokin 
---
  target-arm/helper.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index 18c8296..09f920c 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -7238,6 +7238,7 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
target_ulong address,
   * support for those page table walks.
   */
  if (arm_el_is_aa64(env, el)) {
+level = 0;
  va_size = 64;
  if (el > 1) {
  if (mmu_idx != ARMMMUIdx_S2NS) {


I think we'd better set the level variable to 1 for AArch32 in the else 
clause explicitly and drop its initialization in the beginning of the 
function. Otherwise it looks like AArch64 is a kind of special case.


Best regards,
Sergey

Re: [Qemu-devel] [V6 0/4] AMD IOMMU

2016-03-02 Thread Michael S. Tsirkin

On Thu, Mar 03, 2016 at 12:17:27AM +0300, David Kiarie wrote:
> 
> 
> On 01/03/16 16:48, Jan Kiszka wrote:
> >On 2016-03-01 14:07, Michael S. Tsirkin wrote:
> >>On Sun, Feb 21, 2016 at 09:10:56PM +0300, David Kiarie wrote:
> >>>Hello there,
> >>>
> >>>Repost, AMD IOMMU patches version 6.
> >>>
> >>>Changes since version 5
> >>>  -Fixed macro formating issues
> >>>  -changed occurences of IO MMU to IOMMU for consistency
> >>>  -Fixed capability registers duplication
> >>>  -Rebased to current master
> >>>
> >>>David Kiarie (4):
> >>>   hw/i386: Introduce AMD IOMMU
> >>>   hw/core: Add AMD IOMMU to machine properties
> >>>   hw/i386: ACPI table for AMD IOMMU
> >>>   hw/pci-host: Emulate AMD IOMMU
> >>I went over AMD IOMMU spec.
> >>I'm concerned that it appears that there's no chance for it to
> >>work correctly if host caches invalid PTE entries.
> >>
> >>The spec vaguely discusses write-protecting such PTEs but
> >>that would be very complex if it can be made to work at all.
> >>
> >>This means that this can't work with e.g. VFIO.
> >>It can only work with emulated devices.
> >You mean it can't work if we program a real IOMMU (for VFIO) with
> >translated data from the emulated one but cannot track any updates of
> >the related page tables because the guest is not required to issue
> >traceable flush requests? Hmm, too bad.
> >
> >>OTOH VTD can easily support PTE shadowing by setting a flag.
> >Do you mean RWBF=1 in the CAP register? Given that "Newer hardware
> >implementations are expected to NOT require explicit software flushing
> >of write buffers and report RWBF=0 in the Capability register", we may
> >eventually run into guests that no longer check that flag if we expose
> >something that looks like a "newer" implementation.
> >
> >However, this flag is not set right now in our VT-d model.
> >
> >>I'd like us to find some way to avoid possibility
> >>of user error creating a configuration mixing e.g.
> >>vfio with the amd iommu.
> >>
> >>I'm not sure how to do this.
> >>
> >>Any idea?
> >There is likely no way around write-protecting the IOMMU page tables (in
> >KVM mode) once we evaluated and cached them somewhere. For now, I would
> >simply deny vfio while an IOMMU is active on x86.
> Should I implement this, in the meantime ?

Why not :)

> >
> >Jan
> >

Re: [Qemu-devel] [V6 0/4] AMD IOMMU

2016-03-02 Thread David Kiarie




On 01/03/16 16:48, Jan Kiszka wrote:

On 2016-03-01 14:07, Michael S. Tsirkin wrote:

On Sun, Feb 21, 2016 at 09:10:56PM +0300, David Kiarie wrote:

Hello there,

Repost, AMD IOMMU patches version 6.

Changes since version 5
  -Fixed macro formating issues
  -changed occurences of IO MMU to IOMMU for consistency
  -Fixed capability registers duplication
  -Rebased to current master

David Kiarie (4):
   hw/i386: Introduce AMD IOMMU
   hw/core: Add AMD IOMMU to machine properties
   hw/i386: ACPI table for AMD IOMMU
   hw/pci-host: Emulate AMD IOMMU

I went over AMD IOMMU spec.
I'm concerned that it appears that there's no chance for it to
work correctly if host caches invalid PTE entries.

The spec vaguely discusses write-protecting such PTEs but
that would be very complex if it can be made to work at all.

This means that this can't work with e.g. VFIO.
It can only work with emulated devices.

You mean it can't work if we program a real IOMMU (for VFIO) with
translated data from the emulated one but cannot track any updates of
the related page tables because the guest is not required to issue
traceable flush requests? Hmm, too bad.


OTOH VTD can easily support PTE shadowing by setting a flag.

Do you mean RWBF=1 in the CAP register? Given that "Newer hardware
implementations are expected to NOT require explicit software flushing
of write buffers and report RWBF=0 in the Capability register", we may
eventually run into guests that no longer check that flag if we expose
something that looks like a "newer" implementation.

However, this flag is not set right now in our VT-d model.


I'd like us to find some way to avoid possibility
of user error creating a configuration mixing e.g.
vfio with the amd iommu.

I'm not sure how to do this.

Any idea?

There is likely no way around write-protecting the IOMMU page tables (in
KVM mode) once we evaluated and cached them somewhere. For now, I would
simply deny vfio while an IOMMU is active on x86.

Should I implement this, in the meantime ?


Jan

Re: [Qemu-devel] [V6 4/4] hw/pci-host: Emulate AMD IOMMU

2016-03-02 Thread Michael S. Tsirkin

On Thu, Mar 03, 2016 at 12:09:28AM +0300, David Kiarie wrote:
> 
> 
> On 22/02/16 14:22, Marcel Apfelbaum wrote:
> >On 02/21/2016 08:11 PM, David Kiarie wrote:
> >>Add AMD IOMMU emulation support to q35 chipset
> >>
> >>Signed-off-by: David Kiarie 
> >>---
> >>  hw/pci-host/piix.c|  1 +
> >>  hw/pci-host/q35.c | 14 --
> >>  include/hw/i386/intel_iommu.h |  1 +
> >>  3 files changed, 14 insertions(+), 2 deletions(-)
> >>
> >>diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
> >>index 41aa66f..ab2e24a 100644
> >>--- a/hw/pci-host/piix.c
> >>+++ b/hw/pci-host/piix.c
> >>@@ -36,6 +36,7 @@
> >>  #include "hw/i386/ioapic.h"
> >>  #include "qapi/visitor.h"
> >>  #include "qemu/error-report.h"
> >>+#include "hw/i386/amd_iommu.h"
> >
> >Hi,
> >
> >I think you don't need this include anymore.
> >
> >>
> >>  /*
> >>   * I440FX chipset data sheet.
> >>diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
> >>index 115fb8c..355fb32 100644
> >>--- a/hw/pci-host/q35.c
> >>+++ b/hw/pci-host/q35.c
> >>@@ -31,6 +31,7 @@
> >>  #include "hw/hw.h"
> >>  #include "hw/pci-host/q35.h"
> >>  #include "qapi/visitor.h"
> >>+#include "hw/i386/amd_iommu.h"
> >>
> >>/
> >>   * Q35 host
> >>@@ -505,9 +506,18 @@ static void mch_realize(PCIDevice *d, Error **errp)
> >>   mch->pci_address_space, >pam_regions[i+1],
> >>   PAM_EXPAN_BASE + i * PAM_EXPAN_SIZE, PAM_EXPAN_SIZE);
> >>  }
> >>-/* Intel IOMMU (VT-d) */
> >>-if (object_property_get_bool(qdev_get_machine(), "iommu", NULL)) {
> >>+
> >>+if (g_strcmp0(MACHINE(qdev_get_machine())->iommu, INTEL_IOMMU_STR)
> >>== 0) {
> >>+/* Intel IOMMU (VT-d) */
> >>  mch_init_dmar(mch);
> >>+} else if (g_strcmp0(MACHINE(qdev_get_machine())->iommu,
> >>AMD_IOMMU_STR)
> >>+   == 0) {
> >>+AMDIOMMUState *iommu_state;
> >>+PCIDevice *iommu;
> >>+PCIBus *bus = PCI_BUS(qdev_get_parent_bus(DEVICE(mch)));
> >>+iommu = pci_create_simple(bus, 0x20, TYPE_AMD_IOMMU_DEVICE);

Pls don't hardcode paths like this. Set addr property instead.

> >>+iommu_state = AMD_IOMMU_DEVICE(iommu);
> >>+pci_setup_iommu(bus, bridge_host_amd_iommu, iommu_state);
> >
> >pci_setup_iommu third parameter is void*, so you don't need to cast to
> >AMDIOMMUState
> >before passing it.
> 
> This include is necessary for the definition of "AMD_IOMMU_STR" either way
> so am leaving this as is.

This option parsing is just too ugly.

Looks like it was a mistake to support the iommu
machine property, but I see no reason to add to the
existing mess.

Can't users create iommu with -device amd-iommu ?

It's necessary if we are to support multiple IOMMUs, anyway.

> >
> >Thanks,
> >Marcel
> >
> >>  }
> >>  }
> >>
> >>diff --git a/include/hw/i386/intel_iommu.h
> >>b/include/hw/i386/intel_iommu.h
> >>index b024ffa..539530c 100644
> >>--- a/include/hw/i386/intel_iommu.h
> >>+++ b/include/hw/i386/intel_iommu.h
> >>@@ -27,6 +27,7 @@
> >>  #define TYPE_INTEL_IOMMU_DEVICE "intel-iommu"
> >>  #define INTEL_IOMMU_DEVICE(obj) \
> >>   OBJECT_CHECK(IntelIOMMUState, (obj), TYPE_INTEL_IOMMU_DEVICE)
> >>+#define INTEL_IOMMU_STR "intel"
> >>
> >>  /* DMAR Hardware Unit Definition address (IOMMU unit) */
> >>  #define Q35_HOST_BRIDGE_IOMMU_ADDR  0xfed9ULL
> >>
> >

Re: [Qemu-devel] [V6 2/4] hw/core: Add AMD IOMMU to machine properties

2016-03-02 Thread David Kiarie




On 21/02/16 23:09, Jan Kiszka wrote:

On 2016-02-21 19:10, David Kiarie wrote:

diff --git a/qemu-options.hx b/qemu-options.hx
index 2f0465e..dad160f 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -38,7 +38,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
  "kvm_shadow_mem=size of KVM shadow MMU\n"
  "dump-guest-core=on|off include guest memory in a core dump 
(default=on)\n"
  "mem-merge=on|off controls memory merge support (default: 
on)\n"
-"iommu=on|off controls emulated Intel IOMMU (VT-d) support 
(default=off)\n"
+"iommu=amd|intel enables and selects the emulated IOMMU 
(default: off)\n"

We should also support "iommu=off" or "none" to explicitly disable it.
That is consistent with other switches, and maybe there will once be a
machine type (chipset) with IOMMU enabled by default.


We could have such but this will not be referenced anywhere in the code 
as the IOMMU is 'off' by default. Most of the other such switches relate 
to properties that are 'on' by default.




Jan

Re: [Qemu-devel] [PATCH 72/77] ppc: A couple more dummy POWER8 Book4 regs

2016-03-02 Thread Thomas Huth

On 11.11.2015 01:28, Benjamin Herrenschmidt wrote:
> WORT and PID this time
> 
> Signed-off-by: Benjamin Herrenschmidt 
> ---
>  target-ppc/cpu.h|  2 ++
>  target-ppc/translate_init.c | 16 
>  2 files changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
> index aa328a7..6179fbc 100644
> --- a/target-ppc/cpu.h
> +++ b/target-ppc/cpu.h
> @@ -1363,6 +1363,7 @@ static inline int cpu_mmu_index (CPUPPCState *env, bool 
> ifetch)
>  #define SPR_AMR   (0x01D)
>  #define SPR_ACOP  (0x01F)
>  #define SPR_BOOKE_PID (0x030)
> +#define SPR_BOOKS_PID (0x030)
>  #define SPR_BOOKE_DECAR   (0x036)
>  #define SPR_BOOKE_CSRR0   (0x03A)
>  #define SPR_BOOKE_CSRR1   (0x03B)
> @@ -1716,6 +1717,7 @@ static inline int cpu_mmu_index (CPUPPCState *env, bool 
> ifetch)
>  #define SPR_POWER_SPMC1   (0x37C)
>  #define SPR_POWER_SPMC2   (0x37D)
>  #define SPR_POWER_MMCRS   (0x37E)
> +#define SPR_WORT  (0x37F)
>  #define SPR_PPR   (0x380)
>  #define SPR_750_GQR0  (0x390)
>  #define SPR_440_DNV0  (0x390)
> diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
> index 4ec532c..bfdf028 100644
> --- a/target-ppc/translate_init.c
> +++ b/target-ppc/translate_init.c
> @@ -8226,10 +8226,18 @@ static void gen_spr_power8_book4(CPUPPCState *env)
>  _read_generic, SPR_NOACCESS,
>  _read_generic, _write_generic,
>  0);
> -spr_register(env, SPR_ACOP, "ACOP",
> - SPR_NOACCESS, SPR_NOACCESS,
> - _read_generic, _write_generic,
> - 0);
> +spr_register_kvm(env, SPR_ACOP, "ACOP",
> + SPR_NOACCESS, SPR_NOACCESS,
> + _read_generic, _write_generic,
> + KVM_REG_PPC_ACOP, 0);
> +spr_register_kvm(env, SPR_BOOKS_PID, "PID",
> + SPR_NOACCESS, SPR_NOACCESS,
> + _read_generic, _write_generic,
> + KVM_REG_PPC_PID, 0);
> +spr_register_kvm(env, SPR_WORT, "WORT",
> + SPR_NOACCESS, SPR_NOACCESS,
> + _read_generic, _write_generic,
> + KVM_REG_PPC_WORT, 0);
>  #endif
>  }

AFAICT all patches where you define new SPRs with spr_register_kvm[_hv]
are also important independently of the rest of your patch series -
otherwise these registers are currently lost during migration since they
are not sync'ed with the KVM part in the kernel right now.

So if you've got some spare time, could you maybe extract all those
patches that define new SPRs with spr_register_kvm[_hv] and send them as
a separate patch series? That could help to fix future migration issues,
and also would decrease the size of your really huge "Add native POWER8
platform" patch series a little bit!

 Thanks,
  Thomas

Re: [Qemu-devel] [PATCH 71/77] ppc: Add dummy ACOP SPR

2016-03-02 Thread Thomas Huth

On 11.11.2015 01:28, Benjamin Herrenschmidt wrote:
> Signed-off-by: Benjamin Herrenschmidt 
> ---
>  target-ppc/cpu.h| 1 +
>  target-ppc/translate_init.c | 4 
>  2 files changed, 5 insertions(+)
> 
> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
> index bf8892a..aa328a7 100644
> --- a/target-ppc/cpu.h
> +++ b/target-ppc/cpu.h
> @@ -1361,6 +1361,7 @@ static inline int cpu_mmu_index (CPUPPCState *env, bool 
> ifetch)
>  #define SPR_SRR1  (0x01B)
>  #define SPR_CFAR  (0x01C)
>  #define SPR_AMR   (0x01D)
> +#define SPR_ACOP  (0x01F)
>  #define SPR_BOOKE_PID (0x030)
>  #define SPR_BOOKE_DECAR   (0x036)
>  #define SPR_BOOKE_CSRR0   (0x03A)
> diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
> index b5fd076..4ec532c 100644
> --- a/target-ppc/translate_init.c
> +++ b/target-ppc/translate_init.c
> @@ -8226,6 +8226,10 @@ static void gen_spr_power8_book4(CPUPPCState *env)
>  _read_generic, SPR_NOACCESS,
>  _read_generic, _write_generic,
>  0);
> +spr_register(env, SPR_ACOP, "ACOP",
> + SPR_NOACCESS, SPR_NOACCESS,
> + _read_generic, _write_generic,
> + 0);
>  #endif
>  }

I think this patch should be merged with the next one (where you change
the ACOP hunk in translate_init.c again.

 Thomas

[Qemu-devel] [PATCH 3/3] ppc: Add a few more P8 PMU SPRs

2016-03-02 Thread Thomas Huth

From: Benjamin Herrenschmidt 

Signed-off-by: Benjamin Herrenschmidt 
---
 target-ppc/cpu.h|  7 +++
 target-ppc/translate_init.c | 28 
 2 files changed, 35 insertions(+)

diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index 8fc0fb4..8d90d86 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -1564,6 +1564,7 @@ static inline int cpu_mmu_index (CPUPPCState *env, bool 
ifetch)
 #define SPR_PERF0 (0x300)
 #define SPR_RCPU_MI_RBA0  (0x300)
 #define SPR_MPC_MI_CTR(0x300)
+#define SPR_POWER_USIER   (0x300)
 #define SPR_PERF1 (0x301)
 #define SPR_RCPU_MI_RBA1  (0x301)
 #define SPR_POWER_UMMCR2  (0x301)
@@ -1613,6 +1614,7 @@ static inline int cpu_mmu_index (CPUPPCState *env, bool 
ifetch)
 #define SPR_PERFF (0x30F)
 #define SPR_MPC_MD_TW (0x30F)
 #define SPR_UPERF0(0x310)
+#define SPR_POWER_SIER(0x310)
 #define SPR_UPERF1(0x311)
 #define SPR_POWER_MMCR2   (0x311)
 #define SPR_UPERF2(0x312)
@@ -1674,7 +1676,12 @@ static inline int cpu_mmu_index (CPUPPCState *env, bool 
ifetch)
 #define SPR_440_ITV2  (0x376)
 #define SPR_440_ITV3  (0x377)
 #define SPR_440_CCR1  (0x378)
+#define SPR_TACR  (0x378)
+#define SPR_TCSCR (0x379)
+#define SPR_CSIGR (0x37a)
 #define SPR_DCRIPR(0x37B)
+#define SPR_POWER_SPMC1   (0x37C)
+#define SPR_POWER_SPMC2   (0x37D)
 #define SPR_POWER_MMCRS   (0x37E)
 #define SPR_PPR   (0x380)
 #define SPR_750_GQR0  (0x390)
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index 48a1635..06b008de 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -7603,6 +7603,30 @@ static void gen_spr_power8_pmu_sup(CPUPPCState *env)
  SPR_NOACCESS, SPR_NOACCESS,
  _read_generic, _write_generic,
  KVM_REG_PPC_MMCRS, 0x);
+spr_register_kvm(env, SPR_POWER_SIER, "SIER",
+ SPR_NOACCESS, SPR_NOACCESS,
+ _read_generic, _write_generic,
+ KVM_REG_PPC_SIER, 0x);
+spr_register_kvm(env, SPR_POWER_SPMC1, "SPMC1",
+ SPR_NOACCESS, SPR_NOACCESS,
+ _read_generic, _write_generic,
+ KVM_REG_PPC_SPMC1, 0x);
+spr_register_kvm(env, SPR_POWER_SPMC2, "SPMC2",
+ SPR_NOACCESS, SPR_NOACCESS,
+ _read_generic, _write_generic,
+ KVM_REG_PPC_SPMC2, 0x);
+spr_register_kvm(env, SPR_TACR, "TACR",
+ SPR_NOACCESS, SPR_NOACCESS,
+ _read_generic, _write_generic,
+ KVM_REG_PPC_TACR, 0x);
+spr_register_kvm(env, SPR_TCSCR, "TCSCR",
+ SPR_NOACCESS, SPR_NOACCESS,
+ _read_generic, _write_generic,
+ KVM_REG_PPC_TCSCR, 0x);
+spr_register_kvm(env, SPR_CSIGR, "CSIGR",
+ SPR_NOACCESS, SPR_NOACCESS,
+ _read_generic, _write_generic,
+ KVM_REG_PPC_CSIGR, 0x);
 }
 
 static void gen_spr_power8_pmu_user(CPUPPCState *env)
@@ -7611,6 +7635,10 @@ static void gen_spr_power8_pmu_user(CPUPPCState *env)
  _read_ureg, SPR_NOACCESS,
  _read_ureg, _write_ureg,
  0x);
+spr_register(env, SPR_POWER_USIER, "USIER",
+ _read_generic, SPR_NOACCESS,
+ _read_generic, _write_generic,
+ 0x);
 }
 
 static void gen_spr_power5p_ear(CPUPPCState *env)
-- 
1.8.3.1

[Qemu-devel] [PATCH 2/3] ppc: Fix migration of the TAR SPR

2016-03-02 Thread Thomas Huth

The TAR special purpose register currently does not get migrated
under KVM because it does not get synchronized with the kernel.
Use spr_register_kvm() instead of spr_register() to fix this issue.

Signed-off-by: Thomas Huth 
---
 target-ppc/translate_init.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index f72148c..48a1635 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -7714,10 +7714,10 @@ static void spr_write_tar(DisasContext *ctx, int sprn, 
int gprn)
 
 static void gen_spr_power8_tce_address_control(CPUPPCState *env)
 {
-spr_register(env, SPR_TAR, "TAR",
- _read_tar, _write_tar,
- _read_generic, _write_generic,
- 0x);
+spr_register_kvm(env, SPR_TAR, "TAR",
+ _read_tar, _write_tar,
+ _read_generic, _write_generic,
+ KVM_REG_PPC_TAR, 0x);
 }
 
 static void spr_read_tm(DisasContext *ctx, int gprn, int sprn)
-- 
1.8.3.1

[Qemu-devel] [PATCH 0/3] ppc: Define some more SPRs of POWER8 in QEMU to fix migration

2016-03-02 Thread Thomas Huth

While tinkering with the new kvm-unit-tests framework for Power,
I discovered that a couple of SPRs are destroyed during migration.
We've got to define them in QEMU and make sure that they are
synchronized with the kernel to make sure that the register
contents are not lost.
The first patch introduces the new PSPB register from POWER8,
second patcch fixes the definition of the TAR register, and
the third patch (which has been taken from Ben's "Add native
POWER8 platform" patch series) introduces some missing
performance monitor registers.

Benjamin Herrenschmidt (1):
  ppc: Add a few more P8 PMU SPRs

Thomas Huth (2):
  ppc: Define the PSPB register on POWER8
  ppc: Fix migration of the TAR SPR

 target-ppc/cpu.h|  8 
 target-ppc/translate_init.c | 45 +
 2 files changed, 49 insertions(+), 4 deletions(-)

-- 
1.8.3.1

[Qemu-devel] [PATCH 1/3] ppc: Define the PSPB register on POWER8

2016-03-02 Thread Thomas Huth

POWER8 / PowerISA 2.07 has a new special purpose register called PSPB
("Problem State Priority Boost Register"). The contents of this register
are currently lost during migration. To be able to migrate this register,
too, we've got to define this SPR along with the other SPRs of POWER8.

Signed-off-by: Thomas Huth 
---
 target-ppc/cpu.h| 1 +
 target-ppc/translate_init.c | 9 +
 2 files changed, 10 insertions(+)

diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index 2b10597..8fc0fb4 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -1380,6 +1380,7 @@ static inline int cpu_mmu_index (CPUPPCState *env, bool 
ifetch)
 #define SPR_UAMOR (0x09D)
 #define SPR_MPC_ICTRL (0x09E)
 #define SPR_MPC_BAR   (0x09F)
+#define SPR_PSPB  (0x09F)
 #define SPR_VRSAVE(0x100)
 #define SPR_USPRG0(0x100)
 #define SPR_USPRG1(0x101)
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index bd0cffc..f72148c 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -7842,6 +7842,14 @@ static void gen_spr_power8_fscr(CPUPPCState *env)
  KVM_REG_PPC_FSCR, initval);
 }
 
+static void gen_spr_power8_pspb(CPUPPCState *env)
+{
+spr_register_kvm(env, SPR_PSPB, "PSPB",
+ SPR_NOACCESS, SPR_NOACCESS,
+ _read_generic, _write_generic32,
+ KVM_REG_PPC_PSPB, 0);
+}
+
 static void init_proc_book3s_64(CPUPPCState *env, int version)
 {
 gen_spr_ne_601(env);
@@ -7892,6 +7900,7 @@ static void init_proc_book3s_64(CPUPPCState *env, int 
version)
 gen_spr_power8_pmu_sup(env);
 gen_spr_power8_pmu_user(env);
 gen_spr_power8_tm(env);
+gen_spr_power8_pspb(env);
 gen_spr_vtb(env);
 }
 if (version < BOOK3S_CPU_POWER8) {
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH v2 11/19] qapi: Add type.is_empty() helper

2016-03-02 Thread Eric Blake

On 03/02/2016 12:04 PM, Markus Armbruster wrote:
> Eric Blake  writes:
> 
>> And use it in qapi-types and qapi-event.  Down the road, we may
>> want to lift our artificial restriction of no variants at the
>> top level of an event, at which point, inlining our check for
>> whether members is empty will no longer be sufficient, but
>> adding a check for variants adds verbosity; in the meantime,
>> add some asserts in places where we don't handle variants.
> 
> Perhaps I'm just running out of steam for today, but I've read this
> twice, and still don't get why adding these assertions goes in the same
> patch as adding the helper, or what it has to do with events.

And yet it was the review on the earlier posting that caused me to add
asserts; maybe re-reading that thread will help refresh memory, and spur
an idea for how to better express it in the commit message:
https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg04726.html

> 
>> More immediately, the new .is_empty() helper will help fix a bug
>> in qapi-visit in the next patch, where the generator did not
>> handle an explicit empty type in the same was as a missing type.
> 
> same way

[Ever wonder if I intentionally stick in a typo, just to see who will
notice? Or maybe it really was a slip of the finger...]

>> +++ b/scripts/qapi-event.py
>> @@ -39,7 +39,7 @@ def gen_event_send(name, arg_type):
>>  ''',
>>  proto=gen_event_send_proto(name, arg_type))
>>
>> -if arg_type and arg_type.members:
>> +if arg_type and not arg_type.is_empty():
>>  ret += mcgen('''
>>  QmpOutputVisitor *qov;
>>  Visitor *v;
> 
> Oh, you don't just add a helper, you actually *change* the condition!
> Perhaps the commit message would be easier to understand if it explained
> that first.

The old condition:
arg_type and arg_type.members

New condition:
arg_type and (arg_type.members or arg_type.variants)

But we know there are no variants, since unions cannot (yet) be passed
as event 'data', so the condition is the same effect now, and
future-proofing for a future patch when I do allow unions in events.

>> +++ b/scripts/qapi-types.py
>> @@ -90,7 +90,7 @@ struct %(c_name)s {
>>  # potential issues with attempting to malloc space for zero-length
>>  # structs in C, and also incompatibility with C++ (where an empty
>>  # struct is size 1).
>> -if not (base and base.members) and not members and not variants:
>> +if (not base or base.is_empty()) and not members and not variants:
>>  ret += mcgen('''
>>  char qapi_dummy_for_empty_struct;
>>  ''')
> 
> I figure the case for the helper based on this patch alone is making the
> code a bit more future-proof.  Suggest you try to explain that in your
> commit message, including against what future change exactly you're
> proofing the code.

And here, bases cannot (yet) have variants, but that's also on my plate
of things I'd like to support in the future.

> 
> Haven't reviewed for completeness.
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH v2 1/2] trace: include CPU index in trace_memory_region_*()

2016-03-02 Thread Hollis Blanchard

Knowing which CPU performed an action is essential for understanding SMP guest
behavior.

However, cpu_physical_memory_rw() may be executed by a machine init function,
before any VCPUs are running, when there is no CPU running ('current_cpu' is
NULL). In this case, store -1 in the trace record as the CPU index. Trace
analysis tools may need to be aware of this special case.

Signed-off-by: Hollis Blanchard 
---
v2: use get_cpu_index() helper function
---
 memory.c | 32 
 trace-events |  8 
 2 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/memory.c b/memory.c
index 013c2ed..89395e6 100644
--- a/memory.c
+++ b/memory.c
@@ -386,6 +386,14 @@ static hwaddr memory_region_to_absolute_addr(MemoryRegion 
*mr, hwaddr offset)
 return abs_addr;
 }
 
+static int get_cpu_index(void)
+{
+if (current_cpu) {
+return current_cpu->cpu_index;
+}
+return -1;
+}
+
 static MemTxResult memory_region_oldmmio_read_accessor(MemoryRegion *mr,
hwaddr addr,
uint64_t *value,
@@ -398,10 +406,10 @@ static MemTxResult 
memory_region_oldmmio_read_accessor(MemoryRegion *mr,
 
 tmp = mr->ops->old_mmio.read[ctz32(size)](mr->opaque, addr);
 if (mr->subpage) {
-trace_memory_region_subpage_read(mr, addr, tmp, size);
+trace_memory_region_subpage_read(get_cpu_index(), mr, addr, tmp, size);
 } else if (TRACE_MEMORY_REGION_OPS_READ_ENABLED) {
 hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
-trace_memory_region_ops_read(mr, abs_addr, tmp, size);
+trace_memory_region_ops_read(get_cpu_index(), mr, abs_addr, tmp, size);
 }
 *value |= (tmp & mask) << shift;
 return MEMTX_OK;
@@ -419,10 +427,10 @@ static MemTxResult  
memory_region_read_accessor(MemoryRegion *mr,
 
 tmp = mr->ops->read(mr->opaque, addr, size);
 if (mr->subpage) {
-trace_memory_region_subpage_read(mr, addr, tmp, size);
+trace_memory_region_subpage_read(get_cpu_index(), mr, addr, tmp, size);
 } else if (TRACE_MEMORY_REGION_OPS_READ_ENABLED) {
 hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
-trace_memory_region_ops_read(mr, abs_addr, tmp, size);
+trace_memory_region_ops_read(get_cpu_index(), mr, abs_addr, tmp, size);
 }
 *value |= (tmp & mask) << shift;
 return MEMTX_OK;
@@ -441,10 +449,10 @@ static MemTxResult 
memory_region_read_with_attrs_accessor(MemoryRegion *mr,
 
 r = mr->ops->read_with_attrs(mr->opaque, addr, , size, attrs);
 if (mr->subpage) {
-trace_memory_region_subpage_read(mr, addr, tmp, size);
+trace_memory_region_subpage_read(get_cpu_index(), mr, addr, tmp, size);
 } else if (TRACE_MEMORY_REGION_OPS_READ_ENABLED) {
 hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
-trace_memory_region_ops_read(mr, abs_addr, tmp, size);
+trace_memory_region_ops_read(get_cpu_index(), mr, abs_addr, tmp, size);
 }
 *value |= (tmp & mask) << shift;
 return r;
@@ -462,10 +470,10 @@ static MemTxResult 
memory_region_oldmmio_write_accessor(MemoryRegion *mr,
 
 tmp = (*value >> shift) & mask;
 if (mr->subpage) {
-trace_memory_region_subpage_write(mr, addr, tmp, size);
+trace_memory_region_subpage_write(get_cpu_index(), mr, addr, tmp, 
size);
 } else if (TRACE_MEMORY_REGION_OPS_WRITE_ENABLED) {
 hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
-trace_memory_region_ops_write(mr, abs_addr, tmp, size);
+trace_memory_region_ops_write(get_cpu_index(), mr, abs_addr, tmp, 
size);
 }
 mr->ops->old_mmio.write[ctz32(size)](mr->opaque, addr, tmp);
 return MEMTX_OK;
@@ -483,10 +491,10 @@ static MemTxResult 
memory_region_write_accessor(MemoryRegion *mr,
 
 tmp = (*value >> shift) & mask;
 if (mr->subpage) {
-trace_memory_region_subpage_write(mr, addr, tmp, size);
+trace_memory_region_subpage_write(get_cpu_index(), mr, addr, tmp, 
size);
 } else if (TRACE_MEMORY_REGION_OPS_WRITE_ENABLED) {
 hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
-trace_memory_region_ops_write(mr, abs_addr, tmp, size);
+trace_memory_region_ops_write(get_cpu_index(), mr, abs_addr, tmp, 
size);
 }
 mr->ops->write(mr->opaque, addr, tmp, size);
 return MEMTX_OK;
@@ -504,10 +512,10 @@ static MemTxResult 
memory_region_write_with_attrs_accessor(MemoryRegion *mr,
 
 tmp = (*value >> shift) & mask;
 if (mr->subpage) {
-trace_memory_region_subpage_write(mr, addr, tmp, size);
+trace_memory_region_subpage_write(get_cpu_index(), mr, addr, tmp, 
size);
 } else if (TRACE_MEMORY_REGION_OPS_WRITE_ENABLED) {
 hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
-trace_memory_region_ops_write(mr, abs_addr, tmp,

[Qemu-devel] [PATCH v2 2/2] trace: separate MMIO tracepoints from TB-access tracepoints

2016-03-02 Thread Hollis Blanchard

Memory accesses to code which has previously been translated into a TB show up
in the MMIO path, so that they may invalidate the TB. It's extremely confusing
to mix those in with device MMIOs, so split them into their own tracepoint.

Signed-off-by: Hollis Blanchard 
Reviewed-by: Stefan Hajnoczi 
---
It took many hours to figure out why some RAM accesses were coming through the
MMIO path instead of being handled inline in the TBs.

On IRC, Paolo expressed some concern about performance, but ultimately agreed
that adding one conditional to an already heavy codepath wouldn't have much
impact.

v2: rename trace_memory_region_ops_tb_read/write to
trace_memory_region_tb_read/write
---
 memory.c | 30 ++
 trace-events |  2 ++
 2 files changed, 32 insertions(+)

diff --git a/memory.c b/memory.c
index 89395e6..84347fa 100644
--- a/memory.c
+++ b/memory.c
@@ -407,6 +407,11 @@ static MemTxResult 
memory_region_oldmmio_read_accessor(MemoryRegion *mr,
 tmp = mr->ops->old_mmio.read[ctz32(size)](mr->opaque, addr);
 if (mr->subpage) {
 trace_memory_region_subpage_read(get_cpu_index(), mr, addr, tmp, size);
+} else if (mr == _mem_notdirty) {
+/* Accesses to code which has previously been translated into a TB show
+ * up in the MMIO path, as accesses to the io_mem_notdirty
+ * MemoryRegion. */
+trace_memory_region_tb_read(get_cpu_index(), addr, tmp, size);
 } else if (TRACE_MEMORY_REGION_OPS_READ_ENABLED) {
 hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
 trace_memory_region_ops_read(get_cpu_index(), mr, abs_addr, tmp, size);
@@ -428,6 +433,11 @@ static MemTxResult  
memory_region_read_accessor(MemoryRegion *mr,
 tmp = mr->ops->read(mr->opaque, addr, size);
 if (mr->subpage) {
 trace_memory_region_subpage_read(get_cpu_index(), mr, addr, tmp, size);
+} else if (mr == _mem_notdirty) {
+/* Accesses to code which has previously been translated into a TB show
+ * up in the MMIO path, as accesses to the io_mem_notdirty
+ * MemoryRegion. */
+trace_memory_region_tb_read(get_cpu_index(), addr, tmp, size);
 } else if (TRACE_MEMORY_REGION_OPS_READ_ENABLED) {
 hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
 trace_memory_region_ops_read(get_cpu_index(), mr, abs_addr, tmp, size);
@@ -450,6 +460,11 @@ static MemTxResult 
memory_region_read_with_attrs_accessor(MemoryRegion *mr,
 r = mr->ops->read_with_attrs(mr->opaque, addr, , size, attrs);
 if (mr->subpage) {
 trace_memory_region_subpage_read(get_cpu_index(), mr, addr, tmp, size);
+} else if (mr == _mem_notdirty) {
+/* Accesses to code which has previously been translated into a TB show
+ * up in the MMIO path, as accesses to the io_mem_notdirty
+ * MemoryRegion. */
+trace_memory_region_tb_read(get_cpu_index(), addr, tmp, size);
 } else if (TRACE_MEMORY_REGION_OPS_READ_ENABLED) {
 hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
 trace_memory_region_ops_read(get_cpu_index(), mr, abs_addr, tmp, size);
@@ -471,6 +486,11 @@ static MemTxResult 
memory_region_oldmmio_write_accessor(MemoryRegion *mr,
 tmp = (*value >> shift) & mask;
 if (mr->subpage) {
 trace_memory_region_subpage_write(get_cpu_index(), mr, addr, tmp, 
size);
+} else if (mr == _mem_notdirty) {
+/* Accesses to code which has previously been translated into a TB show
+ * up in the MMIO path, as accesses to the io_mem_notdirty
+ * MemoryRegion. */
+trace_memory_region_tb_write(get_cpu_index(), addr, tmp, size);
 } else if (TRACE_MEMORY_REGION_OPS_WRITE_ENABLED) {
 hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
 trace_memory_region_ops_write(get_cpu_index(), mr, abs_addr, tmp, 
size);
@@ -492,6 +512,11 @@ static MemTxResult 
memory_region_write_accessor(MemoryRegion *mr,
 tmp = (*value >> shift) & mask;
 if (mr->subpage) {
 trace_memory_region_subpage_write(get_cpu_index(), mr, addr, tmp, 
size);
+} else if (mr == _mem_notdirty) {
+/* Accesses to code which has previously been translated into a TB show
+ * up in the MMIO path, as accesses to the io_mem_notdirty
+ * MemoryRegion. */
+trace_memory_region_tb_write(get_cpu_index(), addr, tmp, size);
 } else if (TRACE_MEMORY_REGION_OPS_WRITE_ENABLED) {
 hwaddr abs_addr = memory_region_to_absolute_addr(mr, addr);
 trace_memory_region_ops_write(get_cpu_index(), mr, abs_addr, tmp, 
size);
@@ -513,6 +538,11 @@ static MemTxResult 
memory_region_write_with_attrs_accessor(MemoryRegion *mr,
 tmp = (*value >> shift) & mask;
 if (mr->subpage) {
 trace_memory_region_subpage_write(get_cpu_index(), mr, addr, tmp, 
size);
+} else if (mr == _mem_notdirty) {
+/* Accesses to code which has

Re: [Qemu-devel] [PATCH v2 01/19] qapi: Rename 'fields' to 'members' in internal interface

2016-03-02 Thread Eric Blake

On 03/02/2016 10:15 AM, Markus Armbruster wrote:
> Eric Blake  writes:
> 
>> C types and JSON objects don't have fields, but members.  We
>> shouldn't gratuitously invent terminology.  This patch is a
>> strict renaming of generator code and static genarated functions,
>> plus the naming of the dummy filler member for empty structs,
>> before the next patch exposes some of that naming to the rest of
>> the code base.
>>
>> Suggested-by: Markus Armbruster 
>> Signed-off-by: Eric Blake 
>>
>> ---
>> v2: new patch
> 
> Patch looks good.  You could split it into Python renames (no change to
> generated code) and C renames.  Only if you like the idea.

Sure, not too much work.

> 
> If you want to be *really* thorough: there's a "field" left in
> tests/qapi-schema/qapi-schema-test.json, and a few in
> docs/qapi-code-gen.txt.

I'll save docs/ for 3/19, but get the .json file fixed along with the
Python renames.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v7 5/6] s390x/cpu: Add error handling to cpu creation

2016-03-02 Thread Matthew Rosato

>> +static void s390_cpu_get_id(Object *obj, Visitor *v, const char *name,
>> +void *opaque, Error **errp)
>> +{
>> +S390CPU *cpu = S390_CPU(obj);
>> +int64_t value = cpu->id;
>> +
>> +visit_type_int(v, name, , errp);
>> +}
>> +
>> +static void s390_cpu_set_id(Object *obj, Visitor *v, const char *name,
>> +void *opaque, Error **errp)
>> +{
>> +S390CPU *cpu = S390_CPU(obj);
>> +DeviceState *dev = DEVICE(obj);
>> +const int64_t min = 0;
>> +const int64_t max = UINT32_MAX;
>> +Error *local_err = NULL;
>> +int64_t value;
>> +
>> +if (dev->realized) {
>> +error_setg(errp, "Attempt to set property '%s' on '%s' after "
>> +   "it was realized", name, object_get_typename(obj));
>> +return;
>> +}
>> +
>> +visit_type_int(v, name, , _err);
>> +if (local_err) {
>> +error_propagate(errp, local_err);
>> +return;
>> +}
>> +if (value < min || value > max) {
>> +error_setg(errp, "Property %s.%s doesn't take value %" PRId64
>> +   " (minimum: %" PRId64 ", maximum: %" PRId64 ")" ,
>> +   object_get_typename(obj), name, value, min, max);
>> +return;
>> +}
>> +if ((value != cpu->id) && cpu_exists(value)) {
>> +error_setg(errp, "CPU with ID %" PRIi64 " exists", value);
>> +return;
>> +}
>> +cpu->id = value;
>> +}
> 
> Just curious, what about using a simple
> 
> object_property_set_int() and doing all the checks in realize() ?
> 
> Then we could live without manual getter/setter (and without the realize 
> check).
> 

I think we still need at least a manual setter, even if you want to move
the checks to realize.

See something like object_property_add_uint64_ptr() -- It sets a
boilerplate get routine, and no set routine -- I think this presumes you
set your property upfront (at add time), never change it for the life of
the object, but want to read it later.
By comparison, S390CPU.id is set sometime after instance_init, based on
input.

So, we call object_property_set_int() to update it --  This just passes
the provided int value to the setter routine associated with the
property.  If one doesn't exist, you get:
qemu: Insufficient permission to perform this operation

I think this is also why we want to check for dev->realized in the
setter routine, to make sure the property is not being changed "too
late" -- Once the cpu is realized, the ID is baked and can't be changed.

Or did I misunderstand your idea here?

Matt

Re: [Qemu-devel] [PATCH 33/38] ivshmem: Replace int role_val by OnOffAuto master

2016-03-02 Thread Markus Armbruster

Marc-André Lureau  writes:

> Hi
>
> On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster  wrote:
>> In preparation of making it a qdev property.
>>
>> Signed-off-by: Markus Armbruster 
>> --
>>  hw/misc/ivshmem.c | 31 +++
>>  1 file changed, 19 insertions(+), 12 deletions(-)
>>
>> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
>> index 785ed1c..b39ea27 100644
>> --- a/hw/misc/ivshmem.c
>> +++ b/hw/misc/ivshmem.c
>> @@ -43,9 +43,6 @@
>>  #define IVSHMEM_IOEVENTFD   0
>>  #define IVSHMEM_MSI 1
>>
>> -#define IVSHMEM_PEER0
>> -#define IVSHMEM_MASTER  1
>> -
>>  #define IVSHMEM_REG_BAR_SIZE 0x100
>>
>>  #define IVSHMEM_DEBUG 0
>> @@ -96,12 +93,12 @@ typedef struct IVShmemState {
>>  uint64_t msg_buf;   /* buffer for receiving server messages */
>>  int msg_buffered_bytes; /* #bytes in @msg_buf */
>>
>> +OnOffAuto master;
>>  Error *migration_blocker;
>>
>>  char * shmobj;
>>  char * sizearg;
>>  char * role;
>> -int role_val;   /* scalar to avoid multiple string comparisons */
>>  } IVShmemState;
>>
>>  /* registers for the Inter-VM shared memory device */
>> @@ -117,6 +114,12 @@ static inline uint32_t ivshmem_has_feature(IVShmemState 
>> *ivs,
>>  return (ivs->features & (1 << feature));
>>  }
>>
>> +static inline bool ivshmem_is_master(IVShmemState *s)
>> +{
>> +assert(s->master != ON_OFF_AUTO_AUTO);
>> +return s->master == ON_OFF_AUTO_ON;
>> +}
>> +
>>  static void ivshmem_update_irq(IVShmemState *s)
>>  {
>>  PCIDevice *d = PCI_DEVICE(s);
>> @@ -861,15 +864,15 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error 
>> **errp)
>>  /* check that role is reasonable */
>>  if (s->role) {
>>  if (strncmp(s->role, "peer", 5) == 0) {
>> -s->role_val = IVSHMEM_PEER;
>> +s->master = ON_OFF_AUTO_OFF;
>>  } else if (strncmp(s->role, "master", 7) == 0) {
>> -s->role_val = IVSHMEM_MASTER;
>> +s->master = ON_OFF_AUTO_ON;
>>  } else {
>>  error_setg(errp, "'role' must be 'peer' or 'master'");
>>  return;
>>  }
>>  } else {
>> -s->role_val = IVSHMEM_MASTER; /* default */
>> +s->master = ON_OFF_AUTO_AUTO;
>>  }
>>
>>  pci_conf = dev->config;
>> @@ -931,7 +934,11 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error 
>> **errp)
>>  vmstate_register_ram(s->ivshmem_bar2, DEVICE(s));
>>  pci_register_bar(PCI_DEVICE(s), 2, attr, s->ivshmem_bar2);
>>
>> -if (s->role_val == IVSHMEM_PEER) {
>> +if (s->master == ON_OFF_AUTO_AUTO) {
>> +s->master = s->vm_id == 0 ? ON_OFF_AUTO_ON : ON_OFF_AUTO_OFF;
>> +}
>> +
>> +if (ivshmem_is_master(s)) {
>
> !ivshmem_is_master() instead, or ivshmem_is_peer().

Another stupid mistake...

>>  error_setg(>migration_blocker,
>> "Migration is disabled when using feature 'peer mode' in 
>> device 'ivshmem'");

Note to self: improve this message while there.

>>  migrate_add_blocker(s->migration_blocker);
>> @@ -993,7 +1000,7 @@ static int ivshmem_pre_load(void *opaque)
>>  {
>>  IVShmemState *s = opaque;
>>
>> -if (s->role_val == IVSHMEM_PEER) {
>> +if (ivshmem_is_master(s)) {
>
> same here

Yup.  Thanks!

>>  error_report("'peer' devices are not migratable");
>>  return -EINVAL;
>>  }
>> @@ -1019,9 +1026,9 @@ static int ivshmem_load_old(QEMUFile *f, void *opaque, 
>> int version_id)
>>  return -EINVAL;
>>  }
>>
>> -if (s->role_val == IVSHMEM_PEER) {
>> -error_report("'peer' devices are not migratable");
>> -return -EINVAL;
>> +ret = ivshmem_pre_load(s);
>> +if (ret) {
>> +return ret;
>>  }
>>
>>  ret = pci_device_load(pdev, f);
>> --
>> 2.4.3
>>
>>

Re: [Qemu-devel] [PATCH 27/38] ivshmem: Simplify how we cope with short reads from server

2016-03-02 Thread Markus Armbruster

Marc-André Lureau  writes:

> Hi
>
> On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster  wrote:
>> Short reads from a UNIX domain sockets are exceedingly unlikely when
>> the other side always sends eight bytes and we always read eight
>> bytes.  We cope with them anyway.  However, the code doing that is
>> rather convoluted.  Dumb it down radically.
>>
>> Replace the convoluted code
>
> agreed
>
>>
>> Signed-off-by: Markus Armbruster 
>> ---
>>  hw/misc/ivshmem.c | 76 
>> ---
>>  1 file changed, 16 insertions(+), 60 deletions(-)
>>
>> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
>> index e578b8a..fb8a4f7 100644
>> --- a/hw/misc/ivshmem.c
>> +++ b/hw/misc/ivshmem.c
>> @@ -26,7 +26,6 @@
>>  #include "migration/migration.h"
>>  #include "qemu/error-report.h"
>>  #include "qemu/event_notifier.h"
>> -#include "qemu/fifo8.h"
>>  #include "sysemu/char.h"
>>  #include "sysemu/hostmem.h"
>>  #include "qapi/visitor.h"
>> @@ -80,7 +79,6 @@ typedef struct IVShmemState {
>>  uint32_t intrstatus;
>>
>>  CharDriverState *server_chr;
>> -Fifo8 incoming_fifo;
>>  MemoryRegion ivshmem_mmio;
>>
>>  /* We might need to register the BAR before we actually have the memory.
>> @@ -99,6 +97,8 @@ typedef struct IVShmemState {
>>  uint32_t vectors;
>>  uint32_t features;
>>  MSIVector *msi_vectors;
>> +uint64_t msg_buf;   /* buffer for receiving server messages */
>> +int msg_buffered_bytes; /* #bytes in @msg_buf */
>>
>>  Error *migration_blocker;
>>
>> @@ -255,11 +255,6 @@ static const MemoryRegionOps ivshmem_mmio_ops = {
>>  },
>>  };
>>
>> -static int ivshmem_can_receive(void * opaque)
>> -{
>> -return sizeof(int64_t);
>> -}
>> -
>>  static void ivshmem_vector_notify(void *opaque)
>>  {
>>  MSIVector *entry = opaque;
>> @@ -459,53 +454,6 @@ static void resize_peers(IVShmemState *s, int nb_peers)
>>  }
>>  }
>>
>> -static bool fifo_update_and_get(IVShmemState *s, const uint8_t *buf, int 
>> size,
>> -void *data, size_t len)
>> -{
>> -const uint8_t *p;
>> -uint32_t num;
>> -
>> -assert(len <= sizeof(int64_t)); /* limitation of the fifo */
>> -if (fifo8_is_empty(>incoming_fifo) && size == len) {
>> -memcpy(data, buf, size);
>> -return true;
>> -}
>> -
>> -IVSHMEM_DPRINTF("short read of %d bytes\n", size);
>> -
>> -num = MIN(size, sizeof(int64_t) - fifo8_num_used(>incoming_fifo));
>> -fifo8_push_all(>incoming_fifo, buf, num);
>> -
>> -if (fifo8_num_used(>incoming_fifo) < len) {
>> -assert(num == 0);
>> -return false;
>> -}
>> -
>> -size -= num;
>> -buf += num;
>> -p = fifo8_pop_buf(>incoming_fifo, len, );
>> -assert(num == len);
>> -
>> -memcpy(data, p, len);
>> -
>> -if (size > 0) {
>> -fifo8_push_all(>incoming_fifo, buf, size);
>> -}
>> -
>> -return true;
>> -}
>> -
>> -static bool fifo_update_and_get_i64(IVShmemState *s,
>> -const uint8_t *buf, int size, int64_t 
>> *i64)
>> -{
>> -if (fifo_update_and_get(s, buf, size, i64, sizeof(*i64))) {
>> -*i64 = GINT64_FROM_LE(*i64);
>> -return true;
>> -}
>> -
>> -return false;
>> -}
>> -
>>  static void ivshmem_add_kvm_msi_virq(IVShmemState *s, int vector,
>>   Error **errp)
>>  {
>> @@ -658,6 +606,14 @@ static void process_msg(IVShmemState *s, int64_t msg, 
>> int fd, Error **errp)
>>  }
>>  }
>>
>> +static int ivshmem_can_receive(void *opaque)
>> +{
>> +IVShmemState *s = opaque;
>> +
>> +assert(s->msg_buffered_bytes < sizeof(s->msg_buf));
>> +return sizeof(s->msg_buf) - s->msg_buffered_bytes;
>> +}
>> +
>>  static void ivshmem_read(void *opaque, const uint8_t *buf, int size)
>>  {
>>  IVShmemState *s = opaque;
>> @@ -665,8 +621,12 @@ static void ivshmem_read(void *opaque, const uint8_t 
>> *buf, int size)
>>  int incoming_fd;
>>  int64_t incoming_posn;
>>
>> -if (!fifo_update_and_get_i64(s, buf, size, _posn)) {
>> -return;
>> +assert(size >= 0 && s->msg_buffered_bytes + size <= sizeof(s->msg_buf));
>> +memcpy((unsigned char *)>msg_buf + s->msg_buffered_bytes, buf, size);
>> +s->msg_buffered_bytes += size;
>> +if (s->msg_buffered_bytes == sizeof(s->msg_buf)) {
>> +incoming_posn = le64_to_cpu(s->msg_buf);
>> +s->msg_buffered_bytes = 0;
>>  }
>>
>
> missing "else return" though.

Indeed.  Glad you caught my screwup.

>>  incoming_fd = qemu_chr_fe_get_msgfd(s->server_chr);
>> @@ -1019,8 +979,6 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error 
>> **errp)
>>  }
>>  }
>>
>> -fifo8_create(>incoming_fifo, sizeof(int64_t));
>> -
>>  if (s->role_val == IVSHMEM_PEER) {
>>  error_setg(>migration_blocker,
>> "Migration is disabled

Re: [Qemu-devel] [PATCH 24/38] ivshmem: Propagate errors through ivshmem_recv_setup()

2016-03-02 Thread Markus Armbruster

Marc-André Lureau  writes:

> On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster  wrote:
>> This kills off the funny state described in the previous commit.
>>
>> Simplify ivshmem_io_read() accordingly, and update documentation.
>>
>> Signed-off-by: Markus Armbruster 
>> ---
>>  docs/specs/ivshmem-spec.txt |  20 
>>  hw/misc/ivshmem.c   | 121 
>> +++-
>>  qemu-doc.texi   |   9 +---
>>  3 files changed, 87 insertions(+), 63 deletions(-)
>>
>> diff --git a/docs/specs/ivshmem-spec.txt b/docs/specs/ivshmem-spec.txt
>> index 4fc6f37..3eb8c97 100644
>> --- a/docs/specs/ivshmem-spec.txt
>> +++ b/docs/specs/ivshmem-spec.txt
>> @@ -62,11 +62,11 @@ There are two ways to use this device:
>>likely want to write a kernel driver to handle interrupts.  Requires
>>the device to be configured for interrupts, obviously.
>>
>> -If the device is configured for interrupts, BAR2 is initially invalid.
>> -It becomes safely accessible only after the ivshmem server provided
>> -the shared memory.  Guest software should wait for the IVPosition
>> -register (described below) to become non-negative before accessing
>> -BAR2.
>> +Before QEMU 2.6.0, BAR2 can initially be invalid if the device is
>> +configured for interrupts.  It becomes safely accessible only after
>> +the ivshmem server provided the shared memory.  Guest software should
>> +wait for the IVPosition register (described below) to become
>> +non-negative before accessing BAR2.
>>
>>  The device is not capable to tell guest software whether it is
>>  configured for interrupts.
>> @@ -82,7 +82,7 @@ BAR 0 contains the following registers:
>>  4 4   read/write0   Interrupt Status
>>  bit 0: peer interrupt
>>  bit 1..31: reserved
>> -8 4   read-only   0 or -1   IVPosition
>> +8 4   read-only   0 or ID   IVPosition
>> 12 4   write-only  N/A   Doorbell
>>  bit 0..15: vector
>>  bit 16..31: peer ID
>> @@ -100,12 +100,14 @@ when an interrupt request from a peer is received.  
>> Reading the
>>  register clears it.
>>
>>  IVPosition Register: if the device is not configured for interrupts,
>> -this is zero.  Else, it's -1 for a short while after reset, then
>> -changes to the device's ID (between 0 and 65535).
>> +this is zero.  Else, it is the device's ID (between 0 and 65535).
>> +
>> +Before QEMU 2.6.0, the register may read -1 for a short while after
>> +reset.
>>
>>  There is no good way for software to find out whether the device is
>>  configured for interrupts.  A positive IVPosition means interrupts,
>> -but zero could be either.  The initial -1 cannot be reliably observed.
>> +but zero could be either.
>>
>>  Doorbell Register: writing this register requests to interrupt a peer.
>>  The written value's high 16 bits are the ID of the peer to interrupt,
>> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
>> index 352937f..831da53 100644
>> --- a/hw/misc/ivshmem.c
>> +++ b/hw/misc/ivshmem.c
>> @@ -234,12 +234,7 @@ static uint64_t ivshmem_io_read(void *opaque, hwaddr 
>> addr,
>>  break;
>>
>>  case IVPOSITION:
>> -/* return my VM ID if the memory is mapped */
>> -if (memory_region_is_mapped(>ivshmem)) {
>> -ret = s->vm_id;
>> -} else {
>> -ret = -1;
>> -}
>> +ret = s->vm_id;
>>  break;
>>
>>  default:
>> @@ -511,7 +506,8 @@ static bool fifo_update_and_get_i64(IVShmemState *s,
>>  return false;
>>  }
>>
>> -static int ivshmem_add_kvm_msi_virq(IVShmemState *s, int vector)
>> +static void ivshmem_add_kvm_msi_virq(IVShmemState *s, int vector,
>> + Error **errp)
>
> I prefer to return -1 in case of error, even if Error** is also returned.

You know, I'd prefer that, too, and I've argued for it unsuccessfully.
As it is, we fairly consistently return void when the function returns
errors through Error ** and has no non-error value.

>>  {
>>  PCIDevice *pdev = PCI_DEVICE(s);
>>  MSIMessage msg = msix_get_message(pdev, vector);
>> @@ -522,22 +518,21 @@ static int ivshmem_add_kvm_msi_virq(IVShmemState *s, 
>> int vector)
>>
>>  ret = kvm_irqchip_add_msi_route(kvm_state, msg, pdev);
>>  if (ret < 0) {
>> -error_report("ivshmem: kvm_irqchip_add_msi_route failed");
>> -return -1;
>> +error_setg(errp, "kvm_irqchip_add_msi_route failed");
>> +return;
>>  }
>>
>>  s->msi_vectors[vector].virq = ret;
>>  s->msi_vectors[vector].pdev = pdev;
>> -
>> -return 0;
>>  }
>>
>> -static void setup_interrupt(IVShmemState *s, int vector)
>> +static void setup_interrupt(IVShmemState *s, int vector, Error **errp)

Re: [Qemu-devel] [PATCH 23/38] ivshmem: Receive shared memory synchronously in realize()

2016-03-02 Thread Markus Armbruster

Marc-André Lureau  writes:

> Hi
>
> On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster  wrote:
>> When configured for interrupts (property "chardev" given), we receive
>> the shared memory from an ivshmem server.  We do so asynchronously
>> after realize() completes, by setting up callbacks with
>> qemu_chr_add_handlers().
>>
>> Keeping server I/O out of realize() that way avoids delays due to a
>> slow server.  This is probably relevant only for hot plug.
>>
>> However, this funny "no shared memory, yet" state of the device also
>> causes a raft of issues that are hard or impossible to work around:
>>
>> * The guest is exposed to this state: when we enter and leave it its
>>   shared memory contents is apruptly replaced, and device register
>>   IVPosition changes.
>>
>>   This is a known issue.  We document that guests should not access
>>   the shared memory after device initialization until the IVPosition
>>   register becomes non-negative.
>>
>>   For cold plug, the funny state is unlikely to be visible in
>>   practice, because we normally receive the shared memory long before
>>   the guest gets around to mess with the device.
>>
>>   For hot plug, the timing is tighter, but the relative slowness of
>>   PCI device configuration has a good chance to hide the funny state.
>>
>>   In either case, guests complying with the documented procedure are
>>   safe.
>>
>> * Migration becomes racy.
>>
>>   If migration completes before the shared memory setup completes on
>>   the source, shared memory contents is silently lost.  Fortunately,
>>   migration is rather unlikely to win this race.
>>
>>   If the shared memory's ramblock arrives at the destination before
>>   shared memory setup completes, migration fails.
>>
>>   There is no known way for a management application to wait for
>>   shared memory setup to complete.
>>
>>   All you can do is retry failed migration.  You can improve your
>>   chances by leaving more time between running the destination QEMU
>>   and the migrate command.
>>
>>   To mitigate silent memory loss, you need to ensure the server
>>   initializes shared memory exactly the same on source and
>>   destination.
>>
>>   These issues are entirely undocumented so far.
>>
>> I'd expect the server to be almost always fast enough to hide these
>> issues.  But then rare catastrophic races are in a way the worst kind.
>>
>> This is way more trouble than I'm willing to take from any device.
>> Kill the funny state by receiving shared memory synchronously in
>> realize().  If your hot plug hangs, go kill your ivshmem server.
>>
>> For easier review, this commit only makes the receive synchronous, it
>> doesn't add the necessary error propagation.  Without that, the funny
>> state persists.  The next commit will do that, and kill it off for
>> real.
>>
>> Signed-off-by: Markus Armbruster 
>> ---
>>  hw/misc/ivshmem.c| 70 
>> +---
>>  tests/ivshmem-test.c | 26 ++-
>>  2 files changed, 57 insertions(+), 39 deletions(-)
>>
>> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
>> index c366087..352937f 100644
>> --- a/hw/misc/ivshmem.c
>> +++ b/hw/misc/ivshmem.c
>> @@ -676,27 +676,47 @@ static void ivshmem_read(void *opaque, const uint8_t 
>> *buf, int size)
>>  process_msg(s, incoming_posn, incoming_fd);
>>  }
>>
>> -static void ivshmem_check_version(void *opaque, const uint8_t * buf, int 
>> size)
>> +static int64_t ivshmem_recv_msg(IVShmemState *s, int *pfd)
>>  {
>> -IVShmemState *s = opaque;
>> -int tmp;
>> -int64_t version;
>> +int64_t msg;
>> +int n, ret;
>>
>> -if (!fifo_update_and_get_i64(s, buf, size, )) {
>> -return;
>> -}
>> +n = 0;
>> +do {
>> +ret = qemu_chr_fe_read_all(s->server_chr, (uint8_t *) + n,
>> + sizeof(msg) - n);
>> +if (ret < 0 && ret != -EINTR) {
>> +/* TODO error handling */
>> +return INT64_MIN;
>> +}
>> +n += ret;
>> +} while (n < sizeof(msg));
>>
>> -tmp = qemu_chr_fe_get_msgfd(s->server_chr);
>> -if (tmp != -1 || version != IVSHMEM_PROTOCOL_VERSION) {
>> +*pfd = qemu_chr_fe_get_msgfd(s->server_chr);
>> +return msg;
>> +}
>> +
>> +static void ivshmem_recv_setup(IVShmemState *s)
>> +{
>> +int64_t msg;
>> +int fd;
>> +
>> +msg = ivshmem_recv_msg(s, );
>> +if (fd != -1 || msg != IVSHMEM_PROTOCOL_VERSION) {
>>  fprintf(stderr, "incompatible version, you are connecting to a 
>> ivshmem-"
>>  "server using a different protocol please check your 
>> setup\n");
>> -qemu_chr_add_handlers(s->server_chr, NULL, NULL, NULL, s);
>>  return;
>>  }
>>
>> -IVSHMEM_DPRINTF("version check ok, switch to real chardev handler\n");
>> -qemu_chr_add_handlers(s->server_chr, ivshmem_can_receive, ivshmem_read,
>> -

Re: [Qemu-devel] [PATCH 22/38] ivshmem: Plug leaks on unplug, fix peer disconnect

2016-03-02 Thread Markus Armbruster

Marc-André Lureau  writes:

> Hi
>
> On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster  wrote:
>> close_peer_eventfds() cleans up three things: ioeventfd triggers if
>> they exist, eventfds, and the array to store them.
>>
>> Commit 98609cd (v1.2.0) fixed it not to clean up ioeventfd triggers
>> when they don't exist (property ioeventfd=off, which is the default).
>> Unfortunately, the fix also made it skip cleanup of the eventfds and
>> the array then.  This is a memory and file descriptor leak on unplug.
>>
>> Additionally, the reset of nb_eventfds is skipped.  Doesn't matter on
>> unplug.  On peer disconnect, however, this permanently wedges the
>> interrupt vectors used for that peer's ID.  The eventfds stay behind,
>> but aren't connected to a peer anymore.  When the ID gets recycled for
>> a new peer, the new peer's eventfds get assigned to vectors after the
>> old ones.  Commonly, the device's number of vectors matches the
>> server's, so the new ones get dropped with a "Too many eventfd
>> received" message.  Interrupts either don't work (common case) or go
>> to the wrong vector.
>>
>> Fix by narrowing the conditional to just the ioeventfd trigger
>> cleanup.
>>
>> While there, move the "invalid" peer check to the only caller where it
>> can actually happen.
>>
>> Cc: Paolo Bonzini 
>> Signed-off-by: Markus Armbruster 
>> ---
>>  hw/misc/ivshmem.c | 24 
>>  1 file changed, 12 insertions(+), 12 deletions(-)
>>
>> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
>> index fc4..c366087 100644
>> --- a/hw/misc/ivshmem.c
>> +++ b/hw/misc/ivshmem.c
>> @@ -428,21 +428,17 @@ static void close_peer_eventfds(IVShmemState *s, int 
>> posn)
>>  {
>>  int i, n;
>>
>> -if (!ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
>> -return;
>> -}
>> -if (posn < 0 || posn >= s->nb_peers) {
>> -error_report("invalid peer %d", posn);
>> -return;
>> -}
>> -
>> +assert(posn >= 0 && posn < s->nb_peers);
>>  n = s->peers[posn].nb_eventfds;
>>
>> -memory_region_transaction_begin();
>> -for (i = 0; i < n; i++) {
>> -ivshmem_del_eventfd(s, posn, i);
>> +if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) {
>> +memory_region_transaction_begin();
>> +for (i = 0; i < n; i++) {
>> +ivshmem_del_eventfd(s, posn, i);
>> +}
>> +memory_region_transaction_commit();
>>  }
>> -memory_region_transaction_commit();
>> +
>>  for (i = 0; i < n; i++) {
>>  event_notifier_cleanup(>peers[posn].eventfds[i]);
>>  }
>
> Looks good, that makes me wonder, what would happen if posn == vm_id?
> I think this should be made an invalid condition or it should revert
> setup_interrupt().

When called from pci_ivshmem_exit(): perfectly fine.

When called from process_msg_disconnect(): invalid as long as
ivshmem-spec.txt doesn't assign a sane meaning to it.  Let's make it an
error there, okay?

>> @@ -598,6 +594,10 @@ static void process_msg_shmem(IVShmemState *s, int fd)
>>  static void process_msg_disconnect(IVShmemState *s, uint16_t posn)
>>  {
>>  IVSHMEM_DPRINTF("posn %d has gone away\n", posn);
>> +if (posn >= s->nb_peers) {
>> +error_report("invalid peer %d", posn);
>> +return;
>> +}
>>  close_peer_eventfds(s, posn);
>>  }
>>
>> --
>> 2.4.3
>>
>>

Re: [Qemu-devel] [PATCH] target-arm: Fix translation level on early translation faults

2016-03-02 Thread Sergey Fedorov

On 02.03.2016 21:04, Sergey Sorokin wrote:
> Qemu reports translation fault on 1st level instead of 0th level in case of
> AArch64 address translation if the translation table walk is disabled or
> the address is in the gap between the two regions.

It's probably not a very clear description in the commit message. IIUC,
level 0 fault is reported in case of any fault from TTBR in AArch64 state.

Best regards,
Sergey

>
> Signed-off-by: Sergey Sorokin 
> ---
>  target-arm/helper.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/target-arm/helper.c b/target-arm/helper.c
> index 18c8296..09f920c 100644
> --- a/target-arm/helper.c
> +++ b/target-arm/helper.c
> @@ -7238,6 +7238,7 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
> target_ulong address,
>   * support for those page table walks.
>   */
>  if (arm_el_is_aa64(env, el)) {
> +level = 0;
>  va_size = 64;
>  if (el > 1) {
>  if (mmu_idx != ARMMMUIdx_S2NS) {

Re: [Qemu-devel] [PATCH 21/38] ivshmem: Disentangle ivshmem_read()

2016-03-02 Thread Markus Armbruster

Marc-André Lureau  writes:

> On Wed, Mar 2, 2016 at 4:53 PM, Markus Armbruster  wrote:
 +if (msg == -1) {
 +process_msg_shmem(s, fd);
>>>
>>> the previous code used to close fd if any, it's worth to keep that imho
>>
>> I'm blind.  Where?
>
> Sorry, wrong place I looked at, seems you got them all.
>
> if (msg < -1 || msg > IVSHMEM_MAX_PEERS) {
> error_report("server sent invalid message %" PRId64, msg);
> close(fd);
> return;
> }
>
>
> However, why not keep the if fd != -1 here (not a great idea to call
> close otherwise)

We refuse to make the code more verbose to avoid free(NULL), and I very
much agree with that.

close(-1) is like free(NULL) in that it is perfectly safe.  Where they
differ is performance: free() checks for null right away, but close()
checks only after switching to supervisor mode.  Doesn't matter on an
error path.

Re: [Qemu-devel] [V6 1/4] hw/i386: Introduce AMD IOMMU

2016-03-02 Thread David Kiarie




On 25/02/16 18:43, Marcel Apfelbaum wrote:

On 02/21/2016 08:10 PM, David Kiarie wrote:

Add AMD IOMMU emulaton to Qemu in addition to Intel IOMMU
The IOMMU does basic translation, error checking and has a
mininal IOTLB implementation


Hi,



Signed-off-by: David Kiarie 
---
  hw/i386/Makefile.objs |1 +
  hw/i386/amd_iommu.c   | 1432 
+

  hw/i386/amd_iommu.h   |  395 ++
  include/hw/pci/pci.h  |2 +
  4 files changed, 1830 insertions(+)
  create mode 100644 hw/i386/amd_iommu.c
  create mode 100644 hw/i386/amd_iommu.h

diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
index b52d5b8..2f1a265 100644
--- a/hw/i386/Makefile.objs
+++ b/hw/i386/Makefile.objs
@@ -3,6 +3,7 @@ obj-y += multiboot.o
  obj-y += pc.o pc_piix.o pc_q35.o
  obj-y += pc_sysfw.o
  obj-y += intel_iommu.o
+obj-y += amd_iommu.o
  obj-$(CONFIG_XEN) += ../xenpv/ xen/

  obj-y += kvmvapic.o
diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
new file mode 100644
index 000..3dac043
--- /dev/null
+++ b/hw/i386/amd_iommu.c
@@ -0,0 +1,1432 @@
+/*
+ * QEMU emulation of AMD IOMMU (AMD-Vi)
+ *
+ * Copyright (C) 2011 Eduard - Gabriel Munteanu
+ * Copyright (C) 2015 David Kiarie, 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License 
along

+ * with this program; if not, see .
+ *
+ * Cache implementation inspired by hw/i386/intel_iommu.c
+ *
+ */
+#include "hw/i386/amd_iommu.h"
+
+/*#define DEBUG_AMD_IOMMU*/
+#ifdef DEBUG_AMD_IOMMU
+enum {
+DEBUG_GENERAL, DEBUG_CAPAB, DEBUG_MMIO, DEBUG_ELOG,
+DEBUG_CACHE, DEBUG_COMMAND, DEBUG_MMU
+};
+
+#define IOMMU_DBGBIT(x)   (1 << DEBUG_##x)
+static int iommu_dbgflags = IOMMU_DBGBIT(MMIO);
+
+#define IOMMU_DPRINTF(what, fmt, ...) do { \
+if (iommu_dbgflags & IOMMU_DBGBIT(what)) { \
+fprintf(stderr, "(amd-iommu)%s: " fmt "\n", __func__, \
+## __VA_ARGS__); } \
+} while (0)
+#else
+#define IOMMU_DPRINTF(what, fmt, ...) do {} while (0)
+#endif
+
+typedef struct AMDIOMMUAddressSpace {
+uint8_t bus_num;/* bus 
number   */
+uint8_t devfn;  /* device 
function  */
+AMDIOMMUState *iommu_state; /* IOMMU - one per 
machine  */
+MemoryRegion iommu; /* Device's iommu 
region*/
+AddressSpace as;/* device's corresponding address 
space */

+} AMDIOMMUAddressSpace;
+
+/* IOMMU cache entry */
+typedef struct IOMMUIOTLBEntry {
+uint64_t gfn;
+uint16_t domid;
+uint64_t devid;
+uint64_t perms;
+uint64_t translated_addr;
+} IOMMUIOTLBEntry;
+
+/* configure MMIO registers at startup/reset */
+static void amd_iommu_set_quad(AMDIOMMUState *s, hwaddr addr, 
uint64_t val,

+   uint64_t romask, uint64_t w1cmask)
+{
+stq_le_p(>mmior[addr], val);
+stq_le_p(>romask[addr], romask);
+stq_le_p(>w1cmask[addr], w1cmask);
+}
+
+static uint16_t amd_iommu_readw(AMDIOMMUState *s, hwaddr addr)
+{
+return lduw_le_p(>mmior[addr]);
+}
+
+static uint32_t amd_iommu_readl(AMDIOMMUState *s, hwaddr addr)
+{
+return ldl_le_p(>mmior[addr]);
+}
+
+static uint64_t amd_iommu_readq(AMDIOMMUState *s, hwaddr addr)
+{
+return ldq_le_p(>mmior[addr]);
+}
+
+/* internal write */
+static void amd_iommu_writeq_raw(AMDIOMMUState *s, uint64_t val, 
hwaddr addr)

+{
+stq_le_p(>mmior[addr], val);
+}
+
+/* external write */
+static void amd_iommu_writew(AMDIOMMUState *s, hwaddr addr, uint16_t 
val)

+{
+uint16_t romask = lduw_le_p(>romask[addr]);
+uint16_t w1cmask = lduw_le_p(>w1cmask[addr]);
+uint16_t oldval = lduw_le_p(>mmior[addr]);
+stw_le_p(>mmior[addr], (val & ~(val & w1cmask)) | (romask & 
oldval));

+}
+
+static void amd_iommu_writel(AMDIOMMUState *s, hwaddr addr, uint32_t 
val)

+{
+uint32_t romask = ldl_le_p(>romask[addr]);
+uint32_t w1cmask = ldl_le_p(>w1cmask[addr]);
+uint32_t oldval = ldl_le_p(>mmior[addr]);
+stl_le_p(>mmior[addr], (val & ~(val & w1cmask)) | (romask & 
oldval));

+}
+
+static void amd_iommu_writeq(AMDIOMMUState *s, hwaddr addr, uint64_t 
val)

+{
+uint64_t romask = ldq_le_p(>romask[addr]);
+uint64_t w1cmask = ldq_le_p(>w1cmask[addr]);
+uint32_t oldval = ldq_le_p(>mmior[addr]);
+stq_le_p(>mmior[addr], (val & ~(val & w1cmask)) | (romask & 
oldval));

+}
+
+static void

Re: [Qemu-devel] [PATCH v2 11/19] qapi: Add type.is_empty() helper

2016-03-02 Thread Markus Armbruster

Eric Blake  writes:

> And use it in qapi-types and qapi-event.  Down the road, we may
> want to lift our artificial restriction of no variants at the
> top level of an event, at which point, inlining our check for
> whether members is empty will no longer be sufficient, but
> adding a check for variants adds verbosity; in the meantime,
> add some asserts in places where we don't handle variants.

Perhaps I'm just running out of steam for today, but I've read this
twice, and still don't get why adding these assertions goes in the same
patch as adding the helper, or what it has to do with events.

> More immediately, the new .is_empty() helper will help fix a bug
> in qapi-visit in the next patch, where the generator did not
> handle an explicit empty type in the same was as a missing type.

same way

>
> No change to generated code.
>
> Signed-off-by: Eric Blake 
>
> ---
> v2: no change
> v1: add some asserts
> Previously posted as part of qapi cleanup subset E:
> v9: improve commit message
> v8: no change
> v7: rebase to context change
> v6: new patch
> ---
>  scripts/qapi.py   | 5 +
>  scripts/qapi-event.py | 7 ---
>  scripts/qapi-types.py | 2 +-
>  3 files changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/scripts/qapi.py b/scripts/qapi.py
> index 6c52fe5..83080b3 100644
> --- a/scripts/qapi.py
> +++ b/scripts/qapi.py
> @@ -962,6 +962,7 @@ class QAPISchemaObjectType(QAPISchemaType):
>  assert isinstance(self.base, QAPISchemaObjectType)
>  self.base.check(schema)
>  self.base.check_clash(schema, self.info, seen)
> +assert not self.base.variants
>  for m in self.local_members:
>  m.check(schema)
>  m.check_clash(self.info, seen)

This is the "some asserts" ;)

> @@ -983,6 +984,10 @@ class QAPISchemaObjectType(QAPISchemaType):
>  # See QAPISchema._make_implicit_object_type()
>  return self.name[0] == ':'
>
> +def is_empty(self):
> +assert self.members is not None
> +return not self.members and not self.variants
> +
>  def c_name(self):
>  assert not self.is_implicit()
>  return QAPISchemaType.c_name(self)
> diff --git a/scripts/qapi-event.py b/scripts/qapi-event.py
> index fb579dd..808ed80 100644
> --- a/scripts/qapi-event.py
> +++ b/scripts/qapi-event.py
> @@ -39,7 +39,7 @@ def gen_event_send(name, arg_type):
>  ''',
>  proto=gen_event_send_proto(name, arg_type))
>
> -if arg_type and arg_type.members:
> +if arg_type and not arg_type.is_empty():
>  ret += mcgen('''
>  QmpOutputVisitor *qov;
>  Visitor *v;

Oh, you don't just add a helper, you actually *change* the condition!
Perhaps the commit message would be easier to understand if it explained
that first.

> @@ -58,7 +58,8 @@ def gen_event_send(name, arg_type):
>  ''',
>   name=name)
>
> -if arg_type and arg_type.members:
> +if arg_type and not arg_type.is_empty():
> +assert not arg_type.variants
>  ret += mcgen('''
>  qov = qmp_output_visitor_new();
>  v = qmp_output_get_visitor(qov);
> @@ -88,7 +89,7 @@ out_obj:
>  ''',
>   c_enum=c_enum_const(event_enum_name, name))
>
> -if arg_type and arg_type.members:
> +if arg_type and not arg_type.is_empty():
>  ret += mcgen('''
>  out:
>  qmp_output_visitor_cleanup(qov);
> diff --git a/scripts/qapi-types.py b/scripts/qapi-types.py
> index 0306a88..6c1923d 100644
> --- a/scripts/qapi-types.py
> +++ b/scripts/qapi-types.py
> @@ -90,7 +90,7 @@ struct %(c_name)s {
>  # potential issues with attempting to malloc space for zero-length
>  # structs in C, and also incompatibility with C++ (where an empty
>  # struct is size 1).
> -if not (base and base.members) and not members and not variants:
> +if (not base or base.is_empty()) and not members and not variants:
>  ret += mcgen('''
>  char qapi_dummy_for_empty_struct;
>  ''')

I figure the case for the helper based on this patch alone is making the
code a bit more future-proof.  Suggest you try to explain that in your
commit message, including against what future change exactly you're
proofing the code.

Haven't reviewed for completeness.

Re: [Qemu-devel] [PATCH 32/38] qdev: New DEFINE_PROP_ON_OFF_AUTO

2016-03-02 Thread Marc-André Lureau

Hi

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster  wrote:
> Signed-off-by: Markus Armbruster 
> ---

Reviewed-by: Marc-André Lureau 


>  hw/core/qdev-properties.c| 10 ++
>  include/hw/qdev-properties.h |  3 +++
>  2 files changed, 13 insertions(+)
>
> diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
> index bc89800..d2f5a08 100644
> --- a/hw/core/qdev-properties.c
> +++ b/hw/core/qdev-properties.c
> @@ -516,6 +516,16 @@ PropertyInfo qdev_prop_macaddr = {
>  .set   = set_mac,
>  };
>
> +/* --- on/off/auto --- */
> +
> +PropertyInfo qdev_prop_on_off_auto = {
> +.name = "OnOffAuto",
> +.description = "on/off/auto",
> +.enum_table = OnOffAuto_lookup,
> +.get = get_enum,
> +.set = set_enum,
> +};
> +
>  /* --- lost tick policy --- */
>
>  QEMU_BUILD_BUG_ON(sizeof(LostTickPolicy) != sizeof(int));
> diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
> index 03a1b91..0586cac 100644
> --- a/include/hw/qdev-properties.h
> +++ b/include/hw/qdev-properties.h
> @@ -18,6 +18,7 @@ extern PropertyInfo qdev_prop_string;
>  extern PropertyInfo qdev_prop_chr;
>  extern PropertyInfo qdev_prop_ptr;
>  extern PropertyInfo qdev_prop_macaddr;
> +extern PropertyInfo qdev_prop_on_off_auto;
>  extern PropertyInfo qdev_prop_losttickpolicy;
>  extern PropertyInfo qdev_prop_bios_chs_trans;
>  extern PropertyInfo qdev_prop_fdc_drive_type;
> @@ -155,6 +156,8 @@ extern PropertyInfo qdev_prop_arraylen;
>  DEFINE_PROP(_n, _s, _f, qdev_prop_drive, BlockBackend *)
>  #define DEFINE_PROP_MACADDR(_n, _s, _f) \
>  DEFINE_PROP(_n, _s, _f, qdev_prop_macaddr, MACAddr)
> +#define DEFINE_PROP_ON_OFF_AUTO(_n, _s, _f, _d) \
> +DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_on_off_auto, OnOffAuto)
>  #define DEFINE_PROP_LOSTTICKPOLICY(_n, _s, _f, _d) \
>  DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_losttickpolicy, \
>  LostTickPolicy)
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

Re: [Qemu-devel] [PATCH 33/38] ivshmem: Replace int role_val by OnOffAuto master

2016-03-02 Thread Marc-André Lureau

Hi

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster  wrote:
> In preparation of making it a qdev property.
>
> Signed-off-by: Markus Armbruster 
> --
>  hw/misc/ivshmem.c | 31 +++
>  1 file changed, 19 insertions(+), 12 deletions(-)
>
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index 785ed1c..b39ea27 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -43,9 +43,6 @@
>  #define IVSHMEM_IOEVENTFD   0
>  #define IVSHMEM_MSI 1
>
> -#define IVSHMEM_PEER0
> -#define IVSHMEM_MASTER  1
> -
>  #define IVSHMEM_REG_BAR_SIZE 0x100
>
>  #define IVSHMEM_DEBUG 0
> @@ -96,12 +93,12 @@ typedef struct IVShmemState {
>  uint64_t msg_buf;   /* buffer for receiving server messages */
>  int msg_buffered_bytes; /* #bytes in @msg_buf */
>
> +OnOffAuto master;
>  Error *migration_blocker;
>
>  char * shmobj;
>  char * sizearg;
>  char * role;
> -int role_val;   /* scalar to avoid multiple string comparisons */
>  } IVShmemState;
>
>  /* registers for the Inter-VM shared memory device */
> @@ -117,6 +114,12 @@ static inline uint32_t ivshmem_has_feature(IVShmemState 
> *ivs,
>  return (ivs->features & (1 << feature));
>  }
>
> +static inline bool ivshmem_is_master(IVShmemState *s)
> +{
> +assert(s->master != ON_OFF_AUTO_AUTO);
> +return s->master == ON_OFF_AUTO_ON;
> +}
> +
>  static void ivshmem_update_irq(IVShmemState *s)
>  {
>  PCIDevice *d = PCI_DEVICE(s);
> @@ -861,15 +864,15 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error 
> **errp)
>  /* check that role is reasonable */
>  if (s->role) {
>  if (strncmp(s->role, "peer", 5) == 0) {
> -s->role_val = IVSHMEM_PEER;
> +s->master = ON_OFF_AUTO_OFF;
>  } else if (strncmp(s->role, "master", 7) == 0) {
> -s->role_val = IVSHMEM_MASTER;
> +s->master = ON_OFF_AUTO_ON;
>  } else {
>  error_setg(errp, "'role' must be 'peer' or 'master'");
>  return;
>  }
>  } else {
> -s->role_val = IVSHMEM_MASTER; /* default */
> +s->master = ON_OFF_AUTO_AUTO;
>  }
>
>  pci_conf = dev->config;
> @@ -931,7 +934,11 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error 
> **errp)
>  vmstate_register_ram(s->ivshmem_bar2, DEVICE(s));
>  pci_register_bar(PCI_DEVICE(s), 2, attr, s->ivshmem_bar2);
>
> -if (s->role_val == IVSHMEM_PEER) {
> +if (s->master == ON_OFF_AUTO_AUTO) {
> +s->master = s->vm_id == 0 ? ON_OFF_AUTO_ON : ON_OFF_AUTO_OFF;
> +}
> +
> +if (ivshmem_is_master(s)) {

!ivshmem_is_master() instead, or ivshmem_is_peer().

>  error_setg(>migration_blocker,
> "Migration is disabled when using feature 'peer mode' in 
> device 'ivshmem'");
>  migrate_add_blocker(s->migration_blocker);
> @@ -993,7 +1000,7 @@ static int ivshmem_pre_load(void *opaque)
>  {
>  IVShmemState *s = opaque;
>
> -if (s->role_val == IVSHMEM_PEER) {
> +if (ivshmem_is_master(s)) {

same here

>  error_report("'peer' devices are not migratable");
>  return -EINVAL;
>  }
> @@ -1019,9 +1026,9 @@ static int ivshmem_load_old(QEMUFile *f, void *opaque, 
> int version_id)
>  return -EINVAL;
>  }
>
> -if (s->role_val == IVSHMEM_PEER) {
> -error_report("'peer' devices are not migratable");
> -return -EINVAL;
> +ret = ivshmem_pre_load(s);
> +if (ret) {
> +return ret;
>  }
>
>  ret = pci_device_load(pdev, f);
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

Re: [Qemu-devel] [PATCH v2 10/19] qapi-visit: Factor out gen_visit_members_call()

2016-03-02 Thread Markus Armbruster

Eric Blake  writes:

> Upcoming patches will be adding several contexts where we want
> to handle the visit of an implicit type (an anonymous base type,
> or an anonymous branch of a flat union) by directly inlining
> the visit of each member of the implicit type. The work is made
> easier by factoring out a new helper, gen_visit_members_call(),
> so that the caller doesn't need to care whether the type it is
> visiting is implicit or normal.
>
> For now, the only implicit type we encounter are the branches
> of a simple union; the initial implementation of the helper
> method is hard-coded to that usage, but it gets us one step
> closer to completely dropping the hack of simple_union_type().
>
> Generated output is unchanged.
>
> Signed-off-by: Eric Blake 
>
> ---
> v2: retitle, rebase to s/fields/members/ changes
> ---
>  scripts/qapi-visit.py | 42 --
>  1 file changed, 24 insertions(+), 18 deletions(-)
>
> diff --git a/scripts/qapi-visit.py b/scripts/qapi-visit.py
> index a712e9a..dbae00c 100644
> --- a/scripts/qapi-visit.py
> +++ b/scripts/qapi-visit.py
> @@ -34,6 +34,25 @@ void visit_type_%(c_name)s_members(Visitor *v, %(c_name)s 
> *obj, Error **errp);
>   c_name=c_name(name))
>
>
> +def gen_visit_members_call(typ, c_name):

The actual arguments of c_name are C expressions, not names.

> +ret = ''
> +assert isinstance(typ, QAPISchemaObjectType)
> +if typ.is_implicit():
> +# TODO ugly special case for simple union
> +assert len(typ.members) == 1
> +assert not typ.variants

This is an inlined and simplified version of simple_union_type().
Violation of DRY, acceptable if temporary, but could use a comment.

> +ret += mcgen('''
> +visit_type_%(c_type)s(v, "data", %(c_name)s, );
> +''',
> + c_type=typ.members[0].type.c_name(), c_name=c_name)
> +else:
> +ret += mcgen('''
> +visit_type_%(c_type)s_members(v, %(c_name)s, );
> +''',
> + c_type=typ.c_name(), c_name=c_name)
> +return ret
> +
> +
>  def gen_visit_object_members(name, base, members, variants):
>  ret = mcgen('''
>
> @@ -45,10 +64,7 @@ void visit_type_%(c_name)s_members(Visitor *v, %(c_name)s 
> *obj, Error **errp)
>  c_name=c_name(name))
>
>  if base:
> -ret += mcgen('''
> -visit_type_%(c_type)s_members(v, (%(c_type)s *)obj, );
> -''',
> - c_type=base.c_name())
> +ret += gen_visit_members_call(base, '(%s *)obj' % base.c_name())
>  ret += gen_err_check()
>
>  ret += gen_visit_members(members, prefix='obj->')
> @@ -60,26 +76,16 @@ void visit_type_%(c_name)s_members(Visitor *v, %(c_name)s 
> *obj, Error **errp)
>   c_name=c_name(variants.tag_member.name))
>
>  for var in variants.variants:
> -# TODO ugly special case for simple union
> -simple_union_type = var.simple_union_type()
>  ret += mcgen('''
>  case %(case)s:
>  ''',
>   case=c_enum_const(variants.tag_member.type.name,
> var.name,
> variants.tag_member.type.prefix))
> -if simple_union_type:
> -ret += mcgen('''
> -visit_type_%(c_type)s(v, "data", >u.%(c_name)s, );
> -''',
> - c_type=simple_union_type.c_name(),
> - c_name=c_name(var.name))
> -else:
> -ret += mcgen('''
> -visit_type_%(c_type)s_members(v, >u.%(c_name)s, );
> -''',
> - c_type=var.type.c_name(),
> - c_name=c_name(var.name))
> +push_indent()
> +ret += gen_visit_members_call(var.type,
> +  '>u.' + c_name(var.name))
> +pop_indent()
>  ret += mcgen('''
>  break;
>  ''')

Not an improvement on its own.  Need to review more patches before I can
more.

Re: [Qemu-devel] [PATCH 31/38] ivshmem: Inline check_shm_size() into its only caller

2016-03-02 Thread Marc-André Lureau

Hi

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster  wrote:
> Improve the error messages while there.
>
> Signed-off-by: Markus Armbruster 
> ---

I am not convinced this improves readibility much, I would cleanup a
bit the function, but keep it.

>  hw/misc/ivshmem.c | 37 +++--
>  1 file changed, 11 insertions(+), 26 deletions(-)
>
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index 0440bca..785ed1c 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -342,29 +342,6 @@ static void watch_vector_notifier(IVShmemState *s, 
> EventNotifier *n,
>  NULL, >msi_vectors[vector]);
>  }
>
> -static int check_shm_size(IVShmemState *s, int fd, Error **errp)
> -{
> -/* check that the guest isn't going to try and map more memory than the
> - * the object has allocated return -1 to indicate error */
> -
> -struct stat buf;
> -
> -if (fstat(fd, ) < 0) {
> -error_setg(errp, "exiting: fstat on fd %d failed: %s",
> -   fd, strerror(errno));
> -return -1;
> -}
> -
> -if (s->ivshmem_size > buf.st_size) {
> -error_setg(errp, "Requested memory size greater"
> -   " than shared object size (%zu > %" PRIu64")",
> -   s->ivshmem_size, (uint64_t)buf.st_size);
> -return -1;
> -} else {
> -return 0;
> -}
> -}
> -
>  static void ivshmem_add_eventfd(IVShmemState *s, int posn, int i)
>  {
>  memory_region_add_eventfd(>ivshmem_mmio,
> @@ -479,7 +456,7 @@ static void setup_interrupt(IVShmemState *s, int vector, 
> Error **errp)
>
>  static void process_msg_shmem(IVShmemState *s, int fd, Error **errp)
>  {
> -Error *err = NULL;
> +struct stat buf;
>  void *ptr;
>
>  if (s->ivshmem_bar2) {
> @@ -488,8 +465,16 @@ static void process_msg_shmem(IVShmemState *s, int fd, 
> Error **errp)
>  return;
>  }
>
> -if (check_shm_size(s, fd, ) == -1) {
> -error_propagate(errp, err);
> +if (fstat(fd, ) < 0) {
> +error_setg_errno(errp, errno,
> +"can't determine size of shared memory sent by server");
> +close(fd);
> +return;
> +}
> +
> +if (s->ivshmem_size > buf.st_size) {
> +error_setg(errp, "server sent only %zd bytes of shared memory",
> +   (size_t)buf.st_size);
>  close(fd);
>  return;
>  }
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

Re: [Qemu-devel] [PATCH 28/38] ivshmem: Tighten check of property "size"

2016-03-02 Thread Marc-André Lureau

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster  wrote:
> If size_t is narrower than 64 bits, passing uint64_t ivshmem_size to
> mmap() truncates.  Reject such sizes.
>
> Signed-off-by: Markus Armbruster 
> ---

Reviewed-by: Marc-André Lureau 


>  hw/misc/ivshmem.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index fb8a4f7..8d54fa9 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -87,7 +87,7 @@ typedef struct IVShmemState {
>   */
>  MemoryRegion bar;
>  MemoryRegion ivshmem;
> -uint64_t ivshmem_size; /* size of shared memory region */
> +size_t ivshmem_size; /* size of shared memory region */
>  uint32_t ivshmem_64bit;
>
>  Peer *peers;
> @@ -361,7 +361,7 @@ static int check_shm_size(IVShmemState *s, int fd, Error 
> **errp)
>
>  if (s->ivshmem_size > buf.st_size) {
>  error_setg(errp, "Requested memory size greater"
> -   " than shared object size (%" PRIu64 " > %" PRIu64")",
> +   " than shared object size (%zu > %" PRIu64")",
> s->ivshmem_size, (uint64_t)buf.st_size);
>  return -1;
>  } else {
> @@ -861,7 +861,8 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error 
> **errp)
>  } else {
>  char *end;
>  int64_t size = qemu_strtosz(s->sizearg, );
> -if (size < 0 || *end != '\0' || !is_power_of_2(size)) {
> +if (size < 0 || (size_t)size != size || *end != '\0'
> +|| !is_power_of_2(size)) {
>  error_setg(errp, "Invalid size %s", s->sizearg);
>  return;
>  }
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

Re: [Qemu-devel] [PATCH 27/38] ivshmem: Simplify how we cope with short reads from server

2016-03-02 Thread Marc-André Lureau

Hi

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster  wrote:
> Short reads from a UNIX domain sockets are exceedingly unlikely when
> the other side always sends eight bytes and we always read eight
> bytes.  We cope with them anyway.  However, the code doing that is
> rather convoluted.  Dumb it down radically.
>
> Replace the convoluted code

agreed

>
> Signed-off-by: Markus Armbruster 
> ---
>  hw/misc/ivshmem.c | 76 
> ---
>  1 file changed, 16 insertions(+), 60 deletions(-)
>
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index e578b8a..fb8a4f7 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -26,7 +26,6 @@
>  #include "migration/migration.h"
>  #include "qemu/error-report.h"
>  #include "qemu/event_notifier.h"
> -#include "qemu/fifo8.h"
>  #include "sysemu/char.h"
>  #include "sysemu/hostmem.h"
>  #include "qapi/visitor.h"
> @@ -80,7 +79,6 @@ typedef struct IVShmemState {
>  uint32_t intrstatus;
>
>  CharDriverState *server_chr;
> -Fifo8 incoming_fifo;
>  MemoryRegion ivshmem_mmio;
>
>  /* We might need to register the BAR before we actually have the memory.
> @@ -99,6 +97,8 @@ typedef struct IVShmemState {
>  uint32_t vectors;
>  uint32_t features;
>  MSIVector *msi_vectors;
> +uint64_t msg_buf;   /* buffer for receiving server messages */
> +int msg_buffered_bytes; /* #bytes in @msg_buf */
>
>  Error *migration_blocker;
>
> @@ -255,11 +255,6 @@ static const MemoryRegionOps ivshmem_mmio_ops = {
>  },
>  };
>
> -static int ivshmem_can_receive(void * opaque)
> -{
> -return sizeof(int64_t);
> -}
> -
>  static void ivshmem_vector_notify(void *opaque)
>  {
>  MSIVector *entry = opaque;
> @@ -459,53 +454,6 @@ static void resize_peers(IVShmemState *s, int nb_peers)
>  }
>  }
>
> -static bool fifo_update_and_get(IVShmemState *s, const uint8_t *buf, int 
> size,
> -void *data, size_t len)
> -{
> -const uint8_t *p;
> -uint32_t num;
> -
> -assert(len <= sizeof(int64_t)); /* limitation of the fifo */
> -if (fifo8_is_empty(>incoming_fifo) && size == len) {
> -memcpy(data, buf, size);
> -return true;
> -}
> -
> -IVSHMEM_DPRINTF("short read of %d bytes\n", size);
> -
> -num = MIN(size, sizeof(int64_t) - fifo8_num_used(>incoming_fifo));
> -fifo8_push_all(>incoming_fifo, buf, num);
> -
> -if (fifo8_num_used(>incoming_fifo) < len) {
> -assert(num == 0);
> -return false;
> -}
> -
> -size -= num;
> -buf += num;
> -p = fifo8_pop_buf(>incoming_fifo, len, );
> -assert(num == len);
> -
> -memcpy(data, p, len);
> -
> -if (size > 0) {
> -fifo8_push_all(>incoming_fifo, buf, size);
> -}
> -
> -return true;
> -}
> -
> -static bool fifo_update_and_get_i64(IVShmemState *s,
> -const uint8_t *buf, int size, int64_t 
> *i64)
> -{
> -if (fifo_update_and_get(s, buf, size, i64, sizeof(*i64))) {
> -*i64 = GINT64_FROM_LE(*i64);
> -return true;
> -}
> -
> -return false;
> -}
> -
>  static void ivshmem_add_kvm_msi_virq(IVShmemState *s, int vector,
>   Error **errp)
>  {
> @@ -658,6 +606,14 @@ static void process_msg(IVShmemState *s, int64_t msg, 
> int fd, Error **errp)
>  }
>  }
>
> +static int ivshmem_can_receive(void *opaque)
> +{
> +IVShmemState *s = opaque;
> +
> +assert(s->msg_buffered_bytes < sizeof(s->msg_buf));
> +return sizeof(s->msg_buf) - s->msg_buffered_bytes;
> +}
> +
>  static void ivshmem_read(void *opaque, const uint8_t *buf, int size)
>  {
>  IVShmemState *s = opaque;
> @@ -665,8 +621,12 @@ static void ivshmem_read(void *opaque, const uint8_t 
> *buf, int size)
>  int incoming_fd;
>  int64_t incoming_posn;
>
> -if (!fifo_update_and_get_i64(s, buf, size, _posn)) {
> -return;
> +assert(size >= 0 && s->msg_buffered_bytes + size <= sizeof(s->msg_buf));
> +memcpy((unsigned char *)>msg_buf + s->msg_buffered_bytes, buf, size);
> +s->msg_buffered_bytes += size;
> +if (s->msg_buffered_bytes == sizeof(s->msg_buf)) {
> +incoming_posn = le64_to_cpu(s->msg_buf);
> +s->msg_buffered_bytes = 0;
>  }
>

missing "else return" though.

>  incoming_fd = qemu_chr_fe_get_msgfd(s->server_chr);
> @@ -1019,8 +979,6 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error 
> **errp)
>  }
>  }
>
> -fifo8_create(>incoming_fifo, sizeof(int64_t));
> -
>  if (s->role_val == IVSHMEM_PEER) {
>  error_setg(>migration_blocker,
> "Migration is disabled when using feature 'peer mode' in 
> device 'ivshmem'");
> @@ -1033,8 +991,6 @@ static void pci_ivshmem_exit(PCIDevice *dev)
>  IVShmemState *s = IVSHMEM(dev);
>  int i;
>
> -fifo8_destroy(>incoming_fifo);
> -
>  if (s->migration_blocker) {
>

Re: [Qemu-devel] [PATCH 26/38] ivshmem: Drop the hackish test for UNIX domain chardev

2016-03-02 Thread Marc-André Lureau

Hi

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster  wrote:
> The chardev must be capable of transmitting SCM_RIGHTS ancillary
> messages.  We check it by comparing CharDriverState member filename to
> "unix:".  That's almost as brittle as it is disgusting.
>
> When the actual transmission all happened asynchronously, this check
> was all we could do in realize(), and thus better than nothing.  But
> now we receive at least one SCM_RIGHTS synchronously in realize(),
> it's not worth its keep anymore.  Drop it.
>
> Signed-off-by: Markus Armbruster 
> ---

Didn't look that horrible to me, and could be actually more useful
than a later error. But I don't think this is an issue, so why not
drop a few lines..

Reviewed-by: Marc-André Lureau 


>  hw/misc/ivshmem.c | 9 -
>  1 file changed, 9 deletions(-)
>
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index 8f976ca..e578b8a 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -961,15 +961,6 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error 
> **errp)
>  memory_region_add_subregion(>bar, 0, mr);
>  pci_register_bar(PCI_DEVICE(s), 2, attr, >bar);
>  } else if (s->server_chr != NULL) {
> -/* FIXME do not rely on what chr drivers put into filename */
> -if (strncmp(s->server_chr->filename, "unix:", 5)) {
> -error_setg(errp, "chardev is not a unix client socket");
> -return;
> -}
> -
> -/* if we get a UNIX socket as the parameter we will talk
> - * to the ivshmem server to receive the memory region */
> -
>  IVSHMEM_DPRINTF("using shared memory server (socket = %s)\n",
>  s->server_chr->filename);
>
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

Re: [Qemu-devel] [PATCH 25/38] ivshmem: Rely on server sending the ID right after the version

2016-03-02 Thread Marc-André Lureau

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster  wrote:
> The protocol specification (ivshmem-spec.txt, formerly
> ivshmem_device_spec.txt) has always required the ID message to be sent
> right at the beginning, and ivshmem-server has always complied.  The
> device, however, accepts it out of order.  If an interrupt setup
> arrived before it, though, it would be misinterpreted as connect
> notification.  Fix the latent bug by relying on the spec and
> ivshmem-server's actual behavior.
>
> Signed-off-by: Markus Armbruster 
> ---

Reviewed-by: Marc-André Lureau 


>  hw/misc/ivshmem.c | 27 ---
>  1 file changed, 24 insertions(+), 3 deletions(-)
>
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index 831da53..8f976ca 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -653,8 +653,6 @@ static void process_msg(IVShmemState *s, int64_t msg, int 
> fd, Error **errp)
>
>  if (fd >= 0) {
>  process_msg_connect(s, msg, fd, errp);
> -} else if (s->vm_id == -1) {
> -s->vm_id = msg;
>  } else {
>  process_msg_disconnect(s, msg, errp);
>  }
> @@ -723,6 +721,30 @@ static void ivshmem_recv_setup(IVShmemState *s, Error 
> **errp)
>  }
>
>  /*
> + * ivshmem-server sends the remaining initial messages in a fixed
> + * order, but the device has always accepted them in any order.
> + * Stay as compatible as practical, just in case people use
> + * servers that behave differently.
> + */
> +
> +/*
> + * ivshmem_device_spec.txt has always required the ID message
> + * right here, and ivshmem-server has always complied.  However,
> + * older versions of the device accepted it out of order, but
> + * broke when an interrupt setup message arrived before it.
> + */
> +msg = ivshmem_recv_msg(s, , );
> +if (err) {
> +error_propagate(errp, err);
> +return;
> +}
> +if (fd != -1 || msg < 0 || msg > IVSHMEM_MAX_PEERS) {
> +error_setg(errp, "server sent invalid ID message");
> +return;
> +}
> +s->vm_id = msg;
> +
> +/*
>   * Receive more messages until we got shared memory.
>   */
>  do {
> @@ -953,7 +975,6 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error 
> **errp)
>
>  /* we allocate enough space for 16 peers and grow as needed */
>  resize_peers(s, 16);
> -s->vm_id = -1;
>
>  pci_register_bar(dev, 2, attr, >bar);
>
> --
> 2.4.3
>
>



-- 
Marc-André Lureau

Re: [Qemu-devel] [PATCH v2 09/19] qapi: Drop useless 'data' member of unions

2016-03-02 Thread Markus Armbruster

Eric Blake  writes:

> Now that we no longer have any clients of the 'void *data'
> member injected into unions, we can drop it.  Update the
> testsuite to drop the negative test union-clash-data, and
> replace it with a positive test in qapi-schema-test that
> proves that we no longer have a name collision.
>
> Signed-off-by: Eric Blake 
> Reviewed-by: Daniel P. Berrange 
>
> ---
> v2: add R-b
> v1: drop patch that forced :empty as base to all structs
> Previously posted as part of qapi cleanup subset F:
> v6: rebase to earlier changes
> ---
>  scripts/qapi-types.py   | 9 -
>  tests/Makefile  | 1 -
>  tests/qapi-schema/qapi-schema-test.json | 2 +-
>  tests/qapi-schema/qapi-schema-test.out  | 4 ++--
>  tests/qapi-schema/union-clash-data.err  | 0
>  tests/qapi-schema/union-clash-data.exit | 1 -
>  tests/qapi-schema/union-clash-data.json | 7 ---
>  tests/qapi-schema/union-clash-data.out  | 9 -
>  8 files changed, 3 insertions(+), 30 deletions(-)
>  delete mode 100644 tests/qapi-schema/union-clash-data.err
>  delete mode 100644 tests/qapi-schema/union-clash-data.exit
>  delete mode 100644 tests/qapi-schema/union-clash-data.json
>  delete mode 100644 tests/qapi-schema/union-clash-data.out
>
> diff --git a/scripts/qapi-types.py b/scripts/qapi-types.py
> index 19d1fff..0306a88 100644
> --- a/scripts/qapi-types.py
> +++ b/scripts/qapi-types.py
> @@ -116,17 +116,8 @@ static inline %(base)s *qapi_%(c_name)s_base(const 
> %(c_name)s *obj)
>
>
>  def gen_variants(variants):
> -# FIXME: What purpose does data serve, besides preventing a union that
> -# has a branch named 'data'? We use it in qapi-visit.py to decide
> -# whether to bypass the switch statement if visiting the discriminator
> -# failed; but since we 0-initialize structs, and cannot tell what
> -# branch of the union is in use if the discriminator is invalid, there
> -# should not be any data leaks even without a data pointer.  Or, if
> -# 'data' is merely added to guarantee we don't have an empty union,
> -# shouldn't we enforce that at .json parse time?

I figure this comment became stale in commit 544a373.  Mention in commit
message?

>  ret = mcgen('''
>  union { /* union tag is @%(c_name)s */
> -void *data;
>  ''',
>  c_name=c_name(variants.tag_member.name))
>
> diff --git a/tests/Makefile b/tests/Makefile
> index 04e34b5..cd4bbd4 100644
> --- a/tests/Makefile
> +++ b/tests/Makefile
> @@ -358,7 +358,6 @@ qapi-schema += unicode-str.json
>  qapi-schema += union-base-no-discriminator.json
>  qapi-schema += union-branch-case.json
>  qapi-schema += union-clash-branches.json
> -qapi-schema += union-clash-data.json
>  qapi-schema += union-empty.json
>  qapi-schema += union-invalid-base.json
>  qapi-schema += union-optional-branch.json
> diff --git a/tests/qapi-schema/qapi-schema-test.json 
> b/tests/qapi-schema/qapi-schema-test.json
> index 632964a..b5d0c53 100644
> --- a/tests/qapi-schema/qapi-schema-test.json
> +++ b/tests/qapi-schema/qapi-schema-test.json
> @@ -115,7 +115,7 @@
>  'number': ['number'],
>  'boolean': ['bool'],
>  'string': ['str'],
> -'sizes': ['size'],
> +'data': ['size'],
>  'any': ['any'] } }
>

Replaces the natural name for the size array by an arbitrary one just to
show the patch works.  Next to no value going forward, since we're no
more likely to introduce a 'data' clash than one for any number of other
names.

In short, I wouldn't bother :)

>  # testing commands
> diff --git a/tests/qapi-schema/qapi-schema-test.out 
> b/tests/qapi-schema/qapi-schema-test.out
> index f5e2a73..225e2db 100644
> --- a/tests/qapi-schema/qapi-schema-test.out
> +++ b/tests/qapi-schema/qapi-schema-test.out
> @@ -139,9 +139,9 @@ object UserDefNativeListUnion
>  case number: :obj-numberList-wrapper
>  case boolean: :obj-boolList-wrapper
>  case string: :obj-strList-wrapper
> -case sizes: :obj-sizeList-wrapper
> +case data: :obj-sizeList-wrapper
>  case any: :obj-anyList-wrapper
> -enum UserDefNativeListUnionKind ['integer', 's8', 's16', 's32', 's64', 'u8', 
> 'u16', 'u32', 'u64', 'number', 'boolean', 'string', 'sizes', 'any']
> +enum UserDefNativeListUnionKind ['integer', 's8', 's16', 's32', 's64', 'u8', 
> 'u16', 'u32', 'u64', 'number', 'boolean', 'string', 'data', 'any']
>  object UserDefOne
>  base UserDefZero
>  member string: str optional=False
> diff --git a/tests/qapi-schema/union-clash-data.err 
> b/tests/qapi-schema/union-clash-data.err
> deleted file mode 100644
> index e69de29..000
> diff --git a/tests/qapi-schema/union-clash-data.exit 
> b/tests/qapi-schema/union-clash-data.exit
> deleted file mode 100644
> index 573541a..000
> --- a/tests/qapi-schema/union-clash-data.exit
> +++ /dev/null
> @@ -1 +0,0 @@
> -0
> diff --git

Re: [Qemu-devel] [PATCH] target-i386: fix addr16 prefix

2016-03-02 Thread Richard Henderson

On 03/02/2016 07:04 AM, Paolo Bonzini wrote:
> While ADDSEG will only be false in 16-bit mode for LEA, it can be
> false even in other cases when 16-bit addresses are obtained via
> the 67h prefix in 32-bit mode.  In this case, gen_lea_v_seg forgets
> to add a nonzero FS or GS base if CS/DS/ES/SS are all zero.  This
> case is pretty rare but happens when booting Windows 95/98, and
> this patch fixes it.
> 
> The bug is visible since commit d6a291498, but it was introduced
> together with gen_lea_v_seg and it probably could be reproduced
> with a "addr16 gs movsb" instruction as early as in commit
> ca2f29f555805d07fb0b9ebfbbfc4e3656530977.
> 
> Cc: r...@twiddle.net
> Reported-by: Hervé Poussineau 
> Signed-off-by: Paolo Bonzini 
> ---
>  target-i386/translate.c | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)

Reviewed-by: Richard Henderson 

It doesn't even seem to be uncommon inside the win98 kernel, once you start
looking for that addr16 gs pattern.

Thanks,


r~

Re: [Qemu-devel] [PATCH 24/38] ivshmem: Propagate errors through ivshmem_recv_setup()

2016-03-02 Thread Marc-André Lureau

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster  wrote:
> This kills off the funny state described in the previous commit.
>
> Simplify ivshmem_io_read() accordingly, and update documentation.
>
> Signed-off-by: Markus Armbruster 
> ---
>  docs/specs/ivshmem-spec.txt |  20 
>  hw/misc/ivshmem.c   | 121 
> +++-
>  qemu-doc.texi   |   9 +---
>  3 files changed, 87 insertions(+), 63 deletions(-)
>
> diff --git a/docs/specs/ivshmem-spec.txt b/docs/specs/ivshmem-spec.txt
> index 4fc6f37..3eb8c97 100644
> --- a/docs/specs/ivshmem-spec.txt
> +++ b/docs/specs/ivshmem-spec.txt
> @@ -62,11 +62,11 @@ There are two ways to use this device:
>likely want to write a kernel driver to handle interrupts.  Requires
>the device to be configured for interrupts, obviously.
>
> -If the device is configured for interrupts, BAR2 is initially invalid.
> -It becomes safely accessible only after the ivshmem server provided
> -the shared memory.  Guest software should wait for the IVPosition
> -register (described below) to become non-negative before accessing
> -BAR2.
> +Before QEMU 2.6.0, BAR2 can initially be invalid if the device is
> +configured for interrupts.  It becomes safely accessible only after
> +the ivshmem server provided the shared memory.  Guest software should
> +wait for the IVPosition register (described below) to become
> +non-negative before accessing BAR2.
>
>  The device is not capable to tell guest software whether it is
>  configured for interrupts.
> @@ -82,7 +82,7 @@ BAR 0 contains the following registers:
>  4 4   read/write0   Interrupt Status
>  bit 0: peer interrupt
>  bit 1..31: reserved
> -8 4   read-only   0 or -1   IVPosition
> +8 4   read-only   0 or ID   IVPosition
> 12 4   write-only  N/A   Doorbell
>  bit 0..15: vector
>  bit 16..31: peer ID
> @@ -100,12 +100,14 @@ when an interrupt request from a peer is received.  
> Reading the
>  register clears it.
>
>  IVPosition Register: if the device is not configured for interrupts,
> -this is zero.  Else, it's -1 for a short while after reset, then
> -changes to the device's ID (between 0 and 65535).
> +this is zero.  Else, it is the device's ID (between 0 and 65535).
> +
> +Before QEMU 2.6.0, the register may read -1 for a short while after
> +reset.
>
>  There is no good way for software to find out whether the device is
>  configured for interrupts.  A positive IVPosition means interrupts,
> -but zero could be either.  The initial -1 cannot be reliably observed.
> +but zero could be either.
>
>  Doorbell Register: writing this register requests to interrupt a peer.
>  The written value's high 16 bits are the ID of the peer to interrupt,
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index 352937f..831da53 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -234,12 +234,7 @@ static uint64_t ivshmem_io_read(void *opaque, hwaddr 
> addr,
>  break;
>
>  case IVPOSITION:
> -/* return my VM ID if the memory is mapped */
> -if (memory_region_is_mapped(>ivshmem)) {
> -ret = s->vm_id;
> -} else {
> -ret = -1;
> -}
> +ret = s->vm_id;
>  break;
>
>  default:
> @@ -511,7 +506,8 @@ static bool fifo_update_and_get_i64(IVShmemState *s,
>  return false;
>  }
>
> -static int ivshmem_add_kvm_msi_virq(IVShmemState *s, int vector)
> +static void ivshmem_add_kvm_msi_virq(IVShmemState *s, int vector,
> + Error **errp)

I prefer to return -1 in case of error, even if Error** is also returned.

>  {
>  PCIDevice *pdev = PCI_DEVICE(s);
>  MSIMessage msg = msix_get_message(pdev, vector);
> @@ -522,22 +518,21 @@ static int ivshmem_add_kvm_msi_virq(IVShmemState *s, 
> int vector)
>
>  ret = kvm_irqchip_add_msi_route(kvm_state, msg, pdev);
>  if (ret < 0) {
> -error_report("ivshmem: kvm_irqchip_add_msi_route failed");
> -return -1;
> +error_setg(errp, "kvm_irqchip_add_msi_route failed");
> +return;
>  }
>
>  s->msi_vectors[vector].virq = ret;
>  s->msi_vectors[vector].pdev = pdev;
> -
> -return 0;
>  }
>
> -static void setup_interrupt(IVShmemState *s, int vector)
> +static void setup_interrupt(IVShmemState *s, int vector, Error **errp)
>  {
>  EventNotifier *n = >peers[s->vm_id].eventfds[vector];
>  bool with_irqfd = kvm_msi_via_irqfd_enabled() &&
>  ivshmem_has_feature(s, IVSHMEM_MSI);
>  PCIDevice *pdev = PCI_DEVICE(s);
> +Error *err = NULL;
>
>  IVSHMEM_DPRINTF("setting up interrupt for vector: %d\n", vector);
>
> @@ -546,13 +541,16 @@ static void

Re: [Qemu-devel] [PATCH v2 07/19] qapi: Avoid use of 'data' member of qapi unions

2016-03-02 Thread Markus Armbruster

Eric Blake  writes:

> qapi code generators currently create a 'void *data' member as

QAPI

> part of the anonymous union embedded in the C struct corresponding
> to a qapi union.  However, directly assigning to this member of

QAPI

> the union feels a bit fishy, when we can directly use the rest

Suggest to drop "directly", or perhaps say "when we can assign to
another member

> of the struct instead.
>
> Signed-off-by: Eric Blake 
> Reviewed-by: Daniel P. Berrange 
> ---
> v2: add R-b
> v1: no change
> Previously posted as part of qapi cleanup series F:
> v6: rebase to latest
> ---
>  blockdev.c | 31 +--
>  ui/input.c |  2 +-
>  2 files changed, 18 insertions(+), 15 deletions(-)
>
> diff --git a/blockdev.c b/blockdev.c
> index d4bc435..0f20c65 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -1202,15 +1202,11 @@ void hmp_commit(Monitor *mon, const QDict *qdict)
>  }
>  }
>
> -static void blockdev_do_action(TransactionActionKind type, void *data,
> -   Error **errp)
> +static void blockdev_do_action(TransactionAction *action, Error **errp)
>  {
> -TransactionAction action;
>  TransactionActionList list;
>
> -action.type = type;
> -action.u.data = data;
> -list.value = 
> +list.value = action;
>  list.next = NULL;
>  qmp_transaction(, false, NULL, errp);
>  }

Here, you avoid use of data by assigning the whole struct instead of its
members.  

> @@ -1236,8 +1232,11 @@ void qmp_blockdev_snapshot_sync(bool has_device, const 
> char *device,
>  .has_mode = has_mode,
>  .mode = mode,
>  };
> -blockdev_do_action(TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC,
> -   , errp);
> +TransactionAction action = {
> +.type = TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_SYNC,
> +.u.blockdev_snapshot_sync = ,
> +};
> +blockdev_do_action(, errp);
>  }
>

However, the call sites become wordier.  I guess avoiding type-punning
is worth a bit of verbosity.

>  void qmp_blockdev_snapshot(const char *node, const char *overlay,
> @@ -1247,9 +1246,11 @@ void qmp_blockdev_snapshot(const char *node, const 
> char *overlay,
>  .node = (char *) node,
>  .overlay = (char *) overlay
>  };
> -
> -blockdev_do_action(TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT,
> -   _data, errp);
> +TransactionAction action = {
> +.type = TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT,
> +.u.blockdev_snapshot = _data,
> +};
> +blockdev_do_action(, errp);
>  }
>
>  void qmp_blockdev_snapshot_internal_sync(const char *device,
> @@ -1260,9 +1261,11 @@ void qmp_blockdev_snapshot_internal_sync(const char 
> *device,
>  .device = (char *) device,
>  .name = (char *) name
>  };
> -
> -
> blockdev_do_action(TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_INTERNAL_SYNC,
> -   , errp);
> +TransactionAction action = {
> +.type = TRANSACTION_ACTION_KIND_BLOCKDEV_SNAPSHOT_INTERNAL_SYNC,
> +.u.blockdev_snapshot_internal_sync = ,
> +};
> +blockdev_do_action(, errp);
>  }
>
>  SnapshotInfo *qmp_blockdev_snapshot_delete_internal_sync(const char *device,
> diff --git a/ui/input.c b/ui/input.c
> index e15c618..1e81c25 100644
> --- a/ui/input.c
> +++ b/ui/input.c
> @@ -472,7 +472,7 @@ InputEvent *qemu_input_event_new_move(InputEventKind kind,
>  InputMoveEvent *move = g_new0(InputMoveEvent, 1);
>
>  evt->type = kind;
> -evt->u.data = move;
> +evt->u.rel = move; /* also would work as evt->u.abs */
>  move->axis = axis;
>  move->value = value;
>  return evt;

Suggest to say /* evt->u.rel is the same as evt.u.abs */

Can't think of a way to build-assert that.

Re: [Qemu-devel] [PATCH 23/38] ivshmem: Receive shared memory synchronously in realize()

2016-03-02 Thread Marc-André Lureau

Hi

On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster  wrote:
> When configured for interrupts (property "chardev" given), we receive
> the shared memory from an ivshmem server.  We do so asynchronously
> after realize() completes, by setting up callbacks with
> qemu_chr_add_handlers().
>
> Keeping server I/O out of realize() that way avoids delays due to a
> slow server.  This is probably relevant only for hot plug.
>
> However, this funny "no shared memory, yet" state of the device also
> causes a raft of issues that are hard or impossible to work around:
>
> * The guest is exposed to this state: when we enter and leave it its
>   shared memory contents is apruptly replaced, and device register
>   IVPosition changes.
>
>   This is a known issue.  We document that guests should not access
>   the shared memory after device initialization until the IVPosition
>   register becomes non-negative.
>
>   For cold plug, the funny state is unlikely to be visible in
>   practice, because we normally receive the shared memory long before
>   the guest gets around to mess with the device.
>
>   For hot plug, the timing is tighter, but the relative slowness of
>   PCI device configuration has a good chance to hide the funny state.
>
>   In either case, guests complying with the documented procedure are
>   safe.
>
> * Migration becomes racy.
>
>   If migration completes before the shared memory setup completes on
>   the source, shared memory contents is silently lost.  Fortunately,
>   migration is rather unlikely to win this race.
>
>   If the shared memory's ramblock arrives at the destination before
>   shared memory setup completes, migration fails.
>
>   There is no known way for a management application to wait for
>   shared memory setup to complete.
>
>   All you can do is retry failed migration.  You can improve your
>   chances by leaving more time between running the destination QEMU
>   and the migrate command.
>
>   To mitigate silent memory loss, you need to ensure the server
>   initializes shared memory exactly the same on source and
>   destination.
>
>   These issues are entirely undocumented so far.
>
> I'd expect the server to be almost always fast enough to hide these
> issues.  But then rare catastrophic races are in a way the worst kind.
>
> This is way more trouble than I'm willing to take from any device.
> Kill the funny state by receiving shared memory synchronously in
> realize().  If your hot plug hangs, go kill your ivshmem server.
>
> For easier review, this commit only makes the receive synchronous, it
> doesn't add the necessary error propagation.  Without that, the funny
> state persists.  The next commit will do that, and kill it off for
> real.
>
> Signed-off-by: Markus Armbruster 
> ---
>  hw/misc/ivshmem.c| 70 
> +---
>  tests/ivshmem-test.c | 26 ++-
>  2 files changed, 57 insertions(+), 39 deletions(-)
>
> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
> index c366087..352937f 100644
> --- a/hw/misc/ivshmem.c
> +++ b/hw/misc/ivshmem.c
> @@ -676,27 +676,47 @@ static void ivshmem_read(void *opaque, const uint8_t 
> *buf, int size)
>  process_msg(s, incoming_posn, incoming_fd);
>  }
>
> -static void ivshmem_check_version(void *opaque, const uint8_t * buf, int 
> size)
> +static int64_t ivshmem_recv_msg(IVShmemState *s, int *pfd)
>  {
> -IVShmemState *s = opaque;
> -int tmp;
> -int64_t version;
> +int64_t msg;
> +int n, ret;
>
> -if (!fifo_update_and_get_i64(s, buf, size, )) {
> -return;
> -}
> +n = 0;
> +do {
> +ret = qemu_chr_fe_read_all(s->server_chr, (uint8_t *) + n,
> + sizeof(msg) - n);
> +if (ret < 0 && ret != -EINTR) {
> +/* TODO error handling */
> +return INT64_MIN;
> +}
> +n += ret;
> +} while (n < sizeof(msg));
>
> -tmp = qemu_chr_fe_get_msgfd(s->server_chr);
> -if (tmp != -1 || version != IVSHMEM_PROTOCOL_VERSION) {
> +*pfd = qemu_chr_fe_get_msgfd(s->server_chr);
> +return msg;
> +}
> +
> +static void ivshmem_recv_setup(IVShmemState *s)
> +{
> +int64_t msg;
> +int fd;
> +
> +msg = ivshmem_recv_msg(s, );
> +if (fd != -1 || msg != IVSHMEM_PROTOCOL_VERSION) {
>  fprintf(stderr, "incompatible version, you are connecting to a 
> ivshmem-"
>  "server using a different protocol please check your 
> setup\n");
> -qemu_chr_add_handlers(s->server_chr, NULL, NULL, NULL, s);
>  return;
>  }
>
> -IVSHMEM_DPRINTF("version check ok, switch to real chardev handler\n");
> -qemu_chr_add_handlers(s->server_chr, ivshmem_can_receive, ivshmem_read,
> -  NULL, s);
> +/*
> + * Receive more messages until we got shared memory.
> + */
> +do {
> +msg = ivshmem_recv_msg(s, );
> +process_msg(s, msg, fd);
> +

Re: [Qemu-devel] [PATCH v2 02/10] ipmi: replace IPMI_ADD_RSP_DATA() macro with inline helpers

2016-03-02 Thread Cédric Le Goater

On 03/02/2016 07:02 PM, Michael S. Tsirkin wrote:
> On Wed, Mar 02, 2016 at 06:53:08PM +0100, Cédric Le Goater wrote:
>>> typedef struct RspBuffer RspBuffer;
>>
>> OK. So that's the rule for structs in qemu. It is not that clear
>> when you look at the code around. I will change. np.
> 
> Did you look at CODING_STYLE? Pls do.

This is clear. Thanks.

C.

[Qemu-devel] [PATCH] target-arm: Fix translation level on early translation faults

2016-03-02 Thread Sergey Sorokin

Qemu reports translation fault on 1st level instead of 0th level in case of
AArch64 address translation if the translation table walk is disabled or
the address is in the gap between the two regions.

Signed-off-by: Sergey Sorokin 
---
 target-arm/helper.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index 18c8296..09f920c 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -7238,6 +7238,7 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
target_ulong address,
  * support for those page table walks.
  */
 if (arm_el_is_aa64(env, el)) {
+level = 0;
 va_size = 64;
 if (el > 1) {
 if (mmu_idx != ARMMMUIdx_S2NS) {
-- 
1.9.3

Re: [Qemu-devel] [PATCH v2 05/19] util: Shorten references into SocketAddress

2016-03-02 Thread Markus Armbruster

Eric Blake  writes:

> An upcoming patch will alter how simple unions, like SocketAddress,
> are laid out, which will impact all lines of the form 'addr->u.XXX'.
> To minimize the impact of that patch, use C99 initialization or a
> temporary variable to reduce the number of lines needing modification
> when an internal reference within SocketAddress changes layout.
>
> Signed-off-by: Eric Blake 
> Reviewed-by: Daniel P. Berrange 

If you improve the previous commit's message to address my remarks,
you'll probably want to update this one, too.
>
> ---
> v2: add R-b
> ---
>  block/nbd.c| 14 --
>  qemu-char.c| 43 
> --
>  qemu-nbd.c |  9 +
>  tests/test-io-channel-socket.c | 26 -
>  ui/vnc.c   | 39 +++---
>  util/qemu-sockets.c| 11 ++-
>  6 files changed, 81 insertions(+), 61 deletions(-)
>
> diff --git a/block/nbd.c b/block/nbd.c
> index db57b49..9f333c9 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -204,18 +204,20 @@ static SocketAddress *nbd_config(BDRVNBDState *s, QDict 
> *options, char **export,
>  saddr = g_new0(SocketAddress, 1);
>
>  if (qdict_haskey(options, "path")) {
> +UnixSocketAddress *q_unix;
>  saddr->type = SOCKET_ADDRESS_KIND_UNIX;
> -saddr->u.q_unix = g_new0(UnixSocketAddress, 1);
> -saddr->u.q_unix->path = g_strdup(qdict_get_str(options, "path"));
> +q_unix = saddr->u.q_unix = g_new0(UnixSocketAddress, 1);
> +q_unix->path = g_strdup(qdict_get_str(options, "path"));
>  qdict_del(options, "path");
>  } else {
> +InetSocketAddress *inet;
>  saddr->type = SOCKET_ADDRESS_KIND_INET;
> -saddr->u.inet = g_new0(InetSocketAddress, 1);
> -saddr->u.inet->host = g_strdup(qdict_get_str(options, "host"));
> +inet = saddr->u.inet = g_new0(InetSocketAddress, 1);
> +inet->host = g_strdup(qdict_get_str(options, "host"));
>  if (!qdict_get_try_str(options, "port")) {
> -saddr->u.inet->port = g_strdup_printf("%d", NBD_DEFAULT_PORT);
> +inet->port = g_strdup_printf("%d", NBD_DEFAULT_PORT);
>  } else {
> -saddr->u.inet->port = g_strdup(qdict_get_str(options, "port"));
> +inet->port = g_strdup(qdict_get_str(options, "port"));
>  }
>  qdict_del(options, "host");
>  qdict_del(options, "port");
> diff --git a/qemu-char.c b/qemu-char.c
> index 5ea1d34..cfc82bc 100644
> --- a/qemu-char.c
> +++ b/qemu-char.c
> @@ -3659,20 +3659,23 @@ static void qemu_chr_parse_socket(QemuOpts *opts, 
> ChardevBackend *backend,
>
>  addr = g_new0(SocketAddress, 1);
>  if (path) {
> +UnixSocketAddress *q_unix;
>  addr->type = SOCKET_ADDRESS_KIND_UNIX;
> -addr->u.q_unix = g_new0(UnixSocketAddress, 1);
> -addr->u.q_unix->path = g_strdup(path);
> +q_unix = addr->u.q_unix = g_new0(UnixSocketAddress, 1);
> +q_unix->path = g_strdup(path);
>  } else {
>  addr->type = SOCKET_ADDRESS_KIND_INET;
>  addr->u.inet = g_new0(InetSocketAddress, 1);
> -addr->u.inet->host = g_strdup(host);
> -addr->u.inet->port = g_strdup(port);
> -addr->u.inet->has_to = qemu_opt_get(opts, "to");
> -addr->u.inet->to = qemu_opt_get_number(opts, "to", 0);
> -addr->u.inet->has_ipv4 = qemu_opt_get(opts, "ipv4");
> -addr->u.inet->ipv4 = qemu_opt_get_bool(opts, "ipv4", 0);
> -addr->u.inet->has_ipv6 = qemu_opt_get(opts, "ipv6");
> -addr->u.inet->ipv6 = qemu_opt_get_bool(opts, "ipv6", 0);
> +*addr->u.inet = (InetSocketAddress) {
> +.host = g_strdup(host),
> +.port = g_strdup(port),
> +.has_to = qemu_opt_get(opts, "to"),
> +.to = qemu_opt_get_number(opts, "to", 0),
> +.has_ipv4 = qemu_opt_get(opts, "ipv4"),
> +.ipv4 = qemu_opt_get_bool(opts, "ipv4", 0),
> +.has_ipv6 = qemu_opt_get(opts, "ipv6"),
> +.ipv6 = qemu_opt_get_bool(opts, "ipv6", 0),
> +};

Do you still need g_new0(), or would g_new() do?

>  }
>  sock->addr = addr;
>  }
[More of the same snipped...]

Re: [Qemu-devel] [PATCH v2 02/10] ipmi: replace IPMI_ADD_RSP_DATA() macro with inline helpers

2016-03-02 Thread Michael S. Tsirkin

On Wed, Mar 02, 2016 at 06:53:08PM +0100, Cédric Le Goater wrote:
> > typedef struct RspBuffer RspBuffer;
> 
> OK. So that's the rule for structs in qemu. It is not that clear
> when you look at the code around. I will change. np.

Did you look at CODING_STYLE? Pls do.

Re: [Qemu-devel] [PATCH v2 04/19] chardev: Shorten references into ChardevBackend

2016-03-02 Thread Markus Armbruster

Eric Blake  writes:

> An upcoming patch will alter how simple unions, like ChardevBackend,
> are laid out, which will impact all lines of the form 'backend->u.XXX'.
> To minimize the impact of that patch, use a temporary variable to
> reduce the number of lines needing modification when an internal
> reference within ChardevBackend changes layout.
>
> Signed-off-by: Eric Blake 
> Reviewed-By: Daniel P. Berrange 
>
> ---
> v2: add R-b
> ---
>  qemu-char.c | 122 
> 
>  1 file changed, 66 insertions(+), 56 deletions(-)
>
> diff --git a/qemu-char.c b/qemu-char.c
> index fc8ffda..5ea1d34 100644
> --- a/qemu-char.c
> +++ b/qemu-char.c
> @@ -724,7 +724,7 @@ static CharDriverState *qemu_chr_open_mux(const char *id,
>  ChardevMux *mux = backend->u.mux;
>  CharDriverState *chr, *drv;
>  MuxDriver *d;
> -ChardevCommon *common = qapi_ChardevMux_base(backend->u.mux);
> +ChardevCommon *common = qapi_ChardevMux_base(mux);
>
>  drv = qemu_chr_find(mux->chardev);
>  if (drv == NULL) {

The commit message sounds like you *add* a temporary variable to reduce
churn.  You're using an existing one here.

> @@ -1043,7 +1043,7 @@ static CharDriverState *qemu_chr_open_pipe(const char 
> *id,
>  char *filename_in;
>  char *filename_out;
>  const char *filename = opts->device;
> -ChardevCommon *common = qapi_ChardevHostdev_base(backend->u.pipe);
> +ChardevCommon *common = qapi_ChardevHostdev_base(opts);
>
>
>  filename_in = g_strdup_printf("%s.in", filename);
> @@ -1123,7 +1123,7 @@ static CharDriverState *qemu_chr_open_stdio(const char 
> *id,
>  ChardevStdio *opts = backend->u.stdio;
>  CharDriverState *chr;
>  struct sigaction act;
> -ChardevCommon *common = qapi_ChardevStdio_base(backend->u.stdio);
> +ChardevCommon *common = qapi_ChardevStdio_base(opts);
>
>  if (is_daemonized()) {
>  error_setg(errp, "cannot use stdio with -daemonize");
> @@ -2141,7 +2141,7 @@ static CharDriverState *qemu_chr_open_pipe(const char 
> *id,
>  const char *filename = opts->device;
>  CharDriverState *chr;
>  WinCharState *s;
> -ChardevCommon *common = qapi_ChardevHostdev_base(backend->u.pipe);
> +ChardevCommon *common = qapi_ChardevHostdev_base(opts);
>
>  chr = qemu_chr_alloc(common, errp);
>  if (!chr) {
> @@ -3216,7 +3216,7 @@ static CharDriverState *qemu_chr_open_ringbuf(const 
> char *id,
>Error **errp)
>  {
>  ChardevRingbuf *opts = backend->u.ringbuf;
> -ChardevCommon *common = qapi_ChardevRingbuf_base(backend->u.ringbuf);
> +ChardevCommon *common = qapi_ChardevRingbuf_base(opts);
>  CharDriverState *chr;
>  RingBufCharDriver *d;
>
> @@ -3506,26 +3506,29 @@ static void qemu_chr_parse_file_out(QemuOpts *opts, 
> ChardevBackend *backend,
>  Error **errp)
>  {
>  const char *path = qemu_opt_get(opts, "path");
> +ChardevFile *file;

Ah, you do add a temporary in some places.

>
>  if (path == NULL) {
>  error_setg(errp, "chardev: file: no filename given");
>  return;
>  }
> -backend->u.file = g_new0(ChardevFile, 1);
> -qemu_chr_parse_common(opts, qapi_ChardevFile_base(backend->u.file));
> -backend->u.file->out = g_strdup(path);
> +file = backend->u.file = g_new0(ChardevFile, 1);
> +qemu_chr_parse_common(opts, qapi_ChardevFile_base(file));
> +file->out = g_strdup(path);
>
> -backend->u.file->has_append = true;
> -backend->u.file->append = qemu_opt_get_bool(opts, "append", false);
> +file->has_append = true;
> +file->append = qemu_opt_get_bool(opts, "append", false);
>  }


Whether you touch every line now or later is a wash as far as churn is
concerned.  I'd be willing to accept an argument that this change is
simpler than the one it avoids, or that it makes the code more
consistent, or it makes the code easier to read.  Preferably in the
commit message.

[More of the same snipped...]

1 2 3 >

1 - 100 of 235 matches

Mail list logo