date:20190225

Re: [Qemu-devel] [PATCH] riscv: Add proper alignment check and pending 'C' extension upon misa writes

2019-02-25 Thread Amed Magdy

> > It seems to me that the C extension can be enabled at any point, since
> if C is
> > off, you know that the next insn is aligned modulo 4.
> >
>

 Ok, This is mostly right. When C extension is enabled 32-bit base
instructions can be aligned on 2 bytes boundaries instead of 4 bytes only.
So multiple enables and disables of C bit at different code areas
theoretically may require this check on C extension enable. I'm not really
sure, may be this is might not  be a practical use scenario.

> It is only if the C extension is enabled, and you want to disable it,
> that is
> > when we must check to see if the next insn is aligned mod 4.  It is
> trivial to
> > arrange for a particular instruction to be aligned, via assembler
> directives.
> > So it seems silly to make the definition of the csr write to misa any
> more
> > complicated than it is.
>
>  I completely agree with you that C extension disable should have
alignment check.

>
> > diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> > index 960d2b0aa9..8726ef802e 100644
> > --- a/target/riscv/csr.c
> > +++ b/target/riscv/csr.c
> > @@ -370,10 +370,11 @@ static int write_misa(CPURISCVState *env, int
> csrno,
> > target_ulong val)
> >  val &= ~RVD;
> >}
>
> > -/* Suppress 'C' if next instruction is not aligned
> > - * TODO: this should check next_pc
> > +/*
> > + * Suppress 'C' if next instruction is not aligned.
> > + * We updated env->pc to the next insn in the translator.
> >  */
> > -if ((val & RVC) && (GETPC() & ~3) != 0) {
> > +if ((val & RVC) && (env->pc & ~3) != 0) {
> > val &= ~RVC;
> > }


 Just a hint, (env->pc & 3) instead of (env->pc & ~3) , right ?

>

 Thanks,
 Ahmed

Re: [Qemu-devel] [PATCH] qmp: add query-qemu-capabilities

2019-02-25 Thread Peter Krempa

On Mon, Feb 25, 2019 at 17:40:01 +, Stefan Hajnoczi wrote:
> On Mon, Feb 25, 2019 at 10:28:46AM +0100, Peter Krempa wrote:
> > On Mon, Feb 25, 2019 at 09:50:26 +0100, Markus Armbruster wrote:

[...]

> > I'm slightly worried of misuse of the possibility to change the behavior
> > on runtime. In libvirt we cache the capabilities to prevent a "chicken
> > and egg" problem where we need to know what qemu is able to do when
> > generating the command line which is obviously prior to starting qemu.
> > This means that we will want to cache even information determined by
> > interpreting results of this API.
> > 
> > If any further addition is not as simple as this one it may be
> > challenging to be able to interpret the result correctly and be able to
> > cache it.
> > 
> > Libvirt's use of capabilities is split to pre-startup steps and runtime
> > usage. For the pre-startup case [A]  we obviously want to use the cached
> > capabilities which are obtained by running same qemu with a different
> > configuration that will be used later. After qemu is started we use
> > QMP to interact [B] with it also depending to the capabilities
> > determined from the cache. Scenario [B] also allows us to re-probe some
> > things but they need to be strictly usable only with QMP afterwards.
> > 
> > The proposed API with the 'runtime' behaviour allows for these 3
> > scenarios for a capability:
> > 1) Static capability (as this patch shows)
> > This is easy to cache and also supports both [A] and [B]
> > 
> > 2) Capability depending on configuration
> > [A] is out of the window but if QMP only use is necessary we can
> > adapt.
> 
> Does "configuration" mean "QEMU command-line"?  The result of the query
> command should not depend on command-line arguments.

Yes exactly. There probably is possibility that something would be
detectable only after a shared library load which in turn would depend
on the command line ...

> 
> > 3) Capability depending on host state.
> > Same commandline might not result in same behaviour. Obviously can't
> > be cached at all, but libvirt would not do it anyways. [B] is
> > possible, but it's unpleasant.
> 
> Say the kernel or a library dependency is updated, and this enables a
> feature that QEMU was capable of but couldn't advertise before.  I guess
> this might happen and this is why I noted that the features could be
> selected at runtime.

Yeah, such scenario is really breaking our caching even now. I don't
want to say it's bad. It may even be necessary in some scenarios. Both
this and the above scenario may be necessary eventually. Libvirt
certainly can make use of the detection for QMP use if there is such a
thing.

> > I propose that the docs for the function filling the result (and perhaps
> > also the documentation for the QMP interface) clarify and/or guide devs
> > to avoid situations 2) and 3) if possible and motivate them to document
> > the limitations on when the capability is detactable.
> > 
> > Additionally a new field could be added that e.g. pledges that the given
> > capability is of type 1) as described above and thus can be easily
> > cached. That way we can make sure programatically that we pre-cache only
> > the 'good' capabilities.
> > 
> > Other than the above, this is a welcome improvement as I've personally
> > ran into scenarios where a feature in qemu was fixed but it was
> > impossible to detect whether given qemu version contains the fix as it
> > did not have any influence on the QMP schema.
> 
> I'd like to make things as simpler as possible, but no simpler :).
> 
> The simplest option is that the advertised features are set in stone at
> build time (e.g. selected with #ifdef if necessary).  But then we have
> no way of selecting features at runtime (e.g. based on kernel features).
> 
> What do you think?

I really don't want to limit the possibilities for the API. My goal is
only that it's obvious both from the docs and preferably also from the
returned data (so that we can filter what to cache to prevent mistakes)
that a given capability bit is static or dynamic.

I think adding the field to the returned data which will be set
according to how the capability was detected should be simple enough,
but I will be okay with just documenting any caveats along with the
inidividual capabilities. In that case an comment should encourage those
coming after you to document it properly.


signature.asc
Description: PGP signature

[Qemu-devel] [PATCH] fw_cfg: use __ATTR_RO_MODE to define rev sysfs

2019-02-25 Thread Wei Yang

Leverage __ATTR_RO_MODE to define rev sysfs instead of using open code
to define the attribute.

Signed-off-by: Wei Yang 
---
 drivers/firmware/qemu_fw_cfg.c | 13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/firmware/qemu_fw_cfg.c b/drivers/firmware/qemu_fw_cfg.c
index 039e0f91dba8..a1293cbd7adb 100644
--- a/drivers/firmware/qemu_fw_cfg.c
+++ b/drivers/firmware/qemu_fw_cfg.c
@@ -296,18 +296,13 @@ static int fw_cfg_do_platform_probe(struct 
platform_device *pdev)
return 0;
 }
 
-static ssize_t fw_cfg_showrev(struct kobject *k, struct attribute *a, char 
*buf)
+static ssize_t fw_cfg_rev_show(struct kobject *k, struct kobj_attribute *a,
+  char *buf)
 {
return sprintf(buf, "%u\n", fw_cfg_rev);
 }
-
-static const struct {
-   struct attribute attr;
-   ssize_t (*show)(struct kobject *k, struct attribute *a, char *buf);
-} fw_cfg_rev_attr = {
-   .attr = { .name = "rev", .mode = S_IRUSR },
-   .show = fw_cfg_showrev,
-};
+static const struct kobj_attribute fw_cfg_rev_attr =
+   __ATTR_RO_MODE(fw_cfg_rev, 0400);
 
 /* fw_cfg_sysfs_entry type */
 struct fw_cfg_sysfs_entry {
-- 
2.19.1

Re: [Qemu-devel] [PATCH] qmp: add query-qemu-capabilities

2019-02-25 Thread Markus Armbruster

Stefan Hajnoczi  writes:

> On Mon, Feb 25, 2019 at 09:50:26AM +0100, Markus Armbruster wrote:
>> Stefan Hajnoczi  writes:
>> 
>> > QMP clients can usually detect the presence of features via schema
>> > introspection.  There are rare features that do not involve schema
>> > changes and are therefore impossible to detect with schema
>> > introspection.
>> >
>> > This patch adds the query-qemu-capabilities command.  It returns a list
>> > of capabilities that this QEMU supports.
>> 
>> The name "capabilities" could be confusing, because we already have QMP
>> capabilities, complete with command qmp_capabilities.  Would "features"
>> work?
>
> Sure, will fix.
>
>> > The decision to make this a command rather than something statically
>> > defined in the schema is intentional.  It allows QEMU to decide which
>> > capabilities are available at runtime, if necessary.
>> >
>> > This new interface is necessary so that QMP clients can discover that
>> > migrating disk image files is safe with cache.direct=off on Linux.
>> > There is no other way to detect whether or not QEMU supports this.
>> 
>> I think what's unsaid here is that we don't want to make a completely
>> arbitrary schema change just to carry this bit of information.  We
>> could, but we don't want to.  Correct?
>
> Yes, exactly.

Then let's rephrase a little:

 QMP clients can usually detect the presence of features via schema
 introspection.  There are rare features that do not involve schema
 changes.  To make them detectable with schema introspection, we'd
 have to make some arbitrary schema change.  Annoying.

 The new query-qemu-features command lets us avoid that.  It returns
 a list of features supported by this QEMU.

 The decision to make this a command rather than something statically
 defined in the schema is intentional.  It allows QEMU to decide which
 capabilities are available at runtime, if necessary.

 Use the new command to declare migrating disk image files is safe
 with cache.direct=off on Linux.

[...]

Re: [Qemu-devel] Maintainers, please git-am -m

2019-02-25 Thread Markus Armbruster

Eric Blake  writes:

> On 2/8/19 1:30 AM, Markus Armbruster wrote:
>> Short story: please add
>> 
>> [am]
>> messageid = true
>> 
>> to your .gitconfig.
>> 
>> Long story.  git-am can add a Message-id: tag.  Looks like this:
>> 
>
>> 
>> Signed-off-by: Thomas Huth 
>> Reviewed-by: Daniel P. Berrangé 
>> Reviewed-by: Philippe Mathieu-Daudé 
>> Tested-by: Philippe Mathieu-Daudé 
>> Acked-by: Alex Bennée 
>> --->Message-id: 1549268743-18502-1-git-send-email-th...@redhat.com
>> Signed-off-by: Peter Maydell 
>> 
>> The Message-id identifies the patch e-mail.  It makes finding the review
>> thread easier and more reliable.  It's also a valid key on Patchew[*].
>
> I find the tag valuable enough in later git searches that I don't mind
> feeding my own patches back through the mailing list to add it (patchew
> helps with that, of course).  But for it to become mandatory, we'd need
> to enhance scripts/checkpatch.pl to enforce it.

I'm afraid checkpatch is the wrong place.  When you submit v1 patches
for review, there is no Message-id.  Even for respins, we don't want
one.  It should be added when patches get applied for real, so the
commit carries exactly one Message-id, and it refers back to the final
version on the list.

Two ideas:

* Have Patchew flag pull requests lacking Message-id

* Admittedly vague: some kind of git pre merge hook magic to make
  git-merge flag missing Message-id

Re: [Qemu-devel] [PATCH v5 01/14] qapi: qapi for audio backends

2019-02-25 Thread Markus Armbruster

"Kővágó, Zoltán"  writes:

> This patch adds structures into qapi to replace the existing
> configuration structures used by audio backends currently. This qapi
> will be the base of the -audiodev command line parameter (that replaces
> the old environment variables based config).
>
> This is not a 1:1 translation of the old options, I've tried to make
> them much more consistent (e.g. almost every backend had an option to
> specify buffer size, but the name was different for every backend, and
> some backends required usecs, while some other required frames, samples
> or bytes). Also tried to reduce the number of abbreviations used by the
> config keys.
>
> Some of the more important changes:
> * use `in` and `out` instead of `ADC` and `DAC`, as the former is more
>   user friendly imho
> * moved buffer settings into the global setting area (so it's the same
>   for all backends that support it. Backends that can't change buffer
>   size will simply ignore them). Also using usecs, as it's probably more
>   user friendly than samples or bytes.
> * try-poll is now an alsa backend specific option (as all other backends
>   currently ignore it)
>
> Signed-off-by: Kővágó, Zoltán 
> ---
>
> Notes:
> Changes from v4:
> 
> * documentation fixes
> * renamed pa's source/sink to pa-in/pa-out
> * per-direction options changed per Markus Armbruster's comments
> 
> Changes from v2:
> 
> * update copyright, version numbers
> * remove #optional
> * per-direction options are now optional (needed for 
> qobject_object_visitor_new_str)
> * removed unnecessary AudiodevNoOptions
> * changed integers to unsigned
>
>  qapi/audio.json   | 304 ++
>  qapi/qapi-schema.json |   1 +
>  qapi/Makefile.objs|   6 +-
>  3 files changed, 308 insertions(+), 3 deletions(-)
>  create mode 100644 qapi/audio.json
>
> diff --git a/qapi/audio.json b/qapi/audio.json
> new file mode 100644
> index 00..2f203462c7
> --- /dev/null
> +++ b/qapi/audio.json
> @@ -0,0 +1,304 @@
> +# -*- mode: python -*-
> +#
> +# Copyright (C) 2015-2019 Zoltán Kővágó 
> +#
> +# This work is licensed under the terms of the GNU GPL, version 2 or later.
> +# See the COPYING file in the top-level directory.
> +
> +##
> +# @AudiodevPerDirectionOptions:
> +#
> +# General audio backend options that are used for both playback and
> +# recording.
> +#
> +# @fixed-settings: use fixed settings for host input/output. When off,
> +#  frequency, channels and format must not be
> +#  specified (default true)
> +#
> +# @frequency: frequency to use when using fixed settings
> +# (default 44100)
> +#
> +# @channels: number of channels when using fixed settings (default 2)
> +#
> +# @voices: number of voices to use (default 1)
> +#
> +# @format: sample format to use when using fixed settings
> +#  (default s16)
> +#
> +# @buffer-len: the buffer size in microseconds

The name says "len" (for length), the comment says "size" and by its
unit implies "duration".

In the QMP schema, we traditionally prefer longhand like "length" over
abbreviations like "len".

What about calling it @buffer-capacity?

> +#
> +# Since: 4.0
> +##
> +{ 'struct': 'AudiodevPerDirectionOptions',
> +  'data': {
> +'*fixed-settings': 'bool',
> +'*frequency':  'uint32',
> +'*channels':   'uint32',
> +'*voices': 'uint32',
> +'*format': 'AudioFormat',
> +'*buffer-len': 'uint32' } }
> +
> +##
> +# @AudiodevGenericOptions:
> +#
> +# Generic driver-specific options.
> +#
> +# @in: options of the capture stream
> +#
> +# @out: options of the playback stream
> +#
> +# Since: 4.0
> +##
> +{ 'struct': 'AudiodevGenericOptions',
> +  'data': {
> +'*in':  'AudiodevPerDirectionOptions',
> +'*out': 'AudiodevPerDirectionOptions' } }
> +
> +##
> +# @AudiodevAlsaPerDirectionOptions:
> +#
> +# Options of the alsa backend that are used for both playback and
> +# recording.

Permit me to indulge in a little nitpicking...  If you want to refer to
the Advanced Linux Sound Architecture, that's spelled ALSA.  If you want
to refer to the enumeration value, consider putting it in double quotes:
"alsa".

Here, 'the ALSA backend', 'the "alsa" backend', and 'the alsa backend'
would all work for me.  Your choice.

More occurences of 'the FOO backend' and 'the FOO audio backend' below.

> +#
> +# @dev: the name of the alsa device to use (default 'default')

But I'd definitely use ALSA here.

> +#
> +# @period-len: the period length in microseconds

period-length

> +#
> +# @try-poll: attempt to use poll mode, falling back to non-polling
> +#access on failure (default true)
> +#
> +# Since: 4.0
> +##
> +{ 'struct': 'AudiodevAlsaPerDirectionOptions',
> +  'base': 'AudiodevPerDirectionOptions',
> +  'data': {
> +'*dev':'str',
> +'*period-len': 'uint32',
> +'*try-poll':   'bool' } }
> +
> +##
> +#

[Qemu-devel] [PATCH v3 3/5] Add hepler functions for CPUID xsave area size calculation.

2019-02-25 Thread Yang Weijiang

These functions are called when return CPUID xsave area
size information.

Signed-off-by: Zhang Yi 
Signed-off-by: Yang Weijiang 
---
 target/i386/cpu.c | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index f6c7bdf6fe..d8c36e0f2f 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1284,12 +1284,34 @@ static inline bool accel_uses_host_cpuid(void)
 return kvm_enabled() || hvf_enabled();
 }
 
+static uint32_t xsave_area_size_compacted(uint64_t mask)
+{
+int i;
+uint64_t ret = 0;
+uint32_t offset;
+
+for (i = 0; i < ARRAY_SIZE(x86_ext_save_areas); i++) {
+const ExtSaveArea *esa = _ext_save_areas[i];
+offset = i >= 2 ? ret : esa->offset;
+if ((mask >> i) & 1) {
+ret = MAX(ret, offset + esa->size);
+}
+}
+return ret;
+}
+
 static inline uint64_t x86_cpu_xsave_components(X86CPU *cpu)
 {
 return ((uint64_t)cpu->env.features[FEAT_XSAVE_COMP_HI]) << 32 |
cpu->env.features[FEAT_XSAVE_COMP_LO];
 }
 
+static inline uint64_t x86_cpu_xsave_sv_components(X86CPU *cpu)
+{
+return ((uint64_t)cpu->env.features[FEAT_XSAVE_SV_HI]) << 32 |
+   cpu->env.features[FEAT_XSAVE_SV_LO];
+}
+
 const char *get_register_name_32(unsigned int reg)
 {
 if (reg >= CPU_NB_REGS32) {
@@ -4919,8 +4941,10 @@ static void x86_cpu_enable_xsave_components(X86CPU *cpu)
 }
 }
 
-env->features[FEAT_XSAVE_COMP_LO] = mask;
+env->features[FEAT_XSAVE_COMP_LO] = mask & CPUID_XSTATE_USER_MASK;
 env->features[FEAT_XSAVE_COMP_HI] = mask >> 32;
+env->features[FEAT_XSAVE_SV_LO] = mask & CPUID_XSTATE_KERNEL_MASK;
+env->features[FEAT_XSAVE_SV_HI] = mask >> 32;
 }
 
 /* Steps involved on loading and filtering CPUID data
-- 
2.17.1

Re: [Qemu-devel] Questions about EDID

2019-02-25 Thread Gerd Hoffmann

On Mon, Feb 25, 2019 at 09:49:22PM -0500, Programmingkid wrote:
> 
> > On Feb 25, 2019, at 10:26 AM, Gerd Hoffmann  wrote:
> > 
> > On Mon, Feb 25, 2019 at 09:05:30AM -0500, G 3 wrote:
> >> Hi Gerd, I was wondering if you have made any documentation for your EDID
> >> patches. If you have could you provide a link please?
> > 
> > No docs.
> > 
> >> Also could a feature be added that allows the user to specify resolutions
> >> to be made available to the guest?
> >> 
> >> Maybe it could work like this: -device VGA,edid=on,res=1366x768,7680x4320
> > 
> > A single resolution works (via xres + yres properties).
> 
> Could you send an example of the xres and yres properties please?
> I tried this but it didn't work: -device VGA,edid=on,xres=999,yres=888

That is correct.  But you also need a guest driver with edid support.

I think the macos driver got support for that, for linux support landed
in the 5.0 devel cycle.

cheers,
  Gerd

[Qemu-devel] [PATCH v3 5/5] Add CET MSR save/restore support for migration

2019-02-25 Thread Yang Weijiang

To support features such as live-migration,
CET runtime MSRs need to be saved in source machine and
restored on destination machine, this patch is to save
and restore CET_U, CET_S, PL0_SSP, PL3_SSP and SSP_TABL_ADDR
MSRs.

Signed-off-by: Yang Weijiang 
---
 target/i386/cpu.h |  12 +
 target/i386/kvm.c |  33 ++
 target/i386/machine.c | 100 ++
 3 files changed, 145 insertions(+)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index f3f724d8e6..f350684895 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -460,6 +460,12 @@ typedef enum X86Seg {
 #define MSR_IA32_BNDCFGS0x0d90
 #define MSR_IA32_XSS0x0da0
 
+#define MSR_IA32_U_CET  0x6a0
+#define MSR_IA32_S_CET  0x6a2
+#define MSR_IA32_PL0_SSP0x6a4
+#define MSR_IA32_PL3_SSP0x6a7
+#define MSR_IA32_INTR_SSP_TABL  0x6a8
+
 #define XSTATE_FP_BIT   0
 #define XSTATE_SSE_BIT  1
 #define XSTATE_YMM_BIT  2
@@ -1325,6 +1331,12 @@ typedef struct CPUX86State {
 
 uintptr_t retaddr;
 
+uint64_t u_cet;
+uint64_t s_cet;
+uint64_t pl0_ssp;
+uint64_t pl3_ssp;
+uint64_t ssp_tabl_addr;
+
 /* Fields up to this point are cleared by a CPU reset */
 struct {} end_reset_fields;
 
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index f524e7d929..2ab3c977a4 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -63,6 +63,8 @@
 /* A 4096-byte buffer can hold the 8-byte kvm_msrs header, plus
  * 255 kvm_msr_entry structs */
 #define MSR_BUF_SIZE 4096
+#define HAS_CET_CAP(env)  (env->features[FEAT_7_0_ECX] & 0x80 || \
+   env->features[FEAT_7_0_EDX] & 0x10)
 
 const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 KVM_CAP_INFO(SET_TSS_ADDR),
@@ -2197,6 +2199,14 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 }
 }
 
+if (HAS_CET_CAP(env)) {
+kvm_msr_entry_add(cpu, MSR_IA32_U_CET, env->u_cet);
+kvm_msr_entry_add(cpu, MSR_IA32_S_CET, env->s_cet);
+kvm_msr_entry_add(cpu, MSR_IA32_PL0_SSP, env->pl0_ssp);
+kvm_msr_entry_add(cpu, MSR_IA32_PL3_SSP, env->pl3_ssp);
+kvm_msr_entry_add(cpu, MSR_IA32_INTR_SSP_TABL, env->ssp_tabl_addr);
+}
+
 ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_MSRS, cpu->kvm_msr_buf);
 if (ret < 0) {
 return ret;
@@ -2516,6 +2526,14 @@ static int kvm_get_msrs(X86CPU *cpu)
 }
 }
 
+if (HAS_CET_CAP(env)) {
+kvm_msr_entry_add(cpu, MSR_IA32_U_CET, 0);
+kvm_msr_entry_add(cpu, MSR_IA32_S_CET, 0);
+kvm_msr_entry_add(cpu, MSR_IA32_PL0_SSP, 0);
+kvm_msr_entry_add(cpu, MSR_IA32_PL3_SSP, 0);
+kvm_msr_entry_add(cpu, MSR_IA32_INTR_SSP_TABL, 0);
+}
+
 ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_MSRS, cpu->kvm_msr_buf);
 if (ret < 0) {
 return ret;
@@ -2789,6 +2807,21 @@ static int kvm_get_msrs(X86CPU *cpu)
 case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B:
 env->msr_rtit_addrs[index - MSR_IA32_RTIT_ADDR0_A] = msrs[i].data;
 break;
+case MSR_IA32_U_CET:
+env->u_cet = msrs[i].data;
+break;
+case MSR_IA32_S_CET:
+env->s_cet = msrs[i].data;
+break;
+case MSR_IA32_PL0_SSP:
+env->pl0_ssp = msrs[i].data;
+break;
+case MSR_IA32_PL3_SSP:
+env->pl3_ssp = msrs[i].data;
+break;
+case MSR_IA32_INTR_SSP_TABL:
+env->ssp_tabl_addr = msrs[i].data;
+break;
 }
 }
 
diff --git a/target/i386/machine.c b/target/i386/machine.c
index 225b5d433b..5f8a12ca30 100644
--- a/target/i386/machine.c
+++ b/target/i386/machine.c
@@ -810,6 +810,101 @@ static const VMStateDescription vmstate_xss = {
 }
 };
 
+static bool u_cet_needed(void *opaque)
+{
+X86CPU *cpu = opaque;
+CPUX86State *env = >env;
+
+return env->u_cet != 0;
+}
+
+static const VMStateDescription vmstate_u_cet = {
+.name = "cpu/u_cet",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = u_cet_needed,
+.fields = (VMStateField[]) {
+VMSTATE_UINT64(env.u_cet, X86CPU),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static bool s_cet_needed(void *opaque)
+{
+X86CPU *cpu = opaque;
+CPUX86State *env = >env;
+
+return env->s_cet != 0;
+}
+
+static const VMStateDescription vmstate_s_cet = {
+.name = "cpu/s_cet",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = s_cet_needed,
+.fields = (VMStateField[]) {
+VMSTATE_UINT64(env.s_cet, X86CPU),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static bool pl0_ssp_needed(void *opaque)
+{
+X86CPU *cpu = opaque;
+CPUX86State *env = >env;
+
+return env->pl0_ssp != 0;
+}
+
+static const VMStateDescription vmstate_pl0_ssp = {
+.name = "cpu/pl0_ssp",
+.version_id = 1,
+

[Qemu-devel] RESEND: [PATCH v3 0/5] This patch-set is to enable Guest

2019-02-25 Thread Yang Weijiang

Control-flow Enforcement Technology (CET) provides protection against
return/jump-oriented programming (ROP) attacks. To make kvm Guest OS own
the capability, this patch-set is required. It enables CET related CPUID
report, xsaves/xrstors and live-migration etc. in Qemu.

Changelog:

 v3:
 - Add CET MSR save/restore support for live-migration.

 v2:
 - In CPUID.(EAX=d, ECX=1), set return ECX[n] = 0 if bit n corresponds
   to a bit in MSR_IA32_XSS.
 - In CPUID.(EAX=d, ECX=n), set return ECX = 1 if bit n corresponds
   to a bit in MSR_IA32_XSS.
 - Skip Supervisor mode xsave component when calculate User mode
   xave component size in xsave_area_size() and x86_cpu_reset().

Yang Weijiang (5):
  Add CET xsaves/xrstors related macros and structures.
  Add CET SHSTK and IBT CPUID feature-word definitions.
  Add hepler functions for CPUID xsave area size calculation.
  Report CPUID xsave area support for CET.
  Add CET MSR save/restore support for migration

 target/i386/cpu.c |  73 --
 target/i386/cpu.h |  48 +++-
 target/i386/kvm.c |  27 
 target/i386/machine.c | 100 ++
 4 files changed, 244 insertions(+), 4 deletions(-)

-- 
2.17.1

[Qemu-devel] [PATCH v3 2/5] Add CET SHSTK and IBT CPUID feature-word definitions.

2019-02-25 Thread Yang Weijiang

XSS[bit 11] and XSS[bit 12] correspond to CET
user mode area and supervisor mode area respectively.

Signed-off-by: Zhang Yi 
Signed-off-by: Yang Weijiang 
---
 target/i386/cpu.c | 37 +++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index f81d35e1f9..f6c7bdf6fe 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1018,7 +1018,7 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = 
{
 .type = CPUID_FEATURE_WORD,
 .feat_names = {
 NULL, "avx512vbmi", "umip", "pku",
-NULL /* ospke */, NULL, "avx512vbmi2", NULL,
+NULL /* ospke */, NULL, "avx512vbmi2", "shstk",
 "gfni", "vaes", "vpclmulqdq", "avx512vnni",
 "avx512bitalg", NULL, "avx512-vpopcntdq", NULL,
 "la57", NULL, NULL, NULL,
@@ -1041,7 +1041,7 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = 
{
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, "pconfig", NULL,
-NULL, NULL, NULL, NULL,
+"ibt", NULL, NULL, NULL,
 NULL, NULL, "spec-ctrl", NULL,
 NULL, "arch-capabilities", NULL, "ssbd",
 },
@@ -1162,6 +1162,25 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] 
= {
 }
 },
 },
+/* Below are CET supervisor xsave features */
+[FEAT_XSAVE_SV_LO] = {
+.type = CPUID_FEATURE_WORD,
+.cpuid = {
+.eax = 0xD,
+.needs_ecx = true,
+.ecx = 1,
+.reg = R_ECX,
+},
+},
+[FEAT_XSAVE_SV_HI] = {
+.type = CPUID_FEATURE_WORD,
+.cpuid = {
+.eax = 0xD,
+.needs_ecx = true,
+.ecx = 1,
+.reg = R_EDX
+},
+}
 };
 
 typedef struct X86RegisterInfo32 {
@@ -1233,6 +1252,14 @@ static const ExtSaveArea x86_ext_save_areas[] = {
   { .feature = FEAT_7_0_ECX, .bits = CPUID_7_0_ECX_PKU,
 .offset = offsetof(X86XSaveArea, pkru_state),
 .size = sizeof(XSavePKRU) },
+[XSTATE_CET_U_BIT] = {
+.feature = FEAT_7_0_ECX, .bits = CPUID_7_0_ECX_CET_SHSTK,
+.offset = 0 /*supervisor mode component, offset = 0 */,
+.size = sizeof(XSaveCETU) },
+[XSTATE_CET_S_BIT] = {
+.feature = FEAT_7_0_ECX, .bits = CPUID_7_0_ECX_CET_SHSTK,
+.offset = 0 /*supervisor mode component, offset = 0 */,
+.size = sizeof(XSaveCETS) },
 };
 
 static uint32_t xsave_area_size(uint64_t mask)
@@ -1243,6 +1270,9 @@ static uint32_t xsave_area_size(uint64_t mask)
 for (i = 0; i < ARRAY_SIZE(x86_ext_save_areas); i++) {
 const ExtSaveArea *esa = _ext_save_areas[i];
 if ((mask >> i) & 1) {
+if (i >= 2 && !esa->offset) {
+continue;
+}
 ret = MAX(ret, esa->offset + esa->size);
 }
 }
@@ -4657,6 +4687,9 @@ static void x86_cpu_reset(CPUState *s)
 }
 for (i = 2; i < ARRAY_SIZE(x86_ext_save_areas); i++) {
 const ExtSaveArea *esa = _ext_save_areas[i];
+if (!esa->offset) {
+continue;
+}
 if (env->features[esa->feature] & esa->bits) {
 xcr0 |= 1ull << i;
 }
-- 
2.17.1

[Qemu-devel] [PATCH v3 4/5] Report CPUID xsave area support for CET.

2019-02-25 Thread Yang Weijiang

CPUID bit definition as below:
CPUID.(EAX=d, ECX=1):ECX.CET_U(bit 11): user mode state
CPUID.(EAX=d, ECX=1):ECX.CET_S(bit 12): kernel mode state

Signed-off-by: Zhang Yi 
Signed-off-by: Yang Weijiang 
---
 target/i386/cpu.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index d8c36e0f2f..15e2d5e009 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -4399,12 +4399,22 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 *ebx = xsave_area_size(env->xcr0);
 } else if (count == 1) {
 *eax = env->features[FEAT_XSAVE];
+*ecx = env->features[FEAT_XSAVE_SV_LO];
+*edx = env->features[FEAT_XSAVE_SV_HI];
+*ebx = xsave_area_size_compacted(x86_cpu_xsave_components(cpu) |
+x86_cpu_xsave_sv_components(cpu));
 } else if (count < ARRAY_SIZE(x86_ext_save_areas)) {
 if ((x86_cpu_xsave_components(cpu) >> count) & 1) {
 const ExtSaveArea *esa = _ext_save_areas[count];
 *eax = esa->size;
 *ebx = esa->offset;
 }
+if ((x86_cpu_xsave_sv_components(cpu) >> count) & 1) {
+const ExtSaveArea *esa_sv = _ext_save_areas[count];
+*eax = esa_sv->size;
+*ebx = 0;
+*ecx = 1;
+}
 }
 break;
 }
-- 
2.17.1

[Qemu-devel] [PATCH v3 1/5] Add CET xsaves/xrstors related macros and structures.

2019-02-25 Thread Yang Weijiang

CET protection in user mode and kernel mode relies on
specific MSRs, these MSRs' contents are automatically
saved/restored by xsaves/xrstors instructions.

Signed-off-by: Zhang Yi 
Signed-off-by: Yang Weijiang 
---
 target/i386/cpu.h | 36 +++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 9c52d0cbeb..f3f724d8e6 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -469,6 +469,9 @@ typedef enum X86Seg {
 #define XSTATE_ZMM_Hi256_BIT6
 #define XSTATE_Hi16_ZMM_BIT 7
 #define XSTATE_PKRU_BIT 9
+#define XSTATE_RESERVED_BIT 10
+#define XSTATE_CET_U_BIT11
+#define XSTATE_CET_S_BIT12
 
 #define XSTATE_FP_MASK  (1ULL << XSTATE_FP_BIT)
 #define XSTATE_SSE_MASK (1ULL << XSTATE_SSE_BIT)
@@ -479,6 +482,19 @@ typedef enum X86Seg {
 #define XSTATE_ZMM_Hi256_MASK   (1ULL << XSTATE_ZMM_Hi256_BIT)
 #define XSTATE_Hi16_ZMM_MASK(1ULL << XSTATE_Hi16_ZMM_BIT)
 #define XSTATE_PKRU_MASK(1ULL << XSTATE_PKRU_BIT)
+#define XSTATE_RESERVED_MASK(1ULL << XSTATE_RESERVED_BIT)
+#define XSTATE_CET_U_MASK   (1ULL << XSTATE_CET_U_BIT)
+#define XSTATE_CET_S_MASK   (1ULL << XSTATE_CET_S_BIT)
+
+/* CPUID feature bits available in XCR0 */
+#define CPUID_XSTATE_USER_MASK  (XSTATE_FP_MASK | XSTATE_SSE_MASK \
+| XSTATE_YMM_MASK | XSTATE_BNDREGS_MASK \
+| XSTATE_BNDCSR_MASK | XSTATE_OPMASK_MASK \
+| XSTATE_ZMM_Hi256_MASK \
+| XSTATE_Hi16_ZMM_MASK | XSTATE_PKRU_MASK)
+
+/* CPUID feature bits available in XSS */
+#define CPUID_XSTATE_KERNEL_MASK(XSTATE_CET_U_MASK | XSTATE_CET_S_MASK)
 
 /* CPUID feature words */
 typedef enum FeatureWord {
@@ -503,6 +519,8 @@ typedef enum FeatureWord {
 FEAT_XSAVE_COMP_LO, /* CPUID[EAX=0xd,ECX=0].EAX */
 FEAT_XSAVE_COMP_HI, /* CPUID[EAX=0xd,ECX=0].EDX */
 FEAT_ARCH_CAPABILITIES,
+FEAT_XSAVE_SV_LO,   /* CPUID[EAX=0xd,ECX=1].ECX */
+FEAT_XSAVE_SV_HI,   /* CPUID[EAX=0xd,ECX=1].EDX */
 FEATURE_WORDS,
 } FeatureWord;
 
@@ -687,7 +705,7 @@ typedef uint32_t FeatureWordArray[FEATURE_WORDS];
 #define CPUID_7_0_ECX_LA57 (1U << 16)
 #define CPUID_7_0_ECX_RDPID(1U << 22)
 #define CPUID_7_0_ECX_CLDEMOTE (1U << 25)  /* CLDEMOTE Instruction */
-
+#define CPUID_7_0_ECX_CET_SHSTK (1U << 7)  /* CET SHSTK feature bit */
 #define CPUID_7_0_EDX_AVX512_4VNNIW (1U << 2) /* AVX512 Neural Network 
Instructions */
 #define CPUID_7_0_EDX_AVX512_4FMAPS (1U << 3) /* AVX512 Multiply Accumulation 
Single Precision */
 #define CPUID_7_0_EDX_PCONFIG (1U << 18)   /* Platform Configuration */
@@ -1021,6 +1039,19 @@ typedef struct XSavePKRU {
 uint32_t padding;
 } XSavePKRU;
 
+/* Ext. save area 11: User mode CET state */
+typedef struct XSaveCETU {
+uint64_t u_cet;
+uint64_t user_ssp;
+} XSaveCETU;
+
+/* Ext. save area 12: Supervisor mode CET state */
+typedef struct XSaveCETS {
+uint64_t kernel_ssp;
+uint64_t pl1_ssp;
+uint64_t pl2_ssp;
+} XSaveCETS;
+
 typedef struct X86XSaveArea {
 X86LegacyXSaveArea legacy;
 X86XSaveHeader header;
@@ -1039,6 +1070,9 @@ typedef struct X86XSaveArea {
 XSaveHi16_ZMM hi16_zmm_state;
 /* PKRU State: */
 XSavePKRU pkru_state;
+/* CET State: */
+XSaveCETU cet_u;
+XSaveCETS cet_s;
 } X86XSaveArea;
 
 QEMU_BUILD_BUG_ON(offsetof(X86XSaveArea, avx_state) != 0x240);
-- 
2.17.1

[Qemu-devel] [PATCH v3 4/5] Report CPUID xsave area support for CET.

2019-02-25 Thread Yang Weijiang

CPUID bit definition as below:
CPUID.(EAX=d, ECX=1):ECX.CET_U(bit 11): user mode state
CPUID.(EAX=d, ECX=1):ECX.CET_S(bit 12): kernel mode state

Signed-off-by: Zhang Yi 
Signed-off-by: Yang Weijiang 
---
 target/i386/cpu.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index d8c36e0f2f..15e2d5e009 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -4399,12 +4399,22 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 *ebx = xsave_area_size(env->xcr0);
 } else if (count == 1) {
 *eax = env->features[FEAT_XSAVE];
+*ecx = env->features[FEAT_XSAVE_SV_LO];
+*edx = env->features[FEAT_XSAVE_SV_HI];
+*ebx = xsave_area_size_compacted(x86_cpu_xsave_components(cpu) |
+x86_cpu_xsave_sv_components(cpu));
 } else if (count < ARRAY_SIZE(x86_ext_save_areas)) {
 if ((x86_cpu_xsave_components(cpu) >> count) & 1) {
 const ExtSaveArea *esa = _ext_save_areas[count];
 *eax = esa->size;
 *ebx = esa->offset;
 }
+if ((x86_cpu_xsave_sv_components(cpu) >> count) & 1) {
+const ExtSaveArea *esa_sv = _ext_save_areas[count];
+*eax = esa_sv->size;
+*ebx = 0;
+*ecx = 1;
+}
 }
 break;
 }
-- 
2.17.1

[Qemu-devel] [PATCH v3 2/5] Add CET SHSTK and IBT CPUID feature-word definitions.

2019-02-25 Thread Yang Weijiang

XSS[bit 11] and XSS[bit 12] correspond to CET
user mode area and supervisor mode area respectively.

Signed-off-by: Zhang Yi 
Signed-off-by: Yang Weijiang 
---
 target/i386/cpu.c | 37 +++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index f81d35e1f9..f6c7bdf6fe 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1018,7 +1018,7 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = 
{
 .type = CPUID_FEATURE_WORD,
 .feat_names = {
 NULL, "avx512vbmi", "umip", "pku",
-NULL /* ospke */, NULL, "avx512vbmi2", NULL,
+NULL /* ospke */, NULL, "avx512vbmi2", "shstk",
 "gfni", "vaes", "vpclmulqdq", "avx512vnni",
 "avx512bitalg", NULL, "avx512-vpopcntdq", NULL,
 "la57", NULL, NULL, NULL,
@@ -1041,7 +1041,7 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = 
{
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, "pconfig", NULL,
-NULL, NULL, NULL, NULL,
+"ibt", NULL, NULL, NULL,
 NULL, NULL, "spec-ctrl", NULL,
 NULL, "arch-capabilities", NULL, "ssbd",
 },
@@ -1162,6 +1162,25 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] 
= {
 }
 },
 },
+/* Below are CET supervisor xsave features */
+[FEAT_XSAVE_SV_LO] = {
+.type = CPUID_FEATURE_WORD,
+.cpuid = {
+.eax = 0xD,
+.needs_ecx = true,
+.ecx = 1,
+.reg = R_ECX,
+},
+},
+[FEAT_XSAVE_SV_HI] = {
+.type = CPUID_FEATURE_WORD,
+.cpuid = {
+.eax = 0xD,
+.needs_ecx = true,
+.ecx = 1,
+.reg = R_EDX
+},
+}
 };
 
 typedef struct X86RegisterInfo32 {
@@ -1233,6 +1252,14 @@ static const ExtSaveArea x86_ext_save_areas[] = {
   { .feature = FEAT_7_0_ECX, .bits = CPUID_7_0_ECX_PKU,
 .offset = offsetof(X86XSaveArea, pkru_state),
 .size = sizeof(XSavePKRU) },
+[XSTATE_CET_U_BIT] = {
+.feature = FEAT_7_0_ECX, .bits = CPUID_7_0_ECX_CET_SHSTK,
+.offset = 0 /*supervisor mode component, offset = 0 */,
+.size = sizeof(XSaveCETU) },
+[XSTATE_CET_S_BIT] = {
+.feature = FEAT_7_0_ECX, .bits = CPUID_7_0_ECX_CET_SHSTK,
+.offset = 0 /*supervisor mode component, offset = 0 */,
+.size = sizeof(XSaveCETS) },
 };
 
 static uint32_t xsave_area_size(uint64_t mask)
@@ -1243,6 +1270,9 @@ static uint32_t xsave_area_size(uint64_t mask)
 for (i = 0; i < ARRAY_SIZE(x86_ext_save_areas); i++) {
 const ExtSaveArea *esa = _ext_save_areas[i];
 if ((mask >> i) & 1) {
+if (i >= 2 && !esa->offset) {
+continue;
+}
 ret = MAX(ret, esa->offset + esa->size);
 }
 }
@@ -4657,6 +4687,9 @@ static void x86_cpu_reset(CPUState *s)
 }
 for (i = 2; i < ARRAY_SIZE(x86_ext_save_areas); i++) {
 const ExtSaveArea *esa = _ext_save_areas[i];
+if (!esa->offset) {
+continue;
+}
 if (env->features[esa->feature] & esa->bits) {
 xcr0 |= 1ull << i;
 }
-- 
2.17.1

Re: [Qemu-devel] [QEMU-PPC] [PATCH 1/4] target/ppc/spapr: Add SPAPR_CAP_LARGE_DECREMENTER

2019-02-25 Thread Suraj Jitindar Singh

On Tue, 2019-02-26 at 14:39 +1100, David Gibson wrote:
> On Tue, Feb 26, 2019 at 02:05:28PM +1100, Suraj Jitindar Singh wrote:
> > Add spapr_cap SPAPR_CAP_LARGE_DECREMENTER to be used to control the
> > availability and size of the large decrementer made available to
> > the
> > guest.
> > 
> > Signed-off-by: Suraj Jitindar Singh 
> > ---
> >  hw/ppc/spapr.c |  2 ++
> >  hw/ppc/spapr_caps.c| 45
> > +
> >  include/hw/ppc/spapr.h |  5 -
> >  3 files changed, 51 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index b6a571b6f1..acf62a2b9f 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -2077,6 +2077,7 @@ static const VMStateDescription vmstate_spapr
> > = {
> >  _spapr_irq_map,
> >  _spapr_cap_nested_kvm_hv,
> >  _spapr_dtb,
> > +_spapr_cap_large_decr,
> >  NULL
> >  }
> >  };
> > @@ -4288,6 +4289,7 @@ static void
> > spapr_machine_class_init(ObjectClass *oc, void *data)
> >  smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_BROKEN;
> >  smc->default_caps.caps[SPAPR_CAP_HPT_MAXPAGESIZE] = 16; /*
> > 64kiB */
> >  smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] =
> > SPAPR_CAP_OFF;
> > +smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = 0;
> 
> This looks basically fine, but the name kind of suggests it's a
> boolean, whereas it's actually a number of bits.  I wonder if just
> calling it "decrementer bits" would be clearer, with it defaulting to
> 32.

Yes, except there's a difference between a decrementer with 32 bits and
a large decrementer with 32 bits...

SPAPR_CAP_LARGE_DECR_NR_BITS?

> 
> >  spapr_caps_add_properties(smc, _abort);
> >  smc->irq = _irq_xics;
> >  smc->dr_phb_enabled = true;
> > diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
> > index 64f98ae68d..1545a02729 100644
> > --- a/hw/ppc/spapr_caps.c
> > +++ b/hw/ppc/spapr_caps.c
> > @@ -182,6 +182,34 @@ static void spapr_cap_set_pagesize(Object
> > *obj, Visitor *v, const char *name,
> >  spapr->eff.caps[cap->index] = val;
> >  }
> >  
> > +static void spapr_cap_get_uint8(Object *obj, Visitor *v, const
> > char *name,
> > +void *opaque, Error **errp)
> > +{
> > +sPAPRCapabilityInfo *cap = opaque;
> > +sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> > +uint8_t val = spapr_get_cap(spapr, cap->index);
> > +
> > +visit_type_uint8(v, name, , errp);
> > +}
> > +
> > +static void spapr_cap_set_uint8(Object *obj, Visitor *v, const
> > char *name,
> > +void *opaque, Error **errp)
> > +{
> > +sPAPRCapabilityInfo *cap = opaque;
> > +sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> > +Error *local_err = NULL;
> > +uint8_t val;
> > +
> > +visit_type_uint8(v, name, , _err);
> > +if (local_err) {
> > +error_propagate(errp, local_err);
> > +return;
> > +}
> > +
> > +spapr->cmd_line_caps[cap->index] = true;
> > +spapr->eff.caps[cap->index] = val;
> > +}
> > +
> >  static void cap_htm_apply(sPAPRMachineState *spapr, uint8_t val,
> > Error **errp)
> >  {
> >  if (!val) {
> > @@ -390,6 +418,13 @@ static void
> > cap_nested_kvm_hv_apply(sPAPRMachineState *spapr,
> >  }
> >  }
> >  
> > +static void cap_large_decr_apply(sPAPRMachineState *spapr,
> > + uint8_t val, Error **errp)
> > +{
> > +if (val)
> > +error_setg(errp, "No large decrementer support, try cap-
> > large-decr=0");
> > +}
> > +
> >  sPAPRCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
> >  [SPAPR_CAP_HTM] = {
> >  .name = "htm",
> > @@ -468,6 +503,15 @@ sPAPRCapabilityInfo
> > capability_table[SPAPR_CAP_NUM] = {
> >  .type = "bool",
> >  .apply = cap_nested_kvm_hv_apply,
> >  },
> > +[SPAPR_CAP_LARGE_DECREMENTER] = {
> > +.name = "large-decr",
> > +.description = "Size of Large Decrementer for the Guest
> > (bits) 0=disabled",
> > +.index = SPAPR_CAP_LARGE_DECREMENTER,
> > +.get = spapr_cap_get_uint8,
> > +.set = spapr_cap_set_uint8,
> > +.type = "int",
> > +.apply = cap_large_decr_apply,
> > +},
> >  };
> >  
> >  static sPAPRCapabilities default_caps_with_cpu(sPAPRMachineState
> > *spapr,
> > @@ -596,6 +640,7 @@ SPAPR_CAP_MIG_STATE(cfpc, SPAPR_CAP_CFPC);
> >  SPAPR_CAP_MIG_STATE(sbbc, SPAPR_CAP_SBBC);
> >  SPAPR_CAP_MIG_STATE(ibs, SPAPR_CAP_IBS);
> >  SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NESTED_KVM_HV);
> > +SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
> >  
> >  void spapr_caps_init(sPAPRMachineState *spapr)
> >  {
> > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > index 59073a7579..8efc5e0779 100644
> > --- a/include/hw/ppc/spapr.h
> > +++ b/include/hw/ppc/spapr.h
> > @@ -74,8 +74,10 @@ typedef enum {
> >  #define SPAPR_CAP_HPT_MAXPAGESIZE   0x06
> >  /* Nested KVM-HV */
> >

[Qemu-devel] [PATCH v3 3/5] Add hepler functions for CPUID xsave area size calculation.

2019-02-25 Thread Yang Weijiang

These functions are called when return CPUID xsave area
size information.

Signed-off-by: Zhang Yi 
Signed-off-by: Yang Weijiang 
---
 target/i386/cpu.c | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index f6c7bdf6fe..d8c36e0f2f 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1284,12 +1284,34 @@ static inline bool accel_uses_host_cpuid(void)
 return kvm_enabled() || hvf_enabled();
 }
 
+static uint32_t xsave_area_size_compacted(uint64_t mask)
+{
+int i;
+uint64_t ret = 0;
+uint32_t offset;
+
+for (i = 0; i < ARRAY_SIZE(x86_ext_save_areas); i++) {
+const ExtSaveArea *esa = _ext_save_areas[i];
+offset = i >= 2 ? ret : esa->offset;
+if ((mask >> i) & 1) {
+ret = MAX(ret, offset + esa->size);
+}
+}
+return ret;
+}
+
 static inline uint64_t x86_cpu_xsave_components(X86CPU *cpu)
 {
 return ((uint64_t)cpu->env.features[FEAT_XSAVE_COMP_HI]) << 32 |
cpu->env.features[FEAT_XSAVE_COMP_LO];
 }
 
+static inline uint64_t x86_cpu_xsave_sv_components(X86CPU *cpu)
+{
+return ((uint64_t)cpu->env.features[FEAT_XSAVE_SV_HI]) << 32 |
+   cpu->env.features[FEAT_XSAVE_SV_LO];
+}
+
 const char *get_register_name_32(unsigned int reg)
 {
 if (reg >= CPU_NB_REGS32) {
@@ -4919,8 +4941,10 @@ static void x86_cpu_enable_xsave_components(X86CPU *cpu)
 }
 }
 
-env->features[FEAT_XSAVE_COMP_LO] = mask;
+env->features[FEAT_XSAVE_COMP_LO] = mask & CPUID_XSTATE_USER_MASK;
 env->features[FEAT_XSAVE_COMP_HI] = mask >> 32;
+env->features[FEAT_XSAVE_SV_LO] = mask & CPUID_XSTATE_KERNEL_MASK;
+env->features[FEAT_XSAVE_SV_HI] = mask >> 32;
 }
 
 /* Steps involved on loading and filtering CPUID data
-- 
2.17.1

[Qemu-devel] [PATCH v3 5/5] Add CET MSR save/restore support for migration

2019-02-25 Thread Yang Weijiang

To support features such as live-migration,
CET runtime MSRs need to be saved in source machine and
restored on destination machine, this patch is to save
and restore CET_U, CET_S, PL0_SSP, PL3_SSP and SSP_TABL_ADDR
MSRs.

Signed-off-by: Yang Weijiang 
---
 target/i386/cpu.h |  12 +
 target/i386/kvm.c |  33 ++
 target/i386/machine.c | 100 ++
 3 files changed, 145 insertions(+)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index f3f724d8e6..f350684895 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -460,6 +460,12 @@ typedef enum X86Seg {
 #define MSR_IA32_BNDCFGS0x0d90
 #define MSR_IA32_XSS0x0da0
 
+#define MSR_IA32_U_CET  0x6a0
+#define MSR_IA32_S_CET  0x6a2
+#define MSR_IA32_PL0_SSP0x6a4
+#define MSR_IA32_PL3_SSP0x6a7
+#define MSR_IA32_INTR_SSP_TABL  0x6a8
+
 #define XSTATE_FP_BIT   0
 #define XSTATE_SSE_BIT  1
 #define XSTATE_YMM_BIT  2
@@ -1325,6 +1331,12 @@ typedef struct CPUX86State {
 
 uintptr_t retaddr;
 
+uint64_t u_cet;
+uint64_t s_cet;
+uint64_t pl0_ssp;
+uint64_t pl3_ssp;
+uint64_t ssp_tabl_addr;
+
 /* Fields up to this point are cleared by a CPU reset */
 struct {} end_reset_fields;
 
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index f524e7d929..2ab3c977a4 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -63,6 +63,8 @@
 /* A 4096-byte buffer can hold the 8-byte kvm_msrs header, plus
  * 255 kvm_msr_entry structs */
 #define MSR_BUF_SIZE 4096
+#define HAS_CET_CAP(env)  (env->features[FEAT_7_0_ECX] & 0x80 || \
+   env->features[FEAT_7_0_EDX] & 0x10)
 
 const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 KVM_CAP_INFO(SET_TSS_ADDR),
@@ -2197,6 +2199,14 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 }
 }
 
+if (HAS_CET_CAP(env)) {
+kvm_msr_entry_add(cpu, MSR_IA32_U_CET, env->u_cet);
+kvm_msr_entry_add(cpu, MSR_IA32_S_CET, env->s_cet);
+kvm_msr_entry_add(cpu, MSR_IA32_PL0_SSP, env->pl0_ssp);
+kvm_msr_entry_add(cpu, MSR_IA32_PL3_SSP, env->pl3_ssp);
+kvm_msr_entry_add(cpu, MSR_IA32_INTR_SSP_TABL, env->ssp_tabl_addr);
+}
+
 ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_MSRS, cpu->kvm_msr_buf);
 if (ret < 0) {
 return ret;
@@ -2516,6 +2526,14 @@ static int kvm_get_msrs(X86CPU *cpu)
 }
 }
 
+if (HAS_CET_CAP(env)) {
+kvm_msr_entry_add(cpu, MSR_IA32_U_CET, 0);
+kvm_msr_entry_add(cpu, MSR_IA32_S_CET, 0);
+kvm_msr_entry_add(cpu, MSR_IA32_PL0_SSP, 0);
+kvm_msr_entry_add(cpu, MSR_IA32_PL3_SSP, 0);
+kvm_msr_entry_add(cpu, MSR_IA32_INTR_SSP_TABL, 0);
+}
+
 ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_MSRS, cpu->kvm_msr_buf);
 if (ret < 0) {
 return ret;
@@ -2789,6 +2807,21 @@ static int kvm_get_msrs(X86CPU *cpu)
 case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B:
 env->msr_rtit_addrs[index - MSR_IA32_RTIT_ADDR0_A] = msrs[i].data;
 break;
+case MSR_IA32_U_CET:
+env->u_cet = msrs[i].data;
+break;
+case MSR_IA32_S_CET:
+env->s_cet = msrs[i].data;
+break;
+case MSR_IA32_PL0_SSP:
+env->pl0_ssp = msrs[i].data;
+break;
+case MSR_IA32_PL3_SSP:
+env->pl3_ssp = msrs[i].data;
+break;
+case MSR_IA32_INTR_SSP_TABL:
+env->ssp_tabl_addr = msrs[i].data;
+break;
 }
 }
 
diff --git a/target/i386/machine.c b/target/i386/machine.c
index 225b5d433b..5f8a12ca30 100644
--- a/target/i386/machine.c
+++ b/target/i386/machine.c
@@ -810,6 +810,101 @@ static const VMStateDescription vmstate_xss = {
 }
 };
 
+static bool u_cet_needed(void *opaque)
+{
+X86CPU *cpu = opaque;
+CPUX86State *env = >env;
+
+return env->u_cet != 0;
+}
+
+static const VMStateDescription vmstate_u_cet = {
+.name = "cpu/u_cet",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = u_cet_needed,
+.fields = (VMStateField[]) {
+VMSTATE_UINT64(env.u_cet, X86CPU),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static bool s_cet_needed(void *opaque)
+{
+X86CPU *cpu = opaque;
+CPUX86State *env = >env;
+
+return env->s_cet != 0;
+}
+
+static const VMStateDescription vmstate_s_cet = {
+.name = "cpu/s_cet",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = s_cet_needed,
+.fields = (VMStateField[]) {
+VMSTATE_UINT64(env.s_cet, X86CPU),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static bool pl0_ssp_needed(void *opaque)
+{
+X86CPU *cpu = opaque;
+CPUX86State *env = >env;
+
+return env->pl0_ssp != 0;
+}
+
+static const VMStateDescription vmstate_pl0_ssp = {
+.name = "cpu/pl0_ssp",
+.version_id = 1,
+

[Qemu-devel] [PATCH v3 1/5] Add CET xsaves/xrstors related macros and structures.

2019-02-25 Thread Yang Weijiang

CET protection in user mode and kernel mode relies on
specific MSRs, these MSRs' contents are automatically
saved/restored by xsaves/xrstors instructions.

Signed-off-by: Zhang Yi 
Signed-off-by: Yang Weijiang 
---
 target/i386/cpu.h | 36 +++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 9c52d0cbeb..f3f724d8e6 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -469,6 +469,9 @@ typedef enum X86Seg {
 #define XSTATE_ZMM_Hi256_BIT6
 #define XSTATE_Hi16_ZMM_BIT 7
 #define XSTATE_PKRU_BIT 9
+#define XSTATE_RESERVED_BIT 10
+#define XSTATE_CET_U_BIT11
+#define XSTATE_CET_S_BIT12
 
 #define XSTATE_FP_MASK  (1ULL << XSTATE_FP_BIT)
 #define XSTATE_SSE_MASK (1ULL << XSTATE_SSE_BIT)
@@ -479,6 +482,19 @@ typedef enum X86Seg {
 #define XSTATE_ZMM_Hi256_MASK   (1ULL << XSTATE_ZMM_Hi256_BIT)
 #define XSTATE_Hi16_ZMM_MASK(1ULL << XSTATE_Hi16_ZMM_BIT)
 #define XSTATE_PKRU_MASK(1ULL << XSTATE_PKRU_BIT)
+#define XSTATE_RESERVED_MASK(1ULL << XSTATE_RESERVED_BIT)
+#define XSTATE_CET_U_MASK   (1ULL << XSTATE_CET_U_BIT)
+#define XSTATE_CET_S_MASK   (1ULL << XSTATE_CET_S_BIT)
+
+/* CPUID feature bits available in XCR0 */
+#define CPUID_XSTATE_USER_MASK  (XSTATE_FP_MASK | XSTATE_SSE_MASK \
+| XSTATE_YMM_MASK | XSTATE_BNDREGS_MASK \
+| XSTATE_BNDCSR_MASK | XSTATE_OPMASK_MASK \
+| XSTATE_ZMM_Hi256_MASK \
+| XSTATE_Hi16_ZMM_MASK | XSTATE_PKRU_MASK)
+
+/* CPUID feature bits available in XSS */
+#define CPUID_XSTATE_KERNEL_MASK(XSTATE_CET_U_MASK | XSTATE_CET_S_MASK)
 
 /* CPUID feature words */
 typedef enum FeatureWord {
@@ -503,6 +519,8 @@ typedef enum FeatureWord {
 FEAT_XSAVE_COMP_LO, /* CPUID[EAX=0xd,ECX=0].EAX */
 FEAT_XSAVE_COMP_HI, /* CPUID[EAX=0xd,ECX=0].EDX */
 FEAT_ARCH_CAPABILITIES,
+FEAT_XSAVE_SV_LO,   /* CPUID[EAX=0xd,ECX=1].ECX */
+FEAT_XSAVE_SV_HI,   /* CPUID[EAX=0xd,ECX=1].EDX */
 FEATURE_WORDS,
 } FeatureWord;
 
@@ -687,7 +705,7 @@ typedef uint32_t FeatureWordArray[FEATURE_WORDS];
 #define CPUID_7_0_ECX_LA57 (1U << 16)
 #define CPUID_7_0_ECX_RDPID(1U << 22)
 #define CPUID_7_0_ECX_CLDEMOTE (1U << 25)  /* CLDEMOTE Instruction */
-
+#define CPUID_7_0_ECX_CET_SHSTK (1U << 7)  /* CET SHSTK feature bit */
 #define CPUID_7_0_EDX_AVX512_4VNNIW (1U << 2) /* AVX512 Neural Network 
Instructions */
 #define CPUID_7_0_EDX_AVX512_4FMAPS (1U << 3) /* AVX512 Multiply Accumulation 
Single Precision */
 #define CPUID_7_0_EDX_PCONFIG (1U << 18)   /* Platform Configuration */
@@ -1021,6 +1039,19 @@ typedef struct XSavePKRU {
 uint32_t padding;
 } XSavePKRU;
 
+/* Ext. save area 11: User mode CET state */
+typedef struct XSaveCETU {
+uint64_t u_cet;
+uint64_t user_ssp;
+} XSaveCETU;
+
+/* Ext. save area 12: Supervisor mode CET state */
+typedef struct XSaveCETS {
+uint64_t kernel_ssp;
+uint64_t pl1_ssp;
+uint64_t pl2_ssp;
+} XSaveCETS;
+
 typedef struct X86XSaveArea {
 X86LegacyXSaveArea legacy;
 X86XSaveHeader header;
@@ -1039,6 +1070,9 @@ typedef struct X86XSaveArea {
 XSaveHi16_ZMM hi16_zmm_state;
 /* PKRU State: */
 XSavePKRU pkru_state;
+/* CET State: */
+XSaveCETU cet_u;
+XSaveCETS cet_s;
 } X86XSaveArea;
 
 QEMU_BUILD_BUG_ON(offsetof(X86XSaveArea, avx_state) != 0x240);
-- 
2.17.1

[Qemu-devel] (no subject)

2019-02-25 Thread Yang Weijiang

Subject: [Qemu-devel][PATCH v3 0/5] This patch-set is to enable Guest
CET support.

Control-flow Enforcement Technology (CET) provides protection against
return/jump-oriented programming (ROP) attacks. To make kvm Guest OS own
the capability, this patch-set is required. It enables CET related CPUID
report, xsaves/xrstors and live-migration etc. in Qemu.

Changelog:

 v3:
 - Add CET MSR save/restore support for live-migration.

 v2:
 - In CPUID.(EAX=d, ECX=1), set return ECX[n] = 0 if bit n corresponds
   to a bit in MSR_IA32_XSS.
 - In CPUID.(EAX=d, ECX=n), set return ECX = 1 if bit n corresponds
   to a bit in MSR_IA32_XSS.
 - Skip Supervisor mode xsave component when calculate User mode
   xave component size in xsave_area_size() and x86_cpu_reset().

Yang Weijiang (5):
  Add CET xsaves/xrstors related macros and structures.
  Add CET SHSTK and IBT CPUID feature-word definitions.
  Add hepler functions for CPUID xsave area size calculation.
  Report CPUID xsave area support for CET.
  Add CET MSR save/restore support for migration

 target/i386/cpu.c |  73 --
 target/i386/cpu.h |  48 +++-
 target/i386/kvm.c |  27 
 target/i386/machine.c | 100 ++
 4 files changed, 244 insertions(+), 4 deletions(-)

-- 
2.17.1

Re: [Qemu-devel] [PULL 08/11] authz: add QAuthZList object type for an access control list

2019-02-25 Thread Markus Armbruster

Eric Blake  writes:

> I missed reviewing this before the pull request, so comments here are
> best for a followup patch:

I procrastinated, same result.  My apologies.

Followup or quick respin is up to you.  I'd respin as long as the
changes are trivial.

> On 2/25/19 6:31 AM, Daniel P. Berrangé wrote:
>> From: "Daniel P. Berrange" 
>> 
>> Add a QAuthZList object type that implements the QAuthZ interface. This
>> built-in implementation maintains a trivial access control list with a
>> sequence of match rules and a final default policy. This replicates the
>> functionality currently provided by the qemu_acl module.
>> 
>
>> Reviewed-by: Marc-André Lureau 
>> Reviewed-by: Philippe Mathieu-Daudé 
>> Tested-by: Philippe Mathieu-Daudé 
>> Signed-off-by: Daniel P. Berrange 
>> ---
>
>> +++ b/qapi/Makefile.objs
>> @@ -7,7 +7,7 @@ util-obj-y += qapi-util.o
>>  
>>  QAPI_COMMON_MODULES = block-core block char common crypto introspect
>>  QAPI_COMMON_MODULES += job migration misc net rdma rocker run-state
>> -QAPI_COMMON_MODULES += sockets tpm trace transaction ui
>> +QAPI_COMMON_MODULES += sockets tpm trace transaction ui authz
>
> Let's keep this list alphabetically sorted (authz before block-core).

Yes, please.

>> +++ b/qapi/authz.json
>> @@ -0,0 +1,58 @@
>> +# -*- Mode: Python -*-
>> +#
>> +# QAPI authz definitions
>> +
>> +##
>> +# @QAuthZListPolicy:
>> +#
>> +# The authorization policy result
>> +#
>> +# @deny: deny access
>> +# @allow: allow access
>> +#
>> +# Since: 4.0
>> +##
>> +{ 'enum': 'QAuthZListPolicy',
>> +  'prefix': 'QAUTHZ_LIST_POLICY',
>> +  'data': ['deny', 'allow']}
>> +
>> +##
>> +# @QAuthZListFormat:
>> +#
>> +# The authorization policy result

Pasto?

>> +#
>> +# @exact: an exact string match
>> +# @glob: string with ? and * shell wildcard support
>
> Does it actually use glob() (in which case it also has [] glob support?)
>
>> +#
>> +# Since: 4.0
>> +##
>> +{ 'enum': 'QAuthZListFormat',
>> +  'prefix': 'QAUTHZ_LIST_FORMAT',
>> +  'data': ['exact', 'glob']}
>> +
>> +##
>> +# @QAuthZListRule:
>> +#
>> +# A single authorization rule.
>> +#
>> +# @match: a glob to match against a user identity
>
> Should this read 'a string or glob to match...' since...
>
>> +# @policy: the result to return if @match evaluates to true
>> +# @format: (optional) the format of the @match rule (default 'exact')
>
> ...format controls which of the two styles it is interpreted as?  The
> use of '(optional)' is not required in the current QAPI doc generator,
> and in fact results in redundant output.

Please drop (optional).

>
>> +#
>> +# Since: 4.0
>> +##
>> +{ 'struct': 'QAuthZListRule',
>> +  'data': {'match': 'str',
>> +   'policy': 'QAuthZListPolicy',
>> +   '*format': 'QAuthZListFormat'}}
>> +
>> +##
>> +# @QAuthZListRuleListHack:
>> +#
>> +# Not exposed via QMP; hack to generate QAuthZListRuleList
>> +# for use internally by the code.
>> +#
>> +# Since: 4.0
>> +##
>> +{ 'struct': 'QAuthZListRuleListHack',
>> +  'data': { 'unused': ['QAuthZListRule'] } }
>
> We keep on encountering these hacks; someday it would be nice to teach
> the QAPI generator a nicer way to do this. But not your problem.

[Qemu-devel] [PATCH 0/3] Migration/colo.c: Fix upstream bugs when occur failover

2019-02-25 Thread Zhang Chen

From: Zhang Chen 

Fix three bugs after COLO failover.


Zhang Chen (3):
  Migration/colo.c: Fix double close bug when occur COLO failover
  Migration/colo.c: Fix COLO failover status error
  Migration/colo.c: Make COLO node running after failover

 migration/colo.c  | 9 +
 migration/migration.c | 3 +++
 2 files changed, 8 insertions(+), 4 deletions(-)

-- 
2.17.GIT

[Qemu-devel] [PATCH 1/3] Migration/colo.c: Fix double close bug when occur COLO failover

2019-02-25 Thread Zhang Chen

From: Zhang Chen 

In migration_incoming_state_destroy(void) will check the mis->to_src_file
to double close the mis->to_src_file when occur COLO failover.

Signed-off-by: Zhang Chen 
---
 migration/colo.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/migration/colo.c b/migration/colo.c
index 398b239d1c..a916dc178c 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -872,6 +872,7 @@ out:
 /* Must be called after failover BH is completed */
 if (mis->to_src_file) {
 qemu_fclose(mis->to_src_file);
+mis->to_src_file = NULL;
 }
 migration_incoming_disable_colo();
 
-- 
2.17.GIT

[Qemu-devel] [PATCH 2/3] Migration/colo.c: Fix COLO failover status error

2019-02-25 Thread Zhang Chen

From: Zhang Chen 

When finished COLO failover, the status is FAILOVER_STATUS_COMPLETED.
The origin codes misunderstand the FAILOVER_STATUS_REQUIRE.

Signed-off-by: Zhang Chen 
---
 migration/colo.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index a916dc178c..a13acac192 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -121,6 +121,7 @@ static void secondary_vm_do_failover(void)
 }
 /* Notify COLO incoming thread that failover work is finished */
 qemu_sem_post(>colo_incoming_sem);
+
 /* For Secondary VM, jump to incoming co */
 if (mis->migration_incoming_co) {
 qemu_coroutine_enter(mis->migration_incoming_co);
@@ -262,7 +263,7 @@ COLOStatus *qmp_query_colo_status(Error **errp)
 case FAILOVER_STATUS_NONE:
 s->reason = COLO_EXIT_REASON_NONE;
 break;
-case FAILOVER_STATUS_REQUIRE:
+case FAILOVER_STATUS_COMPLETED:
 s->reason = COLO_EXIT_REASON_REQUEST;
 break;
 default:
@@ -582,7 +583,7 @@ out:
 qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
   COLO_EXIT_REASON_ERROR);
 break;
-case FAILOVER_STATUS_REQUIRE:
+case FAILOVER_STATUS_COMPLETED:
 qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
   COLO_EXIT_REASON_REQUEST);
 break;
@@ -854,7 +855,7 @@ out:
 qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
   COLO_EXIT_REASON_ERROR);
 break;
-case FAILOVER_STATUS_REQUIRE:
+case FAILOVER_STATUS_COMPLETED:
 qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
   COLO_EXIT_REASON_REQUEST);
 break;
-- 
2.17.GIT

[Qemu-devel] [PULL 47/50] hw/ppc: Use object_initialize_child for correct reference counting

2019-02-25 Thread David Gibson

From: Thomas Huth 

Both functions, object_initialize() and object_property_add_child() increase
the reference counter of the new object, so one of the references has to be
dropped afterwards to get the reference counting right. Otherwise the child
object will not be properly cleaned up when the parent gets destroyed.
Thus let's use now object_initialize_child() instead to get the reference
counting here right.

Suggested-by: Eduardo Habkost 
Signed-off-by: Thomas Huth 
Message-Id: <1550748288-30598-1-git-send-email-th...@redhat.com>
Reviewed-by: Cédric Le Goater 
Signed-off-by: David Gibson 
---
 hw/intc/spapr_xive.c | 11 +--
 hw/ppc/pnv.c | 12 ++--
 hw/ppc/pnv_psi.c |  4 ++--
 hw/ppc/spapr.c   |  6 +++---
 4 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index 06e3c9fdbf..e0e5cb5d8e 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -244,13 +244,12 @@ static void spapr_xive_instance_init(Object *obj)
 {
 sPAPRXive *xive = SPAPR_XIVE(obj);
 
-object_initialize(>source, sizeof(xive->source), TYPE_XIVE_SOURCE);
-object_property_add_child(obj, "source", OBJECT(>source), NULL);
+object_initialize_child(obj, "source", >source, sizeof(xive->source),
+TYPE_XIVE_SOURCE, _abort, NULL);
 
-object_initialize(>end_source, sizeof(xive->end_source),
-  TYPE_XIVE_END_SOURCE);
-object_property_add_child(obj, "end_source", OBJECT(>end_source),
-  NULL);
+object_initialize_child(obj, "end_source", >end_source,
+sizeof(xive->end_source), TYPE_XIVE_END_SOURCE,
+_abort, NULL);
 }
 
 static void spapr_xive_realize(DeviceState *dev, Error **errp)
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index da540860a2..9e03e9c336 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -736,18 +736,18 @@ static void pnv_chip_power8_instance_init(Object *obj)
 {
 Pnv8Chip *chip8 = PNV8_CHIP(obj);
 
-object_initialize(>psi, sizeof(chip8->psi), TYPE_PNV_PSI);
-object_property_add_child(obj, "psi", OBJECT(>psi), NULL);
+object_initialize_child(obj, "psi",  >psi, sizeof(chip8->psi),
+TYPE_PNV_PSI, _abort, NULL);
 object_property_add_const_link(OBJECT(>psi), "xics",
OBJECT(qdev_get_machine()), _abort);
 
-object_initialize(>lpc, sizeof(chip8->lpc), TYPE_PNV_LPC);
-object_property_add_child(obj, "lpc", OBJECT(>lpc), NULL);
+object_initialize_child(obj, "lpc",  >lpc, sizeof(chip8->lpc),
+TYPE_PNV_LPC, _abort, NULL);
 object_property_add_const_link(OBJECT(>lpc), "psi",
OBJECT(>psi), _abort);
 
-object_initialize(>occ, sizeof(chip8->occ), TYPE_PNV_OCC);
-object_property_add_child(obj, "occ", OBJECT(>occ), NULL);
+object_initialize_child(obj, "occ",  >occ, sizeof(chip8->occ),
+TYPE_PNV_OCC, _abort, NULL);
 object_property_add_const_link(OBJECT(>occ), "psi",
OBJECT(>psi), _abort);
 }
diff --git a/hw/ppc/pnv_psi.c b/hw/ppc/pnv_psi.c
index 8ced095063..44bc0cbf58 100644
--- a/hw/ppc/pnv_psi.c
+++ b/hw/ppc/pnv_psi.c
@@ -444,8 +444,8 @@ static void pnv_psi_init(Object *obj)
 {
 PnvPsi *psi = PNV_PSI(obj);
 
-object_initialize(>ics, sizeof(psi->ics), TYPE_ICS_SIMPLE);
-object_property_add_child(obj, "ics-psi", OBJECT(>ics), NULL);
+object_initialize_child(obj, "ics-psi",  >ics, sizeof(psi->ics),
+TYPE_ICS_SIMPLE, _abort, NULL);
 }
 
 static const uint8_t irq_to_xivr[] = {
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 5d8b8c9e53..b6a571b6f1 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1776,9 +1776,9 @@ static void spapr_create_nvram(sPAPRMachineState *spapr)
 
 static void spapr_rtc_create(sPAPRMachineState *spapr)
 {
-object_initialize(>rtc, sizeof(spapr->rtc), TYPE_SPAPR_RTC);
-object_property_add_child(OBJECT(spapr), "rtc", OBJECT(>rtc),
-  _fatal);
+object_initialize_child(OBJECT(spapr), "rtc",
+>rtc, sizeof(spapr->rtc), TYPE_SPAPR_RTC,
+_fatal, NULL);
 object_property_set_bool(OBJECT(>rtc), true, "realized",
   _fatal);
 object_property_add_alias(OBJECT(spapr), "rtc-time", OBJECT(>rtc),
-- 
2.20.1

[Qemu-devel] [PATCH 3/3] Migration/colo.c: Make COLO node running after failover

2019-02-25 Thread Zhang Chen

From: Zhang Chen 

Delay to close COLO for auto start VM after failover.

Signed-off-by: Zhang Chen 
---
 migration/colo.c  | 1 -
 migration/migration.c | 3 +++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/migration/colo.c b/migration/colo.c
index a13acac192..89325952c7 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -875,7 +875,6 @@ out:
 qemu_fclose(mis->to_src_file);
 mis->to_src_file = NULL;
 }
-migration_incoming_disable_colo();
 
 rcu_unregister_thread();
 return NULL;
diff --git a/migration/migration.c b/migration/migration.c
index 37e06b76dc..cec5f529c3 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -383,6 +383,9 @@ static void process_incoming_migration_bh(void *opaque)
 } else {
 runstate_set(RUN_STATE_PAUSED);
 }
+} else if (migration_incoming_colo_enabled()) {
+migration_incoming_disable_colo();
+vm_start();
 } else {
 runstate_set(global_state_get_runstate());
 }
-- 
2.17.GIT

[Qemu-devel] [PULL 41/50] spapr_pci: provide node start offset via spapr_populate_pci_dt()

2019-02-25 Thread David Gibson

From: Michael Roth 

PHB hotplug re-uses PHB device tree generation code and passes
it to a guest via RTAS. Doing this requires knowledge of where
exactly in the device tree the node describing the PHB begins.

Provide this via a new optional pointer that can be used to
store the PHB node's start offset.

Signed-off-by: Michael Roth 
Reviewed-by: David Gibson 
Signed-off-by: Greg Kurz 
Message-Id: 
<155059671912.1466090.10891589403973703473.st...@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson 
---
 hw/ppc/spapr.c  | 2 +-
 hw/ppc/spapr_pci.c  | 5 -
 include/hw/pci-host/spapr.h | 2 +-
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index fcda177090..76b3c15d59 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1312,7 +1312,7 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr)
 
 QLIST_FOREACH(phb, >phbs, list) {
 ret = spapr_populate_pci_dt(phb, PHANDLE_INTC, fdt,
-spapr->irq->nr_msis);
+spapr->irq->nr_msis, NULL);
 if (ret < 0) {
 error_report("couldn't setup PCI devices in fdt");
 exit(1);
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index ede928b0bf..a0e1769439 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -2153,7 +2153,7 @@ static void spapr_phb_pci_enumerate(sPAPRPHBState *phb)
 }
 
 int spapr_populate_pci_dt(sPAPRPHBState *phb, uint32_t intc_phandle, void *fdt,
-  uint32_t nr_msis)
+  uint32_t nr_msis, int *node_offset)
 {
 int bus_off, i, j, ret;
 gchar *nodename;
@@ -2208,6 +2208,9 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb, uint32_t 
intc_phandle, void *fdt,
 nodename = g_strdup_printf("pci@%" PRIx64, phb->buid);
 _FDT(bus_off = fdt_add_subnode(fdt, 0, nodename));
 g_free(nodename);
+if (node_offset) {
+*node_offset = bus_off;
+}
 
 /* Write PHB properties */
 _FDT(fdt_setprop_string(fdt, bus_off, "device_type", "pci"));
diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index 4b0443f4cf..ab0e3a0a6f 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -113,7 +113,7 @@ static inline qemu_irq spapr_phb_lsi_qirq(struct 
sPAPRPHBState *phb, int pin)
 }
 
 int spapr_populate_pci_dt(sPAPRPHBState *phb, uint32_t intc_phandle, void *fdt,
-  uint32_t nr_msis);
+  uint32_t nr_msis, int *node_offset);
 
 void spapr_pci_rtas_init(void);
 
-- 
2.20.1

[Qemu-devel] [PULL 45/50] tests/device-plug: Add PHB unplug request test for spapr

2019-02-25 Thread David Gibson

From: Greg Kurz 

We can easily test this, just like PCI. PHB unplug is not supported
on s390x and x86 ACPI.

Signed-off-by: Greg Kurz 
Message-Id: 
<155059673939.1466090.14354001937819612724.st...@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson 
---
 tests/device-plug-test.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/tests/device-plug-test.c b/tests/device-plug-test.c
index 87593d9ecf..318e422d51 100644
--- a/tests/device-plug-test.c
+++ b/tests/device-plug-test.c
@@ -132,6 +132,20 @@ static void test_spapr_memory_unplug_request(void)
 qtest_quit(qtest);
 }
 
+static void test_spapr_phb_unplug_request(void)
+{
+QTestState *qtest;
+
+qtest = qtest_initf("-device spapr-pci-host-bridge,index=1,id=dev0");
+
+/* similar to test_pci_unplug_request */
+device_del_request(qtest, "dev0");
+system_reset(qtest);
+wait_device_deleted_event(qtest, "dev0");
+
+qtest_quit(qtest);
+}
+
 int main(int argc, char **argv)
 {
 const char *arch = qtest_get_arch();
@@ -156,6 +170,8 @@ int main(int argc, char **argv)
test_spapr_cpu_unplug_request);
 qtest_add_func("/device-plug/spapr-memory-unplug-request",
test_spapr_memory_unplug_request);
+qtest_add_func("/device-plug/spapr-phb-unplug-request",
+   test_spapr_phb_unplug_request);
 }
 
 return g_test_run();
-- 
2.20.1

[Qemu-devel] [PULL 34/50] xics: Write source state to KVM at claim time

2019-02-25 Thread David Gibson

From: Greg Kurz 

The pseries machine only uses LSIs to support legacy PCI devices. Every
PHB claims 4 LSIs at realize time. When using in-kernel XICS (or upcoming
in-kernel XIVE), QEMU synchronizes the state of all irqs, including these
LSIs, later on at machine reset.

In order to support PHB hotplug, we need a way to tell KVM about the LSIs
that doesn't require a machine reset. An easy way to do that is to always
inform KVM when an interrupt is claimed, which really isn't a performance
path.

Signed-off-by: Greg Kurz 
Message-Id: 
<155059668360.1466090.5969630516627776426.st...@bahia.lab.toulouse-stg.fr.ibm.com>
Reviewed-by: Cédric Le Goater 
Signed-off-by: David Gibson 
---
 hw/intc/xics.c|  4 +++
 hw/intc/xics_kvm.c| 74 +--
 include/hw/ppc/xics.h |  1 +
 3 files changed, 48 insertions(+), 31 deletions(-)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 767fdeb829..af7dc709ab 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -758,6 +758,10 @@ void ics_set_irq_type(ICSState *ics, int srcno, bool lsi)
 
 ics->irqs[srcno].flags |=
 lsi ? XICS_FLAGS_IRQ_LSI : XICS_FLAGS_IRQ_MSI;
+
+if (kvm_irqchip_in_kernel()) {
+ics_set_kvm_state_one(ics, srcno);
+}
 }
 
 static void xics_register_types(void)
diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
index a00d0a7962..c6e1b630a4 100644
--- a/hw/intc/xics_kvm.c
+++ b/hw/intc/xics_kvm.c
@@ -213,45 +213,57 @@ void ics_synchronize_state(ICSState *ics)
 ics_get_kvm_state(ics);
 }
 
-int ics_set_kvm_state(ICSState *ics)
+int ics_set_kvm_state_one(ICSState *ics, int srcno)
 {
 uint64_t state;
-int i;
 Error *local_err = NULL;
+ICSIRQState *irq = >irqs[srcno];
+int ret;
 
-for (i = 0; i < ics->nr_irqs; i++) {
-ICSIRQState *irq = >irqs[i];
-int ret;
-
-state = irq->server;
-state |= (uint64_t)(irq->saved_priority & KVM_XICS_PRIORITY_MASK)
-<< KVM_XICS_PRIORITY_SHIFT;
-if (irq->priority != irq->saved_priority) {
-assert(irq->priority == 0xff);
-state |= KVM_XICS_MASKED;
-}
+state = irq->server;
+state |= (uint64_t)(irq->saved_priority & KVM_XICS_PRIORITY_MASK)
+<< KVM_XICS_PRIORITY_SHIFT;
+if (irq->priority != irq->saved_priority) {
+assert(irq->priority == 0xff);
+state |= KVM_XICS_MASKED;
+}
 
-if (ics->irqs[i].flags & XICS_FLAGS_IRQ_LSI) {
-state |= KVM_XICS_LEVEL_SENSITIVE;
-if (irq->status & XICS_STATUS_ASSERTED) {
-state |= KVM_XICS_PENDING;
-}
-} else {
-if (irq->status & XICS_STATUS_MASKED_PENDING) {
-state |= KVM_XICS_PENDING;
-}
+if (irq->flags & XICS_FLAGS_IRQ_LSI) {
+state |= KVM_XICS_LEVEL_SENSITIVE;
+if (irq->status & XICS_STATUS_ASSERTED) {
+state |= KVM_XICS_PENDING;
 }
-if (irq->status & XICS_STATUS_PRESENTED) {
-state |= KVM_XICS_PRESENTED;
-}
-if (irq->status & XICS_STATUS_QUEUED) {
-state |= KVM_XICS_QUEUED;
+} else {
+if (irq->status & XICS_STATUS_MASKED_PENDING) {
+state |= KVM_XICS_PENDING;
 }
+}
+if (irq->status & XICS_STATUS_PRESENTED) {
+state |= KVM_XICS_PRESENTED;
+}
+if (irq->status & XICS_STATUS_QUEUED) {
+state |= KVM_XICS_QUEUED;
+}
+
+ret = kvm_device_access(kernel_xics_fd, KVM_DEV_XICS_GRP_SOURCES,
+srcno + ics->offset, , true, _err);
+if (local_err) {
+error_report_err(local_err);
+return ret;
+}
+
+return 0;
+}
+
+int ics_set_kvm_state(ICSState *ics)
+{
+int i;
+
+for (i = 0; i < ics->nr_irqs; i++) {
+int ret;
 
-ret = kvm_device_access(kernel_xics_fd, KVM_DEV_XICS_GRP_SOURCES,
-i + ics->offset, , true, _err);
-if (local_err) {
-error_report_err(local_err);
+ret = ics_set_kvm_state_one(ics, i);
+if (ret) {
 return ret;
 }
 }
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index d36bbe11ee..eb65ad7e43 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -195,6 +195,7 @@ void icp_synchronize_state(ICPState *icp);
 void icp_kvm_realize(DeviceState *dev, Error **errp);
 
 void ics_get_kvm_state(ICSState *ics);
+int ics_set_kvm_state_one(ICSState *ics, int srcno);
 int ics_set_kvm_state(ICSState *ics);
 void ics_synchronize_state(ICSState *ics);
 void ics_kvm_set_irq(ICSState *ics, int srcno, int val);
-- 
2.20.1

[Qemu-devel] [PULL 49/50] ppc/pnv: add INITRD_MAX_SIZE constant

2019-02-25 Thread David Gibson

From: Murilo Opsfelder Araujo 

The current 0x1000 value is actually 256MiB, not 128MB as the comment
suggests. Move it to a constant and fix the comment (no change in the size
value).

Signed-off-by: Murilo Opsfelder Araujo 
Message-Id: <20190225170155.1972-3-muri...@linux.ibm.com>
Reviewed-by: Cédric Le Goater 
Signed-off-by: David Gibson 
---
 hw/ppc/pnv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 4144976aec..0cd6af4669 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -56,6 +56,7 @@
 #define KERNEL_LOAD_ADDR0x2000
 #define KERNEL_MAX_SIZE (256 * MiB)
 #define INITRD_LOAD_ADDR0x6000
+#define INITRD_MAX_SIZE (256 * MiB)
 
 static const char *pnv_chip_core_typename(const PnvChip *o)
 {
@@ -601,7 +602,7 @@ static void pnv_init(MachineState *machine)
 if (machine->initrd_filename) {
 pnv->initrd_base = INITRD_LOAD_ADDR;
 pnv->initrd_size = load_image_targphys(machine->initrd_filename,
-  pnv->initrd_base, 0x1000); /* 128MB max 
*/
+  pnv->initrd_base, INITRD_MAX_SIZE);
 if (pnv->initrd_size < 0) {
 error_report("Could not load initial ram disk '%s'",
  machine->initrd_filename);
-- 
2.20.1

[Qemu-devel] [PULL 46/50] ppc/xive: xive does not have a POWER7 interrupt model

2019-02-25 Thread David Gibson

From: Cédric Le Goater 

Patch "target/ppc: Add POWER9 external interrupt model" should have
removed the section covering PPC_FLAGS_INPUT_POWER7.

Signed-off-by: Cédric Le Goater 
Message-Id: <20190219142530.17807-1-...@kaod.org>
Signed-off-by: David Gibson 
---
 hw/intc/xive.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index 425aa97ef9..daa7badc84 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -481,9 +481,6 @@ static void xive_tctx_realize(DeviceState *dev, Error 
**errp)
 
 env = >env;
 switch (PPC_INPUT(env)) {
-case PPC_FLAGS_INPUT_POWER7:
-tctx->output = env->irq_inputs[POWER7_INPUT_INT];
-break;
 case PPC_FLAGS_INPUT_POWER9:
 tctx->output = env->irq_inputs[POWER9_INPUT_INT];
 break;
-- 
2.20.1

[Qemu-devel] [PULL 43/50] spapr: add hotplug hooks for PHB hotplug

2019-02-25 Thread David Gibson

From: Greg Kurz 

Hotplugging PHBs is a machine-level operation, but PHBs reside on the
main system bus, so we register spapr machine as the handler for the
main system bus.

Provide the usual pre-plug, plug and unplug-request handlers.

Move the checking of the PHB index to the pre-plug handler. It is okay
to do that and assert in the realize function because the pre-plug
handler is always called, even for the oldest machine types we support.

Signed-off-by: Michael Roth 
(Fixed interrupt controller phandle in "interrupt-map" and
 TCE table size in "ibm,dma-window" FDT fragment, Greg Kurz)
Signed-off-by: Greg Kurz 
Message-Id: 
<155059672926.1466090.13612804072190051439.st...@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson 
---
 hw/ppc/spapr.c | 128 -
 hw/ppc/spapr_drc.c |   2 +
 hw/ppc/spapr_pci.c |  16 +-
 include/hw/ppc/spapr.h |   3 +
 4 files changed, 133 insertions(+), 16 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 76b3c15d59..7422c05254 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3009,6 +3009,9 @@ static void spapr_machine_init(MachineState *machine)
 register_savevm_live(NULL, "spapr/htab", -1, 1,
  _htab_handlers, spapr);
 
+qbus_set_hotplug_handler(sysbus_get_default(), OBJECT(machine),
+ _fatal);
+
 qemu_register_boot_set(spapr_boot_set, spapr);
 
 if (kvm_enabled()) {
@@ -3850,6 +3853,115 @@ out:
 error_propagate(errp, local_err);
 }
 
+int spapr_phb_dt_populate(sPAPRDRConnector *drc, sPAPRMachineState *spapr,
+  void *fdt, int *fdt_start_offset, Error **errp)
+{
+sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(drc->dev);
+int intc_phandle;
+
+intc_phandle = spapr_irq_get_phandle(spapr, spapr->fdt_blob, errp);
+if (intc_phandle <= 0) {
+return -1;
+}
+
+if (spapr_populate_pci_dt(sphb, intc_phandle, fdt, spapr->irq->nr_msis,
+  fdt_start_offset)) {
+error_setg(errp, "unable to create FDT node for PHB %d", sphb->index);
+return -1;
+}
+
+/* generally SLOF creates these, for hotplug it's up to QEMU */
+_FDT(fdt_setprop_string(fdt, *fdt_start_offset, "name", "pci"));
+
+return 0;
+}
+
+static void spapr_phb_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
+   Error **errp)
+{
+sPAPRMachineState *spapr = SPAPR_MACHINE(OBJECT(hotplug_dev));
+sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(dev);
+sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
+const unsigned windows_supported = spapr_phb_windows_supported(sphb);
+
+if (dev->hotplugged && !smc->dr_phb_enabled) {
+error_setg(errp, "PHB hotplug not supported for this machine");
+return;
+}
+
+if (sphb->index == (uint32_t)-1) {
+error_setg(errp, "\"index\" for PAPR PHB is mandatory");
+return;
+}
+
+/*
+ * This will check that sphb->index doesn't exceed the maximum number of
+ * PHBs for the current machine type.
+ */
+smc->phb_placement(spapr, sphb->index,
+   >buid, >io_win_addr,
+   >mem_win_addr, >mem64_win_addr,
+   windows_supported, sphb->dma_liobn, errp);
+}
+
+static void spapr_phb_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
+   Error **errp)
+{
+sPAPRMachineState *spapr = SPAPR_MACHINE(OBJECT(hotplug_dev));
+sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
+sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(dev);
+sPAPRDRConnector *drc;
+bool hotplugged = spapr_drc_hotplugged(dev);
+Error *local_err = NULL;
+
+if (!smc->dr_phb_enabled) {
+return;
+}
+
+drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PHB, sphb->index);
+/* hotplug hooks should check it's enabled before getting this far */
+assert(drc);
+
+spapr_drc_attach(drc, DEVICE(dev), _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+
+if (hotplugged) {
+spapr_hotplug_req_add_by_index(drc);
+} else {
+spapr_drc_reset(drc);
+}
+}
+
+void spapr_phb_release(DeviceState *dev)
+{
+HotplugHandler *hotplug_ctrl = qdev_get_hotplug_handler(dev);
+
+hotplug_handler_unplug(hotplug_ctrl, dev, _abort);
+}
+
+static void spapr_phb_unplug(HotplugHandler *hotplug_dev, DeviceState *dev)
+{
+object_unparent(OBJECT(dev));
+}
+
+static void spapr_phb_unplug_request(HotplugHandler *hotplug_dev,
+ DeviceState *dev, Error **errp)
+{
+sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(dev);
+sPAPRDRConnector *drc;
+
+drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PHB, sphb->index);
+assert(drc);
+
+if (!spapr_drc_unplug_requested(drc)) {
+spapr_drc_detach(drc);
+spapr_hotplug_req_remove_by_index(drc);
+}
+}
+
 static void

[Qemu-devel] [PULL 50/50] ppc/pnv: use IEC binary prefixes to represent sizes

2019-02-25 Thread David Gibson

From: Murilo Opsfelder Araujo 

Using IEC binary prefixes from qemu/units.h provides a more human-friendly value
to size constants.

Suggested-by: Eric Blake 
Signed-off-by: Murilo Opsfelder Araujo 
Message-Id: <20190225170155.1972-4-muri...@linux.ibm.com>
Reviewed-by: Cédric Le Goater 
Signed-off-by: David Gibson 
---
 hw/ppc/pnv.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 0cd6af4669..3d5dfef220 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -47,11 +47,11 @@
 
 #include 
 
-#define FDT_MAX_SIZE0x0010
+#define FDT_MAX_SIZE(1 * MiB)
 
 #define FW_FILE_NAME"skiboot.lid"
 #define FW_LOAD_ADDR0x0
-#define FW_MAX_SIZE 0x0040
+#define FW_MAX_SIZE (4 * MiB)
 
 #define KERNEL_LOAD_ADDR0x2000
 #define KERNEL_MAX_SIZE (256 * MiB)
-- 
2.20.1

[Qemu-devel] [PULL 35/50] spapr: Expose the name of the interrupt controller node

2019-02-25 Thread David Gibson

From: Greg Kurz 

This will be needed by PHB hotplug in order to access the "phandle"
property of the interrupt controller node.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Greg Kurz 
Reviewed-by: David Gibson 
Message-Id: 
<155059668867.1466090.6339199751719123386.st...@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson 
---
 hw/intc/spapr_xive.c|  9 -
 hw/intc/xics_spapr.c|  2 +-
 hw/ppc/spapr_irq.c  | 21 -
 include/hw/ppc/spapr_irq.h  |  1 +
 include/hw/ppc/spapr_xive.h |  3 +++
 include/hw/ppc/xics_spapr.h |  2 ++
 6 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index 290a290e43..06e3c9fdbf 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -317,6 +317,9 @@ static void spapr_xive_realize(DeviceState *dev, Error 
**errp)
 /* Map all regions */
 spapr_xive_map_mmio(xive);
 
+xive->nodename = g_strdup_printf("interrupt-controller@%" PRIx64,
+   xive->tm_base + XIVE_TM_USER_PAGE * (1 << 
TM_SHIFT));
+
 qemu_register_reset(spapr_xive_reset, dev);
 }
 
@@ -1448,7 +1451,6 @@ void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t 
nr_servers, void *fdt,
 cpu_to_be32(7),/* start */
 cpu_to_be32(0xf8), /* count */
 };
-gchar *nodename;
 
 /* Thread Interrupt Management Area : User (ring 3) and OS (ring 2) */
 timas[0] = cpu_to_be64(xive->tm_base +
@@ -1458,10 +1460,7 @@ void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t 
nr_servers, void *fdt,
XIVE_TM_OS_PAGE * (1ull << TM_SHIFT));
 timas[3] = cpu_to_be64(1ull << TM_SHIFT);
 
-nodename = g_strdup_printf("interrupt-controller@%" PRIx64,
-   xive->tm_base + XIVE_TM_USER_PAGE * (1 << 
TM_SHIFT));
-_FDT(node = fdt_add_subnode(fdt, 0, nodename));
-g_free(nodename);
+_FDT(node = fdt_add_subnode(fdt, 0, xive->nodename));
 
 _FDT(fdt_setprop_string(fdt, node, "device_type", "power-ivpe"));
 _FDT(fdt_setprop(fdt, node, "reg", timas, sizeof(timas)));
diff --git a/hw/intc/xics_spapr.c b/hw/intc/xics_spapr.c
index e2d8b38183..53bda6661b 100644
--- a/hw/intc/xics_spapr.c
+++ b/hw/intc/xics_spapr.c
@@ -254,7 +254,7 @@ void spapr_dt_xics(sPAPRMachineState *spapr, uint32_t 
nr_servers, void *fdt,
 };
 int node;
 
-_FDT(node = fdt_add_subnode(fdt, 0, "interrupt-controller"));
+_FDT(node = fdt_add_subnode(fdt, 0, XICS_NODENAME));
 
 _FDT(fdt_setprop_string(fdt, node, "device_type",
 "PowerPC-External-Interrupt-Presentation"));
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index 4297eed600..359761494c 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -230,6 +230,11 @@ static void spapr_irq_reset_xics(sPAPRMachineState *spapr, 
Error **errp)
 /* TODO: create the KVM XICS device */
 }
 
+static const char *spapr_irq_get_nodename_xics(sPAPRMachineState *spapr)
+{
+return XICS_NODENAME;
+}
+
 #define SPAPR_IRQ_XICS_NR_IRQS 0x1000
 #define SPAPR_IRQ_XICS_NR_MSIS \
 (XICS_IRQ_BASE + SPAPR_IRQ_XICS_NR_IRQS - SPAPR_IRQ_MSI)
@@ -249,6 +254,7 @@ sPAPRIrq spapr_irq_xics = {
 .post_load   = spapr_irq_post_load_xics,
 .reset   = spapr_irq_reset_xics,
 .set_irq = spapr_irq_set_irq_xics,
+.get_nodename = spapr_irq_get_nodename_xics,
 };
 
 /*
@@ -384,6 +390,11 @@ static void spapr_irq_set_irq_xive(void *opaque, int 
srcno, int val)
 xive_source_set_irq(>xive->source, srcno, val);
 }
 
+static const char *spapr_irq_get_nodename_xive(sPAPRMachineState *spapr)
+{
+return spapr->xive->nodename;
+}
+
 /*
  * XIVE uses the full IRQ number space. Set it to 8K to be compatible
  * with XICS.
@@ -407,6 +418,7 @@ sPAPRIrq spapr_irq_xive = {
 .post_load   = spapr_irq_post_load_xive,
 .reset   = spapr_irq_reset_xive,
 .set_irq = spapr_irq_set_irq_xive,
+.get_nodename = spapr_irq_get_nodename_xive,
 };
 
 /*
@@ -541,6 +553,11 @@ static void spapr_irq_set_irq_dual(void *opaque, int 
srcno, int val)
 spapr_irq_current(spapr)->set_irq(spapr, srcno, val);
 }
 
+static const char *spapr_irq_get_nodename_dual(sPAPRMachineState *spapr)
+{
+return spapr_irq_current(spapr)->get_nodename(spapr);
+}
+
 /*
  * Define values in sync with the XIVE and XICS backend
  */
@@ -561,7 +578,8 @@ sPAPRIrq spapr_irq_dual = {
 .cpu_intc_create = spapr_irq_cpu_intc_create_dual,
 .post_load   = spapr_irq_post_load_dual,
 .reset   = spapr_irq_reset_dual,
-.set_irq = spapr_irq_set_irq_dual
+.set_irq = spapr_irq_set_irq_dual,
+.get_nodename = spapr_irq_get_nodename_dual,
 };
 
 /*
@@ -691,4 +709,5 @@ sPAPRIrq spapr_irq_xics_legacy = {
 .cpu_intc_create = spapr_irq_cpu_intc_create_xics,
 .post_load   = spapr_irq_post_load_xics,
 .set_irq = spapr_irq_set_irq_xics,
+.get_nodename = spapr_irq_get_nodename_xics,
 };
diff --git

[Qemu-devel] [PULL 48/50] ppc/pnv: increase kernel size limit to 256MiB

2019-02-25 Thread David Gibson

From: Murilo Opsfelder Araujo 

Building kernel with CONFIG_DEBUG_INFO_REDUCED can generate a ~90MB image and
building with CONFIG_DEBUG_INFO can generate a ~225M one, both exceeds the
current limit of 32MiB.

Increasing kernel size limit to 256MiB should fit for now.

Signed-off-by: Murilo Opsfelder Araujo 
Message-Id: <20190225170155.1972-2-muri...@linux.ibm.com>
Reviewed-by: Cédric Le Goater 
Signed-off-by: David Gibson 
---
 hw/ppc/pnv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 9e03e9c336..4144976aec 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -54,6 +54,7 @@
 #define FW_MAX_SIZE 0x0040
 
 #define KERNEL_LOAD_ADDR0x2000
+#define KERNEL_MAX_SIZE (256 * MiB)
 #define INITRD_LOAD_ADDR0x6000
 
 static const char *pnv_chip_core_typename(const PnvChip *o)
@@ -588,7 +589,7 @@ static void pnv_init(MachineState *machine)
 long kernel_size;
 
 kernel_size = load_image_targphys(machine->kernel_filename,
-  KERNEL_LOAD_ADDR, 0x200);
+  KERNEL_LOAD_ADDR, KERNEL_MAX_SIZE);
 if (kernel_size < 0) {
 error_report("Could not load kernel '%s'",
  machine->kernel_filename);
-- 
2.20.1

[Qemu-devel] [PULL 38/50] spapr: create DR connectors for PHBs

2019-02-25 Thread David Gibson

From: Michael Roth 

Signed-off-by: Michael Roth 
Reviewed-by: David Gibson 
Signed-off-by: Greg Kurz 
Message-Id: 
<155059670389.1466090.10015601248906623076.st...@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson 
---
 hw/ppc/spapr.c | 13 +
 hw/ppc/spapr_drc.c | 17 +
 include/hw/ppc/spapr.h |  1 +
 include/hw/ppc/spapr_drc.h |  8 
 4 files changed, 39 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 9364d07364..96bea7580a 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2875,6 +2875,19 @@ static void spapr_machine_init(MachineState *machine)
 /* We always have at least the nvram device on VIO */
 spapr_create_nvram(spapr);
 
+/*
+ * Setup hotplug / dynamic-reconfiguration connectors. top-level
+ * connectors (described in root DT node's "ibm,drc-types" property)
+ * are pre-initialized here. additional child connectors (such as
+ * connectors for a PHBs PCI slots) are added as needed during their
+ * parent's realization.
+ */
+if (smc->dr_phb_enabled) {
+for (i = 0; i < SPAPR_MAX_PHBS; i++) {
+spapr_dr_connector_new(OBJECT(machine), TYPE_SPAPR_DRC_PHB, i);
+}
+}
+
 /* Set up PCI */
 spapr_pci_rtas_init();
 
diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 87ca7d9735..fd6380adb3 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -696,6 +696,15 @@ static void spapr_drc_lmb_class_init(ObjectClass *k, void 
*data)
 drck->dt_populate = spapr_lmb_dt_populate;
 }
 
+static void spapr_drc_phb_class_init(ObjectClass *k, void *data)
+{
+sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_CLASS(k);
+
+drck->typeshift = SPAPR_DR_CONNECTOR_TYPE_SHIFT_PHB;
+drck->typename = "PHB";
+drck->drc_name_prefix = "PHB ";
+}
+
 static const TypeInfo spapr_dr_connector_info = {
 .name  = TYPE_SPAPR_DR_CONNECTOR,
 .parent= TYPE_DEVICE,
@@ -739,6 +748,13 @@ static const TypeInfo spapr_drc_lmb_info = {
 .class_init= spapr_drc_lmb_class_init,
 };
 
+static const TypeInfo spapr_drc_phb_info = {
+.name  = TYPE_SPAPR_DRC_PHB,
+.parent= TYPE_SPAPR_DRC_LOGICAL,
+.instance_size = sizeof(sPAPRDRConnector),
+.class_init= spapr_drc_phb_class_init,
+};
+
 /* helper functions for external users */
 
 sPAPRDRConnector *spapr_drc_by_index(uint32_t index)
@@ -1207,6 +1223,7 @@ static void spapr_drc_register_types(void)
 type_register_static(_drc_cpu_info);
 type_register_static(_drc_pci_info);
 type_register_static(_drc_lmb_info);
+type_register_static(_drc_phb_info);
 
 spapr_rtas_register(RTAS_SET_INDICATOR, "set-indicator",
 rtas_set_indicator);
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 5e3c760725..b173fd7149 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -104,6 +104,7 @@ struct sPAPRMachineClass {
 
 /*< public >*/
 bool dr_lmb_enabled;   /* enable dynamic-reconfig/hotplug of LMBs */
+bool dr_phb_enabled;   /* enable dynamic-reconfig/hotplug of PHBs */
 bool update_dt_enabled;/* enable KVMPPC_H_UPDATE_DT */
 bool use_ohci_by_default;  /* use USB-OHCI instead of XHCI */
 bool pre_2_10_has_unused_icps;
diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
index f32758ec84..46b0f6216d 100644
--- a/include/hw/ppc/spapr_drc.h
+++ b/include/hw/ppc/spapr_drc.h
@@ -71,6 +71,14 @@
 #define SPAPR_DRC_LMB(obj) OBJECT_CHECK(sPAPRDRConnector, (obj), \
 TYPE_SPAPR_DRC_LMB)
 
+#define TYPE_SPAPR_DRC_PHB "spapr-drc-phb"
+#define SPAPR_DRC_PHB_GET_CLASS(obj) \
+OBJECT_GET_CLASS(sPAPRDRConnectorClass, obj, TYPE_SPAPR_DRC_PHB)
+#define SPAPR_DRC_PHB_CLASS(klass) \
+OBJECT_CLASS_CHECK(sPAPRDRConnectorClass, klass, TYPE_SPAPR_DRC_PHB)
+#define SPAPR_DRC_PHB(obj) OBJECT_CHECK(sPAPRDRConnector, (obj), \
+TYPE_SPAPR_DRC_PHB)
+
 /*
  * Various hotplug types managed by sPAPRDRConnector
  *
-- 
2.20.1

[Qemu-devel] [PULL 30/50] spapr: Generate FDT fragment for LMBs at configure connector time

2019-02-25 Thread David Gibson

From: Greg Kurz 

Signed-off-by: Greg Kurz 
Message-Id: 
<155059666331.1466090.676654076629713.st...@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson 
---
 hw/ppc/spapr.c | 33 ++---
 hw/ppc/spapr_drc.c |  1 +
 include/hw/ppc/spapr.h |  4 +++-
 3 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 00eb3b643c..b92deee771 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -,14 +,26 @@ static void spapr_nmi(NMIState *n, int cpu_index, Error 
**errp)
 }
 }
 
+int spapr_lmb_dt_populate(sPAPRDRConnector *drc, sPAPRMachineState *spapr,
+  void *fdt, int *fdt_start_offset, Error **errp)
+{
+uint64_t addr;
+uint32_t node;
+
+addr = spapr_drc_index(drc) * SPAPR_MEMORY_BLOCK_SIZE;
+node = object_property_get_uint(OBJECT(drc->dev), PC_DIMM_NODE_PROP,
+_abort);
+*fdt_start_offset = spapr_populate_memory_node(fdt, node, addr,
+   SPAPR_MEMORY_BLOCK_SIZE);
+return 0;
+}
+
 static void spapr_add_lmbs(DeviceState *dev, uint64_t addr_start, uint64_t 
size,
-   uint32_t node, bool dedicated_hp_event_source,
-   Error **errp)
+   bool dedicated_hp_event_source, Error **errp)
 {
 sPAPRDRConnector *drc;
 uint32_t nr_lmbs = size/SPAPR_MEMORY_BLOCK_SIZE;
-int i, fdt_offset, fdt_size;
-void *fdt;
+int i;
 uint64_t addr = addr_start;
 bool hotplugged = spapr_drc_hotplugged(dev);
 Error *local_err = NULL;
@@ -3350,11 +3362,7 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t 
addr_start, uint64_t size,
   addr / SPAPR_MEMORY_BLOCK_SIZE);
 g_assert(drc);
 
-fdt = create_device_tree(_size);
-fdt_offset = spapr_populate_memory_node(fdt, node, addr,
-SPAPR_MEMORY_BLOCK_SIZE);
-
-spapr_drc_attach(drc, dev, fdt, fdt_offset, _err);
+spapr_drc_attach(drc, dev, NULL, 0, _err);
 if (local_err) {
 while (addr > addr_start) {
 addr -= SPAPR_MEMORY_BLOCK_SIZE;
@@ -3362,7 +3370,6 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t 
addr_start, uint64_t size,
   addr / SPAPR_MEMORY_BLOCK_SIZE);
 spapr_drc_detach(drc);
 }
-g_free(fdt);
 error_propagate(errp, local_err);
 return;
 }
@@ -3395,7 +3402,6 @@ static void spapr_memory_plug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 sPAPRMachineState *ms = SPAPR_MACHINE(hotplug_dev);
 PCDIMMDevice *dimm = PC_DIMM(dev);
 uint64_t size, addr;
-uint32_t node;
 
 size = memory_device_get_region_size(MEMORY_DEVICE(dev), _abort);
 
@@ -3410,10 +3416,7 @@ static void spapr_memory_plug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 goto out_unplug;
 }
 
-node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP,
-_abort);
-spapr_add_lmbs(dev, addr, size, node,
-   spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
+spapr_add_lmbs(dev, addr, size, spapr_ovec_test(ms->ov5_cas, OV5_HP_EVT),
_err);
 if (local_err) {
 goto out_unplug;
diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 66b965a0a7..634c28695a 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -700,6 +700,7 @@ static void spapr_drc_lmb_class_init(ObjectClass *k, void 
*data)
 drck->typename = "MEM";
 drck->drc_name_prefix = "LMB ";
 drck->release = spapr_lmb_release;
+drck->dt_populate = spapr_lmb_dt_populate;
 }
 
 static const TypeInfo spapr_dr_connector_info = {
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 659204ea93..0ec309da49 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -764,9 +764,11 @@ void spapr_reallocate_hpt(sPAPRMachineState *spapr, int 
shift,
 void spapr_clear_pending_events(sPAPRMachineState *spapr);
 int spapr_max_server_number(sPAPRMachineState *spapr);
 
-/* CPU and LMB DRC release callbacks. */
+/* DRC callbacks. */
 void spapr_core_release(DeviceState *dev);
 void spapr_lmb_release(DeviceState *dev);
+int spapr_lmb_dt_populate(sPAPRDRConnector *drc, sPAPRMachineState *spapr,
+  void *fdt, int *fdt_start_offset, Error **errp);
 
 void spapr_rtc_read(sPAPRRTCState *rtc, struct tm *tm, uint32_t *ns);
 int spapr_rtc_import_offset(sPAPRRTCState *rtc, int64_t legacy_offset);
-- 
2.20.1

[Qemu-devel] [PULL 44/50] spapr: enable PHB hotplug for default pseries machine type

2019-02-25 Thread David Gibson

From: Michael Roth 

The 'dr_phb_enabled' field of that class can be set as part of
machine-specific init code. It will be used to conditionally
enable creation of DRC objects and device-tree description to
facilitate hotplug of PHBs.

Since we can't migrate this state to older machine types,
default the option to true and disable it for older machine
types.

Signed-off-by: Michael Roth 
Signed-off-by: Greg Kurz 
Reviewed-by: David Gibson 
Message-Id: 
<155059673433.1466090.6188091133769611501.st...@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson 
---
 hw/ppc/spapr.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 7422c05254..5d8b8c9e53 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -4290,6 +4290,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
void *data)
 smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
 spapr_caps_add_properties(smc, _abort);
 smc->irq = _irq_xics;
+smc->dr_phb_enabled = true;
 }
 
 static const TypeInfo spapr_machine_info = {
@@ -4361,6 +4362,7 @@ static void spapr_machine_3_1_class_options(MachineClass 
*mc)
 
 mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power8_v2.0");
 smc->update_dt_enabled = false;
+smc->dr_phb_enabled = false;
 }
 
 DEFINE_SPAPR_MACHINE(3_1, "3.1", false);
-- 
2.20.1

[Qemu-devel] [PULL 27/50] target/ppc: Support for POWER9 native hash

2019-02-25 Thread David Gibson

From: Benjamin Herrenschmidt 

(Might need more patch splitting)

Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: Cédric Le Goater 
Message-Id: <20190215170029.15641-12-...@kaod.org>
[dwg: Hack to fix compile with some earlier include tweaks of mine]
Signed-off-by: David Gibson 
---
 target/ppc/mmu-book3s-v3.c | 18 +++
 target/ppc/mmu-book3s-v3.h | 47 ++
 target/ppc/mmu-hash64.c|  6 +++--
 target/ppc/mmu-hash64.h| 19 +--
 4 files changed, 70 insertions(+), 20 deletions(-)

diff --git a/target/ppc/mmu-book3s-v3.c b/target/ppc/mmu-book3s-v3.c
index a174e7efc5..32b8c166b5 100644
--- a/target/ppc/mmu-book3s-v3.c
+++ b/target/ppc/mmu-book3s-v3.c
@@ -41,3 +41,21 @@ hwaddr ppc64_v3_get_phys_page_debug(PowerPCCPU *cpu, vaddr 
eaddr)
 return ppc_hash64_get_phys_page_debug(cpu, eaddr);
 }
 }
+
+bool ppc64_v3_get_pate(PowerPCCPU *cpu, target_ulong lpid, ppc_v3_pate_t 
*entry)
+{
+uint64_t patb = cpu->env.spr[SPR_PTCR] & PTCR_PATB;
+uint64_t pats = cpu->env.spr[SPR_PTCR] & PTCR_PATS;
+
+/* Calculate number of entries */
+pats = 1ull << (pats + 12 - 4);
+if (pats <= lpid) {
+return false;
+}
+
+/* Grab entry */
+patb += 16 * lpid;
+entry->dw0 = ldq_phys(CPU(cpu)->as, patb);
+entry->dw1 = ldq_phys(CPU(cpu)->as, patb + 8);
+return true;
+}
diff --git a/target/ppc/mmu-book3s-v3.h b/target/ppc/mmu-book3s-v3.h
index d63ca6b1c7..ee8288e32d 100644
--- a/target/ppc/mmu-book3s-v3.h
+++ b/target/ppc/mmu-book3s-v3.h
@@ -20,6 +20,8 @@
 #ifndef MMU_BOOOK3S_V3_H
 #define MMU_BOOOK3S_V3_H
 
+#include "mmu-hash64.h"
+
 #ifndef CONFIG_USER_ONLY
 
 /*
@@ -52,6 +54,9 @@ static inline bool ppc64_use_proc_tbl(PowerPCCPU *cpu)
 return !!(cpu->env.spr[SPR_LPCR] & LPCR_UPRT);
 }
 
+bool ppc64_v3_get_pate(PowerPCCPU *cpu, target_ulong lpid,
+   ppc_v3_pate_t *entry);
+
 /*
  * The LPCR:HR bit is a shortcut that avoids having to
  * dig out the partition table in the fast path. This is
@@ -67,6 +72,48 @@ hwaddr ppc64_v3_get_phys_page_debug(PowerPCCPU *cpu, vaddr 
eaddr);
 int ppc64_v3_handle_mmu_fault(PowerPCCPU *cpu, vaddr eaddr, int rwx,
   int mmu_idx);
 
+static inline hwaddr ppc_hash64_hpt_base(PowerPCCPU *cpu)
+{
+uint64_t base;
+
+if (cpu->vhyp) {
+return 0;
+}
+if (cpu->env.mmu_model == POWERPC_MMU_3_00) {
+ppc_v3_pate_t pate;
+
+if (!ppc64_v3_get_pate(cpu, cpu->env.spr[SPR_LPIDR], )) {
+return 0;
+}
+base = pate.dw0;
+} else {
+base = cpu->env.spr[SPR_SDR1];
+}
+return base & SDR_64_HTABORG;
+}
+
+static inline hwaddr ppc_hash64_hpt_mask(PowerPCCPU *cpu)
+{
+uint64_t base;
+
+if (cpu->vhyp) {
+PPCVirtualHypervisorClass *vhc =
+PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
+return vhc->hpt_mask(cpu->vhyp);
+}
+if (cpu->env.mmu_model == POWERPC_MMU_3_00) {
+ppc_v3_pate_t pate;
+
+if (!ppc64_v3_get_pate(cpu, cpu->env.spr[SPR_LPIDR], )) {
+return 0;
+}
+base = pate.dw0;
+} else {
+base = cpu->env.spr[SPR_SDR1];
+}
+return (1ULL << ((base & SDR_64_HTABSIZE) + 18 - 7)) - 1;
+}
+
 #endif /* TARGET_PPC64 */
 
 #endif /* CONFIG_USER_ONLY */
diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
index 3c057a8c70..c431303eff 100644
--- a/target/ppc/mmu-hash64.c
+++ b/target/ppc/mmu-hash64.c
@@ -417,7 +417,7 @@ const ppc_hash_pte64_t *ppc_hash64_map_hptes(PowerPCCPU 
*cpu,
  hwaddr ptex, int n)
 {
 hwaddr pte_offset = ptex * HASH_PTE_SIZE_64;
-hwaddr base = ppc_hash64_hpt_base(cpu);
+hwaddr base;
 hwaddr plen = n * HASH_PTE_SIZE_64;
 const ppc_hash_pte64_t *hptes;
 
@@ -426,6 +426,7 @@ const ppc_hash_pte64_t *ppc_hash64_map_hptes(PowerPCCPU 
*cpu,
 PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
 return vhc->map_hptes(cpu->vhyp, ptex, n);
 }
+base = ppc_hash64_hpt_base(cpu);
 
 if (!base) {
 return NULL;
@@ -941,7 +942,7 @@ hwaddr ppc_hash64_get_phys_page_debug(PowerPCCPU *cpu, 
target_ulong addr)
 void ppc_hash64_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
uint64_t pte0, uint64_t pte1)
 {
-hwaddr base = ppc_hash64_hpt_base(cpu);
+hwaddr base;
 hwaddr offset = ptex * HASH_PTE_SIZE_64;
 
 if (cpu->vhyp) {
@@ -950,6 +951,7 @@ void ppc_hash64_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
 vhc->store_hpte(cpu->vhyp, ptex, pte0, pte1);
 return;
 }
+base = ppc_hash64_hpt_base(cpu);
 
 stq_phys(CPU(cpu)->as, base + offset, pte0);
 stq_phys(CPU(cpu)->as, base + offset + HASH_PTE_SIZE_64 / 2, pte1);
diff --git a/target/ppc/mmu-hash64.h b/target/ppc/mmu-hash64.h
index 016d6b44ee..6b555b7220 100644
--- a/target/ppc/mmu-hash64.h
+++ b/target/ppc/mmu-hash64.h
@@ -63,6 +63,7 @@ void

[Qemu-devel] [PULL 29/50] spapr_drc: Allow FDT fragment to be added later

2019-02-25 Thread David Gibson

From: Greg Kurz 

The current logic is to provide the FDT fragment when attaching a device
to a DRC. This works perfectly fine for our current hotplug support, but
soon we will add support for PHB hotplug which has some constraints, that
CPU, PCI and LMB devices don't seem to have.

The first constraint is that the "ibm,dma-window" property of the PHB
node requires the IOMMU to be configured, ie, spapr_tce_table_enable()
has been called, which happens during PHB reset. It is okay in the case
of hotplug since the device is reset before the hotplug handler is
called. On the contrary with coldplug, the hotplug handler is called
first and device is only reset during the initial system reset. Trying
to create the FDT fragment on the hotplug path in this case, would
result in somthing like this:

ibm,dma-window = < 0x8000 0x00 0x00 0x00 0x00 >;

This will cause linux in the guest to panic, by simply removing and
re-adding the PHB using the drmgr command:

page = alloc_pages_node(nid, GFP_KERNEL, get_order(sz));
if (!page)
panic("iommu_init_table: Can't allocate %ld bytes\n", sz);

The second and maybe more problematic constraint is that the
"interrupt-map" property needs to reference the interrupt controller
node using the very same phandle that SLOF has already exposed to the
guest. QEMU requires SLOF to call the private KVMPPC_H_UPDATE_DT hcall
at some point to know about this phandle. With the latest QEMU and SLOF,
this happens when SLOF gets quiesced. This means that if the PHB gets
hotplugged after CAS but before SLOF quiesce, then we're sure that the
phandle is not known when the hotplug handler is called.

The FDT is only needed when the guest first invokes RTAS to configure
the connector actually, long after SLOF quiesce. Let's postpone the
creation of FDT fragments for PHBs to rtas_ibm_configure_connector().

Since we only need this for PHBs, introduce a new method in the base
DRC class for that. DRC subtypes will be converted to use it in
subsequent patches.

Allow spapr_drc_attach() to be passed a NULL fdt argument if the method
is available. When all DRC subtypes have been converted, the fdt argument
will eventually disappear.

Signed-off-by: Greg Kurz 
Message-Id: 
<155059665823.1466090.18358845122627355537.st...@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson 
---
 hw/ppc/spapr_drc.c | 36 +++-
 include/hw/ppc/spapr_drc.h |  6 ++
 2 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 2edb7d1e9c..66b965a0a7 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -22,6 +22,7 @@
 #include "qemu/error-report.h"
 #include "hw/ppc/spapr.h" /* for RTAS return codes */
 #include "hw/pci-host/spapr.h" /* spapr_phb_remove_pci_device_cb callback */
+#include "sysemu/device_tree.h"
 #include "trace.h"
 
 #define DRC_CONTAINER_PATH "/dr-connector"
@@ -376,6 +377,8 @@ static void prop_get_fdt(Object *obj, Visitor *v, const 
char *name,
 void spapr_drc_attach(sPAPRDRConnector *drc, DeviceState *d, void *fdt,
   int fdt_start_offset, Error **errp)
 {
+sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+
 trace_spapr_drc_attach(spapr_drc_index(drc));
 
 if (drc->dev) {
@@ -384,11 +387,14 @@ void spapr_drc_attach(sPAPRDRConnector *drc, DeviceState 
*d, void *fdt,
 }
 g_assert((drc->state == SPAPR_DRC_STATE_LOGICAL_UNUSABLE)
  || (drc->state == SPAPR_DRC_STATE_PHYSICAL_POWERON));
-g_assert(fdt);
+g_assert(fdt || drck->dt_populate);
 
 drc->dev = d;
-drc->fdt = fdt;
-drc->fdt_start_offset = fdt_start_offset;
+
+if (fdt) {
+drc->fdt = fdt;
+drc->fdt_start_offset = fdt_start_offset;
+}
 
 object_property_add_link(OBJECT(drc), "device",
  object_get_typename(OBJECT(drc->dev)),
@@ -1102,10 +1108,30 @@ static void rtas_ibm_configure_connector(PowerPCCPU 
*cpu,
 goto out;
 }
 
-g_assert(drc->fdt);
-
 drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
 
+g_assert(drc->fdt || drck->dt_populate);
+
+if (!drc->fdt) {
+Error *local_err = NULL;
+void *fdt;
+int fdt_size;
+
+fdt = create_device_tree(_size);
+
+if (drck->dt_populate(drc, spapr, fdt, >fdt_start_offset,
+  _err)) {
+g_free(fdt);
+error_free(local_err);
+rc = SPAPR_DR_CC_RESPONSE_ERROR;
+goto out;
+}
+
+drc->fdt = fdt;
+drc->ccs_offset = drc->fdt_start_offset;
+drc->ccs_depth = 0;
+}
+
 do {
 uint32_t tag;
 const char *name;
diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
index f6ff32e7e2..2aa919f0cf 100644
--- a/include/hw/ppc/spapr_drc.h
+++ b/include/hw/ppc/spapr_drc.h
@@ -18,6 +18,7 @@
 #include "qom/object.h"
 #include "sysemu/sysemu.h"
 #include "hw/qdev.h"

[Qemu-devel] [PULL 40/50] spapr_events: add support for phb hotplug events

2019-02-25 Thread David Gibson

From: Michael Roth 

Extend the existing EPOW event format we use for PCI
devices to emit PHB plug/unplug events.

Signed-off-by: Michael Roth 
Reviewed-by: David Gibson 
Signed-off-by: Greg Kurz 
Message-Id: 
<155059671405.1466090.535964535260503283.st...@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson 
---
 hw/ppc/spapr_events.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index b9c7ecb9e9..ab9a1f0063 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -526,6 +526,9 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t 
hp_action,
 case SPAPR_DR_CONNECTOR_TYPE_CPU:
 hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_CPU;
 break;
+case SPAPR_DR_CONNECTOR_TYPE_PHB:
+hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PHB;
+break;
 default:
 /* we shouldn't be signaling hotplug events for resources
  * that don't support them
-- 
2.20.1

[Qemu-devel] [PULL 39/50] spapr: populate PHB DRC entries for root DT node

2019-02-25 Thread David Gibson

From: Nathan Fontenot 

This add entries to the root OF node to advertise our PHBs as being
DR-capable in accordance with PAPR specification.

Signed-off-by: Nathan Fontenot 
Signed-off-by: Michael Roth 
Reviewed-by: David Gibson 
Signed-off-by: Greg Kurz 
Message-Id: 
<155059670897.1466090.10843921337591637414.st...@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson 
---
 hw/ppc/spapr.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 96bea7580a..fcda177090 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1365,6 +1365,14 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr)
 exit(1);
 }
 
+if (smc->dr_phb_enabled) {
+ret = spapr_drc_populate_dt(fdt, 0, NULL, SPAPR_DR_CONNECTOR_TYPE_PHB);
+if (ret < 0) {
+error_report("Couldn't set up PHB DR device tree properties");
+exit(1);
+}
+}
+
 return fdt;
 }
 
-- 
2.20.1

[Qemu-devel] [PULL 18/50] target/ppc/spapr: Set LPCR:HR when using Radix mode

2019-02-25 Thread David Gibson

From: Benjamin Herrenschmidt 

The HW relies on LPCR:HR along with the PATE to determine whether
to use Radix or Hash mode. In fact it uses LPCR:HR more commonly
than the PATE.

For us, it's also more efficient to do so, especially since unlike
the HW we do not maintain a cache of the current PATE and HV PATE
in a generic place.

Prepare the grounds for that by ensuring that LPCR:HR is set
properly on SPAPR machines.

Another option would have been to use a callback to get the PATE
but this gets messy when implementing bare metal support, it's
much simpler (and faster) to use LPCR.

Since existing migration streams may not have it, fix it up in
spapr_post_load() as well based on the pseudo-PATE entry that
we keep.

Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: Cédric Le Goater 
Message-Id: <20190215170029.15641-2-...@kaod.org>
Signed-off-by: David Gibson 
---
 hw/ppc/spapr.c  | 41 +++-
 hw/ppc/spapr_hcall.c| 46 +++--
 hw/ppc/spapr_rtas.c |  6 +++---
 include/hw/ppc/spapr.h  |  1 +
 target/ppc/cpu.h|  1 +
 target/ppc/mmu-hash64.c |  2 +-
 6 files changed, 54 insertions(+), 43 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index b3631e22c4..84f6e9d9a8 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1389,6 +1389,37 @@ static void emulate_spapr_hypercall(PPCVirtualHypervisor 
*vhyp,
 }
 }
 
+struct LPCRSyncState {
+target_ulong value;
+target_ulong mask;
+};
+
+static void do_lpcr_sync(CPUState *cs, run_on_cpu_data arg)
+{
+struct LPCRSyncState *s = arg.host_ptr;
+PowerPCCPU *cpu = POWERPC_CPU(cs);
+CPUPPCState *env = >env;
+target_ulong lpcr;
+
+cpu_synchronize_state(cs);
+lpcr = env->spr[SPR_LPCR];
+lpcr &= ~s->mask;
+lpcr |= s->value;
+ppc_store_lpcr(cpu, lpcr);
+}
+
+void spapr_set_all_lpcrs(target_ulong value, target_ulong mask)
+{
+CPUState *cs;
+struct LPCRSyncState s = {
+.value = value,
+.mask = mask
+};
+CPU_FOREACH(cs) {
+run_on_cpu(cs, do_lpcr_sync, RUN_ON_CPU_HOST_PTR());
+}
+}
+
 static uint64_t spapr_get_patbe(PPCVirtualHypervisor *vhyp)
 {
 sPAPRMachineState *spapr = SPAPR_MACHINE(vhyp);
@@ -1565,7 +1596,7 @@ void spapr_reallocate_hpt(sPAPRMachineState *spapr, int 
shift,
 }
 }
 /* We're setting up a hash table, so that means we're not radix */
-spapr->patb_entry = 0;
+spapr_set_all_lpcrs(0, LPCR_HR | LPCR_UPRT);
 }
 
 void spapr_setup_hpt_and_vrma(sPAPRMachineState *spapr)
@@ -1623,6 +1654,7 @@ static void spapr_machine_reset(void)
  * without a HPT because KVM will start them in radix mode.
  * Set the GR bit in PATB so that we know there is no HPT. */
 spapr->patb_entry = PATBE1_GR;
+spapr_set_all_lpcrs(LPCR_HR | LPCR_UPRT, LPCR_HR | LPCR_UPRT);
 } else {
 spapr_setup_hpt_and_vrma(spapr);
 }
@@ -1781,6 +1813,13 @@ static int spapr_post_load(void *opaque, int version_id)
 bool radix = !!(spapr->patb_entry & PATBE1_GR);
 bool gtse = !!(cpu->env.spr[SPR_LPCR] & LPCR_GTSE);
 
+/*
+ * Update LPCR:HR and UPRT as they may not be set properly in
+ * the stream
+ */
+spapr_set_all_lpcrs(radix ? (LPCR_HR | LPCR_UPRT) : 0,
+LPCR_HR | LPCR_UPRT);
+
 err = kvmppc_configure_v3_mmu(cpu, radix, gtse, spapr->patb_entry);
 if (err) {
 error_report("Process table config unsupported by the host");
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 17bcaa3822..b47241ace6 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -17,37 +17,6 @@
 #include "mmu-book3s-v3.h"
 #include "hw/mem/memory-device.h"
 
-struct LPCRSyncState {
-target_ulong value;
-target_ulong mask;
-};
-
-static void do_lpcr_sync(CPUState *cs, run_on_cpu_data arg)
-{
-struct LPCRSyncState *s = arg.host_ptr;
-PowerPCCPU *cpu = POWERPC_CPU(cs);
-CPUPPCState *env = >env;
-target_ulong lpcr;
-
-cpu_synchronize_state(cs);
-lpcr = env->spr[SPR_LPCR];
-lpcr &= ~s->mask;
-lpcr |= s->value;
-ppc_store_lpcr(cpu, lpcr);
-}
-
-static void set_all_lpcrs(target_ulong value, target_ulong mask)
-{
-CPUState *cs;
-struct LPCRSyncState s = {
-.value = value,
-.mask = mask
-};
-CPU_FOREACH(cs) {
-run_on_cpu(cs, do_lpcr_sync, RUN_ON_CPU_HOST_PTR());
-}
-}
-
 static bool has_spr(PowerPCCPU *cpu, int spr)
 {
 /* We can test whether the SPR is defined by checking for a valid name */
@@ -1255,12 +1224,12 @@ static target_ulong h_set_mode_resource_le(PowerPCCPU 
*cpu,
 
 switch (mflags) {
 case H_SET_MODE_ENDIAN_BIG:
-set_all_lpcrs(0, LPCR_ILE);
+spapr_set_all_lpcrs(0, LPCR_ILE);
 spapr_pci_switch_vga(true);
 return H_SUCCESS;
 
 case H_SET_MODE_ENDIAN_LITTLE:
-set_all_lpcrs(LPCR_ILE, LPCR_ILE);
+

[Qemu-devel] [PULL 37/50] spapr_pci: add PHB unrealize

2019-02-25 Thread David Gibson

From: Greg Kurz 

To support PHB hotplug we need to clean up lingering references,
memory, child properties, etc. prior to the PHB object being
finalized. Generally this will be called as a result of calling
object_unparent() on the PHB object, which in turn would normally
be called as the result of an unplug() operation.

When the PHB is finalized, child objects will be unparented in
turn, and finalized if the PHB was the only reference holder. so
we don't bother to explicitly unparent child objects of the PHB,
with the notable exception of DRCs. This is needed to avoid a QEMU
crash when unplugging a PHB and resetting the machine before the
guest could handle the event. The DRCs are removed from the QOM tree
by  pci_unregister_root_bus() and we must make sure we're not leaving
stale aliases under the global /dr-connector path.

The formula that gives the number of DMA windows is moved to an
inline function in the hw/pci-host/spapr.h header because it
will have other users.

The unrealize function is able to cope with partially realized PHBs.
It is hence used to implement proper rollback on the realize error
path.

Signed-off-by: Michael Roth 
Signed-off-by: Greg Kurz 
Reviewed-by: David Gibson 
Message-Id: 
<155059669881.1466090.13515030705986041517.st...@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson 
---
 hw/ppc/spapr_pci.c  | 86 +++--
 include/hw/pci-host/spapr.h |  5 +++
 2 files changed, 87 insertions(+), 4 deletions(-)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index e2bc9fec82..ede928b0bf 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1570,6 +1570,75 @@ static void spapr_pci_unplug_request(HotplugHandler 
*plug_handler,
 }
 }
 
+static void spapr_phb_finalizefn(Object *obj)
+{
+sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(obj);
+
+g_free(sphb->dtbusname);
+sphb->dtbusname = NULL;
+}
+
+static void spapr_phb_unrealize(DeviceState *dev, Error **errp)
+{
+sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
+SysBusDevice *s = SYS_BUS_DEVICE(dev);
+PCIHostState *phb = PCI_HOST_BRIDGE(s);
+sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(phb);
+sPAPRTCETable *tcet;
+int i;
+const unsigned windows_supported = spapr_phb_windows_supported(sphb);
+
+if (sphb->msi) {
+g_hash_table_unref(sphb->msi);
+sphb->msi = NULL;
+}
+
+/*
+ * Remove IO/MMIO subregions and aliases, rest should get cleaned
+ * via PHB's unrealize->object_finalize
+ */
+for (i = windows_supported - 1; i >= 0; i--) {
+tcet = spapr_tce_find_by_liobn(sphb->dma_liobn[i]);
+if (tcet) {
+memory_region_del_subregion(>iommu_root,
+spapr_tce_get_iommu(tcet));
+}
+}
+
+if (sphb->dr_enabled) {
+for (i = PCI_SLOT_MAX * 8 - 1; i >= 0; i--) {
+sPAPRDRConnector *drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PCI,
+(sphb->index << 16) | i);
+
+if (drc) {
+object_unparent(OBJECT(drc));
+}
+}
+}
+
+for (i = PCI_NUM_PINS - 1; i >= 0; i--) {
+if (sphb->lsi_table[i].irq) {
+spapr_irq_free(spapr, sphb->lsi_table[i].irq, 1);
+sphb->lsi_table[i].irq = 0;
+}
+}
+
+QLIST_REMOVE(sphb, list);
+
+memory_region_del_subregion(>iommu_root, >msiwindow);
+
+address_space_destroy(>iommu_as);
+
+qbus_set_hotplug_handler(BUS(phb->bus), NULL, _abort);
+pci_unregister_root_bus(phb->bus);
+
+memory_region_del_subregion(get_system_memory(), >iowindow);
+if (sphb->mem64_win_pciaddr != (hwaddr)-1) {
+memory_region_del_subregion(get_system_memory(), >mem64window);
+}
+memory_region_del_subregion(get_system_memory(), >mem32window);
+}
+
 static void spapr_phb_realize(DeviceState *dev, Error **errp)
 {
 /* We don't use SPAPR_MACHINE() in order to exit gracefully if the user
@@ -1587,8 +1656,7 @@ static void spapr_phb_realize(DeviceState *dev, Error 
**errp)
 PCIBus *bus;
 uint64_t msi_window_size = 4096;
 sPAPRTCETable *tcet;
-const unsigned windows_supported =
-sphb->ddw_enabled ? SPAPR_PCI_DMA_MAX_WINDOWS : 1;
+const unsigned windows_supported = spapr_phb_windows_supported(sphb);
 
 if (!spapr) {
 error_setg(errp, TYPE_SPAPR_PCI_HOST_BRIDGE " needs a pseries 
machine");
@@ -1745,6 +1813,10 @@ static void spapr_phb_realize(DeviceState *dev, Error 
**errp)
 if (local_err) {
 error_propagate_prepend(errp, local_err,
 "can't allocate LSIs: ");
+/*
+ * Older machines will never support PHB hotplug, ie, this is 
an
+ * init only path and QEMU will terminate. No need to rollback.
+ */
 return;
 }
 }
@@ -1752,7 +1824,7 @@ static

[Qemu-devel] [PULL 20/50] target/ppc: Re-enable RMLS on POWER9 for virtual hypervisors

2019-02-25 Thread David Gibson

From: Benjamin Herrenschmidt 

Historically the 64-bit server MMU supports two way of configuring the
guest "real mode" mapping:

 - The "RMA" with is a single chunk of physically contiguous
memory remapped as guest real, and controlled by the RMLS
field in the LPCR register and the RMOR register.

 - The "VRMA" which uses special PTEs inserted in the partition
hash table by the hypervisor.

POWER9 deprecates the former, which is reflected by the filtering
done in ppc_store_lpcr() which effectively prevents setting of
the RMLS field.

However, when using fully emulated SPAPR machines, our qemu code
currently only knows how to define the guest real mode memory using
RMLS.

Thus you cannot run a SPAPR machine anymore with a POWER9 CPU
model today.

This works around it with a quirk in ppc_store_lpcr() to continue
allowing the RMLS field to be set when using a virtual hypervisor.

Ultimately we will want to implement configuring a VRMA instead
which will also be necessary if we want to migrate a SPAPR guest
between TCG and KVM but this is a lot more work.

Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: Cédric Le Goater 
Message-Id: <20190215170029.15641-4-...@kaod.org>
Signed-off-by: David Gibson 
---
 target/ppc/mmu-hash64.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
index f1c7729332..1175b991d9 100644
--- a/target/ppc/mmu-hash64.c
+++ b/target/ppc/mmu-hash64.c
@@ -1088,6 +1088,14 @@ void ppc_store_lpcr(PowerPCCPU *cpu, target_ulong val)
   (LPCR_PECE_L_MASK & (LPCR_PDEE | LPCR_HDEE | LPCR_EEE |
   LPCR_DEE | LPCR_OEE)) | LPCR_MER | LPCR_GTSE | LPCR_TC |
   LPCR_HEIC | LPCR_LPES0 | LPCR_HVICE | LPCR_HDICE);
+/*
+ * If we have a virtual hypervisor, we need to bring back RMLS. It
+ * doesn't exist on an actual P9 but that's all we know how to
+ * configure with softmmu at the moment
+ */
+if (cpu->vhyp) {
+lpcr |= (val & LPCR_RMLS);
+}
 break;
 default:
 ;
-- 
2.20.1

[Qemu-devel] [PULL 22/50] target/ppc: Fix ordering of hash MMU accesses

2019-02-25 Thread David Gibson

From: Benjamin Herrenschmidt 

With mttcg, we can have MMU lookups happening at the same time
as the guest modifying the page tables.

Since the HPTEs of the hash table MMU contains two words (or
double worlds on 64-bit), we need to make sure we read them
in the right order, with the correct memory barrier.

Additionally, when using emulated SPAPR mode, the hypercalls
writing to the hash table must also perform the udpates in
the right order.

Note: This part is still not entirely correct

Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: Cédric Le Goater 
Message-Id: <20190215170029.15641-7-...@kaod.org>
Signed-off-by: David Gibson 
---
 hw/ppc/spapr.c  | 21 +++--
 target/ppc/mmu-hash32.c |  6 ++
 target/ppc/mmu-hash64.c |  6 ++
 3 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 84f6e9d9a8..d2520bc662 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1524,8 +1524,25 @@ static void spapr_store_hpte(PPCVirtualHypervisor *vhyp, 
hwaddr ptex,
 if (!spapr->htab) {
 kvmppc_write_hpte(ptex, pte0, pte1);
 } else {
-stq_p(spapr->htab + offset, pte0);
-stq_p(spapr->htab + offset + HASH_PTE_SIZE_64 / 2, pte1);
+if (pte0 & HPTE64_V_VALID) {
+stq_p(spapr->htab + offset + HASH_PTE_SIZE_64 / 2, pte1);
+/*
+ * When setting valid, we write PTE1 first. This ensures
+ * proper synchronization with the reading code in
+ * ppc_hash64_pteg_search()
+ */
+smp_wmb();
+stq_p(spapr->htab + offset, pte0);
+} else {
+stq_p(spapr->htab + offset, pte0);
+/*
+ * When clearing it we set PTE0 first. This ensures proper
+ * synchronization with the reading code in
+ * ppc_hash64_pteg_search()
+ */
+smp_wmb();
+stq_p(spapr->htab + offset + HASH_PTE_SIZE_64 / 2, pte1);
+}
 }
 }
 
diff --git a/target/ppc/mmu-hash32.c b/target/ppc/mmu-hash32.c
index 03ae3c1279..e8562a7c87 100644
--- a/target/ppc/mmu-hash32.c
+++ b/target/ppc/mmu-hash32.c
@@ -319,6 +319,12 @@ static hwaddr ppc_hash32_pteg_search(PowerPCCPU *cpu, 
hwaddr pteg_off,
 
 for (i = 0; i < HPTES_PER_GROUP; i++) {
 pte0 = ppc_hash32_load_hpte0(cpu, pte_offset);
+/*
+ * pte0 contains the valid bit and must be read before pte1,
+ * otherwise we might see an old pte1 with a new valid bit and
+ * thus an inconsistent hpte value
+ */
+smp_rmb();
 pte1 = ppc_hash32_load_hpte1(cpu, pte_offset);
 
 if ((pte0 & HPTE32_V_VALID)
diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
index 1175b991d9..fbefe5b5aa 100644
--- a/target/ppc/mmu-hash64.c
+++ b/target/ppc/mmu-hash64.c
@@ -507,6 +507,12 @@ static hwaddr ppc_hash64_pteg_search(PowerPCCPU *cpu, 
hwaddr hash,
 }
 for (i = 0; i < HPTES_PER_GROUP; i++) {
 pte0 = ppc_hash64_hpte0(cpu, pteg, i);
+/*
+ * pte0 contains the valid bit and must be read before pte1,
+ * otherwise we might see an old pte1 with a new valid bit and
+ * thus an inconsistent hpte value
+ */
+smp_rmb();
 pte1 = ppc_hash64_hpte1(cpu, pteg, i);
 
 /* This compares V, B, H (secondary) and the AVPN */
-- 
2.20.1

[Qemu-devel] [PULL 32/50] spapr/pci: Generate FDT fragment at configure connector time

2019-02-25 Thread David Gibson

From: Greg Kurz 

Signed-off-by: Greg Kurz 
Message-Id: 
<155059667346.1466090.326696113231137772.st...@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson 
---
 hw/ppc/spapr_drc.c  |  1 +
 hw/ppc/spapr_pci.c  | 19 ---
 include/hw/pci-host/spapr.h |  4 +++-
 3 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index aa26aa40be..248eb8a93d 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -691,6 +691,7 @@ static void spapr_drc_pci_class_init(ObjectClass *k, void 
*data)
 drck->typename = "28";
 drck->drc_name_prefix = "C";
 drck->release = spapr_phb_remove_pci_device_cb;
+drck->dt_populate = spapr_pci_dt_populate;
 }
 
 static void spapr_drc_lmb_class_init(ObjectClass *k, void *data)
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 60777b2355..b22c9f57b2 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1408,6 +1408,17 @@ static uint32_t 
spapr_phb_get_pci_drc_index(sPAPRPHBState *phb,
 return spapr_drc_index(drc);
 }
 
+int spapr_pci_dt_populate(sPAPRDRConnector *drc, sPAPRMachineState *spapr,
+  void *fdt, int *fdt_start_offset, Error **errp)
+{
+HotplugHandler *plug_handler = qdev_get_hotplug_handler(drc->dev);
+sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(plug_handler);
+PCIDevice *pdev = PCI_DEVICE(drc->dev);
+
+*fdt_start_offset = spapr_create_pci_child_dt(sphb, pdev, fdt, 0);
+return 0;
+}
+
 static void spapr_pci_plug(HotplugHandler *plug_handler,
DeviceState *plugged_dev, Error **errp)
 {
@@ -1417,8 +1428,6 @@ static void spapr_pci_plug(HotplugHandler *plug_handler,
 Error *local_err = NULL;
 PCIBus *bus = PCI_BUS(qdev_get_parent_bus(DEVICE(pdev)));
 uint32_t slotnr = PCI_SLOT(pdev->devfn);
-void *fdt = NULL;
-int fdt_start_offset, fdt_size;
 
 /* if DR is disabled we don't need to do anything in the case of
  * hotplug or coldplug callbacks
@@ -1448,10 +1457,7 @@ static void spapr_pci_plug(HotplugHandler *plug_handler,
 goto out;
 }
 
-fdt = create_device_tree(_size);
-fdt_start_offset = spapr_create_pci_child_dt(phb, pdev, fdt, 0);
-
-spapr_drc_attach(drc, DEVICE(pdev), fdt, fdt_start_offset, _err);
+spapr_drc_attach(drc, DEVICE(pdev), NULL, 0, _err);
 if (local_err) {
 goto out;
 }
@@ -1483,7 +1489,6 @@ static void spapr_pci_plug(HotplugHandler *plug_handler,
 out:
 if (local_err) {
 error_propagate(errp, local_err);
-g_free(fdt);
 }
 }
 
diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index 51d81c4b7c..f6e43f48fe 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -121,8 +121,10 @@ sPAPRPHBState *spapr_pci_find_phb(sPAPRMachineState 
*spapr, uint64_t buid);
 PCIDevice *spapr_pci_find_dev(sPAPRMachineState *spapr, uint64_t buid,
   uint32_t config_addr);
 
-/* PCI release callback. */
+/* DRC callbacks */
 void spapr_phb_remove_pci_device_cb(DeviceState *dev);
+int spapr_pci_dt_populate(sPAPRDRConnector *drc, sPAPRMachineState *spapr,
+  void *fdt, int *fdt_start_offset, Error **errp);
 
 /* VFIO EEH hooks */
 #ifdef CONFIG_LINUX
-- 
2.20.1

[Qemu-devel] [PULL 15/50] tests/device-plug: Add CCW unplug test for s390x

2019-02-25 Thread David Gibson

From: David Hildenbrand 

As CCW unplugs are surprise removals without asking the guest first,
we can test this without any guest interaction.

Reviewed-by: Michael S. Tsirkin 
Reviewed-by: Thomas Huth 
Signed-off-by: David Hildenbrand 
Message-Id: <20190218092202.26683-5-da...@redhat.com>
Acked-by: Cornelia Huck 
Signed-off-by: David Gibson 
---
 tests/device-plug-test.c | 41 +++-
 1 file changed, 36 insertions(+), 5 deletions(-)

diff --git a/tests/device-plug-test.c b/tests/device-plug-test.c
index cd9ada539d..d1a6c94af2 100644
--- a/tests/device-plug-test.c
+++ b/tests/device-plug-test.c
@@ -15,17 +15,26 @@
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qstring.h"
 
-static void device_del_request(QTestState *qtest, const char *id)
+static void device_del_start(QTestState *qtest, const char *id)
 {
-QDict *resp;
+qtest_qmp_send(qtest,
+   "{'execute': 'device_del', 'arguments': { 'id': %s } }", 
id);
+}
+
+static void device_del_finish(QTestState *qtest)
+{
+QDict *resp = qtest_qmp_receive(qtest);
 
-resp = qtest_qmp(qtest,
- "{'execute': 'device_del', 'arguments': { 'id': %s } }",
- id);
 g_assert(qdict_haskey(resp, "return"));
 qobject_unref(resp);
 }
 
+static void device_del_request(QTestState *qtest, const char *id)
+{
+device_del_start(qtest, id);
+device_del_finish(qtest);
+}
+
 static void system_reset(QTestState *qtest)
 {
 QDict *resp;
@@ -77,8 +86,25 @@ static void test_pci_unplug_request(void)
 qtest_quit(qtest);
 }
 
+static void test_ccw_unplug(void)
+{
+QTestState *qtest = qtest_initf("-device virtio-balloon-ccw,id=dev0");
+
+/*
+ * The DEVICE_DELETED events will be sent before the command
+ * completes.
+ */
+device_del_start(qtest, "dev0");
+wait_device_deleted_event(qtest, "dev0");
+device_del_finish(qtest);
+
+qtest_quit(qtest);
+}
+
 int main(int argc, char **argv)
 {
+const char *arch = qtest_get_arch();
+
 g_test_init(, , NULL);
 
 /*
@@ -89,5 +115,10 @@ int main(int argc, char **argv)
 qtest_add_func("/device-plug/pci-unplug-request",
test_pci_unplug_request);
 
+if (!strcmp(arch, "s390x")) {
+qtest_add_func("/device-plug/ccw-unplug",
+   test_ccw_unplug);
+}
+
 return g_test_run();
 }
-- 
2.20.1

[Qemu-devel] [PULL 26/50] target/ppc: Rename PATB/PATBE -> PATE

2019-02-25 Thread David Gibson

From: Benjamin Herrenschmidt 

That "b" means "base address" and thus shouldn't be in the name
of actual entries and related constants.

This patch keeps the synthetic patb_entry field of the spapr
virtual hypervisor unchanged until I figure out if that has
an impact on the migration stream.

Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: Cédric Le Goater 
Message-Id: <20190215170029.15641-11-...@kaod.org>
Signed-off-by: David Gibson 
---
 hw/ppc/spapr.c | 24 +++-
 hw/ppc/spapr_hcall.c   | 22 --
 target/ppc/cpu.h   |  6 +-
 target/ppc/mmu-book3s-v3.h | 11 ++-
 target/ppc/mmu-radix64.c   | 18 ++
 target/ppc/mmu-radix64.h   |  4 ++--
 6 files changed, 54 insertions(+), 31 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index d2520bc662..00eb3b643c 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1420,11 +1420,13 @@ void spapr_set_all_lpcrs(target_ulong value, 
target_ulong mask)
 }
 }
 
-static uint64_t spapr_get_patbe(PPCVirtualHypervisor *vhyp)
+static void spapr_get_pate(PPCVirtualHypervisor *vhyp, ppc_v3_pate_t *entry)
 {
 sPAPRMachineState *spapr = SPAPR_MACHINE(vhyp);
 
-return spapr->patb_entry;
+/* Copy PATE1:GR into PATE0:HR */
+entry->dw0 = spapr->patb_entry & PATE0_HR;
+entry->dw1 = spapr->patb_entry;
 }
 
 #define HPTE(_table, _i)   (void *)(((uint64_t *)(_table)) + ((_i) * 2))
@@ -1667,17 +1669,21 @@ static void spapr_machine_reset(void)
 if (kvm_enabled() && kvmppc_has_cap_mmu_radix() &&
 ppc_type_check_compat(machine->cpu_type, CPU_POWERPC_LOGICAL_3_00, 0,
   spapr->max_compat_pvr)) {
-/* If using KVM with radix mode available, VCPUs can be started
+/*
+ * If using KVM with radix mode available, VCPUs can be started
  * without a HPT because KVM will start them in radix mode.
- * Set the GR bit in PATB so that we know there is no HPT. */
-spapr->patb_entry = PATBE1_GR;
+ * Set the GR bit in PATE so that we know there is no HPT.
+ */
+spapr->patb_entry = PATE1_GR;
 spapr_set_all_lpcrs(LPCR_HR | LPCR_UPRT, LPCR_HR | LPCR_UPRT);
 } else {
 spapr_setup_hpt_and_vrma(spapr);
 }
 
-/* if this reset wasn't generated by CAS, we should reset our
- * negotiated options and start from scratch */
+/*
+ * If this reset wasn't generated by CAS, we should reset our
+ * negotiated options and start from scratch
+ */
 if (!spapr->cas_reboot) {
 spapr_ovec_cleanup(spapr->ov5_cas);
 spapr->ov5_cas = spapr_ovec_new();
@@ -1827,7 +1833,7 @@ static int spapr_post_load(void *opaque, int version_id)
 
 if (kvm_enabled() && spapr->patb_entry) {
 PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
-bool radix = !!(spapr->patb_entry & PATBE1_GR);
+bool radix = !!(spapr->patb_entry & PATE1_GR);
 bool gtse = !!(cpu->env.spr[SPR_LPCR] & LPCR_GTSE);
 
 /*
@@ -4118,7 +4124,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
void *data)
 vhc->map_hptes = spapr_map_hptes;
 vhc->unmap_hptes = spapr_unmap_hptes;
 vhc->store_hpte = spapr_store_hpte;
-vhc->get_patbe = spapr_get_patbe;
+vhc->get_pate = spapr_get_pate;
 vhc->encode_hpt_for_kvm_pr = spapr_encode_hpt_for_kvm_pr;
 xic->ics_get = spapr_ics_get;
 xic->ics_resend = spapr_ics_resend;
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index b47241ace6..476bad6271 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1311,12 +1311,12 @@ static void 
spapr_check_setup_free_hpt(sPAPRMachineState *spapr,
  *   later and so assumed radix and now it's called H_REG_PROC_TBL
  */
 
-if ((patbe_old & PATBE1_GR) == (patbe_new & PATBE1_GR)) {
+if ((patbe_old & PATE1_GR) == (patbe_new & PATE1_GR)) {
 /* We assume RADIX, so this catches all the "Do Nothing" cases */
-} else if (!(patbe_old & PATBE1_GR)) {
+} else if (!(patbe_old & PATE1_GR)) {
 /* HASH->RADIX : Free HPT */
 spapr_free_hpt(spapr);
-} else if (!(patbe_new & PATBE1_GR)) {
+} else if (!(patbe_new & PATE1_GR)) {
 /* RADIX->HASH || NOTHING->HASH : Allocate HPT */
 spapr_setup_hpt_and_vrma(spapr);
 }
@@ -1354,7 +1354,7 @@ static target_ulong h_register_process_table(PowerPCCPU 
*cpu,
 } else if (table_size > 24) {
 return H_P4;
 }
-cproc = PATBE1_GR | proc_tbl | table_size;
+cproc = PATE1_GR | proc_tbl | table_size;
 } else { /* Register new HPT process table */
 if (flags & FLAG_HASH_PROC_TBL) { /* Hash with Segment Tables 
*/
 /* TODO - Not Supported */
@@ -1373,13 +1373,15 @@ static target_ulong h_register_process_table(PowerPCCPU 
*cpu,
 }
 
 } else { /* Deregister current process table */
-

[Qemu-devel] [PULL 42/50] spapr_pci: add ibm, my-drc-index property for PHB hotplug

2019-02-25 Thread David Gibson

From: Michael Roth 

This is needed to denote a boot-time PHB as being hot-pluggable.

Signed-off-by: Michael Roth 
Reviewed-by: David Gibson 
Signed-off-by: Greg Kurz 
Message-Id: 
<155059672420.1466090.15147504040270659866.st...@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson 
---
 hw/ppc/spapr_pci.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index a0e1769439..03fc26985a 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -2203,6 +2203,7 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb, uint32_t 
intc_phandle, void *fdt,
 sPAPRTCETable *tcet;
 PCIBus *bus = PCI_HOST_BRIDGE(phb)->bus;
 sPAPRFDT s_fdt;
+sPAPRDRConnector *drc;
 
 /* Start populating the FDT */
 nodename = g_strdup_printf("pci@%" PRIx64, phb->buid);
@@ -2269,6 +2270,14 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb, uint32_t 
intc_phandle, void *fdt,
  tcet->liobn, tcet->bus_offset,
  tcet->nb_table << tcet->page_shift);
 
+drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PHB, phb->index);
+if (drc) {
+uint32_t drc_index = cpu_to_be32(spapr_drc_index(drc));
+
+_FDT(fdt_setprop(fdt, bus_off, "ibm,my-drc-index", _index,
+ sizeof(drc_index)));
+}
+
 /* Walk the bridges and program the bus numbers*/
 spapr_phb_pci_enumerate(phb);
 _FDT(fdt_setprop_cell(fdt, bus_off, "qemu,phb-enumerated", 0x1));
-- 
2.20.1

[Qemu-devel] [PULL 33/50] spapr/drc: Drop spapr_drc_attach() fdt argument

2019-02-25 Thread David Gibson

From: Greg Kurz 

All DRC subtypes have been converted to generate the FDT fragment at
configure connector time instead of attach time. The fdt and fdt_offset
arguments of spapr_drc_attach() aren't needed anymore. Drop them and
make the implementation of the dt_populate() method mandatory.

Signed-off-by: Greg Kurz 
Message-Id: 
<155059667853.1466090.16527852453054217565.st...@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson 
---
 hw/ppc/spapr.c |  4 ++--
 hw/ppc/spapr_drc.c | 13 +
 hw/ppc/spapr_pci.c |  2 +-
 include/hw/ppc/spapr_drc.h |  3 +--
 4 files changed, 5 insertions(+), 17 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 6cf7a9f5c1..9364d07364 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3362,7 +3362,7 @@ static void spapr_add_lmbs(DeviceState *dev, uint64_t 
addr_start, uint64_t size,
   addr / SPAPR_MEMORY_BLOCK_SIZE);
 g_assert(drc);
 
-spapr_drc_attach(drc, dev, NULL, 0, _err);
+spapr_drc_attach(drc, dev, _err);
 if (local_err) {
 while (addr > addr_start) {
 addr -= SPAPR_MEMORY_BLOCK_SIZE;
@@ -3744,7 +3744,7 @@ static void spapr_core_plug(HotplugHandler *hotplug_dev, 
DeviceState *dev,
 g_assert(drc || !mc->has_hotpluggable_cpus);
 
 if (drc) {
-spapr_drc_attach(drc, dev, NULL, 0, _err);
+spapr_drc_attach(drc, dev, _err);
 if (local_err) {
 error_propagate(errp, local_err);
 return;
diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 248eb8a93d..87ca7d9735 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -374,11 +374,8 @@ static void prop_get_fdt(Object *obj, Visitor *v, const 
char *name,
 } while (fdt_depth != 0);
 }
 
-void spapr_drc_attach(sPAPRDRConnector *drc, DeviceState *d, void *fdt,
-  int fdt_start_offset, Error **errp)
+void spapr_drc_attach(sPAPRDRConnector *drc, DeviceState *d, Error **errp)
 {
-sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
-
 trace_spapr_drc_attach(spapr_drc_index(drc));
 
 if (drc->dev) {
@@ -387,15 +384,9 @@ void spapr_drc_attach(sPAPRDRConnector *drc, DeviceState 
*d, void *fdt,
 }
 g_assert((drc->state == SPAPR_DRC_STATE_LOGICAL_UNUSABLE)
  || (drc->state == SPAPR_DRC_STATE_PHYSICAL_POWERON));
-g_assert(fdt || drck->dt_populate);
 
 drc->dev = d;
 
-if (fdt) {
-drc->fdt = fdt;
-drc->fdt_start_offset = fdt_start_offset;
-}
-
 object_property_add_link(OBJECT(drc), "device",
  object_get_typename(OBJECT(drc->dev)),
  (Object **)(>dev),
@@ -1113,8 +1104,6 @@ static void rtas_ibm_configure_connector(PowerPCCPU *cpu,
 
 drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
 
-g_assert(drc->fdt || drck->dt_populate);
-
 if (!drc->fdt) {
 Error *local_err = NULL;
 void *fdt;
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index b22c9f57b2..e2bc9fec82 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1457,7 +1457,7 @@ static void spapr_pci_plug(HotplugHandler *plug_handler,
 goto out;
 }
 
-spapr_drc_attach(drc, DEVICE(pdev), NULL, 0, _err);
+spapr_drc_attach(drc, DEVICE(pdev), _err);
 if (local_err) {
 goto out;
 }
diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
index 2aa919f0cf..f32758ec84 100644
--- a/include/hw/ppc/spapr_drc.h
+++ b/include/hw/ppc/spapr_drc.h
@@ -261,8 +261,7 @@ sPAPRDRConnector *spapr_drc_by_id(const char *type, 
uint32_t id);
 int spapr_drc_populate_dt(void *fdt, int fdt_offset, Object *owner,
   uint32_t drc_type_mask);
 
-void spapr_drc_attach(sPAPRDRConnector *drc, DeviceState *d, void *fdt,
-  int fdt_start_offset, Error **errp);
+void spapr_drc_attach(sPAPRDRConnector *drc, DeviceState *d, Error **errp);
 void spapr_drc_detach(sPAPRDRConnector *drc);
 bool spapr_drc_needed(void *opaque);
 
-- 
2.20.1

[Qemu-devel] [PULL 09/50] target/ppc: Add POWER9 external interrupt model

2019-02-25 Thread David Gibson

From: Benjamin Herrenschmidt 

Adds support for the Hypervisor directed interrupts in addition to the
OS ones.

Signed-off-by: Benjamin Herrenschmidt 
[clg: - modified the icp_realize() and xive_tctx_realize() to take
into account explicitely the POWER9 interrupt model
  - introduced a specific power9_set_irq for POWER9 ]
Signed-off-by: Cédric Le Goater 
Message-Id: <20190215161648.9600-10-...@kaod.org>
Signed-off-by: David Gibson 
---
 hw/intc/xics.c  |  3 +++
 hw/intc/xive.c  |  3 +++
 hw/ppc/ppc.c| 42 +
 include/hw/ppc/ppc.h|  2 ++
 target/ppc/cpu-qom.h|  2 ++
 target/ppc/cpu.h|  7 ++
 target/ppc/translate_init.inc.c |  4 ++--
 7 files changed, 61 insertions(+), 2 deletions(-)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 3009fa7472..767fdeb829 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -338,6 +338,9 @@ static void icp_realize(DeviceState *dev, Error **errp)
 case PPC_FLAGS_INPUT_POWER7:
 icp->output = env->irq_inputs[POWER7_INPUT_INT];
 break;
+case PPC_FLAGS_INPUT_POWER9: /* For SPAPR xics emulation */
+icp->output = env->irq_inputs[POWER9_INPUT_INT];
+break;
 
 case PPC_FLAGS_INPUT_970:
 icp->output = env->irq_inputs[PPC970_INPUT_INT];
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index 2e9b8efd43..425aa97ef9 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -484,6 +484,9 @@ static void xive_tctx_realize(DeviceState *dev, Error 
**errp)
 case PPC_FLAGS_INPUT_POWER7:
 tctx->output = env->irq_inputs[POWER7_INPUT_INT];
 break;
+case PPC_FLAGS_INPUT_POWER9:
+tctx->output = env->irq_inputs[POWER9_INPUT_INT];
+break;
 
 default:
 error_setg(errp, "XIVE interrupt controller does not support "
diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index 12439dbe5d..d1e3d4cd20 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -306,6 +306,48 @@ void ppcPOWER7_irq_init(PowerPCCPU *cpu)
 env->irq_inputs = (void **)qemu_allocate_irqs(_set_irq, cpu,
   POWER7_INPUT_NB);
 }
+
+/* POWER9 internal IRQ controller */
+static void power9_set_irq(void *opaque, int pin, int level)
+{
+PowerPCCPU *cpu = opaque;
+CPUPPCState *env = >env;
+
+LOG_IRQ("%s: env %p pin %d level %d\n", __func__,
+env, pin, level);
+
+switch (pin) {
+case POWER9_INPUT_INT:
+/* Level sensitive - active high */
+LOG_IRQ("%s: set the external IRQ state to %d\n",
+__func__, level);
+ppc_set_irq(cpu, PPC_INTERRUPT_EXT, level);
+break;
+case POWER9_INPUT_HINT:
+/* Level sensitive - active high */
+LOG_IRQ("%s: set the external IRQ state to %d\n",
+__func__, level);
+ppc_set_irq(cpu, PPC_INTERRUPT_HVIRT, level);
+break;
+default:
+/* Unknown pin - do nothing */
+LOG_IRQ("%s: unknown IRQ pin %d\n", __func__, pin);
+return;
+}
+if (level) {
+env->irq_input_state |= 1 << pin;
+} else {
+env->irq_input_state &= ~(1 << pin);
+}
+}
+
+void ppcPOWER9_irq_init(PowerPCCPU *cpu)
+{
+CPUPPCState *env = >env;
+
+env->irq_inputs = (void **)qemu_allocate_irqs(_set_irq, cpu,
+  POWER9_INPUT_NB);
+}
 #endif /* defined(TARGET_PPC64) */
 
 void ppc40x_core_reset(PowerPCCPU *cpu)
diff --git a/include/hw/ppc/ppc.h b/include/hw/ppc/ppc.h
index 298ec354a8..746170f635 100644
--- a/include/hw/ppc/ppc.h
+++ b/include/hw/ppc/ppc.h
@@ -73,6 +73,7 @@ static inline void ppc40x_irq_init(PowerPCCPU *cpu) {}
 static inline void ppc6xx_irq_init(PowerPCCPU *cpu) {}
 static inline void ppc970_irq_init(PowerPCCPU *cpu) {}
 static inline void ppcPOWER7_irq_init(PowerPCCPU *cpu) {}
+static inline void ppcPOWER9_irq_init(PowerPCCPU *cpu) {}
 static inline void ppce500_irq_init(PowerPCCPU *cpu) {}
 #else
 void ppc40x_irq_init(PowerPCCPU *cpu);
@@ -80,6 +81,7 @@ void ppce500_irq_init(PowerPCCPU *cpu);
 void ppc6xx_irq_init(PowerPCCPU *cpu);
 void ppc970_irq_init(PowerPCCPU *cpu);
 void ppcPOWER7_irq_init(PowerPCCPU *cpu);
+void ppcPOWER9_irq_init(PowerPCCPU *cpu);
 #endif
 
 /* PPC machines for OpenBIOS */
diff --git a/target/ppc/cpu-qom.h b/target/ppc/cpu-qom.h
index 904ee694ac..ae51fe754e 100644
--- a/target/ppc/cpu-qom.h
+++ b/target/ppc/cpu-qom.h
@@ -142,6 +142,8 @@ enum powerpc_input_t {
 PPC_FLAGS_INPUT_970,
 /* PowerPC POWER7 bus   */
 PPC_FLAGS_INPUT_POWER7,
+/* PowerPC POWER9 bus   */
+PPC_FLAGS_INPUT_POWER9,
 /* PowerPC 401 bus  */
 PPC_FLAGS_INPUT_401,
 /* Freescale RCPU bus   */
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 7d37d85ac5..ececad9f1f 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -2327,6 +2327,13 @@ enum {
  * them */

[Qemu-devel] [PULL 25/50] target/ppc: Flush the TLB locally when the LPIDR is written

2019-02-25 Thread David Gibson

From: Benjamin Herrenschmidt 

Our TCG TLB only tags whether it's a HV vs a guest access, so it must
be flushed when the LPIDR is changed.

Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: Cédric Le Goater 
Message-Id: <20190215170029.15641-10-...@kaod.org>
Signed-off-by: David Gibson 
---
 target/ppc/helper.h |  1 +
 target/ppc/misc_helper.c| 15 +++
 target/ppc/translate_init.inc.c |  7 ++-
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 18910d18a4..638a6e99c4 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -689,6 +689,7 @@ DEF_HELPER_2(store_ptcr, void, env, tl)
 #endif
 DEF_HELPER_2(store_sdr1, void, env, tl)
 DEF_HELPER_2(store_pidr, void, env, tl)
+DEF_HELPER_2(store_lpidr, void, env, tl)
 DEF_HELPER_FLAGS_2(store_tbl, TCG_CALL_NO_RWG, void, env, tl)
 DEF_HELPER_FLAGS_2(store_tbu, TCG_CALL_NO_RWG, void, env, tl)
 DEF_HELPER_FLAGS_2(store_atbl, TCG_CALL_NO_RWG, void, env, tl)
diff --git a/target/ppc/misc_helper.c b/target/ppc/misc_helper.c
index b884930096..c65d1ade15 100644
--- a/target/ppc/misc_helper.c
+++ b/target/ppc/misc_helper.c
@@ -117,6 +117,21 @@ void helper_store_pidr(CPUPPCState *env, target_ulong val)
 tlb_flush(CPU(cpu));
 }
 
+void helper_store_lpidr(CPUPPCState *env, target_ulong val)
+{
+PowerPCCPU *cpu = ppc_env_get_cpu(env);
+
+env->spr[SPR_LPIDR] = val;
+
+/*
+ * We need to flush the TLB on LPID changes as we only tag HV vs
+ * guest in TCG TLB. Also the quadrants means the HV will
+ * potentially access and cache entries for the current LPID as
+ * well.
+ */
+tlb_flush(CPU(cpu));
+}
+
 void helper_store_hid0_601(CPUPPCState *env, target_ulong val)
 {
 target_ulong hid0;
diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
index 965c5273a6..58542c0fe0 100644
--- a/target/ppc/translate_init.inc.c
+++ b/target/ppc/translate_init.inc.c
@@ -408,6 +408,11 @@ static void spr_write_pidr(DisasContext *ctx, int sprn, 
int gprn)
 gen_helper_store_pidr(cpu_env, cpu_gpr[gprn]);
 }
 
+static void spr_write_lpidr(DisasContext *ctx, int sprn, int gprn)
+{
+gen_helper_store_lpidr(cpu_env, cpu_gpr[gprn]);
+}
+
 static void spr_read_hior(DisasContext *ctx, int gprn, int sprn)
 {
 tcg_gen_ld_tl(cpu_gpr[gprn], cpu_env, offsetof(CPUPPCState, excp_prefix));
@@ -7885,7 +7890,7 @@ static void gen_spr_book3s_ids(CPUPPCState *env)
 spr_register_hv(env, SPR_LPIDR, "LPIDR",
  SPR_NOACCESS, SPR_NOACCESS,
  SPR_NOACCESS, SPR_NOACCESS,
- _read_generic, _write_generic,
+ _read_generic, _write_lpidr,
  0x);
 spr_register_hv(env, SPR_HFSCR, "HFSCR",
  SPR_NOACCESS, SPR_NOACCESS,
-- 
2.20.1

[Qemu-devel] [PULL 36/50] spapr_irq: Expose the phandle of the interrupt controller

2019-02-25 Thread David Gibson

From: Greg Kurz 

This will be used by PHB hotplug in order to create the "interrupt-map"
property of the PHB node.

Signed-off-by: Greg Kurz 
Message-Id: 
<155059669374.1466090.12943228478046223856.st...@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson 
---
 hw/ppc/spapr_irq.c | 21 +
 include/hw/ppc/spapr_irq.h |  1 +
 2 files changed, 22 insertions(+)

diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index 359761494c..4145079d7f 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -638,6 +638,27 @@ void spapr_irq_reset(sPAPRMachineState *spapr, Error 
**errp)
 }
 }
 
+int spapr_irq_get_phandle(sPAPRMachineState *spapr, void *fdt, Error **errp)
+{
+const char *nodename = spapr->irq->get_nodename(spapr);
+int offset, phandle;
+
+offset = fdt_subnode_offset(fdt, 0, nodename);
+if (offset < 0) {
+error_setg(errp, "Can't find node \"%s\": %s", nodename,
+   fdt_strerror(offset));
+return -1;
+}
+
+phandle = fdt_get_phandle(fdt, offset);
+if (!phandle) {
+error_setg(errp, "Can't get phandle of node \"%s\"", nodename);
+return -1;
+}
+
+return phandle;
+}
+
 /*
  * XICS legacy routines - to deprecate one day
  */
diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
index 8bf1a72919..ec1ee64fa6 100644
--- a/include/hw/ppc/spapr_irq.h
+++ b/include/hw/ppc/spapr_irq.h
@@ -61,6 +61,7 @@ void spapr_irq_free(sPAPRMachineState *spapr, int irq, int 
num);
 qemu_irq spapr_qirq(sPAPRMachineState *spapr, int irq);
 int spapr_irq_post_load(sPAPRMachineState *spapr, int version_id);
 void spapr_irq_reset(sPAPRMachineState *spapr, Error **errp);
+int spapr_irq_get_phandle(sPAPRMachineState *spapr, void *fdt, Error **errp);
 
 /*
  * XICS legacy routines
-- 
2.20.1

[Qemu-devel] [PULL 06/50] target/ppc: Add POWER9 exception model

2019-02-25 Thread David Gibson

From: Benjamin Herrenschmidt 

And use it to get the correct HILE bit in HID0

Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: Cédric Le Goater 
Reviewed-by: David Gibson 
Message-Id: <20190215161648.9600-7-...@kaod.org>
Signed-off-by: David Gibson 
---
 target/ppc/cpu-qom.h|  2 ++
 target/ppc/excp_helper.c| 17 +
 target/ppc/translate.c  |  3 ++-
 target/ppc/translate_init.inc.c |  2 +-
 4 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/target/ppc/cpu-qom.h b/target/ppc/cpu-qom.h
index e9cb158423..904ee694ac 100644
--- a/target/ppc/cpu-qom.h
+++ b/target/ppc/cpu-qom.h
@@ -113,6 +113,8 @@ enum powerpc_excp_t {
 POWERPC_EXCP_POWER7,
 /* POWER8 exception model   */
 POWERPC_EXCP_POWER8,
+/* POWER9 exception model   */
+POWERPC_EXCP_POWER9,
 };
 
 /*/
diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 7536620a41..37546bb0f0 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -147,7 +147,7 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int 
excp_model, int excp)
 
 /* Exception targetting modifiers
  *
- * LPES0 is supported on POWER7/8
+ * LPES0 is supported on POWER7/8/9
  * LPES1 is not supported (old iSeries mode)
  *
  * On anything else, we behave as if LPES0 is 1
@@ -158,9 +158,10 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int 
excp_model, int excp)
  */
 #if defined(TARGET_PPC64)
 if (excp_model == POWERPC_EXCP_POWER7 ||
-excp_model == POWERPC_EXCP_POWER8) {
+excp_model == POWERPC_EXCP_POWER8 ||
+excp_model == POWERPC_EXCP_POWER9) {
 lpes0 = !!(env->spr[SPR_LPCR] & LPCR_LPES0);
-if (excp_model == POWERPC_EXCP_POWER8) {
+if (excp_model != POWERPC_EXCP_POWER7) {
 ail = (env->spr[SPR_LPCR] & LPCR_AIL) >> LPCR_AIL_SHIFT;
 } else {
 ail = 0;
@@ -662,7 +663,15 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int 
excp_model, int excp)
 }
 } else if (excp_model == POWERPC_EXCP_POWER8) {
 if (new_msr & MSR_HVB) {
-if (env->spr[SPR_HID0] & (HID0_HILE | HID0_POWER9_HILE)) {
+if (env->spr[SPR_HID0] & HID0_HILE) {
+new_msr |= (target_ulong)1 << MSR_LE;
+}
+} else if (env->spr[SPR_LPCR] & LPCR_ILE) {
+new_msr |= (target_ulong)1 << MSR_LE;
+}
+} else if (excp_model == POWERPC_EXCP_POWER9) {
+if (new_msr & MSR_HVB) {
+if (env->spr[SPR_HID0] & HID0_POWER9_HILE) {
 new_msr |= (target_ulong)1 << MSR_LE;
 }
 } else if (env->spr[SPR_LPCR] & LPCR_ILE) {
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index fde7ead7b7..819221f246 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -7481,7 +7481,8 @@ void ppc_cpu_dump_state(CPUState *cs, FILE *f, 
fprintf_function cpu_fprintf,
 
 #if defined(TARGET_PPC64)
 if (env->excp_model == POWERPC_EXCP_POWER7 ||
-env->excp_model == POWERPC_EXCP_POWER8) {
+env->excp_model == POWERPC_EXCP_POWER8 ||
+env->excp_model == POWERPC_EXCP_POWER9)  {
 cpu_fprintf(f, "HSRR0 " TARGET_FMT_lx " HSRR1 " TARGET_FMT_lx "\n",
 env->spr[SPR_HSRR0], env->spr[SPR_HSRR1]);
 }
diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
index 8b1d324b3b..9909e58761 100644
--- a/target/ppc/translate_init.inc.c
+++ b/target/ppc/translate_init.inc.c
@@ -8905,7 +8905,7 @@ POWERPC_FAMILY(POWER9)(ObjectClass *oc, void *data)
 pcc->hash64_opts = _hash64_opts_POWER7;
 pcc->radix_page_info = _radix_page_info;
 #endif
-pcc->excp_model = POWERPC_EXCP_POWER8;
+pcc->excp_model = POWERPC_EXCP_POWER9;
 pcc->bus_model = PPC_FLAGS_INPUT_POWER7;
 pcc->bfd_mach = bfd_mach_ppc64;
 pcc->flags = POWERPC_FLAG_VRE | POWERPC_FLAG_SE |
-- 
2.20.1

[Qemu-devel] [PULL 31/50] spapr: Generate FDT fragment for CPUs at configure connector time

2019-02-25 Thread David Gibson

From: Greg Kurz 

Signed-off-by: Greg Kurz 
Message-Id: 
<155059666839.1466090.3833376527523126752.st...@bahia.lab.toulouse-stg.fr.ibm.com>
Signed-off-by: David Gibson 
---
 hw/ppc/spapr.c | 52 +++---
 hw/ppc/spapr_drc.c |  1 +
 include/hw/ppc/spapr.h |  2 ++
 3 files changed, 26 insertions(+), 29 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index b92deee771..6cf7a9f5c1 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3636,27 +3636,6 @@ out:
 error_propagate(errp, local_err);
 }
 
-static void *spapr_populate_hotplug_cpu_dt(CPUState *cs, int *fdt_offset,
-   sPAPRMachineState *spapr)
-{
-PowerPCCPU *cpu = POWERPC_CPU(cs);
-DeviceClass *dc = DEVICE_GET_CLASS(cs);
-int id = spapr_get_vcpu_id(cpu);
-void *fdt;
-int offset, fdt_size;
-char *nodename;
-
-fdt = create_device_tree(_size);
-nodename = g_strdup_printf("%s@%x", dc->fw_name, id);
-offset = fdt_add_subnode(fdt, 0, nodename);
-
-spapr_populate_cpu_dt(cs, fdt, offset, spapr);
-g_free(nodename);
-
-*fdt_offset = offset;
-return fdt;
-}
-
 /* Callback to be called during DRC release. */
 void spapr_core_release(DeviceState *dev)
 {
@@ -3717,6 +3696,27 @@ void spapr_core_unplug_request(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 spapr_hotplug_req_remove_by_index(drc);
 }
 
+int spapr_core_dt_populate(sPAPRDRConnector *drc, sPAPRMachineState *spapr,
+   void *fdt, int *fdt_start_offset, Error **errp)
+{
+sPAPRCPUCore *core = SPAPR_CPU_CORE(drc->dev);
+CPUState *cs = CPU(core->threads[0]);
+PowerPCCPU *cpu = POWERPC_CPU(cs);
+DeviceClass *dc = DEVICE_GET_CLASS(cs);
+int id = spapr_get_vcpu_id(cpu);
+char *nodename;
+int offset;
+
+nodename = g_strdup_printf("%s@%x", dc->fw_name, id);
+offset = fdt_add_subnode(fdt, 0, nodename);
+g_free(nodename);
+
+spapr_populate_cpu_dt(cs, fdt, offset, spapr);
+
+*fdt_start_offset = offset;
+return 0;
+}
+
 static void spapr_core_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
 Error **errp)
 {
@@ -3725,7 +3725,7 @@ static void spapr_core_plug(HotplugHandler *hotplug_dev, 
DeviceState *dev,
 sPAPRMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
 sPAPRCPUCore *core = SPAPR_CPU_CORE(OBJECT(dev));
 CPUCore *cc = CPU_CORE(dev);
-CPUState *cs = CPU(core->threads[0]);
+CPUState *cs;
 sPAPRDRConnector *drc;
 Error *local_err = NULL;
 CPUArchId *core_slot;
@@ -3744,14 +3744,8 @@ static void spapr_core_plug(HotplugHandler *hotplug_dev, 
DeviceState *dev,
 g_assert(drc || !mc->has_hotpluggable_cpus);
 
 if (drc) {
-void *fdt;
-int fdt_offset;
-
-fdt = spapr_populate_hotplug_cpu_dt(cs, _offset, spapr);
-
-spapr_drc_attach(drc, dev, fdt, fdt_offset, _err);
+spapr_drc_attach(drc, dev, NULL, 0, _err);
 if (local_err) {
-g_free(fdt);
 error_propagate(errp, local_err);
 return;
 }
diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 634c28695a..aa26aa40be 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -680,6 +680,7 @@ static void spapr_drc_cpu_class_init(ObjectClass *k, void 
*data)
 drck->typename = "CPU";
 drck->drc_name_prefix = "CPU ";
 drck->release = spapr_core_release;
+drck->dt_populate = spapr_core_dt_populate;
 }
 
 static void spapr_drc_pci_class_init(ObjectClass *k, void *data)
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 0ec309da49..5e3c760725 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -766,6 +766,8 @@ int spapr_max_server_number(sPAPRMachineState *spapr);
 
 /* DRC callbacks. */
 void spapr_core_release(DeviceState *dev);
+int spapr_core_dt_populate(sPAPRDRConnector *drc, sPAPRMachineState *spapr,
+   void *fdt, int *fdt_start_offset, Error **errp);
 void spapr_lmb_release(DeviceState *dev);
 int spapr_lmb_dt_populate(sPAPRDRConnector *drc, sPAPRMachineState *spapr,
   void *fdt, int *fdt_start_offset, Error **errp);
-- 
2.20.1

[Qemu-devel] [PULL 24/50] target/ppc: Fix synchronization of mttcg with broadcast TLB flushes

2019-02-25 Thread David Gibson

From: Benjamin Herrenschmidt 

Let's use the generic helper tlb_flush_all_cpus_synced() instead
of iterating the CPUs ourselves.

We do lose the optimization of clearing the "other" CPUs "need flush"
flags but this shouldn't be a problem in practice.

Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: Cédric Le Goater 
Message-Id: <20190215170029.15641-9-...@kaod.org>
Signed-off-by: David Gibson 
---
 target/ppc/helper_regs.h | 27 ++-
 1 file changed, 10 insertions(+), 17 deletions(-)

diff --git a/target/ppc/helper_regs.h b/target/ppc/helper_regs.h
index 5efd18049e..a2205e1044 100644
--- a/target/ppc/helper_regs.h
+++ b/target/ppc/helper_regs.h
@@ -174,26 +174,19 @@ static inline int hreg_store_msr(CPUPPCState *env, 
target_ulong value,
 static inline void check_tlb_flush(CPUPPCState *env, bool global)
 {
 CPUState *cs = CPU(ppc_env_get_cpu(env));
-if (env->tlb_need_flush & TLB_NEED_LOCAL_FLUSH) {
-tlb_flush(cs);
-env->tlb_need_flush &= ~TLB_NEED_LOCAL_FLUSH;
-}
 
-/* Propagate TLB invalidations to other CPUs when the guest uses broadcast
- * TLB invalidation instructions.
- */
+/* Handle global flushes first */
 if (global && (env->tlb_need_flush & TLB_NEED_GLOBAL_FLUSH)) {
-CPUState *other_cs;
-CPU_FOREACH(other_cs) {
-if (other_cs != cs) {
-PowerPCCPU *cpu = POWERPC_CPU(other_cs);
-CPUPPCState *other_env = >env;
-
-other_env->tlb_need_flush &= ~TLB_NEED_LOCAL_FLUSH;
-tlb_flush(other_cs);
-}
-}
 env->tlb_need_flush &= ~TLB_NEED_GLOBAL_FLUSH;
+env->tlb_need_flush &= ~TLB_NEED_LOCAL_FLUSH;
+tlb_flush_all_cpus_synced(cs);
+return;
+}
+
+/* Then handle local ones */
+if (env->tlb_need_flush & TLB_NEED_LOCAL_FLUSH) {
+env->tlb_need_flush &= ~TLB_NEED_LOCAL_FLUSH;
+tlb_flush(cs);
 }
 }
 #else
-- 
2.20.1

[Qemu-devel] [PULL 28/50] target/ppc: Basic POWER9 bare-metal radix MMU support

2019-02-25 Thread David Gibson

From: Benjamin Herrenschmidt 

No guest support yet

Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: Cédric Le Goater 
Message-Id: <20190215170029.15641-13-...@kaod.org>
Signed-off-by: David Gibson 
---
 target/ppc/mmu-radix64.c | 81 ++--
 1 file changed, 69 insertions(+), 12 deletions(-)

diff --git a/target/ppc/mmu-radix64.c b/target/ppc/mmu-radix64.c
index a07d757063..ca1fb2673f 100644
--- a/target/ppc/mmu-radix64.c
+++ b/target/ppc/mmu-radix64.c
@@ -31,10 +31,26 @@
 static bool ppc_radix64_get_fully_qualified_addr(CPUPPCState *env, vaddr eaddr,
  uint64_t *lpid, uint64_t *pid)
 {
-/* We don't have HV support yet and shouldn't get here with it set anyway 
*/
-assert(!msr_hv);
-
-if (!msr_hv) { /* !MSR[HV] -> Guest */
+if (msr_hv) { /* MSR[HV] -> Hypervisor/bare metal */
+switch (eaddr & R_EADDR_QUADRANT) {
+case R_EADDR_QUADRANT0:
+*lpid = 0;
+*pid = env->spr[SPR_BOOKS_PID];
+break;
+case R_EADDR_QUADRANT1:
+*lpid = env->spr[SPR_LPIDR];
+*pid = env->spr[SPR_BOOKS_PID];
+break;
+case R_EADDR_QUADRANT2:
+*lpid = env->spr[SPR_LPIDR];
+*pid = 0;
+break;
+case R_EADDR_QUADRANT3:
+*lpid = 0;
+*pid = 0;
+break;
+}
+} else {  /* !MSR[HV] -> Guest */
 switch (eaddr & R_EADDR_QUADRANT) {
 case R_EADDR_QUADRANT0: /* Guest application */
 *lpid = env->spr[SPR_LPIDR];
@@ -186,21 +202,32 @@ static uint64_t ppc_radix64_walk_tree(PowerPCCPU *cpu, 
vaddr eaddr,
  raddr, psize, fault_cause, pte_addr);
 }
 
+static bool validate_pate(PowerPCCPU *cpu, uint64_t lpid, ppc_v3_pate_t *pate)
+{
+CPUPPCState *env = >env;
+
+if (!(pate->dw0 & PATE0_HR)) {
+return false;
+}
+if (lpid == 0 && !msr_hv) {
+return false;
+}
+/* More checks ... */
+return true;
+}
+
 int ppc_radix64_handle_mmu_fault(PowerPCCPU *cpu, vaddr eaddr, int rwx,
  int mmu_idx)
 {
 CPUState *cs = CPU(cpu);
 CPUPPCState *env = >env;
-PPCVirtualHypervisorClass *vhc =
-PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
+PPCVirtualHypervisorClass *vhc;
 hwaddr raddr, pte_addr;
 uint64_t lpid = 0, pid = 0, offset, size, prtbe0, pte;
 int page_size, prot, fault_cause = 0;
 ppc_v3_pate_t pate;
 
 assert((rwx == 0) || (rwx == 1) || (rwx == 2));
-assert(!msr_hv); /* For now there is no Radix PowerNV Support */
-assert(cpu->vhyp);
 assert(ppc64_use_proc_tbl(cpu));
 
 /* Real Mode Access */
@@ -221,7 +248,23 @@ int ppc_radix64_handle_mmu_fault(PowerPCCPU *cpu, vaddr 
eaddr, int rwx,
 }
 
 /* Get Process Table */
-vhc->get_pate(cpu->vhyp, );
+if (cpu->vhyp) {
+vhc = PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
+vhc->get_pate(cpu->vhyp, );
+} else {
+if (!ppc64_v3_get_pate(cpu, lpid, )) {
+ppc_radix64_raise_si(cpu, rwx, eaddr, DSISR_NOPTE);
+return 1;
+}
+if (!validate_pate(cpu, lpid, )) {
+ppc_radix64_raise_si(cpu, rwx, eaddr, DSISR_R_BADCONFIG);
+}
+/* We don't support guest mode yet */
+if (lpid != 0) {
+error_report("PowerNV guest support Unimplemented");
+exit(1);
+   }
+}
 
 /* Index Process Table by PID to Find Corresponding Process Table Entry */
 offset = pid * sizeof(struct prtb_entry);
@@ -256,8 +299,7 @@ hwaddr ppc_radix64_get_phys_page_debug(PowerPCCPU *cpu, 
target_ulong eaddr)
 {
 CPUState *cs = CPU(cpu);
 CPUPPCState *env = >env;
-PPCVirtualHypervisorClass *vhc =
-PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
+PPCVirtualHypervisorClass *vhc;
 hwaddr raddr, pte_addr;
 uint64_t lpid = 0, pid = 0, offset, size, prtbe0, pte;
 int page_size, fault_cause = 0;
@@ -275,7 +317,22 @@ hwaddr ppc_radix64_get_phys_page_debug(PowerPCCPU *cpu, 
target_ulong eaddr)
 }
 
 /* Get Process Table */
-vhc->get_pate(cpu->vhyp, );
+if (cpu->vhyp) {
+vhc = PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
+vhc->get_pate(cpu->vhyp, );
+} else {
+if (!ppc64_v3_get_pate(cpu, lpid, )) {
+return -1;
+}
+if (!validate_pate(cpu, lpid, )) {
+return -1;
+}
+/* We don't support guest mode yet */
+if (lpid != 0) {
+error_report("PowerNV guest support Unimplemented");
+exit(1);
+   }
+}
 
 /* Index Process Table by PID to Find Corresponding Process Table Entry */
 offset = pid * sizeof(struct prtb_entry);
-- 
2.20.1

[Qemu-devel] [PULL 19/50] target/ppc/mmu: Use LPCR:HR to chose radix vs. hash translation

2019-02-25 Thread David Gibson

From: Benjamin Herrenschmidt 

Now that LPCR:HR is set properly for SPAPR, use it for deciding
the translation type, which also works for bare metal

Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: Cédric Le Goater 
Message-Id: <20190215170029.15641-3-...@kaod.org>
Signed-off-by: David Gibson 
---
 target/ppc/mmu-book3s-v3.c | 11 ++-
 target/ppc/mmu-book3s-v3.h | 14 +-
 target/ppc/mmu_helper.c|  9 ++---
 3 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/target/ppc/mmu-book3s-v3.c b/target/ppc/mmu-book3s-v3.c
index b60df4408f..a174e7efc5 100644
--- a/target/ppc/mmu-book3s-v3.c
+++ b/target/ppc/mmu-book3s-v3.c
@@ -26,9 +26,18 @@
 int ppc64_v3_handle_mmu_fault(PowerPCCPU *cpu, vaddr eaddr, int rwx,
   int mmu_idx)
 {
-if (ppc64_radix_guest(cpu)) { /* Guest uses radix */
+if (ppc64_v3_radix(cpu)) { /* Guest uses radix */
 return ppc_radix64_handle_mmu_fault(cpu, eaddr, rwx, mmu_idx);
 } else { /* Guest uses hash */
 return ppc_hash64_handle_mmu_fault(cpu, eaddr, rwx, mmu_idx);
 }
 }
+
+hwaddr ppc64_v3_get_phys_page_debug(PowerPCCPU *cpu, vaddr eaddr)
+{
+if (ppc64_v3_radix(cpu)) {
+return ppc_radix64_get_phys_page_debug(cpu, eaddr);
+} else {
+return ppc_hash64_get_phys_page_debug(cpu, eaddr);
+}
+}
diff --git a/target/ppc/mmu-book3s-v3.h b/target/ppc/mmu-book3s-v3.h
index fdf80987d7..41b7715862 100644
--- a/target/ppc/mmu-book3s-v3.h
+++ b/target/ppc/mmu-book3s-v3.h
@@ -43,14 +43,18 @@ static inline bool ppc64_use_proc_tbl(PowerPCCPU *cpu)
 return !!(cpu->env.spr[SPR_LPCR] & LPCR_UPRT);
 }
 
-static inline bool ppc64_radix_guest(PowerPCCPU *cpu)
+/*
+ * The LPCR:HR bit is a shortcut that avoids having to
+ * dig out the partition table in the fast path. This is
+ * also how the HW uses it.
+ */
+static inline bool ppc64_v3_radix(PowerPCCPU *cpu)
 {
-PPCVirtualHypervisorClass *vhc =
-PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
-
-return !!(vhc->get_patbe(cpu->vhyp) & PATBE1_GR);
+return !!(cpu->env.spr[SPR_LPCR] & LPCR_HR);
 }
 
+hwaddr ppc64_v3_get_phys_page_debug(PowerPCCPU *cpu, vaddr eaddr);
+
 int ppc64_v3_handle_mmu_fault(PowerPCCPU *cpu, vaddr eaddr, int rwx,
   int mmu_idx);
 
diff --git a/target/ppc/mmu_helper.c b/target/ppc/mmu_helper.c
index bcf19da61d..4a6be4d63b 100644
--- a/target/ppc/mmu_helper.c
+++ b/target/ppc/mmu_helper.c
@@ -1342,7 +1342,7 @@ void dump_mmu(FILE *f, fprintf_function cpu_fprintf, 
CPUPPCState *env)
 dump_slb(f, cpu_fprintf, ppc_env_get_cpu(env));
 break;
 case POWERPC_MMU_3_00:
-if (ppc64_radix_guest(ppc_env_get_cpu(env))) {
+if (ppc64_v3_radix(ppc_env_get_cpu(env))) {
 /* TODO - Unsupported */
 } else {
 dump_slb(f, cpu_fprintf, ppc_env_get_cpu(env));
@@ -1489,12 +1489,7 @@ hwaddr ppc_cpu_get_phys_page_debug(CPUState *cs, vaddr 
addr)
 case POWERPC_MMU_2_07:
 return ppc_hash64_get_phys_page_debug(cpu, addr);
 case POWERPC_MMU_3_00:
-if (ppc64_radix_guest(ppc_env_get_cpu(env))) {
-return ppc_radix64_get_phys_page_debug(cpu, addr);
-} else {
-return ppc_hash64_get_phys_page_debug(cpu, addr);
-}
-break;
+return ppc64_v3_get_phys_page_debug(cpu, addr);
 #endif
 
 case POWERPC_MMU_32B:
-- 
2.20.1

[Qemu-devel] [PULL 23/50] target/ppc: Add basic support for "new format" HPTE as found on POWER9

2019-02-25 Thread David Gibson

From: Benjamin Herrenschmidt 

POWER9 (arch v3) slightly changes the HPTE format. The B bits move
from the first to the second half of the HPTE, and the AVPN/ARPN
are slightly shorter.

However, under SPAPR, the hypercalls still take the old format
(and probably will for the foreseable future).

The simplest way to support this is thus to convert the HPTEs from
new to old format when reading them if the MMU model is v3 and there
is no virtual hypervisor, leaving the rest of the code unchanged.

Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: Cédric Le Goater 
Message-Id: <20190215170029.15641-8-...@kaod.org>
[dwg: Moved function to .c since there was no real need for it in the .h]
Signed-off-by: David Gibson 
---
 target/ppc/mmu-hash64.c | 17 +
 target/ppc/mmu-hash64.h |  5 +
 2 files changed, 22 insertions(+)

diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
index fbefe5b5aa..3c057a8c70 100644
--- a/target/ppc/mmu-hash64.c
+++ b/target/ppc/mmu-hash64.c
@@ -490,6 +490,18 @@ static unsigned hpte_page_shift(const 
PPCHash64SegmentPageSizes *sps,
 return 0; /* Bad page size encoding */
 }
 
+static void ppc64_v3_new_to_old_hpte(target_ulong *pte0, target_ulong *pte1)
+{
+/* Insert B into pte0 */
+*pte0 = (*pte0 & HPTE64_V_COMMON_BITS) |
+((*pte1 & HPTE64_R_3_0_SSIZE_MASK) <<
+ (HPTE64_V_SSIZE_SHIFT - HPTE64_R_3_0_SSIZE_SHIFT));
+
+/* Remove B from pte1 */
+*pte1 = *pte1 & ~HPTE64_R_3_0_SSIZE_MASK;
+}
+
+
 static hwaddr ppc_hash64_pteg_search(PowerPCCPU *cpu, hwaddr hash,
  const PPCHash64SegmentPageSizes *sps,
  target_ulong ptem,
@@ -515,6 +527,11 @@ static hwaddr ppc_hash64_pteg_search(PowerPCCPU *cpu, 
hwaddr hash,
 smp_rmb();
 pte1 = ppc_hash64_hpte1(cpu, pteg, i);
 
+/* Convert format if necessary */
+if (cpu->env.mmu_model == POWERPC_MMU_3_00 && !cpu->vhyp) {
+ppc64_v3_new_to_old_hpte(, );
+}
+
 /* This compares V, B, H (secondary) and the AVPN */
 if (HPTE64_V_COMPARE(pte0, ptem)) {
 *pshift = hpte_page_shift(sps, pte0, pte1);
diff --git a/target/ppc/mmu-hash64.h b/target/ppc/mmu-hash64.h
index f11efc9cbc..016d6b44ee 100644
--- a/target/ppc/mmu-hash64.h
+++ b/target/ppc/mmu-hash64.h
@@ -102,6 +102,11 @@ void ppc_hash64_filter_pagesizes(PowerPCCPU *cpu,
 #define HPTE64_V_1TB_SEG0x4000ULL
 #define HPTE64_V_VRMA_MASK  0x4001ff00ULL
 
+/* Format changes for ARCH v3 */
+#define HPTE64_V_COMMON_BITS0x000fULL
+#define HPTE64_R_3_0_SSIZE_SHIFT 58
+#define HPTE64_R_3_0_SSIZE_MASK (3ULL << HPTE64_R_3_0_SSIZE_SHIFT)
+
 static inline hwaddr ppc_hash64_hpt_base(PowerPCCPU *cpu)
 {
 if (cpu->vhyp) {
-- 
2.20.1

[Qemu-devel] [PULL 17/50] tests/device-plug: Add memory unplug request test for spapr

2019-02-25 Thread David Gibson

From: David Hildenbrand 

We can easily test this, just like PCI. On x86 ACPI, we need guest
interaction to make it work, so it is not that easy to test. We might
add tests for that later on.

Reviewed-by: Michael S. Tsirkin 
Reviewed-by: Greg Kurz 
Reviewed-by: Thomas Huth 
Signed-off-by: David Hildenbrand 
Message-Id: <20190218092202.26683-7-da...@redhat.com>
Signed-off-by: David Gibson 
---
 tests/device-plug-test.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/tests/device-plug-test.c b/tests/device-plug-test.c
index 0262ad6be6..87593d9ecf 100644
--- a/tests/device-plug-test.c
+++ b/tests/device-plug-test.c
@@ -116,6 +116,22 @@ static void test_spapr_cpu_unplug_request(void)
 qtest_quit(qtest);
 }
 
+static void test_spapr_memory_unplug_request(void)
+{
+QTestState *qtest;
+
+qtest = qtest_initf("-m 256M,slots=1,maxmem=768M "
+"-object memory-backend-ram,id=mem0,size=512M "
+"-device pc-dimm,id=dev0,memdev=mem0");
+
+/* similar to test_pci_unplug_request */
+device_del_request(qtest, "dev0");
+system_reset(qtest);
+wait_device_deleted_event(qtest, "dev0");
+
+qtest_quit(qtest);
+}
+
 int main(int argc, char **argv)
 {
 const char *arch = qtest_get_arch();
@@ -138,6 +154,8 @@ int main(int argc, char **argv)
 if (!strcmp(arch, "ppc64")) {
 qtest_add_func("/device-plug/spapr-cpu-unplug-request",
test_spapr_cpu_unplug_request);
+qtest_add_func("/device-plug/spapr-memory-unplug-request",
+   test_spapr_memory_unplug_request);
 }
 
 return g_test_run();
-- 
2.20.1

[Qemu-devel] [PULL 08/50] target/ppc: Add Hypervisor Virtualization Interrupt on POWER9

2019-02-25 Thread David Gibson

From: Benjamin Herrenschmidt 

This adds support for delivering that exception

Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: Cédric Le Goater 
Reviewed-by: David Gibson 
Message-Id: <20190215161648.9600-9-...@kaod.org>
Signed-off-by: David Gibson 
---
 target/ppc/cpu.h|  5 -
 target/ppc/excp_helper.c| 17 -
 target/ppc/translate_init.inc.c | 16 +++-
 3 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index d2364564a0..7d37d85ac5 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -160,8 +160,10 @@ enum {
 /* Server doorbell variants */
 POWERPC_EXCP_SDOOR= 99,
 POWERPC_EXCP_SDOOR_HV = 100,
+/* ISA 3.00 additions */
+POWERPC_EXCP_HVIRT= 101,
 /* EOL   */
-POWERPC_EXCP_NB   = 101,
+POWERPC_EXCP_NB   = 102,
 /* QEMU exceptions: used internally during code translation  */
 POWERPC_EXCP_STOP = 0x200, /* stop translation   */
 POWERPC_EXCP_BRANCH   = 0x201, /* branch instruction */
@@ -2349,6 +2351,7 @@ enum {
 PPC_INTERRUPT_PERFM,  /* Performance monitor interrupt*/
 PPC_INTERRUPT_HMI,/* Hypervisor Maintainance interrupt*/
 PPC_INTERRUPT_HDOORBELL,  /* Hypervisor Doorbell interrupt*/
+PPC_INTERRUPT_HVIRT,  /* Hypervisor virtualization interrupt  */
 };
 
 /* Processor Compatibility mask (PCR) */
diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 1a2f469a5f..d171a5eb62 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -97,6 +97,9 @@ static int powerpc_reset_wakeup(CPUState *cs, CPUPPCState 
*env, int excp,
 case POWERPC_EXCP_HV_MAINT:
 *msr |= 0xaull << (63 - 45);
 break;
+case POWERPC_EXCP_HVIRT:
+*msr |= 0x9ull << (63 - 45);
+break;
 default:
 cpu_abort(cs, "Unsupported exception %d in Power Save mode\n",
   excp);
@@ -427,6 +430,7 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int 
excp_model, int excp)
 case POWERPC_EXCP_HISEG: /* Hypervisor instruction segment exception */
 case POWERPC_EXCP_SDOOR_HV:  /* Hypervisor Doorbell interrupt*/
 case POWERPC_EXCP_HV_EMU:
+case POWERPC_EXCP_HVIRT: /* Hypervisor virtualization*/
 srr0 = SPR_HSRR0;
 srr1 = SPR_HSRR1;
 new_msr |= (target_ulong)MSR_HVB;
@@ -809,7 +813,18 @@ static void ppc_hw_interrupt(CPUPPCState *env)
 return;
 }
 }
-/* Extermal interrupt can ignore MSR:EE under some circumstances */
+
+/* Hypervisor virtualization interrupt */
+if (env->pending_interrupts & (1 << PPC_INTERRUPT_HVIRT)) {
+/* LPCR will be clear when not supported so this will work */
+bool hvice = !!(env->spr[SPR_LPCR] & LPCR_HVICE);
+if ((async_deliver || msr_hv == 0) && hvice) {
+powerpc_excp(cpu, env->excp_model, POWERPC_EXCP_HVIRT);
+return;
+}
+}
+
+/* External interrupt can ignore MSR:EE under some circumstances */
 if (env->pending_interrupts & (1 << PPC_INTERRUPT_EXT)) {
 bool lpes0 = !!(env->spr[SPR_LPCR] & LPCR_LPES0);
 if (async_deliver || (env->has_hv_mode && msr_hv == 0 && !lpes0)) {
diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
index 9909e58761..6062163d85 100644
--- a/target/ppc/translate_init.inc.c
+++ b/target/ppc/translate_init.inc.c
@@ -3313,6 +3313,15 @@ static void init_excp_POWER8(CPUPPCState *env)
 #endif
 }
 
+static void init_excp_POWER9(CPUPPCState *env)
+{
+init_excp_POWER8(env);
+
+#if !defined(CONFIG_USER_ONLY)
+env->excp_vectors[POWERPC_EXCP_HVIRT]= 0x0EA0;
+#endif
+}
+
 #endif
 
 /*/
@@ -8783,7 +8792,7 @@ static void init_proc_POWER9(CPUPPCState *env)
 env->icache_line_size = 128;
 
 /* Allocate hardware IRQ controller */
-init_excp_POWER8(env);
+init_excp_POWER9(env);
 ppcPOWER7_irq_init(ppc_env_get_cpu(env));
 }
 
@@ -8836,6 +8845,11 @@ static bool cpu_has_work_POWER9(CPUState *cs)
 (env->spr[SPR_LPCR] & LPCR_HDEE)) {
 return true;
 }
+/* Hypervisor virtualization exception */
+if ((env->pending_interrupts & (1u << PPC_INTERRUPT_HVIRT)) &&
+(env->spr[SPR_LPCR] & LPCR_HVEE)) {
+return true;
+}
 if (env->pending_interrupts & (1u << PPC_INTERRUPT_RESET)) {
 return true;
 }
-- 
2.20.1

[Qemu-devel] [PULL 16/50] tests/device-plug: Add CPU core unplug request test for spapr

2019-02-25 Thread David Gibson

From: David Hildenbrand 

We can easily test this, just like PCI. On s390x, cpu unplug is not
supported. On x86 ACPI, cpu unplug requires guest interaction to work, so
it can't be tested that easily. We might add tests for ACPI later.

Reviewed-by: Michael S. Tsirkin 
Reviewed-by: Greg Kurz 
Reviewed-by: Thomas Huth 
Signed-off-by: David Hildenbrand 
Message-Id: <20190218092202.26683-6-da...@redhat.com>
Signed-off-by: David Gibson 
---
 tests/device-plug-test.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/tests/device-plug-test.c b/tests/device-plug-test.c
index d1a6c94af2..0262ad6be6 100644
--- a/tests/device-plug-test.c
+++ b/tests/device-plug-test.c
@@ -101,6 +101,21 @@ static void test_ccw_unplug(void)
 qtest_quit(qtest);
 }
 
+static void test_spapr_cpu_unplug_request(void)
+{
+QTestState *qtest;
+
+qtest = qtest_initf("-cpu power9_v2.0 -smp 1,maxcpus=2 "
+"-device 
power9_v2.0-spapr-cpu-core,core-id=1,id=dev0");
+
+/* similar to test_pci_unplug_request */
+device_del_request(qtest, "dev0");
+system_reset(qtest);
+wait_device_deleted_event(qtest, "dev0");
+
+qtest_quit(qtest);
+}
+
 int main(int argc, char **argv)
 {
 const char *arch = qtest_get_arch();
@@ -120,5 +135,10 @@ int main(int argc, char **argv)
test_ccw_unplug);
 }
 
+if (!strcmp(arch, "ppc64")) {
+qtest_add_func("/device-plug/spapr-cpu-unplug-request",
+   test_spapr_cpu_unplug_request);
+}
+
 return g_test_run();
 }
-- 
2.20.1

[Qemu-devel] [PULL 21/50] target/ppc: Fix #include guard in mmu-book3s-v3.h

2019-02-25 Thread David Gibson

From: Benjamin Herrenschmidt 

Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: Cédric Le Goater 
Message-Id: <20190215170029.15641-5-...@kaod.org>
Signed-off-by: David Gibson 
---
 target/ppc/mmu-book3s-v3.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/ppc/mmu-book3s-v3.h b/target/ppc/mmu-book3s-v3.h
index 41b7715862..12ec0054c2 100644
--- a/target/ppc/mmu-book3s-v3.h
+++ b/target/ppc/mmu-book3s-v3.h
@@ -17,8 +17,8 @@
  * License along with this library; if not, see .
  */
 
-#ifndef MMU_H
-#define MMU_H
+#ifndef MMU_BOOOK3S_V3_H
+#define MMU_BOOOK3S_V3_H
 
 #ifndef CONFIG_USER_ONLY
 
@@ -62,4 +62,4 @@ int ppc64_v3_handle_mmu_fault(PowerPCCPU *cpu, vaddr eaddr, 
int rwx,
 
 #endif /* CONFIG_USER_ONLY */
 
-#endif /* MMU_H */
+#endif /* MMU_BOOOK3S_V3_H */
-- 
2.20.1

[Qemu-devel] [PULL 14/50] tests/device-plug: Add a simple PCI unplug request test

2019-02-25 Thread David Gibson

From: David Hildenbrand 

The issue with testing asynchronous unplug requests it that they usually
require a running guest to handle the request. However, to test if
unplug of PCI devices works, we can apply a nice little trick on some
architectures:

On system reset, x86 ACPI, s390x and spapr will perform the unplug,
resulting in the device of interest to get deleted and a DEVICE_DELETED
event getting sent.

On s390x, we still get a warning
qemu-system-s390x: -device virtio-mouse-pci,id=dev0:
warning: Plugging a PCI/zPCI device without the 'zpci' CPU feature
enabled; the guest will not be able to see/use this device

This will be fixed soon, when we enable the zpci CPU feature always
(Conny already has a patch for this queued).

Reviewed-by: Michael S. Tsirkin 
Reviewed-by: Greg Kurz 
Reviewed-by: Thomas Huth 
Reviewed-by: Collin Walling 
Reviewed-by: David Gibson 
Signed-off-by: David Hildenbrand 
Message-Id: <20190218092202.26683-4-da...@redhat.com>
Acked-by: Cornelia Huck 
Signed-off-by: David Gibson 
---
 tests/Makefile.include   |  4 ++
 tests/device-plug-test.c | 93 
 2 files changed, 97 insertions(+)
 create mode 100644 tests/device-plug-test.c

diff --git a/tests/Makefile.include b/tests/Makefile.include
index 3741f8f6dd..b62d82beb4 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -192,6 +192,7 @@ check-qtest-i386-$(CONFIG_ISA_IPMI_KCS) += 
tests/ipmi-kcs-test$(EXESUF)
 # check-qtest-i386-$(CONFIG_ISA_IPMI_BT) += tests/ipmi-bt-test$(EXESUF)
 check-qtest-i386-y += tests/i440fx-test$(EXESUF)
 check-qtest-i386-y += tests/fw_cfg-test$(EXESUF)
+check-qtest-i386-y += tests/device-plug-test$(EXESUF)
 check-qtest-i386-y += tests/drive_del-test$(EXESUF)
 check-qtest-i386-$(CONFIG_WDT_IB700) += tests/wdt_ib700-test$(EXESUF)
 check-qtest-i386-y += tests/tco-test$(EXESUF)
@@ -256,6 +257,7 @@ check-qtest-ppc-$(CONFIG_M48T59) += 
tests/m48t59-test$(EXESUF)
 
 check-qtest-ppc64-y += $(check-qtest-ppc-y)
 check-qtest-ppc64-$(CONFIG_PSERIES) += tests/spapr-phb-test$(EXESUF)
+check-qtest-ppc64-$(CONFIG_PSERIES) += tests/device-plug-test$(EXESUF)
 check-qtest-ppc64-$(CONFIG_POWERNV) += tests/pnv-xscom-test$(EXESUF)
 check-qtest-ppc64-y += tests/migration-test$(EXESUF)
 check-qtest-ppc64-$(CONFIG_PSERIES) += tests/rtas-test$(EXESUF)
@@ -310,6 +312,7 @@ check-qtest-s390x-$(CONFIG_SLIRP) += 
tests/test-netfilter$(EXESUF)
 check-qtest-s390x-$(CONFIG_POSIX) += tests/test-filter-mirror$(EXESUF)
 check-qtest-s390x-$(CONFIG_POSIX) += tests/test-filter-redirector$(EXESUF)
 check-qtest-s390x-y += tests/drive_del-test$(EXESUF)
+check-qtest-s390x-y += tests/device-plug-test$(EXESUF)
 check-qtest-s390x-y += tests/virtio-ccw-test$(EXESUF)
 check-qtest-s390x-y += tests/cpu-plug-test$(EXESUF)
 check-qtest-s390x-y += tests/migration-test$(EXESUF)
@@ -750,6 +753,7 @@ tests/ipoctal232-test$(EXESUF): tests/ipoctal232-test.o
 tests/qom-test$(EXESUF): tests/qom-test.o
 tests/test-hmp$(EXESUF): tests/test-hmp.o
 tests/machine-none-test$(EXESUF): tests/machine-none-test.o
+tests/device-plug-test$(EXESUF): tests/device-plug-test.o
 tests/drive_del-test$(EXESUF): tests/drive_del-test.o $(libqos-virtio-obj-y)
 tests/nvme-test$(EXESUF): tests/nvme-test.o $(libqos-pc-obj-y)
 tests/pvpanic-test$(EXESUF): tests/pvpanic-test.o
diff --git a/tests/device-plug-test.c b/tests/device-plug-test.c
new file mode 100644
index 00..cd9ada539d
--- /dev/null
+++ b/tests/device-plug-test.c
@@ -0,0 +1,93 @@
+/*
+ * QEMU device plug/unplug handling
+ *
+ * Copyright (C) 2019 Red Hat Inc.
+ *
+ * Authors:
+ *  David Hildenbrand 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "libqtest.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/qmp/qstring.h"
+
+static void device_del_request(QTestState *qtest, const char *id)
+{
+QDict *resp;
+
+resp = qtest_qmp(qtest,
+ "{'execute': 'device_del', 'arguments': { 'id': %s } }",
+ id);
+g_assert(qdict_haskey(resp, "return"));
+qobject_unref(resp);
+}
+
+static void system_reset(QTestState *qtest)
+{
+QDict *resp;
+
+resp = qtest_qmp(qtest, "{'execute': 'system_reset'}");
+g_assert(qdict_haskey(resp, "return"));
+qobject_unref(resp);
+}
+
+static void wait_device_deleted_event(QTestState *qtest, const char *id)
+{
+QDict *resp, *data;
+QString *qstr;
+
+/*
+ * Other devices might get removed along with the removed device. Skip
+ * these. The device of interest will be the last one.
+ */
+for (;;) {
+resp = qtest_qmp_eventwait_ref(qtest, "DEVICE_DELETED");
+data = qdict_get_qdict(resp, "data");
+if (!data || !qdict_get(data, "device")) {
+qobject_unref(resp);
+continue;
+}
+qstr = qobject_to(QString, qdict_get(data, "device"));
+g_assert(qstr);
+

[Qemu-devel] [PULL 05/50] target/ppc: Rename "in_pm_state" to "resume_as_sreset"

2019-02-25 Thread David Gibson

From: Benjamin Herrenschmidt 

To better reflect what this does, as it's specific to some of the
P7/P8/P9 PM states, not generic.

Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: Cédric Le Goater 
Reviewed-by: David Gibson 
Message-Id: <20190215161648.9600-6-...@kaod.org>
Signed-off-by: David Gibson 
---
 hw/ppc/ppc.c | 2 +-
 target/ppc/cpu.h | 6 +++---
 target/ppc/excp_helper.c | 8 
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index cffdc3914a..12439dbe5d 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -776,7 +776,7 @@ static inline void cpu_ppc_hdecr_excp(PowerPCCPU *cpu)
  * interrupts in a PM state. Not only they don't cause a
  * wakeup but they also get effectively discarded.
  */
-if (!env->in_pm_state) {
+if (!env->resume_as_sreset) {
 ppc_set_irq(cpu, PPC_INTERRUPT_HDECR, 1);
 }
 }
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 5b1899bfc9..d2364564a0 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1117,10 +1117,10 @@ struct CPUPPCState {
 
 /*
  * On P7/P8/P9, set when in PM state, we need to handle resume in
- * a special way (such as routing some resume causes to 0x100), so
- * flag this here.
+ * a special way (such as routing some resume causes to 0x100, ie,
+ * sreset), so flag this here.
  */
-bool in_pm_state;
+bool resume_as_sreset;
 #endif
 
 /* Those resources are used only during code translation */
diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 489a54f51b..7536620a41 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -69,7 +69,7 @@ static int powerpc_reset_wakeup(CPUState *cs, CPUPPCState 
*env, int excp,
 target_ulong *msr)
 {
 /* We no longer are in a PM state */
-env->in_pm_state = false;
+env->resume_as_sreset = false;
 
 /* Pretend to be returning from doze always as we don't lose state */
 *msr |= (0x1ull << (63 - 47));
@@ -141,7 +141,7 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int 
excp_model, int excp)
  * check for special resume at 0x100 from doze/nap/sleep/winkle on
  * P7/P8/P9
  */
-if (env->in_pm_state) {
+if (env->resume_as_sreset) {
 excp = powerpc_reset_wakeup(cs, env, excp, );
 }
 
@@ -787,7 +787,7 @@ static void ppc_hw_interrupt(CPUPPCState *env)
  * clear when coming out of some power management states (in order
  * for them to become a 0x100).
  */
-async_deliver = (msr_ee != 0) || env->in_pm_state;
+async_deliver = (msr_ee != 0) || env->resume_as_sreset;
 
 /* Hypervisor decrementer exception */
 if (env->pending_interrupts & (1 << PPC_INTERRUPT_HDECR)) {
@@ -970,7 +970,7 @@ void helper_pminsn(CPUPPCState *env, powerpc_pm_insn_t insn)
 env->pending_interrupts &= ~(1 << PPC_INTERRUPT_HDECR);
 
 /* Condition for waking up at 0x100 */
-env->in_pm_state = (insn != PPC_PM_STOP) ||
+env->resume_as_sreset = (insn != PPC_PM_STOP) ||
 (env->spr[SPR_PSSCR] & PSSCR_EC);
 }
 #endif /* defined(TARGET_PPC64) */
-- 
2.20.1

[Qemu-devel] [PULL 12/50] cpus: Properly release the iothread lock when killing a dummy VCPU

2019-02-25 Thread David Gibson

From: David Hildenbrand 

This enables CPU unplug under qtest.

Reviewed-by: Michael S. Tsirkin 
Reviewed-by: Greg Kurz 
Reviewed-by: Thomas Huth 
Reviewed-by: David Gibson 
Signed-off-by: David Hildenbrand 
Message-Id: <20190218092202.26683-2-da...@redhat.com>
Signed-off-by: David Gibson 
---
 cpus.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/cpus.c b/cpus.c
index 154daf57dc..e83f72b48b 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1333,6 +1333,7 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
 qemu_wait_io_event(cpu);
 } while (!cpu->unplug);
 
+qemu_mutex_unlock_iothread();
 rcu_unregister_thread();
 return NULL;
 #endif
-- 
2.20.1

[Qemu-devel] [PULL 11/50] ppc: add host-serial and host-model machine attributes (CVE-2019-8934)

2019-02-25 Thread David Gibson

From: Prasad J Pandit 

On ppc hosts, hypervisor shares following system attributes

  - /proc/device-tree/system-id
  - /proc/device-tree/model

with a guest. This could lead to information leakage and misuse.[*]
Add machine attributes to control such system information exposure
to a guest.

[*] https://wiki.openstack.org/wiki/OSSN/OSSN-0028

Reported-by: Daniel P. Berrangé 
Fix-suggested-by: Daniel P. Berrangé 
Signed-off-by: Prasad J Pandit 
Message-Id: <20190218181349.23885-1-ppan...@redhat.com>
Reviewed-by: Daniel P. Berrangé 
Reviewed-by: Greg Kurz 
Signed-off-by: David Gibson 
---
 hw/ppc/spapr.c | 76 ++
 include/hw/ppc/spapr.h |  2 ++
 2 files changed, 72 insertions(+), 6 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index abf9ebce59..b3631e22c4 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1247,13 +1247,30 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr)
  * Add info to guest to indentify which host is it being run on
  * and what is the uuid of the guest
  */
-if (kvmppc_get_host_model()) {
-_FDT(fdt_setprop_string(fdt, 0, "host-model", buf));
-g_free(buf);
+if (spapr->host_model && !g_str_equal(spapr->host_model, "none")) {
+if (g_str_equal(spapr->host_model, "passthrough")) {
+/* -M host-model=passthrough */
+if (kvmppc_get_host_model()) {
+_FDT(fdt_setprop_string(fdt, 0, "host-model", buf));
+g_free(buf);
+}
+} else {
+/* -M host-model= */
+_FDT(fdt_setprop_string(fdt, 0, "host-model", spapr->host_model));
+}
 }
-if (kvmppc_get_host_serial()) {
-_FDT(fdt_setprop_string(fdt, 0, "host-serial", buf));
-g_free(buf);
+
+if (spapr->host_serial && !g_str_equal(spapr->host_serial, "none")) {
+if (g_str_equal(spapr->host_serial, "passthrough")) {
+/* -M host-serial=passthrough */
+if (kvmppc_get_host_serial()) {
+_FDT(fdt_setprop_string(fdt, 0, "host-serial", buf));
+g_free(buf);
+}
+} else {
+/* -M host-serial= */
+_FDT(fdt_setprop_string(fdt, 0, "host-serial", 
spapr->host_serial));
+}
 }
 
 buf = qemu_uuid_unparse_strdup(_uuid);
@@ -3144,6 +3161,36 @@ static void spapr_set_ic_mode(Object *obj, const char 
*value, Error **errp)
 }
 }
 
+static char *spapr_get_host_model(Object *obj, Error **errp)
+{
+sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
+
+return g_strdup(spapr->host_model);
+}
+
+static void spapr_set_host_model(Object *obj, const char *value, Error **errp)
+{
+sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
+
+g_free(spapr->host_model);
+spapr->host_model = g_strdup(value);
+}
+
+static char *spapr_get_host_serial(Object *obj, Error **errp)
+{
+sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
+
+return g_strdup(spapr->host_serial);
+}
+
+static void spapr_set_host_serial(Object *obj, const char *value, Error **errp)
+{
+sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
+
+g_free(spapr->host_serial);
+spapr->host_serial = g_strdup(value);
+}
+
 static void spapr_instance_init(Object *obj)
 {
 sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
@@ -3189,6 +3236,17 @@ static void spapr_instance_init(Object *obj)
 object_property_set_description(obj, "ic-mode",
  "Specifies the interrupt controller mode (xics, xive, dual)",
  NULL);
+
+object_property_add_str(obj, "host-model",
+spapr_get_host_model, spapr_set_host_model,
+_abort);
+object_property_set_description(obj, "host-model",
+"Set host's model-id to use - none|passthrough|string", _abort);
+object_property_add_str(obj, "host-serial",
+spapr_get_host_serial, spapr_set_host_serial,
+_abort);
+object_property_set_description(obj, "host-serial",
+"Set host's system-id to use - none|passthrough|string", _abort);
 }
 
 static void spapr_machine_finalizefn(Object *obj)
@@ -4086,9 +4144,15 @@ DEFINE_SPAPR_MACHINE(4_0, "4.0", true);
 static void spapr_machine_3_1_class_options(MachineClass *mc)
 {
 sPAPRMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
+static GlobalProperty compat[] = {
+{ TYPE_SPAPR_MACHINE, "host-model", "passthrough" },
+{ TYPE_SPAPR_MACHINE, "host-serial", "passthrough" },
+};
 
 spapr_machine_4_0_class_options(mc);
 compat_props_add(mc->compat_props, hw_compat_3_1, hw_compat_3_1_len);
+compat_props_add(mc->compat_props, compat, G_N_ELEMENTS(compat));
+
 mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power8_v2.0");
 smc->update_dt_enabled = false;
 }
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 631fc5103b..fec0f26f49 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -177,6 +177,8 @@ struct sPAPRMachineState {
 
 /*< public >*/

[Qemu-devel] [PULL 13/50] spapr: support memory unplug for qtest

2019-02-25 Thread David Gibson

From: David Hildenbrand 

Fake availability of OV5_HP_EVT, so we can test memory unplug in qtest.

Reviewed-by: Michael S. Tsirkin 
Reviewed-by: Greg Kurz 
Acked-by: David Gibson 
Signed-off-by: David Hildenbrand 
Message-Id: <20190218092202.26683-3-da...@redhat.com>
Signed-off-by: David Gibson 
---
 hw/ppc/spapr_ovec.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/ppc/spapr_ovec.c b/hw/ppc/spapr_ovec.c
index 318bf33de4..12510b236a 100644
--- a/hw/ppc/spapr_ovec.c
+++ b/hw/ppc/spapr_ovec.c
@@ -16,6 +16,7 @@
 #include "qemu/bitmap.h"
 #include "exec/address-spaces.h"
 #include "qemu/error-report.h"
+#include "sysemu/qtest.h"
 #include "trace.h"
 #include 
 
@@ -131,6 +132,11 @@ bool spapr_ovec_test(sPAPROptionVector *ov, long bitnr)
 g_assert(ov);
 g_assert(bitnr < OV_MAXBITS);
 
+/* support memory unplug for qtest */
+if (qtest_enabled() && bitnr == OV5_HP_EVT) {
+return true;
+}
+
 return test_bit(bitnr, ov->bitmap) ? true : false;
 }
 
-- 
2.20.1

[Qemu-devel] [PULL 07/50] target/ppc: Detect erroneous condition in interrupt delivery

2019-02-25 Thread David Gibson

From: Benjamin Herrenschmidt 

It's very easy for the CPU specific has_work() implementation
and the logic in ppc_hw_interrupt() to be subtly out of sync.

This can occasionally allow a CPU to wakeup from a PM state
and resume executing past the PM instruction when it should
resume at the 0x100 vector.

This detects if it happens and aborts, making it a lot easier
to catch such bugs when testing rather than chasing obscure
guest misbehaviour.

Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: Cédric Le Goater 
Reviewed-by: David Gibson 
Message-Id: <20190215161648.9600-8-...@kaod.org>
Signed-off-by: David Gibson 
---
 target/ppc/excp_helper.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 37546bb0f0..1a2f469a5f 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -878,6 +878,22 @@ static void ppc_hw_interrupt(CPUPPCState *env)
 return;
 }
 }
+
+if (env->resume_as_sreset) {
+/*
+ * This is a bug ! It means that has_work took us out of halt without
+ * anything to deliver while in a PM state that requires getting
+ * out via a 0x100
+ *
+ * This means we will incorrectly execute past the power management
+ * instruction instead of triggering a reset.
+ *
+ * It generally means a discrepancy between the wakup conditions in the
+ * processor has_work implementation and the logic in this function.
+ */
+cpu_abort(CPU(ppc_env_get_cpu(env)),
+  "Wakeup from PM state but interrupt Undelivered");
+}
 }
 
 void ppc_cpu_do_system_reset(CPUState *cs)
-- 
2.20.1

[Qemu-devel] [PULL 10/50] target/ppc: Add support for LPCR:HEIC on POWER9

2019-02-25 Thread David Gibson

From: Benjamin Herrenschmidt 

This controls whether the External Interrupt (0x500) can be
delivered to the hypervisor or not.

Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: Cédric Le Goater 
Reviewed-by: David Gibson 
Message-Id: <20190215161648.9600-11-...@kaod.org>
Signed-off-by: David Gibson 
---
 target/ppc/excp_helper.c| 5 -
 target/ppc/translate_init.inc.c | 5 -
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index d171a5eb62..39bedbb11d 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -827,7 +827,10 @@ static void ppc_hw_interrupt(CPUPPCState *env)
 /* External interrupt can ignore MSR:EE under some circumstances */
 if (env->pending_interrupts & (1 << PPC_INTERRUPT_EXT)) {
 bool lpes0 = !!(env->spr[SPR_LPCR] & LPCR_LPES0);
-if (async_deliver || (env->has_hv_mode && msr_hv == 0 && !lpes0)) {
+bool heic = !!(env->spr[SPR_LPCR] & LPCR_HEIC);
+/* HEIC blocks delivery to the hypervisor */
+if ((async_deliver && !(heic && msr_hv && !msr_pr)) ||
+(env->has_hv_mode && msr_hv == 0 && !lpes0)) {
 powerpc_excp(cpu, env->excp_model, POWERPC_EXCP_EXTERNAL);
 return;
 }
diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
index 9d84164915..965c5273a6 100644
--- a/target/ppc/translate_init.inc.c
+++ b/target/ppc/translate_init.inc.c
@@ -8823,7 +8823,10 @@ static bool cpu_has_work_POWER9(CPUState *cs)
 /* External Exception */
 if ((env->pending_interrupts & (1u << PPC_INTERRUPT_EXT)) &&
 (env->spr[SPR_LPCR] & LPCR_EEE)) {
-return true;
+bool heic = !!(env->spr[SPR_LPCR] & LPCR_HEIC);
+if (heic == 0 || !msr_hv || msr_pr) {
+return true;
+}
 }
 /* Decrementer Exception */
 if ((env->pending_interrupts & (1u << PPC_INTERRUPT_DECR)) &&
-- 
2.20.1

[Qemu-devel] [PULL 04/50] target/ppc: Move "wakeup reset" code to a separate function

2019-02-25 Thread David Gibson

From: Benjamin Herrenschmidt 

This moves the code to handle waking up from the 0x100 vector
from powerpc_excp() to a separate function, as the former is
already way too big as it is.

No functional change.

Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: Cédric Le Goater 
Reviewed-by: David Gibson 
Message-Id: <20190215161648.9600-5-...@kaod.org>
Signed-off-by: David Gibson 
---
 target/ppc/excp_helper.c | 75 ++--
 1 file changed, 41 insertions(+), 34 deletions(-)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 97503193ef..489a54f51b 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -65,6 +65,46 @@ static inline void dump_syscall(CPUPPCState *env)
   ppc_dump_gpr(env, 6), env->nip);
 }
 
+static int powerpc_reset_wakeup(CPUState *cs, CPUPPCState *env, int excp,
+target_ulong *msr)
+{
+/* We no longer are in a PM state */
+env->in_pm_state = false;
+
+/* Pretend to be returning from doze always as we don't lose state */
+*msr |= (0x1ull << (63 - 47));
+
+/* Machine checks are sent normally */
+if (excp == POWERPC_EXCP_MCHECK) {
+return excp;
+}
+switch (excp) {
+case POWERPC_EXCP_RESET:
+*msr |= 0x4ull << (63 - 45);
+break;
+case POWERPC_EXCP_EXTERNAL:
+*msr |= 0x8ull << (63 - 45);
+break;
+case POWERPC_EXCP_DECR:
+*msr |= 0x6ull << (63 - 45);
+break;
+case POWERPC_EXCP_SDOOR:
+*msr |= 0x5ull << (63 - 45);
+break;
+case POWERPC_EXCP_SDOOR_HV:
+*msr |= 0x3ull << (63 - 45);
+break;
+case POWERPC_EXCP_HV_MAINT:
+*msr |= 0xaull << (63 - 45);
+break;
+default:
+cpu_abort(cs, "Unsupported exception %d in Power Save mode\n",
+  excp);
+}
+return POWERPC_EXCP_RESET;
+}
+
+
 /* Note that this function should be greatly optimized
  * when called with a constant excp, from ppc_hw_interrupt
  */
@@ -102,40 +142,7 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int 
excp_model, int excp)
  * P7/P8/P9
  */
 if (env->in_pm_state) {
-env->in_pm_state = false;
-
-/* Pretend to be returning from doze always as we don't lose state */
-msr |= (0x1ull << (63 - 47));
-
-/* Non-machine check are routed to 0x100 with a wakeup cause
- * encoded in SRR1
- */
-if (excp != POWERPC_EXCP_MCHECK) {
-switch (excp) {
-case POWERPC_EXCP_RESET:
-msr |= 0x4ull << (63 - 45);
-break;
-case POWERPC_EXCP_EXTERNAL:
-msr |= 0x8ull << (63 - 45);
-break;
-case POWERPC_EXCP_DECR:
-msr |= 0x6ull << (63 - 45);
-break;
-case POWERPC_EXCP_SDOOR:
-msr |= 0x5ull << (63 - 45);
-break;
-case POWERPC_EXCP_SDOOR_HV:
-msr |= 0x3ull << (63 - 45);
-break;
-case POWERPC_EXCP_HV_MAINT:
-msr |= 0xaull << (63 - 45);
-break;
-default:
-cpu_abort(cs, "Unsupported exception %d in Power Save mode\n",
-  excp);
-}
-excp = POWERPC_EXCP_RESET;
-}
+excp = powerpc_reset_wakeup(cs, env, excp, );
 }
 
 /* Exception targetting modifiers
-- 
2.20.1

[Qemu-devel] [PULL 01/50] target/ppc: Fix nip on power management instructions

2019-02-25 Thread David Gibson

From: Benjamin Herrenschmidt 

Those instructions currently raise an exception from within
the helper. This tends to result in a bogus nip value in
the env context (typically the beginning of the TB). Such
a helper needs a gen_update_nip() first.

This fixes it with a different approach which is to throw the
exception from translate.c instead of the helper using
gen_exception_nip() which does the right thing. Exception
EXCP_HLT is also used instead of POWERPC_EXCP_STOP to effectively
exit from the CPU execution loop.

Signed-off-by: Benjamin Herrenschmidt 
[clg : modified the commit log to comment the use of EXCP_HLT instead
   of POWERPC_EXCP_STOP]
Signed-off-by: Cédric Le Goater 
Message-Id: <20190215161648.9600-2-...@kaod.org>
Signed-off-by: David Gibson 
---
 target/ppc/excp_helper.c |  1 -
 target/ppc/translate.c   | 12 
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 751d759fcc..8407e0ade9 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -958,7 +958,6 @@ void helper_pminsn(CPUPPCState *env, powerpc_pm_insn_t insn)
  * but this doesn't seem to be a problem.
  */
 env->msr |= (1ull << MSR_EE);
-raise_exception(env, EXCP_HLT);
 }
 #endif /* defined(TARGET_PPC64) */
 
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index f4d70e725a..bffdbd9687 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3566,7 +3566,8 @@ static void gen_doze(DisasContext *ctx)
 t = tcg_const_i32(PPC_PM_DOZE);
 gen_helper_pminsn(cpu_env, t);
 tcg_temp_free_i32(t);
-gen_stop_exception(ctx);
+/* Stop translation, as the CPU is supposed to sleep from now */
+gen_exception_nip(ctx, EXCP_HLT, ctx->base.pc_next);
 #endif /* defined(CONFIG_USER_ONLY) */
 }
 
@@ -3581,7 +3582,8 @@ static void gen_nap(DisasContext *ctx)
 t = tcg_const_i32(PPC_PM_NAP);
 gen_helper_pminsn(cpu_env, t);
 tcg_temp_free_i32(t);
-gen_stop_exception(ctx);
+/* Stop translation, as the CPU is supposed to sleep from now */
+gen_exception_nip(ctx, EXCP_HLT, ctx->base.pc_next);
 #endif /* defined(CONFIG_USER_ONLY) */
 }
 
@@ -3601,7 +3603,8 @@ static void gen_sleep(DisasContext *ctx)
 t = tcg_const_i32(PPC_PM_SLEEP);
 gen_helper_pminsn(cpu_env, t);
 tcg_temp_free_i32(t);
-gen_stop_exception(ctx);
+/* Stop translation, as the CPU is supposed to sleep from now */
+gen_exception_nip(ctx, EXCP_HLT, ctx->base.pc_next);
 #endif /* defined(CONFIG_USER_ONLY) */
 }
 
@@ -3616,7 +3619,8 @@ static void gen_rvwinkle(DisasContext *ctx)
 t = tcg_const_i32(PPC_PM_RVWINKLE);
 gen_helper_pminsn(cpu_env, t);
 tcg_temp_free_i32(t);
-gen_stop_exception(ctx);
+/* Stop translation, as the CPU is supposed to sleep from now */
+gen_exception_nip(ctx, EXCP_HLT, ctx->base.pc_next);
 #endif /* defined(CONFIG_USER_ONLY) */
 }
 #endif /* #if defined(TARGET_PPC64) */
-- 
2.20.1

[Qemu-devel] [PULL 02/50] target/ppc: Don't clobber MSR:EE on PM instructions

2019-02-25 Thread David Gibson

From: Benjamin Herrenschmidt 

When issuing a power management instruction, we set MSR:EE
to force ppc_hw_interrupt() into calling powerpc_excp()
to deal with the fact that on P7 and P8, the system reset
caused by the wakeup needs to be generated regardless of
the MSR:EE value (using LPCR only).

This however means that the OS will see a bogus SRR1:EE
value which is a problem. It also prevents properly
implementing P9 STOP "light".

So fix this by instead putting some logic in ppc_hw_interrupt()
to decide whether to deliver or not by taking into account the
fact that we are waking up from sleep.

The LPCR isn't checked as this is done in the has_work() test.

Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: Cédric Le Goater 
Reviewed-by: David Gibson 
Message-Id: <20190215161648.9600-3-...@kaod.org>
Signed-off-by: David Gibson 
---
 target/ppc/excp_helper.c | 27 +++
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 8407e0ade9..7c7c8d1b9d 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -748,6 +748,7 @@ void ppc_cpu_do_interrupt(CPUState *cs)
 static void ppc_hw_interrupt(CPUPPCState *env)
 {
 PowerPCCPU *cpu = ppc_env_get_cpu(env);
+bool async_deliver;
 
 /* External reset */
 if (env->pending_interrupts & (1 << PPC_INTERRUPT_RESET)) {
@@ -769,11 +770,20 @@ static void ppc_hw_interrupt(CPUPPCState *env)
 return;
 }
 #endif
+
+/*
+ * For interrupts that gate on MSR:EE, we need to do something a
+ * bit more subtle, as we need to let them through even when EE is
+ * clear when coming out of some power management states (in order
+ * for them to become a 0x100).
+ */
+async_deliver = (msr_ee != 0) || env->in_pm_state;
+
 /* Hypervisor decrementer exception */
 if (env->pending_interrupts & (1 << PPC_INTERRUPT_HDECR)) {
 /* LPCR will be clear when not supported so this will work */
 bool hdice = !!(env->spr[SPR_LPCR] & LPCR_HDICE);
-if ((msr_ee != 0 || msr_hv == 0) && hdice) {
+if ((async_deliver || msr_hv == 0) && hdice) {
 /* HDEC clears on delivery */
 env->pending_interrupts &= ~(1 << PPC_INTERRUPT_HDECR);
 powerpc_excp(cpu, env->excp_model, POWERPC_EXCP_HDECR);
@@ -783,7 +793,7 @@ static void ppc_hw_interrupt(CPUPPCState *env)
 /* Extermal interrupt can ignore MSR:EE under some circumstances */
 if (env->pending_interrupts & (1 << PPC_INTERRUPT_EXT)) {
 bool lpes0 = !!(env->spr[SPR_LPCR] & LPCR_LPES0);
-if (msr_ee != 0 || (env->has_hv_mode && msr_hv == 0 && !lpes0)) {
+if (async_deliver || (env->has_hv_mode && msr_hv == 0 && !lpes0)) {
 powerpc_excp(cpu, env->excp_model, POWERPC_EXCP_EXTERNAL);
 return;
 }
@@ -795,7 +805,7 @@ static void ppc_hw_interrupt(CPUPPCState *env)
 return;
 }
 }
-if (msr_ee != 0) {
+if (async_deliver != 0) {
 /* Watchdog timer on embedded PowerPC */
 if (env->pending_interrupts & (1 << PPC_INTERRUPT_WDT)) {
 env->pending_interrupts &= ~(1 << PPC_INTERRUPT_WDT);
@@ -943,21 +953,14 @@ void helper_pminsn(CPUPPCState *env, powerpc_pm_insn_t 
insn)
 
 cs = CPU(ppc_env_get_cpu(env));
 cs->halted = 1;
-env->in_pm_state = true;
 
 /* The architecture specifies that HDEC interrupts are
  * discarded in PM states
  */
 env->pending_interrupts &= ~(1 << PPC_INTERRUPT_HDECR);
 
-/* Technically, nap doesn't set EE, but if we don't set it
- * then ppc_hw_interrupt() won't deliver. We could add some
- * other tests there based on LPCR but it's simpler to just
- * whack EE in. It will be cleared by the 0x100 at wakeup
- * anyway. It will still be observable by the guest in SRR1
- * but this doesn't seem to be a problem.
- */
-env->msr |= (1ull << MSR_EE);
+/* Condition for waking up at 0x100 */
+env->in_pm_state = true;
 }
 #endif /* defined(TARGET_PPC64) */
 
-- 
2.20.1

[Qemu-devel] [PULL 00/50] ppc-for-4.0 queue 20190226

2019-02-25 Thread David Gibson

The following changes since commit ef80b99ce7ffbd66b3efd493f4ca99f8abf59e79:

  Merge remote-tracking branch 
'remotes/stsquad/tags/pull-testing-next-220219-1' into staging (2019-02-25 
14:04:20 +)

are available in the Git repository at:

  git://github.com/dgibson/qemu.git tags/ppc-for-4.0-20190226

for you to fetch changes up to b268a6162da8ef9daa6384f24d4b95a0385081eb:

  ppc/pnv: use IEC binary prefixes to represent sizes (2019-02-26 14:20:30 
+1100)


ppc patch queue 2019-02-26

Next set of patches for ppc and spapr.  There's a lot in this one:
 * Support "STOP light" states on POWER9
 * Add support for HVI interrupts on POWER9 (powernv machine)
 * CVE-2019-8934: Don't leak host model and serial information to the guest
 * Tests and cleanups for various hot unplug options
 * Hash and radix MMU implementation on POWER9 for powernv machine
 * PCI Host Bridge hotplug support for pseries machine
 * Allow larger kernels and initrds for powernv machine

Plus a handful of miscellaneous fixes and cleanups.

The cpu hotplug tests and cleanups from David Hildenbrand aren't
solely power related.  However the consensus amongst Michael Tsirkin,
David Hildenbrand, Cornelia Huck and myself was that it made most
sense to come in via my tree.


Benjamin Herrenschmidt (21):
  target/ppc: Fix nip on power management instructions
  target/ppc: Don't clobber MSR:EE on PM instructions
  target/ppc: Fix support for "STOP light" states on POWER9
  target/ppc: Move "wakeup reset" code to a separate function
  target/ppc: Rename "in_pm_state" to "resume_as_sreset"
  target/ppc: Add POWER9 exception model
  target/ppc: Detect erroneous condition in interrupt delivery
  target/ppc: Add Hypervisor Virtualization Interrupt on POWER9
  target/ppc: Add POWER9 external interrupt model
  target/ppc: Add support for LPCR:HEIC on POWER9
  target/ppc/spapr: Set LPCR:HR when using Radix mode
  target/ppc/mmu: Use LPCR:HR to chose radix vs. hash translation
  target/ppc: Re-enable RMLS on POWER9 for virtual hypervisors
  target/ppc: Fix #include guard in mmu-book3s-v3.h
  target/ppc: Fix ordering of hash MMU accesses
  target/ppc: Add basic support for "new format" HPTE as found on POWER9
  target/ppc: Fix synchronization of mttcg with broadcast TLB flushes
  target/ppc: Flush the TLB locally when the LPIDR is written
  target/ppc: Rename PATB/PATBE -> PATE
  target/ppc: Support for POWER9 native hash
  target/ppc: Basic POWER9 bare-metal radix MMU support

Cédric Le Goater (1):
  ppc/xive: xive does not have a POWER7 interrupt model

David Hildenbrand (6):
  cpus: Properly release the iothread lock when killing a dummy VCPU
  spapr: support memory unplug for qtest
  tests/device-plug: Add a simple PCI unplug request test
  tests/device-plug: Add CCW unplug test for s390x
  tests/device-plug: Add CPU core unplug request test for spapr
  tests/device-plug: Add memory unplug request test for spapr

Greg Kurz (11):
  spapr_drc: Allow FDT fragment to be added later
  spapr: Generate FDT fragment for LMBs at configure connector time
  spapr: Generate FDT fragment for CPUs at configure connector time
  spapr/pci: Generate FDT fragment at configure connector time
  spapr/drc: Drop spapr_drc_attach() fdt argument
  xics: Write source state to KVM at claim time
  spapr: Expose the name of the interrupt controller node
  spapr_irq: Expose the phandle of the interrupt controller
  spapr_pci: add PHB unrealize
  spapr: add hotplug hooks for PHB hotplug
  tests/device-plug: Add PHB unplug request test for spapr

Michael Roth (5):
  spapr: create DR connectors for PHBs
  spapr_events: add support for phb hotplug events
  spapr_pci: provide node start offset via spapr_populate_pci_dt()
  spapr_pci: add ibm, my-drc-index property for PHB hotplug
  spapr: enable PHB hotplug for default pseries machine type

Murilo Opsfelder Araujo (3):
  ppc/pnv: increase kernel size limit to 256MiB
  ppc/pnv: add INITRD_MAX_SIZE constant
  ppc/pnv: use IEC binary prefixes to represent sizes

Nathan Fontenot (1):
  spapr: populate PHB DRC entries for root DT node

Prasad J Pandit (1):
  ppc: add host-serial and host-model machine attributes (CVE-2019-8934)

Thomas Huth (1):
  hw/ppc: Use object_initialize_child for correct reference counting

 cpus.c  |   1 +
 hw/intc/spapr_xive.c|  20 +-
 hw/intc/xics.c  |   7 +
 hw/intc/xics_kvm.c  |  74 +---
 hw/intc/xics_spapr.c|   2 +-
 hw/intc/xive.c  |   4 +-
 hw/ppc/pnv.c|  22 ++-
 hw/ppc/pnv_psi.c|   4 +-
 hw/ppc/ppc.c|  44 -
 hw/ppc/spapr.c

[Qemu-devel] [PULL 03/50] target/ppc: Fix support for "STOP light" states on POWER9

2019-02-25 Thread David Gibson

From: Benjamin Herrenschmidt 

STOP must act differently based on PSSCR:EC on POWER9. When set, it
acts like the P7/P8 power management instructions and wake up at 0x100
based on the wakeup conditions in LPCR.

When PSSCR:EC is clear however it will wakeup at the next instruction
after STOP (if EE is clear) or take the corresponding interrupts (if
EE is set).

Signed-off-by: Benjamin Herrenschmidt 
Signed-off-by: Cédric Le Goater 
Reviewed-by: David Gibson 
Message-Id: <20190215161648.9600-4-...@kaod.org>
Signed-off-by: David Gibson 
---
 target/ppc/cpu-qom.h|  1 +
 target/ppc/cpu.h| 12 +---
 target/ppc/excp_helper.c|  8 ++--
 target/ppc/translate.c  | 13 -
 target/ppc/translate_init.inc.c |  7 +++
 5 files changed, 35 insertions(+), 6 deletions(-)

diff --git a/target/ppc/cpu-qom.h b/target/ppc/cpu-qom.h
index 3130802304..e9cb158423 100644
--- a/target/ppc/cpu-qom.h
+++ b/target/ppc/cpu-qom.h
@@ -122,6 +122,7 @@ typedef enum {
 PPC_PM_NAP,
 PPC_PM_SLEEP,
 PPC_PM_RVWINKLE,
+PPC_PM_STOP,
 } powerpc_pm_insn_t;
 
 /*/
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 325ebbeb98..5b1899bfc9 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -414,6 +414,10 @@ struct ppc_slb_t {
 #define LPCR_HVICEPPC_BIT(62) /* HV Virtualisation Int Enable */
 #define LPCR_HDICEPPC_BIT(63)
 
+/* PSSCR bits */
+#define PSSCR_ESL PPC_BIT(42) /* Enable State Loss */
+#define PSSCR_EC  PPC_BIT(43) /* Exit Criterion */
+
 #define msr_sf   ((env->msr >> MSR_SF)   & 1)
 #define msr_isf  ((env->msr >> MSR_ISF)  & 1)
 #define msr_shv  ((env->msr >> MSR_SHV)  & 1)
@@ -1110,9 +1114,11 @@ struct CPUPPCState {
  * instructions and SPRs are diallowed if MSR:HV is 0
  */
 bool has_hv_mode;
-/* On P7/P8, set when in PM state, we need to handle resume
- * in a special way (such as routing some resume causes to
- * 0x100), so flag this here.
+
+/*
+ * On P7/P8/P9, set when in PM state, we need to handle resume in
+ * a special way (such as routing some resume causes to 0x100), so
+ * flag this here.
  */
 bool in_pm_state;
 #endif
diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 7c7c8d1b9d..97503193ef 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -97,7 +97,10 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int 
excp_model, int excp)
 asrr0 = -1;
 asrr1 = -1;
 
-/* check for special resume at 0x100 from doze/nap/sleep/winkle on P7/P8 */
+/*
+ * check for special resume at 0x100 from doze/nap/sleep/winkle on
+ * P7/P8/P9
+ */
 if (env->in_pm_state) {
 env->in_pm_state = false;
 
@@ -960,7 +963,8 @@ void helper_pminsn(CPUPPCState *env, powerpc_pm_insn_t insn)
 env->pending_interrupts &= ~(1 << PPC_INTERRUPT_HDECR);
 
 /* Condition for waking up at 0x100 */
-env->in_pm_state = true;
+env->in_pm_state = (insn != PPC_PM_STOP) ||
+(env->spr[SPR_PSSCR] & PSSCR_EC);
 }
 #endif /* defined(TARGET_PPC64) */
 
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index bffdbd9687..fde7ead7b7 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3589,7 +3589,18 @@ static void gen_nap(DisasContext *ctx)
 
 static void gen_stop(DisasContext *ctx)
 {
-gen_nap(ctx);
+#if defined(CONFIG_USER_ONLY)
+GEN_PRIV;
+#else
+TCGv_i32 t;
+
+CHK_HV;
+t = tcg_const_i32(PPC_PM_STOP);
+gen_helper_pminsn(cpu_env, t);
+tcg_temp_free_i32(t);
+/* Stop translation, as the CPU is supposed to sleep from now */
+gen_exception_nip(ctx, EXCP_HLT, ctx->base.pc_next);
+#endif /* defined(CONFIG_USER_ONLY) */
 }
 
 static void gen_sleep(DisasContext *ctx)
diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
index d884906004..8b1d324b3b 100644
--- a/target/ppc/translate_init.inc.c
+++ b/target/ppc/translate_init.inc.c
@@ -8801,9 +8801,16 @@ static bool cpu_has_work_POWER9(CPUState *cs)
 CPUPPCState *env = >env;
 
 if (cs->halted) {
+uint64_t psscr = env->spr[SPR_PSSCR];
+
 if (!(cs->interrupt_request & CPU_INTERRUPT_HARD)) {
 return false;
 }
+
+/* If EC is clear, just return true on any pending interrupt */
+if (!(psscr & PSSCR_EC)) {
+return true;
+}
 /* External Exception */
 if ((env->pending_interrupts & (1u << PPC_INTERRUPT_EXT)) &&
 (env->spr[SPR_LPCR] & LPCR_EEE)) {
-- 
2.20.1

Re: [Qemu-devel] [PATCH v2 13/13] spapr/xive: fix device hotplug when VM is stopped

2019-02-25 Thread David Gibson

On Fri, Feb 22, 2019 at 02:13:22PM +0100, Cédric Le Goater wrote:
> Instead of switching off the sources, set their state to PENDING to
> possibly catch a hotplug event occuring while the VM is stopped. At
> resume, check the previous state and if an interrupt was queued,
> generate a trigger.

First, I think it would be better to fold this fix into the patch
introducing the state change handlers.

Second, IIUC this would handle any instance of an irq being triggered
while the VM is stopped.  Hotplug interrupts is one obvious case of
that, but I'm not sure its the only one.  VFIO devices could interrupt
while the VM is stopped, I think.  Maybe even emulated devices
depending on how their synchronization with the cpu run state works.
There might be other cases.  Does that sound right to you?

> Signed-off-by: Cédric Le Goater 
> ---
>  hw/intc/spapr_xive_kvm.c | 22 +++---
>  1 file changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
> index 99a829fb3f60..64d160babb26 100644
> --- a/hw/intc/spapr_xive_kvm.c
> +++ b/hw/intc/spapr_xive_kvm.c
> @@ -500,8 +500,16 @@ static void kvmppc_xive_change_state_handler(void 
> *opaque, int running,
>  if (running) {
>  for (i = 0; i < xsrc->nr_irqs; i++) {
>  uint8_t pq = xive_source_esb_get(xsrc, i);
> -if (xive_esb_read(xsrc, i, XIVE_ESB_SET_PQ_00 + (pq << 8)) != 
> 0x1) {
> -error_report("XIVE: IRQ %d has an invalid state", i);
> +uint8_t old_pq;
> +
> +old_pq = xive_esb_read(xsrc, i, XIVE_ESB_SET_PQ_00 + (pq << 8));
> +
> +/*
> + * If an interrupt was queued (hotplug event) while VM was
> + * stopped, generate a trigger.
> + */
> +if (pq == XIVE_ESB_RESET && old_pq == XIVE_ESB_QUEUED) {
> +xive_esb_trigger(xsrc, i);
>  }
>  }
>  
> @@ -515,7 +523,15 @@ static void kvmppc_xive_change_state_handler(void 
> *opaque, int running,
>   * migration is in progress.
>   */
>  for (i = 0; i < xsrc->nr_irqs; i++) {
> -uint8_t pq = xive_esb_read(xsrc, i, XIVE_ESB_SET_PQ_01);
> +uint8_t pq = xive_esb_read(xsrc, i, XIVE_ESB_GET);
> +
> +/*
> + * PQ is set to PENDING to possibly catch a hotplug event
> + * occuring while the VM is stopped.
> + */
> +if (pq != XIVE_ESB_OFF) {
> +pq = xive_esb_read(xsrc, i, XIVE_ESB_SET_PQ_10);
> +}
>  xive_source_esb_set(xsrc, i, pq);
>  }
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [QEMU-PPC] [PATCH 3/4] target/ppc: Implement large decrementer support for KVM

2019-02-25 Thread David Gibson

On Tue, Feb 26, 2019 at 02:05:30PM +1100, Suraj Jitindar Singh wrote:
> Implement support to allow KVM guests to take advantage of the large
> decrementer introduced on POWER9 cpus.
> 
> To determine if the host can support the requested large decrementer
> size, we check it matches that specified in the ibm,dec-bits device-tree
> property. We also need to enable it in KVM by setting the LPCR_LD bit in
> the LPCR. Note that to do this we need to try and set the bit, then read
> it back to check the host allowed us to set it, if so we can use it but
> if we were unable to set it the host cannot support it and we must not
> use the large decrementer.
> 
> Signed-off-by: Suraj Jitindar Singh 
> Signed-off-by: Cédric Le Goater 

Reviewed-by: David Gibson 

Although changes might be necessary to match it to things I've
suggested in the earlier patches in the series.

Is the KVM side support for this already merged?  If so, as of when?

> ---
>  hw/ppc/spapr_caps.c  | 17 +++--
>  target/ppc/kvm.c | 39 +++
>  target/ppc/kvm_ppc.h | 12 
>  3 files changed, 66 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
> index 44542fdbb2..e07568fb94 100644
> --- a/hw/ppc/spapr_caps.c
> +++ b/hw/ppc/spapr_caps.c
> @@ -440,8 +440,16 @@ static void cap_large_decr_apply(sPAPRMachineState 
> *spapr,
>  pcc->hdecr_bits);
>  return;
>  }
> -} else {
> -error_setg(errp, "No large decrementer support, try 
> cap-large-decr=0");
> +} else if (kvm_enabled()) {
> +int kvm_nr_bits = kvmppc_get_cap_large_decr();
> +
> +if (!kvm_nr_bits) {
> +error_setg(errp, "No large decrementer support, try 
> cap-large-decr=0");
> +} else if (val != kvm_nr_bits) {
> +error_setg(errp,
> +"Large decrementer size unsupported, try -cap-large-decr=%d",
> +kvm_nr_bits);
> +}
>  }
>  }
>  
> @@ -452,6 +460,11 @@ static void cap_large_decr_cpu_apply(sPAPRMachineState 
> *spapr,
>  CPUPPCState *env = >env;
>  target_ulong lpcr = env->spr[SPR_LPCR];
>  
> +if (kvm_enabled()) {
> +if (kvmppc_enable_cap_large_decr(cpu, !!val))
> +error_setg(errp, "No large decrementer support, try 
> cap-large-decr=0");
> +}
> +
>  if (val)
>  lpcr |= LPCR_LD;
>  else
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index d01852fe31..3f650c8fc4 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -91,6 +91,7 @@ static int cap_ppc_safe_cache;
>  static int cap_ppc_safe_bounds_check;
>  static int cap_ppc_safe_indirect_branch;
>  static int cap_ppc_nested_kvm_hv;
> +static int cap_large_decr;
>  
>  static uint32_t debug_inst_opcode;
>  
> @@ -124,6 +125,7 @@ static bool kvmppc_is_pr(KVMState *ks)
>  
>  static int kvm_ppc_register_host_cpu_type(MachineState *ms);
>  static void kvmppc_get_cpu_characteristics(KVMState *s);
> +static int kvmppc_get_dec_bits(void);
>  
>  int kvm_arch_init(MachineState *ms, KVMState *s)
>  {
> @@ -151,6 +153,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>  cap_resize_hpt = kvm_vm_check_extension(s, KVM_CAP_SPAPR_RESIZE_HPT);
>  kvmppc_get_cpu_characteristics(s);
>  cap_ppc_nested_kvm_hv = kvm_vm_check_extension(s, KVM_CAP_PPC_NESTED_HV);
> +cap_large_decr = kvmppc_get_dec_bits();
>  /*
>   * Note: setting it to false because there is not such capability
>   * in KVM at this moment.
> @@ -1927,6 +1930,15 @@ uint64_t kvmppc_get_clockfreq(void)
>  return kvmppc_read_int_cpu_dt("clock-frequency");
>  }
>  
> +static int kvmppc_get_dec_bits(void)
> +{
> +int nr_bits = kvmppc_read_int_cpu_dt("ibm,dec-bits");
> +
> +if (nr_bits > 0)
> +return nr_bits;
> +return 0;
> +}
> +
>  static int kvmppc_get_pvinfo(CPUPPCState *env, struct kvm_ppc_pvinfo *pvinfo)
>   {
>   PowerPCCPU *cpu = ppc_env_get_cpu(env);
> @@ -2442,6 +2454,33 @@ bool kvmppc_has_cap_spapr_vfio(void)
>  return cap_spapr_vfio;
>  }
>  
> +int kvmppc_get_cap_large_decr(void)
> +{
> +return cap_large_decr;
> +}
> +
> +int kvmppc_enable_cap_large_decr(PowerPCCPU *cpu, int enable)
> +{
> +CPUState *cs = CPU(cpu);
> +uint64_t lpcr;
> +
> +kvm_get_one_reg(cs, KVM_REG_PPC_LPCR_64, );
> +/* Do we need to modify the LPCR? */
> +if (!!(lpcr & LPCR_LD) != !!enable) {
> +if (enable)
> +lpcr |= LPCR_LD;
> +else
> +lpcr &= ~LPCR_LD;
> +kvm_set_one_reg(cs, KVM_REG_PPC_LPCR_64, );
> +kvm_get_one_reg(cs, KVM_REG_PPC_LPCR_64, );
> +
> +if (!!(lpcr & LPCR_LD) != !!enable)
> +return -1;
> +}
> +
> +return 0;
> +}
> +
>  PowerPCCPUClass *kvm_ppc_get_host_cpu_class(void)
>  {
>  uint32_t host_pvr = mfpvr();
> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
> index bdfaa4e70a..a79835bd14 100644
> --- a/target/ppc/kvm_ppc.h
>

Re: [Qemu-devel] [Qemu-ppc] [PATCH v2] spapr-rtas: add ibm, get-vpd RTAS interface

2019-02-25 Thread David Gibson

On Mon, Feb 25, 2019 at 08:20:09PM -0300, Murilo Opsfelder Araujo wrote:
> Hi, Maxiwell.
> 
> On Mon, Feb 25, 2019 at 01:23:25PM -0300, Maxiwell S. Garcia wrote:
> > This adds a handler for ibm,get-vpd RTAS calls, allowing pseries
> > guest to collect host information. It is disabled by default to
> > avoid unwanted information leakage. To enable it, use:
> > ‘-M pseries,vpd-export=on’
> 
> The patch for setting host-serial and host-model already landed Gibson's
> ppc-for-4.0 branch:
> 
>   commit 9e584f45868f6945c1282c938278038cba0e4af2
>   Author: Prasad J Pandit 
>   Date:   Mon Feb 18 23:43:49 2019 +0530
> 
>   ppc: add host-serial and host-model machine attributes (CVE-2019-8934)
> 
> 
> QEMU should only return host-serial and host-model from the host if the
> following combination of parameters are provided:
> 
>   -M host-serial=passthrough,host-model=passthrough,vpd-export=on
> 
> If host-serial or host-model are set with a user-string, ibm,get-vpd should
> honor these values and return them, not exposing host information by accident.
> 
> I'm not even sure if we need vpd-export= setting. Its logic could be
> derived from the presence of host-serial=passthrough and 
> host-model=passthrough
> options.
> 
> What do you think?

That's an excellent point - I hadn't thought through the fact that
this is the same information exposed by those properties.  I do indeed
think that exposing the same information set in those properties - and
thereby avoiding the new machine option - would be a better plan.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [QEMU-PPC] [PATCH 2/4] target/ppc: Implement large decrementer support for TCG

2019-02-25 Thread David Gibson

On Tue, Feb 26, 2019 at 02:05:29PM +1100, Suraj Jitindar Singh wrote:
> Prior to POWER9 the decrementer was a 32-bit register which decremented
> with each tick of the timebase. From POWER9 onwards the decrementer can
> be set to operate in a mode called large decrementer where it acts as a
> n-bit decrementing register which is visible as a 64-bit register, that
> is the value of the decrementer is sign extended to 64 bits (where n is
> implementation dependant).
> 
> The mode in which the decrementer operates is controlled by the LPCR_LD
> bit in the logical paritition control register (LPCR).
> 
> >From POWER9 onwards the HDEC (hypervisor decrementer) was enlarged to
> h-bits, also sign extended to 64 bits (where h is implementation
> dependant). Note this isn't configurable and is always enabled.
> 
> For TCG we allow the user to configure a custom large decrementer size,
> so long as it's at least 32 and less than the size of the HDEC (the
> restrictions imposed by the ISA).
> 
> Signed-off-by: Suraj Jitindar Singh 
> Signed-off-by: Cédric Le Goater 
> ---
>  hw/ppc/ppc.c| 78 
> -
>  hw/ppc/spapr.c  |  8 +
>  hw/ppc/spapr_caps.c | 38 +++-
>  target/ppc/cpu-qom.h|  1 +
>  target/ppc/cpu.h| 11 +++---
>  target/ppc/mmu-hash64.c |  2 +-
>  target/ppc/translate.c  |  2 +-
>  target/ppc/translate_init.inc.c |  1 +
>  8 files changed, 109 insertions(+), 32 deletions(-)
> 
> diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
> index d1e3d4cd20..853afeed6a 100644
> --- a/hw/ppc/ppc.c
> +++ b/hw/ppc/ppc.c
> @@ -744,10 +744,10 @@ bool ppc_decr_clear_on_delivery(CPUPPCState *env)
>  return ((tb_env->flags & flags) == PPC_DECR_UNDERFLOW_TRIGGERED);
>  }
>  
> -static inline uint32_t _cpu_ppc_load_decr(CPUPPCState *env, uint64_t next)
> +static inline uint64_t _cpu_ppc_load_decr(CPUPPCState *env, uint64_t next)

Hrm.  Given how we use this - and how muldiv64 works, wouldn't it make
more sense to have it return int64_t (i.e. signed).

>  {
>  ppc_tb_t *tb_env = env->tb_env;
> -uint32_t decr;
> +uint64_t decr;
>  int64_t diff;
>  
>  diff = next - qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
> @@ -758,27 +758,42 @@ static inline uint32_t _cpu_ppc_load_decr(CPUPPCState 
> *env, uint64_t next)
>  }  else {
>  decr = -muldiv64(-diff, tb_env->decr_freq, NANOSECONDS_PER_SECOND);
>  }
> -LOG_TB("%s: %08" PRIx32 "\n", __func__, decr);
> +LOG_TB("%s: %016" PRIx64 "\n", __func__, decr);
>  
>  return decr;
>  }
>  
> -uint32_t cpu_ppc_load_decr (CPUPPCState *env)
> +target_ulong cpu_ppc_load_decr (CPUPPCState *env)
>  {
>  ppc_tb_t *tb_env = env->tb_env;
> +uint64_t decr;
>  
>  if (kvm_enabled()) {
>  return env->spr[SPR_DECR];
>  }
>  
> -return _cpu_ppc_load_decr(env, tb_env->decr_next);
> +decr = _cpu_ppc_load_decr(env, tb_env->decr_next);
> +
> +/*
> + * If large decrementer is enabled then the decrementer is signed extened
> + * to 64 bits, otherwise it is a 32 bit value.
> + */
> +if (env->spr[SPR_LPCR] & LPCR_LD)
> +return decr;
> +return (uint32_t) decr;
>  }
>  
> -uint32_t cpu_ppc_load_hdecr (CPUPPCState *env)
> +target_ulong cpu_ppc_load_hdecr (CPUPPCState *env)
>  {
>  ppc_tb_t *tb_env = env->tb_env;
> +uint64_t decr;
>  
> -return _cpu_ppc_load_decr(env, tb_env->hdecr_next);
> +decr =  _cpu_ppc_load_decr(env, tb_env->hdecr_next);
> +
> +/* If POWER9 or later then hdecr is sign extended to 64 bits otherwise 
> 32 */
> +if (env->mmu_model & POWERPC_MMU_3_00)

You already have a pcc->hdecr_bits.  Wouldn't it make more sense to
check against that than the cpu model directly?

> +return decr;
> +return (uint32_t) decr;
>  }
>  
>  uint64_t cpu_ppc_load_purr (CPUPPCState *env)
> @@ -832,13 +847,21 @@ static void __cpu_ppc_store_decr(PowerPCCPU *cpu, 
> uint64_t *nextp,
>   QEMUTimer *timer,
>   void (*raise_excp)(void *),
>   void (*lower_excp)(PowerPCCPU *),
> - uint32_t decr, uint32_t value)
> + target_ulong decr, target_ulong value,
> + int nr_bits)
>  {
>  CPUPPCState *env = >env;
>  ppc_tb_t *tb_env = env->tb_env;
>  uint64_t now, next;
> +bool negative;
> +
> +/* Truncate value to decr_width and sign extend for simplicity */
> +value &= ((1ULL << nr_bits) - 1);
> +negative = !!(value & (1ULL << (nr_bits - 1)));

Could you simplify this by just using
negative = (int64_t)decr < 0;
before you mask?  Or would that be wrong in some edge case or other?

> +if (negative)
> +value |= (0xULL << nr_bits);
>  
> -LOG_TB("%s: %08" PRIx32 " => %08" PRIx32 "\n", __func__,
> +LOG_TB("%s: "

Re: [Qemu-devel] [QEMU-PPC] [PATCH 4/4] target/ppc/spapr: Enable the large decrementer by default on POWER9

2019-02-25 Thread David Gibson

On Tue, Feb 26, 2019 at 02:05:31PM +1100, Suraj Jitindar Singh wrote:
> Enable the large decrementer by default on POWER9 cpu models. The
> default value applied is that provided in the cpu class.
> 
> Signed-off-by: Suraj Jitindar Singh 
> ---
>  hw/ppc/spapr_caps.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
> index e07568fb94..f48aa367e3 100644
> --- a/hw/ppc/spapr_caps.c
> +++ b/hw/ppc/spapr_caps.c
> @@ -566,11 +566,18 @@ sPAPRCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>  static sPAPRCapabilities default_caps_with_cpu(sPAPRMachineState *spapr,
> const char *cputype)
>  {
> +PowerPCCPUClass *pcc = POWERPC_CPU_CLASS(object_class_by_name(cputype));
>  sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
>  sPAPRCapabilities caps;
>  
>  caps = smc->default_caps;
>  
> +caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = pcc->hdecr_bits;

So, the rule with default_caps_with_cpu() is that it can reduce the
value from the machine-global default, but never increase it (because
that could change guest visible behaviour for existing machine
versions).

I think the line above will do that.

> +if (!ppc_type_check_compat(cputype, CPU_POWERPC_LOGICAL_3_00,
> +   0, spapr->max_compat_pvr)) {
> +caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = 0;

And this part I think is redundant, because hdecr_bits won't be large
for anything pre-POWER9.

> +}
> +
>  if (!ppc_type_check_compat(cputype, CPU_POWERPC_LOGICAL_2_07,
> 0, spapr->max_compat_pvr)) {
>  caps.caps[SPAPR_CAP_HTM] = SPAPR_CAP_OFF;

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [QEMU-PPC] [PATCH 1/4] target/ppc/spapr: Add SPAPR_CAP_LARGE_DECREMENTER

2019-02-25 Thread David Gibson

On Tue, Feb 26, 2019 at 02:05:28PM +1100, Suraj Jitindar Singh wrote:
> Add spapr_cap SPAPR_CAP_LARGE_DECREMENTER to be used to control the
> availability and size of the large decrementer made available to the
> guest.
> 
> Signed-off-by: Suraj Jitindar Singh 
> ---
>  hw/ppc/spapr.c |  2 ++
>  hw/ppc/spapr_caps.c| 45 +
>  include/hw/ppc/spapr.h |  5 -
>  3 files changed, 51 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index b6a571b6f1..acf62a2b9f 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2077,6 +2077,7 @@ static const VMStateDescription vmstate_spapr = {
>  _spapr_irq_map,
>  _spapr_cap_nested_kvm_hv,
>  _spapr_dtb,
> +_spapr_cap_large_decr,
>  NULL
>  }
>  };
> @@ -4288,6 +4289,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
> void *data)
>  smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_BROKEN;
>  smc->default_caps.caps[SPAPR_CAP_HPT_MAXPAGESIZE] = 16; /* 64kiB */
>  smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
> +smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = 0;

This looks basically fine, but the name kind of suggests it's a
boolean, whereas it's actually a number of bits.  I wonder if just
calling it "decrementer bits" would be clearer, with it defaulting to
32.

>  spapr_caps_add_properties(smc, _abort);
>  smc->irq = _irq_xics;
>  smc->dr_phb_enabled = true;
> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
> index 64f98ae68d..1545a02729 100644
> --- a/hw/ppc/spapr_caps.c
> +++ b/hw/ppc/spapr_caps.c
> @@ -182,6 +182,34 @@ static void spapr_cap_set_pagesize(Object *obj, Visitor 
> *v, const char *name,
>  spapr->eff.caps[cap->index] = val;
>  }
>  
> +static void spapr_cap_get_uint8(Object *obj, Visitor *v, const char *name,
> +void *opaque, Error **errp)
> +{
> +sPAPRCapabilityInfo *cap = opaque;
> +sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> +uint8_t val = spapr_get_cap(spapr, cap->index);
> +
> +visit_type_uint8(v, name, , errp);
> +}
> +
> +static void spapr_cap_set_uint8(Object *obj, Visitor *v, const char *name,
> +void *opaque, Error **errp)
> +{
> +sPAPRCapabilityInfo *cap = opaque;
> +sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> +Error *local_err = NULL;
> +uint8_t val;
> +
> +visit_type_uint8(v, name, , _err);
> +if (local_err) {
> +error_propagate(errp, local_err);
> +return;
> +}
> +
> +spapr->cmd_line_caps[cap->index] = true;
> +spapr->eff.caps[cap->index] = val;
> +}
> +
>  static void cap_htm_apply(sPAPRMachineState *spapr, uint8_t val, Error 
> **errp)
>  {
>  if (!val) {
> @@ -390,6 +418,13 @@ static void cap_nested_kvm_hv_apply(sPAPRMachineState 
> *spapr,
>  }
>  }
>  
> +static void cap_large_decr_apply(sPAPRMachineState *spapr,
> + uint8_t val, Error **errp)
> +{
> +if (val)
> +error_setg(errp, "No large decrementer support, try 
> cap-large-decr=0");
> +}
> +
>  sPAPRCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>  [SPAPR_CAP_HTM] = {
>  .name = "htm",
> @@ -468,6 +503,15 @@ sPAPRCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
>  .type = "bool",
>  .apply = cap_nested_kvm_hv_apply,
>  },
> +[SPAPR_CAP_LARGE_DECREMENTER] = {
> +.name = "large-decr",
> +.description = "Size of Large Decrementer for the Guest (bits) 
> 0=disabled",
> +.index = SPAPR_CAP_LARGE_DECREMENTER,
> +.get = spapr_cap_get_uint8,
> +.set = spapr_cap_set_uint8,
> +.type = "int",
> +.apply = cap_large_decr_apply,
> +},
>  };
>  
>  static sPAPRCapabilities default_caps_with_cpu(sPAPRMachineState *spapr,
> @@ -596,6 +640,7 @@ SPAPR_CAP_MIG_STATE(cfpc, SPAPR_CAP_CFPC);
>  SPAPR_CAP_MIG_STATE(sbbc, SPAPR_CAP_SBBC);
>  SPAPR_CAP_MIG_STATE(ibs, SPAPR_CAP_IBS);
>  SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NESTED_KVM_HV);
> +SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
>  
>  void spapr_caps_init(sPAPRMachineState *spapr)
>  {
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 59073a7579..8efc5e0779 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -74,8 +74,10 @@ typedef enum {
>  #define SPAPR_CAP_HPT_MAXPAGESIZE   0x06
>  /* Nested KVM-HV */
>  #define SPAPR_CAP_NESTED_KVM_HV 0x07
> +/* Large Decrementer */
> +#define SPAPR_CAP_LARGE_DECREMENTER 0x08
>  /* Num Caps */
> -#define SPAPR_CAP_NUM   (SPAPR_CAP_NESTED_KVM_HV + 1)
> +#define SPAPR_CAP_NUM   (SPAPR_CAP_LARGE_DECREMENTER + 1)
>  
>  /*
>   * Capability Values
> @@ -828,6 +830,7 @@ extern const VMStateDescription vmstate_spapr_cap_cfpc;
>  extern const VMStateDescription vmstate_spapr_cap_sbbc;
>  extern const

Re: [Qemu-devel] [PATCH v5] i386, acpi: check acpi_memory_hotplug capacity in pre_plug

2019-02-25 Thread Wei Yang

On Mon, Feb 25, 2019 at 09:15:34AM +0800, Wei Yang wrote:
>On Mon, Feb 25, 2019 at 09:07:08AM +0800, Wei Yang wrote:
>>Currently we do device realization like below:
>>
>>   hotplug_handler_pre_plug()
>>   dc->realize()
>>   hotplug_handler_plug()
>>
>>Before we do device realization and plug, we should allocate necessary
>>resources and check if memory-hotplug-support property is enabled.
>>
>>At the piix4 and ich9, the memory-hotplug-support property is checked at
>>plug stage. This means that device has been realized and mapped into guest
>>address space 'pc_dimm_plug()' by the time acpi plug handler is called,
>>where it might fail and crash QEMU due to reaching g_assert_not_reached()
>>(piix4) or error_abort (ich9).
>>
>>Fix it by checking if memory hotplug is enabled at pre_plug stage
>>where we can gracefully abort hotplug request.
>>
>>Signed-off-by: Wei Yang 
>>CC: Igor Mammedov 
>>CC: Eric Blake 
>>Signed-off-by: Wei Yang 
>>
>>---
>>v5:
>>   * rebase on latest upstream
>>   * remove a comment before hotplug_handler_pre_plug
>>   * fix alignment for ich9_pm_device_pre_plug_cb
>>v4:
>>   * fix code alignment of piix4_device_pre_plug_cb
>>v3:
>>   * replace acpi_memory_hotplug with memory-hotplug-support in changelog
>>   * fix code alignment of ich9_pm_device_pre_plug_cb
>>   * print which device type memory-hotplug-support is disabled in
>> ich9_pm_device_pre_plug_cb and piix4_device_pre_plug_cb
>>v2:
>>   * (imamm...@redhat.com)
>> - Almost the whole third paragraph
>>   * apply this change to ich9
>>   * use hotplug_handler_pre_plug() instead of open-coding check
>>---
>> hw/acpi/ich9.c | 15 +--
>> hw/acpi/piix4.c| 13 ++---
>> hw/i386/pc.c   |  2 ++
>> hw/isa/lpc_ich9.c  |  1 +
>> include/hw/acpi/ich9.h |  2 ++
>> 5 files changed, 28 insertions(+), 5 deletions(-)
>>
>>diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
>>index c5d8646abc..e53dfe1ee3 100644
>>--- a/hw/acpi/ich9.c
>>+++ b/hw/acpi/ich9.c
>>@@ -483,13 +483,24 @@ void ich9_pm_add_properties(Object *obj, ICH9LPCPMRegs 
>>*pm, Error **errp)
>>  NULL);
>> }
>> 
>>+void ich9_pm_device_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState 
>>*dev,
>>+Error **errp)
>>+{
>>+ICH9LPCState *lpc = ICH9_LPC_DEVICE(hotplug_dev);
>>+
>>+if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) &&
>>+!lpc->pm.acpi_memory_hotplug.is_enabled)
>>+error_setg(errp,
>>+   "memory hotplug is not enabled: %s.memory-hotplug-support 
>>"
>>+   "is not set", object_get_typename(OBJECT(lpc)));
>>+}
>>+
>> void ich9_pm_device_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
>> Error **errp)
>> {
>> ICH9LPCState *lpc = ICH9_LPC_DEVICE(hotplug_dev);
>> 
>>-if (lpc->pm.acpi_memory_hotplug.is_enabled &&
>>-object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
>>+if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
>> if (object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM)) {
>> nvdimm_acpi_plug_cb(hotplug_dev, dev);
>> } else {
>>diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
>>index df8c0db909..8b9654e437 100644
>>--- a/hw/acpi/piix4.c
>>+++ b/hw/acpi/piix4.c
>>@@ -374,9 +374,16 @@ static void piix4_pm_powerdown_req(Notifier *n, void 
>>*opaque)
>> static void piix4_device_pre_plug_cb(HotplugHandler *hotplug_dev,
>> DeviceState *dev, Error **errp)
>

Where will we invoke this check for pci devices?

pc_machine_device_pre_plug_cb() seems has no connection with this function. So
how this pre_plug handler takes effect?

Do I miss something?


-- 
Wei Yang
Help you, Help me

[Qemu-devel] [Bug 1816052] Re: qemu system emulator fails to start if no sound card is present on host

2019-02-25 Thread Like Xu

I would work around this issue
by applying "export QEMU_AUDIO_DRV=none" to shell before we run the qemu 
command.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1816052

Title:
  qemu system emulator fails to start if no sound card is present on
  host

Status in QEMU:
  New

Bug description:
  A plain build from git master at
  81dbcfa9e1d8bab3f7c4cc923c0b40cd666f374f on Fedora 29 x86_64 host,
  with no options passed to configure.

  Trying to launch QEMU on a  host with no audio card present:

  # ls /dev/snd/
  seq  timer

  It will fail to initialize alsa and abort startup:

  # qemu-system-x86_64 -cdrom Fedora-Workstation-Live-x86_64-29-1.2.iso  -m 
4000 -vnc 0.0.0.0:1 
  ALSA lib confmisc.c:767:(parse_card) cannot find card '0'
  ALSA lib conf.c:4555:(_snd_config_evaluate) function snd_func_card_driver 
returned error: No such file or directory
  ALSA lib confmisc.c:392:(snd_func_concat) error evaluating strings
  ALSA lib conf.c:4555:(_snd_config_evaluate) function snd_func_concat returned 
error: No such file or directory
  ALSA lib confmisc.c:1246:(snd_func_refer) error evaluating name
  ALSA lib conf.c:4555:(_snd_config_evaluate) function snd_func_refer returned 
error: No such file or directory
  ALSA lib conf.c:5034:(snd_config_expand) Evaluate error: No such file or 
directory
  ALSA lib pcm.c:2565:(snd_pcm_open_noupdate) Unknown PCM default
  alsa: Could not initialize DAC
  alsa: Failed to open `default':
  alsa: Reason: No such file or directory
  ALSA lib confmisc.c:767:(parse_card) cannot find card '0'
  ALSA lib conf.c:4555:(_snd_config_evaluate) function snd_func_card_driver 
returned error: No such file or directory
  ALSA lib confmisc.c:392:(snd_func_concat) error evaluating strings
  ALSA lib conf.c:4555:(_snd_config_evaluate) function snd_func_concat returned 
error: No such file or directory
  ALSA lib confmisc.c:1246:(snd_func_refer) error evaluating name
  ALSA lib conf.c:4555:(_snd_config_evaluate) function snd_func_refer returned 
error: No such file or directory
  ALSA lib conf.c:5034:(snd_config_expand) Evaluate error: No such file or 
directory
  ALSA lib pcm.c:2565:(snd_pcm_open_noupdate) Unknown PCM default
  alsa: Could not initialize DAC
  alsa: Failed to open `default':
  alsa: Reason: No such file or directory
  init fail
  audio: Failed to create voice `pcspk'
  qemu-system-x86_64: Initialization of device isa-pcspk failed: Initializing 
audio voice failed

  
  git bisect blames this change:

  
commit 6a48541873f14b597630283f8f5397674ad82ea9 (HEAD, refs/bisect/bad)
Author: Gerd Hoffmann 
Date:   Thu Jan 24 12:20:55 2019 +0100

  audio: probe audio drivers by default
  
  Add the drivers listed in audio_possible_drivers to audio_drv_list,
  using the try-* variants.  That way the probable drivers are compiled by
  default if possible.
  
  Additioal tweaks:
linux: reorder to: pa alsa sdl oss.
*bsd: drop pa.
  
  Signed-off-by: Gerd Hoffmann 
  Message-id: 20190124112055.547-7-kra...@redhat.com

  
  This changed our probe order:

 Linux)
-  audio_drv_list="oss"
+  audio_drv_list="try-pa try-alsa try-sdl oss"

  After some debugging I can see that 'audio_init' successfully
  initializes the alsa driver.

  When the pcspk devices goes to AUD_open_out though, the alsa driver
  fails spewing the above text to stderr and thus causes QEMU to fail.

  This looks very much like the ALSA driver in QEMU is broken -
  audio_init() should not have succeeded unless the ALSA driver knew it
  could later succesfully honour AUD_open_out.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1816052/+subscriptions

Re: [Qemu-devel] [PATCH v2 06/13] spapr/xive: add migration support for KVM

2019-02-25 Thread David Gibson

On Fri, Feb 22, 2019 at 02:13:15PM +0100, Cédric Le Goater wrote:
> When the VM is stopped, the VM state handler stabilizes the XIVE IC
> and marks the EQ pages dirty. These are then transferred to destination
> before the transfer of the device vmstates starts.
> 
> The sPAPRXive interrupt controller model captures the XIVE internal
> tables, EAT and ENDT and the XiveTCTX model does the same for the
> thread interrupt context registers.
> 
> At restart, the sPAPRXive 'post_load' method restores all the XIVE
> states. It is called by the sPAPR machine 'post_load' method, when all
> XIVE states have been transferred and loaded.
> 
> Finally, the source states are restored in the VM change state handler
> when the machine reaches the running state.
> 
> Signed-off-by: Cédric Le Goater 

Reviewed-by: David Gibson 

> ---
>  include/hw/ppc/spapr_xive.h |  3 ++
>  include/hw/ppc/xive.h   |  1 +
>  hw/intc/spapr_xive.c| 24 ++
>  hw/intc/spapr_xive_kvm.c| 93 -
>  hw/intc/xive.c  | 17 +++
>  hw/ppc/spapr_irq.c  |  2 +-
>  6 files changed, 138 insertions(+), 2 deletions(-)
> 
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> index 298d204d54ef..22d70650b51f 100644
> --- a/include/hw/ppc/spapr_xive.h
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -55,6 +55,7 @@ typedef struct sPAPRXive {
>  bool spapr_xive_irq_claim(sPAPRXive *xive, uint32_t lisn, bool lsi);
>  bool spapr_xive_irq_free(sPAPRXive *xive, uint32_t lisn);
>  void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
> +int spapr_xive_post_load(sPAPRXive *xive, int version_id);
>  
>  void spapr_xive_hcall_init(sPAPRMachineState *spapr);
>  void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
> @@ -83,5 +84,7 @@ void kvmppc_xive_get_queue_config(sPAPRXive *xive, uint8_t 
> end_blk,
>   uint32_t end_idx, XiveEND *end,
>   Error **errp);
>  void kvmppc_xive_synchronize_state(sPAPRXive *xive, Error **errp);
> +int kvmppc_xive_pre_save(sPAPRXive *xive);
> +int kvmppc_xive_post_load(sPAPRXive *xive, int version_id);
>  
>  #endif /* PPC_SPAPR_XIVE_H */
> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> index f3766fd881a2..3b1baa783975 100644
> --- a/include/hw/ppc/xive.h
> +++ b/include/hw/ppc/xive.h
> @@ -432,5 +432,6 @@ void kvmppc_xive_source_reset(XiveSource *xsrc, Error 
> **errp);
>  void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val);
>  void kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp);
>  void kvmppc_xive_cpu_synchronize_state(XiveTCTX *tctx, Error **errp);
> +void kvmppc_xive_cpu_get_state(XiveTCTX *tctx, Error **errp);
>  
>  #endif /* PPC_XIVE_H */
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index 9f07567f4d78..21fe5e1aa39f 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -469,10 +469,34 @@ static const VMStateDescription vmstate_spapr_xive_eas 
> = {
>  },
>  };
>  
> +static int vmstate_spapr_xive_pre_save(void *opaque)
> +{
> +if (kvm_irqchip_in_kernel()) {
> +return kvmppc_xive_pre_save(SPAPR_XIVE(opaque));
> +}
> +
> +return 0;
> +}
> +
> +/*
> + * Called by the sPAPR IRQ backend 'post_load' method at the machine
> + * level.
> + */
> +int spapr_xive_post_load(sPAPRXive *xive, int version_id)
> +{
> +if (kvm_irqchip_in_kernel()) {
> +return kvmppc_xive_post_load(xive, version_id);
> +}
> +
> +return 0;
> +}
> +
>  static const VMStateDescription vmstate_spapr_xive = {
>  .name = TYPE_SPAPR_XIVE,
>  .version_id = 1,
>  .minimum_version_id = 1,
> +.pre_save = vmstate_spapr_xive_pre_save,
> +.post_load = NULL, /* handled at the machine level */
>  .fields = (VMStateField[]) {
>  VMSTATE_UINT32_EQUAL(nr_irqs, sPAPRXive, NULL),
>  VMSTATE_STRUCT_VARRAY_POINTER_UINT32(eat, sPAPRXive, nr_irqs,
> diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
> index 44d80175b1b5..119fd59fc9ae 100644
> --- a/hw/intc/spapr_xive_kvm.c
> +++ b/hw/intc/spapr_xive_kvm.c
> @@ -15,6 +15,7 @@
>  #include "sysemu/cpus.h"
>  #include "sysemu/kvm.h"
>  #include "hw/ppc/spapr.h"
> +#include "hw/ppc/spapr_cpu_core.h"
>  #include "hw/ppc/spapr_xive.h"
>  #include "hw/ppc/xive.h"
>  #include "kvm_ppc.h"
> @@ -60,7 +61,30 @@ static void kvm_cpu_enable(CPUState *cs)
>  /*
>   * XIVE Thread Interrupt Management context (KVM)
>   */
> -static void kvmppc_xive_cpu_get_state(XiveTCTX *tctx, Error **errp)
> +
> +static void kvmppc_xive_cpu_set_state(XiveTCTX *tctx, Error **errp)
> +{
> +uint64_t state[4];
> +int ret;
> +
> +/* word0 and word1 of the OS ring. */
> +state[0] = *((uint64_t *) >regs[TM_QW1_OS]);
> +
> +/*
> + * OS CAM line. Used by KVM to print out the VP identifier. This
> + * is for debug only.
> + */
> +state[1] = *((uint64_t *) >regs[TM_QW1_OS + TM_WORD2]);
> +
> +ret

Re: [Qemu-devel] [PATCH v2 04/13] spapr/xive: add state synchronization with KVM

2019-02-25 Thread David Gibson

On Fri, Feb 22, 2019 at 02:13:13PM +0100, Cédric Le Goater wrote:
> This extends the KVM XIVE device backend with 'synchronize_state'
> methods used to retrieve the state from KVM. The HW state of the
> sources, the KVM device and the thread interrupt contexts are
> collected for the monitor usage and also migration.
> 
> These get operations rely on their KVM counterpart in the host kernel
> which acts as a proxy for OPAL, the host firmware. The set operations
> will be added for migration support later.
> 
> Signed-off-by: Cédric Le Goater 
> ---
>  include/hw/ppc/spapr_xive.h |  8 
>  include/hw/ppc/xive.h   |  1 +
>  hw/intc/spapr_xive.c| 17 ---
>  hw/intc/spapr_xive_kvm.c| 89 +
>  hw/intc/xive.c  | 10 +
>  5 files changed, 118 insertions(+), 7 deletions(-)
> 
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> index 749c6cbc2c56..ebd65e7fe36b 100644
> --- a/include/hw/ppc/spapr_xive.h
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -44,6 +44,13 @@ typedef struct sPAPRXive {
>  void  *tm_mmap;
>  } sPAPRXive;
>  
> +/*
> + * The sPAPR machine has a unique XIVE IC device. Assign a fixed value
> + * to the controller block id value. It can nevertheless be changed
> + * for testing purpose.
> + */
> +#define SPAPR_XIVE_BLOCK_ID 0x0
> +
>  bool spapr_xive_irq_claim(sPAPRXive *xive, uint32_t lisn, bool lsi);
>  bool spapr_xive_irq_free(sPAPRXive *xive, uint32_t lisn);
>  void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
> @@ -74,5 +81,6 @@ void kvmppc_xive_set_queue_config(sPAPRXive *xive, uint8_t 
> end_blk,
>  void kvmppc_xive_get_queue_config(sPAPRXive *xive, uint8_t end_blk,
>   uint32_t end_idx, XiveEND *end,
>   Error **errp);
> +void kvmppc_xive_synchronize_state(sPAPRXive *xive, Error **errp);
>  
>  #endif /* PPC_SPAPR_XIVE_H */
> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> index 061d43fea24d..f3766fd881a2 100644
> --- a/include/hw/ppc/xive.h
> +++ b/include/hw/ppc/xive.h
> @@ -431,5 +431,6 @@ void kvmppc_xive_source_reset_one(XiveSource *xsrc, int 
> srcno, Error **errp);
>  void kvmppc_xive_source_reset(XiveSource *xsrc, Error **errp);
>  void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val);
>  void kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp);
> +void kvmppc_xive_cpu_synchronize_state(XiveTCTX *tctx, Error **errp);
>  
>  #endif /* PPC_XIVE_H */
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index 3db24391e31c..9f07567f4d78 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -40,13 +40,6 @@
>  
>  #define SPAPR_XIVE_NVT_BASE 0x400
>  
> -/*
> - * The sPAPR machine has a unique XIVE IC device. Assign a fixed value
> - * to the controller block id value. It can nevertheless be changed
> - * for testing purpose.
> - */
> -#define SPAPR_XIVE_BLOCK_ID 0x0
> -
>  /*
>   * sPAPR NVT and END indexing helpers
>   */
> @@ -153,6 +146,16 @@ void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor 
> *mon)
>  XiveSource *xsrc = >source;
>  int i;
>  
> +if (kvm_irqchip_in_kernel()) {
> +Error *local_err = NULL;
> +
> +kvmppc_xive_synchronize_state(xive, _err);
> +if (local_err) {
> +error_report_err(local_err);
> +return;
> +}
> +}
> +
>  monitor_printf(mon, "  LSIN PQEISN CPU/PRIO EQ\n");
>  
>  for (i = 0; i < xive->nr_irqs; i++) {
> diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
> index 6b50451b4f85..4b1ffb9835f9 100644
> --- a/hw/intc/spapr_xive_kvm.c
> +++ b/hw/intc/spapr_xive_kvm.c
> @@ -60,6 +60,57 @@ static void kvm_cpu_enable(CPUState *cs)
>  /*
>   * XIVE Thread Interrupt Management context (KVM)
>   */
> +static void kvmppc_xive_cpu_get_state(XiveTCTX *tctx, Error **errp)
> +{
> +uint64_t state[4] = { 0 };
> +int ret;
> +
> +ret = kvm_get_one_reg(tctx->cs, KVM_REG_PPC_NVT_STATE, state);
> +if (ret != 0) {
> +error_setg_errno(errp, errno,
> + "XIVE: could not capture KVM state of CPU %ld",
> + kvm_arch_vcpu_id(tctx->cs));
> +return;
> +}
> +
> +/* word0 and word1 of the OS ring. */
> +*((uint64_t *) >regs[TM_QW1_OS]) = state[0];
> +
> +/*
> + * KVM also returns word2 containing the OS CAM line which is
> + * interesting to print out in the QEMU monitor.
> + */
> +*((uint64_t *) >regs[TM_QW1_OS + TM_WORD2]) = state[1];

As mentioned elsewhere, it is interesting for debugging, but doesn't
seem to really match the guest visible CAM state, so I'm not convinced
it's a good idea to put it into the regs[] structure.

> +}
> +
> +typedef struct {
> +XiveTCTX *tctx;
> +Error *err;
> +} XiveCpuGetState;
> +
> +static void kvmppc_xive_cpu_do_synchronize_state(CPUState *cpu,
> +

Re: [Qemu-devel] [PATCH v2 05/13] spapr/xive: introduce a VM state change handler

2019-02-25 Thread David Gibson

On Fri, Feb 22, 2019 at 02:13:14PM +0100, Cédric Le Goater wrote:
> This handler is in charge of stabilizing the flow of event notifications
> in the XIVE controller before migrating a guest. This is a requirement
> before transferring the guest EQ pages to a destination.
> 
> When the VM is stopped, the handler masks the sources (PQ=01) to stop
> the flow of events and saves their previous state. The XIVE controller
> is then synced through KVM to flush any in-flight event notification
> and to stabilize the EQs. At this stage, the EQ pages are marked dirty
> to make sure the EQ pages are transferred if a migration sequence is
> in progress.
> 
> The previous configuration of the sources is restored when the VM
> resumes, after a migration or a stop.
> 
> Signed-off-by: Cédric Le Goater 

Reviewed-by: David Gibson 

> ---
>  include/hw/ppc/spapr_xive.h |  1 +
>  hw/intc/spapr_xive_kvm.c| 77 -
>  2 files changed, 77 insertions(+), 1 deletion(-)
> 
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> index ebd65e7fe36b..298d204d54ef 100644
> --- a/include/hw/ppc/spapr_xive.h
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -42,6 +42,7 @@ typedef struct sPAPRXive {
>  /* KVM support */
>  int   fd;
>  void  *tm_mmap;
> +VMChangeStateEntry *change;
>  } sPAPRXive;
>  
>  /*
> diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
> index 4b1ffb9835f9..44d80175b1b5 100644
> --- a/hw/intc/spapr_xive_kvm.c
> +++ b/hw/intc/spapr_xive_kvm.c
> @@ -419,9 +419,81 @@ static void kvmppc_xive_get_queues(sPAPRXive *xive, 
> Error **errp)
>  }
>  }
>  
> +/*
> + * The primary goal of the XIVE VM change handler is to mark the EQ
> + * pages dirty when all XIVE event notifications have stopped.
> + *
> + * Whenever the VM is stopped, the VM change handler masks the sources
> + * (PQ=01) to stop the flow of events and saves the previous state in
> + * anticipation of a migration. The XIVE controller is then synced
> + * through KVM to flush any in-flight event notification and stabilize
> + * the EQs.
> + *
> + * At this stage, we can mark the EQ page dirty and let a migration
> + * sequence transfer the EQ pages to the destination, which is done
> + * just after the stop state.
> + *
> + * The previous configuration of the sources is restored when the VM
> + * runs again.
> + */
> +static void kvmppc_xive_change_state_handler(void *opaque, int running,
> + RunState state)
> +{
> +sPAPRXive *xive = opaque;
> +XiveSource *xsrc = >source;
> +Error *local_err = NULL;
> +int i;
> +
> +/*
> + * Restore the sources to their initial state. This is called when
> + * the VM resumes after a stop or a migration.
> + */
> +if (running) {
> +for (i = 0; i < xsrc->nr_irqs; i++) {
> +uint8_t pq = xive_source_esb_get(xsrc, i);
> +if (xive_esb_read(xsrc, i, XIVE_ESB_SET_PQ_00 + (pq << 8)) != 
> 0x1) {
> +error_report("XIVE: IRQ %d has an invalid state", i);
> +}
> +}
> +
> +return;
> +}
> +
> +/*
> + * Mask the sources, to stop the flow of event notifications, and
> + * save the PQs locally in the XiveSource object. The XiveSource
> + * state will be collected later on by its vmstate handler if a
> + * migration is in progress.
> + */
> +for (i = 0; i < xsrc->nr_irqs; i++) {
> +uint8_t pq = xive_esb_read(xsrc, i, XIVE_ESB_SET_PQ_01);
> +xive_source_esb_set(xsrc, i, pq);
> +}
> +
> +/*
> + * Sync the XIVE controller in KVM, to flush in-flight event
> + * notification that should be enqueued in the EQs and mark the
> + * XIVE EQ pages dirty to collect all updates.
> + */
> +kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_CTRL,
> +  KVM_DEV_XIVE_EQ_SYNC, NULL, true, _err);
> +if (local_err) {
> +error_report_err(local_err);
> +return;
> +}
> +}
> +
>  void kvmppc_xive_synchronize_state(sPAPRXive *xive, Error **errp)
>  {
> -kvmppc_xive_source_get_state(>source);
> +/*
> + * When the VM is stopped, the sources are masked and the previous
> + * state is saved in anticipation of a migration. We should not
> + * synchronize the source state in that case else we will override
> + * the saved state.
> + */
> +if (runstate_is_running()) {
> +kvmppc_xive_source_get_state(>source);
> +}
>  
>  /* EAT: there is no extra state to query from KVM */
>  
> @@ -501,6 +573,9 @@ void kvmppc_xive_connect(sPAPRXive *xive, Error **errp)
>"xive.tima", tima_len, xive->tm_mmap);
>  sysbus_init_mmio(SYS_BUS_DEVICE(xive), >tm_mmio);
>  
> +xive->change = qemu_add_vm_change_state_handler(
> +kvmppc_xive_change_state_handler, xive);
> +
>  kvm_kernel_irqchip = true;
>  kvm_msi_via_irqfd_allowed

Re: [Qemu-devel] [PATCH v2 10/13] spapr: introduce routines to delete the KVM IRQ device

2019-02-25 Thread David Gibson

On Fri, Feb 22, 2019 at 02:13:19PM +0100, Cédric Le Goater wrote:
> If a new interrupt mode is chosen by CAS, the machine generates a
> reset to reconfigure. At this point, the connection with the previous
> KVM device needs to be closed and a new connection needs to opened
> with the KVM device operating the chosen interrupt mode.
> 
> New routines are introduced to destroy the XICS and the XIVE KVM
> devices. They make use of a new KVM device ioctl which destroys the
> device and also disconnects the IRQ presenters from the vCPUs.
> 
> Signed-off-by: Cédric Le Goater 

Ugly, but necessary

Reviewed-by: David Gibson 

> ---
>  include/hw/ppc/spapr_xive.h |  1 +
>  include/hw/ppc/xics_spapr.h |  1 +
>  hw/intc/spapr_xive_kvm.c| 60 +
>  hw/intc/xics_kvm.c  | 56 ++
>  4 files changed, 118 insertions(+)
> 
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> index 22d70650b51f..a7c4c275a747 100644
> --- a/include/hw/ppc/spapr_xive.h
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -71,6 +71,7 @@ int spapr_xive_end_to_target(uint8_t end_blk, uint32_t 
> end_idx,
>   * KVM XIVE device helpers
>   */
>  void kvmppc_xive_connect(sPAPRXive *xive, Error **errp);
> +void kvmppc_xive_disconnect(sPAPRXive *xive, Error **errp);
>  void kvmppc_xive_reset(sPAPRXive *xive, Error **errp);
>  void kvmppc_xive_set_source_config(sPAPRXive *xive, uint32_t lisn, XiveEAS 
> *eas,
> Error **errp);
> diff --git a/include/hw/ppc/xics_spapr.h b/include/hw/ppc/xics_spapr.h
> index b8d924baf437..bddf09821cb0 100644
> --- a/include/hw/ppc/xics_spapr.h
> +++ b/include/hw/ppc/xics_spapr.h
> @@ -34,6 +34,7 @@
>  void spapr_dt_xics(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
> uint32_t phandle);
>  int xics_kvm_init(sPAPRMachineState *spapr, Error **errp);
> +int xics_kvm_disconnect(sPAPRMachineState *spapr, Error **errp);
>  void xics_spapr_init(sPAPRMachineState *spapr);
>  
>  #endif /* XICS_SPAPR_H */
> diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
> index 119fd59fc9ae..e31035c90260 100644
> --- a/hw/intc/spapr_xive_kvm.c
> +++ b/hw/intc/spapr_xive_kvm.c
> @@ -58,6 +58,16 @@ static void kvm_cpu_enable(CPUState *cs)
>  QLIST_INSERT_HEAD(_enabled_cpus, enabled_cpu, node);
>  }
>  
> +static void kvm_cpu_disable_all(void)
> +{
> +KVMEnabledCPU *enabled_cpu, *next;
> +
> +QLIST_FOREACH_SAFE(enabled_cpu, _enabled_cpus, node, next) {
> +QLIST_REMOVE(enabled_cpu, node);
> +g_free(enabled_cpu);
> +}
> +}
> +
>  /*
>   * XIVE Thread Interrupt Management context (KVM)
>   */
> @@ -674,3 +684,53 @@ void kvmppc_xive_connect(sPAPRXive *xive, Error **errp)
>  /* Map all regions */
>  spapr_xive_map_mmio(xive);
>  }
> +
> +void kvmppc_xive_disconnect(sPAPRXive *xive, Error **errp)
> +{
> +XiveSource *xsrc;
> +struct kvm_destroy_device xive_destroy_device;
> +size_t esb_len;
> +int rc;
> +
> +/* The KVM XIVE device is not in use */
> +if (!xive || xive->fd == -1) {
> +return;
> +}
> +
> +if (!kvmppc_has_cap_xive()) {
> +error_setg(errp, "IRQ_XIVE capability must be present for KVM");
> +return;
> +}
> +
> +/* Clear the KVM mapping */
> +xsrc = >source;
> +esb_len = (1ull << xsrc->esb_shift) * xsrc->nr_irqs;
> +
> +sysbus_mmio_unmap(SYS_BUS_DEVICE(xive), 0);
> +munmap(xsrc->esb_mmap, esb_len);
> +
> +sysbus_mmio_unmap(SYS_BUS_DEVICE(xive), 1);
> +
> +sysbus_mmio_unmap(SYS_BUS_DEVICE(xive), 2);
> +munmap(xive->tm_mmap, 4ull << TM_SHIFT);
> +
> +/* Destroy the KVM device. This also clears the VCPU presenters */
> +xive_destroy_device.fd = xive->fd;
> +xive_destroy_device.flags = 0;
> +rc = kvm_vm_ioctl(kvm_state, KVM_DESTROY_DEVICE, _destroy_device);
> +if (rc < 0) {
> +error_setg_errno(errp, -rc, "Error on KVM_DESTROY_DEVICE for XIVE");
> +}
> +close(xive->fd);
> +xive->fd = -1;
> +
> +kvm_kernel_irqchip = false;
> +kvm_msi_via_irqfd_allowed = false;
> +kvm_gsi_direct_mapping = false;
> +
> +/* Clear the local list of presenter (hotplug) */
> +kvm_cpu_disable_all();
> +
> +/* VM Change state handler is not needed anymore */
> +qemu_del_vm_change_state_handler(xive->change);
> +}
> diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
> index c6e1b630a404..373de3155f6b 100644
> --- a/hw/intc/xics_kvm.c
> +++ b/hw/intc/xics_kvm.c
> @@ -51,6 +51,16 @@ typedef struct KVMEnabledICP {
>  static QLIST_HEAD(, KVMEnabledICP)
>  kvm_enabled_icps = QLIST_HEAD_INITIALIZER(_enabled_icps);
>  
> +static void kvm_disable_icps(void)
> +{
> +KVMEnabledICP *enabled_icp, *next;
> +
> +QLIST_FOREACH_SAFE(enabled_icp, _enabled_icps, node, next) {
> +QLIST_REMOVE(enabled_icp, node);
> +g_free(enabled_icp);
> +}
> +}
> +
>  /*
>   * ICP-KVM
>   */
> @@ -360,3 +370,49

Re: [Qemu-devel] [PATCH v2 02/13] spapr/xive: add hcall support when under KVM

2019-02-25 Thread David Gibson

On Fri, Feb 22, 2019 at 02:13:11PM +0100, Cédric Le Goater wrote:
> XIVE hcalls are all redirected to QEMU as none are on a fast path.
> When necessary, QEMU invokes KVM through specific ioctls to perform
> host operations. QEMU should have done the necessary checks before
> calling KVM and, in case of failure, H_HARDWARE is simply returned.
> 
> H_INT_ESB is a special case that could have been handled under KVM
> but the impact on performance was low when under QEMU. Here are some
> figures :
> 
> kernel irqchip  OFF  ON
> H_INT_ESBKVM   QEMU
> 
> rtl8139 (LSI )  1.19 1.24  1.23  Gbits/sec
> virtio 31.8042.30   --   Gbits/sec
> 
> Signed-off-by: Cédric Le Goater 
> ---
>  include/hw/ppc/spapr_xive.h |  15 +++
>  hw/intc/spapr_xive.c|  87 +++--
>  hw/intc/spapr_xive_kvm.c| 184 
>  3 files changed, 278 insertions(+), 8 deletions(-)
> 
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> index ab6732b14a02..749c6cbc2c56 100644
> --- a/include/hw/ppc/spapr_xive.h
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -55,9 +55,24 @@ void spapr_xive_set_tctx_os_cam(XiveTCTX *tctx);
>  void spapr_xive_mmio_set_enabled(sPAPRXive *xive, bool enable);
>  void spapr_xive_map_mmio(sPAPRXive *xive);
>  
> +int spapr_xive_end_to_target(uint8_t end_blk, uint32_t end_idx,
> + uint32_t *out_server, uint8_t *out_prio);
> +
>  /*
>   * KVM XIVE device helpers
>   */
>  void kvmppc_xive_connect(sPAPRXive *xive, Error **errp);
> +void kvmppc_xive_reset(sPAPRXive *xive, Error **errp);
> +void kvmppc_xive_set_source_config(sPAPRXive *xive, uint32_t lisn, XiveEAS 
> *eas,
> +   Error **errp);
> +void kvmppc_xive_sync_source(sPAPRXive *xive, uint32_t lisn, Error **errp);
> +uint64_t kvmppc_xive_esb_rw(XiveSource *xsrc, int srcno, uint32_t offset,
> +uint64_t data, bool write);
> +void kvmppc_xive_set_queue_config(sPAPRXive *xive, uint8_t end_blk,
> + uint32_t end_idx, XiveEND *end,
> + Error **errp);
> +void kvmppc_xive_get_queue_config(sPAPRXive *xive, uint8_t end_blk,
> + uint32_t end_idx, XiveEND *end,
> + Error **errp);
>  
>  #endif /* PPC_SPAPR_XIVE_H */
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index c24d649e3668..3db24391e31c 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -86,6 +86,19 @@ static int spapr_xive_target_to_nvt(uint32_t target,
>   * sPAPR END indexing uses a simple mapping of the CPU vcpu_id, 8
>   * priorities per CPU
>   */
> +int spapr_xive_end_to_target(uint8_t end_blk, uint32_t end_idx,
> + uint32_t *out_server, uint8_t *out_prio)
> +{

Since you don't support irq blocks as yet, should this error out
rather than ignoring if end_blk != 0?

> +if (out_server) {
> +*out_server = end_idx >> 3;
> +}
> +
> +if (out_prio) {
> +*out_prio = end_idx & 0x7;
> +}
> +return 0;
> +}
> +
>  static void spapr_xive_cpu_to_end(PowerPCCPU *cpu, uint8_t prio,
>uint8_t *out_end_blk, uint32_t 
> *out_end_idx)
>  {
> @@ -792,6 +805,16 @@ static target_ulong h_int_set_source_config(PowerPCCPU 
> *cpu,
>  new_eas.w = xive_set_field64(EAS_END_DATA, new_eas.w, eisn);
>  }
>  
> +if (kvm_irqchip_in_kernel()) {
> +Error *local_err = NULL;
> +
> +kvmppc_xive_set_source_config(xive, lisn, _eas, _err);
> +if (local_err) {
> +error_report_err(local_err);
> +return H_HARDWARE;
> +}
> +}
> +
>  out:
>  xive->eat[lisn] = new_eas;
>  return H_SUCCESS;
> @@ -1097,6 +1120,16 @@ static target_ulong h_int_set_queue_config(PowerPCCPU 
> *cpu,
>   */
>  
>  out:
> +if (kvm_irqchip_in_kernel()) {
> +Error *local_err = NULL;
> +
> +kvmppc_xive_set_queue_config(xive, end_blk, end_idx, , 
> _err);
> +if (local_err) {
> +error_report_err(local_err);
> +return H_HARDWARE;
> +}
> +}
> +
>  /* Update END */
>  memcpy(>endt[end_idx], , sizeof(XiveEND));
>  return H_SUCCESS;
> @@ -1189,6 +1222,16 @@ static target_ulong h_int_get_queue_config(PowerPCCPU 
> *cpu,
>  args[2] = 0;
>  }
>  
> +if (kvm_irqchip_in_kernel()) {
> +Error *local_err = NULL;
> +
> +kvmppc_xive_get_queue_config(xive, end_blk, end_idx, end, 
> _err);
> +if (local_err) {
> +error_report_err(local_err);
> +return H_HARDWARE;
> +}
> +}
> +
>  /* TODO: do we need any locking on the END ? */
>  if (flags & SPAPR_XIVE_END_DEBUG) {
>  /* Load the event queue generation number into the return flags */
> @@ -1341,15 +1384,20 @@ static target_ulong

Re: [Qemu-devel] [PATCH v2 07/13] spapr/xive: fix migration of the XiveTCTX under TCG

2019-02-25 Thread David Gibson

On Fri, Feb 22, 2019 at 02:13:16PM +0100, Cédric Le Goater wrote:
> When the thread interrupt management state is retrieved from the KVM
> VCPU, word2 is saved under the QEMU XIVE thread context to print out
> the OS CAM line under the QEMU monitor.
> 
> This breaks the migration of a TCG guest (and with KVM when
> kernel_irqchip=off) because the matching algorithm of the presenter
> relies on the OS CAM value. Fix with an extra reset of the thread
> contexts to restore the expected value.
> 
> Signed-off-by: Cédric Le Goater 

As noted elsewhere, I'm not sure this is the right approach to fixing
this.  In any case this can be folded into the previous patch.

> ---
>  hw/ppc/spapr_irq.c | 26 +-
>  1 file changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> index 12ecca6264f3..3176098b9f7c 100644
> --- a/hw/ppc/spapr_irq.c
> +++ b/hw/ppc/spapr_irq.c
> @@ -356,7 +356,31 @@ static void 
> spapr_irq_cpu_intc_create_xive(sPAPRMachineState *spapr,
>  
>  static int spapr_irq_post_load_xive(sPAPRMachineState *spapr, int version_id)
>  {
> -return spapr_xive_post_load(spapr->xive, version_id);
> +CPUState *cs;
> +int ret;
> +
> +ret = spapr_xive_post_load(spapr->xive, version_id);
> +if (ret) {
> +return ret;
> +}
> +
> +/*
> + * When the states are collected from the KVM XIVE device, word2
> + * of the XiveTCTX is set to print out the OS CAM line under the
> + * QEMU monitor.
> + *
> + * This breaks the migration on a TCG guest (or on KVM with
> + * kernel_irqchip=off) because the matching algorithm of the
> + * presenter relies on the OS CAM value. Fix with an extra reset
> + * of the thread contexts to restore the expected value.
> + */
> +CPU_FOREACH(cs) {
> +PowerPCCPU *cpu = POWERPC_CPU(cs);
> +
> +/* (TCG) Set the OS CAM line of the thread interrupt context. */
> +spapr_xive_set_tctx_os_cam(spapr_cpu_state(cpu)->tctx);
> +}
> +return 0;
>  }
>  
>  static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v2 11/13] spapr: check for the activation of the KVM IRQ device

2019-02-25 Thread David Gibson

On Fri, Feb 22, 2019 at 02:13:20PM +0100, Cédric Le Goater wrote:
> The activation of the KVM IRQ device depends on the interrupt mode
> chosen at CAS time by the machine and some methods used at reset or by
> the migration need to be protected.
> 
> Signed-off-by: Cédric Le Goater 

Reviewed-by: David Gibson 

> ---
>  hw/intc/spapr_xive_kvm.c | 28 
>  hw/intc/xics_kvm.c   | 26 +-
>  2 files changed, 53 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
> index e31035c90260..cd81cdb23a5e 100644
> --- a/hw/intc/spapr_xive_kvm.c
> +++ b/hw/intc/spapr_xive_kvm.c
> @@ -96,9 +96,15 @@ static void kvmppc_xive_cpu_set_state(XiveTCTX *tctx, 
> Error **errp)
>  
>  void kvmppc_xive_cpu_get_state(XiveTCTX *tctx, Error **errp)
>  {
> +sPAPRXive *xive = SPAPR_MACHINE(qdev_get_machine())->xive;
>  uint64_t state[4] = { 0 };
>  int ret;
>  
> +/* The KVM XIVE device is not in use */
> +if (xive->fd == -1) {
> +return;
> +}
> +
>  ret = kvm_get_one_reg(tctx->cs, KVM_REG_PPC_NVT_STATE, state);
>  if (ret != 0) {
>  error_setg_errno(errp, errno,
> @@ -152,6 +158,11 @@ void kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error 
> **errp)
>  unsigned long vcpu_id;
>  int ret;
>  
> +/* The KVM XIVE device is not in use */
> +if (xive->fd == -1) {
> +return;
> +}
> +
>  /* Check if CPU was hot unplugged and replugged. */
>  if (kvm_cpu_is_enabled(tctx->cs)) {
>  return;
> @@ -330,9 +341,13 @@ static void kvmppc_xive_source_get_state(XiveSource 
> *xsrc)
>  void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val)
>  {
>  XiveSource *xsrc = opaque;
> +sPAPRXive *xive = SPAPR_XIVE(xsrc->xive);
>  struct kvm_irq_level args;
>  int rc;
>  
> +/* The KVM XIVE device should be in use */
> +assert(xive->fd != -1);
> +
>  args.irq = srcno;
>  if (!xive_source_irq_is_lsi(xsrc, srcno)) {
>  if (!val) {
> @@ -519,6 +534,11 @@ static void kvmppc_xive_change_state_handler(void 
> *opaque, int running,
>  
>  void kvmppc_xive_synchronize_state(sPAPRXive *xive, Error **errp)
>  {
> +/* The KVM XIVE device is not in use */
> +if (xive->fd == -1) {
> +return;
> +}
> +
>  /*
>   * When the VM is stopped, the sources are masked and the previous
>   * state is saved in anticipation of a migration. We should not
> @@ -544,6 +564,11 @@ int kvmppc_xive_pre_save(sPAPRXive *xive)
>  {
>  Error *local_err = NULL;
>  
> +/* The KVM XIVE device is not in use */
> +if (xive->fd == -1) {
> +return 0;
> +}
> +
>  /* EAT: there is no extra state to query from KVM */
>  
>  /* ENDT */
> @@ -568,6 +593,9 @@ int kvmppc_xive_post_load(sPAPRXive *xive, int version_id)
>  CPUState *cs;
>  int i;
>  
> +/* The KVM XIVE device should be in use */
> +assert(xive->fd != -1);
> +
>  /* Restore the ENDT first. The targetting depends on it. */
>  for (i = 0; i < xive->nr_ends; i++) {
>  kvmppc_xive_set_queue_config(xive, SPAPR_XIVE_BLOCK_ID, i,
> diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
> index 373de3155f6b..9855316e4831 100644
> --- a/hw/intc/xics_kvm.c
> +++ b/hw/intc/xics_kvm.c
> @@ -69,6 +69,11 @@ void icp_get_kvm_state(ICPState *icp)
>  uint64_t state;
>  int ret;
>  
> +/* The KVM XICS device is not in use */
> +if (kernel_xics_fd == -1) {
> +return;
> +}
> +
>  /* ICP for this CPU thread is not in use, exiting */
>  if (!icp->cs) {
>  return;
> @@ -105,6 +110,11 @@ int icp_set_kvm_state(ICPState *icp)
>  uint64_t state;
>  int ret;
>  
> +/* The KVM XICS device is not in use */
> +if (kernel_xics_fd == -1) {
> +return 0;
> +}
> +
>  /* ICP for this CPU thread is not in use, exiting */
>  if (!icp->cs) {
>  return 0;
> @@ -133,8 +143,9 @@ void icp_kvm_realize(DeviceState *dev, Error **errp)
>  unsigned long vcpu_id;
>  int ret;
>  
> +/* The KVM XICS device is not in use */
>  if (kernel_xics_fd == -1) {
> -abort();
> +return;
>  }
>  
>  cs = icp->cs;
> @@ -170,6 +181,11 @@ void ics_get_kvm_state(ICSState *ics)
>  uint64_t state;
>  int i;
>  
> +/* The KVM XICS device is not in use */
> +if (kernel_xics_fd == -1) {
> +return;
> +}
> +
>  for (i = 0; i < ics->nr_irqs; i++) {
>  ICSIRQState *irq = >irqs[i];
>  
> @@ -269,6 +285,11 @@ int ics_set_kvm_state(ICSState *ics)
>  {
>  int i;
>  
> +/* The KVM XICS device is not in use */
> +if (kernel_xics_fd == -1) {
> +return 0;
> +}
> +
>  for (i = 0; i < ics->nr_irqs; i++) {
>  int ret;
>  
> @@ -286,6 +307,9 @@ void ics_kvm_set_irq(ICSState *ics, int srcno, int val)
>  struct kvm_irq_level args;
>  int rc;
>  
> +/* The KVM XICS device should be in use */
> +

Re: [Qemu-devel] [PATCH v2 03/13] spapr/xive: activate KVM support

2019-02-25 Thread David Gibson

On Tue, Feb 26, 2019 at 10:49:27AM +1100, David Gibson wrote:
> On Fri, Feb 22, 2019 at 02:13:12PM +0100, Cédric Le Goater wrote:
> > All is in place for KVM now. State synchronization and migration will
> > come next.
> 
> As with the kernel side capability, this should be moved later in the
> series to avoid breaking bisections.

Apart from that,

Reviewed-by: David Gibson 

> 
> > 
> > Signed-off-by: Cédric Le Goater 
> > ---
> >  hw/ppc/spapr_irq.c | 9 -
> >  1 file changed, 9 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> > index 6e1c36dc62ca..1ad57582a403 100644
> > --- a/hw/ppc/spapr_irq.c
> > +++ b/hw/ppc/spapr_irq.c
> > @@ -263,19 +263,10 @@ sPAPRIrq spapr_irq_xics = {
> >  static void spapr_irq_init_xive(sPAPRMachineState *spapr, int nr_irqs,
> >  Error **errp)
> >  {
> > -MachineState *machine = MACHINE(spapr);
> >  uint32_t nr_servers = spapr_max_server_number(spapr);
> >  DeviceState *dev;
> >  int i;
> >  
> > -/* KVM XIVE device not yet available */
> > -if (kvm_enabled()) {
> > -if (machine_kernel_irqchip_required(machine)) {
> > -error_setg(errp, "kernel_irqchip requested. no KVM XIVE 
> > support");
> > -return;
> > -}
> > -}
> > -
> >  dev = qdev_create(NULL, TYPE_SPAPR_XIVE);
> >  qdev_prop_set_uint32(dev, "nr-irqs", nr_irqs);
> >  /*
> 



-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v2] spapr-rtas: add ibm, get-vpd RTAS interface

2019-02-25 Thread David Gibson

On Mon, Feb 25, 2019 at 01:23:25PM -0300, Maxiwell S. Garcia wrote:
> This adds a handler for ibm,get-vpd RTAS calls, allowing pseries
> guest to collect host information. It is disabled by default to
> avoid unwanted information leakage. To enable it, use:
> ‘-M pseries,vpd-export=on’
> 
> Only the SE and TM keywords are returned at the moment:
> SE for Machine or Cabinet Serial Number and
> TM for Machine Type and Model.
> 
> Powerpc-utils tools can dispatch RTAS calls to retrieve host
> information using this ibm,get-vpd interface. The 'host-serial'
> and 'host-model' nodes of device-tree hold the same information but
> in a static manner, which is useless after a migration operation.
> 
> Signed-off-by: Maxiwell S. Garcia 

Applied, thanks.

> ---
>  hw/ppc/spapr.c | 21 ++
>  hw/ppc/spapr_rtas.c| 95 ++
>  include/hw/ppc/spapr.h | 17 +++-
>  3 files changed, 132 insertions(+), 1 deletion(-)
> 
> Update v2:
> - rtas_ibm_get_vpd(): fix initialization values
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index abf9ebce59..09fd9e2ebb 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3026,6 +3026,20 @@ static char *spapr_get_fw_dev_path(FWPathProvider *p, 
> BusState *bus,
>  return NULL;
>  }
>  
> +static bool spapr_get_vpd_export(Object *obj, Error **errp)
> +{
> +sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> +
> +return spapr->vpd_export;
> +}
> +
> +static void spapr_set_vpd_export(Object *obj, bool value, Error **errp)
> +{
> +sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> +
> +spapr->vpd_export = value;
> +}
> +
>  static char *spapr_get_kvm_type(Object *obj, Error **errp)
>  {
>  sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
> @@ -3150,6 +3164,7 @@ static void spapr_instance_init(Object *obj)
>  sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
>  
>  spapr->htab_fd = -1;
> +spapr->vpd_export = false;
>  spapr->use_hotplug_event_source = true;
>  object_property_add_str(obj, "kvm-type",
>  spapr_get_kvm_type, spapr_set_kvm_type, NULL);
> @@ -3182,6 +3197,12 @@ static void spapr_instance_init(Object *obj)
>  object_property_add_bool(obj, "vfio-no-msix-emulation",
>   spapr_get_msix_emulation, NULL, NULL);
>  
> +object_property_add_bool(obj, "vpd-export", spapr_get_vpd_export,
> + spapr_set_vpd_export, NULL);
> +object_property_set_description(obj, "vpd-export",
> +"Export Host's VPD information to guest",
> +_abort);
> +
>  /* The machine class defines the default interrupt controller mode */
>  spapr->irq = smc->irq;
>  object_property_add_str(obj, "ic-mode", spapr_get_ic_mode,
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index d6a0952154..fbf589c502 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -287,6 +287,99 @@ static void rtas_ibm_set_system_parameter(PowerPCCPU 
> *cpu,
>  rtas_st(rets, 0, ret);
>  }
>  
> +static inline int vpd_st(target_ulong addr, target_ulong len,
> + const void *val, uint16_t val_len)
> +{
> +hwaddr phys = ppc64_phys_to_real(addr);
> +if (len < val_len) {
> +return RTAS_OUT_PARAM_ERROR;
> +}
> +cpu_physical_memory_write(phys, val, val_len);
> +return RTAS_OUT_SUCCESS;
> +}
> +
> +static inline void vpd_ret(target_ulong rets, const int status,
> +   const int next_seq_number, const int 
> bytes_returned)
> +{
> +rtas_st(rets, 0, status);
> +rtas_st(rets, 1, next_seq_number);
> +rtas_st(rets, 2, bytes_returned);
> +}
> +
> +static void rtas_ibm_get_vpd(PowerPCCPU *cpu,
> + sPAPRMachineState *spapr,
> + uint32_t token, uint32_t nargs,
> + target_ulong args,
> + uint32_t nret, target_ulong rets)
> +{
> +sPAPRMachineState *sm = SPAPR_MACHINE(spapr);
> +target_ulong loc_code_addr;
> +target_ulong work_area_addr;
> +target_ulong work_area_size;
> +target_ulong seq_number;
> +unsigned char loc_code = 0;
> +unsigned int next_seq_number = 1;
> +int status = RTAS_IBM_GET_VPD_PARAMETER_ERROR;
> +int ret = RTAS_OUT_PARAM_ERROR;
> +char *vpd_field = NULL;
> +unsigned int vpd_field_len = 0;
> +
> +if (!sm->vpd_export) {
> +vpd_ret(rets, RTAS_OUT_NOT_AUTHORIZED, 1, 0);
> +return;
> +}
> +
> +/* Specific Location Code is not supported */
> +loc_code_addr = rtas_ld(args, 0);
> +cpu_physical_memory_read(loc_code_addr, _code, 1);
> +if (loc_code != 0) {
> +vpd_ret(rets, RTAS_IBM_GET_VPD_PARAMETER_ERROR, 1, 0);
> +return;
> +}
> +
> +work_area_addr = rtas_ld(args, 1);
> +work_area_size = rtas_ld(args, 2);
> +seq_number = rtas_ld(args,

Re: [Qemu-devel] [PATCH v2 03/13] spapr/xive: activate KVM support

2019-02-25 Thread David Gibson

On Fri, Feb 22, 2019 at 02:13:12PM +0100, Cédric Le Goater wrote:
> All is in place for KVM now. State synchronization and migration will
> come next.

As with the kernel side capability, this should be moved later in the
series to avoid breaking bisections.

> 
> Signed-off-by: Cédric Le Goater 
> ---
>  hw/ppc/spapr_irq.c | 9 -
>  1 file changed, 9 deletions(-)
> 
> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> index 6e1c36dc62ca..1ad57582a403 100644
> --- a/hw/ppc/spapr_irq.c
> +++ b/hw/ppc/spapr_irq.c
> @@ -263,19 +263,10 @@ sPAPRIrq spapr_irq_xics = {
>  static void spapr_irq_init_xive(sPAPRMachineState *spapr, int nr_irqs,
>  Error **errp)
>  {
> -MachineState *machine = MACHINE(spapr);
>  uint32_t nr_servers = spapr_max_server_number(spapr);
>  DeviceState *dev;
>  int i;
>  
> -/* KVM XIVE device not yet available */
> -if (kvm_enabled()) {
> -if (machine_kernel_irqchip_required(machine)) {
> -error_setg(errp, "kernel_irqchip requested. no KVM XIVE 
> support");
> -return;
> -}
> -}
> -
>  dev = qdev_create(NULL, TYPE_SPAPR_XIVE);
>  qdev_prop_set_uint32(dev, "nr-irqs", nr_irqs);
>  /*

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

[Qemu-devel] [QEMU-PPC] [PATCH 2/4] target/ppc: Implement large decrementer support for TCG

2019-02-25 Thread Suraj Jitindar Singh

Prior to POWER9 the decrementer was a 32-bit register which decremented
with each tick of the timebase. From POWER9 onwards the decrementer can
be set to operate in a mode called large decrementer where it acts as a
n-bit decrementing register which is visible as a 64-bit register, that
is the value of the decrementer is sign extended to 64 bits (where n is
implementation dependant).

The mode in which the decrementer operates is controlled by the LPCR_LD
bit in the logical paritition control register (LPCR).

>From POWER9 onwards the HDEC (hypervisor decrementer) was enlarged to
h-bits, also sign extended to 64 bits (where h is implementation
dependant). Note this isn't configurable and is always enabled.

For TCG we allow the user to configure a custom large decrementer size,
so long as it's at least 32 and less than the size of the HDEC (the
restrictions imposed by the ISA).

Signed-off-by: Suraj Jitindar Singh 
Signed-off-by: Cédric Le Goater 
---
 hw/ppc/ppc.c| 78 -
 hw/ppc/spapr.c  |  8 +
 hw/ppc/spapr_caps.c | 38 +++-
 target/ppc/cpu-qom.h|  1 +
 target/ppc/cpu.h| 11 +++---
 target/ppc/mmu-hash64.c |  2 +-
 target/ppc/translate.c  |  2 +-
 target/ppc/translate_init.inc.c |  1 +
 8 files changed, 109 insertions(+), 32 deletions(-)

diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index d1e3d4cd20..853afeed6a 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -744,10 +744,10 @@ bool ppc_decr_clear_on_delivery(CPUPPCState *env)
 return ((tb_env->flags & flags) == PPC_DECR_UNDERFLOW_TRIGGERED);
 }
 
-static inline uint32_t _cpu_ppc_load_decr(CPUPPCState *env, uint64_t next)
+static inline uint64_t _cpu_ppc_load_decr(CPUPPCState *env, uint64_t next)
 {
 ppc_tb_t *tb_env = env->tb_env;
-uint32_t decr;
+uint64_t decr;
 int64_t diff;
 
 diff = next - qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
@@ -758,27 +758,42 @@ static inline uint32_t _cpu_ppc_load_decr(CPUPPCState 
*env, uint64_t next)
 }  else {
 decr = -muldiv64(-diff, tb_env->decr_freq, NANOSECONDS_PER_SECOND);
 }
-LOG_TB("%s: %08" PRIx32 "\n", __func__, decr);
+LOG_TB("%s: %016" PRIx64 "\n", __func__, decr);
 
 return decr;
 }
 
-uint32_t cpu_ppc_load_decr (CPUPPCState *env)
+target_ulong cpu_ppc_load_decr (CPUPPCState *env)
 {
 ppc_tb_t *tb_env = env->tb_env;
+uint64_t decr;
 
 if (kvm_enabled()) {
 return env->spr[SPR_DECR];
 }
 
-return _cpu_ppc_load_decr(env, tb_env->decr_next);
+decr = _cpu_ppc_load_decr(env, tb_env->decr_next);
+
+/*
+ * If large decrementer is enabled then the decrementer is signed extened
+ * to 64 bits, otherwise it is a 32 bit value.
+ */
+if (env->spr[SPR_LPCR] & LPCR_LD)
+return decr;
+return (uint32_t) decr;
 }
 
-uint32_t cpu_ppc_load_hdecr (CPUPPCState *env)
+target_ulong cpu_ppc_load_hdecr (CPUPPCState *env)
 {
 ppc_tb_t *tb_env = env->tb_env;
+uint64_t decr;
 
-return _cpu_ppc_load_decr(env, tb_env->hdecr_next);
+decr =  _cpu_ppc_load_decr(env, tb_env->hdecr_next);
+
+/* If POWER9 or later then hdecr is sign extended to 64 bits otherwise 32 
*/
+if (env->mmu_model & POWERPC_MMU_3_00)
+return decr;
+return (uint32_t) decr;
 }
 
 uint64_t cpu_ppc_load_purr (CPUPPCState *env)
@@ -832,13 +847,21 @@ static void __cpu_ppc_store_decr(PowerPCCPU *cpu, 
uint64_t *nextp,
  QEMUTimer *timer,
  void (*raise_excp)(void *),
  void (*lower_excp)(PowerPCCPU *),
- uint32_t decr, uint32_t value)
+ target_ulong decr, target_ulong value,
+ int nr_bits)
 {
 CPUPPCState *env = >env;
 ppc_tb_t *tb_env = env->tb_env;
 uint64_t now, next;
+bool negative;
+
+/* Truncate value to decr_width and sign extend for simplicity */
+value &= ((1ULL << nr_bits) - 1);
+negative = !!(value & (1ULL << (nr_bits - 1)));
+if (negative)
+value |= (0xULL << nr_bits);
 
-LOG_TB("%s: %08" PRIx32 " => %08" PRIx32 "\n", __func__,
+LOG_TB("%s: " TARGET_FMT_lx " => " TARGET_FMT_lx "\n", __func__,
 decr, value);
 
 if (kvm_enabled()) {
@@ -860,15 +883,15 @@ static void __cpu_ppc_store_decr(PowerPCCPU *cpu, 
uint64_t *nextp,
  * an edge interrupt, so raise it here too.
  */
 if ((value < 3) ||
-((tb_env->flags & PPC_DECR_UNDERFLOW_LEVEL) && (value & 0x8000)) ||
-((tb_env->flags & PPC_DECR_UNDERFLOW_TRIGGERED) && (value & 0x8000)
-  && !(decr & 0x8000))) {
+((tb_env->flags & PPC_DECR_UNDERFLOW_LEVEL) && negative) ||
+((tb_env->flags & PPC_DECR_UNDERFLOW_TRIGGERED) && negative
+  && !(decr & (1ULL << (nr_bits - 1) {
 (*raise_excp)(cpu);

[Qemu-devel] [QEMU-PPC] [PATCH 3/4] target/ppc: Implement large decrementer support for KVM

2019-02-25 Thread Suraj Jitindar Singh

Implement support to allow KVM guests to take advantage of the large
decrementer introduced on POWER9 cpus.

To determine if the host can support the requested large decrementer
size, we check it matches that specified in the ibm,dec-bits device-tree
property. We also need to enable it in KVM by setting the LPCR_LD bit in
the LPCR. Note that to do this we need to try and set the bit, then read
it back to check the host allowed us to set it, if so we can use it but
if we were unable to set it the host cannot support it and we must not
use the large decrementer.

Signed-off-by: Suraj Jitindar Singh 
Signed-off-by: Cédric Le Goater 
---
 hw/ppc/spapr_caps.c  | 17 +++--
 target/ppc/kvm.c | 39 +++
 target/ppc/kvm_ppc.h | 12 
 3 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index 44542fdbb2..e07568fb94 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -440,8 +440,16 @@ static void cap_large_decr_apply(sPAPRMachineState *spapr,
 pcc->hdecr_bits);
 return;
 }
-} else {
-error_setg(errp, "No large decrementer support, try cap-large-decr=0");
+} else if (kvm_enabled()) {
+int kvm_nr_bits = kvmppc_get_cap_large_decr();
+
+if (!kvm_nr_bits) {
+error_setg(errp, "No large decrementer support, try 
cap-large-decr=0");
+} else if (val != kvm_nr_bits) {
+error_setg(errp,
+"Large decrementer size unsupported, try -cap-large-decr=%d",
+kvm_nr_bits);
+}
 }
 }
 
@@ -452,6 +460,11 @@ static void cap_large_decr_cpu_apply(sPAPRMachineState 
*spapr,
 CPUPPCState *env = >env;
 target_ulong lpcr = env->spr[SPR_LPCR];
 
+if (kvm_enabled()) {
+if (kvmppc_enable_cap_large_decr(cpu, !!val))
+error_setg(errp, "No large decrementer support, try 
cap-large-decr=0");
+}
+
 if (val)
 lpcr |= LPCR_LD;
 else
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index d01852fe31..3f650c8fc4 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -91,6 +91,7 @@ static int cap_ppc_safe_cache;
 static int cap_ppc_safe_bounds_check;
 static int cap_ppc_safe_indirect_branch;
 static int cap_ppc_nested_kvm_hv;
+static int cap_large_decr;
 
 static uint32_t debug_inst_opcode;
 
@@ -124,6 +125,7 @@ static bool kvmppc_is_pr(KVMState *ks)
 
 static int kvm_ppc_register_host_cpu_type(MachineState *ms);
 static void kvmppc_get_cpu_characteristics(KVMState *s);
+static int kvmppc_get_dec_bits(void);
 
 int kvm_arch_init(MachineState *ms, KVMState *s)
 {
@@ -151,6 +153,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 cap_resize_hpt = kvm_vm_check_extension(s, KVM_CAP_SPAPR_RESIZE_HPT);
 kvmppc_get_cpu_characteristics(s);
 cap_ppc_nested_kvm_hv = kvm_vm_check_extension(s, KVM_CAP_PPC_NESTED_HV);
+cap_large_decr = kvmppc_get_dec_bits();
 /*
  * Note: setting it to false because there is not such capability
  * in KVM at this moment.
@@ -1927,6 +1930,15 @@ uint64_t kvmppc_get_clockfreq(void)
 return kvmppc_read_int_cpu_dt("clock-frequency");
 }
 
+static int kvmppc_get_dec_bits(void)
+{
+int nr_bits = kvmppc_read_int_cpu_dt("ibm,dec-bits");
+
+if (nr_bits > 0)
+return nr_bits;
+return 0;
+}
+
 static int kvmppc_get_pvinfo(CPUPPCState *env, struct kvm_ppc_pvinfo *pvinfo)
  {
  PowerPCCPU *cpu = ppc_env_get_cpu(env);
@@ -2442,6 +2454,33 @@ bool kvmppc_has_cap_spapr_vfio(void)
 return cap_spapr_vfio;
 }
 
+int kvmppc_get_cap_large_decr(void)
+{
+return cap_large_decr;
+}
+
+int kvmppc_enable_cap_large_decr(PowerPCCPU *cpu, int enable)
+{
+CPUState *cs = CPU(cpu);
+uint64_t lpcr;
+
+kvm_get_one_reg(cs, KVM_REG_PPC_LPCR_64, );
+/* Do we need to modify the LPCR? */
+if (!!(lpcr & LPCR_LD) != !!enable) {
+if (enable)
+lpcr |= LPCR_LD;
+else
+lpcr &= ~LPCR_LD;
+kvm_set_one_reg(cs, KVM_REG_PPC_LPCR_64, );
+kvm_get_one_reg(cs, KVM_REG_PPC_LPCR_64, );
+
+if (!!(lpcr & LPCR_LD) != !!enable)
+return -1;
+}
+
+return 0;
+}
+
 PowerPCCPUClass *kvm_ppc_get_host_cpu_class(void)
 {
 uint32_t host_pvr = mfpvr();
diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
index bdfaa4e70a..a79835bd14 100644
--- a/target/ppc/kvm_ppc.h
+++ b/target/ppc/kvm_ppc.h
@@ -64,6 +64,8 @@ int kvmppc_get_cap_safe_bounds_check(void);
 int kvmppc_get_cap_safe_indirect_branch(void);
 bool kvmppc_has_cap_nested_kvm_hv(void);
 int kvmppc_set_cap_nested_kvm_hv(int enable);
+int kvmppc_get_cap_large_decr(void);
+int kvmppc_enable_cap_large_decr(PowerPCCPU *cpu, int enable);
 int kvmppc_enable_hwrng(void);
 int kvmppc_put_books_sregs(PowerPCCPU *cpu);
 PowerPCCPUClass *kvm_ppc_get_host_cpu_class(void);
@@ -332,6 +334,16 @@ static inline int kvmppc_set_cap_nested_kvm_hv(int enable)
 return -1;
 }
 
+static

[Qemu-devel] [QEMU-PPC] [PATCH 1/4] target/ppc/spapr: Add SPAPR_CAP_LARGE_DECREMENTER

2019-02-25 Thread Suraj Jitindar Singh

Add spapr_cap SPAPR_CAP_LARGE_DECREMENTER to be used to control the
availability and size of the large decrementer made available to the
guest.

Signed-off-by: Suraj Jitindar Singh 
---
 hw/ppc/spapr.c |  2 ++
 hw/ppc/spapr_caps.c| 45 +
 include/hw/ppc/spapr.h |  5 -
 3 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index b6a571b6f1..acf62a2b9f 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2077,6 +2077,7 @@ static const VMStateDescription vmstate_spapr = {
 _spapr_irq_map,
 _spapr_cap_nested_kvm_hv,
 _spapr_dtb,
+_spapr_cap_large_decr,
 NULL
 }
 };
@@ -4288,6 +4289,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
void *data)
 smc->default_caps.caps[SPAPR_CAP_IBS] = SPAPR_CAP_BROKEN;
 smc->default_caps.caps[SPAPR_CAP_HPT_MAXPAGESIZE] = 16; /* 64kiB */
 smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
+smc->default_caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = 0;
 spapr_caps_add_properties(smc, _abort);
 smc->irq = _irq_xics;
 smc->dr_phb_enabled = true;
diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index 64f98ae68d..1545a02729 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -182,6 +182,34 @@ static void spapr_cap_set_pagesize(Object *obj, Visitor 
*v, const char *name,
 spapr->eff.caps[cap->index] = val;
 }
 
+static void spapr_cap_get_uint8(Object *obj, Visitor *v, const char *name,
+void *opaque, Error **errp)
+{
+sPAPRCapabilityInfo *cap = opaque;
+sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
+uint8_t val = spapr_get_cap(spapr, cap->index);
+
+visit_type_uint8(v, name, , errp);
+}
+
+static void spapr_cap_set_uint8(Object *obj, Visitor *v, const char *name,
+void *opaque, Error **errp)
+{
+sPAPRCapabilityInfo *cap = opaque;
+sPAPRMachineState *spapr = SPAPR_MACHINE(obj);
+Error *local_err = NULL;
+uint8_t val;
+
+visit_type_uint8(v, name, , _err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+
+spapr->cmd_line_caps[cap->index] = true;
+spapr->eff.caps[cap->index] = val;
+}
+
 static void cap_htm_apply(sPAPRMachineState *spapr, uint8_t val, Error **errp)
 {
 if (!val) {
@@ -390,6 +418,13 @@ static void cap_nested_kvm_hv_apply(sPAPRMachineState 
*spapr,
 }
 }
 
+static void cap_large_decr_apply(sPAPRMachineState *spapr,
+ uint8_t val, Error **errp)
+{
+if (val)
+error_setg(errp, "No large decrementer support, try cap-large-decr=0");
+}
+
 sPAPRCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
 [SPAPR_CAP_HTM] = {
 .name = "htm",
@@ -468,6 +503,15 @@ sPAPRCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
 .type = "bool",
 .apply = cap_nested_kvm_hv_apply,
 },
+[SPAPR_CAP_LARGE_DECREMENTER] = {
+.name = "large-decr",
+.description = "Size of Large Decrementer for the Guest (bits) 
0=disabled",
+.index = SPAPR_CAP_LARGE_DECREMENTER,
+.get = spapr_cap_get_uint8,
+.set = spapr_cap_set_uint8,
+.type = "int",
+.apply = cap_large_decr_apply,
+},
 };
 
 static sPAPRCapabilities default_caps_with_cpu(sPAPRMachineState *spapr,
@@ -596,6 +640,7 @@ SPAPR_CAP_MIG_STATE(cfpc, SPAPR_CAP_CFPC);
 SPAPR_CAP_MIG_STATE(sbbc, SPAPR_CAP_SBBC);
 SPAPR_CAP_MIG_STATE(ibs, SPAPR_CAP_IBS);
 SPAPR_CAP_MIG_STATE(nested_kvm_hv, SPAPR_CAP_NESTED_KVM_HV);
+SPAPR_CAP_MIG_STATE(large_decr, SPAPR_CAP_LARGE_DECREMENTER);
 
 void spapr_caps_init(sPAPRMachineState *spapr)
 {
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 59073a7579..8efc5e0779 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -74,8 +74,10 @@ typedef enum {
 #define SPAPR_CAP_HPT_MAXPAGESIZE   0x06
 /* Nested KVM-HV */
 #define SPAPR_CAP_NESTED_KVM_HV 0x07
+/* Large Decrementer */
+#define SPAPR_CAP_LARGE_DECREMENTER 0x08
 /* Num Caps */
-#define SPAPR_CAP_NUM   (SPAPR_CAP_NESTED_KVM_HV + 1)
+#define SPAPR_CAP_NUM   (SPAPR_CAP_LARGE_DECREMENTER + 1)
 
 /*
  * Capability Values
@@ -828,6 +830,7 @@ extern const VMStateDescription vmstate_spapr_cap_cfpc;
 extern const VMStateDescription vmstate_spapr_cap_sbbc;
 extern const VMStateDescription vmstate_spapr_cap_ibs;
 extern const VMStateDescription vmstate_spapr_cap_nested_kvm_hv;
+extern const VMStateDescription vmstate_spapr_cap_large_decr;
 
 static inline uint8_t spapr_get_cap(sPAPRMachineState *spapr, int cap)
 {
-- 
2.13.6

[Qemu-devel] [QEMU-PPC] [PATCH 4/4] target/ppc/spapr: Enable the large decrementer by default on POWER9

2019-02-25 Thread Suraj Jitindar Singh

Enable the large decrementer by default on POWER9 cpu models. The
default value applied is that provided in the cpu class.

Signed-off-by: Suraj Jitindar Singh 
---
 hw/ppc/spapr_caps.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
index e07568fb94..f48aa367e3 100644
--- a/hw/ppc/spapr_caps.c
+++ b/hw/ppc/spapr_caps.c
@@ -566,11 +566,18 @@ sPAPRCapabilityInfo capability_table[SPAPR_CAP_NUM] = {
 static sPAPRCapabilities default_caps_with_cpu(sPAPRMachineState *spapr,
const char *cputype)
 {
+PowerPCCPUClass *pcc = POWERPC_CPU_CLASS(object_class_by_name(cputype));
 sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
 sPAPRCapabilities caps;
 
 caps = smc->default_caps;
 
+caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = pcc->hdecr_bits;
+if (!ppc_type_check_compat(cputype, CPU_POWERPC_LOGICAL_3_00,
+   0, spapr->max_compat_pvr)) {
+caps.caps[SPAPR_CAP_LARGE_DECREMENTER] = 0;
+}
+
 if (!ppc_type_check_compat(cputype, CPU_POWERPC_LOGICAL_2_07,
0, spapr->max_compat_pvr)) {
 caps.caps[SPAPR_CAP_HTM] = SPAPR_CAP_OFF;
-- 
2.13.6

Re: [Qemu-devel] Questions about EDID

2019-02-25 Thread Programmingkid



> On Feb 25, 2019, at 10:26 AM, Gerd Hoffmann  wrote:
> 
> On Mon, Feb 25, 2019 at 09:05:30AM -0500, G 3 wrote:
>> Hi Gerd, I was wondering if you have made any documentation for your EDID
>> patches. If you have could you provide a link please?
> 
> No docs.
> 
>> Also could a feature be added that allows the user to specify resolutions
>> to be made available to the guest?
>> 
>> Maybe it could work like this: -device VGA,edid=on,res=1366x768,7680x4320
> 
> A single resolution works (via xres + yres properties).

Could you send an example of the xres and yres properties please?
I tried this but it didn't work: -device VGA,edid=on,xres=999,yres=888

Thank you.

Re: [Qemu-devel] [Qemu-block] [PATCH v2] iotests: handle TypeError for Python3 in test 242

2019-02-25 Thread Nir Soffer

On Mon, Feb 25, 2019 at 10:36 PM Eduardo Habkost 
wrote:

> On Fri, Feb 22, 2019 at 02:26:13PM +0300, Andrey Shinkevich wrote:
> > The data type for bytes in Python3 differs from the one in Python2.
> > Those cases should be managed separately.
> >
> > v1:
> > In the first version, the TypeError in Python3 was handled as the
> > exception.
> > Discussed in the e-mail thread with the Message ID:
> > <1550519997-253534-1-git-send-email-andrey.shinkev...@virtuozzo.com>
> >
> > Signed-off-by: Andrey Shinkevich 
> > Reported-by: Kevin Wolf 
> > ---
> >  tests/qemu-iotests/242 | 8 ++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/tests/qemu-iotests/242 b/tests/qemu-iotests/242
> > index 16c65ed..446fbf8 100755
> > --- a/tests/qemu-iotests/242
> > +++ b/tests/qemu-iotests/242
> > @@ -20,6 +20,7 @@
> >
> >  import iotests
> >  import json
> > +import sys
> >  from iotests import qemu_img_create, qemu_io, qemu_img_pipe, \
> >  file_path, img_info_log, log, filter_qemu_io
> >
> > @@ -65,9 +66,12 @@ def toggle_flag(offset):
> >  with open(disk, "r+b") as f:
> >  f.seek(offset, 0)
> >  c = f.read(1)
> > -toggled = chr(ord(c) ^ bitmap_flag_unknown)
> > +toggled = ord(c) ^ bitmap_flag_unknown
> >  f.seek(-1, 1)
> > -f.write(toggled)
> > +if sys.version_info.major >= 3:
> > +f.write(bytes([toggled]))
> > +else:
> > +f.write(chr(toggled))
>
> Pretending we are dealing with text strings and using str/ord is
> a python2-specific quirk.  I think it would be nice to get rid of
> it.
>
> Python 2 has bytearray(), which behaves more similarly to the
> bytes type from Python 3.  If we use it, we can make the code
> more python3-like:
>
> diff --git a/tests/qemu-iotests/242 b/tests/qemu-iotests/242
> index 16c65edcd7..7794fd4a70 100755
> --- a/tests/qemu-iotests/242
> +++ b/tests/qemu-iotests/242
> @@ -64,10 +64,12 @@ def write_to_disk(offset, size):
>  def toggle_flag(offset):
>  with open(disk, "r+b") as f:
>  f.seek(offset, 0)
> -c = f.read(1)
> -toggled = chr(ord(c) ^ bitmap_flag_unknown)
> +# The casts to bytearray() below are only necessary
> +# for Python 2 compatibility
> +c = bytearray(f.read(1))[0]
>

This is simpler and makes the intent of the code more clear:

flag, = struct.unpack("B", f.read(1))


> +toggled = c ^ bitmap_flag_unknown
>  f.seek(-1, 1)
> -f.write(toggled)
> +f.write(bytearray([toggled]))
>

For consistency, we can use struct.pack here:

f.write(struct.pack("B", toggled))

Nir


>
>
>  qemu_img_create('-f', iotests.imgfmt, disk, '1M')
>
> --
> Eduardo
>
>

1 2 3 4 >

1 - 100 of 393 matches

Mail list logo