date:20190212

Re: [Qemu-devel] [RFC v1 1/3] intel_iommu: scalable mode emulation

2019-02-12 Thread Yi Sun

On 19-02-11 18:12:13, Peter Xu wrote:
> On Wed, Jan 30, 2019 at 01:09:11PM +0800, Yi Sun wrote:
> > From: "Liu, Yi L" 
> > 
> > Intel(R) VT-d 3.0 spec introduces scalable mode address translation to
> > replace extended context mode. This patch extends current emulator to
> > support Scalable Mode which includes root table, context table and new
> > pasid table format change. Now intel_iommu emulates both legacy mode
> > and scalable mode (with legacy-equivalent capability set).
> > 
> > The key points are below:
> > 1. Extend root table operations to support both legacy mode and scalable
> >mode.
> > 2. Extend context table operations to support both legacy mode and
> >scalable mode.
> > 3. Add pasid tabled operations to support scalable mode.
> 
> (this patch looks generally good to me, but I've got some trivial
>  comments below...)
> 
Thank you!

> > 
> > [Yi Sun is co-developer to contribute much to refine the whole commit.]
> > Signed-off-by: Yi Sun 
> > Signed-off-by: Liu, Yi L 
> 
> I think you should have your signed-off-by to be the latter one since
> you are the one who processed the patch last (and who posted it).
> 
Got it, thanks!

> > ---
> >  hw/i386/intel_iommu.c  | 528 
> > ++---
> >  hw/i386/intel_iommu_internal.h |  43 +++-
> >  hw/i386/trace-events   |   2 +-
> >  include/hw/i386/intel_iommu.h  |  16 +-
> >  4 files changed, 498 insertions(+), 91 deletions(-)
> > 
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index 8b72735..396ac8e 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -37,6 +37,34 @@
> >  #include "kvm_i386.h"
> >  #include "trace.h"
> >  
> > +#define vtd_devfn_check(devfn) ((devfn & VTD_DEVFN_CHECK_MASK) ? true : 
> > false)
> 
> "vtd_devfn_check(devfn)" is merely as long as "devfn &
> VTD_DEVFN_CHECK_MASK", isn't it? :)
> 
> I would just drop the macro.
> 
There are two places to call this macro. Is that valuable to keep it?

> > +
> > +/* context entry operations */
> > +#define vtd_get_ce_size(s, ce) \
> > +(((s)->root_scalable) ? \
> > + VTD_CTX_ENTRY_SM_SIZE : VTD_CTX_ENTRY_LECY_SIZE)
> 
> "ce" is not used.  Also, if a macro is only used once, I'd just embed
> it in the function.  This one is only used in
> vtd_get_context_entry_from_root().
> 
Yes, I will drop this.

> > +#define vtd_ce_get_domain_id(ce) VTD_CONTEXT_ENTRY_DID((ce)->val[1])
> 
> Is this correct for scalable mode?  Section 9.4, Figure 9-34, it says
> ce->val[1] has RID_PASID in bits 64-83 rather than domain ID.
> 
This is for legacy context entry but not scalable-mode context entry.

> > +#define vtd_ce_get_rid2pasid(ce) \
> > +((ce)->val[1] & VTD_SM_CONTEXT_ENTRY_RID2PASID_MASK)
> > +#define vtd_ce_get_pasid_dir_table(ce) \
> > +((ce)->val[0] & VTD_PASID_DIR_BASE_ADDR_MASK)
> > +
> > +/* pasid operations */
> > +#define vtd_pdire_get_pasidt_base(pdire) \
> > +((pdire)->val & VTD_PASID_TABLE_BASE_ADDR_MASK)
> > +#define vtd_get_pasid_dir_entry_size() VTD_PASID_DIR_ENTRY_SIZE
> > +#define vtd_get_pasid_entry_size() VTD_PASID_ENTRY_SIZE
> > +#define vtd_get_pasid_dir_index(pasid) VTD_PASID_DIR_INDEX(pasid)
> > +#define vtd_get_pasid_table_index(pasid) VTD_PASID_TABLE_INDEX(pasid)
> 
> These macros seem useless.  Please use the existing ones, they are
> good enough AFAICT.  Also, please use capital letters for macro
> definitions so that format will be matched with existing codes.  The
> capital issue is there for the whole series, please adjust them
> accordingly.  I'll stop here on commenting anything about macros...
> 
Ok, I will adjust macro names.

> > +
> > +/* pe operations */
> > +#define vtd_pe_get_type(pe) ((pe)->val[0] & VTD_SM_PASID_ENTRY_PGTT)
> > +#define vtd_pe_get_level(pe) (2 + (((pe)->val[0] >> 2) & 
> > VTD_SM_PASID_ENTRY_AW))
> > +#define vtd_pe_get_agaw(pe) \
> > +(30 + (((pe)->val[0] >> 2) & VTD_SM_PASID_ENTRY_AW) * 9)
> > +#define vtd_pe_get_slpt_base(pe) ((pe)->val[0] & 
> > VTD_SM_PASID_ENTRY_SLPTPTR)
> > +#define vtd_pe_get_domain_id(pe) VTD_SM_PASID_ENTRY_DID((pe)->val[1])
> > +
> >  static void vtd_address_space_refresh_all(IntelIOMMUState *s);
> >  static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n);
> >  
> > @@ -512,9 +540,15 @@ static void 
> > vtd_generate_completion_event(IntelIOMMUState *s)
> >  }
> >  }
> >  
> > -static inline bool vtd_root_entry_present(VTDRootEntry *root)
> > +static inline bool vtd_root_entry_present(IntelIOMMUState *s,
> > +  VTDRootEntry *re,
> > +  uint8_t devfn)
> >  {
> > -return root->val & VTD_ROOT_ENTRY_P;
> > +if (s->root_scalable && vtd_devfn_check(devfn)) {
> > +return re->hi & VTD_ROOT_ENTRY_P;
> > +}
> > +
> > +return re->lo & VTD_ROOT_ENTRY_P;
> >  }
> >  
> >  static int vtd_get_root_entry(IntelIOMMUState *s, uint8_t index,
> > @@ -524,36 +558,64 @@ static int

[Qemu-devel] [Bug 1815721] [NEW] RISC-V PLIC enable interrupt for multicore

2019-02-12 Thread RTOS Pharos

Public bug reported:

Hello all,

There is a bug in Qemu related to the enabling of external interrupts
for multicores (Virt machine).

After correcting Qemu as described in #1815078
(https://bugs.launchpad.net/qemu/+bug/1815078), when we try to enable
interrupts for core 1 at address 0x0C00_2080 we don't seem to be able to
trigger an external interrupt  (e.g. UART0).

This works perfectly for core 0, but fore core 1 it does not work at
all. I assume that given bug #1815078 does not enable any external
interrupt then this feature has not been tested. I tried to look at the
qemu source code but with no luck so far.

I guess the problem is related to function parse_hart_config (in
sfive_plic.c) that initializes incorrectly the
plic->addr_config[addrid].hartid, which is later on read in
sifive_plic_update. But this is a guess.

Best regards,
Pharos team

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1815721

Title:
  RISC-V PLIC enable interrupt for multicore

Status in QEMU:
  New

Bug description:
  Hello all,

  There is a bug in Qemu related to the enabling of external interrupts
  for multicores (Virt machine).

  After correcting Qemu as described in #1815078
  (https://bugs.launchpad.net/qemu/+bug/1815078), when we try to enable
  interrupts for core 1 at address 0x0C00_2080 we don't seem to be able
  to trigger an external interrupt  (e.g. UART0).

  This works perfectly for core 0, but fore core 1 it does not work at
  all. I assume that given bug #1815078 does not enable any external
  interrupt then this feature has not been tested. I tried to look at
  the qemu source code but with no luck so far.

  I guess the problem is related to function parse_hart_config (in
  sfive_plic.c) that initializes incorrectly the
  plic->addr_config[addrid].hartid, which is later on read in
  sifive_plic_update. But this is a guess.

  Best regards,
  Pharos team

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1815721/+subscriptions

Re: [Qemu-devel] [PATCH] Kconfig: add documentation

2019-02-12 Thread Markus Armbruster

Paolo Bonzini  writes:

> On 12/02/19 10:08, Markus Armbruster wrote:
>> Please wrap your lines at column 70 or so.  Humans tend to have trouble
>> following long lines with their eyes (I sure do).  Typographic manuals
>> suggest to limit columns to roughly 60 characters for exactly that
>> reason[*].
>
> Yup, fixed in v2.
>
>>> +Each QEMU target enables a subset of the boards, devices and buses that 
>>> are included
>>> +in QEMU's source code.  As a result, each QEMU executable only links a 
>>> small subset
>> 
>> Really?  Hmm...
>> 
>> $ size aarch64-softmmu/qemu-system-aarch64
>>text data bss dec hex filename
>> 19183216 7200124  592732 2697607219b9f48 
>> aarch64-softmmu/qemu-system-aarch64
>> $ size -t `find -name \*.o `| grep TOT
>> 92713559 186522271183961440  1295327226  4d351ffa
>> (TOTALS)
>> 
>> Yep, really.
>
> Haha. :)
>
>>> +  Optionally, a condition for applying the default value can be added with
>>> +  ``if``.  A config option can have any number of default values (usually, 
>>> if more than
>>> +  one default is present, they will have different conditions). If multiple
>>> +  default values satisfy their condition, only the first defined one is 
>>> active.
>> 
>> Hmm.  Is "multiple default values, first one wins" a healthy state?
>> How obvious is "first defined" to humans?  
>
> It certainly helps that we never have more than one default. :)

True!

> I could also be persuaded to remove "default n", so that multiple
> "default y" clauses are just an OR of the conditions and the ordering
> does not matter.

I'm looking for something that doesn't involve too much global
reasoning.

"All default directives must provide the same value; their conditions
are ORed" feels fine to me.  Order doesn't matter then.  "if "
really means "if", not "if-and-only-if", and that's fine.

Additionally permitting an *unconditional* default with the negated
value would still be okay, I guess.  But I'd do that only when we have a
use.

> Are multiple "default y" clauses useful?  They were there in earlier
> versions of the patches, for now "imply" has removed the need
> everywhere.  However, I'm not sure it's a good idea to remove them
> altogether before we try to extend Kconfig to more features.  (A
> secondary effect of the documentation is to clarify the current scope of
> Kconfig).

My general advice would be YAGNI.  However, keeping currently unused
features around while we grow Kconfig makes sense to me.

>>> +**reverse default** (weak reverse dependency): ``imply  [if 
>>> ]``
>> 
>> If "reverse default" can be regarded as weak reverse dependency, could
>> "default value" be regarded as weak (forward) dependency?
>
> "default n if" could, but we never use it.  This also shows how we use
> the language in a very limited way, according to the very simple rules
> in the second part of the document; but even Linux only has a handful of
> occurrences of "default n if".
>
>>> +
>>> +  This is similar to ``select`` as it applies a lower limit of ``y`` to 
>>> another
>>> +  symbol.  However, the lower limit is only a default and the "implied" 
>>> symbol's
>>> +  value may still be set to ``n`` from a ``default-configs/*.mak`` files.  
>>> The
>> 
>> I'm afraid I don't get "lower limit".  What's the ordering relation?
>
> False < true, so "lower limit of y" means "tries to force to y".  The
> difference is that a contradiction is ignored by "imply" (and then the
> symbol remains at n), while it causes a build error for "select".

Can we use this explanation to rephrase the documentation in simpler
language?

Let me summarize to see whether I got it.  Please correct
misunderstandings.

* "depends on" forces to false unless condition is met

* "select" forces to true if condition is met

* Contradictions between "depends on" and "select" are rejected

* If neither applies, default-configs/*.mak may supply the value

* If it doesn't, "default" / "imply" supply the value if condition is
  met

* What about contradictions between "default" / "imply"?


PS: Thanks for writing down intended use in "Guidelines for writing
Kconfig files", not just the language specification, makes the document
so much more useful.

Re: [Qemu-devel] [Issues] PCI hotplug does not work well on pc platform?

2019-02-12 Thread Liu, Jing2


Hi Igor,

Thanks for your reply!

On 2/5/2019 11:47 PM, Igor Mammedov wrote:

On Wed, 30 Jan 2019 21:02:10 +0800
"Liu, Jing2"  wrote:


Hi everyone,

I have two questions.
1. PCI hotplug on pci.0 must manually rescan in guest. The ACPI hotplug
handler sends the GPE event to guest but it seems guest doesn't receive
it? I tried to open ACPI debug level/layer to 0x, in order to
see if there is any message after device_add in monitor, but no message
comes out until I manually rescan. Also tried printk in
acpi_ev_gpe_xrupt_handler() and acpi_ev_sci_xrupt_handler(). No output
in dmesg.
(I'm sure that CONFIG_HOTPLUG_PCI_PCIE=y, CONFIG_HOTPLUG_PCI_CPCI=y,
CONFIG_HOTPLUG_PCI=y, CONFIG_HOTPLUG_PCI_ACPI=y)

Whether this is a kind of design or a known issue? Does guest receive
the request, where can I find the

does it work with known to work kernels (RHEL7)?

Also sharing used QEMU version and command line could help.


Is there any key config of kernel in guest, besides those I listed above?

I used guest kernel v4.18 and qemu upstream v3.1.0
Command line:
sudo /home/xxx/qemu/x86_64-softmmu/qemu-system-x86_64  \
-machine pc,accel=kvm,kernel_irqchip -cpu host -m 1G,slots=2,maxmem=10G \
-nographic -no-user-config -nodefaults -vga none \
-drive file=/home/xxx/image/clear-24690-kvm.img,if=virtio,format=raw \
 -smp sockets=1,cpus=4,cores=2,maxcpus=8 \
-device virtio-serial-pci,id=virtio-serial0,disable-modern,addr=0x5 \
-monitor tcp:0.0.0.0:5000,nowait,server \
-chardev stdio,id=charconsole0 -device 
virtconsole,chardev=charconsole0,id=console0  \

-kernel /home/xxx/linux-stable/arch/x86/boot/bzImage \
-append 'root=/dev/vda3 rw rootfstype=ext4 data=ordered 
rcupdate.rcu_expedited=1 pci=lastbus=0 pci=realloc=on tsc=reliable 
no_timer_check reboot=t noapictimer console=hvc0 iommu=off  panic=1 
initcall_debug acpi.debug_layer=0x4 acpi.debug_level=4 ' \

-device pci-bridge,bus=pci.0,id=br1,chassis_nr=1,shpc=on,addr=6  \



2. I want to try hotplugging on pci-bridge on pc platform, using shpc. I
set shpc=on, but when I do device_add, qemu still calls
acpi_pcihp_device_plug_cb? Why it does not call pci_bridge_dev_hotplug_cb?
(CONFIG_HOTPLUG_PCI_SHPC=y)


try to disable ACPI hotplug for bridges
  -global PIIX4_PM.acpi-pci-hotplug-with-bridge-support=off


I'll try it!

Thanks!
Jing

Re: [Qemu-devel] [PATCH v3 00/17] block: local qiov helper

2019-02-12 Thread Stefan Hajnoczi

On Thu, Feb 07, 2019 at 01:24:28PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> Here is a new simple helper for a very often patter
> around qemu_iovec_init_external, when we need simple qiov with only
> one iov, initialized from external buffer.
> 
> v3:
>   01-02: tiny improvements, described in patch-emails
>   03-17: new patches
> 
>   Note: only hw/scsi/scsi-disk.c not updated, as it has too tricky
> logic around @iov fields of structures. So, it is simpler to
> keep it as is.
> 
> Previous series version was "[PATCH v2 0/2] block: local qiov helper: part I"
> https://lists.gnu.org/archive/html/qemu-devel/2019-02/msg01610.html
> 
> Vladimir Sementsov-Ogievskiy (17):
>   block: enhance QEMUIOVector structure
>   block/io: use qemu_iovec_init_buf
>   block/block-backend: use QEMU_IOVEC_INIT_BUF
>   block/backup: use qemu_iovec_init_buf
>   block/commit: use QEMU_IOVEC_INIT_BUF
>   block/stream: use QEMU_IOVEC_INIT_BUF
>   block/parallels: use QEMU_IOVEC_INIT_BUF
>   block/qcow: use qemu_iovec_init_buf
>   block/qcow2: use qemu_iovec_init_buf
>   block/qed: use qemu_iovec_init_buf
>   block/vmdk: use qemu_iovec_init_buf
>   qemu-img: use qemu_iovec_init_buf
>   migration/block: use qemu_iovec_init_buf
>   tests/test-bdrv-drain: use QEMU_IOVEC_INIT_BUF
>   hw/ide: drop iov field from IDEState
>   hw/ide: drop iov field from IDEBufferedRequest
>   hw/ide: drop iov field from IDEDMA
> 
>  include/hw/ide/internal.h |  3 --
>  include/qemu/iov.h| 64 +++-
>  block/backup.c|  5 +--
>  block/block-backend.c | 13 +-
>  block/commit.c|  7 +--
>  block/io.c| 89 +--
>  block/parallels.c | 13 +++---
>  block/qcow.c  | 21 ++---
>  block/qcow2.c | 12 +-
>  block/qed-table.c | 16 ++-
>  block/qed.c   | 31 --
>  block/stream.c|  7 +--
>  block/vmdk.c  |  7 +--
>  hw/ide/atapi.c| 14 +++---
>  hw/ide/core.c | 19 -
>  migration/block.c | 10 ++---
>  qemu-img.c| 10 +
>  tests/test-bdrv-drain.c   | 29 ++---
>  18 files changed, 134 insertions(+), 236 deletions(-)
> 
> -- 
> 2.18.0

I made the changes suggested by Eric in Patch 1.

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan


signature.asc
Description: PGP signature

[Qemu-devel] [PATCH v2] hostmem-file: reject invalid pmem file sizes

2019-02-12 Thread Stefan Hajnoczi

Guests started with NVDIMMs larger than the underlying host file produce
confusing errors inside the guest.  This happens because the guest
accesses pages beyond the end of the file.

Check the pmem file size on startup and print a clear error message if
the size is invalid.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1669053
Cc: Haozhong Zhang 
Cc: Zhang Yi 
Cc: Eduardo Habkost 
Cc: Igor Mammedov 
Signed-off-by: Stefan Hajnoczi 
---
v2:
 * Propagate qemu_get_pmem_size() errors [Igor]

 include/qemu/osdep.h| 13 ++
 backends/hostmem-file.c | 22 +
 util/oslib-posix.c  | 53 +
 util/oslib-win32.c  |  5 
 4 files changed, 93 insertions(+)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 840af09cb0..303d315c5d 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -570,6 +570,19 @@ void qemu_set_tty_echo(int fd, bool echo);
 void os_mem_prealloc(int fd, char *area, size_t sz, int smp_cpus,
  Error **errp);
 
+/**
+ * qemu_get_pmem_size:
+ * @filename: path to a pmem file
+ * @errp: pointer to a NULL-initialized error object
+ *
+ * Determine the size of a persistent memory file.  Besides supporting files on
+ * DAX file systems, this function also supports Linux devdax character
+ * devices.
+ *
+ * Returns: the size or 0 on failure
+ */
+uint64_t qemu_get_pmem_size(const char *filename, Error **errp);
+
 /**
  * qemu_get_pid_name:
  * @pid: pid of a process
diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index ba601ce940..d62689179b 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -46,6 +46,28 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 gchar *name;
 #endif
 
+/*
+ * Verify pmem file size since starting a guest with an incorrect size
+ * leads to confusing failures inside the guest.
+ */
+if (fb->is_pmem && fb->mem_path) {
+Error *local_err = NULL;
+uint64_t size;
+
+size = qemu_get_pmem_size(fb->mem_path, _err);
+if (!size) {
+error_propagate(errp, local_err);
+return;
+}
+
+if (backend->size > size) {
+error_setg(errp, "size property %" PRIu64 " is larger than "
+   "pmem file \"%s\" size %" PRIu64, backend->size,
+   fb->mem_path, size);
+return;
+}
+}
+
 if (!backend->size) {
 error_setg(errp, "can't create backend with size 0");
 return;
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 37c5854b9c..10d90d1783 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -500,6 +500,59 @@ void os_mem_prealloc(int fd, char *area, size_t memory, 
int smp_cpus,
 }
 }
 
+uint64_t qemu_get_pmem_size(const char *filename, Error **errp)
+{
+struct stat st;
+
+if (stat(filename, ) < 0) {
+error_setg(errp, "unable to stat pmem file \"%s\"", filename);
+return 0;
+}
+
+#if defined(__linux__)
+/* Special handling for devdax character devices */
+if (S_ISCHR(st.st_mode)) {
+char *subsystem_path = NULL;
+char *subsystem = NULL;
+char *size_path = NULL;
+char *size_str = NULL;
+uint64_t ret = 0;
+
+subsystem_path = g_strdup_printf("/sys/dev/char/%d:%d/subsystem",
+ major(st.st_rdev), minor(st.st_rdev));
+subsystem = g_file_read_link(subsystem_path, NULL);
+if (!subsystem) {
+error_setg(errp, "unable to read subsystem for pmem file \"%s\"",
+   filename);
+goto devdax_err;
+}
+
+if (!g_str_has_suffix(subsystem, "/dax")) {
+error_setg(errp, "pmem file \"%s\" is not a dax device", filename);
+goto devdax_err;
+}
+
+size_path = g_strdup_printf("/sys/dev/char/%d:%d/size",
+major(st.st_rdev), minor(st.st_rdev));
+if (!g_file_get_contents(size_path, _str, NULL, NULL)) {
+error_setg(errp, "unable to read size for pmem file \"%s\"",
+   size_path);
+goto devdax_err;
+}
+
+ret = g_ascii_strtoull(size_str, NULL, 0);
+
+devdax_err:
+g_free(size_str);
+g_free(size_path);
+g_free(subsystem);
+g_free(subsystem_path);
+return ret;
+}
+#endif /* defined(__linux__) */
+
+return st.st_size;
+}
 
 char *qemu_get_pid_name(pid_t pid)
 {
diff --git a/util/oslib-win32.c b/util/oslib-win32.c
index b4c17f5dfa..bd633afab6 100644
--- a/util/oslib-win32.c
+++ b/util/oslib-win32.c
@@ -560,6 +560,11 @@ void os_mem_prealloc(int fd, char *area, size_t memory, 
int smp_cpus,
 }
 }
 
+uint64_t qemu_get_pmem_size(const char *filename, Error **errp)
+{
+error_setg(errp, "pmem support not available");
+return 0;
+}
 
 char *qemu_get_pid_name(pid_t pid)
 {
-- 
2.20.1

[Qemu-devel] [PATCH v1 2/9] hw/rdma: Introduce locked qlist

2019-02-12 Thread Yuval Shaia

To make code more readable move handling of locked list to a generic
functions.

Signed-off-by: Yuval Shaia 
---
 hw/rdma/rdma_backend.c  | 20 +--
 hw/rdma/rdma_backend_defs.h |  8 ++--
 hw/rdma/rdma_utils.c| 39 +
 hw/rdma/rdma_utils.h|  9 +
 4 files changed, 55 insertions(+), 21 deletions(-)

diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
index 5f60856d19..2f6372f8f0 100644
--- a/hw/rdma/rdma_backend.c
+++ b/hw/rdma/rdma_backend.c
@@ -527,9 +527,7 @@ static unsigned int save_mad_recv_buffer(RdmaBackendDev 
*backend_dev,
 bctx->up_ctx = ctx;
 bctx->sge = *sge;
 
-qemu_mutex_lock(_dev->recv_mads_list.lock);
-qlist_append_int(backend_dev->recv_mads_list.list, bctx_id);
-qemu_mutex_unlock(_dev->recv_mads_list.lock);
+rdma_locked_list_append_int64(_dev->recv_mads_list, bctx_id);
 
 return 0;
 }
@@ -913,23 +911,19 @@ static inline void build_mad_hdr(struct ibv_grh *grh, 
union ibv_gid *sgid,
 static void process_incoming_mad_req(RdmaBackendDev *backend_dev,
  RdmaCmMuxMsg *msg)
 {
-QObject *o_ctx_id;
 unsigned long cqe_ctx_id;
 BackendCtx *bctx;
 char *mad;
 
 trace_mad_message("recv", msg->umad.mad, msg->umad_len);
 
-qemu_mutex_lock(_dev->recv_mads_list.lock);
-o_ctx_id = qlist_pop(backend_dev->recv_mads_list.list);
-qemu_mutex_unlock(_dev->recv_mads_list.lock);
-if (!o_ctx_id) {
+cqe_ctx_id = rdma_locked_list_pop_int64(_dev->recv_mads_list);
+if (cqe_ctx_id == -ENOENT) {
 rdma_warn_report("No more free MADs buffers, waiting for a while");
 sleep(THR_POLL_TO);
 return;
 }
 
-cqe_ctx_id = qnum_get_uint(qobject_to(QNum, o_ctx_id));
 bctx = rdma_rm_get_cqe_ctx(backend_dev->rdma_dev_res, cqe_ctx_id);
 if (unlikely(!bctx)) {
 rdma_error_report("No matching ctx for req %ld", cqe_ctx_id);
@@ -994,8 +988,7 @@ static int mad_init(RdmaBackendDev *backend_dev, 
CharBackend *mad_chr_be)
 return -EIO;
 }
 
-qemu_mutex_init(_dev->recv_mads_list.lock);
-backend_dev->recv_mads_list.list = qlist_new();
+rdma_locked_list_init(_dev->recv_mads_list);
 
 enable_rdmacm_mux_async(backend_dev);
 
@@ -1010,10 +1003,7 @@ static void mad_fini(RdmaBackendDev *backend_dev)
 {
 disable_rdmacm_mux_async(backend_dev);
 qemu_chr_fe_disconnect(backend_dev->rdmacm_mux.chr_be);
-if (backend_dev->recv_mads_list.list) {
-qlist_destroy_obj(QOBJECT(backend_dev->recv_mads_list.list));
-qemu_mutex_destroy(_dev->recv_mads_list.lock);
-}
+rdma_locked_list_destroy(_dev->recv_mads_list);
 }
 
 int rdma_backend_get_gid_index(RdmaBackendDev *backend_dev,
diff --git a/hw/rdma/rdma_backend_defs.h b/hw/rdma/rdma_backend_defs.h
index 15ae8b970e..bec0457f25 100644
--- a/hw/rdma/rdma_backend_defs.h
+++ b/hw/rdma/rdma_backend_defs.h
@@ -20,6 +20,7 @@
 #include "chardev/char-fe.h"
 #include 
 #include "contrib/rdmacm-mux/rdmacm-mux.h"
+#include "rdma_utils.h"
 
 typedef struct RdmaDeviceResources RdmaDeviceResources;
 
@@ -30,11 +31,6 @@ typedef struct RdmaBackendThread {
 bool is_running; /* Set by the thread to report its status */
 } RdmaBackendThread;
 
-typedef struct RecvMadList {
-QemuMutex lock;
-QList *list;
-} RecvMadList;
-
 typedef struct RdmaCmMux {
 CharBackend *chr_be;
 int can_receive;
@@ -48,7 +44,7 @@ typedef struct RdmaBackendDev {
 struct ibv_context *context;
 struct ibv_comp_channel *channel;
 uint8_t port_num;
-RecvMadList recv_mads_list;
+LockedList recv_mads_list;
 RdmaCmMux rdmacm_mux;
 } RdmaBackendDev;
 
diff --git a/hw/rdma/rdma_utils.c b/hw/rdma/rdma_utils.c
index f1c980c6be..a2a4ea2a15 100644
--- a/hw/rdma/rdma_utils.c
+++ b/hw/rdma/rdma_utils.c
@@ -14,6 +14,8 @@
  */
 
 #include "qemu/osdep.h"
+#include "qapi/qmp/qlist.h"
+#include "qapi/qmp/qnum.h"
 #include "trace.h"
 #include "rdma_utils.h"
 
@@ -55,3 +57,40 @@ void rdma_pci_dma_unmap(PCIDevice *dev, void *buffer, 
dma_addr_t len)
 pci_dma_unmap(dev, buffer, len, DMA_DIRECTION_TO_DEVICE, 0);
 }
 }
+
+void rdma_locked_list_init(LockedList *list)
+{
+qemu_mutex_init(>lock);
+list->list = qlist_new();
+}
+
+void rdma_locked_list_destroy(LockedList *list)
+{
+if (list->list) {
+qlist_destroy_obj(QOBJECT(list->list));
+qemu_mutex_destroy(>lock);
+list->list = NULL;
+}
+}
+
+void rdma_locked_list_append_int64(LockedList *list, int64_t value)
+{
+qemu_mutex_lock(>lock);
+qlist_append_int(list->list, value);
+qemu_mutex_unlock(>lock);
+}
+
+int64_t rdma_locked_list_pop_int64(LockedList *list)
+{
+QObject *obj;
+
+qemu_mutex_lock(>lock);
+obj = qlist_pop(list->list);
+qemu_mutex_unlock(>lock);
+
+if (!obj) {
+return -ENOENT;
+}
+
+return qnum_get_uint(qobject_to(QNum, obj));
+}
diff --git a/hw/rdma/rdma_utils.h b/hw/rdma/rdma_utils.h

[Qemu-devel] [PATCH v2 2/9] hw/rdma: Introduce protected qlist

2019-02-12 Thread Yuval Shaia

To make code more readable move handling of protected list to a
rdma_utils

Signed-off-by: Yuval Shaia 
---
 hw/rdma/rdma_backend.c  | 20 +--
 hw/rdma/rdma_backend_defs.h |  8 ++--
 hw/rdma/rdma_utils.c| 39 +
 hw/rdma/rdma_utils.h|  9 +
 4 files changed, 55 insertions(+), 21 deletions(-)

diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
index 5f60856d19..6e9c4617da 100644
--- a/hw/rdma/rdma_backend.c
+++ b/hw/rdma/rdma_backend.c
@@ -527,9 +527,7 @@ static unsigned int save_mad_recv_buffer(RdmaBackendDev 
*backend_dev,
 bctx->up_ctx = ctx;
 bctx->sge = *sge;
 
-qemu_mutex_lock(_dev->recv_mads_list.lock);
-qlist_append_int(backend_dev->recv_mads_list.list, bctx_id);
-qemu_mutex_unlock(_dev->recv_mads_list.lock);
+rdma_protected_qlist_append_int64(_dev->recv_mads_list, bctx_id);
 
 return 0;
 }
@@ -913,23 +911,19 @@ static inline void build_mad_hdr(struct ibv_grh *grh, 
union ibv_gid *sgid,
 static void process_incoming_mad_req(RdmaBackendDev *backend_dev,
  RdmaCmMuxMsg *msg)
 {
-QObject *o_ctx_id;
 unsigned long cqe_ctx_id;
 BackendCtx *bctx;
 char *mad;
 
 trace_mad_message("recv", msg->umad.mad, msg->umad_len);
 
-qemu_mutex_lock(_dev->recv_mads_list.lock);
-o_ctx_id = qlist_pop(backend_dev->recv_mads_list.list);
-qemu_mutex_unlock(_dev->recv_mads_list.lock);
-if (!o_ctx_id) {
+cqe_ctx_id = rdma_protected_qlist_pop_int64(_dev->recv_mads_list);
+if (cqe_ctx_id == -ENOENT) {
 rdma_warn_report("No more free MADs buffers, waiting for a while");
 sleep(THR_POLL_TO);
 return;
 }
 
-cqe_ctx_id = qnum_get_uint(qobject_to(QNum, o_ctx_id));
 bctx = rdma_rm_get_cqe_ctx(backend_dev->rdma_dev_res, cqe_ctx_id);
 if (unlikely(!bctx)) {
 rdma_error_report("No matching ctx for req %ld", cqe_ctx_id);
@@ -994,8 +988,7 @@ static int mad_init(RdmaBackendDev *backend_dev, 
CharBackend *mad_chr_be)
 return -EIO;
 }
 
-qemu_mutex_init(_dev->recv_mads_list.lock);
-backend_dev->recv_mads_list.list = qlist_new();
+rdma_protected_qlist_init(_dev->recv_mads_list);
 
 enable_rdmacm_mux_async(backend_dev);
 
@@ -1010,10 +1003,7 @@ static void mad_fini(RdmaBackendDev *backend_dev)
 {
 disable_rdmacm_mux_async(backend_dev);
 qemu_chr_fe_disconnect(backend_dev->rdmacm_mux.chr_be);
-if (backend_dev->recv_mads_list.list) {
-qlist_destroy_obj(QOBJECT(backend_dev->recv_mads_list.list));
-qemu_mutex_destroy(_dev->recv_mads_list.lock);
-}
+rdma_protected_qlist_destroy(_dev->recv_mads_list);
 }
 
 int rdma_backend_get_gid_index(RdmaBackendDev *backend_dev,
diff --git a/hw/rdma/rdma_backend_defs.h b/hw/rdma/rdma_backend_defs.h
index 15ae8b970e..a8c15b09ab 100644
--- a/hw/rdma/rdma_backend_defs.h
+++ b/hw/rdma/rdma_backend_defs.h
@@ -20,6 +20,7 @@
 #include "chardev/char-fe.h"
 #include 
 #include "contrib/rdmacm-mux/rdmacm-mux.h"
+#include "rdma_utils.h"
 
 typedef struct RdmaDeviceResources RdmaDeviceResources;
 
@@ -30,11 +31,6 @@ typedef struct RdmaBackendThread {
 bool is_running; /* Set by the thread to report its status */
 } RdmaBackendThread;
 
-typedef struct RecvMadList {
-QemuMutex lock;
-QList *list;
-} RecvMadList;
-
 typedef struct RdmaCmMux {
 CharBackend *chr_be;
 int can_receive;
@@ -48,7 +44,7 @@ typedef struct RdmaBackendDev {
 struct ibv_context *context;
 struct ibv_comp_channel *channel;
 uint8_t port_num;
-RecvMadList recv_mads_list;
+RdmaProtectedQList recv_mads_list;
 RdmaCmMux rdmacm_mux;
 } RdmaBackendDev;
 
diff --git a/hw/rdma/rdma_utils.c b/hw/rdma/rdma_utils.c
index f1c980c6be..672a09079a 100644
--- a/hw/rdma/rdma_utils.c
+++ b/hw/rdma/rdma_utils.c
@@ -14,6 +14,8 @@
  */
 
 #include "qemu/osdep.h"
+#include "qapi/qmp/qlist.h"
+#include "qapi/qmp/qnum.h"
 #include "trace.h"
 #include "rdma_utils.h"
 
@@ -55,3 +57,40 @@ void rdma_pci_dma_unmap(PCIDevice *dev, void *buffer, 
dma_addr_t len)
 pci_dma_unmap(dev, buffer, len, DMA_DIRECTION_TO_DEVICE, 0);
 }
 }
+
+void rdma_protected_qlist_init(RdmaProtectedQList *list)
+{
+qemu_mutex_init(>lock);
+list->list = qlist_new();
+}
+
+void rdma_protected_qlist_destroy(RdmaProtectedQList *list)
+{
+if (list->list) {
+qlist_destroy_obj(QOBJECT(list->list));
+qemu_mutex_destroy(>lock);
+list->list = NULL;
+}
+}
+
+void rdma_protected_qlist_append_int64(RdmaProtectedQList *list, int64_t value)
+{
+qemu_mutex_lock(>lock);
+qlist_append_int(list->list, value);
+qemu_mutex_unlock(>lock);
+}
+
+int64_t rdma_protected_qlist_pop_int64(RdmaProtectedQList *list)
+{
+QObject *obj;
+
+qemu_mutex_lock(>lock);
+obj = qlist_pop(list->list);
+qemu_mutex_unlock(>lock);
+
+if (!obj) {
+return -ENOENT;
+}
+
+return qnum_get_uint(qobject_to(QNum,

[Qemu-devel] [PATCH v2 3/9] hw/rdma: Protect against concurrent execution of poll_cq

2019-02-12 Thread Yuval Shaia

The function rdma_poll_cq is called from two contexts - completion
handler thread which sense new completion on backend channel and
explicitly as result of guest issuing poll_cq command.

Add lock to protect against concurrent executions.

Signed-off-by: Yuval Shaia 
Reviewed-by: Marcel Apfelbaum
---
 hw/rdma/rdma_backend.c | 2 ++
 hw/rdma/rdma_rm.c  | 4 
 hw/rdma/rdma_rm_defs.h | 1 +
 3 files changed, 7 insertions(+)

diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
index 6e9c4617da..3a2913facf 100644
--- a/hw/rdma/rdma_backend.c
+++ b/hw/rdma/rdma_backend.c
@@ -70,6 +70,7 @@ static void rdma_poll_cq(RdmaDeviceResources *rdma_dev_res, 
struct ibv_cq *ibcq)
 BackendCtx *bctx;
 struct ibv_wc wc[2];
 
+qemu_mutex_lock(_dev_res->lock);
 do {
 ne = ibv_poll_cq(ibcq, ARRAY_SIZE(wc), wc);
 
@@ -89,6 +90,7 @@ static void rdma_poll_cq(RdmaDeviceResources *rdma_dev_res, 
struct ibv_cq *ibcq)
 g_free(bctx);
 }
 } while (ne > 0);
+qemu_mutex_unlock(_dev_res->lock);
 
 if (ne < 0) {
 rdma_error_report("ibv_poll_cq fail, rc=%d, errno=%d", ne, errno);
diff --git a/hw/rdma/rdma_rm.c b/hw/rdma/rdma_rm.c
index 64c6ea1a4e..7cc597cdc8 100644
--- a/hw/rdma/rdma_rm.c
+++ b/hw/rdma/rdma_rm.c
@@ -618,12 +618,16 @@ int rdma_rm_init(RdmaDeviceResources *dev_res, struct 
ibv_device_attr *dev_attr,
 
 init_ports(dev_res);
 
+qemu_mutex_init(_res->lock);
+
 return 0;
 }
 
 void rdma_rm_fini(RdmaDeviceResources *dev_res, RdmaBackendDev *backend_dev,
   const char *ifname)
 {
+qemu_mutex_destroy(_res->lock);
+
 fini_ports(dev_res, backend_dev, ifname);
 
 res_tbl_free(_res->uc_tbl);
diff --git a/hw/rdma/rdma_rm_defs.h b/hw/rdma/rdma_rm_defs.h
index 0ba61d1838..f0ee1f3072 100644
--- a/hw/rdma/rdma_rm_defs.h
+++ b/hw/rdma/rdma_rm_defs.h
@@ -105,6 +105,7 @@ typedef struct RdmaDeviceResources {
 RdmaRmResTbl cq_tbl;
 RdmaRmResTbl cqe_ctx_tbl;
 GHashTable *qp_hash; /* Keeps mapping between real and emulated */
+QemuMutex lock;
 } RdmaDeviceResources;
 
 #endif
-- 
2.17.2

[Qemu-devel] [PATCH] hw/sparc64: Explicitly set default_display = "std"

2019-02-12 Thread Thomas Huth

The sun4uv_init() function expects vga_interface_type to be either
VGA_STD or VGA_NONE and sets up a stdvga device or no vga card
accordingly.
However, the code in vl.c prefers the Cirrus VGA card to stdvga if
it is available and the user and the machine did not specify anything
else.
So far this has not been a problem, since the Cirrus VGA was not linked
into the sparc64 target. But with the upcoming Kconfig build system,
all theoretically possible PCI cards will be enabled by default, so the
Cirrus VGA card might become available on the sparc64 target, too. vl.c
then picks the wrong card, causing sun4uv_init() to abort.
Thus let's make it explicit that we always want stdvga for sparc64 and
so set default_display = "std" for these machines.

Signed-off-by: Thomas Huth 
---
 hw/sparc64/sun4u.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/sparc64/sun4u.c b/hw/sparc64/sun4u.c
index ff24d9b..399f2d7 100644
--- a/hw/sparc64/sun4u.c
+++ b/hw/sparc64/sun4u.c
@@ -797,6 +797,7 @@ static void sun4u_class_init(ObjectClass *oc, void *data)
 mc->default_boot_order = "c";
 mc->default_cpu_type = SPARC_CPU_TYPE_NAME("TI-UltraSparc-IIi");
 mc->ignore_boot_device_suffixes = true;
+mc->default_display = "std";
 fwc->get_dev_path = sun4u_fw_dev_path;
 }
 
@@ -820,6 +821,7 @@ static void sun4v_class_init(ObjectClass *oc, void *data)
 mc->max_cpus = 1; /* XXX for now */
 mc->default_boot_order = "c";
 mc->default_cpu_type = SPARC_CPU_TYPE_NAME("Sun-UltraSparc-T1");
+mc->default_display = "std";
 }
 
 static const TypeInfo sun4v_type = {
-- 
1.8.3.1

[Qemu-devel] [PATCH v2 8/9] hw/pvrdma: Delete pvrdma_exit function

2019-02-12 Thread Yuval Shaia

This hook is not called and was implemented by mistake.

Signed-off-by: Yuval Shaia 
Reviewed-by: Marcel Apfelbaum 
---
 hw/rdma/vmw/pvrdma_main.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
index 90dc9b191b..8b379c6435 100644
--- a/hw/rdma/vmw/pvrdma_main.c
+++ b/hw/rdma/vmw/pvrdma_main.c
@@ -701,18 +701,12 @@ out:
 }
 }
 
-static void pvrdma_exit(PCIDevice *pdev)
-{
-pvrdma_fini(pdev);
-}
-
 static void pvrdma_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
 PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
 k->realize = pvrdma_realize;
-k->exit = pvrdma_exit;
 k->vendor_id = PCI_VENDOR_ID_VMWARE;
 k->device_id = PCI_DEVICE_ID_VMWARE_PVRDMA;
 k->revision = 0x00;
-- 
2.17.2

[Qemu-devel] [PATCH v2 1/9] hw/rdma: Switch to generic error reporting way

2019-02-12 Thread Yuval Shaia

Utilize error_report for all pr_err calls and some pr_dbg that are
considered as errors.
For the remaining pr_dbg calls, the important ones were replaced by
trace points while other deleted.

Signed-off-by: Yuval Shaia 
Reviewed-by: Marcel Apfelbaum 
---
 hw/rdma/rdma_backend.c| 336 ++
 hw/rdma/rdma_rm.c | 121 +---
 hw/rdma/rdma_utils.c  |  11 +-
 hw/rdma/rdma_utils.h  |  45 +
 hw/rdma/trace-events  |  32 +++-
 hw/rdma/vmw/pvrdma.h  |   2 +-
 hw/rdma/vmw/pvrdma_cmd.c  | 113 +++-
 hw/rdma/vmw/pvrdma_dev_ring.c |  26 +--
 hw/rdma/vmw/pvrdma_main.c | 132 +
 hw/rdma/vmw/pvrdma_qp_ops.c   |  49 ++---
 hw/rdma/vmw/trace-events  |  16 +-
 11 files changed, 337 insertions(+), 546 deletions(-)

diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
index fd571f21e5..5f60856d19 100644
--- a/hw/rdma/rdma_backend.c
+++ b/hw/rdma/rdma_backend.c
@@ -14,7 +14,6 @@
  */
 
 #include "qemu/osdep.h"
-#include "qemu/error-report.h"
 #include "sysemu/sysemu.h"
 #include "qapi/error.h"
 #include "qapi/qmp/qlist.h"
@@ -39,7 +38,6 @@
 
 typedef struct BackendCtx {
 void *up_ctx;
-bool is_tx_req;
 struct ibv_sge sge; /* Used to save MAD recv buffer */
 } BackendCtx;
 
@@ -52,7 +50,7 @@ static void (*comp_handler)(void *ctx, struct ibv_wc *wc);
 
 static void dummy_comp_handler(void *ctx, struct ibv_wc *wc)
 {
-pr_err("No completion handler is registered\n");
+rdma_error_report("No completion handler is registered");
 }
 
 static inline void complete_work(enum ibv_wc_status status, uint32_t 
vendor_err,
@@ -66,29 +64,24 @@ static inline void complete_work(enum ibv_wc_status status, 
uint32_t vendor_err,
 comp_handler(ctx, );
 }
 
-static void poll_cq(RdmaDeviceResources *rdma_dev_res, struct ibv_cq *ibcq)
+static void rdma_poll_cq(RdmaDeviceResources *rdma_dev_res, struct ibv_cq 
*ibcq)
 {
 int i, ne;
 BackendCtx *bctx;
 struct ibv_wc wc[2];
 
-pr_dbg("Entering poll_cq loop on cq %p\n", ibcq);
 do {
 ne = ibv_poll_cq(ibcq, ARRAY_SIZE(wc), wc);
 
-pr_dbg("Got %d completion(s) from cq %p\n", ne, ibcq);
+trace_rdma_poll_cq(ne, ibcq);
 
 for (i = 0; i < ne; i++) {
-pr_dbg("wr_id=0x%" PRIx64 "\n", wc[i].wr_id);
-pr_dbg("status=%d\n", wc[i].status);
-
 bctx = rdma_rm_get_cqe_ctx(rdma_dev_res, wc[i].wr_id);
 if (unlikely(!bctx)) {
-pr_dbg("Error: Failed to find ctx for req %" PRId64 "\n",
-   wc[i].wr_id);
+rdma_error_report("No matching ctx for req %"PRId64,
+  wc[i].wr_id);
 continue;
 }
-pr_dbg("Processing %s CQE\n", bctx->is_tx_req ? "send" : "recv");
 
 comp_handler(bctx->up_ctx, [i]);
 
@@ -98,7 +91,7 @@ static void poll_cq(RdmaDeviceResources *rdma_dev_res, struct 
ibv_cq *ibcq)
 } while (ne > 0);
 
 if (ne < 0) {
-pr_dbg("Got error %d from ibv_poll_cq\n", ne);
+rdma_error_report("ibv_poll_cq fail, rc=%d, errno=%d", ne, errno);
 }
 }
 
@@ -115,12 +108,10 @@ static void *comp_handler_thread(void *arg)
 flags = fcntl(backend_dev->channel->fd, F_GETFL);
 rc = fcntl(backend_dev->channel->fd, F_SETFL, flags | O_NONBLOCK);
 if (rc < 0) {
-pr_dbg("Fail to change to non-blocking mode\n");
+rdma_error_report("Failed to change backend channel FD to 
non-blocking");
 return NULL;
 }
 
-pr_dbg("Starting\n");
-
 pfds[0].fd = backend_dev->channel->fd;
 pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
 
@@ -132,27 +123,25 @@ static void *comp_handler_thread(void *arg)
 } while (!rc && backend_dev->comp_thread.run);
 
 if (backend_dev->comp_thread.run) {
-pr_dbg("Waiting for completion on channel %p\n", 
backend_dev->channel);
 rc = ibv_get_cq_event(backend_dev->channel, _cq, _ctx);
-pr_dbg("ibv_get_cq_event=%d\n", rc);
 if (unlikely(rc)) {
-pr_dbg("---> ibv_get_cq_event (%d)\n", rc);
+rdma_error_report("ibv_get_cq_event fail, rc=%d, errno=%d", rc,
+  errno);
 continue;
 }
 
 rc = ibv_req_notify_cq(ev_cq, 0);
 if (unlikely(rc)) {
-pr_dbg("Error %d from ibv_req_notify_cq\n", rc);
+rdma_error_report("ibv_req_notify_cq fail, rc=%d, errno=%d", 
rc,
+  errno);
 }
 
-poll_cq(backend_dev->rdma_dev_res, ev_cq);
+rdma_poll_cq(backend_dev->rdma_dev_res, ev_cq);
 
 ibv_ack_cq_events(ev_cq, 1);
 }
 }
 
-pr_dbg("Going down\n");
-
 /* TODO: Post cqe for all remaining buffs that were posted */
 
 backend_dev->comp_thread.is_running = false;
@@ -177,55 +166,54 @@ static inline int

[Qemu-devel] [PATCH v2 5/9] hw/rdma: Free all MAD receive buffers when device is closed

2019-02-12 Thread Yuval Shaia

When device is going down free all saved MAD buffers.

Signed-off-by: Yuval Shaia 
Reviewed-by: Marcel Apfelbaum
---
 hw/rdma/rdma_backend.c| 34 +-
 hw/rdma/vmw/pvrdma_main.c |  2 ++
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
index 0fb4842970..e8ee205b5f 100644
--- a/hw/rdma/rdma_backend.c
+++ b/hw/rdma/rdma_backend.c
@@ -64,6 +64,33 @@ static inline void complete_work(enum ibv_wc_status status, 
uint32_t vendor_err,
 comp_handler(ctx, );
 }
 
+static void free_cqe_ctx(gpointer data, gpointer user_data)
+{
+BackendCtx *bctx;
+RdmaDeviceResources *rdma_dev_res = user_data;
+unsigned long cqe_ctx_id = GPOINTER_TO_INT(data);
+
+bctx = rdma_rm_get_cqe_ctx(rdma_dev_res, cqe_ctx_id);
+if (bctx) {
+rdma_rm_dealloc_cqe_ctx(rdma_dev_res, cqe_ctx_id);
+}
+g_free(bctx);
+}
+
+static void clean_recv_mads(RdmaBackendDev *backend_dev)
+{
+unsigned long cqe_ctx_id;
+
+do {
+cqe_ctx_id = rdma_protected_qlist_pop_int64(_dev->
+recv_mads_list);
+if (cqe_ctx_id != -ENOENT) {
+free_cqe_ctx(GINT_TO_POINTER(cqe_ctx_id),
+ backend_dev->rdma_dev_res);
+}
+} while (cqe_ctx_id != -ENOENT);
+}
+
 static int rdma_poll_cq(RdmaDeviceResources *rdma_dev_res, struct ibv_cq *ibcq)
 {
 int i, ne, total_ne = 0;
@@ -1037,6 +1064,11 @@ static int mad_init(RdmaBackendDev *backend_dev, 
CharBackend *mad_chr_be)
 return 0;
 }
 
+static void mad_stop(RdmaBackendDev *backend_dev)
+{
+clean_recv_mads(backend_dev);
+}
+
 static void mad_fini(RdmaBackendDev *backend_dev)
 {
 disable_rdmacm_mux_async(backend_dev);
@@ -1224,12 +1256,12 @@ void rdma_backend_start(RdmaBackendDev *backend_dev)
 
 void rdma_backend_stop(RdmaBackendDev *backend_dev)
 {
+mad_stop(backend_dev);
 stop_backend_thread(_dev->comp_thread);
 }
 
 void rdma_backend_fini(RdmaBackendDev *backend_dev)
 {
-rdma_backend_stop(backend_dev);
 mad_fini(backend_dev);
 g_hash_table_destroy(ah_hash);
 ibv_destroy_comp_channel(backend_dev->channel);
diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
index 8ffe79ceca..90dc9b191b 100644
--- a/hw/rdma/vmw/pvrdma_main.c
+++ b/hw/rdma/vmw/pvrdma_main.c
@@ -361,6 +361,8 @@ static void pvrdma_fini(PCIDevice *pdev)
 
 pvrdma_qp_ops_fini();
 
+rdma_backend_stop(>backend_dev);
+
 rdma_rm_fini(>rdma_dev_res, >backend_dev,
  dev->backend_eth_device_name);
 
-- 
2.17.2

[Qemu-devel] [PATCH v2 4/9] {monitor, hw/pvrdma}: Expose device internals via monitor interface

2019-02-12 Thread Yuval Shaia

Allow interrogating device internals through HMP interface.
The exposed indicators can be used for troubleshooting by developers or
sysadmin.
There is no need to expose these attributes to a management system (e.x.
libvirt) because (1) most of them are not "device-management' related
info and (2) there is no guarantee the interface is stable.

Signed-off-by: Yuval Shaia 
---
 hmp-commands-info.hx  | 16 
 hw/rdma/rdma_backend.c| 70 ++-
 hw/rdma/rdma_rm.c |  7 
 hw/rdma/rdma_rm_defs.h| 27 +-
 hw/rdma/vmw/pvrdma.h  |  5 +++
 hw/rdma/vmw/pvrdma_hmp.h  | 21 +++
 hw/rdma/vmw/pvrdma_main.c | 77 +++
 monitor.c | 10 +
 8 files changed, 215 insertions(+), 18 deletions(-)
 create mode 100644 hw/rdma/vmw/pvrdma_hmp.h

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index cbee8b944d..9153c33974 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -524,6 +524,22 @@ STEXI
 Show CPU statistics.
 ETEXI
 
+#if defined(CONFIG_PVRDMA)
+{
+.name   = "pvrdmacounters",
+.args_type  = "",
+.params = "",
+.help   = "show pvrdma device counters",
+.cmd= hmp_info_pvrdmacounters,
+},
+
+STEXI
+@item info pvrdmacounters
+@findex info pvrdmacounters
+Show pvrdma device counters.
+ETEXI
+#endif
+
 #if defined(CONFIG_SLIRP)
 {
 .name   = "usernet",
diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
index 3a2913facf..0fb4842970 100644
--- a/hw/rdma/rdma_backend.c
+++ b/hw/rdma/rdma_backend.c
@@ -64,9 +64,9 @@ static inline void complete_work(enum ibv_wc_status status, 
uint32_t vendor_err,
 comp_handler(ctx, );
 }
 
-static void rdma_poll_cq(RdmaDeviceResources *rdma_dev_res, struct ibv_cq 
*ibcq)
+static int rdma_poll_cq(RdmaDeviceResources *rdma_dev_res, struct ibv_cq *ibcq)
 {
-int i, ne;
+int i, ne, total_ne = 0;
 BackendCtx *bctx;
 struct ibv_wc wc[2];
 
@@ -89,12 +89,18 @@ static void rdma_poll_cq(RdmaDeviceResources *rdma_dev_res, 
struct ibv_cq *ibcq)
 rdma_rm_dealloc_cqe_ctx(rdma_dev_res, wc[i].wr_id);
 g_free(bctx);
 }
+total_ne += ne;
 } while (ne > 0);
+atomic_sub(_dev_res->stats.missing_cqe, total_ne);
 qemu_mutex_unlock(_dev_res->lock);
 
 if (ne < 0) {
 rdma_error_report("ibv_poll_cq fail, rc=%d, errno=%d", ne, errno);
 }
+
+rdma_dev_res->stats.completions += total_ne;
+
+return total_ne;
 }
 
 static void *comp_handler_thread(void *arg)
@@ -122,6 +128,9 @@ static void *comp_handler_thread(void *arg)
 while (backend_dev->comp_thread.run) {
 do {
 rc = qemu_poll_ns(pfds, 1, THR_POLL_TO * (int64_t)SCALE_MS);
+if (!rc) {
+backend_dev->rdma_dev_res->stats.poll_cq_ppoll_to++;
+}
 } while (!rc && backend_dev->comp_thread.run);
 
 if (backend_dev->comp_thread.run) {
@@ -138,6 +147,7 @@ static void *comp_handler_thread(void *arg)
   errno);
 }
 
+backend_dev->rdma_dev_res->stats.poll_cq_from_bk++;
 rdma_poll_cq(backend_dev->rdma_dev_res, ev_cq);
 
 ibv_ack_cq_events(ev_cq, 1);
@@ -271,7 +281,13 @@ int rdma_backend_query_port(RdmaBackendDev *backend_dev,
 
 void rdma_backend_poll_cq(RdmaDeviceResources *rdma_dev_res, RdmaBackendCQ *cq)
 {
-rdma_poll_cq(rdma_dev_res, cq->ibcq);
+int polled;
+
+rdma_dev_res->stats.poll_cq_from_guest++;
+polled = rdma_poll_cq(rdma_dev_res, cq->ibcq);
+if (!polled) {
+rdma_dev_res->stats.poll_cq_from_guest_empty++;
+}
 }
 
 static GHashTable *ah_hash;
@@ -333,7 +349,7 @@ static void ah_cache_init(void)
 
 static int build_host_sge_array(RdmaDeviceResources *rdma_dev_res,
 struct ibv_sge *dsge, struct ibv_sge *ssge,
-uint8_t num_sge)
+uint8_t num_sge, uint64_t *total_length)
 {
 RdmaRmMR *mr;
 int ssge_idx;
@@ -349,6 +365,8 @@ static int build_host_sge_array(RdmaDeviceResources 
*rdma_dev_res,
 dsge->length = ssge[ssge_idx].length;
 dsge->lkey = rdma_backend_mr_lkey(>backend_mr);
 
+*total_length += dsge->length;
+
 dsge++;
 }
 
@@ -445,8 +463,10 @@ void rdma_backend_post_send(RdmaBackendDev *backend_dev,
 rc = mad_send(backend_dev, sgid_idx, sgid, sge, num_sge);
 if (rc) {
 complete_work(IBV_WC_GENERAL_ERR, VENDOR_ERR_MAD_SEND, ctx);
+backend_dev->rdma_dev_res->stats.mad_tx_err++;
 } else {
 complete_work(IBV_WC_SUCCESS, 0, ctx);
+backend_dev->rdma_dev_res->stats.mad_tx++;
 }
 }
 return;
@@ -458,20 +478,21 @@ void rdma_backend_post_send(RdmaBackendDev *backend_dev,
 rc =

[Qemu-devel] [PATCH v2 0/9] Misc fixes to pvrdma device

2019-02-12 Thread Yuval Shaia

Hi,
Please review the following patch-set which consist of cosmetics fixes to
device's user interface (traces, error_report and monitor) and some bug
fixes.

Thanks Markus, Eric, Marcel and David for reviewing v0.
Appreciate your review to this v2.

Still missing r-b for patches 2, 4 and 6.

v0 -> v1:
* Explain why device attributes are exposed only in HMP interface.
* Squash the 3 patches related to HMP interface into one.
* Make monitor dump function simple.
* Make HMP interface available only if pvrdma is included (detected by
  build robot).
* Remove patch 03/10 ("Warn when too many consecutive poll CQ triggered
  on an empty CQ) and add the two counters to patch 0/7 (monitor).
* Add Marcel's R-Bs.
* Add mutex protection to cqe_ctx list.
* Add two new patches.

v1 -> v2:
* Rename locked-lists to protected-lists in patch 2 and patch 6.
* Add Marcel's R-Bs.

Thanks,
Yuval


Yuval Shaia (9):
  hw/rdma: Switch to generic error reporting way
  hw/rdma: Introduce protected qlist
  hw/rdma: Protect against concurrent execution of poll_cq
  {monitor,hw/pvrdma}: Expose device internals via monitor interface
  hw/rdma: Free all MAD receive buffers when device is closed
  hw/rdma: Free all receive buffers when QP is destroyed
  hw/pvrdma: Delete unneeded function argument
  hw/pvrdma: Delete pvrdma_exit function
  hw/pvrdma: Unregister from shutdown notifier when device goes down

 hmp-commands-info.hx  |  16 ++
 hw/rdma/rdma_backend.c| 483 +-
 hw/rdma/rdma_backend.h|   3 +-
 hw/rdma/rdma_backend_defs.h   |  10 +-
 hw/rdma/rdma_rm.c | 134 +-
 hw/rdma/rdma_rm_defs.h|  28 +-
 hw/rdma/rdma_utils.c  |  79 +-
 hw/rdma/rdma_utils.h  |  61 ++---
 hw/rdma/trace-events  |  32 ++-
 hw/rdma/vmw/pvrdma.h  |   7 +-
 hw/rdma/vmw/pvrdma_cmd.c  | 113 ++--
 hw/rdma/vmw/pvrdma_dev_ring.c |  26 +-
 hw/rdma/vmw/pvrdma_hmp.h  |  21 ++
 hw/rdma/vmw/pvrdma_main.c | 217 ---
 hw/rdma/vmw/pvrdma_qp_ops.c   |  52 +---
 hw/rdma/vmw/trace-events  |  16 +-
 monitor.c |  10 +
 17 files changed, 709 insertions(+), 599 deletions(-)
 create mode 100644 hw/rdma/vmw/pvrdma_hmp.h

-- 
2.17.2

Re: [Qemu-devel] [Qemu-ppc] [PATCH] cuda: decrease time delay before raising VIA SR interrupt

2019-02-12 Thread Mark Cave-Ayland

On 13/02/2019 00:21, David Gibson wrote:

> On Tue, Feb 12, 2019 at 08:01:22PM +, Mark Cave-Ayland wrote:
>> On 12/02/2019 18:21, Philippe Mathieu-Daudé wrote:
>>
>>> On 2/12/19 6:50 PM, Mark Cave-Ayland wrote:
 On 12/02/2019 17:21, Philippe Mathieu-Daudé wrote:

>>> If this delay is to prevent a bug which only happens in MacOS then 
>>> that's the hack
>>> not the normal code path to run without the delay that you've just 
>>> removed. So maybe
>>> this should be kept if possible to avoid unecessary delays for other 
>>> guests.
>>> (Although if this only affects mac99,via=cuda but not mac99,via=pmu 
>>> then I don't care
>>> much as long as pmu works.)
>>
>> Well the reality is that the detection above doesn't actually seem to 
>> work anyway -
>> at least a quick boot test with Linux, MacOS X and MacOS 9 with a 
>> printf() added into
>> the if() shows nothing firing once the kernel takes over. So the slow 
>> path with the
>> delay included was always being taken within the OS anyway.
>>
>> And indeed, the code doesn't affect pmu so you won't see any difference 
>> there.
>>
 As a plus it also prevents a guest OS from accidentally triggering the 
 hack whilst
 programming the VIA port.
>>>
>>> That may be a problem though. What's the issue exactly? Why is the 
>>> delay needed in
>>> the first place?
>>
>> It's some kind of racy polling with OS 9 (I wasn't involved in the 
>> technical details,
>> sorry) which causes OS 9 to hang on boot if the delay isn't present. And 
>> even better
>> the slow path that was previously always being taken has now been 
>> reduced from 300us
>> to 30us so whichever way you look at it, having this patch applied is a 
>> win.
>
> Can you write a paragraph about this, that David can amend to your
> patch? That would stop worrying me about looking at this patch in
> various months...

 H well the existing description already describes the interrupt race 
 in OS 9 so I
 guess the only part missing is the bit about the fast path. How about the 
 revised
 text below for the patch description?

 cuda: decrease time delay before raising VIA SR interrupt and remove 
 fast path

 In order to handle a race condition in the MacOS 9 CUDA driver, a 
 delay was
 introduced when raising the VIA SR interrupt inspired by similar code 
 in
 MacOnLinux.

 During original testing of the MacOS 9 patches it was found that the 
 30us
 delay used in MacOnLinux did not work reliably within QEMU, and a 
 value of
 300us was required to function correctly.

 Recent experiments have shown two things: firstly when booting Linux, 
 MacOS
 9 and MacOS X the fast path which bypasses the delay is never 
 triggered once the
 OS kernel is loaded making it effectively useless. Rather than leave 
 this code
 in place where a guest could potentially enable it by accident and 
 break itself,
 we might as well just remove it.

 Secondly the previous reliability issues are no longer present, and 
 this value
 can be reduced down to 20us with no apparent ill effects. This has the 
 benefit of
 considerably improving the responsiveness of the ADB keyboard and 
 mouse within
 the guest.

 Signed-off-by: Mark Cave-Ayland 

>>>
>>> Thanks!
>>>
>>> Phil.
>>
>> No worries. David, are you able to update the commit message in your 
>> ppc-for-4.0
>> branch accordingly?
> 
> Done.

Great, thanks!

ATB,

Mark.

[Qemu-devel] [PATCH v2 9/9] hw/pvrdma: Unregister from shutdown notifier when device goes down

2019-02-12 Thread Yuval Shaia

This hook was installed to close the device when VM is going down.
After the device is closed there is no need to be informed on VM
shutdown.

Signed-off-by: Yuval Shaia 
Reviewed-by: Marcel Apfelbaum 
---
 hw/rdma/vmw/pvrdma_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/rdma/vmw/pvrdma_main.c b/hw/rdma/vmw/pvrdma_main.c
index 8b379c6435..1177c0822f 100644
--- a/hw/rdma/vmw/pvrdma_main.c
+++ b/hw/rdma/vmw/pvrdma_main.c
@@ -359,6 +359,8 @@ static void pvrdma_fini(PCIDevice *pdev)
 {
 PVRDMADev *dev = PVRDMA_DEV(pdev);
 
+notifier_remove(>shutdown_notifier);
+
 pvrdma_qp_ops_fini();
 
 rdma_backend_stop(>backend_dev);
-- 
2.17.2

Re: [Qemu-devel] [PATCH v3 0/3] Trivial cleanup in hw/acpi

2019-02-12 Thread Wei Yang

On Wed, Feb 13, 2019 at 07:32:23AM +0100, Philippe Mathieu-Daudé wrote:
>Hi Wei,
>
>On 2/13/19 7:26 AM, Wei Yang wrote:
>> On Tue, Feb 12, 2019 at 12:34:31AM -0500, Michael S. Tsirkin wrote:
>>> On Tue, Feb 12, 2019 at 01:22:24PM +0800, Wei Yang wrote:
 On Wed, Jan 30, 2019 at 08:06:50AM +0800, Wei Yang wrote:
> There are several functions/variable which are not used anymore.
>
> This serials just remove those without functional change.
>
> v3: add ack and repost in a new thread
> v2: change commit log from "is now used in no place" to "in not used 
> anymore"

 Michael,

 Looks this serials is not merged yet.

 Is there any problem I need to fix?
>>>
>>> Yes pls repost with fixed reviewed-by tags.
>>>
>> 
>> I tried to use git-sendemail to send those patches again.
>> 
>> The character 茅 looks good in patch file, while after git-sendemail this is
>> changed.
>> 
>> My encoding property in mail is :
>> 
>> MIME-Version: 1.0
>> Content-Type: text/plain; charset=UTF-8
>> Content-Language: en-US
>> Content-Transfer-Encoding: 8bit
>
>I'm not sure but you can try:
>
>  Content-Transfer-Encoding: base64
>

Looks not work.

git send-email will detect the file's encoding is 8bit, so manually change to
base64 doesn't work.

>> 
>> Looks the same as Philippe's.
>> 
>> Any hint on how to send out a correct character with git-sendemail?
>
>If it doesn't work, then use my ASCII-reduced lastname, I can live with
>that:
>
>"Reviewed-by: Philippe Mathieu-Daude "
>
>Thanks for trying,

Thanks :-)

Let's wait a while to see whether someone else has an idea on that.

>
>Phil.
>
>> 
>>>
>
> Wei Yang (3):
>  hw/i386/pc.c: remove unused function pc_acpi_init()
>  hw/acpi: remove unused function acpi_table_add_builtin()
>  hw/acpi: remove unnecessary variable acpi_table_builtin
>
> hw/acpi/core.c | 10 +-
> hw/i386/pc.c   | 27 ---
> include/hw/acpi/acpi.h |  1 -
> include/hw/i386/pc.h   |  1 -
> 4 files changed, 1 insertion(+), 38 deletions(-)

-- 
Wei Yang
Help you, Help me

[Qemu-devel] [PATCH v2 7/9] hw/pvrdma: Delete unneeded function argument

2019-02-12 Thread Yuval Shaia

The function argument rdma_dev_res is not needed as it is stored in the
backend_dev object at init.

Signed-off-by: Yuval Shaia 
Reviewed-by: Marcel Apfelbaum
---
 hw/rdma/rdma_backend.c  | 13 ++---
 hw/rdma/rdma_backend.h  |  1 -
 hw/rdma/vmw/pvrdma_qp_ops.c |  3 +--
 3 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
index 83a68057a7..4c994e9a39 100644
--- a/hw/rdma/rdma_backend.c
+++ b/hw/rdma/rdma_backend.c
@@ -594,7 +594,6 @@ static unsigned int save_mad_recv_buffer(RdmaBackendDev 
*backend_dev,
 }
 
 void rdma_backend_post_recv(RdmaBackendDev *backend_dev,
-RdmaDeviceResources *rdma_dev_res,
 RdmaBackendQP *qp, uint8_t qp_type,
 struct ibv_sge *sge, uint32_t num_sge, void *ctx)
 {
@@ -613,9 +612,9 @@ void rdma_backend_post_recv(RdmaBackendDev *backend_dev,
 rc = save_mad_recv_buffer(backend_dev, sge, num_sge, ctx);
 if (rc) {
 complete_work(IBV_WC_GENERAL_ERR, rc, ctx);
-rdma_dev_res->stats.mad_rx_bufs_err++;
+backend_dev->rdma_dev_res->stats.mad_rx_bufs_err++;
 } else {
-rdma_dev_res->stats.mad_rx_bufs++;
+backend_dev->rdma_dev_res->stats.mad_rx_bufs++;
 }
 }
 return;
@@ -625,7 +624,7 @@ void rdma_backend_post_recv(RdmaBackendDev *backend_dev,
 bctx->up_ctx = ctx;
 bctx->backend_qp = qp;
 
-rc = rdma_rm_alloc_cqe_ctx(rdma_dev_res, _id, bctx);
+rc = rdma_rm_alloc_cqe_ctx(backend_dev->rdma_dev_res, _id, bctx);
 if (unlikely(rc)) {
 complete_work(IBV_WC_GENERAL_ERR, VENDOR_ERR_NOMEM, ctx);
 goto err_free_bctx;
@@ -633,7 +632,7 @@ void rdma_backend_post_recv(RdmaBackendDev *backend_dev,
 
 rdma_protected_gslist_append_int32(>cqe_ctx_list, bctx_id);
 
-rc = build_host_sge_array(rdma_dev_res, new_sge, sge, num_sge,
+rc = build_host_sge_array(backend_dev->rdma_dev_res, new_sge, sge, num_sge,
   _dev->rdma_dev_res->stats.rx_bufs_len);
 if (rc) {
 complete_work(IBV_WC_GENERAL_ERR, rc, ctx);
@@ -652,13 +651,13 @@ void rdma_backend_post_recv(RdmaBackendDev *backend_dev,
 }
 
 atomic_inc(_dev->rdma_dev_res->stats.missing_cqe);
-rdma_dev_res->stats.rx_bufs++;
+backend_dev->rdma_dev_res->stats.rx_bufs++;
 
 return;
 
 err_dealloc_cqe_ctx:
 backend_dev->rdma_dev_res->stats.rx_bufs_err++;
-rdma_rm_dealloc_cqe_ctx(rdma_dev_res, bctx_id);
+rdma_rm_dealloc_cqe_ctx(backend_dev->rdma_dev_res, bctx_id);
 
 err_free_bctx:
 g_free(bctx);
diff --git a/hw/rdma/rdma_backend.h b/hw/rdma/rdma_backend.h
index cb5efa2a3a..5d507a1c41 100644
--- a/hw/rdma/rdma_backend.h
+++ b/hw/rdma/rdma_backend.h
@@ -111,7 +111,6 @@ void rdma_backend_post_send(RdmaBackendDev *backend_dev,
 union ibv_gid *dgid, uint32_t dqpn, uint32_t dqkey,
 void *ctx);
 void rdma_backend_post_recv(RdmaBackendDev *backend_dev,
-RdmaDeviceResources *rdma_dev_res,
 RdmaBackendQP *qp, uint8_t qp_type,
 struct ibv_sge *sge, uint32_t num_sge, void *ctx);
 
diff --git a/hw/rdma/vmw/pvrdma_qp_ops.c b/hw/rdma/vmw/pvrdma_qp_ops.c
index 16db726dac..508d8fca3c 100644
--- a/hw/rdma/vmw/pvrdma_qp_ops.c
+++ b/hw/rdma/vmw/pvrdma_qp_ops.c
@@ -231,8 +231,7 @@ void pvrdma_qp_recv(PVRDMADev *dev, uint32_t qp_handle)
 continue;
 }
 
-rdma_backend_post_recv(>backend_dev, >rdma_dev_res,
-   >backend_qp, qp->qp_type,
+rdma_backend_post_recv(>backend_dev, >backend_qp, qp->qp_type,
(struct ibv_sge *)>sge[0], 
wqe->hdr.num_sge,
comp_ctx);
 
-- 
2.17.2

Re: [Qemu-devel] [Qemu-block] [RFC PATCH] coroutines: generate wrapper code

2019-02-12 Thread Stefan Hajnoczi

On Tue, Feb 12, 2019 at 12:58:40PM +0100, Kevin Wolf wrote:
> Am 12.02.2019 um 04:22 hat Stefan Hajnoczi geschrieben:
> > On Mon, Feb 11, 2019 at 09:38:37AM +, Vladimir Sementsov-Ogievskiy 
> > wrote:
> > > 11.02.2019 6:42, Stefan Hajnoczi wrote:
> > > > On Fri, Feb 08, 2019 at 05:11:22PM +0300, Vladimir Sementsov-Ogievskiy 
> > > > wrote:
> > > >> Hi all!
> > > >>
> > > >> We have a very frequent pattern of wrapping a coroutine_fn function
> > > >> to be called from non-coroutine context:
> > > >>
> > > >>- create structure to pack parameters
> > > >>- create function to call original function taking parameters from
> > > >>  struct
> > > >>- create wrapper, which in case of non-coroutine context will
> > > >>  create a coroutine, enter it and start poll-loop.
> > > >>
> > > >> Here is a draft of template code + example how it can be used to drop a
> > > >> lot of similar code.
> > > >>
> > > >> Hope someone like it except me)
> > > > 
> > > > My 2 cents.  Cons:
> > > > 
> > > >   * Synchronous poll loops are an anti-pattern.  They block all of QEMU
> > > > with the big mutex held.  Making them easier to write is
> > > > questionable because we should aim to have as few of these as
> > > > possible.
> > > 
> > > Understand. Do we have a concept or a kind of target for a future to get 
> > > rid of
> > > these a lot of poll-loops? What is the right way? At least for 
> > > block-layer?
> > 
> > It's non-trivial.  The nested event loop could be flattened if there was
> > a mechanism to stop further activity on a specific object only (e.g.
> > BlockDriverState).  That way the event loop can continue processing
> > events for other objects and device emulation could continue for other
> > objects.
> 
> The mechanism to stop activity on BlockDriverStates is bdrv_drain(). But
> I don't see how this is related. Nested event loops aren't for stopping
> concurrent activity (events related to async operations started earlier
> are still processed in nested event loops), but for making progress on
> the operation we're waiting for. They happen when synchronous code calls
> into asynchronous code.
> 
> The way to get rid of them is making their callers async. I think we
> would come a long way if we ran QMP command handlers (at least the block
> related ones) and qemu-img operations in coroutines instead of blocking
> while we wait for the result.

A difficult caller is device reset, where we need to drain all requests.
But even converting some sync code paths to async is a win because it
removes places where QEMU can get stuck.

Regarding block QMP handlers, do you mean suspending the monitor when
a command yields?  The monitor will be unresponsive to the outside
world, so this doesn't solve the problem from the QMP client's
perspective.  This is why async QMP and jobs are interesting but it's a
lot of work both inside QEMU and for clients like libvirt.

> 
> > Unfortunately there are interactions between objects like in block jobs
> > that act on multiple BDSes, so it becomes even tricky.
> > 
> > A simple way of imagining this is to make each object an "actor"
> > coroutine.  The coroutine processes a single message (request) at a time
> > and yields when it needs to wait.  Callers send messages and expect
> > asynchronous responses.  This model is bad for efficiency (parallelism
> > is necessary) but at least it offers a sane way of thinking about
> > multiple asynchronous components coordinating together.  (It's another
> > way of saying, let's put everything into coroutines.)
> > 
> > The advantage of a flat event loop is that a hang in one object (e.g.
> > I/O getting stuck in one file) doesn't freeze the entire event loop.
> 
> I think this one is more theoretical because you'll still have
> dependencies between the components. blk_drain_all() isn't hanging
> because the code is designed suboptimally, but because its semantics is
> to wait until all requests have completed. And it's called because this
> semantics is required.

If we try to convert everything to async there will be two cases:
1. Accidental sync code which can be made async.  (Rare nowadays?)
2. Fundamental synchronization points that require waiting.

When you reach a point that hangs there is still the possibility of a
timeout or an explicit cancel.  Today QEMU supports neither, so a
command that gets stuck will hang QEMU for as long as it takes.

If QMP clients want timeouts or cancel then making everything async is
necessary.  If not, then we can leave it as is and simply audit the code
for accidental sync code (there used to be a lot of this but it's rarer
now) and convert it.

Stefan


signature.asc
Description: PGP signature

[Qemu-devel] [PATCH v2 6/9] hw/rdma: Free all receive buffers when QP is destroyed

2019-02-12 Thread Yuval Shaia

When QP is destroyed the backend QP is destroyed as well. This ensures
we clean all received buffer we posted to it.
However, a contexts of these buffers are still remain in the device.
Fix it by maintaining a list of buffer's context and free them when QP
is destroyed.

Signed-off-by: Yuval Shaia 
---
 hw/rdma/rdma_backend.c  | 26 --
 hw/rdma/rdma_backend.h  |  2 +-
 hw/rdma/rdma_backend_defs.h |  2 +-
 hw/rdma/rdma_rm.c   |  2 +-
 hw/rdma/rdma_utils.c| 29 +
 hw/rdma/rdma_utils.h| 11 +++
 6 files changed, 63 insertions(+), 9 deletions(-)

diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
index e8ee205b5f..83a68057a7 100644
--- a/hw/rdma/rdma_backend.c
+++ b/hw/rdma/rdma_backend.c
@@ -39,6 +39,7 @@
 typedef struct BackendCtx {
 void *up_ctx;
 struct ibv_sge sge; /* Used to save MAD recv buffer */
+RdmaBackendQP *backend_qp; /* To maintain recv buffers */
 } BackendCtx;
 
 struct backend_umad {
@@ -73,6 +74,7 @@ static void free_cqe_ctx(gpointer data, gpointer user_data)
 bctx = rdma_rm_get_cqe_ctx(rdma_dev_res, cqe_ctx_id);
 if (bctx) {
 rdma_rm_dealloc_cqe_ctx(rdma_dev_res, cqe_ctx_id);
+atomic_dec(_dev_res->stats.missing_cqe);
 }
 g_free(bctx);
 }
@@ -85,13 +87,15 @@ static void clean_recv_mads(RdmaBackendDev *backend_dev)
 cqe_ctx_id = rdma_protected_qlist_pop_int64(_dev->
 recv_mads_list);
 if (cqe_ctx_id != -ENOENT) {
+atomic_inc(_dev->rdma_dev_res->stats.missing_cqe);
 free_cqe_ctx(GINT_TO_POINTER(cqe_ctx_id),
  backend_dev->rdma_dev_res);
 }
 } while (cqe_ctx_id != -ENOENT);
 }
 
-static int rdma_poll_cq(RdmaDeviceResources *rdma_dev_res, struct ibv_cq *ibcq)
+static int rdma_poll_cq(RdmaBackendDev *backend_dev,
+RdmaDeviceResources *rdma_dev_res, struct ibv_cq *ibcq)
 {
 int i, ne, total_ne = 0;
 BackendCtx *bctx;
@@ -113,6 +117,8 @@ static int rdma_poll_cq(RdmaDeviceResources *rdma_dev_res, 
struct ibv_cq *ibcq)
 
 comp_handler(bctx->up_ctx, [i]);
 
+rdma_protected_gslist_remove_int32(>backend_qp->cqe_ctx_list,
+   wc[i].wr_id);
 rdma_rm_dealloc_cqe_ctx(rdma_dev_res, wc[i].wr_id);
 g_free(bctx);
 }
@@ -175,14 +181,12 @@ static void *comp_handler_thread(void *arg)
 }
 
 backend_dev->rdma_dev_res->stats.poll_cq_from_bk++;
-rdma_poll_cq(backend_dev->rdma_dev_res, ev_cq);
+rdma_poll_cq(backend_dev, backend_dev->rdma_dev_res, ev_cq);
 
 ibv_ack_cq_events(ev_cq, 1);
 }
 }
 
-/* TODO: Post cqe for all remaining buffs that were posted */
-
 backend_dev->comp_thread.is_running = false;
 
 qemu_thread_exit(0);
@@ -311,7 +315,7 @@ void rdma_backend_poll_cq(RdmaDeviceResources 
*rdma_dev_res, RdmaBackendCQ *cq)
 int polled;
 
 rdma_dev_res->stats.poll_cq_from_guest++;
-polled = rdma_poll_cq(rdma_dev_res, cq->ibcq);
+polled = rdma_poll_cq(cq->backend_dev, rdma_dev_res, cq->ibcq);
 if (!polled) {
 rdma_dev_res->stats.poll_cq_from_guest_empty++;
 }
@@ -501,6 +505,7 @@ void rdma_backend_post_send(RdmaBackendDev *backend_dev,
 
 bctx = g_malloc0(sizeof(*bctx));
 bctx->up_ctx = ctx;
+bctx->backend_qp = qp;
 
 rc = rdma_rm_alloc_cqe_ctx(backend_dev->rdma_dev_res, _id, bctx);
 if (unlikely(rc)) {
@@ -508,6 +513,8 @@ void rdma_backend_post_send(RdmaBackendDev *backend_dev,
 goto err_free_bctx;
 }
 
+rdma_protected_gslist_append_int32(>cqe_ctx_list, bctx_id);
+
 rc = build_host_sge_array(backend_dev->rdma_dev_res, new_sge, sge, num_sge,
   _dev->rdma_dev_res->stats.tx_len);
 if (rc) {
@@ -616,6 +623,7 @@ void rdma_backend_post_recv(RdmaBackendDev *backend_dev,
 
 bctx = g_malloc0(sizeof(*bctx));
 bctx->up_ctx = ctx;
+bctx->backend_qp = qp;
 
 rc = rdma_rm_alloc_cqe_ctx(rdma_dev_res, _id, bctx);
 if (unlikely(rc)) {
@@ -623,6 +631,8 @@ void rdma_backend_post_recv(RdmaBackendDev *backend_dev,
 goto err_free_bctx;
 }
 
+rdma_protected_gslist_append_int32(>cqe_ctx_list, bctx_id);
+
 rc = build_host_sge_array(rdma_dev_res, new_sge, sge, num_sge,
   _dev->rdma_dev_res->stats.rx_bufs_len);
 if (rc) {
@@ -762,6 +772,8 @@ int rdma_backend_create_qp(RdmaBackendQP *qp, uint8_t 
qp_type,
 return -EIO;
 }
 
+rdma_protected_gslist_init(>cqe_ctx_list);
+
 qp->ibpd = pd->ibpd;
 
 /* TODO: Query QP to get max_inline_data and save it to be used in send */
@@ -919,11 +931,13 @@ int rdma_backend_query_qp(RdmaBackendQP *qp, struct 
ibv_qp_attr *attr,
 return ibv_query_qp(qp->ibqp, attr, attr_mask, init_attr);
 }
 
-void rdma_backend_destroy_qp(RdmaBackendQP

Re: [Qemu-devel] [PATCH] hw/display: Add basic ATI VGA emulation

2019-02-12 Thread Mark Cave-Ayland

On 12/02/2019 23:59, BALATON Zoltan wrote:

> Hello,
> 
> On Tue, 12 Feb 2019, Philippe Mathieu-Daudé wrote:
>> Hi Zoltan,
> 
> Thanks for the quick review and testing. I'll use your suggestions for the 
> other
> (mips) patches in a v2. For this one I'm not convinced.
> 
>> On 2/11/19 4:19 AM, BALATON Zoltan wrote:
> [...]
>>> +
>>> +static void ati_reg_write_offs(uint32_t *reg, int offs,
>>> +   uint64_t data, unsigned int size)
>>> +{
>>> +    int shift, i;
>>> +    uint32_t mask;
>>> +
>>> +    for (i = 0; i < size; i++) {
>>> +    shift = (offs + i) * 8;
>>> +    mask = 0xffUL << shift;
>>> +    *reg &= ~mask;
>>> +    *reg |= (data & 0xff) << shift;
>>> +    data >>= 8;
>>
>> I'd have use a pair of extract32/deposit32 but this is probably easier
>> to singlestep.
> 
> You've told me that before but I have concerns about the asserts in those 
> functions
> which to me seem like unnecessary overhead in such low level functions so 
> unless
> these are removed or *_noassert versions introduced I'll stay away from them.
> 
> But I'm also not too happy about these *_offs functions but some registers 
> support
> 8/16/32 bit access and guest code seems to actually do this to update bits in 
> the
> middle of the register at an odd address. Best would be if I could just set 
> .impl.min
> = 4, .impl.max = 4 and .valid.min = 1 .valid.max = 4 for the mem region ops 
> but I'm
> not sure that would work or would it? If that's working maybe I should just 
> go with
> that instead.
> 
> [...]
>>> diff --git a/hw/display/ati_int.h b/hw/display/ati_int.h
>>> new file mode 100644
>>> index 00..85d045517c
>>> --- /dev/null
>>> +++ b/hw/display/ati_int.h
>>> @@ -0,0 +1,67 @@
>>> +/*
>>> + * QEMU ATI SVGA emulation
>>> + *
>>> + * Copyright (c) 2019 BALATON Zoltan
>>> + *
>>> + * This work is licensed under the GNU GPL license version 2 or later.
>>> + */
>>> +
>>> +#include "qemu/osdep.h"
>>> +#include "hw/pci/pci.h"
>>> +#include "vga_int.h"
>>> +
>>> +#undef DEBUG_ATI
>>> +
>>> +#ifdef DEBUG_ATI
>>> +#define DPRINTF(fmt, ...) printf("%s: " fmt, __func__, ## __VA_ARGS__)
>>> +#else
>>> +#define DPRINTF(fmt, ...) do {} while (0)
>>
>> Please use tracepoints (you already add some!).
> 
> I won't and here's why: This is not a finished device model and I expect to 
> need to
> add debug logs and change them frequently during further development and for 
> such
> ad-hoc debugging DPRINF is still easier to use because I don't have to define 
> the
> format string at one file and use them somewhere else. With DPRINTF I can 
> just add a
> debug log at one place and change it easily without editing it at two 
> unrelated
> places so it's easier to work with. Once development is finished those that 
> we intend
> to leave in for later tracing can be converted to trace points (for which 
> trace point
> is better) and at that point remove the DPRINTF macro. We still have enough 
> DPRINTFs
> in QEMU so this should be OK. I've already added trace points to two such 
> places but
> even for those I almost considered ditching them when checkpatch insisted I 
> have to
> add 0x prefix to hex numbers (I don't like this because I know these are hex 
> and
> printing e.g. 0x8 instead of 8 is just distracting from the actual important 
> value
> which is what counts when I'm looking at a lot of these during debugging. 
> Anything
> that distracts from actual values and makes it harder to read (such as 
> timestamps and
> pids added by trace) is bad so I've considered going back to DPRINTF even for 
> those
> trace points but will see if I can live with these for now.) But those that 
> are still
> DPRINTFs won't be converted to trace but supposed to be removed when no 
> longer needed.
> 
> [...]
>> I don't understand well the display code, but the result works very
>> well, nice work :)
>>
>> Tested-by: Philippe Mathieu-Daudé 
> 
> Thanks, it's a start and currently only targeting Linux console with a lot 
> more to do
> for it to be more useful. But I have limited time for this so since it's 
> already
> useful to get mips_fulong2e working I thought that justifies including it now 
> so
> others have a chance to look at it and maybe even help to improve it which 
> can't
> happen if it's only sitting on my machine.

This looks interesting, however I never received the original via the mailing 
list.
Did it get held somewhere because its size?

Also it's probably worth pushing it to a suitable git repo since then it's 
easier for
people to pull and update as required.


ATB,

Mark.

Re: [Qemu-devel] [PATCH] hostmem-file: reject invalid pmem file sizes

2019-02-12 Thread Stefan Hajnoczi

On Tue, Feb 12, 2019 at 03:44:46PM +0100, Igor Mammedov wrote:
> > diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
> > index ba601ce940..325ab4aad9 100644
> > --- a/backends/hostmem-file.c
> > +++ b/backends/hostmem-file.c
> > @@ -46,6 +46,22 @@ file_backend_memory_alloc(HostMemoryBackend *backend, 
> > Error **errp)
> >  gchar *name;
> >  #endif
> >  
> > +/*
> > + * Verify pmem file size since starting a guest with an incorrect size
> > + * leads to confusing failures inside the guest.
> > + */
> > +if (fb->is_pmem && fb->mem_path) {
> > +uint64_t size;
> > +
> > +size = qemu_get_pmem_size(fb->mem_path, NULL);
>
> Did you ignore error intentionally?

Hmm...I think I can now propagate the error.  Originally the function
only handled devdax chardevs so it would fail for a regular file on a
DAX file system and that shouldn't stop QEMU startup.  But now that it
supports regular files too I can't think of inputs that lead to a false
positive.

Will fix in v2.

Stefan


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v3 0/3] Trivial cleanup in hw/acpi

2019-02-12 Thread Philippe Mathieu-Daudé

Hi Wei,

On 2/13/19 7:26 AM, Wei Yang wrote:
> On Tue, Feb 12, 2019 at 12:34:31AM -0500, Michael S. Tsirkin wrote:
>> On Tue, Feb 12, 2019 at 01:22:24PM +0800, Wei Yang wrote:
>>> On Wed, Jan 30, 2019 at 08:06:50AM +0800, Wei Yang wrote:
 There are several functions/variable which are not used anymore.

 This serials just remove those without functional change.

 v3: add ack and repost in a new thread
 v2: change commit log from "is now used in no place" to "in not used 
 anymore"
>>>
>>> Michael,
>>>
>>> Looks this serials is not merged yet.
>>>
>>> Is there any problem I need to fix?
>>
>> Yes pls repost with fixed reviewed-by tags.
>>
> 
> I tried to use git-sendemail to send those patches again.
> 
> The character 茅 looks good in patch file, while after git-sendemail this is
> changed.
> 
> My encoding property in mail is :
> 
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Language: en-US
> Content-Transfer-Encoding: 8bit

I'm not sure but you can try:

  Content-Transfer-Encoding: base64

> 
> Looks the same as Philippe's.
> 
> Any hint on how to send out a correct character with git-sendemail?

If it doesn't work, then use my ASCII-reduced lastname, I can live with
that:

"Reviewed-by: Philippe Mathieu-Daude "

Thanks for trying,

Phil.

> 
>>

 Wei Yang (3):
  hw/i386/pc.c: remove unused function pc_acpi_init()
  hw/acpi: remove unused function acpi_table_add_builtin()
  hw/acpi: remove unnecessary variable acpi_table_builtin

 hw/acpi/core.c | 10 +-
 hw/i386/pc.c   | 27 ---
 include/hw/acpi/acpi.h |  1 -
 include/hw/i386/pc.h   |  1 -
 4 files changed, 1 insertion(+), 38 deletions(-)

Re: [Qemu-devel] [PATCH v3 0/3] Trivial cleanup in hw/acpi

2019-02-12 Thread Wei Yang

On Tue, Feb 12, 2019 at 12:34:31AM -0500, Michael S. Tsirkin wrote:
>On Tue, Feb 12, 2019 at 01:22:24PM +0800, Wei Yang wrote:
>> On Wed, Jan 30, 2019 at 08:06:50AM +0800, Wei Yang wrote:
>> >There are several functions/variable which are not used anymore.
>> >
>> >This serials just remove those without functional change.
>> >
>> >v3: add ack and repost in a new thread
>> >v2: change commit log from "is now used in no place" to "in not used 
>> >anymore"
>> 
>> Michael,
>> 
>> Looks this serials is not merged yet.
>> 
>> Is there any problem I need to fix?
>
>Yes pls repost with fixed reviewed-by tags.
>

I tried to use git-sendemail to send those patches again.

The character 茅 looks good in patch file, while after git-sendemail this is
changed.

My encoding property in mail is :

MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Language: en-US
Content-Transfer-Encoding: 8bit

Looks the same as Philippe's.

Any hint on how to send out a correct character with git-sendemail?

>
>> >
>> >Wei Yang (3):
>> >  hw/i386/pc.c: remove unused function pc_acpi_init()
>> >  hw/acpi: remove unused function acpi_table_add_builtin()
>> >  hw/acpi: remove unnecessary variable acpi_table_builtin
>> >
>> > hw/acpi/core.c | 10 +-
>> > hw/i386/pc.c   | 27 ---
>> > include/hw/acpi/acpi.h |  1 -
>> > include/hw/i386/pc.h   |  1 -
>> > 4 files changed, 1 insertion(+), 38 deletions(-)
>> >
>> >-- 
>> >2.19.1
>> 
>> -- 
>> Wei Yang
>> Help you, Help me

-- 
Wei Yang
Help you, Help me

Re: [Qemu-devel] Combining -loadvm and -snapshot

2019-02-12 Thread Markus Armbruster

Cc'ing the QCOW2 folks.

Drew DeVault  writes:

> I recently ran into an issue where I found I couldn't combine the
> -loadvm and -snapshot flags, nor any conceivable combination of
> alternate approaches like loadvm via the monitor. Independently, both
> options work as expected, but together I get this error:
>
> qemu-system-x86_64: Device 'virtio0' does not have the requested snapshot 
> 'base'
>
> The goal here is to resume the VM state from a snapshot, but to prevent
> the guest from persisting writes to the underlying qcow2.
>
> I started digging into the code to understand this problem more, and I
> was pretty deep in the weeds when I realized what the underlying problem
> probably was and the kind of refactoring necessary to fix it - so I'm
> here to touch base before moving any further.
>
> I believe this happens because -snapshot creates a temporary qcow2
> overlaid on top of the disk you're using, and this overlay does not have
> any snapshots copied, nor does any of the snapshot reading code (e.g.
> qcow2_snapshot_list or qcow2_snapshot_goto) iterate over backing disks
> to load their snapshots.
>
> At first I was going to adjust the qcow2 snapshot loading code (those
> two functions in particular) to read through their backends, but I'm a
> little unfamiliar with this code and the refactoring is not minor so I
> would like to get feedback from some of the wiser folks on this mailing
> list before I sink too much time into this.
>
> Thoughts?
>
> --
> Drew DeVault

Re: [Qemu-devel] Key repeat is no longer working on TTY and grub menu

2019-02-12 Thread Markus Armbruster

Daniel P. Berrangé  writes:

> Yes, this is another regression accidentally introduced by the keyboard
> state tracker.
>
> When GTK does key repeat it omits the Up event for repeated keys.
>
> IOW, you get
>
> Press (a)
> Press (a)
> Press (a)
> Release (a)

This is how keyboards commonly do it, if I remember correctly.

> Not
>
> Press (a)
> Release (a)
> Press (a)
> Release (a)
> Press (a)
> Release (a)
>
> The keyboard state tracker doesn't take this into account, so it is
> surpressing all except the first Press event.
>
> This might affect other frontends too if they use the same trick for
> key repeat

Plausible.

Re: [Qemu-devel] [PATCH qemu] spapr/rtas: Force big endian compile for rtas

2019-02-12 Thread Alexey Kardashevskiy




On 01/02/2019 11:40, Alexey Kardashevskiy wrote:
> At the moment the rtas's Makefile uses generic QEMU rules which means
> that when QEMU is compiled on a little endian system, the spapr-rtas.bin
> is compiled as little endian too which is incorrect as it is always
> executed in big endian mode.
> 
> This enforces -mbig by defining %.o:%.S rule as spapr-rtas.bin is
> a standalone guest binary which should not depend on QEMU flags anyway.

Bag? Good? Useless? :)


> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
>  pc-bios/spapr-rtas/Makefile | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/pc-bios/spapr-rtas/Makefile b/pc-bios/spapr-rtas/Makefile
> index f26dd42..4b9bb12 100644
> --- a/pc-bios/spapr-rtas/Makefile
> +++ b/pc-bios/spapr-rtas/Makefile
> @@ -14,8 +14,11 @@ $(call set-vpath, $(SRC_PATH)/pc-bios/spapr-rtas)
>  
>  build-all: spapr-rtas.bin
>  
> +%.o: %.S
> + $(call quiet-command,$(CCAS) -mbig -c -o $@ $<,"CCAS","$(TARGET_DIR)$@")
> +
>  %.img: %.o
> - $(call quiet-command,$(CC) -nostdlib -o $@ 
> $<,"Building","$(TARGET_DIR)$@")
> + $(call quiet-command,$(CC) -nostdlib -mbig -o $@ 
> $<,"Building","$(TARGET_DIR)$@")
>  
>  %.bin: %.img
>   $(call quiet-command,$(OBJCOPY) -O binary -j .text $< 
> $@,"Building","$(TARGET_DIR)$@")
> 

-- 
Alexey

Re: [Qemu-devel] [PATCH v9 19/21] replay: add BH oneshot event for block layer

2019-02-12 Thread Pavel Dovgalyuk

> From: Kevin Wolf [mailto:kw...@redhat.com]
> Am 14.01.2019 um 12:10 hat Pavel Dovgalyuk geschrieben:
> > > From: Kevin Wolf [mailto:kw...@redhat.com]
> > > Am 09.01.2019 um 13:13 hat Pavel Dovgalyuk geschrieben:
> > > > Replay is capable of recording normal BH events, but sometimes
> > > > there are single use callbacks scheduled with aio_bh_schedule_oneshot
> > > > function. This patch enables recording and replaying such callbacks.
> > > > Block layer uses these events for calling the completion function.
> > > > Replaying these calls makes the execution deterministic.
> > > >
> > > > Signed-off-by: Pavel Dovgalyuk 
> > >
> > > This still doesn't come even close to catching all BHs that need to be
> > > caught. While you managed to show a few BHs that actually don't need to
> > > be considered for recording when I asked for this in v7, most BHs in the
> > > block layer can in some way lead to device callbacks and must therefore
> > > be recorded.
> >
> > Let's have a brief review. I can change all the places, but how
> > should I make a test case to be sure, that all of them are working ok?
> 
> The list is changing all the time. This is why I am so concerned about
> special-casing a few callers instead of having a generic solution. I
> don't know how we could make sure that we call the right function
> everywhere.

I changed all oneshot invocations in the block layer in the new version.
Can you review it and other block-related patches?

Pavel Dovgalyuk

Re: [Qemu-devel] [RFC v1 0/3] intel_iommu: support scalable mode

2019-02-12 Thread Yi Sun

On 19-02-11 18:37:41, Peter Xu wrote:
> On Wed, Jan 30, 2019 at 01:09:10PM +0800, Yi Sun wrote:
> > Intel vt-d rev3.0 [1] introduces a new translation mode called
> > 'scalable mode', which enables PASID-granular translations for
> > first level, second level, nested and pass-through modes. The
> > vt-d scalable mode is the key ingredient to enable Scalable I/O
> > Virtualization (Scalable IOV) [2] [3], which allows sharing a
> > device in minimal possible granularity (ADI - Assignable Device
> > Interface). As a result, previous Extended Context (ECS) mode
> > is deprecated (no production ever implements ECS).
> > 
> > This patch set emulates a minimal capability set of VT-d scalable
> > mode, equivalent to what is available in VT-d legacy mode today:
> > 1. Scalable mode root entry, context entry and PASID table
> > 2. Seconds level translation under scalable mode
> > 3. Queued invalidation (with 256 bits descriptor)
> > 4. Pass-through mode
> > 
> > Corresponding intel-iommu driver support will be included in
> > kernel 5.0:
> > https://www.spinics.net/lists/kernel/msg2985279.html
> > 
> > We will add emulation of full scalable mode capability along with
> > guest iommu driver progress later, e.g.:
> > 1. First level translation
> > 2. Nested translation
> > 3. Per-PASID invalidation descriptors
> > 4. Page request services for handling recoverable faults
> 
> Hi, YiSun/YiLiu,
> 
> Have you tested against any existing usages of VT-d with this series
> applied?
> 
Thanks for the review!

With kernel/qemu scalable mode enabling patch sets applied, I tested
kernel build/data copy/netperf on guest under both "scalable-mode"
enabled and "scalable-mode" disabled scenarios.

> Thanks,
> 
> -- 
> Peter Xu

Re: [Qemu-devel] [PATCH 14/19] target/ppc: Add POWER9 exception model

2019-02-12 Thread David Gibson

On Mon, Jan 28, 2019 at 10:46:20AM +0100, Cédric Le Goater wrote:
> From: Benjamin Herrenschmidt 
> 
> And use it to get the correct HILE bit in HID0
> 
> Signed-off-by: Benjamin Herrenschmidt 
> Signed-off-by: Cédric Le Goater 

Reviewed-by: David Gibson 

> ---
>  target/ppc/cpu-qom.h|  2 ++
>  target/ppc/excp_helper.c| 17 +
>  target/ppc/translate.c  |  3 ++-
>  target/ppc/translate_init.inc.c |  2 +-
>  4 files changed, 18 insertions(+), 6 deletions(-)
> 
> diff --git a/target/ppc/cpu-qom.h b/target/ppc/cpu-qom.h
> index 7c54093a7122..7ff8b2d68632 100644
> --- a/target/ppc/cpu-qom.h
> +++ b/target/ppc/cpu-qom.h
> @@ -113,6 +113,8 @@ enum powerpc_excp_t {
>  POWERPC_EXCP_POWER7,
>  /* POWER8 exception model   */
>  POWERPC_EXCP_POWER8,
> +/* POWER9 exception model   */
> +POWERPC_EXCP_POWER9,
>  };
>  
>  
> /*/
> diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
> index 7536620a4133..37546bb0f0fe 100644
> --- a/target/ppc/excp_helper.c
> +++ b/target/ppc/excp_helper.c
> @@ -147,7 +147,7 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int 
> excp_model, int excp)
>  
>  /* Exception targetting modifiers
>   *
> - * LPES0 is supported on POWER7/8
> + * LPES0 is supported on POWER7/8/9
>   * LPES1 is not supported (old iSeries mode)
>   *
>   * On anything else, we behave as if LPES0 is 1
> @@ -158,9 +158,10 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int 
> excp_model, int excp)
>   */
>  #if defined(TARGET_PPC64)
>  if (excp_model == POWERPC_EXCP_POWER7 ||
> -excp_model == POWERPC_EXCP_POWER8) {
> +excp_model == POWERPC_EXCP_POWER8 ||
> +excp_model == POWERPC_EXCP_POWER9) {
>  lpes0 = !!(env->spr[SPR_LPCR] & LPCR_LPES0);
> -if (excp_model == POWERPC_EXCP_POWER8) {
> +if (excp_model != POWERPC_EXCP_POWER7) {
>  ail = (env->spr[SPR_LPCR] & LPCR_AIL) >> LPCR_AIL_SHIFT;
>  } else {
>  ail = 0;
> @@ -662,7 +663,15 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int 
> excp_model, int excp)
>  }
>  } else if (excp_model == POWERPC_EXCP_POWER8) {
>  if (new_msr & MSR_HVB) {
> -if (env->spr[SPR_HID0] & (HID0_HILE | HID0_POWER9_HILE)) {
> +if (env->spr[SPR_HID0] & HID0_HILE) {
> +new_msr |= (target_ulong)1 << MSR_LE;
> +}
> +} else if (env->spr[SPR_LPCR] & LPCR_ILE) {
> +new_msr |= (target_ulong)1 << MSR_LE;
> +}
> +} else if (excp_model == POWERPC_EXCP_POWER9) {
> +if (new_msr & MSR_HVB) {
> +if (env->spr[SPR_HID0] & HID0_POWER9_HILE) {
>  new_msr |= (target_ulong)1 << MSR_LE;
>  }
>  } else if (env->spr[SPR_LPCR] & LPCR_ILE) {
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index 07bedbb8f1ce..62a9a57e4a65 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -7483,7 +7483,8 @@ void ppc_cpu_dump_state(CPUState *cs, FILE *f, 
> fprintf_function cpu_fprintf,
>  
>  #if defined(TARGET_PPC64)
>  if (env->excp_model == POWERPC_EXCP_POWER7 ||
> -env->excp_model == POWERPC_EXCP_POWER8) {
> +env->excp_model == POWERPC_EXCP_POWER8 ||
> +env->excp_model == POWERPC_EXCP_POWER9)  {
>  cpu_fprintf(f, "HSRR0 " TARGET_FMT_lx " HSRR1 " TARGET_FMT_lx "\n",
>  env->spr[SPR_HSRR0], env->spr[SPR_HSRR1]);
>  }
> diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
> index f235162a1f6b..c1719c46a383 100644
> --- a/target/ppc/translate_init.inc.c
> +++ b/target/ppc/translate_init.inc.c
> @@ -8905,7 +8905,7 @@ POWERPC_FAMILY(POWER9)(ObjectClass *oc, void *data)
>  pcc->hash64_opts = _hash64_opts_POWER7;
>  pcc->radix_page_info = _radix_page_info;
>  #endif
> -pcc->excp_model = POWERPC_EXCP_POWER8;
> +pcc->excp_model = POWERPC_EXCP_POWER9;
>  pcc->bus_model = PPC_FLAGS_INPUT_POWER7;
>  pcc->bfd_mach = bfd_mach_ppc64;
>  pcc->flags = POWERPC_FLAG_VRE | POWERPC_FLAG_SE |

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 11/19] target/ppc: Move "wakeup reset" code to a separate function

2019-02-12 Thread David Gibson

On Mon, Jan 28, 2019 at 10:46:17AM +0100, Cédric Le Goater wrote:
> From: Benjamin Herrenschmidt 
> 
> This moves the code to handle waking up from the 0x100 vector
> from powerpc_excp() to a separate function, as the former is
> already way too big as it is.
> 
> No functional change.
> 
> Signed-off-by: Benjamin Herrenschmidt 
> Signed-off-by: Cédric Le Goater 

Reviewed-by: David Gibson 

> ---
>  target/ppc/excp_helper.c | 75 ++--
>  1 file changed, 41 insertions(+), 34 deletions(-)
> 
> diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
> index 97503193ef43..489a54f51b90 100644
> --- a/target/ppc/excp_helper.c
> +++ b/target/ppc/excp_helper.c
> @@ -65,6 +65,46 @@ static inline void dump_syscall(CPUPPCState *env)
>ppc_dump_gpr(env, 6), env->nip);
>  }
>  
> +static int powerpc_reset_wakeup(CPUState *cs, CPUPPCState *env, int excp,
> +target_ulong *msr)
> +{
> +/* We no longer are in a PM state */
> +env->in_pm_state = false;
> +
> +/* Pretend to be returning from doze always as we don't lose state */
> +*msr |= (0x1ull << (63 - 47));
> +
> +/* Machine checks are sent normally */
> +if (excp == POWERPC_EXCP_MCHECK) {
> +return excp;
> +}
> +switch (excp) {
> +case POWERPC_EXCP_RESET:
> +*msr |= 0x4ull << (63 - 45);
> +break;
> +case POWERPC_EXCP_EXTERNAL:
> +*msr |= 0x8ull << (63 - 45);
> +break;
> +case POWERPC_EXCP_DECR:
> +*msr |= 0x6ull << (63 - 45);
> +break;
> +case POWERPC_EXCP_SDOOR:
> +*msr |= 0x5ull << (63 - 45);
> +break;
> +case POWERPC_EXCP_SDOOR_HV:
> +*msr |= 0x3ull << (63 - 45);
> +break;
> +case POWERPC_EXCP_HV_MAINT:
> +*msr |= 0xaull << (63 - 45);
> +break;
> +default:
> +cpu_abort(cs, "Unsupported exception %d in Power Save mode\n",
> +  excp);
> +}
> +return POWERPC_EXCP_RESET;
> +}
> +
> +
>  /* Note that this function should be greatly optimized
>   * when called with a constant excp, from ppc_hw_interrupt
>   */
> @@ -102,40 +142,7 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int 
> excp_model, int excp)
>   * P7/P8/P9
>   */
>  if (env->in_pm_state) {
> -env->in_pm_state = false;
> -
> -/* Pretend to be returning from doze always as we don't lose state */
> -msr |= (0x1ull << (63 - 47));
> -
> -/* Non-machine check are routed to 0x100 with a wakeup cause
> - * encoded in SRR1
> - */
> -if (excp != POWERPC_EXCP_MCHECK) {
> -switch (excp) {
> -case POWERPC_EXCP_RESET:
> -msr |= 0x4ull << (63 - 45);
> -break;
> -case POWERPC_EXCP_EXTERNAL:
> -msr |= 0x8ull << (63 - 45);
> -break;
> -case POWERPC_EXCP_DECR:
> -msr |= 0x6ull << (63 - 45);
> -break;
> -case POWERPC_EXCP_SDOOR:
> -msr |= 0x5ull << (63 - 45);
> -break;
> -case POWERPC_EXCP_SDOOR_HV:
> -msr |= 0x3ull << (63 - 45);
> -break;
> -case POWERPC_EXCP_HV_MAINT:
> -msr |= 0xaull << (63 - 45);
> -break;
> -default:
> -cpu_abort(cs, "Unsupported exception %d in Power Save 
> mode\n",
> -  excp);
> -}
> -excp = POWERPC_EXCP_RESET;
> -}
> +excp = powerpc_reset_wakeup(cs, env, excp, );
>  }
>  
>  /* Exception targetting modifiers

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 15/19] target/ppc: Detect erroneous condition in interrupt delivery

2019-02-12 Thread David Gibson

On Mon, Jan 28, 2019 at 10:46:21AM +0100, Cédric Le Goater wrote:
> From: Benjamin Herrenschmidt 
> 
> It's very easy for the CPU specific has_work() implementation
> and the logic in ppc_hw_interrupt() to be subtly out of sync.
> 
> This can occasionally allow a CPU to wakeup from a PM state
> and resume executing past the PM instruction when it should
> resume at the 0x100 vector.
> 
> This detects if it happens and aborts, making it a lot easier
> to catch such bugs when testing rather than chasing obscure
> guest misbehaviour.
> 
> Signed-off-by: Benjamin Herrenschmidt 
> Signed-off-by: Cédric Le Goater 

Reviewed-by: David Gibson 

> ---
>  target/ppc/excp_helper.c | 16 
>  1 file changed, 16 insertions(+)
> 
> diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
> index 37546bb0f0fe..1a2f469a5fa2 100644
> --- a/target/ppc/excp_helper.c
> +++ b/target/ppc/excp_helper.c
> @@ -878,6 +878,22 @@ static void ppc_hw_interrupt(CPUPPCState *env)
>  return;
>  }
>  }
> +
> +if (env->resume_as_sreset) {
> +/*
> + * This is a bug ! It means that has_work took us out of halt without
> + * anything to deliver while in a PM state that requires getting
> + * out via a 0x100
> + *
> + * This means we will incorrectly execute past the power management
> + * instruction instead of triggering a reset.
> + *
> + * It generally means a discrepancy between the wakup conditions in 
> the
> + * processor has_work implementation and the logic in this function.
> + */
> +cpu_abort(CPU(ppc_env_get_cpu(env)),
> +  "Wakeup from PM state but interrupt Undelivered");
> +}
>  }
>  
>  void ppc_cpu_do_system_reset(CPUState *cs)

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 12/19] target/ppc: Disable ISA 2.06 PM instructions on POWER9

2019-02-12 Thread David Gibson

On Mon, Jan 28, 2019 at 10:46:18AM +0100, Cédric Le Goater wrote:
> From: Benjamin Herrenschmidt 
> 
> The ISA 2.06/2.07 Power Management instructions (doze, nap & rvwinkle)
> don't exist on POWER9, don't enable them.
> 
> Signed-off-by: Benjamin Herrenschmidt 
> Signed-off-by: Cédric Le Goater 

This looks like a correct fix regardless of the rest of the series,
applied to ppc-for-4.0.

> ---
>  target/ppc/translate_init.inc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
> index 076d94f45755..f235162a1f6b 100644
> --- a/target/ppc/translate_init.inc.c
> +++ b/target/ppc/translate_init.inc.c
> @@ -8880,7 +8880,7 @@ POWERPC_FAMILY(POWER9)(ObjectClass *oc, void *data)
>  PPC2_FP_TST_ISA206 | PPC2_BCTAR_ISA207 |
>  PPC2_LSQ_ISA207 | PPC2_ALTIVEC_207 |
>  PPC2_ISA205 | PPC2_ISA207S | PPC2_FP_CVT_S64 |
> -PPC2_TM | PPC2_PM_ISA206 | PPC2_ISA300 | PPC2_PRCNTL;
> +PPC2_TM | PPC2_ISA300 | PPC2_PRCNTL;
>  pcc->msr_mask = (1ull << MSR_SF) |
>  (1ull << MSR_TM) |
>  (1ull << MSR_VR) |

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 13/19] target/ppc: Rename "in_pm_state" to "resume_as_sreset"

2019-02-12 Thread David Gibson

On Mon, Jan 28, 2019 at 10:46:19AM +0100, Cédric Le Goater wrote:
> From: Benjamin Herrenschmidt 
> 
> To better reflect what this does, as it's specific to some of the
> P7/P8/P9 PM states, not generic.
> 
> Signed-off-by: Benjamin Herrenschmidt 
> Signed-off-by: Cédric Le Goater 

Reviewed-by: David Gibson 

> ---
>  target/ppc/cpu.h | 6 +++---
>  hw/ppc/ppc.c | 2 +-
>  target/ppc/excp_helper.c | 8 
>  3 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index 7ff65c804b57..b69410ea2541 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -1116,10 +1116,10 @@ struct CPUPPCState {
>  
>  /*
>   * On P7/P8/P9, set when in PM state, we need to handle resume in
> - * a special way (such as routing some resume causes to 0x100), so
> - * flag this here.
> + * a special way (such as routing some resume causes to 0x100, ie,
> + * sreset), so flag this here.
>   */
> -bool in_pm_state;
> +bool resume_as_sreset;
>  #endif
>  
>  /* Those resources are used only during code translation */
> diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
> index 9292f986eba7..608405f6f2ca 100644
> --- a/hw/ppc/ppc.c
> +++ b/hw/ppc/ppc.c
> @@ -722,7 +722,7 @@ static inline void cpu_ppc_hdecr_excp(PowerPCCPU *cpu)
>   * interrupts in a PM state. Not only they don't cause a
>   * wakeup but they also get effectively discarded.
>   */
> -if (!env->in_pm_state) {
> +if (!env->resume_as_sreset) {
>  ppc_set_irq(cpu, PPC_INTERRUPT_HDECR, 1);
>  }
>  }
> diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
> index 489a54f51b90..7536620a4133 100644
> --- a/target/ppc/excp_helper.c
> +++ b/target/ppc/excp_helper.c
> @@ -69,7 +69,7 @@ static int powerpc_reset_wakeup(CPUState *cs, CPUPPCState 
> *env, int excp,
>  target_ulong *msr)
>  {
>  /* We no longer are in a PM state */
> -env->in_pm_state = false;
> +env->resume_as_sreset = false;
>  
>  /* Pretend to be returning from doze always as we don't lose state */
>  *msr |= (0x1ull << (63 - 47));
> @@ -141,7 +141,7 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int 
> excp_model, int excp)
>   * check for special resume at 0x100 from doze/nap/sleep/winkle on
>   * P7/P8/P9
>   */
> -if (env->in_pm_state) {
> +if (env->resume_as_sreset) {
>  excp = powerpc_reset_wakeup(cs, env, excp, );
>  }
>  
> @@ -787,7 +787,7 @@ static void ppc_hw_interrupt(CPUPPCState *env)
>   * clear when coming out of some power management states (in order
>   * for them to become a 0x100).
>   */
> -async_deliver = (msr_ee != 0) || env->in_pm_state;
> +async_deliver = (msr_ee != 0) || env->resume_as_sreset;
>  
>  /* Hypervisor decrementer exception */
>  if (env->pending_interrupts & (1 << PPC_INTERRUPT_HDECR)) {
> @@ -970,7 +970,7 @@ void helper_pminsn(CPUPPCState *env, powerpc_pm_insn_t 
> insn)
>  env->pending_interrupts &= ~(1 << PPC_INTERRUPT_HDECR);
>  
>  /* Condition for waking up at 0x100 */
> -env->in_pm_state = (insn != PPC_PM_STOP) ||
> +env->resume_as_sreset = (insn != PPC_PM_STOP) ||
>  (env->spr[SPR_PSSCR] & PSSCR_EC);
>  }
>  #endif /* defined(TARGET_PPC64) */

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 18/19] ppc/xive: Make XIVE generate the proper interrupt types

2019-02-12 Thread David Gibson

On Mon, Jan 28, 2019 at 10:46:24AM +0100, Cédric Le Goater wrote:
> From: Benjamin Herrenschmidt 
> 
> It should be generic Hypervisor Virtualization interrupts for HV
> directed rings and traditional External Interrupts for the OS directed
> ring.
> 
> Don't generate anything for the user ring as it isn't actually
> supported.
> 
> Signed-off-by: Benjamin Herrenschmidt 
> Signed-off-by: Cédric Le Goater 

Reviewed-by: David Gibson 

> ---
>  include/hw/ppc/xive.h |  3 ++-
>  hw/intc/xive.c| 22 +++---
>  2 files changed, 21 insertions(+), 4 deletions(-)
> 
> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> index 2bad8526221b..82e9aef677c5 100644
> --- a/include/hw/ppc/xive.h
> +++ b/include/hw/ppc/xive.h
> @@ -316,7 +316,8 @@ typedef struct XiveTCTX {
>  DeviceState parent_obj;
>  
>  CPUState*cs;
> -qemu_irqoutput;
> +qemu_irqhv_output;
> +qemu_irqos_output;
>  
>  uint8_t regs[XIVE_TM_RING_COUNT * XIVE_TM_RING_SIZE];
>  uint32_thw_cam;
> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> index 119bb02d345d..14b854181ab7 100644
> --- a/hw/intc/xive.c
> +++ b/hw/intc/xive.c
> @@ -61,13 +61,28 @@ static uint8_t exception_mask(uint8_t ring)
>  }
>  }
>  
> +static qemu_irq xive_tctx_output(XiveTCTX *tctx, uint8_t ring)
> +{
> +switch (ring) {
> +case TM_QW0_USER:
> +return 0; /* Not supported */
> +case TM_QW1_OS:
> +return tctx->os_output;
> +case TM_QW2_HV_POOL:
> +case TM_QW3_HV_PHYS:
> +return tctx->hv_output;
> +default:
> +return 0;
> +}
> +}
> +
>  static uint64_t xive_tctx_accept(XiveTCTX *tctx, uint8_t ring)
>  {
>  uint8_t *regs = >regs[ring];
>  uint8_t nsr = regs[TM_NSR];
>  uint8_t mask = exception_mask(ring);
>  
> -qemu_irq_lower(tctx->output);
> +qemu_irq_lower(xive_tctx_output(tctx, ring));
>  
>  if (regs[TM_NSR] & mask) {
>  uint8_t cppr = regs[TM_PIPR];
> @@ -100,7 +115,7 @@ static void xive_tctx_notify(XiveTCTX *tctx, uint8_t ring)
>  default:
>  g_assert_not_reached();
>  }
> -qemu_irq_raise(tctx->output);
> +qemu_irq_raise(xive_tctx_output(tctx, ring));
>  }
>  }
>  
> @@ -554,7 +569,8 @@ static void xive_tctx_realize(DeviceState *dev, Error 
> **errp)
>  env = >env;
>  switch (PPC_INPUT(env)) {
>  case PPC_FLAGS_INPUT_POWER9:
> -tctx->output = env->irq_inputs[POWER7_INPUT_INT];
> +tctx->hv_output = env->irq_inputs[POWER9_INPUT_HINT];
> +tctx->os_output = env->irq_inputs[POWER9_INPUT_INT];
>  break;
>  
>  default:

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 19/19] target/ppc: Add support for LPCR:HEIC on POWER9

2019-02-12 Thread David Gibson

On Mon, Jan 28, 2019 at 10:46:25AM +0100, Cédric Le Goater wrote:
> From: Benjamin Herrenschmidt 
> 
> This controls whether the External Interrupt (0x500) can be
> delivered to the hypervisor or not.
> 
> Signed-off-by: Benjamin Herrenschmidt 
> Signed-off-by: Cédric Le Goater 

Reviewed-by: David Gibson 

> ---
>  target/ppc/excp_helper.c| 5 -
>  target/ppc/translate_init.inc.c | 5 -
>  2 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
> index d171a5eb6236..39bedbb11db0 100644
> --- a/target/ppc/excp_helper.c
> +++ b/target/ppc/excp_helper.c
> @@ -827,7 +827,10 @@ static void ppc_hw_interrupt(CPUPPCState *env)
>  /* External interrupt can ignore MSR:EE under some circumstances */
>  if (env->pending_interrupts & (1 << PPC_INTERRUPT_EXT)) {
>  bool lpes0 = !!(env->spr[SPR_LPCR] & LPCR_LPES0);
> -if (async_deliver || (env->has_hv_mode && msr_hv == 0 && !lpes0)) {
> +bool heic = !!(env->spr[SPR_LPCR] & LPCR_HEIC);
> +/* HEIC blocks delivery to the hypervisor */
> +if ((async_deliver && !(heic && msr_hv && !msr_pr)) ||
> +(env->has_hv_mode && msr_hv == 0 && !lpes0)) {
>  powerpc_excp(cpu, env->excp_model, POWERPC_EXCP_EXTERNAL);
>  return;
>  }
> diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
> index 7f25215f0192..63ded33ea7ea 100644
> --- a/target/ppc/translate_init.inc.c
> +++ b/target/ppc/translate_init.inc.c
> @@ -8823,7 +8823,10 @@ static bool cpu_has_work_POWER9(CPUState *cs)
>  /* External Exception */
>  if ((env->pending_interrupts & (1u << PPC_INTERRUPT_EXT)) &&
>  (env->spr[SPR_LPCR] & LPCR_EEE)) {
> -return true;
> +bool heic = !!(env->spr[SPR_LPCR] & LPCR_HEIC);
> +if (heic == 0 || !msr_hv || msr_pr) {
> +return true;
> +}
>  }
>  /* Decrementer Exception */
>  if ((env->pending_interrupts & (1u << PPC_INTERRUPT_DECR)) &&

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 16/19] target/ppc: Add Hypervisor Virtualization Interrupt on POWER9

2019-02-12 Thread David Gibson

On Mon, Jan 28, 2019 at 10:46:22AM +0100, Cédric Le Goater wrote:
> From: Benjamin Herrenschmidt 
> 
> This adds support for delivering that exception
> 
> Signed-off-by: Benjamin Herrenschmidt 
> Signed-off-by: Cédric Le Goater 

Reviewed-by: David Gibson 

> ---
>  target/ppc/cpu.h|  5 -
>  target/ppc/excp_helper.c| 17 -
>  target/ppc/translate_init.inc.c | 16 +++-
>  3 files changed, 35 insertions(+), 3 deletions(-)
> 
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index b69410ea2541..385d33bd37ff 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -160,8 +160,10 @@ enum {
>  /* Server doorbell variants */
>  POWERPC_EXCP_SDOOR= 99,
>  POWERPC_EXCP_SDOOR_HV = 100,
> +/* ISA 3.00 additions */
> +POWERPC_EXCP_HVIRT= 101,
>  /* EOL   
> */
> -POWERPC_EXCP_NB   = 101,
> +POWERPC_EXCP_NB   = 102,
>  /* QEMU exceptions: used internally during code translation  
> */
>  POWERPC_EXCP_STOP = 0x200, /* stop translation   
> */
>  POWERPC_EXCP_BRANCH   = 0x201, /* branch instruction 
> */
> @@ -2344,6 +2346,7 @@ enum {
>  PPC_INTERRUPT_PERFM,  /* Performance monitor interrupt*/
>  PPC_INTERRUPT_HMI,/* Hypervisor Maintainance interrupt*/
>  PPC_INTERRUPT_HDOORBELL,  /* Hypervisor Doorbell interrupt*/
> +PPC_INTERRUPT_HVIRT,  /* Hypervisor virtualization interrupt  */
>  };
>  
>  /* Processor Compatibility mask (PCR) */
> diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
> index 1a2f469a5fa2..d171a5eb6236 100644
> --- a/target/ppc/excp_helper.c
> +++ b/target/ppc/excp_helper.c
> @@ -97,6 +97,9 @@ static int powerpc_reset_wakeup(CPUState *cs, CPUPPCState 
> *env, int excp,
>  case POWERPC_EXCP_HV_MAINT:
>  *msr |= 0xaull << (63 - 45);
>  break;
> +case POWERPC_EXCP_HVIRT:
> +*msr |= 0x9ull << (63 - 45);
> +break;
>  default:
>  cpu_abort(cs, "Unsupported exception %d in Power Save mode\n",
>excp);
> @@ -427,6 +430,7 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int 
> excp_model, int excp)
>  case POWERPC_EXCP_HISEG: /* Hypervisor instruction segment exception 
> */
>  case POWERPC_EXCP_SDOOR_HV:  /* Hypervisor Doorbell interrupt
> */
>  case POWERPC_EXCP_HV_EMU:
> +case POWERPC_EXCP_HVIRT: /* Hypervisor virtualization
> */
>  srr0 = SPR_HSRR0;
>  srr1 = SPR_HSRR1;
>  new_msr |= (target_ulong)MSR_HVB;
> @@ -809,7 +813,18 @@ static void ppc_hw_interrupt(CPUPPCState *env)
>  return;
>  }
>  }
> -/* Extermal interrupt can ignore MSR:EE under some circumstances */
> +
> +/* Hypervisor virtualization interrupt */
> +if (env->pending_interrupts & (1 << PPC_INTERRUPT_HVIRT)) {
> +/* LPCR will be clear when not supported so this will work */
> +bool hvice = !!(env->spr[SPR_LPCR] & LPCR_HVICE);
> +if ((async_deliver || msr_hv == 0) && hvice) {
> +powerpc_excp(cpu, env->excp_model, POWERPC_EXCP_HVIRT);
> +return;
> +}
> +}
> +
> +/* External interrupt can ignore MSR:EE under some circumstances */
>  if (env->pending_interrupts & (1 << PPC_INTERRUPT_EXT)) {
>  bool lpes0 = !!(env->spr[SPR_LPCR] & LPCR_LPES0);
>  if (async_deliver || (env->has_hv_mode && msr_hv == 0 && !lpes0)) {
> diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
> index c1719c46a383..6ffa4a8fe0fa 100644
> --- a/target/ppc/translate_init.inc.c
> +++ b/target/ppc/translate_init.inc.c
> @@ -3313,6 +3313,15 @@ static void init_excp_POWER8(CPUPPCState *env)
>  #endif
>  }
>  
> +static void init_excp_POWER9(CPUPPCState *env)
> +{
> +init_excp_POWER8(env);
> +
> +#if !defined(CONFIG_USER_ONLY)
> +env->excp_vectors[POWERPC_EXCP_HVIRT]= 0x0EA0;
> +#endif
> +}
> +
>  #endif
>  
>  
> /*/
> @@ -8783,7 +8792,7 @@ static void init_proc_POWER9(CPUPPCState *env)
>  env->icache_line_size = 128;
>  
>  /* Allocate hardware IRQ controller */
> -init_excp_POWER8(env);
> +init_excp_POWER9(env);
>  ppcPOWER7_irq_init(ppc_env_get_cpu(env));
>  }
>  
> @@ -8836,6 +8845,11 @@ static bool cpu_has_work_POWER9(CPUState *cs)
>  (env->spr[SPR_LPCR] & LPCR_HDEE)) {
>  return true;
>  }
> +/* Hypervisor virtualization exception */
> +if ((env->pending_interrupts & (1u << PPC_INTERRUPT_HVIRT)) &&
> +(env->spr[SPR_LPCR] & LPCR_HVEE)) {
> +return true;
> +}
>  if (env->pending_interrupts & (1u << PPC_INTERRUPT_RESET)) {
>  return true;
>  }

--

Re: [Qemu-devel] [PATCH 10/19] target/ppc: Fix support for "STOP light" states on POWER9

2019-02-12 Thread David Gibson

On Mon, Jan 28, 2019 at 10:46:16AM +0100, Cédric Le Goater wrote:
> From: Benjamin Herrenschmidt 
> 
> STOP must act differently based on PSSCR:EC on POWER9. When set, it
> acts like the P7/P8 power management instructions and wake up at 0x100
> based on the wakeup conditions in LPCR.
> 
> When PSSCR:EC is clear however it will wakeup at the next instruction
> after STOP (if EE is clear) or take the corresponding interrupts (if
> EE is set).
> 
> Signed-off-by: Benjamin Herrenschmidt 
> Signed-off-by: Cédric Le Goater 

Reviewed-by: David Gibson 

> ---
>  target/ppc/cpu-qom.h|  1 +
>  target/ppc/cpu.h| 12 +---
>  target/ppc/excp_helper.c|  8 ++--
>  target/ppc/translate.c  | 13 -
>  target/ppc/translate_init.inc.c |  7 +++
>  5 files changed, 35 insertions(+), 6 deletions(-)
> 
> diff --git a/target/ppc/cpu-qom.h b/target/ppc/cpu-qom.h
> index 4ea67692e2a6..7c54093a7122 100644
> --- a/target/ppc/cpu-qom.h
> +++ b/target/ppc/cpu-qom.h
> @@ -122,6 +122,7 @@ typedef enum {
>  PPC_PM_NAP,
>  PPC_PM_SLEEP,
>  PPC_PM_RVWINKLE,
> +PPC_PM_STOP,
>  } powerpc_pm_insn_t;
>  
>  
> /*/
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index 2c22292e7f41..7ff65c804b57 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -413,6 +413,10 @@ struct ppc_slb_t {
>  #define LPCR_HVICEPPC_BIT(62) /* HV Virtualisation Int Enable */
>  #define LPCR_HDICEPPC_BIT(63)
>  
> +/* PSSCR bits */
> +#define PSSCR_ESL PPC_BIT(42) /* Enable State Loss */
> +#define PSSCR_EC  PPC_BIT(43) /* Exit Criterion */
> +
>  #define msr_sf   ((env->msr >> MSR_SF)   & 1)
>  #define msr_isf  ((env->msr >> MSR_ISF)  & 1)
>  #define msr_shv  ((env->msr >> MSR_SHV)  & 1)
> @@ -1109,9 +1113,11 @@ struct CPUPPCState {
>   * instructions and SPRs are diallowed if MSR:HV is 0
>   */
>  bool has_hv_mode;
> -/* On P7/P8, set when in PM state, we need to handle resume
> - * in a special way (such as routing some resume causes to
> - * 0x100), so flag this here.
> +
> +/*
> + * On P7/P8/P9, set when in PM state, we need to handle resume in
> + * a special way (such as routing some resume causes to 0x100), so
> + * flag this here.
>   */
>  bool in_pm_state;
>  #endif
> diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
> index 7c7c8d1b9dc6..97503193ef43 100644
> --- a/target/ppc/excp_helper.c
> +++ b/target/ppc/excp_helper.c
> @@ -97,7 +97,10 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int 
> excp_model, int excp)
>  asrr0 = -1;
>  asrr1 = -1;
>  
> -/* check for special resume at 0x100 from doze/nap/sleep/winkle on P7/P8 
> */
> +/*
> + * check for special resume at 0x100 from doze/nap/sleep/winkle on
> + * P7/P8/P9
> + */
>  if (env->in_pm_state) {
>  env->in_pm_state = false;
>  
> @@ -960,7 +963,8 @@ void helper_pminsn(CPUPPCState *env, powerpc_pm_insn_t 
> insn)
>  env->pending_interrupts &= ~(1 << PPC_INTERRUPT_HDECR);
>  
>  /* Condition for waking up at 0x100 */
> -env->in_pm_state = true;
> +env->in_pm_state = (insn != PPC_PM_STOP) ||
> +(env->spr[SPR_PSSCR] & PSSCR_EC);
>  }
>  #endif /* defined(TARGET_PPC64) */
>  
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index 55281a8975e0..07bedbb8f1ce 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -3594,7 +3594,18 @@ static void gen_nap(DisasContext *ctx)
>  
>  static void gen_stop(DisasContext *ctx)
>  {
> -gen_nap(ctx);
> +#if defined(CONFIG_USER_ONLY)
> +GEN_PRIV;
> +#else
> +TCGv_i32 t;
> +
> +CHK_HV;
> +t = tcg_const_i32(PPC_PM_STOP);
> +gen_helper_pminsn(cpu_env, t);
> +tcg_temp_free_i32(t);
> +/* Stop translation, as the CPU is supposed to sleep from now */
> +gen_exception_nip(ctx, EXCP_HLT, ctx->base.pc_next);
> +#endif /* defined(CONFIG_USER_ONLY) */
>  }
>  
>  static void gen_sleep(DisasContext *ctx)
> diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
> index 59e0b8676236..076d94f45755 100644
> --- a/target/ppc/translate_init.inc.c
> +++ b/target/ppc/translate_init.inc.c
> @@ -8801,9 +8801,16 @@ static bool cpu_has_work_POWER9(CPUState *cs)
>  CPUPPCState *env = >env;
>  
>  if (cs->halted) {
> +uint64_t psscr = env->spr[SPR_PSSCR];
> +
>  if (!(cs->interrupt_request & CPU_INTERRUPT_HARD)) {
>  return false;
>  }
> +
> +/* If EC is clear, just return true on any pending interrupt */
> +if (!(psscr & PSSCR_EC)) {
> +return true;
> +}
>  /* External Exception */
>  if ((env->pending_interrupts & (1u << PPC_INTERRUPT_EXT)) &&
>  (env->spr[SPR_LPCR] & LPCR_EEE)) {

-- 
David Gibson| I'll have my music baroque, and my code
david

Re: [Qemu-devel] [PATCH 17/19] target/ppc: Add POWER9 external interrupt model

2019-02-12 Thread David Gibson

On Mon, Jan 28, 2019 at 10:46:23AM +0100, Cédric Le Goater wrote:
> From: Benjamin Herrenschmidt 
> 
> Adds support for the Hypervisor directed interrupts in addition to the
> OS ones.
> 
> Signed-off-by: Benjamin Herrenschmidt 
> Signed-off-by: Cédric Le Goater 
> ---
>  include/hw/ppc/ppc.h|  2 ++
>  target/ppc/cpu-qom.h|  2 ++
>  target/ppc/cpu.h|  7 +++
>  hw/intc/xics.c  |  1 +
>  hw/intc/xive.c  |  2 +-
>  hw/ppc/ppc.c| 14 ++
>  target/ppc/translate_init.inc.c |  4 ++--
>  7 files changed, 29 insertions(+), 3 deletions(-)
> 
> diff --git a/include/hw/ppc/ppc.h b/include/hw/ppc/ppc.h
> index daaa04a22dbf..4bdcb8bacd4e 100644
> --- a/include/hw/ppc/ppc.h
> +++ b/include/hw/ppc/ppc.h
> @@ -74,6 +74,7 @@ static inline void ppc40x_irq_init(PowerPCCPU *cpu) {}
>  static inline void ppc6xx_irq_init(PowerPCCPU *cpu) {}
>  static inline void ppc970_irq_init(PowerPCCPU *cpu) {}
>  static inline void ppcPOWER7_irq_init(PowerPCCPU *cpu) {}
> +static inline void ppcPOWER9_irq_init(PowerPCCPU *cpu) {}
>  static inline void ppce500_irq_init(PowerPCCPU *cpu) {}
>  #else
>  void ppc40x_irq_init(PowerPCCPU *cpu);
> @@ -81,6 +82,7 @@ void ppce500_irq_init(PowerPCCPU *cpu);
>  void ppc6xx_irq_init(PowerPCCPU *cpu);
>  void ppc970_irq_init(PowerPCCPU *cpu);
>  void ppcPOWER7_irq_init(PowerPCCPU *cpu);
> +void ppcPOWER9_irq_init(PowerPCCPU *cpu);
>  #endif
>  
>  /* PPC machines for OpenBIOS */
> diff --git a/target/ppc/cpu-qom.h b/target/ppc/cpu-qom.h
> index 7ff8b2d68632..079fbb9d8718 100644
> --- a/target/ppc/cpu-qom.h
> +++ b/target/ppc/cpu-qom.h
> @@ -142,6 +142,8 @@ enum powerpc_input_t {
>  PPC_FLAGS_INPUT_970,
>  /* PowerPC POWER7 bus   */
>  PPC_FLAGS_INPUT_POWER7,
> +/* PowerPC POWER9 bus   */
> +PPC_FLAGS_INPUT_POWER9,
>  /* PowerPC 401 bus  */
>  PPC_FLAGS_INPUT_401,
>  /* Freescale RCPU bus   */
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index 385d33bd37ff..cc41ae6f3017 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -2322,6 +2322,13 @@ enum {
>   * them */
>  POWER7_INPUT_NB,
>  };
> +
> +enum {
> +/* POWER9 input pins */
> +POWER9_INPUT_INT= 0, /* Must match POWER7_INPUT_INT */

Rather than having this vital comment here...

> +POWER9_INPUT_HINT   = 1,
> +POWER9_INPUT_NB,
> +};
>  #endif
>  
>  /* Hardware exceptions definitions */
> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
> index 16e8ffa2aaf7..1f786175168f 100644
> --- a/hw/intc/xics.c
> +++ b/hw/intc/xics.c
> @@ -342,6 +342,7 @@ static void icp_realize(DeviceState *dev, Error **errp)
>  env = >env;
>  switch (PPC_INPUT(env)) {
>  case PPC_FLAGS_INPUT_POWER7:
> +case PPC_FLAGS_INPUT_POWER9: /* For SPAPR xics emulation */

.. you could just split the cases here.  It's a little more verbose,
but it's more robust.

>  icp->output = env->irq_inputs[POWER7_INPUT_INT];
>  break;
>  
> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> index 0e0e1dc9c1b7..119bb02d345d 100644
> --- a/hw/intc/xive.c
> +++ b/hw/intc/xive.c
> @@ -553,7 +553,7 @@ static void xive_tctx_realize(DeviceState *dev, Error 
> **errp)
>  
>  env = >env;
>  switch (PPC_INPUT(env)) {
> -case PPC_FLAGS_INPUT_POWER7:
> +case PPC_FLAGS_INPUT_POWER9:

And here, of course.

>  tctx->output = env->irq_inputs[POWER7_INPUT_INT];
>  break;
>  
> diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
> index 608405f6f2ca..1a5e087c9b1e 100644
> --- a/hw/ppc/ppc.c
> +++ b/hw/ppc/ppc.c
> @@ -289,6 +289,12 @@ static void power7_set_irq(void *opaque, int pin, int 
> level)
>  __func__, level);
>  ppc_set_irq(cpu, PPC_INTERRUPT_EXT, level);
>  break;
> +case POWER9_INPUT_HINT:

Having a POWER9 specific case in a function named for power7 is pretty
odd.  Best to split these as well, I think.

> +/* Level sensitive - active high */
> +LOG_IRQ("%s: set the external IRQ state to %d\n",
> +__func__, level);
> +ppc_set_irq(cpu, PPC_INTERRUPT_HVIRT, level);
> +break;
>  default:
>  /* Unknown pin - do nothing */
>  LOG_IRQ("%s: unknown IRQ pin %d\n", __func__, pin);
> @@ -308,6 +314,14 @@ void ppcPOWER7_irq_init(PowerPCCPU *cpu)
>  env->irq_inputs = (void **)qemu_allocate_irqs(_set_irq, cpu,
>POWER7_INPUT_NB);
>  }
> +
> +void ppcPOWER9_irq_init(PowerPCCPU *cpu)
> +{
> +CPUPPCState *env = >env;
> +
> +env->irq_inputs = (void **)qemu_allocate_irqs(_set_irq, cpu,
> +  POWER9_INPUT_NB);
> +}
>  #endif /* defined(TARGET_PPC64) */
>  
>  /* PowerPC 40x internal IRQ controller */
> diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
> index 6ffa4a8fe0fa..7f25215f0192 100644
> ---

Re: [Qemu-devel] [PATCH 07/19] target/ppc: Make special ORs match x86 pause and don't generate on mttcg

2019-02-12 Thread David Gibson

On Wed, Feb 13, 2019 at 11:03:12AM +1100, Benjamin Herrenschmidt wrote:
> On Tue, 2019-02-12 at 16:59 +1100, David Gibson wrote:
> > On Mon, Jan 28, 2019 at 10:46:13AM +0100, Cédric Le Goater wrote:
> > > From: Benjamin Herrenschmidt 
> > > 
> > > There's no point in going out of translation on an SMT OR with
> > > mttcg since the backend won't do anything useful such as pausing,
> > > it's only useful on traditional TCG to give time to other
> > > processors.
> > 
> > Is it actively harmful in the MTTCG case, or just pointless?
> 
> I think it can hurt performance, I don't remember for sure :)
> 
> > > Signed-off-by: Benjamin Herrenschmidt 
> > > Signed-off-by: Cédric Le Goater 
> > > ---
> > >  target/ppc/translate.c | 6 --
> > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> > > index e169c43643a1..7d40a1fbe6bd 100644
> > > --- a/target/ppc/translate.c
> > > +++ b/target/ppc/translate.c
> > > @@ -1580,7 +1580,7 @@ static void gen_pause(DisasContext *ctx)
> > >  tcg_temp_free_i32(t0);
> > >  
> > >  /* Stop translation, this gives other CPUs a chance to run */
> > > -gen_exception_nip(ctx, EXCP_HLT, ctx->base.pc_next);
> > > +gen_exception_nip(ctx, EXCP_INTERRUPT, ctx->base.pc_next);
> > 
> > I don't see how this change relates to the rest.
> 
> Yeah not sure anymore :-)

Oh.  That certainly doesn't make this easier to review.

So, all these target/ppc patches are only indirectly related to XIVE
pnv support.  Cédric, can you split them out into their own series on
the next spin.

> 
> > >  }
> > >  #endif /* defined(TARGET_PPC64) */
> > >  
> > > @@ -1662,7 +1662,9 @@ static void gen_or(DisasContext *ctx)
> > >   * than no-op, e.g., miso(rs=26), yield(27), mdoio(29), 
> > > mdoom(30),
> > >   * and all currently undefined.
> > >   */
> > > -gen_pause(ctx);
> > > +if (!mttcg_enabled) {
> > > +gen_pause(ctx);
> > > +}
> > >  #endif
> > >  #endif
> > >  }
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v4 10/15] qdev: pass an Object * to qbus_set_hotplug_handler()

2019-02-12 Thread David Gibson

On Tue, Feb 12, 2019 at 07:24:59PM +0100, Greg Kurz wrote:
> From: Michael Roth 
> 
> Certain devices types, like memory/CPU, are now being handled using a
> hotplug interface provided by a top-level MachineClass. Hotpluggable
> host bridges are another such device where it makes sense to use a
> machine-level hotplug handler. However, unlike those devices,
> host-bridges have a parent bus (the main system bus), and devices with
> a parent bus use a different mechanism for registering their hotplug
> handlers: qbus_set_hotplug_handler(). This interface currently expects
> a handler to be a subclass of DeviceClass, but this is not the case
> for MachineClass, which derives directly from ObjectClass.
> 
> Internally, the interface only requires an ObjectClass, so expose that
> in qbus_set_hotplug_handler().
> 
> Cc: Michael S. Tsirkin 
> Cc: Eduardo Habkost 
> Signed-off-by: Michael Roth 
> Signed-off-by: Greg Kurz 
> Reviewed-by: David Gibson 
> Reviewed-by: Cornelia Huck 
> Acked-by: Halil Pasic 
> Reviewed-by: Michael S. Tsirkin 

Applied to ppc-for-4.0, this will be useful for something I have in
mind as well.

> ---
>  hw/acpi/pcihp.c   |2 +-
>  hw/acpi/piix4.c   |2 +-
>  hw/char/virtio-serial-bus.c   |2 +-
>  hw/core/bus.c |   11 ++-
>  hw/pci/pcie.c |2 +-
>  hw/pci/shpc.c |2 +-
>  hw/ppc/spapr_pci.c|2 +-
>  hw/s390x/css-bridge.c |2 +-
>  hw/s390x/s390-pci-bus.c   |6 +++---
>  hw/scsi/virtio-scsi.c |2 +-
>  hw/scsi/vmw_pvscsi.c  |2 +-
>  hw/usb/dev-smartcard-reader.c |2 +-
>  include/hw/qdev-core.h|3 +--
>  13 files changed, 16 insertions(+), 24 deletions(-)
> 
> diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
> index 7bc7a723407b..942918132376 100644
> --- a/hw/acpi/pcihp.c
> +++ b/hw/acpi/pcihp.c
> @@ -251,7 +251,7 @@ void acpi_pcihp_device_plug_cb(HotplugHandler 
> *hotplug_dev, AcpiPciHpState *s,
>  object_dynamic_cast(OBJECT(dev), TYPE_PCI_BRIDGE)) {
>  PCIBus *sec = pci_bridge_get_sec_bus(PCI_BRIDGE(pdev));
>  
> -qbus_set_hotplug_handler(BUS(sec), DEVICE(hotplug_dev),
> +qbus_set_hotplug_handler(BUS(sec), OBJECT(hotplug_dev),
>   _abort);
>  /* We don't have to overwrite any other hotplug handler yet */
>  assert(QLIST_EMPTY(>child));
> diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
> index 88f9a9ec0912..df8c0db909ce 100644
> --- a/hw/acpi/piix4.c
> +++ b/hw/acpi/piix4.c
> @@ -536,7 +536,7 @@ static void piix4_pm_realize(PCIDevice *dev, Error **errp)
>  
>  piix4_acpi_system_hot_add_init(pci_address_space_io(dev),
> pci_get_bus(dev), s);
> -qbus_set_hotplug_handler(BUS(pci_get_bus(dev)), DEVICE(s), _abort);
> +qbus_set_hotplug_handler(BUS(pci_get_bus(dev)), OBJECT(s), _abort);
>  
>  piix4_pm_add_propeties(s);
>  }
> diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
> index d76351d7487d..bdd917bbb83c 100644
> --- a/hw/char/virtio-serial-bus.c
> +++ b/hw/char/virtio-serial-bus.c
> @@ -1052,7 +1052,7 @@ static void virtio_serial_device_realize(DeviceState 
> *dev, Error **errp)
>  /* Spawn a new virtio-serial bus on which the ports will ride as devices 
> */
>  qbus_create_inplace(>bus, sizeof(vser->bus), 
> TYPE_VIRTIO_SERIAL_BUS,
>  dev, vdev->bus_name);
> -qbus_set_hotplug_handler(BUS(>bus), DEVICE(vser), errp);
> +qbus_set_hotplug_handler(BUS(>bus), OBJECT(vser), errp);
>  vser->bus.vser = vser;
>  QTAILQ_INIT(>ports);
>  
> diff --git a/hw/core/bus.c b/hw/core/bus.c
> index 4651f244864c..e09843f6abea 100644
> --- a/hw/core/bus.c
> +++ b/hw/core/bus.c
> @@ -22,22 +22,15 @@
>  #include "hw/qdev.h"
>  #include "qapi/error.h"
>  
> -static void qbus_set_hotplug_handler_internal(BusState *bus, Object *handler,
> -  Error **errp)
> +void qbus_set_hotplug_handler(BusState *bus, Object *handler, Error **errp)
>  {
> -
>  object_property_set_link(OBJECT(bus), OBJECT(handler),
>   QDEV_HOTPLUG_HANDLER_PROPERTY, errp);
>  }
>  
> -void qbus_set_hotplug_handler(BusState *bus, DeviceState *handler, Error 
> **errp)
> -{
> -qbus_set_hotplug_handler_internal(bus, OBJECT(handler), errp);
> -}
> -
>  void qbus_set_bus_hotplug_handler(BusState *bus, Error **errp)
>  {
> -qbus_set_hotplug_handler_internal(bus, OBJECT(bus), errp);
> +qbus_set_hotplug_handler(bus, OBJECT(bus), errp);
>  }
>  
>  int qbus_walk_children(BusState *bus,
> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> index 230478faab12..3f7c36609313 100644
> --- a/hw/pci/pcie.c
> +++ b/hw/pci/pcie.c
> @@ -543,7 +543,7 @@ void pcie_cap_slot_init(PCIDevice *dev, uint16_t slot)
>  dev->exp.hpev_notified = false;
>  
>

Re: [Qemu-devel] [PATCH v4 03/15] spapr_irq: Set LSIs at interrupt controller init

2019-02-12 Thread David Gibson

On Tue, Feb 12, 2019 at 07:24:13PM +0100, Greg Kurz wrote:
> The pseries machine only uses LSIs to support legacy PCI devices. Every
> PHB claims 4 LSIs at realize time. When using in-kernel XICS (or upcoming
> in-kernel XIVE), QEMU synchronizes the state of all irqs, including these
> LSIs, later on at machine reset.
> 
> In order to support PHB hotplug, we need a way to tell KVM about the LSIs
> that doesn't require a machine reset.
> 
> Since recent machine types allocate all these LSIs in a fixed range for
> the machine lifetime, identify them when initializing the interrupt
> controller, long before they get passed to KVM.
> 
> In order to do that, first disintricate interrupt typing and allocation.
> Since the vast majority of interrupts are MSIs, make that the default
> and have only the LSI users to explicitely set the type.
> 
> It is rather straight forward for XIVE. XICS needs some extra care
> though: allocation state and type are mixed up in the same bits of the
> flags field within the interrupt state. Setting the LSI bit there at
> init time would mean the interrupt is de facto allocated, even if no
> device asked for it. Introduce a bitmap to track LSIs at the ICS level.
> In order to keep the patch minimal, the bitmap is only used when writing
> the source state to KVM and when the interrupt is claimed, so that the
> code that checks the interrupt type through the flags stays untouched.
> 
> With older pseries machine using the XICS legacy IRQ allocation scheme,
> all interrupt numbers come from a common pool and there's no such thing
> as a fixed range for LSIs. Introduce an helper so that these older
> machine types can continue to set the type when allocating the LSI.
> 
> Signed-off-by: Greg Kurz 
> ---
>  hw/intc/spapr_xive.c|7 +--
>  hw/intc/xics.c  |   10 --
>  hw/intc/xics_kvm.c  |2 +-
>  hw/ppc/pnv_psi.c|3 ++-
>  hw/ppc/spapr_events.c   |4 ++--
>  hw/ppc/spapr_irq.c  |   42 --
>  hw/ppc/spapr_pci.c  |6 --
>  hw/ppc/spapr_vio.c  |2 +-
>  include/hw/ppc/spapr_irq.h  |5 +++--
>  include/hw/ppc/spapr_xive.h |2 +-
>  include/hw/ppc/xics.h   |4 +++-
>  11 files changed, 58 insertions(+), 29 deletions(-)
> 
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index 290a290e43a5..815263ca72ab 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -480,18 +480,13 @@ static void spapr_xive_register_types(void)
>  
>  type_init(spapr_xive_register_types)
>  
> -bool spapr_xive_irq_claim(sPAPRXive *xive, uint32_t lisn, bool lsi)
> +bool spapr_xive_irq_claim(sPAPRXive *xive, uint32_t lisn)
>  {
> -XiveSource *xsrc = >source;
> -
>  if (lisn >= xive->nr_irqs) {
>  return false;
>  }
>  
>  xive->eat[lisn].w |= cpu_to_be64(EAS_VALID);
> -if (lsi) {
> -xive_source_irq_set_lsi(xsrc, lisn);
> -}
>  return true;
>  }
>  
> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
> index 7cac138067e2..26e8940d7329 100644
> --- a/hw/intc/xics.c
> +++ b/hw/intc/xics.c
> @@ -636,6 +636,7 @@ static void ics_base_realize(DeviceState *dev, Error 
> **errp)
>  return;
>  }
>  ics->irqs = g_malloc0(ics->nr_irqs * sizeof(ICSIRQState));
> +ics->lsi_map = bitmap_new(ics->nr_irqs);
>  }
>  
>  static int ics_base_dispatch_pre_save(void *opaque)
> @@ -733,12 +734,17 @@ ICPState *xics_icp_get(XICSFabric *xi, int server)
>  return xic->icp_get(xi, server);
>  }
>  
> -void ics_set_irq_type(ICSState *ics, int srcno, bool lsi)
> +void ics_set_lsi(ICSState *ics, int srcno)
> +{
> +set_bit(srcno, ics->lsi_map);
> +}
> +
> +void ics_claim_irq(ICSState *ics, int srcno)
>  {
>  assert(!(ics->irqs[srcno].flags & XICS_FLAGS_IRQ_MASK));
>  
>  ics->irqs[srcno].flags |=
> -lsi ? XICS_FLAGS_IRQ_LSI : XICS_FLAGS_IRQ_MSI;
> +test_bit(srcno, ics->lsi_map) ? XICS_FLAGS_IRQ_LSI : 
> XICS_FLAGS_IRQ_MSI;

I really don't like having the trigger type redundantly stored in the
lsi_map and then again in the flags fields.

In a sense the natural way to do this would be more like the hardware
- have two source objects, one for MSIs and one for LSIs, and make the
trigger a per ICSState rather than per IRQState.  But that would make
life hard for the legacy support.

But... thinking about it, isn't all this overkill anyway.  Can't we
fix the problem by simply forcing an ics_set_kvm_state() (and the xive
equivalent) at claim time.  It's not like it's a hot path.

>  }
>  
>  static void xics_register_types(void)
> diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
> index dff13300504c..e63979abc7fc 100644
> --- a/hw/intc/xics_kvm.c
> +++ b/hw/intc/xics_kvm.c
> @@ -271,7 +271,7 @@ static int ics_set_kvm_state(ICSState *ics, int 
> version_id)
>  state |= KVM_XICS_MASKED;
>  }
>  
> -if (ics->irqs[i].flags & XICS_FLAGS_IRQ_LSI) {
> +if

Re: [Qemu-devel] [PATCH v4 06/15] spapr_pci: add PHB unrealize

2019-02-12 Thread David Gibson

On Tue, Feb 12, 2019 at 07:24:33PM +0100, Greg Kurz wrote:
> To support PHB hotplug we need to clean up lingering references,
> memory, child properties, etc. prior to the PHB object being
> finalized. Generally this will be called as a result of calling
> object_unparent() on the PHB object, which in turn would normally
> be called as the result of an unplug() operation.
> 
> When the PHB is finalized, child objects will be unparented in
> turn, and finalized if the PHB was the only reference holder. so
> we don't bother to explicitly unparent child objects of the PHB
> (spapr_iommu, spapr_drc, etc).
> 
> The formula that gives the number of DMA windows is moved to an
> inline function in the hw/pci-host/spapr.h header because it
> will have other users.
> 
> The unrealize function is able to cope with partially realized PHBs.
> It is hence used to implement proper rollback on the realize error
> path.
> 
> Signed-off-by: Michael Roth 
> Signed-off-by: Greg Kurz 

Reviewed-by: David Gibson 

> ---
> v4: - reverted to v2
> v3: - don't free LSIs at unrealize
> v2: - implement rollback with unrealize function
> ---
>  hw/ppc/spapr_pci.c  |   75 
> +--
>  include/hw/pci-host/spapr.h |5 +++
>  2 files changed, 76 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index d68595531d5a..e3781dd110b2 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1565,6 +1565,64 @@ static void spapr_pci_unplug_request(HotplugHandler 
> *plug_handler,
>  }
>  }
>  
> +static void spapr_phb_finalizefn(Object *obj)
> +{
> +sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(obj);
> +
> +g_free(sphb->dtbusname);
> +sphb->dtbusname = NULL;
> +}
> +
> +static void spapr_phb_unrealize(DeviceState *dev, Error **errp)
> +{
> +sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> +SysBusDevice *s = SYS_BUS_DEVICE(dev);
> +PCIHostState *phb = PCI_HOST_BRIDGE(s);
> +sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(phb);
> +sPAPRTCETable *tcet;
> +int i;
> +const unsigned windows_supported = spapr_phb_windows_supported(sphb);
> +
> +if (sphb->msi) {
> +g_hash_table_unref(sphb->msi);
> +sphb->msi = NULL;
> +}
> +
> +/*
> + * Remove IO/MMIO subregions and aliases, rest should get cleaned
> + * via PHB's unrealize->object_finalize
> + */
> +for (i = windows_supported - 1; i >= 0; i--) {
> +tcet = spapr_tce_find_by_liobn(sphb->dma_liobn[i]);
> +if (tcet) {
> +memory_region_del_subregion(>iommu_root,
> +spapr_tce_get_iommu(tcet));
> +}
> +}
> +
> +for (i = PCI_NUM_PINS - 1; i >= 0; i--) {
> +if (sphb->lsi_table[i].irq) {
> +spapr_irq_free(spapr, sphb->lsi_table[i].irq, 1);
> +sphb->lsi_table[i].irq = 0;
> +}
> +}
> +
> +QLIST_REMOVE(sphb, list);
> +
> +memory_region_del_subregion(>iommu_root, >msiwindow);
> +
> +address_space_destroy(>iommu_as);
> +
> +qbus_set_hotplug_handler(BUS(phb->bus), NULL, _abort);
> +pci_unregister_root_bus(phb->bus);
> +
> +memory_region_del_subregion(get_system_memory(), >iowindow);
> +if (sphb->mem64_win_pciaddr != (hwaddr)-1) {
> +memory_region_del_subregion(get_system_memory(), >mem64window);
> +}
> +memory_region_del_subregion(get_system_memory(), >mem32window);
> +}
> +
>  static void spapr_phb_realize(DeviceState *dev, Error **errp)
>  {
>  /* We don't use SPAPR_MACHINE() in order to exit gracefully if the user
> @@ -1582,8 +1640,7 @@ static void spapr_phb_realize(DeviceState *dev, Error 
> **errp)
>  PCIBus *bus;
>  uint64_t msi_window_size = 4096;
>  sPAPRTCETable *tcet;
> -const unsigned windows_supported =
> -sphb->ddw_enabled ? SPAPR_PCI_DMA_MAX_WINDOWS : 1;
> +const unsigned windows_supported = spapr_phb_windows_supported(sphb);
>  
>  if (!spapr) {
>  error_setg(errp, TYPE_SPAPR_PCI_HOST_BRIDGE " needs a pseries 
> machine");
> @@ -1740,6 +1797,10 @@ static void spapr_phb_realize(DeviceState *dev, Error 
> **errp)
>  if (local_err) {
>  error_propagate_prepend(errp, local_err,
>  "can't allocate LSIs: ");
> +/*
> + * Older machines will never support PHB hotplug, ie, this 
> is an
> + * init only path and QEMU will terminate. No need to 
> rollback.
> + */
>  return;
>  }
>  
> @@ -1749,7 +1810,7 @@ static void spapr_phb_realize(DeviceState *dev, Error 
> **errp)
>  spapr_irq_claim(spapr, irq, _err);
>  if (local_err) {
>  error_propagate_prepend(errp, local_err, "can't allocate LSIs: 
> ");
> -return;
> +goto unrealize;
>  }
>  
>  sphb->lsi_table[i].irq = irq;
> @@ -1769,13

Re: [Qemu-devel] [PATCH 13/13] spapr: add KVM support to the 'dual' machine

2019-02-12 Thread David Gibson

On Tue, Feb 12, 2019 at 08:18:19AM +0100, Cédric Le Goater wrote:
> On 2/12/19 2:11 AM, David Gibson wrote:
> > On Mon, Jan 07, 2019 at 07:39:46PM +0100, Cédric Le Goater wrote:
> >> The interrupt mode is chosen by the CAS negotiation process and
> >> activated after a reset to take into account the required changes in
> >> the machine. This brings new constraints on how the associated KVM IRQ
> >> device is initialized.
> >>
> >> Currently, each model takes care of the initialization of the KVM
> >> device in their realize method but this is not possible anymore as the
> >> initialization needs to be done globaly when the interrupt mode is
> >> known, i.e. when machine is reseted. It also means that we need a way
> >> to delete a KVM device when another mode is chosen.
> >>
> >> Also, to support migration, the QEMU objects holding the state to
> >> transfer should always be available but not necessarily activated.
> >>
> >> The overall approach of this proposal is to initialize both interrupt
> >> mode at the QEMU level and keep the IRQ number space in sync to allow
> >> switching from one mode to another. For the KVM side of things, the
> >> whole initialization of the KVM device, sources and presenters, is
> >> grouped in a single routine. The XICS and XIVE sPAPR IRQ reset
> >> handlers are modified accordingly to handle the init and the delete
> >> sequences of the KVM device.
> >>
> >> As KVM is now initialized at reset, we loose the possiblity to
> >> fallback to the QEMU emulated mode in case of failure and failures
> >> become fatal to the machine.
> >>
> >> Signed-off-by: Cédric Le Goater 
> >> ---
> >>  hw/intc/spapr_xive.c |  8 +---
> >>  hw/intc/spapr_xive_kvm.c | 27 ++
> >>  hw/intc/xics_kvm.c   | 25 +
> >>  hw/intc/xive.c   |  4 --
> >>  hw/ppc/spapr_irq.c   | 79 
> >>  5 files changed, 109 insertions(+), 34 deletions(-)
> >>
> >> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> >> index 21f3c1ef0901..0661aca35900 100644
> >> --- a/hw/intc/spapr_xive.c
> >> +++ b/hw/intc/spapr_xive.c
> >> @@ -330,13 +330,7 @@ static void spapr_xive_realize(DeviceState *dev, 
> >> Error **errp)
> >>  xive->eat = g_new0(XiveEAS, xive->nr_irqs);
> >>  xive->endt = g_new0(XiveEND, xive->nr_ends);
> >>  
> >> -if (kvmppc_xive_enabled()) {
> >> -kvmppc_xive_connect(xive, _err);
> >> -if (local_err) {
> >> -error_propagate(errp, local_err);
> >> -return;
> >> -}
> >> -} else {
> >> +if (!kvmppc_xive_enabled()) {
> >>  /* TIMA initialization */
> >>  memory_region_init_io(>tm_mmio, OBJECT(xive), _tm_ops, 
> >> xive,
> >>"xive.tima", 4ull << TM_SHIFT);
> >> diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
> >> index d35814c1992e..3ebc947f2be7 100644
> >> --- a/hw/intc/spapr_xive_kvm.c
> >> +++ b/hw/intc/spapr_xive_kvm.c
> >> @@ -737,6 +737,15 @@ void kvmppc_xive_connect(sPAPRXive *xive, Error 
> >> **errp)
> >>  Error *local_err = NULL;
> >>  size_t esb_len;
> >>  size_t tima_len;
> >> +CPUState *cs;
> >> +
> >> +/*
> >> + * The KVM XIVE device already in use. This is the case when
> >> + * rebooting XIVE -> XIVE
> > 
> > Can this case actually occur?  Further down you appear to
> > unconditionally destroy both KVM devices at reset time.
> 
> I guess you are right. I will check.
> 
> >> + */
> >> +if (xive->fd != -1) {
> >> +return;
> >> +}
> >>  
> >>  if (!kvm_enabled() || !kvmppc_has_cap_xive()) {
> >>  error_setg(errp, "IRQ_XIVE capability must be present for KVM");
> >> @@ -800,6 +809,24 @@ void kvmppc_xive_connect(sPAPRXive *xive, Error 
> >> **errp)
> >>  xive->change = qemu_add_vm_change_state_handler(
> >>  kvmppc_xive_change_state_handler, xive);
> >>  
> >> +/* Connect the presenters to the initial VCPUs of the machine */
> >> +CPU_FOREACH(cs) {
> >> +PowerPCCPU *cpu = POWERPC_CPU(cs);
> >> +
> >> +kvmppc_xive_cpu_connect(cpu->tctx, _err);
> >> +if (local_err) {
> >> +error_propagate(errp, local_err);
> >> +return;
> >> +}
> >> +}
> >> +
> >> +/* Update the KVM sources */
> >> +kvmppc_xive_source_reset(xsrc, _err);
> >> +if (local_err) {
> >> +error_propagate(errp, local_err);
> >> +return;
> >> +}
> >> +
> >>  kvm_kernel_irqchip = true;
> >>  kvm_msi_via_irqfd_allowed = true;
> >>  kvm_gsi_direct_mapping = true;
> >> diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
> >> index 1d21ff217b82..bfc35d71df7f 100644
> >> --- a/hw/intc/xics_kvm.c
> >> +++ b/hw/intc/xics_kvm.c
> >> @@ -448,6 +448,16 @@ static void rtas_dummy(PowerPCCPU *cpu, 
> >> sPAPRMachineState *spapr,
> >>  int xics_kvm_init(sPAPRMachineState *spapr, Error **errp)
> >>  {
> >>  int rc;
> >> +CPUState *cs;
> >> +Error

Re: [Qemu-devel] [PATCH v4 14/15] spapr: add hotplug hooks for PHB hotplug

2019-02-12 Thread David Gibson

On Tue, Feb 12, 2019 at 07:25:25PM +0100, Greg Kurz wrote:
> Hotplugging PHBs is a machine-level operation, but PHBs reside on the
> main system bus, so we register spapr machine as the handler for the
> main system bus.
> 
> Provide the usual pre-plug, plug and unplug-request handlers.
> 
> Move the checking of the PHB index to the pre-plug handler. It is okay
> to do that and assert in the realize function because the pre-plug
> handler is always called, even for the oldest machine types we support.
> 
> Unlike with other device types, there are some cases where we cannot
> provide the FDT fragment of the PHB from the plug handler, eg, before
> KVMPPC_H_UPDATE_DT was called. Do this from a DRC callback that is
> called just before the first FDT fragment is exposed to the guest.
> 
> Signed-off-by: Michael Roth 
> (Fixed interrupt controller phandle in "interrupt-map" and
>  TCE table size in "ibm,dma-window" FDT fragment, Greg Kurz)
> Signed-off-by: Greg Kurz 
> ---
> v4: - populate FDT fragment in a DRC callback
> v3: - reworked phandle handling some more
> v2: - reworked phandle handling
> - sync LSIs to KVM
> ---
> ---
>  hw/ppc/spapr.c |  121 
> 
>  hw/ppc/spapr_drc.c |2 +
>  hw/ppc/spapr_pci.c |   16 --
>  include/hw/ppc/spapr.h |5 ++
>  4 files changed, 127 insertions(+), 17 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 021758825b7e..06ce0babcb54 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2930,6 +2930,11 @@ static void spapr_machine_init(MachineState *machine)
>  register_savevm_live(NULL, "spapr/htab", -1, 1,
>   _htab_handlers, spapr);
>  
> +if (smc->dr_phb_enabled) {
> +qbus_set_hotplug_handler(sysbus_get_default(), OBJECT(machine),
> + _fatal);
> +}

I think you could do this unconditionally and just check
dr_phb_enabled at pre_plug.  That makes it more consistent with the
other hotplug types, and I suspect will give us better error messages.

>  qemu_register_boot_set(spapr_boot_set, spapr);
>  
>  if (kvm_enabled()) {
> @@ -3733,6 +3738,108 @@ out:
>  error_propagate(errp, local_err);
>  }
>  
> +int spapr_dt_phb(DeviceState *dev, sPAPRMachineState *spapr, void *fdt,
> + int *fdt_start_offset, Error **errp)
> +{
> +sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(dev);
> +uint32_t intc_phandle;
> +
> +if (spapr_irq_get_phandle(spapr, spapr->fdt_blob, _phandle, errp)) {
> +return -1;
> +}
> +
> +if (spapr_populate_pci_dt(sphb, intc_phandle, fdt, spapr->irq->nr_msis,
> +  fdt_start_offset)) {
> +error_setg(errp, "unable to create FDT node for PHB %d", 
> sphb->index);
> +return -1;
> +}
> +
> +/* generally SLOF creates these, for hotplug it's up to QEMU */
> +_FDT(fdt_setprop_string(fdt, *fdt_start_offset, "name", "pci"));
> +
> +return 0;
> +}
> +
> +static void spapr_phb_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> +   Error **errp)
> +{
> +sPAPRMachineState *spapr = SPAPR_MACHINE(OBJECT(hotplug_dev));
> +sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(dev);
> +sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
> +const unsigned windows_supported = spapr_phb_windows_supported(sphb);
> +
> +if (sphb->index == (uint32_t)-1) {
> +error_setg(errp, "\"index\" for PAPR PHB is mandatory");
> +return;
> +}
> +
> +/*
> + * This will check that sphb->index doesn't exceed the maximum number of
> + * PHBs for the current machine type.
> + */
> +smc->phb_placement(spapr, sphb->index,
> +   >buid, >io_win_addr,
> +   >mem_win_addr, >mem64_win_addr,
> +   windows_supported, sphb->dma_liobn, errp);
> +}
> +
> +static void spapr_phb_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> +   Error **errp)
> +{
> +sPAPRMachineState *spapr = SPAPR_MACHINE(OBJECT(hotplug_dev));
> +sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
> +sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(dev);
> +sPAPRDRConnector *drc;
> +bool hotplugged = spapr_drc_hotplugged(dev);
> +Error *local_err = NULL;
> +
> +if (!smc->dr_phb_enabled) {
> +return;
> +}
> +
> +drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PHB, sphb->index);
> +/* hotplug hooks should check it's enabled before getting this far */
> +assert(drc);
> +
> +/*
> + * The FDT fragment will be added during the first invocation of RTAS
> + * ibm,client-architecture-support  for this device, when we're sure
> + * that the IOMMU is configured and that QEMU knows the phandle of the
> + * interrupt controller.
> + */
> +spapr_drc_attach(drc, DEVICE(dev), NULL, 0, _err);
> +if (local_err) {
> +

Re: [Qemu-devel] [PATCH v2] ppc: add host-serial and host-model machine attributes

2019-02-12 Thread David Gibson

On Tue, Feb 12, 2019 at 05:09:29PM +0530, P J P wrote:
> From: Prasad J Pandit 
> 
> On ppc hosts, hypervisor shares following system attributes
> 
>   - /proc/device-tree/system-id
>   - /proc/device-tree/model
> 
> with a guest. This could lead to information leakage and misuse.[*]
> Add machine attributes to control such system information exposure
> to a guest.
> 
> [*] https://wiki.openstack.org/wiki/OSSN/OSSN-0028
> 
> Reported-by: Daniel P. Berrangé 
> Fix-suggested-by: Daniel P. Berrangé 
> Signed-off-by: Prasad J Pandit 
> ---
>  hw/core/machine.c   | 46 +
>  hw/ppc/spapr.c  | 36 +--
>  include/hw/boards.h |  2 ++
>  qemu-options.hx | 10 +-
>  util/qemu-config.c  |  8 
>  5 files changed, 95 insertions(+), 7 deletions(-)
> 
> Update v2: add backward compatible properties
>   -> https://lists.gnu.org/archive/html/qemu-devel/2019-02/msg00593.html
> 
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 2629515363..2d5a52476a 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -476,6 +476,38 @@ static void machine_set_memory_encryption(Object *obj, 
> const char *value,
>  ms->memory_encryption = g_strdup(value);
>  }
>  
> +static char *machine_get_host_serial(Object *obj, Error **errp)
> +{
> +MachineState *ms = MACHINE(obj);
> +
> +return g_strdup(ms->host_serial);
> +}
> +
> +static void machine_set_host_serial(Object *obj, const char *value,
> +Error **errp)
> +{
> +MachineState *ms = MACHINE(obj);
> +
> +g_free(ms->host_serial);
> +ms->host_serial = g_strdup(value);
> +}
> +
> +static char *machine_get_host_model(Object *obj, Error **errp)
> +{
> +MachineState *ms = MACHINE(obj);
> +
> +return g_strdup(ms->host_model);
> +}
> +
> +static void machine_set_host_model(Object *obj, const char *value,
> +   Error **errp)
> +{
> +MachineState *ms = MACHINE(obj);
> +
> +g_free(ms->host_model);
> +ms->host_model = g_strdup(value);
> +}
> +
>  void machine_class_allow_dynamic_sysbus_dev(MachineClass *mc, const char 
> *type)
>  {
>  strList *item = g_new0(strList, 1);
> @@ -760,6 +792,18 @@ static void machine_class_init(ObjectClass *oc, void 
> *data)
>  _abort);
>  object_class_property_set_description(oc, "memory-encryption",
>  "Set memory encryption object to use", _abort);
> +
> +object_class_property_add_str(oc, "host-serial",
> +machine_get_host_serial, machine_set_host_serial,
> +_abort);
> +object_class_property_set_description(oc, "host-serial",
> +"Set host's system-id to use", _abort);
> +
> +object_class_property_add_str(oc, "host-model",
> +machine_get_host_model, machine_set_host_model,
> +_abort);
> +object_class_property_set_description(oc, "host-model",
> +"Set host's model-id to use", _abort);

You're adding properties to *all* machines, for something that's only
used on the PAPR machine.  That doesn't seem right.

>  }
>  
>  static void machine_class_base_init(ObjectClass *oc, void *data)
> @@ -785,6 +829,8 @@ static void machine_initfn(Object *obj)
>  ms->dump_guest_core = true;
>  ms->mem_merge = true;
>  ms->enable_graphics = true;
> +ms->host_serial = NULL;
> +ms->host_model = NULL;
>  
>  /* Register notifier when init is done for sysbus sanity checks */
>  ms->sysbus_notifier.notify = machine_init_notify;
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 0942f35bf8..a70667d72d 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1249,13 +1249,31 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr,
>   * Add info to guest to indentify which host is it being run on
>   * and what is the uuid of the guest
>   */
> -if (kvmppc_get_host_model()) {
> -_FDT(fdt_setprop_string(fdt, 0, "host-model", buf));
> -g_free(buf);
> +if (machine->host_model && !g_str_equal(machine->host_model, "none")) {
> +if (g_str_equal(machine->host_model, "passthrough")) {
> +/* -M host-model=passthrough */
> +if (kvmppc_get_host_model()) {
> +_FDT(fdt_setprop_string(fdt, 0, "host-model", buf));
> +g_free(buf);
> +}
> +} else {
> +/* -M host-model= */
> +_FDT(fdt_setprop_string(fdt, 0, "host-model", 
> machine->host_model));
> +}
>  }
> -if (kvmppc_get_host_serial()) {
> -_FDT(fdt_setprop_string(fdt, 0, "host-serial", buf));
> -g_free(buf);
> +
> +if (machine->host_serial && !g_str_equal(machine->host_serial, "none")) {
> +if (g_str_equal(machine->host_serial, "passthrough")) {
> +/* -M host-serial=passthrough */
> +if (kvmppc_get_host_serial()) {
> +_FDT(fdt_setprop_string(fdt, 0, "host-serial", buf));
> +

Re: [Qemu-devel] [PATCH v4 02/15] xive: Only set source type for LSIs

2019-02-12 Thread David Gibson

On Tue, Feb 12, 2019 at 07:24:06PM +0100, Greg Kurz wrote:
> MSI is the default and LSI specific code is guarded by the
> xive_source_irq_is_lsi() helper. The xive_source_irq_set()
> helper is a nop for MSIs.
> 
> Simplify the code by turning xive_source_irq_set() into
> xive_source_irq_set_lsi() and only call it for LSIs. The
> call to xive_source_irq_set(false) in spapr_xive_irq_free()
> is also a nop. Just drop it.
> 
> Signed-off-by: Greg Kurz 
> Reviewed-by: Cédric Le Goater 

Looks like a reasoanble cleanup regardless of the rest of the series.
Applied to ppc-for-4.0.

> ---
>  hw/intc/spapr_xive.c  |7 +++
>  include/hw/ppc/xive.h |7 ++-
>  2 files changed, 5 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index a0f5ff929447..290a290e43a5 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -489,20 +489,19 @@ bool spapr_xive_irq_claim(sPAPRXive *xive, uint32_t 
> lisn, bool lsi)
>  }
>  
>  xive->eat[lisn].w |= cpu_to_be64(EAS_VALID);
> -xive_source_irq_set(xsrc, lisn, lsi);
> +if (lsi) {
> +xive_source_irq_set_lsi(xsrc, lisn);
> +}
>  return true;
>  }
>  
>  bool spapr_xive_irq_free(sPAPRXive *xive, uint32_t lisn)
>  {
> -XiveSource *xsrc = >source;
> -
>  if (lisn >= xive->nr_irqs) {
>  return false;
>  }
>  
>  xive->eat[lisn].w &= cpu_to_be64(~EAS_VALID);
> -xive_source_irq_set(xsrc, lisn, false);
>  return true;
>  }
>  
> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> index ec3bb2aae45a..13a487527b11 100644
> --- a/include/hw/ppc/xive.h
> +++ b/include/hw/ppc/xive.h
> @@ -283,13 +283,10 @@ static inline bool xive_source_irq_is_lsi(XiveSource 
> *xsrc, uint32_t srcno)
>  return test_bit(srcno, xsrc->lsi_map);
>  }
>  
> -static inline void xive_source_irq_set(XiveSource *xsrc, uint32_t srcno,
> -   bool lsi)
> +static inline void xive_source_irq_set_lsi(XiveSource *xsrc, uint32_t srcno)
>  {
>  assert(srcno < xsrc->nr_irqs);
> -if (lsi) {
> -bitmap_set(xsrc->lsi_map, srcno, 1);
> -}
> +bitmap_set(xsrc->lsi_map, srcno, 1);
>  }
>  
>  void xive_source_set_irq(void *opaque, int srcno, int val);
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v4 05/15] spapr_irq: Expose the phandle of the interrupt controller

2019-02-12 Thread David Gibson

On Tue, Feb 12, 2019 at 07:24:26PM +0100, Greg Kurz wrote:
> This will be used by PHB hotplug in order to create the "interrupt-map"
> property of the PHB node.
> 
> Reviewed-by: Cédric Le Goater 
> Signed-off-by: Greg Kurz 
> ---
> v4: - return phandle via a pointer

You don't really need to do this.  You already have an Error ** to
return errors via, so you don't need an error return code.  Plus
phandles are not permitted to be 0 or -1, so you have some safe values
even for that case.

> ---
>  hw/ppc/spapr_irq.c |   26 ++
>  include/hw/ppc/spapr_irq.h |2 ++
>  2 files changed, 28 insertions(+)
> 
> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> index b8d725e251ba..31495033c37c 100644
> --- a/hw/ppc/spapr_irq.c
> +++ b/hw/ppc/spapr_irq.c
> @@ -692,6 +692,32 @@ void spapr_irq_reset(sPAPRMachineState *spapr, Error 
> **errp)
>  }
>  }
>  
> +int spapr_irq_get_phandle(sPAPRMachineState *spapr, void *fdt,
> +  uint32_t *phandle, Error **errp)
> +{
> +const char *nodename = spapr->irq->get_nodename(spapr);
> +int offset, ph;
> +
> +offset = fdt_subnode_offset(fdt, 0, nodename);
> +if (offset < 0) {
> +error_setg(errp, "Can't find node \"%s\": %s", nodename,
> +   fdt_strerror(offset));
> +return -1;
> +}
> +
> +ph = fdt_get_phandle(fdt, offset);
> +if (!ph) {
> +error_setg(errp, "Can't get phandle of node \"%s\"", nodename);
> +return -1;
> +}
> +
> +if (phandle) {
> +*phandle = ph;
> +}
> +
> +return 0;
> +}
> +
>  /*
>   * XICS legacy routines - to deprecate one day
>   */
> diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
> index ad7127355441..4b3303ef4f6a 100644
> --- a/include/hw/ppc/spapr_irq.h
> +++ b/include/hw/ppc/spapr_irq.h
> @@ -62,6 +62,8 @@ void spapr_irq_free(sPAPRMachineState *spapr, int irq, int 
> num);
>  qemu_irq spapr_qirq(sPAPRMachineState *spapr, int irq);
>  int spapr_irq_post_load(sPAPRMachineState *spapr, int version_id);
>  void spapr_irq_reset(sPAPRMachineState *spapr, Error **errp);
> +int spapr_irq_get_phandle(sPAPRMachineState *spapr, void *fdt,
> +  uint32_t *phandle, Error **errp);
>  
>  /*
>   * XICS legacy routines
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v4 15/15] spapr: enable PHB hotplug for default pseries machine type

2019-02-12 Thread David Gibson

On Tue, Feb 12, 2019 at 07:25:32PM +0100, Greg Kurz wrote:
> From: Michael Roth 
> 
> The 'dr_phb_enabled' field of that class can be set as part of
> machine-specific init code. It will be used to conditionally
> enable creation of DRC objects and device-tree description to
> facilitate hotplug of PHBs.
> 
> Since we can't migrate this state to older machine types,
> default the option to true and disable it for older machine
> types.
> 
> Signed-off-by: Michael Roth 
> Signed-off-by: Greg Kurz 

Reviewed-by: David Gibson 

> ---
>  hw/ppc/spapr.c |2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 06ce0babcb54..4a6b2f7f3f62 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -4166,6 +4166,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
> void *data)
>  smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF;
>  spapr_caps_add_properties(smc, _abort);
>  smc->irq = _irq_xics;
> +smc->dr_phb_enabled = true;
>  }
>  
>  static const TypeInfo spapr_machine_info = {
> @@ -4231,6 +4232,7 @@ static void 
> spapr_machine_3_1_class_options(MachineClass *mc)
>  compat_props_add(mc->compat_props, hw_compat_3_1, hw_compat_3_1_len);
>  mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power8_v2.0");
>  smc->update_dt_enabled = false;
> +smc->dr_phb_enabled = false;
>  }
>  
>  DEFINE_SPAPR_MACHINE(3_1, "3.1", false);
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [RFC 4/4] numa: check threads of the same core are on the same node

2019-02-12 Thread David Gibson

On Tue, Feb 12, 2019 at 10:48:27PM +0100, Laurent Vivier wrote:
> A core cannot be split between two nodes.
> To check if a thread of the same core has already been assigned to a node,
> this patch reverses the numa topology checking order and exits if the
> topology is not valid.

I'm not entirely sure if this makes sense to enforce generically.

It's certainly true for PAPR - we have no way to represent threads
with different NUMA nodes to the guest.

It probably makes sense for everything - the whole point of threading
is to take better advantage of latencies accessing memory, so it seems
implausible that the threads would have different paths to memory.

But... there are some pretty weird setups out there, so I'm not sure
it's a good idea to enforce a restriction generically that's not
actually inherent in the structure of the problem.

> 
> Update test/numa-test accordingly.
> 
> Fixes: 722387e78daf ("spapr: get numa node mapping from possible_cpus instead 
> of numa_get_node_for_cpu()")
> Cc: imamm...@redhat.com
> Signed-off-by: Laurent Vivier 
> ---
>  hw/core/machine.c | 27 ---
>  tests/numa-test.c |  4 ++--
>  2 files changed, 26 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index a2c29692b55e..c0a556b0dce7 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -602,6 +602,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
>  MachineClass *mc = MACHINE_GET_CLASS(machine);
>  bool match = false;
>  int i;
> +const CpuInstanceProperties *previous_props = NULL;
>  
>  if (!mc->possible_cpu_arch_ids) {
>  error_setg(errp, "mapping of CPUs to NUMA node is not supported");
> @@ -634,18 +635,38 @@ void machine_set_cpu_numa_node(MachineState *machine,
>  }
>  
>  /* skip slots with explicit mismatch */
> -if (props->has_thread_id && props->thread_id != 
> slot->props.thread_id) {
> +if (props->has_socket_id && props->socket_id != 
> slot->props.socket_id) {
>  continue;
>  }
>  
> -if (props->has_core_id && props->core_id != slot->props.core_id) {
> +if (props->has_core_id) {
> +if (props->core_id != slot->props.core_id) {
>  continue;
> +}
> +if (slot->props.has_node_id) {
> +/* we have a node where our core is already assigned */
> +previous_props = >props;
> +}
>  }
>  
> -if (props->has_socket_id && props->socket_id != 
> slot->props.socket_id) {
> +if (props->has_thread_id && props->thread_id != 
> slot->props.thread_id) {
>  continue;
>  }
>  
> +/* check current thread matches node of the thread of the same core 
> */
> +if (previous_props && previous_props->has_node_id &&
> +previous_props->node_id != props->node_id) {
> +char *cpu_str = cpu_props_to_string(props);
> +char *node_str = cpu_props_to_string(previous_props);
> +error_setg(errp,  "Invalid node-id=%"PRIu64" of [%s]: core-id "
> +  "[%s] is already assigned to node-id %"PRIu64,
> +  props->node_id, cpu_str,
> +  node_str, previous_props->node_id);
> +g_free(cpu_str);
> +g_free(node_str);
> +return;
> +}
> +
>  /* reject assignment if slot is already assigned, for compatibility
>   * of legacy cpu_index mapping with SPAPR core based mapping do not
>   * error out if cpu thread and matched core have the same node-id */
> diff --git a/tests/numa-test.c b/tests/numa-test.c
> index 5280573fc992..a7c3c5b4dee8 100644
> --- a/tests/numa-test.c
> +++ b/tests/numa-test.c
> @@ -112,7 +112,7 @@ static void pc_numa_cpu(const void *data)
>  "-numa cpu,node-id=1,socket-id=0 "
>  "-numa cpu,node-id=0,socket-id=1,core-id=0 "
>  "-numa cpu,node-id=0,socket-id=1,core-id=1,thread-id=0 "
> -"-numa cpu,node-id=1,socket-id=1,core-id=1,thread-id=1");
> +"-numa cpu,node-id=0,socket-id=1,core-id=1,thread-id=1");
>  qtest_start(cli);
>  cpus = get_cpus();
>  g_assert(cpus);
> @@ -141,7 +141,7 @@ static void pc_numa_cpu(const void *data)
>  } else if (socket == 1 && core == 1 && thread == 0) {
>  g_assert_cmpint(node, ==, 0);
>  } else if (socket == 1 && core == 1 && thread == 1) {
> -g_assert_cmpint(node, ==, 1);
> +g_assert_cmpint(node, ==, 0);
>  } else {
>  g_assert(false);
>  }

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v4 13/15] spapr_drc: Allow FDT fragment to be added later

2019-02-12 Thread David Gibson

On Tue, Feb 12, 2019 at 07:25:19PM +0100, Greg Kurz wrote:
> The current logic is to provide the FDT fragment when attaching a device
> to a DRC. This works perfectly fine for our current hotplug support, but
> soon we will add support for PHB hotplug which has some constraints, that
> CPU, PCI and LMB devices don't seem to have.
> 
> The first constraint is that the "ibm,dma-window" property of the PHB
> node requires the IOMMU to be configured, ie, spapr_tce_table_enable()
> has been called, which happens during PHB reset. It is okay in the case
> of hotplug since the device is reset before the hotplug handler is
> called. On the contrary with coldplug, the hotplug handler is called
> first and device is only reset during the initial system reset. Trying
> to create the FDT fragment on the hotplug path in this case, would
> result in somthing like this:
> 
> ibm,dma-window = < 0x8000 0x00 0x00 0x00 0x00 >;
> 
> This will cause linux in the guest to panic, by simply removing and
> re-adding the PHB using the drmgr command:
> 
>   page = alloc_pages_node(nid, GFP_KERNEL, get_order(sz));
>   if (!page)
>   panic("iommu_init_table: Can't allocate %ld bytes\n", sz);
> 
> The second and maybe more problematic constraint is that the
> "interrupt-map" property needs to reference the interrupt controller
> node using the very same phandle that SLOF has already exposed to the
> guest. QEMU requires SLOF to call the private KVMPPC_H_UPDATE_DT hcall
> at some point to know about this phandle. With the latest QEMU and SLOF,
> this happens when SLOF gets quiesced. This means that if the PHB gets
> hotplugged after CAS but before SLOF quiesce, then we're sure that the
> phandle is not known when the hotplug handler is called.
> 
> The FDT is only needed when the guest first invokes RTAS to configure
> the connector actually, long after SLOF quiesce. Let's postpone the
> creation of FDT fragments for PHBs to rtas_ibm_configure_connector().
> 
> Since we only need this for PHBs, introduce a new method in the base
> DRC class for that. It will implemented for "spapr-drc-phb" DRCs in
> a subsequent patch.
> 
> Allow spapr_drc_attach() to be passed a NULL fdt argument if the method
> is available.
> 
> Signed-off-by: Greg Kurz 

The basic solution looks fine.  However I don't much like the fact
that this leaves us with two ways to handle the fdt fragment - either
at connect time or at configure connector time via a callback.  qemu
already has way to many places where there are confusingly multiple
ways to do things.

I know it's a detour, but I'd really prefer to convert the existing
DRC handling to this new callback scheme, rather than have two
different approaches.

> ---
>  hw/ppc/spapr_drc.c |   34 +-
>  include/hw/ppc/spapr_drc.h |6 ++
>  2 files changed, 35 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 189ee681062a..c5a281915665 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -22,6 +22,7 @@
>  #include "qemu/error-report.h"
>  #include "hw/ppc/spapr.h" /* for RTAS return codes */
>  #include "hw/pci-host/spapr.h" /* spapr_phb_remove_pci_device_cb callback */
> +#include "sysemu/device_tree.h"
>  #include "trace.h"
>  
>  #define DRC_CONTAINER_PATH "/dr-connector"
> @@ -376,6 +377,8 @@ static void prop_get_fdt(Object *obj, Visitor *v, const 
> char *name,
>  void spapr_drc_attach(sPAPRDRConnector *drc, DeviceState *d, void *fdt,
>int fdt_start_offset, Error **errp)
>  {
> +sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +
>  trace_spapr_drc_attach(spapr_drc_index(drc));
>  
>  if (drc->dev) {
> @@ -384,11 +387,14 @@ void spapr_drc_attach(sPAPRDRConnector *drc, 
> DeviceState *d, void *fdt,
>  }
>  g_assert((drc->state == SPAPR_DRC_STATE_LOGICAL_UNUSABLE)
>   || (drc->state == SPAPR_DRC_STATE_PHYSICAL_POWERON));
> -g_assert(fdt);
> +g_assert(fdt || drck->populate_dt);
>  
>  drc->dev = d;
> -drc->fdt = fdt;
> -drc->fdt_start_offset = fdt_start_offset;
> +
> +if (fdt) {
> +drc->fdt = fdt;
> +drc->fdt_start_offset = fdt_start_offset;
> +}
>  
>  object_property_add_link(OBJECT(drc), "device",
>   object_get_typename(OBJECT(drc->dev)),
> @@ -1118,10 +1124,28 @@ static void rtas_ibm_configure_connector(PowerPCCPU 
> *cpu,
>  goto out;
>  }
>  
> -g_assert(drc->fdt);
> -
>  drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
>  
> +g_assert(drc->fdt || drck->populate_dt);
> +
> +if (!drc->fdt) {
> +Error *local_err = NULL;
> +void *fdt;
> +int fdt_size;
> +
> +fdt = create_device_tree(_size);
> +
> +if (drck->populate_dt(drc->dev, spapr, fdt, >fdt_start_offset,
> +   _err)) {
> +g_free(fdt);
> +error_free(local_err);
> +

Re: [Qemu-devel] [PATCH v4 01/15] spapr_irq: Add an @xics_offset field to sPAPRIrq

2019-02-12 Thread David Gibson

On Tue, Feb 12, 2019 at 07:24:00PM +0100, Greg Kurz wrote:
> Only pseries machines, either recent ones started with ic-mode=xics
> or older ones using the legacy irq allocation scheme, need to set the
> @offset of the ICS to XICS_IRQ_BASE. Recent pseries started with
> ic-mode=dual set it to 0 and powernv machines set it to some other
> value at runtime.
> 
> It thus doesn't really help to set the default value of the ICS offset
> to XICS_IRQ_BASE in ics_base_instance_init().
> 
> Drop that code from XICS and let the pseries code set the offset
> explicitely for clarity.
> 
> Signed-off-by: Greg Kurz 

So this actually relates to a discussion I've had on some of Cédric's
more recent patches.  Changing the ics offset in ic-mode=dual doesn't
make sense to me.  The global (guest) interrupt numbers need to match
between XICS and XIVE, but the global interrupt numbers don't have to
match the ICS source numbers, which is what ics->offset is about.

> ---
>  hw/intc/xics.c |8 
>  hw/ppc/spapr_irq.c |   33 -
>  include/hw/ppc/spapr_irq.h |1 +
>  3 files changed, 21 insertions(+), 21 deletions(-)
> 
> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
> index 16e8ffa2aaf7..7cac138067e2 100644
> --- a/hw/intc/xics.c
> +++ b/hw/intc/xics.c
> @@ -638,13 +638,6 @@ static void ics_base_realize(DeviceState *dev, Error 
> **errp)
>  ics->irqs = g_malloc0(ics->nr_irqs * sizeof(ICSIRQState));
>  }
>  
> -static void ics_base_instance_init(Object *obj)
> -{
> -ICSState *ics = ICS_BASE(obj);
> -
> -ics->offset = XICS_IRQ_BASE;
> -}
> -
>  static int ics_base_dispatch_pre_save(void *opaque)
>  {
>  ICSState *ics = opaque;
> @@ -720,7 +713,6 @@ static const TypeInfo ics_base_info = {
>  .parent = TYPE_DEVICE,
>  .abstract = true,
>  .instance_size = sizeof(ICSState),
> -.instance_init = ics_base_instance_init,
>  .class_init = ics_base_class_init,
>  .class_size = sizeof(ICSStateClass),
>  };
> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> index 80b0083b8e38..8217e0215411 100644
> --- a/hw/ppc/spapr_irq.c
> +++ b/hw/ppc/spapr_irq.c
> @@ -68,10 +68,11 @@ void spapr_irq_msi_reset(sPAPRMachineState *spapr)
>  
>  static ICSState *spapr_ics_create(sPAPRMachineState *spapr,
>const char *type_ics,
> -  int nr_irqs, Error **errp)
> +  int nr_irqs, int offset, Error **errp)
>  {
>  Error *local_err = NULL;
>  Object *obj;
> +ICSState *ics;
>  
>  obj = object_new(type_ics);
>  object_property_add_child(OBJECT(spapr), "ics", obj, _abort);
> @@ -86,7 +87,10 @@ static ICSState *spapr_ics_create(sPAPRMachineState *spapr,
>  goto error;
>  }
>  
> -return ICS_BASE(obj);
> +ics = ICS_BASE(obj);
> +ics->offset = offset;
> +
> +return ics;
>  
>  error:
>  error_propagate(errp, local_err);
> @@ -104,6 +108,7 @@ static void spapr_irq_init_xics(sPAPRMachineState *spapr, 
> Error **errp)
>  !xics_kvm_init(spapr, _err)) {
>  spapr->icp_type = TYPE_KVM_ICP;
>  spapr->ics = spapr_ics_create(spapr, TYPE_ICS_KVM, nr_irqs,
> +  spapr->irq->xics_offset,
>_err);
>  }
>  if (machine_kernel_irqchip_required(machine) && !spapr->ics) {
> @@ -119,6 +124,7 @@ static void spapr_irq_init_xics(sPAPRMachineState *spapr, 
> Error **errp)
>  xics_spapr_init(spapr);
>  spapr->icp_type = TYPE_ICP;
>  spapr->ics = spapr_ics_create(spapr, TYPE_ICS_SIMPLE, nr_irqs,
> +  spapr->irq->xics_offset,
>_err);
>  }
>  
> @@ -246,6 +252,7 @@ sPAPRIrq spapr_irq_xics = {
>  .nr_irqs = SPAPR_IRQ_XICS_NR_IRQS,
>  .nr_msis = SPAPR_IRQ_XICS_NR_MSIS,
>  .ov5 = SPAPR_OV5_XIVE_LEGACY,
> +.xics_offset = XICS_IRQ_BASE,
>  
>  .init= spapr_irq_init_xics,
>  .claim   = spapr_irq_claim_xics,
> @@ -451,17 +458,6 @@ static void spapr_irq_init_dual(sPAPRMachineState 
> *spapr, Error **errp)
>  return;
>  }
>  
> -/*
> - * Align the XICS and the XIVE IRQ number space under QEMU.
> - *
> - * However, the XICS KVM device still considers that the IRQ
> - * numbers should start at XICS_IRQ_BASE (0x1000). Either we
> - * should introduce a KVM device ioctl to set the offset or ignore
> - * the lower 4K numbers when using the get/set ioctl of the XICS
> - * KVM device. The second option seems the least intrusive.
> - */
> -spapr->ics->offset = 0;
> -
>  spapr_irq_xive.init(spapr, _err);
>  if (local_err) {
>  error_propagate(errp, local_err);
> @@ -582,6 +578,16 @@ sPAPRIrq spapr_irq_dual = {
>  .nr_irqs = SPAPR_IRQ_DUAL_NR_IRQS,
>  .nr_msis = SPAPR_IRQ_DUAL_NR_MSIS,
>

Re: [Qemu-devel] [PATCH 11/13] spapr: check for the activation of the KVM IRQ device

2019-02-12 Thread David Gibson

On Tue, Feb 12, 2019 at 08:12:28AM +0100, Cédric Le Goater wrote:
> On 2/12/19 2:01 AM, David Gibson wrote:
> > On Mon, Jan 07, 2019 at 07:39:44PM +0100, Cédric Le Goater wrote:
> >> The activation of the KVM IRQ device depends on the interrupt mode
> >> chosen at CAS time by the machine and some methods used at reset or by
> >> the migration need to be protected.
> >>
> >> Signed-off-by: Cédric Le Goater 
> >> ---
> >>  hw/intc/spapr_xive_kvm.c | 28 
> >>  hw/intc/xics_kvm.c   | 25 -
> >>  2 files changed, 52 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
> >> index 93ea8e71047a..d35814c1992e 100644
> >> --- a/hw/intc/spapr_xive_kvm.c
> >> +++ b/hw/intc/spapr_xive_kvm.c
> >> @@ -95,9 +95,15 @@ static void kvmppc_xive_cpu_set_state(XiveTCTX *tctx, 
> >> Error **errp)
> >>  
> >>  void kvmppc_xive_cpu_get_state(XiveTCTX *tctx, Error **errp)
> >>  {
> >> +sPAPRXive *xive = SPAPR_MACHINE(qdev_get_machine())->xive;
> >>  uint64_t state[4] = { 0 };
> >>  int ret;
> >>  
> >> +/* The KVM XIVE device is not in use */
> >> +if (xive->fd == -1) {
> >> +return;
> >> +}
> >> +
> >>  ret = kvm_get_one_reg(tctx->cs, KVM_REG_PPC_NVT_STATE, state);
> >>  if (ret != 0) {
> >>  error_setg_errno(errp, errno,
> >> @@ -151,6 +157,11 @@ void kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error 
> >> **errp)
> >>  unsigned long vcpu_id;
> >>  int ret;
> >>  
> >> +/* The KVM XIVE device is not in use */
> >> +if (xive->fd == -1) {
> >> +return;
> >> +}
> >> +
> >>  /* Check if CPU was hot unplugged and replugged. */
> >>  if (kvm_cpu_is_enabled(tctx->cs)) {
> >>  return;
> >> @@ -234,9 +245,13 @@ static void kvmppc_xive_source_get_state(XiveSource 
> >> *xsrc)
> >>  void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val)
> >>  {
> >>  XiveSource *xsrc = opaque;
> >> +sPAPRXive *xive = SPAPR_XIVE(xsrc->xive);
> >>  struct kvm_irq_level args;
> >>  int rc;
> >>  
> >> +/* The KVM XIVE device should be in use */
> >> +assert(xive->fd != -1);
> >> +
> >>  args.irq = srcno;
> >>  if (!xive_source_irq_is_lsi(xsrc, srcno)) {
> >>  if (!val) {
> >> @@ -580,6 +595,11 @@ int kvmppc_xive_pre_save(sPAPRXive *xive)
> >>  Error *local_err = NULL;
> >>  CPUState *cs;
> >>  
> >> +/* The KVM XIVE device is not in use */
> >> +if (xive->fd == -1) {
> >> +return 0;
> >> +}
> >> +
> >>  /* Grab the EAT */
> >>  kvmppc_xive_get_eas_state(xive, _err);
> >>  if (local_err) {
> >> @@ -612,6 +632,9 @@ int kvmppc_xive_post_load(sPAPRXive *xive, int 
> >> version_id)
> >>  Error *local_err = NULL;
> >>  CPUState *cs;
> >>  
> >> +/* The KVM XIVE device should be in use */
> >> +assert(xive->fd != -1);
> > 
> > I'm guessing this is an assert() because the handler shouldn't be
> > registered when we're not in KVM mode.  But wouldn't that also be true
> > of the pre_save hook, which errors out rather than asserting?
> 
> The handlers are not symetric.
> 
> The pre_save is registered in the vmstate of the sPAPRXive model and the 
> post_load is handled at the machine level after all XIVE state have been
> transferred.

Ah, ok.  Some comments on site explaining why that's so would be useful.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH] ppc: fix crash during branch stepping

2019-02-12 Thread David Gibson

On Tue, Feb 12, 2019 at 01:12:55PM +0100, Roman Kapl wrote:
> The PPC BRANCH exception could bubble up, but this is an QEMU internal 
> exception
> and QEMU then crased. Instead it should trigger TRACE exception, according to
> PPC 2.07 book. It could happen only when using branch stepping, which is not
> commonly used.
> 
> Change gen_prep_dbgex do do trigger TRACE. The excp, argument is now removed,
> since the type of exception can be inferred from the singlestep_enabled flags.
> removed the guards around gen_exception, since they are unnecessary.
> 
> Fixes: 0e3bf48909 ("ppc: add DBCR based debugging").
> Signed-off-by: Roman Kapl 

Applied to ppc-for-4.0, thanks.

> ---
>  target/ppc/translate.c | 37 +++--
>  1 file changed, 15 insertions(+), 22 deletions(-)
> 
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index e169c43643..c22d1a69c7 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -287,26 +287,22 @@ static void gen_exception_nip(DisasContext *ctx, 
> uint32_t excp,
>  ctx->exception = (excp);
>  }
>  
> -/* Translates the EXCP_TRACE/BRANCH exceptions used on most PowerPCs to
> - * EXCP_DEBUG, if we are running on cores using the debug enable bit (e.g.
> - * BookE).
> +/*
> + * Tells the caller what is the appropriate exception to generate and 
> prepares
> + * SPR registers for this exception.
> + *
> + * The exception can be either POWERPC_EXCP_TRACE (on most PowerPCs) or
> + * POWERPC_EXCP_DEBUG (on BookE).
>   */
> -static uint32_t gen_prep_dbgex(DisasContext *ctx, uint32_t excp)
> +static uint32_t gen_prep_dbgex(DisasContext *ctx)
>  {
> -if ((ctx->singlestep_enabled & CPU_SINGLE_STEP)
> -&& (excp == POWERPC_EXCP_BRANCH)) {
> -/* Trace excpt. has priority */
> -excp = POWERPC_EXCP_TRACE;
> -}
>  if (ctx->flags & POWERPC_FLAG_DE) {
>  target_ulong dbsr = 0;
> -switch (excp) {
> -case POWERPC_EXCP_TRACE:
> +if (ctx->singlestep_enabled & CPU_SINGLE_STEP) {
>  dbsr = DBCR0_ICMP;
> -break;
> -case POWERPC_EXCP_BRANCH:
> +} else {
> +/* Must have been branch */
>  dbsr = DBCR0_BRT;
> -break;
>  }
>  TCGv t0 = tcg_temp_new();
>  gen_load_spr(t0, SPR_BOOKE_DBSR);
> @@ -315,7 +311,7 @@ static uint32_t gen_prep_dbgex(DisasContext *ctx, 
> uint32_t excp)
>  tcg_temp_free(t0);
>  return POWERPC_EXCP_DEBUG;
>  } else {
> -return excp;
> +return POWERPC_EXCP_TRACE;
>  }
>  }
>  
> @@ -3652,10 +3648,8 @@ static void gen_lookup_and_goto_ptr(DisasContext *ctx)
>  if (sse & GDBSTUB_SINGLE_STEP) {
>  gen_debug_exception(ctx);
>  } else if (sse & (CPU_SINGLE_STEP | CPU_BRANCH_STEP)) {
> -uint32_t excp = gen_prep_dbgex(ctx, POWERPC_EXCP_BRANCH);
> -if (excp != POWERPC_EXCP_NONE) {
> -gen_exception(ctx, excp);
> -}
> +uint32_t excp = gen_prep_dbgex(ctx);
> +gen_exception(ctx, excp);
>  }
>  tcg_gen_exit_tb(NULL, 0);
>  } else {
> @@ -7785,9 +7779,8 @@ static void ppc_tr_translate_insn(DisasContextBase 
> *dcbase, CPUState *cs)
>   ctx->exception != POWERPC_SYSCALL &&
>   ctx->exception != POWERPC_EXCP_TRAP &&
>   ctx->exception != POWERPC_EXCP_BRANCH)) {
> -uint32_t excp = gen_prep_dbgex(ctx, POWERPC_EXCP_TRACE);
> -if (excp != POWERPC_EXCP_NONE)
> -gen_exception_nip(ctx, excp, ctx->base.pc_next);
> +uint32_t excp = gen_prep_dbgex(ctx);
> +gen_exception_nip(ctx, excp, ctx->base.pc_next);
>  }
>  
>  if (tcg_check_temp_count()) {

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v4 04/15] spapr: Expose the name of the interrupt controller node

2019-02-12 Thread David Gibson

On Tue, Feb 12, 2019 at 07:24:19PM +0100, Greg Kurz wrote:
> This will be needed by PHB hotplug in order to access the "phandle"
> property of the interrupt controller node.
> 
> Reviewed-by: Cédric Le Goater 
> Signed-off-by: Greg Kurz 

Reviewed-by: David Gibson 

> ---
> v4: - folded some changes from patches 15, 16 and 17 of v3
> - dropped useless helpers
> ---
>  hw/intc/spapr_xive.c|9 -
>  hw/intc/xics_spapr.c|2 +-
>  hw/ppc/spapr_irq.c  |   21 -
>  include/hw/ppc/spapr_irq.h  |1 +
>  include/hw/ppc/spapr_xive.h |3 +++
>  include/hw/ppc/xics_spapr.h |2 ++
>  6 files changed, 31 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index 815263ca72ab..f14e436ad4b9 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -317,6 +317,9 @@ static void spapr_xive_realize(DeviceState *dev, Error 
> **errp)
>  /* Map all regions */
>  spapr_xive_map_mmio(xive);
>  
> +xive->nodename = g_strdup_printf("interrupt-controller@%" PRIx64,
> +   xive->tm_base + XIVE_TM_USER_PAGE * (1 << 
> TM_SHIFT));
> +
>  qemu_register_reset(spapr_xive_reset, dev);
>  }
>  
> @@ -1443,7 +1446,6 @@ void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t 
> nr_servers, void *fdt,
>  cpu_to_be32(7),/* start */
>  cpu_to_be32(0xf8), /* count */
>  };
> -gchar *nodename;
>  
>  /* Thread Interrupt Management Area : User (ring 3) and OS (ring 2) */
>  timas[0] = cpu_to_be64(xive->tm_base +
> @@ -1453,10 +1455,7 @@ void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t 
> nr_servers, void *fdt,
> XIVE_TM_OS_PAGE * (1ull << TM_SHIFT));
>  timas[3] = cpu_to_be64(1ull << TM_SHIFT);
>  
> -nodename = g_strdup_printf("interrupt-controller@%" PRIx64,
> -   xive->tm_base + XIVE_TM_USER_PAGE * (1 << 
> TM_SHIFT));
> -_FDT(node = fdt_add_subnode(fdt, 0, nodename));
> -g_free(nodename);
> +_FDT(node = fdt_add_subnode(fdt, 0, xive->nodename));
>  
>  _FDT(fdt_setprop_string(fdt, node, "device_type", "power-ivpe"));
>  _FDT(fdt_setprop(fdt, node, "reg", timas, sizeof(timas)));
> diff --git a/hw/intc/xics_spapr.c b/hw/intc/xics_spapr.c
> index e2d8b3818336..53bda6661b2a 100644
> --- a/hw/intc/xics_spapr.c
> +++ b/hw/intc/xics_spapr.c
> @@ -254,7 +254,7 @@ void spapr_dt_xics(sPAPRMachineState *spapr, uint32_t 
> nr_servers, void *fdt,
>  };
>  int node;
>  
> -_FDT(node = fdt_add_subnode(fdt, 0, "interrupt-controller"));
> +_FDT(node = fdt_add_subnode(fdt, 0, XICS_NODENAME));
>  
>  _FDT(fdt_setprop_string(fdt, node, "device_type",
>  "PowerPC-External-Interrupt-Presentation"));
> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> index 3fc34d7c8a43..b8d725e251ba 100644
> --- a/hw/ppc/spapr_irq.c
> +++ b/hw/ppc/spapr_irq.c
> @@ -256,6 +256,11 @@ static void spapr_irq_reset_xics(sPAPRMachineState 
> *spapr, Error **errp)
>  /* TODO: create the KVM XICS device */
>  }
>  
> +static const char *spapr_irq_get_nodename_xics(sPAPRMachineState *spapr)
> +{
> +return XICS_NODENAME;
> +}
> +
>  #define SPAPR_IRQ_XICS_NR_IRQS 0x1000
>  #define SPAPR_IRQ_XICS_NR_MSIS \
>  (XICS_IRQ_BASE + SPAPR_IRQ_XICS_NR_IRQS - SPAPR_IRQ_MSI)
> @@ -276,6 +281,7 @@ sPAPRIrq spapr_irq_xics = {
>  .post_load   = spapr_irq_post_load_xics,
>  .reset   = spapr_irq_reset_xics,
>  .set_irq = spapr_irq_set_irq_xics,
> +.get_nodename = spapr_irq_get_nodename_xics,
>  };
>  
>  /*
> @@ -415,6 +421,11 @@ static void spapr_irq_set_irq_xive(void *opaque, int 
> srcno, int val)
>  xive_source_set_irq(>xive->source, srcno, val);
>  }
>  
> +static const char *spapr_irq_get_nodename_xive(sPAPRMachineState *spapr)
> +{
> +return spapr->xive->nodename;
> +}
> +
>  /*
>   * XIVE uses the full IRQ number space. Set it to 8K to be compatible
>   * with XICS.
> @@ -438,6 +449,7 @@ sPAPRIrq spapr_irq_xive = {
>  .post_load   = spapr_irq_post_load_xive,
>  .reset   = spapr_irq_reset_xive,
>  .set_irq = spapr_irq_set_irq_xive,
> +.get_nodename = spapr_irq_get_nodename_xive,
>  };
>  
>  /*
> @@ -585,6 +597,11 @@ static void spapr_irq_set_irq_dual(void *opaque, int 
> srcno, int val)
>  spapr_irq_current(spapr)->set_irq(spapr, srcno, val);
>  }
>  
> +static const char *spapr_irq_get_nodename_dual(sPAPRMachineState *spapr)
> +{
> +return spapr_irq_current(spapr)->get_nodename(spapr);
> +}
> +
>  /*
>   * Define values in sync with the XIVE and XICS backend
>   */
> @@ -615,7 +632,8 @@ sPAPRIrq spapr_irq_dual = {
>  .cpu_intc_create = spapr_irq_cpu_intc_create_dual,
>  .post_load   = spapr_irq_post_load_dual,
>  .reset   = spapr_irq_reset_dual,
> -.set_irq = spapr_irq_set_irq_dual
> +.set_irq = spapr_irq_set_irq_dual,
> +.get_nodename =

Re: [Qemu-devel] [RFC 1/4] numa, spapr: add thread-id in the possible_cpus list

2019-02-12 Thread David Gibson

On Tue, Feb 12, 2019 at 10:48:24PM +0100, Laurent Vivier wrote:
> spapr_possible_cpu_arch_ids() counts only cores, and so
> the number of available CPUs is the number of vCPU divided
> by smp_threads.
> 
> ... -smp 4,maxcpus=8,cores=2,threads=2,sockets=2 -numa node,cpus=0,cpus=1 \
>  -numa node,cpus=3,cpus=4 \
>  -numa node -numa node
> 
> This generates (info hotpluggable-cpus)
> 
>   node-id: 0 core-id: 0 thread-id: 0 [thread-id: 1]
>   node-id: 0 core-id: 6 thread-id: 0 [thread-id: 1]
>   node-id: 1 core-id: 2 thread-id: 0 [thread-id: 1]
>   node-id: 1 core-id: 4 thread-id: 0 [thread-id: 1]
> 
> And this command line generates the following error:
> 
>   CPU(s) not present in any NUMA nodes: CPU 3 [core-id: 6]
> 
> That is wrong because CPU 3 [core-id: 6] is assigned to node-id 0
> Moreover "cpus=4" is not valid, because it means core-id 8 but
> maxcpus is 8.
> 
> With this patch we have now:
> 
>   node-id: 0 core-id: 0 thread-id: 0
>   node-id: 0 core-id: 0 thread-id: 1
>   node-id: 0 core-id: 1 thread-id: 0
>   node-id: 1 core-id: 1 thread-id: 1
>   node-id: 0 core-id: 2 thread-id: 1
>   node-id: 1 core-id: 2 thread-id: 0
>   node-id: 0 core-id: 3 thread-id: 1
>   node-id: 0 core-id: 3 thread-id: 0

I'm afraid this is not the right solution.  The point of the
hotpluggable cpus table is that it has exactly one entry for each
hotpluggable unit.  For PAPR that's a core, not a thread.

So, the problem is with how the NUMA configuration code is
interpreting possible-cpus, not how the machine is building the table.

> CPUs 0 (core-id: 0 thread-id: 0) and 1 (core-id: 0 thread-id: 1) are
> correctly assigned to node-id 0, CPUs 3 (core-id: 1 thread-id: 1) and
>  4 (core-id: 2 thread-id: 0) are correctly assigned to node-id 1.
> All other CPUs are assigned to node-id 0 by default.
> 
> And the error message is also correct:
> 
>   CPU(s) not present in any NUMA nodes: CPU 2 [core-id: 1, thread-id: 0], \
> CPU 5 [core-id: 2, thread-id: 1], \
> CPU 6 [core-id: 3, thread-id: 0], \
> CPU 7 [core-id: 3, thread-id: 1]
> 
> Fixes: ec78f8114bc4 ("numa: use possible_cpus for not mapped CPUs check")
> Cc: imamm...@redhat.com
> 
> Before commit ec78f8114bc4, output was correct:
> 
>   CPU(s) not present in any NUMA nodes: 2 5 6 7
> 
> Signed-off-by: Laurent Vivier 
> ---
>  hw/ppc/spapr.c | 33 +
>  1 file changed, 13 insertions(+), 20 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 332cba89d425..7196ba09da34 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2404,15 +2404,13 @@ static void spapr_validate_node_memory(MachineState 
> *machine, Error **errp)
>  /* find cpu slot in machine->possible_cpus by core_id */
>  static CPUArchId *spapr_find_cpu_slot(MachineState *ms, uint32_t id, int 
> *idx)
>  {
> -int index = id / smp_threads;
> -
> -if (index >= ms->possible_cpus->len) {
> +if (id >= ms->possible_cpus->len) {
>  return NULL;
>  }
>  if (idx) {
> -*idx = index;
> +*idx = id;
>  }
> -return >possible_cpus->cpus[index];
> +return >possible_cpus->cpus[id];
>  }
>  
>  static void spapr_set_vsmt_mode(sPAPRMachineState *spapr, Error **errp)
> @@ -2514,7 +2512,7 @@ static void spapr_init_cpus(sPAPRMachineState *spapr)
>  error_report("This machine version does not support CPU 
> hotplug");
>  exit(1);
>  }
> -boot_cores_nr = possible_cpus->len;
> +boot_cores_nr = possible_cpus->len / smp_threads;
>  }
>  
>  if (smc->pre_2_10_has_unused_icps) {
> @@ -2528,7 +2526,7 @@ static void spapr_init_cpus(sPAPRMachineState *spapr)
>  }
>  }
>  
> -for (i = 0; i < possible_cpus->len; i++) {
> +for (i = 0; i < possible_cpus->len / smp_threads; i++) {
>  int core_id = i * smp_threads;
>  
>  if (mc->has_hotpluggable_cpus) {
> @@ -3795,21 +3793,16 @@ spapr_cpu_index_to_props(MachineState *machine, 
> unsigned cpu_index)
>  
>  static int64_t spapr_get_default_cpu_node_id(const MachineState *ms, int idx)
>  {
> -return idx / smp_cores % nb_numa_nodes;
> +return idx / (smp_cores * smp_threads) % nb_numa_nodes;
>  }
>  
>  static const CPUArchIdList *spapr_possible_cpu_arch_ids(MachineState 
> *machine)
>  {
>  int i;
>  const char *core_type;
> -int spapr_max_cores = max_cpus / smp_threads;
> -MachineClass *mc = MACHINE_GET_CLASS(machine);
>  
> -if (!mc->has_hotpluggable_cpus) {
> -spapr_max_cores = QEMU_ALIGN_UP(smp_cpus, smp_threads) / smp_threads;
> -}
>  if (machine->possible_cpus) {
> -assert(machine->possible_cpus->len == spapr_max_cores);
> +assert(machine->possible_cpus->len == max_cpus);
>  return machine->possible_cpus;
>  }
>  
> @@

Re: [Qemu-devel] [PATCH 12/13] spapr/xics: ignore the lower 4K in the IRQ number space

2019-02-12 Thread David Gibson

On Tue, Feb 12, 2019 at 08:05:53AM +0100, Cédric Le Goater wrote:
> On 2/12/19 2:06 AM, David Gibson wrote:
> > On Mon, Jan 07, 2019 at 07:39:45PM +0100, Cédric Le Goater wrote:
> >> The IRQ number space of the XIVE and XICS interrupt mode are aligned
> >> when using the dual interrupt mode for the machine. This means that
> >> the ICS offset is set to zero in QEMU and that the KVM XICS device
> >> should be informed of this new value. Unfortunately, there is now way
> >> to do so and KVM still maintains the XICS_IRQ_BASE (0x1000) offset.
> >>
> >> Ignore the lower 4K which are not used under the XICS interrupt
> >> mode. These IRQ numbers are only claimed by XIVE for the CPU IPIs.
> >>
> >> Signed-off-by: Cédric Le Goater 
> >> ---
> >>  hw/intc/xics_kvm.c | 18 ++
> >>  1 file changed, 18 insertions(+)
> >>
> >> diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
> >> index 651bbfdf6966..1d21ff217b82 100644
> >> --- a/hw/intc/xics_kvm.c
> >> +++ b/hw/intc/xics_kvm.c
> >> @@ -238,6 +238,15 @@ static void ics_get_kvm_state(ICSState *ics)
> >>  for (i = 0; i < ics->nr_irqs; i++) {
> >>  ICSIRQState *irq = >irqs[i];
> >>  
> >> +/*
> >> + * The KVM XICS device considers that the IRQ numbers should
> >> + * start at XICS_IRQ_BASE (0x1000). Ignore the lower 4K
> >> + * numbers (only claimed by XIVE for the CPU IPIs).
> >> + */
> >> +if (i + ics->offset < XICS_IRQ_BASE) {
> >> +continue;
> >> +}
> >> +
> > 
> > This seems bogus to me.  The guest-visible irq numbers need to line up
> > between xics and xive mode, yes, but that doesn't mean we need to keep
> > around a great big array of unused array of ICS irq states, even in
> > TCG mode.
> 
> This is because the qirqs[] array is under the machine and shared between 
> both interrupt modes, xics and xive.

I don't see how that follows.  ICSIRQState is indexed in terms of the
ICS source number, not the global irq number, so I don't see why it
has to match up with the qirq array.

> 
> C.
> 
> > 
> >>  kvm_device_access(kernel_xics_fd, KVM_DEV_XICS_GRP_SOURCES,
> >>i + ics->offset, , false, _fatal);
> >>  
> >> @@ -303,6 +312,15 @@ static int ics_set_kvm_state(ICSState *ics, int 
> >> version_id)
> >>  ICSIRQState *irq = >irqs[i];
> >>  int ret;
> >>  
> >> +/*
> >> + * The KVM XICS device considers that the IRQ numbers should
> >> + * start at XICS_IRQ_BASE (0x1000). Ignore the lower 4K
> >> + * numbers (only claimed by XIVE for the CPU IPIs).
> >> + */
> >> +if (i + ics->offset < XICS_IRQ_BASE) {
> >> +continue;
> >> +}
> >> +
> >>  state = irq->server;
> >>  state |= (uint64_t)(irq->saved_priority & KVM_XICS_PRIORITY_MASK)
> >>  << KVM_XICS_PRIORITY_SHIFT;
> > 
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [Qemu-ppc] [PATCH] cuda: decrease time delay before raising VIA SR interrupt

2019-02-12 Thread David Gibson

On Tue, Feb 12, 2019 at 08:01:22PM +, Mark Cave-Ayland wrote:
> On 12/02/2019 18:21, Philippe Mathieu-Daudé wrote:
> 
> > On 2/12/19 6:50 PM, Mark Cave-Ayland wrote:
> >> On 12/02/2019 17:21, Philippe Mathieu-Daudé wrote:
> >>
> > If this delay is to prevent a bug which only happens in MacOS then 
> > that's the hack
> > not the normal code path to run without the delay that you've just 
> > removed. So maybe
> > this should be kept if possible to avoid unecessary delays for other 
> > guests.
> > (Although if this only affects mac99,via=cuda but not mac99,via=pmu 
> > then I don't care
> > much as long as pmu works.)
> 
>  Well the reality is that the detection above doesn't actually seem to 
>  work anyway -
>  at least a quick boot test with Linux, MacOS X and MacOS 9 with a 
>  printf() added into
>  the if() shows nothing firing once the kernel takes over. So the slow 
>  path with the
>  delay included was always being taken within the OS anyway.
> 
>  And indeed, the code doesn't affect pmu so you won't see any difference 
>  there.
> 
> >> As a plus it also prevents a guest OS from accidentally triggering the 
> >> hack whilst
> >> programming the VIA port.
> >
> > That may be a problem though. What's the issue exactly? Why is the 
> > delay needed in
> > the first place?
> 
>  It's some kind of racy polling with OS 9 (I wasn't involved in the 
>  technical details,
>  sorry) which causes OS 9 to hang on boot if the delay isn't present. And 
>  even better
>  the slow path that was previously always being taken has now been 
>  reduced from 300us
>  to 30us so whichever way you look at it, having this patch applied is a 
>  win.
> >>>
> >>> Can you write a paragraph about this, that David can amend to your
> >>> patch? That would stop worrying me about looking at this patch in
> >>> various months...
> >>
> >> H well the existing description already describes the interrupt race 
> >> in OS 9 so I
> >> guess the only part missing is the bit about the fast path. How about the 
> >> revised
> >> text below for the patch description?
> >>
> >>
> >> cuda: decrease time delay before raising VIA SR interrupt and remove 
> >> fast path
> >>
> >> In order to handle a race condition in the MacOS 9 CUDA driver, a 
> >> delay was
> >> introduced when raising the VIA SR interrupt inspired by similar code 
> >> in
> >> MacOnLinux.
> >>
> >> During original testing of the MacOS 9 patches it was found that the 
> >> 30us
> >> delay used in MacOnLinux did not work reliably within QEMU, and a 
> >> value of
> >> 300us was required to function correctly.
> >>
> >> Recent experiments have shown two things: firstly when booting Linux, 
> >> MacOS
> >> 9 and MacOS X the fast path which bypasses the delay is never 
> >> triggered once the
> >> OS kernel is loaded making it effectively useless. Rather than leave 
> >> this code
> >> in place where a guest could potentially enable it by accident and 
> >> break itself,
> >> we might as well just remove it.
> >>
> >> Secondly the previous reliability issues are no longer present, and 
> >> this value
> >> can be reduced down to 20us with no apparent ill effects. This has the 
> >> benefit of
> >> considerably improving the responsiveness of the ADB keyboard and 
> >> mouse within
> >> the guest.
> >>
> >> Signed-off-by: Mark Cave-Ayland 
> >>
> > 
> > Thanks!
> > 
> > Phil.
> 
> No worries. David, are you able to update the commit message in your 
> ppc-for-4.0
> branch accordingly?

Done.

> 
> 
> ATB,
> 
> Mark.
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

[Qemu-devel] [PATCH v5 1/2] Add generic Nios II board.

2019-02-12 Thread Sandra Loosemore

This patch adds support for a generic MMU-less Nios II board that can
be used e.g. for bare-metal compiler testing.  Nios II booting is also
tweaked so that bare-metal binaries start executing in RAM starting at
0x, rather than an alias at 0xc000, which allows features
such as unwinding to work when binaries are linked to start at the
beginning of the address space.

The generic_nommu.c parts are by Andrew Jenner, based on code by Marek
Vasut.

Originally by Marek Vasut and Andrew Jenner.

Signed-off-by: Sandra Loosemore 
Signed-off-by: Julian Brown 
Signed-off-by: Andrew Jenner 
Signed-off-by: Marek Vasut 
---
 default-configs/nios2-softmmu.mak |   1 +
 hw/nios2/Makefile.objs|   1 +
 hw/nios2/boot.c   |   5 +-
 hw/nios2/generic_nommu.c  | 130 ++
 4 files changed, 136 insertions(+), 1 deletion(-)
 create mode 100644 hw/nios2/generic_nommu.c

diff --git a/default-configs/nios2-softmmu.mak 
b/default-configs/nios2-softmmu.mak
index ab42d0f..95ed1c2 100644
--- a/default-configs/nios2-softmmu.mak
+++ b/default-configs/nios2-softmmu.mak
@@ -5,3 +5,4 @@ CONFIG_SERIAL=y
 CONFIG_PTIMER=y
 CONFIG_ALTERA_TIMER=y
 CONFIG_NIOS2_10M50=y
+CONFIG_NIOS2_GENERIC_NOMMU=y
diff --git a/hw/nios2/Makefile.objs b/hw/nios2/Makefile.objs
index 89a419a..3e01798 100644
--- a/hw/nios2/Makefile.objs
+++ b/hw/nios2/Makefile.objs
@@ -1,2 +1,3 @@
 obj-y = boot.o cpu_pic.o
 obj-$(CONFIG_NIOS2_10M50) += 10m50_devboard.o
+obj-$(CONFIG_NIOS2_GENERIC_NOMMU) += generic_nommu.o
diff --git a/hw/nios2/boot.c b/hw/nios2/boot.c
index 5f0ab2f..c697047 100644
--- a/hw/nios2/boot.c
+++ b/hw/nios2/boot.c
@@ -140,6 +140,7 @@ void nios2_load_kernel(Nios2CPU *cpu, hwaddr ddr_base,
 uint64_t entry, low, high;
 uint32_t base32;
 int big_endian = 0;
+int kernel_space = 0;
 
 #ifdef TARGET_WORDS_BIGENDIAN
 big_endian = 1;
@@ -155,10 +156,12 @@ void nios2_load_kernel(Nios2CPU *cpu, hwaddr ddr_base,
translate_kernel_address, NULL,
, NULL, NULL,
big_endian, EM_ALTERA_NIOS2, 0, 0);
+kernel_space = 1;
 }
 
 /* Always boot into physical ram. */
-boot_info.bootstrap_pc = ddr_base + 0xc000 + (entry & 0x07ff);
+boot_info.bootstrap_pc = ddr_base + (kernel_space ? 0xc000 : 0)
+ + (entry & 0x07ff);
 
 /* If it wasn't an ELF image, try an u-boot image. */
 if (kernel_size < 0) {
diff --git a/hw/nios2/generic_nommu.c b/hw/nios2/generic_nommu.c
new file mode 100644
index 000..502567f
--- /dev/null
+++ b/hw/nios2/generic_nommu.c
@@ -0,0 +1,130 @@
+/*
+ * Generic simulator target with no MMU
+ *
+ * Copyright (c) 2018-2019 Mentor Graphics
+ *
+ * Copyright (c) 2016 Marek Vasut 
+ *
+ * Based on LabX device code
+ *
+ * Copyright (c) 2012 Chris Wulff 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see
+ * 
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu-common.h"
+#include "cpu.h"
+
+#include "hw/sysbus.h"
+#include "hw/hw.h"
+#include "hw/char/serial.h"
+#include "sysemu/sysemu.h"
+#include "hw/boards.h"
+#include "exec/memory.h"
+#include "exec/address-spaces.h"
+#include "qemu/config-file.h"
+
+#include "boot.h"
+
+#define BINARY_DEVICE_TREE_FILE"generic-nommu.dtb"
+
+static void nios2_generic_nommu_init(MachineState *machine)
+{
+Nios2CPU *cpu;
+DeviceState *dev;
+MemoryRegion *address_space_mem = get_system_memory();
+MemoryRegion *phys_tcm = g_new(MemoryRegion, 1);
+MemoryRegion *phys_tcm_alias = g_new(MemoryRegion, 1);
+MemoryRegion *phys_ram = g_new(MemoryRegion, 1);
+MemoryRegion *phys_ram_alias = g_new(MemoryRegion, 1);
+ram_addr_t tcm_base = 0x0;
+ram_addr_t tcm_size = 0x1000;/* 1 kiB, but QEMU limit is 4 kiB */
+ram_addr_t ram_base = 0x1000;
+ram_addr_t ram_size = 0x0800;
+qemu_irq *cpu_irq, irq[32];
+int i;
+
+/* Physical TCM (tb_ram_1k) with alias at 0xc000 */
+memory_region_init_ram(phys_tcm, NULL, "nios2.tcm", tcm_size,
+   _abort);
+memory_region_init_alias(phys_tcm_alias, NULL, "nios2.tcm.alias",
+ phys_tcm, 0, tcm_size);
+

[Qemu-devel] [PATCH v5 0/2] Nios II generic board config and semihosting

2019-02-12 Thread Sandra Loosemore

This is the fifth version of the patch series last posted here:

http://lists.nongnu.org/archive/html/qemu-devel/2018-08/msg01987.html

Since the previous version, I've updated the copyrights on the new
files, refreshed the patches against current trunk, and fixed bugs in
the implementations of lseek() and gettimeofday().

The original version of these patches was rejected because there was
no corresponding open-source BSP or I/O library support, making it
difficult to test the code.  I contributed those pieces to libgloss
last summer (commit fddc74d12bf7f765c04c3182a7237ecf23893d27), so that
should no longer be a blocking issue.

Sandra Loosemore (2):
  Add generic Nios II board.
  Add Nios II semihosting support.

 default-configs/nios2-softmmu.mak |   1 +
 hw/nios2/Makefile.objs|   1 +
 hw/nios2/boot.c   |   5 +-
 hw/nios2/generic_nommu.c  | 130 +++
 qemu-options.hx   |   8 +-
 target/nios2/Makefile.objs|   2 +-
 target/nios2/cpu.h|   4 +-
 target/nios2/helper.c |  11 +
 target/nios2/nios2-semi.c | 446 ++
 9 files changed, 601 insertions(+), 7 deletions(-)
 create mode 100644 hw/nios2/generic_nommu.c
 create mode 100644 target/nios2/nios2-semi.c

-- 
2.8.1

[Qemu-devel] [PATCH v5 2/2] Add Nios II semihosting support.

2019-02-12 Thread Sandra Loosemore

This patch adds support for libgloss semihosting to Nios II bare-metal
emulation.

Signed-off-by: Sandra Loosemore 
Signed-off-by: Julian Brown 
---
 qemu-options.hx|   8 +-
 target/nios2/Makefile.objs |   2 +-
 target/nios2/cpu.h |   4 +-
 target/nios2/helper.c  |  11 ++
 target/nios2/nios2-semi.c  | 446 +
 5 files changed, 465 insertions(+), 6 deletions(-)
 create mode 100644 target/nios2/nios2-semi.c

diff --git a/qemu-options.hx b/qemu-options.hx
index 06ef1a7..5019ede 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3712,21 +3712,21 @@ ETEXI
 DEF("semihosting", 0, QEMU_OPTION_semihosting,
 "-semihostingsemihosting mode\n",
 QEMU_ARCH_ARM | QEMU_ARCH_M68K | QEMU_ARCH_XTENSA | QEMU_ARCH_LM32 |
-QEMU_ARCH_MIPS)
+QEMU_ARCH_MIPS | QEMU_ARCH_NIOS2)
 STEXI
 @item -semihosting
 @findex -semihosting
-Enable semihosting mode (ARM, M68K, Xtensa, MIPS only).
+Enable semihosting mode (ARM, M68K, Xtensa, MIPS, Nios II only).
 ETEXI
 DEF("semihosting-config", HAS_ARG, QEMU_OPTION_semihosting_config,
 "-semihosting-config 
[enable=on|off][,target=native|gdb|auto][,arg=str[,...]]\n" \
 "semihosting configuration\n",
 QEMU_ARCH_ARM | QEMU_ARCH_M68K | QEMU_ARCH_XTENSA | QEMU_ARCH_LM32 |
-QEMU_ARCH_MIPS)
+QEMU_ARCH_MIPS | QEMU_ARCH_NIOS2)
 STEXI
 @item -semihosting-config 
[enable=on|off][,target=native|gdb|auto][,arg=str[,...]]
 @findex -semihosting-config
-Enable and configure semihosting (ARM, M68K, Xtensa, MIPS only).
+Enable and configure semihosting (ARM, M68K, Xtensa, MIPS, Nios II only).
 @table @option
 @item target=@code{native|gdb|auto}
 Defines where the semihosting calls will be addressed, to QEMU (@code{native})
diff --git a/target/nios2/Makefile.objs b/target/nios2/Makefile.objs
index 2a11c5c..010de0e 100644
--- a/target/nios2/Makefile.objs
+++ b/target/nios2/Makefile.objs
@@ -1,4 +1,4 @@
-obj-y += translate.o op_helper.o helper.o cpu.o mmu.o
+obj-y += translate.o op_helper.o helper.o cpu.o mmu.o nios2-semi.o
 obj-$(CONFIG_SOFTMMU) += monitor.o
 
 $(obj)/op_helper.o: QEMU_CFLAGS += $(HELPER_CFLAGS)
diff --git a/target/nios2/cpu.h b/target/nios2/cpu.h
index 047f376..afd30d5 100644
--- a/target/nios2/cpu.h
+++ b/target/nios2/cpu.h
@@ -141,7 +141,7 @@ typedef struct Nios2CPUClass {
 #define R_PC 64
 
 /* Exceptions */
-#define EXCP_BREAK-1
+#define EXCP_BREAK0x1000
 #define EXCP_RESET0
 #define EXCP_PRESET   1
 #define EXCP_IRQ  2
@@ -223,6 +223,8 @@ void nios2_cpu_do_unaligned_access(CPUState *cpu, vaddr 
addr,
 qemu_irq *nios2_cpu_pic_init(Nios2CPU *cpu);
 void nios2_check_interrupts(CPUNios2State *env);
 
+void do_nios2_semihosting(CPUNios2State *env);
+
 #define TARGET_PHYS_ADDR_SPACE_BITS 32
 #ifdef CONFIG_USER_ONLY
 # define TARGET_VIRT_ADDR_SPACE_BITS 31
diff --git a/target/nios2/helper.c b/target/nios2/helper.c
index a8b8ec6..ca3b087 100644
--- a/target/nios2/helper.c
+++ b/target/nios2/helper.c
@@ -25,6 +25,7 @@
 #include "exec/exec-all.h"
 #include "exec/log.h"
 #include "exec/helper-proto.h"
+#include "exec/semihost.h"
 
 #if defined(CONFIG_USER_ONLY)
 
@@ -169,6 +170,16 @@ void nios2_cpu_do_interrupt(CPUState *cs)
 break;
 
 case EXCP_BREAK:
+qemu_log_mask(CPU_LOG_INT, "BREAK exception at pc=%x\n",
+  env->regs[R_PC]);
+
+if (semihosting_enabled()) {
+qemu_log_mask(CPU_LOG_INT, "Entering semihosting\n");
+env->regs[R_PC] += 4;
+do_nios2_semihosting(env);
+break;
+}
+
 if ((env->regs[CR_STATUS] & CR_STATUS_EH) == 0) {
 env->regs[CR_BSTATUS] = env->regs[CR_STATUS];
 env->regs[R_BA] = env->regs[R_PC] + 4;
diff --git a/target/nios2/nios2-semi.c b/target/nios2/nios2-semi.c
new file mode 100644
index 000..9db518a
--- /dev/null
+++ b/target/nios2/nios2-semi.c
@@ -0,0 +1,446 @@
+/*
+ *  Nios II Semihosting syscall interface.
+ *  This code is derived from m68k-semi.c.
+ *
+ *  Copyright (c) 2017-2019 Mentor Graphics
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+#include "qemu/osdep.h"
+
+#include "cpu.h"
+#if defined(CONFIG_USER_ONLY)
+#include "qemu.h"
+#else
+#include "qemu-common.h"
+#include "exec/gdbstub.h"
+#include "exec/softmmu-semi.h"
+#endif
+#include "qemu/log.h"
+#include "sysemu/sysemu.h"
+

Re: [Qemu-devel] [PATCH 0/3] pci, vhost-user: Fix two incorrectly applied patches

2019-02-12 Thread Peter Xu

On Tue, Feb 12, 2019 at 03:06:18PM +0100, Philippe Mathieu-Daudé wrote:
> Commit a56de056c91f8 squashed two unrelated commits at once.
> Revert it and reapply the two commits to avoid confusion.
> 
> See: https://lists.gnu.org/archive/html/qemu-devel/2019-02/msg02966.html

This does look slightly better.  Thanks Phil & Michael.

Reviewed-by: Peter Xu 

-- 
Peter Xu

Re: [Qemu-devel] [PATCH v2] Kconfig: add documentation

2019-02-12 Thread Peter Xu

On Tue, Feb 12, 2019 at 10:57:49AM +0100, Paolo Bonzini wrote:

[...]

> +Writing and modifying default configurations
> +
> +
> +In addition to the Kconfig files under hw/, each target also includes
> +a file called ``default-configs/TARGETNAME-softmmu.mak``.  These files
> +initialize some Kconfig variables to non-default values and provide the
> +starting point to turn on devices and subsystems.
> +
> +A file in ``default-configs/`` looks like the following example::
> +
> +# Default configuration for alpha-softmmu
> +
> +# Uncomment the following lines to disable these optional devices:
> +#
> +#CONFIG_PCI_DEVICES=n
> +#CONFIG_TEST_DEVICES=n
> +
> +# Boards:
> +#
> +CONFIG_DP264=y
> +
> +The first part, consisting of commented-out ``=n`` assignments, tells
> +the user which devices or device groups are implied by the boards.
> +The second part, consisting of ``=y`` assignments, tells the user which
> +boards are supported by the target.  The user will typically modify
> +default the configuration by uncommenting lines in the first group,

(noticed a trivial typo when read...)

s/default the/the default/

-- 
Peter Xu

Re: [Qemu-devel] [PATCH v6 00/35] target/riscv: Convert to decodetree

2019-02-12 Thread Palmer Dabbelt

On Tue, Feb 12, 2019 at 3:21 PM Palmer Dabbelt  wrote:

> On Wed, 23 Jan 2019 01:25:03 PST (-0800), Bastian Koppelmann wrote:
> > Hi,
> >
> > this patchset converts the RISC-V decoder to decodetree in four major
> steps:
> >
> > 1) Convert 32-bit instructions to decodetree [Patch 1-16]:
> > Many of the gen_* functions are called by the decode functions for
> 16-bit
> > and 32-bit functions. If we move translation code from the gen_*
> > functions to the generated trans_* functions of decode-tree, we get
> a lot of
> > duplication. Therefore, we mostly generate calls to the old gen_*
> function
> > which are properly replaced after step 2).
> >
> > Each of the trans_ functions are grouped into files corresponding to
> their
> > ISA extension, e.g. addi which is in RV32I is translated in the file
> > 'trans_rvi.inc.c'.
> >
> > 2) Convert 16-bit instructions to decodetree [Patch 17-19]:
> > All 16 bit instructions have a direct mapping to a 32 bit
> instruction. Thus,
> > we convert the arguments in the 16 bit trans_ function to the
> arguments of
> > the corresponding 32 bit instruction and call the 32 bit trans_
> function.
> >
> > 3) Remove old manual decoding in gen_* function [Patch 20-30]:
> > this move all manual translation code into the trans_* instructions
> of
> > decode tree, such that we can remove the old decode_* functions.
> >
> > 4) Simplify RVC by reusing as much as possible from the RVG decoder as
> suggested
> >by Richard. [Patch 31-35]
> >
> > full tree available at
> > https://github.com/bkoppelmann/qemu/tree/riscv-dt-v6
> >
> > Cheers,
> > Bastian
> >
> > v5 -> v6:
> > - fixed funky indentation
> >
> >
> > Bastian Koppelmann (35):
> >   target/riscv: Move CPURISCVState pointer to DisasContext
> >   target/riscv: Activate decodetree and implemnt LUI & AUIPC
> >   target/riscv: Convert RVXI branch insns to decodetree
> >   target/riscv: Convert RV32I load/store insns to decodetree
> >   target/riscv: Convert RV64I load/store insns to decodetree
> >   target/riscv: Convert RVXI arithmetic insns to decodetree
> >   target/riscv: Convert RVXI fence insns to decodetree
> >   target/riscv: Convert RVXI csr insns to decodetree
> >   target/riscv: Convert RVXM insns to decodetree
> >   target/riscv: Convert RV32A insns to decodetree
> >   target/riscv: Convert RV64A insns to decodetree
> >   target/riscv: Convert RV32F insns to decodetree
> >   target/riscv: Convert RV64F insns to decodetree
> >   target/riscv: Convert RV32D insns to decodetree
> >   target/riscv: Convert RV64D insns to decodetree
> >   target/riscv: Convert RV priv insns to decodetree
> >   target/riscv: Convert quadrant 0 of RVXC insns to decodetree
> >   target/riscv: Convert quadrant 1 of RVXC insns to decodetree
> >   target/riscv: Convert quadrant 2 of RVXC insns to decodetree
> >   target/riscv: Remove gen_jalr()
> >   target/riscv: Remove manual decoding from gen_branch()
> >   target/riscv: Remove manual decoding from gen_load()
> >   target/riscv: Remove manual decoding from gen_store()
> >   target/riscv: Move gen_arith_imm() decoding into trans_* functions
> >   target/riscv: make ADD/SUB/OR/XOR/AND insn use arg lists
> >   target/riscv: Remove shift and slt insn manual decoding
> >   target/riscv: Remove manual decoding of RV32/64M insn
> >   target/riscv: Rename trans_arith to gen_arith
> >   target/riscv: Remove gen_system()
> >   target/riscv: Remove decode_RV32_64G()
> >   target/riscv: Convert @cs_2 insns to share translation functions
> >   target/riscv: Convert @cl_d, @cl_w, @cs_d, @cs_w insns
> >   target/riscv: Splice fsw_sd and flw_ld for riscv32 vs riscv64
> >   target/riscv: Splice remaining compressed insn pairs for riscv32 vs
> > riscv64
> >   target/riscv: Remaining rvc insn reuse 32 bit translators
> >
> >  target/riscv/Makefile.objs|   22 +
> >  target/riscv/insn16-32.decode |   31 +
> >  target/riscv/insn16-64.decode |   33 +
> >  target/riscv/insn16.decode|  114 ++
> >  target/riscv/insn32-64.decode |   72 +
> >  target/riscv/insn32.decode|  203 ++
> >  .../riscv/insn_trans/trans_privileged.inc.c   |  110 +
> >  target/riscv/insn_trans/trans_rva.inc.c   |  207 ++
> >  target/riscv/insn_trans/trans_rvc.inc.c   |  149 ++
> >  target/riscv/insn_trans/trans_rvd.inc.c   |  388 
> >  target/riscv/insn_trans/trans_rvf.inc.c   |  388 
> >  target/riscv/insn_trans/trans_rvi.inc.c   |  568 ++
> >  target/riscv/insn_trans/trans_rvm.inc.c   |  107 +
> >  target/riscv/translate.c  | 1781 ++---
> >  14 files changed, 2611 insertions(+), 1562 deletions(-)
> >  create mode 100644 target/riscv/insn16-32.decode
> >  create mode 100644 target/riscv/insn16-64.decode
> >  create mode 100644 target/riscv/insn16.decode
> >  create mode 100644 target/riscv/insn32-64.decode
> >  create

Re: [Qemu-devel] [PATCH v4] virtio-blk: set correct config size for the host driver

2019-02-12 Thread Michael S. Tsirkin

On Wed, Feb 13, 2019 at 09:48:57AM +0800, Changpeng Liu wrote:
> Commit caa1ee43 "vhost-user-blk: add discard/write zeroes features
> support" added fields to struct virtio_blk_config. This changes
> the size of the config space and breaks migration from QEMU 3.1
> and older:
> 
> qemu-system-ppc64: get_pci_config_device: Bad config data: i=0x10 read: 41 
> device: 1 cmask: ff wmask: 80 w1cmask:0
> qemu-system-ppc64: Failed to load PCIDevice:config
> qemu-system-ppc64: Failed to load virtio-blk:virtio
> qemu-system-ppc64: error while loading state for instance 0x0 of device 
> 'pci@8002000:01.0/virtio-blk'
> qemu-system-ppc64: load of migration failed: Invalid argument
> 
> Since virtio-blk doesn't support the "discard" and "write zeroes"
> features, it shouldn't even expose the associated fields in the
> config space actually. Just include all fields up to num_queues to
> match QEMU 3.1 and older.
> 
> Signed-off-by: Changpeng Liu 


Reviewed-by: Michael S. Tsirkin 

Stefan, are you merging this?

> ---
>  hw/block/virtio-blk.c | 13 +
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index 9a87b3b..6fce9c7 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -28,6 +28,10 @@
>  #include "hw/virtio/virtio-bus.h"
>  #include "hw/virtio/virtio-access.h"
>  
> +/* We don't support discard yet, hide associated config fields. */
> +#define VIRTIO_BLK_CFG_SIZE offsetof(struct virtio_blk_config, \
> + max_discard_sectors)
> +
>  static void virtio_blk_init_request(VirtIOBlock *s, VirtQueue *vq,
>  VirtIOBlockReq *req)
>  {
> @@ -761,7 +765,8 @@ static void virtio_blk_update_config(VirtIODevice *vdev, 
> uint8_t *config)
>  blkcfg.alignment_offset = 0;
>  blkcfg.wce = blk_enable_write_cache(s->blk);
>  virtio_stw_p(vdev, _queues, s->conf.num_queues);
> -memcpy(config, , sizeof(struct virtio_blk_config));
> +memcpy(config, , VIRTIO_BLK_CFG_SIZE);
> +QEMU_BUILD_BUG_ON(VIRTIO_BLK_CFG_SIZE > sizeof(blkcfg));
>  }
>  
>  static void virtio_blk_set_config(VirtIODevice *vdev, const uint8_t *config)
> @@ -769,7 +774,8 @@ static void virtio_blk_set_config(VirtIODevice *vdev, 
> const uint8_t *config)
>  VirtIOBlock *s = VIRTIO_BLK(vdev);
>  struct virtio_blk_config blkcfg;
>  
> -memcpy(, config, sizeof(blkcfg));
> +memcpy(, config, VIRTIO_BLK_CFG_SIZE);
> +QEMU_BUILD_BUG_ON(VIRTIO_BLK_CFG_SIZE > sizeof(blkcfg));
>  
>  aio_context_acquire(blk_get_aio_context(s->blk));
>  blk_set_enable_write_cache(s->blk, blkcfg.wce != 0);
> @@ -952,8 +958,7 @@ static void virtio_blk_device_realize(DeviceState *dev, 
> Error **errp)
>  return;
>  }
>  
> -virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK,
> -sizeof(struct virtio_blk_config));
> +virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK, VIRTIO_BLK_CFG_SIZE);
>  
>  s->blk = conf->conf.blk;
>  s->rq = NULL;
> -- 
> 1.9.3

[Qemu-devel] Combining -loadvm and -snapshot

2019-02-12 Thread Drew DeVault

I recently ran into an issue where I found I couldn't combine the
-loadvm and -snapshot flags, nor any conceivable combination of
alternate approaches like loadvm via the monitor. Independently, both
options work as expected, but together I get this error:

qemu-system-x86_64: Device 'virtio0' does not have the requested snapshot 'base'

The goal here is to resume the VM state from a snapshot, but to prevent
the guest from persisting writes to the underlying qcow2.

I started digging into the code to understand this problem more, and I
was pretty deep in the weeds when I realized what the underlying problem
probably was and the kind of refactoring necessary to fix it - so I'm
here to touch base before moving any further.

I believe this happens because -snapshot creates a temporary qcow2
overlaid on top of the disk you're using, and this overlay does not have
any snapshots copied, nor does any of the snapshot reading code (e.g.
qcow2_snapshot_list or qcow2_snapshot_goto) iterate over backing disks
to load their snapshots.

At first I was going to adjust the qcow2 snapshot loading code (those
two functions in particular) to read through their backends, but I'm a
little unfamiliar with this code and the refactoring is not minor so I
would like to get feedback from some of the wiser folks on this mailing
list before I sink too much time into this.

Thoughts?

--
Drew DeVault

Re: [Qemu-devel] [PATCH v3] virtio-blk: set correct config size for the host driver

2019-02-12 Thread Liu, Changpeng




> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: Tuesday, February 12, 2019 11:11 PM
> To: Liu, Changpeng 
> Cc: qemu-devel@nongnu.org; stefa...@redhat.com; sgarz...@redhat.com;
> dgilb...@redhat.com; ldok...@redhat.com
> Subject: Re: [PATCH v3] virtio-blk: set correct config size for the host 
> driver
> 
> On Tue, Feb 12, 2019 at 11:19:49PM +0800, Changpeng Liu wrote:
> > Commit caa1ee43 "vhost-user-blk: add discard/write zeroes features
> > support" added fields to struct virtio_blk_config. This changes
> > the size of the config space and breaks migration from QEMU 3.1
> > and older:
> >
> > qemu-system-ppc64: get_pci_config_device: Bad config data: i=0x10 read: 41
> device: 1 cmask: ff wmask: 80 w1cmask:0
> > qemu-system-ppc64: Failed to load PCIDevice:config
> > qemu-system-ppc64: Failed to load virtio-blk:virtio
> > qemu-system-ppc64: error while loading state for instance 0x0 of device
> 'pci@8002000:01.0/virtio-blk'
> > qemu-system-ppc64: load of migration failed: Invalid argument
> >
> > Since virtio-blk doesn't support the "discard" and "write zeroes"
> > features, it shouldn't even expose the associated fields in the
> > config space actually. Just include all fields up to num_queues to
> > match QEMU 3.1 and older.
> >
> > Signed-off-by: Changpeng Liu 
> 
> OK almost.
> 
> > ---
> >  hw/block/virtio-blk.c | 14 ++
> >  1 file changed, 10 insertions(+), 4 deletions(-)
> >
> > diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> > index 9a87b3b..0ff5315 100644
> > --- a/hw/block/virtio-blk.c
> > +++ b/hw/block/virtio-blk.c
> > @@ -28,6 +28,10 @@
> >  #include "hw/virtio/virtio-bus.h"
> >  #include "hw/virtio/virtio-access.h"
> >
> > +/* We don't support discard yet, hide associated config fields. */
> > +#define VIRTIO_BLK_CFG_SIZE offsetof(struct virtio_blk_config, \
> > + max_discard_sectors)
> > +
> >  static void virtio_blk_init_request(VirtIOBlock *s, VirtQueue *vq,
> >  VirtIOBlockReq *req)
> >  {
> > @@ -761,7 +765,9 @@ static void virtio_blk_update_config(VirtIODevice
> *vdev, uint8_t *config)
> >  blkcfg.alignment_offset = 0;
> >  blkcfg.wce = blk_enable_write_cache(s->blk);
> >  virtio_stw_p(vdev, _queues, s->conf.num_queues);
> > -memcpy(config, , sizeof(struct virtio_blk_config));
> > +memcpy(config, , VIRTIO_BLK_CFG_SIZE);
> > +QEMU_BUILD_BUG_ON(VIRTIO_BLK_CFG_SIZE > sizeof(struct
> virtio_blk_config));
> > +
> 
> Oh probably sizeof blkcfg here, right?
> Also we don't need the empty line here.
Ok, removed the empty line and use sizeof(blkcfg) instead with v4.
> 
> >  }
> >
> >  static void virtio_blk_set_config(VirtIODevice *vdev, const uint8_t 
> > *config)
> > @@ -769,7 +775,8 @@ static void virtio_blk_set_config(VirtIODevice *vdev,
> const uint8_t *config)
> >  VirtIOBlock *s = VIRTIO_BLK(vdev);
> >  struct virtio_blk_config blkcfg;
> >
> > -memcpy(, config, sizeof(blkcfg));
> > +memcpy(, config, VIRTIO_BLK_CFG_SIZE);
> > +QEMU_BUILD_BUG_ON(VIRTIO_BLK_CFG_SIZE > sizeof(blkcfg));
> >
> >  aio_context_acquire(blk_get_aio_context(s->blk));
> >  blk_set_enable_write_cache(s->blk, blkcfg.wce != 0);
> > @@ -952,8 +959,7 @@ static void virtio_blk_device_realize(DeviceState *dev,
> Error **errp)
> >  return;
> >  }
> >
> > -virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK,
> > -sizeof(struct virtio_blk_config));
> > +virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK, VIRTIO_BLK_CFG_SIZE);
> >
> >  s->blk = conf->conf.blk;
> >  s->rq = NULL;
> > --
> > 1.9.3

[Qemu-devel] [PATCH v4] virtio-blk: set correct config size for the host driver

2019-02-12 Thread Changpeng Liu

Commit caa1ee43 "vhost-user-blk: add discard/write zeroes features
support" added fields to struct virtio_blk_config. This changes
the size of the config space and breaks migration from QEMU 3.1
and older:

qemu-system-ppc64: get_pci_config_device: Bad config data: i=0x10 read: 41 
device: 1 cmask: ff wmask: 80 w1cmask:0
qemu-system-ppc64: Failed to load PCIDevice:config
qemu-system-ppc64: Failed to load virtio-blk:virtio
qemu-system-ppc64: error while loading state for instance 0x0 of device 
'pci@8002000:01.0/virtio-blk'
qemu-system-ppc64: load of migration failed: Invalid argument

Since virtio-blk doesn't support the "discard" and "write zeroes"
features, it shouldn't even expose the associated fields in the
config space actually. Just include all fields up to num_queues to
match QEMU 3.1 and older.

Signed-off-by: Changpeng Liu 
---
 hw/block/virtio-blk.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 9a87b3b..6fce9c7 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -28,6 +28,10 @@
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
 
+/* We don't support discard yet, hide associated config fields. */
+#define VIRTIO_BLK_CFG_SIZE offsetof(struct virtio_blk_config, \
+ max_discard_sectors)
+
 static void virtio_blk_init_request(VirtIOBlock *s, VirtQueue *vq,
 VirtIOBlockReq *req)
 {
@@ -761,7 +765,8 @@ static void virtio_blk_update_config(VirtIODevice *vdev, 
uint8_t *config)
 blkcfg.alignment_offset = 0;
 blkcfg.wce = blk_enable_write_cache(s->blk);
 virtio_stw_p(vdev, _queues, s->conf.num_queues);
-memcpy(config, , sizeof(struct virtio_blk_config));
+memcpy(config, , VIRTIO_BLK_CFG_SIZE);
+QEMU_BUILD_BUG_ON(VIRTIO_BLK_CFG_SIZE > sizeof(blkcfg));
 }
 
 static void virtio_blk_set_config(VirtIODevice *vdev, const uint8_t *config)
@@ -769,7 +774,8 @@ static void virtio_blk_set_config(VirtIODevice *vdev, const 
uint8_t *config)
 VirtIOBlock *s = VIRTIO_BLK(vdev);
 struct virtio_blk_config blkcfg;
 
-memcpy(, config, sizeof(blkcfg));
+memcpy(, config, VIRTIO_BLK_CFG_SIZE);
+QEMU_BUILD_BUG_ON(VIRTIO_BLK_CFG_SIZE > sizeof(blkcfg));
 
 aio_context_acquire(blk_get_aio_context(s->blk));
 blk_set_enable_write_cache(s->blk, blkcfg.wce != 0);
@@ -952,8 +958,7 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 return;
 }
 
-virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK,
-sizeof(struct virtio_blk_config));
+virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK, VIRTIO_BLK_CFG_SIZE);
 
 s->blk = conf->conf.blk;
 s->rq = NULL;
-- 
1.9.3

Re: [Qemu-devel] [PATCH] SiFive RISC-V GPIO Device

2019-02-12 Thread Alistair Francis

On Tue, Feb 12, 2019 at 9:39 AM Fabien Chouteau  wrote:
>
> QEMU model of the GPIO device on the SiFive E300 series SOCs.
>
> The pins are not used by a board definition yet, however this
> implementation can already be used to trigger GPIO interrupts from the
> software by configuring a pin as both output and input.
>
> Signed-off-by: Fabien Chouteau 

Hey,

Thanks for the patch!

> ---
>  Makefile.objs  |   1 +
>  hw/riscv/Makefile.objs |   1 +
>  hw/riscv/sifive_e.c|  28 ++-
>  hw/riscv/sifive_gpio.c | 388 +
>  hw/riscv/trace-events  |   7 +
>  include/hw/riscv/sifive_e.h|   8 +-
>  include/hw/riscv/sifive_gpio.h |  72 ++
>  7 files changed, 501 insertions(+), 4 deletions(-)
>  create mode 100644 hw/riscv/sifive_gpio.c
>  create mode 100644 hw/riscv/trace-events
>  create mode 100644 include/hw/riscv/sifive_gpio.h
>
> diff --git a/Makefile.objs b/Makefile.objs
> index 67a054b08a..d40eb089ae 100644
> --- a/Makefile.objs
> +++ b/Makefile.objs
> @@ -184,6 +184,7 @@ trace-events-subdirs += hw/virtio
>  trace-events-subdirs += hw/watchdog
>  trace-events-subdirs += hw/xen
>  trace-events-subdirs += hw/gpio
> +trace-events-subdirs += hw/riscv
>  trace-events-subdirs += io
>  trace-events-subdirs += linux-user
>  trace-events-subdirs += migration
> diff --git a/hw/riscv/Makefile.objs b/hw/riscv/Makefile.objs
> index 1dde01d39d..ced7935371 100644
> --- a/hw/riscv/Makefile.objs
> +++ b/hw/riscv/Makefile.objs
> @@ -7,5 +7,6 @@ obj-y += sifive_plic.o
>  obj-y += sifive_test.o
>  obj-y += sifive_u.o
>  obj-y += sifive_uart.o
> +obj-y += sifive_gpio.o

I know the other RISC-V files don't do it, but this should go in the
hw/gpio directory instead of hw/riscv.

>  obj-y += spike.o
>  obj-y += virt.o
> diff --git a/hw/riscv/sifive_e.c b/hw/riscv/sifive_e.c
> index 5d9d65ff29..49c1dd986c 100644
> --- a/hw/riscv/sifive_e.c
> +++ b/hw/riscv/sifive_e.c
> @@ -146,11 +146,15 @@ static void riscv_sifive_e_soc_init(Object *obj)
>  _abort);
>  object_property_set_int(OBJECT(>cpus), smp_cpus, "num-harts",
>  _abort);
> +sysbus_init_child_obj(obj, "riscv.sifive.e.gpio0",
> +  >gpio, sizeof(s->gpio),
> +  TYPE_SIFIVE_GPIO);
>  }
>
>  static void riscv_sifive_e_soc_realize(DeviceState *dev, Error **errp)
>  {
>  const struct MemmapEntry *memmap = sifive_e_memmap;
> +Error *err = NULL;
>
>  SiFiveESoCState *s = RISCV_E_SOC(dev);
>  MemoryRegion *sys_mem = get_system_memory();
> @@ -184,8 +188,28 @@ static void riscv_sifive_e_soc_realize(DeviceState *dev, 
> Error **errp)
>  sifive_mmio_emulate(sys_mem, "riscv.sifive.e.aon",
>  memmap[SIFIVE_E_AON].base, memmap[SIFIVE_E_AON].size);
>  sifive_prci_create(memmap[SIFIVE_E_PRCI].base);
> -sifive_mmio_emulate(sys_mem, "riscv.sifive.e.gpio0",
> -memmap[SIFIVE_E_GPIO0].base, memmap[SIFIVE_E_GPIO0].size);
> +
> +/* GPIO */
> +
> +object_property_set_bool(OBJECT(>gpio), true, "realized", );
> +if (err) {
> +error_propagate(errp, err);
> +return;
> +}
> +
> +/* Map GPIO registers */
> +sysbus_mmio_map(SYS_BUS_DEVICE(>gpio), 0, 
> memmap[SIFIVE_E_GPIO0].base);
> +
> +/* Pass all GPIOs to the SOC layer so they are available to the board */
> +qdev_pass_gpios(DEVICE(>gpio), dev, NULL);
> +
> +/* Connect GPIO interrupts to the PLIC */
> +for (int i = 0; i < 32; i++) {
> +sysbus_connect_irq(SYS_BUS_DEVICE(>gpio), i,
> +   qdev_get_gpio_in(DEVICE(s->plic),
> +SIFIVE_E_GPIO0_IRQ0 + i));
> +}
> +

It's common in QEMU world to split your patch in two. One that adds
the device and then one that connects it.

In this case the patch isn't too complex so it's fine, just for future
reference.

>  sifive_uart_create(sys_mem, memmap[SIFIVE_E_UART0].base,
>  serial_hd(0), qdev_get_gpio_in(DEVICE(s->plic), SIFIVE_E_UART0_IRQ));
>  sifive_mmio_emulate(sys_mem, "riscv.sifive.e.qspi0",
> diff --git a/hw/riscv/sifive_gpio.c b/hw/riscv/sifive_gpio.c
> new file mode 100644
> index 00..06bd8112d7
> --- /dev/null
> +++ b/hw/riscv/sifive_gpio.c
> @@ -0,0 +1,388 @@
> +/*
> + * sifive System-on-Chip general purpose input/output register definition
> + *
> + * Copyright 2019 AdaCore
> + *
> + * Base on nrf51_gpio.c:
> + *
> + * Copyright 2018 Steffen Görtz 
> + *
> + * This code is licensed under the GPL version 2 or later.  See
> + * the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/log.h"
> +#include "hw/riscv/sifive_gpio.h"
> +#include "trace.h"
> +
> +static void update_output_irq(SIFIVEGPIOState *s)
> +{
> +

Remove the new line.

> +uint32_t pending;
> +uint32_t pin;
> +
> +pending = s->high_ip & s->high_ie;
> +pending |= s->low_ip & s->low_ie;
> +

Re: [Qemu-devel] [PATCH] hw/display: Add basic ATI VGA emulation

2019-02-12 Thread BALATON Zoltan


Hello,

On Tue, 12 Feb 2019, Philippe Mathieu-Daudé wrote:

Hi Zoltan,


Thanks for the quick review and testing. I'll use your suggestions for the 
other (mips) patches in a v2. For this one I'm not convinced.



On 2/11/19 4:19 AM, BALATON Zoltan wrote:

[...]

+
+static void ati_reg_write_offs(uint32_t *reg, int offs,
+   uint64_t data, unsigned int size)
+{
+int shift, i;
+uint32_t mask;
+
+for (i = 0; i < size; i++) {
+shift = (offs + i) * 8;
+mask = 0xffUL << shift;
+*reg &= ~mask;
+*reg |= (data & 0xff) << shift;
+data >>= 8;


I'd have use a pair of extract32/deposit32 but this is probably easier
to singlestep.


You've told me that before but I have concerns about the asserts in those 
functions which to me seem like unnecessary overhead in such low level 
functions so unless these are removed or *_noassert versions introduced 
I'll stay away from them.


But I'm also not too happy about these *_offs functions but some registers 
support 8/16/32 bit access and guest code seems to actually do this to 
update bits in the middle of the register at an odd address. Best would be 
if I could just set .impl.min = 4, .impl.max = 4 and .valid.min = 1 
.valid.max = 4 for the mem region ops but I'm not sure that would work or 
would it? If that's working maybe I should just go with that instead.


[...]

diff --git a/hw/display/ati_int.h b/hw/display/ati_int.h
new file mode 100644
index 00..85d045517c
--- /dev/null
+++ b/hw/display/ati_int.h
@@ -0,0 +1,67 @@
+/*
+ * QEMU ATI SVGA emulation
+ *
+ * Copyright (c) 2019 BALATON Zoltan
+ *
+ * This work is licensed under the GNU GPL license version 2 or later.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/pci/pci.h"
+#include "vga_int.h"
+
+#undef DEBUG_ATI
+
+#ifdef DEBUG_ATI
+#define DPRINTF(fmt, ...) printf("%s: " fmt, __func__, ## __VA_ARGS__)
+#else
+#define DPRINTF(fmt, ...) do {} while (0)


Please use tracepoints (you already add some!).


I won't and here's why: This is not a finished device model and I expect 
to need to add debug logs and change them frequently during further 
development and for such ad-hoc debugging DPRINF is still easier to use 
because I don't have to define the format string at one file and use them 
somewhere else. With DPRINTF I can just add a debug log at one place and 
change it easily without editing it at two unrelated places so it's easier 
to work with. Once development is finished those that we intend to leave 
in for later tracing can be converted to trace points (for which trace 
point is better) and at that point remove the DPRINTF macro. We still have 
enough DPRINTFs in QEMU so this should be OK. I've already added trace 
points to two such places but even for those I almost considered ditching 
them when checkpatch insisted I have to add 0x prefix to hex numbers (I 
don't like this because I know these are hex and printing e.g. 0x8 instead 
of 8 is just distracting from the actual important value which is what 
counts when I'm looking at a lot of these during debugging. Anything that 
distracts from actual values and makes it harder to read (such as 
timestamps and pids added by trace) is bad so I've considered going back 
to DPRINTF even for those trace points but will see if I can live with 
these for now.) But those that are still DPRINTFs won't be converted to 
trace but supposed to be removed when no longer needed.


[...]

I don't understand well the display code, but the result works very
well, nice work :)

Tested-by: Philippe Mathieu-Daudé 


Thanks, it's a start and currently only targeting Linux console with a lot 
more to do for it to be more useful. But I have limited time for this so 
since it's already useful to get mips_fulong2e working I thought that 
justifies including it now so others have a chance to look at it and maybe 
even help to improve it which can't happen if it's only sitting on my 
machine.


Regards,
BALATON Zoltan

Re: [Qemu-devel] [PATCH 4/4] mips_fulong2e: Add on-board graphics chip

2019-02-12 Thread BALATON Zoltan


On Tue, 12 Feb 2019, Philippe Mathieu-Daudé wrote:

On 2/11/19 5:01 AM, BALATON Zoltan wrote:

Add (partial) emulation of the on-board GPU of the machine. This
allows the PMON2000 firmware to run and should also work with Linux
console but probably not with X yet.

Signed-off-by: BALATON Zoltan 
---
Depends on hw/display: Add basic ATI VGA emulation

 hw/mips/mips_fulong2e.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/mips/mips_fulong2e.c b/hw/mips/mips_fulong2e.c
index eec6fd02c8..68bd030fc1 100644
--- a/hw/mips/mips_fulong2e.c
+++ b/hw/mips/mips_fulong2e.c
@@ -287,6 +287,7 @@ static void mips_fulong2e_init(MachineState *machine)
 I2CBus *smbus;
 MIPSCPU *cpu;
 CPUMIPSState *env;
+DeviceState *dev;

 /* init CPUs */
 cpu = MIPS_CPU(cpu_create(machine->cpu_type));
@@ -347,6 +348,11 @@ static void mips_fulong2e_init(MachineState *machine)
 vt82c686b_southbridge_init(pci_bus, FULONG2E_VIA_SLOT, env->irq[5],
, _bus);

+/* GPU */
+dev = DEVICE(pci_create(pci_bus, -1, "ati-vga"));


You missed in your cover:
Based-on: 20190211040434.1c986745...@zero.eik.bme.hu


I forgot what this was called, thanks for reminding. But patchew seems to 
be down anyway.


Regards,
BALATON Zoltan



Else testing we get:
qemu-system-mips64el: Unknown device 'ati-vga' for bus 'PCI'
Aborted (core dumped)


+qdev_prop_set_uint16(dev, "device_id", 0x5159);
+qdev_init_nofail(dev);
+
 /* Populate SPD eeprom data */
 spd_data = spd_data_generate(DDR, ram_size, );
 if (err) {



Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé

Re: [Qemu-devel] [PATCH 07/19] target/ppc: Make special ORs match x86 pause and don't generate on mttcg

2019-02-12 Thread Benjamin Herrenschmidt

On Tue, 2019-02-12 at 16:59 +1100, David Gibson wrote:
> On Mon, Jan 28, 2019 at 10:46:13AM +0100, Cédric Le Goater wrote:
> > From: Benjamin Herrenschmidt 
> > 
> > There's no point in going out of translation on an SMT OR with
> > mttcg since the backend won't do anything useful such as pausing,
> > it's only useful on traditional TCG to give time to other
> > processors.
> 
> Is it actively harmful in the MTTCG case, or just pointless?

I think it can hurt performance, I don't remember for sure :)

> > Signed-off-by: Benjamin Herrenschmidt 
> > Signed-off-by: Cédric Le Goater 
> > ---
> >  target/ppc/translate.c | 6 --
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> > index e169c43643a1..7d40a1fbe6bd 100644
> > --- a/target/ppc/translate.c
> > +++ b/target/ppc/translate.c
> > @@ -1580,7 +1580,7 @@ static void gen_pause(DisasContext *ctx)
> >  tcg_temp_free_i32(t0);
> >  
> >  /* Stop translation, this gives other CPUs a chance to run */
> > -gen_exception_nip(ctx, EXCP_HLT, ctx->base.pc_next);
> > +gen_exception_nip(ctx, EXCP_INTERRUPT, ctx->base.pc_next);
> 
> I don't see how this change relates to the rest.

Yeah not sure anymore :-)

> >  }
> >  #endif /* defined(TARGET_PPC64) */
> >  
> > @@ -1662,7 +1662,9 @@ static void gen_or(DisasContext *ctx)
> >   * than no-op, e.g., miso(rs=26), yield(27), mdoio(29), mdoom(30),
> >   * and all currently undefined.
> >   */
> > -gen_pause(ctx);
> > +if (!mttcg_enabled) {
> > +gen_pause(ctx);
> > +}
> >  #endif
> >  #endif
> >  }

Re: [Qemu-devel] [PATCH 08/19] target/ppc: Fix nip on power management instructions

2019-02-12 Thread Benjamin Herrenschmidt

On Tue, 2019-02-12 at 17:02 +1100, David Gibson wrote:
> On Mon, Jan 28, 2019 at 10:46:14AM +0100, Cédric Le Goater wrote:
> > From: Benjamin Herrenschmidt 
> > 
> > Those instructions currently raise an exception from within
> > the helper. This tends to result in a bogus nip value in
> > the env context (typically the beginning of the TB). Such
> > a helper needs a gen_update_nip() first.
> > 
> > This fixes it with a different approach which is to throw
> > the exception from translate.c instead of the helper using
> > gen_exception_nip() which does the right thing.
> > 
> > Signed-off-by: Benjamin Herrenschmidt 
> > Signed-off-by: Cédric Le Goater 
> > ---
> >  target/ppc/excp_helper.c |  1 -
> >  target/ppc/translate.c   | 12 
> >  2 files changed, 8 insertions(+), 5 deletions(-)
> > 
> > diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
> > index 751d759fcc1d..8407e0ade938 100644
> > --- a/target/ppc/excp_helper.c
> > +++ b/target/ppc/excp_helper.c
> > @@ -958,7 +958,6 @@ void helper_pminsn(CPUPPCState *env, powerpc_pm_insn_t 
> > insn)
> >   * but this doesn't seem to be a problem.
> >   */
> >  env->msr |= (1ull << MSR_EE);
> > -raise_exception(env, EXCP_HLT);
> >  }
> >  #endif /* defined(TARGET_PPC64) */
> >  
> > diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> > index 7d40a1fbe6bd..55281a8975e0 100644
> > --- a/target/ppc/translate.c
> > +++ b/target/ppc/translate.c
> > @@ -3571,7 +3571,8 @@ static void gen_doze(DisasContext *ctx)
> >  t = tcg_const_i32(PPC_PM_DOZE);
> >  gen_helper_pminsn(cpu_env, t);
> >  tcg_temp_free_i32(t);
> > -gen_stop_exception(ctx);
> > +/* Stop translation, as the CPU is supposed to sleep from now */
> > +gen_exception_nip(ctx, EXCP_HLT, ctx->base.pc_next);
> 
> IIUC this also changes from EXCP_STOP to EXCP_HLT, is that intentional?

Off the top of my head, it might be to break out of the outer exec
loop, but I don't remember off hand.

> >  #endif /* defined(CONFIG_USER_ONLY) */
> >  }
> >  
> > @@ -3586,7 +3587,8 @@ static void gen_nap(DisasContext *ctx)
> >  t = tcg_const_i32(PPC_PM_NAP);
> >  gen_helper_pminsn(cpu_env, t);
> >  tcg_temp_free_i32(t);
> > -gen_stop_exception(ctx);
> > +/* Stop translation, as the CPU is supposed to sleep from now */
> > +gen_exception_nip(ctx, EXCP_HLT, ctx->base.pc_next);
> >  #endif /* defined(CONFIG_USER_ONLY) */
> >  }
> >  
> > @@ -3606,7 +3608,8 @@ static void gen_sleep(DisasContext *ctx)
> >  t = tcg_const_i32(PPC_PM_SLEEP);
> >  gen_helper_pminsn(cpu_env, t);
> >  tcg_temp_free_i32(t);
> > -gen_stop_exception(ctx);
> > +/* Stop translation, as the CPU is supposed to sleep from now */
> > +gen_exception_nip(ctx, EXCP_HLT, ctx->base.pc_next);
> >  #endif /* defined(CONFIG_USER_ONLY) */
> >  }
> >  
> > @@ -3621,7 +3624,8 @@ static void gen_rvwinkle(DisasContext *ctx)
> >  t = tcg_const_i32(PPC_PM_RVWINKLE);
> >  gen_helper_pminsn(cpu_env, t);
> >  tcg_temp_free_i32(t);
> > -gen_stop_exception(ctx);
> > +/* Stop translation, as the CPU is supposed to sleep from now */
> > +gen_exception_nip(ctx, EXCP_HLT, ctx->base.pc_next);
> >  #endif /* defined(CONFIG_USER_ONLY) */
> >  }
> >  #endif /* #if defined(TARGET_PPC64) */

Re: [Qemu-devel] [PATCH 08/19] target/ppc: Fix nip on power management instructions

2019-02-12 Thread David Gibson

On Mon, Jan 28, 2019 at 10:46:14AM +0100, Cédric Le Goater wrote:
> From: Benjamin Herrenschmidt 
> 
> Those instructions currently raise an exception from within
> the helper. This tends to result in a bogus nip value in
> the env context (typically the beginning of the TB). Such
> a helper needs a gen_update_nip() first.
> 
> This fixes it with a different approach which is to throw
> the exception from translate.c instead of the helper using
> gen_exception_nip() which does the right thing.
> 
> Signed-off-by: Benjamin Herrenschmidt 
> Signed-off-by: Cédric Le Goater 
> ---
>  target/ppc/excp_helper.c |  1 -
>  target/ppc/translate.c   | 12 
>  2 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
> index 751d759fcc1d..8407e0ade938 100644
> --- a/target/ppc/excp_helper.c
> +++ b/target/ppc/excp_helper.c
> @@ -958,7 +958,6 @@ void helper_pminsn(CPUPPCState *env, powerpc_pm_insn_t 
> insn)
>   * but this doesn't seem to be a problem.
>   */
>  env->msr |= (1ull << MSR_EE);
> -raise_exception(env, EXCP_HLT);
>  }
>  #endif /* defined(TARGET_PPC64) */
>  
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index 7d40a1fbe6bd..55281a8975e0 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -3571,7 +3571,8 @@ static void gen_doze(DisasContext *ctx)
>  t = tcg_const_i32(PPC_PM_DOZE);
>  gen_helper_pminsn(cpu_env, t);
>  tcg_temp_free_i32(t);
> -gen_stop_exception(ctx);
> +/* Stop translation, as the CPU is supposed to sleep from now */
> +gen_exception_nip(ctx, EXCP_HLT, ctx->base.pc_next);

IIUC this also changes from EXCP_STOP to EXCP_HLT, is that intentional?

>  #endif /* defined(CONFIG_USER_ONLY) */
>  }
>  
> @@ -3586,7 +3587,8 @@ static void gen_nap(DisasContext *ctx)
>  t = tcg_const_i32(PPC_PM_NAP);
>  gen_helper_pminsn(cpu_env, t);
>  tcg_temp_free_i32(t);
> -gen_stop_exception(ctx);
> +/* Stop translation, as the CPU is supposed to sleep from now */
> +gen_exception_nip(ctx, EXCP_HLT, ctx->base.pc_next);
>  #endif /* defined(CONFIG_USER_ONLY) */
>  }
>  
> @@ -3606,7 +3608,8 @@ static void gen_sleep(DisasContext *ctx)
>  t = tcg_const_i32(PPC_PM_SLEEP);
>  gen_helper_pminsn(cpu_env, t);
>  tcg_temp_free_i32(t);
> -gen_stop_exception(ctx);
> +/* Stop translation, as the CPU is supposed to sleep from now */
> +gen_exception_nip(ctx, EXCP_HLT, ctx->base.pc_next);
>  #endif /* defined(CONFIG_USER_ONLY) */
>  }
>  
> @@ -3621,7 +3624,8 @@ static void gen_rvwinkle(DisasContext *ctx)
>  t = tcg_const_i32(PPC_PM_RVWINKLE);
>  gen_helper_pminsn(cpu_env, t);
>  tcg_temp_free_i32(t);
> -gen_stop_exception(ctx);
> +/* Stop translation, as the CPU is supposed to sleep from now */
> +gen_exception_nip(ctx, EXCP_HLT, ctx->base.pc_next);
>  #endif /* defined(CONFIG_USER_ONLY) */
>  }
>  #endif /* #if defined(TARGET_PPC64) */

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 09/19] target/ppc: Don't clobber MSR:EE on PM instructions

2019-02-12 Thread David Gibson

On Mon, Jan 28, 2019 at 10:46:15AM +0100, Cédric Le Goater wrote:
> From: Benjamin Herrenschmidt 
> 
> When issuing a power management instruction, we set MSR:EE
> to force ppc_hw_interrupt() into calling powerpc_excp()
> to deal with the fact that on P7 and P8, the system reset
> caused by the wakeup needs to be generated regardless of
> the MSR:EE value (using LPCR only).
> 
> This however means that the OS will see a bogus SRR1:EE
> value which is a problem. It also prevents properly
> implementing P9 STOP "light".
> 
> So fix this by instead putting some logic in ppc_hw_interrupt()
> to decide whether to deliver or not by taking into account the
> fact that we are waking up from sleep.
> 
> The LPCR isn't checked as this is done in the has_work() test.
> 
> Signed-off-by: Benjamin Herrenschmidt 
> Signed-off-by: Cédric Le Goater 

Reviewed-by: David Gibson 

> ---
>  target/ppc/excp_helper.c | 27 +++
>  1 file changed, 15 insertions(+), 12 deletions(-)
> 
> diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
> index 8407e0ade938..7c7c8d1b9dc6 100644
> --- a/target/ppc/excp_helper.c
> +++ b/target/ppc/excp_helper.c
> @@ -748,6 +748,7 @@ void ppc_cpu_do_interrupt(CPUState *cs)
>  static void ppc_hw_interrupt(CPUPPCState *env)
>  {
>  PowerPCCPU *cpu = ppc_env_get_cpu(env);
> +bool async_deliver;
>  
>  /* External reset */
>  if (env->pending_interrupts & (1 << PPC_INTERRUPT_RESET)) {
> @@ -769,11 +770,20 @@ static void ppc_hw_interrupt(CPUPPCState *env)
>  return;
>  }
>  #endif
> +
> +/*
> + * For interrupts that gate on MSR:EE, we need to do something a
> + * bit more subtle, as we need to let them through even when EE is
> + * clear when coming out of some power management states (in order
> + * for them to become a 0x100).
> + */
> +async_deliver = (msr_ee != 0) || env->in_pm_state;
> +
>  /* Hypervisor decrementer exception */
>  if (env->pending_interrupts & (1 << PPC_INTERRUPT_HDECR)) {
>  /* LPCR will be clear when not supported so this will work */
>  bool hdice = !!(env->spr[SPR_LPCR] & LPCR_HDICE);
> -if ((msr_ee != 0 || msr_hv == 0) && hdice) {
> +if ((async_deliver || msr_hv == 0) && hdice) {
>  /* HDEC clears on delivery */
>  env->pending_interrupts &= ~(1 << PPC_INTERRUPT_HDECR);
>  powerpc_excp(cpu, env->excp_model, POWERPC_EXCP_HDECR);
> @@ -783,7 +793,7 @@ static void ppc_hw_interrupt(CPUPPCState *env)
>  /* Extermal interrupt can ignore MSR:EE under some circumstances */
>  if (env->pending_interrupts & (1 << PPC_INTERRUPT_EXT)) {
>  bool lpes0 = !!(env->spr[SPR_LPCR] & LPCR_LPES0);
> -if (msr_ee != 0 || (env->has_hv_mode && msr_hv == 0 && !lpes0)) {
> +if (async_deliver || (env->has_hv_mode && msr_hv == 0 && !lpes0)) {
>  powerpc_excp(cpu, env->excp_model, POWERPC_EXCP_EXTERNAL);
>  return;
>  }
> @@ -795,7 +805,7 @@ static void ppc_hw_interrupt(CPUPPCState *env)
>  return;
>  }
>  }
> -if (msr_ee != 0) {
> +if (async_deliver != 0) {
>  /* Watchdog timer on embedded PowerPC */
>  if (env->pending_interrupts & (1 << PPC_INTERRUPT_WDT)) {
>  env->pending_interrupts &= ~(1 << PPC_INTERRUPT_WDT);
> @@ -943,21 +953,14 @@ void helper_pminsn(CPUPPCState *env, powerpc_pm_insn_t 
> insn)
>  
>  cs = CPU(ppc_env_get_cpu(env));
>  cs->halted = 1;
> -env->in_pm_state = true;
>  
>  /* The architecture specifies that HDEC interrupts are
>   * discarded in PM states
>   */
>  env->pending_interrupts &= ~(1 << PPC_INTERRUPT_HDECR);
>  
> -/* Technically, nap doesn't set EE, but if we don't set it
> - * then ppc_hw_interrupt() won't deliver. We could add some
> - * other tests there based on LPCR but it's simpler to just
> - * whack EE in. It will be cleared by the 0x100 at wakeup
> - * anyway. It will still be observable by the guest in SRR1
> - * but this doesn't seem to be a problem.
> - */
> -env->msr |= (1ull << MSR_EE);
> +/* Condition for waking up at 0x100 */
> +env->in_pm_state = true;
>  }
>  #endif /* defined(TARGET_PPC64) */
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 07/19] target/ppc: Make special ORs match x86 pause and don't generate on mttcg

2019-02-12 Thread David Gibson

On Mon, Jan 28, 2019 at 10:46:13AM +0100, Cédric Le Goater wrote:
> From: Benjamin Herrenschmidt 
> 
> There's no point in going out of translation on an SMT OR with
> mttcg since the backend won't do anything useful such as pausing,
> it's only useful on traditional TCG to give time to other
> processors.

Is it actively harmful in the MTTCG case, or just pointless?

> 
> Signed-off-by: Benjamin Herrenschmidt 
> Signed-off-by: Cédric Le Goater 
> ---
>  target/ppc/translate.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index e169c43643a1..7d40a1fbe6bd 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -1580,7 +1580,7 @@ static void gen_pause(DisasContext *ctx)
>  tcg_temp_free_i32(t0);
>  
>  /* Stop translation, this gives other CPUs a chance to run */
> -gen_exception_nip(ctx, EXCP_HLT, ctx->base.pc_next);
> +gen_exception_nip(ctx, EXCP_INTERRUPT, ctx->base.pc_next);

I don't see how this change relates to the rest.

>  }
>  #endif /* defined(TARGET_PPC64) */
>  
> @@ -1662,7 +1662,9 @@ static void gen_or(DisasContext *ctx)
>   * than no-op, e.g., miso(rs=26), yield(27), mdoio(29), mdoom(30),
>   * and all currently undefined.
>   */
> -gen_pause(ctx);
> +if (!mttcg_enabled) {
> +gen_pause(ctx);
> +}
>  #endif
>  #endif
>  }

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v4 5/5] RISC-V: Add hooks to use the gdb xml files.

2019-02-12 Thread Alistair Francis

On Tue, Feb 12, 2019 at 3:10 PM Jim Wilson  wrote:
>
> The gdb CSR xml file has registers in documentation order, not numerical
> order, so we need a table to map the register numbers.  This also adds
> fairly standard gdb hooks to access xml specified registers.
>
> Signed-off-by: Jim Wilson 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/cpu.c |   9 +-
>  target/riscv/cpu.h |   2 +
>  target/riscv/gdbstub.c | 348 
> +++--
>  3 files changed, 347 insertions(+), 12 deletions(-)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index 28d7e53..c23bd01 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -311,6 +311,8 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
> **errp)
>  return;
>  }
>
> +riscv_cpu_register_gdb_regs_for_features(cs);
> +
>  qemu_init_vcpu(cs);
>  cpu_reset(cs);
>
> @@ -351,7 +353,12 @@ static void riscv_cpu_class_init(ObjectClass *c, void 
> *data)
>  cc->synchronize_from_tb = riscv_cpu_synchronize_from_tb;
>  cc->gdb_read_register = riscv_cpu_gdb_read_register;
>  cc->gdb_write_register = riscv_cpu_gdb_write_register;
> -cc->gdb_num_core_regs = 65;
> +cc->gdb_num_core_regs = 33;
> +#if defined(TARGET_RISCV32)
> +cc->gdb_core_xml_file = "riscv-32bit-cpu.xml";
> +#elif defined(TARGET_RISCV64)
> +cc->gdb_core_xml_file = "riscv-64bit-cpu.xml";
> +#endif
>  cc->gdb_stop_before_watchpoint = true;
>  cc->disas_set_info = riscv_cpu_disas_set_info;
>  #ifdef CONFIG_USER_ONLY
> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> index 04a050e..c10e86c 100644
> --- a/target/riscv/cpu.h
> +++ b/target/riscv/cpu.h
> @@ -329,6 +329,8 @@ typedef struct {
>  void riscv_get_csr_ops(int csrno, riscv_csr_operations *ops);
>  void riscv_set_csr_ops(int csrno, riscv_csr_operations *ops);
>
> +void riscv_cpu_register_gdb_regs_for_features(CPUState *cs);
> +
>  #include "exec/cpu-all.h"
>
>  #endif /* RISCV_CPU_H */
> diff --git a/target/riscv/gdbstub.c b/target/riscv/gdbstub.c
> index 3cabb21..621206d 100644
> --- a/target/riscv/gdbstub.c
> +++ b/target/riscv/gdbstub.c
> @@ -21,6 +21,255 @@
>  #include "exec/gdbstub.h"
>  #include "cpu.h"
>
> +/*
> + * The GDB CSR xml files list them in documentation order, not numerical 
> order,
> + * and are missing entries for unnamed CSRs.  So we need to map the gdb 
> numbers
> + * to the hardware numbers.
> + */
> +
> +static int csr_register_map[] = {
> +CSR_USTATUS,
> +CSR_UIE,
> +CSR_UTVEC,
> +CSR_USCRATCH,
> +CSR_UEPC,
> +CSR_UCAUSE,
> +CSR_UTVAL,
> +CSR_UIP,
> +CSR_FFLAGS,
> +CSR_FRM,
> +CSR_FCSR,
> +CSR_CYCLE,
> +CSR_TIME,
> +CSR_INSTRET,
> +CSR_HPMCOUNTER3,
> +CSR_HPMCOUNTER4,
> +CSR_HPMCOUNTER5,
> +CSR_HPMCOUNTER6,
> +CSR_HPMCOUNTER7,
> +CSR_HPMCOUNTER8,
> +CSR_HPMCOUNTER9,
> +CSR_HPMCOUNTER10,
> +CSR_HPMCOUNTER11,
> +CSR_HPMCOUNTER12,
> +CSR_HPMCOUNTER13,
> +CSR_HPMCOUNTER14,
> +CSR_HPMCOUNTER15,
> +CSR_HPMCOUNTER16,
> +CSR_HPMCOUNTER17,
> +CSR_HPMCOUNTER18,
> +CSR_HPMCOUNTER19,
> +CSR_HPMCOUNTER20,
> +CSR_HPMCOUNTER21,
> +CSR_HPMCOUNTER22,
> +CSR_HPMCOUNTER23,
> +CSR_HPMCOUNTER24,
> +CSR_HPMCOUNTER25,
> +CSR_HPMCOUNTER26,
> +CSR_HPMCOUNTER27,
> +CSR_HPMCOUNTER28,
> +CSR_HPMCOUNTER29,
> +CSR_HPMCOUNTER30,
> +CSR_HPMCOUNTER31,
> +CSR_CYCLEH,
> +CSR_TIMEH,
> +CSR_INSTRETH,
> +CSR_HPMCOUNTER3H,
> +CSR_HPMCOUNTER4H,
> +CSR_HPMCOUNTER5H,
> +CSR_HPMCOUNTER6H,
> +CSR_HPMCOUNTER7H,
> +CSR_HPMCOUNTER8H,
> +CSR_HPMCOUNTER9H,
> +CSR_HPMCOUNTER10H,
> +CSR_HPMCOUNTER11H,
> +CSR_HPMCOUNTER12H,
> +CSR_HPMCOUNTER13H,
> +CSR_HPMCOUNTER14H,
> +CSR_HPMCOUNTER15H,
> +CSR_HPMCOUNTER16H,
> +CSR_HPMCOUNTER17H,
> +CSR_HPMCOUNTER18H,
> +CSR_HPMCOUNTER19H,
> +CSR_HPMCOUNTER20H,
> +CSR_HPMCOUNTER21H,
> +CSR_HPMCOUNTER22H,
> +CSR_HPMCOUNTER23H,
> +CSR_HPMCOUNTER24H,
> +CSR_HPMCOUNTER25H,
> +CSR_HPMCOUNTER26H,
> +CSR_HPMCOUNTER27H,
> +CSR_HPMCOUNTER28H,
> +CSR_HPMCOUNTER29H,
> +CSR_HPMCOUNTER30H,
> +CSR_HPMCOUNTER31H,
> +CSR_SSTATUS,
> +CSR_SEDELEG,
> +CSR_SIDELEG,
> +CSR_SIE,
> +CSR_STVEC,
> +CSR_SCOUNTEREN,
> +CSR_SSCRATCH,
> +CSR_SEPC,
> +CSR_SCAUSE,
> +CSR_STVAL,
> +CSR_SIP,
> +CSR_SATP,
> +CSR_MVENDORID,
> +CSR_MARCHID,
> +CSR_MIMPID,
> +CSR_MHARTID,
> +CSR_MSTATUS,
> +CSR_MISA,
> +CSR_MEDELEG,
> +CSR_MIDELEG,
> +CSR_MIE,
> +CSR_MTVEC,
> +CSR_MCOUNTEREN,
> +CSR_MSCRATCH,
> +CSR_MEPC,
> +CSR_MCAUSE,
> +CSR_MTVAL,
> +CSR_MIP,
> +CSR_PMPCFG0,
> +CSR_PMPCFG1,
> +CSR_PMPCFG2,
> +CSR_PMPCFG3,
> +CSR_PMPADDR0,
> +CSR_PMPADDR1,
> +CSR_PMPADDR2,
> +CSR_PMPADDR3,
> +CSR_PMPADDR4,
> +

Re: [Qemu-devel] [PATCH v6 00/35] target/riscv: Convert to decodetree

2019-02-12 Thread Palmer Dabbelt


On Wed, 23 Jan 2019 01:25:03 PST (-0800), Bastian Koppelmann wrote:

Hi,

this patchset converts the RISC-V decoder to decodetree in four major steps:

1) Convert 32-bit instructions to decodetree [Patch 1-16]:
Many of the gen_* functions are called by the decode functions for 16-bit
and 32-bit functions. If we move translation code from the gen_*
functions to the generated trans_* functions of decode-tree, we get a lot of
duplication. Therefore, we mostly generate calls to the old gen_* function
which are properly replaced after step 2).

Each of the trans_ functions are grouped into files corresponding to their
ISA extension, e.g. addi which is in RV32I is translated in the file
'trans_rvi.inc.c'.

2) Convert 16-bit instructions to decodetree [Patch 17-19]:
All 16 bit instructions have a direct mapping to a 32 bit instruction. Thus,
we convert the arguments in the 16 bit trans_ function to the arguments of
the corresponding 32 bit instruction and call the 32 bit trans_ function.

3) Remove old manual decoding in gen_* function [Patch 20-30]:
this move all manual translation code into the trans_* instructions of
decode tree, such that we can remove the old decode_* functions.

4) Simplify RVC by reusing as much as possible from the RVG decoder as suggested
   by Richard. [Patch 31-35]

full tree available at
https://github.com/bkoppelmann/qemu/tree/riscv-dt-v6

Cheers,
Bastian

v5 -> v6:
- fixed funky indentation


Bastian Koppelmann (35):
  target/riscv: Move CPURISCVState pointer to DisasContext
  target/riscv: Activate decodetree and implemnt LUI & AUIPC
  target/riscv: Convert RVXI branch insns to decodetree
  target/riscv: Convert RV32I load/store insns to decodetree
  target/riscv: Convert RV64I load/store insns to decodetree
  target/riscv: Convert RVXI arithmetic insns to decodetree
  target/riscv: Convert RVXI fence insns to decodetree
  target/riscv: Convert RVXI csr insns to decodetree
  target/riscv: Convert RVXM insns to decodetree
  target/riscv: Convert RV32A insns to decodetree
  target/riscv: Convert RV64A insns to decodetree
  target/riscv: Convert RV32F insns to decodetree
  target/riscv: Convert RV64F insns to decodetree
  target/riscv: Convert RV32D insns to decodetree
  target/riscv: Convert RV64D insns to decodetree
  target/riscv: Convert RV priv insns to decodetree
  target/riscv: Convert quadrant 0 of RVXC insns to decodetree
  target/riscv: Convert quadrant 1 of RVXC insns to decodetree
  target/riscv: Convert quadrant 2 of RVXC insns to decodetree
  target/riscv: Remove gen_jalr()
  target/riscv: Remove manual decoding from gen_branch()
  target/riscv: Remove manual decoding from gen_load()
  target/riscv: Remove manual decoding from gen_store()
  target/riscv: Move gen_arith_imm() decoding into trans_* functions
  target/riscv: make ADD/SUB/OR/XOR/AND insn use arg lists
  target/riscv: Remove shift and slt insn manual decoding
  target/riscv: Remove manual decoding of RV32/64M insn
  target/riscv: Rename trans_arith to gen_arith
  target/riscv: Remove gen_system()
  target/riscv: Remove decode_RV32_64G()
  target/riscv: Convert @cs_2 insns to share translation functions
  target/riscv: Convert @cl_d, @cl_w, @cs_d, @cs_w insns
  target/riscv: Splice fsw_sd and flw_ld for riscv32 vs riscv64
  target/riscv: Splice remaining compressed insn pairs for riscv32 vs
riscv64
  target/riscv: Remaining rvc insn reuse 32 bit translators

 target/riscv/Makefile.objs|   22 +
 target/riscv/insn16-32.decode |   31 +
 target/riscv/insn16-64.decode |   33 +
 target/riscv/insn16.decode|  114 ++
 target/riscv/insn32-64.decode |   72 +
 target/riscv/insn32.decode|  203 ++
 .../riscv/insn_trans/trans_privileged.inc.c   |  110 +
 target/riscv/insn_trans/trans_rva.inc.c   |  207 ++
 target/riscv/insn_trans/trans_rvc.inc.c   |  149 ++
 target/riscv/insn_trans/trans_rvd.inc.c   |  388 
 target/riscv/insn_trans/trans_rvf.inc.c   |  388 
 target/riscv/insn_trans/trans_rvi.inc.c   |  568 ++
 target/riscv/insn_trans/trans_rvm.inc.c   |  107 +
 target/riscv/translate.c  | 1781 ++---
 14 files changed, 2611 insertions(+), 1562 deletions(-)
 create mode 100644 target/riscv/insn16-32.decode
 create mode 100644 target/riscv/insn16-64.decode
 create mode 100644 target/riscv/insn16.decode
 create mode 100644 target/riscv/insn32-64.decode
 create mode 100644 target/riscv/insn32.decode
 create mode 100644 target/riscv/insn_trans/trans_privileged.inc.c
 create mode 100644 target/riscv/insn_trans/trans_rva.inc.c
 create mode 100644 target/riscv/insn_trans/trans_rvc.inc.c
 create mode 100644 target/riscv/insn_trans/trans_rvd.inc.c
 create mode 100644 target/riscv/insn_trans/trans_rvf.inc.c
 create mode 100644 target/riscv/insn_trans/trans_rvi.inc.c
 create mode 100644

[Qemu-devel] [PATCH v4 4/5] RISC-V: Add debug support for accessing CSRs.

2019-02-12 Thread Jim Wilson

Add a debugger field to CPURISCVState.  Add riscv_csrrw_debug function
to set it.  Disable mode checks when debugger field true.

Signed-off-by: Jim Wilson 
Reviewed-by: Alistair Francis 
---
 target/riscv/cpu.h |  5 +
 target/riscv/csr.c | 34 ++
 2 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 743f02c..04a050e 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -170,6 +170,9 @@ struct CPURISCVState {
 
 /* physical memory protection */
 pmp_table_t pmp_state;
+
+/* True if in debugger mode.  */
+bool debugger;
 #endif
 
 float_status fp_status;
@@ -292,6 +295,8 @@ static inline void cpu_get_tb_cpu_state(CPURISCVState *env, 
target_ulong *pc,
 
 int riscv_csrrw(CPURISCVState *env, int csrno, target_ulong *ret_value,
 target_ulong new_value, target_ulong write_mask);
+int riscv_csrrw_debug(CPURISCVState *env, int csrno, target_ulong *ret_value,
+  target_ulong new_value, target_ulong write_mask);
 
 static inline void csr_write_helper(CPURISCVState *env, target_ulong val,
 int csrno)
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 5e7e7d1..de28a5d 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -46,7 +46,7 @@ void riscv_set_csr_ops(int csrno, riscv_csr_operations *ops)
 static int fs(CPURISCVState *env, int csrno)
 {
 #if !defined(CONFIG_USER_ONLY)
-if (!(env->mstatus & MSTATUS_FS)) {
+if (!env->debugger && !(env->mstatus & MSTATUS_FS)) {
 return -1;
 }
 #endif
@@ -58,7 +58,7 @@ static int ctr(CPURISCVState *env, int csrno)
 #if !defined(CONFIG_USER_ONLY)
 target_ulong ctr_en = env->priv == PRV_U ? env->scounteren :
   env->priv == PRV_S ? env->mcounteren : -1U;
-if (!(ctr_en & (1 << (csrno & 31 {
+if (!env->debugger && !(ctr_en & (1 << (csrno & 31 {
 return -1;
 }
 #endif
@@ -86,7 +86,7 @@ static int pmp(CPURISCVState *env, int csrno)
 static int read_fflags(CPURISCVState *env, int csrno, target_ulong *val)
 {
 #if !defined(CONFIG_USER_ONLY)
-if (!(env->mstatus & MSTATUS_FS)) {
+if (!env->debugger && !(env->mstatus & MSTATUS_FS)) {
 return -1;
 }
 #endif
@@ -97,7 +97,7 @@ static int read_fflags(CPURISCVState *env, int csrno, 
target_ulong *val)
 static int write_fflags(CPURISCVState *env, int csrno, target_ulong val)
 {
 #if !defined(CONFIG_USER_ONLY)
-if (!(env->mstatus & MSTATUS_FS)) {
+if (!env->debugger && !(env->mstatus & MSTATUS_FS)) {
 return -1;
 }
 env->mstatus |= MSTATUS_FS;
@@ -109,7 +109,7 @@ static int write_fflags(CPURISCVState *env, int csrno, 
target_ulong val)
 static int read_frm(CPURISCVState *env, int csrno, target_ulong *val)
 {
 #if !defined(CONFIG_USER_ONLY)
-if (!(env->mstatus & MSTATUS_FS)) {
+if (!env->debugger && !(env->mstatus & MSTATUS_FS)) {
 return -1;
 }
 #endif
@@ -120,7 +120,7 @@ static int read_frm(CPURISCVState *env, int csrno, 
target_ulong *val)
 static int write_frm(CPURISCVState *env, int csrno, target_ulong val)
 {
 #if !defined(CONFIG_USER_ONLY)
-if (!(env->mstatus & MSTATUS_FS)) {
+if (!env->debugger && !(env->mstatus & MSTATUS_FS)) {
 return -1;
 }
 env->mstatus |= MSTATUS_FS;
@@ -132,7 +132,7 @@ static int write_frm(CPURISCVState *env, int csrno, 
target_ulong val)
 static int read_fcsr(CPURISCVState *env, int csrno, target_ulong *val)
 {
 #if !defined(CONFIG_USER_ONLY)
-if (!(env->mstatus & MSTATUS_FS)) {
+if (!env->debugger && !(env->mstatus & MSTATUS_FS)) {
 return -1;
 }
 #endif
@@ -144,7 +144,7 @@ static int read_fcsr(CPURISCVState *env, int csrno, 
target_ulong *val)
 static int write_fcsr(CPURISCVState *env, int csrno, target_ulong val)
 {
 #if !defined(CONFIG_USER_ONLY)
-if (!(env->mstatus & MSTATUS_FS)) {
+if (!env->debugger && !(env->mstatus & MSTATUS_FS)) {
 return -1;
 }
 env->mstatus |= MSTATUS_FS;
@@ -772,6 +772,24 @@ int riscv_csrrw(CPURISCVState *env, int csrno, 
target_ulong *ret_value,
 return 0;
 }
 
+/*
+ * Debugger support.  If not in user mode, set env->debugger before the
+ * riscv_csrrw call and clear it after the call.
+ */
+int riscv_csrrw_debug(CPURISCVState *env, int csrno, target_ulong *ret_value,
+target_ulong new_value, target_ulong write_mask)
+{
+int ret;
+#if !defined(CONFIG_USER_ONLY)
+env->debugger = true;
+#endif
+ret = riscv_csrrw(env, csrno, ret_value, new_value, write_mask);
+#if !defined(CONFIG_USER_ONLY)
+env->debugger = false;
+#endif
+return ret;
+}
+
 /* Control and Status Register function table */
 static riscv_csr_operations csr_ops[CSR_TABLE_SIZE] = {
 /* User Floating-Point CSRs */
-- 
2.7.4

[Qemu-devel] [PATCH v4 2/5] RISC-V: Add 64-bit gdb xml files.

2019-02-12 Thread Jim Wilson

Signed-off-by: Jim Wilson 
Reviewed-by: Alistair Francis 
---
 configure   |   1 +
 gdb-xml/riscv-64bit-cpu.xml |  43 
 gdb-xml/riscv-64bit-csr.xml | 250 
 gdb-xml/riscv-64bit-fpu.xml |  52 +
 4 files changed, 346 insertions(+)
 create mode 100644 gdb-xml/riscv-64bit-cpu.xml
 create mode 100644 gdb-xml/riscv-64bit-csr.xml
 create mode 100644 gdb-xml/riscv-64bit-fpu.xml

diff --git a/configure b/configure
index febe292..d7cae4e 100755
--- a/configure
+++ b/configure
@@ -7258,6 +7258,7 @@ case "$target_name" in
 TARGET_BASE_ARCH=riscv
 TARGET_ABI_DIR=riscv
 mttcg=yes
+gdb_xml_files="riscv-64bit-cpu.xml riscv-64bit-fpu.xml riscv-64bit-csr.xml"
 target_compiler=$cross_cc_riscv64
   ;;
   sh4|sh4eb)
diff --git a/gdb-xml/riscv-64bit-cpu.xml b/gdb-xml/riscv-64bit-cpu.xml
new file mode 100644
index 000..f37d7f3
--- /dev/null
+++ b/gdb-xml/riscv-64bit-cpu.xml
@@ -0,0 +1,43 @@
+
+
+
+
+
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+
diff --git a/gdb-xml/riscv-64bit-csr.xml b/gdb-xml/riscv-64bit-csr.xml
new file mode 100644
index 000..a3de834
--- /dev/null
+++ b/gdb-xml/riscv-64bit-csr.xml
@@ -0,0 +1,250 @@
+
+
+
+
+
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+
diff --git a/gdb-xml/riscv-64bit-fpu.xml b/gdb-xml/riscv-64bit-fpu.xml
new file mode 100644
index 000..fb24b72
--- /dev/null
+++ b/gdb-xml/riscv-64bit-fpu.xml
@@ -0,0 +1,52 @@
+
+
+
+
+
+
+  
+
+
+  
+
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+
+  
+  
+  
+
-- 
2.7.4

[Qemu-devel] [PATCH v4 5/5] RISC-V: Add hooks to use the gdb xml files.

2019-02-12 Thread Jim Wilson

The gdb CSR xml file has registers in documentation order, not numerical
order, so we need a table to map the register numbers.  This also adds
fairly standard gdb hooks to access xml specified registers.

Signed-off-by: Jim Wilson 
---
 target/riscv/cpu.c |   9 +-
 target/riscv/cpu.h |   2 +
 target/riscv/gdbstub.c | 348 +++--
 3 files changed, 347 insertions(+), 12 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 28d7e53..c23bd01 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -311,6 +311,8 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
+riscv_cpu_register_gdb_regs_for_features(cs);
+
 qemu_init_vcpu(cs);
 cpu_reset(cs);
 
@@ -351,7 +353,12 @@ static void riscv_cpu_class_init(ObjectClass *c, void 
*data)
 cc->synchronize_from_tb = riscv_cpu_synchronize_from_tb;
 cc->gdb_read_register = riscv_cpu_gdb_read_register;
 cc->gdb_write_register = riscv_cpu_gdb_write_register;
-cc->gdb_num_core_regs = 65;
+cc->gdb_num_core_regs = 33;
+#if defined(TARGET_RISCV32)
+cc->gdb_core_xml_file = "riscv-32bit-cpu.xml";
+#elif defined(TARGET_RISCV64)
+cc->gdb_core_xml_file = "riscv-64bit-cpu.xml";
+#endif
 cc->gdb_stop_before_watchpoint = true;
 cc->disas_set_info = riscv_cpu_disas_set_info;
 #ifdef CONFIG_USER_ONLY
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 04a050e..c10e86c 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -329,6 +329,8 @@ typedef struct {
 void riscv_get_csr_ops(int csrno, riscv_csr_operations *ops);
 void riscv_set_csr_ops(int csrno, riscv_csr_operations *ops);
 
+void riscv_cpu_register_gdb_regs_for_features(CPUState *cs);
+
 #include "exec/cpu-all.h"
 
 #endif /* RISCV_CPU_H */
diff --git a/target/riscv/gdbstub.c b/target/riscv/gdbstub.c
index 3cabb21..621206d 100644
--- a/target/riscv/gdbstub.c
+++ b/target/riscv/gdbstub.c
@@ -21,6 +21,255 @@
 #include "exec/gdbstub.h"
 #include "cpu.h"
 
+/*
+ * The GDB CSR xml files list them in documentation order, not numerical order,
+ * and are missing entries for unnamed CSRs.  So we need to map the gdb numbers
+ * to the hardware numbers.
+ */
+
+static int csr_register_map[] = {
+CSR_USTATUS,
+CSR_UIE,
+CSR_UTVEC,
+CSR_USCRATCH,
+CSR_UEPC,
+CSR_UCAUSE,
+CSR_UTVAL,
+CSR_UIP,
+CSR_FFLAGS,
+CSR_FRM,
+CSR_FCSR,
+CSR_CYCLE,
+CSR_TIME,
+CSR_INSTRET,
+CSR_HPMCOUNTER3,
+CSR_HPMCOUNTER4,
+CSR_HPMCOUNTER5,
+CSR_HPMCOUNTER6,
+CSR_HPMCOUNTER7,
+CSR_HPMCOUNTER8,
+CSR_HPMCOUNTER9,
+CSR_HPMCOUNTER10,
+CSR_HPMCOUNTER11,
+CSR_HPMCOUNTER12,
+CSR_HPMCOUNTER13,
+CSR_HPMCOUNTER14,
+CSR_HPMCOUNTER15,
+CSR_HPMCOUNTER16,
+CSR_HPMCOUNTER17,
+CSR_HPMCOUNTER18,
+CSR_HPMCOUNTER19,
+CSR_HPMCOUNTER20,
+CSR_HPMCOUNTER21,
+CSR_HPMCOUNTER22,
+CSR_HPMCOUNTER23,
+CSR_HPMCOUNTER24,
+CSR_HPMCOUNTER25,
+CSR_HPMCOUNTER26,
+CSR_HPMCOUNTER27,
+CSR_HPMCOUNTER28,
+CSR_HPMCOUNTER29,
+CSR_HPMCOUNTER30,
+CSR_HPMCOUNTER31,
+CSR_CYCLEH,
+CSR_TIMEH,
+CSR_INSTRETH,
+CSR_HPMCOUNTER3H,
+CSR_HPMCOUNTER4H,
+CSR_HPMCOUNTER5H,
+CSR_HPMCOUNTER6H,
+CSR_HPMCOUNTER7H,
+CSR_HPMCOUNTER8H,
+CSR_HPMCOUNTER9H,
+CSR_HPMCOUNTER10H,
+CSR_HPMCOUNTER11H,
+CSR_HPMCOUNTER12H,
+CSR_HPMCOUNTER13H,
+CSR_HPMCOUNTER14H,
+CSR_HPMCOUNTER15H,
+CSR_HPMCOUNTER16H,
+CSR_HPMCOUNTER17H,
+CSR_HPMCOUNTER18H,
+CSR_HPMCOUNTER19H,
+CSR_HPMCOUNTER20H,
+CSR_HPMCOUNTER21H,
+CSR_HPMCOUNTER22H,
+CSR_HPMCOUNTER23H,
+CSR_HPMCOUNTER24H,
+CSR_HPMCOUNTER25H,
+CSR_HPMCOUNTER26H,
+CSR_HPMCOUNTER27H,
+CSR_HPMCOUNTER28H,
+CSR_HPMCOUNTER29H,
+CSR_HPMCOUNTER30H,
+CSR_HPMCOUNTER31H,
+CSR_SSTATUS,
+CSR_SEDELEG,
+CSR_SIDELEG,
+CSR_SIE,
+CSR_STVEC,
+CSR_SCOUNTEREN,
+CSR_SSCRATCH,
+CSR_SEPC,
+CSR_SCAUSE,
+CSR_STVAL,
+CSR_SIP,
+CSR_SATP,
+CSR_MVENDORID,
+CSR_MARCHID,
+CSR_MIMPID,
+CSR_MHARTID,
+CSR_MSTATUS,
+CSR_MISA,
+CSR_MEDELEG,
+CSR_MIDELEG,
+CSR_MIE,
+CSR_MTVEC,
+CSR_MCOUNTEREN,
+CSR_MSCRATCH,
+CSR_MEPC,
+CSR_MCAUSE,
+CSR_MTVAL,
+CSR_MIP,
+CSR_PMPCFG0,
+CSR_PMPCFG1,
+CSR_PMPCFG2,
+CSR_PMPCFG3,
+CSR_PMPADDR0,
+CSR_PMPADDR1,
+CSR_PMPADDR2,
+CSR_PMPADDR3,
+CSR_PMPADDR4,
+CSR_PMPADDR5,
+CSR_PMPADDR6,
+CSR_PMPADDR7,
+CSR_PMPADDR8,
+CSR_PMPADDR9,
+CSR_PMPADDR10,
+CSR_PMPADDR11,
+CSR_PMPADDR12,
+CSR_PMPADDR13,
+CSR_PMPADDR14,
+CSR_PMPADDR15,
+CSR_MCYCLE,
+CSR_MINSTRET,
+CSR_MHPMCOUNTER3,
+CSR_MHPMCOUNTER4,
+CSR_MHPMCOUNTER5,
+CSR_MHPMCOUNTER6,
+CSR_MHPMCOUNTER7,
+CSR_MHPMCOUNTER8,
+CSR_MHPMCOUNTER9,
+CSR_MHPMCOUNTER10,
+CSR_MHPMCOUNTER11,
+

[Qemu-devel] [PATCH v4 3/5] RISC-V: Fixes to CSR_* register macros.

2019-02-12 Thread Jim Wilson

This adds some missing CSR_* register macros, and documents some as being
priv v1.9.1 specific.

Signed-off-by: Jim Wilson 
Reviewed-by: Alistair Francis 
---
 target/riscv/cpu_bits.h | 35 +--
 1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index 5439f47..316d500 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -135,16 +135,22 @@
 /* Legacy Counter Setup (priv v1.9.1) */
 #define CSR_MUCOUNTEREN 0x320
 #define CSR_MSCOUNTEREN 0x321
+#define CSR_MHCOUNTEREN 0x322
 
 /* Machine Trap Handling */
 #define CSR_MSCRATCH0x340
 #define CSR_MEPC0x341
 #define CSR_MCAUSE  0x342
-#define CSR_MBADADDR0x343
+#define CSR_MTVAL   0x343
 #define CSR_MIP 0x344
 
+/* Legacy Machine Trap Handling (priv v1.9.1) */
+#define CSR_MBADADDR0x343
+
 /* Supervisor Trap Setup */
 #define CSR_SSTATUS 0x100
+#define CSR_SEDELEG 0x102
+#define CSR_SIDELEG 0x103
 #define CSR_SIE 0x104
 #define CSR_STVEC   0x105
 #define CSR_SCOUNTEREN  0x106
@@ -153,9 +159,12 @@
 #define CSR_SSCRATCH0x140
 #define CSR_SEPC0x141
 #define CSR_SCAUSE  0x142
-#define CSR_SBADADDR0x143
+#define CSR_STVAL   0x143
 #define CSR_SIP 0x144
 
+/* Legacy Supervisor Trap Handling (priv v1.9.1) */
+#define CSR_SBADADDR0x143
+
 /* Supervisor Protection and Translation */
 #define CSR_SPTBR   0x180
 #define CSR_SATP0x180
@@ -282,6 +291,28 @@
 #define CSR_MHPMCOUNTER30H  0xb9e
 #define CSR_MHPMCOUNTER31H  0xb9f
 
+/* Legacy Hypervisor Trap Setup (priv v1.9.1) */
+#define CSR_HSTATUS 0x200
+#define CSR_HEDELEG 0x202
+#define CSR_HIDELEG 0x203
+#define CSR_HIE 0x204
+#define CSR_HTVEC   0x205
+
+/* Legacy Hypervisor Trap Handling (priv v1.9.1) */
+#define CSR_HSCRATCH0x240
+#define CSR_HEPC0x241
+#define CSR_HCAUSE  0x242
+#define CSR_HBADADDR0x243
+#define CSR_HIP 0x244
+
+/* Legacy Machine Protection and Translation (priv v1.9.1) */
+#define CSR_MBASE   0x380
+#define CSR_MBOUND  0x381
+#define CSR_MIBASE  0x382
+#define CSR_MIBOUND 0x383
+#define CSR_MDBASE  0x384
+#define CSR_MDBOUND 0x385
+
 /* mstatus CSR bits */
 #define MSTATUS_UIE 0x0001
 #define MSTATUS_SIE 0x0002
-- 
2.7.4

[Qemu-devel] [PATCH v4 1/5] RISC-V: Add 32-bit gdb xml files.

2019-02-12 Thread Jim Wilson

Signed-off-by: Jim Wilson 
Reviewed-by: Alistair Francis 
---
 configure   |   1 +
 gdb-xml/riscv-32bit-cpu.xml |  43 
 gdb-xml/riscv-32bit-csr.xml | 250 
 gdb-xml/riscv-32bit-fpu.xml |  46 
 4 files changed, 340 insertions(+)
 create mode 100644 gdb-xml/riscv-32bit-cpu.xml
 create mode 100644 gdb-xml/riscv-32bit-csr.xml
 create mode 100644 gdb-xml/riscv-32bit-fpu.xml

diff --git a/configure b/configure
index fbd0825..febe292 100755
--- a/configure
+++ b/configure
@@ -7251,6 +7251,7 @@ case "$target_name" in
 TARGET_BASE_ARCH=riscv
 TARGET_ABI_DIR=riscv
 mttcg=yes
+gdb_xml_files="riscv-32bit-cpu.xml riscv-32bit-fpu.xml riscv-32bit-csr.xml"
 target_compiler=$cross_cc_riscv32
   ;;
   riscv64)
diff --git a/gdb-xml/riscv-32bit-cpu.xml b/gdb-xml/riscv-32bit-cpu.xml
new file mode 100644
index 000..c02f86c
--- /dev/null
+++ b/gdb-xml/riscv-32bit-cpu.xml
@@ -0,0 +1,43 @@
+
+
+
+
+
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+
diff --git a/gdb-xml/riscv-32bit-csr.xml b/gdb-xml/riscv-32bit-csr.xml
new file mode 100644
index 000..4aea9e6
--- /dev/null
+++ b/gdb-xml/riscv-32bit-csr.xml
@@ -0,0 +1,250 @@
+
+
+
+
+
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+
diff --git a/gdb-xml/riscv-32bit-fpu.xml b/gdb-xml/riscv-32bit-fpu.xml
new file mode 100644
index 000..783287d
--- /dev/null
+++ b/gdb-xml/riscv-32bit-fpu.xml
@@ -0,0 +1,46 @@
+
+
+
+
+
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+
+  
+  
+  
+
-- 
2.7.4

[Qemu-devel] [PATCH v4 0/5] RISC-V: Add gdb xml files and gdbstub support.

2019-02-12 Thread Jim Wilson

This is the 4th version of the patch set.  Updated as per the review
from Alistair, it has the riscv_csrrw_debug function added, and
Reviewed-By lines added.  Otherwise it is the same as the 3rd version.

Jim

Re: [Qemu-devel] [PATCH v2] blockdev: acquire aio_context for bitmap add/remove

2019-02-12 Thread Eric Blake

On 2/12/19 3:37 PM, John Snow wrote:
> 
> 
> On 2/12/19 3:16 PM, Eric Blake wrote:
>> On 2/12/19 2:07 PM, John Snow wrote:
>>> When bitmaps are persistent, they may incur a disk read or write when 
>>> bitmaps
>>> are added or removed. For configurations like virtio-dataplane, failing to
>>> acquire this lock will abort QEMU when disk IO occurs.
>>>
>>> We used to acquire aio_context as part of the bitmap lookup, so re-introduce
>>> the lock for just the cases that have an IO penalty.
>>
>> It would be nice to call out which commit id dropped the aio_context
>> acquisition during bitmap lookup (making it easier to analyze how long
>> this has broken, and which downstream builds need the backport.
>>
> 
> OK, I will amend this.
> 
> Looks like:
> 
> commit 2119882c7eb7e2c612b24fc0c8d86f5887d6f1c3
> Author: Paolo Bonzini 
> Date:   Mon Jun 5 14:39:03 2017 +0200
> 
> since 2.10.

Hmm. block-dirty-bitmap-add's "persistent":true parameter was also added
in 2.10 in commit fd5ae4cc.  In fact, 2119882c was made at a time when
there were not persistent bitmaps; so I guess that this means we have
always been broken since fd5ae4cc.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [RFC 4/4] numa: check threads of the same core are on the same node

2019-02-12 Thread Laurent Vivier

A core cannot be split between two nodes.
To check if a thread of the same core has already been assigned to a node,
this patch reverses the numa topology checking order and exits if the
topology is not valid.

Update test/numa-test accordingly.

Fixes: 722387e78daf ("spapr: get numa node mapping from possible_cpus instead 
of numa_get_node_for_cpu()")
Cc: imamm...@redhat.com
Signed-off-by: Laurent Vivier 
---
 hw/core/machine.c | 27 ---
 tests/numa-test.c |  4 ++--
 2 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index a2c29692b55e..c0a556b0dce7 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -602,6 +602,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
 MachineClass *mc = MACHINE_GET_CLASS(machine);
 bool match = false;
 int i;
+const CpuInstanceProperties *previous_props = NULL;
 
 if (!mc->possible_cpu_arch_ids) {
 error_setg(errp, "mapping of CPUs to NUMA node is not supported");
@@ -634,18 +635,38 @@ void machine_set_cpu_numa_node(MachineState *machine,
 }
 
 /* skip slots with explicit mismatch */
-if (props->has_thread_id && props->thread_id != slot->props.thread_id) 
{
+if (props->has_socket_id && props->socket_id != slot->props.socket_id) 
{
 continue;
 }
 
-if (props->has_core_id && props->core_id != slot->props.core_id) {
+if (props->has_core_id) {
+if (props->core_id != slot->props.core_id) {
 continue;
+}
+if (slot->props.has_node_id) {
+/* we have a node where our core is already assigned */
+previous_props = >props;
+}
 }
 
-if (props->has_socket_id && props->socket_id != slot->props.socket_id) 
{
+if (props->has_thread_id && props->thread_id != slot->props.thread_id) 
{
 continue;
 }
 
+/* check current thread matches node of the thread of the same core */
+if (previous_props && previous_props->has_node_id &&
+previous_props->node_id != props->node_id) {
+char *cpu_str = cpu_props_to_string(props);
+char *node_str = cpu_props_to_string(previous_props);
+error_setg(errp,  "Invalid node-id=%"PRIu64" of [%s]: core-id "
+  "[%s] is already assigned to node-id %"PRIu64,
+  props->node_id, cpu_str,
+  node_str, previous_props->node_id);
+g_free(cpu_str);
+g_free(node_str);
+return;
+}
+
 /* reject assignment if slot is already assigned, for compatibility
  * of legacy cpu_index mapping with SPAPR core based mapping do not
  * error out if cpu thread and matched core have the same node-id */
diff --git a/tests/numa-test.c b/tests/numa-test.c
index 5280573fc992..a7c3c5b4dee8 100644
--- a/tests/numa-test.c
+++ b/tests/numa-test.c
@@ -112,7 +112,7 @@ static void pc_numa_cpu(const void *data)
 "-numa cpu,node-id=1,socket-id=0 "
 "-numa cpu,node-id=0,socket-id=1,core-id=0 "
 "-numa cpu,node-id=0,socket-id=1,core-id=1,thread-id=0 "
-"-numa cpu,node-id=1,socket-id=1,core-id=1,thread-id=1");
+"-numa cpu,node-id=0,socket-id=1,core-id=1,thread-id=1");
 qtest_start(cli);
 cpus = get_cpus();
 g_assert(cpus);
@@ -141,7 +141,7 @@ static void pc_numa_cpu(const void *data)
 } else if (socket == 1 && core == 1 && thread == 0) {
 g_assert_cmpint(node, ==, 0);
 } else if (socket == 1 && core == 1 && thread == 1) {
-g_assert_cmpint(node, ==, 1);
+g_assert_cmpint(node, ==, 0);
 } else {
 g_assert(false);
 }
-- 
2.20.1

[Qemu-devel] [RFC 3/4] numa: move cpu_slot_to_string() upper in the function

2019-02-12 Thread Laurent Vivier

This will allow to use it in more functions in the future.

As we change the prototype to take directly CpuInstanceProperties
instead of CPUArchId, rename the function to cpu_props_to_string().

Signed-off-by: Laurent Vivier 
---
 hw/core/machine.c | 44 ++--
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 7c74b318f635..a2c29692b55e 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -550,6 +550,27 @@ HotpluggableCPUList 
*machine_query_hotpluggable_cpus(MachineState *machine)
 return head;
 }
 
+static char *cpu_props_to_string(const CpuInstanceProperties *props)
+{
+GString *s = g_string_new(NULL);
+if (props->has_socket_id) {
+g_string_append_printf(s, "socket-id: %"PRId64, props->socket_id);
+}
+if (props->has_core_id) {
+if (s->len) {
+g_string_append_printf(s, ", ");
+}
+g_string_append_printf(s, "core-id: %"PRId64, props->core_id);
+}
+if (props->has_thread_id) {
+if (s->len) {
+g_string_append_printf(s, ", ");
+}
+g_string_append_printf(s, "thread-id: %"PRId64, props->thread_id);
+}
+return g_string_free(s, false);
+}
+
 /**
  * machine_set_cpu_numa_node:
  * @machine: machine object to modify
@@ -849,27 +870,6 @@ bool machine_mem_merge(MachineState *machine)
 return machine->mem_merge;
 }
 
-static char *cpu_slot_to_string(const CPUArchId *cpu)
-{
-GString *s = g_string_new(NULL);
-if (cpu->props.has_socket_id) {
-g_string_append_printf(s, "socket-id: %"PRId64, cpu->props.socket_id);
-}
-if (cpu->props.has_core_id) {
-if (s->len) {
-g_string_append_printf(s, ", ");
-}
-g_string_append_printf(s, "core-id: %"PRId64, cpu->props.core_id);
-}
-if (cpu->props.has_thread_id) {
-if (s->len) {
-g_string_append_printf(s, ", ");
-}
-g_string_append_printf(s, "thread-id: %"PRId64, cpu->props.thread_id);
-}
-return g_string_free(s, false);
-}
-
 static void machine_numa_finish_cpu_init(MachineState *machine)
 {
 int i;
@@ -887,7 +887,7 @@ static void machine_numa_finish_cpu_init(MachineState 
*machine)
 default_mapping = false;
 } else {
 /* record slots with not set mapping, */
-char *cpu_str = cpu_slot_to_string(cpu_slot);
+char *cpu_str = cpu_props_to_string(_slot->props);
 g_string_append_printf(s, "%sCPU %d [%s]",
s->len ? ", " : "", i, cpu_str);
 g_free(cpu_str);
-- 
2.20.1

[Qemu-devel] [RFC 0/4] numa, spapr: add thread-id in the possible_cpus list

2019-02-12 Thread Laurent Vivier

There are inconsistencies between the command line using
"-numa node,cpus=XX" and what is checked internally:
the XX is supposed to be a CPU number, but for SPAPR
it's taken as a core number, ignoring the threads.
(See the description message of PATCH 1 for more details)

This series fixes this problem by introducing the threads
in the possible_cpus list instead of only the cores.
To avoid inconsistent topology, it doesn't allow anymore to
have an incomplete CPU NUMA config on the command line
(there was already a message announcing it will be absoleted
for 2 years).

Laurent Vivier (4):
  numa,spapr: add thread-id in the possible_cpus list
  numa: exit on incomplete CPU mapping
  numa: move cpu_slot_to_string() upper in the function
  numa: check threads of the same core are on the same node

 hw/core/machine.c | 115 ++
 hw/ppc/spapr.c|  33 ++---
 tests/numa-test.c |  24 +-
 3 files changed, 81 insertions(+), 91 deletions(-)

-- 
2.20.1

[Qemu-devel] [RFC 2/4] numa: exit on incomplete CPU mapping

2019-02-12 Thread Laurent Vivier

Change the existing message to an error and exit.

This message was a warning and comes with the information
it will be removed in the future since May 10 2017
(ec78f8114bc4 "numa: use possible_cpus for not mapped CPUs check").

Update tests/numa-test to remove the incomplete CPU mapping test.

Signed-off-by: Laurent Vivier 
---
 hw/core/machine.c | 46 +-
 tests/numa-test.c | 20 
 2 files changed, 21 insertions(+), 45 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 077fbd182adf..7c74b318f635 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -873,51 +873,47 @@ static char *cpu_slot_to_string(const CPUArchId *cpu)
 static void machine_numa_finish_cpu_init(MachineState *machine)
 {
 int i;
-bool default_mapping;
+bool default_mapping = true;
 GString *s = g_string_new(NULL);
 MachineClass *mc = MACHINE_GET_CLASS(machine);
 const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(machine);
+const CPUArchId *cpu_slot;
 
 assert(nb_numa_nodes);
 for (i = 0; i < possible_cpus->len; i++) {
-if (possible_cpus->cpus[i].props.has_node_id) {
-break;
+cpu_slot = _cpus->cpus[i];
+
+if (cpu_slot->props.has_node_id) {
+default_mapping = false;
+} else {
+/* record slots with not set mapping, */
+char *cpu_str = cpu_slot_to_string(cpu_slot);
+g_string_append_printf(s, "%sCPU %d [%s]",
+   s->len ? ", " : "", i, cpu_str);
+g_free(cpu_str);
 }
 }
-default_mapping = (i == possible_cpus->len);
+if (!default_mapping && s->len && !qtest_enabled()) {
+error_report("CPU(s) not present in any NUMA nodes: %s", s->str);
+error_report("All CPU(s) up to maxcpus must be described "
+"in NUMA config");
+g_string_free(s, true);
+exit(1);
+}
+g_string_free(s, true);
 
 for (i = 0; i < possible_cpus->len; i++) {
-const CPUArchId *cpu_slot = _cpus->cpus[i];
+cpu_slot = _cpus->cpus[i];
 
 if (!cpu_slot->props.has_node_id) {
 /* fetch default mapping from board and enable it */
 CpuInstanceProperties props = cpu_slot->props;
 
 props.node_id = mc->get_default_cpu_node_id(machine, i);
-if (!default_mapping) {
-/* record slots with not set mapping,
- * TODO: make it hard error in future */
-char *cpu_str = cpu_slot_to_string(cpu_slot);
-g_string_append_printf(s, "%sCPU %d [%s]",
-   s->len ? ", " : "", i, cpu_str);
-g_free(cpu_str);
-
-/* non mapped cpus used to fallback to node 0 */
-props.node_id = 0;
-}
-
 props.has_node_id = true;
 machine_set_cpu_numa_node(machine, , _fatal);
 }
 }
-if (s->len && !qtest_enabled()) {
-warn_report("CPU(s) not present in any NUMA nodes: %s",
-s->str);
-warn_report("All CPU(s) up to maxcpus should be described "
-"in NUMA config, ability to start up with partial NUMA "
-"mappings is obsoleted and will be removed in future");
-}
-g_string_free(s, true);
 }
 
 void machine_run_board_init(MachineState *machine)
diff --git a/tests/numa-test.c b/tests/numa-test.c
index 9824fdd5875e..5280573fc992 100644
--- a/tests/numa-test.c
+++ b/tests/numa-test.c
@@ -55,25 +55,6 @@ static void test_mon_default(const void *data)
 g_free(cli);
 }
 
-static void test_mon_partial(const void *data)
-{
-char *s;
-char *cli;
-
-cli = make_cli(data, "-smp 8 "
-   "-numa node,nodeid=0,cpus=0-1 "
-   "-numa node,nodeid=1,cpus=4-5 ");
-qtest_start(cli);
-
-s = hmp("info numa");
-g_assert(strstr(s, "node 0 cpus: 0 1 2 3 6 7"));
-g_assert(strstr(s, "node 1 cpus: 4 5"));
-g_free(s);
-
-qtest_end();
-g_free(cli);
-}
-
 static QList *get_cpus(QDict **resp)
 {
 *resp = qmp("{ 'execute': 'query-cpus' }");
@@ -333,7 +314,6 @@ int main(int argc, char **argv)
 
 qtest_add_data_func("/numa/mon/default", args, test_mon_default);
 qtest_add_data_func("/numa/mon/cpus/explicit", args, test_mon_explicit);
-qtest_add_data_func("/numa/mon/cpus/partial", args, test_mon_partial);
 qtest_add_data_func("/numa/qmp/cpus/query-cpus", args, test_query_cpus);
 
 if (!strcmp(arch, "i386") || !strcmp(arch, "x86_64")) {
-- 
2.20.1

[Qemu-devel] [RFC 1/4] numa, spapr: add thread-id in the possible_cpus list

2019-02-12 Thread Laurent Vivier

spapr_possible_cpu_arch_ids() counts only cores, and so
the number of available CPUs is the number of vCPU divided
by smp_threads.

... -smp 4,maxcpus=8,cores=2,threads=2,sockets=2 -numa node,cpus=0,cpus=1 \
 -numa node,cpus=3,cpus=4 \
 -numa node -numa node

This generates (info hotpluggable-cpus)

  node-id: 0 core-id: 0 thread-id: 0 [thread-id: 1]
  node-id: 0 core-id: 6 thread-id: 0 [thread-id: 1]
  node-id: 1 core-id: 2 thread-id: 0 [thread-id: 1]
  node-id: 1 core-id: 4 thread-id: 0 [thread-id: 1]

And this command line generates the following error:

  CPU(s) not present in any NUMA nodes: CPU 3 [core-id: 6]

That is wrong because CPU 3 [core-id: 6] is assigned to node-id 0
Moreover "cpus=4" is not valid, because it means core-id 8 but
maxcpus is 8.

With this patch we have now:

  node-id: 0 core-id: 0 thread-id: 0
  node-id: 0 core-id: 0 thread-id: 1
  node-id: 0 core-id: 1 thread-id: 0
  node-id: 1 core-id: 1 thread-id: 1
  node-id: 0 core-id: 2 thread-id: 1
  node-id: 1 core-id: 2 thread-id: 0
  node-id: 0 core-id: 3 thread-id: 1
  node-id: 0 core-id: 3 thread-id: 0

CPUs 0 (core-id: 0 thread-id: 0) and 1 (core-id: 0 thread-id: 1) are
correctly assigned to node-id 0, CPUs 3 (core-id: 1 thread-id: 1) and
 4 (core-id: 2 thread-id: 0) are correctly assigned to node-id 1.
All other CPUs are assigned to node-id 0 by default.

And the error message is also correct:

  CPU(s) not present in any NUMA nodes: CPU 2 [core-id: 1, thread-id: 0], \
CPU 5 [core-id: 2, thread-id: 1], \
CPU 6 [core-id: 3, thread-id: 0], \
CPU 7 [core-id: 3, thread-id: 1]

Fixes: ec78f8114bc4 ("numa: use possible_cpus for not mapped CPUs check")
Cc: imamm...@redhat.com

Before commit ec78f8114bc4, output was correct:

  CPU(s) not present in any NUMA nodes: 2 5 6 7

Signed-off-by: Laurent Vivier 
---
 hw/ppc/spapr.c | 33 +
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 332cba89d425..7196ba09da34 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2404,15 +2404,13 @@ static void spapr_validate_node_memory(MachineState 
*machine, Error **errp)
 /* find cpu slot in machine->possible_cpus by core_id */
 static CPUArchId *spapr_find_cpu_slot(MachineState *ms, uint32_t id, int *idx)
 {
-int index = id / smp_threads;
-
-if (index >= ms->possible_cpus->len) {
+if (id >= ms->possible_cpus->len) {
 return NULL;
 }
 if (idx) {
-*idx = index;
+*idx = id;
 }
-return >possible_cpus->cpus[index];
+return >possible_cpus->cpus[id];
 }
 
 static void spapr_set_vsmt_mode(sPAPRMachineState *spapr, Error **errp)
@@ -2514,7 +2512,7 @@ static void spapr_init_cpus(sPAPRMachineState *spapr)
 error_report("This machine version does not support CPU hotplug");
 exit(1);
 }
-boot_cores_nr = possible_cpus->len;
+boot_cores_nr = possible_cpus->len / smp_threads;
 }
 
 if (smc->pre_2_10_has_unused_icps) {
@@ -2528,7 +2526,7 @@ static void spapr_init_cpus(sPAPRMachineState *spapr)
 }
 }
 
-for (i = 0; i < possible_cpus->len; i++) {
+for (i = 0; i < possible_cpus->len / smp_threads; i++) {
 int core_id = i * smp_threads;
 
 if (mc->has_hotpluggable_cpus) {
@@ -3795,21 +3793,16 @@ spapr_cpu_index_to_props(MachineState *machine, 
unsigned cpu_index)
 
 static int64_t spapr_get_default_cpu_node_id(const MachineState *ms, int idx)
 {
-return idx / smp_cores % nb_numa_nodes;
+return idx / (smp_cores * smp_threads) % nb_numa_nodes;
 }
 
 static const CPUArchIdList *spapr_possible_cpu_arch_ids(MachineState *machine)
 {
 int i;
 const char *core_type;
-int spapr_max_cores = max_cpus / smp_threads;
-MachineClass *mc = MACHINE_GET_CLASS(machine);
 
-if (!mc->has_hotpluggable_cpus) {
-spapr_max_cores = QEMU_ALIGN_UP(smp_cpus, smp_threads) / smp_threads;
-}
 if (machine->possible_cpus) {
-assert(machine->possible_cpus->len == spapr_max_cores);
+assert(machine->possible_cpus->len == max_cpus);
 return machine->possible_cpus;
 }
 
@@ -3820,16 +3813,16 @@ static const CPUArchIdList 
*spapr_possible_cpu_arch_ids(MachineState *machine)
 }
 
 machine->possible_cpus = g_malloc0(sizeof(CPUArchIdList) +
- sizeof(CPUArchId) * spapr_max_cores);
-machine->possible_cpus->len = spapr_max_cores;
+ sizeof(CPUArchId) * max_cpus);
+machine->possible_cpus->len = max_cpus;
 for (i = 0; i < machine->possible_cpus->len; i++) {
-int core_id = i * smp_threads;
-
 machine->possible_cpus->cpus[i].type = core_type;
 machine->possible_cpus->cpus[i].vcpus_count = smp_threads;
-

[Qemu-devel] [PATCH] gdbstub: Send a reply to the vKill packet.

2019-02-12 Thread Sandra Loosemore

Per the GDB remote protocol documentation

https://sourceware.org/gdb/current/onlinedocs/gdb/Packets.html#index-vKill-packet

the debug stub is expected to send a reply to the 'vKill' packet.  At
least some versions of GDB crash if the gdb stub simply exits without
sending a reply.  This patch fixes QEMU's gdb stub to conform to the
expected behavior.

Note that QEMU's existing handling of the legacy 'k' packet is
correct: in that case GDB does not expect a reply, and QEMU does not
send one.

Signed-off-by: Sandra Loosemore 
---
 gdbstub.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gdbstub.c b/gdbstub.c
index 70cf330..eb129f6 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -1363,6 +1363,7 @@ static int gdb_handle_packet(GDBState *s, const char 
*line_buf)
 break;
 } else if (strncmp(p, "Kill;", 5) == 0) {
 /* Kill the target */
+put_packet(s, "OK");
 error_report("QEMU: Terminated via GDBstub");
 exit(0);
 } else {
-- 
2.8.1

Re: [Qemu-devel] [PATCH v2 1/2] linux-user: Add ELF_PLATFORM for arm

2019-02-12 Thread Richard Henderson

On 2/12/19 12:31 AM, Laurent Vivier wrote:
> I know nothing about ARM, but in kernel we have also a "v5t"
> (cpu_elf_name) and in QEMU we have a ARM_FEATURE_V4T which is set with
> ARM_FEATURE_V5. Is it related?

>From the ARM ARM (DDI 0406C, page A1-30):

The valid variants of ARMv4, ARMv5, and ARMv6 are as follows:
ARMv4, ARMv4T, ARMv5T, ARMv5TE, ARMv5TEJ, ARMv6, ARMv6K, ARMv6T2

So all v5 are "t".  The use of "v5t" within the kernel seems to be an outlier
and perhaps a bug to be squashed:

$ grep -r cpu_elf_name . | grep v5
./mm/proc-xsc3.S:   string  cpu_elf_name, "v5"
./mm/proc-arm1020.S:string  cpu_elf_name, "v5"
./mm/proc-arm946.S: string  cpu_elf_name, "v5t"
./mm/proc-arm1020e.S:   string  cpu_elf_name, "v5"
./mm/proc-arm1022.S:string  cpu_elf_name, "v5"
./mm/proc-feroceon.S:   string  cpu_elf_name, "v5"
./mm/proc-xscale.S: string  cpu_elf_name, "v5"
./mm/proc-mohawk.S: string  cpu_elf_name, "v5"
./mm/proc-arm926.S: string  cpu_elf_name, "v5"
./mm/proc-arm1026.S:string  cpu_elf_name, "v5"


r~

[Qemu-devel] [PATCH 2/2] qga-win: fix VSS build breakage due to unintended gnu99 C++ flag

2019-02-12 Thread Michael Roth

Commit 7be41675f7c set -std=gnu99 for C code via QEMU_CFLAGS. Currently
we generate a "custom" QEMU_CXXFLAGS for VSS DLL C++ build by
filtering out some options from QEMU_CFLAGS and adding some others.
Since we don't filter out -std=gnu99 currently this breaks builds when
VSS support is enabled.

We could keep the existing approach, filter out -std=gnu99 from
QEMU_CFLAGS, and add -std=gnu++98, like configure currently does for
QEMU_CXXFLAGS, but as it turns out our resulting QEMU_CXXFLAGS would
be exactly what configure already generates, just with these filtered
out:

  -fstack-protector-all -fstack-protector-strong

and these added:

  -Wno-unknown-pragmas -Wno-delete-non-virtual-dtor

So fix the issue by re-using configure-generated QEMU_CXXFLAGS and
just handling these specific changes.

Signed-off-by: Michael Roth 
---
 qga/vss-win32/Makefile.objs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qga/vss-win32/Makefile.objs b/qga/vss-win32/Makefile.objs
index dad9d1b0ba..fd3ba1896b 100644
--- a/qga/vss-win32/Makefile.objs
+++ b/qga/vss-win32/Makefile.objs
@@ -3,7 +3,7 @@
 qga-vss-dll-obj-y += requester.o provider.o install.o
 
 obj-qga-vss-dll-obj-y = $(addprefix $(obj)/, $(qga-vss-dll-obj-y))
-$(obj-qga-vss-dll-obj-y): QEMU_CXXFLAGS = $(filter-out -Wstrict-prototypes 
-Wmissing-prototypes -Wnested-externs -Wold-style-declaration 
-Wold-style-definition -Wredundant-decls -fstack-protector-all 
-fstack-protector-strong, $(QEMU_CFLAGS)) -Wno-unknown-pragmas 
-Wno-delete-non-virtual-dtor
+$(obj-qga-vss-dll-obj-y): QEMU_CXXFLAGS := $(filter-out -fstack-protector-all 
-fstack-protector-strong, $(QEMU_CXXFLAGS)) -Wno-unknown-pragmas 
-Wno-delete-non-virtual-dtor
 
 $(obj)/qga-vss.dll: LDFLAGS = -shared 
-Wl,--add-stdcall-alias,--enable-stdcall-fixup -lglib-2.0 -lole32 -loleaut32 
-lshlwapi -luuid -lintl -lws2_32 -static
 $(obj)/qga-vss.dll: $(obj-qga-vss-dll-obj-y) $(SRC_PATH)/$(obj)/qga-vss.def
-- 
2.17.1

[Qemu-devel] [PULL v2 00/24] target/hppa patch queue

2019-02-12 Thread Richard Henderson

v2 fixes the clang build failure.  I've addressed this by changing
the names of two of the insns.decode argument sets.  This could
probably use an additional error from decodetree.py itself...

Only reposting the changed patch, 12/24.


r~


The following changes since commit 0b5e750bea635b167eb03d86c3d9a09bbd43bc06:

  Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into 
staging (2019-02-12 10:53:37 +)

are available in the Git repository at:

  https://github.com/rth7680/qemu.git tags/pull-hppa-20190212

for you to fetch changes up to cb82c5728c3402248068002c0d13f55f9adcb073:

  hw/hppa: forward requests to CPU HPA (2019-02-12 08:59:21 -0800)


Convert to decodetree.
Fix signed overflow conditions.
Fix dcor.
Add CPU MIE to PCI address space.


Richard Henderson (20):
  target/hppa: Use DisasContextBase.is_jmp
  target/hppa: Begin using scripts/decodetree.py
  target/hppa: Convert move to/from system registers
  target/hppa: Convert remainder of system insns
  target/hppa: Unify specializations of OR
  target/hppa: Convert memory management insns
  target/hppa: Convert arithmetic/logical insns
  target/hppa: Convert indexed memory insns
  target/hppa: Convert fp multiply-add
  target/hppa: Convert conditional branches
  target/hppa: Convert shift, extract, deposit insns
  target/hppa: Convert direct and indirect branches
  target/hppa: Convert arithmetic immediate insns
  target/hppa: Convert offset memory insns
  target/hppa: Convert fp indexed memory insns
  target/hppa: Convert halt/reset insns
  target/hppa: Convert fp fused multiply-add insns
  target/hppa: Convert fp operate insns
  target/hppa: Merge translate_one into hppa_tr_translate_insn
  target/hppa: Rearrange log conditions

Sven Schnelle (4):
  target/hppa: move GETPC to HELPER() functions
  target/hppa: Fix addition '

Re: [Qemu-devel] [PATCH v2] blockdev: acquire aio_context for bitmap add/remove

2019-02-12 Thread John Snow




On 2/12/19 3:16 PM, Eric Blake wrote:
> On 2/12/19 2:07 PM, John Snow wrote:
>> When bitmaps are persistent, they may incur a disk read or write when bitmaps
>> are added or removed. For configurations like virtio-dataplane, failing to
>> acquire this lock will abort QEMU when disk IO occurs.
>>
>> We used to acquire aio_context as part of the bitmap lookup, so re-introduce
>> the lock for just the cases that have an IO penalty.
> 
> It would be nice to call out which commit id dropped the aio_context
> acquisition during bitmap lookup (making it easier to analyze how long
> this has broken, and which downstream builds need the backport.
> 

OK, I will amend this.

Looks like:

commit 2119882c7eb7e2c612b24fc0c8d86f5887d6f1c3
Author: Paolo Bonzini 
Date:   Mon Jun 5 14:39:03 2017 +0200

since 2.10.

>>
>> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1672010
>> Reported-By: Aihua Liang 
>> ---
>>  blockdev.c | 26 --
>>  1 file changed, 20 insertions(+), 6 deletions(-)
> 
> Reviewed-by: Eric Blake 
>

[Qemu-devel] [PULL v2 12/24] target/hppa: Convert direct and indirect branches

2019-02-12 Thread Richard Henderson

Tested-by: Helge Deller 
Tested-by: Sven Schnelle 
Signed-off-by: Richard Henderson 
---
 target/hppa/translate.c  | 131 +--
 target/hppa/insns.decode |  34 +-
 2 files changed, 63 insertions(+), 102 deletions(-)

diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index 83d898212e..26b5cd205b 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -895,15 +895,6 @@ static target_sreg assemble_16a(uint32_t insn)
 return x << 2;
 }
 
-static target_sreg assemble_17(uint32_t insn)
-{
-target_ureg x = -(target_ureg)(insn & 1);
-x = (x <<  5) | extract32(insn, 16, 5);
-x = (x <<  1) | extract32(insn, 2, 1);
-x = (x << 10) | extract32(insn, 3, 10);
-return x << 2;
-}
-
 static target_sreg assemble_21(uint32_t insn)
 {
 target_ureg x = -(target_ureg)(insn & 1);
@@ -914,15 +905,6 @@ static target_sreg assemble_21(uint32_t insn)
 return x << 11;
 }
 
-static target_sreg assemble_22(uint32_t insn)
-{
-target_ureg x = -(target_ureg)(insn & 1);
-x = (x << 10) | extract32(insn, 16, 10);
-x = (x <<  1) | extract32(insn, 2, 1);
-x = (x << 10) | extract32(insn, 3, 10);
-return x << 2;
-}
-
 /* The parisc documentation describes only the general interpretation of
the conditions, without describing their exact implementation.  The
interpretations do not stand up well when considering ADD,C and SUB,B.
@@ -3546,11 +3528,8 @@ static bool trans_depwi_sar(DisasContext *ctx, 
arg_depwi_sar *a)
 return do_depw_sar(ctx, a->t, a->c, a->nz, a->clen, load_const(ctx, a->i));
 }
 
-static bool trans_be(DisasContext *ctx, uint32_t insn, bool is_l)
+static bool trans_be(DisasContext *ctx, arg_be *a)
 {
-unsigned n = extract32(insn, 1, 1);
-unsigned b = extract32(insn, 21, 5);
-target_sreg disp = assemble_17(insn);
 TCGv_reg tmp;
 
 #ifdef CONFIG_USER_ONLY
@@ -3562,29 +3541,28 @@ static bool trans_be(DisasContext *ctx, uint32_t insn, 
bool is_l)
 /* Since we don't implement spaces, just branch.  Do notice the special
case of "be disp(*,r0)" using a direct branch to disp, so that we can
goto_tb to the TB containing the syscall.  */
-if (b == 0) {
-return do_dbranch(ctx, disp, is_l ? 31 : 0, n);
+if (a->b == 0) {
+return do_dbranch(ctx, a->disp, a->l, a->n);
 }
 #else
-int sp = assemble_sr3(insn);
 nullify_over(ctx);
 #endif
 
 tmp = get_temp(ctx);
-tcg_gen_addi_reg(tmp, load_gpr(ctx, b), disp);
+tcg_gen_addi_reg(tmp, load_gpr(ctx, a->b), a->disp);
 tmp = do_ibranch_priv(ctx, tmp);
 
 #ifdef CONFIG_USER_ONLY
-return do_ibranch(ctx, tmp, is_l ? 31 : 0, n);
+return do_ibranch(ctx, tmp, a->l, a->n);
 #else
 TCGv_i64 new_spc = tcg_temp_new_i64();
 
-load_spr(ctx, new_spc, sp);
-if (is_l) {
+load_spr(ctx, new_spc, a->sp);
+if (a->l) {
 copy_iaoq_entry(cpu_gr[31], ctx->iaoq_n, ctx->iaoq_n_var);
 tcg_gen_mov_i64(cpu_sr[0], cpu_iasq_f);
 }
-if (n && use_nullify_skip(ctx)) {
+if (a->n && use_nullify_skip(ctx)) {
 tcg_gen_mov_reg(cpu_iaoq_f, tmp);
 tcg_gen_addi_reg(cpu_iaoq_b, cpu_iaoq_f, 4);
 tcg_gen_mov_i64(cpu_iasq_f, new_spc);
@@ -3596,7 +3574,7 @@ static bool trans_be(DisasContext *ctx, uint32_t insn, 
bool is_l)
 }
 tcg_gen_mov_reg(cpu_iaoq_b, tmp);
 tcg_gen_mov_i64(cpu_iasq_b, new_spc);
-nullify_set(ctx, n);
+nullify_set(ctx, a->n);
 }
 tcg_temp_free_i64(new_spc);
 tcg_gen_lookup_and_goto_ptr();
@@ -3605,22 +3583,14 @@ static bool trans_be(DisasContext *ctx, uint32_t insn, 
bool is_l)
 #endif
 }
 
-static bool trans_bl(DisasContext *ctx, uint32_t insn, const DisasInsn *di)
+static bool trans_bl(DisasContext *ctx, arg_bl *a)
 {
-unsigned n = extract32(insn, 1, 1);
-unsigned link = extract32(insn, 21, 5);
-target_sreg disp = assemble_17(insn);
-
-do_dbranch(ctx, iaoq_dest(ctx, disp), link, n);
-return true;
+return do_dbranch(ctx, iaoq_dest(ctx, a->disp), a->l, a->n);
 }
 
-static bool trans_b_gate(DisasContext *ctx, uint32_t insn, const DisasInsn *di)
+static bool trans_b_gate(DisasContext *ctx, arg_b_gate *a)
 {
-unsigned n = extract32(insn, 1, 1);
-unsigned link = extract32(insn, 21, 5);
-target_sreg disp = assemble_17(insn);
-target_ureg dest = iaoq_dest(ctx, disp);
+target_ureg dest = iaoq_dest(ctx, a->disp);
 
 /* Make sure the caller hasn't done something weird with the queue.
  * ??? This is not quite the same as the PSW[B] bit, which would be
@@ -3659,65 +3629,44 @@ static bool trans_b_gate(DisasContext *ctx, uint32_t 
insn, const DisasInsn *di)
 }
 #endif
 
-do_dbranch(ctx, dest, link, n);
-return true;
+return do_dbranch(ctx, dest, a->l, a->n);
 }
 
-static bool trans_bl_long(DisasContext *ctx, uint32_t insn, const DisasInsn 
*di)
+static bool trans_blr(DisasContext *ctx, arg_blr *a)
 {
-unsigned n = extract32(insn,

[Qemu-devel] [PATCH 0/2] qga-win: fixes for builds with VSS/fsfreeze enabled

2019-02-12 Thread Michael Roth

These fix a couple build regressions that have slipped in over that past
couple months and hopefully will help avoid future breakages.

 qga/vss-win32/Makefile.objs | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

[Qemu-devel] [PATCH 1/2] qga-win: include glib when building VSS DLL

2019-02-12 Thread Michael Roth

Commit 3ebee3b191e defined assert() as g_assert(), but when we build
the VSS DLL component of QGA (to handle fsfreeze) we do not include
glib, which results in breakage when building with VSS support enabled.

Fix this by including glib. Since the VSS DLL is built statically,
this introduces an additional dependency on static glib and supporting
libs for the mingw environment (possibly why we didn't include glib
originally), but VSS support already has very specific prerequisites
so it shouldn't affect too many build environments.

Since the VSS DLL code does use qemu/osdep.h, this should also help
avoid future breakages and possibly allow for some clean ups in current
VSS code.

Suggested-by: Daniel P. Berrangé 
Cc: Daniel P. Berrangé 
Cc: qemu-sta...@nongnu.org
Signed-off-by: Michael Roth 
---
 qga/vss-win32/Makefile.objs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qga/vss-win32/Makefile.objs b/qga/vss-win32/Makefile.objs
index 23d08da225..dad9d1b0ba 100644
--- a/qga/vss-win32/Makefile.objs
+++ b/qga/vss-win32/Makefile.objs
@@ -5,7 +5,7 @@ qga-vss-dll-obj-y += requester.o provider.o install.o
 obj-qga-vss-dll-obj-y = $(addprefix $(obj)/, $(qga-vss-dll-obj-y))
 $(obj-qga-vss-dll-obj-y): QEMU_CXXFLAGS = $(filter-out -Wstrict-prototypes 
-Wmissing-prototypes -Wnested-externs -Wold-style-declaration 
-Wold-style-definition -Wredundant-decls -fstack-protector-all 
-fstack-protector-strong, $(QEMU_CFLAGS)) -Wno-unknown-pragmas 
-Wno-delete-non-virtual-dtor
 
-$(obj)/qga-vss.dll: LDFLAGS = -shared 
-Wl,--add-stdcall-alias,--enable-stdcall-fixup -lole32 -loleaut32 -lshlwapi 
-luuid -static
+$(obj)/qga-vss.dll: LDFLAGS = -shared 
-Wl,--add-stdcall-alias,--enable-stdcall-fixup -lglib-2.0 -lole32 -loleaut32 
-lshlwapi -luuid -lintl -lws2_32 -static
 $(obj)/qga-vss.dll: $(obj-qga-vss-dll-obj-y) $(SRC_PATH)/$(obj)/qga-vss.def
$(call quiet-command,$(CXX) -o $@ $(qga-vss-dll-obj-y) 
$(SRC_PATH)/qga/vss-win32/qga-vss.def $(CXXFLAGS) 
$(LDFLAGS),"LINK","$(TARGET_DIR)$@")
 
-- 
2.17.1

[Qemu-devel] [Bug 1813034] Re: create_elf_tables() doesn't set AT_PLATFORM for 32bit ARM platforms

2019-02-12 Thread Richard Henderson

Patches posted:
https://lists.gnu.org/archive/html/qemu-devel/2019-02/msg02863.html

** Changed in: qemu
 Assignee: (unassigned) => Richard Henderson (rth)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1813034

Title:
  create_elf_tables() doesn't set AT_PLATFORM for 32bit ARM  platforms

Status in QEMU:
  New

Bug description:
  The dynamic linker uses AT_PLATFORM from getauxval to substitute
  $PLATFORM in certain places (man ld.so). It would be nice if it was
  set to 'v6l', 'v7l' and whatever other platforms there are according
  to the chosen CPU or via an environment variable. AT_PLATFORM is not
  guaranteed to be set, so this isn't a major bug, but this is one case
  where it makes things difficult.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1813034/+subscriptions

1 2 3 4 >

1 - 100 of 363 matches

Mail list logo