Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-02-04 Thread Christoph Hellwig
On Mon, Feb 04, 2019 at 04:38:21PM -0500, Michael S. Tsirkin wrote:
> It was designed to make, when set, as many guests as we can work
> correctly, and it seems to be successful in doing exactly that.
> 
> Unfortunately there could be legacy guests that do work correctly but
> become slow. Whether trying to somehow work around that
> can paint us into a corner where things again don't
> work for some people is a question worth discussing.

The other problem is that some qemu machines just throw passthrough
devices and virtio devices on the same virtual PCI(e) bus, and have a
common IOMMU setup for the whole bus / root port / domain.  I think
this is completely bogus, but unfortunately it is out in the field.

Given that power is one of these examples I suspect that is what
Thiago referes to.  But in this case the answer can't be that we
pile on hack ontop of another, but instead introduce a new qemu
machine that separates these clearly, and make that mandatory for
the secure guest support.


Re: [PATCH v2] powerpc: drop page_is_ram() and walk_system_ram_range()

2019-02-04 Thread Christophe Leroy




Le 04/02/2019 à 11:24, Michael Ellerman a écrit :

Christophe Leroy  writes:


Since commit c40dd2f76644 ("powerpc: Add System RAM to /proc/iomem")
it is possible to use the generic walk_system_ram_range() and
the generic page_is_ram().

To enable the use of walk_system_ram_range() by the IBM EHEA
ethernet driver, the generic function has to be exported.


I'm not sure if we have a policy on that, but I suspect we'd rather not
add a new export on all arches unless we need to. Especially seeing as
the only user is the EHEA code which is heavily in maintenance mode.


If you take the exemple of function walk_iomem_res_desc(), that's 
similar. It is only used by x86 it seems and exported for nvdimm/e820 
driver only.


See commit d76401ade0bb6ab0a7 ("libnvdimm, e820: Register all pmem 
resources")




I'll put the export in powerpc code and make sure that builds.


I thought there was a rule that EXPORT_SYMBOL has to immediately follow 
the function it exports. At least checkpatch checks for that.


Christophe




As powerpc was the only (last?) user of CONFIG_ARCH_HAS_WALK_MEMORY,
the #ifdef around the generic walk_system_ram_range() has become
useless and can be dropped.


Yes it was the only user:

a99824f327c7 ("[POWERPC] Add arch-specific walk_memory_remove() for 64-bit 
powerpc")

I'll update the changelog.

cheers



Fixes: c40dd2f76644 ("powerpc: Add System RAM to /proc/iomem")
Signed-off-by: Christophe Leroy 
---
  arch/powerpc/Kconfig|  3 ---
  arch/powerpc/include/asm/page.h |  1 -
  arch/powerpc/mm/mem.c   | 33 -
  kernel/resource.c   |  5 +
  4 files changed, 1 insertion(+), 41 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2890d36eb531..f92e6754edf1 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -478,9 +478,6 @@ config ARCH_CPU_PROBE_RELEASE
  config ARCH_ENABLE_MEMORY_HOTPLUG
def_bool y
  
-config ARCH_HAS_WALK_MEMORY

-   def_bool y
-
  config ARCH_ENABLE_MEMORY_HOTREMOVE
def_bool y
  
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h

index 5c5ea2413413..aa4497175bd3 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -326,7 +326,6 @@ struct page;
  extern void clear_user_page(void *page, unsigned long vaddr, struct page *pg);
  extern void copy_user_page(void *to, void *from, unsigned long vaddr,
struct page *p);
-extern int page_is_ram(unsigned long pfn);
  extern int devmem_is_allowed(unsigned long pfn);
  
  #ifdef CONFIG_PPC_SMLPAR

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 33cc6f676fa6..fa9916c2c662 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -80,11 +80,6 @@ static inline pte_t *virt_to_kpte(unsigned long vaddr)
  #define TOP_ZONE ZONE_NORMAL
  #endif
  
-int page_is_ram(unsigned long pfn)

-{
-   return memblock_is_memory(__pfn_to_phys(pfn));
-}
-
  pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
  unsigned long size, pgprot_t vma_prot)
  {
@@ -176,34 +171,6 @@ int __meminit arch_remove_memory(int nid, u64 start, u64 
size,
  #endif
  #endif /* CONFIG_MEMORY_HOTPLUG */
  
-/*

- * walk_memory_resource() needs to make sure there is no holes in a given
- * memory range.  PPC64 does not maintain the memory layout in /proc/iomem.
- * Instead it maintains it in memblock.memory structures.  Walk through the
- * memory regions, find holes and callback for contiguous regions.
- */
-int
-walk_system_ram_range(unsigned long start_pfn, unsigned long nr_pages,
-   void *arg, int (*func)(unsigned long, unsigned long, void *))
-{
-   struct memblock_region *reg;
-   unsigned long end_pfn = start_pfn + nr_pages;
-   unsigned long tstart, tend;
-   int ret = -1;
-
-   for_each_memblock(memory, reg) {
-   tstart = max(start_pfn, memblock_region_memory_base_pfn(reg));
-   tend = min(end_pfn, memblock_region_memory_end_pfn(reg));
-   if (tstart >= tend)
-   continue;
-   ret = (*func)(tstart, tend - tstart, arg);
-   if (ret)
-   break;
-   }
-   return ret;
-}
-EXPORT_SYMBOL_GPL(walk_system_ram_range);
-
  #ifndef CONFIG_NEED_MULTIPLE_NODES
  void __init mem_topology_setup(void)
  {
diff --git a/kernel/resource.c b/kernel/resource.c
index 915c02e8e5dd..2e1636041508 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -448,8 +448,6 @@ int walk_mem_res(u64 start, u64 end, void *arg,
 arg, func);
  }
  
-#if !defined(CONFIG_ARCH_HAS_WALK_MEMORY)

-
  /*
   * This function calls the @func callback against all memory ranges of type
   * System RAM which are marked as IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY.
@@ -480,8 +478,7 @@ int walk_system_ram_range(unsigned long start_pfn, unsigned 
long nr_pages,
}

Re: [PATCH 15/19] KVM: PPC: Book3S HV: add get/set accessors for the source configuration

2019-02-04 Thread David Gibson
On Mon, Feb 04, 2019 at 05:07:28PM +0100, Cédric Le Goater wrote:
> On 2/4/19 6:21 AM, David Gibson wrote:
> > On Mon, Jan 07, 2019 at 07:43:27PM +0100, Cédric Le Goater wrote:
> >> Theses are use to capure the XIVE EAS table of the KVM device, the
> >> configuration of the source targets.
> >>
> >> Signed-off-by: Cédric Le Goater 
> >> ---
> >>  arch/powerpc/include/uapi/asm/kvm.h   | 11 
> >>  arch/powerpc/kvm/book3s_xive_native.c | 87 +++
> >>  2 files changed, 98 insertions(+)
> >>
> >> diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
> >> b/arch/powerpc/include/uapi/asm/kvm.h
> >> index 1a8740629acf..faf024f39858 100644
> >> --- a/arch/powerpc/include/uapi/asm/kvm.h
> >> +++ b/arch/powerpc/include/uapi/asm/kvm.h
> >> @@ -683,9 +683,20 @@ struct kvm_ppc_cpu_char {
> >>  #define   KVM_DEV_XIVE_SAVE_EQ_PAGES  4
> >>  #define KVM_DEV_XIVE_GRP_SOURCES  2   /* 64-bit source attributes */
> >>  #define KVM_DEV_XIVE_GRP_SYNC 3   /* 64-bit source 
> >> attributes */
> >> +#define KVM_DEV_XIVE_GRP_EAS  4   /* 64-bit eas 
> >> attributes */
> >>  
> >>  /* Layout of 64-bit XIVE source attribute values */
> >>  #define KVM_XIVE_LEVEL_SENSITIVE  (1ULL << 0)
> >>  #define KVM_XIVE_LEVEL_ASSERTED   (1ULL << 1)
> >>  
> >> +/* Layout of 64-bit eas attribute values */
> >> +#define KVM_XIVE_EAS_PRIORITY_SHIFT   0
> >> +#define KVM_XIVE_EAS_PRIORITY_MASK0x7
> >> +#define KVM_XIVE_EAS_SERVER_SHIFT 3
> >> +#define KVM_XIVE_EAS_SERVER_MASK  0xfff8ULL
> >> +#define KVM_XIVE_EAS_MASK_SHIFT   32
> >> +#define KVM_XIVE_EAS_MASK_MASK0x1ULL
> >> +#define KVM_XIVE_EAS_EISN_SHIFT   33
> >> +#define KVM_XIVE_EAS_EISN_MASK0xfffeULL
> >> +
> >>  #endif /* __LINUX_KVM_POWERPC_H */
> >> diff --git a/arch/powerpc/kvm/book3s_xive_native.c 
> >> b/arch/powerpc/kvm/book3s_xive_native.c
> >> index f2de1bcf3b35..0468b605baa7 100644
> >> --- a/arch/powerpc/kvm/book3s_xive_native.c
> >> +++ b/arch/powerpc/kvm/book3s_xive_native.c
> >> @@ -525,6 +525,88 @@ static int kvmppc_xive_native_sync(struct kvmppc_xive 
> >> *xive, long irq, u64 addr)
> >>return 0;
> >>  }
> >>  
> >> +static int kvmppc_xive_native_set_eas(struct kvmppc_xive *xive, long irq,
> >> +u64 addr)
> > 
> > I'd prefer to avoid the name "EAS" here.  IIUC these aren't "raw" EAS
> > values, but rather essentially the "source config" in the terminology
> > of the PAPR hcalls.  Which, yes, is basically implemented by setting
> > the EAS, but since it's the PAPR architected state that we need to
> > preserve across migration, I'd prefer to stick as close as we can to
> > the PAPR terminology.
> 
> But we don't have an equivalent name in the PAPR specs for the tuple 
> (prio, server). We could use the generic 'target' name may be ? even 
> if this is usually referring to a CPU number.

Um.. what?  That's about terminology for one of the fields in this
thing, not about the name for the thing itself.

> Or, IVE (Interrupt Vector Entry) ? which makes some sense. 
> This is was the former name in HW. I think we recycle it for KVM.

That's a terrible idea, which will make a confusing situation even
more confusing.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 05/19] KVM: PPC: Book3S HV: add a new KVM device for the XIVE native exploitation mode

2019-02-04 Thread David Gibson
On Mon, Feb 04, 2019 at 12:19:07PM +0100, Cédric Le Goater wrote:
> On 2/4/19 5:25 AM, David Gibson wrote:
> > On Mon, Jan 07, 2019 at 07:43:17PM +0100, Cédric Le Goater wrote:
> >> This is the basic framework for the new KVM device supporting the XIVE
> >> native exploitation mode. The user interface exposes a new capability
> >> and a new KVM device to be used by QEMU.
> >>
> >> Internally, the interface to the new KVM device is protected with a
> >> new interrupt mode: KVMPPC_IRQ_XIVE.
> >>
> >> Signed-off-by: Cédric Le Goater 
> >> ---
> >>  arch/powerpc/include/asm/kvm_host.h   |   2 +
> >>  arch/powerpc/include/asm/kvm_ppc.h|  21 ++
> >>  arch/powerpc/kvm/book3s_xive.h|   3 +
> >>  include/uapi/linux/kvm.h  |   3 +
> >>  arch/powerpc/kvm/book3s.c |   7 +-
> >>  arch/powerpc/kvm/book3s_xive_native.c | 332 ++
> >>  arch/powerpc/kvm/powerpc.c|  30 +++
> >>  arch/powerpc/kvm/Makefile |   2 +-
> >>  8 files changed, 398 insertions(+), 2 deletions(-)
> >>  create mode 100644 arch/powerpc/kvm/book3s_xive_native.c
> >>
> >> diff --git a/arch/powerpc/include/asm/kvm_host.h 
> >> b/arch/powerpc/include/asm/kvm_host.h
> >> index 0f98f00da2ea..c522e8274ad9 100644
> >> --- a/arch/powerpc/include/asm/kvm_host.h
> >> +++ b/arch/powerpc/include/asm/kvm_host.h
> >> @@ -220,6 +220,7 @@ extern struct kvm_device_ops kvm_xics_ops;
> >>  struct kvmppc_xive;
> >>  struct kvmppc_xive_vcpu;
> >>  extern struct kvm_device_ops kvm_xive_ops;
> >> +extern struct kvm_device_ops kvm_xive_native_ops;
> >>  
> >>  struct kvmppc_passthru_irqmap;
> >>  
> >> @@ -446,6 +447,7 @@ struct kvmppc_passthru_irqmap {
> >>  #define KVMPPC_IRQ_DEFAULT0
> >>  #define KVMPPC_IRQ_MPIC   1
> >>  #define KVMPPC_IRQ_XICS   2 /* Includes a XIVE option */
> >> +#define KVMPPC_IRQ_XIVE   3 /* XIVE native exploitation mode */
> >>  
> >>  #define MMIO_HPTE_CACHE_SIZE  4
> >>  
> >> diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
> >> b/arch/powerpc/include/asm/kvm_ppc.h
> >> index eb0d79f0ca45..1bb313f238fe 100644
> >> --- a/arch/powerpc/include/asm/kvm_ppc.h
> >> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> >> @@ -591,6 +591,18 @@ extern int kvmppc_xive_set_icp(struct kvm_vcpu *vcpu, 
> >> u64 icpval);
> >>  extern int kvmppc_xive_set_irq(struct kvm *kvm, int irq_source_id, u32 
> >> irq,
> >>   int level, bool line_status);
> >>  extern void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu);
> >> +
> >> +static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu)
> >> +{
> >> +  return vcpu->arch.irq_type == KVMPPC_IRQ_XIVE;
> >> +}
> >> +
> >> +extern int kvmppc_xive_native_connect_vcpu(struct kvm_device *dev,
> >> +  struct kvm_vcpu *vcpu, u32 cpu);
> >> +extern void kvmppc_xive_native_cleanup_vcpu(struct kvm_vcpu *vcpu);
> >> +extern void kvmppc_xive_native_init_module(void);
> >> +extern void kvmppc_xive_native_exit_module(void);
> >> +
> >>  #else
> >>  static inline int kvmppc_xive_set_xive(struct kvm *kvm, u32 irq, u32 
> >> server,
> >>   u32 priority) { return -1; }
> >> @@ -614,6 +626,15 @@ static inline int kvmppc_xive_set_icp(struct kvm_vcpu 
> >> *vcpu, u64 icpval) { retur
> >>  static inline int kvmppc_xive_set_irq(struct kvm *kvm, int irq_source_id, 
> >> u32 irq,
> >>  int level, bool line_status) { return 
> >> -ENODEV; }
> >>  static inline void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu) { }
> >> +
> >> +static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu)
> >> +  { return 0; }
> >> +static inline int kvmppc_xive_native_connect_vcpu(struct kvm_device *dev,
> >> +struct kvm_vcpu *vcpu, u32 
> >> cpu) { return -EBUSY; }
> >> +static inline void kvmppc_xive_native_cleanup_vcpu(struct kvm_vcpu *vcpu) 
> >> { }
> >> +static inline void kvmppc_xive_native_init_module(void) { }
> >> +static inline void kvmppc_xive_native_exit_module(void) { }
> >> +
> >>  #endif /* CONFIG_KVM_XIVE */
> >>  
> >>  /*
> >> diff --git a/arch/powerpc/kvm/book3s_xive.h 
> >> b/arch/powerpc/kvm/book3s_xive.h
> >> index 10c4aa5cd010..5f22415520b4 100644
> >> --- a/arch/powerpc/kvm/book3s_xive.h
> >> +++ b/arch/powerpc/kvm/book3s_xive.h
> >> @@ -12,6 +12,9 @@
> >>  #ifdef CONFIG_KVM_XICS
> >>  #include "book3s_xics.h"
> >>  
> >> +#define KVMPPC_XIVE_FIRST_IRQ 0
> >> +#define KVMPPC_XIVE_NR_IRQS   KVMPPC_XICS_NR_IRQS
> >> +
> >>  /*
> >>   * State for one guest irq source.
> >>   *
> >> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> >> index 6d4ea4b6c922..52bf74a1616e 100644
> >> --- a/include/uapi/linux/kvm.h
> >> +++ b/include/uapi/linux/kvm.h
> >> @@ -988,6 +988,7 @@ struct kvm_ppc_resize_hpt {
> >>  #define KVM_CAP_ARM_VM_IPA_SIZE 165
> >>  #define KVM_CAP_MANUAL_DIRTY_LOG_PROTECT 166
> >>  #define KVM_CAP_HYPERV_CPUID 167
> >> +#define 

Re: [PATCH 14/19] KVM: PPC: Book3S HV: add a control to make the XIVE EQ pages dirty

2019-02-04 Thread David Gibson
On Mon, Feb 04, 2019 at 04:46:00PM +0100, Cédric Le Goater wrote:
> On 2/4/19 6:18 AM, David Gibson wrote:
> > On Mon, Jan 07, 2019 at 07:43:26PM +0100, Cédric Le Goater wrote:
> >> When the VM is stopped in a migration sequence, the sources are masked
> >> and the XIVE IC is synced to stabilize the EQs. When done, the KVM
> >> ioctl KVM_DEV_XIVE_SAVE_EQ_PAGES is called to mark dirty the EQ pages.
> >>
> >> The migration can then transfer the remaining dirty pages to the
> >> destination and start collecting the state of the devices.
> > 
> > Is there a reason to make this a separate step from the SYNC
> > operation?
> 
> Hmm, apart from letting QEMU orchestrate the migration step by step, no.
> 
> We could merge the SYNC and the SAVE_EQ_PAGES in a single KVM operation. 
> I think that should be fine.

I think that makes sense.  SYNC is supposed to complete delivery of
any in-flight interrupts, and to me writing to the queue page and
marking it dirty as a result is a logical part of that.

> However, it does not make sense to call this operation without the VM 
> being stopped. I wonder how this can checked from KVM. May be we
> can't.

I don't think it matters.  qemu is allowed to shoot itself in the
foot.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 06/19] KVM: PPC: Book3S HV: add a GET_ESB_FD control to the XIVE native device

2019-02-04 Thread David Gibson
On Mon, Feb 04, 2019 at 12:30:39PM +0100, Cédric Le Goater wrote:
> On 2/4/19 5:45 AM, David Gibson wrote:
> > On Mon, Jan 07, 2019 at 07:43:18PM +0100, Cédric Le Goater wrote:
> >> This will let the guest create a memory mapping to expose the ESB MMIO
> >> regions used to control the interrupt sources, to trigger events, to
> >> EOI or to turn off the sources.
> >>
> >> Signed-off-by: Cédric Le Goater 
> >> ---
> >>  arch/powerpc/include/uapi/asm/kvm.h   |  4 ++
> >>  arch/powerpc/kvm/book3s_xive_native.c | 97 +++
> >>  2 files changed, 101 insertions(+)
> >>
> >> diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
> >> b/arch/powerpc/include/uapi/asm/kvm.h
> >> index 8c876c166ef2..6bb61ba141c2 100644
> >> --- a/arch/powerpc/include/uapi/asm/kvm.h
> >> +++ b/arch/powerpc/include/uapi/asm/kvm.h
> >> @@ -675,4 +675,8 @@ struct kvm_ppc_cpu_char {
> >>  #define  KVM_XICS_PRESENTED   (1ULL << 43)
> >>  #define  KVM_XICS_QUEUED  (1ULL << 44)
> >>  
> >> +/* POWER9 XIVE Native Interrupt Controller */
> >> +#define KVM_DEV_XIVE_GRP_CTRL 1
> >> +#define   KVM_DEV_XIVE_GET_ESB_FD 1
> > 
> > Introducing a new FD for ESB and TIMA seems overkill.  Can't you get
> > to both with an mmap() directly on the xive device fd?  Using the
> > offset to distinguish which one to map, obviously.
> 
> The page offset would define some sort of user API. It seems feasible.
> But I am not sure this would be practical in the future if we need to 
> tune the length.

Um.. why not?  I mean, yes the XIVE supports rather a lot of
interrupts, but we have 64-bits of offset we can play with - we can
leave room for billions of ESB slots and still have room for billions
of VPs.

> The TIMA has two pages that can be exposed at guest level for interrupt 
> management : the OS and the USER page. That should be OK.
> 
> But we might want to map only portions of the interrupt ESB space, for 
> PCI passthrough for instance as Paul proposed. I am still looking at that.
> 
> Thanks,
> 
> C.
> 
> >>  #endif /* __LINUX_KVM_POWERPC_H */
> >> diff --git a/arch/powerpc/kvm/book3s_xive_native.c 
> >> b/arch/powerpc/kvm/book3s_xive_native.c
> >> index 115143e76c45..e20081f0c8d4 100644
> >> --- a/arch/powerpc/kvm/book3s_xive_native.c
> >> +++ b/arch/powerpc/kvm/book3s_xive_native.c
> >> @@ -153,6 +153,85 @@ int kvmppc_xive_native_connect_vcpu(struct kvm_device 
> >> *dev,
> >>return rc;
> >>  }
> >>  
> >> +static int xive_native_esb_fault(struct vm_fault *vmf)
> >> +{
> >> +  struct vm_area_struct *vma = vmf->vma;
> >> +  struct kvmppc_xive *xive = vma->vm_file->private_data;
> >> +  struct kvmppc_xive_src_block *sb;
> >> +  struct kvmppc_xive_irq_state *state;
> >> +  struct xive_irq_data *xd;
> >> +  u32 hw_num;
> >> +  u16 src;
> >> +  u64 page;
> >> +  unsigned long irq;
> >> +
> >> +  /*
> >> +   * Linux/KVM uses a two pages ESB setting, one for trigger and
> >> +   * one for EOI
> >> +   */
> >> +  irq = vmf->pgoff / 2;
> >> +
> >> +  sb = kvmppc_xive_find_source(xive, irq, );
> >> +  if (!sb) {
> >> +  pr_err("%s: source %lx not found !\n", __func__, irq);
> >> +  return VM_FAULT_SIGBUS;
> >> +  }
> >> +
> >> +  state = >irq_state[src];
> >> +  kvmppc_xive_select_irq(state, _num, );
> >> +
> >> +  arch_spin_lock(>lock);
> >> +
> >> +  /*
> >> +   * first/even page is for trigger
> >> +   * second/odd page is for EOI and management.
> >> +   */
> >> +  page = vmf->pgoff % 2 ? xd->eoi_page : xd->trig_page;
> >> +  arch_spin_unlock(>lock);
> >> +
> >> +  if (!page) {
> >> +  pr_err("%s: acessing invalid ESB page for source %lx !\n",
> >> + __func__, irq);
> >> +  return VM_FAULT_SIGBUS;
> >> +  }
> >> +
> >> +  vmf_insert_pfn(vma, vmf->address, page >> PAGE_SHIFT);
> >> +  return VM_FAULT_NOPAGE;
> >> +}
> >> +
> >> +static const struct vm_operations_struct xive_native_esb_vmops = {
> >> +  .fault = xive_native_esb_fault,
> >> +};
> >> +
> >> +static int xive_native_esb_mmap(struct file *file, struct vm_area_struct 
> >> *vma)
> >> +{
> >> +  /* There are two ESB pages (trigger and EOI) per IRQ */
> >> +  if (vma_pages(vma) + vma->vm_pgoff > KVMPPC_XIVE_NR_IRQS * 2)
> >> +  return -EINVAL;
> >> +
> >> +  vma->vm_flags |= VM_IO | VM_PFNMAP;
> >> +  vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> >> +  vma->vm_ops = _native_esb_vmops;
> >> +  return 0;
> >> +}
> >> +
> >> +static const struct file_operations xive_native_esb_fops = {
> >> +  .mmap = xive_native_esb_mmap,
> >> +};
> >> +
> >> +static int kvmppc_xive_native_get_esb_fd(struct kvmppc_xive *xive, u64 
> >> addr)
> >> +{
> >> +  u64 __user *ubufp = (u64 __user *) addr;
> >> +  int ret;
> >> +
> >> +  ret = anon_inode_getfd("[xive-esb]", _native_esb_fops, xive,
> >> +  O_RDWR | O_CLOEXEC);
> >> +  if (ret < 0)
> >> +  return ret;
> >> +
> >> +  return put_user(ret, ubufp);
> >> +}
> >> +
> >>  static int kvmppc_xive_native_set_attr(struct 

Re: [PATCH 17/19] KVM: PPC: Book3S HV: add get/set accessors for the VP XIVE state

2019-02-04 Thread David Gibson
On Mon, Feb 04, 2019 at 07:57:26PM +0100, Cédric Le Goater wrote:
> On 2/4/19 6:26 AM, David Gibson wrote:
> > On Mon, Jan 07, 2019 at 08:10:04PM +0100, Cédric Le Goater wrote:
> >> At a VCPU level, the state of the thread context interrupt management
> >> registers needs to be collected. These registers are cached under the
> >> 'xive_saved_state.w01' field of the VCPU when the VPCU context is
> >> pulled from the HW thread. An OPAL call retrieves the backup of the
> >> IPB register in the NVT structure and merges it in the KVM state.
> >>
> >> The structures of the interface between QEMU and KVM provisions some
> >> extra room (two u64) for further extensions if more state needs to be
> >> transferred back to QEMU.
> >>
> >> Signed-off-by: Cédric Le Goater 
> >> ---
> >>  arch/powerpc/include/asm/kvm_ppc.h|  5 ++
> >>  arch/powerpc/include/uapi/asm/kvm.h   |  2 +
> >>  arch/powerpc/kvm/book3s.c | 24 +
> >>  arch/powerpc/kvm/book3s_xive_native.c | 78 +++
> >>  4 files changed, 109 insertions(+)
> >>
> >> diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
> >> b/arch/powerpc/include/asm/kvm_ppc.h
> >> index 4cc897039485..49c488af168c 100644
> >> --- a/arch/powerpc/include/asm/kvm_ppc.h
> >> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> >> @@ -270,6 +270,7 @@ union kvmppc_one_reg {
> >>u64 addr;
> >>u64 length;
> >>}   vpaval;
> >> +  u64 xive_timaval[4];
> >>  };
> >>  
> >>  struct kvmppc_ops {
> >> @@ -603,6 +604,8 @@ extern void kvmppc_xive_native_cleanup_vcpu(struct 
> >> kvm_vcpu *vcpu);
> >>  extern void kvmppc_xive_native_init_module(void);
> >>  extern void kvmppc_xive_native_exit_module(void);
> >>  extern int kvmppc_xive_native_hcall(struct kvm_vcpu *vcpu, u32 cmd);
> >> +extern int kvmppc_xive_native_get_vp(struct kvm_vcpu *vcpu, union 
> >> kvmppc_one_reg *val);
> >> +extern int kvmppc_xive_native_set_vp(struct kvm_vcpu *vcpu, union 
> >> kvmppc_one_reg *val);
> >>  
> >>  #else
> >>  static inline int kvmppc_xive_set_xive(struct kvm *kvm, u32 irq, u32 
> >> server,
> >> @@ -637,6 +640,8 @@ static inline void 
> >> kvmppc_xive_native_init_module(void) { }
> >>  static inline void kvmppc_xive_native_exit_module(void) { }
> >>  static inline int kvmppc_xive_native_hcall(struct kvm_vcpu *vcpu, u32 cmd)
> >>{ return 0; }
> >> +static inline int kvmppc_xive_native_get_vp(struct kvm_vcpu *vcpu, union 
> >> kvmppc_one_reg *val) { return 0; }
> >> +static inline int kvmppc_xive_native_set_vp(struct kvm_vcpu *vcpu, union 
> >> kvmppc_one_reg *val) { return -ENOENT; }
> > 
> > IIRC "VP" is the old name for "TCTX".  Since we're using tctx in the
> > rest of the XIVE code, can we use it here as well.
> 
> OK. The state we are getting or setting is indeed related to the thread 
> interrupt  context registers. 
> 
> The name VP is related to an identifier to some interrupt context under 
> OPAL (NVT in HW to be precise).

Oh, sorry, "NVT" was the name I was looking for, not "TCTX".  But in
any case, please lets standardize on one.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 09/19] KVM: PPC: Book3S HV: add a SET_SOURCE control to the XIVE native device

2019-02-04 Thread David Gibson
On Mon, Feb 04, 2019 at 08:07:20PM +0100, Cédric Le Goater wrote:
> On 2/4/19 5:57 AM, David Gibson wrote:
> > On Mon, Jan 07, 2019 at 07:43:21PM +0100, Cédric Le Goater wrote:
[snip]
> >> +  sb = kvmppc_xive_create_src_block(xive, irq);
> >> +  if (!sb) {
> >> +  pr_err("Failed to create block...\n");
> >> +  return -ENOMEM;
> >> +  }
> >> +  }
> >> +  state = >irq_state[idx];
> >> +
> >> +  if (get_user(val, ubufp)) {
> >> +  pr_err("fault getting user info !\n");
> >> +  return -EFAULT;
> >> +  }
> >> +
> >> +  /*
> >> +   * If the source doesn't already have an IPI, allocate
> >> +   * one and get the corresponding data
> >> +   */
> >> +  if (!state->ipi_number) {
> >> +  state->ipi_number = xive_native_alloc_irq();
> >> +  if (state->ipi_number == 0) {
> >> +  pr_err("Failed to allocate IRQ !\n");
> >> +  return -ENOMEM;
> >> +  }
> > 
> > Am I right in thinking this is the point at which a specific guest irq
> > number gets bound to a specific host irq number?
> 
> yes. the XIVE IRQ state caches this information and 'state' should be 
> protected before being assigned, indeed ... The XICS-over-XIVE device
> also has the same race issue.
> 
> It's not showing because where initializing the KVM device sequentially
> from QEMU and only once.

Ok.

So, for the passthrough case, what's the point at which we know that a
particular guest interrupt needs to be bound to a specific real
hardware interrupt, rather than a generic IPI?

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] powerpc/powernv/npu: Remove redundant change_pte() hook

2019-02-04 Thread Alistair Popple
On Thursday, 31 January 2019 12:11:06 PM AEDT Andrea Arcangeli wrote:
> On Thu, Jan 31, 2019 at 06:30:22PM +0800, Peter Xu wrote:
> > The change_pte() notifier was designed to use as a quick path to
> > update secondary MMU PTEs on write permission changes or PFN changes.
> > For KVM, it could reduce the vm-exits when vcpu faults on the pages
> > that was touched up by KSM.  It's not used to do cache invalidations,
> > for example, if we see the notifier will be called before the real PTE
> > update after all (please see set_pte_at_notify that set_pte_at was
> > called later).

Thanks for the fixup. I didn't realise that invalidate_range() always gets 
called but I now see that is the case so this change looks good to me as well.

Reviewed-by: Alistair Popple 

> > All the necessary cache invalidation should all be done in
> > invalidate_range() already.
> > 
> > CC: Benjamin Herrenschmidt 
> > CC: Paul Mackerras 
> > CC: Michael Ellerman 
> > CC: Alistair Popple 
> > CC: Alexey Kardashevskiy 
> > CC: Mark Hairgrove 
> > CC: Balbir Singh 
> > CC: David Gibson 
> > CC: Andrea Arcangeli 
> > CC: Jerome Glisse 
> > CC: Jason Wang 
> > CC: linuxppc-dev@lists.ozlabs.org
> > CC: linux-ker...@vger.kernel.org
> > Signed-off-by: Peter Xu 
> > ---
> > 
> >  arch/powerpc/platforms/powernv/npu-dma.c | 10 --
> >  1 file changed, 10 deletions(-)
> 
> Reviewed-by: Andrea Arcangeli 
> 
> It doesn't make sense to implement change_pte as an invalidate,
> change_pte is not compulsory to implement so if one wants to have
> invalidates only, change_pte method shouldn't be implemented in the
> first place and the common code will guarantee to invoke the range
> invalidates instead.
> 
> Currently the whole change_pte optimization is effectively disabled as
> noted in past discussions with Jerome (because of the range
> invalidates that always surrounds it), so we need to revisit the whole
> change_pte logic and decide it to re-enable it or to drop it as a
> whole, but in the meantime it's good to cleanup spots like below that
> should leave change_pte alone.
> 
> There are several examples of mmu_notifiers_ops in the kernel that
> don't implement change_pte, in fact it's the majority. Of all mmu
> notifier users, only nv_nmmu_notifier_ops, intel_mmuops_change and
> kvm_mmu_notifier_ops implements change_pte and as Peter found out by
> source review nv_nmmu_notifier_ops, intel_mmuops_change are wrong
> about it and should stop implementing it as an invalidate.
> 
> In short change_pte is only implemented correctly from KVM which can
> really updates the spte and flushes the TLB but the spte update
> remains and could avoid a vmexit if we figure out how to re-enable the
> optimization safely (the TLB fill after change_pte in KVM EPT/shadow
> secondary MMU will be looked up by the CPU in hardware).
> 
> If change_pte is implemented, it should update the mapping like KVM
> does and not do an invalidate.
> 
> Thanks,
> Andrea
> 
> > diff --git a/arch/powerpc/platforms/powernv/npu-dma.c
> > b/arch/powerpc/platforms/powernv/npu-dma.c index
> > 3f58c7dbd581..c003b29d870e 100644
> > --- a/arch/powerpc/platforms/powernv/npu-dma.c
> > +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> > @@ -917,15 +917,6 @@ static void pnv_npu2_mn_release(struct mmu_notifier
> > *mn,> 
> > mmio_invalidate(npu_context, 0, ~0UL);
> >  
> >  }
> > 
> > -static void pnv_npu2_mn_change_pte(struct mmu_notifier *mn,
> > -   struct mm_struct *mm,
> > -   unsigned long address,
> > -   pte_t pte)
> > -{
> > -   struct npu_context *npu_context = mn_to_npu_context(mn);
> > -   mmio_invalidate(npu_context, address, PAGE_SIZE);
> > -}
> > -
> > 
> >  static void pnv_npu2_mn_invalidate_range(struct mmu_notifier *mn,
> >  
> > struct mm_struct *mm,
> > unsigned long start, unsigned long end)
> > 
> > @@ -936,7 +927,6 @@ static void pnv_npu2_mn_invalidate_range(struct
> > mmu_notifier *mn,> 
> >  static const struct mmu_notifier_ops nv_nmmu_notifier_ops = {
> >  
> > .release = pnv_npu2_mn_release,
> > 
> > -   .change_pte = pnv_npu2_mn_change_pte,
> > 
> > .invalidate_range = pnv_npu2_mn_invalidate_range,
> >  
> >  };




Re: [PATCH] scsi: cxlflash: Prevent deadlock when adapter probe fails

2019-02-04 Thread Martin K. Petersen


Vaibhav,

> Presently when an error is encountered during probe of the cxlflash
> adapter, a deadlock is seen with cpu thread stuck inside
> cxlflash_remove(). Below is the trace of the deadlock as logged by
> khungtaskd:

Applied to 5.0/scsi-fixes, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH 12/17] powerpc/64s/exception: unwind exception-64s.h macros

2019-02-04 Thread kbuild test robot
Hi Nicholas,

I love your patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v5.0-rc4 next-20190204]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Nicholas-Piggin/powerpc-64s-tidy-and-gasify-exception-handler-code-round-1/20190205-020038
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-ps3_defconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 8.2.0-11) 8.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=8.2.0 make.cross ARCH=powerpc 

All errors (new ones prefixed by >>):

   powerpc64-linux-gnu-ld: arch/powerpc/kernel/head_64.o:(__ftr_alt_97+0xcc): 
undefined reference to `label'
   powerpc64-linux-gnu-ld: arch/powerpc/kernel/head_64.o: in function 
`exc_real_0xe60_hmi_exception':
>> (.head.text.real_vectors+0xd7c): undefined reference to `hanndler'
   powerpc64-linux-gnu-ld: arch/powerpc/kernel/head_64.o: in function 
`exc_virt_0x4300_data_access':
   (.head.text.virt_vectors+0x354): undefined reference to `label'
   powerpc64-linux-gnu-ld: arch/powerpc/kernel/head_64.o: in function 
`exc_virt_0x4380_data_access_slb':
   (.head.text.virt_vectors+0x3d4): undefined reference to `label'
   powerpc64-linux-gnu-ld: arch/powerpc/kernel/head_64.o: in function 
`exc_virt_0x4400_instruction_access':
   (.head.text.virt_vectors+0x454): undefined reference to `label'
   powerpc64-linux-gnu-ld: arch/powerpc/kernel/head_64.o: in function 
`exc_virt_0x4480_instruction_access_slb':
   (.head.text.virt_vectors+0x4d4): undefined reference to `label'
   powerpc64-linux-gnu-ld: arch/powerpc/kernel/head_64.o: in function 
`exc_virt_0x4500_hardware_interrupt':
   (.head.text.virt_vectors+0x564): undefined reference to `label'
   powerpc64-linux-gnu-ld: 
arch/powerpc/kernel/head_64.o:(.head.text.virt_vectors+0x654): more undefined 
references to `label' follow

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH v2 10/21] memblock: refactor internal allocation functions

2019-02-04 Thread Stephen Rothwell
Hi all,

On Mon, 04 Feb 2019 19:45:17 +1100 Michael Ellerman  wrote:
>
> Mike Rapoport  writes:
> > On Sun, Feb 03, 2019 at 08:39:20PM +1100, Michael Ellerman wrote:  
> >> Mike Rapoport  writes:  
> >> > Currently, memblock has several internal functions with overlapping
> >> > functionality. They all call memblock_find_in_range_node() to find free
> >> > memory and then reserve the allocated range and mark it with kmemleak.
> >> > However, there is difference in the allocation constraints and in 
> >> > fallback
> >> > strategies.  
> ...
> >> 
> >> This is causing problems on some of my machines.  
> ...
> >> 
> >> On some of my other systems it does that, and then panics because it
> >> can't allocate anything at all:
> >> 
> >> [0.00] numa:   NODE_DATA [mem 0x7ffcaee80-0x7ffcb3fff]
> >> [0.00] numa:   NODE_DATA [mem 0x7ffc99d00-0x7ffc9ee7f]
> >> [0.00] numa: NODE_DATA(1) on node 0
> >> [0.00] Kernel panic - not syncing: Cannot allocate 20864 bytes for 
> >> node 16 data
> >> [0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 
> >> 5.0.0-rc4-gccN-next-20190201-gdc4c899 #1
> >> [0.00] Call Trace:
> >> [0.00] [c11cfca0] [c0c11044] dump_stack+0xe8/0x164 
> >> (unreliable)
> >> [0.00] [c11cfcf0] [c00fdd6c] panic+0x17c/0x3e0
> >> [0.00] [c11cfd90] [c0f61bc8] 
> >> initmem_init+0x128/0x260
> >> [0.00] [c11cfe60] [c0f57940] setup_arch+0x398/0x418
> >> [0.00] [c11cfee0] [c0f50a94] 
> >> start_kernel+0xa0/0x684
> >> [0.00] [c11cff90] [c000af70] 
> >> start_here_common+0x1c/0x52c
> >> [0.00] Rebooting in 180 seconds..
> >> 
> >> 
> >> So there's something going wrong there, I haven't had time to dig into
> >> it though (Sunday night here).  
> >
> > Yeah, I've misplaced 'nid' and 'MEMBLOCK_ALLOC_ACCESSIBLE' in
> > memblock_phys_alloc_try_nid() :(
> >
> > Can you please check if the below patch fixes the issue on your systems?  
> 
> Yes it does, thanks.
> 
> Tested-by: Michael Ellerman 
> 
> cheers
> 
> 
> > From 5875b7440e985ce551e6da3cb28aa8e9af697e10 Mon Sep 17 00:00:00 2001
> > From: Mike Rapoport 
> > Date: Sun, 3 Feb 2019 13:35:42 +0200
> > Subject: [PATCH] memblock: fix parameter order in
> >  memblock_phys_alloc_try_nid()
> >
> > The refactoring of internal memblock allocation functions used wrong order
> > of parameters in memblock_alloc_range_nid() call from
> > memblock_phys_alloc_try_nid().
> > Fix it.
> >
> > Signed-off-by: Mike Rapoport 
> > ---
> >  mm/memblock.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/memblock.c b/mm/memblock.c
> > index e047933..0151a5b 100644
> > --- a/mm/memblock.c
> > +++ b/mm/memblock.c
> > @@ -1402,8 +1402,8 @@ phys_addr_t __init 
> > memblock_phys_alloc_range(phys_addr_t size,
> >  
> >  phys_addr_t __init memblock_phys_alloc_try_nid(phys_addr_t size, 
> > phys_addr_t align, int nid)
> >  {
> > -   return memblock_alloc_range_nid(size, align, 0, nid,
> > -   MEMBLOCK_ALLOC_ACCESSIBLE);
> > +   return memblock_alloc_range_nid(size, align, 0,
> > +   MEMBLOCK_ALLOC_ACCESSIBLE, nid);
> >  }
> >  
> >  /**
> > -- 
> > 2.7.4

I have applied that patch to the akpm tree in linux-next from today.

-- 
Cheers,
Stephen Rothwell


pgpYuFmVATTJg.pgp
Description: OpenPGP digital signature


Re: [PATCH 13/19] KVM: PPC: Book3S HV: add a SYNC control for the XIVE native migration

2019-02-04 Thread Cédric Le Goater
On 2/4/19 6:17 AM, David Gibson wrote:
> On Mon, Jan 07, 2019 at 07:43:25PM +0100, Cédric Le Goater wrote:
>> When migration of a VM is initiated, a first copy of the RAM is
>> transferred to the destination before the VM is stopped. At that time,
>> QEMU needs to perform a XIVE quiesce sequence to stop the flow of
>> event notifications and stabilize the EQs. The sources are masked and
>> the XIVE IC is synced with the KVM ioctl KVM_DEV_XIVE_GRP_SYNC.
>>
> 
> Don't you also need to make sure the guests queue pages are marked
> dirty here, in case they were already migrated?

I have added an extra KVM service to mark the EQ pages dirty. That 
might be overkill as it seems you are suggesting.

C. 
 
>> Signed-off-by: Cédric Le Goater 
>> ---
>>  arch/powerpc/include/uapi/asm/kvm.h   |  1 +
>>  arch/powerpc/kvm/book3s_xive_native.c | 32 +++
>>  2 files changed, 33 insertions(+)
>>
>> diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
>> b/arch/powerpc/include/uapi/asm/kvm.h
>> index 6fc9660c5aec..f3b859223b80 100644
>> --- a/arch/powerpc/include/uapi/asm/kvm.h
>> +++ b/arch/powerpc/include/uapi/asm/kvm.h
>> @@ -681,6 +681,7 @@ struct kvm_ppc_cpu_char {
>>  #define   KVM_DEV_XIVE_GET_TIMA_FD  2
>>  #define   KVM_DEV_XIVE_VC_BASE  3
>>  #define KVM_DEV_XIVE_GRP_SOURCES2   /* 64-bit source attributes */
>> +#define KVM_DEV_XIVE_GRP_SYNC   3   /* 64-bit source 
>> attributes */
>>  
>>  /* Layout of 64-bit XIVE source attribute values */
>>  #define KVM_XIVE_LEVEL_SENSITIVE(1ULL << 0)
>> diff --git a/arch/powerpc/kvm/book3s_xive_native.c 
>> b/arch/powerpc/kvm/book3s_xive_native.c
>> index 4ca75aade069..a8052867afc1 100644
>> --- a/arch/powerpc/kvm/book3s_xive_native.c
>> +++ b/arch/powerpc/kvm/book3s_xive_native.c
>> @@ -459,6 +459,35 @@ static int kvmppc_xive_native_set_source(struct 
>> kvmppc_xive *xive, long irq,
>>  return 0;
>>  }
>>  
>> +static int kvmppc_xive_native_sync(struct kvmppc_xive *xive, long irq, u64 
>> addr)
>> +{
>> +struct kvmppc_xive_src_block *sb;
>> +struct kvmppc_xive_irq_state *state;
>> +struct xive_irq_data *xd;
>> +u32 hw_num;
>> +u16 src;
>> +
>> +pr_devel("%s irq=0x%lx\n", __func__, irq);
>> +
>> +sb = kvmppc_xive_find_source(xive, irq, );
>> +if (!sb)
>> +return -ENOENT;
>> +
>> +state = >irq_state[src];
>> +
>> +if (!state->valid)
>> +return -ENOENT;
>> +
>> +arch_spin_lock(>lock);
>> +
>> +kvmppc_xive_select_irq(state, _num, );
>> +xive_native_sync_source(hw_num);
>> +xive_native_sync_queue(hw_num);
>> +
>> +arch_spin_unlock(>lock);
>> +return 0;
>> +}
>> +
>>  static int kvmppc_xive_native_set_attr(struct kvm_device *dev,
>> struct kvm_device_attr *attr)
>>  {
>> @@ -474,6 +503,8 @@ static int kvmppc_xive_native_set_attr(struct kvm_device 
>> *dev,
>>  case KVM_DEV_XIVE_GRP_SOURCES:
>>  return kvmppc_xive_native_set_source(xive, attr->attr,
>>   attr->addr);
>> +case KVM_DEV_XIVE_GRP_SYNC:
>> +return kvmppc_xive_native_sync(xive, attr->attr, attr->addr);
>>  }
>>  return -ENXIO;
>>  }
>> @@ -511,6 +542,7 @@ static int kvmppc_xive_native_has_attr(struct kvm_device 
>> *dev,
>>  }
>>  break;
>>  case KVM_DEV_XIVE_GRP_SOURCES:
>> +case KVM_DEV_XIVE_GRP_SYNC:
>>  if (attr->attr >= KVMPPC_XIVE_FIRST_IRQ &&
>>  attr->attr < KVMPPC_XIVE_NR_IRQS)
>>  return 0;
> 



Re: [PATCH 02/17] powerpc/64s/exception: remove H concatenation for EXC_HV variants

2019-02-04 Thread kbuild test robot
Hi Nicholas,

I love your patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v5.0-rc4 next-20190204]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Nicholas-Piggin/powerpc-64s-tidy-and-gasify-exception-handler-code-round-1/20190205-020038
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-ps3_defconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 8.2.0-11) 8.2.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=8.2.0 make.cross ARCH=powerpc 

All errors (new ones prefixed by >>):

>> powerpc64-linux-gnu-ld: arch/powerpc/kernel/head_64.o:(__ftr_alt_97+0x104): 
>> undefined reference to `label'
   powerpc64-linux-gnu-ld: arch/powerpc/kernel/head_64.o: in function 
`exc_virt_0x4300_data_access':
>> (.head.text.virt_vectors+0x354): undefined reference to `label'
   powerpc64-linux-gnu-ld: arch/powerpc/kernel/head_64.o: in function 
`exc_virt_0x4380_data_access_slb':
   (.head.text.virt_vectors+0x3d4): undefined reference to `label'
   powerpc64-linux-gnu-ld: arch/powerpc/kernel/head_64.o: in function 
`exc_virt_0x4400_instruction_access':
   (.head.text.virt_vectors+0x454): undefined reference to `label'
   powerpc64-linux-gnu-ld: arch/powerpc/kernel/head_64.o: in function 
`exc_virt_0x4480_instruction_access_slb':
   (.head.text.virt_vectors+0x4d4): undefined reference to `label'
   powerpc64-linux-gnu-ld: 
arch/powerpc/kernel/head_64.o:(.head.text.virt_vectors+0x564): more undefined 
references to `label' follow

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-02-04 Thread Michael S. Tsirkin
On Mon, Feb 04, 2019 at 04:15:41PM -0200, Thiago Jung Bauermann wrote:
> 
> Christoph Hellwig  writes:
> 
> > On Tue, Jan 29, 2019 at 09:36:08PM -0500, Michael S. Tsirkin wrote:
> >> This has been discussed ad nauseum. virtio is all about compatibility.
> >> Losing a couple of lines of code isn't worth breaking working setups.
> >> People that want "just use DMA API no tricks" now have the option.
> >> Setting a flag in a feature bit map is literally a single line
> >> of code in the hypervisor. So stop pushing for breaking working
> >> legacy setups and just fix it in the right place.
> >
> > I agree with the legacy aspect.  What I am missing is an extremely
> > strong wording that says you SHOULD always set this flag for new
> > hosts, including an explanation why.
> 
> My understanding of ACCESS_PLATFORM is that it means "this device will
> behave in all aspects like a regular device attached to this bus".


Not really. Look it up in the spec:

VIRTIO_F_ACCESS_PLATFORM(33) This feature indicates that the device can be used 
on a platform
where device access to data in memory is limited and/or translated. 
E.g. this is the case if the device
can be located behind an IOMMU that translates bus addresses from the 
device into physical addresses
in memory, if the device can be limited to only access certain memory 
addresses or if special commands
such as a cache flush can be needed to synchronise data in memory with 
the device. Whether accesses
are actually limited or translated is described by platform-specific 
means. If this feature bit is set to 0,
then the device has same access to memory addresses supplied to it as 
the driver has. In particular, the
device will always use physical addresses matching addresses used by 
the driver (typically meaning
physical addresses used by the CPU) and not translated further, and can 
access any address supplied
to it by the driver. When clear, this overrides any platform-specific 
description of whether device access
is limited or translated in any way, e.g. whether an IOMMU may be 
present.



> Is
> that it? Therefore it should be set because it's the sane thing to do?

It's the sane thing to do unless you want the very specific thing that
having it clear means, which is just have it be another CPU.

It was designed to make, when set, as many guests as we can work
correctly, and it seems to be successful in doing exactly that.

Unfortunately there could be legacy guests that do work correctly but
become slow. Whether trying to somehow work around that
can paint us into a corner where things again don't
work for some people is a question worth discussing.


> --
> Thiago Jung Bauermann
> IBM Linux Technology Center


Re: [PATCH] soc: fsl: dpio: Use after free in dpaa2_dpio_remove()

2019-02-04 Thread Li Yang
On Mon, Feb 4, 2019 at 8:12 AM Dan Carpenter  wrote:
>
> The dpaa2_io_down(priv->io) call frees "priv->io" so I've shifted the
> code around a little bit to avoid the use after free.
>
> Fixes: 991e873223e9 ("soc: fsl: dpio: use a cpumask to identify which cpus 
> are unused")
> Signed-off-by: Dan Carpenter 

Applied.  Thanks.

> ---
>  drivers/soc/fsl/dpio/dpio-driver.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/soc/fsl/dpio/dpio-driver.c 
> b/drivers/soc/fsl/dpio/dpio-driver.c
> index 2d4af32a0dec..a28799b62d53 100644
> --- a/drivers/soc/fsl/dpio/dpio-driver.c
> +++ b/drivers/soc/fsl/dpio/dpio-driver.c
> @@ -220,12 +220,12 @@ static int dpaa2_dpio_remove(struct fsl_mc_device 
> *dpio_dev)
>
> dev = _dev->dev;
> priv = dev_get_drvdata(dev);
> +   cpu = dpaa2_io_get_cpu(priv->io);
>
> dpaa2_io_down(priv->io);
>
> dpio_teardown_irqs(dpio_dev);
>
> -   cpu = dpaa2_io_get_cpu(priv->io);
> cpumask_set_cpu(cpu, cpus_unused_mask);
>
> err = dpio_open(dpio_dev->mc_io, 0, dpio_dev->obj_desc.id,
> --
> 2.17.1
>


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-02-04 Thread Michael S. Tsirkin
On Mon, Feb 04, 2019 at 04:14:20PM -0200, Thiago Jung Bauermann wrote:
> 
> Hello Michael,
> 
> Michael S. Tsirkin  writes:
> 
> > On Tue, Jan 29, 2019 at 03:42:44PM -0200, Thiago Jung Bauermann wrote:
> >>
> >> Fixing address of powerpc mailing list.
> >>
> >> Thiago Jung Bauermann  writes:
> >>
> >> > Hello,
> >> >
> >> > With Christoph's rework of the DMA API that recently landed, the patch
> >> > below is the only change needed in virtio to make it work in a POWER
> >> > secure guest under the ultravisor.
> >> >
> >> > The other change we need (making sure the device's dma_map_ops is NULL
> >> > so that the dma-direct/swiotlb code is used) can be made in
> >> > powerpc-specific code.
> >> >
> >> > Of course, I also have patches (soon to be posted as RFC) which hook up
> >> >  to the powerpc secure guest support code.
> >> >
> >> > What do you think?
> >> >
> >> > From d0629a36a75c678b4a72b853f8f7f8c17eedd6b3 Mon Sep 17 00:00:00 2001
> >> > From: Thiago Jung Bauermann 
> >> > Date: Thu, 24 Jan 2019 22:08:02 -0200
> >> > Subject: [RFC PATCH] virtio_ring: Use DMA API if guest memory is 
> >> > encrypted
> >> >
> >> > The host can't access the guest memory when it's encrypted, so using
> >> > regular memory pages for the ring isn't an option. Go through the DMA 
> >> > API.
> >> >
> >> > Signed-off-by: Thiago Jung Bauermann 
> >
> > Well I think this will come back to bite us (witness xen which is now
> > reworking precisely this path - but at least they aren't to blame, xen
> > came before ACCESS_PLATFORM).
> >
> > I also still think the right thing would have been to set
> > ACCESS_PLATFORM for all systems where device can't access all memory.
> 
> I understand. The problem with that approach for us is that because we
> don't know which guests will become secure guests and which will remain
> regular guests, QEMU would need to offer ACCESS_PLATFORM to all guests.
> 
> And the problem with that is that for QEMU on POWER, having
> ACCESS_PLATFORM turned off means that it can bypass the IOMMU for the
> device (which makes sense considering that the name of the flag was
> IOMMU_PLATFORM). And we need that for regular guests to avoid
> performance degradation.

You don't really, ACCESS_PLATFORM means just that, platform decides.

> So while ACCESS_PLATFORM solves our problems for secure guests, we can't
> turn it on by default because we can't affect legacy systems. Doing so
> would penalize existing systems that can access all memory. They would
> all have to unnecessarily go through address translations, and take a
> performance hit.

So as step one, you just give hypervisor admin an option to run legacy
systems faster by blocking secure mode. I don't see why that is
so terrible.

But as step two, assuming you use above step one to make legacy
guests go fast - maybe there is a point in detecting
such a hypervisor and doing something smarter with it.
By all means let's have a discussion around this but that is no longer
"to make it work" as the commit log says it's more a performance
optimization.


> The semantics of ACCESS_PLATFORM assume that the hypervisor/QEMU knows
> in advance - right when the VM is instantiated - that it will not have
> access to all guest memory.

Not quite. It just means that hypervisor can live with not having
access to all memory. If platform wants to give it access
to all memory that is quite all right.


> Unfortunately that assumption is subtly
> broken on our secure-platform. The hypervisor/QEMU realizes that the
> platform is going secure only *after the VM is instantiated*. It's the
> kernel running in the VM that determines that it wants to switch the
> platform to secure-mode.

ACCESS_PLATFORM is there so guests can detect legacy hypervisors
which always assumed it's another CPU.

> Another way of looking at this issue which also explains our reluctance
> is that the only difference between a secure guest and a regular guest
> (at least regarding virtio) is that the former uses swiotlb while the
> latter doens't.

But swiotlb is just one implementation. It's a guest internal thing. The
issue is that memory isn't host accessible.  Yes linux does not use that
info too much right now but it already begins to seep out of the
abstraction.  For example as you are doing data copies you should maybe
calculate the packet checksum just as well.  Not something DMA API will
let you know right now, but that's because any bounce buffer users so
far weren't terribly fast anyway - it was all for 16 bit hardware and
such.


> And from the device's point of view they're
> indistinguishable. It can't tell one guest that is using swiotlb from
> one that isn't. And that implies that secure guest vs regular guest
> isn't a virtio interface issue, it's "guest internal affairs". So
> there's no reason to reflect that in the feature flags.

So don't. The way not to reflect that in the feature flags is
to set ACCESS_PLATFORM.  Then you say *I don't care let platform device*.


Without 

Re: [PATCH v3 1/2] dt-bindings: soc: fsl: Document Qixis FPGA usage

2019-02-04 Thread Li Yang
Please include device tree binding mailing list and maintainers for
binding patches(cc'ed now).

On Mon, Feb 4, 2019 at 3:15 AM Pankaj Bansal  wrote:
>
> an FPGA-based system controller, called “Qixis”, which
> manages several critical system features, including:
> • Reset sequencing
> • Power supply configuration
> • Board configuration
> • hardware configuration
>
> The qixis registers are accessible over one or more system-specific
> interfaces, typically I2C, JTAG or an embedded processor.
>
> Signed-off-by: Pankaj Bansal 
> ---
>
> Notes:
> V3:
> - Added boardname based compatible field in bindings
> - Added bindings for MMIO based FPGA
> V2:
> - No change
>
>  .../bindings/soc/fsl/qixis_ctrl.txt  | 53 ++
>  1 file changed, 53 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/soc/fsl/qixis_ctrl.txt 
> b/Documentation/devicetree/bindings/soc/fsl/qixis_ctrl.txt
> new file mode 100644
> index ..5d510df14be8
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/soc/fsl/qixis_ctrl.txt
> @@ -0,0 +1,53 @@
> +* QIXIS FPGA block
> +
> +an FPGA-based system controller, called “Qixis”, which
> +manages several critical system features, including:
> +• Configuration switch monitoring
> +• Power on/off sequencing
> +• Reset sequencing
> +• Power supply configuration
> +• Board configuration
> +• hardware configuration
> +• Background power data collection (DCM)
> +• Fault monitoring
> +• RCW bypass SRAM (replace flash RCW with internal RCW) (NOR only)
> +• Dedicated functional validation blocks (POSt/IRS, triggered event, and so 
> on)
> +• I2C master for remote board control even with no DUT available
> +
> +The qixis registers are accessible over one or more system-specific 
> interfaces,
> +typically I2C, JTAG or an embedded processor.
> +
> +FPGA connected to I2C:
> +Required properties:
> +
> + - compatible: should be a board-specific string followed by a string
> +   indicating the type of FPGA.  Example:
> +   "fsl,-fpga", "fsl,fpga-qixis-i2c"
> + - reg : i2c address of the qixis device.
> +
> +Example (LX2160A-QDS):
> +   /* The FPGA node */
> +fpga@66 {
> +   compatible = "fsl,lx2160aqds-fpga", "fsl,fpga-qixis-i2c";
> +   reg = <0x66>;
> +   #address-cells = <1>;
> +   #size-cells = <0>;
> +   }
> +
> +* Freescale on-board FPGA
> +
> +This is the memory-mapped registers for on board FPGA.
> +
> +Required properties:
> +- compatible: should be a board-specific string followed by a string
> +  indicating the type of FPGA.  Example:
> +   "fsl,-fpga", "fsl,fpga-qixis"
> +- reg: should contain the address and the length of the FPGA register set.
> +
> +Example (LS2080A-RDB):
> +
> +cpld@3,0 {
> +compatible = "fsl,ls2080ardb-fpga", "fsl,fpga-qixis";
> +reg = <0x3 0 0x1>;
> +};
> +
> --
> 2.17.1
>


Re: [PATCH 09/19] KVM: PPC: Book3S HV: add a SET_SOURCE control to the XIVE native device

2019-02-04 Thread Cédric Le Goater
On 2/4/19 5:57 AM, David Gibson wrote:
> On Mon, Jan 07, 2019 at 07:43:21PM +0100, Cédric Le Goater wrote:
>> Interrupt sources are simply created at the OPAL level and then
>> MASKED. KVM only needs to know about their type: LSI or MSI.
> 
> This commit message isn't very illuminating.

There is room for improvement certainly.
 
>>
>> Signed-off-by: Cédric Le Goater 
>> ---
>>  arch/powerpc/include/uapi/asm/kvm.h   |  5 +
>>  arch/powerpc/kvm/book3s_xive_native.c | 98 +++
>>  .../powerpc/kvm/book3s_xive_native_template.c | 27 +
>>  3 files changed, 130 insertions(+)
>>  create mode 100644 arch/powerpc/kvm/book3s_xive_native_template.c
>>
>> diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
>> b/arch/powerpc/include/uapi/asm/kvm.h
>> index 8b78b12aa118..6fc9660c5aec 100644
>> --- a/arch/powerpc/include/uapi/asm/kvm.h
>> +++ b/arch/powerpc/include/uapi/asm/kvm.h
>> @@ -680,5 +680,10 @@ struct kvm_ppc_cpu_char {
>>  #define   KVM_DEV_XIVE_GET_ESB_FD   1
>>  #define   KVM_DEV_XIVE_GET_TIMA_FD  2
>>  #define   KVM_DEV_XIVE_VC_BASE  3
>> +#define KVM_DEV_XIVE_GRP_SOURCES2   /* 64-bit source attributes */
>> +
>> +/* Layout of 64-bit XIVE source attribute values */
>> +#define KVM_XIVE_LEVEL_SENSITIVE(1ULL << 0)
>> +#define KVM_XIVE_LEVEL_ASSERTED (1ULL << 1)
>>  
>>  #endif /* __LINUX_KVM_POWERPC_H */
>> diff --git a/arch/powerpc/kvm/book3s_xive_native.c 
>> b/arch/powerpc/kvm/book3s_xive_native.c
>> index 29a62914de55..2518640d4a58 100644
>> --- a/arch/powerpc/kvm/book3s_xive_native.c
>> +++ b/arch/powerpc/kvm/book3s_xive_native.c
>> @@ -31,6 +31,24 @@
>>  
>>  #include "book3s_xive.h"
>>  
>> +/*
>> + * We still instantiate them here because we use some of the
>> + * generated utility functions as well in this file.
> 
> And this comment is downright cryptic.

I have removed this part now that the hcalls are not done under
real mode anymore.
 
> 
>> + */
>> +#define XIVE_RUNTIME_CHECKS
>> +#define X_PFX xive_vm_
>> +#define X_STATIC static
>> +#define X_STAT_PFX stat_vm_
>> +#define __x_timaxive_tima
>> +#define __x_eoi_page(xd)((void __iomem *)((xd)->eoi_mmio))
>> +#define __x_trig_page(xd)   ((void __iomem *)((xd)->trig_mmio))
>> +#define __x_writeb  __raw_writeb
>> +#define __x_readw   __raw_readw
>> +#define __x_readq   __raw_readq
>> +#define __x_writeq  __raw_writeq
>> +
>> +#include "book3s_xive_native_template.c"
>> +
>>  static void xive_native_cleanup_queue(struct kvm_vcpu *vcpu, int prio)
>>  {
>>  struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
>> @@ -305,6 +323,78 @@ static int kvmppc_xive_native_get_tima_fd(struct 
>> kvmppc_xive *xive, u64 addr)
>>  return put_user(ret, ubufp);
>>  }
>>  
>> +static int kvmppc_xive_native_set_source(struct kvmppc_xive *xive, long irq,
>> + u64 addr)
>> +{
>> +struct kvmppc_xive_src_block *sb;
>> +struct kvmppc_xive_irq_state *state;
>> +u64 __user *ubufp = (u64 __user *) addr;
>> +u64 val;
>> +u16 idx;
>> +
>> +pr_devel("%s irq=0x%lx\n", __func__, irq);
>> +
>> +if (irq < KVMPPC_XIVE_FIRST_IRQ || irq >= KVMPPC_XIVE_NR_IRQS)
>> +return -ENOENT;
>> +
>> +sb = kvmppc_xive_find_source(xive, irq, );
>> +if (!sb) {
>> +pr_debug("No source, creating source block...\n");
> 
> Doesn't this need to be protected by some lock?
> 
>> +sb = kvmppc_xive_create_src_block(xive, irq);
>> +if (!sb) {
>> +pr_err("Failed to create block...\n");
>> +return -ENOMEM;
>> +}
>> +}
>> +state = >irq_state[idx];
>> +
>> +if (get_user(val, ubufp)) {
>> +pr_err("fault getting user info !\n");
>> +return -EFAULT;
>> +}
>> +
>> +/*
>> + * If the source doesn't already have an IPI, allocate
>> + * one and get the corresponding data
>> + */
>> +if (!state->ipi_number) {
>> +state->ipi_number = xive_native_alloc_irq();
>> +if (state->ipi_number == 0) {
>> +pr_err("Failed to allocate IRQ !\n");
>> +return -ENOMEM;
>> +}
> 
> Am I right in thinking this is the point at which a specific guest irq
> number gets bound to a specific host irq number?

yes. the XIVE IRQ state caches this information and 'state' should be 
protected before being assigned, indeed ... The XICS-over-XIVE device
also has the same race issue.

It's not showing because where initializing the KVM device sequentially
from QEMU and only once.

Thanks,

C. 
 

> 
>> +xive_native_populate_irq_data(state->ipi_number,
>> +  >ipi_data);
>> +pr_debug("%s allocated hw_irq=0x%x for irq=0x%lx\n", __func__,
>> + state->ipi_number, irq);
>> +}
>> +
>> +arch_spin_lock(>lock);
>> +
>> +/* Restore LSI state */
>> +if (val & 

Re: [PATCH 17/19] KVM: PPC: Book3S HV: add get/set accessors for the VP XIVE state

2019-02-04 Thread Cédric Le Goater
On 2/4/19 6:26 AM, David Gibson wrote:
> On Mon, Jan 07, 2019 at 08:10:04PM +0100, Cédric Le Goater wrote:
>> At a VCPU level, the state of the thread context interrupt management
>> registers needs to be collected. These registers are cached under the
>> 'xive_saved_state.w01' field of the VCPU when the VPCU context is
>> pulled from the HW thread. An OPAL call retrieves the backup of the
>> IPB register in the NVT structure and merges it in the KVM state.
>>
>> The structures of the interface between QEMU and KVM provisions some
>> extra room (two u64) for further extensions if more state needs to be
>> transferred back to QEMU.
>>
>> Signed-off-by: Cédric Le Goater 
>> ---
>>  arch/powerpc/include/asm/kvm_ppc.h|  5 ++
>>  arch/powerpc/include/uapi/asm/kvm.h   |  2 +
>>  arch/powerpc/kvm/book3s.c | 24 +
>>  arch/powerpc/kvm/book3s_xive_native.c | 78 +++
>>  4 files changed, 109 insertions(+)
>>
>> diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
>> b/arch/powerpc/include/asm/kvm_ppc.h
>> index 4cc897039485..49c488af168c 100644
>> --- a/arch/powerpc/include/asm/kvm_ppc.h
>> +++ b/arch/powerpc/include/asm/kvm_ppc.h
>> @@ -270,6 +270,7 @@ union kvmppc_one_reg {
>>  u64 addr;
>>  u64 length;
>>  }   vpaval;
>> +u64 xive_timaval[4];
>>  };
>>  
>>  struct kvmppc_ops {
>> @@ -603,6 +604,8 @@ extern void kvmppc_xive_native_cleanup_vcpu(struct 
>> kvm_vcpu *vcpu);
>>  extern void kvmppc_xive_native_init_module(void);
>>  extern void kvmppc_xive_native_exit_module(void);
>>  extern int kvmppc_xive_native_hcall(struct kvm_vcpu *vcpu, u32 cmd);
>> +extern int kvmppc_xive_native_get_vp(struct kvm_vcpu *vcpu, union 
>> kvmppc_one_reg *val);
>> +extern int kvmppc_xive_native_set_vp(struct kvm_vcpu *vcpu, union 
>> kvmppc_one_reg *val);
>>  
>>  #else
>>  static inline int kvmppc_xive_set_xive(struct kvm *kvm, u32 irq, u32 server,
>> @@ -637,6 +640,8 @@ static inline void kvmppc_xive_native_init_module(void) 
>> { }
>>  static inline void kvmppc_xive_native_exit_module(void) { }
>>  static inline int kvmppc_xive_native_hcall(struct kvm_vcpu *vcpu, u32 cmd)
>>  { return 0; }
>> +static inline int kvmppc_xive_native_get_vp(struct kvm_vcpu *vcpu, union 
>> kvmppc_one_reg *val) { return 0; }
>> +static inline int kvmppc_xive_native_set_vp(struct kvm_vcpu *vcpu, union 
>> kvmppc_one_reg *val) { return -ENOENT; }
> 
> IIRC "VP" is the old name for "TCTX".  Since we're using tctx in the
> rest of the XIVE code, can we use it here as well.

OK. The state we are getting or setting is indeed related to the thread 
interrupt  context registers. 

The name VP is related to an identifier to some interrupt context under 
OPAL (NVT in HW to be precise).  

C.

> 
>>  #endif /* CONFIG_KVM_XIVE */
>>  
>> diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
>> b/arch/powerpc/include/uapi/asm/kvm.h
>> index 95302558ce10..3c958c39a782 100644
>> --- a/arch/powerpc/include/uapi/asm/kvm.h
>> +++ b/arch/powerpc/include/uapi/asm/kvm.h
>> @@ -480,6 +480,8 @@ struct kvm_ppc_cpu_char {
>>  #define  KVM_REG_PPC_ICP_PPRI_SHIFT 16  /* pending irq priority */
>>  #define  KVM_REG_PPC_ICP_PPRI_MASK  0xff
>>  
>> +#define KVM_REG_PPC_VP_STATE(KVM_REG_PPC | KVM_REG_SIZE_U256 | 0x8d)
>> +
>>  /* Device control API: PPC-specific devices */
>>  #define KVM_DEV_MPIC_GRP_MISC   1
>>  #define   KVM_DEV_MPIC_BASE_ADDR0   /* 64-bit */
>> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
>> index de7eed191107..5ad658077a35 100644
>> --- a/arch/powerpc/kvm/book3s.c
>> +++ b/arch/powerpc/kvm/book3s.c
>> @@ -641,6 +641,18 @@ int kvmppc_get_one_reg(struct kvm_vcpu *vcpu, u64 id,
>>  *val = get_reg_val(id, 
>> kvmppc_xics_get_icp(vcpu));
>>  break;
>>  #endif /* CONFIG_KVM_XICS */
>> +#ifdef CONFIG_KVM_XIVE
>> +case KVM_REG_PPC_VP_STATE:
>> +if (!vcpu->arch.xive_vcpu) {
>> +r = -ENXIO;
>> +break;
>> +}
>> +if (xive_enabled())
>> +r = kvmppc_xive_native_get_vp(vcpu, val);
>> +else
>> +r = -ENXIO;
>> +break;
>> +#endif /* CONFIG_KVM_XIVE */
>>  case KVM_REG_PPC_FSCR:
>>  *val = get_reg_val(id, vcpu->arch.fscr);
>>  break;
>> @@ -714,6 +726,18 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id,
>>  r = kvmppc_xics_set_icp(vcpu, set_reg_val(id, 
>> *val));
>>  break;
>>  #endif /* CONFIG_KVM_XICS */
>> +#ifdef CONFIG_KVM_XIVE
>> +case KVM_REG_PPC_VP_STATE:
>> +if (!vcpu->arch.xive_vcpu) {
>> +r = -ENXIO;
>> +break;
>> +}
>> +if 

Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-02-04 Thread Thiago Jung Bauermann


Christoph Hellwig  writes:

> On Tue, Jan 29, 2019 at 09:36:08PM -0500, Michael S. Tsirkin wrote:
>> This has been discussed ad nauseum. virtio is all about compatibility.
>> Losing a couple of lines of code isn't worth breaking working setups.
>> People that want "just use DMA API no tricks" now have the option.
>> Setting a flag in a feature bit map is literally a single line
>> of code in the hypervisor. So stop pushing for breaking working
>> legacy setups and just fix it in the right place.
>
> I agree with the legacy aspect.  What I am missing is an extremely
> strong wording that says you SHOULD always set this flag for new
> hosts, including an explanation why.

My understanding of ACCESS_PLATFORM is that it means "this device will
behave in all aspects like a regular device attached to this bus". Is
that it? Therefore it should be set because it's the sane thing to do?

--
Thiago Jung Bauermann
IBM Linux Technology Center



Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-02-04 Thread Thiago Jung Bauermann


Hello Michael,

Michael S. Tsirkin  writes:

> On Tue, Jan 29, 2019 at 03:42:44PM -0200, Thiago Jung Bauermann wrote:
>>
>> Fixing address of powerpc mailing list.
>>
>> Thiago Jung Bauermann  writes:
>>
>> > Hello,
>> >
>> > With Christoph's rework of the DMA API that recently landed, the patch
>> > below is the only change needed in virtio to make it work in a POWER
>> > secure guest under the ultravisor.
>> >
>> > The other change we need (making sure the device's dma_map_ops is NULL
>> > so that the dma-direct/swiotlb code is used) can be made in
>> > powerpc-specific code.
>> >
>> > Of course, I also have patches (soon to be posted as RFC) which hook up
>> >  to the powerpc secure guest support code.
>> >
>> > What do you think?
>> >
>> > From d0629a36a75c678b4a72b853f8f7f8c17eedd6b3 Mon Sep 17 00:00:00 2001
>> > From: Thiago Jung Bauermann 
>> > Date: Thu, 24 Jan 2019 22:08:02 -0200
>> > Subject: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted
>> >
>> > The host can't access the guest memory when it's encrypted, so using
>> > regular memory pages for the ring isn't an option. Go through the DMA API.
>> >
>> > Signed-off-by: Thiago Jung Bauermann 
>
> Well I think this will come back to bite us (witness xen which is now
> reworking precisely this path - but at least they aren't to blame, xen
> came before ACCESS_PLATFORM).
>
> I also still think the right thing would have been to set
> ACCESS_PLATFORM for all systems where device can't access all memory.

I understand. The problem with that approach for us is that because we
don't know which guests will become secure guests and which will remain
regular guests, QEMU would need to offer ACCESS_PLATFORM to all guests.

And the problem with that is that for QEMU on POWER, having
ACCESS_PLATFORM turned off means that it can bypass the IOMMU for the
device (which makes sense considering that the name of the flag was
IOMMU_PLATFORM). And we need that for regular guests to avoid
performance degradation.

So while ACCESS_PLATFORM solves our problems for secure guests, we can't
turn it on by default because we can't affect legacy systems. Doing so
would penalize existing systems that can access all memory. They would
all have to unnecessarily go through address translations, and take a
performance hit.

The semantics of ACCESS_PLATFORM assume that the hypervisor/QEMU knows
in advance - right when the VM is instantiated - that it will not have
access to all guest memory. Unfortunately that assumption is subtly
broken on our secure-platform. The hypervisor/QEMU realizes that the
platform is going secure only *after the VM is instantiated*. It's the
kernel running in the VM that determines that it wants to switch the
platform to secure-mode.

Another way of looking at this issue which also explains our reluctance
is that the only difference between a secure guest and a regular guest
(at least regarding virtio) is that the former uses swiotlb while the
latter doens't. And from the device's point of view they're
indistinguishable. It can't tell one guest that is using swiotlb from
one that isn't. And that implies that secure guest vs regular guest
isn't a virtio interface issue, it's "guest internal affairs". So
there's no reason to reflect that in the feature flags.

That said, we still would like to arrive at a proper design for this
rather than add yet another hack if we can avoid it. So here's another
proposal: considering that the dma-direct code (in kernel/dma/direct.c)
automatically uses swiotlb when necessary (thanks to Christoph's recent
DMA work), would it be ok to replace virtio's own direct-memory code
that is used in the !ACCESS_PLATFORM case with the dma-direct code? That
way we'll get swiotlb even with !ACCESS_PLATFORM, and virtio will get a
code cleanup (replace open-coded stuff with calls to existing
infrastructure).

> But I also think I don't have the energy to argue about power secure
> guest anymore.  So be it for power secure guest since the involved
> engineers disagree with me.  Hey I've been wrong in the past ;).

Yeah, it's been a difficult discussion. Thanks for still engaging!
I honestly thought that this patch was a good solution (if the guest has
encrypted memory it means that the DMA API needs to be used), but I can
see where you are coming from. As I said, we'd like to arrive at a good
solution if possible.

> But the name "sev_active" makes me scared because at least AMD guys who
> were doing the sensible thing and setting ACCESS_PLATFORM

My understanding is, AMD guest-platform knows in advance that their
guest will run in secure mode and hence sets the flag at the time of VM
instantiation. Unfortunately we dont have that luxury on our platforms.

> (unless I'm
> wrong? I reemember distinctly that's so) will likely be affected too.
> We don't want that.
>
> So let's find a way to make sure it's just power secure guest for now
> pls.

Yes, my understanding is that they turn ACCESS_PLATFORM on. And because
of 

Re: [PATCH 15/19] KVM: PPC: Book3S HV: add get/set accessors for the source configuration

2019-02-04 Thread Cédric Le Goater
On 2/4/19 6:21 AM, David Gibson wrote:
> On Mon, Jan 07, 2019 at 07:43:27PM +0100, Cédric Le Goater wrote:
>> Theses are use to capure the XIVE EAS table of the KVM device, the
>> configuration of the source targets.
>>
>> Signed-off-by: Cédric Le Goater 
>> ---
>>  arch/powerpc/include/uapi/asm/kvm.h   | 11 
>>  arch/powerpc/kvm/book3s_xive_native.c | 87 +++
>>  2 files changed, 98 insertions(+)
>>
>> diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
>> b/arch/powerpc/include/uapi/asm/kvm.h
>> index 1a8740629acf..faf024f39858 100644
>> --- a/arch/powerpc/include/uapi/asm/kvm.h
>> +++ b/arch/powerpc/include/uapi/asm/kvm.h
>> @@ -683,9 +683,20 @@ struct kvm_ppc_cpu_char {
>>  #define   KVM_DEV_XIVE_SAVE_EQ_PAGES4
>>  #define KVM_DEV_XIVE_GRP_SOURCES2   /* 64-bit source attributes */
>>  #define KVM_DEV_XIVE_GRP_SYNC   3   /* 64-bit source 
>> attributes */
>> +#define KVM_DEV_XIVE_GRP_EAS4   /* 64-bit eas 
>> attributes */
>>  
>>  /* Layout of 64-bit XIVE source attribute values */
>>  #define KVM_XIVE_LEVEL_SENSITIVE(1ULL << 0)
>>  #define KVM_XIVE_LEVEL_ASSERTED (1ULL << 1)
>>  
>> +/* Layout of 64-bit eas attribute values */
>> +#define KVM_XIVE_EAS_PRIORITY_SHIFT 0
>> +#define KVM_XIVE_EAS_PRIORITY_MASK  0x7
>> +#define KVM_XIVE_EAS_SERVER_SHIFT   3
>> +#define KVM_XIVE_EAS_SERVER_MASK0xfff8ULL
>> +#define KVM_XIVE_EAS_MASK_SHIFT 32
>> +#define KVM_XIVE_EAS_MASK_MASK  0x1ULL
>> +#define KVM_XIVE_EAS_EISN_SHIFT 33
>> +#define KVM_XIVE_EAS_EISN_MASK  0xfffeULL
>> +
>>  #endif /* __LINUX_KVM_POWERPC_H */
>> diff --git a/arch/powerpc/kvm/book3s_xive_native.c 
>> b/arch/powerpc/kvm/book3s_xive_native.c
>> index f2de1bcf3b35..0468b605baa7 100644
>> --- a/arch/powerpc/kvm/book3s_xive_native.c
>> +++ b/arch/powerpc/kvm/book3s_xive_native.c
>> @@ -525,6 +525,88 @@ static int kvmppc_xive_native_sync(struct kvmppc_xive 
>> *xive, long irq, u64 addr)
>>  return 0;
>>  }
>>  
>> +static int kvmppc_xive_native_set_eas(struct kvmppc_xive *xive, long irq,
>> +  u64 addr)
> 
> I'd prefer to avoid the name "EAS" here.  IIUC these aren't "raw" EAS
> values, but rather essentially the "source config" in the terminology
> of the PAPR hcalls.  Which, yes, is basically implemented by setting
> the EAS, but since it's the PAPR architected state that we need to
> preserve across migration, I'd prefer to stick as close as we can to
> the PAPR terminology.

But we don't have an equivalent name in the PAPR specs for the tuple 
(prio, server). We could use the generic 'target' name may be ? even 
if this is usually referring to a CPU number.

Or, IVE (Interrupt Vector Entry) ? which makes some sense. 
This is was the former name in HW. I think we recycle it for KVM. 
 
C.  

> 
>> +{
>> +struct kvmppc_xive_src_block *sb;
>> +struct kvmppc_xive_irq_state *state;
>> +u64 __user *ubufp = (u64 __user *) addr;
>> +u16 src;
>> +u64 kvm_eas;
>> +u32 server;
>> +u8 priority;
>> +u32 eisn;
>> +
>> +sb = kvmppc_xive_find_source(xive, irq, );
>> +if (!sb)
>> +return -ENOENT;
>> +
>> +state = >irq_state[src];
>> +
>> +if (!state->valid)
>> +return -EINVAL;
>> +
>> +if (get_user(kvm_eas, ubufp))
>> +return -EFAULT;
>> +
>> +pr_devel("%s irq=0x%lx eas=%016llx\n", __func__, irq, kvm_eas);
>> +
>> +priority = (kvm_eas & KVM_XIVE_EAS_PRIORITY_MASK) >>
>> +KVM_XIVE_EAS_PRIORITY_SHIFT;
>> +server = (kvm_eas & KVM_XIVE_EAS_SERVER_MASK) >>
>> +KVM_XIVE_EAS_SERVER_SHIFT;
>> +eisn = (kvm_eas & KVM_XIVE_EAS_EISN_MASK) >> KVM_XIVE_EAS_EISN_SHIFT;
>> +
>> +if (priority != xive_prio_from_guest(priority)) {
>> +pr_err("invalid priority for queue %d for VCPU %d\n",
>> +   priority, server);
>> +return -EINVAL;
>> +}
>> +
>> +return kvmppc_xive_native_set_source_config(xive, sb, state, server,
>> +priority, eisn);
>> +}
>> +
>> +static int kvmppc_xive_native_get_eas(struct kvmppc_xive *xive, long irq,
>> +  u64 addr)
>> +{
>> +struct kvmppc_xive_src_block *sb;
>> +struct kvmppc_xive_irq_state *state;
>> +u64 __user *ubufp = (u64 __user *) addr;
>> +u16 src;
>> +u64 kvm_eas;
>> +
>> +sb = kvmppc_xive_find_source(xive, irq, );
>> +if (!sb)
>> +return -ENOENT;
>> +
>> +state = >irq_state[src];
>> +
>> +if (!state->valid)
>> +return -EINVAL;
>> +
>> +arch_spin_lock(>lock);
>> +
>> +if (state->act_priority == MASKED)
>> +kvm_eas = KVM_XIVE_EAS_MASK_MASK;
>> +else {
>> +kvm_eas = (state->act_priority << KVM_XIVE_EAS_PRIORITY_SHIFT) &
>> +KVM_XIVE_EAS_PRIORITY_MASK;
>> +kvm_eas |= 

Re: [PATCH 12/19] KVM: PPC: Book3S HV: record guest queue page address

2019-02-04 Thread Cédric Le Goater
On 2/4/19 6:15 AM, David Gibson wrote:
> On Mon, Jan 07, 2019 at 07:43:24PM +0100, Cédric Le Goater wrote:
>> The guest physical address of the event queue will be part of the
>> state to transfer in the migration. Cache its value when the queue is
>> configured, it will save us an OPAL call.
> 
> That doesn't sound like a very compelling case - migration is already
> a hundreds of milliseconds type operation, I wouldn't expect a few
> extra OPAL calls to be an issue.

OK. I don't think this is much a problem anyhow. Let's call OPAL.

C. 

 
>>
>> Signed-off-by: Cédric Le Goater 
>> ---
>>  arch/powerpc/include/asm/xive.h   | 2 ++
>>  arch/powerpc/kvm/book3s_xive_native.c | 4 
>>  2 files changed, 6 insertions(+)
>>
>> diff --git a/arch/powerpc/include/asm/xive.h 
>> b/arch/powerpc/include/asm/xive.h
>> index 7a7aa22d8258..e90c3c5d9533 100644
>> --- a/arch/powerpc/include/asm/xive.h
>> +++ b/arch/powerpc/include/asm/xive.h
>> @@ -74,6 +74,8 @@ struct xive_q {
>>  u32 esc_irq;
>>  atomic_tcount;
>>  atomic_tpending_count;
>> +u64 guest_qpage;
>> +u32 guest_qsize;
>>  };
>>  
>>  /* Global enable flags for the XIVE support */
>> diff --git a/arch/powerpc/kvm/book3s_xive_native.c 
>> b/arch/powerpc/kvm/book3s_xive_native.c
>> index 35d806740c3a..4ca75aade069 100644
>> --- a/arch/powerpc/kvm/book3s_xive_native.c
>> +++ b/arch/powerpc/kvm/book3s_xive_native.c
>> @@ -708,6 +708,10 @@ static int kvmppc_h_int_set_queue_config(struct 
>> kvm_vcpu *vcpu,
>>  }
>>  qaddr = page_to_virt(page) + (qpage & ~PAGE_MASK);
>>  
>> +/* Backup queue page address and size for migration */
>> +q->guest_qpage = qpage;
>> +q->guest_qsize = qsize;
>> +
>>  rc = xive_native_configure_queue(xc->vp_id, q, priority,
>>   (__be32 *) qaddr, qsize, true);
>>  if (rc) {
> 



[PATCH RFC v3 06/21] PCI: Pause the devices with movable BARs during rescan

2019-02-04 Thread Sergey Miroshnichenko
Drivers indicate their support of movable BARs by implementing the
new rescan_prepare() and rescan_done() hooks in the struct pci_driver.

All device's activity must be stopped during a rescan, and iounmap()
+ioremap() must be applied to every used BAR.

Signed-off-by: Sergey Miroshnichenko 
---
 drivers/pci/probe.c | 51 +++--
 include/linux/pci.h |  2 ++
 2 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index bbc12934f041..e18d07996cf3 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -3172,6 +3172,38 @@ unsigned int pci_rescan_bus_bridge_resize(struct pci_dev 
*bridge)
return max;
 }
 
+static void pci_bus_rescan_prepare(struct pci_bus *bus)
+{
+   struct pci_dev *dev;
+
+   list_for_each_entry(dev, >devices, bus_list) {
+   struct pci_bus *child = dev->subordinate;
+
+   if (child) {
+   pci_bus_rescan_prepare(child);
+   } else if (dev->driver &&
+  dev->driver->rescan_prepare) {
+   dev->driver->rescan_prepare(dev);
+   }
+   }
+}
+
+static void pci_bus_rescan_done(struct pci_bus *bus)
+{
+   struct pci_dev *dev;
+
+   list_for_each_entry(dev, >devices, bus_list) {
+   struct pci_bus *child = dev->subordinate;
+
+   if (child) {
+   pci_bus_rescan_done(child);
+   } else if (dev->driver &&
+  dev->driver->rescan_done) {
+   dev->driver->rescan_done(dev);
+   }
+   }
+}
+
 /**
  * pci_rescan_bus - Scan a PCI bus for devices
  * @bus: PCI bus to scan
@@ -3185,8 +3217,23 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
 {
unsigned int max;
 
-   max = pci_scan_child_bus(bus);
-   pci_assign_unassigned_bus_resources(bus);
+   if (pci_movable_bars_enabled()) {
+   struct pci_bus *root = bus;
+
+   while (!pci_is_root_bus(root))
+   root = root->parent;
+
+   pci_bus_rescan_prepare(root);
+
+   max = pci_scan_child_bus(root);
+   pci_assign_unassigned_root_bus_resources(root);
+
+   pci_bus_rescan_done(root);
+   } else {
+   max = pci_scan_child_bus(bus);
+   pci_assign_unassigned_bus_resources(bus);
+   }
+
pci_bus_add_devices(bus);
 
return max;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index ba0b1d0ea2d2..5cd534b6631b 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -777,6 +777,8 @@ struct pci_driver {
int  (*resume)(struct pci_dev *dev);/* Device woken up */
void (*shutdown)(struct pci_dev *dev);
int  (*sriov_configure)(struct pci_dev *dev, int num_vfs); /* On PF */
+   void (*rescan_prepare)(struct pci_dev *dev);
+   void (*rescan_done)(struct pci_dev *dev);
const struct pci_error_handlers *err_handler;
const struct attribute_group **groups;
struct device_driverdriver;
-- 
2.20.1



Re: [PATCH 14/19] KVM: PPC: Book3S HV: add a control to make the XIVE EQ pages dirty

2019-02-04 Thread Cédric Le Goater
On 2/4/19 6:18 AM, David Gibson wrote:
> On Mon, Jan 07, 2019 at 07:43:26PM +0100, Cédric Le Goater wrote:
>> When the VM is stopped in a migration sequence, the sources are masked
>> and the XIVE IC is synced to stabilize the EQs. When done, the KVM
>> ioctl KVM_DEV_XIVE_SAVE_EQ_PAGES is called to mark dirty the EQ pages.
>>
>> The migration can then transfer the remaining dirty pages to the
>> destination and start collecting the state of the devices.
> 
> Is there a reason to make this a separate step from the SYNC
> operation?

Hmm, apart from letting QEMU orchestrate the migration step by step, no.

We could merge the SYNC and the SAVE_EQ_PAGES in a single KVM operation. 
I think that should be fine. 

However, it does not make sense to call this operation without the VM 
being stopped. I wonder how this can checked from KVM. May be we can't. 

C. 

> 
>>
>> Signed-off-by: Cédric Le Goater 
>> ---
>>  arch/powerpc/include/uapi/asm/kvm.h   |  1 +
>>  arch/powerpc/kvm/book3s_xive_native.c | 40 +++
>>  2 files changed, 41 insertions(+)
>>
>> diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
>> b/arch/powerpc/include/uapi/asm/kvm.h
>> index f3b859223b80..1a8740629acf 100644
>> --- a/arch/powerpc/include/uapi/asm/kvm.h
>> +++ b/arch/powerpc/include/uapi/asm/kvm.h
>> @@ -680,6 +680,7 @@ struct kvm_ppc_cpu_char {
>>  #define   KVM_DEV_XIVE_GET_ESB_FD   1
>>  #define   KVM_DEV_XIVE_GET_TIMA_FD  2
>>  #define   KVM_DEV_XIVE_VC_BASE  3
>> +#define   KVM_DEV_XIVE_SAVE_EQ_PAGES4
>>  #define KVM_DEV_XIVE_GRP_SOURCES2   /* 64-bit source attributes */
>>  #define KVM_DEV_XIVE_GRP_SYNC   3   /* 64-bit source 
>> attributes */
>>  
>> diff --git a/arch/powerpc/kvm/book3s_xive_native.c 
>> b/arch/powerpc/kvm/book3s_xive_native.c
>> index a8052867afc1..f2de1bcf3b35 100644
>> --- a/arch/powerpc/kvm/book3s_xive_native.c
>> +++ b/arch/powerpc/kvm/book3s_xive_native.c
>> @@ -373,6 +373,43 @@ static int kvmppc_xive_native_get_tima_fd(struct 
>> kvmppc_xive *xive, u64 addr)
>>  return put_user(ret, ubufp);
>>  }
>>  
>> +static int kvmppc_xive_native_vcpu_save_eq_pages(struct kvm_vcpu *vcpu)
>> +{
>> +struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
>> +unsigned int prio;
>> +
>> +if (!xc)
>> +return -ENOENT;
>> +
>> +for (prio = 0; prio < KVMPPC_XIVE_Q_COUNT; prio++) {
>> +struct xive_q *q = >queues[prio];
>> +
>> +if (!q->qpage)
>> +continue;
>> +
>> +/* Mark EQ page dirty for migration */
>> +mark_page_dirty(vcpu->kvm, gpa_to_gfn(q->guest_qpage));
>> +}
>> +return 0;
>> +}
>> +
>> +static int kvmppc_xive_native_save_eq_pages(struct kvmppc_xive *xive)
>> +{
>> +struct kvm *kvm = xive->kvm;
>> +struct kvm_vcpu *vcpu;
>> +unsigned int i;
>> +
>> +pr_devel("%s\n", __func__);
>> +
>> +mutex_lock(>lock);
>> +kvm_for_each_vcpu(i, vcpu, kvm) {
>> +kvmppc_xive_native_vcpu_save_eq_pages(vcpu);
>> +}
>> +mutex_unlock(>lock);
>> +
>> +return 0;
>> +}
>> +
>>  static int xive_native_validate_queue_size(u32 qsize)
>>  {
>>  switch (qsize) {
>> @@ -498,6 +535,8 @@ static int kvmppc_xive_native_set_attr(struct kvm_device 
>> *dev,
>>  switch (attr->attr) {
>>  case KVM_DEV_XIVE_VC_BASE:
>>  return kvmppc_xive_native_set_vc_base(xive, attr->addr);
>> +case KVM_DEV_XIVE_SAVE_EQ_PAGES:
>> +return kvmppc_xive_native_save_eq_pages(xive);
>>  }
>>  break;
>>  case KVM_DEV_XIVE_GRP_SOURCES:
>> @@ -538,6 +577,7 @@ static int kvmppc_xive_native_has_attr(struct kvm_device 
>> *dev,
>>  case KVM_DEV_XIVE_GET_ESB_FD:
>>  case KVM_DEV_XIVE_GET_TIMA_FD:
>>  case KVM_DEV_XIVE_VC_BASE:
>> +case KVM_DEV_XIVE_SAVE_EQ_PAGES:
>>  return 0;
>>  }
>>  break;
> 



[PATCH RFC v3 02/21] PCI: Fix race condition in pci_enable/disable_device()

2019-02-04 Thread Sergey Miroshnichenko
 CPU0  CPU1

 pci_enable_device_mem()   pci_enable_device_mem()
   pci_enable_bridge()   pci_enable_bridge()
 pci_is_enabled()
   return false;
 atomic_inc_return(enable_cnt)
 Start actual enabling the bridge
 ...   pci_is_enabled()
 ... return true;
 ...   Start memory requests <-- FAIL
 ...
 Set the PCI_COMMAND_MEMORY bit <-- Must wait for this

This patch protects the pci_enable/disable_device() and pci_enable_bridge()
with mutexes.

Signed-off-by: Sergey Miroshnichenko 
---
 drivers/pci/pci.c   | 26 ++
 drivers/pci/probe.c |  1 +
 include/linux/pci.h |  1 +
 3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index e1fc93c9eea1..3a83e05f8363 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1571,6 +1571,8 @@ static void pci_enable_bridge(struct pci_dev *dev)
struct pci_dev *bridge;
int retval;
 
+   mutex_lock(>enable_mutex);
+
bridge = pci_upstream_bridge(dev);
if (bridge)
pci_enable_bridge(bridge);
@@ -1578,6 +1580,7 @@ static void pci_enable_bridge(struct pci_dev *dev)
if (pci_is_enabled(dev)) {
if (!dev->is_busmaster)
pci_set_master(dev);
+   mutex_unlock(>enable_mutex);
return;
}
 
@@ -1586,11 +1589,14 @@ static void pci_enable_bridge(struct pci_dev *dev)
pci_err(dev, "Error enabling bridge (%d), continuing\n",
retval);
pci_set_master(dev);
+   mutex_unlock(>enable_mutex);
 }
 
 static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 {
struct pci_dev *bridge;
+   /* Enable-locking of bridges is performed within the 
pci_enable_bridge() */
+   bool need_lock = !dev->subordinate;
int err;
int i, bars = 0;
 
@@ -1606,8 +1612,13 @@ static int pci_enable_device_flags(struct pci_dev *dev, 
unsigned long flags)
dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
}
 
-   if (atomic_inc_return(>enable_cnt) > 1)
+   if (need_lock)
+   mutex_lock(>enable_mutex);
+   if (pci_is_enabled(dev)) {
+   if (need_lock)
+   mutex_unlock(>enable_mutex);
return 0;   /* already enabled */
+   }
 
bridge = pci_upstream_bridge(dev);
if (bridge)
@@ -1622,8 +1633,10 @@ static int pci_enable_device_flags(struct pci_dev *dev, 
unsigned long flags)
bars |= (1 << i);
 
err = do_pci_enable_device(dev, bars);
-   if (err < 0)
-   atomic_dec(>enable_cnt);
+   if (err >= 0)
+   atomic_inc(>enable_cnt);
+   if (need_lock)
+   mutex_unlock(>enable_mutex);
return err;
 }
 
@@ -1866,15 +1879,20 @@ void pci_disable_device(struct pci_dev *dev)
if (dr)
dr->enabled = 0;
 
+   mutex_lock(>enable_mutex);
dev_WARN_ONCE(>dev, atomic_read(>enable_cnt) <= 0,
  "disabling already-disabled device");
 
-   if (atomic_dec_return(>enable_cnt) != 0)
+   if (atomic_dec_return(>enable_cnt) != 0) {
+   mutex_unlock(>enable_mutex);
return;
+   }
 
do_pci_disable_device(dev);
 
dev->is_busmaster = 0;
+
+   mutex_unlock(>enable_mutex);
 }
 EXPORT_SYMBOL(pci_disable_device);
 
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 257b9f6f2ebb..bbc12934f041 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2191,6 +2191,7 @@ struct pci_dev *pci_alloc_dev(struct pci_bus *bus)
INIT_LIST_HEAD(>bus_list);
dev->dev.type = _dev_type;
dev->bus = pci_bus_get(bus);
+   mutex_init(>enable_mutex);
 
return dev;
 }
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 65f1d8c2f082..28fecfdd598d 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -416,6 +416,7 @@ struct pci_dev {
unsigned intno_vf_scan:1;   /* Don't scan for VFs after IOV 
enablement */
pci_dev_flags_t dev_flags;
atomic_tenable_cnt; /* pci_enable_device has been called */
+   struct mutexenable_mutex;
 
u32 saved_config_space[16]; /* Config space saved at 
suspend time */
struct hlist_head saved_cap_space;
-- 
2.20.1



[PATCH RFC v3 05/21] PCI: hotplug: Add a flag for the movable BARs feature

2019-02-04 Thread Sergey Miroshnichenko
If a new PCIe device has been hot-plugged between the two active ones
without big enough gap between their BARs, these BARs should be moved
if their drivers support this feature. The drivers should be notified
and paused during the procedure:

1) dev 8 (new)
   |
   v
.. |  dev 3  |  dev 3  |  dev 5  |  dev 7  |
.. |  BAR 0  |  BAR 1  |  BAR 0  |  BAR 0  |

2) dev 8
 |
 v
.. |  dev 3  |  dev 3  | -->   --> |  dev 5  |  dev 7  |
.. |  BAR 0  |  BAR 1  | -->   --> |  BAR 0  |  BAR 0  |

 3)

.. |  dev 3  |  dev 3  |  dev 8  |  dev 8  |  dev 5  |  dev 7  |
.. |  BAR 0  |  BAR 1  |  BAR 0  |  BAR 1  |  BAR 0  |  BAR 0  |

Thus, prior reservation of memory regions by BIOS/bootloader/firmware
is not required anymore for the PCIe hotplug.

The PCI_MOVABLE_BARS flag is set by the platform is this feature is
supported and tested, but can be overridden by the following command
line option:
pcie_movable_bars={ off | force }

Signed-off-by: Sergey Miroshnichenko 
---
 .../admin-guide/kernel-parameters.txt |  7 ++
 drivers/pci/pci.c | 24 +++
 include/linux/pci.h   |  2 ++
 3 files changed, 33 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index b799bcf67d7b..2165c4b5aea6 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3387,6 +3387,13 @@
nomsi   Do not use MSI for native PCIe PME signaling (this makes
all PCIe root ports use INTx for all services).
 
+   pcie_movable_bars=[PCIE]
+   Override the movable BARs support detection:
+   off
+   Disable even if supported by the platform
+   force
+   Enable even if not explicitly declared as supported
+
pcmv=   [HW,PCMCIA] BadgePAD 4
 
pd_ignore_unused
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 55cf18389c15..096413f9ee67 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -139,6 +139,30 @@ static int __init pcie_port_pm_setup(char *str)
 }
 __setup("pcie_port_pm=", pcie_port_pm_setup);
 
+static bool pcie_movable_bars_off;
+static bool pcie_movable_bars_force;
+static int __init pcie_movable_bars_setup(char *str)
+{
+   if (!strcmp(str, "off"))
+   pcie_movable_bars_off = true;
+   else if (!strcmp(str, "force"))
+   pcie_movable_bars_force = true;
+   return 1;
+}
+__setup("pcie_movable_bars=", pcie_movable_bars_setup);
+
+bool pci_movable_bars_enabled(void)
+{
+   if (pcie_movable_bars_off)
+   return false;
+
+   if (pcie_movable_bars_force)
+   return true;
+
+   return pci_has_flag(PCI_MOVABLE_BARS);
+}
+EXPORT_SYMBOL(pci_movable_bars_enabled);
+
 /* Time to wait after a reset for device to become responsive */
 #define PCIE_RESET_READY_POLL_MS 6
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 28fecfdd598d..ba0b1d0ea2d2 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -863,6 +863,7 @@ enum {
PCI_ENABLE_PROC_DOMAINS = 0x0010,   /* Enable domains in /proc */
PCI_COMPAT_DOMAIN_0 = 0x0020,   /* ... except domain 0 */
PCI_SCAN_ALL_PCIE_DEVS  = 0x0040,   /* Scan all, not just dev 0 */
+   PCI_MOVABLE_BARS= 0x0080,   /* Runtime BAR reassign after 
hotplug */
 };
 
 /* These external functions are only available when PCI support is enabled */
@@ -1342,6 +1343,7 @@ unsigned char pci_bus_max_busnr(struct pci_bus *bus);
 void pci_setup_bridge(struct pci_bus *bus);
 resource_size_t pcibios_window_alignment(struct pci_bus *bus,
 unsigned long type);
+bool pci_movable_bars_enabled(void);
 
 #define PCI_VGA_STATE_CHANGE_BRIDGE (1 << 0)
 #define PCI_VGA_STATE_CHANGE_DECODES (1 << 1)
-- 
2.20.1



[PATCH RFC v3 00/21] PCI: Allow BAR movement during hotplug

2019-02-04 Thread Sergey Miroshnichenko
If the firmware or kernel has arranged memory for PCIe devices in a way that
doesn't provide enough space for BARs of a new hotplugged device, the kernel
can pause the drivers of the "obstructing" devices and move their BARs, so
new BARs can fit into the freed spaces.

When a driver is un-paused by the kernel after the PCIe rescan, it should
check if its BARs had moved, and ioremap() them if needed.

Drivers indicate their support of the feature by implementing the new
rescan_prepare() and rescan_done() hooks in the struct pci_driver. If a
driver doesn't yet support the feature, BARs of its devices will be marked
as immovable by the IORESOURCE_PCI_FIXED flag.

To re-arrange the BARs and bridge windows this patch releases all of them
after a rescan and re-assigns in the same way as during the initial PCIe
topology scan at system boot.

Tested on:
 - x86_64 with "pci=realloc,assign-busses,use_crs pcie_movable_bars=force"
 - ppc64le POWER8 PowerNV (with extra arch-specific patches which will be
   introduced later) with "pci=realloc pcie_movable_bars=force"

Not so many platforms and test cases were covered, so all who are interested
are highly welcome to test on your setups - the more exotic the better!

Changes since v2:
 - Fixed double-assignment of bridge windows;
 - Fixed assignment of fixed prefetched resources;
 - Fixed releasing of fixed resources;
 - Fixed a debug message;
 - Removed auto-enabling the movable BARs for x86 - let's rely on the
   "pcie_movable_bars=force" option for now;
 - Reordered the patches - bugfixes first.

Changes since v1:
 - Add a "pcie_movable_bars={ off | force }" command line argument;
 - Handle the IORESOURCE_PCI_FIXED flag properly;
 - Don't move BARs of devices which don't support the feature;
 - Guarantee that new hotplugged devices will not steal memory from working
   devices by ignoring the failing new devices with the new PCI_DEV_IGNORE
   flag;
 - Add rescan_prepare()+rescan_done() to the struct pci_driver instead of
   using the reset_prepare()+reset_done() from struct pci_error_handlers;
 - Add a bugfix of a race condition;
 - Fixed hotplug in a non-pre-enabled (by BIOS/firmware) bridge;
 - Fix the compatibility of the feature with pm_runtime and D3-state;
 - Hotplug events from pciehp also can move BARs;
 - Add support of the feature to the NVME driver.

This patchset is a part of our work on adding support for hotplugging
bridges full of NVME and GPU devices (without special requirement such as
Hot-Plug Controller, reservation of bus numbers and memory regions by
firmware, etc.). Next patchset will implement the movable bus numbers.

Sergey Miroshnichenko (21):
  PCI: Fix writing invalid BARs during pci_restore_state()
  PCI: Fix race condition in pci_enable/disable_device()
  PCI: Enable bridge's I/O and MEM access for hotplugged devices
  PCI: Define PCI-specific version of the release_child_resources()
  PCI: hotplug: Add a flag for the movable BARs feature
  PCI: Pause the devices with movable BARs during rescan
  PCI: Wake up bridges during rescan when movable BARs enabled
  nvme-pci: Handle movable BARs
  PCI: Mark immovable BARs with PCI_FIXED
  PCI: Fix assigning of fixed prefetchable resources
  PCI: Release and reassign the root bridge resources during rescan
  PCI: Don't allow hotplugged devices to steal resources
  PCI: Include fixed BARs into the bus size calculating
  PCI: Don't reserve memory for hotplug when enabled movable BARs
  PCI: Allow the failed resources to be reassigned later
  PCI: Calculate fixed areas of bridge windows based on fixed BARs
  PCI: Calculate boundaries for bridge windows
  PCI: Make sure bridge windows include their fixed BARs
  PCI: Prioritize fixed BAR assigning over the movable ones
  PCI: pciehp: Add support for the movable BARs feature
  powerpc/pci: Fix crash with enabled movable BARs

 .../admin-guide/kernel-parameters.txt |   7 +
 arch/powerpc/platforms/powernv/pci-ioda.c |   3 +-
 drivers/nvme/host/pci.c   |  29 +-
 drivers/pci/bus.c |   7 +-
 drivers/pci/hotplug/pciehp_pci.c  |  14 +-
 drivers/pci/pci.c |  60 +++-
 drivers/pci/pci.h |  26 ++
 drivers/pci/probe.c   | 271 +-
 drivers/pci/setup-bus.c   | 245 ++--
 drivers/pci/setup-res.c   |  43 ++-
 include/linux/pci.h   |  14 +
 11 files changed, 678 insertions(+), 41 deletions(-)

-- 
2.20.1



[PATCH RFC v3 21/21] powerpc/pci: Fix crash with enabled movable BARs

2019-02-04 Thread Sergey Miroshnichenko
Check a resource for the UNSET flags.

Signed-off-by: Sergey Miroshnichenko 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 7db3119f8a5b..5354dfdf1028 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3038,7 +3038,8 @@ static void pnv_ioda_setup_pe_res(struct pnv_ioda_pe *pe,
int index;
int64_t rc;
 
-   if (!res || !res->flags || res->start > res->end)
+   if (!res || !res->flags || res->start > res->end ||
+   (res->flags & IORESOURCE_UNSET))
return;
 
if (res->flags & IORESOURCE_IO) {
-- 
2.20.1



Re: [PATCH 08/19] KVM: PPC: Book3S HV: add a VC_BASE control to the XIVE native device

2019-02-04 Thread Cédric Le Goater
On 2/4/19 5:49 AM, David Gibson wrote:
> On Wed, Jan 23, 2019 at 05:56:26PM +0100, Cédric Le Goater wrote:
>> On 1/22/19 6:14 AM, Paul Mackerras wrote:
>>> On Mon, Jan 07, 2019 at 07:43:20PM +0100, Cédric Le Goater wrote:
 The ESB MMIO region controls the interrupt sources of the guest. QEMU
 will query an fd (GET_ESB_FD ioctl) and map this region at a specific
 address for the guest to use. The guest will obtain this information
 using the H_INT_GET_SOURCE_INFO hcall. To inform KVM of the address
 setting used by QEMU, add a VC_BASE control to the KVM XIVE device
>>>
>>> This needs a little more explanation.  I *think* the only way this
>>> gets used is that it gets returned to the guest by the new
>>> hypercalls.  If that is indeed the case it would be useful to mention
>>> that in the patch description, because otherwise taking a value that
>>> userspace provides and which looks like it is an address, and not
>>> doing any validation on it, looks a bit scary.
>>
>> I think we have solved this problem in another email thread. 
>>
>> The H_INT_GET_SOURCE_INFO hcall does not need to be implemented in KVM
>> as all the source information should already be available in QEMU. In
>> that case, there is no need to inform KVM of where the ESB pages are 
>> mapped in the guest address space. So we don't need that extra control
>> on the KVM device. This is good news.
> 
> Ah, good to hear.  I thought this looked strange.

yes. I didn't know which path to choose between HV real mode, HV, QEMU. 
It's clarified now. 

But now, we have nested, and this is adding quite a bit of strangeness 
to the hcall possibilities.

C.  


[PATCH 17/17] powerpc/64s/exception: move head-64.h code to exception-64s.S where it is used

2019-02-04 Thread Nicholas Piggin
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/head-64.h   | 252 ---
 arch/powerpc/kernel/exceptions-64s.S | 251 ++
 2 files changed, 251 insertions(+), 252 deletions(-)

diff --git a/arch/powerpc/include/asm/head-64.h 
b/arch/powerpc/include/asm/head-64.h
index 341f23306a80..a466765709a9 100644
--- a/arch/powerpc/include/asm/head-64.h
+++ b/arch/powerpc/include/asm/head-64.h
@@ -169,53 +169,6 @@ end_##sname:
 
 #define ABS_ADDR(label) (label - fs_label + fs_start)
 
-/*
- * Following are the BOOK3S exception handler helper macros.
- * Handlers come in a number of types, and each type has a number of varieties.
- *
- * EXC_REAL_* - real, unrelocated exception vectors
- * EXC_VIRT_* - virt (AIL), unrelocated exception vectors
- * TRAMP_REAL_*   - real, unrelocated helpers (virt can call these)
- * TRAMP_VIRT_*   - virt, unreloc helpers (in practice, real can use)
- * TRAMP_KVM  - KVM handlers that get put into real, unrelocated
- * EXC_COMMON - virt, relocated common handlers
- *
- * The EXC handlers are given a name, and branch to name_common, or the
- * appropriate KVM or masking function. Vector handler verieties are as
- * follows:
- *
- * EXC_{REAL|VIRT}_BEGIN/END - used to open-code the exception
- *
- * EXC_{REAL|VIRT}  - standard exception
- *
- * EXC_{REAL|VIRT}_suffix
- * where _suffix is:
- *   - _MASKABLE   - maskable exception
- *   - _OOL- out of line with trampoline to common handler
- *   - _HV - HV exception
- *
- * There can be combinations, e.g., EXC_VIRT_OOL_MASKABLE_HV
- *
- * The one unusual case is __EXC_REAL_OOL_HV_DIRECT, which is
- * an OOL vector that branches to a specified handler rather than the usual
- * trampoline that goes to common. It, and other underscore macros, should
- * be used with care.
- *
- * KVM handlers come in the following verieties:
- * TRAMP_KVM
- * TRAMP_KVM_SKIP
- * TRAMP_KVM_HV
- * TRAMP_KVM_HV_SKIP
- *
- * COMMON handlers come in the following verieties:
- * EXC_COMMON_BEGIN/END - used to open-code the handler
- * EXC_COMMON
- * EXC_COMMON_ASYNC
- *
- * TRAMP_REAL and TRAMP_VIRT can be used with BEGIN/END. KVM
- * and OOL handlers are implemented as types of TRAMP and TRAMP_VIRT handlers.
- */
-
 #define EXC_REAL_BEGIN(name, start, size)  \
FIXED_SECTION_ENTRY_BEGIN_LOCATION(real_vectors, 
exc_real_##start##_##name, start, size)
 
@@ -257,211 +210,6 @@ end_##sname:
FIXED_SECTION_ENTRY_BEGIN_LOCATION(virt_vectors, 
exc_virt_##start##_##unused, start, size); \
FIXED_SECTION_ENTRY_END_LOCATION(virt_vectors, 
exc_virt_##start##_##unused, start, size)
 
-
-#define __EXC_REAL(name, start, size, area)\
-   EXC_REAL_BEGIN(name, start, size);  \
-   SET_SCRATCH0(r13);  /* save r13 */  \
-   EXCEPTION_PROLOG_0 area ;   \
-   EXCEPTION_PROLOG_1 EXC_STD, area, 1, start, 0 ; \
-   EXCEPTION_PROLOG_2_REAL name##_common, EXC_STD, 1 ; \
-   EXC_REAL_END(name, start, size)
-
-#define EXC_REAL(name, start, size)\
-   __EXC_REAL(name, start, size, PACA_EXGEN)
-
-#define __EXC_VIRT(name, start, size, realvec, area)   \
-   EXC_VIRT_BEGIN(name, start, size);  \
-   SET_SCRATCH0(r13);/* save r13 */\
-   EXCEPTION_PROLOG_0 area ;   \
-   EXCEPTION_PROLOG_1 EXC_STD, area, 0, realvec, 0;\
-   EXCEPTION_PROLOG_2_VIRT name##_common, EXC_STD ;\
-   EXC_VIRT_END(name, start, size)
-
-#define EXC_VIRT(name, start, size, realvec)   \
-   __EXC_VIRT(name, start, size, realvec, PACA_EXGEN)
-
-#define EXC_REAL_MASKABLE(name, start, size, bitmask)  \
-   EXC_REAL_BEGIN(name, start, size);  \
-   SET_SCRATCH0(r13);/* save r13 */\
-   EXCEPTION_PROLOG_0 PACA_EXGEN ; \
-   EXCEPTION_PROLOG_1 EXC_STD, PACA_EXGEN, 1, start, bitmask ; \
-   EXCEPTION_PROLOG_2_REAL name##_common, EXC_STD, 1 ; \
-   EXC_REAL_END(name, start, size)
-
-#define EXC_VIRT_MASKABLE(name, start, size, realvec, bitmask) \
-   EXC_VIRT_BEGIN(name, start, size);  \
-   SET_SCRATCH0(r13);/* save r13 */\
-   EXCEPTION_PROLOG_0 PACA_EXGEN ; \
-   EXCEPTION_PROLOG_1 EXC_STD, PACA_EXGEN, 0, realvec, bitmask ;   \
-   EXCEPTION_PROLOG_2_VIRT name##_common, EXC_STD ;\
-   EXC_VIRT_END(name, start, size)
-
-#define EXC_REAL_HV(name, start, size)

[PATCH 16/17] powerpc/64s/exception: move exception-64s.h code to exception-64s.S where it is used

2019-02-04 Thread Nicholas Piggin
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/exception-64s.h | 429 ---
 arch/powerpc/kernel/exceptions-64s.S | 429 +++
 2 files changed, 429 insertions(+), 429 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 3eb8f9a4eac8..d9f48831fbd0 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -149,435 +149,6 @@
RFI_FLUSH_SLOT; \
hrfid;  \
b   hrfi_flush_fallback
-
-/*
- * We're short on space and time in the exception prolog, so we can't
- * use the normal LOAD_REG_IMMEDIATE macro to load the address of label.
- * Instead we get the base of the kernel from paca->kernelbase and or in the 
low
- * part of label. This requires that the label be within 64KB of kernelbase, 
and
- * that kernelbase be 64K aligned.
- */
-#define LOAD_HANDLER(reg, label)   \
-   ld  reg,PACAKBASE(r13); /* get high part of  */   \
-   ori reg,reg,FIXED_SYMBOL_ABS_ADDR(label)
-
-#define __LOAD_HANDLER(reg, label) \
-   ld  reg,PACAKBASE(r13); \
-   ori reg,reg,(ABS_ADDR(label))@l
-
-/*
- * Branches from unrelocated code (e.g., interrupts) to labels outside
- * head-y require >64K offsets.
- */
-#define __LOAD_FAR_HANDLER(reg, label) \
-   ld  reg,PACAKBASE(r13); \
-   ori reg,reg,(ABS_ADDR(label))@l;\
-   addis   reg,reg,(ABS_ADDR(label))@h
-
-/* Exception register prefixes */
-#define EXC_HV 1
-#define EXC_STD0
-
-#if defined(CONFIG_RELOCATABLE)
-/*
- * If we support interrupts with relocation on AND we're a relocatable kernel,
- * we need to use CTR to get to the 2nd level handler.  So, save/restore it
- * when required.
- */
-#define SAVE_CTR(reg, area)mfctr   reg ;   std reg,area+EX_CTR(r13)
-#define GET_CTR(reg, area) ld  reg,area+EX_CTR(r13)
-#define RESTORE_CTR(reg, area) ld  reg,area+EX_CTR(r13) ; mtctr reg
-#else
-/* ...else CTR is unused and in register. */
-#define SAVE_CTR(reg, area)
-#define GET_CTR(reg, area) mfctr   reg
-#define RESTORE_CTR(reg, area)
-#endif
-
-/*
- * PPR save/restore macros used in exceptions_64s.S  
- * Used for P7 or later processors
- */
-#define SAVE_PPR(area, ra) \
-BEGIN_FTR_SECTION_NESTED(940)  \
-   ld  ra,area+EX_PPR(r13);/* Read PPR from paca */\
-   std ra,_PPR(r1);\
-END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,940)
-
-#define RESTORE_PPR_PACA(area, ra) \
-BEGIN_FTR_SECTION_NESTED(941)  \
-   ld  ra,area+EX_PPR(r13);\
-   mtspr   SPRN_PPR,ra;\
-END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,941)
-
-/*
- * Get an SPR into a register if the CPU has the given feature
- */
-#define OPT_GET_SPR(ra, spr, ftr)  \
-BEGIN_FTR_SECTION_NESTED(943)  \
-   mfspr   ra,spr; \
-END_FTR_SECTION_NESTED(ftr,ftr,943)
-
-/*
- * Set an SPR from a register if the CPU has the given feature
- */
-#define OPT_SET_SPR(ra, spr, ftr)  \
-BEGIN_FTR_SECTION_NESTED(943)  \
-   mtspr   spr,ra; \
-END_FTR_SECTION_NESTED(ftr,ftr,943)
-
-/*
- * Save a register to the PACA if the CPU has the given feature
- */
-#define OPT_SAVE_REG_TO_PACA(offset, ra, ftr)  \
-BEGIN_FTR_SECTION_NESTED(943)  \
-   std ra,offset(r13); \
-END_FTR_SECTION_NESTED(ftr,ftr,943)
-
-.macro EXCEPTION_PROLOG_0 area
-   GET_PACA(r13)
-   std r9,\area\()+EX_R9(r13)  /* save r9 */
-   OPT_GET_SPR(r9, SPRN_PPR, CPU_FTR_HAS_PPR)
-   HMT_MEDIUM
-   std r10,\area\()+EX_R10(r13)/* save r10 - r12 */
-   OPT_GET_SPR(r10, SPRN_CFAR, CPU_FTR_CFAR)
-.endm
-
-.macro EXCEPTION_PROLOG_1 hsrr, area, kvm, vec, bitmask
-   OPT_SAVE_REG_TO_PACA(\area\()+EX_PPR, r9, CPU_FTR_HAS_PPR)
-   OPT_SAVE_REG_TO_PACA(\area\()+EX_CFAR, r10, CPU_FTR_CFAR)
-   INTERRUPT_TO_KERNEL
-   SAVE_CTR(r10, \area\())
-   mfcrr9
-   .if \kvm
-   KVMTEST \hsrr \vec
-   .endif
-   .if 

[PATCH 15/17] powerpc/64s/exception: move KVM related code together

2019-02-04 Thread Nicholas Piggin
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/exception-64s.h | 40 +---
 1 file changed, 21 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 4d3bd10ea59a..3eb8f9a4eac8 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -339,18 +339,6 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #endif
 .endm
 
-
-#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
-/*
- * If hv is possible, interrupts come into to the hv version
- * of the kvmppc_interrupt code, which then jumps to the PR handler,
- * kvmppc_interrupt_pr, if the guest is a PR guest.
- */
-#define kvmppc_interrupt kvmppc_interrupt_hv
-#else
-#define kvmppc_interrupt kvmppc_interrupt_pr
-#endif
-
 /*
  * Branch to label using its 0xC000 address. This results in instruction
  * address suitable for MSR[IR]=0 or 1, which allows relocation to be turned
@@ -375,6 +363,17 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
mtctr   r12;\
bctrl
 
+#else
+#define BRANCH_TO_COMMON(reg, label)   \
+   b   label
+
+#define BRANCH_LINK_TO_FAR(label)  \
+   bl  label
+#endif
+
+#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
+
+#ifdef CONFIG_RELOCATABLE
 /*
  * KVM requires __LOAD_FAR_HANDLER.
  *
@@ -391,19 +390,22 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
bctr
 
 #else
-#define BRANCH_TO_COMMON(reg, label)   \
-   b   label
-
-#define BRANCH_LINK_TO_FAR(label)  \
-   bl  label
-
 #define __BRANCH_TO_KVM_EXIT(area, label)  \
ld  r9,area+EX_R9(r13); \
b   label
+#endif
 
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+/*
+ * If hv is possible, interrupts come into to the hv version
+ * of the kvmppc_interrupt code, which then jumps to the PR handler,
+ * kvmppc_interrupt_pr, if the guest is a PR guest.
+ */
+#define kvmppc_interrupt kvmppc_interrupt_hv
+#else
+#define kvmppc_interrupt kvmppc_interrupt_pr
 #endif
 
-#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
 .macro KVMTEST hsrr, n
lbz r10,HSTATE_IN_GUEST(r13)
cmpwi   r10,0
-- 
2.18.0



[PATCH 14/17] powerpc/64s/exception: remove STD_EXCEPTION_COMMON variants

2019-02-04 Thread Nicholas Piggin
These are only called in one place each.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/exception-64s.h | 22 --
 arch/powerpc/include/asm/head-64.h   | 19 +--
 2 files changed, 17 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index e7e5c71b1ad0..4d3bd10ea59a 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -557,28 +557,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_CTRL)
EXCEPTION_PROLOG_COMMON_2(area);\
EXCEPTION_PROLOG_COMMON_3(trap)
 
-#define STD_EXCEPTION_COMMON(trap, hdlr)   \
-   EXCEPTION_COMMON(PACA_EXGEN, trap); \
-   bl  save_nvgprs;\
-   RECONCILE_IRQ_STATE(r10, r11);  \
-   addir3,r1,STACK_FRAME_OVERHEAD; \
-   bl  hdlr;   \
-   b   ret_from_except
-
-/*
- * Like STD_EXCEPTION_COMMON, but for exceptions that can occur
- * in the idle task and therefore need the special idle handling
- * (finish nap and runlatch)
- */
-#define STD_EXCEPTION_COMMON_ASYNC(trap, hdlr) \
-   EXCEPTION_COMMON(PACA_EXGEN, trap); \
-   FINISH_NAP; \
-   RECONCILE_IRQ_STATE(r10, r11);  \
-   RUNLATCH_ON;\
-   addir3,r1,STACK_FRAME_OVERHEAD; \
-   bl  hdlr;   \
-   b   ret_from_except_lite
-
 /*
  * When the idle code in power4_idle puts the CPU into NAP mode,
  * it has to do so in a loop, and relies on the external interrupt
diff --git a/arch/powerpc/include/asm/head-64.h 
b/arch/powerpc/include/asm/head-64.h
index 40096c097c3c..341f23306a80 100644
--- a/arch/powerpc/include/asm/head-64.h
+++ b/arch/powerpc/include/asm/head-64.h
@@ -441,11 +441,26 @@ end_##sname:
 
 #define EXC_COMMON(name, realvec, hdlr)
\
EXC_COMMON_BEGIN(name); \
-   STD_EXCEPTION_COMMON(realvec, hdlr)
+   EXCEPTION_COMMON(PACA_EXGEN, realvec);  \
+   bl  save_nvgprs;\
+   RECONCILE_IRQ_STATE(r10, r11);  \
+   addir3,r1,STACK_FRAME_OVERHEAD; \
+   bl  hdlr;   \
+   b   ret_from_except
 
+/*
+ * Like EXC_COMMON, but for exceptions that can occur in the idle task and
+ * therefore need the special idle handling (finish nap and runlatch)
+ */
 #define EXC_COMMON_ASYNC(name, realvec, hdlr)  \
EXC_COMMON_BEGIN(name); \
-   STD_EXCEPTION_COMMON_ASYNC(realvec, hdlr)
+   EXCEPTION_COMMON(PACA_EXGEN, realvec);  \
+   FINISH_NAP; \
+   RECONCILE_IRQ_STATE(r10, r11);  \
+   RUNLATCH_ON;\
+   addir3,r1,STACK_FRAME_OVERHEAD; \
+   bl  hdlr;   \
+   b   ret_from_except_lite
 
 #endif /* __ASSEMBLY__ */
 
-- 
2.18.0



[PATCH 13/17] powerpc/64s/exception: move EXCEPTION_PROLOG_2* to a more logical place

2019-02-04 Thread Nicholas Piggin
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/exception-64s.h | 113 ---
 1 file changed, 57 insertions(+), 56 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index f8f657321c88..e7e5c71b1ad0 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -174,62 +174,6 @@
ori reg,reg,(ABS_ADDR(label))@l;\
addis   reg,reg,(ABS_ADDR(label))@h
 
-.macro EXCEPTION_PROLOG_2_REAL label, hsrr, set_ri
-   ld  r10,PACAKMSR(r13)   /* get MSR value for kernel */
-   .if ! \set_ri
-   xorir10,r10,MSR_RI  /* Clear MSR_RI */
-   .endif
-   .if \hsrr
-   mfspr   r11,SPRN_HSRR0  /* save HSRR0 */
-   .else
-   mfspr   r11,SPRN_SRR0   /* save SRR0 */
-   .endif
-   LOAD_HANDLER(r12, \label\())
-   .if \hsrr
-   mtspr   SPRN_HSRR0,r12
-   mfspr   r12,SPRN_HSRR1  /* and HSRR1 */
-   mtspr   SPRN_HSRR1,r10
-   HRFI_TO_KERNEL
-   .else
-   mtspr   SPRN_SRR0,r12
-   mfspr   r12,SPRN_SRR1   /* and SRR1 */
-   mtspr   SPRN_SRR1,r10
-   RFI_TO_KERNEL
-   .endif
-   b   .   /* prevent speculative execution */
-.endm
-
-.macro EXCEPTION_PROLOG_2_VIRT label, hsrr
-#ifdef CONFIG_RELOCATABLE
-   .if \hsrr
-   mfspr   r11,SPRN_HSRR0  /* save HSRR0 */
-   .else
-   mfspr   r11,SPRN_SRR0   /* save SRR0 */
-   .endif
-   LOAD_HANDLER(r12, \label\())
-   mtctr   r12
-   .if \hsrr
-   mfspr   r12,SPRN_HSRR1  /* and HSRR1 */
-   .else
-   mfspr   r12,SPRN_SRR1   /* and HSRR1 */
-   .endif
-   li  r10,MSR_RI
-   mtmsrd  r10,1   /* Set RI (EE=0) */
-   bctr
-#else
-   .if \hsrr
-   mfspr   r11,SPRN_HSRR0  /* save HSRR0 */
-   mfspr   r12,SPRN_HSRR1  /* and HSRR1 */
-   .else
-   mfspr   r11,SPRN_SRR0   /* save SRR0 */
-   mfspr   r12,SPRN_SRR1   /* and SRR1 */
-   .endif
-   li  r10,MSR_RI
-   mtmsrd  r10,1   /* Set RI (EE=0) */
-   b   label
-#endif
-.endm
-
 /* Exception register prefixes */
 #define EXC_HV 1
 #define EXC_STD0
@@ -339,6 +283,63 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
std r10,\area\()+EX_R13(r13)
 .endm
 
+.macro EXCEPTION_PROLOG_2_REAL label, hsrr, set_ri
+   ld  r10,PACAKMSR(r13)   /* get MSR value for kernel */
+   .if ! \set_ri
+   xorir10,r10,MSR_RI  /* Clear MSR_RI */
+   .endif
+   .if \hsrr
+   mfspr   r11,SPRN_HSRR0  /* save HSRR0 */
+   .else
+   mfspr   r11,SPRN_SRR0   /* save SRR0 */
+   .endif
+   LOAD_HANDLER(r12, \label\())
+   .if \hsrr
+   mtspr   SPRN_HSRR0,r12
+   mfspr   r12,SPRN_HSRR1  /* and HSRR1 */
+   mtspr   SPRN_HSRR1,r10
+   HRFI_TO_KERNEL
+   .else
+   mtspr   SPRN_SRR0,r12
+   mfspr   r12,SPRN_SRR1   /* and SRR1 */
+   mtspr   SPRN_SRR1,r10
+   RFI_TO_KERNEL
+   .endif
+   b   .   /* prevent speculative execution */
+.endm
+
+.macro EXCEPTION_PROLOG_2_VIRT label, hsrr
+#ifdef CONFIG_RELOCATABLE
+   .if \hsrr
+   mfspr   r11,SPRN_HSRR0  /* save HSRR0 */
+   .else
+   mfspr   r11,SPRN_SRR0   /* save SRR0 */
+   .endif
+   LOAD_HANDLER(r12, \label\())
+   mtctr   r12
+   .if \hsrr
+   mfspr   r12,SPRN_HSRR1  /* and HSRR1 */
+   .else
+   mfspr   r12,SPRN_SRR1   /* and HSRR1 */
+   .endif
+   li  r10,MSR_RI
+   mtmsrd  r10,1   /* Set RI (EE=0) */
+   bctr
+#else
+   .if \hsrr
+   mfspr   r11,SPRN_HSRR0  /* save HSRR0 */
+   mfspr   r12,SPRN_HSRR1  /* and HSRR1 */
+   .else
+   mfspr   r11,SPRN_SRR0   /* save SRR0 */
+   mfspr   r12,SPRN_SRR1   /* and SRR1 */
+   .endif
+   li  r10,MSR_RI
+   mtmsrd  r10,1   /* Set RI (EE=0) */
+   b   label
+#endif
+.endm
+
+
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 /*
  * If hv is possible, interrupts come into to the hv version
-- 
2.18.0



[PATCH 12/17] powerpc/64s/exception: unwind exception-64s.h macros

2019-02-04 Thread Nicholas Piggin
Many of these macros just specify 1-4 lines which are only called a
few times each at most, and often just once. Remove this indirection.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/exception-64s.h | 101 ---
 arch/powerpc/include/asm/head-64.h   |  76 -
 arch/powerpc/kernel/exceptions-64s.S |  50 +--
 3 files changed, 78 insertions(+), 149 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 05e8aff58d96..f8f657321c88 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -230,17 +230,6 @@
 #endif
 .endm
 
-/*
- * As EXCEPTION_PROLOG(), except we've already got relocation on so no need to
- * rfid. Save CTR in case we're CONFIG_RELOCATABLE, in which case
- * EXCEPTION_PROLOG_2_VIRT will be using CTR.
- */
-#define EXCEPTION_RELON_PROLOG(area, label, hsrr, kvm, vec)\
-   SET_SCRATCH0(r13);  /* save r13 */  \
-   EXCEPTION_PROLOG_0 area ;   \
-   EXCEPTION_PROLOG_1 hsrr, area, kvm, vec, 0 ;\
-   EXCEPTION_PROLOG_2_VIRT label, hsrr
-
 /* Exception register prefixes */
 #define EXC_HV 1
 #define EXC_STD0
@@ -350,12 +339,6 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
std r10,\area\()+EX_R13(r13)
 .endm
 
-#define EXCEPTION_PROLOG(area, label, hsrr, kvm, vec)  \
-   SET_SCRATCH0(r13);  /* save r13 */  \
-   EXCEPTION_PROLOG_0 area ;   \
-   EXCEPTION_PROLOG_1 hsrr, area, kvm, vec, 0 ;\
-   EXCEPTION_PROLOG_2_REAL label, hsrr, 1
-
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 /*
  * If hv is possible, interrupts come into to the hv version
@@ -419,12 +402,6 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 
 #endif
 
-/* Do not enable RI */
-#define EXCEPTION_PROLOG_NORI(area, label, hsrr, kvm, vec) \
-   EXCEPTION_PROLOG_0 area ;   \
-   EXCEPTION_PROLOG_1 hsrr, area, kvm, vec, 0 ;\
-   EXCEPTION_PROLOG_2_REAL label, hsrr, 0
-
 #ifdef CONFIG_KVM_BOOK3S_64_HANDLER
 .macro KVMTEST hsrr, n
lbz r10,HSTATE_IN_GUEST(r13)
@@ -560,84 +537,6 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
std r10,RESULT(r1); /* clear regs->result   */ \
std r11,STACK_FRAME_OVERHEAD-16(r1); /* mark the frame  */
 
-/*
- * Exception vectors.
- */
-#define STD_EXCEPTION(vec, label)  \
-   EXCEPTION_PROLOG(PACA_EXGEN, label, EXC_STD, 1, vec);
-
-/* Version of above for when we have to branch out-of-line */
-#define __OOL_EXCEPTION(vec, label, hdlr)  \
-   SET_SCRATCH0(r13);  \
-   EXCEPTION_PROLOG_0 PACA_EXGEN ; \
-   b hdlr
-
-#define STD_EXCEPTION_OOL(vec, label)  \
-   EXCEPTION_PROLOG_1 EXC_STD, PACA_EXGEN, 1, vec, 0 ; \
-   EXCEPTION_PROLOG_2_REAL label, EXC_STD, 1
-
-#define STD_EXCEPTION_HV(loc, vec, label)  \
-   EXCEPTION_PROLOG(PACA_EXGEN, label, EXC_HV, 1, vec)
-
-#define STD_EXCEPTION_HV_OOL(vec, label)   \
-   EXCEPTION_PROLOG_1 EXC_HV, PACA_EXGEN, 1, vec, 0 ;  \
-   EXCEPTION_PROLOG_2_REAL label, EXC_HV, 1
-
-#define STD_RELON_EXCEPTION(loc, vec, label)   \
-   /* No guest interrupts come through here */ \
-   EXCEPTION_RELON_PROLOG(PACA_EXGEN, label, EXC_STD, 0, vec)
-
-#define STD_RELON_EXCEPTION_OOL(vec, label)\
-   EXCEPTION_PROLOG_1 EXC_STD, PACA_EXGEN, 0, vec, 0 ; \
-   EXCEPTION_PROLOG_2_VIRT label, EXC_STD
-
-#define STD_RELON_EXCEPTION_HV(loc, vec, label)\
-   EXCEPTION_RELON_PROLOG(PACA_EXGEN, label, EXC_HV, 1, vec)
-
-#define STD_RELON_EXCEPTION_HV_OOL(vec, label) \
-   EXCEPTION_PROLOG_1 EXC_HV, PACA_EXGEN, 1, vec, 0 ;  \
-   EXCEPTION_PROLOG_2_VIRT label, EXC_HV
-
-#define __MASKABLE_EXCEPTION(vec, label, hsrr, kvm, bitmask)   \
-   SET_SCRATCH0(r13);/* save r13 */\
-   EXCEPTION_PROLOG_0 PACA_EXGEN ; \
-   EXCEPTION_PROLOG_1 hsrr, PACA_EXGEN, kvm, vec, bitmask ;\
-   EXCEPTION_PROLOG_2_REAL label, hsrr, 1
-
-#define MASKABLE_EXCEPTION(vec, label, bitmask)
\
-   __MASKABLE_EXCEPTION(vec, label, EXC_STD, 1, bitmask)
-
-#define MASKABLE_EXCEPTION_OOL(vec, label, bitmask)\
-   EXCEPTION_PROLOG_1 EXC_STD, PACA_EXGEN, 1, vec, bitmask ;   \
-   EXCEPTION_PROLOG_2_REAL label, EXC_STD, 1
-
-#define MASKABLE_EXCEPTION_HV(vec, label, bitmask) \
-   __MASKABLE_EXCEPTION(vec, 

[PATCH 11/17] powerpc/64s/exception: Move EXCEPTION_COMMON additions into callers

2019-02-04 Thread Nicholas Piggin
More cases of code insertion via macros that does not add a great
deal. All the additions have to be specified in the macro arguments,
so they can just as well go after the macro.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/exception-64s.h | 42 +++---
 arch/powerpc/include/asm/head-64.h   |  4 +--
 arch/powerpc/kernel/exceptions-64s.S | 45 +---
 3 files changed, 39 insertions(+), 52 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 676c877f6190..05e8aff58d96 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -638,21 +638,6 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
EXCEPTION_PROLOG_1 EXC_HV, PACA_EXGEN, 1, vec, bitmask ;\
EXCEPTION_PROLOG_2_VIRT label, EXC_HV
 
-/*
- * Our exception common code can be passed various "additions"
- * to specify the behaviour of interrupts, whether to kick the
- * runlatch, etc...
- */
-
-/*
- * This addition reconciles our actual IRQ state with the various software
- * flags that track it. This may call C code.
- */
-#define ADD_RECONCILE  RECONCILE_IRQ_STATE(r10,r11)
-
-#define ADD_NVGPRS \
-   bl  save_nvgprs
-
 #define RUNLATCH_ON\
 BEGIN_FTR_SECTION  \
CURRENT_THREAD_INFO(r3, r1);\
@@ -661,24 +646,21 @@ BEGIN_FTR_SECTION \
beqlppc64_runlatch_on_trampoline;   \
 END_FTR_SECTION_IFSET(CPU_FTR_CTRL)
 
-#define EXCEPTION_COMMON(area, trap, label, additions) \
+#define EXCEPTION_COMMON(area, trap)   \
EXCEPTION_PROLOG_COMMON(trap, area);\
-   /* Volatile regs are potentially clobbered here */  \
-   additions
 
 /*
- * Exception where stack is already set in r1, r1 is saved in r10, and it
- * continues rather than returns.
+ * Exception where stack is already set in r1, r1 is saved in r10
  */
-#define EXCEPTION_COMMON_NORET_STACK(area, trap, label, additions) \
+#define EXCEPTION_COMMON_STACK(area, trap) \
EXCEPTION_PROLOG_COMMON_1();\
EXCEPTION_PROLOG_COMMON_2(area);\
-   EXCEPTION_PROLOG_COMMON_3(trap);\
-   /* Volatile regs are potentially clobbered here */  \
-   additions
+   EXCEPTION_PROLOG_COMMON_3(trap)
 
-#define STD_EXCEPTION_COMMON(trap, label, hdlr)\
-   EXCEPTION_COMMON(PACA_EXGEN, trap, label, ADD_NVGPRS;ADD_RECONCILE); \
+#define STD_EXCEPTION_COMMON(trap, hdlr)   \
+   EXCEPTION_COMMON(PACA_EXGEN, trap); \
+   bl  save_nvgprs;\
+   RECONCILE_IRQ_STATE(r10, r11);  \
addir3,r1,STACK_FRAME_OVERHEAD; \
bl  hdlr;   \
b   ret_from_except
@@ -688,9 +670,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_CTRL)
  * in the idle task and therefore need the special idle handling
  * (finish nap and runlatch)
  */
-#define STD_EXCEPTION_COMMON_ASYNC(trap, label, hdlr)  \
-   EXCEPTION_COMMON(PACA_EXGEN, trap, label,   \
-   FINISH_NAP;ADD_RECONCILE;RUNLATCH_ON);  \
+#define STD_EXCEPTION_COMMON_ASYNC(trap, hdlr) \
+   EXCEPTION_COMMON(PACA_EXGEN, trap); \
+   FINISH_NAP; \
+   RECONCILE_IRQ_STATE(r10, r11);  \
+   RUNLATCH_ON;\
addir3,r1,STACK_FRAME_OVERHEAD; \
bl  hdlr;   \
b   ret_from_except_lite
diff --git a/arch/powerpc/include/asm/head-64.h 
b/arch/powerpc/include/asm/head-64.h
index bdd67a26e959..acd94fcf9f40 100644
--- a/arch/powerpc/include/asm/head-64.h
+++ b/arch/powerpc/include/asm/head-64.h
@@ -403,11 +403,11 @@ end_##sname:
 
 #define EXC_COMMON(name, realvec, hdlr)
\
EXC_COMMON_BEGIN(name); \
-   STD_EXCEPTION_COMMON(realvec, name, hdlr)
+   STD_EXCEPTION_COMMON(realvec, hdlr)
 
 #define EXC_COMMON_ASYNC(name, realvec, hdlr)  \
EXC_COMMON_BEGIN(name); \
-   STD_EXCEPTION_COMMON_ASYNC(realvec, name, hdlr)
+   STD_EXCEPTION_COMMON_ASYNC(realvec, hdlr)
 
 #endif /* __ASSEMBLY__ */
 
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 6f0c270f45e4..346df79dca6a 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -149,21 +149,6 @@ 

[PATCH 10/17] powerpc/64s/exception: Move EXCEPTION_COMMON handler and return branches into callers

2019-02-04 Thread Nicholas Piggin
The aim is to reduce the amount of indirection it takes to get through
the exception handler macros, particularly where it provides little
code sharing.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/exception-64s.h | 26 
 arch/powerpc/kernel/exceptions-64s.S | 21 +++
 2 files changed, 26 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 35efb46c6f5f..676c877f6190 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -661,30 +661,27 @@ BEGIN_FTR_SECTION \
beqlppc64_runlatch_on_trampoline;   \
 END_FTR_SECTION_IFSET(CPU_FTR_CTRL)
 
-#define EXCEPTION_COMMON(area, trap, label, hdlr, ret, additions) \
+#define EXCEPTION_COMMON(area, trap, label, additions) \
EXCEPTION_PROLOG_COMMON(trap, area);\
/* Volatile regs are potentially clobbered here */  \
-   additions;  \
-   addir3,r1,STACK_FRAME_OVERHEAD; \
-   bl  hdlr;   \
-   b   ret
+   additions
 
 /*
  * Exception where stack is already set in r1, r1 is saved in r10, and it
  * continues rather than returns.
  */
-#define EXCEPTION_COMMON_NORET_STACK(area, trap, label, hdlr, additions) \
+#define EXCEPTION_COMMON_NORET_STACK(area, trap, label, additions) \
EXCEPTION_PROLOG_COMMON_1();\
EXCEPTION_PROLOG_COMMON_2(area);\
EXCEPTION_PROLOG_COMMON_3(trap);\
/* Volatile regs are potentially clobbered here */  \
-   additions;  \
-   addir3,r1,STACK_FRAME_OVERHEAD; \
-   bl  hdlr
+   additions
 
 #define STD_EXCEPTION_COMMON(trap, label, hdlr)\
-   EXCEPTION_COMMON(PACA_EXGEN, trap, label, hdlr, \
-   ret_from_except, ADD_NVGPRS;ADD_RECONCILE)
+   EXCEPTION_COMMON(PACA_EXGEN, trap, label, ADD_NVGPRS;ADD_RECONCILE); \
+   addir3,r1,STACK_FRAME_OVERHEAD; \
+   bl  hdlr;   \
+   b   ret_from_except
 
 /*
  * Like STD_EXCEPTION_COMMON, but for exceptions that can occur
@@ -692,8 +689,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_CTRL)
  * (finish nap and runlatch)
  */
 #define STD_EXCEPTION_COMMON_ASYNC(trap, label, hdlr)  \
-   EXCEPTION_COMMON(PACA_EXGEN, trap, label, hdlr, \
-   ret_from_except_lite, FINISH_NAP;ADD_RECONCILE;RUNLATCH_ON)
+   EXCEPTION_COMMON(PACA_EXGEN, trap, label,   \
+   FINISH_NAP;ADD_RECONCILE;RUNLATCH_ON);  \
+   addir3,r1,STACK_FRAME_OVERHEAD; \
+   bl  hdlr;   \
+   b   ret_from_except_lite
 
 /*
  * When the idle code in power4_idle puts the CPU into NAP mode,
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index d6316725a43b..6f0c270f45e4 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -180,9 +180,10 @@ EXC_COMMON_BEGIN(system_reset_common)
mr  r10,r1
ld  r1,PACA_NMI_EMERG_SP(r13)
subir1,r1,INT_FRAME_SIZE
-   EXCEPTION_COMMON_NORET_STACK(PACA_EXNMI, 0x100,
-   system_reset, system_reset_exception,
-   ADD_NVGPRS;ADD_RECONCILE_NMI)
+   EXCEPTION_COMMON_NORET_STACK(PACA_EXNMI, 0x100, system_reset,
+   ADD_NVGPRS;ADD_RECONCILE_NMI)
+   addir3,r1,STACK_FRAME_OVERHEAD
+   bl  system_reset_exception
 
/* This (and MCE) can be simplified with mtmsrd L=1 */
/* Clear MSR_RI before setting SRR0 and SRR1. */
@@ -1090,8 +1091,11 @@ hmi_exception_after_realmode:
b   tramp_real_hmi_exception
 
 EXC_COMMON_BEGIN(hmi_exception_common)
-EXCEPTION_COMMON(PACA_EXGEN, 0xe60, hmi_exception_common, handle_hmi_exception,
-ret_from_except, FINISH_NAP;ADD_NVGPRS;ADD_RECONCILE;RUNLATCH_ON)
+EXCEPTION_COMMON(PACA_EXGEN, 0xe60, hmi_exception_common,
+   FINISH_NAP;ADD_NVGPRS;ADD_RECONCILE;RUNLATCH_ON)
+   addir3,r1,STACK_FRAME_OVERHEAD
+   bl  handle_hmi_exception
+   b   ret_from_except
 
 EXC_REAL_OOL_MASKABLE_HV(h_doorbell, 0xe80, 0x20, IRQS_DISABLED)
 EXC_VIRT_OOL_MASKABLE_HV(h_doorbell, 0x4e80, 0x20, 0xe80, IRQS_DISABLED)
@@ -1386,9 +1390,10 @@ EXC_COMMON_BEGIN(soft_nmi_common)
mr  r10,r1
ld  r1,PACAEMERGSP(r13)
subir1,r1,INT_FRAME_SIZE
-   EXCEPTION_COMMON_NORET_STACK(PACA_EXGEN, 0x900,
-   system_reset, 

[PATCH 09/17] powerpc/64s/exception: Make EXCEPTION_PROLOG_0 a gas macro for consistency with others

2019-02-04 Thread Nicholas Piggin
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/exception-64s.h | 25 
 arch/powerpc/kernel/exceptions-64s.S | 12 ++--
 2 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index c78d9b1bf22d..35efb46c6f5f 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -237,7 +237,7 @@
  */
 #define EXCEPTION_RELON_PROLOG(area, label, hsrr, kvm, vec)\
SET_SCRATCH0(r13);  /* save r13 */  \
-   EXCEPTION_PROLOG_0(area);   \
+   EXCEPTION_PROLOG_0 area ;   \
EXCEPTION_PROLOG_1 hsrr, area, kvm, vec, 0 ;\
EXCEPTION_PROLOG_2_VIRT label, hsrr
 
@@ -301,13 +301,14 @@ BEGIN_FTR_SECTION_NESTED(943) 
\
std ra,offset(r13); \
 END_FTR_SECTION_NESTED(ftr,ftr,943)
 
-#define EXCEPTION_PROLOG_0(area)   \
-   GET_PACA(r13);  \
-   std r9,area+EX_R9(r13); /* save r9 */   \
-   OPT_GET_SPR(r9, SPRN_PPR, CPU_FTR_HAS_PPR); \
-   HMT_MEDIUM; \
-   std r10,area+EX_R10(r13);   /* save r10 - r12 */\
+.macro EXCEPTION_PROLOG_0 area
+   GET_PACA(r13)
+   std r9,\area\()+EX_R9(r13)  /* save r9 */
+   OPT_GET_SPR(r9, SPRN_PPR, CPU_FTR_HAS_PPR)
+   HMT_MEDIUM
+   std r10,\area\()+EX_R10(r13)/* save r10 - r12 */
OPT_GET_SPR(r10, SPRN_CFAR, CPU_FTR_CFAR)
+.endm
 
 .macro EXCEPTION_PROLOG_1 hsrr, area, kvm, vec, bitmask
OPT_SAVE_REG_TO_PACA(\area\()+EX_PPR, r9, CPU_FTR_HAS_PPR)
@@ -351,7 +352,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 
 #define EXCEPTION_PROLOG(area, label, hsrr, kvm, vec)  \
SET_SCRATCH0(r13);  /* save r13 */  \
-   EXCEPTION_PROLOG_0(area);   \
+   EXCEPTION_PROLOG_0 area ;   \
EXCEPTION_PROLOG_1 hsrr, area, kvm, vec, 0 ;\
EXCEPTION_PROLOG_2_REAL label, hsrr, 1
 
@@ -420,7 +421,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 
 /* Do not enable RI */
 #define EXCEPTION_PROLOG_NORI(area, label, hsrr, kvm, vec) \
-   EXCEPTION_PROLOG_0(area);   \
+   EXCEPTION_PROLOG_0 area ;   \
EXCEPTION_PROLOG_1 hsrr, area, kvm, vec, 0 ;\
EXCEPTION_PROLOG_2_REAL label, hsrr, 0
 
@@ -568,7 +569,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 /* Version of above for when we have to branch out-of-line */
 #define __OOL_EXCEPTION(vec, label, hdlr)  \
SET_SCRATCH0(r13);  \
-   EXCEPTION_PROLOG_0(PACA_EXGEN); \
+   EXCEPTION_PROLOG_0 PACA_EXGEN ; \
b hdlr
 
 #define STD_EXCEPTION_OOL(vec, label)  \
@@ -599,7 +600,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 
 #define __MASKABLE_EXCEPTION(vec, label, hsrr, kvm, bitmask)   \
SET_SCRATCH0(r13);/* save r13 */\
-   EXCEPTION_PROLOG_0(PACA_EXGEN); \
+   EXCEPTION_PROLOG_0 PACA_EXGEN ; \
EXCEPTION_PROLOG_1 hsrr, PACA_EXGEN, kvm, vec, bitmask ;\
EXCEPTION_PROLOG_2_REAL label, hsrr, 1
 
@@ -619,7 +620,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 
 #define __MASKABLE_RELON_EXCEPTION(vec, label, hsrr, kvm, bitmask) \
SET_SCRATCH0(r13);/* save r13 */\
-   EXCEPTION_PROLOG_0(PACA_EXGEN); \
+   EXCEPTION_PROLOG_0 PACA_EXGEN ; \
EXCEPTION_PROLOG_1 hsrr, PACA_EXGEN, kvm, vec, bitmask ;\
EXCEPTION_PROLOG_2_VIRT label, hsrr
 
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index ebec428f8791..d6316725a43b 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -100,7 +100,7 @@ EXC_VIRT_NONE(0x4000, 0x100)
 
 EXC_REAL_BEGIN(system_reset, 0x100, 0x100)
SET_SCRATCH0(r13)
-   EXCEPTION_PROLOG_0(PACA_EXNMI)
+   EXCEPTION_PROLOG_0 PACA_EXNMI
 
/* This is EXCEPTION_PROLOG_1 with the idle feature section added */
OPT_SAVE_REG_TO_PACA(PACA_EXNMI+EX_PPR, r9, CPU_FTR_HAS_PPR)
@@ -251,7 +251,7 @@ EXC_REAL_BEGIN(machine_check, 0x200, 0x100)
 * vector
 */
SET_SCRATCH0(r13) 

[PATCH 08/17] powerpc/64s/exception: KVM handler can set the HSRR trap bit

2019-02-04 Thread Nicholas Piggin
Move the KVM trap HSRR bit, into the KVM handler, which can be
conditionally applied when hsrr parameter is set.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/exception-64s.h | 5 +
 arch/powerpc/include/asm/head-64.h   | 7 ++-
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index cecef7166a0c..c78d9b1bf22d 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -453,7 +453,12 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
ld  r10,\area+EX_R10(r13)
std r12,HSTATE_SCRATCH0(r13)
sldir12,r9,32
+   /* HSRR variants have the 0x2 bit added to their trap number */
+   .if \hsrr
+   ori r12,r12,(\n + 0x2)
+   .else
ori r12,r12,(\n)
+   .endif
/* This reloads r9 before branching to kvmppc_interrupt */
__BRANCH_TO_KVM_EXIT(\area, kvmppc_interrupt)
 
diff --git a/arch/powerpc/include/asm/head-64.h 
b/arch/powerpc/include/asm/head-64.h
index 518d9758b41e..bdd67a26e959 100644
--- a/arch/powerpc/include/asm/head-64.h
+++ b/arch/powerpc/include/asm/head-64.h
@@ -393,16 +393,13 @@ end_##sname:
TRAMP_KVM_BEGIN(do_kvm_##n);\
KVM_HANDLER area, EXC_STD, n, 1
 
-/*
- * HV variant exceptions get the 0x2 bit added to their trap number.
- */
 #define TRAMP_KVM_HV(area, n)  \
TRAMP_KVM_BEGIN(do_kvm_H##n);   \
-   KVM_HANDLER area, EXC_HV, n + 0x2, 0
+   KVM_HANDLER area, EXC_HV, n, 0
 
 #define TRAMP_KVM_HV_SKIP(area, n) \
TRAMP_KVM_BEGIN(do_kvm_H##n);   \
-   KVM_HANDLER area, EXC_HV, n + 0x2, 1
+   KVM_HANDLER area, EXC_HV, n, 1
 
 #define EXC_COMMON(name, realvec, hdlr)
\
EXC_COMMON_BEGIN(name); \
-- 
2.18.0



[PATCH 07/17] powerpc/64s/exception: merge KVM handler and skip variants

2019-02-04 Thread Nicholas Piggin
Conditionally expand the skip case if it is specified.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/exception-64s.h | 28 +---
 arch/powerpc/include/asm/head-64.h   |  8 +++
 arch/powerpc/kernel/exceptions-64s.S |  2 +-
 3 files changed, 15 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index adfbc5a0f267..cecef7166a0c 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -435,26 +435,17 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
.endif
 .endm
 
-.macro KVM_HANDLER area, hsrr, n
+.macro KVM_HANDLER area, hsrr, n, skip
+   .if \skip
+   cmpwi   r10,KVM_GUEST_MODE_SKIP
+   beq 89f
+   .else
BEGIN_FTR_SECTION_NESTED(947)
ld  r10,\area+EX_CFAR(r13)
std r10,HSTATE_CFAR(r13)
END_FTR_SECTION_NESTED(CPU_FTR_CFAR,CPU_FTR_CFAR,947)
-   BEGIN_FTR_SECTION_NESTED(948)
-   ld  r10,\area+EX_PPR(r13)
-   std r10,HSTATE_PPR(r13)
-   END_FTR_SECTION_NESTED(CPU_FTR_HAS_PPR,CPU_FTR_HAS_PPR,948)
-   ld  r10,\area+EX_R10(r13)
-   std r12,HSTATE_SCRATCH0(r13)
-   sldir12,r9,32
-   ori r12,r12,(\n)
-   /* This reloads r9 before branching to kvmppc_interrupt */
-   __BRANCH_TO_KVM_EXIT(\area, kvmppc_interrupt)
-.endm
+   .endif
 
-.macro KVM_HANDLER_SKIP area, hsrr, n
-   cmpwi   r10,KVM_GUEST_MODE_SKIP
-   beq 89f
BEGIN_FTR_SECTION_NESTED(948)
ld  r10,\area+EX_PPR(r13)
std r10,HSTATE_PPR(r13)
@@ -465,6 +456,8 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
ori r12,r12,(\n)
/* This reloads r9 before branching to kvmppc_interrupt */
__BRANCH_TO_KVM_EXIT(\area, kvmppc_interrupt)
+
+   .if \skip
 89:mtocrf  0x80,r9
ld  r9,\area+EX_R9(r13)
ld  r10,\area+EX_R10(r13)
@@ -473,14 +466,13 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
.else
b   kvmppc_skip_interrupt
.endif
+   .endif
 .endm
 
 #else
 .macro KVMTEST hsrr, n
 .endm
-.macro KVM_HANDLER area, hsrr, n
-.endm
-.macro KVM_HANDLER_SKIP area, hsrr, n
+.macro KVM_HANDLER area, hsrr, n, skip
 .endm
 #endif
 
diff --git a/arch/powerpc/include/asm/head-64.h 
b/arch/powerpc/include/asm/head-64.h
index 4767d6c7b8fa..518d9758b41e 100644
--- a/arch/powerpc/include/asm/head-64.h
+++ b/arch/powerpc/include/asm/head-64.h
@@ -387,22 +387,22 @@ end_##sname:
 
 #define TRAMP_KVM(area, n) \
TRAMP_KVM_BEGIN(do_kvm_##n);\
-   KVM_HANDLER area, EXC_STD, n
+   KVM_HANDLER area, EXC_STD, n, 0
 
 #define TRAMP_KVM_SKIP(area, n)
\
TRAMP_KVM_BEGIN(do_kvm_##n);\
-   KVM_HANDLER_SKIP area, EXC_STD, n
+   KVM_HANDLER area, EXC_STD, n, 1
 
 /*
  * HV variant exceptions get the 0x2 bit added to their trap number.
  */
 #define TRAMP_KVM_HV(area, n)  \
TRAMP_KVM_BEGIN(do_kvm_H##n);   \
-   KVM_HANDLER area, EXC_HV, n + 0x2
+   KVM_HANDLER area, EXC_HV, n + 0x2, 0
 
 #define TRAMP_KVM_HV_SKIP(area, n) \
TRAMP_KVM_BEGIN(do_kvm_H##n);   \
-   KVM_HANDLER_SKIP area, EXC_HV, n + 0x2
+   KVM_HANDLER area, EXC_HV, n + 0x2, 1
 
 #define EXC_COMMON(name, realvec, hdlr)
\
EXC_COMMON_BEGIN(name); \
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index fd6c4748cfa2..ebec428f8791 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -983,7 +983,7 @@ TRAMP_KVM_BEGIN(do_kvm_0xc00)
SET_SCRATCH0(r10)
std r9,PACA_EXGEN+EX_R9(r13)
mfcrr9
-   KVM_HANDLER PACA_EXGEN, EXC_STD, 0xc00
+   KVM_HANDLER PACA_EXGEN, EXC_STD, 0xc00, 0
 #endif
 
 
-- 
2.18.0



[PATCH 06/17] powerpc/64s/exception: consolidate maskable and non-maskable prologs

2019-02-04 Thread Nicholas Piggin
Conditionally expand the soft-masking test if a mask is passed in.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/exception-64s.h | 113 +--
 arch/powerpc/kernel/exceptions-64s.S |   8 +-
 2 files changed, 49 insertions(+), 72 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 62502c8f6b18..adfbc5a0f267 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -238,7 +238,7 @@
 #define EXCEPTION_RELON_PROLOG(area, label, hsrr, kvm, vec)\
SET_SCRATCH0(r13);  /* save r13 */  \
EXCEPTION_PROLOG_0(area);   \
-   EXCEPTION_PROLOG_1 hsrr, area, kvm, vec ;   \
+   EXCEPTION_PROLOG_1 hsrr, area, kvm, vec, 0 ;\
EXCEPTION_PROLOG_2_VIRT label, hsrr
 
 /* Exception register prefixes */
@@ -309,73 +309,50 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
std r10,area+EX_R10(r13);   /* save r10 - r12 */\
OPT_GET_SPR(r10, SPRN_CFAR, CPU_FTR_CFAR)
 
-#define __EXCEPTION_PROLOG_1_PRE(area) \
-   OPT_SAVE_REG_TO_PACA(area+EX_PPR, r9, CPU_FTR_HAS_PPR); \
-   OPT_SAVE_REG_TO_PACA(area+EX_CFAR, r10, CPU_FTR_CFAR);  \
-   INTERRUPT_TO_KERNEL;\
-   SAVE_CTR(r10, area);\
+.macro EXCEPTION_PROLOG_1 hsrr, area, kvm, vec, bitmask
+   OPT_SAVE_REG_TO_PACA(\area\()+EX_PPR, r9, CPU_FTR_HAS_PPR)
+   OPT_SAVE_REG_TO_PACA(\area\()+EX_CFAR, r10, CPU_FTR_CFAR)
+   INTERRUPT_TO_KERNEL
+   SAVE_CTR(r10, \area\())
mfcrr9
-
-#define __EXCEPTION_PROLOG_1_POST(area)
\
-   std r11,area+EX_R11(r13);   \
-   std r12,area+EX_R12(r13);   \
-   GET_SCRATCH0(r10);  \
-   std r10,area+EX_R13(r13)
-
-/*
- * This version of the EXCEPTION_PROLOG_1 will carry
- * addition parameter called "bitmask" to support
- * checking of the interrupt maskable level.
- * Intended to be used in MASKABLE_EXCPETION_* macros.
- */
-.macro MASKABLE_EXCEPTION_PROLOG_1 hsrr, area, kvm, vec, bitmask
-   __EXCEPTION_PROLOG_1_PRE(\area\())
.if \kvm
KVMTEST \hsrr \vec
.endif
-
-   lbz r10,PACAIRQSOFTMASK(r13)
-   andi.   r10,r10,\bitmask
-   /* This associates vector numbers with bits in paca->irq_happened */
-   .if \vec == 0x500 || \vec == 0xea0
-   li  r10,PACA_IRQ_EE
-   .elseif \vec == 0x900 || \vec == 0xea0
-   li  r10,PACA_IRQ_DEC
-   .elseif \vec == 0xa00 || \vec == 0xe80
-   li  r10,PACA_IRQ_DBELL
-   .elseif \vec == 0xe60
-   li  r10,PACA_IRQ_HMI
-   .elseif \vec == 0xf00
-   li  r10,PACA_IRQ_PMI
-   .else
-   .abort "Bad maskable vector"
+   .if \bitmask
+   lbz r10,PACAIRQSOFTMASK(r13)
+   andi.   r10,r10,\bitmask
+   /* Associate vector numbers with bits in paca->irq_happened */
+   .if \vec == 0x500 || \vec == 0xea0
+   li  r10,PACA_IRQ_EE
+   .elseif \vec == 0x900 || \vec == 0xea0
+   li  r10,PACA_IRQ_DEC
+   .elseif \vec == 0xa00 || \vec == 0xe80
+   li  r10,PACA_IRQ_DBELL
+   .elseif \vec == 0xe60
+   li  r10,PACA_IRQ_HMI
+   .elseif \vec == 0xf00
+   li  r10,PACA_IRQ_PMI
+   .else
+   .abort "Bad maskable vector"
+   .endif
+
+   .if \hsrr
+   bne masked_Hinterrupt
+   .else
+   bne masked_interrupt
+   .endif
.endif
 
-   .if \hsrr
-   bne masked_Hinterrupt
-   .else
-   bne masked_interrupt
-   .endif
-
-   __EXCEPTION_PROLOG_1_POST(\area\())
-.endm
-
-/*
- * This version of the EXCEPTION_PROLOG_1 is intended
- * to be used in STD_EXCEPTION* macros
- */
-.macro EXCEPTION_PROLOG_1 hsrr, area, kvm, vec
-   __EXCEPTION_PROLOG_1_PRE(\area\())
-   .if \kvm
-   KVMTEST \hsrr \vec
-   .endif
-   __EXCEPTION_PROLOG_1_POST(\area\())
+   std r11,\area\()+EX_R11(r13)
+   std r12,\area\()+EX_R12(r13)
+   GET_SCRATCH0(r10)
+   std r10,\area\()+EX_R13(r13)
 .endm
 
 #define EXCEPTION_PROLOG(area, label, hsrr, kvm, vec)  \
SET_SCRATCH0(r13);  /* save r13 */  \
EXCEPTION_PROLOG_0(area);   \
-   EXCEPTION_PROLOG_1 hsrr, area, kvm, vec ;   \
+   EXCEPTION_PROLOG_1 hsrr, area, kvm, vec, 0 ;   

[PATCH 05/17] powerpc/64s/exception: remove the "extra" macro parameter

2019-02-04 Thread Nicholas Piggin
Rather than pass in the soft-masking and KVM tests via macro that is
passed to another macro to expand it, switch to usig gas macros and
conditionally expand the soft-masking and KVM tests.

The system reset with its idle test is open coded as it is a one-off.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/exception-64s.h | 158 ++-
 arch/powerpc/kernel/exceptions-64s.S |  65 ++
 2 files changed, 107 insertions(+), 116 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 7f6358654263..62502c8f6b18 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -235,10 +235,10 @@
  * rfid. Save CTR in case we're CONFIG_RELOCATABLE, in which case
  * EXCEPTION_PROLOG_2_VIRT will be using CTR.
  */
-#define EXCEPTION_RELON_PROLOG(area, label, hsrr, extra, vec)  \
+#define EXCEPTION_RELON_PROLOG(area, label, hsrr, kvm, vec)\
SET_SCRATCH0(r13);  /* save r13 */  \
EXCEPTION_PROLOG_0(area);   \
-   EXCEPTION_PROLOG_1(area, extra, vec);   \
+   EXCEPTION_PROLOG_1 hsrr, area, kvm, vec ;   \
EXCEPTION_PROLOG_2_VIRT label, hsrr
 
 /* Exception register prefixes */
@@ -325,31 +325,58 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 /*
  * This version of the EXCEPTION_PROLOG_1 will carry
  * addition parameter called "bitmask" to support
- * checking of the interrupt maskable level in the SOFTEN_TEST.
+ * checking of the interrupt maskable level.
  * Intended to be used in MASKABLE_EXCPETION_* macros.
  */
-#define MASKABLE_EXCEPTION_PROLOG_1(area, extra, vec, bitmask) 
\
-   __EXCEPTION_PROLOG_1_PRE(area); \
-   extra(vec, bitmask);\
-   __EXCEPTION_PROLOG_1_POST(area)
+.macro MASKABLE_EXCEPTION_PROLOG_1 hsrr, area, kvm, vec, bitmask
+   __EXCEPTION_PROLOG_1_PRE(\area\())
+   .if \kvm
+   KVMTEST \hsrr \vec
+   .endif
+
+   lbz r10,PACAIRQSOFTMASK(r13)
+   andi.   r10,r10,\bitmask
+   /* This associates vector numbers with bits in paca->irq_happened */
+   .if \vec == 0x500 || \vec == 0xea0
+   li  r10,PACA_IRQ_EE
+   .elseif \vec == 0x900 || \vec == 0xea0
+   li  r10,PACA_IRQ_DEC
+   .elseif \vec == 0xa00 || \vec == 0xe80
+   li  r10,PACA_IRQ_DBELL
+   .elseif \vec == 0xe60
+   li  r10,PACA_IRQ_HMI
+   .elseif \vec == 0xf00
+   li  r10,PACA_IRQ_PMI
+   .else
+   .abort "Bad maskable vector"
+   .endif
+
+   .if \hsrr
+   bne masked_Hinterrupt
+   .else
+   bne masked_interrupt
+   .endif
+
+   __EXCEPTION_PROLOG_1_POST(\area\())
+.endm
 
 /*
  * This version of the EXCEPTION_PROLOG_1 is intended
  * to be used in STD_EXCEPTION* macros
  */
-#define _EXCEPTION_PROLOG_1(area, extra, vec)  \
-   __EXCEPTION_PROLOG_1_PRE(area); \
-   extra(vec); \
-   __EXCEPTION_PROLOG_1_POST(area)
-
-#define EXCEPTION_PROLOG_1(area, extra, vec)   \
-   _EXCEPTION_PROLOG_1(area, extra, vec)
+.macro EXCEPTION_PROLOG_1 hsrr, area, kvm, vec
+   __EXCEPTION_PROLOG_1_PRE(\area\())
+   .if \kvm
+   KVMTEST \hsrr \vec
+   .endif
+   __EXCEPTION_PROLOG_1_POST(\area\())
+.endm
 
-#define EXCEPTION_PROLOG(area, label, h, extra, vec)   \
+#define EXCEPTION_PROLOG(area, label, hsrr, kvm, vec)  \
SET_SCRATCH0(r13);  /* save r13 */  \
EXCEPTION_PROLOG_0(area);   \
-   EXCEPTION_PROLOG_1(area, extra, vec);   \
-   EXCEPTION_PROLOG_2_REAL label, h, 1
+   EXCEPTION_PROLOG_1 hsrr, area, kvm, vec ;   \
+   EXCEPTION_PROLOG_2_REAL label, hsrr, 1
 
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 /*
@@ -415,10 +442,10 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #endif
 
 /* Do not enable RI */
-#define EXCEPTION_PROLOG_NORI(area, label, h, extra, vec)  \
+#define EXCEPTION_PROLOG_NORI(area, label, hsrr, kvm, vec) \
EXCEPTION_PROLOG_0(area);   \
-   EXCEPTION_PROLOG_1(area, extra, vec);   \
-   EXCEPTION_PROLOG_2_REAL label, h, 0
+   EXCEPTION_PROLOG_1 hsrr, area, kvm, vec ;   \
+   EXCEPTION_PROLOG_2_REAL label, hsrr, 0
 
 #ifdef CONFIG_KVM_BOOK3S_64_HANDLER
 .macro KVMTEST hsrr, n
@@ -480,8 +507,6 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 .endm
 #endif
 
-#define NOTEST(n)
-
 #define EXCEPTION_PROLOG_COMMON_1()   \
std  

[PATCH 04/17] powerpc/64s/exception: move and tidy EXCEPTION_PROLOG_2 variants

2019-02-04 Thread Nicholas Piggin
- Re-name the macros to _REAL and _VIRT suffixes rather than no and
  _RELON suffix.

- Move the macro definitions together in the file.

- Move RELOCATABLE ifdef inside the _VIRT macro.

Further consolidation between variants does not buy much here.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/exception-64s.h | 87 
 arch/powerpc/kernel/exceptions-64s.S |  6 +-
 2 files changed, 45 insertions(+), 48 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index aa19c95e7cfa..7f6358654263 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -174,8 +174,33 @@
ori reg,reg,(ABS_ADDR(label))@l;\
addis   reg,reg,(ABS_ADDR(label))@h
 
+.macro EXCEPTION_PROLOG_2_REAL label, hsrr, set_ri
+   ld  r10,PACAKMSR(r13)   /* get MSR value for kernel */
+   .if ! \set_ri
+   xorir10,r10,MSR_RI  /* Clear MSR_RI */
+   .endif
+   .if \hsrr
+   mfspr   r11,SPRN_HSRR0  /* save HSRR0 */
+   .else
+   mfspr   r11,SPRN_SRR0   /* save SRR0 */
+   .endif
+   LOAD_HANDLER(r12, \label\())
+   .if \hsrr
+   mtspr   SPRN_HSRR0,r12
+   mfspr   r12,SPRN_HSRR1  /* and HSRR1 */
+   mtspr   SPRN_HSRR1,r10
+   HRFI_TO_KERNEL
+   .else
+   mtspr   SPRN_SRR0,r12
+   mfspr   r12,SPRN_SRR1   /* and SRR1 */
+   mtspr   SPRN_SRR1,r10
+   RFI_TO_KERNEL
+   .endif
+   b   .   /* prevent speculative execution */
+.endm
+
+.macro EXCEPTION_PROLOG_2_VIRT label, hsrr
 #ifdef CONFIG_RELOCATABLE
-.macro EXCEPTION_PROLOG_2_RELON label, hsrr
.if \hsrr
mfspr   r11,SPRN_HSRR0  /* save HSRR0 */
.else
@@ -191,10 +216,7 @@
li  r10,MSR_RI
mtmsrd  r10,1   /* Set RI (EE=0) */
bctr
-.endm
 #else
-/* If not relocatable, we can jump directly -- and save messing with LR */
-.macro EXCEPTION_PROLOG_2_RELON label, hsrr
.if \hsrr
mfspr   r11,SPRN_HSRR0  /* save HSRR0 */
mfspr   r12,SPRN_HSRR1  /* and HSRR1 */
@@ -205,19 +227,19 @@
li  r10,MSR_RI
mtmsrd  r10,1   /* Set RI (EE=0) */
b   label
-.endm
 #endif
+.endm
 
 /*
  * As EXCEPTION_PROLOG(), except we've already got relocation on so no need to
- * rfid. Save LR in case we're CONFIG_RELOCATABLE, in which case
- * EXCEPTION_PROLOG_2_RELON will be using LR.
+ * rfid. Save CTR in case we're CONFIG_RELOCATABLE, in which case
+ * EXCEPTION_PROLOG_2_VIRT will be using CTR.
  */
 #define EXCEPTION_RELON_PROLOG(area, label, hsrr, extra, vec)  \
SET_SCRATCH0(r13);  /* save r13 */  \
EXCEPTION_PROLOG_0(area);   \
EXCEPTION_PROLOG_1(area, extra, vec);   \
-   EXCEPTION_PROLOG_2_RELON label, hsrr
+   EXCEPTION_PROLOG_2_VIRT label, hsrr
 
 /* Exception register prefixes */
 #define EXC_HV 1
@@ -323,36 +345,11 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define EXCEPTION_PROLOG_1(area, extra, vec)   \
_EXCEPTION_PROLOG_1(area, extra, vec)
 
-.macro EXCEPTION_PROLOG_2 label, hsrr, set_ri
-   ld  r10,PACAKMSR(r13)   /* get MSR value for kernel */
-   .if ! \set_ri
-   xorir10,r10,MSR_RI  /* Clear MSR_RI */
-   .endif
-   .if \hsrr
-   mfspr   r11,SPRN_HSRR0  /* save HSRR0 */
-   .else
-   mfspr   r11,SPRN_SRR0   /* save SRR0 */
-   .endif
-   LOAD_HANDLER(r12,\label\())
-   .if \hsrr
-   mtspr   SPRN_HSRR0,r12
-   mfspr   r12,SPRN_HSRR1  /* and HSRR1 */
-   mtspr   SPRN_HSRR1,r10
-   HRFI_TO_KERNEL
-   .else
-   mtspr   SPRN_SRR0,r12
-   mfspr   r12,SPRN_SRR1   /* and SRR1 */
-   mtspr   SPRN_SRR1,r10
-   RFI_TO_KERNEL
-   .endif
-   b   .   /* prevent speculative execution */
-.endm
-
 #define EXCEPTION_PROLOG(area, label, h, extra, vec)   \
SET_SCRATCH0(r13);  /* save r13 */  \
EXCEPTION_PROLOG_0(area);   \
EXCEPTION_PROLOG_1(area, extra, vec);   \
-   EXCEPTION_PROLOG_2 label, h, 1
+   EXCEPTION_PROLOG_2_REAL label, h, 1
 
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 /*
@@ -421,7 +418,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define EXCEPTION_PROLOG_NORI(area, label, h, extra, vec)  \
EXCEPTION_PROLOG_0(area);   \
EXCEPTION_PROLOG_1(area, extra, vec);   \
-   EXCEPTION_PROLOG_2 label, h, 0
+   EXCEPTION_PROLOG_2_REAL label, h, 0
 
 #ifdef CONFIG_KVM_BOOK3S_64_HANDLER
 .macro KVMTEST hsrr, n
@@ -577,14 +574,14 @@ 

[PATCH 03/17] powerpc/64s/exception: consolidate EXCEPTION_PROLOG_2 with _NORI variant

2019-02-04 Thread Nicholas Piggin
Switch to a gas macro that conditionally expands the RI clearing
instruction.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/exception-64s.h | 43 ++--
 arch/powerpc/kernel/exceptions-64s.S |  6 ++--
 2 files changed, 14 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 230328724314..aa19c95e7cfa 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -323,32 +323,11 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define EXCEPTION_PROLOG_1(area, extra, vec)   \
_EXCEPTION_PROLOG_1(area, extra, vec)
 
-.macro EXCEPTION_PROLOG_2 label, hsrr
-   ld  r10,PACAKMSR(r13)   /* get MSR value for kernel */
-   .if \hsrr
-   mfspr   r11,SPRN_HSRR0  /* save HSRR0 */
-   .else
-   mfspr   r11,SPRN_SRR0   /* save SRR0 */
-   .endif
-   LOAD_HANDLER(r12,\label\())
-   .if \hsrr
-   mtspr   SPRN_HSRR0,r12
-   mfspr   r12,SPRN_HSRR1  /* and HSRR1 */
-   mtspr   SPRN_HSRR1,r10
-   HRFI_TO_KERNEL
-   .else
-   mtspr   SPRN_SRR0,r12
-   mfspr   r12,SPRN_SRR1   /* and SRR1 */
-   mtspr   SPRN_SRR1,r10
-   RFI_TO_KERNEL
-   .endif
-   b   .   /* prevent speculative execution */
-.endm
-
-/* _NORI variant keeps MSR_RI clear */
-.macro EXCEPTION_PROLOG_2_NORI label, hsrr
+.macro EXCEPTION_PROLOG_2 label, hsrr, set_ri
ld  r10,PACAKMSR(r13)   /* get MSR value for kernel */
+   .if ! \set_ri
xorir10,r10,MSR_RI  /* Clear MSR_RI */
+   .endif
.if \hsrr
mfspr   r11,SPRN_HSRR0  /* save HSRR0 */
.else
@@ -373,7 +352,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
SET_SCRATCH0(r13);  /* save r13 */  \
EXCEPTION_PROLOG_0(area);   \
EXCEPTION_PROLOG_1(area, extra, vec);   \
-   EXCEPTION_PROLOG_2 label, h
+   EXCEPTION_PROLOG_2 label, h, 1
 
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 /*
@@ -442,7 +421,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define EXCEPTION_PROLOG_NORI(area, label, h, extra, vec)  \
EXCEPTION_PROLOG_0(area);   \
EXCEPTION_PROLOG_1(area, extra, vec);   \
-   EXCEPTION_PROLOG_2_NORI label, h
+   EXCEPTION_PROLOG_2 label, h, 0
 
 #ifdef CONFIG_KVM_BOOK3S_64_HANDLER
 .macro KVMTEST hsrr, n
@@ -598,14 +577,14 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 
 #define STD_EXCEPTION_OOL(vec, label)  \
EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST_PR, vec);\
-   EXCEPTION_PROLOG_2 label, EXC_STD
+   EXCEPTION_PROLOG_2 label, EXC_STD, 1
 
 #define STD_EXCEPTION_HV(loc, vec, label)  \
EXCEPTION_PROLOG(PACA_EXGEN, label, EXC_HV, KVMTEST_HV, vec)
 
 #define STD_EXCEPTION_HV_OOL(vec, label)   \
EXCEPTION_PROLOG_1(PACA_EXGEN, KVMTEST_HV, vec);\
-   EXCEPTION_PROLOG_2 label, EXC_HV
+   EXCEPTION_PROLOG_2 label, EXC_HV, 1
 
 #define STD_RELON_EXCEPTION(loc, vec, label)   \
/* No guest interrupts come through here */ \
@@ -669,21 +648,21 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
SET_SCRATCH0(r13);/* save r13 */\
EXCEPTION_PROLOG_0(PACA_EXGEN); \
MASKABLE_EXCEPTION_PROLOG_1(PACA_EXGEN, extra, vec, bitmask);   \
-   EXCEPTION_PROLOG_2 label, h
+   EXCEPTION_PROLOG_2 label, h, 1
 
 #define MASKABLE_EXCEPTION(vec, label, bitmask)
\
__MASKABLE_EXCEPTION(vec, label, EXC_STD, SOFTEN_TEST_PR, bitmask)
 
 #define MASKABLE_EXCEPTION_OOL(vec, label, bitmask)\
MASKABLE_EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_TEST_PR, vec, bitmask);\
-   EXCEPTION_PROLOG_2 label, EXC_STD
+   EXCEPTION_PROLOG_2 label, EXC_STD, 1
 
 #define MASKABLE_EXCEPTION_HV(vec, label, bitmask) \
__MASKABLE_EXCEPTION(vec, label, EXC_HV, SOFTEN_TEST_HV, bitmask)
 
 #define MASKABLE_EXCEPTION_HV_OOL(vec, label, bitmask) \
MASKABLE_EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_TEST_HV, vec, bitmask);\
-   EXCEPTION_PROLOG_2 label, EXC_HV
+   EXCEPTION_PROLOG_2 label, EXC_HV, 1
 
 #define __MASKABLE_RELON_EXCEPTION(vec, label, h, extra, bitmask)  \
SET_SCRATCH0(r13);/* save r13 */\
@@ -696,7 +675,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 
 #define MASKABLE_RELON_EXCEPTION_OOL(vec, label, bitmask)  \
MASKABLE_EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_NOTEST_PR, vec, 
bitmask);\
-   EXCEPTION_PROLOG_2 label, EXC_STD
+   EXCEPTION_PROLOG_2 label, EXC_STD, 1
 
 #define 

[PATCH 02/17] powerpc/64s/exception: remove H concatenation for EXC_HV variants

2019-02-04 Thread Nicholas Piggin
Replace all instances of this with gas macros that test the hsrr
parameter and use the appropriate register names / labels.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/exception-64s.h | 333 +--
 arch/powerpc/include/asm/head-64.h   |   8 +-
 arch/powerpc/kernel/exceptions-64s.S |  85 +++---
 3 files changed, 247 insertions(+), 179 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index f78ff225cb64..230328724314 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -67,6 +67,8 @@
  */
 #define EX_R3  EX_DAR
 
+#ifdef __ASSEMBLY__
+
 #define STF_ENTRY_BARRIER_SLOT \
STF_ENTRY_BARRIER_FIXUP_SECTION;\
nop;\
@@ -148,38 +150,6 @@
hrfid;  \
b   hrfi_flush_fallback
 
-#ifdef CONFIG_RELOCATABLE
-#define __EXCEPTION_PROLOG_2_RELON(label, h)   \
-   mfspr   r11,SPRN_##h##SRR0; /* save SRR0 */ \
-   LOAD_HANDLER(r12,label);\
-   mtctr   r12;\
-   mfspr   r12,SPRN_##h##SRR1; /* and SRR1 */  \
-   li  r10,MSR_RI; \
-   mtmsrd  r10,1;  /* Set RI (EE=0) */ \
-   bctr;
-#else
-/* If not relocatable, we can jump directly -- and save messing with LR */
-#define __EXCEPTION_PROLOG_2_RELON(label, h)   \
-   mfspr   r11,SPRN_##h##SRR0; /* save SRR0 */ \
-   mfspr   r12,SPRN_##h##SRR1; /* and SRR1 */  \
-   li  r10,MSR_RI; \
-   mtmsrd  r10,1;  /* Set RI (EE=0) */ \
-   b   label;
-#endif
-#define EXCEPTION_PROLOG_2_RELON(label, h) \
-   __EXCEPTION_PROLOG_2_RELON(label, h)
-
-/*
- * As EXCEPTION_PROLOG(), except we've already got relocation on so no need to
- * rfid. Save LR in case we're CONFIG_RELOCATABLE, in which case
- * EXCEPTION_PROLOG_2_RELON will be using LR.
- */
-#define EXCEPTION_RELON_PROLOG(area, label, h, extra, vec) \
-   SET_SCRATCH0(r13);  /* save r13 */  \
-   EXCEPTION_PROLOG_0(area);   \
-   EXCEPTION_PROLOG_1(area, extra, vec);   \
-   EXCEPTION_PROLOG_2_RELON(label, h)
-
 /*
  * We're short on space and time in the exception prolog, so we can't
  * use the normal LOAD_REG_IMMEDIATE macro to load the address of label.
@@ -204,9 +174,54 @@
ori reg,reg,(ABS_ADDR(label))@l;\
addis   reg,reg,(ABS_ADDR(label))@h
 
+#ifdef CONFIG_RELOCATABLE
+.macro EXCEPTION_PROLOG_2_RELON label, hsrr
+   .if \hsrr
+   mfspr   r11,SPRN_HSRR0  /* save HSRR0 */
+   .else
+   mfspr   r11,SPRN_SRR0   /* save SRR0 */
+   .endif
+   LOAD_HANDLER(r12, \label\())
+   mtctr   r12
+   .if \hsrr
+   mfspr   r12,SPRN_HSRR1  /* and HSRR1 */
+   .else
+   mfspr   r12,SPRN_SRR1   /* and HSRR1 */
+   .endif
+   li  r10,MSR_RI
+   mtmsrd  r10,1   /* Set RI (EE=0) */
+   bctr
+.endm
+#else
+/* If not relocatable, we can jump directly -- and save messing with LR */
+.macro EXCEPTION_PROLOG_2_RELON label, hsrr
+   .if \hsrr
+   mfspr   r11,SPRN_HSRR0  /* save HSRR0 */
+   mfspr   r12,SPRN_HSRR1  /* and HSRR1 */
+   .else
+   mfspr   r11,SPRN_SRR0   /* save SRR0 */
+   mfspr   r12,SPRN_SRR1   /* and SRR1 */
+   .endif
+   li  r10,MSR_RI
+   mtmsrd  r10,1   /* Set RI (EE=0) */
+   b   label
+.endm
+#endif
+
+/*
+ * As EXCEPTION_PROLOG(), except we've already got relocation on so no need to
+ * rfid. Save LR in case we're CONFIG_RELOCATABLE, in which case
+ * EXCEPTION_PROLOG_2_RELON will be using LR.
+ */
+#define EXCEPTION_RELON_PROLOG(area, label, hsrr, extra, vec)  \
+   SET_SCRATCH0(r13);  /* save r13 */  \
+   EXCEPTION_PROLOG_0(area);   \
+   EXCEPTION_PROLOG_1(area, extra, vec);   \
+   EXCEPTION_PROLOG_2_RELON label, hsrr
+
 /* Exception register prefixes */
-#define EXC_HV H
-#define EXC_STD
+#define EXC_HV 1
+#define EXC_STD0
 
 #if defined(CONFIG_RELOCATABLE)
 /*
@@ -308,43 +323,57 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define EXCEPTION_PROLOG_1(area, extra, vec)   \
_EXCEPTION_PROLOG_1(area, extra, vec)
 

[PATCH 01/17] powerpc/64s/exception: fix some line wrap and semicolon inconsistencies in macros

2019-02-04 Thread Nicholas Piggin
By convention, all lines should be separated by a semicolons. Last line
should have neithe semicolon or line wrap. Small cleanup before we begin.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/exception-64s.h | 36 ++---
 arch/powerpc/include/asm/head-64.h   | 68 
 2 files changed, 52 insertions(+), 52 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 3b4767ed3ec5..f78ff225cb64 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -189,11 +189,11 @@
  */
 #define LOAD_HANDLER(reg, label)   \
ld  reg,PACAKBASE(r13); /* get high part of  */   \
-   ori reg,reg,FIXED_SYMBOL_ABS_ADDR(label);
+   ori reg,reg,FIXED_SYMBOL_ABS_ADDR(label)
 
 #define __LOAD_HANDLER(reg, label) \
ld  reg,PACAKBASE(r13); \
-   ori reg,reg,(ABS_ADDR(label))@l;
+   ori reg,reg,(ABS_ADDR(label))@l
 
 /*
  * Branches from unrelocated code (e.g., interrupts) to labels outside
@@ -202,7 +202,7 @@
 #define __LOAD_FAR_HANDLER(reg, label) \
ld  reg,PACAKBASE(r13); \
ori reg,reg,(ABS_ADDR(label))@l;\
-   addis   reg,reg,(ABS_ADDR(label))@h;
+   addis   reg,reg,(ABS_ADDR(label))@h
 
 /* Exception register prefixes */
 #define EXC_HV H
@@ -277,7 +277,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
OPT_SAVE_REG_TO_PACA(area+EX_CFAR, r10, CPU_FTR_CFAR);  \
INTERRUPT_TO_KERNEL;\
SAVE_CTR(r10, area);\
-   mfcrr9;
+   mfcrr9
 
 #define __EXCEPTION_PROLOG_1_POST(area)
\
std r11,area+EX_R11(r13);   \
@@ -294,7 +294,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define MASKABLE_EXCEPTION_PROLOG_1(area, extra, vec, bitmask) 
\
__EXCEPTION_PROLOG_1_PRE(area); \
extra(vec, bitmask);\
-   __EXCEPTION_PROLOG_1_POST(area);
+   __EXCEPTION_PROLOG_1_POST(area)
 
 /*
  * This version of the EXCEPTION_PROLOG_1 is intended
@@ -303,7 +303,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define _EXCEPTION_PROLOG_1(area, extra, vec)  \
__EXCEPTION_PROLOG_1_PRE(area); \
extra(vec); \
-   __EXCEPTION_PROLOG_1_POST(area);
+   __EXCEPTION_PROLOG_1_POST(area)
 
 #define EXCEPTION_PROLOG_1(area, extra, vec)   \
_EXCEPTION_PROLOG_1(area, extra, vec)
@@ -311,7 +311,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define __EXCEPTION_PROLOG_2(label, h) \
ld  r10,PACAKMSR(r13);  /* get MSR value for kernel */  \
mfspr   r11,SPRN_##h##SRR0; /* save SRR0 */ \
-   LOAD_HANDLER(r12,label) \
+   LOAD_HANDLER(r12,label);\
mtspr   SPRN_##h##SRR0,r12; \
mfspr   r12,SPRN_##h##SRR1; /* and SRR1 */  \
mtspr   SPRN_##h##SRR1,r10; \
@@ -325,7 +325,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
ld  r10,PACAKMSR(r13);  /* get MSR value for kernel */  \
xorir10,r10,MSR_RI; /* Clear MSR_RI */  \
mfspr   r11,SPRN_##h##SRR0; /* save SRR0 */ \
-   LOAD_HANDLER(r12,label) \
+   LOAD_HANDLER(r12,label);\
mtspr   SPRN_##h##SRR0,r12; \
mfspr   r12,SPRN_##h##SRR1; /* and SRR1 */  \
mtspr   SPRN_##h##SRR1,r10; \
@@ -339,7 +339,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
SET_SCRATCH0(r13);  /* save r13 */  \
EXCEPTION_PROLOG_0(area);   \
EXCEPTION_PROLOG_1(area, extra, vec);   \
-   EXCEPTION_PROLOG_2(label, h);
+   EXCEPTION_PROLOG_2(label, h)
 
 #define __KVMTEST(h, n)
\
lbz r10,HSTATE_IN_GUEST(r13);   \
@@ -413,7 +413,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define EXCEPTION_PROLOG_NORI(area, label, h, extra, vec)  \
EXCEPTION_PROLOG_0(area);   \

[PATCH 00/17] powerpc/64s: tidy and gasify exception handler code, round 1

2019-02-04 Thread Nicholas Piggin
My previous big patch was received about as well as can be expected.
That is to say I'll assume everybody loved it, so we have to get there
a bit more incrementally.

This first round of patches was verified each patch (with several
configs) to not change any generated code, to make a small step to
improving things.

The end result is that head-64.h is only used for fixed section
layout code, exeption-64s.h is only used for some paca layout and
speculation control sequences, and exception-64s.S contains all the
actual code for interrupt handlers in a bit nicer form.

There is quite a way to go yet, but hopefully this is a improvement
already, and the good thing about generated code not changing with
this series is that backports are easy to verify. Once we start code
changes, we'll want to minimise the number of releases they are
spread over.

Thanks,
Nick

Nicholas Piggin (17):
  powerpc/64s/exception: fix some line wrap and semicolon
inconsistencies in macros
  powerpc/64s/exception: remove H concatenation for EXC_HV variants
  powerpc/64s/exception: consolidate EXCEPTION_PROLOG_2 with _NORI
variant
  powerpc/64s/exception: move and tidy EXCEPTION_PROLOG_2 variants
  powerpc/64s/exception: remove the "extra" macro parameter
  powerpc/64s/exception: consolidate maskable and non-maskable prologs
  powerpc/64s/exception: merge KVM handler and skip variants
  powerpc/64s/exception: KVM handler can set the HSRR trap bit
  powerpc/64s/exception: Make EXCEPTION_PROLOG_0 a gas macro for
consistency with others
  powerpc/64s/exception: Move EXCEPTION_COMMON handler and return
branches into callers
  powerpc/64s/exception: Move EXCEPTION_COMMON additions into callers
  powerpc/64s/exception: unwind exception-64s.h macros
  powerpc/64s/exception: move EXCEPTION_PROLOG_2* to a more logical
place
  powerpc/64s/exception: remove STD_EXCEPTION_COMMON variants
  powerpc/64s/exception: move KVM related code together
  powerpc/64s/exception: move exception-64s.h code to exception-64s.S
where it is used
  powerpc/64s/exception: move head-64.h code to exception-64s.S where it
is used

 arch/powerpc/include/asm/exception-64s.h | 585 +-
 arch/powerpc/include/asm/head-64.h   | 204 +
 arch/powerpc/kernel/exceptions-64s.S | 932 ---
 3 files changed, 822 insertions(+), 899 deletions(-)

-- 
2.18.0



[PATCH] soc: fsl: dpio: Use after free in dpaa2_dpio_remove()

2019-02-04 Thread Dan Carpenter
The dpaa2_io_down(priv->io) call frees "priv->io" so I've shifted the
code around a little bit to avoid the use after free.

Fixes: 991e873223e9 ("soc: fsl: dpio: use a cpumask to identify which cpus are 
unused")
Signed-off-by: Dan Carpenter 
---
 drivers/soc/fsl/dpio/dpio-driver.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/soc/fsl/dpio/dpio-driver.c 
b/drivers/soc/fsl/dpio/dpio-driver.c
index 2d4af32a0dec..a28799b62d53 100644
--- a/drivers/soc/fsl/dpio/dpio-driver.c
+++ b/drivers/soc/fsl/dpio/dpio-driver.c
@@ -220,12 +220,12 @@ static int dpaa2_dpio_remove(struct fsl_mc_device 
*dpio_dev)
 
dev = _dev->dev;
priv = dev_get_drvdata(dev);
+   cpu = dpaa2_io_get_cpu(priv->io);
 
dpaa2_io_down(priv->io);
 
dpio_teardown_irqs(dpio_dev);
 
-   cpu = dpaa2_io_get_cpu(priv->io);
cpumask_set_cpu(cpu, cpus_unused_mask);
 
err = dpio_open(dpio_dev->mc_io, 0, dpio_dev->obj_desc.id,
-- 
2.17.1



Re: use generic DMA mapping code in powerpc V4

2019-02-04 Thread Christoph Hellwig
On Mon, Feb 04, 2019 at 01:13:54PM +0100, Christian Zigotzky wrote:
>>> Results: The X1000 and X5000 boot but unfortunately the P.A. Semi Ethernet
>>> doesn't work.
>> Are there any interesting messages in the boot log?  Can you send me
>> the dmesg?
>>
> Here you are: http://www.xenosoft.de/dmesg_X1000_with_DMA_updates.txt

It seems like the pasemi driver fails to set a DMA mask, but seems
otherwise 64-bit DMA capable.  The old PPC code didn't verify the
dma mask during the map operations, but the x86-derived generic
code does.

This patch just sets the DMA mask.

Olof: does this look ok?  The DMA device seems to not directly
bound by the net driver, but not really used by anything else in tree
either..

diff --git a/drivers/net/ethernet/pasemi/pasemi_mac.c 
b/drivers/net/ethernet/pasemi/pasemi_mac.c
index d21041554507..d98bd447c536 100644
--- a/drivers/net/ethernet/pasemi/pasemi_mac.c
+++ b/drivers/net/ethernet/pasemi/pasemi_mac.c
@@ -1716,6 +1716,7 @@ pasemi_mac_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent)
err = -ENODEV;
goto out;
}
+   dma_set_mask(>dma_pdev->dev, DMA_BIT_MASK(32));
 
mac->iob_pdev = pci_get_device(PCI_VENDOR_ID_PASEMI, 0xa001, NULL);
if (!mac->iob_pdev) {


Re: [PATCH] powerpc/64s: Remove MSR_RI optimisation in system_call_exit()

2019-02-04 Thread Michael Ellerman
Nicholas Piggin  writes:

> Michael Ellerman's on January 17, 2019 9:35 pm:
>> Currently in system_call_exit() we have an optimisation where we
>> disable MSR_RI (recoverable interrupt) and MSR_EE (external interrupt
>> enable) in a single mtmsrd instruction.
>> 
>> Unfortunately this will no longer work with THREAD_INFO_IN_TASK,
>> because then the load of TI_FLAGS might fault and faulting with MSR_RI
>> clear is treated as an unrecoverable exception which leads to a
>> panic().
>> 
>> So change the code to only clear MSR_EE prior to loading TI_FLAGS,
>> leaving the clear of MSR_RI until later. We have some latitude in
>> where do the clear of MSR_RI. A bit of experimentation has shown that
>> this location gives the least slow down.
>> 
>> This still causes a noticeable slow down in our null_syscall
>> performance. On a Power9 DD2.2:
>> 
>>   BeforeAfter Delta Delta %
>>   955 cycles999 cycles-44-4.6%
>> 
>> On the plus side this does simplify the code somewhat, because we
>> don't have to reenable MSR_RI on the restore_math() or
>> syscall_exit_work() paths which was necessitated previously by the
>> optimisation.
>> 
>> Signed-off-by: Michael Ellerman 
>
> Reviewed-by: Nicholas Piggin 
>
> But only because spectre and meltdown broke my spirit.



Thanks for reviewing it anyway.

cheers


Re: use generic DMA mapping code in powerpc V4

2019-02-04 Thread Christian Zigotzky

On 04 February 2019 at 08:56AM, Christoph Hellwig wrote:

On Sun, Feb 03, 2019 at 05:49:02PM +0100, Christian Zigotzky wrote:

OK, next step: b50f42f0fe12965ead395c76bcb6a14f00cdf65b (powerpc/dma: use
the dma_direct mapping routines)

git clone git://git.infradead.org/users/hch/misc.git -b powerpc-dma.6 a

git checkout b50f42f0fe12965ead395c76bcb6a14f00cdf65b

Results: The X1000 and X5000 boot but unfortunately the P.A. Semi Ethernet
doesn't work.

Are there any interesting messages in the boot log?  Can you send me
the dmesg?


Here you are: http://www.xenosoft.de/dmesg_X1000_with_DMA_updates.txt

-- Christian



Re: [PATCH 06/19] KVM: PPC: Book3S HV: add a GET_ESB_FD control to the XIVE native device

2019-02-04 Thread Cédric Le Goater
On 2/4/19 5:45 AM, David Gibson wrote:
> On Mon, Jan 07, 2019 at 07:43:18PM +0100, Cédric Le Goater wrote:
>> This will let the guest create a memory mapping to expose the ESB MMIO
>> regions used to control the interrupt sources, to trigger events, to
>> EOI or to turn off the sources.
>>
>> Signed-off-by: Cédric Le Goater 
>> ---
>>  arch/powerpc/include/uapi/asm/kvm.h   |  4 ++
>>  arch/powerpc/kvm/book3s_xive_native.c | 97 +++
>>  2 files changed, 101 insertions(+)
>>
>> diff --git a/arch/powerpc/include/uapi/asm/kvm.h 
>> b/arch/powerpc/include/uapi/asm/kvm.h
>> index 8c876c166ef2..6bb61ba141c2 100644
>> --- a/arch/powerpc/include/uapi/asm/kvm.h
>> +++ b/arch/powerpc/include/uapi/asm/kvm.h
>> @@ -675,4 +675,8 @@ struct kvm_ppc_cpu_char {
>>  #define  KVM_XICS_PRESENTED (1ULL << 43)
>>  #define  KVM_XICS_QUEUED(1ULL << 44)
>>  
>> +/* POWER9 XIVE Native Interrupt Controller */
>> +#define KVM_DEV_XIVE_GRP_CTRL   1
>> +#define   KVM_DEV_XIVE_GET_ESB_FD   1
> 
> Introducing a new FD for ESB and TIMA seems overkill.  Can't you get
> to both with an mmap() directly on the xive device fd?  Using the
> offset to distinguish which one to map, obviously.

The page offset would define some sort of user API. It seems feasible.
But I am not sure this would be practical in the future if we need to 
tune the length.

The TIMA has two pages that can be exposed at guest level for interrupt 
management : the OS and the USER page. That should be OK.

But we might want to map only portions of the interrupt ESB space, for 
PCI passthrough for instance as Paul proposed. I am still looking at that.

Thanks,

C.

>>  #endif /* __LINUX_KVM_POWERPC_H */
>> diff --git a/arch/powerpc/kvm/book3s_xive_native.c 
>> b/arch/powerpc/kvm/book3s_xive_native.c
>> index 115143e76c45..e20081f0c8d4 100644
>> --- a/arch/powerpc/kvm/book3s_xive_native.c
>> +++ b/arch/powerpc/kvm/book3s_xive_native.c
>> @@ -153,6 +153,85 @@ int kvmppc_xive_native_connect_vcpu(struct kvm_device 
>> *dev,
>>  return rc;
>>  }
>>  
>> +static int xive_native_esb_fault(struct vm_fault *vmf)
>> +{
>> +struct vm_area_struct *vma = vmf->vma;
>> +struct kvmppc_xive *xive = vma->vm_file->private_data;
>> +struct kvmppc_xive_src_block *sb;
>> +struct kvmppc_xive_irq_state *state;
>> +struct xive_irq_data *xd;
>> +u32 hw_num;
>> +u16 src;
>> +u64 page;
>> +unsigned long irq;
>> +
>> +/*
>> + * Linux/KVM uses a two pages ESB setting, one for trigger and
>> + * one for EOI
>> + */
>> +irq = vmf->pgoff / 2;
>> +
>> +sb = kvmppc_xive_find_source(xive, irq, );
>> +if (!sb) {
>> +pr_err("%s: source %lx not found !\n", __func__, irq);
>> +return VM_FAULT_SIGBUS;
>> +}
>> +
>> +state = >irq_state[src];
>> +kvmppc_xive_select_irq(state, _num, );
>> +
>> +arch_spin_lock(>lock);
>> +
>> +/*
>> + * first/even page is for trigger
>> + * second/odd page is for EOI and management.
>> + */
>> +page = vmf->pgoff % 2 ? xd->eoi_page : xd->trig_page;
>> +arch_spin_unlock(>lock);
>> +
>> +if (!page) {
>> +pr_err("%s: acessing invalid ESB page for source %lx !\n",
>> +   __func__, irq);
>> +return VM_FAULT_SIGBUS;
>> +}
>> +
>> +vmf_insert_pfn(vma, vmf->address, page >> PAGE_SHIFT);
>> +return VM_FAULT_NOPAGE;
>> +}
>> +
>> +static const struct vm_operations_struct xive_native_esb_vmops = {
>> +.fault = xive_native_esb_fault,
>> +};
>> +
>> +static int xive_native_esb_mmap(struct file *file, struct vm_area_struct 
>> *vma)
>> +{
>> +/* There are two ESB pages (trigger and EOI) per IRQ */
>> +if (vma_pages(vma) + vma->vm_pgoff > KVMPPC_XIVE_NR_IRQS * 2)
>> +return -EINVAL;
>> +
>> +vma->vm_flags |= VM_IO | VM_PFNMAP;
>> +vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
>> +vma->vm_ops = _native_esb_vmops;
>> +return 0;
>> +}
>> +
>> +static const struct file_operations xive_native_esb_fops = {
>> +.mmap = xive_native_esb_mmap,
>> +};
>> +
>> +static int kvmppc_xive_native_get_esb_fd(struct kvmppc_xive *xive, u64 addr)
>> +{
>> +u64 __user *ubufp = (u64 __user *) addr;
>> +int ret;
>> +
>> +ret = anon_inode_getfd("[xive-esb]", _native_esb_fops, xive,
>> +O_RDWR | O_CLOEXEC);
>> +if (ret < 0)
>> +return ret;
>> +
>> +return put_user(ret, ubufp);
>> +}
>> +
>>  static int kvmppc_xive_native_set_attr(struct kvm_device *dev,
>> struct kvm_device_attr *attr)
>>  {
>> @@ -162,12 +241,30 @@ static int kvmppc_xive_native_set_attr(struct 
>> kvm_device *dev,
>>  static int kvmppc_xive_native_get_attr(struct kvm_device *dev,
>> struct kvm_device_attr *attr)
>>  {
>> +struct kvmppc_xive *xive = dev->private;
>> +
>> +switch (attr->group) {
>> +case KVM_DEV_XIVE_GRP_CTRL:

Re: [PATCH 05/19] KVM: PPC: Book3S HV: add a new KVM device for the XIVE native exploitation mode

2019-02-04 Thread Cédric Le Goater
On 2/4/19 5:25 AM, David Gibson wrote:
> On Mon, Jan 07, 2019 at 07:43:17PM +0100, Cédric Le Goater wrote:
>> This is the basic framework for the new KVM device supporting the XIVE
>> native exploitation mode. The user interface exposes a new capability
>> and a new KVM device to be used by QEMU.
>>
>> Internally, the interface to the new KVM device is protected with a
>> new interrupt mode: KVMPPC_IRQ_XIVE.
>>
>> Signed-off-by: Cédric Le Goater 
>> ---
>>  arch/powerpc/include/asm/kvm_host.h   |   2 +
>>  arch/powerpc/include/asm/kvm_ppc.h|  21 ++
>>  arch/powerpc/kvm/book3s_xive.h|   3 +
>>  include/uapi/linux/kvm.h  |   3 +
>>  arch/powerpc/kvm/book3s.c |   7 +-
>>  arch/powerpc/kvm/book3s_xive_native.c | 332 ++
>>  arch/powerpc/kvm/powerpc.c|  30 +++
>>  arch/powerpc/kvm/Makefile |   2 +-
>>  8 files changed, 398 insertions(+), 2 deletions(-)
>>  create mode 100644 arch/powerpc/kvm/book3s_xive_native.c
>>
>> diff --git a/arch/powerpc/include/asm/kvm_host.h 
>> b/arch/powerpc/include/asm/kvm_host.h
>> index 0f98f00da2ea..c522e8274ad9 100644
>> --- a/arch/powerpc/include/asm/kvm_host.h
>> +++ b/arch/powerpc/include/asm/kvm_host.h
>> @@ -220,6 +220,7 @@ extern struct kvm_device_ops kvm_xics_ops;
>>  struct kvmppc_xive;
>>  struct kvmppc_xive_vcpu;
>>  extern struct kvm_device_ops kvm_xive_ops;
>> +extern struct kvm_device_ops kvm_xive_native_ops;
>>  
>>  struct kvmppc_passthru_irqmap;
>>  
>> @@ -446,6 +447,7 @@ struct kvmppc_passthru_irqmap {
>>  #define KVMPPC_IRQ_DEFAULT  0
>>  #define KVMPPC_IRQ_MPIC 1
>>  #define KVMPPC_IRQ_XICS 2 /* Includes a XIVE option */
>> +#define KVMPPC_IRQ_XIVE 3 /* XIVE native exploitation mode */
>>  
>>  #define MMIO_HPTE_CACHE_SIZE4
>>  
>> diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
>> b/arch/powerpc/include/asm/kvm_ppc.h
>> index eb0d79f0ca45..1bb313f238fe 100644
>> --- a/arch/powerpc/include/asm/kvm_ppc.h
>> +++ b/arch/powerpc/include/asm/kvm_ppc.h
>> @@ -591,6 +591,18 @@ extern int kvmppc_xive_set_icp(struct kvm_vcpu *vcpu, 
>> u64 icpval);
>>  extern int kvmppc_xive_set_irq(struct kvm *kvm, int irq_source_id, u32 irq,
>> int level, bool line_status);
>>  extern void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu);
>> +
>> +static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu)
>> +{
>> +return vcpu->arch.irq_type == KVMPPC_IRQ_XIVE;
>> +}
>> +
>> +extern int kvmppc_xive_native_connect_vcpu(struct kvm_device *dev,
>> +struct kvm_vcpu *vcpu, u32 cpu);
>> +extern void kvmppc_xive_native_cleanup_vcpu(struct kvm_vcpu *vcpu);
>> +extern void kvmppc_xive_native_init_module(void);
>> +extern void kvmppc_xive_native_exit_module(void);
>> +
>>  #else
>>  static inline int kvmppc_xive_set_xive(struct kvm *kvm, u32 irq, u32 server,
>> u32 priority) { return -1; }
>> @@ -614,6 +626,15 @@ static inline int kvmppc_xive_set_icp(struct kvm_vcpu 
>> *vcpu, u64 icpval) { retur
>>  static inline int kvmppc_xive_set_irq(struct kvm *kvm, int irq_source_id, 
>> u32 irq,
>>int level, bool line_status) { return 
>> -ENODEV; }
>>  static inline void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu) { }
>> +
>> +static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu)
>> +{ return 0; }
>> +static inline int kvmppc_xive_native_connect_vcpu(struct kvm_device *dev,
>> +  struct kvm_vcpu *vcpu, u32 
>> cpu) { return -EBUSY; }
>> +static inline void kvmppc_xive_native_cleanup_vcpu(struct kvm_vcpu *vcpu) { 
>> }
>> +static inline void kvmppc_xive_native_init_module(void) { }
>> +static inline void kvmppc_xive_native_exit_module(void) { }
>> +
>>  #endif /* CONFIG_KVM_XIVE */
>>  
>>  /*
>> diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xive.h
>> index 10c4aa5cd010..5f22415520b4 100644
>> --- a/arch/powerpc/kvm/book3s_xive.h
>> +++ b/arch/powerpc/kvm/book3s_xive.h
>> @@ -12,6 +12,9 @@
>>  #ifdef CONFIG_KVM_XICS
>>  #include "book3s_xics.h"
>>  
>> +#define KVMPPC_XIVE_FIRST_IRQ   0
>> +#define KVMPPC_XIVE_NR_IRQS KVMPPC_XICS_NR_IRQS
>> +
>>  /*
>>   * State for one guest irq source.
>>   *
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 6d4ea4b6c922..52bf74a1616e 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -988,6 +988,7 @@ struct kvm_ppc_resize_hpt {
>>  #define KVM_CAP_ARM_VM_IPA_SIZE 165
>>  #define KVM_CAP_MANUAL_DIRTY_LOG_PROTECT 166
>>  #define KVM_CAP_HYPERV_CPUID 167
>> +#define KVM_CAP_PPC_IRQ_XIVE 168
>>  
>>  #ifdef KVM_CAP_IRQ_ROUTING
>>  
>> @@ -1211,6 +1212,8 @@ enum kvm_device_type {
>>  #define KVM_DEV_TYPE_ARM_VGIC_V3KVM_DEV_TYPE_ARM_VGIC_V3
>>  KVM_DEV_TYPE_ARM_VGIC_ITS,
>>  #define KVM_DEV_TYPE_ARM_VGIC_ITS   KVM_DEV_TYPE_ARM_VGIC_ITS
>> +KVM_DEV_TYPE_XIVE,
>> 

Re: [PATCH v15 00/13] powerpc: Switch to CONFIG_THREAD_INFO_IN_TASK

2019-02-04 Thread Michael Ellerman
Christophe Leroy  writes:

> The purpose of this serie is to activate CONFIG_THREAD_INFO_IN_TASK which
> moves the thread_info into task_struct.

Hi Christophe,

I've taken this series and split some of the patches up a bit more.

I'll just run it through some tests and then post my version.

cheers


> Moving thread_info into task_struct has the following advantages:
> - It protects thread_info from corruption in the case of stack
> overflows.
> - Its address is harder to determine if stack addresses are
> leaked, making a number of attacks more difficult.
>
> Changes in v15:
>  - switched patch 1 and 2.
>  - resync patch 1 with linux/next. As memblock modifications are now fully 
> merged in
>  linux-mm tree, this patch voids as soon as linux-mm gets merged into 
> powerpc/merge branch
>  - Fixed build failure on 64le due to call to 
> __save_stack_trace_tsk_reliable() (patch 5)
>  - Taken the renaming of THREAD_INFO to TASK_STACK out of the preparation 
> patch to ease review (hence new patch 6)
>  - Fixed one place where r11 (physical address of stack) was used instead of 
> r1 to locate
>  thread_info, inducing a bug when switching to r2 which is virtual address of 
> current (patch 7)
>  - Keeping physical address of current in r2 until MMU translation is 
> reactivated (patch 11)
>
> Changes in v14 (ie since v13):
>  - Added in front a fixup patch which conflicts with this serie
>  - Added a patch for using try_get_task_stack()/put_task_stack() in stack 
> walkers.
>  - Fixed compilation failure in the preparation patch (by moving the 
> modification
>  of klp_init_thread_info() to the following patch)
>
> Changes since v12:
>  - Patch 1: Taken comment from Mike (re-introduced the 'panic' in case 
> memblock allocation fails in setup_64.c
>  - Patch 1: Added alloc_stack() function in setup_32.c to also panic in case 
> of allocation failure.
>
> Changes since v11:
>  - Rebased on 81775f5563fa ("Automatic merge of branches 'master', 'next' and 
> 'fixes' into merge")
>  - Added a first patch to change memblock allocs to functions returning 
> virtual addrs. This removes
>the memset() which were the only remaining stuff in irq_ctx_init() and 
> exc_lvl_ctx_init() at the end.
>  - dropping irq_ctx_init() and exc_lvl_ctx_init() in patch 5 (powerpc: 
> Activate CONFIG_THREAD_INFO_IN_TASK)
>  - A few cosmetic changes in commit log and code.
>
> Changes since v10:
>  - Rebased on 21622a0d2023 ("Automatic merge of branches 'master', 'next' and 
> 'fixes' into merge")
>   ==> Fixed conflict in setup_32.S
>
> Changes since v9:
>  - Rebased on 183cbf93be88 ("Automatic merge of branches 'master', 'next' and 
> 'fixes' into merge")
>   ==> Fixed conflict on xmon
>
> Changes since v8:
>  - Rebased on e589b79e40d9 ("Automatic merge of branches 'master', 'next' and 
> 'fixes' into merge")
>   ==> Main impact was conflicts due to commit 9a8dd708d547 ("memblock: rename 
> memblock_alloc{_nid,_try_nid} to memblock_phys_alloc*")
>
> Changes since v7:
>  - Rebased on fb6c6ce7907d ("Automatic merge of branches 'master', 'next' and 
> 'fixes' into merge")
>
> Changes since v6:
>  - Fixed validate_sp() to exclude NULL sp in 'regain entire stack space' 
> patch (early crash with CONFIG_KMEMLEAK)
>
> Changes since v5:
>  - Fixed livepatch_sp setup by using end_of_stack() instead of hardcoding
>  - Fixed PPC_BPF_LOAD_CPU() macro
>
> Changes since v4:
>  - Fixed a build failure on 32bits SMP when include/generated/asm-offsets.h 
> is not
>  already existing, was due to spaces instead of a tab in the Makefile
>
> Changes since RFC v3: (based on Nick's review)
>  - Renamed task_size.h to task_size_user64.h to better relate to what it 
> contains.
>  - Handling of the isolation of thread_info cpu field inside CONFIG_SMP 
> #ifdefs moved to a separate patch.
>  - Removed CURRENT_THREAD_INFO macro completely.
>  - Added a guard in asm/smp.h to avoid build failure before _TASK_CPU is 
> defined.
>  - Added a patch at the end to rename 'tp' pointers to 'sp' pointers
>  - Renamed 'tp' into 'sp' pointers in preparation patch when relevant
>  - Fixed a few commit logs
>  - Fixed checkpatch report.
>
> Changes since RFC v2:
>  - Removed the modification of names in asm-offsets
>  - Created a rule in arch/powerpc/Makefile to append the offset of 
> current->cpu in CFLAGS
>  - Modified asm/smp.h to use the offset set in CFLAGS
>  - Squashed the renaming of THREAD_INFO to TASK_STACK in the preparation patch
>  - Moved the modification of current_pt_regs in the patch activating 
> CONFIG_THREAD_INFO_IN_TASK
>
> Changes since RFC v1:
>  - Removed the first patch which was modifying header inclusion order in timer
>  - Modified some names in asm-offsets to avoid conflicts when including 
> asm-offsets in C files
>  - Modified asm/smp.h to avoid having to include linux/sched.h (using 
> asm-offsets instead)
>  - Moved some changes from the activation patch to the preparation patch.
>
> Christophe Leroy (13):
>   powerpc/irq: use memblock 

Re: [PATCH v2] powerpc: drop page_is_ram() and walk_system_ram_range()

2019-02-04 Thread Michael Ellerman
Christophe Leroy  writes:

> Since commit c40dd2f76644 ("powerpc: Add System RAM to /proc/iomem")
> it is possible to use the generic walk_system_ram_range() and
> the generic page_is_ram().
>
> To enable the use of walk_system_ram_range() by the IBM EHEA
> ethernet driver, the generic function has to be exported.

I'm not sure if we have a policy on that, but I suspect we'd rather not
add a new export on all arches unless we need to. Especially seeing as
the only user is the EHEA code which is heavily in maintenance mode.

I'll put the export in powerpc code and make sure that builds.

> As powerpc was the only (last?) user of CONFIG_ARCH_HAS_WALK_MEMORY,
> the #ifdef around the generic walk_system_ram_range() has become
> useless and can be dropped.

Yes it was the only user:

a99824f327c7 ("[POWERPC] Add arch-specific walk_memory_remove() for 64-bit 
powerpc")

I'll update the changelog.

cheers


> Fixes: c40dd2f76644 ("powerpc: Add System RAM to /proc/iomem")
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/Kconfig|  3 ---
>  arch/powerpc/include/asm/page.h |  1 -
>  arch/powerpc/mm/mem.c   | 33 -
>  kernel/resource.c   |  5 +
>  4 files changed, 1 insertion(+), 41 deletions(-)
>
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 2890d36eb531..f92e6754edf1 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -478,9 +478,6 @@ config ARCH_CPU_PROBE_RELEASE
>  config ARCH_ENABLE_MEMORY_HOTPLUG
>   def_bool y
>  
> -config ARCH_HAS_WALK_MEMORY
> - def_bool y
> -
>  config ARCH_ENABLE_MEMORY_HOTREMOVE
>   def_bool y
>  
> diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
> index 5c5ea2413413..aa4497175bd3 100644
> --- a/arch/powerpc/include/asm/page.h
> +++ b/arch/powerpc/include/asm/page.h
> @@ -326,7 +326,6 @@ struct page;
>  extern void clear_user_page(void *page, unsigned long vaddr, struct page 
> *pg);
>  extern void copy_user_page(void *to, void *from, unsigned long vaddr,
>   struct page *p);
> -extern int page_is_ram(unsigned long pfn);
>  extern int devmem_is_allowed(unsigned long pfn);
>  
>  #ifdef CONFIG_PPC_SMLPAR
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index 33cc6f676fa6..fa9916c2c662 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -80,11 +80,6 @@ static inline pte_t *virt_to_kpte(unsigned long vaddr)
>  #define TOP_ZONE ZONE_NORMAL
>  #endif
>  
> -int page_is_ram(unsigned long pfn)
> -{
> - return memblock_is_memory(__pfn_to_phys(pfn));
> -}
> -
>  pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
> unsigned long size, pgprot_t vma_prot)
>  {
> @@ -176,34 +171,6 @@ int __meminit arch_remove_memory(int nid, u64 start, u64 
> size,
>  #endif
>  #endif /* CONFIG_MEMORY_HOTPLUG */
>  
> -/*
> - * walk_memory_resource() needs to make sure there is no holes in a given
> - * memory range.  PPC64 does not maintain the memory layout in /proc/iomem.
> - * Instead it maintains it in memblock.memory structures.  Walk through the
> - * memory regions, find holes and callback for contiguous regions.
> - */
> -int
> -walk_system_ram_range(unsigned long start_pfn, unsigned long nr_pages,
> - void *arg, int (*func)(unsigned long, unsigned long, void *))
> -{
> - struct memblock_region *reg;
> - unsigned long end_pfn = start_pfn + nr_pages;
> - unsigned long tstart, tend;
> - int ret = -1;
> -
> - for_each_memblock(memory, reg) {
> - tstart = max(start_pfn, memblock_region_memory_base_pfn(reg));
> - tend = min(end_pfn, memblock_region_memory_end_pfn(reg));
> - if (tstart >= tend)
> - continue;
> - ret = (*func)(tstart, tend - tstart, arg);
> - if (ret)
> - break;
> - }
> - return ret;
> -}
> -EXPORT_SYMBOL_GPL(walk_system_ram_range);
> -
>  #ifndef CONFIG_NEED_MULTIPLE_NODES
>  void __init mem_topology_setup(void)
>  {
> diff --git a/kernel/resource.c b/kernel/resource.c
> index 915c02e8e5dd..2e1636041508 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -448,8 +448,6 @@ int walk_mem_res(u64 start, u64 end, void *arg,
>arg, func);
>  }
>  
> -#if !defined(CONFIG_ARCH_HAS_WALK_MEMORY)
> -
>  /*
>   * This function calls the @func callback against all memory ranges of type
>   * System RAM which are marked as IORESOURCE_SYSTEM_RAM and IORESOUCE_BUSY.
> @@ -480,8 +478,7 @@ int walk_system_ram_range(unsigned long start_pfn, 
> unsigned long nr_pages,
>   }
>   return ret;
>  }
> -
> -#endif
> +EXPORT_SYMBOL_GPL(walk_system_ram_range);
>  
>  static int __is_ram(unsigned long pfn, unsigned long nr_pages, void *arg)
>  {
> -- 
> 2.13.3


Re: [PATCH 03/19] KVM: PPC: Book3S HV: check the IRQ controller type

2019-02-04 Thread Cédric Le Goater
On 2/4/19 1:50 AM, David Gibson wrote:
> On Wed, Jan 23, 2019 at 05:24:13PM +0100, Cédric Le Goater wrote:
>> On 1/22/19 5:56 AM, Paul Mackerras wrote:
>>> On Mon, Jan 07, 2019 at 07:43:15PM +0100, Cédric Le Goater wrote:
 We will have different KVM devices for interrupts, one for the
 XICS-over-XIVE mode and one for the XIVE native exploitation
 mode. Let's add some checks to make sure we are not mixing the
 interfaces in KVM.

 Signed-off-by: Cédric Le Goater 
 ---
  arch/powerpc/kvm/book3s_xive.c | 6 ++
  1 file changed, 6 insertions(+)

 diff --git a/arch/powerpc/kvm/book3s_xive.c 
 b/arch/powerpc/kvm/book3s_xive.c
 index f78d002f0fe0..8a4fa45f07f8 100644
 --- a/arch/powerpc/kvm/book3s_xive.c
 +++ b/arch/powerpc/kvm/book3s_xive.c
 @@ -819,6 +819,9 @@ u64 kvmppc_xive_get_icp(struct kvm_vcpu *vcpu)
  {
struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
  
 +  if (!kvmppc_xics_enabled(vcpu))
 +  return -EPERM;
 +
if (!xc)
return 0;
  
 @@ -835,6 +838,9 @@ int kvmppc_xive_set_icp(struct kvm_vcpu *vcpu, u64 
 icpval)
u8 cppr, mfrr;
u32 xisr;
  
 +  if (!kvmppc_xics_enabled(vcpu))
 +  return -EPERM;
 +
if (!xc || !xive)
return -ENOENT;
>>>
>>> I can't see how these new checks could ever trigger in the code as it
>>> stands.  Is there a way at present? 
>>
>> It would require some custom QEMU doing silly things : create the XICS 
>> KVM device, and then call kvm_get_one_reg(KVM_REG_PPC_ICP_STATE) or 
>> kvm_set_one_reg(icp->cs, KVM_REG_PPC_ICP_STATE) without connecting the
>> vCPU to its presenter. 
>>
>> Today, you get a ENOENT.
> 
> TBH, ENOENT seems fine to me.
> 
>>> Do following patches ever add a path where the new checks could trigger, 
>>> or is this just an excess of caution? 
>>
>> With the following patches, QEMU could to do something even more silly,
>> which is to mix the interrupt mode interfaces : create a KVM XICS device
>> and call KVM CPU ioctls of the KVM XIVE device, or the opposite.
> 
> AFAICT, like above, that won't really differ from calling the XIVE CPU
> ioctl()s when no irqchip is set up at all, and should be covered by
> just a !xive check.

we can drop that patch. It does not bring much.

Thanks,

C.

> 
>>
>>> (Your patch description should ideally have answered these questions > for 
>>> me.)
>>
>> Yes. I also think that I introduced this patch to early in the series.
>> It make more sense when the XICS and the XIVE KVM devices are available.  
>>
>> Thanks,
>>
>> C.
>>
> 



Applied "ASoC: fsl-asoc-card: fix object reference leaks in fsl_asoc_card_probe" to the asoc tree

2019-02-04 Thread Mark Brown
The patch

   ASoC: fsl-asoc-card: fix object reference leaks in fsl_asoc_card_probe

has been applied to the asoc tree at

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From 11907e9d3533648615db08140e3045b829d2c141 Mon Sep 17 00:00:00 2001
From: wen yang 
Date: Sat, 2 Feb 2019 14:53:16 +
Subject: [PATCH] ASoC: fsl-asoc-card: fix object reference leaks in
 fsl_asoc_card_probe

The of_find_device_by_node() takes a reference to the underlying device
structure, we should release that reference.

Signed-off-by: Wen Yang 
Cc: Timur Tabi 
Cc: Nicolin Chen 
Cc: Xiubo Li 
Cc: Fabio Estevam 
Cc: Liam Girdwood 
Cc: Mark Brown 
Cc: Jaroslav Kysela 
Cc: Takashi Iwai 
Cc: alsa-de...@alsa-project.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Mark Brown 
---
 sound/soc/fsl/fsl-asoc-card.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c
index 81f2fe2c6d23..60f87a0d99f4 100644
--- a/sound/soc/fsl/fsl-asoc-card.c
+++ b/sound/soc/fsl/fsl-asoc-card.c
@@ -689,6 +689,7 @@ static int fsl_asoc_card_probe(struct platform_device *pdev)
 asrc_fail:
of_node_put(asrc_np);
of_node_put(codec_np);
+   put_device(_pdev->dev);
 fail:
of_node_put(cpu_np);
 
-- 
2.20.1



[PATCH v3 1/2] dt-bindings: soc: fsl: Document Qixis FPGA usage

2019-02-04 Thread Pankaj Bansal
an FPGA-based system controller, called “Qixis”, which
manages several critical system features, including:
• Reset sequencing
• Power supply configuration
• Board configuration
• hardware configuration

The qixis registers are accessible over one or more system-specific
interfaces, typically I2C, JTAG or an embedded processor.

Signed-off-by: Pankaj Bansal 
---

Notes:
V3:
- Added boardname based compatible field in bindings
- Added bindings for MMIO based FPGA
V2:
- No change

 .../bindings/soc/fsl/qixis_ctrl.txt  | 53 ++
 1 file changed, 53 insertions(+)

diff --git a/Documentation/devicetree/bindings/soc/fsl/qixis_ctrl.txt 
b/Documentation/devicetree/bindings/soc/fsl/qixis_ctrl.txt
new file mode 100644
index ..5d510df14be8
--- /dev/null
+++ b/Documentation/devicetree/bindings/soc/fsl/qixis_ctrl.txt
@@ -0,0 +1,53 @@
+* QIXIS FPGA block
+
+an FPGA-based system controller, called “Qixis”, which
+manages several critical system features, including:
+• Configuration switch monitoring
+• Power on/off sequencing
+• Reset sequencing
+• Power supply configuration
+• Board configuration
+• hardware configuration
+• Background power data collection (DCM)
+• Fault monitoring
+• RCW bypass SRAM (replace flash RCW with internal RCW) (NOR only)
+• Dedicated functional validation blocks (POSt/IRS, triggered event, and so on)
+• I2C master for remote board control even with no DUT available
+
+The qixis registers are accessible over one or more system-specific interfaces,
+typically I2C, JTAG or an embedded processor.
+
+FPGA connected to I2C:
+Required properties:
+
+ - compatible: should be a board-specific string followed by a string
+   indicating the type of FPGA.  Example:
+   "fsl,-fpga", "fsl,fpga-qixis-i2c"
+ - reg : i2c address of the qixis device.
+
+Example (LX2160A-QDS):
+   /* The FPGA node */
+fpga@66 {
+   compatible = "fsl,lx2160aqds-fpga", "fsl,fpga-qixis-i2c";
+   reg = <0x66>;
+   #address-cells = <1>;
+   #size-cells = <0>;
+   }
+
+* Freescale on-board FPGA
+
+This is the memory-mapped registers for on board FPGA.
+
+Required properties:
+- compatible: should be a board-specific string followed by a string
+  indicating the type of FPGA.  Example:
+   "fsl,-fpga", "fsl,fpga-qixis"
+- reg: should contain the address and the length of the FPGA register set.
+
+Example (LS2080A-RDB):
+
+cpld@3,0 {
+compatible = "fsl,ls2080ardb-fpga", "fsl,fpga-qixis";
+reg = <0x3 0 0x1>;
+};
+
-- 
2.17.1



[PATCH v3 0/2] add qixis driver

2019-02-04 Thread Pankaj Bansal
FPGA on LX2160AQDS/LX2160ARDB connected on I2C bus,
so add qixis driver which is basically an i2c client driver to control FPGA.

Also added platform driver for MMIO based FPGA, like the one available
on LS2088ARDB/LS2088AQDS.

This driver is essential to control MDIO mux multiplexing.

This driver is dependent on below patches:
https://www.mail-archive.com/netdev@vger.kernel.org/msg281274.html

Cc: Varun Sethi 

---
Notes:
V2:
- https://patchwork.kernel.org/cover/10788341/
V1:
- https://patchwork.kernel.org/cover/10627297/

Pankaj Bansal (2):
  dt-bindings: soc: fsl: Document Qixis FPGA usage
  drivers: soc: fsl: add qixis driver

 .../bindings/soc/fsl/qixis_ctrl.txt   |  53 +
 drivers/soc/fsl/Kconfig   |  11 +
 drivers/soc/fsl/Makefile  |   1 +
 drivers/soc/fsl/qixis_ctrl.c  | 207 ++
 4 files changed, 272 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/soc/fsl/qixis_ctrl.txt
 create mode 100644 drivers/soc/fsl/qixis_ctrl.c

-- 
2.17.1



[PATCH v3 2/2] drivers: soc: fsl: add qixis driver

2019-02-04 Thread Pankaj Bansal
FPGA on LX2160AQDS/LX2160ARDB connected on I2C bus, so add qixis
driver which is basically an i2c client driver to control FPGA.

Also added platform driver for MMIO based FPGA, like the one available
on LS2088ARDB/LS2088AQDS.

Signed-off-by: Wang Dongsheng 
Signed-off-by: Pankaj Bansal 
---

Notes:
V3:
- Add MMIO based FPGA driver
V2:
- Modify the driver to not create platform devices corresponding to 
subnodes.
  because the subnodes are not actual devices.
- Use mdio_mux_regmap_init/mdio_mux_regmap_uninit
- Remove header file from include folder, as no qixis api is called from 
outside
- Add regmap_exit in driver's remove function
Dendencies:
- https://www.mail-archive.com/netdev@vger.kernel.org/msg281274.html

 drivers/soc/fsl/Kconfig  |  11 ++
 drivers/soc/fsl/Makefile |   1 +
 drivers/soc/fsl/qixis_ctrl.c | 207 +
 3 files changed, 219 insertions(+)

diff --git a/drivers/soc/fsl/Kconfig b/drivers/soc/fsl/Kconfig
index 8f80e8bbf29e..75993be04e42 100644
--- a/drivers/soc/fsl/Kconfig
+++ b/drivers/soc/fsl/Kconfig
@@ -28,4 +28,15 @@ config FSL_MC_DPIO
  other DPAA2 objects. This driver does not expose the DPIO
  objects individually, but groups them under a service layer
  API.
+
+config FSL_QIXIS
+   tristate "QIXIS system controller driver"
+   depends on OF
+   select REGMAP_I2C
+   select REGMAP_MMIO
+   default n
+   help
+ Say y here to enable QIXIS system controller api. The qixis driver
+ provides FPGA functions to control system.
+
 endmenu
diff --git a/drivers/soc/fsl/Makefile b/drivers/soc/fsl/Makefile
index 803ef1bfb5ff..47e0cfc66ca4 100644
--- a/drivers/soc/fsl/Makefile
+++ b/drivers/soc/fsl/Makefile
@@ -5,5 +5,6 @@
 obj-$(CONFIG_FSL_DPAA) += qbman/
 obj-$(CONFIG_QUICC_ENGINE) += qe/
 obj-$(CONFIG_CPM)  += qe/
+obj-$(CONFIG_FSL_QIXIS)+= qixis_ctrl.o
 obj-$(CONFIG_FSL_GUTS) += guts.o
 obj-$(CONFIG_FSL_MC_DPIO)  += dpio/
diff --git a/drivers/soc/fsl/qixis_ctrl.c b/drivers/soc/fsl/qixis_ctrl.c
new file mode 100644
index ..36a3e1abc465
--- /dev/null
+++ b/drivers/soc/fsl/qixis_ctrl.c
@@ -0,0 +1,207 @@
+// SPDX-License-Identifier: GPL-2.0+
+
+/* Freescale QIXIS system controller driver.
+ *
+ * Copyright 2015 Freescale Semiconductor, Inc.
+ * Copyright 2018-2019 NXP
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* QIXIS MAP */
+struct fsl_qixis_regs {
+   u8  id; /* Identification Registers */
+   u8  version;/* Version Register */
+   u8  qixis_ver;  /* QIXIS Version Register */
+   u8  reserved1[0x1f];
+};
+
+struct mdio_mux_data {
+   void*data;
+   struct list_headlink;
+};
+
+struct qixis_priv {
+   struct regmap   *regmap;
+   struct list_headmdio_mux_list;
+};
+
+static struct regmap_config qixis_regmap_config = {
+   .reg_bits = 8,
+   .val_bits = 8,
+};
+
+static int fsl_qixis_mdio_mux_init(struct device *dev, struct qixis_priv *priv)
+{
+   struct device_node *child;
+   struct mdio_mux_data *mux_data;
+   int ret;
+
+   INIT_LIST_HEAD(>mdio_mux_list);
+   for_each_child_of_node(dev->of_node, child) {
+   if (!of_node_name_prefix(child, "mdio-mux"))
+   continue;
+
+   mux_data = devm_kzalloc(dev, sizeof(struct mdio_mux_data),
+   GFP_KERNEL);
+   if (!mux_data)
+   return -ENOMEM;
+   ret = mdio_mux_regmap_init(dev, child, _data->data);
+   if (ret)
+   return ret;
+   list_add(_data->link, >mdio_mux_list);
+   }
+
+   return 0;
+}
+
+static int fsl_qixis_mdio_mux_uninit(struct qixis_priv *priv)
+{
+   struct list_head *pos;
+   struct mdio_mux_data *mux_data;
+
+   list_for_each(pos, >mdio_mux_list) {
+   mux_data = list_entry(pos, struct mdio_mux_data, link);
+   mdio_mux_regmap_uninit(mux_data->data);
+   }
+
+   return 0;
+}
+
+static int fsl_qixis_probe(struct platform_device *pdev)
+{
+   static struct fsl_qixis_regs __iomem *qixis;
+   struct qixis_priv *priv;
+   int ret;
+   u32 qver;
+
+   qixis = of_iomap(pdev->dev.of_node, 0);
+   if (IS_ERR_OR_NULL(qixis)) {
+   pr_err("%s: Could not map qixis registers\n", __func__);
+   return -ENODEV;
+   }
+
+   priv = devm_kzalloc(>dev, sizeof(struct qixis_priv), GFP_KERNEL);
+   if (!priv)
+   return -ENOMEM;
+
+   priv->regmap = devm_regmap_init_mmio(>dev, qixis,
+_regmap_config);
+   

Re: [PATCH v2 10/21] memblock: refactor internal allocation functions

2019-02-04 Thread Michael Ellerman
Mike Rapoport  writes:
> On Sun, Feb 03, 2019 at 08:39:20PM +1100, Michael Ellerman wrote:
>> Mike Rapoport  writes:
>> > Currently, memblock has several internal functions with overlapping
>> > functionality. They all call memblock_find_in_range_node() to find free
>> > memory and then reserve the allocated range and mark it with kmemleak.
>> > However, there is difference in the allocation constraints and in fallback
>> > strategies.
...
>> 
>> This is causing problems on some of my machines.
...
>> 
>> On some of my other systems it does that, and then panics because it
>> can't allocate anything at all:
>> 
>> [0.00] numa:   NODE_DATA [mem 0x7ffcaee80-0x7ffcb3fff]
>> [0.00] numa:   NODE_DATA [mem 0x7ffc99d00-0x7ffc9ee7f]
>> [0.00] numa: NODE_DATA(1) on node 0
>> [0.00] Kernel panic - not syncing: Cannot allocate 20864 bytes for 
>> node 16 data
>> [0.00] CPU: 0 PID: 0 Comm: swapper Not tainted 
>> 5.0.0-rc4-gccN-next-20190201-gdc4c899 #1
>> [0.00] Call Trace:
>> [0.00] [c11cfca0] [c0c11044] dump_stack+0xe8/0x164 
>> (unreliable)
>> [0.00] [c11cfcf0] [c00fdd6c] panic+0x17c/0x3e0
>> [0.00] [c11cfd90] [c0f61bc8] initmem_init+0x128/0x260
>> [0.00] [c11cfe60] [c0f57940] setup_arch+0x398/0x418
>> [0.00] [c11cfee0] [c0f50a94] start_kernel+0xa0/0x684
>> [0.00] [c11cff90] [c000af70] 
>> start_here_common+0x1c/0x52c
>> [0.00] Rebooting in 180 seconds..
>> 
>> 
>> So there's something going wrong there, I haven't had time to dig into
>> it though (Sunday night here).
>
> Yeah, I've misplaced 'nid' and 'MEMBLOCK_ALLOC_ACCESSIBLE' in
> memblock_phys_alloc_try_nid() :(
>
> Can you please check if the below patch fixes the issue on your systems?

Yes it does, thanks.

Tested-by: Michael Ellerman 

cheers


> From 5875b7440e985ce551e6da3cb28aa8e9af697e10 Mon Sep 17 00:00:00 2001
> From: Mike Rapoport 
> Date: Sun, 3 Feb 2019 13:35:42 +0200
> Subject: [PATCH] memblock: fix parameter order in
>  memblock_phys_alloc_try_nid()
>
> The refactoring of internal memblock allocation functions used wrong order
> of parameters in memblock_alloc_range_nid() call from
> memblock_phys_alloc_try_nid().
> Fix it.
>
> Signed-off-by: Mike Rapoport 
> ---
>  mm/memblock.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/memblock.c b/mm/memblock.c
> index e047933..0151a5b 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1402,8 +1402,8 @@ phys_addr_t __init 
> memblock_phys_alloc_range(phys_addr_t size,
>  
>  phys_addr_t __init memblock_phys_alloc_try_nid(phys_addr_t size, phys_addr_t 
> align, int nid)
>  {
> - return memblock_alloc_range_nid(size, align, 0, nid,
> - MEMBLOCK_ALLOC_ACCESSIBLE);
> + return memblock_alloc_range_nid(size, align, 0,
> + MEMBLOCK_ALLOC_ACCESSIBLE, nid);
>  }
>  
>  /**
> -- 
> 2.7.4
>
>
> -- 
> Sincerely yours,
> Mike.


Re: [RFC PATCH] powerpc: fix get_arch_dma_ops() for NTB devices

2019-02-04 Thread Christoph Hellwig
On Wed, Jan 30, 2019 at 11:58:40PM +1100, Michael Ellerman wrote:
> Alexander Fomichev  writes:
> 
> > get_dma_ops() falls into arch-dependant get_arch_dma_ops(), which
> > historically returns NULL on PowerPC. Therefore dma_set_mask() fails.
> > This affects Switchtec (and probably other) NTB devices, that they fail
> > to initialize.
> 
> What's an NTB device?
> 
> drivers/ntb I assume?
> 
> So it's a PCI device of some sort, but presumably the device you're
> calling dma_set_mask() on is an NTB device not a PCI device?
> 
> But then it works if you tell it to use the PCI DMA ops?
> 
> At the very least the code should be checking for the NTB bus type and
> only returning the PCI ops in that specific case, not for all devices.

Can you provide the context?  E.g. the patch and the rest of the commit
log.  This all looks rather odd to me.