Re: [PATCH 1/2] powerpc: string: implement optimized memset variants

2017-04-04 Thread PrasannaKumar Muralidharan
On 30 March 2017 at 12:46, Naveen N. Rao
 wrote:
> Also, with a simple module to memset64() a 1GB vmalloc'ed buffer, here
> are the results:
> generic:0.245315533 seconds time elapsed( +-  1.83% )
> optimized:  0.169282701 seconds time elapsed( +-  1.96% )

Wondering what makes gcc not to produce efficient assembly code. Can
you please post the disassembly of C implementation of memset64? Just
for info purpose.

Thanks,
Prasanna


Re: [RFC PATCH] powerpc/mm/hugetlb: Add support for 1G huge pages

2017-04-04 Thread Anshuman Khandual
On 04/04/2017 07:33 PM, Aneesh Kumar K.V wrote:
> This patch adds support for gigantic pages in ppc64. We also updates
> gigantic_page_supported helper such that arch can override it.

Seems like only radix based 1GB is considered as gigantic page in this
implementation. What about the existing 16GB pages support ? IIUC they
are still supported currently as gigantic pages (as defined in generic
HugeTLB) if the platform gives us reserved memory areas during boot.
Can you explain how this is going to be different ?

> 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/include/asm/book3s/64/hugetlb.h | 9 +
>  arch/powerpc/mm/hugetlbpage.c| 7 +--
>  arch/powerpc/platforms/Kconfig.cputype   | 1 +
>  mm/hugetlb.c | 4 
>  4 files changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/hugetlb.h 
> b/arch/powerpc/include/asm/book3s/64/hugetlb.h
> index cd366596..a994d069fdaf 100644
> --- a/arch/powerpc/include/asm/book3s/64/hugetlb.h
> +++ b/arch/powerpc/include/asm/book3s/64/hugetlb.h
> @@ -50,4 +50,13 @@ static inline pte_t arch_make_huge_pte(pte_t entry, struct 
> vm_area_struct *vma,
>   else
>   return entry;
>  }
> +
> +#define gigantic_page_supported gigantic_page_supported
> +static inline bool gigantic_page_supported(void)
> +{
> + if (radix_enabled())
> + return true;
> + return false;
> +}

POWER8 (non radix MMU) cannot have 16GB gigantic HugeTLB pages ?

> +
>  #endif
> diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
> index a4f33de4008e..80f6d2ed551a 100644
> --- a/arch/powerpc/mm/hugetlbpage.c
> +++ b/arch/powerpc/mm/hugetlbpage.c
> @@ -763,8 +763,11 @@ static int __init add_huge_page_size(unsigned long long 
> size)
>* Hash: 16M and 16G
>*/
>   if (radix_enabled()) {
> - if (mmu_psize != MMU_PAGE_2M)
> - return -EINVAL;
> + if (mmu_psize != MMU_PAGE_2M) {
> + if (cpu_has_feature(CPU_FTR_POWER9_DD1) ||
> + (mmu_psize != MMU_PAGE_1G))
> + return -EINVAL;
> + }

The comment above this code block needs to be updated as well for
this new page size addition. I understand that this code block
was added to protect against wrong device tree supplied page size
values but wondering dont we require one such check for normal page
sizes as well (non HugeTLB) ? But anyways, thats a different topic.



[PATCH V2] powerpc/hugetlb: Add ABI defines for supported HugeTLB page sizes

2017-04-04 Thread Anshuman Khandual
This just adds user space exported ABI definitions for 2MB, 16MB, 1GB,
16GB non default huge page sizes to be used with mmap() system call.

Signed-off-by: Anshuman Khandual 
---
These defined values will be used along with MAP_HUGETLB while calling
mmap() system call if the desired HugeTLB page size is not the default
one. Follows similar definitions present in x86.

arch/x86/include/uapi/asm/mman.h:#define MAP_HUGE_2MB(21 << MAP_HUGE_SHIFT)
arch/x86/include/uapi/asm/mman.h:#define MAP_HUGE_1GB(30 << MAP_HUGE_SHIFT)

Changes in V2:
- Added definitions for 2MB and 1GB HugeTLB pages per Aneesh

 arch/powerpc/include/uapi/asm/mman.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/include/uapi/asm/mman.h 
b/arch/powerpc/include/uapi/asm/mman.h
index 03c06ba..3eb788c 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -29,4 +29,9 @@
 #define MAP_STACK  0x2 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x4 /* create a huge page mapping */
 
+#define MAP_HUGE_2MB   (21 << MAP_HUGE_SHIFT)  /* 2MB HugeTLB Page */
+#define MAP_HUGE_16MB  (24 << MAP_HUGE_SHIFT)  /* 16MB HugeTLB Page */
+#define MAP_HUGE_1GB   (30 << MAP_HUGE_SHIFT)  /* 1GB HugeTLB Page */
+#define MAP_HUGE_16GB  (34 << MAP_HUGE_SHIFT)  /* 16GB HugeTLB Page */
+
 #endif /* _UAPI_ASM_POWERPC_MMAN_H */
-- 
1.8.5.2



Re: [PATCH v2] KVM: PPC: Book3S PR: Do not fail emulation with mtspr/mfspr for unknown SPRs

2017-04-04 Thread Paul Mackerras
On Tue, Apr 04, 2017 at 12:05:03PM +0200, Thomas Huth wrote:
> According to the PowerISA 2.07, mtspr and mfspr should not always
> generate an illegal instruction exception when being used with an
> undefined SPR, but rather treat the instruction as a NOP or inject a
> privilege exception in some cases, too - depending on the SPR number.
> Also turn the printk here into a ratelimited print statement, so that
> the guest can not flood the dmesg log of the host by issueing lots of
> illegal mtspr/mfspr instruction here.
> 
> Signed-off-by: Thomas Huth 
> ---
>  v2:
>  - Inject illegal instruction program interrupt instead of emulation
>assist interrupt (according to the last programming note in section
>6.5.9 of Book III of the PowerISA v2.07)
> 
>  arch/powerpc/kvm/book3s_emulate.c | 26 ++
>  1 file changed, 18 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_emulate.c 
> b/arch/powerpc/kvm/book3s_emulate.c
> index 8359752..bf4181e 100644
> --- a/arch/powerpc/kvm/book3s_emulate.c
> +++ b/arch/powerpc/kvm/book3s_emulate.c
> @@ -503,10 +503,14 @@ int kvmppc_core_emulate_mtspr_pr(struct kvm_vcpu *vcpu, 
> int sprn, ulong spr_val)
>   break;
>  unprivileged:
>   default:
> - printk(KERN_INFO "KVM: invalid SPR write: %d\n", sprn);
> -#ifndef DEBUG_SPR
> - emulated = EMULATE_FAIL;
> -#endif
> + pr_info_ratelimited("KVM: invalid SPR write: %d\n", sprn);
> + if (sprn & 0x10) {
> + if (kvmppc_get_msr(vcpu) & MSR_PR)
> + kvmppc_core_queue_program(vcpu, SRR1_PROGPRIV);
> + } else {
> + if ((kvmppc_get_msr(vcpu) & MSR_PR) || sprn == 0)
> + kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
> + }
>   break;

In the cases where we generate an interrupt, we are now returning
EMULATE_DONE, which means that kvmppc_emulate_instruction() will
advance the PC by 4 after this function returns.  Since
kvmppc_core_queue_program() injects the interrupt straight away, this
means that the guest will resume execution at 0x704 rather than
0x700.

Paul.


Re: [PATCH v6 01/11] powerpc/powernv: Data structure and macros definitions

2017-04-04 Thread Madhavan Srinivasan



On Tuesday 04 April 2017 07:18 AM, Daniel Axtens wrote:

Hi,


+#define IMC_MAX_CHIPS  32
+#define IMC_MAX_PMUS   32
+#define IMC_MAX_PMU_NAME_LEN   256

I've noticed this is used as both the maximum length for event names and
event value strings. Would another name suit better?


This is used in the value string length comparison also. So yes, will
change the name to suit better.

Thanks for review
Maddy




+
+#define IMC_NEST_MAX_PAGES 16
+
+#define IMC_DTB_COMPAT "ibm,opal-in-memory-counters"
+#define IMC_DTB_NEST_COMPAT"ibm,imc-counters-nest"
+
+/*
+ * Structure to hold per chip specific memory address
+ * information for nest pmus. Nest Counter data are exported
+ * in per-chip reserved memory region by the PORE Engine.
+ */
+struct perchip_nest_info {
+   u32 chip_id;
+   u64 pbase;
+   u64 vbase[IMC_NEST_MAX_PAGES];
+   u64 size;
+};
+
+/*
+ * Place holder for nest pmu events and values.
+ */
+struct imc_events {
+   char *ev_name;
+   char *ev_value;
+};
+
+/*
+ * Device tree parser code detects IMC pmu support and
+ * registers new IMC pmus. This structure will
+ * hold the pmu functions and attrs for each imc pmu and
+ * will be referenced at the time of pmu registration.
+ */
+struct imc_pmu {
+   struct pmu pmu;
+   int domain;
+   const struct attribute_group *attr_groups[4];
+};
+
+/*
+ * Domains for IMC PMUs
+ */
+#define IMC_DOMAIN_NEST1
+#define IMC_DOMAIN_UNKNOWN -1
+
+#endif /* PPC_POWERNV_IMC_PMU_DEF_H */
--
2.7.4




Re: [PATCH kernel] powerpc/iommu: Do not call PageTransHuge() on tail pages

2017-04-04 Thread Aneesh Kumar K.V



On Wednesday 05 April 2017 08:29 AM, Alexey Kardashevskiy wrote:

On 04/04/17 19:26, Aneesh Kumar K.V wrote:

Alexey Kardashevskiy  writes:


The CMA pages migration code does not support compound pages at
the moment so it performs few tests before proceeding to actual page
migration.

One of the tests - PageTransHuge() - has VM_BUG_ON_PAGE(PageTail()) as
it should be called on head pages. Since we also test for PageCompound(),
and it contains PageTail(), we can simply move PageCompound() in front
of PageTransHuge() and therefore avoid possible VM_BUG_ON_PAGE.

Signed-off-by: Alexey Kardashevskiy 
---

Some of actual POWER8 systems do crash on that BUG_ON.
---
 arch/powerpc/mm/mmu_context_iommu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/mmu_context_iommu.c 
b/arch/powerpc/mm/mmu_context_iommu.c
index 497130c5c742..ba7fccf993b3 100644
--- a/arch/powerpc/mm/mmu_context_iommu.c
+++ b/arch/powerpc/mm/mmu_context_iommu.c
@@ -81,7 +81,7 @@ struct page *new_iommu_non_cma_page(struct page *page, 
unsigned long private,
gfp_t gfp_mask = GFP_USER;
struct page *new_page;

-   if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
+   if (PageCompound(page) || PageHuge(page) || PageTransHuge(page))



A checked for compound page should be sufficient here, because a
Huge/TransHuge page is also marked compound.



But PageCompound() calls PageTail() so PageTail() will be called on a trans
page which is BUG_ON in PageTransHuge but it is not in PageCompound() -
this inconsistency is bothering me. Does not this BUG_ON tell us that we
should not be calling PageTail() on _any_ page?

In other words, should I get a head page (via compound_head()) first and
only then test the head page if it is thp/huge (as you suggested in a chat)?





I was suggesting to replace that if () condition with just

/* We don't handle hugetlb/THP pages yet */
if (PageCompund(page)) {

}

-aneesh



Re: [PATCH 1/5] crypto/nx: Rename nx842_powernv_function as icswx function

2017-04-04 Thread Haren Myneni
On 04/04/2017 04:11 AM, Michael Ellerman wrote:
> Haren Myneni  writes:
> 
>> [PATCH 1/5] crypto/nx: Rename nx842_powernv_function as icswx function
>>
>> nx842_powernv_function is points to nx842_icswx_function and
>> will be point to VAS function which will be added later for
>> P9 NX support.
> 
> I know it's nit-picking but can you give it a better name while you're
> there.
> 
> I was thinking it should be called "send" or something, but it actually
> synchronously sends and waits for a request.
> 
> So perhaps just nx842_exec(), for "execute a request", and then you can
> have nx842_exec_icswx() and nx842_exec_vas().
> 
> cheers
> 

Michael, 

Thanks for review,

nx842_powernv_function() was used before, So just renamed similar to this name. 
But I will make changes in the next version.

Haren 



Re: [PATCH kernel] powerpc/iommu: Do not call PageTransHuge() on tail pages

2017-04-04 Thread Alexey Kardashevskiy
On 04/04/17 19:26, Aneesh Kumar K.V wrote:
> Alexey Kardashevskiy  writes:
> 
>> The CMA pages migration code does not support compound pages at
>> the moment so it performs few tests before proceeding to actual page
>> migration.
>>
>> One of the tests - PageTransHuge() - has VM_BUG_ON_PAGE(PageTail()) as
>> it should be called on head pages. Since we also test for PageCompound(),
>> and it contains PageTail(), we can simply move PageCompound() in front
>> of PageTransHuge() and therefore avoid possible VM_BUG_ON_PAGE.
>>
>> Signed-off-by: Alexey Kardashevskiy 
>> ---
>>
>> Some of actual POWER8 systems do crash on that BUG_ON.
>> ---
>>  arch/powerpc/mm/mmu_context_iommu.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/powerpc/mm/mmu_context_iommu.c 
>> b/arch/powerpc/mm/mmu_context_iommu.c
>> index 497130c5c742..ba7fccf993b3 100644
>> --- a/arch/powerpc/mm/mmu_context_iommu.c
>> +++ b/arch/powerpc/mm/mmu_context_iommu.c
>> @@ -81,7 +81,7 @@ struct page *new_iommu_non_cma_page(struct page *page, 
>> unsigned long private,
>>  gfp_t gfp_mask = GFP_USER;
>>  struct page *new_page;
>>
>> -if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
>> +if (PageCompound(page) || PageHuge(page) || PageTransHuge(page))
> 
> 
> A checked for compound page should be sufficient here, because a
> Huge/TransHuge page is also marked compound.


But PageCompound() calls PageTail() so PageTail() will be called on a trans
page which is BUG_ON in PageTransHuge but it is not in PageCompound() -
this inconsistency is bothering me. Does not this BUG_ON tell us that we
should not be calling PageTail() on _any_ page?

In other words, should I get a head page (via compound_head()) first and
only then test the head page if it is thp/huge (as you suggested in a chat)?


> If we want to indicate that
> we don't handle hugetlb and THP pages, we can write that as a comment ?
> 
> 
> 
>>  return NULL;
>>
>>  if (PageHighMem(page))
>> @@ -100,7 +100,7 @@ static int mm_iommu_move_page_from_cma(struct page *page)
>>  LIST_HEAD(cma_migrate_pages);
>>
>>  /* Ignore huge pages for now */
>> -if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
>> +if (PageCompound(page) || PageHuge(page) || PageTransHuge(page))
>>  return -EBUSY;
>>
>>  lru_add_drain();
>> -- 
>> 2.11.0
> 


-- 
Alexey


[RFC PATCH 3/3] powerpc/pseries: Always enable SMP when building pseries

2017-04-04 Thread Michael Ellerman
The pseries platform supports Power4 and later CPUs, all of which are
multithreaded and/or multicore.

In practice no one ever builds a SMP=n kernel for these machines. So as
we did for powernv, have the pseries platform imply SMP=y.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/platforms/pseries/Kconfig | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/Kconfig 
b/arch/powerpc/platforms/pseries/Kconfig
index 30ec04f1c67c..913c54e23eea 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -17,9 +17,10 @@ config PPC_PSERIES
select PPC_UDBG_16550
select PPC_NATIVE
select PPC_DOORBELL
-   select HOTPLUG_CPU if SMP
+   select HOTPLUG_CPU
select ARCH_RANDOM
select PPC_DOORBELL
+   select FORCE_SMP
default y
 
 config PPC_SPLPAR
-- 
2.7.4



[RFC PATCH 2/3] powerpc/powernv: Always enable SMP when building powernv

2017-04-04 Thread Michael Ellerman
The powernv platform supports Power7 and later CPUs, all of which are
multithreaded and multicore.

As such we never build a SMP=n kernel for those machines, other than
possibly for debugging or running in a simulator.

In the debugging case we can get a similar effect by booting with
nr_cpus=1, or there's always the option of building a custom kernel with
SMP hacked out.

For running in simulators the code size reduction from building without
SMP is not particularly important, what matters is the number of
instructions executed. A quick test shows that a SMP=y kernel takes ~6%
more instructions to boot to a shell. Booting with nr_cpus=1 recovers
about half that deficit.

On the flip side, keeping the SMP=n kernel building can be a pain at
times. And although we've mostly kept it building in recent years, no
one is regularly testing that the SMP=n kernel actually boots and works
well on these machines.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/platforms/powernv/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/powernv/Kconfig 
b/arch/powerpc/platforms/powernv/Kconfig
index 3a07e4dcf97c..bd8d41d3a1b3 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -19,6 +19,7 @@ config PPC_POWERNV
select CPU_FREQ_GOV_ONDEMAND
select CPU_FREQ_GOV_CONSERVATIVE
select PPC_DOORBELL
+   select FORCE_SMP
default y
 
 config OPAL_PRD
-- 
2.7.4



[RFC PATCH 1/3] powerpc: Allow platforms to force-enable CONFIG_SMP

2017-04-04 Thread Michael Ellerman
Of the 64-bit Book3S platforms, only powermac supports booting on an
actual non-SMP system. The other platforms can be built with SMP
disabled, but it doesn't make a lot of sense given the CPUs they support
are all multicore or multithreaded.

So give platforms the option of forcing SMP=y.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/platforms/Kconfig.cputype | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index 99b0ae8acb78..5c011e4baf0b 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -371,9 +371,15 @@ config PPC_PERF_CTRS
help
  This enables the powerpc-specific perf_event back-end.
 
+config FORCE_SMP
+   # Allow platforms to force SMP=y by selecting this
+   bool
+   default n
+   select SMP
+
 config SMP
depends on PPC_BOOK3S || PPC_BOOK3E || FSL_BOOKE || PPC_47x
-   bool "Symmetric multi-processing support"
+   bool "Symmetric multi-processing support" if !FORCE_SMP
---help---
  This enables support for systems with more than one CPU. If you have
  a system with only one CPU, say N. If you have a system with more
-- 
2.7.4



Re: POWER4 - who has one?

2017-04-04 Thread luigi burdo
Hi Michael,

PowerPc 970 as classified as Power4 too, because is it a pure derivate.

If need i have a 970MP


Ciao

Luigi


Da: Linuxppc-dev 
 per conto di 
Michael Ellerman 
Inviato: martedì 4 aprile 2017 15.20
A: linuxppc-dev@
Oggetto: POWER4 - who has one?

Hi folks,

Quick quiz, who still has a POWER4?

And if so are you running mainline on it?

cheers


Re: POWER4 - who has one?

2017-04-04 Thread Andreas Schwab
On Apr 04 2017, Michael Ellerman  wrote:

> Quick quiz, who still has a POWER4?

Does a G5 qualify?

> And if so are you running mainline on it?

Always following latest -rc.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: [PATCH v4 04/11] VAS: Define vas_init() and vas_exit()

2017-04-04 Thread Sukadev Bhattiprolu
Sukadev Bhattiprolu [sukadevatlinux.vnet.ibm.com] wrote:
> Implement vas_init() and vas_exit() functions for a new VAS module.
> This VAS module is essentially a library for other device drivers
> and kernel users of the NX coprocessors like NX-842 and NX-GZIP.
> In the future this will be extended to add support for user space
> to access the NX coprocessors.

Add "depends on PPC_64K_PAGES" to the Kconfig for VAS.
---

>From ff6fb584282363f6917fd956ccac05822d1912d7 Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu 
Date: Thu, 12 Jan 2017 02:16:10 -0500
Subject: [PATCH v4 04/11] VAS: Define vas_init() and vas_exit()

Implement vas_init() and vas_exit() functions for a new VAS module.
This VAS module is essentially a library for other device drivers
and kernel users of the NX coprocessors like NX-842 and NX-GZIP.
In the future this will be extended to add support for user space
to access the NX coprocessors.

VAS is currently only supported with 64K page size.

Signed-off-by: Sukadev Bhattiprolu 
---
Changelog[v4]:
- [Michael Neuling] Fix some accidental deletions; fix help text
  in Kconfig; change vas_initialized to a function; move from
  drivers/misc to arch/powerpc/kernel
- Drop the vas_window_reset() interface. It is not needed as
  window will be initialized before each use.
- Add a "depends on PPC_64K_PAGES"
Changelog[v3]:
- Zero vas_instances memory on allocation
- [Haren Myneni] Fix description in Kconfig
Changelog[v2]:
- Get HVWC, UWC and window address parameters from device tree.
---
 arch/powerpc/platforms/powernv/Kconfig  |  14 +++
 arch/powerpc/platforms/powernv/Makefile |   1 +
 arch/powerpc/platforms/powernv/vas-window.c |  19 
 arch/powerpc/platforms/powernv/vas.c| 145 
 arch/powerpc/platforms/powernv/vas.h|   3 +
 5 files changed, 182 insertions(+)
 create mode 100644 arch/powerpc/platforms/powernv/vas-window.c
 create mode 100644 arch/powerpc/platforms/powernv/vas.c

diff --git a/arch/powerpc/platforms/powernv/Kconfig 
b/arch/powerpc/platforms/powernv/Kconfig
index 3a07e4d..34c344c 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -27,3 +27,17 @@ config OPAL_PRD
help
  This enables the opal-prd driver, a facility to run processor
  recovery diagnostics on OpenPower machines
+
+config VAS
+   tristate "IBM Virtual Accelerator Switchboard (VAS)"
+   depends on PPC_POWERNV && PPC_64K_PAGES
+   default n
+   help
+ This enables support for IBM Virtual Accelerator Switchboard (VAS).
+
+ VAS allows accelerators in co-processors like NX-GZIP and NX-842
+ to be accessible to kernel subsystems.
+
+ VAS adapters are found in POWER9 based systems.
+
+ If unsure, say N.
diff --git a/arch/powerpc/platforms/powernv/Makefile 
b/arch/powerpc/platforms/powernv/Makefile
index b5d98cb..ebef20b 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -12,3 +12,4 @@ obj-$(CONFIG_PPC_SCOM)+= opal-xscom.o
 obj-$(CONFIG_MEMORY_FAILURE)   += opal-memory-errors.o
 obj-$(CONFIG_TRACEPOINTS)  += opal-tracepoints.o
 obj-$(CONFIG_OPAL_PRD) += opal-prd.o
+obj-$(CONFIG_VAS)  += vas.o vas-window.o
diff --git a/arch/powerpc/platforms/powernv/vas-window.c 
b/arch/powerpc/platforms/powernv/vas-window.c
new file mode 100644
index 000..6156fbe
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/vas-window.c
@@ -0,0 +1,19 @@
+/*
+ * Copyright 2016 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include 
+#include 
+
+#include "vas.h"
+
+/* stub for now */
+int vas_win_close(struct vas_window *window)
+{
+   return -1;
+}
diff --git a/arch/powerpc/platforms/powernv/vas.c 
b/arch/powerpc/platforms/powernv/vas.c
new file mode 100644
index 000..9bf8f57
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/vas.c
@@ -0,0 +1,145 @@
+/*
+ * Copyright 2016 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vas.h"
+
+static bool init_done;
+int vas_num_instances;
+struct vas_instance *vas_instances;
+
+static int init_vas_instance(struct device_node *dn,
+   struct vas_instance *vinst)
+{
+   int rc;
+   const __be32 *p;
+   u64 values[6];
+
+   ida_init(>ida);
+   mutex_init(>mutex);

Re: [PATCH v4 02/11] VAS: Define macros, register fields and structures

2017-04-04 Thread Sukadev Bhattiprolu
Sukadev Bhattiprolu [sukadevatlinux.vnet.ibm.com] wrote:
> Define macros for the VAS hardware registers and bit-fields as well
> as couple of data structures needed by the VAS driver.
> 
> Signed-off-by: Sukadev Bhattiprolu 



> +++ b/arch/powerpc/platforms/powernv/vas.h
> @@ -0,0 +1,387 @@
> +/*
> + * Copyright 2016 IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#ifndef _VAS_H
> +#define _VAS_H
> +#include 
> +#include 
> +#include 
> +
> +#ifdef CONFIG_PPC_4K_PAGES
> +#error "TODO: Compute RMA/Paste-address for 4K pages."
> +#else
> +#ifndef CONFIG_PPC_64K_PAGES
> +#error "Unexpected Page size."
> +#endif
> +#endif

The above "#error" breaks kbuild with some config. Here is the updated
patch with the block removed. Instead, [PATCH 4] now includes a "depends
on PPC_64K_PAGES".

Thanks,

Sukadev


>From 45629fa7a233eb35483c0941fb4f0509d369b001 Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu 
Date: Thu, 10 Nov 2016 16:51:17 -0500
Subject: [PATCH v4 02/11] VAS: Define macros, register fields and structures

Define macros for the VAS hardware registers and bit-fields as well
as couple of data structures needed by the VAS driver.

Signed-off-by: Sukadev Bhattiprolu 
---
Changelog[v4]
- [Michael Neuling] Move VAS code to arch/powerpc; Reorg vas.h and
  vas-internal.h to kernel and uapi versions; rather than creating
  separate properties for window context/address entries in device
  tree, combine them into "reg" properties; drop ->hwirq and irq_port
  fields from vas_window as they are only needed with user space
  windows.
- Drop the error check for CONFIG_PPC_4K_PAGES. Instead in a
  follow-on patch add a "depends on CONFIG_PPC_64K_PAGES".

Changelog[v3]
- Rename winctx->pid to winctx->pidr to reflect that its a value
  from the PID register (SPRN_PID), not the linux process id.
- Make it easier to split header into kernel/user parts
- To keep user interface simple, use macros rather than enum for
  the threshold-control modes.
- Add a pid field to struct vas_window - needed for user space
  send windows.

Changelog[v2]
- Add an overview of VAS in vas-internal.h
- Get window context parameters from device tree and drop
  unnecessary macros.
---
 arch/powerpc/include/asm/vas.h   |  34 
 arch/powerpc/include/uapi/asm/vas.h  |  25 +++
 arch/powerpc/platforms/powernv/vas.h | 379 +++
 3 files changed, 438 insertions(+)
 create mode 100644 arch/powerpc/include/asm/vas.h
 create mode 100644 arch/powerpc/include/uapi/asm/vas.h
 create mode 100644 arch/powerpc/platforms/powernv/vas.h

diff --git a/arch/powerpc/include/asm/vas.h b/arch/powerpc/include/asm/vas.h
new file mode 100644
index 000..e2575d5
--- /dev/null
+++ b/arch/powerpc/include/asm/vas.h
@@ -0,0 +1,34 @@
+/*
+ * Copyright 2016 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _MISC_VAS_H
+#define _MISC_VAS_H
+
+#include 
+
+/*
+ * Min and max FIFO sizes are based on Version 1.05 Section 3.1.4.25
+ * (Local FIFO Size Register) of the VAS workbook.
+ */
+#define VAS_RX_FIFO_SIZE_MIN   (1 << 10)   /* 1KB */
+#define VAS_RX_FIFO_SIZE_MAX   (8 << 20)   /* 8MB */
+
+/*
+ * Co-processor Engine type.
+ */
+enum vas_cop_type {
+   VAS_COP_TYPE_FAULT,
+   VAS_COP_TYPE_842,
+   VAS_COP_TYPE_842_HIPRI,
+   VAS_COP_TYPE_GZIP,
+   VAS_COP_TYPE_GZIP_HIPRI,
+   VAS_COP_TYPE_MAX,
+};
+
+#endif /* _MISC_VAS_H */
diff --git a/arch/powerpc/include/uapi/asm/vas.h 
b/arch/powerpc/include/uapi/asm/vas.h
new file mode 100644
index 000..ddfe046
--- /dev/null
+++ b/arch/powerpc/include/uapi/asm/vas.h
@@ -0,0 +1,25 @@
+/*
+ * Copyright 2016 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _UAPI_MISC_VAS_H
+#define _UAPI_MISC_VAS_H
+
+/*
+ * Threshold Control Mode: Have paste operation fail if the number of
+ * requests in receive FIFO exceeds a threshold.
+ *
+ * NOTE: No special error code yet if paste is rejected because of these
+ *  limits. So users can't distinguish between this and other errors.
+ */
+#define VAS_THRESH_DISABLED0
+#define 

Re: POWER4 - who has one?

2017-04-04 Thread Denis Kirjanov
On 4/4/17, Michael Ellerman  wrote:
> Hi folks,
>
> Quick quiz, who still has a POWER4?
>
> And if so are you running mainline on it?

Not the same thing, but I have a box on two 970MP

>
> cheers
>


Re: [PATCH guest kernel] vfio/powerpc/spapr_tce: Enforce IOMMU type compatibility check

2017-04-04 Thread Alex Williamson
On Tue, 4 Apr 2017 20:12:45 +1000
Alexey Kardashevskiy  wrote:

> On 25/03/17 23:25, Alexey Kardashevskiy wrote:
> > On 25/03/17 07:29, Alex Williamson wrote:  
> >> On Fri, 24 Mar 2017 17:44:06 +1100
> >> Alexey Kardashevskiy  wrote:
> >>  
> >>> The existing SPAPR TCE driver advertises both VFIO_SPAPR_TCE_IOMMU and
> >>> VFIO_SPAPR_TCE_v2_IOMMU types to the userspace and the userspace usually
> >>> picks the v2.
> >>>
> >>> Normally the userspace would create a container, attach an IOMMU group
> >>> to it and only then set the IOMMU type (which would normally be v2).
> >>>
> >>> However a specific IOMMU group may not support v2, in other words
> >>> it may not implement set_window/unset_window/take_ownership/
> >>> release_ownership and such a group should not be attached to
> >>> a v2 container.
> >>>
> >>> This adds extra checks that a new group can do what the selected IOMMU
> >>> type suggests. The userspace can then test the return value from
> >>> ioctl(VFIO_SET_IOMMU, VFIO_SPAPR_TCE_v2_IOMMU) and try
> >>> VFIO_SPAPR_TCE_IOMMU.
> >>>
> >>> Signed-off-by: Alexey Kardashevskiy 
> >>> ---
> >>>
> >>> This is one of the patches needed to do nested VFIO - for either
> >>> second level guest or DPDK running in a guest.
> >>> ---
> >>>  drivers/vfio/vfio_iommu_spapr_tce.c | 8 
> >>>  1 file changed, 8 insertions(+)  
> >>
> >> I'm not sure I understand why you're labeling this "guest kernel", is a  
> > 
> > 
> > That is my script :)
> >   
> >> VM the only case where we can have combinations that only a subset of
> >> the groups might support v2?
> > 
> > powernv (non-virtualized, and it runs HV KVM) host provides v2-capable
> > groups, they all the same, and a pseries host (which normally runs as a
> > guest but it can do nested KVM as well - it is called PR KVM) can do only
> > v1 (after this patch, without it - no vfio at all).
> >   
> >> What terrible things happen when such a
> >> combination is created?  
> > 
> > There is no mixture at the moment, I just needed a way to tell userspace
> > that a group cannot do v2.
> >   
> >> The fix itself seems sane, but I'm trying to
> >> figure out whether it should be marked for stable, should go in for
> >> v4.11, or be queued for v4.12.  Thanks,  
> > 
> > No need for stable.  
> 
> 
> So what is the next step with this patch?

Unless there are objections or further comments, I'll put this in my
next branch for v4.12, probably this week.  Thanks,

Alex

> >>> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
> >>> b/drivers/vfio/vfio_iommu_spapr_tce.c
> >>> index cf3de91fbfe7..a7d811524092 100644
> >>> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
> >>> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
> >>> @@ -1335,8 +1335,16 @@ static int tce_iommu_attach_group(void *iommu_data,
> >>>  
> >>>   if (!table_group->ops || !table_group->ops->take_ownership ||
> >>>   !table_group->ops->release_ownership) {
> >>> + if (container->v2) {
> >>> + ret = -EPERM;
> >>> + goto unlock_exit;
> >>> + }
> >>>   ret = tce_iommu_take_ownership(container, table_group);
> >>>   } else {
> >>> + if (!container->v2) {
> >>> + ret = -EPERM;
> >>> + goto unlock_exit;
> >>> + }
> >>>   ret = tce_iommu_take_ownership_ddw(container, table_group);
> >>>   if (!tce_groups_attached(container) && !container->tables[0])
> >>>   container->def_window_pending = true;  
> >>  
> > 
> >   
> 
> 



Re: [PATCH 06/12] powerpc/xive: Native exploitation of the XIVE interrupt controller

2017-04-04 Thread Benjamin Herrenschmidt
On Tue, 2017-04-04 at 23:03 +1000, Michael Ellerman wrote:
> 
> >  14 files changed, 2186 insertions(+), 12 deletions(-)
> 
> I'm not going to review this in one go, given it's 10:30pm already.

Well, good, I was about to send (well tomorrow morning actually) v2
hoping it was going to be final since nobody else hard reviewed it :-)

> +extern void __iomem *xive_tm_area;
> 
> I think Paul already commented on "tm" being an overly used acronym.

He asked me to spell it out in a comment which I did in v2. I haven't
changed the name of the variable though which percolates through the
KVM bits etc... I could rename it (painfully) to use "tma" instead
(Thread Management Area).

> > +extern u32 xive_tm_offset;
> > +
> > +/*
> > + * Per-irq data (irq_get_handler_data for normal IRQs), IPIs
> > + * have it stored in the xive_cpu structure. We also cache
> > + * for normal interrupts the current target CPU.
> > + */
> > +struct xive_irq_data {
> > +   /* Setup by backend */
> > +   u64 flags;
> > +#define XIVE_IRQ_FLAG_STORE_EOI0x01
> > +#define XIVE_IRQ_FLAG_LSI  0x02
> > +#define XIVE_IRQ_FLAG_SHIFT_BUG0x04
> > +#define XIVE_IRQ_FLAG_MASK_FW  0x08
> > +#define XIVE_IRQ_FLAG_EOI_FW   0x10
> 
> I don't love that style, prefer them just prior to the struct.

I much prefer having the definitions next to the variable they apply
to but if you feel strongly about it, I will move them.
 
> > +   u64 eoi_page;
> > +   void __iomem *eoi_mmio;
> > +   u64 trig_page;
> > +   void __iomem *trig_mmio;
> > +   u32 esb_shift;
> > +   int src_chip;
> 
> Why not space out the members like you do in xive_q below, I think
> that looks better given you have the long __iomem lines.

Ok.

> > +
> > +   /* Setup/used by frontend */
> > +   int target;
> > +   bool saved_p;
> > +};
> > +#define XIVE_INVALID_CHIP_ID   -1
> > +
> > +/* A queue tracking structure in a CPU */
> > +struct xive_q {
> > +   __be32  *qpage;
> > +   u32 msk;
> > +   u32 idx;
> > +   u32 toggle;
> > +   u64 eoi_phys;
> > +   void __iomem*eoi_mmio;
> > +   u32 esc_irq;
> > +   atomic_tcount;
> > +   atomic_tpending_count;
> > +};
> > +
> > +/*
> > + * "magic" ESB MMIO offsets
> 
> What's an ESB?

Well, the problem here is that if I start answering that one along with
a chunk of the rest of your questions, I basically end up writing a
summary of the XIVE specification in comments, which would probably
take 2 or 3 pages ;-)

I don't know where to start there or rather how far to go. I could
spell out the acronyms but it's not necessarily that useful.

Another problem with XIVE is that everything has 2 names ! The original
design came with (rather sane) names but the "architects" later on
renamed everything into weird stuff. For example, the HW name of an
event queue descriptor is "EQD". The "architecture" name is "END"
(Event Notification Descriptor I *think*).

Sadly bcs we have docs mix & matching both, I ended up accidentally
making a bit of a mess myself though I've generally favored the HW
names (EQ vs. END, VP (Virtual Processor) vs. NVT (Notification Virtual
Target), etc... 

> If here you put:
> 
> #define pr_fmt(fmt) "xive: " fmt
> 
> Then you can drop the prefix from every pr_xxx() in the whole file.

Yup. I live in the past obviously :-)

> > +/*
> > + * A "disabled" interrupt should never fire, to catch problems
> > + * we set its logical number to this
> > + */
> > +#define XIVE_BAD_IRQ   0x7fff
> 
> Can it be anything? How about 0x7fbadbad ?

It can be anything as long as we never assign that number to an
interrupt. So we have to limit the IRQ numbers to that value. Talking
of which I need to make sure I enforce the limitation on the numbers
today.

> > +#define XIVE_MAX_IRQ   (XIVE_BAD_IRQ - 1)
> > +
> > +/* An invalid CPU target */
> > +#define XIVE_INVALID_TARGET(-1)
> > +
> > +static u32 xive_read_eq(struct xive_q *q, u8 prio, bool just_peek)
> 
> Can it have a doc comment? And tell me what an EQ is?

I added a description in v2.

> > +{
> > +   u32 cur;
> > +
> > +   if (!q->qpage)
> > +   return 0;
> 
> A newline or ..
> 
> > +   cur = be32_to_cpup(q->qpage + q->idx);
> > +   if ((cur >> 31) == q->toggle)
> > +   return 0;
> 
> .. two wouldn't hurt here.
> 
> > +   if (!just_peek) {
> > +   q->idx = (q->idx + 1) & q->msk;
> > +   if (q->idx == 0)
> > +   q->toggle ^= 1;
> > +   }
> > +   return cur & 0x7fff;
> 
> Is that XIVE_BAD_IRQ ?

No. This is a mask. The top bit is the toggle valid bit, we mask it out
on the way back. Will add a comment.

> > +}
> > +
> > +static u32 xive_scan_interrupts(struct xive_cpu *xc, bool
> > just_peek)
> > +{
> > +   u32 hirq = 0;
> 
> Is that a hwirq or something different?

not sure why I called it hirq ... it's what comes out of the queue
which is a 

Re: [RFC PATCH 1/7] mm/hugetlb/migration: Use set_huge_pte_at instead of set_pte_at

2017-04-04 Thread Aneesh Kumar K.V


The patch series is not yet send to linux-mm. Once I get feedback on the 
approach used, I will resend this to linux-mm.


Also if there is sufficient interest we could also get nohash hugetlb 
migration to work. But I avoided doing that in this series, because of 
my inability to test the changes.


-aneesh



[RFC PATCH 7/7] powerpc/hugetlb: Enable hugetlb migration for ppc64

2017-04-04 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/platforms/Kconfig.cputype | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index 382c3dd86d6d..c0ca27521679 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -350,6 +350,11 @@ config PPC_RADIX_MMU
  is only implemented by IBM Power9 CPUs, if you don't have one of them
  you can probably disable this.
 
+config ARCH_ENABLE_HUGEPAGE_MIGRATION
+   def_bool y
+   depends on PPC_BOOK3S_64 && HUGETLB_PAGE && MIGRATION
+
+
 config PPC_MMU_NOHASH
def_bool y
depends on !PPC_STD_MMU
-- 
2.7.4



[RFC PATCH 3/7] mm/hugetlb: export hugetlb_entry_migration helper

2017-04-04 Thread Aneesh Kumar K.V
We will be using this later from the ppc64 code. Change the return type to bool.

Signed-off-by: Aneesh Kumar K.V 
---
 include/linux/hugetlb.h | 1 +
 mm/hugetlb.c| 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index b857fc8cc2ec..fddf6cf403d5 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -126,6 +126,7 @@ int pud_huge(pud_t pud);
 unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
unsigned long address, unsigned long end, pgprot_t newprot);
 
+bool is_hugetlb_entry_migration(pte_t pte);
 #else /* !CONFIG_HUGETLB_PAGE */
 
 static inline void reset_vma_resv_huge_pages(struct vm_area_struct *vma)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 2c090189f314..34ec2ab62215 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3189,7 +3189,7 @@ static void set_huge_ptep_writable(struct vm_area_struct 
*vma,
update_mmu_cache(vma, address, ptep);
 }
 
-static int is_hugetlb_entry_migration(pte_t pte)
+bool is_hugetlb_entry_migration(pte_t pte)
 {
swp_entry_t swp;
 
-- 
2.7.4



[RFC PATCH 4/7] mm/follow_page_mask: Add support for hugepage directory entry

2017-04-04 Thread Aneesh Kumar K.V
The defaul implementation prints warning and returns NULL. We will add ppc64
support in later patches

Signed-off-by: Aneesh Kumar K.V 
---
 include/linux/hugetlb.h |  3 +++
 mm/gup.c| 33 +
 mm/hugetlb.c|  8 
 3 files changed, 44 insertions(+)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index fddf6cf403d5..d3a4be0022d8 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -117,6 +117,9 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long 
addr);
 int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep);
 struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
  int write);
+struct page *follow_huge_pd(struct vm_area_struct *vma,
+   unsigned long address, hugepd_t hpd,
+   int flags, int pdshift);
 struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
pmd_t *pmd, int flags);
 struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address,
diff --git a/mm/gup.c b/mm/gup.c
index 73d46f9f7b81..0e18fd5f65b4 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -226,6 +226,14 @@ static struct page *follow_pmd_mask(struct vm_area_struct 
*vma,
return page;
return no_page_table(vma, flags);
}
+   if (is_hugepd(__hugepd(pmd_val(*pmd {
+   page = follow_huge_pd(vma, address,
+ __hugepd(pmd_val(*pmd)), flags,
+ PMD_SHIFT);
+   if (page)
+   return page;
+   return no_page_table(vma, flags);
+   }
if (pmd_devmap(*pmd)) {
ptl = pmd_lock(mm, pmd);
page = follow_devmap_pmd(vma, address, pmd, flags);
@@ -292,6 +300,14 @@ static struct page *follow_pud_mask(struct vm_area_struct 
*vma,
return page;
return no_page_table(vma, flags);
}
+   if (is_hugepd(__hugepd(pud_val(*pud {
+   page = follow_huge_pd(vma, address,
+ __hugepd(pud_val(*pud)), flags,
+ PUD_SHIFT);
+   if (page)
+   return page;
+   return no_page_table(vma, flags);
+   }
if (pud_devmap(*pud)) {
ptl = pud_lock(mm, pud);
page = follow_devmap_pud(vma, address, pud, flags);
@@ -311,6 +327,7 @@ static struct page *follow_p4d_mask(struct vm_area_struct 
*vma,
unsigned int flags, unsigned int *page_mask)
 {
p4d_t *p4d;
+   struct page *page;
 
p4d = p4d_offset(pgdp, address);
if (p4d_none(*p4d))
@@ -319,6 +336,14 @@ static struct page *follow_p4d_mask(struct vm_area_struct 
*vma,
if (unlikely(p4d_bad(*p4d)))
return no_page_table(vma, flags);
 
+   if (is_hugepd(__hugepd(p4d_val(*p4d {
+   page = follow_huge_pd(vma, address,
+ __hugepd(p4d_val(*p4d)), flags,
+ P4D_SHIFT);
+   if (page)
+   return page;
+   return no_page_table(vma, flags);
+   }
return follow_pud_mask(vma, address, p4d, flags, page_mask);
 }
 
@@ -357,6 +382,14 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd)))
return no_page_table(vma, flags);
 
+   if (is_hugepd(__hugepd(pgd_val(*pgd {
+   page = follow_huge_pd(vma, address,
+ __hugepd(pgd_val(*pgd)), flags,
+ PGDIR_SHIFT);
+   if (page)
+   return page;
+   return no_page_table(vma, flags);
+   }
return follow_p4d_mask(vma, address, pgd, flags, page_mask);
 }
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 34ec2ab62215..b02faa1079bd 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4650,6 +4650,14 @@ follow_huge_addr(struct mm_struct *mm, unsigned long 
address,
 }
 
 struct page * __weak
+follow_huge_pd(struct vm_area_struct *vma,
+  unsigned long address, hugepd_t hpd, int flags, int pdshift)
+{
+   WARN(1, "hugepd follow called with no support for hugepage directory 
format\n");
+   return NULL;
+}
+
+struct page * __weak
 follow_huge_pmd(struct mm_struct *mm, unsigned long address,
pmd_t *pmd, int flags)
 {
-- 
2.7.4



[RFC PATCH 5/7] mm/follow_page_mask: Add support for hugetlb pgd entries.

2017-04-04 Thread Aneesh Kumar K.V
ppc64 supports pgd hugetlb entries. Add code to handle hugetlb pgd entries to
follow_page_mask so that ppc64 can switch to it to handle hugetlbe entries.

Signed-off-by: Aneesh Kumar K.V 
---
 include/linux/hugetlb.h | 3 +++
 mm/gup.c| 7 +++
 mm/hugetlb.c| 9 +
 3 files changed, 19 insertions(+)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index d3a4be0022d8..04b73a9c8b4b 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -124,6 +124,9 @@ struct page *follow_huge_pmd(struct mm_struct *mm, unsigned 
long address,
pmd_t *pmd, int flags);
 struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address,
pud_t *pud, int flags);
+struct page *follow_huge_pgd(struct mm_struct *mm, unsigned long address,
+pgd_t *pgd, int flags);
+
 int pmd_huge(pmd_t pmd);
 int pud_huge(pud_t pud);
 unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
diff --git a/mm/gup.c b/mm/gup.c
index 0e18fd5f65b4..74a25e33dddb 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -382,6 +382,13 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd)))
return no_page_table(vma, flags);
 
+   if (pgd_huge(*pgd)) {
+   page = follow_huge_pgd(mm, address, pgd, flags);
+   if (page)
+   return page;
+   return no_page_table(vma, flags);
+   }
+
if (is_hugepd(__hugepd(pgd_val(*pgd {
page = follow_huge_pd(vma, address,
  __hugepd(pgd_val(*pgd)), flags,
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index b02faa1079bd..eb39a7496de7 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4702,6 +4702,15 @@ follow_huge_pud(struct mm_struct *mm, unsigned long 
address,
return pte_page(*(pte_t *)pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT);
 }
 
+struct page * __weak
+follow_huge_pgd(struct mm_struct *mm, unsigned long address, pgd_t *pgd, int 
flags)
+{
+   if (flags & FOLL_GET)
+   return NULL;
+
+   return pte_page(*(pte_t *)pgd) + ((address & ~PGDIR_MASK) >> 
PAGE_SHIFT);
+}
+
 #ifdef CONFIG_MEMORY_FAILURE
 
 /*
-- 
2.7.4



[RFC PATCH 6/7] powerpc/hugetlb: Add code to support to follow huge page directory entries

2017-04-04 Thread Aneesh Kumar K.V
Add follow_huge_pd implementation for ppc64.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/hugetlbpage.c | 42 ++
 1 file changed, 42 insertions(+)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 80f6d2ed551a..9d66d4f810aa 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -17,6 +17,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -618,6 +620,10 @@ void hugetlb_free_pgd_range(struct mmu_gather *tlb,
 }
 
 /*
+ * 64 bit book3s use generic follow_page_mask
+ */
+#ifndef CONFIG_PPC_BOOK3S_64
+/*
  * We are holding mmap_sem, so a parallel huge page collapse cannot run.
  * To prevent hugepage split, disable irq.
  */
@@ -673,6 +679,42 @@ follow_huge_pud(struct mm_struct *mm, unsigned long 
address,
return NULL;
 }
 
+#else /* !CONFIG_PPC_BOOK3S_64 */
+
+struct page *follow_huge_pd(struct vm_area_struct *vma,
+   unsigned long address, hugepd_t hpd,
+   int flags, int pdshift)
+{
+   pte_t *ptep;
+   spinlock_t *ptl;
+   struct page *page = NULL;
+   unsigned long mask;
+   int shift = hugepd_shift(hpd);
+   struct mm_struct *mm = vma->vm_mm;
+
+retry:
+   ptl = >page_table_lock;
+   spin_lock(ptl);
+
+   ptep = hugepte_offset(hpd, address, pdshift);
+   if (pte_present(*ptep)) {
+   mask = (1UL << shift) - 1;
+   page = pte_page(*ptep);
+   page += ((address & mask) >> PAGE_SHIFT);
+   if (flags & FOLL_GET)
+   get_page(page);
+   } else {
+   if (is_hugetlb_entry_migration(*ptep)) {
+   spin_unlock(ptl);
+   __migration_entry_wait(mm, ptep, ptl);
+   goto retry;
+   }
+   }
+   spin_unlock(ptl);
+   return page;
+}
+#endif
+
 static unsigned long hugepte_addr_end(unsigned long addr, unsigned long end,
  unsigned long sz)
 {
-- 
2.7.4



[RFC PATCH 2/7] mm/follow_page_mask: Split follow_page_mask to smaller functions.

2017-04-04 Thread Aneesh Kumar K.V
Makes code reading easy. No functional changes in this patch.

Signed-off-by: Aneesh Kumar K.V 
---
 mm/gup.c | 148 +++
 1 file changed, 91 insertions(+), 57 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 04aa405350dc..73d46f9f7b81 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -208,68 +208,16 @@ static struct page *follow_page_pte(struct vm_area_struct 
*vma,
return no_page_table(vma, flags);
 }
 
-/**
- * follow_page_mask - look up a page descriptor from a user-virtual address
- * @vma: vm_area_struct mapping @address
- * @address: virtual address to look up
- * @flags: flags modifying lookup behaviour
- * @page_mask: on output, *page_mask is set according to the size of the page
- *
- * @flags can have FOLL_ flags set, defined in 
- *
- * Returns the mapped (struct page *), %NULL if no mapping exists, or
- * an error pointer if there is a mapping to something not represented
- * by a page descriptor (see also vm_normal_page()).
- */
-struct page *follow_page_mask(struct vm_area_struct *vma,
- unsigned long address, unsigned int flags,
- unsigned int *page_mask)
+static struct page *follow_pmd_mask(struct vm_area_struct *vma,
+   unsigned long address, pud_t *pudp,
+   unsigned int flags, unsigned int *page_mask)
 {
-   pgd_t *pgd;
-   p4d_t *p4d;
-   pud_t *pud;
pmd_t *pmd;
spinlock_t *ptl;
struct page *page;
struct mm_struct *mm = vma->vm_mm;
 
-   *page_mask = 0;
-
-   page = follow_huge_addr(mm, address, flags & FOLL_WRITE);
-   if (!IS_ERR(page)) {
-   BUG_ON(flags & FOLL_GET);
-   return page;
-   }
-
-   pgd = pgd_offset(mm, address);
-   if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd)))
-   return no_page_table(vma, flags);
-   p4d = p4d_offset(pgd, address);
-   if (p4d_none(*p4d))
-   return no_page_table(vma, flags);
-   BUILD_BUG_ON(p4d_huge(*p4d));
-   if (unlikely(p4d_bad(*p4d)))
-   return no_page_table(vma, flags);
-   pud = pud_offset(p4d, address);
-   if (pud_none(*pud))
-   return no_page_table(vma, flags);
-   if (pud_huge(*pud) && vma->vm_flags & VM_HUGETLB) {
-   page = follow_huge_pud(mm, address, pud, flags);
-   if (page)
-   return page;
-   return no_page_table(vma, flags);
-   }
-   if (pud_devmap(*pud)) {
-   ptl = pud_lock(mm, pud);
-   page = follow_devmap_pud(vma, address, pud, flags);
-   spin_unlock(ptl);
-   if (page)
-   return page;
-   }
-   if (unlikely(pud_bad(*pud)))
-   return no_page_table(vma, flags);
-
-   pmd = pmd_offset(pud, address);
+   pmd = pmd_offset(pudp, address);
if (pmd_none(*pmd))
return no_page_table(vma, flags);
if (pmd_huge(*pmd) && vma->vm_flags & VM_HUGETLB) {
@@ -319,13 +267,99 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
return ret ? ERR_PTR(ret) :
follow_page_pte(vma, address, pmd, flags);
}
-
page = follow_trans_huge_pmd(vma, address, pmd, flags);
spin_unlock(ptl);
*page_mask = HPAGE_PMD_NR - 1;
return page;
 }
 
+
+static struct page *follow_pud_mask(struct vm_area_struct *vma,
+   unsigned long address, p4d_t *p4dp,
+   unsigned int flags, unsigned int *page_mask)
+{
+   pud_t *pud;
+   spinlock_t *ptl;
+   struct page *page;
+   struct mm_struct *mm = vma->vm_mm;
+
+   pud = pud_offset(p4dp, address);
+   if (pud_none(*pud))
+   return no_page_table(vma, flags);
+   if (pud_huge(*pud) && vma->vm_flags & VM_HUGETLB) {
+   page = follow_huge_pud(mm, address, pud, flags);
+   if (page)
+   return page;
+   return no_page_table(vma, flags);
+   }
+   if (pud_devmap(*pud)) {
+   ptl = pud_lock(mm, pud);
+   page = follow_devmap_pud(vma, address, pud, flags);
+   spin_unlock(ptl);
+   if (page)
+   return page;
+   }
+   if (unlikely(pud_bad(*pud)))
+   return no_page_table(vma, flags);
+
+   return follow_pmd_mask(vma, address, pud, flags, page_mask);
+}
+
+
+static struct page *follow_p4d_mask(struct vm_area_struct *vma,
+   unsigned long address, pgd_t *pgdp,
+   unsigned int flags, unsigned int *page_mask)
+{
+   p4d_t *p4d;
+
+   p4d = p4d_offset(pgdp, address);
+   if (p4d_none(*p4d))
+   return no_page_table(vma, flags);
+   

[RFC PATCH 1/7] mm/hugetlb/migration: Use set_huge_pte_at instead of set_pte_at

2017-04-04 Thread Aneesh Kumar K.V
The right interface to use to set a hugetlb pte entry is set_huge_pte_at. Use
that instead of set_pte_at.

Signed-off-by: Aneesh Kumar K.V 
---
 mm/migrate.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 9a0897a14d37..4c272ac6fe53 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -224,25 +224,26 @@ static int remove_migration_pte(struct page *page, struct 
vm_area_struct *vma,
if (is_write_migration_entry(entry))
pte = maybe_mkwrite(pte, vma);
 
+   flush_dcache_page(new);
 #ifdef CONFIG_HUGETLB_PAGE
if (PageHuge(new)) {
pte = pte_mkhuge(pte);
pte = arch_make_huge_pte(pte, vma, new, 0);
-   }
-#endif
-   flush_dcache_page(new);
-   set_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte);
-
-   if (PageHuge(new)) {
+   set_huge_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, 
pte);
if (PageAnon(new))
hugepage_add_anon_rmap(new, vma, pvmw.address);
else
page_dup_rmap(new, true);
-   } else if (PageAnon(new))
-   page_add_anon_rmap(new, vma, pvmw.address, false);
-   else
-   page_add_file_rmap(new, false);
+   } else
+#endif
+   {
+   set_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte);
 
+   if (PageAnon(new))
+   page_add_anon_rmap(new, vma, pvmw.address, 
false);
+   else
+   page_add_file_rmap(new, false);
+   }
if (vma->vm_flags & VM_LOCKED && !PageTransCompound(new))
mlock_vma_page(new);
 
-- 
2.7.4



[RFC PATCH] powerpc/mm/hugetlb: Add support for 1G huge pages

2017-04-04 Thread Aneesh Kumar K.V
This patch adds support for gigantic pages in ppc64. We also updates
gigantic_page_supported helper such that arch can override it.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hugetlb.h | 9 +
 arch/powerpc/mm/hugetlbpage.c| 7 +--
 arch/powerpc/platforms/Kconfig.cputype   | 1 +
 mm/hugetlb.c | 4 
 4 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hugetlb.h 
b/arch/powerpc/include/asm/book3s/64/hugetlb.h
index cd366596..a994d069fdaf 100644
--- a/arch/powerpc/include/asm/book3s/64/hugetlb.h
+++ b/arch/powerpc/include/asm/book3s/64/hugetlb.h
@@ -50,4 +50,13 @@ static inline pte_t arch_make_huge_pte(pte_t entry, struct 
vm_area_struct *vma,
else
return entry;
 }
+
+#define gigantic_page_supported gigantic_page_supported
+static inline bool gigantic_page_supported(void)
+{
+   if (radix_enabled())
+   return true;
+   return false;
+}
+
 #endif
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index a4f33de4008e..80f6d2ed551a 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -763,8 +763,11 @@ static int __init add_huge_page_size(unsigned long long 
size)
 * Hash: 16M and 16G
 */
if (radix_enabled()) {
-   if (mmu_psize != MMU_PAGE_2M)
-   return -EINVAL;
+   if (mmu_psize != MMU_PAGE_2M) {
+   if (cpu_has_feature(CPU_FTR_POWER9_DD1) ||
+   (mmu_psize != MMU_PAGE_1G))
+   return -EINVAL;
+   }
} else {
if (mmu_psize != MMU_PAGE_16M && mmu_psize != MMU_PAGE_16G)
return -EINVAL;
diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index a7c0c1fafe68..382c3dd86d6d 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -343,6 +343,7 @@ config PPC_STD_MMU_64
 config PPC_RADIX_MMU
bool "Radix MMU Support"
depends on PPC_BOOK3S_64
+   select ARCH_HAS_GIGANTIC_PAGE
default y
help
  Enable support for the Power ISA 3.0 Radix style MMU. Currently this
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3d0aab9ee80d..2c090189f314 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1158,7 +1158,11 @@ static int alloc_fresh_gigantic_page(struct hstate *h,
return 0;
 }
 
+#ifndef gigantic_page_supported
 static inline bool gigantic_page_supported(void) { return true; }
+#define gigantic_page_supported gigantic_page_supported
+#endif
+
 #else
 static inline bool gigantic_page_supported(void) { return false; }
 static inline void free_gigantic_page(struct page *page, unsigned int order) { 
}
-- 
2.7.4



Re: [PATCH] powerpc/hugetlb: Add ABI defines for MAP_HUGE_16MB and MAP_HUGE_16GB

2017-04-04 Thread Anshuman Khandual
On 04/04/2017 02:03 PM, Aneesh Kumar K.V wrote:
> 
> 
> On Tuesday 04 April 2017 11:33 AM, Anshuman Khandual wrote:
>> This just adds user space exported ABI definitions for both 16MB and
>> 16GB non default huge page sizes to be used with mmap() system call.
>>
>> Signed-off-by: Anshuman Khandual 
>> ---
>> These defined values will be used along with MAP_HUGETLB while calling
>> mmap() system call if the desired HugeTLB page size is not the default
>> one. Follows similar definitions present in x86.
>>
>> arch/x86/include/uapi/asm/mman.h:#define MAP_HUGE_2MB(21 <<
>> MAP_HUGE_SHIFT)
>> arch/x86/include/uapi/asm/mman.h:#define MAP_HUGE_1GB(30 <<
>> MAP_HUGE_SHIFT)
>>
>>  arch/powerpc/include/uapi/asm/mman.h | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/arch/powerpc/include/uapi/asm/mman.h
>> b/arch/powerpc/include/uapi/asm/mman.h
>> index 03c06ba..e78980b 100644
>> --- a/arch/powerpc/include/uapi/asm/mman.h
>> +++ b/arch/powerpc/include/uapi/asm/mman.h
>> @@ -29,4 +29,7 @@
>>  #define MAP_STACK0x2/* give out an address that is
>> best suited for process/thread stacks */
>>  #define MAP_HUGETLB0x4/* create a huge page mapping */
>>
>> +#define MAP_HUGE_16MB(24 << MAP_HUGE_SHIFT)/* 16MB HugeTLB
>> Page */
>> +#define MAP_HUGE_16GB(34 << MAP_HUGE_SHIFT)/* 16GB HugeTLB
>> Page */
>> +
>>  #endif /* _UAPI_ASM_POWERPC_MMAN_H */
>>
> 
> I am doing a similar patch as part of 1G and hugetlb migration series.
> Can you add 2M and 1G #defines also so that i can drop the patch from my
> series and pick this ?

Sure, will just have to add the two lines from x86 code :)



Re: [PATCH 02/12] powerpc: Sync opal-api.h

2017-04-04 Thread Benjamin Herrenschmidt
On Tue, 2017-04-04 at 22:20 +1000, Michael Ellerman wrote:
> Benjamin Herrenschmidt  writes:
> 
> ...
> 
> Give me some change log !

Well, the subject says it all :-) Sync the API with the latest OPAL :-)

> > Signed-off-by: Benjamin Herrenschmidt 
> > ---
> >  arch/powerpc/include/asm/opal-api.h| 302
> > -
> 
> It looks like you've just copied it over in its entirety, including
> lots of unused cruft.
> 
> Please just give me the XIVE bits you need.

Why ? It's a lot easier in the long run to have the file actually in
sync between the two projects no ?

Cheers,
Ben.



Re: [7/7] crypto: caam/qi - add ablkcipher and authenc algorithms

2017-04-04 Thread Laurentiu Tudor
Hi Michael,

Just a couple of basic things to check:
  - was the dtb updated to the newest?
  - is the qman node present? This should be easily visible in 
/proc/device-tree/soc@ffe00/qman@318000.

---
Best Regards, Laurentiu

On 04/04/2017 08:03 AM, Michael Ellerman wrote:
> Horia Geantă  writes:
>
>> Add support to submit ablkcipher and authenc algorithms
>> via the QI backend:
>> -ablkcipher:
>> cbc({aes,des,des3_ede})
>> ctr(aes), rfc3686(ctr(aes))
>> xts(aes)
>> -authenc:
>> authenc(hmac(md5),cbc({aes,des,des3_ede}))
>> authenc(hmac(sha*),cbc({aes,des,des3_ede}))
>>
>> caam/qi being a new driver, let's wait some time to settle down without
>> interfering with existing caam/jr driver.
>> Accordingly, for now all caam/qi algorithms (caamalg_qi module) are
>> marked to be of lower priority than caam/jr ones (caamalg module).
>>
>> Signed-off-by: Vakul Garg 
>> Signed-off-by: Alex Porosanu 
>> Signed-off-by: Horia Geantă 
>> ---
>>   drivers/crypto/caam/Kconfig|   20 +-
>>   drivers/crypto/caam/Makefile   |1 +
>>   drivers/crypto/caam/caamalg.c  |9 +-
>>   drivers/crypto/caam/caamalg_desc.c |   77 +-
>>   drivers/crypto/caam/caamalg_desc.h |   15 +-
>>   drivers/crypto/caam/caamalg_qi.c   | 2387 
>> 
>>   drivers/crypto/caam/sg_sw_qm.h |  108 ++
>>   7 files changed, 2601 insertions(+), 16 deletions(-)
>>   create mode 100644 drivers/crypto/caam/caamalg_qi.c
>>   create mode 100644 drivers/crypto/caam/sg_sw_qm.h
>
>
> This appears to be blowing up my Freescale (NXP) P5020DS board:
>
>Unable to handle kernel paging request for data at address 0x0020
>Faulting instruction address: 0xc04393e4
>Oops: Kernel access of bad area, sig: 11 [#1]
>SMP NR_CPUS=24
>CoreNet Generic
>Modules linked in:
>CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
> 4.11.0-rc3-compiler_gcc-4.6.3-00046-gb189817cf789 #5
>task: c000f70c task.stack: c000f70c8000
>NIP: c04393e4 LR: c04aeba0 CTR: c04fa7d8
>REGS: c000f70cb160 TRAP: 0300   Not tainted  
> (4.11.0-rc3-compiler_gcc-4.6.3-00046-gb189817cf789)
>MSR: 80029000 
>  CR: 24adbe48  XER: 2000
>DEAR: 0020 ESR:  SOFTE: 1
>GPR00: c06feba0 c000f70cb3e0 c0e6 
>GPR04: 0001  c0e0b290 0003
>GPR08: 0004 c0ea5280 0004 0004
>GPR12: 88adbe22 c0003fff5000 c0ba3518 880088090fa8
>GPR16: 1000 c0ba3500 c000f72c68d8 0004
>GPR20: c0ea5280 c0ba34e8 0020 0004
>GPR24: c0eab7c0  c000f7fc8818 c0eb
>GPR28: c000f786cc00 c0eab780 f786cc00 c0eab7c0
>NIP [c04393e4] .gen_pool_alloc+0x0/0xc
>LR [c04aeba0] .qman_alloc_cgrid_range+0x24/0x54
>Call Trace:
>[c000f70cb3e0] [c0504054] 
> .platform_device_register_full+0x12c/0x150 (unreliable)
>[c000f70cb460] [c06feba0] .caam_qi_init+0x158/0x63c
>[c000f70cb5f0] [c06fc64c] .caam_probe+0x8b8/0x1830
>[c000f70cb740] [c0503288] .platform_drv_probe+0x60/0xac
>[c000f70cb7c0] [c0501194] .driver_probe_device+0x248/0x344
>[c000f70cb870] [c05013b4] .__driver_attach+0x124/0x128
>[c000f70cb900] [c04fed90] .bus_for_each_dev+0x80/0xcc
>[c000f70cb9a0] [c0500858] .driver_attach+0x24/0x38
>[c000f70cba10] [c050043c] .bus_add_driver+0x14c/0x29c
>[c000f70cbab0] [c0501d64] .driver_register+0x8c/0x154
>[c000f70cbb30] [c0503000] .__platform_driver_register+0x48/0x5c
>[c000f70cbba0] [c0c7f798] .caam_driver_init+0x1c/0x30
>[c000f70cbc10] [c0001904] .do_one_initcall+0x60/0x1a8
>[c000f70cbcf0] [c0c35f30] .kernel_init_freeable+0x248/0x334
>[c000f70cbdb0] [c00020fc] .kernel_init+0x1c/0xf20
>[c000f70cbe30] [c9bc] .ret_from_kernel_thread+0x58/0x9c
>Instruction dump:
>eb61ffd8 eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8 4e800020 3860 4bb0
>7ce53b78 4b0c 7f67db78 4b24  e8c30028 4bfffd30 fbe1fff8
>---[ end trace 9f61087068991b02 ]---
>
>
> home:linux-next(4)(I)> git bisect log
> ...
> git bisect bad b189817cf7894e03fd3700acd923221d3007259e
> # first bad commit: [b189817cf7894e03fd3700acd923221d3007259e] crypto: 
> caam/qi - add ablkcipher and authenc algorithms
>
>
> The oops is saying gen_pool_alloc() was called with a NULL pointer, so
> it seems qm_cgralloc is NULL:
>
> static int qman_alloc_range(struct gen_pool *p, u32 *result, u32 cnt)
> {
>   unsigned long addr;

POWER4 - who has one?

2017-04-04 Thread Michael Ellerman
Hi folks,

Quick quiz, who still has a POWER4?

And if so are you running mainline on it?

cheers


Re: [PATCH 06/12] powerpc/xive: Native exploitation of the XIVE interrupt controller

2017-04-04 Thread Michael Ellerman
Benjamin Herrenschmidt  writes:

> The XIVE interrupt controller is the new interrupt controller
> found in POWER9. It supports advanced virtualization capabilities
> among other things.
>
> Currently we use a set of firmware calls that simulate the old
> "XICS" interrupt controller but this is fairly inefficient.
>
> This adds the framework for using XIVE along with a native
> backend which OPAL for configuration. Later, a backend allowing
   ^
   calls?

> the use in a KVM or PowerVM guest will also be provided.
>
> This disables some fast path for interrupts in KVM when XIVE is
> enabled as these rely on the firmware emulation code which is no
> longer available when the XIVE is used natively by Linux.
>
> A latter patch will make KVM also directly exploit the XIVE, thus
> recovering the lost performance (and more).
>
> Signed-off-by: Benjamin Herrenschmidt 
> ---
>  arch/powerpc/include/asm/xive.h  |  116 +++
>  arch/powerpc/include/asm/xmon.h  |2 +
>  arch/powerpc/platforms/powernv/Kconfig   |2 +
>  arch/powerpc/platforms/powernv/setup.c   |   15 +-
>  arch/powerpc/platforms/powernv/smp.c |   39 +-
>  arch/powerpc/sysdev/Kconfig  |1 +
>  arch/powerpc/sysdev/Makefile |1 +
>  arch/powerpc/sysdev/xive/Kconfig |7 +
>  arch/powerpc/sysdev/xive/Makefile|4 +
>  arch/powerpc/sysdev/xive/common.c| 1175 
> ++
>  arch/powerpc/sysdev/xive/native.c|  604 +++
>  arch/powerpc/sysdev/xive/xive-internal.h |   51 ++
>  arch/powerpc/sysdev/xive/xive-regs.h |   88 +++
>  arch/powerpc/xmon/xmon.c |   93 ++-
>  14 files changed, 2186 insertions(+), 12 deletions(-)

I'm not going to review this in one go, given it's 10:30pm already.

So just a few things that hit me straight away.

> diff --git a/arch/powerpc/include/asm/xive.h b/arch/powerpc/include/asm/xive.h
> new file mode 100644
> index 000..b1604b73
> --- /dev/null
> +++ b/arch/powerpc/include/asm/xive.h
> @@ -0,0 +1,116 @@

Copyright missing.

> +#ifndef _ASM_POWERPC_XIVE_H
> +#define _ASM_POWERPC_XIVE_H
> +
> +#define XIVE_INVALID_VP  0x
> +
> +#ifdef CONFIG_PPC_XIVE
> +
> +extern void __iomem *xive_tm_area;

I think Paul already commented on "tm" being an overly used acronym.

> +extern u32 xive_tm_offset;
> +
> +/*
> + * Per-irq data (irq_get_handler_data for normal IRQs), IPIs
> + * have it stored in the xive_cpu structure. We also cache
> + * for normal interrupts the current target CPU.
> + */
> +struct xive_irq_data {
> + /* Setup by backend */
> + u64 flags;
> +#define XIVE_IRQ_FLAG_STORE_EOI  0x01
> +#define XIVE_IRQ_FLAG_LSI0x02
> +#define XIVE_IRQ_FLAG_SHIFT_BUG  0x04
> +#define XIVE_IRQ_FLAG_MASK_FW0x08
> +#define XIVE_IRQ_FLAG_EOI_FW 0x10

I don't love that style, prefer them just prior to the struct.

> + u64 eoi_page;
> + void __iomem *eoi_mmio;
> + u64 trig_page;
> + void __iomem *trig_mmio;
> + u32 esb_shift;
> + int src_chip;

Why not space out the members like you do in xive_q below, I think that
looks better given you have the long __iomem lines.

> +
> + /* Setup/used by frontend */
> + int target;
> + bool saved_p;
> +};
> +#define XIVE_INVALID_CHIP_ID -1
> +
> +/* A queue tracking structure in a CPU */
> +struct xive_q {
> + __be32  *qpage;
> + u32 msk;
> + u32 idx;
> + u32 toggle;
> + u64 eoi_phys;
> + void __iomem*eoi_mmio;
> + u32 esc_irq;
> + atomic_tcount;
> + atomic_tpending_count;
> +};
> +
> +/*
> + * "magic" ESB MMIO offsets

What's an ESB?

> + */
> +#define XIVE_ESB_GET 0x800
> +#define XIVE_ESB_SET_PQ_00   0xc00
> +#define XIVE_ESB_SET_PQ_01   0xd00
> +#define XIVE_ESB_SET_PQ_10   0xe00
> +#define XIVE_ESB_SET_PQ_11   0xf00
> +#define XIVE_ESB_MASKXIVE_ESB_SET_PQ_01
> +
> +extern bool __xive_enabled;
> +
> +static inline bool xive_enabled(void) { return __xive_enabled; }
> +
> +extern bool xive_native_init(void);
> +extern void xive_smp_probe(void);
> +extern int  xive_smp_prepare_cpu(unsigned int cpu);
> +extern void xive_smp_setup_cpu(void);
> +extern void xive_smp_disable_cpu(void);
> +extern void xive_kexec_teardown_cpu(int secondary);
> +extern void xive_shutdown(void);
> +extern void xive_flush_interrupt(void);
> +
> +/* xmon hook */
> +extern void xmon_xive_do_dump(int cpu);
> +
> +/* APIs used by KVM */
> +extern u32 xive_native_default_eq_shift(void);
> +extern u32 xive_native_alloc_vp_block(u32 max_vcpus);
> +extern void xive_native_free_vp_block(u32 vp_base);
> +extern int xive_native_populate_irq_data(u32 hw_irq,
> +  struct xive_irq_data *data);
> +extern 

Re: [PATCH] tty/hvc_console: fix console lock ordering with spinlock

2017-04-04 Thread Denis Kirjanov
On 4/4/17, Michael Ellerman  wrote:
> Denis Kirjanov  writes:
>
>> hvc_remove() takes a spin lock first then acquires the console
>> semaphore. This situation can easily lead to a deadlock scenario
>> where we call scheduler with spin lock held.
>
> Have you actually hit the deadlock? Because that code's been like that
> for years and I've never received a bug report.

Nope, I didn't. I've found the bug in the code while looking at the
lockdep output

>
>> diff --git a/drivers/tty/hvc/hvc_console.c
>> b/drivers/tty/hvc/hvc_console.c
>> index b19ae36..a8d3991 100644
>> --- a/drivers/tty/hvc/hvc_console.c
>> +++ b/drivers/tty/hvc/hvc_console.c
>> @@ -920,17 +920,17 @@ int hvc_remove(struct hvc_struct *hp)
>>
>>  tty = tty_port_tty_get(>port);
>>
>> +console_lock();
>>  spin_lock_irqsave(>lock, flags);
>>  if (hp->index < MAX_NR_HVC_CONSOLES) {
>> -console_lock();
>>  vtermnos[hp->index] = -1;
>>  cons_ops[hp->index] = NULL;
>> -console_unlock();
>>  }
>>
>>  /* Don't whack hp->irq because tty_hangup() will need to free the irq.
>> */
>>
>>  spin_unlock_irqrestore(>lock, flags);
>> +console_unlock();
>
> I get that you're trying to do the minimal change, but I don't think the
> result is ideal. If this isn't a console hvc then we take both locks but
> do nothing.
>
> So what about:
>
>   if (hp->index < MAX_NR_HVC_CONSOLES) {
>   console_lock();
>   spin_lock_irqsave(>lock, flags);
>   vtermnos[hp->index] = -1;
>   cons_ops[hp->index] = NULL;
>   spin_unlock_irqrestore(>lock, flags);
>   console_unlock();
>   }
Are you sure that we don't corrupt the hp->index between hvc_poll in
interrupt context and hvc_remoev?

>
> cheers
>


[bug report] ibmvnic: Cleanup failure path in ibmvnic_open

2017-04-04 Thread Dan Carpenter
[  This patch changed the code from using multiple come-from label names
   to using a single err label.  Both are terrible ways to do error
   handling.

   Come-From Labels:

   Come-from labels look like this:

foo = alloc();
if (!foo)
goto alloc_failed;

   The "alloc_failed" name is infuriating to look at because you can
   easily see that alloc_failed because you literally just read that on
   the line before.  It looks like it is useful information but it's a
   trick.  What we want to know is what does the goto do?  The label
   name should be verb based.

   One Err Style Error Handling:

   This is the most bug prone way of handling errors.  You can easily
   see that if you count the number of bugs from static checkers.  These
   labels try to handle every possible condition which is more difficult
   than handling a specific error state.  Plus, the label names for
   these are always vague and annoying.

   This patch introduces a classic One Err Bug.

   Kernel Style Error Handling:

   It's best if you use a string of error labels that each unwind a
   specific thing.  Never free something that hasn't been allocated.
   Choose verb based labels like:

err_release_bar:
release_bar(bar);
err_free_foo:
kfree(foo);

   When you're reviewing code that uses this style of error handling you
   only need to track the most recently allocated thing.  It works like
   this:

foo = alloc();
if (!foo)
return -ENOMEM;
bar = get_bar();
if (!bar)
goto err_free_foo;

   With this code you don't have to scroll up and down the function.
   It's obviously correct just from looking at those 6 lines.

- dan ]

Hello Nathan Fontenot,

The patch 1b8955ee5f6c: "ibmvnic: Cleanup failure path in
ibmvnic_open" from Mar 30, 2017, leads to the following static
checker warning:

drivers/net/ethernet/ibm/ibmvnic.c:672 ibmvnic_open()
error: we previously assumed 'adapter->napi' could be null (see line 
626)

drivers/net/ethernet/ibm/ibmvnic.c
   593  static int ibmvnic_open(struct net_device *netdev)
   594  {
   595  struct ibmvnic_adapter *adapter = netdev_priv(netdev);
   596  struct device *dev = >vdev->dev;
   597  union ibmvnic_crq crq;
   598  int rc = 0;
   599  int i;
   600  
   601  if (adapter->is_closed) {
   602  rc = ibmvnic_init(adapter);
   603  if (rc)
   604  return rc;
   605  }
   606  
   607  rc = ibmvnic_login(netdev);
   608  if (rc)
   609  return rc;
^^
See?  We didn't clean up form ibmvnic_init() so this is buggy already.
It should be goto uninit;  The One Err Style error handling is too
complicated so people just give up.  But Kernel Style error handling
methodically writes itself.

   610  
   611  rc = netif_set_real_num_tx_queues(netdev, 
adapter->req_tx_queues);
   612  if (rc) {
   613  dev_err(dev, "failed to set the number of tx queues\n");
   614  return -1;
   615  }
   616  
   617  rc = init_sub_crq_irqs(adapter);
   618  if (rc) {
   619  dev_err(dev, "failed to initialize sub crq irqs\n");
   620  return -1;
   621  }
   622  
   623  adapter->map_id = 1;
   624  adapter->napi = kcalloc(adapter->req_rx_queues,
   625  sizeof(struct napi_struct), GFP_KERNEL);
   626  if (!adapter->napi)
   627  goto ibmvnic_open_fail;
^^
If we hit this goto then we will Oops.

   628  for (i = 0; i < adapter->req_rx_queues; i++) {
   629  netif_napi_add(netdev, >napi[i], ibmvnic_poll,
   630 NAPI_POLL_WEIGHT);
   631  napi_enable(>napi[i]);
   632  }
   633  
   634  send_map_query(adapter);
   635  
   636  rc = init_rx_pools(netdev);
   637  if (rc)
   638  goto ibmvnic_open_fail;
   639  
   640  rc = init_tx_pools(netdev);
   641  if (rc)
   642  goto ibmvnic_open_fail;
   643  
   644  rc = init_bounce_buffer(netdev);
   645  if (rc)
   646  goto ibmvnic_open_fail;
   647  
   648  replenish_pools(adapter);
   649  
   650  /* We're ready to receive frames, enable the sub-crq interrupts 
and
   651   * set the logical link state to up
   652   */
   653  for (i = 0; i < adapter->req_rx_queues; i++)
   654  enable_scrq_irq(adapter, adapter->rx_scrq[i]);
   655  
   656  for (i = 0; i < adapter->req_tx_queues; i++)
   657  enable_scrq_irq(adapter, adapter->tx_scrq[i]);
   658  

Re: [PATCH 02/12] powerpc: Sync opal-api.h

2017-04-04 Thread Michael Ellerman
Benjamin Herrenschmidt  writes:

...

Give me some change log !

> Signed-off-by: Benjamin Herrenschmidt 
> ---
>  arch/powerpc/include/asm/opal-api.h| 302 
> -

It looks like you've just copied it over in its entirety, including lots
of unused cruft.

Please just give me the XIVE bits you need.

cheers


Re: [PATCH 1/2] powerpc: string: implement optimized memset variants

2017-04-04 Thread Michael Ellerman
"Naveen N. Rao"  writes:
> (generic) is with Matt's arch-independent patches applied. Profiling 
> indicates that most of the overhead is actually with the lzo 
> decompression...
>
> Also, with a simple module to memset64() a 1GB vmalloc'ed buffer, here 
> are the results:
> generic:  0.245315533 seconds time elapsed( +-  1.83% )
> optimized:0.169282701 seconds time elapsed( +-  1.96% )

Great, that's pretty conclusive.

I'm pretty sure I can take these 2 patches independently of Matt's
series, they just won't be used by much until his series goes in, so
I'll do that unless someone yells.

cheers


Re: [PATCH] tty/hvc_console: fix console lock ordering with spinlock

2017-04-04 Thread Michael Ellerman
Denis Kirjanov  writes:

> hvc_remove() takes a spin lock first then acquires the console
> semaphore. This situation can easily lead to a deadlock scenario
> where we call scheduler with spin lock held.

Have you actually hit the deadlock? Because that code's been like that
for years and I've never received a bug report.

> diff --git a/drivers/tty/hvc/hvc_console.c b/drivers/tty/hvc/hvc_console.c
> index b19ae36..a8d3991 100644
> --- a/drivers/tty/hvc/hvc_console.c
> +++ b/drivers/tty/hvc/hvc_console.c
> @@ -920,17 +920,17 @@ int hvc_remove(struct hvc_struct *hp)
>  
>   tty = tty_port_tty_get(>port);
>  
> + console_lock();
>   spin_lock_irqsave(>lock, flags);
>   if (hp->index < MAX_NR_HVC_CONSOLES) {
> - console_lock();
>   vtermnos[hp->index] = -1;
>   cons_ops[hp->index] = NULL;
> - console_unlock();
>   }
>  
>   /* Don't whack hp->irq because tty_hangup() will need to free the irq. 
> */
>  
>   spin_unlock_irqrestore(>lock, flags);
> + console_unlock();

I get that you're trying to do the minimal change, but I don't think the
result is ideal. If this isn't a console hvc then we take both locks but
do nothing.

So what about:

if (hp->index < MAX_NR_HVC_CONSOLES) {
console_lock();
spin_lock_irqsave(>lock, flags);
vtermnos[hp->index] = -1;
cons_ops[hp->index] = NULL;
spin_unlock_irqrestore(>lock, flags);
console_unlock();
}

cheers


Re: [PATCH 5/5] crypto/nx: Add P9 NX specific error codes for 842 engine

2017-04-04 Thread Michael Ellerman
Haren Myneni  writes:

> [PATCH 5/5] crypto/nx: Add P9 NX specific error codes for 842 engine
>
> This patch adds changes for checking P9 specific 842 engine
> error codes. These errros are reported in co-processor status
> block (CSB) for failures.

But you just enabled support on P9 in patch 4.

So you should reorder patch 4 and 5. ie. add the P9 error handling
before you enable P9 support.

cheers


Re: [PATCH 3/5] crypto/nx: Create nx842_delete_coproc function

2017-04-04 Thread Michael Ellerman
Haren Myneni  writes:

> [PATCH 3/5] crypto/nx: Create nx842_delete_coproc function
>
> Move deleting coprocessor info upon exit or failure to
> nx842_delete_coproc().

Naming again, this deletes *all* the coprocs, so the name should be
plural.

cheers


Re: [PATCH 2/5] crypto/nx: Create nx842_cfg_crb function

2017-04-04 Thread Michael Ellerman
Haren Myneni  writes:

> [PATCH 2/5] crypto/nx: Create nx842_cfg_crb function
>
> Configure CRB is moved to nx842_cfg_crb() so that it can be
> used for icswx function and VAS function which will be added
> later.

Buy a vowel! :)

nx842_configure_crb() is fine.

cheers


Re: [PATCH 1/5] crypto/nx: Rename nx842_powernv_function as icswx function

2017-04-04 Thread Michael Ellerman
Haren Myneni  writes:

> [PATCH 1/5] crypto/nx: Rename nx842_powernv_function as icswx function
>
> nx842_powernv_function is points to nx842_icswx_function and
> will be point to VAS function which will be added later for
> P9 NX support.

I know it's nit-picking but can you give it a better name while you're
there.

I was thinking it should be called "send" or something, but it actually
synchronously sends and waits for a request.

So perhaps just nx842_exec(), for "execute a request", and then you can
have nx842_exec_icswx() and nx842_exec_vas().

cheers


Re: [PATCH 4/5] crypto/nx: Add P9 NX support for 842 compression engine.

2017-04-04 Thread Michael Ellerman
Hi Haren,

A few comments ...

Haren Myneni  writes:

> diff --git a/arch/powerpc/include/asm/vas.h b/arch/powerpc/include/asm/vas.h
> index 4e5a470..7315621 100644
> --- a/arch/powerpc/include/asm/vas.h
> +++ b/arch/powerpc/include/asm/vas.h
> @@ -19,6 +19,8 @@
>  #define VAS_RX_FIFO_SIZE_MIN (1 << 10)   /* 1KB */
>  #define VAS_RX_FIFO_SIZE_MAX (8 << 20)   /* 8MB */
>  
> +#define is_vas_available()   (cpu_has_feature(CPU_FTR_ARCH_300))

You shouldn't need that, it should all come from the device tree.

> diff --git a/drivers/crypto/nx/Kconfig b/drivers/crypto/nx/Kconfig
> index ad7552a..4ad7fdb 100644
> --- a/drivers/crypto/nx/Kconfig
> +++ b/drivers/crypto/nx/Kconfig
> @@ -38,6 +38,7 @@ config CRYPTO_DEV_NX_COMPRESS_PSERIES
>  config CRYPTO_DEV_NX_COMPRESS_POWERNV
>   tristate "Compression acceleration support on PowerNV platform"
>   depends on PPC_POWERNV
> + select VAS

Don't select symbols that are user visible. 

I'm not sure we actually want CONFIG_VAS to be user visible, but
currently it is so this should be 'depends on VAS'.

> diff --git a/drivers/crypto/nx/nx-842-powernv.c 
> b/drivers/crypto/nx/nx-842-powernv.c
> index 8737e90..66efd39 100644
> --- a/drivers/crypto/nx/nx-842-powernv.c
> +++ b/drivers/crypto/nx/nx-842-powernv.c
> @@ -554,6 +662,164 @@ static int nx842_powernv_decompress(const unsigned char 
> *in, unsigned int inlen,
> wmem, CCW_FC_842_DECOMP_CRC);
>  }
>  
> +
> +static int __init vas_cfg_coproc_info(struct device_node *dn, int chip_id,
> + int vasid, int ct)
> +{
> + struct vas_window *rxwin, *txwin = NULL;
> + struct vas_rx_win_attr rxattr;
> + struct vas_tx_win_attr txattr;
> + struct nx842_coproc *coproc;
> + u32 lpid, pid, tid;
> + u64 rx_fifo;
> + int ret;
> +#define RX_FIFO_SIZE 0x8000

Where's that come from?

> + if (of_property_read_u64(dn, "rx-fifo-address", (void *)_fifo)) {
> + pr_err("ibm,nx-842: Missing rx-fifo-address property\n");

The driver already declares pr_fmt(), so do you need the prefixes on
these pr_err()s ?

> + return -EINVAL;
> + }
> +
> + if (of_property_read_u32(dn, "lpid", )) {
> + pr_err("ibm,nx-842: Missing lpid property\n");
> + return -EINVAL;
> + }
> +
> + if (of_property_read_u32(dn, "pid", )) {
> + pr_err("ibm,nx-842: Missing pid property\n");
> + return -EINVAL;
> + }
> +
> + if (of_property_read_u32(dn, "tid", )) {
> + pr_err("ibm,nx-842: Missing tid property\n");
> + return -EINVAL;
> + }
> +
> + vas_init_rx_win_attr(, ct);
> + rxattr.rx_fifo = (void *)rx_fifo;
> + rxattr.rx_fifo_size = RX_FIFO_SIZE;
> + rxattr.lnotify_lpid = lpid;
> + rxattr.lnotify_pid = pid;
> + rxattr.lnotify_tid = tid;
> + rxattr.wcreds_max = 64;
> +
> + /*
> +  * Open a VAS receice window which is used to configure RxFIFO
> +  * for NX.
> +  */
> + rxwin = vas_rx_win_open(vasid, ct, );
> + if (IS_ERR(rxwin)) {
> + pr_err("ibm,nx-842: setting RxFIFO with VAS failed: %ld\n",
> + PTR_ERR(rxwin));
> + return PTR_ERR(rxwin);
> + }
> +
> + /*
> +  * Kernel requests will be high priority. So open send
> +  * windows only for high priority RxFIFO entries.
> +  */
> + if (ct == VAS_COP_TYPE_842_HIPRI) {

This if body looks like it should be a separate function to me.

> + vas_init_tx_win_attr(, ct);
> + txattr.lpid = 0;/* lpid is 0 for kernel requests */
> + txattr.pid = mfspr(SPRN_PID);
> +
> + /*
> +  * Open a VAS send window which is used to send request to NX.
> +  */
> + txwin = vas_tx_win_open(vasid, ct, );
> + if (IS_ERR(txwin)) {
> + pr_err("ibm,nx-842: Can not open TX window: %ld\n",
> + PTR_ERR(txwin));
> + ret = PTR_ERR(txwin);
> + goto err_out;
> + }
> + }
> +
> + coproc = kmalloc(sizeof(*coproc), GFP_KERNEL);
> + if (!coproc) {
> + ret = -ENOMEM;
> + goto err_out;
> + }

The error handling would be simpler if you did that earlier, before you
open the RX/TX windows.

> + coproc->chip_id = chip_id;
> + coproc->vas.rxwin = rxwin;
> + coproc->vas.txwin = txwin;
> +
> + INIT_LIST_HEAD(>list);
> + list_add(>list, _coprocs);

That duplicates logic in the non-vas probe, so ideally would be shared
or in a helper.

> +
> + return 0;
> +
> +err_out:
> + if (txwin)
> + vas_win_close(txwin);
> +
> + vas_win_close(rxwin);
> +
> + return ret;
> +}
> +
> +
> +static int __init nx842_powernv_probe_vas(struct device_node *dn)
> +{
> + struct device_node *nxdn, *np;

There's too many device nodes 

Re: [PATCH guest kernel] vfio/powerpc/spapr_tce: Enforce IOMMU type compatibility check

2017-04-04 Thread Alexey Kardashevskiy
On 25/03/17 23:25, Alexey Kardashevskiy wrote:
> On 25/03/17 07:29, Alex Williamson wrote:
>> On Fri, 24 Mar 2017 17:44:06 +1100
>> Alexey Kardashevskiy  wrote:
>>
>>> The existing SPAPR TCE driver advertises both VFIO_SPAPR_TCE_IOMMU and
>>> VFIO_SPAPR_TCE_v2_IOMMU types to the userspace and the userspace usually
>>> picks the v2.
>>>
>>> Normally the userspace would create a container, attach an IOMMU group
>>> to it and only then set the IOMMU type (which would normally be v2).
>>>
>>> However a specific IOMMU group may not support v2, in other words
>>> it may not implement set_window/unset_window/take_ownership/
>>> release_ownership and such a group should not be attached to
>>> a v2 container.
>>>
>>> This adds extra checks that a new group can do what the selected IOMMU
>>> type suggests. The userspace can then test the return value from
>>> ioctl(VFIO_SET_IOMMU, VFIO_SPAPR_TCE_v2_IOMMU) and try
>>> VFIO_SPAPR_TCE_IOMMU.
>>>
>>> Signed-off-by: Alexey Kardashevskiy 
>>> ---
>>>
>>> This is one of the patches needed to do nested VFIO - for either
>>> second level guest or DPDK running in a guest.
>>> ---
>>>  drivers/vfio/vfio_iommu_spapr_tce.c | 8 
>>>  1 file changed, 8 insertions(+)
>>
>> I'm not sure I understand why you're labeling this "guest kernel", is a
> 
> 
> That is my script :)
> 
>> VM the only case where we can have combinations that only a subset of
>> the groups might support v2?  
> 
> powernv (non-virtualized, and it runs HV KVM) host provides v2-capable
> groups, they all the same, and a pseries host (which normally runs as a
> guest but it can do nested KVM as well - it is called PR KVM) can do only
> v1 (after this patch, without it - no vfio at all).
> 
>> What terrible things happen when such a
>> combination is created?
> 
> There is no mixture at the moment, I just needed a way to tell userspace
> that a group cannot do v2.
> 
>> The fix itself seems sane, but I'm trying to
>> figure out whether it should be marked for stable, should go in for
>> v4.11, or be queued for v4.12.  Thanks,
> 
> No need for stable.


So what is the next step with this patch?


> 
> 
>>
>> Alex
>>
>>> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
>>> b/drivers/vfio/vfio_iommu_spapr_tce.c
>>> index cf3de91fbfe7..a7d811524092 100644
>>> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
>>> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
>>> @@ -1335,8 +1335,16 @@ static int tce_iommu_attach_group(void *iommu_data,
>>>  
>>> if (!table_group->ops || !table_group->ops->take_ownership ||
>>> !table_group->ops->release_ownership) {
>>> +   if (container->v2) {
>>> +   ret = -EPERM;
>>> +   goto unlock_exit;
>>> +   }
>>> ret = tce_iommu_take_ownership(container, table_group);
>>> } else {
>>> +   if (!container->v2) {
>>> +   ret = -EPERM;
>>> +   goto unlock_exit;
>>> +   }
>>> ret = tce_iommu_take_ownership_ddw(container, table_group);
>>> if (!tce_groups_attached(container) && !container->tables[0])
>>> container->def_window_pending = true;
>>
> 
> 


-- 
Alexey


[PATCH v2] KVM: PPC: Book3S PR: Do not fail emulation with mtspr/mfspr for unknown SPRs

2017-04-04 Thread Thomas Huth
According to the PowerISA 2.07, mtspr and mfspr should not always
generate an illegal instruction exception when being used with an
undefined SPR, but rather treat the instruction as a NOP or inject a
privilege exception in some cases, too - depending on the SPR number.
Also turn the printk here into a ratelimited print statement, so that
the guest can not flood the dmesg log of the host by issueing lots of
illegal mtspr/mfspr instruction here.

Signed-off-by: Thomas Huth 
---
 v2:
 - Inject illegal instruction program interrupt instead of emulation
   assist interrupt (according to the last programming note in section
   6.5.9 of Book III of the PowerISA v2.07)

 arch/powerpc/kvm/book3s_emulate.c | 26 ++
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_emulate.c 
b/arch/powerpc/kvm/book3s_emulate.c
index 8359752..bf4181e 100644
--- a/arch/powerpc/kvm/book3s_emulate.c
+++ b/arch/powerpc/kvm/book3s_emulate.c
@@ -503,10 +503,14 @@ int kvmppc_core_emulate_mtspr_pr(struct kvm_vcpu *vcpu, 
int sprn, ulong spr_val)
break;
 unprivileged:
default:
-   printk(KERN_INFO "KVM: invalid SPR write: %d\n", sprn);
-#ifndef DEBUG_SPR
-   emulated = EMULATE_FAIL;
-#endif
+   pr_info_ratelimited("KVM: invalid SPR write: %d\n", sprn);
+   if (sprn & 0x10) {
+   if (kvmppc_get_msr(vcpu) & MSR_PR)
+   kvmppc_core_queue_program(vcpu, SRR1_PROGPRIV);
+   } else {
+   if ((kvmppc_get_msr(vcpu) & MSR_PR) || sprn == 0)
+   kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
+   }
break;
}
 
@@ -648,10 +652,16 @@ int kvmppc_core_emulate_mfspr_pr(struct kvm_vcpu *vcpu, 
int sprn, ulong *spr_val
break;
default:
 unprivileged:
-   printk(KERN_INFO "KVM: invalid SPR read: %d\n", sprn);
-#ifndef DEBUG_SPR
-   emulated = EMULATE_FAIL;
-#endif
+   pr_info_ratelimited("KVM: invalid SPR read: %d\n", sprn);
+   if (sprn & 0x10) {
+   if (kvmppc_get_msr(vcpu) & MSR_PR)
+   kvmppc_core_queue_program(vcpu, SRR1_PROGPRIV);
+   } else {
+   if ((kvmppc_get_msr(vcpu) & MSR_PR) || sprn == 0 ||
+   sprn == 4 || sprn == 5 || sprn == 6)
+   kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
+   }
+
break;
}
 
-- 
1.8.3.1



Re: [PATCH] raid6/altivec: adding vpermxor implementation for raid6 Q syndrome

2017-04-04 Thread Michael Ellerman
Daniel Axtens  writes:

>> In that function, the flow is:
>>  pagefault_disable();
>>  enable_kernel_altivec();
>>  
>>  pagefault_enable();
>>
>> There are a few things that it would be nice (but by no means essential)
>> to find out:
>>  - what is the difference between pagefault and prempt enable/disable
>>  - is it required to disable altivec after the end of the function or
>>can we leave that enabled?
>
> Answering my own question here, dc4fbba11e46 ("powerpc: Create
> disable_kernel_{fp,altivec,vsx,spe}()") adds the disable_ function, and
> it's a no-op except under debug conditions. So it should stay.

Yeah enabling altivec for use in the kernel requires saving the
userspace altivec state first (so we don't clobber it).

But once we've enabled it in the kernel, we can just leave it enabled
until we return to userspace and save the cost of disabling it. There's
a small risk that leaving altivec enabled allows some other kernel code
to use altivec when it shouldn't, hence the debug option to make
disable_kernel_altivec() actually disable it.

cheers


Re: [PATCH] powerpc/misc: fix exported functions that reference the TOC

2017-04-04 Thread Michael Ellerman
Benjamin Herrenschmidt  writes:

> On Mon, 2017-04-03 at 23:29 +1000, Michael Ellerman wrote:
>> The other option would be just to make a rule that anything EXPORT'ed
>> must use _GLOBAL_TOC().
>
> Can we enforce that somewhat at build time ?

Yeah I had a quick look at doing that last night but didn't see anything
obvious.

So yes we could, but perhaps not easily - EXPORT_SYMBOL() is generic
code so changing that could be a pain.

cheers


Re: [PATCH kernel] powerpc/iommu: Do not call PageTransHuge() on tail pages

2017-04-04 Thread Aneesh Kumar K.V
Alexey Kardashevskiy  writes:

> The CMA pages migration code does not support compound pages at
> the moment so it performs few tests before proceeding to actual page
> migration.
>
> One of the tests - PageTransHuge() - has VM_BUG_ON_PAGE(PageTail()) as
> it should be called on head pages. Since we also test for PageCompound(),
> and it contains PageTail(), we can simply move PageCompound() in front
> of PageTransHuge() and therefore avoid possible VM_BUG_ON_PAGE.
>
> Signed-off-by: Alexey Kardashevskiy 
> ---
>
> Some of actual POWER8 systems do crash on that BUG_ON.
> ---
>  arch/powerpc/mm/mmu_context_iommu.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/mm/mmu_context_iommu.c 
> b/arch/powerpc/mm/mmu_context_iommu.c
> index 497130c5c742..ba7fccf993b3 100644
> --- a/arch/powerpc/mm/mmu_context_iommu.c
> +++ b/arch/powerpc/mm/mmu_context_iommu.c
> @@ -81,7 +81,7 @@ struct page *new_iommu_non_cma_page(struct page *page, 
> unsigned long private,
>   gfp_t gfp_mask = GFP_USER;
>   struct page *new_page;
>
> - if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
> + if (PageCompound(page) || PageHuge(page) || PageTransHuge(page))


A checked for compound page should be sufficient here, because a
Huge/TransHuge page is also marked compound. If we want to indicate that
we don't handle hugetlb and THP pages, we can write that as a comment ?



>   return NULL;
>
>   if (PageHighMem(page))
> @@ -100,7 +100,7 @@ static int mm_iommu_move_page_from_cma(struct page *page)
>   LIST_HEAD(cma_migrate_pages);
>
>   /* Ignore huge pages for now */
> - if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
> + if (PageCompound(page) || PageHuge(page) || PageTransHuge(page))
>   return -EBUSY;
>
>   lru_add_drain();
> -- 
> 2.11.0



Re: [PATCH] powerpc/mm: Add missing global TLBI if cxl is active

2017-04-04 Thread Aneesh Kumar K.V
Frederic Barrat  writes:

> Commit 4c6d9acce1f4 ("powerpc/mm: Add hooks for cxl") converted local
> TLBIs to global if the cxl driver is active. It is necessary because
> the CAPP snoops invalidations to forward them to the PSL on the cxl
> adapter.
> However one path was apparently forgotten. native_flush_hash_range()
> still sends local TLBIs, as found out the hard way recently.
>
> This patch fixes it by following the same logic as previously: if the
> cxl driver is active, the local TLBIs are 'upgraded' to global.
>
> Fixes: 4c6d9acce1f4 ("powerpc/mm: Add hooks for cxl")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Frederic Barrat 

Reviewed-by: Aneesh Kumar K.V 

> ---
>  arch/powerpc/mm/hash_native_64.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/mm/hash_native_64.c 
> b/arch/powerpc/mm/hash_native_64.c
> index cc33260..65bb8f3 100644
> --- a/arch/powerpc/mm/hash_native_64.c
> +++ b/arch/powerpc/mm/hash_native_64.c
> @@ -638,6 +638,10 @@ static void native_flush_hash_range(unsigned long 
> number, int local)
>   unsigned long psize = batch->psize;
>   int ssize = batch->ssize;
>   int i;
> + unsigned int use_local;
> +
> + use_local = local && mmu_has_feature(MMU_FTR_TLBIEL) &&
> + mmu_psize_defs[psize].tlbiel && !cxl_ctx_in_use();
>
>   local_irq_save(flags);
>
> @@ -667,8 +671,7 @@ static void native_flush_hash_range(unsigned long number, 
> int local)
>   } pte_iterate_hashed_end();
>   }
>
> - if (mmu_has_feature(MMU_FTR_TLBIEL) &&
> - mmu_psize_defs[psize].tlbiel && local) {
> + if (use_local) {
>   asm volatile("ptesync":::"memory");
>   for (i = 0; i < number; i++) {
>   vpn = batch->vpn[i];
> -- 
> 2.9.3



Re: [PATCH] powerpc/hugetlb: Add ABI defines for MAP_HUGE_16MB and MAP_HUGE_16GB

2017-04-04 Thread Aneesh Kumar K.V



On Tuesday 04 April 2017 11:33 AM, Anshuman Khandual wrote:

This just adds user space exported ABI definitions for both 16MB and
16GB non default huge page sizes to be used with mmap() system call.

Signed-off-by: Anshuman Khandual 
---
These defined values will be used along with MAP_HUGETLB while calling
mmap() system call if the desired HugeTLB page size is not the default
one. Follows similar definitions present in x86.

arch/x86/include/uapi/asm/mman.h:#define MAP_HUGE_2MB(21 << MAP_HUGE_SHIFT)
arch/x86/include/uapi/asm/mman.h:#define MAP_HUGE_1GB(30 << MAP_HUGE_SHIFT)

 arch/powerpc/include/uapi/asm/mman.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/include/uapi/asm/mman.h 
b/arch/powerpc/include/uapi/asm/mman.h
index 03c06ba..e78980b 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -29,4 +29,7 @@
 #define MAP_STACK  0x2 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x4 /* create a huge page mapping */

+#define MAP_HUGE_16MB  (24 << MAP_HUGE_SHIFT)/* 16MB HugeTLB Page */
+#define MAP_HUGE_16GB  (34 << MAP_HUGE_SHIFT)/* 16GB HugeTLB Page */
+
 #endif /* _UAPI_ASM_POWERPC_MMAN_H */



I am doing a similar patch as part of 1G and hugetlb migration series. 
Can you add 2M and 1G #defines also so that i can drop the patch from my 
series and pick this ?


-aneesh



Re: [PATCH] KVM: PPC: Book3S PR: Do not fail emulation with mtspr/mfspr for unknown SPRs

2017-04-04 Thread Thomas Huth
On 04.04.2017 08:25, Paul Mackerras wrote:
> On Mon, Apr 03, 2017 at 01:23:15PM +0200, Thomas Huth wrote:
>> According to the PowerISA 2.07, mtspr and mfspr should not generate
>> an illegal instruction exception when being used with an undefined SPR,
>> but rather treat the instruction as a NOP, inject a privilege exception
>> or an emulation assistance exception - depending on the SPR number.
> 
> The emulation assist interrupt is a hypervisor interrupt, so the guest
> would not be expecting to receive it.  On a real machine, the
> hypervisor would synthesize an illegal instruction type program
> interrupt as described in the last programming note in section 6.5.9
> of Book III of Power ISA v2.07B.  Since we are the hypervisor here, we
> should synthesize a program interrupt rather than an emulation assist
> interrupt.

Ah, right, we're doing this in other spots, too, so a PROGILL is indeed
more consistent here. I'll send a v2 ...

 Thomas



Re: [PATCH] KVM: PPC: Book3S PR: Do not fail emulation with mtspr/mfspr for unknown SPRs

2017-04-04 Thread Paul Mackerras
On Mon, Apr 03, 2017 at 01:23:15PM +0200, Thomas Huth wrote:
> According to the PowerISA 2.07, mtspr and mfspr should not generate
> an illegal instruction exception when being used with an undefined SPR,
> but rather treat the instruction as a NOP, inject a privilege exception
> or an emulation assistance exception - depending on the SPR number.

The emulation assist interrupt is a hypervisor interrupt, so the guest
would not be expecting to receive it.  On a real machine, the
hypervisor would synthesize an illegal instruction type program
interrupt as described in the last programming note in section 6.5.9
of Book III of Power ISA v2.07B.  Since we are the hypervisor here, we
should synthesize a program interrupt rather than an emulation assist
interrupt.

> Also turn the printk here into a ratelimited print statement, so that
> the guest can not flood the dmesg log of the host by issueing lots of
> illegal mtspr/mfspr instruction here.

Good idea.

Paul.


Re: [PATCH] KVM: PPC: Book3S PR: Do not always inject facility unavailable exceptions

2017-04-04 Thread Paul Mackerras
On Mon, Apr 03, 2017 at 01:28:34PM +0200, Thomas Huth wrote:
> KVM should not inject a facility unavailable exception into the guest
> when it tries to execute a mtspr/mfspr instruction for an SPR that
> is unavailable, and the vCPU is *not* running in PRoblem state.
> 
> It's right that we inject an exception when the vCPU is in PR mode, since
> chapter "6.2.10 Facility Status and Control Register" of the PowerISA
> v2.07 says that "When the FSCR makes a facility unavailable, attempted
> usage of the facility in *problem state* is treated as follows: [...]
> Access of an SPR using mfspr/mtspr causes a Facility Unavailable
> interrupt". But if the guest vCPU is not in PR mode, we should follow
> the behavior that is described in chapter "4.4.4 Move To/From System
> Register Instructions" instead and treat the instruction as a NOP.

This doesn't seem quite right.  My reading of the ISA is that the FSCR
bit for a facility being 0 doesn't prevent privileged code from
accessing the facility, so we shouldn't be treating mfspr/mtspr as
NOP.  Instead we should be set the facility's bit in the shadow
FSCR and re-execute the instruction (remembering of course to clear
the FSCR bit when we go back to emulated problem state).

For TM it's a bit different as the MSR[TM] bit does prevent privileged
code from accessing TM registers and instructions, so for TM we should
be delivering a facility unavailable interrupt even when the guest is
in emulated privileged state.

So I don't see any case where mfspr/mtspr should be treated as a NOP
in response to a facility unavailable interrupt.

Paul.


[PATCH] powerpc/hugetlb: Add ABI defines for MAP_HUGE_16MB and MAP_HUGE_16GB

2017-04-04 Thread Anshuman Khandual
This just adds user space exported ABI definitions for both 16MB and
16GB non default huge page sizes to be used with mmap() system call.

Signed-off-by: Anshuman Khandual 
---
These defined values will be used along with MAP_HUGETLB while calling
mmap() system call if the desired HugeTLB page size is not the default
one. Follows similar definitions present in x86.

arch/x86/include/uapi/asm/mman.h:#define MAP_HUGE_2MB(21 << MAP_HUGE_SHIFT)
arch/x86/include/uapi/asm/mman.h:#define MAP_HUGE_1GB(30 << MAP_HUGE_SHIFT)

 arch/powerpc/include/uapi/asm/mman.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/include/uapi/asm/mman.h 
b/arch/powerpc/include/uapi/asm/mman.h
index 03c06ba..e78980b 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -29,4 +29,7 @@
 #define MAP_STACK  0x2 /* give out an address that is best 
suited for process/thread stacks */
 #define MAP_HUGETLB0x4 /* create a huge page mapping */
 
+#define MAP_HUGE_16MB  (24 << MAP_HUGE_SHIFT)  /* 16MB HugeTLB Page */
+#define MAP_HUGE_16GB  (34 << MAP_HUGE_SHIFT)  /* 16GB HugeTLB Page */
+
 #endif /* _UAPI_ASM_POWERPC_MMAN_H */
-- 
1.8.5.2