date:20161114

Re: [PATCH] powerpc/64: Simplify adaptation to new ISA v3.00 HPTE format

2016-11-14 Thread Paul Mackerras

On Mon, Nov 14, 2016 at 11:26:00AM +1100, Balbir Singh wrote:
> 
> 
> On 11/11/16 16:55, Paul Mackerras wrote:
> > This changes the way that we support the new ISA v3.00 HPTE format.
> > Instead of adapting everything that uses HPTE values to handle either
> > the old format or the new format, depending on which CPU we are on,
> > we now convert explicitly between old and new formats if necessary
> > in the low-level routines that actually access HPTEs in memory.
> > This limits the amount of code that needs to know about the new
> > format and makes the conversions explicit.  This is OK because the
> > old format contains all the information that is in the new format.
...
> > +   if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> > +   hpte_r = hpte_old_to_new_r(hpte_v, hpte_r);
> > +   hpte_v = hpte_old_to_new_v(hpte_v);
> 
> I don't think its called out, but it seems like we have a depedency where
> hpte_old_to_new_r MUST be called prior to hpte_old_to_new_v, since we need
> the v bit to be extracted and moved to the _r bit. I suspect one way to avoid
> that dependency is to pass the ssize_field or to do both conversions at once.

There is no dependency between the functions.  The functions are pure
functions.  If you are overwriting the input values with the output
values, then yes you need to call the functions with the inputs not
the outputs.  I would like to think that even programmers of average
skill can see that

a = foo(a, b);
b = bar(b);

computes something different from

b = bar(b);
a = foo(a, b);

If someone is going to be so careless as to overlook that difference,
then I don't believe they would bother to read a comment either.  (I
know for sure that people don't read comments, as witnessed by the KVM
bug I found recently.)

Paul.

Re: [PATCH v7 2/5] mm: remove x86-only restriction of movable_node

2016-11-14 Thread Aneesh Kumar K.V

Reza Arbab  writes:

> In commit c5320926e370 ("mem-hotplug: introduce movable_node boot
> option"), the memblock allocation direction is changed to bottom-up and
> then back to top-down like this:
>
> 1. memblock_set_bottom_up(true), called by cmdline_parse_movable_node().
> 2. memblock_set_bottom_up(false), called by x86's numa_init().
>
> Even though (1) occurs in generic mm code, it is wrapped by #ifdef
> CONFIG_MOVABLE_NODE, which depends on X86_64.
>
> This means that when we extend CONFIG_MOVABLE_NODE to non-x86 arches,
> things will be unbalanced. (1) will happen for them, but (2) will not.
>
> This toggle was added in the first place because x86 has a delay between
> adding memblocks and marking them as hotpluggable. Since other arches do
> this marking either immediately or not at all, they do not require the
> bottom-up toggle.
>
> So, resolve things by moving (1) from cmdline_parse_movable_node() to
> x86's setup_arch(), immediately after the movable_node parameter has
> been parsed.


Considering that we now can mark memblock hotpluggable, do we need to
enable the bottom up allocation for ppc64 also ?


>
> Signed-off-by: Reza Arbab 
> ---
>  Documentation/kernel-parameters.txt |  2 +-
>  arch/x86/kernel/setup.c | 24 

-aneesh

Re: [PATCH v1 3/3] powerpc: fix node_possible_map limitations

2016-11-14 Thread Michael Ellerman

Can you make the subject a bit more descriptive?

Currently this prevents node hotplug, so it's required that we remove it
to support that IIUIC.

Balbir Singh  writes:
> We've fixed the memory hotplug issue with memcg, hence
> this work around should not be required.
>
> Fixes: commit 3af229f2071f
> ("powerpc/numa: Reset node_possible_map to only node_online_map")

I don't think Fixes is right here, that commit wasn't buggy, it was just
a workaround for the code at that time.

Just say "This is a revert of commit 3af229f2071f ("powerpc/numa: Reset
node_possible_map to only node_online_map")".

Otherwise LGTM to go via mm.

Acked-by: Michael Ellerman 

cheers

[powerpc v6 3/3] Enable storage keys for radix - user mode execution

2016-11-14 Thread Balbir Singh

ISA 3 defines new encoded access authority that allows instruction
access prevention in privileged mode and allows normal access
to problem state. This patch just enables IAMR (Instruction Authority
Mask Register), enabling AMR would require more work.

I've tested this with a buggy driver and a simple payload. The payload
is specific to the build I've tested.

Signed-off-by: Balbir Singh 
---
 arch/powerpc/mm/pgtable-radix.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 7aa104d..1a3ea06 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -341,6 +341,26 @@ static void radix_init_amor(void)
mtspr(SPRN_AMOR, amor);
 }
 
+/*
+ * For radix page tables we setup, the IAMR values as follows
+ * IMAR = 0100...00 (key 0 is set to 1)
+ * AMR, UAMR, UAMOR are not affected
+ */
+static void radix_init_iamr(void)
+{
+   unsigned long iamr_mask = 0x4000;
+   unsigned long iamr;
+
+   /*
+* The IAMR should set to 0 in DD1
+*/
+   if (cpu_has_feature(CPU_FTR_POWER9_DD1))
+   return;
+
+   iamr = iamr_mask;
+   mtspr(SPRN_IAMR, iamr);
+}
+
 void __init radix__early_init_mmu(void)
 {
unsigned long lpcr;
@@ -400,6 +420,7 @@ void __init radix__early_init_mmu(void)
radix_init_amor();
}
 
+   radix_init_iamr();
radix_init_pgtable();
 }
 
@@ -417,6 +438,7 @@ void radix__early_init_mmu_secondary(void)
  __pa(partition_tb) | (PATB_SIZE_SHIFT - 12));
radix_init_amor();
}
+   radix_init_iamr();
 }
 
 void radix__mmu_cleanup_all(void)
-- 
2.5.5

[powerpc v6 2/3] Detect instruction fetch denied and report

2016-11-14 Thread Balbir Singh

ISA 3 allows for prevention of instruction fetch and execution
of user mode pages. If such an error occurs, SRR1 bit 35
reports the error. We catch and report the error in do_page_fault()

Signed-off-by: Balbir Singh 
---
 arch/powerpc/mm/fault.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index d0b137d..d498e40 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -390,6 +390,13 @@ int do_page_fault(struct pt_regs *regs, unsigned long 
address,
 #endif /* CONFIG_8xx */
 
if (is_exec) {
+
+   /*
+* An execution fault + no execute ?
+*/
+   if (regs->msr & SRR1_ISI_N_OR_G)
+   goto bad_area;
+
/*
 * Allow execution from readable areas if the MMU does not
 * provide separate controls over reading and executing.
@@ -404,6 +411,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long 
address,
(cpu_has_feature(CPU_FTR_NOEXECUTE) ||
 !(vma->vm_flags & (VM_READ | VM_WRITE
goto bad_area;
+
 #ifdef CONFIG_PPC_STD_MMU
/*
 * protfault should only happen due to us
-- 
2.5.5

[powerpc v6 1/3] Setup AMOR in HV mode

2016-11-14 Thread Balbir Singh

AMOR should be setup in HV mode, we set it up once
and let the generic kernel handle IAMR. This patch is
used to enable storage keys in a following patch as
defined in ISA 3. We don't setup AMOR in DD1, since we
can't setup IAMR in DD1 (bits have to be 0). If we setup
AMOR some other code could potentially try to set IAMR
(guest kernel for example).

Reported-by: Aneesh Kumar K.V 
Signed-off-by: Balbir Singh 
---
 arch/powerpc/mm/pgtable-radix.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index ed7bddc..7aa104d 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -320,6 +320,27 @@ static void update_hid_for_radix(void)
cpu_relax();
 }
 
+/*
+ * In HV mode, we init AMOR so that the hypervisor
+ * and guest can setup IMAR, enable key 0 and set
+ * it to 1
+ * AMOR = 110000 (Mask for key 0 is 11)
+ */
+static void radix_init_amor(void)
+{
+   unsigned long amor_mask = 0xc000;
+   unsigned long amor;
+
+   /*
+* The amor bits are unused in DD1
+*/
+   if (cpu_has_feature(CPU_FTR_POWER9_DD1))
+   return;
+
+   amor = amor_mask;
+   mtspr(SPRN_AMOR, amor);
+}
+
 void __init radix__early_init_mmu(void)
 {
unsigned long lpcr;
@@ -376,6 +397,7 @@ void __init radix__early_init_mmu(void)
lpcr = mfspr(SPRN_LPCR);
mtspr(SPRN_LPCR, lpcr | LPCR_UPRT | LPCR_HR);
radix_init_partition_table();
+   radix_init_amor();
}
 
radix_init_pgtable();
@@ -393,6 +415,7 @@ void radix__early_init_mmu_secondary(void)
 
mtspr(SPRN_PTCR,
  __pa(partition_tb) | (PATB_SIZE_SHIFT - 12));
+   radix_init_amor();
}
 }
 
-- 
2.5.5

[powerpc v6 0/3] Enable storage keys for radix

2016-11-14 Thread Balbir Singh

The first patch sets up AMOR in hypervisor mode. AMOR
needs to be setup before IAMR (details of AMOR/IAMR in
each patch). The second patch enables detection of exceptions
generated due to instruction fetch violations caused
and OOPSs' the task. The third patch enables IAMR for
both hypervisor and guest kernels.

IAMR in radix mode, prevents the kernel from executing
code from user mode pages.

I've tested with patch series with a sample hack and
payload.

Chris Smart helped with the series, reviewing and
providing valuable feedback

Changelog from previous post
  Implement review comments and suggestions

Balbir Singh (3):
  powerpc:Setup AMOR in HV mode
  powerpc/mm/radix:Detect instruction fetch denied and report
  powerpc:Enable storage keys for radix - user mode execution

 arch/powerpc/mm/fault.c |  8 
 arch/powerpc/mm/pgtable-radix.c | 45 +
 2 files changed, 53 insertions(+)

-- 
2.5.5

Re: [PATCH 3/3] powerpc/fsl/dts: add FMan node for t1042d4rdb

2016-11-14 Thread Scott Wood

On Fri, 2016-11-11 at 17:53 +0200, Madalin Bucur wrote:
> Signed-off-by: Madalin Bucur 
> ---
>  arch/powerpc/boot/dts/fsl/t1042d4rdb.dts | 47
> 
>  1 file changed, 47 insertions(+)
> 
> diff --git a/arch/powerpc/boot/dts/fsl/t1042d4rdb.dts
> b/arch/powerpc/boot/dts/fsl/t1042d4rdb.dts
> index 2a5a90d..8c0c318 100644
> --- a/arch/powerpc/boot/dts/fsl/t1042d4rdb.dts
> +++ b/arch/powerpc/boot/dts/fsl/t1042d4rdb.dts
> @@ -48,6 +48,53 @@
>   "fsl,deepsleep-cpld";
>   };
>   };
> + soc: soc@ffe00 {

Please leave a blank line between nodes, especially here at the top level.

-Scott

Re: powerpc64: Enable CONFIG_E500 and CONFIG_PPC_E500MC for e5500/e6500

2016-11-14 Thread Scott Wood

On Fri, 2016-10-07 at 11:00 +0200, David Engraf wrote:
> Am 27.09.2016 um 01:08 schrieb Scott Wood:
> > 
> > On Mon, 2016-09-26 at 10:48 +0200, David Engraf wrote:
> > > 
> > > Am 25.09.2016 um 08:20 schrieb Scott Wood:
> > > > 
> > > > 
> > > > On Mon, Aug 22, 2016 at 04:46:43PM +0200, David Engraf wrote:
> > > > > 
> > > > > 
> > > > > The PowerPC e5500/e6500 architecture is based on the e500mc core.
> > > > > Enable
> > > > > CONFIG_E500 and CONFIG_PPC_E500MC when e5500/e6500 is used.
> > > > > 
> > > > > This will also fix using CONFIG_PPC_QEMU_E500 on PPC64.
> > > > > 
> > > > > Signed-off-by: David Engraf 
> > > > > ---
> > > > >  arch/powerpc/platforms/Kconfig.cputype | 6 --
> > > > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/arch/powerpc/platforms/Kconfig.cputype
> > > > > b/arch/powerpc/platforms/Kconfig.cputype
> > > > > index f32edec..0382da7 100644
> > > > > --- a/arch/powerpc/platforms/Kconfig.cputype
> > > > > +++ b/arch/powerpc/platforms/Kconfig.cputype
> > > > > @@ -125,11 +125,13 @@ config POWER8_CPU
> > > > > 
> > > > >  config E5500_CPU
> > > > >   bool "Freescale e5500"
> > > > > - depends on E500
> > > > > + select E500
> > > > > + select PPC_E500MC
> > > > > 
> > > > >  config E6500_CPU
> > > > >   bool "Freescale e6500"
> > > > > - depends on E500
> > > > > + select E500
> > > > > + select PPC_E500MC
> > > > These config symbols are for setting -mcpu.  Kernels built with
> > > > CONFIG_GENERIC_CPU should also work on e5500/e6500.
> > > I don't think so.
> > I do think so.  It's what you get when you run "make
> > corenet64_smp_defconfig"
> > and that kernel works on e5500/e6500.
> > 
> > > 
> > >  At least on QEMU it is not working because e5500/e6500
> > > is based on the e500mc core and the option CONFIG_PPC_E500MC also
> > > controls the cpu features (check cputable.h).
> > Again, this is only a problem when you have CONFIG_PPC_QEMU_E500 without
> > CONFIG_CORENET_GENERIC, and the fix for that is to have
> > CONFIG_PPC_QEMU_E500
> > select CONFIG_E500 (and you need to manually turn on CONFIG_PPC_E500MC if
> > applicable, since CONFIG_PPC_QEMU_E500 can also be used with e500v2).
> > 
> > I wouldn't be opposed to also adding "select PPC_E500MC if PPC64" to
> > CONFIG_PPC_QEMU_E500.
> Please find attached the new version, setting E500 and PPC_E500MC on 64 
> bit for review.

Could you send as a standalone patch (not an attachment) with changelog and
signoff so I can apply it?

-Scott

Re: [PATCH 3/3] soc: fsl: make guts driver explicitly non-modular

2016-11-14 Thread Scott Wood

On Sun, 2016-11-13 at 14:03 -0500, Paul Gortmaker wrote:
> The Kconfig currently controlling compilation of this code is:
> 
> drivers/soc/fsl/Kconfig:config FSL_GUTS
> drivers/soc/fsl/Kconfig:bool
> 
> ...meaning that it currently is not being built as a module by anyone.
> 
> Lets remove the modular code that is essentially orphaned, so that
> when reading the driver there is no doubt it is builtin-only.
> 
> We explicitly disallow a driver unbind, since that doesn't have a
> sensible use case anyway, and it allows us to drop the ".remove"
> code for non-modular drivers.
> 
> Since the code was already not using module_init, the init ordering
> remains unchanged with this commit.
> 
> Also note that MODULE_DEVICE_TABLE is a no-op for non-modular code.
> 
> Cc: Scott Wood 
> Cc: Yangbo Lu 
> Cc: Arnd Bergmann 
> Cc: Ulf Hansson 
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-arm-ker...@lists.infradead.org
> Signed-off-by: Paul Gortmaker 

Acked-by: Scott Wood 

-Scott

Re: [PATCH v2 2/4] powerpc/perf: update attribute_group data structure

2016-11-14 Thread kbuild test robot

Hi Madhavan,

[auto build test ERROR on powerpc/next]
[also build test ERROR on v4.9-rc5 next-20161114]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Madhavan-Srinivasan/powerpc-perf-factor-out-the-event-format-field/20161115-041335
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-pasemi_defconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=powerpc 

Note: it may well be a FALSE warning. FWIW you are at least aware of it now.
http://gcc.gnu.org/wiki/Better_Uninitialized_Warnings

Note: the 
linux-review/Madhavan-Srinivasan/powerpc-perf-factor-out-the-event-format-field/20161115-041335
 HEAD 51c82e12f03b6f83703ad810dfa72a7bf2205983 builds fine.
  It only hurts bisectibility.

All errors (new ones prefixed by >>):

   arch/powerpc/perf/power9-pmu.c: In function 'init_power9_pmu':
>> arch/powerpc/perf/power9-pmu.c:294:5: error: 'rc' may be used uninitialized 
>> in this function [-Werror=maybe-uninitialized]
 if (rc)
^
   cc1: all warnings being treated as errors

vim +/rc +294 arch/powerpc/perf/power9-pmu.c

8c002dbd Madhavan Srinivasan 2016-06-26  288return -ENODEV;
8c002dbd Madhavan Srinivasan 2016-06-26  289  
4a3d1bb1 Madhavan Srinivasan 2016-11-14  290if 
(cpu_has_feature(CPU_FTR_POWER9_DD1)) {
4a3d1bb1 Madhavan Srinivasan 2016-11-14  291rc = 
register_power_pmu(&power9_isa207_pmu);
4a3d1bb1 Madhavan Srinivasan 2016-11-14  292}
4a3d1bb1 Madhavan Srinivasan 2016-11-14  293  
8c002dbd Madhavan Srinivasan 2016-06-26 @294if (rc)
8c002dbd Madhavan Srinivasan 2016-06-26  295return rc;
8c002dbd Madhavan Srinivasan 2016-06-26  296  
8c002dbd Madhavan Srinivasan 2016-06-26  297/* Tell userspace that EBB is 
supported */

:: The code at line 294 was first introduced by commit
:: 8c002dbd05eecbb2933e9668da9614b33c7a97d2 powerpc/perf: Power9 PMU support

:: TO: Madhavan Srinivasan 
:: CC: Michael Ellerman 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH] T4240RBD: add device tree entry for W83793

2016-11-14 Thread Scott Wood

On Mon, 2016-11-14 at 13:28 +0100, Florian Larysch wrote:
> The T4240RDB contains a W83793 hardware monitoring chip. Add a device
> tree entry to make the driver attach to it, as the i2c-mpc bus driver
> dropped support for class-based instantiation of devices a long time
> ago.
> 
> Signed-off-by: Florian Larysch 
> ---
>  arch/powerpc/boot/dts/fsl/t4240rdb.dts | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/powerpc/boot/dts/fsl/t4240rdb.dts
> b/arch/powerpc/boot/dts/fsl/t4240rdb.dts
> index cc0a264..b35eea1 100644
> --- a/arch/powerpc/boot/dts/fsl/t4240rdb.dts
> +++ b/arch/powerpc/boot/dts/fsl/t4240rdb.dts
> @@ -142,6 +142,10 @@
>   reg = <0x68>;
>   interrupts = <0x1 0x1 0 0>;
>   };
> + hwmon@2f {
> + compatible = "winbond,w83793";
> + reg = <0x2f>;
> + };
>   };
>  
>   sdhc@114000 {

Please (in a separate patch) add this to
Documentation/devicetree/bindings/i2c/trivial-devices.txt and CC\
devicet...@vger.kernel.org

Also, please keep the i2c nodes sorted by address.

-Scott

[PATCH] powerpc/64: Fix setting of AIL in hypervisor mode

2016-11-14 Thread Benjamin Herrenschmidt

Commit d3cbff1b5 "powerpc: Put exception configuration in a common place"
broke the setting of the AIL bit (which enables taking exceptions with
the MMU still on) on all processors, moving it incorrectly to a function
called only on the boot CPU. This was correct for the guest case but
not when running in hypervisor mode.

This fixes it by partially reverting that commit, putting the setting
back in cpu_ready_for_interrupts()

Signed-off-by: Benjamin Herrenschmidt 
Fixes: d3cbff1b5 ("powerpc: Put exception configuration in a common place")
CC: sta...@vger.kernel.org # v4.8+
---
 arch/powerpc/kernel/setup_64.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 7ac8e6e..ac75224 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -226,17 +226,25 @@ static void __init configure_exceptions(void)
if (firmware_has_feature(FW_FEATURE_OPAL))
opal_configure_cores();
 
-   /* Enable AIL if supported, and we are in hypervisor mode */
-   if (early_cpu_has_feature(CPU_FTR_HVMODE) &&
-   early_cpu_has_feature(CPU_FTR_ARCH_207S)) {
-   unsigned long lpcr = mfspr(SPRN_LPCR);
-   mtspr(SPRN_LPCR, lpcr | LPCR_AIL_3);
-   }
+   /* AIL on native is done in cpu_ready_for_interrupts */
}
 }
 
 static void cpu_ready_for_interrupts(void)
 {
+   /*
+* Enable AIL if supported, and we are in hypervisor mode. This
+* is called once for every processor.
+*
+* If we are not in hypervisor mode the job is done once for
+* the whole partition in configure_exceptions().
+*/
+   if (early_cpu_has_feature(CPU_FTR_HVMODE) &&
+   early_cpu_has_feature(CPU_FTR_ARCH_207S)) {
+   unsigned long lpcr = mfspr(SPRN_LPCR);
+   mtspr(SPRN_LPCR, lpcr | LPCR_AIL_3);
+   }
+
/* Set IR and DR in PACA MSR */
get_paca()->kernel_msr = MSR_KERNEL;
 }

linux-next: manual merge of the akpm tree with the powerpc tree

2016-11-14 Thread Stephen Rothwell

Hi Andrew,

Today's linux-next merge of the akpm tree got a conflict in:

  arch/powerpc/kernel/module_64.c

between commit:

  9f751b82b491 ("powerpc/module: Add support for R_PPC64_REL32 relocations")

from the powerpc tree and patch:

  "powerpc: factor out relocation code in module_64.c"

from the akpm tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/powerpc/kernel/module_64.c
index bb1807184bad,61baad036639..
--- a/arch/powerpc/kernel/module_64.c
+++ b/arch/powerpc/kernel/module_64.c
@@@ -507,6 -507,181 +507,186 @@@ static int restore_r2(u32 *instruction
return 1;
  }
  
+ static int elf64_apply_relocate_add_item(const Elf64_Shdr *sechdrs,
+const char *strtab,
+const Elf64_Rela *rela,
+const Elf64_Sym *sym,
+unsigned long *location,
+unsigned long value,
+unsigned long my_r2,
+const char *obj_name,
+struct module *me)
+ {
+   switch (ELF64_R_TYPE(rela->r_info)) {
+   case R_PPC64_ADDR32:
+   /* Simply set it */
+   *(u32 *)location = value;
+   break;
+ 
+   case R_PPC64_ADDR64:
+   /* Simply set it */
+   *(unsigned long *)location = value;
+   break;
+ 
+   case R_PPC64_TOC:
+   *(unsigned long *)location = my_r2;
+   break;
+ 
+   case R_PPC64_TOC16:
+   /* Subtract TOC pointer */
+   value -= my_r2;
+   if (value + 0x8000 > 0x) {
+   pr_err("%s: bad TOC16 relocation (0x%lx)\n",
+  obj_name, value);
+   return -ENOEXEC;
+   }
+   *((uint16_t *) location)
+   = (*((uint16_t *) location) & ~0x)
+   | (value & 0x);
+   break;
+ 
+   case R_PPC64_TOC16_LO:
+   /* Subtract TOC pointer */
+   value -= my_r2;
+   *((uint16_t *) location)
+   = (*((uint16_t *) location) & ~0x)
+   | (value & 0x);
+   break;
+ 
+   case R_PPC64_TOC16_DS:
+   /* Subtract TOC pointer */
+   value -= my_r2;
+   if ((value & 3) != 0 || value + 0x8000 > 0x) {
+   pr_err("%s: bad TOC16_DS relocation (0x%lx)\n",
+  obj_name, value);
+   return -ENOEXEC;
+   }
+   *((uint16_t *) location)
+   = (*((uint16_t *) location) & ~0xfffc)
+   | (value & 0xfffc);
+   break;
+ 
+   case R_PPC64_TOC16_LO_DS:
+   /* Subtract TOC pointer */
+   value -= my_r2;
+   if ((value & 3) != 0) {
+   pr_err("%s: bad TOC16_LO_DS relocation (0x%lx)\n",
+  obj_name, value);
+   return -ENOEXEC;
+   }
+   *((uint16_t *) location)
+   = (*((uint16_t *) location) & ~0xfffc)
+   | (value & 0xfffc);
+   break;
+ 
+   case R_PPC64_TOC16_HA:
+   /* Subtract TOC pointer */
+   value -= my_r2;
+   value = ((value + 0x8000) >> 16);
+   *((uint16_t *) location)
+   = (*((uint16_t *) location) & ~0x)
+   | (value & 0x);
+   break;
+ 
+   case R_PPC_REL24:
+   /* FIXME: Handle weak symbols here --RR */
+   if (sym->st_shndx == SHN_UNDEF) {
+   /* External: go via stub */
+   value = stub_for_addr(sechdrs, value, me);
+   if (!value)
+   return -ENOENT;
+   if (!restore_r2((u32 *)location + 1, me))
+   return -ENOEXEC;
+ 
+   squash_toc_save_inst(strtab + sym->st_name, value);
+   } else
+   value += local_entry_offset(sym);
+ 
+   /* Convert value to relative */
+   value -= (unsigned long)location;
+   if (value + 0x200 > 0x3ff || (value & 3) != 0) {
+   pr_err("%s: REL24 %li out of range!\n",
+

[PATCH 3/3] powerpc/pseries: Disable IBMEBUS on little endian builds

2016-11-14 Thread Michael Ellerman

The IBMEBUS code supports the GX bus found on Power7 and earlier CPUs.
On Power8 it has been replaced, and so we have no need for it.

We don't actually have a config symbol for Power8 vs Power7 etc., but
we only support booting little endian on Power8 or later, so use that as
a reasonable approximation.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/platforms/pseries/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/Kconfig 
b/arch/powerpc/platforms/pseries/Kconfig
index fbf2e4477f88..e1c280a95d58 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -134,7 +134,7 @@ config IBMVIO
default y
 
 config IBMEBUS
-   depends on PPC_PSERIES
+   depends on PPC_PSERIES && !CPU_LITTLE_ENDIAN
bool "Support for GX bus based adapters"
help
  Bus device driver for GX bus based adapters.
-- 
2.7.4

[PATCH 2/3] powerpc/pseries: Move ibmebus.c into platforms pseries

2016-11-14 Thread Michael Ellerman

ibmebus.c is pseries only code, so move it in there.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/Makefile | 1 -
 arch/powerpc/platforms/Kconfig   | 6 --
 arch/powerpc/platforms/pseries/Kconfig   | 6 ++
 arch/powerpc/platforms/pseries/Makefile  | 1 +
 arch/powerpc/{kernel => platforms/pseries}/ibmebus.c | 0
 5 files changed, 7 insertions(+), 7 deletions(-)
 rename arch/powerpc/{kernel => platforms/pseries}/ibmebus.c (100%)

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 9c57ebf61e4d..26b5a5e02e69 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -58,7 +58,6 @@ obj-$(CONFIG_PPC_RTAS)+= rtas.o rtas-rtc.o 
$(rtaspci-y-y)
 obj-$(CONFIG_PPC_RTAS_DAEMON)  += rtasd.o
 obj-$(CONFIG_RTAS_FLASH)   += rtas_flash.o
 obj-$(CONFIG_RTAS_PROC)+= rtas-proc.o
-obj-$(CONFIG_IBMEBUS)   += ibmebus.o
 obj-$(CONFIG_EEH)  += eeh.o eeh_pe.o eeh_dev.o eeh_cache.o \
  eeh_driver.o eeh_event.o eeh_sysfs.o
 obj-$(CONFIG_GENERIC_TBSYNC)   += smp-tbsync.o
diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
index eae86c35e4c6..7e3a2ebba29b 100644
--- a/arch/powerpc/platforms/Kconfig
+++ b/arch/powerpc/platforms/Kconfig
@@ -168,12 +168,6 @@ config MPIC_BROKEN_REGREAD
  well, but enabling it uses about 8KB of memory to keep copies
  of the register contents in software.
 
-config IBMEBUS
-   depends on PPC_PSERIES
-   bool "Support for GX bus based adapters"
-   help
- Bus device driver for GX bus based adapters.
-
 config EEH
bool
depends on (PPC_POWERNV || PPC_PSERIES) && PCI
diff --git a/arch/powerpc/platforms/pseries/Kconfig 
b/arch/powerpc/platforms/pseries/Kconfig
index f7d78b81951d..fbf2e4477f88 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -132,3 +132,9 @@ config IBMVIO
depends on PPC_PSERIES
bool
default y
+
+config IBMEBUS
+   depends on PPC_PSERIES
+   bool "Support for GX bus based adapters"
+   help
+ Bus device driver for GX bus based adapters.
diff --git a/arch/powerpc/platforms/pseries/Makefile 
b/arch/powerpc/platforms/pseries/Makefile
index 85ba00233fb0..942fe116a8ba 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -22,6 +22,7 @@ obj-$(CONFIG_DTL) += dtl.o
 obj-$(CONFIG_IO_EVENT_IRQ) += io_event_irq.o
 obj-$(CONFIG_LPARCFG)  += lparcfg.o
 obj-$(CONFIG_IBMVIO)   += vio.o
+obj-$(CONFIG_IBMEBUS)  += ibmebus.o
 
 ifeq ($(CONFIG_PPC_PSERIES),y)
 obj-$(CONFIG_SUSPEND)  += suspend.o
diff --git a/arch/powerpc/kernel/ibmebus.c 
b/arch/powerpc/platforms/pseries/ibmebus.c
similarity index 100%
rename from arch/powerpc/kernel/ibmebus.c
rename to arch/powerpc/platforms/pseries/ibmebus.c
-- 
2.7.4

[PATCH 1/3] powerpc/pseries: Move vio.c into platforms pseries

2016-11-14 Thread Michael Ellerman

vio.c is pseries only code, so move it in there.

Signed-off-by: Michael Ellerman 
---
 arch/powerpc/kernel/Makefile | 1 -
 arch/powerpc/platforms/Kconfig   | 5 -
 arch/powerpc/platforms/pseries/Kconfig   | 5 +
 arch/powerpc/platforms/pseries/Makefile  | 1 +
 arch/powerpc/{kernel => platforms/pseries}/vio.c | 0
 5 files changed, 6 insertions(+), 6 deletions(-)
 rename arch/powerpc/{kernel => platforms/pseries}/vio.c (100%)

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 1925341dbb9c..9c57ebf61e4d 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -58,7 +58,6 @@ obj-$(CONFIG_PPC_RTAS)+= rtas.o rtas-rtc.o 
$(rtaspci-y-y)
 obj-$(CONFIG_PPC_RTAS_DAEMON)  += rtasd.o
 obj-$(CONFIG_RTAS_FLASH)   += rtas_flash.o
 obj-$(CONFIG_RTAS_PROC)+= rtas-proc.o
-obj-$(CONFIG_IBMVIO)   += vio.o
 obj-$(CONFIG_IBMEBUS)   += ibmebus.o
 obj-$(CONFIG_EEH)  += eeh.o eeh_pe.o eeh_dev.o eeh_cache.o \
  eeh_driver.o eeh_event.o eeh_sysfs.o
diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
index fbdae8377b71..eae86c35e4c6 100644
--- a/arch/powerpc/platforms/Kconfig
+++ b/arch/powerpc/platforms/Kconfig
@@ -168,11 +168,6 @@ config MPIC_BROKEN_REGREAD
  well, but enabling it uses about 8KB of memory to keep copies
  of the register contents in software.
 
-config IBMVIO
-   depends on PPC_PSERIES
-   bool
-   default y
-
 config IBMEBUS
depends on PPC_PSERIES
bool "Support for GX bus based adapters"
diff --git a/arch/powerpc/platforms/pseries/Kconfig 
b/arch/powerpc/platforms/pseries/Kconfig
index bec90fb30425..f7d78b81951d 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -127,3 +127,8 @@ config HV_PERF_CTRS
  systems. 24x7 is available on Power 8 systems.
 
   If unsure, select Y.
+
+config IBMVIO
+   depends on PPC_PSERIES
+   bool
+   default y
diff --git a/arch/powerpc/platforms/pseries/Makefile 
b/arch/powerpc/platforms/pseries/Makefile
index fedc2ccf029d..85ba00233fb0 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -21,6 +21,7 @@ obj-$(CONFIG_CMM) += cmm.o
 obj-$(CONFIG_DTL)  += dtl.o
 obj-$(CONFIG_IO_EVENT_IRQ) += io_event_irq.o
 obj-$(CONFIG_LPARCFG)  += lparcfg.o
+obj-$(CONFIG_IBMVIO)   += vio.o
 
 ifeq ($(CONFIG_PPC_PSERIES),y)
 obj-$(CONFIG_SUSPEND)  += suspend.o
diff --git a/arch/powerpc/kernel/vio.c b/arch/powerpc/platforms/pseries/vio.c
similarity index 100%
rename from arch/powerpc/kernel/vio.c
rename to arch/powerpc/platforms/pseries/vio.c
-- 
2.7.4

Re: [PATCH] ext4: ext4_mb_mark_free_simple: Fix integer value truncation

2016-11-14 Thread Theodore Ts'o

On Thu, Nov 03, 2016 at 01:32:33PM -0600, Andreas Dilger wrote:
> On Nov 3, 2016, at 3:14 AM, Chandan Rajendra  
> wrote:
> > 
> > 'border' variable is set to a value of 2 times the block size of the
> > underlying filesystem. With 64k block size, the resulting value won't
> > fit into a 16-bit variable. Hence this commit changes the data type of
> > 'border' to 'unsigned int'.
> > 
> > Signed-off-by: Chandan Rajendra 
> 
> Reviewed-by: Andreas Dilger 

Applied, with a change in the commit summary:

ext4: fix mballoc breakage with 64k block size

Many thanks!!

- Ted

Re: [RFC PATCH v7 0/3] PCI: Introduce a way to enforce all MMIO BARs not to share PAGE_SIZE

2016-11-14 Thread Yongji Xie


Hi Bjorn,


Kindly ping. What do you think of the way to fix the bug that 
resources's size is changed


when using resource_alignment. Thanks.


On 2016/10/26 14:53, Yongji Xie wrote:

This series introduces a way for PCI resource allocator to force
MMIO BARs not to share PAGE_SIZE. This would make sense to VFIO
driver. Because current VFIO implementation disallows to mmap
sub-page(size < PAGE_SIZE) MMIO BARs which may share the same page
with other BARs for security reasons. Thus, we have to handle mmio
access to these BARs in QEMU emulation rather than in guest which
will cause some performance loss.

In our solution, we try to make use of the existing code path of
resource_alignment kernel parameter and add a macro to set default
alignment for it. Thus we can define this macro by default on some
archs which may easily hit the performance issue because of their
64K page.

In this series, patch 1,2 fixed bugs of using resource_alignment;
patch 3 adds a macro to set the default alignment of all MMIO BARs.

Changelog v7:
- Rebased against v4.9-rc2
- Drop two merged patches
- Rework the patch which fix a bug that resources's size is changed when
   using resource_alignment
- Add a patch that fix a bug for IOV BARs when using resource_alignment

Changelog v6:
- Remove the option "noresize@" of resource_alignment

Changelog v5:
- Rebased against v4.8-rc6
- Drop the patch that forbidding disable memory decoding in
   pci_reassigndev_resource_alignment()

Changelog v4:
- Rebased against v4.8-rc1
- Drop one irrelevant patch
- Drop the patch that adding wildcard to resource_alignment to enforce
   the alignment of all MMIO BARs to be at least PAGE_SIZE
- Change the format of option "noresize" of resource_alignment
- Code style improvements

Changelog v3:
- Ignore enforced alignment to fixed BARs
- Fix issue that disabling memory decoding when reassigning the alignment
- Only enable default alignment on PowerNV platform

Changelog v2:
- Ignore enforced alignment to VF BARs on pci_reassigndev_resource_alignment()

Yongji Xie (3):
   PCI: Ignore requested alignment for IOV BARs
   PCI: Restore resource's size if we expand it by using resource_alignment
   PCI: Add a macro to set default alignment for all PCI devices

  arch/powerpc/include/asm/pci.h |4 
  drivers/pci/pci.c  |6 +-
  drivers/pci/setup-bus.c|   19 +++
  include/linux/pci.h|1 +
  4 files changed, 29 insertions(+), 1 deletion(-)

Re: [powerpc v5 0/3] Enable IAMR storage keys for radix

2016-11-14 Thread Michael Ellerman

Balbir Singh  writes:

> The first patch sets up AMOR in hypervisor mode. AMOR
> needs to be setup before IAMR (details of AMOR/IAMR in
> each patch). The second patch enables detection of exceptions
> generated due to instruction fetch violations caused
> and OOPSs' the task. The third patch enables IAMR for
> both hypervisor and guest kernels.
>
> I've tested with patch series with a sample hack and
> payload.
>
> Chris Smart helped with the series, reviewing and
> providing valuable feedback
>
> Changelog
>   Remove __init annotation for iamr and amor init
>
> Balbir Singh (3):
>   Setup AMOR in HV mode
>   Enable storage keys for radix - user mode execution

These should be "powerpc/mm/radix: ...".

>   Detect instruction fetch denied and report

And that should be "powerpc/mm: ..."

cheers

Re: [powerpc v5 2/3] Detect instruction fetch denied and report

2016-11-14 Thread Michael Ellerman

Balbir Singh  writes:

> ISA 3 allows for prevention of instruction fetch and execution
> of user mode pages. If such an error occurs, SRR1 bit 35
> reports the error. We catch and report the error in do_page_fault()
>
> Signed-off-by: Balbir Singh 
> ---
>  arch/powerpc/mm/fault.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index d0b137d..1e7ff7b 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -404,6 +404,10 @@ int do_page_fault(struct pt_regs *regs, unsigned long 
> address,
>   (cpu_has_feature(CPU_FTR_NOEXECUTE) ||
>!(vma->vm_flags & (VM_READ | VM_WRITE
>   goto bad_area;
> +
> + if (regs->msr & SRR1_ISI_N_OR_G)
> + goto bad_area;

Can you move that check above the more complicated check. It shouldn't
change anything in practice, but makes it easier to follow the code
because the easy cases can be discarded.

cheers

Re: [powerpc v4 1/3] Setup AMOR in HV mode

2016-11-14 Thread Michael Ellerman

Balbir Singh  writes:

> diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
> index ed7bddc..7343573 100644
> --- a/arch/powerpc/mm/pgtable-radix.c
> +++ b/arch/powerpc/mm/pgtable-radix.c
> @@ -320,6 +320,25 @@ static void update_hid_for_radix(void)
>   cpu_relax();
>  }
>  
> +/*
> + * In HV mode, we init AMOR so that the hypervisor
> + * and guest can setup IMAR, enable key 0 and set
> + * it to 1
> + * AMOR = 110000 (Mask for key 0 is 11)
> + */
> +static void __init radix_init_amor(void)

This can't be __init because you call it from the secondary version,
which can run after init due to CPU hotplug.

> +{
> + unsigned long amor_mask = 0xc000;
> + unsigned long amor = mfspr(SPRN_AMOR);

You don't use the amor value.

> + if (cpu_has_feature(CPU_FTR_POWER9_DD1))
> + return;

This needs a comment explaining why we're not doing it on DD1.

> +
> + amor = amor_mask;
> +
> + mtspr(SPRN_AMOR, amor);

Just move the constant directly.



cheers

Re: [PATCH v6 3/3] powerpc/pseries: Implement indexed-count hotplug memory remove

2016-11-14 Thread Michael Roth

Quoting Nathan Fontenot (2016-10-18 12:21:06)
> From: Sahil Mehta 
> 
> Indexed-count remove for memory hotplug guarantees that a contiguous block
> of  lmbs beginning at a specified  will be unassigned (NOT
> that  lmbs will be removed). Because of Qemu's per-DIMM memory
> management, the removal of a contiguous block of memory currently
> requires a series of individual calls. Indexed-count remove reduces
> this series into a single call.
> 
> Signed-off-by: Sahil Mehta 
> Signed-off-by: Nathan Fontenot 
> ---
> v2: -use u32s drc_index and count instead of u32 ic[]
>  in dlpar_memory
> v3: -add logic to handle invalid drc_index input
> v4: -none
> v5: -Update for() loop to start at start_index
> v6: -none
> 
>  arch/powerpc/platforms/pseries/hotplug-memory.c |   90 
> +++
>  1 file changed, 90 insertions(+)
> 
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
> b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index badc66d..19ad081 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -558,6 +558,92 @@ static int dlpar_memory_remove_by_index(u32 drc_index, 
> struct property *prop)
> return rc;
>  }
> 
> +static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index,
> +struct property *prop)
> +{
> +   struct of_drconf_cell *lmbs;
> +   u32 num_lmbs, *p;
> +   int i, rc, start_lmb_found;
> +   int lmbs_available = 0, start_index = 0, end_index;
> +
> +   pr_info("Attempting to hot-remove %u LMB(s) at %x\n",
> +   lmbs_to_remove, drc_index);
> +
> +   if (lmbs_to_remove == 0)
> +   return -EINVAL;
> +
> +   p = prop->value;
> +   num_lmbs = *p++;
> +   lmbs = (struct of_drconf_cell *)p;
> +   start_lmb_found = 0;
> +
> +   /* Navigate to drc_index */
> +   while (start_index < num_lmbs) {
> +   if (lmbs[start_index].drc_index == drc_index) {
> +   start_lmb_found = 1;
> +   break;
> +   }
> +
> +   start_index++;
> +   }
> +
> +   if (!start_lmb_found)
> +   return -EINVAL;
> +
> +   end_index = start_index + lmbs_to_remove;
> +
> +   /* Validate that there are enough LMBs to satisfy the request */
> +   for (i = start_index; i < end_index; i++) {
> +   if (lmbs[i].flags & DRCONF_MEM_RESERVED)
> +   break;
> +
> +   lmbs_available++;
> +   }
> +
> +   if (lmbs_available < lmbs_to_remove)
> +   return -EINVAL;
> +
> +   for (i = start_index; i < end_index; i++) {
> +   if (!(lmbs[i].flags & DRCONF_MEM_ASSIGNED))
> +   continue;
> +
> +   rc = dlpar_remove_lmb(&lmbs[i]);

dlpar_remove_lmb() currently does both offlining of the memory as well
as releasing the LMB back to the platform, but the specification for
hotplug notifications has the following verbage regarding
indexed-count/count identifiers:

'When using “drc count” or “drc count indexed” as the Hotplug Identifier,
the OS should take steps to verify the entirety of the request can be
satisfied before proceeding with the hotplug / unplug operations. If
only a partial count can be satisfied, the OS should ignore the entirety
of the request. If the OS cannot determine this beforehand, it should
satisfy the hotplug / unplug request for as many of the requested
resources as possible, and attempt to revert to the original OS / DRC
state.'

So doing the dlpar_remove->dlapr_add in case of failure is in line with
the spec, but it should only be done as a last-resort. To me that
suggests that we should be attempting to offline all the LMBs
beforehand, and only after that's successful should we begin attempting
to release LMBs back to the platform. Should we consider introducing
that logic in the patchset? Or maybe as a follow-up?

> +   if (rc)
> +   break;
> +
> +   lmbs[i].reserved = 1;
> +   }
> +
> +   if (rc) {
> +   pr_err("Memory indexed-count-remove failed, adding any 
> removed LMBs\n");
> +
> +   for (i = start_index; i < end_index; i++) {
> +   if (!lmbs[i].reserved)
> +   continue;
> +
> +   rc = dlpar_add_lmb(&lmbs[i]);
> +   if (rc)
> +   pr_err("Failed to add LMB, drc index %x\n",
> +  be32_to_cpu(lmbs[i].drc_index));
> +
> +   lmbs[i].reserved = 0;
> +   }
> +   rc = -EINVAL;
> +   } else {
> +   for (i = start_index; i < end_index; i++) {
> +   if (!lmbs[i].reserved)
> +   continue;
> +
> +   pr_info("Memory at %llx (drc i

Re: [PATCH 4/7] powerpc/64: tool to check head sections location sanity

2016-11-14 Thread Michael Ellerman

Nicholas Piggin  writes:

> diff --git a/arch/powerpc/Makefile.postlink b/arch/powerpc/Makefile.postlink
> index 1725e64..b8fe12b 100644
> --- a/arch/powerpc/Makefile.postlink
> +++ b/arch/powerpc/Makefile.postlink
> @@ -24,6 +27,9 @@ endif
>  
>  vmlinux: FORCE
>   @true
> +ifeq ($(CONFIG_PPC64),y)

You can just use:

ifdef CONFIG_PPC64

> + $(call cmd,head_check)
> +endif
>  ifeq ($(CONFIG_RELOCATABLE),y)
>   $(call if_changed,relocs_check)
>  endif
> @@ -32,6 +38,7 @@ endif
>   @true
>  
>  clean:
> + rm -f .tmp_symbols.txt
>   @true

We shouldn't need the true anymore should we?

> diff --git a/arch/powerpc/tools/head_check.sh 
> b/arch/powerpc/tools/head_check.sh
> new file mode 100755
> index 000..9635fe7
> --- /dev/null
> +++ b/arch/powerpc/tools/head_check.sh
> @@ -0,0 +1,69 @@
> +#!/bin/sh

We run this explicitly via $(CONFIG_SHELL), so having a shebang here is
redundant and also a little confusing. I added "-x" here, to turn on
tracing, but it doesn't take effect, so I think better to just drop the
line. If anyone wants to run it manually they can just pass it to sh.

> +# Copyright © 2016 IBM Corporation
> +
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License
> +# as published by the Free Software Foundation; either version
> +# 2 of the License, or (at your option) any later version.
> +
> +# This script checks the head of a vmlinux for linker stubs that
> +# break our placement of fixed-location code for 64-bit.
> +
> +# based on relocs_check.pl
> +# Copyright © 2009 IBM Corporation
> +
> +# READ THIS
> +#
> +# If the build dies here, it's likely code in head_64.S or nearby is
> +# referencing labels it can't reach, which results in the linker inserting
> +# stubs without the assembler's knowledge. This can move code around in ways
> +# that break the fixed location placement stuff (head-64.h). To debug,
> +# disassemble the vmlinux and look for branch stubs (long_branch, plt_branch
> +# etc) in the fixed section region (0 - 0x8000ish). Check what places are
> +# calling those stubs.
> +#
> +# Linker stubs use the TOC pointer, so even if fixed section code could
> +# tolerate them being inserted into head code, they can't be allowed in low
> +# level entry code (boot, interrupt vectors, etc) until r2 is set up. This
> +# could cause the kernel to die in early boot.

Can you add:

# Turn this on if you want more debug output:
# set -x

> +
> +if [ $# -lt 2 ]; then
> + echo "$0 [path to nm] [path to vmlinux]" 1>&2
> + exit 1
> +fi
> +
> +# Have Kbuild supply the path to nm so we handle cross compilation.
> +nm="$1"
> +vmlinux="$2"
> +
> +nm "$vmlinux" | grep -e " T _stext$" -e " t start_first_256B$" -e " a 
> text_start$" -e " t start_text$" -m4 > .tmp_symbols.txt

You don't use $nm there.

> +
> +
> +vma=$(cat .tmp_symbols.txt | grep " T _stext$" | cut -d' ' -f1)
> +
> +expected_start_head_addr=$vma
> +
> +start_head_addr=$(cat .tmp_symbols.txt | grep " t start_first_256B$" | cut 
> -d' ' -f1)
> +
> +if [ "$start_head_addr" != "$expected_start_head_addr" ]; then
> + echo "ERROR: head code starts at $start_head_addr, should be 0"
> + echo "ERROR: see comments in arch/powerpc/tools/head_check.sh"

This is blowing up for me with ppc64e_defconfig.

It says:

ERROR: start_text address is c0001100, should be c200


cheers

[PATCH v1 3/3] powerpc: fix node_possible_map limitations

2016-11-14 Thread Balbir Singh

We've fixed the memory hotplug issue with memcg, hence
this work around should not be required.

Fixes: commit 3af229f2071f
("powerpc/numa: Reset node_possible_map to only node_online_map")

Signed-off-by: Balbir Singh 
---
 arch/powerpc/mm/numa.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index a51c188..ca8c2ab 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -916,13 +916,6 @@ void __init initmem_init(void)
 
memblock_dump_all();
 
-   /*
-* Reduce the possible NUMA nodes to the online NUMA nodes,
-* since we do not support node hotplug. This ensures that  we
-* lower the maximum NUMA node ID to what is actually present.
-*/
-   nodes_and(node_possible_map, node_possible_map, node_online_map);
-
for_each_online_node(nid) {
unsigned long start_pfn, end_pfn;
 
-- 
2.5.5

[PATCH v1 2/3] Move from all possible nodes to online nodes

2016-11-14 Thread Balbir Singh

Move routines that do operations on all nodes to
just the online nodes. Most of the changes are
very obvious (like the ones related to soft limit tree
per node)

Signed-off-by: Balbir Singh 
---
 mm/memcontrol.c | 28 +++-
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 5585fce..cc49fa2 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -497,7 +497,7 @@ static void mem_cgroup_remove_from_trees(struct mem_cgroup 
*memcg)
struct mem_cgroup_per_node *mz;
int nid;
 
-   for_each_node(nid) {
+   for_each_online_node(nid) {
mz = mem_cgroup_nodeinfo(memcg, nid);
mctz = soft_limit_tree_node(nid);
mem_cgroup_remove_exceeded(mz, mctz);
@@ -895,7 +895,7 @@ static void invalidate_reclaim_iterators(struct mem_cgroup 
*dead_memcg)
int i;
 
while ((memcg = parent_mem_cgroup(memcg))) {
-   for_each_node(nid) {
+   for_each_online_node(nid) {
mz = mem_cgroup_nodeinfo(memcg, nid);
for (i = 0; i <= DEF_PRIORITY; i++) {
iter = &mz->iter[i];
@@ -4146,7 +4146,7 @@ static void mem_cgroup_free(struct mem_cgroup *memcg)
int node;
 
memcg_wb_domain_exit(memcg);
-   for_each_node(node)
+   for_each_online_node(node)
free_mem_cgroup_per_node_info(memcg, node);
free_percpu(memcg->stat);
kfree(memcg);
@@ -4175,7 +4175,7 @@ static struct mem_cgroup *mem_cgroup_alloc(void)
if (!memcg->stat)
goto fail;
 
-   for_each_node(node)
+   for_each_online_node(node)
if (alloc_mem_cgroup_per_node_info(memcg, node))
goto fail;
 
@@ -5774,11 +5774,21 @@ __setup("cgroup.memory=", cgroup_memory);
 static void memcg_node_offline(int node)
 {
struct mem_cgroup *memcg;
+   struct mem_cgroup_tree_per_node *rtpn;
+   struct mem_cgroup_tree_per_node *mctz;
+   struct mem_cgroup_per_node *mz;
 
if (node < 0)
return;
 
+   rtpn = soft_limit_tree.rb_tree_per_node[node];
+   kfree(rtpn);
+
for_each_mem_cgroup(memcg) {
+   mz = mem_cgroup_nodeinfo(memcg, node);
+   mctz = soft_limit_tree_node(node);
+   mem_cgroup_remove_exceeded(mz, mctz);
+
free_mem_cgroup_per_node_info(memcg, node);
mem_cgroup_may_update_nodemask(memcg);
}
@@ -5787,10 +5797,18 @@ static void memcg_node_offline(int node)
 static void memcg_node_online(int node)
 {
struct mem_cgroup *memcg;
+   struct mem_cgroup_tree_per_node *rtpn;
 
if (node < 0)
return;
 
+   rtpn = kzalloc_node(sizeof(*rtpn), GFP_KERNEL,
+   node_online(node) ? node : NUMA_NO_NODE);
+
+   rtpn->rb_root = RB_ROOT;
+   spin_lock_init(&rtpn->lock);
+   soft_limit_tree.rb_tree_per_node[node] = rtpn;
+
for_each_mem_cgroup(memcg) {
alloc_mem_cgroup_per_node_info(memcg, node);
mem_cgroup_may_update_nodemask(memcg);
@@ -5854,7 +5872,7 @@ static int __init mem_cgroup_init(void)
INIT_WORK(&per_cpu_ptr(&memcg_stock, cpu)->work,
  drain_local_stock);
 
-   for_each_node(node) {
+   for_each_online_node(node) {
struct mem_cgroup_tree_per_node *rtpn;
 
rtpn = kzalloc_node(sizeof(*rtpn), GFP_KERNEL,
-- 
2.5.5

[PATCH v1 1/3] Add basic infrastructure for memcg hotplug support

2016-11-14 Thread Balbir Singh

The lack of hotplug support makes us allocate all memory
upfront for per node data structures. With large number
of cgroups this can be an overhead. PPC64 actually limits
n_possible nodes to n_online to avoid some of this overhead.

This patch adds the basic notifiers to listen to hotplug
events and does the allocation and free of those structures
per cgroup. We walk every cgroup per event, its a trade-off
of allocating upfront vs allocating on demand and freeing
on offline.

Signed-off-by: Balbir Singh 
---
 mm/memcontrol.c | 68 ++---
 1 file changed, 60 insertions(+), 8 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 91dfc7c..5585fce 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -63,6 +63,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 #include 
 #include 
@@ -1342,6 +1343,10 @@ int mem_cgroup_select_victim_node(struct mem_cgroup 
*memcg)
 {
return 0;
 }
+
+static void mem_cgroup_may_update_nodemask(struct mem_cgroup *memcg)
+{
+}
 #endif
 
 static int mem_cgroup_soft_reclaim(struct mem_cgroup *root_memcg,
@@ -4115,14 +4120,7 @@ static int alloc_mem_cgroup_per_node_info(struct 
mem_cgroup *memcg, int node)
 {
struct mem_cgroup_per_node *pn;
int tmp = node;
-   /*
-* This routine is called against possible nodes.
-* But it's BUG to call kmalloc() against offline node.
-*
-* TODO: this routine can waste much memory for nodes which will
-*   never be onlined. It's better to use memory hotplug callback
-*   function.
-*/
+
if (!node_state(node, N_NORMAL_MEMORY))
tmp = -1;
pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, tmp);
@@ -5773,6 +5771,59 @@ static int __init cgroup_memory(char *s)
 }
 __setup("cgroup.memory=", cgroup_memory);
 
+static void memcg_node_offline(int node)
+{
+   struct mem_cgroup *memcg;
+
+   if (node < 0)
+   return;
+
+   for_each_mem_cgroup(memcg) {
+   free_mem_cgroup_per_node_info(memcg, node);
+   mem_cgroup_may_update_nodemask(memcg);
+   }
+}
+
+static void memcg_node_online(int node)
+{
+   struct mem_cgroup *memcg;
+
+   if (node < 0)
+   return;
+
+   for_each_mem_cgroup(memcg) {
+   alloc_mem_cgroup_per_node_info(memcg, node);
+   mem_cgroup_may_update_nodemask(memcg);
+   }
+}
+
+static int memcg_memory_hotplug_callback(struct notifier_block *self,
+   unsigned long action, void *arg)
+{
+   struct memory_notify *marg = arg;
+   int node = marg->status_change_nid;
+
+   switch (action) {
+   case MEM_GOING_OFFLINE:
+   case MEM_CANCEL_ONLINE:
+   memcg_node_offline(node);
+   break;
+   case MEM_GOING_ONLINE:
+   case MEM_CANCEL_OFFLINE:
+   memcg_node_online(node);
+   break;
+   case MEM_ONLINE:
+   case MEM_OFFLINE:
+   break;
+   }
+   return NOTIFY_OK;
+}
+
+static struct notifier_block memcg_memory_hotplug_nb __meminitdata = {
+   .notifier_call = memcg_memory_hotplug_callback,
+   .priority = IPC_CALLBACK_PRI,
+};
+
 /*
  * subsys_initcall() for memory controller.
  *
@@ -5797,6 +5848,7 @@ static int __init mem_cgroup_init(void)
 #endif
 
hotcpu_notifier(memcg_cpu_hotplug_callback, 0);
+   register_hotmemory_notifier(&memcg_memory_hotplug_nb);
 
for_each_possible_cpu(cpu)
INIT_WORK(&per_cpu_ptr(&memcg_stock, cpu)->work,
-- 
2.5.5

[v1 0/3] Support memory cgroup hotplug

2016-11-14 Thread Balbir Singh

In the absence of hotplug we use extra memory proportional to
(possible_nodes - online_nodes) * number_of_cgroups. PPC64 has a patch
to disable large consumption with large number of cgroups. This patch
adds hotplug support to memory cgroups and reverts the commit that
limited possible nodes to online nodes.

Cc: Tejun Heo 
Cc: Andrew Morton 

I've tested this patches under a VM with two nodes and movable
nodes enabled. I've offlined nodes and checked that the system
and cgroups with tasks deep in the hierarchy continue to work
fine.

Balbir Singh (3):
  Add basic infrastructure for memcg hotplug support
  Move from all possible nodes to online nodes
  powerpc: fix node_possible_map limitations

 arch/powerpc/mm/numa.c |  7 
 mm/memcontrol.c| 96 +++---
 2 files changed, 83 insertions(+), 20 deletions(-)

-- 
2.5.5

Re: [PATCH v7 4/5] of/fdt: mark hotpluggable memory

2016-11-14 Thread Balbir Singh



On 15/11/16 09:02, Reza Arbab wrote:
> When movable nodes are enabled, any node containing only hotpluggable
> memory is made movable at boot time.
> 
> On x86, hotpluggable memory is discovered by parsing the ACPI SRAT,
> making corresponding calls to memblock_mark_hotplug().
> 
> If we introduce a dt property to describe memory as hotpluggable,
> configs supporting early fdt may then also do this marking and use
> movable nodes.
> 
> Signed-off-by: Reza Arbab 
> Tested-by: Balbir Singh 

Also

Acked-by: Balbir Singh

Re: [PATCH v7 2/5] mm: remove x86-only restriction of movable_node

2016-11-14 Thread Balbir Singh



On 15/11/16 09:02, Reza Arbab wrote:
> In commit c5320926e370 ("mem-hotplug: introduce movable_node boot
> option"), the memblock allocation direction is changed to bottom-up and
> then back to top-down like this:
> 
> 1. memblock_set_bottom_up(true), called by cmdline_parse_movable_node().
> 2. memblock_set_bottom_up(false), called by x86's numa_init().
> 
> Even though (1) occurs in generic mm code, it is wrapped by #ifdef
> CONFIG_MOVABLE_NODE, which depends on X86_64.
> 
> This means that when we extend CONFIG_MOVABLE_NODE to non-x86 arches,
> things will be unbalanced. (1) will happen for them, but (2) will not.
> 
> This toggle was added in the first place because x86 has a delay between
> adding memblocks and marking them as hotpluggable. Since other arches do
> this marking either immediately or not at all, they do not require the
> bottom-up toggle.
> 
> So, resolve things by moving (1) from cmdline_parse_movable_node() to
> x86's setup_arch(), immediately after the movable_node parameter has
> been parsed.
> 
> Signed-off-by: Reza Arbab 


Acked-by: Balbir Singh

Re: [PATCH 1/2] usb: dwc2: add amcc,dwc-otg support

2016-11-14 Thread John Youn

On 11/11/2016 3:12 PM, Christian Lamparter wrote:
> On Friday, November 11, 2016 2:20:42 PM CET John Youn wrote:
>> On 11/11/2016 2:05 PM, Christian Lamparter wrote:
>>> On Friday, November 11, 2016 1:22:16 PM CET John Youn wrote:
 On 11/11/2016 12:59 PM, Christian Lamparter wrote:
> This patch adds support for the "amcc,usb-otg" device
> which is found in the PowerPC Canyonlands' dts.
>
> The device definition was added by:
> commit c89b3458d8cc ("powerpc/44x: Add USB DWC DTS entry to Canyonlands 
> board")'
> but without any driver support as the dwc2 driver wasn't
> available at that time.
>
> Note: The system can't use the generic "snps,dwc2" compatible
> because of the special ahbcfg configuration. The default
> GAHBCFG_HBSTLEN_INCR4 of snps,dwc2 can cause a system hang
> when the USB and SATA is used concurrently.

 I don't want to add any more of these param structures to the driver
 unless really necessary. We're trying to remove usage of them in favor
 of using auto-detected defaults and device properties to override
 them.
>>> Ok, thanks. I think that would work. I've attached an updated patch.
>>> Can it be applied/queued now? Or do you want me to resent it later?
>>>
 The AHB Burst is actually one of the ones we were going to do next
 because our platform also doesn't work well with INCR4. In fact I'm
 thinking of making the default INCR.
>>> Is that actually possible to change the default still? This would
>>> require to re-evaluate all existing archs/platforms that use 
>>> "snps,dwc2" for INCR16 compatibility. 
>>
>> INCR, not INCR16, but you're right, so we may not change it even
>> though though INCR is usually the right choice over INCR4.
> What about making a device-tree property?

Yes, that's what I meant. I'll send a change for this shortly.

> 
> Recommended properties:
>  - g-ahb-bursts : specifies the ahb bursts length, should be one of
>"single", "INCRx", "INCR4", "INCR8", or "INCR16". If not specified
>the safer but inefficient "INCR4" is used. The optimal setting is
>"INCRx".
> 
> Would this work? If so, I can make a patch over the weekend.
>> Anyways, with the binding, can't you just set the compatible string to
>> snps,dwc2?
> 
> Ah, let me explain. I had a discussion with Mark Rutland and Rob Herring
> a while back about device-tree bindings.
> 
> They made it very clear to me, that they don't want any generic "catch all
> compatible" strings:
> 
> "Bindings should be for hardware (either specific device models, or for
> classes), and not for Linux drivers. The latter is subject to arbitrary
> changes while the former is not, as old hardware continues to exist and
> does not change while drivers get completely reworked." [0]
> 
> Furthermore, this is an existing binding in kernel's canyonlands.dts [1]
> and this binding can't be easily changed. Rob Herring explained this in
> the context of the "basic-mmio-gpio" patch [2] when I was editing the dts
> to make them work with the changes I made:
> 
> "You can't remove the old drivers as they are needed to work with 
> old dtbs, so there is no gain.
> 
> You would need to match on existing compatibles such as
> moxa,moxart-gpio and provide a match data struct that has all the info
> you are adding here (e.g. data register offset). Then additionally you
> could add "basic-mmio-gpio" (I would drop "basic" part) and the
> additional data associated with it. But it has to be new properties,
> not changing properties. Changing the reg values doesn't work."
> 
> So, for this to work with the existing canyonlands.dts, I need to have
> the "amcc,dwc-otg" compatible string.

Ok, if that's the case. But still a bit confused as to what driver was
working with it before since the binding was not defined for dwc2.

> 
> Of course, it would be great to hear from Rob Herring and/or Mark Rutland
> about this case.
> 
> Regards,
> Christian
> 
> [0] 
> [1] 
> 
> [2] 
> 
>  
>>>
>>> From what I can tell based would be:
>>> bcm11351, bcm21664, bcm23550, exynos3250, stm32f429, rk3xxx,
>>> stratix10, meson-gxbb, rt3050 and some Altera FPGAs.
>>>
 If that's all you need then a devicetree binding should be enough
 right?
>>> Yes. The device is working fine so far.
>>>
>>> Regards,
>>> Christian
>>>
>>> ---
>>> From 70dd4be016b89655a56bc8260f04683b50f07644 Mon Sep 17 00:00:00 2001
>>> From: Christian Lamparter 
>>> Date: Sun, 6 Nov 2016 00:39:24 +0100
>>> Subject: [PATCH] usb: dwc2: add amcc,dwc-otg support
>>>
>>> This patch adds support for the "amcc,usb-otg" device
>>> which is found in the PowerPC Canyonlands' dts.
>>>
>>> The device definition was added by:
>>> commit c89b3458d8cc ("powerpc/44x: Add USB DWC DTS entry to Canyonlands 
>>> board")'
>>> but without any

Re: [PATCH 2/2] usb: dwc2: fixes host_dma logic

2016-11-14 Thread John Youn

On 11/11/2016 12:59 PM, Christian Lamparter wrote:
> This patch moves the the host_dma initialization
> before dwc2_set_param_dma_desc_enable and
> dwc2_set_param_dma_desc_fs_enable. The reason being
> that both function need it.
> 
> Fixes: 1205489cee75bf39 ("usb: dwc2: Get host DMA device properties")

This should probably be omitted since it's only in Felipe's
testing/next.

Otherwise looks good.

Acked-by: John Youn 

Regards,
John


> 
> Cc: John Youn 
> Cc: Felipe Balbi 
> Signed-off-by: Christian Lamparter 
> ---
>  drivers/usb/dwc2/params.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/usb/dwc2/params.c b/drivers/usb/dwc2/params.c
> index 5d822c5..222a83c 100644
> --- a/drivers/usb/dwc2/params.c
> +++ b/drivers/usb/dwc2/params.c
> @@ -1157,9 +1157,6 @@ static void dwc2_set_parameters(struct dwc2_hsotg 
> *hsotg,
>   bool dma_capable = !(hw->arch == GHWCFG2_SLAVE_ONLY_ARCH);
>  
>   dwc2_set_param_otg_cap(hsotg, params->otg_cap);
> - dwc2_set_param_dma_desc_enable(hsotg, params->dma_desc_enable);
> - dwc2_set_param_dma_desc_fs_enable(hsotg, params->dma_desc_fs_enable);
> -
>   if ((hsotg->dr_mode == USB_DR_MODE_HOST) ||
>   (hsotg->dr_mode == USB_DR_MODE_OTG)) {
>   bool disable;
> @@ -1174,6 +1171,8 @@ static void dwc2_set_parameters(struct dwc2_hsotg 
> *hsotg,
>   !disable, false,
>   dma_capable);
>   }
> + dwc2_set_param_dma_desc_enable(hsotg, params->dma_desc_enable);
> + dwc2_set_param_dma_desc_fs_enable(hsotg, params->dma_desc_fs_enable);
>  
>   dwc2_set_param_host_support_fs_ls_low_power(hsotg,
>   params->host_support_fs_ls_low_power);
>

[PATCH v7 4/5] of/fdt: mark hotpluggable memory

2016-11-14 Thread Reza Arbab

When movable nodes are enabled, any node containing only hotpluggable
memory is made movable at boot time.

On x86, hotpluggable memory is discovered by parsing the ACPI SRAT,
making corresponding calls to memblock_mark_hotplug().

If we introduce a dt property to describe memory as hotpluggable,
configs supporting early fdt may then also do this marking and use
movable nodes.

Signed-off-by: Reza Arbab 
Tested-by: Balbir Singh 
---
 drivers/of/fdt.c   | 19 +++
 include/linux/of_fdt.h |  1 +
 mm/Kconfig |  2 +-
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index c89d5d2..c9b5cac 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -1015,6 +1015,7 @@ int __init early_init_dt_scan_memory(unsigned long node, 
const char *uname,
const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
const __be32 *reg, *endp;
int l;
+   bool hotpluggable;
 
/* We are scanning "memory" nodes only */
if (type == NULL) {
@@ -1034,6 +1035,7 @@ int __init early_init_dt_scan_memory(unsigned long node, 
const char *uname,
return 0;
 
endp = reg + (l / sizeof(__be32));
+   hotpluggable = of_get_flat_dt_prop(node, "hotpluggable", NULL);
 
pr_debug("memory scan node %s, reg size %d,\n", uname, l);
 
@@ -1049,6 +1051,13 @@ int __init early_init_dt_scan_memory(unsigned long node, 
const char *uname,
(unsigned long long)size);
 
early_init_dt_add_memory_arch(base, size);
+
+   if (!hotpluggable)
+   continue;
+
+   if (early_init_dt_mark_hotplug_memory_arch(base, size))
+   pr_warn("failed to mark hotplug range 0x%llx - 
0x%llx\n",
+   base, base + size);
}
 
return 0;
@@ -1146,6 +1155,11 @@ void __init __weak early_init_dt_add_memory_arch(u64 
base, u64 size)
memblock_add(base, size);
 }
 
+int __init __weak early_init_dt_mark_hotplug_memory_arch(u64 base, u64 size)
+{
+   return memblock_mark_hotplug(base, size);
+}
+
 int __init __weak early_init_dt_reserve_memory_arch(phys_addr_t base,
phys_addr_t size, bool nomap)
 {
@@ -1168,6 +1182,11 @@ void __init __weak early_init_dt_add_memory_arch(u64 
base, u64 size)
WARN_ON(1);
 }
 
+int __init __weak early_init_dt_mark_hotplug_memory_arch(u64 base, u64 size)
+{
+   return -ENOSYS;
+}
+
 int __init __weak early_init_dt_reserve_memory_arch(phys_addr_t base,
phys_addr_t size, bool nomap)
 {
diff --git a/include/linux/of_fdt.h b/include/linux/of_fdt.h
index 4341f32..271b3fd 100644
--- a/include/linux/of_fdt.h
+++ b/include/linux/of_fdt.h
@@ -71,6 +71,7 @@ extern int early_init_dt_scan_memory(unsigned long node, 
const char *uname,
 extern void early_init_fdt_scan_reserved_mem(void);
 extern void early_init_fdt_reserve_self(void);
 extern void early_init_dt_add_memory_arch(u64 base, u64 size);
+extern int early_init_dt_mark_hotplug_memory_arch(u64 base, u64 size);
 extern int early_init_dt_reserve_memory_arch(phys_addr_t base, phys_addr_t 
size,
 bool no_map);
 extern void * early_init_dt_alloc_memory_arch(u64 size, u64 align);
diff --git a/mm/Kconfig b/mm/Kconfig
index 061b46b..33a9b06 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -153,7 +153,7 @@ config MOVABLE_NODE
bool "Enable to assign a node which has only movable memory"
depends on HAVE_MEMBLOCK
depends on NO_BOOTMEM
-   depends on X86_64 || MEMORY_HOTPLUG
+   depends on X86_64 || OF_EARLY_FLATTREE || MEMORY_HOTPLUG
depends on NUMA
default n
help
-- 
1.8.3.1

[PATCH v7 1/5] powerpc/mm: allow memory hotplug into a memoryless node

2016-11-14 Thread Reza Arbab

Remove the check which prevents us from hotplugging into an empty node.

The original commit b226e4621245 ("[PATCH] powerpc: don't add memory to
empty node/zone"), states that this was intended to be a temporary measure.
It is a workaround for an oops which no longer occurs.

Signed-off-by: Reza Arbab 
Reviewed-by: Aneesh Kumar K.V 
Acked-by: Balbir Singh 
Acked-by: Michael Ellerman 
Cc: Nathan Fontenot 
Cc: Bharata B Rao 
---
 arch/powerpc/mm/numa.c | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index a51c188..0cb6bd8 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1085,7 +1085,7 @@ static int hot_add_node_scn_to_nid(unsigned long scn_addr)
 int hot_add_scn_to_nid(unsigned long scn_addr)
 {
struct device_node *memory = NULL;
-   int nid, found = 0;
+   int nid;
 
if (!numa_enabled || (min_common_depth < 0))
return first_online_node;
@@ -1101,17 +1101,6 @@ int hot_add_scn_to_nid(unsigned long scn_addr)
if (nid < 0 || !node_online(nid))
nid = first_online_node;
 
-   if (NODE_DATA(nid)->node_spanned_pages)
-   return nid;
-
-   for_each_online_node(nid) {
-   if (NODE_DATA(nid)->node_spanned_pages) {
-   found = 1;
-   break;
-   }
-   }
-
-   BUG_ON(!found);
return nid;
 }
 
-- 
1.8.3.1

[PATCH v7 2/5] mm: remove x86-only restriction of movable_node

2016-11-14 Thread Reza Arbab

In commit c5320926e370 ("mem-hotplug: introduce movable_node boot
option"), the memblock allocation direction is changed to bottom-up and
then back to top-down like this:

1. memblock_set_bottom_up(true), called by cmdline_parse_movable_node().
2. memblock_set_bottom_up(false), called by x86's numa_init().

Even though (1) occurs in generic mm code, it is wrapped by #ifdef
CONFIG_MOVABLE_NODE, which depends on X86_64.

This means that when we extend CONFIG_MOVABLE_NODE to non-x86 arches,
things will be unbalanced. (1) will happen for them, but (2) will not.

This toggle was added in the first place because x86 has a delay between
adding memblocks and marking them as hotpluggable. Since other arches do
this marking either immediately or not at all, they do not require the
bottom-up toggle.

So, resolve things by moving (1) from cmdline_parse_movable_node() to
x86's setup_arch(), immediately after the movable_node parameter has
been parsed.

Signed-off-by: Reza Arbab 
---
 Documentation/kernel-parameters.txt |  2 +-
 arch/x86/kernel/setup.c | 24 
 mm/memory_hotplug.c | 20 
 3 files changed, 25 insertions(+), 21 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 37babf9..adcccd5 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2401,7 +2401,7 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
that the amount of memory usable for all allocations
is not too small.
 
-   movable_node[KNL,X86] Boot-time switch to enable the effects
+   movable_node[KNL] Boot-time switch to enable the effects
of CONFIG_MOVABLE_NODE=y. See mm/Kconfig for details.
 
MTD_Partition=  [MTD]
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 9c337b0..4cfba94 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -985,6 +985,30 @@ void __init setup_arch(char **cmdline_p)
 
parse_early_param();
 
+#ifdef CONFIG_MEMORY_HOTPLUG
+   /*
+* Memory used by the kernel cannot be hot-removed because Linux
+* cannot migrate the kernel pages. When memory hotplug is
+* enabled, we should prevent memblock from allocating memory
+* for the kernel.
+*
+* ACPI SRAT records all hotpluggable memory ranges. But before
+* SRAT is parsed, we don't know about it.
+*
+* The kernel image is loaded into memory at very early time. We
+* cannot prevent this anyway. So on NUMA system, we set any
+* node the kernel resides in as un-hotpluggable.
+*
+* Since on modern servers, one node could have double-digit
+* gigabytes memory, we can assume the memory around the kernel
+* image is also un-hotpluggable. So before SRAT is parsed, just
+* allocate memory near the kernel image to try the best to keep
+* the kernel away from hotpluggable memory.
+*/
+   if (movable_node_is_enabled())
+   memblock_set_bottom_up(true);
+#endif
+
x86_report_nx();
 
/* after early param, so could get panic from serial */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index cad4b91..e43142c1 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1727,26 +1727,6 @@ static bool can_offline_normal(struct zone *zone, 
unsigned long nr_pages)
 static int __init cmdline_parse_movable_node(char *p)
 {
 #ifdef CONFIG_MOVABLE_NODE
-   /*
-* Memory used by the kernel cannot be hot-removed because Linux
-* cannot migrate the kernel pages. When memory hotplug is
-* enabled, we should prevent memblock from allocating memory
-* for the kernel.
-*
-* ACPI SRAT records all hotpluggable memory ranges. But before
-* SRAT is parsed, we don't know about it.
-*
-* The kernel image is loaded into memory at very early time. We
-* cannot prevent this anyway. So on NUMA system, we set any
-* node the kernel resides in as un-hotpluggable.
-*
-* Since on modern servers, one node could have double-digit
-* gigabytes memory, we can assume the memory around the kernel
-* image is also un-hotpluggable. So before SRAT is parsed, just
-* allocate memory near the kernel image to try the best to keep
-* the kernel away from hotpluggable memory.
-*/
-   memblock_set_bottom_up(true);
movable_node_enabled = true;
 #else
pr_warn("movable_node option not supported\n");
-- 
1.8.3.1

[PATCH v7 3/5] mm: enable CONFIG_MOVABLE_NODE on non-x86 arches

2016-11-14 Thread Reza Arbab

To support movable memory nodes (CONFIG_MOVABLE_NODE), at least one of
the following must be true:

1. This config has the capability to identify movable nodes at boot.
   Right now, only x86 can do this.

2. Our config supports memory hotplug, which means that a movable node
   can be created by hotplugging all of its memory into ZONE_MOVABLE.

Fix the Kconfig definition of CONFIG_MOVABLE_NODE, which currently
recognizes (1), but not (2).

Signed-off-by: Reza Arbab 
Reviewed-by: Aneesh Kumar K.V 
Acked-by: Balbir Singh 
---
 mm/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index 86e3e0e..061b46b 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -153,7 +153,7 @@ config MOVABLE_NODE
bool "Enable to assign a node which has only movable memory"
depends on HAVE_MEMBLOCK
depends on NO_BOOTMEM
-   depends on X86_64
+   depends on X86_64 || MEMORY_HOTPLUG
depends on NUMA
default n
help
-- 
1.8.3.1

[PATCH v7 0/5] enable movable nodes on non-x86 configs

2016-11-14 Thread Reza Arbab

This patchset allows more configs to make use of movable nodes. When
CONFIG_MOVABLE_NODE is selected, there are two ways to introduce such
nodes into the system:

1. Discover movable nodes at boot. Currently this is only possible on
   x86, but we will enable configs supporting fdt to do the same.

2. Hotplug and online all of a node's memory using online_movable. This
   is already possible on any config supporting memory hotplug, not
   just x86, but the Kconfig doesn't say so. We will fix that.

We'll also remove some cruft on power which would prevent (2).

/* changelog */

v7:
* Fix error when !CONFIG_HAVE_MEMBLOCK, found by the kbuild test robot.

* Remove the prefix of "linux,hotpluggable". Document the property's purpose.

v6:
* 
http://lkml.kernel.org/r/1478562276-25539-1-git-send-email-ar...@linux.vnet.ibm.com

* Add a patch enabling the fdt to describe hotpluggable memory.

v5:
* 
http://lkml.kernel.org/r/1477339089-5455-1-git-send-email-ar...@linux.vnet.ibm.com

* Drop the patches which recognize the "status" property of dt memory
  nodes. Firmware can set the size of "linux,usable-memory" to zero instead.

v4:
* 
http://lkml.kernel.org/r/1475778995-1420-1-git-send-email-ar...@linux.vnet.ibm.com

* Rename of_fdt_is_available() to of_fdt_device_is_available().
  Rename of_flat_dt_is_available() to of_flat_dt_device_is_available().

* Instead of restoring top-down allocation, ensure it never goes
  bottom-up in the first place, by making movable_node arch-specific.

* Use MEMORY_HOTPLUG instead of PPC64 in the mm/Kconfig patch.

v3:
* 
http://lkml.kernel.org/r/1474828616-16608-1-git-send-email-ar...@linux.vnet.ibm.com

* Use Rob Herring's suggestions to improve the node availability check.

* More verbose commit log in the patch enabling CONFIG_MOVABLE_NODE.

* Add a patch to restore top-down allocation the way x86 does.

v2:
* 
http://lkml.kernel.org/r/1473883618-14998-1-git-send-email-ar...@linux.vnet.ibm.com

* Use the "status" property of standard dt memory nodes instead of
  introducing a new "ibm,hotplug-aperture" compatible id.

* Remove the patch which explicitly creates a memoryless node. This set
  no longer has any bearing on whether the pgdat is created at boot or
  at the time of memory addition.

v1:
* 
http://lkml.kernel.org/r/1470680843-28702-1-git-send-email-ar...@linux.vnet.ibm.com

Reza Arbab (5):
  powerpc/mm: allow memory hotplug into a memoryless node
  mm: remove x86-only restriction of movable_node
  mm: enable CONFIG_MOVABLE_NODE on non-x86 arches
  of/fdt: mark hotpluggable memory
  dt: add documentation of "hotpluggable" memory property

 Documentation/devicetree/booting-without-of.txt |  7 +++
 Documentation/kernel-parameters.txt |  2 +-
 arch/powerpc/mm/numa.c  | 13 +
 arch/x86/kernel/setup.c | 24 
 drivers/of/fdt.c| 19 +++
 include/linux/of_fdt.h  |  1 +
 mm/Kconfig  |  2 +-
 mm/memory_hotplug.c | 20 
 8 files changed, 54 insertions(+), 34 deletions(-)

-- 
1.8.3.1

[PATCH v7 5/5] dt: add documentation of "hotpluggable" memory property

2016-11-14 Thread Reza Arbab

Summarize the "hotpluggable" property of dt memory nodes.

Signed-off-by: Reza Arbab 
---
 Documentation/devicetree/booting-without-of.txt | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/devicetree/booting-without-of.txt 
b/Documentation/devicetree/booting-without-of.txt
index 3f1437f..280d283 100644
--- a/Documentation/devicetree/booting-without-of.txt
+++ b/Documentation/devicetree/booting-without-of.txt
@@ -974,6 +974,13 @@ compatibility.
   4Gb. Some vendors prefer splitting those ranges into smaller
   segments, but the kernel doesn't care.
 
+  Additional properties:
+
+- hotpluggable : The presence of this property provides an explicit
+  hint to the operating system that this memory may potentially be
+  removed later. The kernel can take this into consideration when
+  doing nonmovable allocations and when laying out memory zones.
+
   e) The /chosen node
 
   This node is a bit "special". Normally, that's where Open Firmware
-- 
1.8.3.1

Re: [PATCH -next] powernv: cpufreq: Fix uninitialized lpstate_idx in gpstates_timer_handler

2016-11-14 Thread Rafael J. Wysocki

On Monday, November 14, 2016 05:29:27 PM Akshay Adiga wrote:
> lpstate_idx remains uninitialized in the case when elapsed_time
> is greater than MAX_RAMP_DOWN_TIME. At the end of rampdown
> global pstate should be equal to local pstate.
> 
> Fixes: 20b15b766354 ("cpufreq: powernv: Use PMCR to verify global and
> localpstate")
> Reported-by: Stephen Rothwell 
> Signed-off-by: Akshay Adiga 
> ---
>  drivers/cpufreq/powernv-cpufreq.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/cpufreq/powernv-cpufreq.c 
> b/drivers/cpufreq/powernv-cpufreq.c
> index c82304b..c5c5bc3 100644
> --- a/drivers/cpufreq/powernv-cpufreq.c
> +++ b/drivers/cpufreq/powernv-cpufreq.c
> @@ -624,6 +624,7 @@ void gpstate_timer_handler(unsigned long data)
>  
>   if (gpstates->elapsed_time > MAX_RAMP_DOWN_TIME) {
>   gpstate_idx = pstate_to_idx(freq_data.pstate_id);
> + lpstate_idx = gpstate_idx;
>   reset_gpstates(policy);
>   gpstates->highest_lpstate_idx = gpstate_idx;
>   } else {
> 

Applied.

Thanks,
Rafael

[PATCH v2] Fix loading of module radeonfb on PowerMac

2016-11-14 Thread Mathieu Malaterre

When the linux kernel is build with (typical kernel ship with Debian
installer):

CONFIG_FB_OF=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_FB_RADEON=m

The offb driver takes precedence over module radeonfb. It is then
impossible to load the module, error reported is:

[   96.551486] radeonfb :00:10.0: enabling device (0006 -> 0007)
[   96.551526] radeonfb :00:10.0: BAR 0: can't reserve [mem 
0x9800-0x9fff pref]
[   96.551531] radeonfb (:00:10.0): cannot request region 0.
[   96.551545] radeonfb: probe of :00:10.0 failed with error -16

This patch reproduce the behavior of the module radeon, so as to make it
possible to load radeonfb when offb is first loaded.

It should be noticed that `offb_destroy` is never called which explain the
need to skip error detection on the radeon side.

Signed-off-by: Mathieu Malaterre 
Link: https://bugs.debian.org/826629#57
Link: https://bugzilla.kernel.org/show_bug.cgi?id=119741
Suggested-by: Lennart Sorensen 
---
 drivers/video/fbdev/aty/radeon_base.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/drivers/video/fbdev/aty/radeon_base.c 
b/drivers/video/fbdev/aty/radeon_base.c
index 218339a..84d634b 100644
--- a/drivers/video/fbdev/aty/radeon_base.c
+++ b/drivers/video/fbdev/aty/radeon_base.c
@@ -2259,6 +2259,22 @@ static struct bin_attribute edid2_attr = {
.read   = radeon_show_edid2,
 };
 
+static int radeon_kick_out_firmware_fb(struct pci_dev *pdev)
+{
+   struct apertures_struct *ap;
+
+   ap = alloc_apertures(1);
+   if (!ap)
+   return -ENOMEM;
+
+   ap->ranges[0].base = pci_resource_start(pdev, 0);
+   ap->ranges[0].size = pci_resource_len(pdev, 0);
+
+   remove_conflicting_framebuffers(ap, KBUILD_MODNAME, false);
+   kfree(ap);
+
+   return 0;
+}
 
 static int radeonfb_pci_register(struct pci_dev *pdev,
 const struct pci_device_id *ent)
@@ -2314,19 +2330,27 @@ static int radeonfb_pci_register(struct pci_dev *pdev,
rinfo->fb_base_phys = pci_resource_start (pdev, 0);
rinfo->mmio_base_phys = pci_resource_start (pdev, 2);
 
+   ret = radeon_kick_out_firmware_fb(pdev);
+   if (ret)
+   return ret;
+
/* request the mem regions */
ret = pci_request_region(pdev, 0, "radeonfb framebuffer");
if (ret < 0) {
printk( KERN_ERR "radeonfb (%s): cannot request region 0.\n",
pci_name(rinfo->pdev));
+#ifndef CONFIG_PPC
goto err_release_fb;
+#endif
}
 
ret = pci_request_region(pdev, 2, "radeonfb mmio");
if (ret < 0) {
printk( KERN_ERR "radeonfb (%s): cannot request region 2.\n",
pci_name(rinfo->pdev));
+#ifndef CONFIG_PPC
goto err_release_pci0;
+#endif
}
 
/* map the regions */
@@ -2511,10 +2535,12 @@ static int radeonfb_pci_register(struct pci_dev *pdev,
iounmap(rinfo->mmio_base);
 err_release_pci2:
pci_release_region(pdev, 2);
+#ifndef CONFIG_PPC
 err_release_pci0:
pci_release_region(pdev, 0);
 err_release_fb:
 framebuffer_release(info);
+#endif
 err_disable:
 err_out:
return ret;
-- 
2.1.4

Re: [PATCH v6 4/4] of/fdt: mark hotpluggable memory

2016-11-14 Thread Reza Arbab


On Mon, Nov 14, 2016 at 10:59:43PM +1100, Michael Ellerman wrote:

So I'm not opposed to this, but it is a little vague.

What does the "hotpluggable" property really mean?

Is it just a hint to the operating system? (which may or may not be
Linux).

Or is it a direction, "this memory must be able to be hotunplugged"?

I think you're intending the former, ie. a hint, which is probably OK.
But it needs to be documented clearly.


Yes, you've got it right. It's just a hint, not a mandate.

I'm about to send v7 which adds a description of "hotpluggable" in the 
documentation. Hopefully I've explained it well enough there.


--
Reza Arbab

Re: [PATCH] ps3_gelic: fix spelling mistake in debug message

2016-11-14 Thread David Miller

From: Colin King 
Date: Sat, 12 Nov 2016 17:20:30 +

> From: Colin Ian King 
> 
> Trivial fix to spelling mistake "unmached" to "unmatched" in
> debug message.
> 
> Signed-off-by: Colin Ian King 

Applied.

[PATCH v2 3/4] powerpc/perf: PowerISA v3.0 raw event format encoding

2016-11-14 Thread Madhavan Srinivasan

Patch to update the PowerISA v3.0 raw event encoding format
information and add support for the same in Power9-pmu.c.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/perf/power9-pmu.c | 134 +
 1 file changed, 134 insertions(+)

diff --git a/arch/powerpc/perf/power9-pmu.c b/arch/powerpc/perf/power9-pmu.c
index d1782fd644e9..928d0e739ed4 100644
--- a/arch/powerpc/perf/power9-pmu.c
+++ b/arch/powerpc/perf/power9-pmu.c
@@ -16,6 +16,78 @@
 #include "isa207-common.h"
 
 /*
+ * Raw event encoding for PowerISA v3.0:
+ *
+ *60565248444036   
 32
+ * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - 
- - |
+ *   | | [ ]   [ ] [  thresh_cmp ]   [  thresh_ctl 
  ]
+ *   | |  | | |
+ *   | |  *- IFM (Linux)|thresh start/stop OR FAB match -*
+ *   | *- BHRB (Linux)  *sm
+ *   *- EBB (Linux)
+ *
+ *2824201612 8 4   
  0
+ * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - 
- - |
+ *   [   ] [  sample ]   [cache]   [ pmc ]   [unit ]   []m   [pmcxsel  
  ]
+ * ||   |  | |
+ * ||   |  | *- mark
+ * ||   *- L1/L2/L3 cache_sel  |
+ * ||  |
+ * |*- sampling mode for marked events *- combine
+ * |
+ * *- thresh_sel
+ *
+ * Below uses IBM bit numbering.
+ *
+ * MMCR1[x:y] = unit(PMCxUNIT)
+ * MMCR1[24]   = pmc1combine[0]
+ * MMCR1[25]   = pmc1combine[1]
+ * MMCR1[26]   = pmc2combine[0]
+ * MMCR1[27]   = pmc2combine[1]
+ * MMCR1[28]   = pmc3combine[0]
+ * MMCR1[29]   = pmc3combine[1]
+ * MMCR1[30]   = pmc4combine[0]
+ * MMCR1[31]   = pmc4combine[1]
+ *
+ * if pmc == 3 and unit == 0 and pmcxsel[0:6] == 0b0101011
+ * # PM_MRK_FAB_RSP_MATCH
+ * MMCR1[20:27] = thresh_ctl   (FAB_CRESP_MATCH / FAB_TYPE_MATCH)
+ * else if pmc == 4 and unit == 0xf and pmcxsel[0:6] == 0b0101001
+ * # PM_MRK_FAB_RSP_MATCH_CYC
+ * MMCR1[20:27] = thresh_ctl   (FAB_CRESP_MATCH / FAB_TYPE_MATCH)
+ * else
+ * MMCRA[48:55] = thresh_ctl   (THRESH START/END)
+ *
+ * if thresh_sel:
+ * MMCRA[45:47] = thresh_sel
+ *
+ * if thresh_cmp:
+ * MMCRA[9:11] = thresh_cmp[0:2]
+ * MMCRA[12:18] = thresh_cmp[3:9]
+ *
+ * if unit == 6 or unit == 7
+ * MMCRC[53:55] = cache_sel[1:3]  (L2EVENT_SEL)
+ * else if unit == 8 or unit == 9:
+ * if cache_sel[0] == 0: # L3 bank
+ * MMCRC[47:49] = cache_sel[1:3]  (L3EVENT_SEL0)
+ * else if cache_sel[0] == 1:
+ * MMCRC[50:51] = cache_sel[2:3]  (L3EVENT_SEL1)
+ * else if cache_sel[1]: # L1 event
+ * MMCR1[16] = cache_sel[2]
+ * MMCR1[17] = cache_sel[3]
+ *
+ * if mark:
+ * MMCRA[63]= 1(SAMPLE_ENABLE)
+ * MMCRA[57:59] = sample[0:2]  (RAND_SAMP_ELIG)
+ * MMCRA[61:62] = sample[3:4]  (RAND_SAMP_MODE)
+ *
+ * if EBB and BHRB:
+ * MMCRA[32:33] = IFM
+ *
+ * MMCRA[SDAR_MODE]  = sm
+ */
+
+/*
  * Some power9 event codes.
  */
 #define EVENT(_name, _code)_name = _code,
@@ -99,6 +171,48 @@ static const struct attribute_group 
*power9_isa207_pmu_attr_groups[] = {
NULL,
 };
 
+PMU_FORMAT_ATTR(event, "config:0-51");
+PMU_FORMAT_ATTR(pmcxsel,   "config:0-7");
+PMU_FORMAT_ATTR(mark,  "config:8");
+PMU_FORMAT_ATTR(combine,   "config:10-11");
+PMU_FORMAT_ATTR(unit,  "config:12-15");
+PMU_FORMAT_ATTR(pmc,   "config:16-19");
+PMU_FORMAT_ATTR(cache_sel, "config:20-23");
+PMU_FORMAT_ATTR(sample_mode,   "config:24-28");
+PMU_FORMAT_ATTR(thresh_sel,"config:29-31");
+PMU_FORMAT_ATTR(thresh_stop,   "config:32-35");
+PMU_FORMAT_ATTR(thresh_start,  "config:36-39");
+PMU_FORMAT_ATTR(thresh_cmp,"config:40-49");
+PMU_FORMAT_ATTR(sdar_mode, "config:50-51");
+
+static struct attribute *power9_pmu_format_attr[] = {
+   &format_attr_event.attr,
+   &format_attr_pmcxsel.attr,
+   &format_attr_mark.attr,
+   &format_attr_combine.attr,
+   &format_attr_unit.attr,
+   &format_attr_pmc.attr,
+   &format_attr_cache_sel.attr,
+   &format_attr_sample_mode.attr,
+   &format_attr_thresh_sel.attr,
+   &format_attr_thresh_stop.attr,
+   &format_attr_thresh_start.attr,
+   &format_attr_thresh_cmp.attr,
+   &format_attr_sdar_mode.attr,
+   NULL,
+};
+
+static struct attribute_group power9_pmu_format_group = {
+   .name = "format",
+   .attrs = power9_pmu_format_attr,
+};
+
+static const struct attribute_group *power9_pmu_attr_groups[] = {
+   &power9_pmu_format_group,
+   &power9_pmu_events_group,
+   NULL,
+};
+
 static int power9_generic_events[] = {
[PERF_COUNT_HW_CPU_CYCLES] =

[PATCH v2 1/4] powerpc/perf: factor out the event format field

2016-11-14 Thread Madhavan Srinivasan

Factor out the format field structure for PowerISA v2.07.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/perf/isa207-common.c | 34 ++
 arch/powerpc/perf/power8-pmu.c| 39 ---
 arch/powerpc/perf/power9-pmu.c| 39 ---
 3 files changed, 42 insertions(+), 70 deletions(-)

diff --git a/arch/powerpc/perf/isa207-common.c 
b/arch/powerpc/perf/isa207-common.c
index 6143c99f3ec5..2a2040ea5f99 100644
--- a/arch/powerpc/perf/isa207-common.c
+++ b/arch/powerpc/perf/isa207-common.c
@@ -12,6 +12,40 @@
  */
 #include "isa207-common.h"
 
+PMU_FORMAT_ATTR(event, "config:0-49");
+PMU_FORMAT_ATTR(pmcxsel,   "config:0-7");
+PMU_FORMAT_ATTR(mark,  "config:8");
+PMU_FORMAT_ATTR(combine,   "config:11");
+PMU_FORMAT_ATTR(unit,  "config:12-15");
+PMU_FORMAT_ATTR(pmc,   "config:16-19");
+PMU_FORMAT_ATTR(cache_sel, "config:20-23");
+PMU_FORMAT_ATTR(sample_mode,   "config:24-28");
+PMU_FORMAT_ATTR(thresh_sel,"config:29-31");
+PMU_FORMAT_ATTR(thresh_stop,   "config:32-35");
+PMU_FORMAT_ATTR(thresh_start,  "config:36-39");
+PMU_FORMAT_ATTR(thresh_cmp,"config:40-49");
+
+struct attribute *isa207_pmu_format_attr[] = {
+   &format_attr_event.attr,
+   &format_attr_pmcxsel.attr,
+   &format_attr_mark.attr,
+   &format_attr_combine.attr,
+   &format_attr_unit.attr,
+   &format_attr_pmc.attr,
+   &format_attr_cache_sel.attr,
+   &format_attr_sample_mode.attr,
+   &format_attr_thresh_sel.attr,
+   &format_attr_thresh_stop.attr,
+   &format_attr_thresh_start.attr,
+   &format_attr_thresh_cmp.attr,
+   NULL,
+};
+
+struct attribute_group isa207_pmu_format_group = {
+   .name = "format",
+   .attrs = isa207_pmu_format_attr,
+};
+
 static inline bool event_is_fab_match(u64 event)
 {
/* Only check pmc, unit and pmcxsel, ignore the edge bit (0) */
diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index ab830d106ec5..d07186382f3a 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -30,6 +30,9 @@ enum {
 #definePOWER8_MMCRA_IFM2   0x8000UL
 #definePOWER8_MMCRA_IFM3   0xC000UL
 
+/* PowerISA v2.07 format attribute structure*/
+extern struct attribute_group isa207_pmu_format_group;
+
 /* Table of alternatives, sorted by column 0 */
 static const unsigned int event_alternatives[][MAX_ALT] = {
{ PM_MRK_ST_CMPL,   PM_MRK_ST_CMPL_ALT },
@@ -175,42 +178,8 @@ static struct attribute_group power8_pmu_events_group = {
.attrs = power8_events_attr,
 };
 
-PMU_FORMAT_ATTR(event, "config:0-49");
-PMU_FORMAT_ATTR(pmcxsel,   "config:0-7");
-PMU_FORMAT_ATTR(mark,  "config:8");
-PMU_FORMAT_ATTR(combine,   "config:11");
-PMU_FORMAT_ATTR(unit,  "config:12-15");
-PMU_FORMAT_ATTR(pmc,   "config:16-19");
-PMU_FORMAT_ATTR(cache_sel, "config:20-23");
-PMU_FORMAT_ATTR(sample_mode,   "config:24-28");
-PMU_FORMAT_ATTR(thresh_sel,"config:29-31");
-PMU_FORMAT_ATTR(thresh_stop,   "config:32-35");
-PMU_FORMAT_ATTR(thresh_start,  "config:36-39");
-PMU_FORMAT_ATTR(thresh_cmp,"config:40-49");
-
-static struct attribute *power8_pmu_format_attr[] = {
-   &format_attr_event.attr,
-   &format_attr_pmcxsel.attr,
-   &format_attr_mark.attr,
-   &format_attr_combine.attr,
-   &format_attr_unit.attr,
-   &format_attr_pmc.attr,
-   &format_attr_cache_sel.attr,
-   &format_attr_sample_mode.attr,
-   &format_attr_thresh_sel.attr,
-   &format_attr_thresh_stop.attr,
-   &format_attr_thresh_start.attr,
-   &format_attr_thresh_cmp.attr,
-   NULL,
-};
-
-static struct attribute_group power8_pmu_format_group = {
-   .name = "format",
-   .attrs = power8_pmu_format_attr,
-};
-
 static const struct attribute_group *power8_pmu_attr_groups[] = {
-   &power8_pmu_format_group,
+   &isa207_pmu_format_group,
&power8_pmu_events_group,
NULL,
 };
diff --git a/arch/powerpc/perf/power9-pmu.c b/arch/powerpc/perf/power9-pmu.c
index 8e9a81967ff8..443511b18bc5 100644
--- a/arch/powerpc/perf/power9-pmu.c
+++ b/arch/powerpc/perf/power9-pmu.c
@@ -31,6 +31,9 @@ enum {
 #define POWER9_MMCRA_IFM2  0x8000UL
 #define POWER9_MMCRA_IFM3  0xC000UL
 
+/* PowerISA v2.07 format attribute structure*/
+extern struct attribute_group isa207_pmu_format_group;
+
 GENERIC_EVENT_ATTR(cpu-cycles, PM_CYC);
 GENERIC_EVENT_ATTR(stalled-cycles-frontend,PM_ICT_NOSLOT_CYC);
 GENERIC_EVENT_ATTR(stalled-cycles-backend, PM_CMPLU_STALL);
@@ -90,42 +93,8 @@ static struct attribute_group power9_pmu_events_group = {
.attrs = power9_events_attr,
 };
 
-PMU_FORMAT_ATTR(event, "config:0-49");
-PMU_FORMAT_ATTR(pmcxsel,   "config:0-7");
-PMU_FORMAT_ATTR(

[PATCH v2 4/4] powerpc/perf: macros for PowerISA v3.0 format encoding

2016-11-14 Thread Madhavan Srinivasan

Patch to add macros and contants to support the PowerISA v3.0 raw
event encoding format. Couple of functions added since some of the
bits fields like PMCxCOMB and THRESH_CMP has different width and location
within MMCR* in PowerISA v3.0.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/perf/isa207-common.c | 90 ---
 arch/powerpc/perf/isa207-common.h | 27 +++-
 2 files changed, 109 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/perf/isa207-common.c 
b/arch/powerpc/perf/isa207-common.c
index 2a2040ea5f99..e747bbf06661 100644
--- a/arch/powerpc/perf/isa207-common.c
+++ b/arch/powerpc/perf/isa207-common.c
@@ -55,6 +55,81 @@ static inline bool event_is_fab_match(u64 event)
return (event == 0x30056 || event == 0x4f052);
 }
 
+static bool is_event_valid(u64 event)
+{
+   if (cpu_has_feature(CPU_FTR_ARCH_300) &&
+   (cpu_has_feature(CPU_FTR_POWER9_DD1)) &&
+   (event & ~EVENT_VALID_MASK))
+   return false;
+   else if (cpu_has_feature(CPU_FTR_ARCH_300) &&
+   (event & ~ISA300_EVENT_VALID_MASK))
+   return false;
+   else if (event & ~EVENT_VALID_MASK)
+   return false;
+
+   return true;
+}
+
+static u64 mmcra_sdar_mode(u64 event)
+{
+   u64 sm;
+
+   if (cpu_has_feature(CPU_FTR_ARCH_300) &&
+  (cpu_has_feature(CPU_FTR_POWER9_DD1))) {
+   goto sm_tlb;
+   } else if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+   sm = (event >> ISA300_SDAR_MODE_SHIFT) & ISA300_SDAR_MODE_MASK;
+   if (sm)
+   return sm<> EVENT_COMBINE_SHIFT) & EVENT_COMBINE_MASK;
+   else if (cpu_has_feature(CPU_FTR_ARCH_300))
+   combine = (event >> ISA300_EVENT_COMBINE_SHIFT) & 
ISA300_EVENT_COMBINE_MASK;
+   else
+   combine = (event >> EVENT_COMBINE_SHIFT) & EVENT_COMBINE_MASK;
+
+   return combine;
+}
+
+static unsigned long combine_shift(unsigned long pmc)
+{
+   if (cpu_has_feature(CPU_FTR_ARCH_300) &&
+  (cpu_has_feature(CPU_FTR_POWER9_DD1)))
+   goto comb_shift;
+   else if (cpu_has_feature(CPU_FTR_ARCH_300))
+   return ISA300_MMCR1_COMBINE_SHIFT(pmc);
+   else
+   goto comb_shift;
+
+comb_shift:
+   return MMCR1_COMBINE_SHIFT(pmc);
+}
+
 int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp)
 {
unsigned int unit, pmc, cache, ebb;
@@ -62,7 +137,7 @@ int isa207_get_constraint(u64 event, unsigned long *maskp, 
unsigned long *valp)
 
mask = value = 0;
 
-   if (event & ~EVENT_VALID_MASK)
+   if (!is_event_valid(event))
return -1;
 
pmc   = (event >> EVENT_PMC_SHIFT)& EVENT_PMC_MASK;
@@ -189,15 +264,13 @@ int isa207_compute_mmcr(u64 event[], int n_ev,
pmc_inuse |= 1 << pmc;
}
 
-   /* In continuous sampling mode, update SDAR on TLB miss */
-   mmcra = MMCRA_SDAR_MODE_TLB;
-   mmcr1 = mmcr2 = 0;
+   mmcra = mmcr1 = mmcr2 = 0;
 
/* Second pass: assign PMCs, set all MMCR1 fields */
for (i = 0; i < n_ev; ++i) {
pmc = (event[i] >> EVENT_PMC_SHIFT) & EVENT_PMC_MASK;
unit= (event[i] >> EVENT_UNIT_SHIFT) & EVENT_UNIT_MASK;
-   combine = (event[i] >> EVENT_COMBINE_SHIFT) & 
EVENT_COMBINE_MASK;
+   combine = combine_from_event(event[i]);
psel=  event[i] & EVENT_PSEL_MASK;
 
if (!pmc) {
@@ -211,10 +284,13 @@ int isa207_compute_mmcr(u64 event[], int n_ev,
 
if (pmc <= 4) {
mmcr1 |= unit << MMCR1_UNIT_SHIFT(pmc);
-   mmcr1 |= combine << MMCR1_COMBINE_SHIFT(pmc);
+   mmcr1 |= combine << combine_shift(pmc);
mmcr1 |= psel << MMCR1_PMCSEL_SHIFT(pmc);
}
 
+   /* In continuous sampling mode, update SDAR on TLB miss */
+   mmcra |= mmcra_sdar_mode(event[i]);
+
if (event[i] & EVENT_IS_L1) {
cache = event[i] >> EVENT_CACHE_SEL_SHIFT;
mmcr1 |= (cache & 1) << MMCR1_IC_QUAL_SHIFT;
@@ -245,7 +321,7 @@ int isa207_compute_mmcr(u64 event[], int n_ev,
val = (event[i] >> EVENT_THR_SEL_SHIFT) & 
EVENT_THR_SEL_MASK;
mmcra |= val << MMCRA_THR_SEL_SHIFT;
val = (event[i] >> EVENT_THR_CMP_SHIFT) & 
EVENT_THR_CMP_MASK;
-   mmcra |= val << MMCRA_THR_CMP_SHIFT;
+   mmcra |= thresh_cmp_val(val);
}
 
if (event[i] & EVENT_WANTS_BHRB) {
diff --git a/arch/powerpc/perf/isa207-common.h 
b/arch/powerpc/perf/isa207-common.h
index 4d0a4e5017c2..0a240635cf48 100644
--- a/arch/powerpc/perf/isa207-common.h
+++ b/arch/powerpc/perf/isa207-common.h
@@ -134,6 +134,24 @@
 PERF_SAMPLE_BRANCH_K

[PATCH v2 2/4] powerpc/perf: update attribute_group data structure

2016-11-14 Thread Madhavan Srinivasan

Rename the power_pmu and attribute_group variables that
support PowerISA v2.07. Add a cpu feature flag check to pick
the PowerISA v2.07 format structures to support.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/perf/power9-pmu.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/perf/power9-pmu.c b/arch/powerpc/perf/power9-pmu.c
index 443511b18bc5..d1782fd644e9 100644
--- a/arch/powerpc/perf/power9-pmu.c
+++ b/arch/powerpc/perf/power9-pmu.c
@@ -93,7 +93,7 @@ static struct attribute_group power9_pmu_events_group = {
.attrs = power9_events_attr,
 };
 
-static const struct attribute_group *power9_pmu_attr_groups[] = {
+static const struct attribute_group *power9_isa207_pmu_attr_groups[] = {
&isa207_pmu_format_group,
&power9_pmu_events_group,
NULL,
@@ -260,7 +260,7 @@ static int 
power9_cache_events[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
 
 #undef C
 
-static struct power_pmu power9_pmu = {
+static struct power_pmu power9_isa207_pmu = {
.name   = "POWER9",
.n_counter  = MAX_PMU_COUNTERS,
.add_fields = ISA207_ADD_FIELDS,
@@ -274,7 +274,7 @@ static struct power_pmu power9_pmu = {
.n_generic  = ARRAY_SIZE(power9_generic_events),
.generic_events = power9_generic_events,
.cache_events   = &power9_cache_events,
-   .attr_groups= power9_pmu_attr_groups,
+   .attr_groups= power9_isa207_pmu_attr_groups,
.bhrb_nr= 32,
 };
 
@@ -287,7 +287,10 @@ static int __init init_power9_pmu(void)
strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc64/power9"))
return -ENODEV;
 
-   rc = register_power_pmu(&power9_pmu);
+   if (cpu_has_feature(CPU_FTR_POWER9_DD1)) {
+   rc = register_power_pmu(&power9_isa207_pmu);
+   }
+
if (rc)
return rc;
 
-- 
2.7.4

[PATCH v2 0/4] Support PowerISA v3.0 PMU Raw event format

2016-11-14 Thread Madhavan Srinivasan

Patchset to factor out the PowerISA v2.07 PMU raw event
format encoding and add support to the PowerISA v3.0 PMU
raw event format encoding.

Changelog v1:
1) Initialized "mmcra" variable to avoid compile time errors
2) Made changes to commit message

Madhavan Srinivasan (4):
  powerpc/perf: factor out the event format field
  powerpc/perf: update attribute_group data structure
  powerpc/perf: PowerISA v3.0 raw event format encoding
  powerpc/perf: macros for PowerISA v3.0 format encoding

 arch/powerpc/perf/isa207-common.c | 124 +++---
 arch/powerpc/perf/isa207-common.h |  27 -
 arch/powerpc/perf/power8-pmu.c|  39 ++--
 arch/powerpc/perf/power9-pmu.c| 112 +-
 4 files changed, 256 insertions(+), 46 deletions(-)

-- 
2.7.4

[PATCH v3 01/11] powerpc: Add #defs for paca->soft_enabled flags

2016-11-14 Thread Madhavan Srinivasan

Two #defs IRQ_DISABLE_LEVEL_NONE and IRQ_DISABLE_LEVEL_LINUX
are added to be used when updating paca->soft_enabled.
Replace the hardcoded values used when updating
paca->soft_enabled with IRQ_DISABLE_MASK_* #def.
No logic change.

Reviewed-by: Nicholas Piggin 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/exception-64s.h |  2 +-
 arch/powerpc/include/asm/hw_irq.h| 21 ++---
 arch/powerpc/include/asm/irqflags.h  |  6 +++---
 arch/powerpc/include/asm/kvm_ppc.h   |  2 +-
 arch/powerpc/kernel/entry_64.S   | 16 
 arch/powerpc/kernel/exceptions-64e.S |  6 +++---
 arch/powerpc/kernel/head_64.S|  5 +++--
 arch/powerpc/kernel/idle_book3e.S|  3 ++-
 arch/powerpc/kernel/idle_power4.S|  3 ++-
 arch/powerpc/kernel/irq.c|  9 +
 arch/powerpc/kernel/process.c|  3 ++-
 arch/powerpc/kernel/setup_64.c   |  3 +++
 arch/powerpc/kernel/time.c   |  2 +-
 arch/powerpc/mm/hugetlbpage.c|  2 +-
 arch/powerpc/perf/core-book3s.c  |  2 +-
 15 files changed, 50 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 84d49b197c32..53ba672fe9dc 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -403,7 +403,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 
 #define __SOFTEN_TEST(h, vec)  \
lbz r10,PACASOFTIRQEN(r13); \
-   cmpwi   r10,0;  \
+   cmpwi   r10,IRQ_DISABLE_MASK_LINUX; \
li  r10,SOFTEN_VALUE_##vec; \
beq masked_##h##interrupt
 
diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index eba60416536e..05b81bca15e9 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -27,6 +27,12 @@
 #define PACA_IRQ_EE_EDGE   0x10 /* BookE only */
 #define PACA_IRQ_HMI   0x20
 
+/*
+ * flags for paca->soft_enabled
+ */
+#define IRQ_DISABLE_MASK_NONE  1
+#define IRQ_DISABLE_MASK_LINUX 0
+
 #endif /* CONFIG_PPC64 */
 
 #ifndef __ASSEMBLY__
@@ -58,9 +64,10 @@ static inline unsigned long arch_local_irq_disable(void)
unsigned long flags, zero;
 
asm volatile(
-   "li %1,0; lbz %0,%2(13); stb %1,%2(13)"
+   "li %1,%3; lbz %0,%2(13); stb %1,%2(13)"
: "=r" (flags), "=&r" (zero)
-   : "i" (offsetof(struct paca_struct, soft_enabled))
+   : "i" (offsetof(struct paca_struct, soft_enabled)),\
+ "i" (IRQ_DISABLE_MASK_LINUX)
: "memory");
 
return flags;
@@ -70,7 +77,7 @@ extern void arch_local_irq_restore(unsigned long);
 
 static inline void arch_local_irq_enable(void)
 {
-   arch_local_irq_restore(1);
+   arch_local_irq_restore(IRQ_DISABLE_MASK_NONE);
 }
 
 static inline unsigned long arch_local_irq_save(void)
@@ -80,7 +87,7 @@ static inline unsigned long arch_local_irq_save(void)
 
 static inline bool arch_irqs_disabled_flags(unsigned long flags)
 {
-   return flags == 0;
+   return flags == IRQ_DISABLE_MASK_LINUX;
 }
 
 static inline bool arch_irqs_disabled(void)
@@ -100,9 +107,9 @@ static inline bool arch_irqs_disabled(void)
u8 _was_enabled;\
__hard_irq_disable();   \
_was_enabled = local_paca->soft_enabled;\
-   local_paca->soft_enabled = 0;   \
+   local_paca->soft_enabled = IRQ_DISABLE_MASK_LINUX;\
local_paca->irq_happened |= PACA_IRQ_HARD_DIS;  \
-   if (_was_enabled)   \
+   if (_was_enabled == IRQ_DISABLE_MASK_NONE)  \
trace_hardirqs_off();   \
 } while(0)
 
@@ -125,7 +132,7 @@ static inline void may_hard_irq_enable(void)
 
 static inline bool arch_irq_disabled_regs(struct pt_regs *regs)
 {
-   return !regs->softe;
+   return (regs->softe == IRQ_DISABLE_MASK_LINUX);
 }
 
 extern bool prep_irq_for_idle(void);
diff --git a/arch/powerpc/include/asm/irqflags.h 
b/arch/powerpc/include/asm/irqflags.h
index f2149066fe5d..d0ed2a7d7d10 100644
--- a/arch/powerpc/include/asm/irqflags.h
+++ b/arch/powerpc/include/asm/irqflags.h
@@ -48,8 +48,8 @@
 #define RECONCILE_IRQ_STATE(__rA, __rB)\
lbz __rA,PACASOFTIRQEN(r13);\
lbz __rB,PACAIRQHAPPENED(r13);  \
-   cmpwi   cr0,__rA,0; \
-   li  __rA,0; \
+   cmpwi   cr0,__rA,IRQ_DISABLE_MASK_LINUX;\
+   li  __rA,IRQ_DISABLE_MASK_LINUX;\
ori __rB,__rB,PACA_IRQ_HARD_DIS;\
stb __rB,PACAIRQHAPPENED(r13);  \
beq 44f;

[PATCH v3 06/11] powerpc: Avoid using EXCEPTION_PROLOG_1 macro in MASKABLE_*

2016-11-14 Thread Madhavan Srinivasan

Currently we use both EXCEPTION_PROLOG_1 and __EXCEPTION_PROLOG_1
in the MASKABLE_* macros. As a cleanup, this patch makes MASKABLE_*
to use only __EXCEPTION_PROLOG_1. There is not logic change.

Reviewed-by: Nicholas Piggin 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/exception-64s.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 53ba672fe9dc..bcd5a7f7dafe 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -440,7 +440,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
EXC_STD, SOFTEN_TEST_PR)
 
 #define MASKABLE_EXCEPTION_PSERIES_OOL(vec, label) \
-   EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_TEST_PR, vec);\
+   __EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_TEST_PR, vec);  \
EXCEPTION_PROLOG_PSERIES_1(label, EXC_STD)
 
 #define MASKABLE_EXCEPTION_HV(loc, vec, label) \
@@ -448,7 +448,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
EXC_HV, SOFTEN_TEST_HV)
 
 #define MASKABLE_EXCEPTION_HV_OOL(vec, label)  \
-   EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_TEST_HV, vec);\
+   __EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_TEST_HV, vec);  \
EXCEPTION_PROLOG_PSERIES_1(label, EXC_HV)
 
 #define __MASKABLE_RELON_EXCEPTION_PSERIES(vec, label, h, extra)   \
@@ -469,7 +469,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
  EXC_HV, SOFTEN_NOTEST_HV)
 
 #define MASKABLE_RELON_EXCEPTION_HV_OOL(vec, label)\
-   EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_NOTEST_HV, vec);  \
+   __EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_NOTEST_HV, vec);\
EXCEPTION_PROLOG_PSERIES_1(label, EXC_HV)
 
 /*
-- 
2.7.4

[PATCH v3 04/11] powerpc: Add soft_enabled manipulation functions

2016-11-14 Thread Madhavan Srinivasan

Add new soft_enabled_* manipulation function and implement
arch_local_* using the soft_enabled_* wrappers.

Reviewed-by: Nicholas Piggin 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/hw_irq.h | 32 ++--
 1 file changed, 14 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index 88f6a8e2b5e3..c292ef4b4bc5 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -62,21 +62,7 @@ static inline notrace void soft_enabled_set(unsigned long 
enable)
: "memory");
 }
 
-static inline notrace unsigned long soft_enabled_set_return(unsigned long 
enable)
-{
-   unsigned long flags;
-
-   asm volatile(
-   "lbz %0,%1(13); stb %2,%1(13)"
-   : "=r" (flags)
-   : "i" (offsetof(struct paca_struct, soft_enabled)),\
- "r" (enable)
-   : "memory");
-
-   return flags;
-}
-
-static inline unsigned long arch_local_save_flags(void)
+static inline notrace unsigned long soft_enabled_return(void)
 {
unsigned long flags;
 
@@ -88,20 +74,30 @@ static inline unsigned long arch_local_save_flags(void)
return flags;
 }
 
-static inline unsigned long arch_local_irq_disable(void)
+static inline notrace unsigned long soft_enabled_set_return(unsigned long 
enable)
 {
unsigned long flags, zero;
 
asm volatile(
-   "li %1,%3; lbz %0,%2(13); stb %1,%2(13)"
+   "mr %1,%3; lbz %0,%2(13); stb %1,%2(13)"
: "=r" (flags), "=&r" (zero)
: "i" (offsetof(struct paca_struct, soft_enabled)),\
- "i" (IRQ_DISABLE_MASK_LINUX)
+ "r" (enable)
: "memory");
 
return flags;
 }
 
+static inline unsigned long arch_local_save_flags(void)
+{
+   return soft_enabled_return();
+}
+
+static inline unsigned long arch_local_irq_disable(void)
+{
+   return soft_enabled_set_return(IRQ_DISABLE_MASK_LINUX);
+}
+
 extern void arch_local_irq_restore(unsigned long);
 
 static inline void arch_local_irq_enable(void)
-- 
2.7.4

[PATCH v3 11/11] powerpc: rewrite local_t using soft_irq

2016-11-14 Thread Madhavan Srinivasan

Local atomic operations are fast and highly reentrant per CPU counters.
Used for percpu variable updates. Local atomic operations only guarantee
variable modification atomicity wrt the CPU which owns the data and
these needs to be executed in a preemption safe way.

Here is the design of this patch. Since local_* operations
are only need to be atomic to interrupts (IIUC), we have two options.
Either replay the "op" if interrupted or replay the interrupt after
the "op". Initial patchset posted was based on implementing local_* operation
based on CR5 which replay's the "op". Patchset had issues in case of
rewinding the address pointor from an array. This make the slow patch
really slow. Since CR5 based implementation proposed using __ex_table to find
the rewind addressr, this rasied concerns about size of __ex_table and vmlinux.

https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-December/123115.html

But this patch uses, powerpc_local_irq_pmu_save to soft_disable
interrupts (including PMIs). After finishing the "op", 
powerpc_local_irq_pmu_restore()
called and correspondingly interrupts are replayed if any occured.

patch re-write the current local_* functions to use arch_local_irq_disbale.
Base flow for each function is

{
powerpc_local_irq_pmu_save(flags)
load
..
store
powerpc_local_irq_pmu_restore(flags)
}

Reason for the approach is that, currently l[w/d]arx/st[w/d]cx.
instruction pair is used for local_* operations, which are heavy
on cycle count and they dont support a local variant. So to
see whether the new implementation helps, used a modified
version of Rusty's benchmark code on local_t.

https://lkml.org/lkml/2008/12/16/450

Modifications to Rusty's benchmark code:
- Executed only local_t test

Here are the values with the patch.

Time in ns per iteration

Local_t Without Patch   With Patch

_inc28  8
_add28  8
_read   3   3
_add_return 28  7

Currently only asm/local.h has been rewrite, and also
the entire change is tested only in PPC64 (pseries guest)
and PPC64 host (LE)

Reviewed-by: Nicholas Piggin 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/local.h | 201 +++
 1 file changed, 201 insertions(+)

diff --git a/arch/powerpc/include/asm/local.h b/arch/powerpc/include/asm/local.h
index b8da91363864..7d117c07b0b1 100644
--- a/arch/powerpc/include/asm/local.h
+++ b/arch/powerpc/include/asm/local.h
@@ -3,6 +3,9 @@
 
 #include 
 #include 
+#include 
+
+#include 
 
 typedef struct
 {
@@ -14,6 +17,202 @@ typedef struct
 #define local_read(l)  atomic_long_read(&(l)->a)
 #define local_set(l,i) atomic_long_set(&(l)->a, (i))
 
+#ifdef CONFIG_PPC64
+
+static __inline__ void local_add(long i, local_t *l)
+{
+   long t;
+   unsigned long flags;
+
+   powerpc_local_irq_pmu_save(flags);
+   __asm__ __volatile__(
+   PPC_LL" %0,0(%2)\n\
+   add %0,%1,%0\n"
+   PPC_STL" %0,0(%2)\n"
+   : "=&r" (t)
+   : "r" (i), "r" (&(l->a.counter)));
+   powerpc_local_irq_pmu_restore(flags);
+}
+
+static __inline__ void local_sub(long i, local_t *l)
+{
+   long t;
+   unsigned long flags;
+
+   powerpc_local_irq_pmu_save(flags);
+   __asm__ __volatile__(
+   PPC_LL" %0,0(%2)\n\
+   subf%0,%1,%0\n"
+   PPC_STL" %0,0(%2)\n"
+   : "=&r" (t)
+   : "r" (i), "r" (&(l->a.counter)));
+   powerpc_local_irq_pmu_restore(flags);
+}
+
+static __inline__ long local_add_return(long a, local_t *l)
+{
+   long t;
+   unsigned long flags;
+
+   powerpc_local_irq_pmu_save(flags);
+   __asm__ __volatile__(
+   PPC_LL" %0,0(%2)\n\
+   add %0,%1,%0\n"
+   PPC_STL "%0,0(%2)\n"
+   : "=&r" (t)
+   : "r" (a), "r" (&(l->a.counter))
+   : "memory");
+   powerpc_local_irq_pmu_restore(flags);
+
+   return t;
+}
+
+#define local_add_negative(a, l)   (local_add_return((a), (l)) < 0)
+
+static __inline__ long local_sub_return(long a, local_t *l)
+{
+   long t;
+   unsigned long flags;
+
+   powerpc_local_irq_pmu_save(flags);
+   __asm__ __volatile__(
+   PPC_LL" %0,0(%2)\n\
+   subf%0,%1,%0\n"
+   PPC_STL "%0,0(%2)\n"
+   : "=&r" (t)
+   : "r" (a), "r" (&(l->a.counter))
+   : "memory");
+   powerpc_local_irq_pmu_restore(flags);
+
+   return t;
+}
+
+static __inline__ long local_inc_return(local_t *l)
+{
+   long t;
+   unsigned long flags;
+
+   powerpc_local_irq_pmu_save(flags);
+   __asm__ __volatile__(
+   PPC_LL" %0,0(%1)\n\
+   addic   %0,%0,1\n"
+   PPC_STL "%0,0(%1)\n"
+   : "=&r" (t)
+   : "r" (&(l->a.counter))
+   : "xer", "memory");
+   powerpc_local_irq_pmu_restore(flags);
+
+   return t;
+}
+
+/*
+ * local_inc_and_test - increment and test
+ * @l: pointer of typ

[PATCH v3 09/11] powerpc:Add new kconfig IRQ_DEBUG_SUPPORT

2016-11-14 Thread Madhavan Srinivasan

New Kconfig is added "CONFIG_IRQ_DEBUG_SUPPORT" to add warn_on
to alert the invalid transitions. Also moved the code under
the CONFIG_TRACE_IRQFLAGS in arch_local_irq_restore() to new Kconfig.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/Kconfig  | 4 
 arch/powerpc/kernel/irq.c | 4 ++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 65fba4c34cd7..d19eee83b864 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -46,6 +46,10 @@ config TRACE_IRQFLAGS_SUPPORT
bool
default y
 
+config IRQ_DEBUG_SUPPORT
+   bool
+   default n
+
 config LOCKDEP_SUPPORT
bool
default y
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index c030ccd0a7aa..299b55071612 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -263,7 +263,7 @@ notrace void arch_local_irq_restore(unsigned long en)
 */
if (unlikely(irq_happened != PACA_IRQ_HARD_DIS))
__hard_irq_disable();
-#ifdef CONFIG_TRACE_IRQFLAGS
+#ifdef CONFIG_IRQ_DEBUG_SUPPORT
else {
/*
 * We should already be hard disabled here. We had bugs
@@ -274,7 +274,7 @@ notrace void arch_local_irq_restore(unsigned long en)
if (WARN_ON(mfmsr() & MSR_EE))
__hard_irq_disable();
}
-#endif /* CONFIG_TRACE_IRQFLAGS */
+#endif /* CONFIG_IRQ_DEBUG_SUPPORT */
 
soft_enabled_set(IRQ_DISABLE_MASK_LINUX);
 
-- 
2.7.4

[PATCH v3 10/11] powerpc: Add new set of soft_enabled_ functions

2016-11-14 Thread Madhavan Srinivasan

To support disabling and enabling of irq with PMI, set of
new powerpc_local_irq_pmu_save() and powerpc_local_irq_restore()
functions are added. And powerpc_local_irq_save() implemented,
by adding a new soft_enabled manipulation function soft_enabled_or_return().
Local_irq_pmu_* macros are provided to access these powerpc_local_irq_pmu*
functions which includes trace_hardirqs_on|off() to match what we
have in include/linux/irqflags.h.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/hw_irq.h | 62 ++-
 arch/powerpc/kernel/irq.c |  4 +++
 2 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index ac9a17ea1d66..7be90b73a943 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -90,6 +90,20 @@ static inline notrace unsigned long 
soft_enabled_set_return(unsigned long enable
return flags;
 }
 
+static inline notrace unsigned long soft_enabled_or_return(unsigned long 
enable)
+{
+   unsigned long flags, zero;
+
+   asm volatile(
+   "mr %1,%3; lbz %0,%2(13); or %1,%0,%1; stb %1,%2(13)"
+   : "=r" (flags), "=&r"(zero)
+   : "i" (offsetof(struct paca_struct, soft_enabled)),\
+"r" (enable)
+   : "memory");
+
+   return flags;
+}
+
 static inline unsigned long arch_local_save_flags(void)
 {
return soft_enabled_return();
@@ -114,7 +128,7 @@ static inline unsigned long arch_local_irq_save(void)
 
 static inline bool arch_irqs_disabled_flags(unsigned long flags)
 {
-   return flags == IRQ_DISABLE_MASK_LINUX;
+   return flags & IRQ_DISABLE_MASK_LINUX;
 }
 
 static inline bool arch_irqs_disabled(void)
@@ -122,6 +136,52 @@ static inline bool arch_irqs_disabled(void)
return arch_irqs_disabled_flags(arch_local_save_flags());
 }
 
+/*
+ * To support disabling and enabling of irq with PMI, set of
+ * new powerpc_local_irq_pmu_save() and powerpc_local_irq_restore()
+ * functions are added. These macros are implemented using generic
+ * linux local_irq_* code from include/linux/irqflags.h.
+ */
+#define raw_local_irq_pmu_save(flags)  \
+   do {\
+   typecheck(unsigned long, flags);\
+   flags = soft_enabled_or_return(IRQ_DISABLE_MASK_LINUX | \
+   IRQ_DISABLE_MASK_PMU);  \
+   } while(0)
+
+#define raw_local_irq_pmu_restore(flags)   \
+   do {\
+   typecheck(unsigned long, flags);\
+   arch_local_irq_restore(flags);  \
+   } while(0)
+
+#ifdef CONFIG_TRACE_IRQFLAGS
+#define powerpc_local_irq_pmu_save(flags)  \
+do {   \
+   raw_local_irq_pmu_save(flags);  \
+   trace_hardirqs_off();   \
+   } while(0)
+#define powerpc_local_irq_pmu_restore(flags)   \
+   do {\
+   if (raw_irqs_disabled_flags(flags)) {   \
+   raw_local_irq_pmu_restore(flags);   \
+   trace_hardirqs_off();   \
+   } else {\
+   trace_hardirqs_on();\
+   raw_local_irq_pmu_restore(flags);   \
+   }   \
+   } while(0)
+#else
+#define powerpc_local_irq_pmu_save(flags)  \
+   do {\
+   raw_local_irq_pmu_save(flags);  \
+   } while(0)
+#define powerpc_local_irq_pmu_restore(flags)   \
+   do {\
+   raw_local_irq_pmu_restore(flags);   \
+   } while (0)
+#endif  /* CONFIG_TRACE_IRQFLAGS */
+
 #ifdef CONFIG_PPC_BOOK3E
 #define __hard_irq_enable()asm volatile("wrteei 1" : : : "memory")
 #define __hard_irq_disable()   asm volatile("wrteei 0" : : : "memory")
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 299b55071612..9010f996e238 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -227,6 +227,10 @@ notrace void arch_local_irq_restore(unsigned long en)
unsigned char irq_happened;
unsigned int replay;
 
+#ifdef CONFIG_IRQ_DEBUG_SUPPORT
+   WARN_ON(en & local_paca->soft_enabled & ~IRQ_DISABLE_MASK_LINUX);
+#endif
+
/* Write the new soft-enabled value */
soft_enabled_set(en);
 
-- 
2.7.4

[PATCH v3 08/11] powerpc: Add support to mask perf interrupts and replay them

2016-11-14 Thread Madhavan Srinivasan

New bit mask field "IRQ_DISABLE_MASK_PMU" is introduced to support
the masking of PMI.

Couple of new irq #defs "PACA_IRQ_PMI" and "SOFTEN_VALUE_0xf0*" added to
use in the exception code to check for PMI interrupts.

In the masked_interrupt handler, for PMIs we reset the MSR[EE]
and return. In the __check_irq_replay(), replay the PMI interrupt
by calling performance_monitor_common handler.

Reviewed-by: Nicholas Piggin 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/exception-64s.h |  5 +
 arch/powerpc/include/asm/hw_irq.h|  2 ++
 arch/powerpc/kernel/entry_64.S   |  5 +
 arch/powerpc/kernel/exceptions-64s.S |  6 --
 arch/powerpc/kernel/irq.c| 25 -
 5 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index ac0b8a25d9e6..8033f3950970 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -422,6 +422,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define SOFTEN_VALUE_0xe80 PACA_IRQ_DBELL
 #define SOFTEN_VALUE_0xe60 PACA_IRQ_HMI
 #define SOFTEN_VALUE_0xea0 PACA_IRQ_EE
+#define SOFTEN_VALUE_0xf00 PACA_IRQ_PMI
 
 #define __SOFTEN_TEST(h, vec, bitmask) \
lbz r10,PACASOFTIRQEN(r13); \
@@ -486,6 +487,10 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
_MASKABLE_RELON_EXCEPTION_PSERIES(vec, label,   \
  EXC_STD, SOFTEN_NOTEST_PR, bitmask)
 
+#define MASKABLE_RELON_EXCEPTION_PSERIES_OOL(vec, label, bitmask)  \
+   MASKABLE_EXCEPTION_PROLOG_1(PACA_EXGEN, SOFTEN_NOTEST_PR, vec, 
bitmask);\
+   EXCEPTION_PROLOG_PSERIES_1(label, EXC_STD);
+
 #define MASKABLE_RELON_EXCEPTION_HV(loc, vec, label, bitmask)  \
_MASKABLE_RELON_EXCEPTION_PSERIES(vec, label,   \
  EXC_HV, SOFTEN_NOTEST_HV, bitmask)
diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index 8359fbf83376..ac9a17ea1d66 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -26,12 +26,14 @@
 #define PACA_IRQ_DEC   0x08 /* Or FIT */
 #define PACA_IRQ_EE_EDGE   0x10 /* BookE only */
 #define PACA_IRQ_HMI   0x20
+#define PACA_IRQ_PMI   0x40
 
 /*
  * flags for paca->soft_enabled
  */
 #define IRQ_DISABLE_MASK_NONE  0
 #define IRQ_DISABLE_MASK_LINUX 1
+#define IRQ_DISABLE_MASK_PMU   2
 
 #endif /* CONFIG_PPC64 */
 
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index f3afa0b9332d..d021f7de79bd 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -931,6 +931,11 @@ restore_check_irq_replay:
addir3,r1,STACK_FRAME_OVERHEAD;
bl  do_IRQ
b   ret_from_except
+1: cmpwi   cr0,r3,0xf00
+   bne 1f
+   addir3,r1,STACK_FRAME_OVERHEAD;
+   bl  performance_monitor_exception
+   b   ret_from_except
 1: cmpwi   cr0,r3,0xe60
bne 1f
addir3,r1,STACK_FRAME_OVERHEAD;
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index adddaa8258b9..3718f1d13707 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1036,8 +1036,8 @@ EXC_REAL_NONE(0xec0, 0xf00)
 EXC_VIRT_NONE(0x4ec0, 0x4f00)
 
 
-EXC_REAL_OOL(performance_monitor, 0xf00, 0xf20)
-EXC_VIRT_OOL(performance_monitor, 0x4f00, 0x4f20, 0xf00)
+EXC_REAL_OOL_MASKABLE(performance_monitor, 0xf00, 0xf20, IRQ_DISABLE_MASK_PMU)
+EXC_VIRT_OOL_MASKABLE(performance_monitor, 0x4f00, 0x4f20, 0xf00, 
IRQ_DISABLE_MASK_PMU)
 TRAMP_KVM(PACA_EXGEN, 0xf00)
 EXC_COMMON_ASYNC(performance_monitor_common, 0xf00, 
performance_monitor_exception)
 
@@ -1575,6 +1575,8 @@ _GLOBAL(__replay_interrupt)
beq decrementer_common
cmpwi   r3,0x500
beq hardware_interrupt_common
+   cmpwi   r3,0xf00
+   beq performance_monitor_common
 BEGIN_FTR_SECTION
cmpwi   r3,0xe80
beq h_doorbell_common
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 5a995183dafb..c030ccd0a7aa 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -169,6 +169,27 @@ notrace unsigned int __check_irq_replay(void)
if ((happened & PACA_IRQ_DEC) || decrementer_check_overflow())
return 0x900;
 
+   /*
+* In masked_handler() for PMI, we disable MSR[EE] and return.
+* Replay it here.
+*
+* After this point, PMIs could still be disabled in certain
+* scenarios like this one.
+*
+* local_irq_disable();
+* powerpc_irq_pmu_save();
+* powerpc_irq_pmu_restore();
+* local_irq_restore();
+*
+* Even though powerpc_irq_pmu

[PATCH v3 07/11] powerpc: Add support to take additional parameter in MASKABLE_* macro

2016-11-14 Thread Madhavan Srinivasan

To support addition of "bitmask" to MASKABLE_* macros,
factor out the EXCPETION_PROLOG_1 macro.

Currently soft_enabled is used as the flag to determine
the interrupt state. Patch extends the soft_enabled
to be used as a mask instead of a flag.

Make it explicit the interrupt masking supported
by a gievn interrupt handler. Patch correspondingly
extends the MASKABLE_* macros with an addition's parameter.
"bitmask" parameter is passed to SOFTEN_TEST macro to decide
on masking the interrupt.

Reviewed-by: Nicholas Piggin 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/exception-64s.h | 94 
 arch/powerpc/include/asm/head-64.h   | 40 +++---
 arch/powerpc/include/asm/irqflags.h  |  4 +-
 arch/powerpc/kernel/entry_64.S   |  4 +-
 arch/powerpc/kernel/exceptions-64e.S |  6 +-
 arch/powerpc/kernel/exceptions-64s.S | 32 ++-
 6 files changed, 104 insertions(+), 76 deletions(-)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index bcd5a7f7dafe..ac0b8a25d9e6 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -166,18 +166,40 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
std r10,area+EX_R10(r13);   /* save r10 - r12 */\
OPT_GET_SPR(r10, SPRN_CFAR, CPU_FTR_CFAR)
 
-#define __EXCEPTION_PROLOG_1(area, extra, vec) \
+#define __EXCEPTION_PROLOG_1_PRE(area) \
OPT_SAVE_REG_TO_PACA(area+EX_PPR, r9, CPU_FTR_HAS_PPR); \
OPT_SAVE_REG_TO_PACA(area+EX_CFAR, r10, CPU_FTR_CFAR);  \
SAVE_CTR(r10, area);\
-   mfcrr9; \
-   extra(vec); \
+   mfcrr9;
+
+#define __EXCEPTION_PROLOG_1_POST(area)
\
std r11,area+EX_R11(r13);   \
std r12,area+EX_R12(r13);   \
GET_SCRATCH0(r10);  \
std r10,area+EX_R13(r13)
+
+/*
+ * This version of the EXCEPTION_PROLOG_1 will carry
+ * addition parameter called "bitmask" to support
+ * checking of the interrupt maskable level in the SOFTEN_TEST.
+ * Intended to be used in MASKABLE_EXCPETION_* macros.
+ */
+#define MASKABLE_EXCEPTION_PROLOG_1(area, extra, vec, bitmask) 
\
+   __EXCEPTION_PROLOG_1_PRE(area); \
+   extra(vec, bitmask);\
+   __EXCEPTION_PROLOG_1_POST(area);
+
+/*
+ * This version of the EXCEPTION_PROLOG_1 is intended
+ * to be used in STD_EXCEPTION* macros
+ */
+#define _EXCEPTION_PROLOG_1(area, extra, vec)  \
+   __EXCEPTION_PROLOG_1_PRE(area); \
+   extra(vec); \
+   __EXCEPTION_PROLOG_1_POST(area);
+
 #define EXCEPTION_PROLOG_1(area, extra, vec)   \
-   __EXCEPTION_PROLOG_1(area, extra, vec)
+   _EXCEPTION_PROLOG_1(area, extra, vec)
 
 #define __EXCEPTION_PROLOG_PSERIES_1(label, h) \
ld  r10,PACAKMSR(r13);  /* get MSR value for kernel */  \
@@ -401,21 +423,21 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
 #define SOFTEN_VALUE_0xe60 PACA_IRQ_HMI
 #define SOFTEN_VALUE_0xea0 PACA_IRQ_EE
 
-#define __SOFTEN_TEST(h, vec)  \
+#define __SOFTEN_TEST(h, vec, bitmask) \
lbz r10,PACASOFTIRQEN(r13); \
-   cmpwi   r10,IRQ_DISABLE_MASK_LINUX; \
+   andi.   r10,r10,bitmask;\
li  r10,SOFTEN_VALUE_##vec; \
-   beq masked_##h##interrupt
+   bne masked_##h##interrupt
 
-#define _SOFTEN_TEST(h, vec)   __SOFTEN_TEST(h, vec)
+#define _SOFTEN_TEST(h, vec, bitmask)  __SOFTEN_TEST(h, vec, bitmask)
 
-#define SOFTEN_TEST_PR(vec)\
+#define SOFTEN_TEST_PR(vec, bitmask)   \
KVMTEST(EXC_STD, vec);  \
-   _SOFTEN_TEST(EXC_STD, vec)
+   _SOFTEN_TEST(EXC_STD, vec, bitmask)
 
-#define SOFTEN_TEST_HV(vec)\
+#define SOFTEN_TEST_HV(vec, bitmask)   \
KVMTEST(EXC_HV, vec);   \
-   _SOFTEN_TEST(EXC_HV, vec)
+   _SOFTEN_TEST(EXC_HV, vec, bitmask)
 
 #define KVMTEST_PR(vec)
\
KVMTEST(EXC_STD, vec)
@@ -423,53 +445,53

[PATCH v3 05/11] powerpc: reverse the soft_enable logic

2016-11-14 Thread Madhavan Srinivasan

"paca->soft_enabled" is used as a flag to mask some of interrupts.
Currently supported flags values and their details:

soft_enabledMSR[EE]

0   0   Disabled (PMI and HMI not masked)
1   1   Enabled

"paca->soft_enabled" is initialized to 1 to make the interripts as
enabled. arch_local_irq_disable() will toggle the value when interrupts
needs to disbled. At this point, the interrupts are not actually disabled,
instead, interrupt vector has code to check for the flag and mask it when it 
occurs.
By "mask it", it update interrupt paca->irq_happened and return.
arch_local_irq_restore() is called to re-enable interrupts, which checks and
replays interrupts if any occured.

Now, as mentioned, current logic doesnot mask "performance monitoring 
interrupts"
and PMIs are implemented as NMI. But this patchset depends on local_irq_*
for a successful local_* update. Meaning, mask all possible interrupts during
local_* update and replay them after the update.

So the idea here is to reserve the "paca->soft_enabled" logic. New values and
details:

soft_enabledMSR[EE]

1   0   Disabled  (PMI and HMI not masked)
0   1   Enabled

Reason for the this change is to create foundation for a third mask value "0x2"
for "soft_enabled" to add support to mask PMIs. When ->soft_enabled is
set to a value "3", PMI interrupts are mask and when set to a value
of "1", PMI are not mask.

Reviewed-by: Nicholas Piggin 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/hw_irq.h | 4 ++--
 arch/powerpc/kernel/entry_64.S| 5 ++---
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index c292ef4b4bc5..8359fbf83376 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -30,8 +30,8 @@
 /*
  * flags for paca->soft_enabled
  */
-#define IRQ_DISABLE_MASK_NONE  1
-#define IRQ_DISABLE_MASK_LINUX 0
+#define IRQ_DISABLE_MASK_NONE  0
+#define IRQ_DISABLE_MASK_LINUX 1
 
 #endif /* CONFIG_PPC64 */
 
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 8e347ffca14e..7ef3064ddde1 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -132,8 +132,7 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
 */
 #if defined(CONFIG_TRACE_IRQFLAGS) && defined(CONFIG_BUG)
lbz r10,PACASOFTIRQEN(r13)
-   xorir10,r10,IRQ_DISABLE_MASK_NONE
-1: tdnei   r10,0
+1: tdnei   r10,IRQ_DISABLE_MASK_NONE
EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING
 #endif
 
@@ -1010,7 +1009,7 @@ _GLOBAL(enter_rtas)
 * check it with the asm equivalent of WARN_ON
 */
lbz r0,PACASOFTIRQEN(r13)
-1: tdnei   r0,IRQ_DISABLE_MASK_LINUX
+1: tdeqi   r0,IRQ_DISABLE_MASK_NONE
EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,BUGFLAG_WARNING
 #endif

-- 
2.7.4

[PATCH v3 03/11] powerpc: Use soft_enabled_set api to update paca->soft_enabled

2016-11-14 Thread Madhavan Srinivasan

Force use of soft_enabled_set() wrapper to update paca-soft_enabled
wherever possisble. Also add a new wrapper function, soft_enabled_set_return(),
added to force the paca->soft_enabled updates.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/hw_irq.h  | 14 ++
 arch/powerpc/include/asm/kvm_ppc.h |  2 +-
 arch/powerpc/kernel/irq.c  |  2 +-
 arch/powerpc/kernel/setup_64.c |  4 ++--
 arch/powerpc/kernel/time.c |  6 +++---
 5 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index ab1e6da7825c..88f6a8e2b5e3 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -62,6 +62,20 @@ static inline notrace void soft_enabled_set(unsigned long 
enable)
: "memory");
 }
 
+static inline notrace unsigned long soft_enabled_set_return(unsigned long 
enable)
+{
+   unsigned long flags;
+
+   asm volatile(
+   "lbz %0,%1(13); stb %2,%1(13)"
+   : "=r" (flags)
+   : "i" (offsetof(struct paca_struct, soft_enabled)),\
+ "r" (enable)
+   : "memory");
+
+   return flags;
+}
+
 static inline unsigned long arch_local_save_flags(void)
 {
unsigned long flags;
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 7d8d2b40c6a8..2263d2443d94 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -735,7 +735,7 @@ static inline void kvmppc_fix_ee_before_entry(void)
 
/* Only need to enable IRQs by hard enabling them after this */
local_paca->irq_happened = 0;
-   local_paca->soft_enabled = IRQ_DISABLE_MASK_NONE;
+   soft_enabled_set(IRQ_DISABLE_MASK_NONE);
 #endif
 }
 
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 204fa51cdc9e..5a995183dafb 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -337,7 +337,7 @@ bool prep_irq_for_idle(void)
 * of entering the low power state.
 */
local_paca->irq_happened &= ~PACA_IRQ_HARD_DIS;
-   local_paca->soft_enabled = IRQ_DISABLE_MASK_NONE;
+   soft_enabled_set(IRQ_DISABLE_MASK_NONE);
 
/* Tell the caller to enter the low power state */
return true;
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index f31930b9bfc1..f0f882166dcc 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -197,7 +197,7 @@ static void __init fixup_boot_paca(void)
/* Allow percpu accesses to work until we setup percpu data */
get_paca()->data_offset = 0;
/* Mark interrupts disabled in PACA */
-   get_paca()->soft_enabled = IRQ_DISABLE_MASK_LINUX;
+   soft_enabled_set(IRQ_DISABLE_MASK_LINUX);
 }
 
 static void __init configure_exceptions(void)
@@ -334,7 +334,7 @@ void __init early_setup(unsigned long dt_ptr)
 void early_setup_secondary(void)
 {
/* Mark interrupts disabled in PACA */
-   get_paca()->soft_enabled = 0;
+   soft_enabled_set(IRQ_DISABLE_MASK_LINUX);
 
/* Initialize the hash table or TLB handling */
early_init_mmu_secondary();
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 334144e300ca..42a39a2cea0a 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -260,7 +260,7 @@ static u64 scan_dispatch_log(u64 stop_tb)
 void accumulate_stolen_time(void)
 {
u64 sst, ust;
-   u8 save_soft_enabled = local_paca->soft_enabled;
+   unsigned long save_soft_enabled;
struct cpu_accounting_data *acct = &local_paca->accounting;
 
/* We are called early in the exception entry, before
@@ -269,7 +269,7 @@ void accumulate_stolen_time(void)
 * needs to reflect that so various debug stuff doesn't
 * complain
 */
-   local_paca->soft_enabled = IRQ_DISABLE_MASK_LINUX;
+   save_soft_enabled = soft_enabled_set_return(IRQ_DISABLE_MASK_LINUX);
 
sst = scan_dispatch_log(acct->starttime_user);
ust = scan_dispatch_log(acct->starttime);
@@ -277,7 +277,7 @@ void accumulate_stolen_time(void)
acct->user_time -= ust;
local_paca->stolen_time += ust + sst;
 
-   local_paca->soft_enabled = save_soft_enabled;
+   soft_enabled_set(save_soft_enabled);
 }
 
 static inline u64 calculate_stolen_time(u64 stop_tb)
-- 
2.7.4

[PATCH v3 02/11] powerpc: move set_soft_enabled() and rename

2016-11-14 Thread Madhavan Srinivasan

Move set_soft_enabled() from powerpc/kernel/irq.c to
asm/hw_irq.c, to force updates to paca-soft_enabled
done via these access function. Add "memory" clobber
to hint compiler since paca->soft_enabled memory is the target
here

Renaming it as soft_enabled_set() will make
namespaces works better as prefix than a postfix
when new soft_enabled manipulation functions introduced.

Reviewed-by: Nicholas Piggin 
Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/hw_irq.h | 15 +++
 arch/powerpc/kernel/irq.c | 12 +++-
 2 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index 05b81bca15e9..ab1e6da7825c 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -47,6 +47,21 @@ extern void unknown_exception(struct pt_regs *regs);
 #ifdef CONFIG_PPC64
 #include 
 
+/*
+ *TODO:
+ * Currently none of the soft_eanbled modification helpers have clobbers
+ * for modifying the r13->soft_enabled memory itself. Secondly they only
+ * include "memory" clobber as a hint. Ideally, if all the accesses to
+ * soft_enabled go via these helpers, we could avoid the "memory" clobber.
+ * Former could be taken care by having location in the constraints.
+ */
+static inline notrace void soft_enabled_set(unsigned long enable)
+{
+   __asm__ __volatile__("stb %0,%1(13)"
+   : : "r" (enable), "i" (offsetof(struct paca_struct, soft_enabled))
+   : "memory");
+}
+
 static inline unsigned long arch_local_save_flags(void)
 {
unsigned long flags;
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index c6f1e13ff441..204fa51cdc9e 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -108,12 +108,6 @@ static inline notrace unsigned long get_irq_happened(void)
return happened;
 }
 
-static inline notrace void set_soft_enabled(unsigned long enable)
-{
-   __asm__ __volatile__("stb %0,%1(13)"
-   : : "r" (enable), "i" (offsetof(struct paca_struct, soft_enabled)));
-}
-
 static inline notrace int decrementer_check_overflow(void)
 {
u64 now = get_tb_or_rtc();
@@ -213,7 +207,7 @@ notrace void arch_local_irq_restore(unsigned long en)
unsigned int replay;
 
/* Write the new soft-enabled value */
-   set_soft_enabled(en);
+   soft_enabled_set(en);
if (en == IRQ_DISABLE_MASK_LINUX)
return;
/*
@@ -259,7 +253,7 @@ notrace void arch_local_irq_restore(unsigned long en)
}
 #endif /* CONFIG_TRACE_IRQFLAGS */
 
-   set_soft_enabled(IRQ_DISABLE_MASK_LINUX);
+   soft_enabled_set(IRQ_DISABLE_MASK_LINUX);
 
/*
 * Check if anything needs to be re-emitted. We haven't
@@ -269,7 +263,7 @@ notrace void arch_local_irq_restore(unsigned long en)
replay = __check_irq_replay();
 
/* We can soft-enable now */
-   set_soft_enabled(IRQ_DISABLE_MASK_NONE);
+   soft_enabled_set(IRQ_DISABLE_MASK_NONE);
 
/*
 * And replay if we have to. This will return with interrupts
-- 
2.7.4

[PATCH v3 00/11]powerpc: "paca->soft_enabled" based local atomic operation implementation

2016-11-14 Thread Madhavan Srinivasan

Local atomic operations are fast and highly reentrant per CPU counters.
Used for percpu variable updates. Local atomic operations only guarantee
variable modification atomicity wrt the CPU which owns the data and
these needs to be executed in a preemption safe way.

Here is the design of the patchset. Since local_* operations
are only need to be atomic to interrupts (IIUC), we have two options.
Either replay the "op" if interrupted or replay the interrupt after
the "op". Initial patchset posted was based on implementing local_* operation
based on CR5 which replay's the "op". Patchset had issues in case of
rewinding the address pointor from an array. This make the slow patch
really slow. Since CR5 based implementation proposed using __ex_table to find
the rewind address, this rasied concerns about size of __ex_table and vmlinux.

https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-December/123115.html

But this patchset uses Benjamin Herrenschmidt suggestion of using
arch_local_irq_disable() to soft_disable interrupts (including PMIs).
After finishing the "op", arch_local_irq_restore() called and correspondingly
interrupts are replayed if any occured.

Current paca->soft_enabled logic is reserved and MASKABLE_EXCEPTION_* macros
are extended to support this feature.

patch re-write the current local_* functions to use arch_local_irq_disbale.
Base flow for each function is

 {
powerpc_local_irq_pmu_save(flags)
load
..
store
powerpc_local_irq_pmu_restore(flags)
 }

Reason for the approach is that, currently l[w/d]arx/st[w/d]cx.
instruction pair is used for local_* operations, which are heavy
on cycle count and they dont support a local variant. So to
see whether the new implementation helps, used a modified
version of Rusty's benchmark code on local_t.

https://lkml.org/lkml/2008/12/16/450

Modifications to Rusty's benchmark code:
 - Executed only local_t test

Here are the values with the patch.

Time in ns per iteration

Local_t Without Patch   With Patch

_inc28  8
_add28  8
_read   3   3
_add_return 28  7

Currently only asm/local.h has been rewrite, and also
the entire change is tested only in PPC64 (pseries guest)
and PPC64 LE host. Have only compile tested ppc64e_*.

First five are the clean up patches which lays the foundation
to make things easier. Fifth patch in the patchset reverse the
current soft_enabled logic and commit message details the reason and
need for this change. Six and seventh patch refactor's the __EXPECTION_PROLOG_1
code to support addition of a new parameter to MASKABLE_* macros. New parameter
will give the possible mask for the interrupt. Rest of the patches are
to add support for maskable PMI and implementation of local_t using 
powerpc_local_irq_pmu_*().

Since the patchset is experimental, testing done only on pseries and
powernv platforms. Have only compile tested the patchset for Book3e.

Other suggestions from Nick: (planned to be handled via separate follow up 
patchset):
1)Rename the soft_enabled to soft_disable_mask
2)builtin_constants for the soft_enabled manipulation functions
3)Update the proper clobber for "r13->soft_enabled" updates and add barriers()
  to caller functions

Changelog v2:
Rebased to latest upstream

Changelog v1:
1)squashed patches 1/2 together and 8/9/10 together for readability
2)Created a separate patch for the kconfig changes
3)Moved the new mask value commit to patch 11.
4)Renamed local_irq_pmu_*() to powerpc_irq_pmu_*() to avoid
  namespaces matches with generic kernel local_irq*() functions
5)Renamed __EXCEPTION_PROLOG_1 macro to MASKABLE_EXCEPTION_PROLOG_1 macro
6)Made changes to commit messages
7)Add more comments to codes

Changelog RFC v5:
1)Implemented new set of soft_enabled manipulation functions
2)rewritten arch_local_irq_* functions to use the new soft_enabled_*()
3)Add WARN_ON to identify invalid soft_enabled transitions
4)Added powerpc_local_irq_pmu_save() and powerpc_local_irq_pmu_restore() to
  support masking of irqs (with PMI).
5)Added local_irq_pmu_*()s macros with trace_hardirqs_on|off() to match
  include/linux/irqflags.h

Changelog RFC v4:
1)Fix build breaks in in ppc64e_defconfig compilation
2)Merged PMI replay code with the exception vector changes patch
3)Renamed the new API to set PMI mask bit as suggested
4)Modified the current arch_local_save and new API function call to
  "OR" and store the value to ->soft_enabled instead of just store.
5)Updated the check in the arch_local_irq_restore() to alway check for
  greather than or zero to _LINUX mask bit.
6)Updated the commit messages.

Changelog RFC v3:
1)Squashed PMI masked interrupt patch and replay patch together
2)Have created a new patch which includes a new Kconfig and set_irq_set_mask()
3)Fixed the compilation issue with IRQ_DISABLE_MASK_* macros in book3e_*

Changelog RFC v2:
1)Renamed IRQ_DISABLE

Re: [PATCH net-next v7 03/10] dpaa_eth: add option to use one buffer pool set

2016-11-14 Thread David Miller

From: Madalin-Cristian Bucur 
Date: Mon, 14 Nov 2016 10:25:13 +

> I've introduced this Kconfig option as a backwards compatible option, to
> be able to run comparative tests between the independent buffer pool setup
> and the previous common buffer pool setup. There are not so many reasons
> to use the same buffer pool besides "having the old setup", the memory
> saving is marginal, in all other aspects the separate buffer pools setup
> fares better.
> 
> I'll remove this patch from the next submission. Should anyone care for
> this I can add an entry to the feature backlog to add runtime support but
> it will be quite low in priority.

If it's a debugging feature then that's certainly how this should be
handled.

Re: [RESEND PATCH] cxl: Fix coredump generation when cxl_get_fd() is used

2016-11-14 Thread Matthew R. Ochs

> On Nov 14, 2016, at 2:58 AM, Frederic Barrat  
> wrote:
> 
> If a process dumps core while owning a cxl file descriptor obtained
> from an AFU driver (e.g. cxlflash) through the cxl_get_fd() API, the
> following error occurs:
> 
> [  868.027591] Unable to handle kernel paging request for data at address ...
> [  868.027778] Faulting instruction address: 0xc035edb0
> cpu 0x8c: Vector: 300 (Data Access) at [c03c688275e0]
>pc: c035edb0: elf_core_dump+0xd60/0x1300
>lr: c035ed80: elf_core_dump+0xd30/0x1300
>sp: c03c68827860
>   msr: 90019033
>   dar: c
> dsisr: 4000
> current = 0xc03c6878
> paca= 0xc1b73200   softe: 0irq_happened: 0x01
>pid   = 46725, comm = hxesurelock
> enter ? for help
> [c03c68827a60] c036948c do_coredump+0xcec/0x11e0
> [c03c68827c20] c00ce9e0 get_signal+0x540/0x7b0
> [c03c68827d10] c0017354 do_signal+0x54/0x2b0
> [c03c68827e00] c001777c do_notify_resume+0xbc/0xd0
> [c03c68827e30] c0009838 ret_from_except_lite+0x64/0x68
> --- Exception: 300 (Data Access) at 3fff98ad2918
> 
> The root cause is that the address_space structure for the file
> doesn't define a 'host' member.
> 
> When cxl allocates a file descriptor, it's using the anonymous inode
> to back the file, but allocates a private address_space for each
> context. The private address_space allows to track memory allocation
> for each context. cxl doesn't define the 'host' member of the address
> space, i.e. the inode. We don't want to define it as the anonymous
> inode, since there's no longer a 1-to-1 relation between address_space
> and inode.
> 
> To fix it, instead of using the anonymous inode, we introduce a simple
> pseudo filesystem so that cxl can allocate its own inodes. So we now
> have one inode for each file and address_space. The pseudo filesystem
> is only mounted on the first allocation of a file descriptor by
> cxl_get_fd().
> 
> Tested with cxlflash.
> 
> Signed-off-by: Frederic Barrat 

Reviewed-by: Matthew R. Ochs

Re: [PATCH] powerpc/mm: Batch tlb flush when invalidating pte entries

2016-11-14 Thread Aneesh Kumar K.V

"Aneesh Kumar K.V"  writes:

> This will improve the task exit case, by batching tlb invalidates.
>
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/include/asm/book3s/64/radix.h | 14 ++
>  1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/radix.h 
> b/arch/powerpc/include/asm/book3s/64/radix.h
> index aec6e8ee6e27..e8b4f39e9fab 100644
> --- a/arch/powerpc/include/asm/book3s/64/radix.h
> +++ b/arch/powerpc/include/asm/book3s/64/radix.h
> @@ -147,10 +147,16 @@ static inline unsigned long radix__pte_update(struct 
> mm_struct *mm,
>* new value of pte
>*/
>   new_pte = (old_pte | set) & ~clr;
> - psize = radix_get_mmu_psize(pg_sz);
> - radix__flush_tlb_page_psize(mm, addr, psize);
> -
> - __radix_pte_update(ptep, 0, new_pte);
> + /*
> +  * If we are trying to clear the pte, we can skip
> +  * the below sequence and batch the tlb flush. The
> +  * tlb flush batching is done by mmu gather code
> +  */
> + if (new_pte) {
> + psize = radix_get_mmu_psize(pg_sz);
> + radix__flush_tlb_page_psize(mm, addr, psize);
> + __radix_pte_update(ptep, 0, new_pte);
> + }
>   } else
>   old_pte = __radix_pte_update(ptep, clr, set);
>   asm volatile("ptesync" : : : "memory");

We can also avoid the ptesync I guess. BTW for transition from V=0 to
a valid pte, we are good without this patch because that is done via 
set_pte_at()

diff --git a/arch/powerpc/include/asm/book3s/64/radix.h 
b/arch/powerpc/include/asm/book3s/64/radix.h
index e8b4f39e9fab..83c77323a769 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -142,7 +142,6 @@ static inline unsigned long radix__pte_update(struct 
mm_struct *mm,
unsigned long new_pte;
 
old_pte = __radix_pte_update(ptep, ~0, 0);
-   asm volatile("ptesync" : : : "memory");
/*
 * new value of pte
 */
@@ -153,6 +152,7 @@ static inline unsigned long radix__pte_update(struct 
mm_struct *mm,
 * tlb flush batching is done by mmu gather code
 */
if (new_pte) {
+   asm volatile("ptesync" : : : "memory");
psize = radix_get_mmu_psize(pg_sz);
radix__flush_tlb_page_psize(mm, addr, psize);
__radix_pte_update(ptep, 0, new_pte);

[PATCH] powerpc/mm: Batch tlb flush when invalidating pte entries

2016-11-14 Thread Aneesh Kumar K.V

This will improve the task exit case, by batching tlb invalidates.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/radix.h | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/radix.h 
b/arch/powerpc/include/asm/book3s/64/radix.h
index aec6e8ee6e27..e8b4f39e9fab 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -147,10 +147,16 @@ static inline unsigned long radix__pte_update(struct 
mm_struct *mm,
 * new value of pte
 */
new_pte = (old_pte | set) & ~clr;
-   psize = radix_get_mmu_psize(pg_sz);
-   radix__flush_tlb_page_psize(mm, addr, psize);
-
-   __radix_pte_update(ptep, 0, new_pte);
+   /*
+* If we are trying to clear the pte, we can skip
+* the below sequence and batch the tlb flush. The
+* tlb flush batching is done by mmu gather code
+*/
+   if (new_pte) {
+   psize = radix_get_mmu_psize(pg_sz);
+   radix__flush_tlb_page_psize(mm, addr, psize);
+   __radix_pte_update(ptep, 0, new_pte);
+   }
} else
old_pte = __radix_pte_update(ptep, clr, set);
asm volatile("ptesync" : : : "memory");
-- 
2.10.2

[PATCH v2 1/2] hugetlb: Change the function prototype to take vma_area_struct as arg

2016-11-14 Thread Aneesh Kumar K.V

This help us to find the hugetlb page size which we need ot use on some
archs like ppc64 for tlbflush. This also make the interface consistent
with other hugetlb functions

Signed-off-by: Aneesh Kumar K.V 
---
NOTE: This series is dependent on another series posted here.
https://lists.ozlabs.org/pipermail/linuxppc-dev/2016-November/150948.html

 arch/arm/include/asm/hugetlb-3level.h|  8 
 arch/arm64/include/asm/hugetlb.h |  4 ++--
 arch/arm64/mm/hugetlbpage.c  |  7 +--
 arch/ia64/include/asm/hugetlb.h  |  8 
 arch/metag/include/asm/hugetlb.h |  8 
 arch/mips/include/asm/hugetlb.h  |  7 ---
 arch/parisc/include/asm/hugetlb.h|  4 ++--
 arch/parisc/mm/hugetlbpage.c |  6 --
 arch/powerpc/include/asm/book3s/32/pgtable.h |  4 ++--
 arch/powerpc/include/asm/book3s/64/hugetlb.h |  4 ++--
 arch/powerpc/include/asm/hugetlb.h   |  6 +++---
 arch/powerpc/include/asm/nohash/32/pgtable.h |  4 ++--
 arch/powerpc/include/asm/nohash/64/pgtable.h |  4 ++--
 arch/s390/include/asm/hugetlb.h  | 12 ++--
 arch/s390/mm/hugetlbpage.c   |  3 ++-
 arch/sh/include/asm/hugetlb.h|  8 
 arch/sparc/include/asm/hugetlb.h |  6 +++---
 arch/sparc/mm/hugetlbpage.c  |  3 ++-
 arch/tile/include/asm/hugetlb.h  |  8 
 arch/x86/include/asm/hugetlb.h   |  8 
 mm/hugetlb.c |  6 +++---
 21 files changed, 68 insertions(+), 60 deletions(-)

diff --git a/arch/arm/include/asm/hugetlb-3level.h 
b/arch/arm/include/asm/hugetlb-3level.h
index d4014fbe5ea3..b71839e1786f 100644
--- a/arch/arm/include/asm/hugetlb-3level.h
+++ b/arch/arm/include/asm/hugetlb-3level.h
@@ -49,16 +49,16 @@ static inline void huge_ptep_clear_flush(struct 
vm_area_struct *vma,
ptep_clear_flush(vma, addr, ptep);
 }
 
-static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
+static inline void huge_ptep_set_wrprotect(struct vm_area_struct *vma,
   unsigned long addr, pte_t *ptep)
 {
-   ptep_set_wrprotect(mm, addr, ptep);
+   ptep_set_wrprotect(vma->vm_mm, addr, ptep);
 }
 
-static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
+static inline pte_t huge_ptep_get_and_clear(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep)
 {
-   return ptep_get_and_clear(mm, addr, ptep);
+   return ptep_get_and_clear(vma->vm_mm, addr, ptep);
 }
 
 static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index bbc1e35aa601..4e54d4b58d3e 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -76,9 +76,9 @@ extern void set_huge_pte_at(struct mm_struct *mm, unsigned 
long addr,
 extern int huge_ptep_set_access_flags(struct vm_area_struct *vma,
  unsigned long addr, pte_t *ptep,
  pte_t pte, int dirty);
-extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
+extern pte_t huge_ptep_get_and_clear(struct vm_area_struct *vma,
 unsigned long addr, pte_t *ptep);
-extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
+extern void huge_ptep_set_wrprotect(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep);
 extern void huge_ptep_clear_flush(struct vm_area_struct *vma,
  unsigned long addr, pte_t *ptep);
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 2e49bd252fe7..5c8903433cd9 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -197,10 +197,11 @@ pte_t arch_make_huge_pte(pte_t entry, struct 
vm_area_struct *vma,
return entry;
 }
 
-pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
+pte_t huge_ptep_get_and_clear(struct vm_area_struct *vma,
  unsigned long addr, pte_t *ptep)
 {
pte_t pte;
+   struct mm_struct *mm = vma->vm_mm;
 
if (pte_cont(*ptep)) {
int ncontig, i;
@@ -263,9 +264,11 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
}
 }
 
-void huge_ptep_set_wrprotect(struct mm_struct *mm,
+void huge_ptep_set_wrprotect(struct vm_area_struct *vma,
 unsigned long addr, pte_t *ptep)
 {
+   struct mm_struct *mm = vma->vm_mm;
+
if (pte_cont(*ptep)) {
int ncontig, i;
pte_t *cpte;
diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
index ef65f026b11e..eb1c1d674200 100644
--- a/arch/ia64/include/asm/hugetlb.h
+++ b/arch/ia64/include/asm/hugetlb.h
@@ -26,10 +26,10 @@ static inline void set_huge_pte_at(struct mm_struct *mm, 
unsi

[PATCH v2 2/2] powerpc/mm/hugetlb: Switch hugetlb update to use pte_update

2016-11-14 Thread Aneesh Kumar K.V

Now that we have updated hugetlb functions to take vm_area_struct and we can
derive huge page size from vma, switch the pte update to use generic functions.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hugetlb.h | 34 +++-
 arch/powerpc/include/asm/hugetlb.h   |  2 +-
 2 files changed, 9 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hugetlb.h 
b/arch/powerpc/include/asm/book3s/64/hugetlb.h
index 80fa0c828413..0a6db2086140 100644
--- a/arch/powerpc/include/asm/book3s/64/hugetlb.h
+++ b/arch/powerpc/include/asm/book3s/64/hugetlb.h
@@ -31,36 +31,18 @@ static inline int hstate_get_psize(struct hstate *hstate)
}
 }
 
-static inline unsigned long huge_pte_update(struct mm_struct *mm, unsigned 
long addr,
+static inline unsigned long huge_pte_update(struct vm_area_struct *vma, 
unsigned long addr,
pte_t *ptep, unsigned long clr,
unsigned long set)
 {
-   if (radix_enabled()) {
-   unsigned long old_pte;
+   unsigned long pg_sz;
 
-   if (cpu_has_feature(CPU_FTR_POWER9_DD1)) {
+   VM_WARN_ON(!is_vm_hugetlb_page(vma));
+   pg_sz = huge_page_size(hstate_vma(vma));
 
-   unsigned long new_pte;
-
-   old_pte = __radix_pte_update(ptep, ~0, 0);
-   asm volatile("ptesync" : : : "memory");
-   /*
-* new value of pte
-*/
-   new_pte = (old_pte | set) & ~clr;
-   /*
-* For now let's do heavy pid flush
-* radix__flush_tlb_page_psize(mm, addr, 
mmu_virtual_psize);
-*/
-   radix__flush_tlb_mm(mm);
-
-   __radix_pte_update(ptep, 0, new_pte);
-   } else
-   old_pte = __radix_pte_update(ptep, clr, set);
-   asm volatile("ptesync" : : : "memory");
-   return old_pte;
-   }
-   return hash__pte_update(mm, addr, ptep, clr, set, true);
+   if (radix_enabled())
+   return radix__pte_update(vma->vm_mm, addr, ptep, clr, set, 
pg_sz);
+   return hash__pte_update(vma->vm_mm, addr, ptep, clr, set, true);
 }
 
 static inline void huge_ptep_set_wrprotect(struct vm_area_struct *vma,
@@ -69,7 +51,7 @@ static inline void huge_ptep_set_wrprotect(struct 
vm_area_struct *vma,
if ((pte_raw(*ptep) & cpu_to_be64(_PAGE_WRITE)) == 0)
return;
 
-   huge_pte_update(vma->vm_mm, addr, ptep, _PAGE_WRITE, 0);
+   huge_pte_update(vma, addr, ptep, _PAGE_WRITE, 0);
 }
 
 #endif
diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index bb1bf23d6f90..f0731dff76c2 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -136,7 +136,7 @@ static inline pte_t huge_ptep_get_and_clear(struct 
vm_area_struct *vma,
unsigned long addr, pte_t *ptep)
 {
 #ifdef CONFIG_PPC64
-   return __pte(huge_pte_update(vma->vm_mm, addr, ptep, ~0UL, 0));
+   return __pte(huge_pte_update(vma, addr, ptep, ~0UL, 0));
 #else
return __pte(pte_update(ptep, ~0UL, 0));
 #endif
-- 
2.10.2

[PATCH v2 4/4] powerpc/mm: update pte_update to not do full mm tlb flush

2016-11-14 Thread Aneesh Kumar K.V

When we are updating pte, we just need to flush the tlb mapping for
that pte. Right now we do a full mm flush because we don't track page
size. Update the interface to track the page size and use that to
do the right tlb flush.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 16 ++--
 arch/powerpc/include/asm/book3s/64/radix.h   | 19 ---
 arch/powerpc/mm/pgtable-radix.c  |  2 +-
 3 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index ef2eef1ba99a..09869ad37aba 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -301,12 +301,16 @@ extern unsigned long pci_io_base;
 
 static inline unsigned long pte_update(struct mm_struct *mm, unsigned long 
addr,
   pte_t *ptep, unsigned long clr,
-  unsigned long set, int huge)
+  unsigned long set,
+  unsigned long pg_sz)
 {
+   bool huge = (pg_sz != PAGE_SIZE);
+
if (radix_enabled())
-   return radix__pte_update(mm, addr, ptep, clr, set, huge);
+   return radix__pte_update(mm, addr, ptep, clr, set, pg_sz);
return hash__pte_update(mm, addr, ptep, clr, set, huge);
 }
+
 /*
  * For hash even if we have _PAGE_ACCESSED = 0, we do a pte_update.
  * We currently remove entries from the hashtable regardless of whether
@@ -324,7 +328,7 @@ static inline int __ptep_test_and_clear_young(struct 
mm_struct *mm,
 
if ((pte_raw(*ptep) & cpu_to_be64(_PAGE_ACCESSED | H_PAGE_HASHPTE)) == 
0)
return 0;
-   old = pte_update(mm, addr, ptep, _PAGE_ACCESSED, 0, 0);
+   old = pte_update(mm, addr, ptep, _PAGE_ACCESSED, 0, PAGE_SIZE);
return (old & _PAGE_ACCESSED) != 0;
 }
 
@@ -343,21 +347,21 @@ static inline void ptep_set_wrprotect(struct mm_struct 
*mm, unsigned long addr,
if ((pte_raw(*ptep) & cpu_to_be64(_PAGE_WRITE)) == 0)
return;
 
-   pte_update(mm, addr, ptep, _PAGE_WRITE, 0, 0);
+   pte_update(mm, addr, ptep, _PAGE_WRITE, 0, PAGE_SIZE);
 }
 
 #define __HAVE_ARCH_PTEP_GET_AND_CLEAR
 static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
   unsigned long addr, pte_t *ptep)
 {
-   unsigned long old = pte_update(mm, addr, ptep, ~0UL, 0, 0);
+   unsigned long old = pte_update(mm, addr, ptep, ~0UL, 0, PAGE_SIZE);
return __pte(old);
 }
 
 static inline void pte_clear(struct mm_struct *mm, unsigned long addr,
 pte_t * ptep)
 {
-   pte_update(mm, addr, ptep, ~0UL, 0, 0);
+   pte_update(mm, addr, ptep, ~0UL, 0, PAGE_SIZE);
 }
 
 static inline int pte_write(pte_t pte)
diff --git a/arch/powerpc/include/asm/book3s/64/radix.h 
b/arch/powerpc/include/asm/book3s/64/radix.h
index 279b2f68e00f..aec6e8ee6e27 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -129,15 +129,16 @@ static inline unsigned long __radix_pte_update(pte_t 
*ptep, unsigned long clr,
 
 
 static inline unsigned long radix__pte_update(struct mm_struct *mm,
-   unsigned long addr,
-   pte_t *ptep, unsigned long clr,
-   unsigned long set,
-   int huge)
+ unsigned long addr,
+ pte_t *ptep, unsigned long clr,
+ unsigned long set,
+ unsigned long pg_sz)
 {
unsigned long old_pte;
 
if (cpu_has_feature(CPU_FTR_POWER9_DD1)) {
 
+   int psize;
unsigned long new_pte;
 
old_pte = __radix_pte_update(ptep, ~0, 0);
@@ -146,18 +147,14 @@ static inline unsigned long radix__pte_update(struct 
mm_struct *mm,
 * new value of pte
 */
new_pte = (old_pte | set) & ~clr;
-
-   /*
-* For now let's do heavy pid flush
-* radix__flush_tlb_page_psize(mm, addr, mmu_virtual_psize);
-*/
-   radix__flush_tlb_mm(mm);
+   psize = radix_get_mmu_psize(pg_sz);
+   radix__flush_tlb_page_psize(mm, addr, psize);
 
__radix_pte_update(ptep, 0, new_pte);
} else
old_pte = __radix_pte_update(ptep, clr, set);
asm volatile("ptesync" : : : "memory");
-   if (!huge)
+   if (pg_sz == PAGE_SIZE)
assert_pte_locked(mm, addr);
 
return old_pte;
diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 6b1ffc449158..735be6821

[PATCH v2 3/4] powerpc/mm/hugetlb: Switch hugetlb update to use huge_pte_update

2016-11-14 Thread Aneesh Kumar K.V

We want to switch pte_update to use va based tlb flush. In order to do that we
need to track the page size. With hugetlb we currently don't have page size
available in these functions. Hence switch hugetlb to use seperate functions
for update. In later patch we will update hugetlb functions to take
vm_area_struct from which we can derive the page size. After that we will switch
this back to use pte_update

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hugetlb.h | 42 
 arch/powerpc/include/asm/book3s/64/pgtable.h |  9 --
 arch/powerpc/include/asm/hugetlb.h   |  2 +-
 3 files changed, 43 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hugetlb.h 
b/arch/powerpc/include/asm/book3s/64/hugetlb.h
index d9c283f95e05..9a64f356a8e8 100644
--- a/arch/powerpc/include/asm/book3s/64/hugetlb.h
+++ b/arch/powerpc/include/asm/book3s/64/hugetlb.h
@@ -30,4 +30,46 @@ static inline int hstate_get_psize(struct hstate *hstate)
return mmu_virtual_psize;
}
 }
+
+static inline unsigned long huge_pte_update(struct mm_struct *mm, unsigned 
long addr,
+   pte_t *ptep, unsigned long clr,
+   unsigned long set)
+{
+   if (radix_enabled()) {
+   unsigned long old_pte;
+
+   if (cpu_has_feature(CPU_FTR_POWER9_DD1)) {
+
+   unsigned long new_pte;
+
+   old_pte = __radix_pte_update(ptep, ~0, 0);
+   asm volatile("ptesync" : : : "memory");
+   /*
+* new value of pte
+*/
+   new_pte = (old_pte | set) & ~clr;
+   /*
+* For now let's do heavy pid flush
+* radix__flush_tlb_page_psize(mm, addr, 
mmu_virtual_psize);
+*/
+   radix__flush_tlb_mm(mm);
+
+   __radix_pte_update(ptep, 0, new_pte);
+   } else
+   old_pte = __radix_pte_update(ptep, clr, set);
+   asm volatile("ptesync" : : : "memory");
+   return old_pte;
+   }
+   return hash__pte_update(mm, addr, ptep, clr, set, true);
+}
+
+static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
+  unsigned long addr, pte_t *ptep)
+{
+   if ((pte_raw(*ptep) & cpu_to_be64(_PAGE_WRITE)) == 0)
+   return;
+
+   huge_pte_update(mm, addr, ptep, _PAGE_WRITE, 0);
+}
+
 #endif
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 46d739457d68..ef2eef1ba99a 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -346,15 +346,6 @@ static inline void ptep_set_wrprotect(struct mm_struct 
*mm, unsigned long addr,
pte_update(mm, addr, ptep, _PAGE_WRITE, 0, 0);
 }
 
-static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
-  unsigned long addr, pte_t *ptep)
-{
-   if ((pte_raw(*ptep) & cpu_to_be64(_PAGE_WRITE)) == 0)
-   return;
-
-   pte_update(mm, addr, ptep, _PAGE_WRITE, 0, 1);
-}
-
 #define __HAVE_ARCH_PTEP_GET_AND_CLEAR
 static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
   unsigned long addr, pte_t *ptep)
diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index c03e0a3dd4d8..058d6311de87 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -136,7 +136,7 @@ static inline pte_t huge_ptep_get_and_clear(struct 
mm_struct *mm,
unsigned long addr, pte_t *ptep)
 {
 #ifdef CONFIG_PPC64
-   return __pte(pte_update(mm, addr, ptep, ~0UL, 0, 1));
+   return __pte(huge_pte_update(mm, addr, ptep, ~0UL, 0));
 #else
return __pte(pte_update(ptep, ~0UL, 0));
 #endif
-- 
2.10.2

[PATCH v2 2/4] powerpc/mm: Rename hugetlb-radix.h to hugetlb.h

2016-11-14 Thread Aneesh Kumar K.V

We will start moving some book3s specific hugetlb functions there.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/{hugetlb-radix.h => hugetlb.h} | 8 ++--
 arch/powerpc/include/asm/hugetlb.h| 2 +-
 2 files changed, 7 insertions(+), 3 deletions(-)
 rename arch/powerpc/include/asm/book3s/64/{hugetlb-radix.h => hugetlb.h} (78%)

diff --git a/arch/powerpc/include/asm/book3s/64/hugetlb-radix.h 
b/arch/powerpc/include/asm/book3s/64/hugetlb.h
similarity index 78%
rename from arch/powerpc/include/asm/book3s/64/hugetlb-radix.h
rename to arch/powerpc/include/asm/book3s/64/hugetlb.h
index c45189aa7476..d9c283f95e05 100644
--- a/arch/powerpc/include/asm/book3s/64/hugetlb-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/hugetlb.h
@@ -1,5 +1,5 @@
-#ifndef _ASM_POWERPC_BOOK3S_64_HUGETLB_RADIX_H
-#define _ASM_POWERPC_BOOK3S_64_HUGETLB_RADIX_H
+#ifndef _ASM_POWERPC_BOOK3S_64_HUGETLB_H
+#define _ASM_POWERPC_BOOK3S_64_HUGETLB_H
 /*
  * For radix we want generic code to handle hugetlb. But then if we want
  * both hash and radix to be enabled together we need to workaround the
@@ -21,6 +21,10 @@ static inline int hstate_get_psize(struct hstate *hstate)
return MMU_PAGE_2M;
else if (shift == mmu_psize_defs[MMU_PAGE_1G].shift)
return MMU_PAGE_1G;
+   else if (shift == mmu_psize_defs[MMU_PAGE_16M].shift)
+   return MMU_PAGE_16M;
+   else if (shift == mmu_psize_defs[MMU_PAGE_16G].shift)
+   return MMU_PAGE_16G;
else {
WARN(1, "Wrong huge page shift\n");
return mmu_virtual_psize;
diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index c5517f463ec7..c03e0a3dd4d8 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -9,7 +9,7 @@ extern struct kmem_cache *hugepte_cache;
 
 #ifdef CONFIG_PPC_BOOK3S_64
 
-#include 
+#include 
 /*
  * This should work for other subarchs too. But right now we use the
  * new format only for 64bit book3s
-- 
2.10.2

[PATCH v2 1/4] powerpc/mm: update ptep_set_access_flag to not do full mm tlb flush

2016-11-14 Thread Aneesh Kumar K.V

When we are updating pte, we just need to flush the tlb mapping for
that pte. Right now we do a full mm flush because we don't track page
size. Update the interface to track the page size and use that to
do the right tlb flush.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/32/pgtable.h |  4 +++-
 arch/powerpc/include/asm/book3s/64/pgtable.h |  7 +--
 arch/powerpc/include/asm/book3s/64/radix.h   | 14 +++---
 arch/powerpc/include/asm/nohash/32/pgtable.h |  4 +++-
 arch/powerpc/include/asm/nohash/64/pgtable.h |  4 +++-
 arch/powerpc/mm/pgtable-book3s64.c   |  3 ++-
 arch/powerpc/mm/pgtable-radix.c  | 16 
 arch/powerpc/mm/pgtable.c| 10 --
 arch/powerpc/mm/tlb-radix.c  | 15 ---
 9 files changed, 47 insertions(+), 30 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 6b8b2d57fdc8..0713626e9189 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -224,7 +224,9 @@ static inline void huge_ptep_set_wrprotect(struct mm_struct 
*mm,
 
 
 static inline void __ptep_set_access_flags(struct mm_struct *mm,
-  pte_t *ptep, pte_t entry)
+  pte_t *ptep, pte_t entry,
+  unsigned long address,
+  unsigned long pg_sz)
 {
unsigned long set = pte_val(entry) &
(_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_RW | _PAGE_EXEC);
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 86870c11917b..46d739457d68 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -580,10 +580,13 @@ static inline bool check_pte_access(unsigned long access, 
unsigned long ptev)
  */
 
 static inline void __ptep_set_access_flags(struct mm_struct *mm,
-  pte_t *ptep, pte_t entry)
+  pte_t *ptep, pte_t entry,
+  unsigned long address,
+  unsigned long pg_sz)
 {
if (radix_enabled())
-   return radix__ptep_set_access_flags(mm, ptep, entry);
+   return radix__ptep_set_access_flags(mm, ptep, entry,
+   address, pg_sz);
return hash__ptep_set_access_flags(ptep, entry);
 }
 
diff --git a/arch/powerpc/include/asm/book3s/64/radix.h 
b/arch/powerpc/include/asm/book3s/64/radix.h
index 2a46dea8e1b1..279b2f68e00f 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -110,6 +110,7 @@
 #define RADIX_PUD_TABLE_SIZE   (sizeof(pud_t) << RADIX_PUD_INDEX_SIZE)
 #define RADIX_PGD_TABLE_SIZE   (sizeof(pgd_t) << RADIX_PGD_INDEX_SIZE)
 
+extern int radix_get_mmu_psize(unsigned long pg_sz);
 static inline unsigned long __radix_pte_update(pte_t *ptep, unsigned long clr,
   unsigned long set)
 {
@@ -167,7 +168,9 @@ static inline unsigned long radix__pte_update(struct 
mm_struct *mm,
  * function doesn't need to invalidate tlb.
  */
 static inline void radix__ptep_set_access_flags(struct mm_struct *mm,
-   pte_t *ptep, pte_t entry)
+   pte_t *ptep, pte_t entry,
+   unsigned long address,
+   unsigned long pg_sz)
 {
 
unsigned long set = pte_val(entry) & (_PAGE_DIRTY | _PAGE_ACCESSED |
@@ -175,6 +178,7 @@ static inline void radix__ptep_set_access_flags(struct 
mm_struct *mm,
 
if (cpu_has_feature(CPU_FTR_POWER9_DD1)) {
 
+   int psize;
unsigned long old_pte, new_pte;
 
old_pte = __radix_pte_update(ptep, ~0, 0);
@@ -183,12 +187,8 @@ static inline void radix__ptep_set_access_flags(struct 
mm_struct *mm,
 * new value of pte
 */
new_pte = old_pte | set;
-
-   /*
-* For now let's do heavy pid flush
-* radix__flush_tlb_page_psize(mm, addr, mmu_virtual_psize);
-*/
-   radix__flush_tlb_mm(mm);
+   psize = radix_get_mmu_psize(pg_sz);
+   radix__flush_tlb_page_psize(mm, address, psize);
 
__radix_pte_update(ptep, 0, new_pte);
} else
diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h 
b/arch/powerpc/include/asm/nohash/32/pgtable.h
index c219ef7be53b..24ee66bf7223 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -268,7 +268,9 @@ static inline void huge_

Re: [powerpc v4 3/3] Enable storage keys for radix - user mode execution

2016-11-14 Thread kbuild test robot

Hi Balbir,

[auto build test WARNING on powerpc/master]

url:
https://github.com/0day-ci/linux/commits/Balbir-Singh/Enable-IAMR-storage-keys-for-radix/20161114-133434
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git master
config: powerpc-allyesconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=powerpc 

All warnings (new ones prefixed by >>):

>> WARNING: vmlinux.o(.text+0x645f0): Section mismatch in reference from the 
>> function .radix__early_init_mmu_secondary() to the function 
>> .init.text:.radix_init_iamr()
   The function .radix__early_init_mmu_secondary() references
   the function __init .radix_init_iamr().
   This is often because .radix__early_init_mmu_secondary lacks a __init
   annotation or the annotation of .radix_init_iamr is wrong.

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH] powerpc: fpr save/restore function cleanups

2016-11-14 Thread Michael Ellerman

Nicholas Piggin  writes:

> On Tue, 1 Nov 2016 15:41:12 +1100
> Nicholas Piggin  wrote:
>
>> On Tue,  1 Nov 2016 15:22:19 +1100
>> Nicholas Piggin  wrote:
>> 
>> > The powerpc64 linker generates fpr save/restore functions on-demand,
>> > placing them in the .sfpr section. So remove the explicitly coded ones
>> > from the 64 build.
>> > 
>> > Have 32-bit put save/restore functions into .sfpr section rather than
>> > .text, to match 64-bit.
>> > 
>> > And explicitly have the linker script place the section rather than
>> > leaving it as orphan.
>> > 
>> > Signed-off-by: Nicholas Piggin 
>> > ---
>> > 
>> > I tested this with 64-bit optimize-for-size build with modules,
>> > and that works okay.  
>> 
>> No I didn't, it's broken. I'll send an update shortly.
>
> Working (hopefully) patch this time. I also removed the 32-bit changes
> which aren't really necessary:
>
> The powerpc64 linker generates fpr save/restore functions on-demand,
> placing them in the .sfpr section. Module linking (because it's a
> "non-final" link) requires --save-restore-funcs for this.
>
> Remove the explicitly coded save/restore functions from the 64 build.
>
> And explicitly have the linker script place the section rather than
> leaving it as orphan.

>
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/Makefile |   4 +
>  arch/powerpc/boot/Makefile|   3 +-
>  arch/powerpc/boot/crtsavres.S |   8 +-
>  arch/powerpc/kernel/vmlinux.lds.S |   6 +
>  arch/powerpc/lib/Makefile |   5 +-
>  arch/powerpc/lib/crtsavres.S  | 238 
> +-
>  6 files changed, 22 insertions(+), 242 deletions(-)
>
> diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> index fe76cfe..8ea7c9e 100644
> --- a/arch/powerpc/Makefile
> +++ b/arch/powerpc/Makefile
> @@ -179,7 +179,11 @@ else
>  CHECKFLAGS   += -D__LITTLE_ENDIAN__
>  endif
>  
> +ifeq ($(CONFIG_PPC32),y)
>  KBUILD_LDFLAGS_MODULE += arch/powerpc/lib/crtsavres.o
> +else
> +KBUILD_LDFLAGS_MODULE += --save-restore-funcs

As discussed offline, this option is reasonably new, added in 2014 and
appearing in binutils 2.25.

So it's too new to require, we can use it if it's there, but we still
need a fallback for when it's not.

cheers

[PATCH] T4240RBD: add device tree entry for W83793

2016-11-14 Thread Florian Larysch

The T4240RDB contains a W83793 hardware monitoring chip. Add a device
tree entry to make the driver attach to it, as the i2c-mpc bus driver
dropped support for class-based instantiation of devices a long time
ago.

Signed-off-by: Florian Larysch 
---
 arch/powerpc/boot/dts/fsl/t4240rdb.dts | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/boot/dts/fsl/t4240rdb.dts 
b/arch/powerpc/boot/dts/fsl/t4240rdb.dts
index cc0a264..b35eea1 100644
--- a/arch/powerpc/boot/dts/fsl/t4240rdb.dts
+++ b/arch/powerpc/boot/dts/fsl/t4240rdb.dts
@@ -142,6 +142,10 @@
reg = <0x68>;
interrupts = <0x1 0x1 0 0>;
};
+   hwmon@2f {
+   compatible = "winbond,w83793";
+   reg = <0x2f>;
+   };
};
 
sdhc@114000 {
-- 
2.10.2

Re: powerpc/pseries: Use H_CLEAR_HPT to clear MMU hash table during kexec

2016-11-14 Thread Michael Ellerman

On Sat, 2016-01-10 at 10:41:56 UTC, Anton Blanchard wrote:
> From: Anton Blanchard 
> 
> An hcall was recently added that does exactly what we need
> during kexec - it clears the entire MMU hash table, ignoring any
> VRMA mappings.
> 
> Try it and fall back to the old method if we get a failure.
> 
> On a POWER8 box with 5TB of memory, this reduces the time it takes to
> kexec a new kernel from from 4 minutes to 1 minute.
> 
> Signed-off-by: Anton Blanchard 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/5246adec59458b5d325b8e1462ea9e

cheers

Re: tools/testing/selftests/powerpc: Add Anton's null_syscall benchmark to the selftests

2016-11-14 Thread Michael Ellerman

On Tue, 2016-27-09 at 14:10:16 UTC, Rui Teng wrote:
> From: Anton Blanchard 
> 
> Pull in a version of Anton's null_syscall benchmark:
> http://ozlabs.org/~anton/junkcode/null_syscall.c
> Into tools/testing/selftests/powerpc/benchmarks.
> 
> Suggested-by: Michael Ellerman 
> Signed-off-by: Anton Blanchard 
> Signed-off-by: Rui Teng 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/d8db9bc55a316d97d0acdf6ccc9f43

cheers

Re: [RESEND] tools-powerpc: Return false instead of -1

2016-11-14 Thread Michael Ellerman

On Wed, 2016-09-11 at 08:55:04 UTC, Andrew Shadura wrote:
> From: Peter Senna Tschudin 
> 
> Returning a negative value for a boolean function seem to have the
> undesired effect of returning true. require_paranoia_below() is a
> boolean function, but the variable used to store the return value is an
> integer, receiving -1 or 0. This patch convert rc to bool, replace -1
> by false, and 0 by true.
> 
> This issue was found by the following Coccinelle semantic patch:
> 
> @@
> identifier f, ret;
> constant C;
> typedef bool;
> @@
> bool f (...){
> <+...
> ret = -C;
> ...
> * return ret;
> ...+>
> }
> 
> 
> Signed-off-by: Peter Senna Tschudin 
> Signed-off-by: Andrew Shadura 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/0e27d27e0d6d92342797e0d37738c2

cheers

Re: Removed dependency on IDE_GD_ATA if ADB_PMU_LED_DISK is selected.

2016-11-14 Thread Michael Ellerman

On Sun, 2016-18-09 at 11:08:45 UTC, Elimar Riesebieter wrote:
> We can use the front led of powerbooks/ibooks as a disk activitiy imagination
> without the deprecated IDE_GD_ATA.
> 
> Signed-off-by: Elimar Riesebieter 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/0e865a80c1358db9311c411c4763b9

cheers

Re: powerpc/64s: reduce exception alignment

2016-11-14 Thread Michael Ellerman

On Thu, 2016-13-10 at 03:43:52 UTC, Nicholas Piggin wrote:
> Exception handlers are aligned to 128 bytes (L1 cache) on 64s, which is
> overkill. It can reduce the icache footprint of any individual exception
> path. However taken as a whole, the expansion in icache footprint seems
> likely to be counter-productive and cause more total misses.
> 
> Create IFETCH_ALIGN_SHIFT/BYTES, which should give optimal ifetch
> alignment with much more reasonable alignment. This saves 1792 bytes
> from head_64.o text with an allmodconfig build.
> 
> Other subarchitectures should define appropriate IFETCH_ALIGN_SHIFT
> values if this becomes more widely used.
> 
> Cc: Anton Blanchard 
> Signed-off-by: Nicholas Piggin 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/f4329f2ecb149282fdfdd8830a936a

cheers

Re: powerpc: Remove suspect CONFIG_PPC_BOOK3E #ifdefs in nohash/64/pgtable.h

2016-11-14 Thread Michael Ellerman

On Thu, 2016-25-08 at 06:31:10 UTC, Rui Teng wrote:
> There are three #ifdef CONFIG_PPC_BOOK3E sections in nohash/64/pgtable.h.
> And there should be no configurations possible which use nohash/64/pgtable.h
> but don't also enable CONFIG_PPC_BOOK3E.
> 
> Suggested-by: Michael Ellerman 
> Signed-off-by: Rui Teng 
> Reviewed-by: Aneesh Kumar K.V 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/fda0440d82d38e603cd713e997fdfe

cheers

Re: powerpc: make _ASM_NOKPROBE_SYMBOL a noop when KPROBES not defined

2016-11-14 Thread Michael Ellerman

On Thu, 2016-13-10 at 02:07:14 UTC, Nicholas Piggin wrote:
> Signed-off-by: Nicholas Piggin 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/c0a5149105ab5f76f7c5a8fc2eb3d7

cheers

Re: [v2] powerpc: EX_TABLE macro for exception tables

2016-11-14 Thread Michael Ellerman

On Thu, 2016-03-11 at 05:43:12 UTC, Michael Ellerman wrote:
> From: Nicholas Piggin 
> 
> This macro is taken from s390, and allows more flexibility in
> changing exception table format.
> 
> mpe: Put it in ppc_asm.h and only define one version using stringinfy_in_c().
> Add some empty definitions and headers to keep the selftests happy.
> Add some missing .previouses in fsl_rio.c and tsi108_pci.c.
> 
> Signed-off-by: Nicholas Piggin 
> Signed-off-by: Michael Ellerman 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/24bfa6a9e0d4fe414dfc4ad06c93e1

cheers

Re: [v2] powernv: Simplify searching for compatible device nodes

2016-11-14 Thread Michael Ellerman

On Thu, 2016-11-08 at 00:32:40 UTC, Jack Miller wrote:
> This condenses the opal node searching into a single function that finds
> all compatible nodes, instead of just searching the ibm,opal children,
> for ipmi, flash, and prd similar to how opal-i2c nodes are found.
> 
> Signed-off-by: Jack Miller 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/9e4f51bdaf880208869aa001ee94a4

cheers

Re: [v2] powerpc/64: option to force run-at-load to test relocation

2016-11-14 Thread Michael Ellerman

On Fri, 2016-14-10 at 07:31:33 UTC, Nicholas Piggin wrote:
> This adds a config option that can help exercise the case when
> the kernel is not running at PAGE_OFFSET.
> 
> Signed-off-by: Nicholas Piggin 
> Reviewed-by: Balbir Singh 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/70839d207792aa348f013c733e8853

cheers

Re: i2c_powermac: shut up lockdep warning

2016-11-14 Thread Michael Ellerman

On Wed, 2016-21-09 at 11:34:58 UTC, Denis Kirjanov wrote:
> That's unclear why lockdep shows the following warning but adding a
> lockdep class to struct pmac_i2c_bus solves it
> 
> [   20.507795] ==
> [   20.507796] [ INFO: possible circular locking dependency detected ]
> [   20.507800] 4.8.0-rc7-00037-gd2ffb01 #21 Not tainted
> [   20.507801] ---
> [   20.507803] swapper/0/1 is trying to acquire lock:
> [   20.507818]  (&bus->mutex){+.+.+.}, at: [] 
> .pmac_i2c_open+0x30/0x100
> [   20.507819]
> [   20.507819] but task is already holding lock:
> [   20.507829]  (&policy->rwsem){+.+.+.}, at: [] 
> .cpufreq_online+0x1ac/0x9d0
> [   20.507830]
> [   20.507830] which lock already depends on the new lock.
> [   20.507830]
> [   20.507832]
> [   20.507832] the existing dependency chain (in reverse order) is:
> [   20.507837]
> [   20.507837] -> #4 (&policy->rwsem){+.+.+.}:
> [   20.507844][] .down_write+0x6c/0x110
> [   20.507849][] .cpufreq_online+0x1ac/0x9d0
> [   20.507855][] 
> .subsys_interface_register+0xb8/0x110
> [   20.507860][] 
> .cpufreq_register_driver+0x1d0/0x250
> [   20.507866][] .g5_cpufreq_init+0x9cc/0xa28
> [   20.507872][] .do_one_initcall+0x5c/0x1d0
> [   20.507878][] .kernel_init_freeable+0x1ac/0x28c
> [   20.507883][] .kernel_init+0x1c/0x140
> [   20.507887][] .ret_from_kernel_thread+0x58/0x64
> [   20.507894]
> [   20.507894] -> #3 (subsys mutex#2){+.+.+.}:
> [   20.507899][] .mutex_lock_nested+0xa8/0x590
> [   20.507903][] .bus_probe_device+0x44/0xe0
> [   20.507907][] .device_add+0x508/0x730
> [   20.507911][] .register_cpu+0x118/0x190
> [   20.507916][] .topology_init+0x148/0x248
> [   20.507921][] .do_one_initcall+0x5c/0x1d0
> [   20.507925][] .kernel_init_freeable+0x1ac/0x28c
> [   20.507929][] .kernel_init+0x1c/0x140
> [   20.507934][] .ret_from_kernel_thread+0x58/0x64
> [   20.507939]
> [   20.507939] -> #2 (cpu_add_remove_lock){+.+.+.}:
> [   20.507944][] .mutex_lock_nested+0xa8/0x590
> [   20.507950][] .register_cpu_notifier+0x2c/0x70
> [   20.507955][] .spawn_ksoftirqd+0x18/0x4c
> [   20.507959][] .do_one_initcall+0x5c/0x1d0
> [   20.507964][] .kernel_init_freeable+0xb0/0x28c
> [   20.507968][] .kernel_init+0x1c/0x140
> [   20.507972][] .ret_from_kernel_thread+0x58/0x64
> [   20.507978]
> [   20.507978] -> #1 (&host->mutex){+.+.+.}:
> [   20.507982][] .mutex_lock_nested+0xa8/0x590
> [   20.507987][] .kw_i2c_open+0x18/0x30
> [   20.507991][] .pmac_i2c_open+0x94/0x100
> [   20.507995][] .smp_core99_probe+0x260/0x410
> [   20.507999][] .smp_prepare_cpus+0x280/0x2ac
> [   20.508003][] .kernel_init_freeable+0x88/0x28c
> [   20.508008][] .kernel_init+0x1c/0x140
> [   20.508012][] .ret_from_kernel_thread+0x58/0x64
> [   20.508018]
> [   20.508018] -> #0 (&bus->mutex){+.+.+.}:
> [   20.508023][] .lock_acquire+0x84/0x100
> [   20.508027][] .mutex_lock_nested+0xa8/0x590
> [   20.508032][] .pmac_i2c_open+0x30/0x100
> [   20.508037][] .pmac_i2c_do_begin+0x34/0x120
> [   20.508040][] .pmf_call_one+0x50/0xd0
> [   20.508045][] .g5_pfunc_switch_volt+0x2c/0xc0
> [   20.508050][] .g5_pfunc_switch_freq+0x1cc/0x1f0
> [   20.508054][] .g5_cpufreq_target+0x2c/0x40
> [   20.508058][] 
> .__cpufreq_driver_target+0x23c/0x840
> [   20.508062][] 
> .cpufreq_gov_performance_limits+0x18/0x30
> [   20.508067][] .cpufreq_start_governor+0xac/0x100
> [   20.508071][] .cpufreq_set_policy+0x208/0x260
> [   20.508076][] .cpufreq_init_policy+0x6c/0xb0
> [   20.508081][] .cpufreq_online+0x250/0x9d0
> [   20.508085][] 
> .subsys_interface_register+0xb8/0x110
> [   20.508090][] 
> .cpufreq_register_driver+0x1d0/0x250
> [   20.508094][] .g5_cpufreq_init+0x9cc/0xa28
> [   20.508099][] .do_one_initcall+0x5c/0x1d0
> [   20.508103][] .kernel_init_freeable+0x1ac/0x28c
> [   20.508107][] .kernel_init+0x1c/0x140
> [   20.508112][] .ret_from_kernel_thread+0x58/0x64
> [   20.508113]
> [   20.508113] other info that might help us debug this:
> [   20.508113]
> [   20.508121] Chain exists of:
> [   20.508121]   &bus->mutex --> subsys mutex#2 --> &policy->rwsem
> [   20.508121]
> [   20.508123]  Possible unsafe locking scenario:
> [   20.508123]
> [   20.508124]CPU0CPU1
> [   20.508125]
> [   20.508128]   lock(&policy->rwsem);
> [   20.508132]lock(subsys mutex#2);
> [   20.508135]lock(&policy->rwsem);
> [   20.508138]   lock(&bus->mutex);
> [   20.508139]
> [   20.508139]  **

Re: powerpc/configs: Drop REISERFS from pseries & powernv

2016-11-14 Thread Michael Ellerman

On Thu, 2016-03-11 at 07:18:07 UTC, Michael Ellerman wrote:
> No one uses reiserfs much these days, or is likely to in future. So drop
> it from pseries and powernv defconfigs to save time and space. It's
> still enabled in ppc64_defconfig so we get some build coverage.
> 
> Signed-off-by: Michael Ellerman 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/7a53ef5ebe80bc15f70d060dc45484

cheers

Re: [v2] powerpc/hash64: Be more careful when generating tlbiel

2016-11-14 Thread Michael Ellerman

On Wed, 2016-19-10 at 05:53:25 UTC, Michael Ellerman wrote:
> From: Balbir Singh 
> 
> In ISA v2.05, the tlbiel instruction takes two arguments, RB and L:
> 
> tlbiel RB,L
> 
> +-+-++-+-+-++
> |31   |/| L  |/|RB   |   274   | /  |
> | 31 - 26 | 25 - 22 | 21 | 20 - 16 | 15 - 11 |  10 - 1 | 0  |
> +-+-++-+-+-++
> 
> In ISA v2.06 tlbiel takes only one argument, RB:
> 
> tlbiel RB
> 
> +-+-+-+-+-++
> |31   |/|/|RB   |   274   | /  |
> | 31 - 26 | 25 - 21 | 20 - 16 | 15 - 11 |  10 - 1 | 0  |
> +-+-+-+-+-++
> 
> And in ISA v3.00 tlbiel takes five arguments:
> 
> tlbiel RB,RS,RIC,PRS,R
> 
> +-+-++-+++-+-++
> |31   |RS   | /  |   RIC   |PRS | R  |RB   |   274   | /  |
> | 31 - 26 | 25 - 21 | 20 | 19 - 18 | 17 | 16 | 15 - 11 |  10 - 1 | 0  |
> +-+-++-+++-+-++
> 
> However the assembler also accepts "tlbiel RB", and generates
> "tlbiel RB,r0,0,0,0".
> 
> As you can see above the L field from the v2.05 encoding overlaps with the
> reserved field of the v2.06 encoding, and the low bit of the RS field of the
> v3.00 encoding.
> 
> Currently in __tlbiel() we generate two tlbiel instructions manually using hex
> constants. In the first case, for MMU_PAGE_4K, we generate "tlbiel RB,0", 
> which
> is safe in all cases, because the L bit is zero.
> 
> However in the default case we generate "tlbiel RB,1", therefore setting bit 
> 21
> to 1.
> 
> This is not an actual bug on v2.06 processors, because the CPU ignores the 
> value
> of the reserved field. However software is supposed to encode the reserved
> fields as zero to enable forward compatibility.
> 
> On v3.00 processors setting bit 21 to 1 and no other bits of RS, means we are
> using r1 for the value of RS.
> 
> Although it's not obvious, the code sets the IS field (bits 10-11) to 0 (by
> omission), and L=1, in the va value, which is passed as RB. We also pass R=0 
> in
> the instruction.
> 
> The combination of IS=0, L=1 and R=0 means the value of RS is not used, so 
> even
> on ISA v3.00 there is no actual bug.
> 
> We should still fix it, as setting a reserved bit on v2.06 is naughty, and we
> are only avoiding a bug on v3.00 by accident rather than design. Use
> ASM_FTR_IFSET() to generate the single argument form on ISA v2.06 and later, 
> and
> the two argument form on pre v2.06.
> 
> Although there may be very old toolchains which don't understand tlbiel, we 
> have
> other code in the tree which has been using tlbiel for over five years, and no
> one has reported any build failures, so just let the assembler generate the
> instructions.
> 
> Signed-off-by: Balbir Singh 
> [mpe: Rewrite change log, use IFSET instead of IFCLR]
> Signed-off-by: Michael Ellerman 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/f923efbcfdbaa4391874eeda676b08

cheers

Re: powerpc/book3s64: Always build for power4 or later

2016-11-14 Thread Michael Ellerman

On Fri, 2016-21-10 at 00:01:51 UTC, Michael Ellerman wrote:
> When we're not compiling for a specific CPU, ie. none of the
> CONFIG_POWERx_CPU options are set, and CONFIG_GENERIC_CPU *is* set, we
> currently don't pass any -mcpu option to the compiler. This means the
> compiler builds for a "generic" Power CPU.
> 
> But back in 2014 we dropped support for pre power4 CPUs in commit
> 468a33028edd ("powerpc: Drop support for pre-POWER4 cpus").
> 
> Given that, there's no point in building the kernel to run on pre power4
> cpus. So update the flags we pass to the compiler when
> CONFIG_GENERIC_CPU is set, to specify -mcpu=power4.
> 
> Signed-off-by: Michael Ellerman 
> Acked-by: Balbir Singh 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/3a849815a13695ad0e32f6f01c9054

cheers

Re: powerpc/asm: Allow including ppc_asm.h in asm files

2016-11-14 Thread Michael Ellerman

On Fri, 2016-04-11 at 02:39:43 UTC, Michael Ellerman wrote:
> There's no reason to #error if we include ppc_asm.h in asm files, the
> ifdef already prevents any problems.
> 
> Signed-off-by: Michael Ellerman 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/e3f2c6c39371c0e67c7fd6ebcb8b74

cheers

Re: powerpc/module: Add support for R_PPC64_REL32 relocations

2016-11-14 Thread Michael Ellerman

On Thu, 2016-03-11 at 04:39:17 UTC, Michael Ellerman wrote:
> We haven't seen these before, but the soon to be merged relative
> exception tables support causes them to be generated.
> 
> Signed-off-by: Michael Ellerman 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/9f751b82b491d06c6438066b511d44

cheers

Re: [4/4] powerpc/pci: fix device reference leaks

2016-11-14 Thread Michael Ellerman

On Tue, 2016-01-11 at 15:26:03 UTC, Johan Hovold wrote:
> Make sure to drop any device reference taken by vio_find_node() when
> adding and removing virtual I/O slots.
> 
> Fixes: 5eeb8c63a38f ("[PATCH] PCI Hotplug: rpaphp: Move VIO...")
> Signed-off-by: Johan Hovold 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/99e5cde5eae78bef95bfe7c16ccda8

cheers

Re: [3/4] powerpc/vio: clarify vio_find_node reference counting

2016-11-14 Thread Michael Ellerman

On Tue, 2016-01-11 at 15:26:02 UTC, Johan Hovold wrote:
> Add comment clarifying that vio_find_node() takes a reference to the
> embedded struct device which needs to be dropped after use.
> 
> Signed-off-by: Johan Hovold 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/e8cfb7e7c3b2be8a4c2241b5da2ae6

cheers

Re: [3/3] powerpc: build-time sort exception table

2016-11-14 Thread Michael Ellerman

On Thu, 2016-13-10 at 05:42:55 UTC, Nicholas Piggin wrote:
> Signed-off-by: Nicholas Piggin 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/5b9ff027859868efd63cdbbff5d301

cheers

Re: [2/4] ibmebus: fix further device reference leaks

2016-11-14 Thread Michael Ellerman

On Tue, 2016-01-11 at 15:26:01 UTC, Johan Hovold wrote:
> Make sure to drop any reference taken by bus_find_device() when creating
> devices during init and driver registration.
> 
> Fixes: 55347cc9962f ("[POWERPC] ibmebus: Add device creation...)
> Signed-off-by: Johan Hovold 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/815a7141c4d1b11610dccb7fcbb386

cheers

Re: [2/3] powerpc: relative exception tables

2016-11-14 Thread Michael Ellerman

On Thu, 2016-13-10 at 05:42:54 UTC, Nicholas Piggin wrote:
> This halves the exception table size on 64-bit builds, and it
> allows build-time sorting of exception tables to work on
> relocated kernels.
> 
> Signed-off-by: Nicholas Piggin 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/61a92f703120daf7ed25e046275aa8

cheers

Re: [2/2] selftests/powerpc: Fail load_unaligned_zeropad on miscompare

2016-11-14 Thread Michael Ellerman

On Thu, 2016-03-11 at 04:41:01 UTC, Michael Ellerman wrote:
> If the result returned by load_unaligned_zeropad() doesn't match what we
> expect we should fail the test!
> 
> Signed-off-by: Michael Ellerman 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/997e200182347d2cc7e37bc43eaafe

cheers

Re: [2/2] Revert "powerpc: Load Monitor Register Support"

2016-11-14 Thread Michael Ellerman

On Mon, 2016-31-10 at 02:19:39 UTC, Michael Neuling wrote:
> Load monitored is no longer supported on POWER9 so let's remove the
> code.
> 
> This reverts commit bd3ea317fddfd0f2044f94bed294b90c4bc8e69e.
> 
> Signed-off-by: Michael Neuling 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/29a969b764817c1dce819c2bc8c00a

cheers

Re: [1/4] ibmebus: fix device reference leaks in sysfs interface

2016-11-14 Thread Michael Ellerman

On Tue, 2016-01-11 at 15:26:00 UTC, Johan Hovold wrote:
> Make sure to drop any reference taken by bus_find_device() in the sysfs
> callbacks that are used to create and destroy devices based on
> device-tree entries.
> 
> Fixes: 6bccf755ff53 ("[POWERPC] ibmebus: dynamic addition/removal...)
> Signed-off-by: Johan Hovold 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/fe0f3168169f7c34c29b0cf0c489f1

cheers

Re: [1/2] selftests/powerpc: Abort load_unaligned_zeropad on unhandled SEGV

2016-11-14 Thread Michael Ellerman

On Thu, 2016-03-11 at 04:41:00 UTC, Michael Ellerman wrote:
> If the load unaligned zeropad test takes a SEGV which can't be handled,
> we increment segv_error, print the offending NIP and then return without
> taking any further action. In almost all cases this means we'll just
> take the SEGV again, and loop eternally spamming the console.
> 
> Instead just abort(), it's a fatal error in the test. The test harness
> will notice that the child died and print a nice message for us.
> 
> Signed-off-by: Michael Ellerman 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/06236f4efb926aa433e2cb3e36e846

cheers

Re: [1/2] Revert "selftests/powerpc: Load Monitor Register Tests"

2016-11-14 Thread Michael Ellerman

On Mon, 2016-31-10 at 02:19:38 UTC, Michael Neuling wrote:
> Load monitored won't be supported in POWER9, so PPC_FEATURE2_ARCH_3_00
> (in HWCAP2) will no longer imply Load monitor support.
> 
> These Load monitored tests are enabled by PPC_FEATURE2_ARCH_3_00 so
> they are now bogus and need to be removed.
> 
> This reverts commit 16c19a2e983346c547501795aadffde1977b058d.
> 
> Signed-off-by: Michael Neuling 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/7c65856b7e0d12f2be2adae7a75a11

cheers

[powerpc v5 3/3] Enable storage keys for radix - user mode execution

2016-11-14 Thread Balbir Singh

ISA 3 defines new encoded access authority that allows instruction
access prevention in privileged mode and allows normal access
to problem state. This patch just enables IAMR (Instruction Authority
Mask Register), enabling AMR would require more work.

I've tested this with a buggy driver and a simple payload. The payload
is specific to the build I've tested.

Signed-off-by: Balbir Singh 
---
 arch/powerpc/mm/pgtable-radix.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 7c21a52..ae0d701 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -339,6 +339,24 @@ static void radix_init_amor(void)
mtspr(SPRN_AMOR, amor);
 }
 
+/*
+ * For radix page tables we setup, the IAMR values as follows
+ * IMAR = 0100...00 (key 0 is set to 1)
+ * AMR, UAMR, UAMOR are not affected
+ */
+static void radix_init_iamr(void)
+{
+   unsigned long iamr_mask = 0x4000;
+   unsigned long iamr = mfspr(SPRN_IAMR);
+
+   if (cpu_has_feature(CPU_FTR_POWER9_DD1))
+   return;
+
+   iamr = iamr_mask;
+
+   mtspr(SPRN_IAMR, iamr);
+}
+
 void __init radix__early_init_mmu(void)
 {
unsigned long lpcr;
@@ -398,6 +416,7 @@ void __init radix__early_init_mmu(void)
radix_init_amor();
}
 
+   radix_init_iamr();
radix_init_pgtable();
 }
 
@@ -415,6 +434,7 @@ void radix__early_init_mmu_secondary(void)
  __pa(partition_tb) | (PATB_SIZE_SHIFT - 12));
radix_init_amor();
}
+   radix_init_iamr();
 }
 
 void radix__mmu_cleanup_all(void)
-- 
2.5.5

[powerpc v5 2/3] Detect instruction fetch denied and report

2016-11-14 Thread Balbir Singh

ISA 3 allows for prevention of instruction fetch and execution
of user mode pages. If such an error occurs, SRR1 bit 35
reports the error. We catch and report the error in do_page_fault()

Signed-off-by: Balbir Singh 
---
 arch/powerpc/mm/fault.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index d0b137d..1e7ff7b 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -404,6 +404,10 @@ int do_page_fault(struct pt_regs *regs, unsigned long 
address,
(cpu_has_feature(CPU_FTR_NOEXECUTE) ||
 !(vma->vm_flags & (VM_READ | VM_WRITE
goto bad_area;
+
+   if (regs->msr & SRR1_ISI_N_OR_G)
+   goto bad_area;
+
 #ifdef CONFIG_PPC_STD_MMU
/*
 * protfault should only happen due to us
-- 
2.5.5

1 2 >

1 - 100 of 121 matches

Mail list logo