Re: [PATCH] powerpc/mm/hash: Always clear UPRT and Host Radix bits when setting up CPU

2017-02-21 Thread Aneesh Kumar K.V



On Wednesday 22 February 2017 11:54 AM, Balbir Singh wrote:

On Wed, Feb 22, 2017 at 10:42:02AM +0530, Aneesh Kumar K.V wrote:

We will set LPCR with correct value for radix during int. This make sure we
start with a sanitized value of LPCR. In case of kexec, cpus can have LPCR
value based on the previous translation mode we were running.

Fixes: fe036a0605d60 ("powerpc/64/kexec: Fix MMU cleanup on radix")
Cc: sta...@vger.kernel.org # v4.9+
Acked-by: Michael Neuling 
Signed-off-by: Aneesh Kumar K.V 
---
  arch/powerpc/kernel/cpu_setup_power.S | 4 
  1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/kernel/cpu_setup_power.S 
b/arch/powerpc/kernel/cpu_setup_power.S
index 917188615bf5..7fe8c79e6937 100644
--- a/arch/powerpc/kernel/cpu_setup_power.S
+++ b/arch/powerpc/kernel/cpu_setup_power.S
@@ -101,6 +101,8 @@ _GLOBAL(__setup_cpu_power9)
mfspr   r3,SPRN_LPCR
LOAD_REG_IMMEDIATE(r4, LPCR_PECEDH | LPCR_PECE_HVEE | LPCR_HVICE)
or  r3, r3, r4
+   LOAD_REG_IMMEDIATE(r4, LPCR_UPRT | LPCR_HR)
+   andcr3, r3, r4
bl  __init_LPCR
bl  __init_HFSCR
bl  __init_tlb_power9
@@ -122,6 +124,8 @@ _GLOBAL(__restore_cpu_power9)
mfspr   r3,SPRN_LPCR
LOAD_REG_IMMEDIATE(r4, LPCR_PECEDH | LPCR_PECE_HVEE | LPCR_HVICE)
or  r3, r3, r4
+   LOAD_REG_IMMEDIATE(r4, LPCR_UPRT | LPCR_HR)
+   andcr3, r3, r4
bl  __init_LPCR
bl  __init_HFSCR
bl  __init_tlb_power9

My previous comment mentions GTSE, but really we should be clearing
LPCR to 0 and setting it to sane values in __init_LPCR



IIUC we do want to inherit values from firmware/skiboot. Hence the
explicit usage of mfspr/or. What we want to clear here are values we 
updated/changed

based on translation mode.

-aneesh



Re: [PATCH 0/2] Allow configurable stack size (especially 32k on PPC64)

2017-02-21 Thread Michael Ellerman
Hamish Martin  writes:
> This patch series adds the ability to configure the THREAD_SHIFT value and
> thereby alter the stack size on powerpc systems. We are particularly 
> interested
> in configuring for a 32k stack on PPC64.
...
>
> For instance for a 70 frame stack, the architecture overhead just for the 
> stack
> frames is:
>70 * 16 bytes = 1120 bytes for PPC32, and
>70 * 112 bytes = 7840 bytes for PPC64.
> So a simple doubling of the PPC32 stack size leaves us with a shortfall of 
> 5600
> bytes (7840 - (2 * 1120)). In the example the stack frame overhead for PPC32 
> is
> 1120/8192 = 13.5% of the stack space, whereas for PPC64 it is 7840/16384 =
> 47.8% of the space.
>
> The aim of this series is to provide the ability for users to configure for
> larger stacks without altering the defaults in a way that would impact 
> existing
> users. However, given the inequity between the PPC32 and PPC64 stacks when
> taking into account the respective minimum stack frame sizes, we believe
> consideration should be given to having a large default. We would appreciate
> any input or opinions on this issue.

Thanks for the detailed explanation.

The patches look fine, so I don't see any reason why we wouldn't merge
this. I might make the config option depend on EXPERT, but that's just
cosmetic.


You're right about the difference in stack overhead between 32 & 64-bit.
But I guess on the other hand we've been using 16K stacks on 64-bit for
over 15 years, and although we have had some reports of stack overflow
they're not a common problem.

cheers


Re: [PATCH] powerpc/mm/hash: Always clear UPRT and Host Radix bits when setting up CPU

2017-02-21 Thread Balbir Singh
On Wed, Feb 22, 2017 at 10:42:02AM +0530, Aneesh Kumar K.V wrote:
> We will set LPCR with correct value for radix during int. This make sure we
> start with a sanitized value of LPCR. In case of kexec, cpus can have LPCR
> value based on the previous translation mode we were running.
> 
> Fixes: fe036a0605d60 ("powerpc/64/kexec: Fix MMU cleanup on radix")
> Cc: sta...@vger.kernel.org # v4.9+
> Acked-by: Michael Neuling 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/kernel/cpu_setup_power.S | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/cpu_setup_power.S 
> b/arch/powerpc/kernel/cpu_setup_power.S
> index 917188615bf5..7fe8c79e6937 100644
> --- a/arch/powerpc/kernel/cpu_setup_power.S
> +++ b/arch/powerpc/kernel/cpu_setup_power.S
> @@ -101,6 +101,8 @@ _GLOBAL(__setup_cpu_power9)
>   mfspr   r3,SPRN_LPCR
>   LOAD_REG_IMMEDIATE(r4, LPCR_PECEDH | LPCR_PECE_HVEE | LPCR_HVICE)
>   or  r3, r3, r4
> + LOAD_REG_IMMEDIATE(r4, LPCR_UPRT | LPCR_HR)
> + andcr3, r3, r4
>   bl  __init_LPCR
>   bl  __init_HFSCR
>   bl  __init_tlb_power9
> @@ -122,6 +124,8 @@ _GLOBAL(__restore_cpu_power9)
>   mfspr   r3,SPRN_LPCR
>   LOAD_REG_IMMEDIATE(r4, LPCR_PECEDH | LPCR_PECE_HVEE | LPCR_HVICE)
>   or  r3, r3, r4
> + LOAD_REG_IMMEDIATE(r4, LPCR_UPRT | LPCR_HR)
> + andcr3, r3, r4
>   bl  __init_LPCR
>   bl  __init_HFSCR
>   bl  __init_tlb_power9

My previous comment mentions GTSE, but really we should be clearing
LPCR to 0 and setting it to sane values in __init_LPCR

Balbir


Re: [PATCH] powerpc/mm/hash: Always clear UPRT and Host Radix bits when setting up CPU

2017-02-21 Thread Aneesh Kumar K.V



On Wednesday 22 February 2017 11:46 AM, Balbir Singh wrote:

On Wed, Feb 22, 2017 at 10:42:02AM +0530, Aneesh Kumar K.V wrote:

We will set LPCR with correct value for radix during int. This make sure we
start with a sanitized value of LPCR. In case of kexec, cpus can have LPCR
value based on the previous translation mode we were running.

Fixes: fe036a0605d60 ("powerpc/64/kexec: Fix MMU cleanup on radix")
Cc: sta...@vger.kernel.org # v4.9+
Acked-by: Michael Neuling 
Signed-off-by: Aneesh Kumar K.V 
---
  arch/powerpc/kernel/cpu_setup_power.S | 4 
  1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/kernel/cpu_setup_power.S 
b/arch/powerpc/kernel/cpu_setup_power.S
index 917188615bf5..7fe8c79e6937 100644
--- a/arch/powerpc/kernel/cpu_setup_power.S
+++ b/arch/powerpc/kernel/cpu_setup_power.S
@@ -101,6 +101,8 @@ _GLOBAL(__setup_cpu_power9)
mfspr   r3,SPRN_LPCR
LOAD_REG_IMMEDIATE(r4, LPCR_PECEDH | LPCR_PECE_HVEE | LPCR_HVICE)
or  r3, r3, r4
+   LOAD_REG_IMMEDIATE(r4, LPCR_UPRT | LPCR_HR)
+   andcr3, r3, r4
bl  __init_LPCR
bl  __init_HFSCR
bl  __init_tlb_power9
@@ -122,6 +124,8 @@ _GLOBAL(__restore_cpu_power9)
mfspr   r3,SPRN_LPCR
LOAD_REG_IMMEDIATE(r4, LPCR_PECEDH | LPCR_PECE_HVEE | LPCR_HVICE)
or  r3, r3, r4
+   LOAD_REG_IMMEDIATE(r4, LPCR_UPRT | LPCR_HR)
+   andcr3, r3, r4
bl  __init_LPCR
bl  __init_HFSCR
bl  __init_tlb_power9

What about LPCR_GTSE and other bits?




That is set by the hypervisor for guest. We don't set that and expect to 
inherit that from
a previous run for baremetal. Also setting that in the context of LPID 0 
shouldn't have

any impact.

-aneesh



Re: [PATCH] powerpc/mm/hash: Always clear UPRT and Host Radix bits when setting up CPU

2017-02-21 Thread Balbir Singh
On Wed, Feb 22, 2017 at 10:42:02AM +0530, Aneesh Kumar K.V wrote:
> We will set LPCR with correct value for radix during int. This make sure we
> start with a sanitized value of LPCR. In case of kexec, cpus can have LPCR
> value based on the previous translation mode we were running.
> 
> Fixes: fe036a0605d60 ("powerpc/64/kexec: Fix MMU cleanup on radix")
> Cc: sta...@vger.kernel.org # v4.9+
> Acked-by: Michael Neuling 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/kernel/cpu_setup_power.S | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/cpu_setup_power.S 
> b/arch/powerpc/kernel/cpu_setup_power.S
> index 917188615bf5..7fe8c79e6937 100644
> --- a/arch/powerpc/kernel/cpu_setup_power.S
> +++ b/arch/powerpc/kernel/cpu_setup_power.S
> @@ -101,6 +101,8 @@ _GLOBAL(__setup_cpu_power9)
>   mfspr   r3,SPRN_LPCR
>   LOAD_REG_IMMEDIATE(r4, LPCR_PECEDH | LPCR_PECE_HVEE | LPCR_HVICE)
>   or  r3, r3, r4
> + LOAD_REG_IMMEDIATE(r4, LPCR_UPRT | LPCR_HR)
> + andcr3, r3, r4
>   bl  __init_LPCR
>   bl  __init_HFSCR
>   bl  __init_tlb_power9
> @@ -122,6 +124,8 @@ _GLOBAL(__restore_cpu_power9)
>   mfspr   r3,SPRN_LPCR
>   LOAD_REG_IMMEDIATE(r4, LPCR_PECEDH | LPCR_PECE_HVEE | LPCR_HVICE)
>   or  r3, r3, r4
> + LOAD_REG_IMMEDIATE(r4, LPCR_UPRT | LPCR_HR)
> + andcr3, r3, r4
>   bl  __init_LPCR
>   bl  __init_HFSCR
>   bl  __init_tlb_power9

What about LPCR_GTSE and other bits?

Balbir Singh.


Re: [PATCH] powerpc/mm: Add translation mode information in /proc/cpuinfo

2017-02-21 Thread Aneesh Kumar K.V



On Wednesday 22 February 2017 11:15 AM, Michael Ellerman wrote:

"Aneesh Kumar K.V"  writes:


With this we have on powernv and pseries /proc/cpuinfo reporting

timebase: 51200
platform: PowerNV
model   : 8247-22L
machine : PowerNV 8247-22L
firmware: OPAL
translation : Hash

Signed-off-by: Aneesh Kumar K.V 
---
  arch/powerpc/platforms/powernv/setup.c | 4 
  arch/powerpc/platforms/pseries/setup.c | 4 
  2 files changed, 8 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/setup.c 
b/arch/powerpc/platforms/powernv/setup.c
index d50c7d99baaf..d38571e289bb 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -95,6 +95,10 @@ static void pnv_show_cpuinfo(struct seq_file *m)
else
seq_printf(m, "firmware\t: BML\n");
of_node_put(root);
+   if (radix_enabled())
+   seq_printf(m, "translation\t: Radix\n");
+   else
+   seq_printf(m, "translation\t: Hash\n");
  }

Can we just call it "MMU" ?

I don't think it's entirely clear what "translation" means here if you
don't already know.

cheers

I avoided using MMU, because it will confuse hardware guys. Radix is not 
clearly the full definition of
Memory management unit, but rather the translation mode used by memory 
management unit. But

i don't have strong opinion on this.

Do you want me to send an updated patch ? or you can update it when you 
apply it to your tree ?


-aneesh



Re: [PATCH] powerpc/mm: Add translation mode information in /proc/cpuinfo

2017-02-21 Thread Michael Ellerman
"Aneesh Kumar K.V"  writes:

> With this we have on powernv and pseries /proc/cpuinfo reporting
>
> timebase: 51200
> platform: PowerNV
> model   : 8247-22L
> machine : PowerNV 8247-22L
> firmware: OPAL
> translation : Hash
>
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/platforms/powernv/setup.c | 4 
>  arch/powerpc/platforms/pseries/setup.c | 4 
>  2 files changed, 8 insertions(+)
>
> diff --git a/arch/powerpc/platforms/powernv/setup.c 
> b/arch/powerpc/platforms/powernv/setup.c
> index d50c7d99baaf..d38571e289bb 100644
> --- a/arch/powerpc/platforms/powernv/setup.c
> +++ b/arch/powerpc/platforms/powernv/setup.c
> @@ -95,6 +95,10 @@ static void pnv_show_cpuinfo(struct seq_file *m)
>   else
>   seq_printf(m, "firmware\t: BML\n");
>   of_node_put(root);
> + if (radix_enabled())
> + seq_printf(m, "translation\t: Radix\n");
> + else
> + seq_printf(m, "translation\t: Hash\n");
>  }

Can we just call it "MMU" ?

I don't think it's entirely clear what "translation" means here if you
don't already know.

cheers


Re: next-20170217 boot on POWER8 LPAR : WARNING @kernel/jump_label.c:287

2017-02-21 Thread Michael Ellerman
Jason Baron  writes:

> On 02/20/2017 10:05 PM, Sachin Sant wrote:
>>
>>> On 20-Feb-2017, at 8:27 PM, Jason Baron >> > wrote:
>>>
>>> Hi,
>>>
>>> On 02/19/2017 09:07 AM, Sachin Sant wrote:
 While booting next-20170217 on a POWER8 LPAR following
 warning is displayed.

 Reverting the following commit helps boot cleanly.
 commit 3821fd35b5 :  jump_label: Reduce the size of struct static_key

 [   11.393008] [ cut here ]
 [   11.393031] WARNING: CPU: 5 PID: 2890 at kernel/jump_label.c:287
 static_key_set_entries.isra.10+0x3c/0x50
>>>
>>> Thanks for the report. So this is saying that the jump_entry table is
>>> not at least 4-byte aligned. I wonder if this fixes it up?
>>>
>>
>> Yes. With this patch the warning is gone.
>
> Hi,
>
> Thanks for testing. We probably need something like the following to 
> make sure we don't hit this on other arches. Steve - I will send 4 
> separate patches for this to get arch maintainers' acks for this?

What's the 4 byte alignment requirement from?

On 64-bit our JUMP_ENTRY_TYPE is 8 bytes, should we be aligning to 8
bytes?

> diff --git a/arch/powerpc/include/asm/jump_label.h 
> b/arch/powerpc/include/asm/jump_label.h
> index 9a287e0ac8b1..f870a85bac46 100644
> --- a/arch/powerpc/include/asm/jump_label.h
> +++ b/arch/powerpc/include/asm/jump_label.h
> @@ -24,6 +24,7 @@ static __always_inline bool arch_static_branch(struct 
> static_key *key, bool bran
>  asm_volatile_goto("1:\n\t"
>   "nop # arch_static_branch\n\t"
>   ".pushsection __jump_table,  \"aw\"\n\t"
> +".balign 4 \n\t"

Can you line those up vertically?

(That may just be an email artifact)

>   JUMP_ENTRY_TYPE "1b, %l[l_yes], %c0\n\t"
>   ".popsection \n\t"
>   : :  "i" (&((char *)key)[branch]) : : l_yes);
> @@ -38,6 +39,7 @@ static __always_inline bool 
> arch_static_branch_jump(struct static_key *key, bool
>  asm_volatile_goto("1:\n\t"
>   "b %l[l_yes] # arch_static_branch_jump\n\t"
>   ".pushsection __jump_table,  \"aw\"\n\t"
> +".balign 4 \n\t"
>   JUMP_ENTRY_TYPE "1b, %l[l_yes], %c0\n\t"
>   ".popsection \n\t"
>   : :  "i" (&((char *)key)[branch]) : : l_yes);
> @@ -63,6 +65,7 @@ struct jump_entry {
>   #define ARCH_STATIC_BRANCH(LABEL, KEY) \
>   1098:  nop;\
>  .pushsection __jump_table, "aw";\
> +   .balign 4;  \
>  FTR_ENTRY_LONG 1098b, LABEL, KEY;   \
>  .popsection
>   #endif

Otherwise that looks fine assuming 4 bytes is the correct alignment.

cheers


[PATCH] powerpc/mm/hash: Always clear UPRT and Host Radix bits when setting up CPU

2017-02-21 Thread Aneesh Kumar K.V
We will set LPCR with correct value for radix during int. This make sure we
start with a sanitized value of LPCR. In case of kexec, cpus can have LPCR
value based on the previous translation mode we were running.

Fixes: fe036a0605d60 ("powerpc/64/kexec: Fix MMU cleanup on radix")
Cc: sta...@vger.kernel.org # v4.9+
Acked-by: Michael Neuling 
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/kernel/cpu_setup_power.S | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/kernel/cpu_setup_power.S 
b/arch/powerpc/kernel/cpu_setup_power.S
index 917188615bf5..7fe8c79e6937 100644
--- a/arch/powerpc/kernel/cpu_setup_power.S
+++ b/arch/powerpc/kernel/cpu_setup_power.S
@@ -101,6 +101,8 @@ _GLOBAL(__setup_cpu_power9)
mfspr   r3,SPRN_LPCR
LOAD_REG_IMMEDIATE(r4, LPCR_PECEDH | LPCR_PECE_HVEE | LPCR_HVICE)
or  r3, r3, r4
+   LOAD_REG_IMMEDIATE(r4, LPCR_UPRT | LPCR_HR)
+   andcr3, r3, r4
bl  __init_LPCR
bl  __init_HFSCR
bl  __init_tlb_power9
@@ -122,6 +124,8 @@ _GLOBAL(__restore_cpu_power9)
mfspr   r3,SPRN_LPCR
LOAD_REG_IMMEDIATE(r4, LPCR_PECEDH | LPCR_PECE_HVEE | LPCR_HVICE)
or  r3, r3, r4
+   LOAD_REG_IMMEDIATE(r4, LPCR_UPRT | LPCR_HR)
+   andcr3, r3, r4
bl  __init_LPCR
bl  __init_HFSCR
bl  __init_tlb_power9
-- 
2.7.4



[PATCH kernel] powerpc/powernv/ioda2: Gracefully fail if too many TCE levels requested

2017-02-21 Thread Alexey Kardashevskiy
The IODA2 specification says that a 64 DMA address cannot use top 4 bits
(3 are reserved and one is a "TVE select"); bottom page_shift bits
cannot be used for multilevel table addressing either.

The existing IODA2 table allocation code aligns the minimum TCE table
size to PAGE_SIZE so in the case of 64K system pages and 4K IOMMU pages,
we have 64-4-12=48 bits. Since 64K page stores 8192 TCEs, i.e. needs
13 bits, the maximum number of levels is 48/13 = 3 so we physically
cannot address more and EEH happens on DMA accesses.

This adds a check that too many levels were requested.

It is still possible to have 5 levels in the case of 4K system page size.

Signed-off-by: Alexey Kardashevskiy 
---

The alternative would be allocating TCE tables as big as PAGE_SIZE but
only using parts of it but this would complicate a bit bits of code
responsible for overall amount of memory used for TCE table.

Or kmem_cache_create() could be used to allocate as big TCE table levels
as we really need but that API does not seem to support NUMA nodes.

In the reality, even 3 levels give us way too much addressable memory.
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 24fa2de2a0af..1e92ec954321 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2631,6 +2631,9 @@ static long pnv_pci_ioda2_table_alloc_pages(int nid, 
__u64 bus_offset,
level_shift = entries_shift + 3;
level_shift = max_t(unsigned, level_shift, PAGE_SHIFT);
 
+   if ((level_shift - 3) * levels + page_shift >= 60)
+   return -EINVAL;
+
/* Allocate TCE table */
addr = pnv_pci_ioda2_table_do_alloc_pages(nid, level_shift,
levels, tce_table_size, &offset, &total_allocated);
-- 
2.11.0



Re: [PATCH kernel] powerpc/powernv/ioda2: Update iommu table base on ownership change

2017-02-21 Thread Gavin Shan
On Wed, Feb 22, 2017 at 02:05:15PM +1100, Alexey Kardashevskiy wrote:
>On 22/02/17 10:28, Gavin Shan wrote:
>> On Tue, Feb 21, 2017 at 01:41:31PM +1100, Alexey Kardashevskiy wrote:

[The subsequent discussion isn't related to the patch itself anymore]

>> One thing would be improved in future, which isn't relevant to
>> this patch if my understanding is correct enough: The TCE table for
>> DMA32 space created during system boot is destroyed when VFIO takes
>> the ownership. The same TCE table (same level, page size, window size
>> etc) is created and associated to the PE again. Some CPU cycles would
>> be saved if the original table is picked up without creating a new one.
>
>It is not necessary same levels or window size, could be something
>different. Also carrying a table will just make code bit more complicated
>and it is complicated enough already - we need to consider very possible
>case of IOMMU tables sharing.
>

Right after host boots up and VFIO isn't involved yet, each PE is associated
with a DMA32 space (0 - 2G) and the IO page size is 4KB. If the whole (window)
size, IO page size or levels are changed after the PE is released from guest
to host, it's not so much reasonable as the device (including its TCE table)
needs to be restored to previous state. Or we are talking about different
DMA space (TCE tables)?

Regarding the possiblity of sharing IOMMU tables, I don't quite understand.
Do you mean the situation of multiple functions adapter, some of them are
passed to guest and the left owned by host? I don't see how it works from
the DMA path. Would you please explain a bit?

>
>> The involved function is pnv_pci_ioda2_create_table(). Its primary work
>> is to allocate pages from buddy.
>
>It allocates pages via alloc_pages_node(), not buddy.
>

page allocator maybe? It's fetching page from PCP (PerCPU Pages) or buddy's
freelist depending on the requested size.

>> It's usually fast if there are enough
>> free pages. Otherwise, it would be relatively slow. It also has the risk
>> to fail the allocation. I guess it's not bad to save CPU cycles in this
>> critical (maybe hot?) path.
>
>It is not a critical path - it happens on a guest (re)boot only.
>

My point is: it sounds nice if less time needs for guest to (re)boot. I
don't know how much time could be saved though.

Thanks,
Gavin



Re: [PATCH kernel] powerpc/powernv/ioda2: Update iommu table base on ownership change

2017-02-21 Thread David Gibson
On Tue, Feb 21, 2017 at 01:41:31PM +1100, Alexey Kardashevskiy wrote:
> On POWERNV platform, in order to do DMA via IOMMU (i.e. 32bit DMA in
> our case), a device needs an iommu_table pointer set via
> set_iommu_table_base().
> 
> The codeflow is:
> - pnv_pci_ioda2_setup_dma_pe()
>   - pnv_pci_ioda2_setup_default_config()
>   - pnv_ioda_setup_bus_dma() [1]
> 
> pnv_pci_ioda2_setup_dma_pe() creates IOMMU groups,
> pnv_pci_ioda2_setup_default_config() does default DMA setup,
> pnv_ioda_setup_bus_dma() takes a bus PE (on IODA2, all physical function
> PEs as bus PEs except NPU), walks through all underlying buses and
> devices, adds all devices to an IOMMU group and sets iommu_table.
> 
> On IODA2, when VFIO is used, it takes ownership over a PE which means it
> removes all tables and creates new ones (with a possibility of sharing
> them among PEs). So when the ownership is returned from VFIO to
> the kernel, the iommu_table pointer written to a device at [1] is
> stale and needs an update.
> 
> This adds an "add_to_group" parameter to pnv_ioda_setup_bus_dma()
> (in fact re-adds as it used to be there a while ago for different
> reasons) to tell the helper if a device needs to be added to
> an IOMMU group with an iommu_table update or just the latter.
> 
> This calls pnv_ioda_setup_bus_dma(..., false) from
> pnv_ioda2_release_ownership() so when the ownership is restored,
> 32bit DMA can work again for a device. This does the same thing
> on obtaining ownership as the iommu_table point is stale at this point
> anyway and it is safer to have NULL there.
> 
> We did not hit this earlier as all tested devices in recent years were
> only using 64bit DMA; the rare exception for this is MPT3 SAS adapter
> which uses both 32bit and 64bit DMA access and it has not been tested
> with VFIO much.
> 
> Cc: Gavin Shan 
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: David Gibson 

> ---
> 
> If this is applied before "powerpc/powernv/npu: Remove dead iommu code",
> there will be a minor conflict.
> ---
>  arch/powerpc/platforms/powernv/pci-ioda.c | 17 -
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 51ec0dc1dfde..f5a2421bf164 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -1774,17 +1774,20 @@ static u64 pnv_pci_ioda_dma_get_required_mask(struct 
> pci_dev *pdev)
>  }
>  
>  static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe,
> -struct pci_bus *bus)
> +struct pci_bus *bus,
> +bool add_to_group)
>  {
>   struct pci_dev *dev;
>  
>   list_for_each_entry(dev, &bus->devices, bus_list) {
>   set_iommu_table_base(&dev->dev, pe->table_group.tables[0]);
>   set_dma_offset(&dev->dev, pe->tce_bypass_base);
> - iommu_add_device(&dev->dev);
> + if (add_to_group)
> + iommu_add_device(&dev->dev);
>  
>   if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
> - pnv_ioda_setup_bus_dma(pe, dev->subordinate);
> + pnv_ioda_setup_bus_dma(pe, dev->subordinate,
> + add_to_group);
>   }
>  }
>  
> @@ -2190,7 +2193,7 @@ static void pnv_pci_ioda1_setup_dma_pe(struct pnv_phb 
> *phb,
>   set_iommu_table_base(&pe->pdev->dev, tbl);
>   iommu_add_device(&pe->pdev->dev);
>   } else if (pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL))
> - pnv_ioda_setup_bus_dma(pe, pe->pbus);
> + pnv_ioda_setup_bus_dma(pe, pe->pbus, true);
>  
>   return;
>   fail:
> @@ -2425,6 +2428,8 @@ static void pnv_ioda2_take_ownership(struct 
> iommu_table_group *table_group)
>  
>   pnv_pci_ioda2_set_bypass(pe, false);
>   pnv_pci_ioda2_unset_window(&pe->table_group, 0);
> + if (pe->pbus)
> + pnv_ioda_setup_bus_dma(pe, pe->pbus, false);
>   pnv_ioda2_table_free(tbl);
>  }
>  
> @@ -2434,6 +2439,8 @@ static void pnv_ioda2_release_ownership(struct 
> iommu_table_group *table_group)
>   table_group);
>  
>   pnv_pci_ioda2_setup_default_config(pe);
> + if (pe->pbus)
> + pnv_ioda_setup_bus_dma(pe, pe->pbus, false);
>  }
>  
>  static struct iommu_table_group_ops pnv_pci_ioda2_ops = {
> @@ -2725,7 +2732,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb 
> *phb,
>   return;
>  
>   if (pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL))
> - pnv_ioda_setup_bus_dma(pe, pe->pbus);
> + pnv_ioda_setup_bus_dma(pe, pe->pbus, true);
>  }
>  
>  #ifdef CONFIG_PCI_MSI

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _o

Re: [PATCH kernel] powerpc/powernv/ioda2: Update iommu table base on ownership change

2017-02-21 Thread Alexey Kardashevskiy
On 22/02/17 10:28, Gavin Shan wrote:
> On Tue, Feb 21, 2017 at 01:41:31PM +1100, Alexey Kardashevskiy wrote:
>> On POWERNV platform, in order to do DMA via IOMMU (i.e. 32bit DMA in
>> our case), a device needs an iommu_table pointer set via
>> set_iommu_table_base().
>>
>> The codeflow is:
>> - pnv_pci_ioda2_setup_dma_pe()
>>  - pnv_pci_ioda2_setup_default_config()
>>  - pnv_ioda_setup_bus_dma() [1]
>>
>> pnv_pci_ioda2_setup_dma_pe() creates IOMMU groups,
>> pnv_pci_ioda2_setup_default_config() does default DMA setup,
>> pnv_ioda_setup_bus_dma() takes a bus PE (on IODA2, all physical function
>> PEs as bus PEs except NPU), walks through all underlying buses and
>> devices, adds all devices to an IOMMU group and sets iommu_table.
>>
>> On IODA2, when VFIO is used, it takes ownership over a PE which means it
>> removes all tables and creates new ones (with a possibility of sharing
>> them among PEs). So when the ownership is returned from VFIO to
>> the kernel, the iommu_table pointer written to a device at [1] is
>> stale and needs an update.
>>
>> This adds an "add_to_group" parameter to pnv_ioda_setup_bus_dma()
>> (in fact re-adds as it used to be there a while ago for different
>> reasons) to tell the helper if a device needs to be added to
>> an IOMMU group with an iommu_table update or just the latter.
>>
>> This calls pnv_ioda_setup_bus_dma(..., false) from
>> pnv_ioda2_release_ownership() so when the ownership is restored,
>> 32bit DMA can work again for a device. This does the same thing
>> on obtaining ownership as the iommu_table point is stale at this point
>> anyway and it is safer to have NULL there.
>>
>> We did not hit this earlier as all tested devices in recent years were
>> only using 64bit DMA; the rare exception for this is MPT3 SAS adapter
>> which uses both 32bit and 64bit DMA access and it has not been tested
>> with VFIO much.
>>
>> Cc: Gavin Shan 
>> Signed-off-by: Alexey Kardashevskiy 
> 
> Acked-by: Gavin Shan 

Thanks!

> One thing would be improved in future, which isn't relevant to
> this patch if my understanding is correct enough: The TCE table for
> DMA32 space created during system boot is destroyed when VFIO takes
> the ownership. The same TCE table (same level, page size, window size
> etc) is created and associated to the PE again. Some CPU cycles would
> be saved if the original table is picked up without creating a new one.

It is not necessary same levels or window size, could be something
different. Also carrying a table will just make code bit more complicated
and it is complicated enough already - we need to consider very possible
case of IOMMU tables sharing.


> The involved function is pnv_pci_ioda2_create_table(). Its primary work
> is to allocate pages from buddy.

It allocates pages via alloc_pages_node(), not buddy.

> It's usually fast if there are enough
> free pages. Otherwise, it would be relatively slow. It also has the risk
> to fail the allocation. I guess it's not bad to save CPU cycles in this
> critical (maybe hot?) path.

It is not a critical path - it happens on a guest (re)boot only.


-- 
Alexey


Re: PowerPC build fail

2017-02-21 Thread Andrew Donnellan

On 22/02/17 12:22, Tobin C. Harding wrote:

The current (2bfe01e) torvalds git tree fails to build on powerpc64. Build 
machine
is virtualized.

- Build error
arch/powerpc/kernel/time.c: In function ‘running_clock’:
arch/powerpc/kernel/time.c:712:25: error: implicit declaration of function 
‘cputime_to_nsecs’
   return local_clock() - 
cputime_to_nsecs(kcpustat_this_cpu->cpustat[CPUTIME_STEAL]);


Already reported, see 
https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-February/154433.html


See patch at http://patchwork.ozlabs.org/patch/730616/


Andrew

--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



Re: [PATCH] usercopy: Don't test 64-bit get/put_user() on 32-bit powerpc

2017-02-21 Thread Michael Ellerman
Kees Cook  writes:

> On Sat, Feb 18, 2017 at 1:33 AM, Michael Ellerman  wrote:
>> Add PPC32 to the opt-out list, otherwise it breaks the build.
>>
>> Signed-off-by: Michael Ellerman 
>> ---
>>  lib/test_user_copy.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/lib/test_user_copy.c b/lib/test_user_copy.c
>> index 4a79f2c1cd6e..6f335a3d4ae2 100644
>> --- a/lib/test_user_copy.c
>> +++ b/lib/test_user_copy.c
>> @@ -37,6 +37,7 @@
>> !defined(CONFIG_MICROBLAZE) &&  \
>> !defined(CONFIG_MN10300) && \
>> !defined(CONFIG_NIOS2) &&   \
>> +   !defined(CONFIG_PPC32) &&   \
>> !defined(CONFIG_SUPERH))
>>  # define TEST_U64
>>  #endif
>
> I'm fine to add this, but I'm curious why it fails? ppc uaccess.h has:
>
> #define get_user(x, ptr) \
> __get_user_check((x), (ptr), sizeof(*(ptr)))
>
> #define __get_user_check(x, ptr, size)  \
> ({  \
> ...
> __get_user_size(__gu_val, __gu_addr, (size), __gu_err); \
>
> #define __get_user_size(x, ptr, size, retval)   \
> do {\
> ...
> case 8: __get_user_asm2(x, ptr, retval);  break;\
>
> #ifdef __powerpc64__
> #define __get_user_asm2(x, addr, err)   \
> __get_user_asm(x, addr, err, "ld")
> #else /* __powerpc64__ */
> #define __get_user_asm2(x, addr, err)   \
> __asm__ __volatile__(   \
> ...
>
> It looks like __get_user_asm2() was explicitly designed for handling
> 64-bit get_user()?

Hmm, quite.

It's definitely failing:

  ERROR: "__get_user_bad" [lib/test_user_copy.ko] undefined!


I suspect it's because get_user() goes via __get_user_check(), which
uses an unsigned long as a temporary:

  #define __get_user_check(x, ptr, size)
\
  ({\
long __gu_err = -EFAULT;\
unsigned long  __gu_val = 0;\
const __typeof__(*(ptr)) __user *__gu_addr = (ptr); \
might_fault();  \
if (access_ok(VERIFY_READ, __gu_addr, (size)))  \
__get_user_size(__gu_val, __gu_addr, (size), __gu_err); \
(x) = (__force __typeof__(*(ptr)))__gu_val; 
\
__gu_err;   \
  })


And that trips the check in __get_user_size():

  #define __get_user_size(x, ptr, size, retval) \
  do {  \
retval = 0; \
__chk_user_ptr(ptr);\
if (size > sizeof(x))   \
(x) = __get_user_bad(); \



Which I can confirm just by changing that case to call
__get_user_bad_target() instead of __get_user_bad_size():

  ERROR: "__get_user_bad_target" [lib/test_user_copy.ko] undefined!


So despite having __get_user_asm2() defined for 32-bit it doesn't
actually work unless you call __get_user_size() directly.


For now if you don't mind merging this that would be good, so the build
stops breaking.

Then we can think about whether we fix it for ppc32 or just rip it out
entirely.

cheers


PowerPC build fail

2017-02-21 Thread Tobin C. Harding
The current (2bfe01e) torvalds git tree fails to build on powerpc64. Build 
machine
is virtualized.

- Build error
arch/powerpc/kernel/time.c: In function ‘running_clock’:
arch/powerpc/kernel/time.c:712:25: error: implicit declaration of function 
‘cputime_to_nsecs’
   return local_clock() - 
cputime_to_nsecs(kcpustat_this_cpu->cpustat[CPUTIME_STEAL]);
  ^
  
$ cat .config | grep CONFIG_VIRT_CPU
# CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set
# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set

$ cat .config | grep CONFIG_PPC_PSERIES
CONFIG_PPC_PSERIES=y

Looking into it I found there are compile time guards on 
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
preventing cputime_to_nsecs being defined.

Removing the compile time guard allows the build to proceed, surely
this is not the correct solution though.

diff --git a/arch/powerpc/include/asm/cputime.h 
b/arch/powerpc/include/asm/cputime.h
index 99b5418..15482cb 100644
--- a/arch/powerpc/include/asm/cputime.h
+++ b/arch/powerpc/include/asm/cputime.h
@@ -16,7 +16,7 @@
 #ifndef __POWERPC_CPUTIME_H
 #define __POWERPC_CPUTIME_H

-#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
+

 #include 
 #include 
@@ -53,5 +53,5 @@ void arch_vtime_task_switch(struct task_struct *tsk);
 #endif

 #endif /* __KERNEL__ */
-#endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
 #endif /* __POWERPC_CPUTIME_H */
diff --git a/include/linux/cputime.h b/include/linux/cputime.h
index a691dc4..f730c14 100644
--- a/include/linux/cputime.h
+++ b/include/linux/cputime.h
@@ -1,13 +1,13 @@
 #ifndef __LINUX_CPUTIME_H
 #define __LINUX_CPUTIME_H

-#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
+
 #include 

 #ifndef cputime_to_nsecs
-# define cputime_to_nsecs(__ct)\
+#define cputime_to_nsecs(__ct) \
(cputime_to_usecs(__ct) * NSEC_PER_USEC)
 #endif

-#endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
+
 #endif /* __LINUX_CPUTIME_H */


If you think this can be fixed with the *little* kernel dev knowledge
I have please point me at a starting place and I will dig into it.

thanks,
Tobin.


Re: [PATCH kernel] powerpc/powernv/npu: Remove dead iommu code

2017-02-21 Thread David Gibson
On Tue, Feb 21, 2017 at 01:40:20PM +1100, Alexey Kardashevskiy wrote:
> PNV_IODA_PE_DEV is only used for NPU devices (emulated PCI bridges
> representing NVLink). These are added to IOMMU groups with corresponding
> NVIDIA devices after all non-NPU PEs are setup; a special helper -
> pnv_pci_ioda_setup_iommu_api() - handles this in pnv_pci_ioda_fixup().
> 
> The pnv_pci_ioda2_setup_dma_pe() helper sets up DMA for a PE. It is called
> for VFs (so it does not handle NPU case) and PCI bridges but only
> IODA1 and IODA2 types. An NPU bridge has its own type id (PNV_PHB_NPU)
> so pnv_pci_ioda2_setup_dma_pe() cannot be called on NPU and therefore
> (pe->flags & PNV_IODA_PE_DEV) is always "false".
> 
> This removes not used iommu_add_device(). This should not cause any
> behavioral change.
> 
> Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: David Gibson 

> ---
>  arch/powerpc/platforms/powernv/pci-ioda.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 5fcae29107e1..51ec0dc1dfde 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -2724,9 +2724,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb 
> *phb,
>   if (rc)
>   return;
>  
> - if (pe->flags & PNV_IODA_PE_DEV)
> - iommu_add_device(&pe->pdev->dev);
> - else if (pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL))
> + if (pe->flags & (PNV_IODA_PE_BUS | PNV_IODA_PE_BUS_ALL))
>   pnv_ioda_setup_bus_dma(pe, pe->pbus);
>  }
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] powerpc/pseries: advertise Hot Plug Event support to firmware

2017-02-21 Thread David Gibson
On Mon, Feb 20, 2017 at 07:12:18PM -0600, Michael Roth wrote:
> With the inclusion of:
> 
>   powerpc/pseries: Implement indexed-count hotplug memory remove
>   powerpc/pseries: Implement indexed-count hotplug memory add
> 
> we now have complete handling of the RTAS hotplug event format
> as described by PAPR via ACR "PAPR Changes for Hotplug RTAS Events".
> 
> This capability is indicated by byte 6, bit 5 of architecture
> option vector 5, and allows for greater control over cpu/memory/pci
> hot plug/unplug operations.
> 
> Existing pseries kernels will utilize this capability based on the
> existence of the /event-sources/hot-plug-events DT property, so we
> only need to advertise it via CAS and do not need a corresponding
> FW_FEATURE_* value to test for.
> 
> Cc: Michael Ellerman 
> Cc: Nathan Fontenot 
> Cc: David Gibson 
> Signed-off-by: Michael Roth 

Reviewed-by: David Gibson 

> ---
>  arch/powerpc/include/asm/prom.h | 1 +
>  arch/powerpc/kernel/prom_init.c | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/prom.h b/arch/powerpc/include/asm/prom.h
> index 2c8001c..4a90634 100644
> --- a/arch/powerpc/include/asm/prom.h
> +++ b/arch/powerpc/include/asm/prom.h
> @@ -153,6 +153,7 @@ struct of_drconf_cell {
>  #define OV5_XCMO 0x0440  /* Page Coalescing */
>  #define OV5_TYPE1_AFFINITY   0x0580  /* Type 1 NUMA affinity */
>  #define OV5_PRRN 0x0540  /* Platform Resource Reassignment */
> +#define OV5_HP_EVT   0x0604  /* Hot Plug Event support */
>  #define OV5_RESIZE_HPT   0x0601  /* Hash Page Table resizing */
>  #define OV5_PFO_HW_RNG   0x1180  /* PFO Random Number Generator 
> */
>  #define OV5_PFO_HW_842   0x1140  /* PFO Compression Accelerator 
> */
> diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
> index f3c8799..1a835e7 100644
> --- a/arch/powerpc/kernel/prom_init.c
> +++ b/arch/powerpc/kernel/prom_init.c
> @@ -839,7 +839,7 @@ struct ibm_arch_vec __cacheline_aligned 
> ibm_architecture_vec = {
>   0,
>  #endif
>   .associativity = OV5_FEAT(OV5_TYPE1_AFFINITY) | 
> OV5_FEAT(OV5_PRRN),
> - .bin_opts = OV5_FEAT(OV5_RESIZE_HPT),
> + .bin_opts = OV5_FEAT(OV5_RESIZE_HPT) | OV5_FEAT(OV5_HP_EVT),
>   .micro_checkpoint = 0,
>   .reserved0 = 0,
>   .max_cpus = cpu_to_be32(NR_CPUS),   /* number of cores 
> supported */

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] powerpc/mm: Add translation mode information in /proc/cpuinfo

2017-02-21 Thread Balbir Singh
On Sun, Feb 19, 2017 at 03:47:49PM +0530, Aneesh Kumar K.V wrote:
> With this we have on powernv and pseries /proc/cpuinfo reporting
> 
> timebase: 51200
> platform: PowerNV
> model   : 8247-22L
> machine : PowerNV 8247-22L
> firmware: OPAL
> translation : Hash
> 
> Signed-off-by: Aneesh Kumar K.V 
> ---

Acked-by: Balbir Singh 


Re: [PATCH V3 10/10] powerpc/mm/slice: Update slice mask printing to use bitmap printing.

2017-02-21 Thread Balbir Singh
On Sun, Feb 19, 2017 at 03:37:17PM +0530, Aneesh Kumar K.V wrote:
> We now get output like below which is much better.
> 
> [0.935306]  good_mask low_slice: 0-15
> [0.935360]  good_mask high_slice: 0-511
> 
> Compared to
> 
> [0.953414]  good_mask: - 1.
> 
> I also fixed an error with slice_dbg printing.
> 
> Signed-off-by: Aneesh Kumar K.V 
> ---

Yep, like this much better

Acked-by: Balbir Singh 


Re: [PATCH V3 06/10] powerpc/mm: Remove redundant TASK_SIZE_USER64 checks

2017-02-21 Thread Balbir Singh
On Sun, Feb 19, 2017 at 03:37:13PM +0530, Aneesh Kumar K.V wrote:
> The check against VSID range is implied when we check task size against
> hash and radix pgtable range[1], because we make sure page table range cannot
> exceed vsid range.
> 
> [1] BUILD_BUG_ON(TASK_SIZE_USER64 > H_PGTABLE_RANGE);
> BUILD_BUG_ON(TASK_SIZE_USER64 > RADIX_PGTABLE_RANGE);
> 
> The check for smaller task size is also removed here, because the follow up
> patch will support a tasksize smaller than pgtable range.
> 
> Signed-off-by: Aneesh Kumar K.V 
> ---

Acked-by: Balbir Singh 


Re: [PATCH V3 01/10] powerpc/mm/slice: Convert slice_mask high slice to a bitmap

2017-02-21 Thread Balbir Singh
On Tue, Feb 21, 2017 at 12:26:15PM +0530, Aneesh Kumar K.V wrote:
> 
> 
> On Tuesday 21 February 2017 10:13 AM, Balbir Singh wrote:
> > On Sun, Feb 19, 2017 at 03:37:08PM +0530, Aneesh Kumar K.V wrote:
> > > In followup patch we want to increase the va range which will result
> > > in us requiring high_slices to have more than 64 bits. To enable this
> > > convert high_slices to bitmap. We keep the number bits same in this patch
> > > and later change that to higher value
> > > 
> > > Signed-off-by: Aneesh Kumar K.V 
> > > ---
> > For consistency it would be nice to have low_slices represented similarly
> > as well.
> 
> That will need converting lot of low_slices update to bitmap api with no
> real benefit

Consistency and not having our own bitmap implementation :) Anways,
I am not too held up on it.

> > Acked-by: Balbir Singh 
> > 
> 
> -aneesh
> 


Re: [PATCH V3 04/10] powerpc/mm/hash: Support 68 bit VA

2017-02-21 Thread Balbir Singh
On Sun, Feb 19, 2017 at 03:37:11PM +0530, Aneesh Kumar K.V wrote:
> Inorder to support large effective address range (512TB), we want to increase
> the virtual address bits to 68. But we do have platforms like p4 and p5 that 
> can
> only do 65 bit VA. We support those platforms by limiting context bits on them
> to 16.
> 
> The protovsid -> vsid conversion is verified to work with both 65 and 68 bit
> va values. I also documented the restrictions in a table format as part of 
> code
> comments.
> 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  /*
>   * This should be computed such that protovosid * vsid_mulitplier
>   * doesn't overflow 64 bits. It should also be co-prime to vsid_modulus
> + * We also need to make sure that number of bits in divisor is less
> + * than twice the number of protovsid bits for our modulus optmization to 
> work.

Could you please explain why?

I am also beginning to wonder if we need the modulus optimization.
On my compiler (a * b) % (2^n - 1) generated reasonable code with just
one multiplication and a few shifts and boolean operations.

> + * The below table shows the current values used.
> + *
> + * |---++++--|
> + * |   | Prime Bits | VSID_BITS_65VA | Total Bits | 2* VSID_BITS |
> + * |---++++--|
> + * | 1T| 24 | 25 | 49 |   50 |
> + * |---++++--|
> + * | 256MB | 24 | 37 | 61 |   74 |
> + * |---++++--|
> + *
> + * |---++++--|
> + * |   | Prime Bits | VSID_BITS_68VA | Total Bits | 2* VSID_BITS |
> + * |---++++--|
> + * | 1T| 24 | 28 | 52 |   56 |
> + * |---++++--|
> + * | 256MB | 24 | 40 | 64 |   80 |
> + * |---++++--|
> + *
>   */
>  #define VSID_MULTIPLIER_256M ASM_CONST(12538073) /* 24-bit prime */
> -#define VSID_BITS_256M   (CONTEXT_BITS + ESID_BITS)
> +#define VSID_BITS_256M   (VA_BITS - SID_SHIFT)
>  #define VSID_MODULUS_256M((1UL< +#define VSID_BITS_65_256M(65 - SID_SHIFT)
>  
>  #define VSID_MULTIPLIER_1T   ASM_CONST(12538073) /* 24-bit prime */
> -#define VSID_BITS_1T (CONTEXT_BITS + ESID_BITS_1T)
> +#define VSID_BITS_1T (VA_BITS - SID_SHIFT_1T)
>  #define VSID_MODULUS_1T  ((1UL< -
> +#define VSID_BITS_65_1T  (65 - SID_SHIFT_1T)
>  
>  #define USER_VSID_RANGE  (1UL << (ESID_BITS + SID_SHIFT))
>  
> -/*
> - * This macro generates asm code to compute the VSID scramble
> - * function.  Used in slb_allocate() and do_stab_bolted.  The function
> - * computed is: (protovsid*VSID_MULTIPLIER) % VSID_MODULUS
> - *
> - *   rt = register containing the proto-VSID and into which the
> - *   VSID will be stored
> - *   rx = scratch register (clobbered)
> - *
> - *   - rt and rx must be different registers
> - *   - The answer will end up in the low VSID_BITS bits of rt.  The higher
> - * bits may contain other garbage, so you may need to mask the
> - * result.
> - */
> -#define ASM_VSID_SCRAMBLE(rt, rx, size)  
> \
> - lis rx,VSID_MULTIPLIER_##size@h;\
> - ori rx,rx,VSID_MULTIPLIER_##size@l; \
> - mulld   rt,rt,rx;   /* rt = rt * MULTIPLIER */  \
> - \
> - srdirx,rt,VSID_BITS_##size; \
> - clrldi  rt,rt,(64-VSID_BITS_##size);\
> - add rt,rt,rx;   /* add high and low bits */ \
> - /* NOTE: explanation based on VSID_BITS_##size = 36 \
> -  * Now, r3 == VSID (mod 2^36-1), and lies between 0 and \
> -  * 2^36-1+2^28-1.  That in particular means that if r3 >=   \
> -  * 2^36-1, then r3+1 has the 2^36 bit set.  So, if r3+1 has \
> -  * the bit clear, r3 already has the answer we want, if it  \
> -  * doesn't, the answer is the low 36 bits of r3+1.  So in all   \
> -  * cases the answer is the low 36 bits of (r3 + ((r3+1) >> 36))*/\

I see that this comment is lost in the new definition, could we please
retain it.

> - addirx,rt,1;\
> - srdirx,rx,VSID_BITS_##size; /* extract 2^VSID_BITS bit */   \
> - add rt,rt,rx
> -
>  /* 4 bits per slice and we have one slice per 1TB */
>  #define SLICE_ARRAY_SIZE  (H_PGTABLE_RANGE >> 41)
>  
> @@ -629,7 +632,7 @@ static inline void subpage_pr

Re: [PATCH kernel] powerpc/powernv/ioda2: Update iommu table base on ownership change

2017-02-21 Thread Gavin Shan
On Tue, Feb 21, 2017 at 01:41:31PM +1100, Alexey Kardashevskiy wrote:
>On POWERNV platform, in order to do DMA via IOMMU (i.e. 32bit DMA in
>our case), a device needs an iommu_table pointer set via
>set_iommu_table_base().
>
>The codeflow is:
>- pnv_pci_ioda2_setup_dma_pe()
>   - pnv_pci_ioda2_setup_default_config()
>   - pnv_ioda_setup_bus_dma() [1]
>
>pnv_pci_ioda2_setup_dma_pe() creates IOMMU groups,
>pnv_pci_ioda2_setup_default_config() does default DMA setup,
>pnv_ioda_setup_bus_dma() takes a bus PE (on IODA2, all physical function
>PEs as bus PEs except NPU), walks through all underlying buses and
>devices, adds all devices to an IOMMU group and sets iommu_table.
>
>On IODA2, when VFIO is used, it takes ownership over a PE which means it
>removes all tables and creates new ones (with a possibility of sharing
>them among PEs). So when the ownership is returned from VFIO to
>the kernel, the iommu_table pointer written to a device at [1] is
>stale and needs an update.
>
>This adds an "add_to_group" parameter to pnv_ioda_setup_bus_dma()
>(in fact re-adds as it used to be there a while ago for different
>reasons) to tell the helper if a device needs to be added to
>an IOMMU group with an iommu_table update or just the latter.
>
>This calls pnv_ioda_setup_bus_dma(..., false) from
>pnv_ioda2_release_ownership() so when the ownership is restored,
>32bit DMA can work again for a device. This does the same thing
>on obtaining ownership as the iommu_table point is stale at this point
>anyway and it is safer to have NULL there.
>
>We did not hit this earlier as all tested devices in recent years were
>only using 64bit DMA; the rare exception for this is MPT3 SAS adapter
>which uses both 32bit and 64bit DMA access and it has not been tested
>with VFIO much.
>
>Cc: Gavin Shan 
>Signed-off-by: Alexey Kardashevskiy 

Acked-by: Gavin Shan 

One thing would be improved in future, which isn't relevant to
this patch if my understanding is correct enough: The TCE table for
DMA32 space created during system boot is destroyed when VFIO takes
the ownership. The same TCE table (same level, page size, window size
etc) is created and associated to the PE again. Some CPU cycles would
be saved if the original table is picked up without creating a new one.

The involved function is pnv_pci_ioda2_create_table(). Its primary work
is to allocate pages from buddy. It's usually fast if there are enough
free pages. Otherwise, it would be relatively slow. It also has the risk
to fail the allocation. I guess it's not bad to save CPU cycles in this
critical (maybe hot?) path.

Thanks,
Gavin 



Re: [PATCH kernel] powerpc/powernv/npu: Remove dead iommu code

2017-02-21 Thread Gavin Shan
On Tue, Feb 21, 2017 at 01:40:20PM +1100, Alexey Kardashevskiy wrote:
>PNV_IODA_PE_DEV is only used for NPU devices (emulated PCI bridges
>representing NVLink). These are added to IOMMU groups with corresponding
>NVIDIA devices after all non-NPU PEs are setup; a special helper -
>pnv_pci_ioda_setup_iommu_api() - handles this in pnv_pci_ioda_fixup().
>
>The pnv_pci_ioda2_setup_dma_pe() helper sets up DMA for a PE. It is called
>for VFs (so it does not handle NPU case) and PCI bridges but only
>IODA1 and IODA2 types. An NPU bridge has its own type id (PNV_PHB_NPU)
>so pnv_pci_ioda2_setup_dma_pe() cannot be called on NPU and therefore
>(pe->flags & PNV_IODA_PE_DEV) is always "false".
>
>This removes not used iommu_add_device(). This should not cause any
>behavioral change.
>
>Signed-off-by: Alexey Kardashevskiy 

Acked-by: Gavin Shan 



Re: [PATCH 0/2] Allow configurable stack size (especially 32k on PPC64)

2017-02-21 Thread Benjamin Herrenschmidt
On Tue, 2017-02-21 at 13:51 +0100, Gabriel Paubert wrote:
> For now it has only been used for little-endian kernel and
> applications,
> but according to messages that I have seen on the list, switching the
> kernel 
> to Elf V2 should be possible.

I don't think the toolchain "supports" ELFv2 on BE. It sort-of seems to
work but it's not supported.

Cheers,
Ben.



Re: [PATCH kernel] powerpc/powernv: Fix it_ops::get() callback to return in cpu endian

2017-02-21 Thread Gavin Shan
On Tue, Feb 21, 2017 at 01:38:54PM +1100, Alexey Kardashevskiy wrote:
>The iommu_table_ops callbacks are declared CPU endian as they take and
>return "unsigned long"; underlying hardware tables are big-endian.
>
>However get() was missing be64_to_cpu(), this adds the missing conversion.
>
>The only caller of this is crash dump at arch/powerpc/kernel/iommu.c,
>iommu_table_clear() which only compares TCE to zero so this change
>should not cause behavioral change.
>
>Signed-off-by: Alexey Kardashevskiy 

Acked-by: Gavin Shan 



Re: linux-next: manual merge of the tip tree with the powerpc tree

2017-02-21 Thread Stephen Rothwell
Hi all,

On Fri, 17 Feb 2017 12:48:43 +1100 Stephen Rothwell  
wrote:
>
> Today's linux-next merge of the tip tree got a conflict in:
> 
>   arch/powerpc/kernel/asm-offsets.c
> 
> between commit:
> 
>   454656155110 ("powerpc/asm: Use OFFSET macro in asm-offsets.c")
> 
> from the powerpc tree and commit:
> 
>   8c8b73c4811f ("sched/cputime, powerpc: Prepare accounting structure for 
> cputime flush on tick")
> 
> from the tip tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 
> -- 
> Cheers,
> Stephen Rothwell
> 
> diff --cc arch/powerpc/kernel/asm-offsets.c
> index d918338b54b0,9e8e771f8acb..
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@@ -208,48 -213,58 +208,48 @@@ int main(void
>   #endif /* CONFIG_PPC_BOOK3E */
>   
>   #ifdef CONFIG_PPC_STD_MMU_64
>  -DEFINE(PACASLBCACHE, offsetof(struct paca_struct, slb_cache));
>  -DEFINE(PACASLBCACHEPTR, offsetof(struct paca_struct, slb_cache_ptr));
>  -DEFINE(PACAVMALLOCSLLP, offsetof(struct paca_struct, vmalloc_sllp));
>  +OFFSET(PACASLBCACHE, paca_struct, slb_cache);
>  +OFFSET(PACASLBCACHEPTR, paca_struct, slb_cache_ptr);
>  +OFFSET(PACAVMALLOCSLLP, paca_struct, vmalloc_sllp);
>   #ifdef CONFIG_PPC_MM_SLICES
>  -DEFINE(MMUPSIZESLLP, offsetof(struct mmu_psize_def, sllp));
>  +OFFSET(MMUPSIZESLLP, mmu_psize_def, sllp);
>   #else
>  -DEFINE(PACACONTEXTSLLP, offsetof(struct paca_struct, mm_ctx_sllp));
>  +OFFSET(PACACONTEXTSLLP, paca_struct, mm_ctx_sllp);
>   #endif /* CONFIG_PPC_MM_SLICES */
>  -DEFINE(PACA_EXGEN, offsetof(struct paca_struct, exgen));
>  -DEFINE(PACA_EXMC, offsetof(struct paca_struct, exmc));
>  -DEFINE(PACA_EXSLB, offsetof(struct paca_struct, exslb));
>  -DEFINE(PACALPPACAPTR, offsetof(struct paca_struct, lppaca_ptr));
>  -DEFINE(PACA_SLBSHADOWPTR, offsetof(struct paca_struct, slb_shadow_ptr));
>  -DEFINE(SLBSHADOW_STACKVSID,
>  -   offsetof(struct slb_shadow, save_area[SLB_NUM_BOLTED - 1].vsid));
>  -DEFINE(SLBSHADOW_STACKESID,
>  -   offsetof(struct slb_shadow, save_area[SLB_NUM_BOLTED - 1].esid));
>  -DEFINE(SLBSHADOW_SAVEAREA, offsetof(struct slb_shadow, save_area));
>  -DEFINE(LPPACA_PMCINUSE, offsetof(struct lppaca, pmcregs_in_use));
>  -DEFINE(LPPACA_DTLIDX, offsetof(struct lppaca, dtl_idx));
>  -DEFINE(LPPACA_YIELDCOUNT, offsetof(struct lppaca, yield_count));
>  -DEFINE(PACA_DTL_RIDX, offsetof(struct paca_struct, dtl_ridx));
>  +OFFSET(PACA_EXGEN, paca_struct, exgen);
>  +OFFSET(PACA_EXMC, paca_struct, exmc);
>  +OFFSET(PACA_EXSLB, paca_struct, exslb);
>  +OFFSET(PACALPPACAPTR, paca_struct, lppaca_ptr);
>  +OFFSET(PACA_SLBSHADOWPTR, paca_struct, slb_shadow_ptr);
>  +OFFSET(SLBSHADOW_STACKVSID, slb_shadow, save_area[SLB_NUM_BOLTED - 
> 1].vsid);
>  +OFFSET(SLBSHADOW_STACKESID, slb_shadow, save_area[SLB_NUM_BOLTED - 
> 1].esid);
>  +OFFSET(SLBSHADOW_SAVEAREA, slb_shadow, save_area);
>  +OFFSET(LPPACA_PMCINUSE, lppaca, pmcregs_in_use);
>  +OFFSET(LPPACA_DTLIDX, lppaca, dtl_idx);
>  +OFFSET(LPPACA_YIELDCOUNT, lppaca, yield_count);
>  +OFFSET(PACA_DTL_RIDX, paca_struct, dtl_ridx);
>   #endif /* CONFIG_PPC_STD_MMU_64 */
>  -DEFINE(PACAEMERGSP, offsetof(struct paca_struct, emergency_sp));
>  +OFFSET(PACAEMERGSP, paca_struct, emergency_sp);
>   #ifdef CONFIG_PPC_BOOK3S_64
>  -DEFINE(PACAMCEMERGSP, offsetof(struct paca_struct, mc_emergency_sp));
>  -DEFINE(PACA_IN_MCE, offsetof(struct paca_struct, in_mce));
>  -#endif
>  -DEFINE(PACAHWCPUID, offsetof(struct paca_struct, hw_cpu_id));
>  -DEFINE(PACAKEXECSTATE, offsetof(struct paca_struct, kexec_state));
>  -DEFINE(PACA_DSCR_DEFAULT, offsetof(struct paca_struct, dscr_default));
>  -DEFINE(ACCOUNT_STARTTIME,
>  -   offsetof(struct paca_struct, accounting.starttime));
>  -DEFINE(ACCOUNT_STARTTIME_USER,
>  -   offsetof(struct paca_struct, accounting.starttime_user));
>  -DEFINE(ACCOUNT_USER_TIME,
>  -   offsetof(struct paca_struct, accounting.utime));
>  -DEFINE(ACCOUNT_SYSTEM_TIME,
>  -   offsetof(struct paca_struct, accounting.stime));
>  -DEFINE(PACA_TRAP_SAVE, offsetof(struct paca_struct, trap_save));
>  -DEFINE(PACA_NAPSTATELOST, offsetof(struct paca_struct, nap_state_lost));
>  -DEFINE(PACA_SPRG_VDSO, offsetof(struct paca_struct, sprg_vdso));
>  +OFFSET(PACAMCEMERGSP, paca_struct, mc_emergency_sp);
>  +OFFSET(PACA_IN_MCE, paca_struct, in_mce);
>  +#endif
>  +OFFSET(PACAHWCPUID, paca_struct, hw_cpu_id);
>  +OFFSET(PACAKEXECSTATE, paca_struct, kexec_state);

Re: [PATCH v5 13/15] livepatch: change to a per-task consistency model

2017-02-21 Thread Josh Poimboeuf
On Fri, Feb 17, 2017 at 09:51:29AM +0100, Miroslav Benes wrote:
> On Thu, 16 Feb 2017, Josh Poimboeuf wrote:
> > What do you think about the following?  I tried to put the logic in
> > klp_complete_transition(), so the module_put()'s would be in one place.
> > But it was too messy, so I put it in klp_cancel_transition() instead.
> >
> > diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> > index e96346e..bd1c1fd 100644
> > --- a/kernel/livepatch/transition.c
> > +++ b/kernel/livepatch/transition.c
> > @@ -121,8 +121,28 @@ static void klp_complete_transition(void)
> >   */
> >  void klp_cancel_transition(void)
> >  {
> > +   bool immediate_func = false;
> > +
> > klp_target_state = !klp_target_state;
> > klp_complete_transition();
> > +
> > +   if (klp_target_state == KLP_PATCHED)
> > +   return;
> 
> This is not needed, I think. We call klp_cancel_transition() in 
> __klp_enable_patch() only. klp_target_state is set to KLP_PATCHED there 
> (through klp_init_transition()) and negated here. We know it must be 
> KLP_UNPATCHED.

Yeah, I was trying to hedge against the possibility of future code
calling this function in the disable path.  But that probably won't
happen and it would probably be cleaner to just add a warning if
klp_target_state isn't KLP_PATCHED.

> Moreover, due to klp_complete_transition() klp_target_state is always 
> KLP_UNDEFINED after it.
> 
> > +
> > +   /*
> > +* In the enable error path, even immediate patches can be safely
> > +* removed because the transition hasn't been started yet.
> > +*
> > +* klp_complete_transition() doesn't have a module_put() for immediate
> > +* patches, so do it here.
> > +*/
> > +   klp_for_each_object(klp_transition_patch, obj)
> > +   klp_for_each_func(obj, func)
> > +   if (func->immediate)
> > +   immediate_func = true;
> > +
> > +   if (klp_transition_patch->immediate || immediate_func)
> > +   module_put(klp_transition_patch->mod);
> 
> Almost correct. The only problem is that klp_transition_patch is NULL at 
> this point. klp_complete_transition() does that and it should stay there 
> in my opinion to keep it simple.
> 
> So we could either move all this to __klp_enable_patch(), where patch 
> variable is defined, or we could store klp_transition_patch to a local 
> variable here in klp_cancel_transition() before klp_complete_transition() 
> is called. That should be safe. I like the latter more, because it keeps 
> the code in klp_cancel_transition() where it belongs.

Good points.  Here's another try:

diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index e96346e..a23c63c 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -121,8 +121,31 @@ static void klp_complete_transition(void)
  */
 void klp_cancel_transition(void)
 {
-   klp_target_state = !klp_target_state;
+   struct klp_patch *patch = klp_transition_patch;
+   struct klp_object *obj;
+   struct klp_func *func;
+   bool immediate_func = false;
+
+   if (WARN_ON_ONCE(klp_target_state != KLP_PATCHED))
+   return;
+
+   klp_target_state = KLP_UNPATCHED;
klp_complete_transition();
+
+   /*
+* In the enable error path, even immediate patches can be safely
+* removed because the transition hasn't been started yet.
+*
+* klp_complete_transition() doesn't have a module_put() for immediate
+* patches, so do it here.
+*/
+   klp_for_each_object(patch, obj)
+   klp_for_each_func(obj, func)
+   if (func->immediate)
+   immediate_func = true;
+
+   if (patch->immediate || immediate_func)
+   module_put(patch->mod);
 }
 
 /*


Re: [RFC PATCH 4/9] powerpc/4xx: Create 4xx pseudo-platform in platforms/4xx

2017-02-21 Thread Arnd Bergmann
On Mon, Feb 20, 2017 at 3:34 AM, Nicholas Piggin  wrote:
> On Fri, 17 Feb 2017 17:32:14 +1100
> Michael Ellerman  wrote:
>
>> We have a lot of code in sysdev for supporting 4xx, ie. either 40x or
>> 44x. Instead it would be cleaner if it was all in platforms/4xx.
>>
>> This is slightly odd in that we don't actually define any machines in
>> the 4xx platform, as is usual for a platform directory. But still it
>> seems like a better result to have all this related code in a directory
>> by itself.
>
> What about the other things in sysdev that support multiple platforms?

Some of them have subsystem specific directories in drivers these
days, e.g. drivers/pci/host and drivers/irqchip. Some others that are
shared with ARM or ARM64 platforms are already being moved to
drivers/soc/

> Why not just put the new 4xx subdirectory under sysdev?

arch/powerpc/platforms/40x/ only has four small C files, you could also
move everything to platforms/4xx/ instead.

 Arnd


Re: [PATCH] cxl: Enable PCI device ID for future IBM CXL adapter

2017-02-21 Thread Uma Krishnan

On 2/19/2017 4:54 AM, Andrew Donnellan wrote:

On 17/02/17 14:45, Uma Krishnan wrote:

From: "Matthew R. Ochs" 

Add support for a future IBM Coherent Accelerator (CXL) device
with an ID of 0x0623.

Signed-off-by: Matthew R. Ochs 
Signed-off-by: Uma Krishnan 


Is this a CAIA 1 or CAIA 2 device?



CAIA 1 device



[PATCH v3 2/2] powerpc: ftrace_64: split further based on -mprofile-kernel

2017-02-21 Thread Naveen N. Rao
Split ftrace_64.S further retaining the core ftrace 64-bit aspects
in ftrace_64.S and moving ftrace_caller() and ftrace_graph_caller() into
separate files based on -mprofile-kernel. The livepatch routines are all
now contained within the mprofile file.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/kernel/trace/Makefile |   5 +
 arch/powerpc/kernel/trace/ftrace_64.S  | 308 +
 arch/powerpc/kernel/trace/ftrace_64_mprofile.S | 274 ++
 arch/powerpc/kernel/trace/ftrace_64_pg.S   |  68 ++
 4 files changed, 348 insertions(+), 307 deletions(-)
 create mode 100644 arch/powerpc/kernel/trace/ftrace_64_mprofile.S
 create mode 100644 arch/powerpc/kernel/trace/ftrace_64_pg.S

diff --git a/arch/powerpc/kernel/trace/Makefile 
b/arch/powerpc/kernel/trace/Makefile
index 5f5a35254a9b..729dffc5f7bc 100644
--- a/arch/powerpc/kernel/trace/Makefile
+++ b/arch/powerpc/kernel/trace/Makefile
@@ -11,6 +11,11 @@ endif
 
 obj32-$(CONFIG_FUNCTION_TRACER)+= ftrace_32.o
 obj64-$(CONFIG_FUNCTION_TRACER)+= ftrace_64.o
+ifdef CONFIG_MPROFILE_KERNEL
+obj64-$(CONFIG_FUNCTION_TRACER)+= ftrace_64_mprofile.o
+else
+obj64-$(CONFIG_FUNCTION_TRACER)+= ftrace_64_pg.o
+endif
 obj-$(CONFIG_DYNAMIC_FTRACE)   += ftrace.o
 obj-$(CONFIG_FUNCTION_GRAPH_TRACER)+= ftrace.o
 obj-$(CONFIG_FTRACE_SYSCALLS)  += ftrace.o
diff --git a/arch/powerpc/kernel/trace/ftrace_64.S 
b/arch/powerpc/kernel/trace/ftrace_64.S
index df47433b8e13..e5ccea19821e 100644
--- a/arch/powerpc/kernel/trace/ftrace_64.S
+++ b/arch/powerpc/kernel/trace/ftrace_64.S
@@ -23,236 +23,7 @@ EXPORT_SYMBOL(_mcount)
mtlrr0
bctr
 
-#ifndef CC_USING_MPROFILE_KERNEL
-_GLOBAL_TOC(ftrace_caller)
-   /* Taken from output of objdump from lib64/glibc */
-   mflrr3
-   ld  r11, 0(r1)
-   stdur1, -112(r1)
-   std r3, 128(r1)
-   ld  r4, 16(r11)
-   subir3, r3, MCOUNT_INSN_SIZE
-.globl ftrace_call
-ftrace_call:
-   bl  ftrace_stub
-   nop
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
-.globl ftrace_graph_call
-ftrace_graph_call:
-   b   ftrace_graph_stub
-_GLOBAL(ftrace_graph_stub)
-#endif
-   ld  r0, 128(r1)
-   mtlrr0
-   addir1, r1, 112
-
-#else /* CC_USING_MPROFILE_KERNEL */
-/*
- *
- * ftrace_caller() is the function that replaces _mcount() when ftrace is
- * active.
- *
- * We arrive here after a function A calls function B, and we are the trace
- * function for B. When we enter r1 points to A's stack frame, B has not yet
- * had a chance to allocate one yet.
- *
- * Additionally r2 may point either to the TOC for A, or B, depending on
- * whether B did a TOC setup sequence before calling us.
- *
- * On entry the LR points back to the _mcount() call site, and r0 holds the
- * saved LR as it was on entry to B, ie. the original return address at the
- * call site in A.
- *
- * Our job is to save the register state into a struct pt_regs (on the stack)
- * and then arrange for the ftrace function to be called.
- */
-_GLOBAL(ftrace_caller)
-   /* Save the original return address in A's stack frame */
-   std r0,LRSAVE(r1)
-
-   /* Create our stack frame + pt_regs */
-   stdur1,-SWITCH_FRAME_SIZE(r1)
-
-   /* Save all gprs to pt_regs */
-   SAVE_8GPRS(0,r1)
-   SAVE_8GPRS(8,r1)
-   SAVE_8GPRS(16,r1)
-   SAVE_8GPRS(24,r1)
-
-   /* Load special regs for save below */
-   mfmsr   r8
-   mfctr   r9
-   mfxer   r10
-   mfcrr11
-
-   /* Get the _mcount() call site out of LR */
-   mflrr7
-   /* Save it as pt_regs->nip & pt_regs->link */
-   std r7, _NIP(r1)
-   std r7, _LINK(r1)
-
-   /* Save callee's TOC in the ABI compliant location */
-   std r2, 24(r1)
-   ld  r2,PACATOC(r13) /* get kernel TOC in r2 */
-
-   addis   r3,r2,function_trace_op@toc@ha
-   addir3,r3,function_trace_op@toc@l
-   ld  r5,0(r3)
-
-#ifdef CONFIG_LIVEPATCH
-   mr  r14,r7  /* remember old NIP */
-#endif
-   /* Calculate ip from nip-4 into r3 for call below */
-   subir3, r7, MCOUNT_INSN_SIZE
-
-   /* Put the original return address in r4 as parent_ip */
-   mr  r4, r0
-
-   /* Save special regs */
-   std r8, _MSR(r1)
-   std r9, _CTR(r1)
-   std r10, _XER(r1)
-   std r11, _CCR(r1)
-
-   /* Load &pt_regs in r6 for call below */
-   addir6, r1 ,STACK_FRAME_OVERHEAD
-
-   /* ftrace_call(r3, r4, r5, r6) */
-.globl ftrace_call
-ftrace_call:
-   bl  ftrace_stub
-   nop
-
-   /* Load ctr with the possibly modified NIP */
-   ld  r3, _NIP(r1)
-   mtctr   r3
-#ifdef CONFIG_LIVEPATCH
-   cmpdr14,r3  /* has NIP been altered? */
-#endif
-
-   /* Restore gprs */
-   REST_8GPRS(0,r1)
-   REST_8GPRS(8,r1)
-   R

[PATCH v3 1/2] powerpc: split ftrace bits into a separate file

2017-02-21 Thread Naveen N. Rao
entry_*.S now includes a lot more than just kernel entry/exit code. As a
first step at cleaning this up, let's split out the ftrace bits into
separate files. Also move all related tracing code into a new trace/
subdirectory.

No functional changes.

Suggested-by: Michael Ellerman 
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/kernel/Makefile  |   9 +-
 arch/powerpc/kernel/entry_32.S| 107 ---
 arch/powerpc/kernel/entry_64.S| 380 -
 arch/powerpc/kernel/trace/Makefile|  24 ++
 arch/powerpc/kernel/{ => trace}/ftrace.c  |   0
 arch/powerpc/kernel/trace/ftrace_32.S | 118 
 arch/powerpc/kernel/trace/ftrace_64.S | 391 ++
 arch/powerpc/kernel/{ => trace}/trace_clock.c |   0
 8 files changed, 534 insertions(+), 495 deletions(-)
 create mode 100644 arch/powerpc/kernel/trace/Makefile
 rename arch/powerpc/kernel/{ => trace}/ftrace.c (100%)
 create mode 100644 arch/powerpc/kernel/trace/ftrace_32.S
 create mode 100644 arch/powerpc/kernel/trace/ftrace_64.S
 rename arch/powerpc/kernel/{ => trace}/trace_clock.c (100%)

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index a048b37b9b27..e1a04237f09d 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -29,8 +29,6 @@ CFLAGS_REMOVE_cputable.o = -mno-sched-epilog 
$(CC_FLAGS_FTRACE)
 CFLAGS_REMOVE_prom_init.o = -mno-sched-epilog $(CC_FLAGS_FTRACE)
 CFLAGS_REMOVE_btext.o = -mno-sched-epilog $(CC_FLAGS_FTRACE)
 CFLAGS_REMOVE_prom.o = -mno-sched-epilog $(CC_FLAGS_FTRACE)
-# do not trace tracer code
-CFLAGS_REMOVE_ftrace.o = -mno-sched-epilog $(CC_FLAGS_FTRACE)
 # timers used by tracing
 CFLAGS_REMOVE_time.o = -mno-sched-epilog $(CC_FLAGS_FTRACE)
 endif
@@ -122,10 +120,7 @@ obj64-$(CONFIG_AUDIT)  += compat_audit.o
 
 obj-$(CONFIG_PPC_IO_WORKAROUNDS)   += io-workarounds.o
 
-obj-$(CONFIG_DYNAMIC_FTRACE)   += ftrace.o
-obj-$(CONFIG_FUNCTION_GRAPH_TRACER)+= ftrace.o
-obj-$(CONFIG_FTRACE_SYSCALLS)  += ftrace.o
-obj-$(CONFIG_TRACING)  += trace_clock.o
+obj-y  += trace/
 
 ifneq ($(CONFIG_PPC_INDIRECT_PIO),y)
 obj-y  += iomap.o
@@ -146,8 +141,6 @@ obj-$(CONFIG_KVM_GUEST) += kvm.o kvm_emul.o
 # Disable GCOV & sanitizers in odd or sensitive code
 GCOV_PROFILE_prom_init.o := n
 UBSAN_SANITIZE_prom_init.o := n
-GCOV_PROFILE_ftrace.o := n
-UBSAN_SANITIZE_ftrace.o := n
 GCOV_PROFILE_machine_kexec_64.o := n
 UBSAN_SANITIZE_machine_kexec_64.o := n
 GCOV_PROFILE_machine_kexec_32.o := n
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index f3e4fc1c1b4d..5a2a13bd2193 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -31,7 +31,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 
@@ -1319,109 +1318,3 @@ machine_check_in_rtas:
/* XXX load up BATs and panic */
 
 #endif /* CONFIG_PPC_RTAS */
-
-#ifdef CONFIG_FUNCTION_TRACER
-#ifdef CONFIG_DYNAMIC_FTRACE
-_GLOBAL(mcount)
-_GLOBAL(_mcount)
-   /*
-* It is required that _mcount on PPC32 must preserve the
-* link register. But we have r0 to play with. We use r0
-* to push the return address back to the caller of mcount
-* into the ctr register, restore the link register and
-* then jump back using the ctr register.
-*/
-   mflrr0
-   mtctr   r0
-   lwz r0, 4(r1)
-   mtlrr0
-   bctr
-
-_GLOBAL(ftrace_caller)
-   MCOUNT_SAVE_FRAME
-   /* r3 ends up with link register */
-   subir3, r3, MCOUNT_INSN_SIZE
-.globl ftrace_call
-ftrace_call:
-   bl  ftrace_stub
-   nop
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
-.globl ftrace_graph_call
-ftrace_graph_call:
-   b   ftrace_graph_stub
-_GLOBAL(ftrace_graph_stub)
-#endif
-   MCOUNT_RESTORE_FRAME
-   /* old link register ends up in ctr reg */
-   bctr
-#else
-_GLOBAL(mcount)
-_GLOBAL(_mcount)
-
-   MCOUNT_SAVE_FRAME
-
-   subir3, r3, MCOUNT_INSN_SIZE
-   LOAD_REG_ADDR(r5, ftrace_trace_function)
-   lwz r5,0(r5)
-
-   mtctr   r5
-   bctrl
-   nop
-
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
-   b   ftrace_graph_caller
-#endif
-   MCOUNT_RESTORE_FRAME
-   bctr
-#endif
-EXPORT_SYMBOL(_mcount)
-
-_GLOBAL(ftrace_stub)
-   blr
-
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
-_GLOBAL(ftrace_graph_caller)
-   /* load r4 with local address */
-   lwz r4, 44(r1)
-   subir4, r4, MCOUNT_INSN_SIZE
-
-   /* Grab the LR out of the caller stack frame */
-   lwz r3,52(r1)
-
-   bl  prepare_ftrace_return
-   nop
-
-/*
- * prepare_ftrace_return gives us the address we divert to.
- * Change the LR in the callers stack frame to this.
- */
-   stw r3,52(r1)
-
-   MCOUNT_RESTORE_FRAME
-   /* old link register ends up in c

[PATCH v3 0/2] powerpc: split ftrace bits into separate files

2017-02-21 Thread Naveen N. Rao
v2: https://patchwork.ozlabs.org/patch/697787/

Michael,
Sorry this took as long as it did, but here's a take at v3. This series
conflicts with the KPROBES_ON_FTRACE patchset, but I'm posting this so
as to get feedback. I will rework these patches as needed.

Thanks,
Naveen

Naveen N. Rao (2):
  powerpc: split ftrace bits into a separate file
  powerpc: ftrace_64: split further based on -mprofile-kernel

 arch/powerpc/kernel/Makefile   |   9 +-
 arch/powerpc/kernel/entry_32.S | 107 ---
 arch/powerpc/kernel/entry_64.S | 380 -
 arch/powerpc/kernel/trace/Makefile |  29 ++
 arch/powerpc/kernel/{ => trace}/ftrace.c   |   0
 arch/powerpc/kernel/trace/ftrace_32.S  | 118 
 arch/powerpc/kernel/trace/ftrace_64.S  |  85 ++
 arch/powerpc/kernel/trace/ftrace_64_mprofile.S | 274 ++
 arch/powerpc/kernel/trace/ftrace_64_pg.S   |  68 +
 arch/powerpc/kernel/{ => trace}/trace_clock.c  |   0
 10 files changed, 575 insertions(+), 495 deletions(-)
 create mode 100644 arch/powerpc/kernel/trace/Makefile
 rename arch/powerpc/kernel/{ => trace}/ftrace.c (100%)
 create mode 100644 arch/powerpc/kernel/trace/ftrace_32.S
 create mode 100644 arch/powerpc/kernel/trace/ftrace_64.S
 create mode 100644 arch/powerpc/kernel/trace/ftrace_64_mprofile.S
 create mode 100644 arch/powerpc/kernel/trace/ftrace_64_pg.S
 rename arch/powerpc/kernel/{ => trace}/trace_clock.c (100%)

-- 
2.11.0



Re: [PATCH 0/9] QorIQ DPAA 1 updates

2017-02-21 Thread David Miller
From: Madalin Bucur 
Date: Tue, 21 Feb 2017 13:52:45 +0200

> This patch set introduces a series of fixes and features to the DPAA 1
> drivers. Besides activating hardware Rx checksum offloading, four traffic
> classes are added for Tx traffic prioritisation.

The net-next tree is closed, please resubmit this after the merge window when
the net-next tree opens back up.

Thanks.


Re: [PATCH] powerpc/fadump: set an upper limit for boot memory size

2017-02-21 Thread Hari Bathini

Hi Michael,


On Friday 17 February 2017 11:54 AM, Michael Ellerman wrote:

Hari Bathini  writes:


diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index de7d39a..d5107f4 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -222,6 +222,18 @@ static inline unsigned long 
fadump_calculate_reserve_size(void)
&size, &base);
if (ret == 0 && size > 0) {
fw_dump.reserve_bootvar = (unsigned long)size;
+   /*
+* Adjust if the boot memory size specified is above
+* the upper limit.
+*/
+   if (fw_dump.reserve_bootvar >
+   (memblock_end_of_DRAM() / MAX_BOOT_MEM_RATIO)) {

Using memblock_end_of_DRAM() doesn't take into account the fact that you
might have holes in your memory layout.

Possibly on PowerVM that never happens, but I don't think we should
write the code to assume that, if possible.



I think memblock_phys_mem_size() can fill in..

In the same file, memblock_end_of_DRAM() is also used when nothing
is specified through cmdline. Let me also change that and respin..

Thanks
Hari



[RFC PATCH] memory-hotplug: Use dev_online for memhp_auto_offline

2017-02-21 Thread Nathan Fontenot
Commit 31bc3858e "add automatic onlining policy for the newly added memory"
provides the capability to have added memory automatically onlined
during add, but this appears to be slightly broken.

The current implementation uses walk_memory_range() to call
online_memory_block, which uses memory_block_change_state() to online
the memory. Instead I think we should be calling device_online()
for the memory block in online_memory_block. This would online
the memory (the memory bus online routine memory_subsys_online()
called from device_online calls memory_block_change_state()) and
properly update the device struct offline flag.

As a result of the current implementation, attempting to remove
a memory block after adding it using auto online fails. This is
because doing a remove, for instance
'echo offline > /sys/devices/system/memory/memoryXXX/state', uses
device_offline() which checks the dev->offline flag.

There is a workaround in that a user could online the memory or have
a udev rule to online the memory by using the sysfs interface. The
sysfs interface to online memory goes through device_online() which
should updated the dev->offline flag. I'm not sure that having kernel
memory hotplug rely on userspace actions is the correct way to go.

I have tried reading through the email threads when the origianl patch
was submitted and could not determine if this is the expected behavior.
The problem with the current behavior was found when trying to update
memory hotplug on powerpc to use auto online.

-Nathan Fontenot
---
 drivers/base/memory.c  |2 +-
 include/linux/memory.h |3 ---
 mm/memory_hotplug.c|2 +-
 3 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 8ab8ea1..ede46f3 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -249,7 +249,7 @@ static bool pages_correctly_reserved(unsigned long 
start_pfn)
return ret;
 }
 
-int memory_block_change_state(struct memory_block *mem,
+static int memory_block_change_state(struct memory_block *mem,
unsigned long to_state, unsigned long from_state_req)
 {
int ret = 0;
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 093607f..b723a68 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -109,9 +109,6 @@ static inline int memory_isolate_notify(unsigned long val, 
void *v)
 extern int register_memory_isolate_notifier(struct notifier_block *nb);
 extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
 extern int register_new_memory(int, struct mem_section *);
-extern int memory_block_change_state(struct memory_block *mem,
-unsigned long to_state,
-unsigned long from_state_req);
 #ifdef CONFIG_MEMORY_HOTREMOVE
 extern int unregister_memory_section(struct mem_section *);
 #endif
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index e43142c1..6f7a289 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1329,7 +1329,7 @@ int zone_for_memory(int nid, u64 start, u64 size, int 
zone_default,
 
 static int online_memory_block(struct memory_block *mem, void *arg)
 {
-   return memory_block_change_state(mem, MEM_ONLINE, MEM_OFFLINE);
+   return device_online(&mem->dev);
 }
 
 /* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */



Re: [PATCH v7 2/4] powerpc/pseries: Revert 'Auto-online hotplugged memory'

2017-02-21 Thread Nathan Fontenot
On 02/20/2017 07:02 PM, Michael Ellerman wrote:
> Nathan Fontenot  writes:
> 
>> On 02/15/2017 10:34 PM, Michael Ellerman wrote:
>>> Nathan Fontenot  writes:
>>>
 Revert the patch patch to auto-online hotplugged memory, commit
 id ec999072442a. Using the auto-online acpability does online added
 memory but does not update the associated device struct to
 indicate that the memory is online. The result of this is that
 memoryXX/online file in sysfs still reports the memory as being offline.
>>>
>>> Isn't that just a bug in the auto-online code?
>>
>> After digging through the code some more and reading some of the email
>> chain when the auto-online feature was submitted I can't decide if this
>> is a bug or if this is by design. The fact that they only other users
>> of this appear to be balloon drivers (hv and xen) makes me think this
>> may be by design.
> 
> Have we asked the original authors? I don't see them on Cc?

Not yet. I'm working on a patch to use device_online() for doing
auto online of memory. I will send this out as an RFC with an explanation
of what I'm seeing and ask why it was done the way it was.

> 
>> Changing the auto-online capability to call device_offline() instead
>> would appear to also require changes to the hv and xen balloon
>> drivers for the new behavior.
> 
> OK, if that's the case then that's going to make life tricky.

Yep. I'll ask about this with the RFC I send out.

-Nathan

> 
>>> If I'm reading it right it's calling online_memory_block(). If that
>>> doesn't cause the memory_block to be online that would puzzle me.
>>
>> The memory is online and usuable when the dlpar operation completes. I
>> was mistaken in my original note though, the state file in sysfs does report
>> the memory as being online. The underlying issue is that the device struct
>> does not get updated (dev->offline) when using the auto-online capability.
>> The result is that trying to remove a LMB a second time fails when we call
>> device_offline() which checks the dev->offline flag and returns failure.
> 
> That still sounds like a bug to me. We asked the core to "auto online"
> the added memory, but the dev is still offline? But maybe there's some
> subtlety.
> 
>> I think reverting the patch to use the auto-online capability may be the
>> way to go. This would restore the code so that we call device_online and
>> device_offline for add and remove respectively, and not rely on what the 
>> auto-online code is doing.
>>
>> Thoughts?
> 
> It's not great, but given we need to backport it to v4.8, yeah I think
> we'll have to go with a revert.
> 
> But we should also pursue fixing the auto online logic.
> 
>>> Also commit ec999072442a went into v4.8, so is memory hotplug broken
>>> since then? If so we need to backport this or whatever fix we come up.
>>
>> Yes, we need to backport whatever fix we do.
> 
> Right. I'll queue it up.
> 
> cheers
> 



Re: next-20170217 boot on POWER8 LPAR : WARNING @kernel/jump_label.c:287

2017-02-21 Thread Jason Baron

On 02/20/2017 10:05 PM, Sachin Sant wrote:



On 20-Feb-2017, at 8:27 PM, Jason Baron mailto:jba...@akamai.com>> wrote:

Hi,

On 02/19/2017 09:07 AM, Sachin Sant wrote:

While booting next-20170217 on a POWER8 LPAR following
warning is displayed.

Reverting the following commit helps boot cleanly.
commit 3821fd35b5 :  jump_label: Reduce the size of struct static_key

[   11.393008] [ cut here ]
[   11.393031] WARNING: CPU: 5 PID: 2890 at kernel/jump_label.c:287
static_key_set_entries.isra.10+0x3c/0x50


Thanks for the report. So this is saying that the jump_entry table is
not at least 4-byte aligned. I wonder if this fixes it up?



Yes. With this patch the warning is gone.



Hi,

Thanks for testing. We probably need something like the following to 
make sure we don't hit this on other arches. Steve - I will send 4 
separate patches for this to get arch maintainers' acks for this?


Thanks,

-Jason

diff --git a/arch/arm/include/asm/jump_label.h 
b/arch/arm/include/asm/jump_label.h

index 34f7b6980d21..9720c2f1850b 100644
--- a/arch/arm/include/asm/jump_label.h
+++ b/arch/arm/include/asm/jump_label.h
@@ -13,6 +13,7 @@ static __always_inline bool arch_static_branch(struct 
static_key *key, bool bran

asm_volatile_goto("1:\n\t"
 WASM(nop) "\n\t"
 ".pushsection __jump_table,  \"aw\"\n\t"
+".balign 4 \n\t"
 ".word 1b, %l[l_yes], %c0\n\t"
 ".popsection\n\t"
 : :  "i" (&((char *)key)[branch]) :  : l_yes);
@@ -27,6 +28,7 @@ static __always_inline bool 
arch_static_branch_jump(struct static_key *key, bool

asm_volatile_goto("1:\n\t"
 WASM(b) " %l[l_yes]\n\t"
 ".pushsection __jump_table,  \"aw\"\n\t"
+".balign 4 \n\t"
 ".word 1b, %l[l_yes], %c0\n\t"
 ".popsection\n\t"
 : :  "i" (&((char *)key)[branch]) :  : l_yes);
diff --git a/arch/mips/include/asm/jump_label.h 
b/arch/mips/include/asm/jump_label.h

index e77672539e8e..51ce97dda3cc 100644
--- a/arch/mips/include/asm/jump_label.h
+++ b/arch/mips/include/asm/jump_label.h
@@ -31,6 +31,7 @@ static __always_inline bool arch_static_branch(struct 
static_key *key, bool bran

asm_volatile_goto("1:\t" NOP_INSN "\n\t"
"nop\n\t"
".pushsection __jump_table,  \"aw\"\n\t"
+   ".balign 4 \n\t"
WORD_INSN " 1b, %l[l_yes], %0\n\t"
".popsection\n\t"
: :  "i" (&((char *)key)[branch]) : : l_yes);
@@ -45,6 +46,7 @@ static __always_inline bool 
arch_static_branch_jump(struct static_key *key, bool

asm_volatile_goto("1:\tj %l[l_yes]\n\t"
"nop\n\t"
".pushsection __jump_table,  \"aw\"\n\t"
+   ".balign 4 \n\t"
WORD_INSN " 1b, %l[l_yes], %0\n\t"
".popsection\n\t"
: :  "i" (&((char *)key)[branch]) : : l_yes);
diff --git a/arch/powerpc/include/asm/jump_label.h 
b/arch/powerpc/include/asm/jump_label.h

index 9a287e0ac8b1..f870a85bac46 100644
--- a/arch/powerpc/include/asm/jump_label.h
+++ b/arch/powerpc/include/asm/jump_label.h
@@ -24,6 +24,7 @@ static __always_inline bool arch_static_branch(struct 
static_key *key, bool bran

asm_volatile_goto("1:\n\t"
 "nop # arch_static_branch\n\t"
 ".pushsection __jump_table,  \"aw\"\n\t"
+".balign 4 \n\t"
 JUMP_ENTRY_TYPE "1b, %l[l_yes], %c0\n\t"
 ".popsection \n\t"
 : :  "i" (&((char *)key)[branch]) : : l_yes);
@@ -38,6 +39,7 @@ static __always_inline bool 
arch_static_branch_jump(struct static_key *key, bool

asm_volatile_goto("1:\n\t"
 "b %l[l_yes] # arch_static_branch_jump\n\t"
 ".pushsection __jump_table,  \"aw\"\n\t"
+".balign 4 \n\t"
 JUMP_ENTRY_TYPE "1b, %l[l_yes], %c0\n\t"
 ".popsection \n\t"
 : :  "i" (&((char *)key)[branch]) : : l_yes);
@@ -63,6 +65,7 @@ struct jump_entry {
 #define ARCH_STATIC_BRANCH(LABEL, KEY) \
 1098:  nop;\
.pushsection __jump_table, "aw";\
+   .balign 4;  \
FTR_ENTRY_LONG 1098b, LABEL, KEY;   \
.popsection
 #endif
diff --git a/arch/tile/include/asm/jump_label.h 
b/arch/tile/include/asm/jump_label.h

index cde7573f397b..c9f6125c41ef 100644
--- a/arch/tile/include/asm/jump_label.h
+++ b/arch/tile/include/asm/jump_label.h
@@ -25,6 +25,7 @@ static __always_inline bool arch_static_branch(struct 
static_key *key,

asm_volatile_goto("1:\n\t"
"nop" "\n\t"
".pushsection __jump_table,  \"aw\"\n\t"
+   ".balign 4 \n\t"
".quad 1b, %l[l_yes], %0 + %1 \n\t"
".popsection\n\t"
: :  "i" (key), "i" (br

[PATCH v2 5/5] powerpc: kprobes: prefer ftrace when probing function entry

2017-02-21 Thread Naveen N. Rao
KPROBES_ON_FTRACE avoids much of the overhead with regular kprobes as it
eliminates the need for a trap, as well as the need to emulate or
single-step instructions.

Though OPTPROBES provides us with similar performance, we have limited
optprobes trampoline slots. As such, when asked to probe at a function
entry, default to using the ftrace infrastructure.

With:
# cd /sys/kernel/debug/tracing
# echo 'p _do_fork' > kprobe_events

before patch:
# cat ../kprobes/list
c00daf08  k  _do_fork+0x8[DISABLED]
c0044fc0  k  kretprobe_trampoline+0x0[OPTIMIZED]

and after patch:
# cat ../kprobes/list
c00d074c  k  _do_fork+0xc[DISABLED][FTRACE]
c00412b0  k  kretprobe_trampoline+0x0[OPTIMIZED]

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/kernel/kprobes.c | 17 +++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index b78b274e1d6e..23d19678a56f 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -49,8 +49,21 @@ kprobe_opcode_t *kprobe_lookup_name(const char *name, 
unsigned int offset)
 #ifdef PPC64_ELF_ABI_v2
/* PPC64 ABIv2 needs local entry point */
addr = (kprobe_opcode_t *)kallsyms_lookup_name(name);
-   if (addr && !offset)
-   addr = (kprobe_opcode_t *)ppc_function_entry(addr);
+   if (addr && !offset) {
+#ifdef CONFIG_KPROBES_ON_FTRACE
+   unsigned long faddr;
+   /*
+* Per livepatch.h, ftrace location is always within the first
+* 16 bytes of a function on powerpc with -mprofile-kernel.
+*/
+   faddr = ftrace_location_range((unsigned long)addr,
+ (unsigned long)addr + 16);
+   if (faddr)
+   addr = (kprobe_opcode_t *)faddr;
+   else
+#endif
+   addr = (kprobe_opcode_t *)ppc_function_entry(addr);
+   }
 #elif defined(PPC64_ELF_ABI_v1)
/*
 * 64bit powerpc ABIv1 uses function descriptors:
-- 
2.11.0



[PATCH v2 4/5] powerpc: kprobes: add support for KPROBES_ON_FTRACE

2017-02-21 Thread Naveen N. Rao
Allow kprobes to be placed on ftrace _mcount() call sites. This
optimization avoids the use of a trap, by riding on ftrace
infrastructure.

This depends on HAVE_DYNAMIC_FTRACE_WITH_REGS which depends on
MPROFILE_KERNEL, which is only currently enabled on powerpc64le with
newer toolchains.

Based on the x86 code by Masami.

Signed-off-by: Naveen N. Rao 
---
 .../debug/kprobes-on-ftrace/arch-support.txt   |   2 +-
 arch/powerpc/Kconfig   |   1 +
 arch/powerpc/include/asm/kprobes.h |  10 ++
 arch/powerpc/kernel/Makefile   |   3 +
 arch/powerpc/kernel/kprobes-ftrace.c   | 104 +
 arch/powerpc/kernel/kprobes.c  |   8 +-
 6 files changed, 126 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/kernel/kprobes-ftrace.c

diff --git a/Documentation/features/debug/kprobes-on-ftrace/arch-support.txt 
b/Documentation/features/debug/kprobes-on-ftrace/arch-support.txt
index 40f44d041fb4..930430c6aef6 100644
--- a/Documentation/features/debug/kprobes-on-ftrace/arch-support.txt
+++ b/Documentation/features/debug/kprobes-on-ftrace/arch-support.txt
@@ -27,7 +27,7 @@
 |   nios2: | TODO |
 |openrisc: | TODO |
 |  parisc: | TODO |
-| powerpc: | TODO |
+| powerpc: |  ok  |
 |s390: | TODO |
 |   score: | TODO |
 |  sh: | TODO |
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 5e7aaa9976e2..c2fdd895816c 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -101,6 +101,7 @@ config PPC
select HAVE_EFFICIENT_UNALIGNED_ACCESS if !(CPU_LITTLE_ENDIAN && 
POWER7_CPU)
select HAVE_KPROBES
select HAVE_OPTPROBES if PPC64
+   select HAVE_KPROBES_ON_FTRACE
select HAVE_ARCH_KGDB
select HAVE_KRETPROBES
select HAVE_ARCH_TRACEHOOK
diff --git a/arch/powerpc/include/asm/kprobes.h 
b/arch/powerpc/include/asm/kprobes.h
index ab5bd200bb48..a8a3c04ba248 100644
--- a/arch/powerpc/include/asm/kprobes.h
+++ b/arch/powerpc/include/asm/kprobes.h
@@ -100,6 +100,16 @@ extern int kprobe_exceptions_notify(struct notifier_block 
*self,
 extern int kprobe_fault_handler(struct pt_regs *regs, int trapnr);
 extern int kprobe_handler(struct pt_regs *regs);
 extern int kprobe_post_handler(struct pt_regs *regs);
+#ifdef CONFIG_KPROBES_ON_FTRACE
+extern int skip_singlestep(struct kprobe *p, struct pt_regs *regs,
+  struct kprobe_ctlblk *kcb);
+#else
+static inline int skip_singlestep(struct kprobe *p, struct pt_regs *regs,
+ struct kprobe_ctlblk *kcb)
+{
+   return 0;
+}
+#endif
 #else
 static inline int kprobe_handler(struct pt_regs *regs) { return 0; }
 static inline int kprobe_post_handler(struct pt_regs *regs) { return 0; }
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index a048b37b9b27..88b21427ccc7 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -101,6 +101,7 @@ obj-$(CONFIG_BOOTX_TEXT)+= btext.o
 obj-$(CONFIG_SMP)  += smp.o
 obj-$(CONFIG_KPROBES)  += kprobes.o
 obj-$(CONFIG_OPTPROBES)+= optprobes.o optprobes_head.o
+obj-$(CONFIG_KPROBES_ON_FTRACE)+= kprobes-ftrace.o
 obj-$(CONFIG_UPROBES)  += uprobes.o
 obj-$(CONFIG_PPC_UDBG_16550)   += legacy_serial.o udbg_16550.o
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
@@ -154,6 +155,8 @@ GCOV_PROFILE_machine_kexec_32.o := n
 UBSAN_SANITIZE_machine_kexec_32.o := n
 GCOV_PROFILE_kprobes.o := n
 UBSAN_SANITIZE_kprobes.o := n
+GCOV_PROFILE_kprobes-ftrace.o := n
+UBSAN_SANITIZE_kprobes-ftrace.o := n
 UBSAN_SANITIZE_vdso.o := n
 
 extra-$(CONFIG_PPC_FPU)+= fpu.o
diff --git a/arch/powerpc/kernel/kprobes-ftrace.c 
b/arch/powerpc/kernel/kprobes-ftrace.c
new file mode 100644
index ..6c089d9757c9
--- /dev/null
+++ b/arch/powerpc/kernel/kprobes-ftrace.c
@@ -0,0 +1,104 @@
+/*
+ * Dynamic Ftrace based Kprobes Optimization
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) Hitachi Ltd., 2012
+ * Copyright 2016 Naveen N. Rao 
+ *   IBM Corporation
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static nokprobe_inline
+int __skip_singlestep(struct k

[PATCH v2 3/5] kprobes: Skip preparing optprobe if the probe is ftrace-based

2017-02-21 Thread Naveen N. Rao
From: Masami Hiramatsu 

Skip preparing optprobe if the probe is ftrace-based, since anyway, it
must not be optimized (or already optimized by ftrace).

Tested-by: Naveen N. Rao 
Signed-off-by: Masami Hiramatsu 
---
 kernel/kprobes.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index fbc7a70ff33e..7e93500dede7 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -708,13 +708,20 @@ static void kill_optimized_kprobe(struct kprobe *p)
arch_remove_optimized_kprobe(op);
 }
 
+static inline
+void __prepare_optimized_kprobe(struct optimized_kprobe *op, struct kprobe *p)
+{
+   if (!kprobe_ftrace(p))
+   arch_prepare_optimized_kprobe(op, p);
+}
+
 /* Try to prepare optimized instructions */
 static void prepare_optimized_kprobe(struct kprobe *p)
 {
struct optimized_kprobe *op;
 
op = container_of(p, struct optimized_kprobe, kp);
-   arch_prepare_optimized_kprobe(op, p);
+   __prepare_optimized_kprobe(op, p);
 }
 
 /* Allocate new optimized_kprobe and try to prepare optimized instructions */
@@ -728,7 +735,7 @@ static struct kprobe *alloc_aggr_kprobe(struct kprobe *p)
 
INIT_LIST_HEAD(&op->list);
op->kp.addr = p->addr;
-   arch_prepare_optimized_kprobe(op, p);
+   __prepare_optimized_kprobe(op, p);
 
return &op->kp;
 }
-- 
2.11.0



[PATCH v2 2/5] powerpc: ftrace: restore LR from pt_regs

2017-02-21 Thread Naveen N. Rao
Pass the real LR to the ftrace handler. This is needed for
KPROBES_ON_FTRACE for the pre handlers.

Also, with KPROBES_ON_FTRACE, the link register may be updated by the
pre handlers or by a registed kretprobe. Honor updated LR by restoring
it from pt_regs, rather than from the stack save area.

Live patch and function graph continue to work fine with this change.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/kernel/entry_64.S | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 8fd8718722a1..744b2f91444a 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -1248,9 +1248,10 @@ _GLOBAL(ftrace_caller)
 
/* Get the _mcount() call site out of LR */
mflrr7
-   /* Save it as pt_regs->nip & pt_regs->link */
+   /* Save it as pt_regs->nip */
std r7, _NIP(r1)
-   std r7, _LINK(r1)
+   /* Save the read LR in pt_regs->link */
+   std r0, _LINK(r1)
 
/* Save callee's TOC in the ABI compliant location */
std r2, 24(r1)
@@ -1297,16 +1298,16 @@ ftrace_call:
REST_8GPRS(16,r1)
REST_8GPRS(24,r1)
 
+   /* Restore possibly modified LR */
+   ld  r0, _LINK(r1)
+   mtlrr0
+
/* Restore callee's TOC */
ld  r2, 24(r1)
 
/* Pop our stack frame */
addi r1, r1, SWITCH_FRAME_SIZE
 
-   /* Restore original LR for return to B */
-   ld  r0, LRSAVE(r1)
-   mtlrr0
-
 #ifdef CONFIG_LIVEPATCH
 /* Based on the cmpd above, if the NIP was altered handle livepatch */
bne-livepatch_handler
-- 
2.11.0



[PATCH v2 1/5] powerpc: ftrace: minor cleanup

2017-02-21 Thread Naveen N. Rao
Move the stack setup and teardown code to the ftrace_graph_caller().
This way, we don't incur the cost of setting it up unless function graph
is enabled for this function.

Also, remove the extraneous LR restore code after the function graph
stub. LR has previously been restored and neither livepatch_handler()
nor ftrace_graph_caller() return back here.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/kernel/entry_64.S | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 6432d4bf08c8..8fd8718722a1 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -1313,16 +1313,12 @@ ftrace_call:
 #endif
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
-   stdur1, -112(r1)
 .globl ftrace_graph_call
 ftrace_graph_call:
b   ftrace_graph_stub
 _GLOBAL(ftrace_graph_stub)
-   addir1, r1, 112
 #endif
 
-   ld  r0,LRSAVE(r1)   /* restore callee's lr at _mcount site */
-   mtlrr0
bctr/* jump after _mcount site */
 #endif /* CC_USING_MPROFILE_KERNEL */
 
@@ -1446,6 +1442,7 @@ _GLOBAL(ftrace_stub)
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 #ifndef CC_USING_MPROFILE_KERNEL
 _GLOBAL(ftrace_graph_caller)
+   stdur1, -112(r1)
/* load r4 with local address */
ld  r4, 128(r1)
subir4, r4, MCOUNT_INSN_SIZE
@@ -1471,6 +1468,7 @@ _GLOBAL(ftrace_graph_caller)
 
 #else /* CC_USING_MPROFILE_KERNEL */
 _GLOBAL(ftrace_graph_caller)
+   stdur1, -112(r1)
/* with -mprofile-kernel, parameter regs are still alive at _mcount */
std r10, 104(r1)
std r9, 96(r1)
-- 
2.11.0



Re: [PATCH] powerpc/pseries: advertise Hot Plug Event support to firmware

2017-02-21 Thread Nathan Fontenot
On 02/20/2017 07:12 PM, Michael Roth wrote:
> With the inclusion of:
> 
>   powerpc/pseries: Implement indexed-count hotplug memory remove
>   powerpc/pseries: Implement indexed-count hotplug memory add
> 
> we now have complete handling of the RTAS hotplug event format
> as described by PAPR via ACR "PAPR Changes for Hotplug RTAS Events".
> 
> This capability is indicated by byte 6, bit 5 of architecture
> option vector 5, and allows for greater control over cpu/memory/pci
> hot plug/unplug operations.
> 
> Existing pseries kernels will utilize this capability based on the
> existence of the /event-sources/hot-plug-events DT property, so we
> only need to advertise it via CAS and do not need a corresponding
> FW_FEATURE_* value to test for.
> 
> Cc: Michael Ellerman 
> Cc: Nathan Fontenot 
> Cc: David Gibson 
> Signed-off-by: Michael Roth 

Reviewed-by: Nathan Fontenot 

> ---
>  arch/powerpc/include/asm/prom.h | 1 +
>  arch/powerpc/kernel/prom_init.c | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/prom.h b/arch/powerpc/include/asm/prom.h
> index 2c8001c..4a90634 100644
> --- a/arch/powerpc/include/asm/prom.h
> +++ b/arch/powerpc/include/asm/prom.h
> @@ -153,6 +153,7 @@ struct of_drconf_cell {
>  #define OV5_XCMO 0x0440  /* Page Coalescing */
>  #define OV5_TYPE1_AFFINITY   0x0580  /* Type 1 NUMA affinity */
>  #define OV5_PRRN 0x0540  /* Platform Resource Reassignment */
> +#define OV5_HP_EVT   0x0604  /* Hot Plug Event support */
>  #define OV5_RESIZE_HPT   0x0601  /* Hash Page Table resizing */
>  #define OV5_PFO_HW_RNG   0x1180  /* PFO Random Number Generator 
> */
>  #define OV5_PFO_HW_842   0x1140  /* PFO Compression Accelerator 
> */
> diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
> index f3c8799..1a835e7 100644
> --- a/arch/powerpc/kernel/prom_init.c
> +++ b/arch/powerpc/kernel/prom_init.c
> @@ -839,7 +839,7 @@ struct ibm_arch_vec __cacheline_aligned 
> ibm_architecture_vec = {
>   0,
>  #endif
>   .associativity = OV5_FEAT(OV5_TYPE1_AFFINITY) | 
> OV5_FEAT(OV5_PRRN),
> - .bin_opts = OV5_FEAT(OV5_RESIZE_HPT),
> + .bin_opts = OV5_FEAT(OV5_RESIZE_HPT) | OV5_FEAT(OV5_HP_EVT),
>   .micro_checkpoint = 0,
>   .reserved0 = 0,
>   .max_cpus = cpu_to_be32(NR_CPUS),   /* number of cores 
> supported */
> 



Re: [PowerPC] 4.10.0 fails to build on BE config

2017-02-21 Thread Frederic Weisbecker
On Tue, Feb 21, 2017 at 12:55:35PM +0530, abdul wrote:
> Hi,
> 
> Today's mainline build, breaks on Power6 and Power7 (all BE config) with
> these build errors
> 
> arch/powerpc/kernel/time.c: In function ‘running_clock’:
> arch/powerpc/kernel/time.c:712:2: error: implicit declaration of function
> ‘cputime_to_nsecs’ [-Werror=implicit-function-declaration]
> return local_clock() -
> cputime_to_nsecs(kcpustat_this_cpu->cpustat[CPUTIME_STEAL]);
> ^
> cc1: some warnings being treated as errors
> make[1]: *** [arch/powerpc/kernel/time.o] Error 1

Thanks for reporting this!
I just sent a fix.


[PATCH] powerpc: Remove leftover cputime_to_nsecs call causing build error

2017-02-21 Thread Frederic Weisbecker
This type conversion is a leftover that got ignored during the kcpustat
conversion to nanosecs, resulting in build breakage with config having
CONFIG_NO_HZ_FULL=y.

arch/powerpc/kernel/time.c: In function 'running_clock':
arch/powerpc/kernel/time.c:712:2: error: implicit declaration of 
function 'cputime_to_nsecs' [-Werror=implicit-function-declaration]
  return local_clock() - 
cputime_to_nsecs(kcpustat_this_cpu->cpustat[CPUTIME_STEAL]);

All we need is to remove it.

Reported-by: Abdul Haleem 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Oliver O'Halloran 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Frederic Weisbecker 
---
 arch/powerpc/kernel/time.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 14e4855..bc84a8d 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -709,7 +709,7 @@ unsigned long long running_clock(void)
 * time and on a host which doesn't do any virtualisation TB *should* 
equal
 * VTB so it makes no difference anyway.
 */
-   return local_clock() - 
cputime_to_nsecs(kcpustat_this_cpu->cpustat[CPUTIME_STEAL]);
+   return local_clock() - kcpustat_this_cpu->cpustat[CPUTIME_STEAL];
 }
 #endif
 
-- 
2.7.4



[PATCH] powerpc: optprobes: fix TOC handling in optprobes trampoline

2017-02-21 Thread Naveen N. Rao
Optprobes on powerpc is limited to kernel text area. We decided to also
optimize kretprobe_trampoline since that is also in kernel text area.
However,we failed to take into consideration the fact that the same
trampoline is also used to catch function returns from kernel modules.
As an example:

$ sudo modprobe kobject-example
$ sudo bash -c "echo 'r foo_show+8' > 
/sys/kernel/debug/tracing/kprobe_events"
$ sudo bash -c "echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable"
$ sudo cat /sys/kernel/debug/kprobes/list
c0041350  k  kretprobe_trampoline+0x0[OPTIMIZED]
d0e00200  r  foo_show+0x8  kobject_example
$ cat /sys/kernel/kobject_example/foo
Segmentation fault

With the below (trimmed) splat in dmesg:

[70646.248029] Unable to handle kernel paging request for data at address 
0xfec4
[70646.248730] Faulting instruction address: 0xc0041540
[70646.249210] Oops: Kernel access of bad area, sig: 11 [#1]
[snip]
[70646.259635] NIP [c0041540] optimized_callback+0x70/0xe0
[70646.259962] LR [c0041e60] optinsn_slot+0xf8/0x1
[70646.260268] Call Trace:
[70646.260583] [c000c7327850] [c0289af4] 
alloc_set_pte+0x1c4/0x860 (unreliable)
[70646.260910] [c000c7327890] [c0041e60] 
optinsn_slot+0xf8/0x1
[70646.261223] --- interrupt: 700 at 0xc000c7327a80
   LR = kretprobe_trampoline+0x0/0x10
[70646.261849] [c000c7327ba0] [c03a30d4] 
sysfs_kf_seq_show+0x104/0x1d0
[70646.262135] [c000c7327bf0] [c03a0bb4] 
kernfs_seq_show+0x44/0x60
[70646.264211] [c000c7327c10] [c0330578] seq_read+0xf8/0x560
[70646.265142] [c000c7327cb0] [c03a1e64] 
kernfs_fop_read+0x194/0x260
[70646.266070] [c000c7327d00] [c02f9954] __vfs_read+0x44/0x1a0
[70646.266977] [c000c7327d90] [c02fb4cc] vfs_read+0xbc/0x1b0
[70646.267860] [c000c7327de0] [c02fd138] SyS_read+0x68/0x110
[70646.268701] [c000c7327e30] [c000b8e0] system_call+0x38/0xfc
[snip]

Fix this by loading up the kernel TOC before calling into the kernel.
The original TOC gets restored as part of the usual pt_regs restore.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/kernel/optprobes_head.S | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/kernel/optprobes_head.S 
b/arch/powerpc/kernel/optprobes_head.S
index 53e429b5a29d..bf28188f308c 100644
--- a/arch/powerpc/kernel/optprobes_head.S
+++ b/arch/powerpc/kernel/optprobes_head.S
@@ -65,6 +65,13 @@ optprobe_template_entry:
mfdsisr r5
std r5,_DSISR(r1)
 
+   /*
+* We may get here from a module, so load the kernel TOC in r2.
+* The original TOC gets restored when pt_regs is restored
+*  further below.
+*/
+   ld  r2,PACATOC(r13)
+
.global optprobe_template_op_address
 optprobe_template_op_address:
/*
-- 
2.11.0



Re: [PATCH v3 3/3] powerpc/xmon: add debugfs entry for xmon

2017-02-21 Thread Guilherme G. Piccoli
On 02/21/2017 03:35 AM, Pan Xinhui wrote:
> 
> 
> 在 2017/2/21 09:58, Guilherme G. Piccoli 写道:
>> Currently the xmon debugger is set only via kernel boot command-line.
>> It's disabled by default, and can be enabled with "xmon=on" on the
>> command-line. Also, xmon may be accessed via sysrq mechanism.
>> But we cannot enable/disable xmon in runtime, it needs kernel reload.
>>
>> This patch introduces a debugfs entry for xmon, allowing user to query
>> its current state and change it if desired. Basically, the "xmon" file
>> to read from/write to is under the debugfs mount point, on powerpc
>> directory. It's a simple attribute, value 0 meaning xmon is disabled
>> and value 1 the opposite. Writing these states to the file will take
>> immediate effect in the debugger.
>>
>> Signed-off-by: Guilherme G. Piccoli 
>> ---
>> v3: logic improved based in the changes made on patch 1.
>>
>> v2: dropped the custom parser by using simple attributes [mpe suggestion].
>>
>>
>>  arch/powerpc/xmon/xmon.c | 31 +++
>>  1 file changed, 31 insertions(+)
>>
>> diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
>> index f1fcfa8..764ea62 100644
>> --- a/arch/powerpc/xmon/xmon.c
>> +++ b/arch/powerpc/xmon/xmon.c
>> @@ -29,6 +29,10 @@
>>  #include 
>>  #include 
>>
>> +#ifdef CONFIG_DEBUG_FS
>> +#include 
>> +#endif
>> +
>>  #include 
>>  #include 
>>  #include 
>> @@ -3275,6 +3279,33 @@ static int __init setup_xmon_sysrq(void)
>>  device_initcall(setup_xmon_sysrq);
>>  #endif /* CONFIG_MAGIC_SYSRQ */
>>
>> +#ifdef CONFIG_DEBUG_FS
>> +static int xmon_dbgfs_set(void *data, u64 val)
>> +{
>> +xmon_off = !val;
> xmon_off is really 'off' :)
> There is xmon_on
> 

OK, this is *really* odd.
I sent the wrong version of the patch ={

I'm _sorry_, thanks for noticing Xinhui!

@Michael, my complete fault. I tested the real "v3" of this patch, but
for some reason (aka my disorganization) I sent the old v2. In the v3, I
used "xmon_on = !!val".

How about if I resend the whole series tomorrow, correcting this and
changing patch 2 in order to keep the automatic show of backtrace
always? Or do you prefer to fix it yourself?

Thanks, and again, I apologize for the mess.


Guilherme



>> +xmon_init(!xmon_off);
>> +
>> +return 0;
>> +}
>> +
>> +static int xmon_dbgfs_get(void *data, u64 *val)
>> +{
>> +*val = !xmon_off;
>> +return 0;
>> +}
>> +
>> +DEFINE_SIMPLE_ATTRIBUTE(xmon_dbgfs_ops, xmon_dbgfs_get,
>> +xmon_dbgfs_set, "%llu\n");
>> +
>> +static int __init setup_xmon_dbgfs(void)
>> +{
>> +debugfs_create_file("xmon", 0600, powerpc_debugfs_root, NULL,
>> +&xmon_dbgfs_ops);
>> +return 0;
>> +}
>> +device_initcall(setup_xmon_dbgfs);
>> +#endif /* CONFIG_DEBUG_FS */
>> +
>>  static int xmon_early __initdata;
>>
>>  static int __init early_parse_xmon(char *p)
>>



Re: [PATCH 1/1] powerpc/xmon: Dump memory in native endian format.

2017-02-21 Thread Douglas Miller

On 02/20/2017 11:01 PM, Michael Ellerman wrote:

Douglas Miller  writes:


diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 9c0e17c..6249975 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -2334,9 +2338,49 @@ static void dump_pacas(void)
  }
  #endif
  
+static void dump_by_size(unsigned long addr, long count, int size)

+{
+   unsigned char temp[16];
+   int i, j;
+   u64 val;
+
+   /*
+* 'count' was aligned 16. If that changes, the following
+* must also change to accommodate other values for 'count'.
+*/
+   for (i = 0; i < count; i += 16, addr += 16) {
+   printf(REG, addr);
+
+   if (mread(addr, temp, 16) != 16) {
+   printf("Faulted reading %d bytes from 0x"REG"\n", 16, 
addr);
+   return;
+   }
+
+   for (j = 0; j < 16; j += size) {
+   putchar(' ');
+   switch (size) {
+   case 1: val = temp[j]; break;
+   case 2: val = *(u16 *)&temp[j]; break;
+   case 4: val = *(u32 *)&temp[j]; break;
+   case 8: val = *(u64 *)&temp[j]; break;
+   default: val = 0;
+   }
+
+   printf("%0*lx", size * 2, val);
+   }
+   printf("  |");
+   for (j = 0; j < 16; ++j) {
+   val = temp[j];
+   putchar(' ' <= val && val <= '~' ? val : '.');
+   }
+   printf("|\n");

I know the ascii dump looks nice, but I think it's misleading. Which is
why I omitted it from my version.

eg.

0:mon> d $__kstrtab_init_task
c0c03ebe 696e69745f746173 6b006d6d755f6665  |init_task.mmu_fe|
c0c03ece 61747572655f6b65 7973006370755f66  |ature_keys.cpu_f|
c0c03ede 6561747572655f6b 657973006375725f  |eature_keys.cur_|
c0c03eee 6370755f73706563 00766972715f746f  |cpu_spec.virq_to|

0:mon> d8 $__kstrtab_init_task
c0c03ebe 7361745f74696e69 65665f756d6d006b  |init_task.mmu_fe|
c0c03ece 656b5f6572757461 665f757063007379  |ature_keys.cpu_f|
c0c03ede 6b5f657275746165 5f72756300737965  |eature_keys.cur_|
c0c03eee 636570735f757063 6f745f7172697600  |cpu_spec.virq_to|


That second dump says at c0c03ebe there is a byte with the value
0x73, which prints as 'i' - but that's false.


So I've dropped the ascii printing for now because I want to sneak this
in to v4.11.

If you want to send a follow-up patch to do the ascii byte reversed that
would be nice.

cheers

I would disagree, anything printed as bytes should be in only one order 
- the order it exists in memory. I maintain that the ascii dump is 
correctly printed. The purpose of an ascii dump like this is to show 
what is in memory. ASCII in memory has only one order.


Thanks,
Doug



Re: [PATCH 2/3] powerpc/xmon: drop the nobt option from xmon plus minor fixes

2017-02-21 Thread Guilherme G. Piccoli
On 02/21/2017 02:33 AM, Michael Ellerman wrote:
> Michael Ellerman  writes:
>> "Guilherme G. Piccoli"  writes:
> ...
>>
>> Imagine you're debugging a machine and you drop into xmon to check
>> something, then drop out again.
>>
>> Then you go away and leave the box, and it crashes into xmon, but xmon
>> doesn't print a backtrace because you've already been in xmon. Usually
>> you can just get on the console and hit 't', but sometimes the machine
>> crashes so hard that xmon doesn't take input - in which case you now
>> have no backtrace. :sadface:
> 
> OK I read your patch wrong, the above won't happen.
> 
> But I still don't think we need any of the auto back trace suppression.

Just to clarify, so do you think we always should print the backtrace
automatically, correctly? I can change it and resend the patch if you want.

Thanks,


Guilherme

> 
> cheers
> 



Re: [PATCH 2/3] powerpc/xmon: drop the nobt option from xmon plus minor fixes

2017-02-21 Thread Guilherme G. Piccoli
On 02/21/2017 02:16 AM, Michael Ellerman wrote:
> "Guilherme G. Piccoli"  writes:
>> Subject: Re: [PATCH 2/3] powerpc/xmon: drop the nobt option from xmon plus 
>> minor fixes
> 
> In future please use the same version number for all patches of a
> series.
> 
> ie. This should include a v2, like the rest of the patches in the series.
> 
> It confuses the tools to have "v2 1/3" "2/3" "v2 3/3".
> 
> I realise that might seem a little odd when a patch is new to the
> series, but the version is the version *of the series*, not the
> individual patches.
> 
> For a new patch you can just add after the change log:
> 
> ---
> v2: New for v2 of the series.
> 
> 
> For example.

Sure, thanks for the hint and for explain very well how I should
proceed! Unfortunately...as you probably already noticed, I'm only
seeing this after sent the v2 of the series heheh
Sorry, next time I'll follow your suggestion (TBH I thought of it before
sending this, but I got more inclined in mess with the series numbering
heheh)

> 
>> The xmon parameter nobt was added long time ago, by commit 26c8af5f01df
>> ("[POWERPC] print backtrace when entering xmon"). The problem that time
>> was that during a crash in a machine with USB keyboard, xmon wouldn't
>> respond to commands from the keyboard, so printing the backtrace wouldn't
>> be possible.
>>
>> Idea then was to show automatically the backtrace on xmon crash for the
>> first time it's invoked (if it recovers, next time xmon won't show
>> backtrace automatically). The nobt parameter was added _only_ to prevent
>> this automatic trace show. Seems long time ago USB keyboards didn't work
>> that well!
>>
>> We don't need it anymore, but the feature of auto showing the backtrace
>> on first crash seems interesting (imagine a case of auto-reboot script),
>> so this patch keeps the functionality, yet removes the nobt parameter.
> 
> 
> I'm going to take this as-is, because I want to get it in for v4.11.
> 
> But I don't think we need the auto back trace logic at all. If anything
> it's an anti-feature IMHO.
> 
> Imagine you're debugging a machine and you drop into xmon to check
> something, then drop out again.
> 
> Then you go away and leave the box, and it crashes into xmon, but xmon
> doesn't print a backtrace because you've already been in xmon. Usually
> you can just get on the console and hit 't', but sometimes the machine
> crashes so hard that xmon doesn't take input - in which case you now
> have no backtrace. :sadface:
> 
> So I'll send a follow-up patch to remove the auto backtrace stuff
> completely and see if anyone objects.
> 

OK, guess you noticed in your next message I kept the trace
behavior...let's discuss there =)

Cheers,


Guilherme
> cheers
> 



Re: [PATCH 0/2] powerpc: kretprobe updates

2017-02-21 Thread Masami Hiramatsu
On Mon, 20 Feb 2017 15:20:24 +0530
"Naveen N. Rao"  wrote:

> On 2017/02/19 01:42PM, Masami Hiramatsu wrote:
> > On Fri, 17 Feb 2017 17:42:54 -0300
> > Arnaldo Carvalho de Melo  wrote:
> > 
> > > Em Fri, Feb 17, 2017 at 07:44:33PM +0900, Masami Hiramatsu escreveu:
> > > > On Thu, 16 Feb 2017 13:47:37 +0530
> > > > "Naveen N. Rao"  wrote:
> > > > 
> > > > > I am posting the powerpc bits in the same thread so as to keep these
> > > > > changes together. I am not sure how this should be taken upstream as
> > > > > there are atleast three different trees involved: one for the core
> > > > > kprobes infrastructure, one for powerpc and one for perf.
> > > 
> > > > Hmm, could you make these (and other related) patches and
> > > > other series in one series? Or wait for the other series
> > > > are merged correctly.
> > > 
> > > Well, patches like these should be done in a way that the tooling parts
> > > can deal with kernels with or without the kernel changes, so that older
> > > tools work with new kernels and new tools work with older kernels.
> > > 
> > > "work" as in the previous behaviour is kept when a new tool deals with
> > > an older kernel and an older tool would warn the user that what it needs
> > > is not present in that kernel.
> > > 
> > > Is this the case? I just looked briefly at the patch commit logs.
> > 
> > Thanks Arnaldo,
> > 
> > Naveen, I think this one and your previous series are incompatible
> > with older kernel. So those should be merged in one series and
> > at least (1) update ftrace's README special file to show explicitly
> > which can accept text+offset style for kretprobes, and 
> 
> Sure - do you mean Documentation/trace/kprobetrace.txt? And, do you want 
> me to include kernel version where this changed?

No, I meant /sys/kernel/debug/tracing/README. For some reasons, perf
probe already parse it in util/probe-file.c. 
Please see commit 180b20616ce57e93eb692170c793be94c456b1e2 and 
864256255597aad86abcecbe6c53da8852ded15b

Thank you,

> 
> > (2) update
> > perf probe side to ensure that (and fallback to previous logic if not).
> 
> Sure. I am trying out an approach and will post it as soon as it's 
> ready.
> 
> Thanks!
> - Naveen
> 


-- 
Masami Hiramatsu 


Re: [PATCH 0/2] powerpc: kretprobe updates

2017-02-21 Thread Masami Hiramatsu
On Mon, 20 Feb 2017 17:13:05 +0530
"Naveen N. Rao"  wrote:

> On 2017/02/17 05:42PM, Arnaldo Carvalho de Melo wrote:
> > Em Fri, Feb 17, 2017 at 07:44:33PM +0900, Masami Hiramatsu escreveu:
> > > On Thu, 16 Feb 2017 13:47:37 +0530
> > > "Naveen N. Rao"  wrote:
> > > 
> > > > I am posting the powerpc bits in the same thread so as to keep these
> > > > changes together. I am not sure how this should be taken upstream as
> > > > there are atleast three different trees involved: one for the core
> > > > kprobes infrastructure, one for powerpc and one for perf.
> > 
> > > Hmm, could you make these (and other related) patches and
> > > other series in one series? Or wait for the other series
> > > are merged correctly.
> > 
> > Well, patches like these should be done in a way that the tooling parts
> > can deal with kernels with or without the kernel changes, so that older
> > tools work with new kernels and new tools work with older kernels.
> 
> Does the below work?

Sorry, no that is not what we meant. Please see my previous reply.

> 
> The idea is to just prefer the real function names when probing 
> functions that do not have duplicate names. Offset from _text only if we 
> detect that a function name has multiple entries. In this manner,
> existing perf will continue to work with newer kernels and vice-versa.

Even with this, the latter will fail (instead of putting rprobe on the
first symbol) on older kernel. Instead, we'll check the ftrace's README
on current kernel and if it supports sym+offs style on retprobe,
we'll use _text+offs event definition. If not, we continue to use
older style.

Thank you,

> 
> Before this patch:
> 
> naveen@ubuntu:~/linux/tools/perf$ sudo ./perf probe -D _do_fork
> p:probe/_do_fork _text+857288
> naveen@ubuntu:~/linux/tools/perf$ sudo ./perf probe -D _do_fork%return
> r:probe/_do_fork _text+857288
> naveen@ubuntu:~/linux/tools/perf$ sudo ./perf probe -D read_mem
> p:probe/read_mem _text+10019704
> p:probe/read_mem_1 _text+6228424
> naveen@ubuntu:~/linux/tools/perf$ sudo ./perf probe -D read_mem%return
> r:probe/read_mem _text+10019704
> r:probe/read_mem_1 _text+6228424
> 
> 
> With the below patch (lightly tested):
> 
> naveen@ubuntu:~/linux/tools/perf$ sudo ./perf probe -D _do_fork
> p:probe/_do_fork _do_fork+8
> naveen@ubuntu:~/linux/tools/perf$ sudo ./perf probe -D _do_fork%return
> r:probe/_do_fork _do_fork+8
> naveen@ubuntu:~/linux/tools/perf$ sudo ./perf probe -D read_mem
> p:probe/read_mem _text+10019704
> p:probe/read_mem_1 _text+6228424
> naveen@ubuntu:~/linux/tools/perf$ sudo ./perf probe -D read_mem%return
> r:probe/read_mem _text+10019704
> r:probe/read_mem_1 _text+6228424
> 
> Signed-off-by: Naveen N. Rao 
> ---
>  tools/perf/util/machine.h |  7 +
>  tools/perf/util/map.c | 41 +
>  tools/perf/util/map.h | 13 ++
>  tools/perf/util/probe-event.c | 18 +
>  tools/perf/util/symbol.c  | 60 
> +++
>  tools/perf/util/symbol.h  |  2 ++
>  6 files changed, 141 insertions(+)
> 
> diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
> index 354de6e56109..277ffb1e0896 100644
> --- a/tools/perf/util/machine.h
> +++ b/tools/perf/util/machine.h
> @@ -203,6 +203,13 @@ struct symbol 
> *machine__find_kernel_function_by_name(struct machine *machine,
>   return map_groups__find_function_by_name(&machine->kmaps, name, mapp);
>  }
>  
> +static inline
> +unsigned int machine__find_kernel_function_count_by_name(struct machine 
> *machine,
> +  const char *name)
> +{
> + return map_groups__find_function_count_by_name(&machine->kmaps, name);
> +}
> +
>  struct map *machine__findnew_module_map(struct machine *machine, u64 start,
>   const char *filename);
>  int arch__fix_module_text_start(u64 *start, const char *name);
> diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
> index 4f9a71c63026..b75b35dc54ad 100644
> --- a/tools/perf/util/map.c
> +++ b/tools/perf/util/map.c
> @@ -349,6 +349,17 @@ struct symbol *map__find_symbol_by_name(struct map *map, 
> const char *name)
>   return dso__find_symbol_by_name(map->dso, map->type, name);
>  }
>  
> +unsigned int map__find_symbol_count_by_name(struct map *map, const char 
> *name)
> +{
> + if (map__load(map) < 0)
> + return 0;
> +
> + if (!dso__sorted_by_name(map->dso, map->type))
> + dso__sort_by_name(map->dso, map->type);
> +
> + return dso__find_symbol_count_by_name(map->dso, map->type, name);
> +}
> +
>  struct map *map__clone(struct map *from)
>  {
>   struct map *map = memdup(from, sizeof(*map));
> @@ -593,6 +604,29 @@ struct symbol *maps__find_symbol_by_name(struct maps 
> *maps, const char *name,
>   return sym;
>  }
>  
> +unsigned int maps__find_symbol_count_by_name(struct maps *maps, const char 
> *name)
> +{
> + struct symbol *sym;
> + struct 

Re: [PATCH 0/2] Allow configurable stack size (especially 32k on PPC64)

2017-02-21 Thread Gabriel Paubert
On Tue, Feb 21, 2017 at 09:24:38AM +1300, Hamish Martin wrote:
> This patch series adds the ability to configure the THREAD_SHIFT value and
> thereby alter the stack size on powerpc systems. We are particularly 
> interested
> in configuring for a 32k stack on PPC64.
> 
> Using an NXP T2081 (e6500 PPC64 cores) we are observing stack overflows as a
> result of applying a DTS overlay containing some I2C devices. Our scenario is
> an ethernet switch chassis with plug-in cards. The I2C is driven from the 
> T2081
> through a PCA9548 mux on the main board. When we detect insertion of the 
> plugin
> card we schedule work for a call to of_overlay_create() to install a DTS
> overlay for the plugin board. This DTS overlay contains a further PCA9548 mux
> with more devices hanging off it including a PCA9539 GPIO expander. The
> ultimate installed I2C tree is:
> 
> T2081 --- PCA9548 MUX --- PCA9548 MUX --- PCA9539 GPIO Expander
> 
> When we install the overlay the devices described in the overlay are probed 
> and
> we see a large number of stack frames used as a result. If this is coupled 
> with
> an interrupt happening that requires moderate to high stack use we observe
> stack corruption. Here is an example long stack (from a 4.10-rc8 kernel) that
> does not show corruption but does demonstrate the length and frame sizes
> involved.
> 
> DepthSize   Location(72 entries)
> -   
>   0)13872 128   .__raise_softirq_irqoff+0x1c/0x130
>   1)13744 144   .raise_softirq+0x30/0x70
>   2)13600 112   .invoke_rcu_core+0x54/0x70
>   3)13488 336   .rcu_check_callbacks+0x294/0xde0
>   4)13152 128   .update_process_times+0x40/0x90
>   5)13024 144   .tick_sched_handle.isra.16+0x40/0xb0
>   6)12880 144   .tick_sched_timer+0x6c/0xe0
>   7)12736 272   .__hrtimer_run_queues+0x1a0/0x4b0
>   8)12464 208   .hrtimer_interrupt+0xe8/0x2a0
>   9)12256 160   .__timer_interrupt+0xdc/0x330
>  10)12096 160   .timer_interrupt+0x138/0x190
>  11)11936 752   exc_0x900_common+0xe0/0xe4
>  12)11184 128   .ftrace_ops_no_ops+0x11c/0x230
>  13)11056 176   .ftrace_ops_test.isra.12+0x30/0x50
>  14)10880 160   .ftrace_ops_no_ops+0xd4/0x230
>  15)10720 112   ftrace_call+0x4/0x8
>  16)10608 176   .lock_timer_base+0x3c/0xf0
>  17)10432 144   .try_to_del_timer_sync+0x2c/0x90
>  18)10288 128   .del_timer_sync+0x60/0x80
>  19)10160 256   .schedule_timeout+0x1fc/0x490
>  20) 9904 208   .i2c_wait+0x238/0x290
>  21) 9696 256   .mpc_xfer+0x4e4/0x570
>  22) 9440 208   .__i2c_transfer+0x158/0x6d0
>  23) 9232 192   .pca954x_reg_write+0x70/0x110
>  24) 9040 160   .__i2c_mux_master_xfer+0xb4/0xf0
>  25) 8880 208   .__i2c_transfer+0x158/0x6d0
>  26) 8672 192   .pca954x_reg_write+0x70/0x110
>  27) 8480 144   .pca954x_select_chan+0x68/0xa0
>  28) 8336 160   .__i2c_mux_master_xfer+0x64/0xf0
>  29) 8176 208   .__i2c_transfer+0x158/0x6d0
>  30) 7968 144   .i2c_transfer+0x98/0x130
>  31) 7824 320   .i2c_smbus_xfer_emulated+0x168/0x600
>  32) 7504 208   .i2c_smbus_xfer+0x1c0/0x5d0
>  33) 7296 192   .i2c_smbus_write_byte_data+0x50/0x70
>  34) 7104 144   .pca953x_write_single+0x6c/0xe0
>  35) 6960 192   .pca953x_gpio_direction_output+0xa4/0x160
>  36) 6768 160   ._gpiod_direction_output_raw+0xec/0x460
>  37) 6608 160   .gpiod_hog+0x98/0x250
>  38) 6448 176   .of_gpiochip_add+0xdc/0x1c0
>  39) 6272 256   .gpiochip_add_data+0x4f4/0x8c0
>  40) 6016 144   .devm_gpiochip_add_data+0x64/0xf0
>  41) 5872 208   .pca953x_probe+0x2b4/0x5f0
>  42) 5664 144   .i2c_device_probe+0x224/0x2e0
>  43) 5520 160   .really_probe+0x244/0x380
>  44) 5360 160   .bus_for_each_drv+0x94/0x100
>  45) 5200 160   .__device_attach+0x118/0x160
>  46) 5040 144   .bus_probe_device+0xe8/0x100
>  47) 4896 208   .device_add+0x500/0x6c0
>  48) 4688 144   .i2c_new_device+0x1f8/0x240
>  49) 4544 256   .of_i2c_register_device+0x160/0x280
>  50) 4288 192   .i2c_register_adapter+0x238/0x630
>  51) 4096 208   .i2c_mux_add_adapter+0x3f8/0x540
>  52) 3888 192   .pca954x_probe+0x234/0x370
>  53) 3696 144   .i2c_device_probe+0x224/0x2e0
>  54) 3552 160   .really_probe+0x244/0x380
>  55) 3392 160   .bus_for_each_drv+0x94/0x100
>  56) 3232 160   .__device_attach+0x118/0x160
>  57) 3072 144   .bus_probe_device+0xe8/0x100
>  58) 2928 208   .device_add+0x500/0x6c0
>  59) 2720 144   .i2c_new_device+0x1f8/0x240
>  60) 2576 256   .of_i2c_register_device+0x160/0x280
>  61) 2320 144   .of_i2c_notify+0x12c/0x1d0
>  62) 2176 160   .notifier_call_chain+0x8c/0x100
>  63) 2016 160   .__blocking_not

Re: [PowerPC] 4.10.0 fails to build on BE config

2017-02-21 Thread Sachin Sant

> On 21-Feb-2017, at 4:39 PM, Oliver O'Halloran  wrote:
> 
> On Tue, Feb 21, 2017 at 6:25 PM, abdul  wrote:
>> Hi,
>> 
>> Today's mainline build, breaks on Power6 and Power7 (all BE config) with
>> these build errors
>> 
>> arch/powerpc/kernel/time.c: In function ‘running_clock’:
>> arch/powerpc/kernel/time.c:712:2: error: implicit declaration of function
>> ‘cputime_to_nsecs’ [-Werror=implicit-function-declaration]
>> return local_clock() -
>> cputime_to_nsecs(kcpustat_this_cpu->cpustat[CPUTIME_STEAL]);
>> ^
>> cc1: some warnings being treated as errors
>> make[1]: *** [arch/powerpc/kernel/time.o] Error 1
>> 
>> 
>> Regard's
>> Abdul Haleem
>> IBM Linux Technology Center.
> 
> Hi Abdul,
> 
> Are there any extra patches in your tree? I briefly tried to reproduce
> this, but in my local tree this line:

I think the failure reported here is against linux-next tree.
Abdul can you confirm ?

With today’s linux-next tree I can recreate this build failure

  CC  arch/powerpc/kernel/sysfs.o
  CC  arch/powerpc/kernel/cacheinfo.o
  CC  arch/powerpc/kernel/time.o
arch/powerpc/kernel/time.c: In function 'running_clock':
arch/powerpc/kernel/time.c:712:2: error: implicit declaration of function 
'cputime_to_nsecs' [-Werror=implicit-function-declaration]
  return local_clock() - 
cputime_to_nsecs(kcpustat_this_cpu->cpustat[CPUTIME_STEAL]);
  ^
cc1: some warnings being treated as errors
make[1]: *** [arch/powerpc/kernel/time.o] Error 1
make: *** [arch/powerpc/kernel] Error 2

This is with big endian config. I have attached the config file.
The system has gcc version 4.8.5

Thanks
- Sachin



config_be_next_20170221
Description: Binary data


[PATCH 9/9] dpaa_eth: enable multiple Tx traffic classes

2017-02-21 Thread Madalin Bucur
From: Camelia Groza 

Implement the setup_tc ndo to configure prioritised Tx traffic classes.
Priorities range from 0 (lowest) to 3 (highest). The driver assigns
NR_CPUS queues to each traffic class.

Signed-off-by: Camelia Groza 
Signed-off-by: Madalin Bucur 
---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 36 ++
 1 file changed, 36 insertions(+)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index ac75d09..1b3ea38 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -342,6 +342,41 @@ static void dpaa_get_stats64(struct net_device *net_dev,
}
 }
 
+static int dpaa_setup_tc(struct net_device *net_dev, u32 handle, __be16 proto,
+struct tc_to_netdev *tc)
+{
+   struct dpaa_priv *priv = netdev_priv(net_dev);
+   int i;
+
+   if (tc->type != TC_SETUP_MQPRIO)
+   return -EINVAL;
+
+   if (tc->tc == priv->num_tc)
+   return 0;
+
+   if (!tc->tc) {
+   netdev_reset_tc(net_dev);
+   goto out;
+   }
+
+   if (tc->tc > DPAA_TC_NUM) {
+   netdev_err(net_dev, "Too many traffic classes: max %d 
supported.\n",
+  DPAA_TC_NUM);
+   return -EINVAL;
+   }
+
+   netdev_set_num_tc(net_dev, tc->tc);
+
+   for (i = 0; i < tc->tc; i++)
+   netdev_set_tc_queue(net_dev, i, DPAA_TC_TXQ_NUM,
+   i * DPAA_TC_TXQ_NUM);
+
+out:
+   priv->num_tc = tc->tc ? tc->tc : 1;
+   netif_set_real_num_tx_queues(net_dev, priv->num_tc * DPAA_TC_TXQ_NUM);
+   return 0;
+}
+
 static struct mac_device *dpaa_mac_dev_get(struct platform_device *pdev)
 {
struct platform_device *of_dev;
@@ -2417,6 +2452,7 @@ static const struct net_device_ops dpaa_ops = {
.ndo_validate_addr = eth_validate_addr,
.ndo_set_rx_mode = dpaa_set_rx_mode,
.ndo_do_ioctl = dpaa_ioctl,
+   .ndo_setup_tc = dpaa_setup_tc,
 };
 
 static int dpaa_napi_add(struct net_device *net_dev)
-- 
2.1.0



[PATCH 8/9] dpaa_eth: add four prioritised Tx traffic classes

2017-02-21 Thread Madalin Bucur
From: Camelia Groza 

Each traffic class corresponds to a WQ priority level. The number of Tx
netdev queues and frame queues is increased to NR_CPUS queues for each
traffic class. In addition, the priority of the Rx, Error and Conf queues
is lowered but their order is maintained.

By default, only one traffic class is enabled, only the low priority Tx
queues are used and only the corresponding netdev queues are advertised.

Signed-off-by: Camelia Groza 
Signed-off-by: Madalin Bucur 
---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 43 +-
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.h |  8 -
 2 files changed, 42 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index ae64cdb..ac75d09 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -565,16 +565,18 @@ static void dpaa_bps_free(struct dpaa_priv *priv)
 
 /* Use multiple WQs for FQ assignment:
  * - Tx Confirmation queues go to WQ1.
- * - Rx Error and Tx Error queues go to WQ2 (giving them a better chance
- *   to be scheduled, in case there are many more FQs in WQ3).
- * - Rx Default and Tx queues go to WQ3 (no differentiation between
- *   Rx and Tx traffic).
+ * - Rx Error and Tx Error queues go to WQ5 (giving them a better chance
+ *   to be scheduled, in case there are many more FQs in WQ6).
+ * - Rx Default goes to WQ6.
+ * - Tx queues go to different WQs depending on their priority. Equal
+ *   chunks of NR_CPUS queues go to WQ6 (lowest priority), WQ2, WQ1 and
+ *   WQ0 (highest priority).
  * This ensures that Tx-confirmed buffers are timely released. In particular,
  * it avoids congestion on the Tx Confirm FQs, which can pile up PFDRs if they
  * are greatly outnumbered by other FQs in the system, while
  * dequeue scheduling is round-robin.
  */
-static inline void dpaa_assign_wq(struct dpaa_fq *fq)
+static inline void dpaa_assign_wq(struct dpaa_fq *fq, int idx)
 {
switch (fq->fq_type) {
case FQ_TYPE_TX_CONFIRM:
@@ -583,11 +585,33 @@ static inline void dpaa_assign_wq(struct dpaa_fq *fq)
break;
case FQ_TYPE_RX_ERROR:
case FQ_TYPE_TX_ERROR:
-   fq->wq = 2;
+   fq->wq = 5;
break;
case FQ_TYPE_RX_DEFAULT:
+   fq->wq = 6;
+   break;
case FQ_TYPE_TX:
-   fq->wq = 3;
+   switch (idx / DPAA_TC_TXQ_NUM) {
+   case 0:
+   /* Low priority (best effort) */
+   fq->wq = 6;
+   break;
+   case 1:
+   /* Medium priority */
+   fq->wq = 2;
+   break;
+   case 2:
+   /* High priority */
+   fq->wq = 1;
+   break;
+   case 3:
+   /* Very high priority */
+   fq->wq = 0;
+   break;
+   default:
+   WARN(1, "Too many TX FQs: more than %d!\n",
+DPAA_ETH_TXQ_NUM);
+   }
break;
default:
WARN(1, "Invalid FQ type %d for FQID %d!\n",
@@ -615,7 +639,7 @@ static struct dpaa_fq *dpaa_fq_alloc(struct device *dev,
}
 
for (i = 0; i < count; i++)
-   dpaa_assign_wq(dpaa_fq + i);
+   dpaa_assign_wq(dpaa_fq + i, i);
 
return dpaa_fq;
 }
@@ -2683,6 +2707,9 @@ static int dpaa_eth_probe(struct platform_device *pdev)
memset(percpu_priv, 0, sizeof(*percpu_priv));
}
 
+   priv->num_tc = 1;
+   netif_set_real_num_tx_queues(net_dev, priv->num_tc * DPAA_TC_TXQ_NUM);
+
/* Initialize NAPI */
err = dpaa_napi_add(net_dev);
if (err < 0)
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
index 1f9aebf..9941a78 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
@@ -39,7 +39,12 @@
 #include "mac.h"
 #include "dpaa_eth_trace.h"
 
-#define DPAA_ETH_TXQ_NUM   NR_CPUS
+/* Number of prioritised traffic classes */
+#define DPAA_TC_NUM4
+/* Number of Tx queues per traffic class */
+#define DPAA_TC_TXQ_NUMNR_CPUS
+/* Total number of Tx queues */
+#define DPAA_ETH_TXQ_NUM   (DPAA_TC_NUM * DPAA_TC_TXQ_NUM)
 
 #define DPAA_BPS_NUM 3 /* number of bpools per interface */
 
@@ -152,6 +157,7 @@ struct dpaa_priv {
u16 channel;
struct list_head dpaa_fq_list;
 
+   u8 num_tc;
u32 msg_enable; /* net_device message level */
 
struct {
-- 
2.1.0



[PATCH 7/9] dpaa_eth: do not ignore port api return value

2017-02-21 Thread Madalin Bucur
Reported-by: Dan Carpenter 

Signed-off-by: Madalin Bucur 
---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 65 +-
 1 file changed, 43 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index a7a595c..ae64cdb 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -1063,9 +1063,9 @@ static int dpaa_fq_free(struct device *dev, struct 
list_head *list)
return err;
 }
 
-static void dpaa_eth_init_tx_port(struct fman_port *port, struct dpaa_fq *errq,
- struct dpaa_fq *defq,
- struct dpaa_buffer_layout *buf_layout)
+static int dpaa_eth_init_tx_port(struct fman_port *port, struct dpaa_fq *errq,
+struct dpaa_fq *defq,
+struct dpaa_buffer_layout *buf_layout)
 {
struct fman_buffer_prefix_content buf_prefix_content;
struct fman_port_params params;
@@ -1084,23 +1084,29 @@ static void dpaa_eth_init_tx_port(struct fman_port 
*port, struct dpaa_fq *errq,
params.specific_params.non_rx_params.dflt_fqid = defq->fqid;
 
err = fman_port_config(port, ¶ms);
-   if (err)
+   if (err) {
pr_err("%s: fman_port_config failed\n", __func__);
+   return err;
+   }
 
err = fman_port_cfg_buf_prefix_content(port, &buf_prefix_content);
-   if (err)
+   if (err) {
pr_err("%s: fman_port_cfg_buf_prefix_content failed\n",
   __func__);
+   return err;
+   }
 
err = fman_port_init(port);
if (err)
pr_err("%s: fm_port_init failed\n", __func__);
+
+   return err;
 }
 
-static void dpaa_eth_init_rx_port(struct fman_port *port, struct dpaa_bp **bps,
- size_t count, struct dpaa_fq *errq,
- struct dpaa_fq *defq,
- struct dpaa_buffer_layout *buf_layout)
+static int dpaa_eth_init_rx_port(struct fman_port *port, struct dpaa_bp **bps,
+size_t count, struct dpaa_fq *errq,
+struct dpaa_fq *defq,
+struct dpaa_buffer_layout *buf_layout)
 {
struct fman_buffer_prefix_content buf_prefix_content;
struct fman_port_rx_params *rx_p;
@@ -1128,32 +1134,44 @@ static void dpaa_eth_init_rx_port(struct fman_port 
*port, struct dpaa_bp **bps,
}
 
err = fman_port_config(port, ¶ms);
-   if (err)
+   if (err) {
pr_err("%s: fman_port_config failed\n", __func__);
+   return err;
+   }
 
err = fman_port_cfg_buf_prefix_content(port, &buf_prefix_content);
-   if (err)
+   if (err) {
pr_err("%s: fman_port_cfg_buf_prefix_content failed\n",
   __func__);
+   return err;
+   }
 
err = fman_port_init(port);
if (err)
pr_err("%s: fm_port_init failed\n", __func__);
+
+   return err;
 }
 
-static void dpaa_eth_init_ports(struct mac_device *mac_dev,
-   struct dpaa_bp **bps, size_t count,
-   struct fm_port_fqs *port_fqs,
-   struct dpaa_buffer_layout *buf_layout,
-   struct device *dev)
+static int dpaa_eth_init_ports(struct mac_device *mac_dev,
+  struct dpaa_bp **bps, size_t count,
+  struct fm_port_fqs *port_fqs,
+  struct dpaa_buffer_layout *buf_layout,
+  struct device *dev)
 {
struct fman_port *rxport = mac_dev->port[RX];
struct fman_port *txport = mac_dev->port[TX];
+   int err;
 
-   dpaa_eth_init_tx_port(txport, port_fqs->tx_errq,
- port_fqs->tx_defq, &buf_layout[TX]);
-   dpaa_eth_init_rx_port(rxport, bps, count, port_fqs->rx_errq,
- port_fqs->rx_defq, &buf_layout[RX]);
+   err = dpaa_eth_init_tx_port(txport, port_fqs->tx_errq,
+   port_fqs->tx_defq, &buf_layout[TX]);
+   if (err)
+   return err;
+
+   err = dpaa_eth_init_rx_port(rxport, bps, count, port_fqs->rx_errq,
+   port_fqs->rx_defq, &buf_layout[RX]);
+
+   return err;
 }
 
 static int dpaa_bman_release(const struct dpaa_bp *dpaa_bp,
@@ -2649,8 +2667,10 @@ static int dpaa_eth_probe(struct platform_device *pdev)
priv->rx_headroom = dpaa_get_headroom(&priv->buf_layout[RX]);
 
/* All real interfaces need their ports initialized */
-   dpaa_eth_init_ports(mac_dev, dpaa_bps, DPAA_BPS_NUM, &port_fqs,
-   &priv->buf_layout[0], dev);
+   err = dpaa

[PATCH 6/9] dpaa_eth: enable Rx checksum offload

2017-02-21 Thread Madalin Bucur
Use the FMan HW parser L4CV flag to offload Rx checksumming.

Signed-off-by: Madalin Bucur 
---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 29 --
 1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index e19181f..a7a595c 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -137,6 +137,13 @@ MODULE_PARM_DESC(tx_timeout, "The Tx timeout in ms");
 /* L4 Type field: TCP */
 #define FM_L4_PARSE_RESULT_TCP 0x20
 
+/* FD status field indicating whether the FM Parser has attempted to validate
+ * the L4 csum of the frame.
+ * Note that having this bit set doesn't necessarily imply that the checksum
+ * is valid. One would have to check the parse results to find that out.
+ */
+#define FM_FD_STAT_L4CV 0x0004
+
 #define DPAA_SGT_MAX_ENTRIES 16 /* maximum number of entries in SG Table */
 #define DPAA_BUFF_RELEASE_MAX 8 /* maximum number of buffers released at once 
*/
 
@@ -235,6 +242,7 @@ static int dpaa_netdev_init(struct net_device *net_dev,
 * For conformity, we'll still declare GSO explicitly.
 */
net_dev->features |= NETIF_F_GSO;
+   net_dev->features |= NETIF_F_RXCSUM;
 
net_dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
/* we do not want shared skbs on TX */
@@ -1526,6 +1534,23 @@ static struct sk_buff *dpaa_cleanup_tx_fd(const struct 
dpaa_priv *priv,
return skb;
 }
 
+static u8 rx_csum_offload(const struct dpaa_priv *priv, const struct qm_fd *fd)
+{
+   /* The parser has run and performed L4 checksum validation.
+* We know there were no parser errors (and implicitly no
+* L4 csum error), otherwise we wouldn't be here.
+*/
+   if ((priv->net_dev->features & NETIF_F_RXCSUM) &&
+   (be32_to_cpu(fd->status) & FM_FD_STAT_L4CV))
+   return CHECKSUM_UNNECESSARY;
+
+   /* We're here because either the parser didn't run or the L4 checksum
+* was not verified. This may include the case of a UDP frame with
+* checksum zero or an L4 proto other than TCP/UDP
+*/
+   return CHECKSUM_NONE;
+}
+
 /* Build a linear skb around the received buffer.
  * We are guaranteed there is enough room at the end of the data buffer to
  * accommodate the shared info area of the skb.
@@ -1556,7 +1581,7 @@ static struct sk_buff *contig_fd_to_skb(const struct 
dpaa_priv *priv,
skb_reserve(skb, fd_off);
skb_put(skb, qm_fd_get_length(fd));
 
-   skb->ip_summed = CHECKSUM_NONE;
+   skb->ip_summed = rx_csum_offload(priv, fd);
 
return skb;
 
@@ -1616,7 +1641,7 @@ static struct sk_buff *sg_fd_to_skb(const struct 
dpaa_priv *priv,
if (WARN_ON(unlikely(!skb)))
goto free_buffers;
 
-   skb->ip_summed = CHECKSUM_NONE;
+   skb->ip_summed = rx_csum_offload(priv, fd);
 
/* Make sure forwarded skbs will have enough space
 * on Tx, if extra headers are added.
-- 
2.1.0



[PATCH 5/9] dpaa_eth: remove redundant initialization

2017-02-21 Thread Madalin Bucur
Signed-off-by: Madalin Bucur 
---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index e2ca107..e19181f 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -2093,7 +2093,7 @@ static enum qman_cb_dqrr_result rx_default_dqrr(struct 
qman_portal *portal,
dma_addr_t addr = qm_fd_addr(fd);
enum qm_fd_format fd_format;
struct net_device *net_dev;
-   u32 fd_status = fd->status;
+   u32 fd_status;
struct dpaa_bp *dpaa_bp;
struct dpaa_priv *priv;
unsigned int skb_len;
-- 
2.1.0



[PATCH 4/9] fsl/fman: enlarge FIFO to allow for the 5th port

2017-02-21 Thread Madalin Bucur
Signed-off-by: Madalin Bucur 
---
 drivers/net/ethernet/freescale/fman/fman.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/fman/fman.c 
b/drivers/net/ethernet/freescale/fman/fman.c
index d755930..4aefe24 100644
--- a/drivers/net/ethernet/freescale/fman/fman.c
+++ b/drivers/net/ethernet/freescale/fman/fman.c
@@ -1212,7 +1212,7 @@ static int fill_soc_specific_params(struct 
fman_state_struct *state)
state->max_num_of_open_dmas = 32;
state->fm_port_num_of_cg= 256;
state->num_of_rx_ports  = 6;
-   state->total_fifo_size  = 122 * 1024;
+   state->total_fifo_size  = 136 * 1024;
break;
 
case 2:
-- 
2.1.0



[PATCH 3/9] fsl/fman: remove wrong free

2017-02-21 Thread Madalin Bucur
Reported-by: Dan Carpenter 

Signed-off-by: Madalin Bucur 
---
 drivers/net/ethernet/freescale/fman/fman_port.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fman/fman_port.c 
b/drivers/net/ethernet/freescale/fman/fman_port.c
index f314348..57bf44f 100644
--- a/drivers/net/ethernet/freescale/fman/fman_port.c
+++ b/drivers/net/ethernet/freescale/fman/fman_port.c
@@ -1312,7 +1312,7 @@ int fman_port_config(struct fman_port *port, struct 
fman_port_params *params)
/* Allocate the FM driver's parameters structure */
port->cfg = kzalloc(sizeof(*port->cfg), GFP_KERNEL);
if (!port->cfg)
-   goto err_params;
+   return -EINVAL;
 
/* Initialize FM port parameters which will be kept by the driver */
port->port_type = port->dts_params.type;
@@ -1393,8 +1393,6 @@ int fman_port_config(struct fman_port *port, struct 
fman_port_params *params)
 
 err_port_cfg:
kfree(port->cfg);
-err_params:
-   kfree(port);
return -EINVAL;
 }
 EXPORT_SYMBOL(fman_port_config);
-- 
2.1.0



[PATCH 2/9] fsl/fman: set HW parser as BMI next engine

2017-02-21 Thread Madalin Bucur
Enable the HW parser for all DPAA interfaces.

Signed-off-by: Madalin Bucur 
---
 drivers/net/ethernet/freescale/fman/fman.c  | 21 
 drivers/net/ethernet/freescale/fman/fman_port.c | 72 +++--
 2 files changed, 90 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fman/fman.c 
b/drivers/net/ethernet/freescale/fman/fman.c
index f60845f..d755930 100644
--- a/drivers/net/ethernet/freescale/fman/fman.c
+++ b/drivers/net/ethernet/freescale/fman/fman.c
@@ -59,6 +59,7 @@
 #define DMA_OFFSET 0x000C2000
 #define FPM_OFFSET 0x000C3000
 #define IMEM_OFFSET0x000C4000
+#define HWP_OFFSET 0x000C7000
 #define CGP_OFFSET 0x000DB000
 
 /* Exceptions bit map */
@@ -218,6 +219,9 @@
 
 #define QMI_GS_HALT_NOT_BUSY   0x0002
 
+/* HWP defines */
+#define HWP_RPIMAC_PEN 0x0001
+
 /* IRAM defines */
 #define IRAM_IADD_AIE  0x8000
 #define IRAM_READY 0x8000
@@ -475,6 +479,12 @@ struct fman_dma_regs {
u32 res00e0[0x400 - 56];
 };
 
+struct fman_hwp_regs {
+   u32 res[0x844 / 4]; /* 0x000..0x843 */
+   u32 fmprrpimac; /* FM Parser Internal memory access control */
+   u32 res[(0x1000 - 0x848) / 4];  /* 0x848..0xFFF */
+};
+
 /* Structure that holds current FMan state.
  * Used for saving run time information.
  */
@@ -606,6 +616,7 @@ struct fman {
struct fman_bmi_regs __iomem *bmi_regs;
struct fman_qmi_regs __iomem *qmi_regs;
struct fman_dma_regs __iomem *dma_regs;
+   struct fman_hwp_regs __iomem *hwp_regs;
fman_exceptions_cb *exception_cb;
fman_bus_error_cb *bus_error_cb;
/* Spinlock for FMan use */
@@ -999,6 +1010,12 @@ static void qmi_init(struct fman_qmi_regs __iomem *qmi_rg,
iowrite32be(tmp_reg, &qmi_rg->fmqm_ien);
 }
 
+static void hwp_init(struct fman_hwp_regs __iomem *hwp_rg)
+{
+   /* enable HW Parser */
+   iowrite32be(HWP_RPIMAC_PEN, &hwp_rg->fmprrpimac);
+}
+
 static int enable(struct fman *fman, struct fman_cfg *cfg)
 {
u32 cfg_reg = 0;
@@ -1793,6 +1810,7 @@ static int fman_config(struct fman *fman)
fman->bmi_regs = base_addr + BMI_OFFSET;
fman->qmi_regs = base_addr + QMI_OFFSET;
fman->dma_regs = base_addr + DMA_OFFSET;
+   fman->hwp_regs = base_addr + HWP_OFFSET;
fman->base_addr = base_addr;
 
spin_lock_init(&fman->spinlock);
@@ -2062,6 +2080,9 @@ static int fman_init(struct fman *fman)
/* Init QMI Registers */
qmi_init(fman->qmi_regs, fman->cfg);
 
+   /* Init HW Parser */
+   hwp_init(fman->hwp_regs);
+
err = enable(fman, cfg);
if (err != 0)
return err;
diff --git a/drivers/net/ethernet/freescale/fman/fman_port.c 
b/drivers/net/ethernet/freescale/fman/fman_port.c
index 9f3bb50..f314348 100644
--- a/drivers/net/ethernet/freescale/fman/fman_port.c
+++ b/drivers/net/ethernet/freescale/fman/fman_port.c
@@ -62,6 +62,7 @@
 
 #define BMI_PORT_REGS_OFFSET   0
 #define QMI_PORT_REGS_OFFSET   0x400
+#define HWP_PORT_REGS_OFFSET   0x800
 
 /* Default values */
 #define DFLT_PORT_BUFFER_PREFIX_CONTEXT_DATA_ALIGN \
@@ -182,7 +183,7 @@
 #define NIA_ENG_BMI0x0050
 #define NIA_ENG_QMI_ENQ0x0054
 #define NIA_ENG_QMI_DEQ0x0058
-
+#define NIA_ENG_HWP0x0044
 #define NIA_BMI_AC_ENQ_FRAME   0x0002
 #define NIA_BMI_AC_TX_RELEASE  0x02C0
 #define NIA_BMI_AC_RELEASE 0x00C0
@@ -317,6 +318,19 @@ struct fman_port_qmi_regs {
u32 fmqm_pndcc; /* PortID n Dequeue Confirm Counter */
 };
 
+#define HWP_HXS_COUNT 16
+#define HWP_HXS_PHE_REPORT 0x0800
+#define HWP_HXS_PCAC_PSTAT 0x0100
+#define HWP_HXS_PCAC_PSTOP 0x0001
+struct fman_port_hwp_regs {
+   struct {
+   u32 ssa; /* Soft Sequence Attachment */
+   u32 lcv; /* Line-up Enable Confirmation Mask */
+   } pmda[HWP_HXS_COUNT]; /* Parse Memory Direct Access Registers */
+   u32 reserved080[(0x3f8 - 0x080) / 4]; /* (0x080-0x3f7) */
+   u32 fmpr_pcac; /* Configuration Access Control */
+};
+
 /* QMI dequeue prefetch modes */
 enum fman_port_deq_prefetch {
FMAN_PORT_DEQ_NO_PREFETCH, /* No prefetch mode */
@@ -436,6 +450,7 @@ struct fman_port {
 
union fman_port_bmi_regs __iomem *bmi_regs;
struct fman_port_qmi_regs __iomem *qmi_regs;
+   struct fman_port_hwp_regs __iomem *hwp_regs;
 
struct fman_sp_buffer_offsets buffer_offsets;
 
@@ -521,9 +536,12 @@ static int init_bmi_rx(struct fman_port *port)
/* NIA */
tmp = (u32)cfg->rx_fd_bits << BMI_NEXT_ENG_FD_BITS_S

[PATCH 1/9] fsl/fman: parse result data is big endian

2017-02-21 Thread Madalin Bucur
Signed-off-by: Madalin Bucur 
---
 drivers/net/ethernet/freescale/fman/fman.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fman/fman.h 
b/drivers/net/ethernet/freescale/fman/fman.h
index 57aae8d..f53e147 100644
--- a/drivers/net/ethernet/freescale/fman/fman.h
+++ b/drivers/net/ethernet/freescale/fman/fman.h
@@ -134,14 +134,14 @@ enum fman_exceptions {
 struct fman_prs_result {
u8 lpid;/* Logical port id */
u8 shimr;   /* Shim header result  */
-   u16 l2r;/* Layer 2 result */
-   u16 l3r;/* Layer 3 result */
+   __be16 l2r; /* Layer 2 result */
+   __be16 l3r; /* Layer 3 result */
u8 l4r; /* Layer 4 result */
u8 cplan;   /* Classification plan id */
-   u16 nxthdr; /* Next Header  */
-   u16 cksum;  /* Running-sum */
+   __be16 nxthdr;  /* Next Header  */
+   __be16 cksum;   /* Running-sum */
/* Flags&fragment-offset field of the last IP-header */
-   u16 flags_frag_off;
+   __be16 flags_frag_off;
/* Routing type field of a IPV6 routing extension header */
u8 route_type;
/* Routing Extension Header Present; last bit is IP valid */
-- 
2.1.0



[PATCH 0/9] QorIQ DPAA 1 updates

2017-02-21 Thread Madalin Bucur
This patch set introduces a series of fixes and features to the DPAA 1
drivers. Besides activating hardware Rx checksum offloading, four traffic
classes are added for Tx traffic prioritisation.

Camelia Groza (2):
  dpaa_eth: add four prioritised Tx traffic classes
  dpaa_eth: enable multiple Tx traffic classes

Madalin Bucur (7):
  fsl/fman: parse result data is big endian
  fsl/fman: set HW parser as BMI next engine
  fsl/fman: remove wrong free
  fsl/fman: enlarge FIFO to allow for the 5th port
  dpaa_eth: remove redundant initialization
  dpaa_eth: enable Rx checksum offload
  dpaa_eth: do not ignore port api return value

 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c  | 175 +++-
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.h  |   8 +-
 drivers/net/ethernet/freescale/fman/fman.c  |  23 +++-
 drivers/net/ethernet/freescale/fman/fman.h  |  10 +-
 drivers/net/ethernet/freescale/fman/fman_port.c |  76 +-
 5 files changed, 246 insertions(+), 46 deletions(-)

-- 
2.1.0



Re: [PowerPC] 4.10.0 fails to build on BE config

2017-02-21 Thread Oliver O'Halloran
On Tue, Feb 21, 2017 at 6:25 PM, abdul  wrote:
> Hi,
>
> Today's mainline build, breaks on Power6 and Power7 (all BE config) with
> these build errors
>
> arch/powerpc/kernel/time.c: In function ‘running_clock’:
> arch/powerpc/kernel/time.c:712:2: error: implicit declaration of function
> ‘cputime_to_nsecs’ [-Werror=implicit-function-declaration]
> return local_clock() -
> cputime_to_nsecs(kcpustat_this_cpu->cpustat[CPUTIME_STEAL]);
> ^
> cc1: some warnings being treated as errors
> make[1]: *** [arch/powerpc/kernel/time.o] Error 1
>
>
> Regard's
> Abdul Haleem
> IBM Linux Technology Center.

Hi Abdul,

Are there any extra patches in your tree? I briefly tried to reproduce
this, but in my local tree this line:

> return local_clock() - 
> cputime_to_nsecs(kcpustat_this_cpu->cpustat[CPUTIME_STEAL]);

Is at time.c:692 rather than time.c:712

Oliver


Re: [PowerPC] 4.10.0 fails to build on BE config

2017-02-21 Thread Michael Ellerman
abdul  writes:

> Hi,
>
> Today's mainline build, breaks on Power6 and Power7 (all BE config) with 
> these build errors
>
> arch/powerpc/kernel/time.c: In function ‘running_clock’:
> arch/powerpc/kernel/time.c:712:2: error: implicit declaration of 
> function ‘cputime_to_nsecs’ [-Werror=implicit-function-declaration]
> return local_clock() - 
> cputime_to_nsecs(kcpustat_this_cpu->cpustat[CPUTIME_STEAL]);
> ^
> cc1: some warnings being treated as errors
> make[1]: *** [arch/powerpc/kernel/time.o] Error 1

What exact config is that?

And what compiler are you using?

I build ~250 configs with two different compilers multiple times a day,
so I'm curious how we failed to notice this.

cheers


[PATCH 5/5] powerpc/mm: Move hash specific pte bits to be top bits of RPN

2017-02-21 Thread Aneesh Kumar K.V
We don't support the full 57 bits of physical address and hence can overload
the top bits of RPN as hash specific pte bits.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash.h| 18 ++
 arch/powerpc/include/asm/book3s/64/pgtable.h | 19 ---
 2 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index af3c88624d3a..205c04df9cf3 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -6,20 +6,14 @@
  * Common bits between 4K and 64K pages in a linux-style PTE.
  * Additional bits may be defined in pgtable-hash64-*.h
  *
- * Note: We only support user read/write permissions. Supervisor always
- * have full read/write to pages above PAGE_OFFSET (pages below that
- * always use the user access permissions).
- *
- * We could create separate kernel read-only if we used the 3 PP bits
- * combinations that newer processors provide but we currently don't.
  */
-#define H_PAGE_BUSY_RPAGE_SW1 /* software: PTE & hash are busy */
+#define H_PAGE_BUSY_RPAGE_RPN45 /* software: PTE & hash are busy */
 #define H_PTE_NONE_MASK_PAGE_HPTEFLAGS
-#define H_PAGE_F_GIX_SHIFT 57
-/* (7ul << 57) HPTE index within HPTEG */
-#define H_PAGE_F_GIX   (_RPAGE_RSV2 | _RPAGE_RSV3 | _RPAGE_RSV4)
-#define H_PAGE_F_SECOND_RPAGE_RSV1 /* HPTE is in 2ndary 
HPTEG */
-#define H_PAGE_HASHPTE _RPAGE_SW0  /* PTE has associated HPTE */
+#define H_PAGE_F_GIX_SHIFT 53
+/* (7ul << 53) HPTE index within HPTEG */
+#define H_PAGE_F_GIX   (_RPAGE_RPN44 | _RPAGE_RPN43 | _RPAGE_RPN42)
+#define H_PAGE_F_SECOND_RPAGE_RPN41/* HPTE is in 2ndary 
HPTEG */
+#define H_PAGE_HASHPTE _RPAGE_RPN40/* PTE has associated HPTE */
 /*
  * Max physical address bit we will use for now.
  *
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 0ea69c91520b..3fd9e46e44c5 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -30,16 +30,29 @@
 #define _RPAGE_RSV20x0800UL
 #define _RPAGE_RSV30x0400UL
 #define _RPAGE_RSV40x0200UL
+
+#define _PAGE_PTE  0x4000UL/* distinguishes PTEs 
from pointers */
+#define _PAGE_PRESENT  0x8000UL/* pte contains a 
translation */
+
+/*
+ * Top and bottom bits of RPN which can be used by hash
+ * translation mode, because we expect them to be zero
+ * otherwise.
+ */
 #define _RPAGE_RPN00x01000
 #define _RPAGE_RPN10x02000
+#define _RPAGE_RPN45   0x0100UL
+#define _RPAGE_RPN44   0x0080UL
+#define _RPAGE_RPN43   0x0040UL
+#define _RPAGE_RPN42   0x0020UL
+#define _RPAGE_RPN41   0x0010UL
+#define _RPAGE_RPN40   0x0008UL
+
 /* Max physicall address bit as per radix table */
 #define _RPAGE_PA_MAX  57
 
 #define _PAGE_SOFT_DIRTY   _RPAGE_SW3 /* software: software dirty tracking 
*/
 #define _PAGE_SPECIAL  _RPAGE_SW2 /* software: special page */
-
-#define _PAGE_PTE  0x4000UL/* distinguishes PTEs 
from pointers */
-#define _PAGE_PRESENT  0x8000UL/* pte contains a 
translation */
 /*
  * Drivers request for cache inhibited pte mapping using _PAGE_NO_CACHE
  * Instead of fixing all of them, add an alternate define which
-- 
2.7.4



[PATCH 4/5] powerpc/mm/radix: Make max pfn bits a variable

2017-02-21 Thread Aneesh Kumar K.V
This makes max pysical address bits a variable so that hash and radix
translation mode can choose what value to use. In this patch we also switch the
radix translation mode to use 57 bits. This make it resilient to future changes
to max pfn supported by platforms.

Tis patch is split from the previous one to make the review easier.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash.h| 18 ++
 arch/powerpc/include/asm/book3s/64/pgtable.h | 28 +---
 arch/powerpc/include/asm/book3s/64/radix.h   |  4 
 arch/powerpc/mm/hash_utils_64.c  |  1 +
 arch/powerpc/mm/pgtable-radix.c  |  1 +
 arch/powerpc/mm/pgtable_64.c |  3 +++
 6 files changed, 32 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index ec2828b1db07..af3c88624d3a 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -20,6 +20,24 @@
 #define H_PAGE_F_GIX   (_RPAGE_RSV2 | _RPAGE_RSV3 | _RPAGE_RSV4)
 #define H_PAGE_F_SECOND_RPAGE_RSV1 /* HPTE is in 2ndary 
HPTEG */
 #define H_PAGE_HASHPTE _RPAGE_SW0  /* PTE has associated HPTE */
+/*
+ * Max physical address bit we will use for now.
+ *
+ * This is mostly a hardware limitation and for now Power9 has
+ * a 51 bit limit.
+ *
+ * This is different from the number of physical bit required to address
+ * the last byte of memory. That is defined by MAX_PHYSMEM_BITS.
+ * MAX_PHYSMEM_BITS is a linux limitation imposed by the maximum
+ * number of sections we can support (SECTIONS_SHIFT).
+ *
+ * This is different from Radix page table limitation and
+ * should always be less than that. The limit is done such that
+ * we can overload the bits between _RPAGE_PA_MAX and H_PAGE_PA_MAX
+ * for hash linux page table specific bits.
+ */
+#define H_PAGE_PA_MAX  51
+#define H_PTE_RPN_MASK (((1UL << H_PAGE_PA_MAX) - 1) & (PAGE_MASK))
 
 #ifdef CONFIG_PPC_64K_PAGES
 #include 
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index f1381f85c00a..0ea69c91520b 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -34,23 +34,6 @@
 #define _RPAGE_RPN10x02000
 /* Max physicall address bit as per radix table */
 #define _RPAGE_PA_MAX  57
-/*
- * Max physical address bit we will use for now.
- *
- * This is mostly a hardware limitation and for now Power9 has
- * a 51 bit limit.
- *
- * This is different from the number of physical bit required to address
- * the last byte of memory. That is defined by MAX_PHYSMEM_BITS.
- * MAX_PHYSMEM_BITS is a linux limitation imposed by the maximum
- * number of sections we can support (SECTIONS_SHIFT).
- *
- * This is different from Radix page table limitation above and
- * should always be less than that. The limit is done such that
- * we can overload the bits between _RPAGE_PA_MAX and _PAGE_PA_MAX
- * for hash linux page table specific bits.
- */
-#define _PAGE_PA_MAX   51
 
 #define _PAGE_SOFT_DIRTY   _RPAGE_SW3 /* software: software dirty tracking 
*/
 #define _PAGE_SPECIAL  _RPAGE_SW2 /* software: special page */
@@ -64,12 +47,6 @@
  */
 #define _PAGE_NO_CACHE _PAGE_TOLERANT
 /*
- * We support _RPAGE_PA_MAX bit real address in pte. On the linux side
- * we are limited by _PAGE_PA_MAX. Clear everything above _PAGE_PA_MAX
- * every thing below PAGE_SHIFT;
- */
-#define PTE_RPN_MASK   (((1UL << _PAGE_PA_MAX) - 1) & (PAGE_MASK))
-/*
  * set of bits not changed in pmd_modify. Even though we have hash specific 
bits
  * in here, on radix we expect them to be zero.
  */
@@ -174,6 +151,11 @@
 
 #ifndef __ASSEMBLY__
 /*
+ * based on max physical address bit that we want to encode in page table
+ */
+extern unsigned long __pte_rpn_mask;
+#define PTE_RPN_MASK __pte_rpn_mask
+/*
  * page table defines
  */
 extern unsigned long __pte_index_size;
diff --git a/arch/powerpc/include/asm/book3s/64/radix.h 
b/arch/powerpc/include/asm/book3s/64/radix.h
index 96b94d2b4432..d4ab838b97b2 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -24,6 +24,10 @@
 
 /* An empty PTE can still have a R or C writeback */
 #define RADIX_PTE_NONE_MASK(_PAGE_DIRTY | _PAGE_ACCESSED)
+/*
+ * Clear everything above _RPAGE_PA_MAX every thing below PAGE_SHIFT
+ */
+#define RADIX_PTE_RPN_MASK (((1UL << _RPAGE_PA_MAX) - 1) & 
(PAGE_MASK))
 
 /* Bits to set in a RPMD/RPUD/RPGD */
 #define RADIX_PMD_VAL_BITS (0x8000UL | 
RADIX_PTE_INDEX_SIZE)
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 67e19a0821be..edcca1628bbf 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -955,6 +955,7 @@ void __init hash__early_init

[PATCH 3/5] powerpc/mm: Lower the max real address to 51 bits

2017-02-21 Thread Aneesh Kumar K.V
Max value supported by hardware is 51 bits address. Radix page table define
a slot of 57 bits for future expansion. We restrict the value supported in
linux kernel 51 bits, so that we can use the bits between 57-51 for storing
hash linux page table bits. This is done in the next patch.

This will free up the software page table bits to be used for features
that are needed for both hash and radix. The current hash linux page table
format doesn't have any free software bits. Moving hash linux page table
specific bits to top of RPN field free up the software bits for other purpose.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index dec136742cd7..f1381f85c00a 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -32,6 +32,25 @@
 #define _RPAGE_RSV40x0200UL
 #define _RPAGE_RPN00x01000
 #define _RPAGE_RPN10x02000
+/* Max physicall address bit as per radix table */
+#define _RPAGE_PA_MAX  57
+/*
+ * Max physical address bit we will use for now.
+ *
+ * This is mostly a hardware limitation and for now Power9 has
+ * a 51 bit limit.
+ *
+ * This is different from the number of physical bit required to address
+ * the last byte of memory. That is defined by MAX_PHYSMEM_BITS.
+ * MAX_PHYSMEM_BITS is a linux limitation imposed by the maximum
+ * number of sections we can support (SECTIONS_SHIFT).
+ *
+ * This is different from Radix page table limitation above and
+ * should always be less than that. The limit is done such that
+ * we can overload the bits between _RPAGE_PA_MAX and _PAGE_PA_MAX
+ * for hash linux page table specific bits.
+ */
+#define _PAGE_PA_MAX   51
 
 #define _PAGE_SOFT_DIRTY   _RPAGE_SW3 /* software: software dirty tracking 
*/
 #define _PAGE_SPECIAL  _RPAGE_SW2 /* software: special page */
@@ -45,10 +64,11 @@
  */
 #define _PAGE_NO_CACHE _PAGE_TOLERANT
 /*
- * We support 57 bit real address in pte. Clear everything above 57, and
+ * We support _RPAGE_PA_MAX bit real address in pte. On the linux side
+ * we are limited by _PAGE_PA_MAX. Clear everything above _PAGE_PA_MAX
  * every thing below PAGE_SHIFT;
  */
-#define PTE_RPN_MASK   (((1UL << 57) - 1) & (PAGE_MASK))
+#define PTE_RPN_MASK   (((1UL << _PAGE_PA_MAX) - 1) & (PAGE_MASK))
 /*
  * set of bits not changed in pmd_modify. Even though we have hash specific 
bits
  * in here, on radix we expect them to be zero.
-- 
2.7.4



[PATCH 2/5] powerpc/mm: Express everything based on Radix page table defines

2017-02-21 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash-64k.h | 4 ++--
 arch/powerpc/include/asm/book3s/64/pgtable.h  | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index b39f0b86405e..7be54f9590a3 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -10,8 +10,8 @@
  * 64k aligned address free up few of the lower bits of RPN for us
  * We steal that here. For more deatils look at pte_pfn/pfn_pte()
  */
-#define H_PAGE_COMBO   0x1000 /* this is a combo 4k page */
-#define H_PAGE_4K_PFN  0x2000 /* PFN is for a single 4k page */
+#define H_PAGE_COMBO   _RPAGE_RPN0 /* this is a combo 4k page */
+#define H_PAGE_4K_PFN  _RPAGE_RPN1 /* PFN is for a single 4k page */
 /*
  * We need to differentiate between explicit huge page and THP huge
  * page, since THP huge page also need to track real subpage details
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 8ae9e0be3a6f..dec136742cd7 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -30,6 +30,8 @@
 #define _RPAGE_RSV20x0800UL
 #define _RPAGE_RSV30x0400UL
 #define _RPAGE_RSV40x0200UL
+#define _RPAGE_RPN00x01000
+#define _RPAGE_RPN10x02000
 
 #define _PAGE_SOFT_DIRTY   _RPAGE_SW3 /* software: software dirty tracking 
*/
 #define _PAGE_SPECIAL  _RPAGE_SW2 /* software: special page */
-- 
2.7.4



[PATCH 1/5] powerpc/mm: Conditional defines of pte bits are messy

2017-02-21 Thread Aneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 4 
 1 file changed, 4 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 6a55bbe91556..8ae9e0be3a6f 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -31,11 +31,7 @@
 #define _RPAGE_RSV30x0400UL
 #define _RPAGE_RSV40x0200UL
 
-#ifdef CONFIG_MEM_SOFT_DIRTY
 #define _PAGE_SOFT_DIRTY   _RPAGE_SW3 /* software: software dirty tracking 
*/
-#else
-#define _PAGE_SOFT_DIRTY   0x0
-#endif
 #define _PAGE_SPECIAL  _RPAGE_SW2 /* software: special page */
 
 #define _PAGE_PTE  0x4000UL/* distinguishes PTEs 
from pointers */
-- 
2.7.4