Re: [kernel-hardening] Re: [PATCH 9/9] mm: SLUB hardened usercopy support

2016-07-08 Thread Michael Ellerman
Kees Cook  writes:

> On Fri, Jul 8, 2016 at 1:41 PM, Kees Cook  wrote:
>> So, as found already, the position in the usercopy check needs to be
>> bumped down by red_left_pad, which is what Michael's fix does, so I'll
>> include it in the next version.
>
> Actually, after some offline chats, I think this is better, since it
> makes sure the ptr doesn't end up somewhere weird before we start the
> calculations. This leaves the pointer as-is, but explicitly handles
> the redzone on the offset instead, with no wrapping, etc:
>
> /* Find offset within object. */
> offset = (ptr - page_address(page)) % s->size;
>
> +   /* Adjust for redzone and reject if within the redzone. */
> +   if (s->flags & SLAB_RED_ZONE) {
> +   if (offset < s->red_left_pad)
> +   return s->name;
> +   offset -= s->red_left_pad;
> +   }
> +
> /* Allow address range falling entirely within object size. */
> if (offset <= s->object_size && n <= s->object_size - offset)
> return NULL;

That fixes the case for me in kstrndup(), which allows the system to boot.

I then get two hits, which may or may not be valid:

[2.309556] usercopy: kernel memory overwrite attempt detected to 
d3510028 (kernfs_node_cache) (64 bytes)
[2.309995] CPU: 7 PID: 2241 Comm: wait-for-root Not tainted 
4.7.0-rc3-00099-g97872fc89d41 #64
[2.310480] Call Trace:
[2.310556] [c001f4773bf0] [c09bdbe8] dump_stack+0xb0/0xf0 
(unreliable)
[2.311016] [c001f4773c30] [c029cf44] 
__check_object_size+0x74/0x320
[2.311472] [c001f4773cb0] [c005d4d0] copy_from_user+0x60/0xd4
[2.311873] [c001f4773cf0] [c08b38f4] __get_filter+0x74/0x160
[2.312230] [c001f4773d30] [c08b408c] sk_attach_filter+0x2c/0xc0
[2.312596] [c001f4773d60] [c0871c34] sock_setsockopt+0x954/0xc00
[2.313021] [c001f4773dd0] [c086ac44] SyS_setsockopt+0x134/0x150
[2.313380] [c001f4773e30] [c0009260] system_call+0x38/0x108
[2.317045] usercopy: kernel memory overwrite attempt detected to 
d3530028 (kernfs_node_cache) (64 bytes)
[2.317297] CPU: 10 PID: 2242 Comm: wait-for-root Not tainted 
4.7.0-rc3-00099-g97872fc89d41 #64
[2.317475] Call Trace:
[2.317511] [c001f471fbf0] [c09bdbe8] dump_stack+0xb0/0xf0 
(unreliable)
[2.317689] [c001f471fc30] [c029cf44] 
__check_object_size+0x74/0x320
[2.317861] [c001f471fcb0] [c005d4d0] copy_from_user+0x60/0xd4
[2.318011] [c001f471fcf0] [c08b38f4] __get_filter+0x74/0x160
[2.318165] [c001f471fd30] [c08b408c] sk_attach_filter+0x2c/0xc0
[2.318313] [c001f471fd60] [c0871c34] sock_setsockopt+0x954/0xc00
[2.318485] [c001f471fdd0] [c086ac44] SyS_setsockopt+0x134/0x150
[2.318632] [c001f471fe30] [c0009260] system_call+0x38/0x108


With:

# zgrep SLUB /proc/config.gz
CONFIG_SLUB_DEBUG=y
CONFIG_SLUB=y
CONFIG_SLUB_CPU_PARTIAL=y
CONFIG_SLUB_DEBUG_ON=y
# CONFIG_SLUB_STATS is not set

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v1] Specific requirement of type casting for 64-bit architectures.

2016-07-08 Thread Arvind Yadav
-Return type of 'qe_muram_alloc' is 'unsigned long', That Was trying to
assigned in ucc_fast_tx_virtual_fifo_base_offset and
ucc_fast_rx_virtual_fifo_base_offset. It will work on 32-bit architectures
But data can be loss on 64-bit architectures if 'qe_muram_alloc' will
return greater then MAX value of 'unsigned int'.

-Passing value in IS_ERR_VALUE() is wrong, as they pass an 'unsigned int'
into a function, It will through this compilation warning.

"
 include/linux/err.h:21:49: warning: cast to pointer from integer of different 
size [-Wint-to-pointer-cast]
 #define IS_ERR_VALUE(x) unlikely((unsigned long)(void *)(x) >= (unsigned 
long)-MAX_ERRNO)
 ^
 include/linux/compiler.h:170:42: note: in definition of macro ‘unlikely’
 # define unlikely(x) __builtin_expect(!!(x), 0)
"

-Most users of IS_ERR_VALUE() in the kernel are wrong, as they
pass an 'unsigned int' into a function that takes an 'unsigned long'
argument. This happens to work because the type is sign-extended
on 64-bit architectures before it gets converted into an
unsigned type.

However, anything that passes an 'unsigned short' or 'unsigned int'
argument into IS_ERR_VALUE() is guaranteed to be broken, as are
8-bit integers and types that are wider than 'unsigned long'.

Signed-off-by: Arvind Yadav 
---
 drivers/soc/fsl/qe/ucc_fast.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/soc/fsl/qe/ucc_fast.c b/drivers/soc/fsl/qe/ucc_fast.c
index a768931..208b198 100644
--- a/drivers/soc/fsl/qe/ucc_fast.c
+++ b/drivers/soc/fsl/qe/ucc_fast.c
@@ -141,6 +141,7 @@ int ucc_fast_init(struct ucc_fast_info * uf_info, struct 
ucc_fast_private ** ucc
struct ucc_fast __iomem *uf_regs;
u32 gumr;
int ret;
+   unsigned long ret_muram;
 
if (!uf_info)
return -EINVAL;
@@ -265,28 +266,34 @@ int ucc_fast_init(struct ucc_fast_info * uf_info, struct 
ucc_fast_private ** ucc
gumr |= uf_info->mode;
out_be32(_regs->gumr, gumr);
 
-   /* Allocate memory for Tx Virtual Fifo */
-   uccf->ucc_fast_tx_virtual_fifo_base_offset =
-   qe_muram_alloc(uf_info->utfs, UCC_FAST_VIRT_FIFO_REGS_ALIGNMENT);
-   if (IS_ERR_VALUE(uccf->ucc_fast_tx_virtual_fifo_base_offset)) {
+   ret_muram =
+   qe_muram_alloc(uf_info->utfs,
+   UCC_FAST_VIRT_FIFO_REGS_ALIGNMENT);
+
+   if (IS_ERR_VALUE(ret_muram)) {
printk(KERN_ERR "%s: cannot allocate MURAM for TX FIFO\n",
__func__);
uccf->ucc_fast_tx_virtual_fifo_base_offset = 0;
ucc_fast_free(uccf);
return -ENOMEM;
+   } else {
+   /* Allocate memory for Tx Virtual Fifo */
+   uccf->ucc_fast_tx_virtual_fifo_base_offset = (u32)ret_muram;
}
 
-   /* Allocate memory for Rx Virtual Fifo */
-   uccf->ucc_fast_rx_virtual_fifo_base_offset =
+   ret_muram =
qe_muram_alloc(uf_info->urfs +
   UCC_FAST_RECEIVE_VIRTUAL_FIFO_SIZE_FUDGE_FACTOR,
   UCC_FAST_VIRT_FIFO_REGS_ALIGNMENT);
-   if (IS_ERR_VALUE(uccf->ucc_fast_rx_virtual_fifo_base_offset)) {
+   if (IS_ERR_VALUE(ret_muram)) {
printk(KERN_ERR "%s: cannot allocate MURAM for RX FIFO\n",
__func__);
uccf->ucc_fast_rx_virtual_fifo_base_offset = 0;
ucc_fast_free(uccf);
return -ENOMEM;
+   } else {
+   /* Allocate memory for Rx Virtual Fifo */
+   uccf->ucc_fast_rx_virtual_fifo_base_offset = (u32)ret_muram;
}
 
/* Set Virtual Fifo registers */
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/9] mm: Hardened usercopy

2016-07-08 Thread Rik van Riel
On Fri, 2016-07-08 at 19:22 -0700, Laura Abbott wrote:
> 
> Even with the SLUB fixup I'm still seeing this blow up on my arm64
> system. This is a
> Fedora rawhide kernel + the patches
> 
> [0.666700] usercopy: kernel memory exposure attempt detected from
> fc0008b4dd58 () (8 bytes)
> [0.666720] CPU: 2 PID: 79 Comm: modprobe Tainted:
> GW   4.7.0-0.rc6.git1.1.hardenedusercopy.fc25.aarch64 #1
> [0.666733] Hardware name: AppliedMicro Mustang/Mustang, BIOS
> 1.1.0 Nov 24 2015
> [0.666744] Call trace:
> [0.666756] [] dump_backtrace+0x0/0x1e8
> [0.666765] [] show_stack+0x24/0x30
> [0.666775] [] dump_stack+0xa4/0xe0
> [0.666785] [] __check_object_size+0x6c/0x230
> [0.666795] [] create_elf_tables+0x74/0x420
> [0.666805] [] load_elf_binary+0x828/0xb70
> [0.666814] [] search_binary_handler+0xb4/0x240
> [0.666823] [] do_execveat_common+0x63c/0x950
> [0.666832] [] do_execve+0x3c/0x50
> [0.666841] []
> call_usermodehelper_exec_async+0xe8/0x148
> [0.666850] [] ret_from_fork+0x10/0x50
> 
> This happens on every call to execve. This seems to be the first
> copy_to_user in
> create_elf_tables. I didn't get a chance to debug and I'm going out
> of town
> all of next week so all I have is the report unfortunately. config
> attached.

That's odd, this should be copying a piece of kernel data (not text)
to userspace.

from fs/binfmt_elf.c

        const char *k_platform = ELF_PLATFORM;

...
                size_t len = strlen(k_platform) + 1;

                u_platform = (elf_addr_t __user *)STACK_ALLOC(p, len);
if (__copy_to_user(u_platform, k_platform, len))
return -EFAULT;

from arch/arm/include/asm/elf.h:

#define ELF_PLATFORM_SIZE 8
#define ELF_PLATFORM(elf_platform)

extern char elf_platform[];

from arch/arm/kernel/setup.c:

char elf_platform[ELF_PLATFORM_SIZE];
EXPORT_SYMBOL(elf_platform);

...

snprintf(elf_platform, ELF_PLATFORM_SIZE, "%s%c",
 list->elf_name, ENDIANNESS);

How does that end up in the .text section of the
image, instead of in one of the various data sections?

What kind of linker oddity is going on with ARM?

--  
All Rights Reversed.

signature.asc
Description: This is a digitally signed message part
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v1] Specific requirement of type casting for 64-bit architectures.

2016-07-08 Thread Guenter Roeck

On 07/08/2016 02:44 PM, Arvind Yadav wrote:

I would really suggest to read section 14 of Documentation/SubmittingPatches
and to follow the guidance it provides.

For the subject line: The subsystem/driver is still not listed,
and I am quite sure that this is not v1 of this patch.
It also does not describe the patch, much less concisely.


-Return type of 'qe_muram_alloc' is 'unsigned long', That Was trying to
assigned in ucc_fast_tx_virtual_fifo_base_offset and
ucc_fast_rx_virtual_fifo_base_offset. It will work on 32-bit architectures
But data can be loss on 64-bit architectures if 'qe_muram_alloc' will
return greater then MAX value of 'unsigned int'.


Try to rephrase this to make it better readable.


-Passing value in IS_ERR_VALUE() is wrong, as they pass an 'unsigned int'
into a function, It will through this compilation warning.



What is wrong it that the return value from the allocator function is truncated
to 32 bit, and that the resulting value is then used as argument to 
IS_ERR_VALUE().


"
  include/linux/err.h:21:49: warning: cast to pointer from integer of different 
size [-Wint-to-pointer-cast]
  #define IS_ERR_VALUE(x) unlikely((unsigned long)(void *)(x) >= (unsigned 
long)-MAX_ERRNO)
  ^
  include/linux/compiler.h:170:42: note: in definition of macro ‘unlikely’
  # define unlikely(x) __builtin_expect(!!(x), 0)
"

-Most users of IS_ERR_VALUE() in the kernel are wrong, as they
pass an 'unsigned int' into a function that takes an 'unsigned long'
argument. This happens to work because the type is sign-extended
on 64-bit architectures before it gets converted into an
unsigned type.


While this may be true, the description of this patch should be about
this patch, not about the rest of the kernel.


However, anything that passes an 'unsigned short' or 'unsigned int'
argument into IS_ERR_VALUE() is guaranteed to be broken, as are
8-bit integers and types that are wider than 'unsigned long'.



What does that have to do with this patch ?

Again, the problem here is that a unsigned long is assigned to an u32, and that
the u32 is then used as parameter to IS_ERR_VALUE. This is wrong and needs to
be fixed. Describe what is wrong and needs to be fixed, not what can be wrong
elsewhere in the kernel.


Signed-off-by: Arvind Yadav 
---


Here is where one would normally expect a change log.


  drivers/soc/fsl/qe/ucc_fast.c | 21 ++---
  1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/soc/fsl/qe/ucc_fast.c b/drivers/soc/fsl/qe/ucc_fast.c
index a768931..208b198 100644
--- a/drivers/soc/fsl/qe/ucc_fast.c
+++ b/drivers/soc/fsl/qe/ucc_fast.c
@@ -141,6 +141,7 @@ int ucc_fast_init(struct ucc_fast_info * uf_info, struct 
ucc_fast_private ** ucc
struct ucc_fast __iomem *uf_regs;
u32 gumr;
int ret;
+   unsigned long ret_muram;



Kind of an unfortunate variable name. A simple "offset" might be a better 
choice.


if (!uf_info)
return -EINVAL;
@@ -265,28 +266,34 @@ int ucc_fast_init(struct ucc_fast_info * uf_info, struct 
ucc_fast_private ** ucc
gumr |= uf_info->mode;
out_be32(_regs->gumr, gumr);

-   /* Allocate memory for Tx Virtual Fifo */
-   uccf->ucc_fast_tx_virtual_fifo_base_offset =
-   qe_muram_alloc(uf_info->utfs, UCC_FAST_VIRT_FIFO_REGS_ALIGNMENT);
-   if (IS_ERR_VALUE(uccf->ucc_fast_tx_virtual_fifo_base_offset)) {
+   ret_muram =
+   qe_muram_alloc(uf_info->utfs,
+   UCC_FAST_VIRT_FIFO_REGS_ALIGNMENT);


While minor, this introduces a checkpatch CHECK message.


+

This added empty line is an unnecessary whitespace change and does not add any 
value.


+   if (IS_ERR_VALUE(ret_muram)) {
printk(KERN_ERR "%s: cannot allocate MURAM for TX FIFO\n",
__func__);
uccf->ucc_fast_tx_virtual_fifo_base_offset = 0;
ucc_fast_free(uccf);
return -ENOMEM;
+   } else {
+   /* Allocate memory for Tx Virtual Fifo */


Why did you move the comment here ? The code below does not allocate anything.


+   uccf->ucc_fast_tx_virtual_fifo_base_offset = (u32)ret_muram;
}


checkpatch will rightfully tell you that else after return is generally not 
useful.
Also, the typecast is not necessary.



-   /* Allocate memory for Rx Virtual Fifo */
-   uccf->ucc_fast_rx_virtual_fifo_base_offset =
+   ret_muram =
qe_muram_alloc(uf_info->urfs +
   UCC_FAST_RECEIVE_VIRTUAL_FIFO_SIZE_FUDGE_FACTOR,
   UCC_FAST_VIRT_FIFO_REGS_ALIGNMENT);
-   if (IS_ERR_VALUE(uccf->ucc_fast_rx_virtual_fifo_base_offset)) {
+   if (IS_ERR_VALUE(ret_muram)) {
printk(KERN_ERR "%s: cannot allocate MURAM for RX FIFO\n",
__func__);

Re: Need proper type casting before assignment, Remove compilation Warning.

2016-07-08 Thread arvind Yadav


As per you concern, I have submitted one more patch with some changes. 
Please review it.


Thanks,

On Friday 08 July 2016 09:03 PM, Guenter Roeck wrote:

On Thu, Jul 07, 2016 at 10:31:11PM +0530, Arvind Yadav wrote:

-Return type of 'qe_muram_alloc' is 'unsigned long', That Was trying to
assigned in ucc_fast_tx_virtual_fifo_base_offset and
ucc_fast_rx_virtual_fifo_base_offset. These variable are 'unsigned int'.
So before assginment need a proper type casting.

Are they ? In the upstream kernel, they seem to be "u32".

Yes, I have changed as per you suggestion.



-Passing value in IS_ERR_VALUE() is wrong, as they pass an 'int'
into a function that takes an 'unsigned long' argument.This happens
to work because the type is sign-extended on 64-bit architectures
before it gets converted into an unsigned type.


Not really sure I understand if/how this applies to the patch in question.
I don't see an int passed to IS_ERR_VALUE(), I only see u32.

-Most users of IS_ERR_VALUE() in the kernel are wrong, as they
pass an 'unsigned int' into a function that takes an 'unsigned long'
argument. This happens to work because the type is sign-extended
on 64-bit architectures before it gets converted into an
unsigned type.

However, anything that passes an 'unsigned short' or 'unsigned int'
argument into IS_ERR_VALUE() is guaranteed to be broken, as are
8-bit integers and types that are wider than 'unsigned long'.

-Passing an 'unsigned short' or 'unsigned int'argument into
IS_ERR_VALUE() is guaranteed to be broken, as are 8-bit integers
and types that are wider than 'unsigned long'.


What does this have to do with this patch ?

Specific requirement of type casting for 64-bit architectures.

-Return type of 'qe_muram_alloc' is 'unsigned long', That Was trying to
assigned in ucc_fast_tx_virtual_fifo_base_offset and
ucc_fast_rx_virtual_fifo_base_offset. It will work on 32-bit architectures
But data can be loss on 64-bit architectures if 'qe_muram_alloc' will return
greater then MAX value of 'unsigned int'.


-Passing value in IS_ERR_VALUE() is wrong, as they pass an 'unsigned int'
into a function, It will through this compilation warning.
"
 include/linux/err.h:21:49: warning: cast to pointer from integer of 
different size [-Wint-to-pointer-cast]
 #define IS_ERR_VALUE(x) unlikely((unsigned long)(void *)(x) >= 
(unsigned long)-MAX_ERRNO)

 ^
 include/linux/compiler.h:170:42: note: in definition of macro ‘unlikely’
 # define unlikely(x) __builtin_expect(!!(x), 0)
"

-Any user will get compilation warning for that do not pass an
unsigned long' argument.


Sure, but that doesn't mean that typecasting the parameter to unsigned long
does any good (other than hiding the real bug).

Your subject line still does not list the affected subsystem and/or driver.
Documentation/SubmittingPatches might give some hints about proper subject
lines, and looking at other patches applied to the same file(s) might help
as well.

Also, if you want someone to review your patches, it helps to Cc: that
someone.

Thanks, For your suggestion.

Signed-off-by: Arvind Yadav 
---
  drivers/soc/fsl/qe/ucc_fast.c | 11 +++
  1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/soc/fsl/qe/ucc_fast.c b/drivers/soc/fsl/qe/ucc_fast.c
index a768931..98eed25 100644
--- a/drivers/soc/fsl/qe/ucc_fast.c
+++ b/drivers/soc/fsl/qe/ucc_fast.c
@@ -267,8 +267,10 @@ int ucc_fast_init(struct ucc_fast_info * uf_info, struct 
ucc_fast_private ** ucc
  
  	/* Allocate memory for Tx Virtual Fifo */

uccf->ucc_fast_tx_virtual_fifo_base_offset =
-   qe_muram_alloc(uf_info->utfs, UCC_FAST_VIRT_FIFO_REGS_ALIGNMENT);
-   if (IS_ERR_VALUE(uccf->ucc_fast_tx_virtual_fifo_base_offset)) {
+   (unsigned int)qe_muram_alloc(uf_info->utfs,

I don't see the point of this typecast.


+   UCC_FAST_VIRT_FIFO_REGS_ALIGNMENT);
+   if (IS_ERR_VALUE(
+   (unsigned long)uccf->ucc_fast_tx_virtual_fifo_base_offset)) {

If sizeof(u32) == sizeof(unsigned long), this patch does not have an effect.
If sizeof(u32) < sizeof(unsigned long), it does not change anything, and the
resulting code is as wrong as it was before.


printk(KERN_ERR "%s: cannot allocate MURAM for TX FIFO\n",
__func__);
uccf->ucc_fast_tx_virtual_fifo_base_offset = 0;
@@ -278,10 +280,11 @@ int ucc_fast_init(struct ucc_fast_info * uf_info, struct 
ucc_fast_private ** ucc
  
  	/* Allocate memory for Rx Virtual Fifo */

uccf->ucc_fast_rx_virtual_fifo_base_offset =
-   qe_muram_alloc(uf_info->urfs +
+   (unsigned int)qe_muram_alloc(uf_info->urfs +
   UCC_FAST_RECEIVE_VIRTUAL_FIFO_SIZE_FUDGE_FACTOR,
   UCC_FAST_VIRT_FIFO_REGS_ALIGNMENT);
-   if (IS_ERR_VALUE(uccf->ucc_fast_rx_virtual_fifo_base_offset)) {
+   if (IS_ERR_VALUE(

Re: t1040 IFC flash driver Extended Chip Select

2016-07-08 Thread Scott Wood
On 07/07/2016 06:48 PM, Daniel Walker wrote:
> On 07/07/2016 03:37 PM, Scott Wood wrote:
>> On 07/07/2016 05:01 PM, Daniel Walker wrote:
>>> On 07/07/2016 02:59 PM, Scott Wood wrote:
 On 07/07/2016 04:49 PM, Daniel Walker wrote:
> On 07/07/2016 02:23 PM, Scott Wood wrote:
>> I suspect that add the usage of cspr_ext into the driver would fix the
>> issue we have. It reads like you would find that acceptable ?
>> What specifically is the problem you're having?  Is it that CSPR_EXT is
>> not getting written to, and thus the device does not appear at the
>> address that it should?
>>
>> Or is the driver matching incorrectly?  The only way the driver's lack
>> of using CSPR_EXT to match would be a problem would be if you have
>> multiple chipselects with the same address in the lower 32 bits, and
>> only CSPR_EXT distinguishing them.  Since you proposed a device tree
>> binding that assumes all devices have the same CSPR_EXT, I doubt that's
>> the case, so I doubt adding CSPR_EXT matching to the driver will solve
>> your problem.
>>
>> -Scott
>>
> I didn't do the debug on this. From my perspective it's either flash
> works, or it doesn't work. We need the code below for it to work,
 Adding CSPR_EXT matching to the driver will not accomplish the same
 thing as that code.

>>> So from u-boot perspective, the values in the device tree under "ranges"
>>> or parts of it, are place into the cspr and cspr_ext ? Is that how it's
>>> suppose to work ?
>> U-Boot writes values that are hardcoded in the board config header.
>> These values (as well as the area covered by the IFC LAW) need to match
>> the address in the device tree, but U-Boot doesn't get them from the
>> device tree.
>>
> 
> I was suggesting the values it writes are the same as the ones inside 
> the device tree. So we could have both csrp and csrp_ext written from 
> the driver and the values would
> come from the ranges property.

There's more to CSPR than just the address.  The driver should either be
able to assume that all of CSPR/CSOR has been correctly initialized, or
it should assume none of that has been initialized -- which again,
requires the attribute information to be in the device tree.  If you're
doing something in between, then that's a board quirk rather than a
general solution.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] rpaphp: fix slot registration for multiple slots under a PHB

2016-07-08 Thread Nathan Fontenot
On 07/08/2016 06:19 PM, Tyrel Datwyler wrote:
> PowerVM seems to only ever provide a single hotplug slot per PHB.
> The under lying slot hotplug registration code assumed multiple slots,
> but the actual implementation is broken for multiple slots. This went
> unnoticed for years due to the nature of PowerVM as mentioned
> previously. Under qemu/kvm the hotplug slot model aligns more with
> x86 where multiple slots are presented under a single PHB. As seen
> in the following each additional slot after the first fails to
> register due to each slot always being compared against the first
> child node of the PHB in the device tree.
> 
> [6.492291] rpaphp: RPA HOT Plug PCI Controller Driver version: 0.1
> [6.492569] rpaphp: Slot [Slot 0] registered
> [6.492577] rpaphp: pci_hp_register failed with error -16
> [6.493082] rpaphp: pci_hp_register failed with error -16
> [6.493138] rpaphp: pci_hp_register failed with error -16
> [6.493161] rpaphp: pci_hp_register failed with error -16
> 
> The registration logic is fixed so that each slot is compared
> against the existing child devices of the PHB in the device tree to
> determine present slots vs empty slots.
> 
> [   38.481750] rpaphp: RPA HOT Plug PCI Controller Driver version: 0.1
> [   38.482004] rpaphp: Slot [C0] registered
> [   38.482127] rpaphp: Slot [C1] registered
> [   38.482241] rpaphp: Slot [C2] registered
> [   38.482356] rpaphp: Slot [C3] registered
> [   38.482495] rpaphp: Slot [C4] registered
> 
> Signed-off-by: Tyrel Datwyler 
> ---
>  drivers/pci/hotplug/rpaphp_slot.c | 17 -
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/rpaphp_slot.c 
> b/drivers/pci/hotplug/rpaphp_slot.c
> index 6937c72..c90fa8d 100644
> --- a/drivers/pci/hotplug/rpaphp_slot.c
> +++ b/drivers/pci/hotplug/rpaphp_slot.c
> @@ -117,8 +117,10 @@ EXPORT_SYMBOL_GPL(rpaphp_deregister_slot);
>  int rpaphp_register_slot(struct slot *slot)
>  {
>   struct hotplug_slot *php_slot = slot->hotplug_slot;
> + struct device_node *child;
> + u32 my_index;
>   int retval;
> - int slotno;
> + int slotno = -1;
> 
>   dbg("%s registering slot:path[%s] index[%x], name[%s] pdomain[%x] 
> type[%d]\n",
>   __func__, slot->dn->full_name, slot->index, slot->name,
> @@ -130,10 +132,15 @@ int rpaphp_register_slot(struct slot *slot)
>   return -EAGAIN;
>   }
> 
> - if (slot->dn->child)
> - slotno = PCI_SLOT(PCI_DN(slot->dn->child)->devfn);
> - else
> - slotno = -1;
> + for_each_child_of_node(slot->dn, child) {
> + retval = of_property_read_u32(child, "my,ibm-drc-index", 
> _index);

Shouldn't this be reading ibm,my-drc-index? instead of my,ibm-drc-index.

-Nathan

> + if (my_index == slot->index) {
> + slotno = PCI_SLOT(PCI_DN(child)->devfn);
> + of_node_put(child);
> + break;
> + }
> + }
> +
>   retval = pci_hp_register(php_slot, slot->bus, slotno, slot->name);
>   if (retval) {
>   err("pci_hp_register failed with error %d\n", retval);
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] rpaphp: fix slot registration for multiple slots under a PHB

2016-07-08 Thread Tyrel Datwyler
PowerVM seems to only ever provide a single hotplug slot per PHB.
The under lying slot hotplug registration code assumed multiple slots,
but the actual implementation is broken for multiple slots. This went
unnoticed for years due to the nature of PowerVM as mentioned
previously. Under qemu/kvm the hotplug slot model aligns more with
x86 where multiple slots are presented under a single PHB. As seen
in the following each additional slot after the first fails to
register due to each slot always being compared against the first
child node of the PHB in the device tree.

[6.492291] rpaphp: RPA HOT Plug PCI Controller Driver version: 0.1
[6.492569] rpaphp: Slot [Slot 0] registered
[6.492577] rpaphp: pci_hp_register failed with error -16
[6.493082] rpaphp: pci_hp_register failed with error -16
[6.493138] rpaphp: pci_hp_register failed with error -16
[6.493161] rpaphp: pci_hp_register failed with error -16

The registration logic is fixed so that each slot is compared
against the existing child devices of the PHB in the device tree to
determine present slots vs empty slots.

[   38.481750] rpaphp: RPA HOT Plug PCI Controller Driver version: 0.1
[   38.482004] rpaphp: Slot [C0] registered
[   38.482127] rpaphp: Slot [C1] registered
[   38.482241] rpaphp: Slot [C2] registered
[   38.482356] rpaphp: Slot [C3] registered
[   38.482495] rpaphp: Slot [C4] registered

Signed-off-by: Tyrel Datwyler 
---
 drivers/pci/hotplug/rpaphp_slot.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/hotplug/rpaphp_slot.c 
b/drivers/pci/hotplug/rpaphp_slot.c
index 6937c72..c90fa8d 100644
--- a/drivers/pci/hotplug/rpaphp_slot.c
+++ b/drivers/pci/hotplug/rpaphp_slot.c
@@ -117,8 +117,10 @@ EXPORT_SYMBOL_GPL(rpaphp_deregister_slot);
 int rpaphp_register_slot(struct slot *slot)
 {
struct hotplug_slot *php_slot = slot->hotplug_slot;
+   struct device_node *child;
+   u32 my_index;
int retval;
-   int slotno;
+   int slotno = -1;
 
dbg("%s registering slot:path[%s] index[%x], name[%s] pdomain[%x] 
type[%d]\n",
__func__, slot->dn->full_name, slot->index, slot->name,
@@ -130,10 +132,15 @@ int rpaphp_register_slot(struct slot *slot)
return -EAGAIN;
}
 
-   if (slot->dn->child)
-   slotno = PCI_SLOT(PCI_DN(slot->dn->child)->devfn);
-   else
-   slotno = -1;
+   for_each_child_of_node(slot->dn, child) {
+   retval = of_property_read_u32(child, "my,ibm-drc-index", 
_index);
+   if (my_index == slot->index) {
+   slotno = PCI_SLOT(PCI_DN(child)->devfn);
+   of_node_put(child);
+   break;
+   }
+   }
+
retval = pci_hp_register(php_slot, slot->bus, slotno, slot->name);
if (retval) {
err("pci_hp_register failed with error %d\n", retval);
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Need proper type casting before assignment, Remove compilation Warning.

2016-07-08 Thread arvind Yadav

As per your concern, I have changed and submitted one more patch.

This answer of your all questions,
-Return type of 'qe_muram_alloc' is 'unsigned long', That Was trying to
assigned in ucc_fast_tx_virtual_fifo_base_offset and
ucc_fast_rx_virtual_fifo_base_offset. It will work on 32-bit architectures
But data can be loss on 64-bit architectures if 'qe_muram_alloc' will return
greater then MAX value of 'unsigned int'.


-Passing value in IS_ERR_VALUE() is wrong, as they pass an 'unsigned int'
into a function, It will through this compilation warning.
"
 include/linux/err.h:21:49: warning: cast to pointer from integer of 
different size [-Wint-to-pointer-cast]
 #define IS_ERR_VALUE(x) unlikely((unsigned long)(void *)(x) >= 
(unsigned long)-MAX_ERRNO)

 ^
 include/linux/compiler.h:170:42: note: in definition of macro ‘unlikely’
 # define unlikely(x) __builtin_expect(!!(x), 0)
"

-Most users of IS_ERR_VALUE() in the kernel are wrong, as they
pass an 'unsigned int' into a function that takes an 'unsigned long'
argument. This happens to work because the type is sign-extended
on 64-bit architectures before it gets converted into an
unsigned type.

However, anything that passes an 'unsigned short' or 'unsigned int'
argument into IS_ERR_VALUE() is guaranteed to be broken, as are
8-bit integers and types that are wider than 'unsigned long'.


Thanks,
Arvind Yadav

On Friday 08 July 2016 09:03 PM, Guenter Roeck wrote:

On Thu, Jul 07, 2016 at 10:31:11PM +0530, Arvind Yadav wrote:

-Return type of 'qe_muram_alloc' is 'unsigned long', That Was trying to
assigned in ucc_fast_tx_virtual_fifo_base_offset and
ucc_fast_rx_virtual_fifo_base_offset. These variable are 'unsigned int'.
So before assginment need a proper type casting.

Are they ? In the upstream kernel, they seem to be "u32".
-Yes, I have changed as per you suggestion.

-Passing value in IS_ERR_VALUE() is wrong, as they pass an 'int'
into a function that takes an 'unsigned long' argument.This happens
to work because the type is sign-extended on 64-bit architectures
before it gets converted into an unsigned type.


Not really sure I understand if/how this applies to the patch in question.
I don't see an int passed to IS_ERR_VALUE(), I only see u32.


-Passing an 'unsigned short' or 'unsigned int'argument into
IS_ERR_VALUE() is guaranteed to be broken, as are 8-bit integers
and types that are wider than 'unsigned long'.


What does this have to do with this patch ?


-Any user will get compilation warning for that do not pass an
unsigned long' argument.


Sure, but that doesn't mean that typecasting the parameter to unsigned long
does any good (other than hiding the real bug).

Your subject line still does not list the affected subsystem and/or driver.
Documentation/SubmittingPatches might give some hints about proper subject
lines, and looking at other patches applied to the same file(s) might help
as well.

Also, if you want someone to review your patches, it helps to Cc: that
someone.


Signed-off-by: Arvind Yadav 
---
  drivers/soc/fsl/qe/ucc_fast.c | 11 +++
  1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/soc/fsl/qe/ucc_fast.c b/drivers/soc/fsl/qe/ucc_fast.c
index a768931..98eed25 100644
--- a/drivers/soc/fsl/qe/ucc_fast.c
+++ b/drivers/soc/fsl/qe/ucc_fast.c
@@ -267,8 +267,10 @@ int ucc_fast_init(struct ucc_fast_info * uf_info, struct 
ucc_fast_private ** ucc
  
  	/* Allocate memory for Tx Virtual Fifo */

uccf->ucc_fast_tx_virtual_fifo_base_offset =
-   qe_muram_alloc(uf_info->utfs, UCC_FAST_VIRT_FIFO_REGS_ALIGNMENT);
-   if (IS_ERR_VALUE(uccf->ucc_fast_tx_virtual_fifo_base_offset)) {
+   (unsigned int)qe_muram_alloc(uf_info->utfs,

I don't see the point of this typecast.


+   UCC_FAST_VIRT_FIFO_REGS_ALIGNMENT);
+   if (IS_ERR_VALUE(
+   (unsigned long)uccf->ucc_fast_tx_virtual_fifo_base_offset)) {

If sizeof(u32) == sizeof(unsigned long), this patch does not have an effect.
If sizeof(u32) < sizeof(unsigned long), it does not change anything, and the
resulting code is as wrong as it was before.


printk(KERN_ERR "%s: cannot allocate MURAM for TX FIFO\n",
__func__);
uccf->ucc_fast_tx_virtual_fifo_base_offset = 0;
@@ -278,10 +280,11 @@ int ucc_fast_init(struct ucc_fast_info * uf_info, struct 
ucc_fast_private ** ucc
  
  	/* Allocate memory for Rx Virtual Fifo */

uccf->ucc_fast_rx_virtual_fifo_base_offset =
-   qe_muram_alloc(uf_info->urfs +
+   (unsigned int)qe_muram_alloc(uf_info->urfs +
   UCC_FAST_RECEIVE_VIRTUAL_FIFO_SIZE_FUDGE_FACTOR,
   UCC_FAST_VIRT_FIFO_REGS_ALIGNMENT);
-   if (IS_ERR_VALUE(uccf->ucc_fast_rx_virtual_fifo_base_offset)) {
+   if (IS_ERR_VALUE(
+   (unsigned 

[PATCH v1] Specific requirement of type casting for 64-bit architectures.

2016-07-08 Thread Arvind Yadav
-Return type of 'qe_muram_alloc' is 'unsigned long', That Was trying to
assigned in ucc_fast_tx_virtual_fifo_base_offset and
ucc_fast_rx_virtual_fifo_base_offset. It will work on 32-bit architectures
But data can be loss on 64-bit architectures if 'qe_muram_alloc' will
return greater then MAX value of 'unsigned int'.

-Passing value in IS_ERR_VALUE() is wrong, as they pass an 'unsigned int'
into a function, It will through this compilation warning.

"
 include/linux/err.h:21:49: warning: cast to pointer from integer of different 
size [-Wint-to-pointer-cast]
 #define IS_ERR_VALUE(x) unlikely((unsigned long)(void *)(x) >= (unsigned 
long)-MAX_ERRNO)
 ^
 include/linux/compiler.h:170:42: note: in definition of macro ‘unlikely’
 # define unlikely(x) __builtin_expect(!!(x), 0)
"

-Most users of IS_ERR_VALUE() in the kernel are wrong, as they
pass an 'unsigned int' into a function that takes an 'unsigned long'
argument. This happens to work because the type is sign-extended
on 64-bit architectures before it gets converted into an
unsigned type.

However, anything that passes an 'unsigned short' or 'unsigned int'
argument into IS_ERR_VALUE() is guaranteed to be broken, as are
8-bit integers and types that are wider than 'unsigned long'.

Signed-off-by: Arvind Yadav 
---
 drivers/soc/fsl/qe/ucc_fast.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/soc/fsl/qe/ucc_fast.c b/drivers/soc/fsl/qe/ucc_fast.c
index a768931..208b198 100644
--- a/drivers/soc/fsl/qe/ucc_fast.c
+++ b/drivers/soc/fsl/qe/ucc_fast.c
@@ -141,6 +141,7 @@ int ucc_fast_init(struct ucc_fast_info * uf_info, struct 
ucc_fast_private ** ucc
struct ucc_fast __iomem *uf_regs;
u32 gumr;
int ret;
+   unsigned long ret_muram;
 
if (!uf_info)
return -EINVAL;
@@ -265,28 +266,34 @@ int ucc_fast_init(struct ucc_fast_info * uf_info, struct 
ucc_fast_private ** ucc
gumr |= uf_info->mode;
out_be32(_regs->gumr, gumr);
 
-   /* Allocate memory for Tx Virtual Fifo */
-   uccf->ucc_fast_tx_virtual_fifo_base_offset =
-   qe_muram_alloc(uf_info->utfs, UCC_FAST_VIRT_FIFO_REGS_ALIGNMENT);
-   if (IS_ERR_VALUE(uccf->ucc_fast_tx_virtual_fifo_base_offset)) {
+   ret_muram =
+   qe_muram_alloc(uf_info->utfs,
+   UCC_FAST_VIRT_FIFO_REGS_ALIGNMENT);
+
+   if (IS_ERR_VALUE(ret_muram)) {
printk(KERN_ERR "%s: cannot allocate MURAM for TX FIFO\n",
__func__);
uccf->ucc_fast_tx_virtual_fifo_base_offset = 0;
ucc_fast_free(uccf);
return -ENOMEM;
+   } else {
+   /* Allocate memory for Tx Virtual Fifo */
+   uccf->ucc_fast_tx_virtual_fifo_base_offset = (u32)ret_muram;
}
 
-   /* Allocate memory for Rx Virtual Fifo */
-   uccf->ucc_fast_rx_virtual_fifo_base_offset =
+   ret_muram =
qe_muram_alloc(uf_info->urfs +
   UCC_FAST_RECEIVE_VIRTUAL_FIFO_SIZE_FUDGE_FACTOR,
   UCC_FAST_VIRT_FIFO_REGS_ALIGNMENT);
-   if (IS_ERR_VALUE(uccf->ucc_fast_rx_virtual_fifo_base_offset)) {
+   if (IS_ERR_VALUE(ret_muram)) {
printk(KERN_ERR "%s: cannot allocate MURAM for RX FIFO\n",
__func__);
uccf->ucc_fast_rx_virtual_fifo_base_offset = 0;
ucc_fast_free(uccf);
return -ENOMEM;
+   } else {
+   /* Allocate memory for Rx Virtual Fifo */
+   uccf->ucc_fast_rx_virtual_fifo_base_offset = (u32)ret_muram;
}
 
/* Set Virtual Fifo registers */
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 2/2] cpufreq: qoriq: Don't look at clock implementation details

2016-07-08 Thread Scott Wood
On Thu, 2016-07-07 at 19:26 -0700, Michael Turquette wrote:
> Quoting Scott Wood (2016-07-06 21:13:23)
> > 
> > On Wed, 2016-07-06 at 18:30 -0700, Michael Turquette wrote:
> > > 
> > > Quoting Scott Wood (2016-06-15 23:21:25)
> > > > 
> > > > 
> > > > -static struct device_node *cpu_to_clk_node(int cpu)
> > > > +static struct clk *cpu_to_clk(int cpu)
> > > >  {
> > > > -   struct device_node *np, *clk_np;
> > > > +   struct device_node *np;
> > > > +   struct clk *clk;
> > > >  
> > > > if (!cpu_present(cpu))
> > > > return NULL;
> > > > @@ -112,37 +80,28 @@ static struct device_node *cpu_to_clk_node(int
> > > > cpu)
> > > > if (!np)
> > > > return NULL;
> > > >  
> > > > -   clk_np = of_parse_phandle(np, "clocks", 0);
> > > > -   if (!clk_np)
> > > > -   return NULL;
> > > > -
> > > > +   clk = of_clk_get(np, 0);
> > > Why not use devm_clk_get here?
> > devm_clk_get() is a wrapper around clk_get() which is not the same as
> > of_clk_get().  What device would you pass to devm_clk_get(), and what name
> > would you pass?
> I'm fuzzy on whether or not you get a struct device from a cpufreq
> driver. If so, then that would be the one to use. I would hope that
> cpufreq drivers model cpus as devices, but I'm really not sure without
> looking into the code.

It's not the cpufreq code that provides it, but get_cpu_device() could be
used.

Do you have any comments on the first patch of this set?

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [kernel-hardening] Re: [PATCH 9/9] mm: SLUB hardened usercopy support

2016-07-08 Thread Kees Cook
On Fri, Jul 8, 2016 at 1:41 PM, Kees Cook  wrote:
> On Fri, Jul 8, 2016 at 12:20 PM, Christoph Lameter  wrote:
>> On Fri, 8 Jul 2016, Kees Cook wrote:
>>
>>> Is check_valid_pointer() making sure the pointer is within the usable
>>> size? It seemed like it was checking that it was within the slub
>>> object (checks against s->size, wants it above base after moving
>>> pointer to include redzone, etc).
>>
>> check_valid_pointer verifies that a pointer is pointing to the start of an
>> object. It is used to verify the internal points that SLUB used and
>> should not be modified to do anything different.
>
> Yup, no worries -- I won't touch it. :) I just wanted to verify my
> understanding.
>
> And after playing a bit more, I see that the only thing to the left is
> padding and redzone. SLUB layout, from what I saw:
>
> offset: what's there
> ---
> start: padding, redzone
> red_left_pad: object itself
> inuse: rest of metadata
> size: start of next slub object
>
> (and object_size == inuse - red_left_pad)
>
> i.e. a pointer must be between red_left_pad and inuse, which is the
> same as pointer - ref_left_pad being less than object_size.
>
> So, as found already, the position in the usercopy check needs to be
> bumped down by red_left_pad, which is what Michael's fix does, so I'll
> include it in the next version.

Actually, after some offline chats, I think this is better, since it
makes sure the ptr doesn't end up somewhere weird before we start the
calculations. This leaves the pointer as-is, but explicitly handles
the redzone on the offset instead, with no wrapping, etc:

/* Find offset within object. */
offset = (ptr - page_address(page)) % s->size;

+   /* Adjust for redzone and reject if within the redzone. */
+   if (s->flags & SLAB_RED_ZONE) {
+   if (offset < s->red_left_pad)
+   return s->name;
+   offset -= s->red_left_pad;
+   }
+
/* Allow address range falling entirely within object size. */
if (offset <= s->object_size && n <= s->object_size - offset)
return NULL;

-Kees


-- 
Kees Cook
Chrome OS & Brillo Security
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/9] mm: Hardened usercopy

2016-07-08 Thread Ingo Molnar

* Linus Torvalds  wrote:

> On Fri, Jul 8, 2016 at 1:46 AM, Ingo Molnar  wrote:
> >
> > Could you please try to find some syscall workload that does many small user
> > copies and thus excercises this code path aggressively?
> 
> Any stat()-heavy path will hit cp_new_stat() very heavily. Think the
> usual kind of "traverse the whole tree looking for something". "git
> diff" will do it, just checking that everything is up-to-date.
> 
> That said, other things tend to dominate.

So I think a cached 'find /usr >/dev/null' might be a good one as well:

 triton:~/tip> strace -c find /usr >/dev/null
 % time seconds  usecs/call callserrors syscall
 -- --- --- - - 
  47.090.006518   0254697   newfstatat
  26.200.003627   0254795   getdents
  14.450.002000   0   1147411   fcntl
   7.330.001014   0509811   close
   3.280.000454   0128220 1 openat
   1.520.000210   0128230   fstat
   0.270.16   0 12810   write
   0.000.00   010   read

 triton:~/tip> perf stat --repeat 3 -e cycles:u,cycles:k,cycles find /usr 
>/dev/null

 Performance counter stats for 'find /usr' (3 runs):

 1,594,437,143  cycles:u
  ( +-  2.76% )
 2,570,544,009  cycles:k
  ( +-  2.50% )
 4,164,981,152  cycles  
  ( +-  2.59% )

   0.929883686 seconds time elapsed 
 ( +-  2.57% )

... and it's dominated by kernel overhead, with a fair amount of memcpy 
overhead 
as well:

   1.22%  find [kernel.kallsyms]   [k] copy_user_enhanced_fast_string   

 

But maybe there are simple shell commands that are even more user-memcpy 
intense? 

Thanks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [kernel-hardening] Re: [PATCH 9/9] mm: SLUB hardened usercopy support

2016-07-08 Thread Kees Cook
On Fri, Jul 8, 2016 at 12:20 PM, Christoph Lameter  wrote:
> On Fri, 8 Jul 2016, Kees Cook wrote:
>
>> Is check_valid_pointer() making sure the pointer is within the usable
>> size? It seemed like it was checking that it was within the slub
>> object (checks against s->size, wants it above base after moving
>> pointer to include redzone, etc).
>
> check_valid_pointer verifies that a pointer is pointing to the start of an
> object. It is used to verify the internal points that SLUB used and
> should not be modified to do anything different.

Yup, no worries -- I won't touch it. :) I just wanted to verify my
understanding.

And after playing a bit more, I see that the only thing to the left is
padding and redzone. SLUB layout, from what I saw:

offset: what's there
---
start: padding, redzone
red_left_pad: object itself
inuse: rest of metadata
size: start of next slub object

(and object_size == inuse - red_left_pad)

i.e. a pointer must be between red_left_pad and inuse, which is the
same as pointer - ref_left_pad being less than object_size.

So, as found already, the position in the usercopy check needs to be
bumped down by red_left_pad, which is what Michael's fix does, so I'll
include it in the next version.

Thanks!

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [kernel-hardening] Re: [PATCH 9/9] mm: SLUB hardened usercopy support

2016-07-08 Thread Christoph Lameter
On Fri, 8 Jul 2016, Kees Cook wrote:

> Is check_valid_pointer() making sure the pointer is within the usable
> size? It seemed like it was checking that it was within the slub
> object (checks against s->size, wants it above base after moving
> pointer to include redzone, etc).

check_valid_pointer verifies that a pointer is pointing to the start of an
object. It is used to verify the internal points that SLUB used and
should not be modified to do anything different.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/9] mm: Hardened usercopy

2016-07-08 Thread Linus Torvalds
On Fri, Jul 8, 2016 at 1:46 AM, Ingo Molnar  wrote:
>
> Could you please try to find some syscall workload that does many small user
> copies and thus excercises this code path aggressively?

Any stat()-heavy path will hit cp_new_stat() very heavily. Think the
usual kind of "traverse the whole tree looking for something". "git
diff" will do it, just checking that everything is up-to-date.

That said, other things tend to dominate.

 Linus
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [kernel-hardening] Re: [PATCH 9/9] mm: SLUB hardened usercopy support

2016-07-08 Thread Kees Cook
On Fri, Jul 8, 2016 at 9:45 AM, Christoph Lameter  wrote:
> On Fri, 8 Jul 2016, Michael Ellerman wrote:
>
>> > I wonder if this code should be using size_from_object() instead of 
>> > s->size?

BTW, I can't reproduce this on x86 yet...

>>
>> Hmm, not sure. Who's SLUB maintainer? :)
>
> Me.
>
> s->size is the size of the whole object including debugging info etc.
> ksize() gives you the actual usable size of an object.

Is check_valid_pointer() making sure the pointer is within the usable
size? It seemed like it was checking that it was within the slub
object (checks against s->size, wants it above base after moving
pointer to include redzone, etc).

I think a potential problem with Michael's fix is that the ptr in
__check_heap_object() may not point at the _start_ of the usable
object, so doing the red zone shift isn't quite right.

This finds the ptr's offset within the slub object (since s->size is
the slub object size):

offset = (ptr - page_address(page)) % s->size;

But this looks at object_size and doesn't take into account actual size:

if (offset <= s->object_size && n <= s->object_size - offset)
return NULL;

I think offset needs to be adjusted by the size of padding, which the
restore_red_left() call had the same effect, but may not cover all
padding conditions? I'm not sure.

Should it be:

/* Find offset within slab object. */
offset = (ptr - page_address(page)) % s->size;

/* Adjust offset for meta data and padding. */
offset -= s->size - s->object_size;

/* Make sure offset and size are within bounds of the
allocation size. */
if (offset <= s->object_size && n <= s->object_size - offset)
return NULL;

?

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Need proper type casting before assignment, Remove compilation Warning.

2016-07-08 Thread Guenter Roeck
On Thu, Jul 07, 2016 at 10:31:11PM +0530, Arvind Yadav wrote:
> -Return type of 'qe_muram_alloc' is 'unsigned long', That Was trying to
> assigned in ucc_fast_tx_virtual_fifo_base_offset and
> ucc_fast_rx_virtual_fifo_base_offset. These variable are 'unsigned int'.
> So before assginment need a proper type casting.

Are they ? In the upstream kernel, they seem to be "u32".

> 
> -Passing value in IS_ERR_VALUE() is wrong, as they pass an 'int'
> into a function that takes an 'unsigned long' argument.This happens
> to work because the type is sign-extended on 64-bit architectures
> before it gets converted into an unsigned type.
> 
Not really sure I understand if/how this applies to the patch in question.
I don't see an int passed to IS_ERR_VALUE(), I only see u32.

> -Passing an 'unsigned short' or 'unsigned int'argument into
> IS_ERR_VALUE() is guaranteed to be broken, as are 8-bit integers
> and types that are wider than 'unsigned long'.
> 
What does this have to do with this patch ?

> -Any user will get compilation warning for that do not pass an
> unsigned long' argument.
> 
Sure, but that doesn't mean that typecasting the parameter to unsigned long
does any good (other than hiding the real bug).

Your subject line still does not list the affected subsystem and/or driver.
Documentation/SubmittingPatches might give some hints about proper subject
lines, and looking at other patches applied to the same file(s) might help
as well.

Also, if you want someone to review your patches, it helps to Cc: that
someone.

> Signed-off-by: Arvind Yadav 
> ---
>  drivers/soc/fsl/qe/ucc_fast.c | 11 +++
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/soc/fsl/qe/ucc_fast.c b/drivers/soc/fsl/qe/ucc_fast.c
> index a768931..98eed25 100644
> --- a/drivers/soc/fsl/qe/ucc_fast.c
> +++ b/drivers/soc/fsl/qe/ucc_fast.c
> @@ -267,8 +267,10 @@ int ucc_fast_init(struct ucc_fast_info * uf_info, struct 
> ucc_fast_private ** ucc
>  
>   /* Allocate memory for Tx Virtual Fifo */
>   uccf->ucc_fast_tx_virtual_fifo_base_offset =
> - qe_muram_alloc(uf_info->utfs, UCC_FAST_VIRT_FIFO_REGS_ALIGNMENT);
> - if (IS_ERR_VALUE(uccf->ucc_fast_tx_virtual_fifo_base_offset)) {
> + (unsigned int)qe_muram_alloc(uf_info->utfs,

I don't see the point of this typecast.

> + UCC_FAST_VIRT_FIFO_REGS_ALIGNMENT);
> + if (IS_ERR_VALUE(
> + (unsigned long)uccf->ucc_fast_tx_virtual_fifo_base_offset)) {

If sizeof(u32) == sizeof(unsigned long), this patch does not have an effect.
If sizeof(u32) < sizeof(unsigned long), it does not change anything, and the
resulting code is as wrong as it was before.

>   printk(KERN_ERR "%s: cannot allocate MURAM for TX FIFO\n",
>   __func__);
>   uccf->ucc_fast_tx_virtual_fifo_base_offset = 0;
> @@ -278,10 +280,11 @@ int ucc_fast_init(struct ucc_fast_info * uf_info, 
> struct ucc_fast_private ** ucc
>  
>   /* Allocate memory for Rx Virtual Fifo */
>   uccf->ucc_fast_rx_virtual_fifo_base_offset =
> - qe_muram_alloc(uf_info->urfs +
> + (unsigned int)qe_muram_alloc(uf_info->urfs +
>  UCC_FAST_RECEIVE_VIRTUAL_FIFO_SIZE_FUDGE_FACTOR,
>  UCC_FAST_VIRT_FIFO_REGS_ALIGNMENT);
> - if (IS_ERR_VALUE(uccf->ucc_fast_rx_virtual_fifo_base_offset)) {
> + if (IS_ERR_VALUE(
> + (unsigned long)uccf->ucc_fast_rx_virtual_fifo_base_offset)) {
>   printk(KERN_ERR "%s: cannot allocate MURAM for RX FIFO\n",
>   __func__);
>   uccf->ucc_fast_rx_virtual_fifo_base_offset = 0;
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC] arm64: kexec_file_load support

2016-07-08 Thread Thiago Jung Bauermann
Am Donnerstag, 07 Juli 2016, 14:12:45 schrieb Dave Young:
> If so maybe change a bit from your precious mentioned 7 args proposal like
> below?
> 
> struct kexec_file_fd {
>   enum kexec_file_type;
>   int fd;
> }
> 
> struct kexec_fdset {
>   int nr_fd;
>   struct kexec_file_fd fd[0];
> }
> 
> int kexec_file_load(int kernel_fd, int initrd_fd,
>   unsigned long cmdline_len, const char *cmdline_ptr,
>   unsigned long flags, struct kexec_fdset *extra_fds);


Is there a way for the kernel to distinguish whether the process passed 5 or 
6 arguments? How can it know whether extra_fds is a valid argument or just 
garbage? I think we have to define a new flag KEXEC_FILE_EXTRA_FDS so that 
the process can signal that it is using the new interface.

-- 
[]'s
Thiago Jung Bauermann
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: selftests/powerpc: Use "Delta" rather than "Error" in normal output

2016-07-08 Thread Michael Ellerman
On Wed, 2016-06-07 at 05:18:55 UTC, Michael Ellerman wrote:
> Use "Delta" to refer to the difference between measurements, rather than
> "Error", so scripts that look for "Error" aren't confused into thinking
> there was a failure.
> 
> Signed-off-by: Michael Ellerman 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/bc5c0a0d7fa94777acb9e88571

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc/kernel: Drop unused extern for current_set

2016-07-08 Thread Michael Ellerman
On Wed, 2016-29-06 at 11:25:33 UTC, Michael Ellerman wrote:
> Signed-off-by: Michael Ellerman 

Applied to powerpc next.

https://git.kernel.org/powerpc/c/fc022fdf41b7f8c48714af154b

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v7] powerpc/pci: Assign fixed PHB number based on device-tree properties

2016-07-08 Thread Michael Ellerman
On Wed, 2016-29-06 at 18:14:22 UTC, "Guilherme G. Piccoli" wrote:
> The domain/PHB field of PCI addresses has its value obtained from a
> global variable, incremented each time a new domain (represented by
> struct pci_controller) is added on the system. The domain addition
> process happens during boot or due to PHB hotplug add.
> 
> As recent kernels are using predictable naming for network interfaces,
> the network stack is more tied to PCI naming. This can be a problem in
> hotplug scenarios, because PCI addresses will change if devices are
> removed and then re-added. This situation seems unusual, but it can
> happen if a user wants to replace a NIC without rebooting the machine,
> for example.
> 
> This patch changes the way PCI domain values are generated: now, we use
> device-tree properties to assign fixed PHB numbers to PCI addresses
> when available (meaning pSeries and PowerNV cases). We also use a bitmap
> to allow dynamic PHB numbering when device-tree properties are not
> used. This bitmap keeps track of used PHB numbers and if a PHB is
> released (by hotplug operations for example), it allows the reuse of
> this PHB number, avoiding PCI address to change in case of device remove
> and re-add soon after. No functional changes were introduced.
> 
> Signed-off-by: Guilherme G. Piccoli 
> Reviewed-by: Gavin Shan 
> Reviewed-by: Ian Munsie 
> Acked-by: Gavin Shan 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/63a72284b159c569ec52f380c9

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [41/41] powerpc: Fix build with CONFIG_MEMORY_HOTPLUG on some configs

2016-07-08 Thread Michael Ellerman
On Tue, 2016-05-07 at 05:07:54 UTC, Benjamin Herrenschmidt wrote:
> For memory hotplug to work, the MMU code needs to provide the functions
> create_section_mapping() and remove_section_mapping() to respectively
> map and unmap portions of the linear mapping.
> 
> At the moment only hash64 provides these, so we provide weak stubs that
> just error out. This fixes the build with configurations such as 64-bit
> BookE with CONFIG_MEMORY_HOTPLUG enabled.
> 
> Signed-off-by: Benjamin Herrenschmidt 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/fecbfabe1dc940525f26eb1683

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [40/41] powerpc/pci: Fix build of Book3E/64 without EEH

2016-07-08 Thread Michael Ellerman
On Tue, 2016-05-07 at 05:07:53 UTC, Benjamin Herrenschmidt wrote:
> Signed-off-by: Benjamin Herrenschmidt 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/d468fcafb7a42f4e5a73219692

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [39/41] powerpc/mm: Fix build of Book3E/64 with 64K pages

2016-07-08 Thread Michael Ellerman
On Tue, 2016-05-07 at 05:07:52 UTC, Benjamin Herrenschmidt wrote:
> Signed-off-by: Benjamin Herrenschmidt 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/e93d8e67737e5b1405792d0a5b

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [kernel-hardening] Re: [PATCH 9/9] mm: SLUB hardened usercopy support

2016-07-08 Thread Christoph Lameter
On Fri, 8 Jul 2016, Michael Ellerman wrote:

> > I wonder if this code should be using size_from_object() instead of s->size?
>
> Hmm, not sure. Who's SLUB maintainer? :)

Me.

s->size is the size of the whole object including debugging info etc.
ksize() gives you the actual usable size of an object.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v3, 1/7] powerpc: add explicit #include for jump label

2016-07-08 Thread Michael Ellerman
On Wed, 2016-06-07 at 21:42:30 UTC, jba...@akamai.com wrote:
> The stringify_in_c() macro may not be included. Make the dependency
> explicit.
> 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Jason Baron 

I didn't test it, but assuming you did:

Acked-by: Michael Ellerman 

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 3/3] powerpc: implement DMA_ATTR_NO_WARN

2016-07-08 Thread Mauricio Faria de Oliveira
Add support for the DMA_ATTR_NO_WARN attribute on powerpc iommu code.

Signed-off-by: Mauricio Faria de Oliveira 
---
Changelog:
 v3:
  - powerpc: none
 v2:
  - all: address warnings from checkpatch.pl (line wrapping and typos)

 arch/powerpc/kernel/iommu.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index a8e3490..f1e20ea 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -479,7 +479,8 @@ int ppc_iommu_map_sg(struct device *dev, struct iommu_table 
*tbl,
 
/* Handle failure */
if (unlikely(entry == DMA_ERROR_CODE)) {
-   if (printk_ratelimit())
+   if (unlikely(!dma_get_attr(DMA_ATTR_NO_WARN, attrs)) &&
+   printk_ratelimit())
dev_info(dev, "iommu_alloc failed, tbl %p "
 "vaddr %lx npages %lu\n", tbl, vaddr,
 npages);
@@ -776,7 +777,8 @@ dma_addr_t iommu_map_page(struct device *dev, struct 
iommu_table *tbl,
 mask >> tbl->it_page_shift, align,
 attrs);
if (dma_handle == DMA_ERROR_CODE) {
-   if (printk_ratelimit())  {
+   if (unlikely(!dma_get_attr(DMA_ATTR_NO_WARN, attrs)) &&
+   printk_ratelimit())  {
dev_info(dev, "iommu_alloc failed, tbl %p "
 "vaddr %p npages %d\n", tbl, vaddr,
 npages);
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 2/3] nvme: implement DMA_ATTR_NO_WARN

2016-07-08 Thread Mauricio Faria de Oliveira
Use the DMA_ATTR_NO_WARN attribute on dma_map_sg() calls of nvme driver.

Signed-off-by: Mauricio Faria de Oliveira 
Reviewed-by: Gabriel Krisman Bertazi 
---
Changelog:
 v3:
  - nvme: use DMA_ATTR_NO_WARN when ret = BLK_MQ_RQ_QUEUE_BUSY (io will be
requeued) but not when ret = BLK_MQ_RQ_QUEUE_ERROR (io will be failed).
thanks: Masayoshi Mizuma 
 v2:
  - all: address warnings from checkpatch.pl (line wrapping and typos)

 drivers/nvme/host/pci.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index d1a8259..187aa6b 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -65,6 +66,8 @@ MODULE_PARM_DESC(use_cmb_sqes, "use controller's memory 
buffer for I/O SQes");
 
 static struct workqueue_struct *nvme_workq;
 
+static DEFINE_DMA_ATTRS(nvme_dma_attrs);
+
 struct nvme_dev;
 struct nvme_queue;
 
@@ -498,7 +501,8 @@ static int nvme_map_data(struct nvme_dev *dev, struct 
request *req,
goto out;
 
ret = BLK_MQ_RQ_QUEUE_BUSY;
-   if (!dma_map_sg(dev->dev, iod->sg, iod->nents, dma_dir))
+   if (!dma_map_sg_attrs(dev->dev, iod->sg, iod->nents, dma_dir,
+   _dma_attrs))
goto out;
 
if (!nvme_setup_prps(dev, req, size))
@@ -2118,6 +2122,9 @@ static int __init nvme_init(void)
result = pci_register_driver(_driver);
if (result)
destroy_workqueue(nvme_workq);
+
+   dma_set_attr(DMA_ATTR_NO_WARN, _dma_attrs);
+
return result;
 }
 
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 1/3] dma: introduce DMA_ATTR_NO_WARN

2016-07-08 Thread Mauricio Faria de Oliveira
Introduce the DMA_ATTR_NO_WARN attribute, and document it.

Signed-off-by: Mauricio Faria de Oliveira 
---
Changelog:
 v3:
  - dma: none.
 v2:
  - all: address warnings from checkpatch.pl (line wrapping and typos)

 Documentation/DMA-attributes.txt | 17 +
 include/linux/dma-attrs.h|  1 +
 2 files changed, 18 insertions(+)

diff --git a/Documentation/DMA-attributes.txt b/Documentation/DMA-attributes.txt
index e8cf9cf..48150c6 100644
--- a/Documentation/DMA-attributes.txt
+++ b/Documentation/DMA-attributes.txt
@@ -126,3 +126,20 @@ means that we won't try quite as hard to get them.
 
 NOTE: At the moment DMA_ATTR_ALLOC_SINGLE_PAGES is only implemented on ARM,
 though ARM64 patches will likely be posted soon.
+
+DMA_ATTR_NO_WARN
+
+
+This tells the DMA-mapping subsystem to suppress allocation failure reports
+(similarly to __GFP_NOWARN).
+
+On some architectures allocation failures are reported with error messages
+to the system logs.  Although this can help to identify and debug problems,
+drivers which handle failures (eg, retry later) have no problems with them,
+and can actually flood the system logs with error messages that aren't any
+problem at all, depending on the implementation of the retry mechanism.
+
+So, this provides a way for drivers to avoid those error messages on calls
+where allocation failures are not a problem, and shouldn't bother the logs.
+
+NOTE: At the moment DMA_ATTR_NO_WARN is only implemented on PowerPC.
diff --git a/include/linux/dma-attrs.h b/include/linux/dma-attrs.h
index f3c5aea..0577389 100644
--- a/include/linux/dma-attrs.h
+++ b/include/linux/dma-attrs.h
@@ -19,6 +19,7 @@ enum dma_attr {
DMA_ATTR_SKIP_CPU_SYNC,
DMA_ATTR_FORCE_CONTIGUOUS,
DMA_ATTR_ALLOC_SINGLE_PAGES,
+   DMA_ATTR_NO_WARN,
DMA_ATTR_MAX,
 };
 
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3 0/3] dma, nvme, powerpc: introduce and implement DMA_ATTR_NO_WARN

2016-07-08 Thread Mauricio Faria de Oliveira
This patchset introduces dma_attr DMA_ATTR_NO_WARN (just like __GFP_NOWARN),
which tells the DMA-mapping subsystem to suppress allocation failure reports.

On some architectures allocation failures are reported with error messages
to the system logs.  Although this can help to identify and debug problems,
drivers which handle failures (eg, retry later) have no problems with them,
and can actually flood the system logs with error messages that aren't any
problem at all, depending on the implementation of the retry mechanism.

So, this provides a way for drivers to avoid those error messages on calls
where allocation failures are not a problem, and shouldn't bother the logs.

 - Patch 1/3 introduces and documents the new dma_attr.

 - Patch 2/3 implements it on the nvme driver (which might repeatedly trip
 on allocation failures due to high load, flooding system logs
 with error messages at least on powerpc: "iommu_alloc failed")

 - Patch 3/3 implements support for it on powerpc arch (where this problem
 was observed.  It's possible to extend support for more archs
 if the patchset is welcome).

Changelog:
 v3:
  - nvme: use DMA_ATTR_NO_WARN when ret = BLK_MQ_RQ_QUEUE_BUSY (io will be
requeued) but not when ret = BLK_MQ_RQ_QUEUE_ERROR (io will be failed).
thanks: Masayoshi Mizuma 
 v2:
  - all: address warnings from checkpatch.pl (line wrapping and typos)

Mauricio Faria de Oliveira (3):
  dma: introduce DMA_ATTR_NO_WARN
  nvme: implement DMA_ATTR_NO_WARN
  powerpc: implement DMA_ATTR_NO_WARN

 Documentation/DMA-attributes.txt | 17 +
 arch/powerpc/kernel/iommu.c  |  6 --
 drivers/nvme/host/pci.c  |  9 -
 include/linux/dma-attrs.h|  1 +
 4 files changed, 30 insertions(+), 3 deletions(-)

-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/3] nvme: implement DMA_ATTR_NO_WARN

2016-07-08 Thread Mauricio Faria de Oliveira

On 07/08/2016 04:54 AM, Masayoshi Mizuma wrote:

Here, I think the error messages should not be suppressed because
the return value of nvme_map_data() is BLK_MQ_RQ_QUEUE_ERROR, so
the IO returns as -EIO.


Agree; good point.  fixed in v3.

Thanks for reviewing.

--
Mauricio Faria de Oliveira
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [1/1] KVM: PPC: Introduce KVM_CAP_PPC_HTM

2016-07-08 Thread Michael Ellerman
On Wed, 2016-06-07 at 06:05:54 UTC, Sam bobroff wrote:
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 02416fe..06d79bc 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -588,6 +588,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
> ext)
>   r = 1;
>   break;
>  #endif
> + case KVM_CAP_PPC_HTM:
> + r = cpu_has_feature(CPU_FTR_TM)
> + && is_kvmppc_hv_enabled(kvm);

I think it should be using CPU_FTR_TM_COMP.

And AFAICS you don't need to break that line.

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [kernel-hardening] Re: [PATCH 9/9] mm: SLUB hardened usercopy support

2016-07-08 Thread Michael Ellerman
Kees Cook  writes:
> On Thu, Jul 7, 2016 at 12:35 AM, Michael Ellerman  wrote:
>> I gave this a quick spin on powerpc, it blew up immediately :)
>
> Wheee :) This series is rather easy to test: blows up REALLY quickly
> if it's wrong. ;)

Better than subtle race conditions which is the usual :)

>> diff --git a/mm/slub.c b/mm/slub.c
>> index 0c8ace04f075..66191ea4545a 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -3630,6 +3630,9 @@ const char *__check_heap_object(const void *ptr, 
>> unsigned long n,
>> /* Find object. */
>> s = page->slab_cache;
>>
>> +   /* Subtract red zone if enabled */
>> +   ptr = restore_red_left(s, ptr);
>> +
>
> Ah, interesting. Just to make sure: you've built with
> CONFIG_SLUB_DEBUG and either CONFIG_SLUB_DEBUG_ON or booted with
> either slub_debug or slub_debug=z ?

Yeah built with CONFIG_SLUB_DEBUG_ON, and booted with and without slub_debug
options.

> Thanks for the slub fix!
>
> I wonder if this code should be using size_from_object() instead of s->size?

Hmm, not sure. Who's SLUB maintainer? :)

I was modelling it on the logic in check_valid_pointer(), which also does the
restore_red_left(), and then checks for % s->size:

static inline int check_valid_pointer(struct kmem_cache *s,
struct page *page, void *object)
{
void *base;

if (!object)
return 1;

base = page_address(page);
object = restore_red_left(s, object);
if (object < base || object >= base + page->objects * s->size ||
(object - base) % s->size) {
return 0;
}

return 1;
}

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v4] powerpc: Export thread_struct.used_vr/used_vsr to user space

2016-07-08 Thread Michael Ellerman
Benjamin Herrenschmidt  writes:

> On Thu, 2016-07-07 at 23:21 +1000, Benjamin Herrenschmidt wrote:
>> 
>> I think the right fix is that if a restore_sigcontext() has the MSR
>> bits set,
>> it should set the corresponding used_* flag.
>
> Something like this:
>
> (totally untested)

Simon/Laurent, can you guys test this and let me know if it works for
your usecase.

Cyril, can you give this a review, you've been touching this code the
most lately.

cheers

> diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
> index b6aa378..1bf074e 100644
> --- a/arch/powerpc/kernel/signal_32.c
> +++ b/arch/powerpc/kernel/signal_32.c
> @@ -698,6 +698,7 @@ static long restore_user_regs(struct pt_regs *regs,
>   if (__copy_from_user(>thread.vr_state, >mc_vregs,
>sizeof(sr->mc_vregs)))
>   return 1;
> + current->thread.used_vr = true;
>   } else if (current->thread.used_vr)
>   memset(>thread.vr_state, 0,
>  ELF_NVRREG * sizeof(vector128));
> @@ -724,6 +725,7 @@ static long restore_user_regs(struct pt_regs *regs,
>*/
>   if (copy_vsx_from_user(current, >mc_vsregs))
>   return 1;
> + current->thread.used_vsr = true;
>   } else if (current->thread.used_vsr)
>   for (i = 0; i < 32 ; i++)
>   current->thread.fp_state.fpr[i][TS_VSRLOWOFFSET] = 0;
> @@ -743,6 +745,7 @@ static long restore_user_regs(struct pt_regs *regs,
>   if (__copy_from_user(current->thread.evr, >mc_vregs,
>ELF_NEVRREG * sizeof(u32)))
>   return 1;
> + current->thread.used_spe = true;
>   } else if (current->thread.used_spe)
>   memset(current->thread.evr, 0, ELF_NEVRREG * sizeof(u32));
>  
> @@ -799,6 +802,7 @@ static long restore_tm_user_regs(struct pt_regs *regs,
>_sr->mc_vregs,
>sizeof(sr->mc_vregs)))
>   return 1;
> + current->thread.used_vr = true;
>   } else if (current->thread.used_vr) {
>   memset(>thread.vr_state, 0,
>  ELF_NVRREG * sizeof(vector128));
> @@ -832,6 +836,7 @@ static long restore_tm_user_regs(struct pt_regs *regs,
>   if (copy_vsx_from_user(current, >mc_vsregs) ||
>   copy_transact_vsx_from_user(current, _sr->mc_vsregs))
>   return 1;
> + current->thread.used_vsr = true;
>   } else if (current->thread.used_vsr)
>   for (i = 0; i < 32 ; i++) {
>   current->thread.fp_state.fpr[i][TS_VSRLOWOFFSET] = 0;
> @@ -848,6 +853,7 @@ static long restore_tm_user_regs(struct pt_regs *regs,
>   if (__copy_from_user(current->thread.evr, >mc_vregs,
>ELF_NEVRREG * sizeof(u32)))
>   return 1;
> + current->thread.used_spe = true;
>   } else if (current->thread.used_spe)
>   memset(current->thread.evr, 0, ELF_NEVRREG * sizeof(u32));
>  
> diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
> index 2552079..8704269 100644
> --- a/arch/powerpc/kernel/signal_64.c
> +++ b/arch/powerpc/kernel/signal_64.c
> @@ -363,9 +363,11 @@ static long restore_sigcontext(struct pt_regs *regs, 
> sigset_t *set, int sig,
>   if (v_regs && !access_ok(VERIFY_READ, v_regs, 34 * sizeof(vector128)))
>   return -EFAULT;
>   /* Copy 33 vec registers (vr0..31 and vscr) from the stack */
> - if (v_regs != NULL && (msr & MSR_VEC) != 0)
> + if (v_regs != NULL && (msr & MSR_VEC) != 0) {
>   err |= __copy_from_user(>thread.vr_state, v_regs,
>   33 * sizeof(vector128));
> + current->thread.used_vr = true;
> + }
>   else if (current->thread.used_vr)
>   memset(>thread.vr_state, 0, 33 * sizeof(vector128));
>   /* Always get VRSAVE back */
> @@ -385,9 +387,10 @@ static long restore_sigcontext(struct pt_regs *regs, 
> sigset_t *set, int sig,
>* buffer for formatting, then into the taskstruct.
>*/
>   v_regs += ELF_NVRREG;
> - if ((msr & MSR_VSX) != 0)
> + if ((msr & MSR_VSX) != 0) {
>   err |= copy_vsx_from_user(current, v_regs);
> - else
> + current->thread.used_vsr = true;
> + } else
>   for (i = 0; i < 32 ; i++)
>   current->thread.fp_state.fpr[i][TS_VSRLOWOFFSET] = 0;
>  #endif
> @@ -482,6 +485,7 @@ static long restore_tm_sigcontexts(struct pt_regs *regs,
>   33 * sizeof(vector128));
>   err |= __copy_from_user(>thread.transact_vr, tm_v_regs,
>   33 * sizeof(vector128));
> + 

RE: [PATCH] Need proper type casting before assignment, Remove compilation Warning.

2016-07-08 Thread David Laight
From: Arvind Yadav
> Sent: 07 July 2016 19:38
> -Return type of 'qe_muram_alloc' is 'unsigned long', That Was trying to
> assigned in ucc_fast_tx_virtual_fifo_base_offset and
> ucc_fast_rx_virtual_fifo_base_offset. These variable are 'unsigned int'.
> So before assginment need a proper type casting.

Are you sure, seems to me that the type of one of the fields is wrong.
The casts you are adding do not aid readability at all.

> -Passing value in IS_ERR_VALUE() is wrong, as they pass an 'int'
> into a function that takes an 'unsigned long' argument.This happens
> to work because the type is sign-extended on 64-bit architectures
> before it gets converted into an unsigned type.
> 
> -Passing an 'unsigned short' or 'unsigned int'argument into
> IS_ERR_VALUE() is guaranteed to be broken, as are 8-bit integers
> and types that are wider than 'unsigned long'.

Signed 8 and 16 bit values will be sign extended to 'int' before being
used in any arithmetic operation.
Unsigned ones get zero extended to 'int'.

That probably means you shouldn't be using IS_ERR_VALUE().

David

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [Qemu-ppc] [PATCH v2] powerpc/pseries: start rtasd before PCI probing

2016-07-08 Thread Michael Ellerman
Greg Kurz  writes:

> Ping ?

Thanks. It got lost in the flood.

It's in my testing tree and should show up in next soon.

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/9] mm: Hardened usercopy

2016-07-08 Thread Arnd Bergmann
On Thursday, July 7, 2016 1:37:43 PM CEST Kees Cook wrote:
> >
> >> + /* Allow kernel bss region (if not marked as Reserved). */
> >> + if (ptr >= (const void *)__bss_start &&
> >> + end <= (const void *)__bss_stop)
> >> + return NULL;
> >
> > accesses to .data/.rodata/.bss are probably not performance critical,
> > so we could go further here and check the kallsyms table to ensure
> > that we are not spanning multiple symbols here.
> 
> Oh, interesting! Yeah, would you be willing to put together that patch
> and test it?

Not at the moment, sorry.

I've given it a closer look and unfortunately realized that kallsyms
today only covers .text and .init.text, so it's currently useless because
those sections are already disallowed.

We could extend kallsyms to also cover all other sections, but doing
that right will likely cause a number of problems (most likely
kallsyms size mismatch) that will have to be debugged first.\

I think it's doable but time-consuming. The check function should
actually be trivial:

static bool usercopy_spans_multiple_symbols(void *ptr, size_t len)
{
unsigned long size, offset; 

if (kallsyms_lookup_size_offset((unsigned long)ptr, , ))
return 0; /* no symbol found or kallsyms disabled */

if (size - offset <= len)
return 0; /* range is within one symbol */

return 1;
}

This part would also be trivial:

diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index 1f22a186c18c..e0f37212e2a9 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -50,6 +50,11 @@ static struct addr_range text_ranges[] = {
{ "_sinittext", "_einittext" },
{ "_stext_l1",  "_etext_l1"  }, /* Blackfin on-chip L1 inst SRAM */
{ "_stext_l2",  "_etext_l2"  }, /* Blackfin on-chip L2 SRAM */
+#ifdef CONFIG_HARDENED_USERCOPY
+   { "_sdata", "_edata" },
+   { "__bss_start", "__bss_stop" },
+   { "__start_rodata", "__end_rodata" },
+#endif
 };
 #define text_range_text (_ranges[0])
 #define text_range_inittext (_ranges[1])

but I fear that if you actually try that, things start falling apart
in a big way, so I didn't try ;-)

> I wonder if there are any cases where there are
> legitimate usercopys across multiple symbols.

The only possible use case I can think of is for reading out the entire
kernel memory from /dev/kmem, but your other checks in here already
define that as illegitimate. On that subject, we probably want to
make CONFIG_DEVKMEM mutually exclusive with CONFIG_HARDENED_USERCOPY.

Arnd
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/3] nvme: implement DMA_ATTR_NO_WARN

2016-07-08 Thread Masayoshi Mizuma


On Thu, 7 Jul 2016 09:45:08 -0300 Mauricio Faria De Oliveira wrote:

Use the DMA_ATTR_NO_WARN attribute on dma_map_sg() calls of nvme driver.

Signed-off-by: Mauricio Faria de Oliveira 
Reviewed-by: Gabriel Krisman Bertazi 
---
Changelog:
  v2:
   - address warnings from checkpatch.pl (line wrapping and typos)

  drivers/nvme/host/pci.c | 12 ++--
  1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index d1a8259..a7ccad8 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -18,6 +18,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -65,6 +66,8 @@ MODULE_PARM_DESC(use_cmb_sqes, "use controller's memory buffer for 
I/O SQes");

  static struct workqueue_struct *nvme_workq;

+static DEFINE_DMA_ATTRS(nvme_dma_attrs);
+
  struct nvme_dev;
  struct nvme_queue;

@@ -498,7 +501,8 @@ static int nvme_map_data(struct nvme_dev *dev, struct 
request *req,
goto out;

ret = BLK_MQ_RQ_QUEUE_BUSY;
-   if (!dma_map_sg(dev->dev, iod->sg, iod->nents, dma_dir))
+   if (!dma_map_sg_attrs(dev->dev, iod->sg, iod->nents, dma_dir,
+   _dma_attrs))


This change is OK because the return value of nvme_map_data() is
BLK_MQ_RQ_QUEUE_BUSY, so the IO will be requeued.


goto out;

if (!nvme_setup_prps(dev, req, size))
@@ -516,7 +520,8 @@ static int nvme_map_data(struct nvme_dev *dev, struct 
request *req,
if (rq_data_dir(req))
nvme_dif_remap(req, nvme_dif_prep);

-   if (!dma_map_sg(dev->dev, >meta_sg, 1, dma_dir))
+   if (!dma_map_sg_attrs(dev->dev, >meta_sg, 1, dma_dir,
+   _dma_attrs))


Here, I think the error messages should not be suppressed because
the return value of nvme_map_data() is BLK_MQ_RQ_QUEUE_ERROR, so
the IO returns as -EIO.

- Masayoshi Mizuma


goto out_unmap;
}

@@ -2118,6 +2123,9 @@ static int __init nvme_init(void)
result = pci_register_driver(_driver);
if (result)
destroy_workqueue(nvme_workq);
+
+   dma_set_attr(DMA_ATTR_NO_WARN, _dma_attrs);
+
return result;
  }



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/9] mm: Hardened usercopy

2016-07-08 Thread Ingo Molnar

* Kees Cook  wrote:

> - I couldn't detect a measurable performance change with these features
>   enabled. Kernel build times were unchanged, hackbench was unchanged,
>   etc. I think we could flip this to "on by default" at some point.

Could you please try to find some syscall workload that does many small user 
copies and thus excercises this code path aggressively?

If that measurement works out fine then I'd prefer to enable these security 
checks 
by default.

Thaks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 3/4] perf annotate: add powerpc support

2016-07-08 Thread Michael Ellerman
Ravi Bangoria  writes:

> On Wednesday 06 July 2016 03:38 PM, Michael Ellerman wrote:
>
> I've sent v4 which enables annotate for bctr' instructions.
>
> for 'bctr', it will show down arrow(indicate jump) and 'bctrl' will show
> right arrow(indicate call). But no navigation options will be provided.
> By pressing Enter key on that, message will be shown that like
> "Invalid target"

Great thanks.

 It doesn't look like we have the opcode handy here? Could we get it 
 somehow?
 That would make this a *lot* more robust.
>>> objdump prints machine code, but I don't know how difficult that would
>>> be to parse to get opcode.
>> Normal objdump -d output includes the opcode, eg:
>>
>> c000886c:   2c 2c 00 00 cmpdi   r12,0
>>  ^^^
>>
>> The only thing you need to know is the endian and you can reconstruct
>> the raw instruction.
>>
>> Then you can just decode the opcode, see how we do it in the kernel with
>> eg. instr_is_relative_branch().
>
> I'm sorry. I was thinking that you wants to show opcodes with perf
> annotate. But you were asking to use opcode instead of parsing
> instructions.

Yeah.

> This looks like rewrite parsing code. I don't know whether there is any
> library already available for this which we can directly use. I'm thinking
> about this.

OK don't worry about it for now. We should get this merged for starters
and we can always improve it later.

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 11/14] powerpc/powernv/pci: Fallback to OPAL for TCE invalidations

2016-07-08 Thread Benjamin Herrenschmidt
If we don't find registers for the PHB or don't know the model
specific invalidation method, use OPAL calls instead.

Signed-off-by: Benjamin Herrenschmidt 
---

v2. Missed some new invalidation calls that went upstream since I
wrote the original patch.

 arch/powerpc/platforms/powernv/pci-ioda.c | 37 ++-
 1 file changed, 31 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index ac4a432..4e9b000 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1867,6 +1867,17 @@ static void pnv_pci_phb3_tce_invalidate(struct 
pnv_ioda_pe *pe, bool rm,
}
 }
 
+static inline void pnv_pci_ioda2_tce_invalidate_pe(struct pnv_ioda_pe *pe)
+{
+   struct pnv_phb *phb = pe->phb;
+
+   if (phb->model == PNV_PHB_MODEL_PHB3 && phb->regs)
+   pnv_pci_phb3_tce_invalidate_pe(pe);
+   else
+   opal_pci_tce_kill(phb->opal_id, OPAL_PCI_TCE_KILL_PE,
+ pe->pe_number, 0, 0, 0);
+}
+
 static void pnv_pci_ioda2_tce_invalidate(struct iommu_table *tbl,
unsigned long index, unsigned long npages, bool rm)
 {
@@ -1875,17 +1886,31 @@ static void pnv_pci_ioda2_tce_invalidate(struct 
iommu_table *tbl,
list_for_each_entry_rcu(tgl, >it_group_list, next) {
struct pnv_ioda_pe *pe = container_of(tgl->table_group,
struct pnv_ioda_pe, table_group);
-   if (pe->phb->type == PNV_PHB_NPU) {
+   struct pnv_phb *phb = pe->phb;
+   unsigned int shift = tbl->it_page_shift;
+
+   if (phb->type == PNV_PHB_NPU) {
/*
 * The NVLink hardware does not support TCE kill
 * per TCE entry so we have to invalidate
 * the entire cache for it.
 */
-   pnv_pci_phb3_tce_invalidate_entire(pe->phb, rm);
+   pnv_pci_phb3_tce_invalidate_entire(phb, rm);
continue;
}
-   pnv_pci_phb3_tce_invalidate(pe, rm, tbl->it_page_shift,
-   index, npages);
+   if (phb->model == PNV_PHB_MODEL_PHB3 && phb->regs)
+   pnv_pci_phb3_tce_invalidate(pe, rm, shift,
+   index, npages);
+   else if (rm)
+   opal_rm_pci_tce_kill(phb->opal_id,
+OPAL_PCI_TCE_KILL_PAGES,
+pe->pe_number, 1u << shift,
+index << shift, npages);
+   else
+   opal_pci_tce_kill(phb->opal_id,
+ OPAL_PCI_TCE_KILL_PAGES,
+ pe->pe_number, 1u << shift,
+ index << shift, npages);
}
 }
 
@@ -2151,7 +2176,7 @@ static long pnv_pci_ioda2_set_window(struct 
iommu_table_group *table_group,
 
pnv_pci_link_table_and_group(phb->hose->node, num,
tbl, >table_group);
-   pnv_pci_phb3_tce_invalidate_pe(pe);
+   pnv_pci_ioda2_tce_invalidate_pe(pe);
 
return 0;
 }
@@ -2289,7 +2314,7 @@ static long pnv_pci_ioda2_unset_window(struct 
iommu_table_group *table_group,
if (ret)
pe_warn(pe, "Unmapping failed, ret = %ld\n", ret);
else
-   pnv_pci_phb3_tce_invalidate_pe(pe);
+   pnv_pci_ioda2_tce_invalidate_pe(pe);
 
pnv_pci_unlink_table_and_group(table_group->tables[num], table_group);
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v4] powerpc: Export thread_struct.used_vr/used_vsr to user space

2016-07-08 Thread Michael Ellerman
Laurent Dufour  writes:
> On 07/07/2016 15:21, Benjamin Herrenschmidt wrote:
>> On Thu, 2016-07-07 at 15:12 +0200, Laurent Dufour wrote:
>>> Most of the time this is fine, but in the case a thread which has really
>>> used those registers is catching a signal just after the restore and
>>> before it has touched to these registers again (and so set used_vsr/vr),
>>> these registers will not be pushed in the newly built signal frame since
>>> setup_sigcontext() check for used_vsr/vr before pushing the registers on
>>> the stack.
>>> This may be an issue in the case the thread wants to changed those
>>> registers (don't ask me why :)) in the stacked signal frame from the
>>> signal handler since they will not be there...
>>>
>>> Being able to get and set the used_vr and used_vsr thread's variables,
>>> fixes this issue.
>> 
>> I think the right fix is that if a restore_sigcontext() has the MSR bits set,
>> it should set the corresponding used_* flag.
>> 
>> Or is there a reason why that won't work ?
>
> I got your point and I agree that most of the time now, the Altivec/VSX
> registers are used by libc. In that case is there still a need for the
> lazy Altivec/VSX registers dump in the signal frame ?

Probably not for new programs. But it could conceivably break old
software.

> I'm fine with your proposal, except that every restarted process will
> have the used_vr/used_vsx turned on after the restart since we can't
> check if these registers were used or not at checkpoint time.
> But that may be a minor point...

Yeah I'd argue that's not worth worrying about, at least for now.

If it *is* a problem then we can fix it later.

cheers
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] cxl: remove dead Kconfig options

2016-07-08 Thread Andrew Donnellan

On 04/07/16 17:12, Andrew Donnellan wrote:

Remove the CXL_KERNEL_API and CXL_EEH Kconfig options, as they were only
needed to coordinate the merging of the cxlflash driver. Also remove the
stub implementation of cxl_perst_reloads_same_image() in cxlflash which is
only used if CXL_EEH isn't defined (i.e. never).

Suggested-by: Ian Munsie 
Signed-off-by: Andrew Donnellan 

---

Applies on top of powerpc#next


I'm going to rebase this on top of the current Mellanox CX-4 series shortly.


Andrew

--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 14/14] powerpc/pci: Don't try to allocate resources that will be reassigned

2016-07-08 Thread Benjamin Herrenschmidt
When we know we will reassign all resources, trying (and failing)
to allocate them initially is fairly pointless and leads to a lot
of scary messages in the kernel log

Signed-off-by: Benjamin Herrenschmidt 
---
 arch/powerpc/kernel/pci-common.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index d1f91e1..fb32db4 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1370,8 +1370,10 @@ void __init pcibios_resource_survey(void)
/* Allocate and assign resources */
list_for_each_entry(b, _root_buses, node)
pcibios_allocate_bus_resources(b);
-   pcibios_allocate_resources(0);
-   pcibios_allocate_resources(1);
+   if (!pci_has_flag(PCI_REASSIGN_ALL_RSRC)) {
+   pcibios_allocate_resources(0);
+   pcibios_allocate_resources(1);
+   }
 
/* Before we start assigning unassigned resource, we try to reserve
 * the low IO area and the VGA memory area if they intersect the
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 13/14] powerpc/powernv/pci: Check status of a PHB before using it

2016-07-08 Thread Benjamin Herrenschmidt
If the firmware encounters an error (internal or HW) during initialization
of a PHB, it might leave the device-node in the tree but mark it disabled
using the "status" property. We should check it.

Signed-off-by: Benjamin Herrenschmidt 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index b48c130..f975d19 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3655,6 +3655,9 @@ static void __init pnv_pci_init_ioda_phb(struct 
device_node *np,
void *aux;
long rc;
 
+   if (!of_device_is_available(np))
+   return;
+
pr_info("Initializing %s PHB (%s)\n",
pnv_phb_names[ioda_type], of_node_full_name(np));
 
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 12/14] powerpc/powernv/pci: Use the device-tree to get available range of M64's

2016-07-08 Thread Benjamin Herrenschmidt
M64's are the configurable 64-bit windows that cover the 64-bit MMIO
space. We used to hard code 16 windows. Newer chips might have a
variable number and might need to reserve some as well (for example
on PHB4/POWER9, M32 and M64 are actually unified and we use M64#0
to map the 32-bit space).

So newer OPALs will provide a property we can use to know what range
of windows is available. The property is named so that it can
eventually support multiple ranges but we only use the first one for
now.

Signed-off-by: Benjamin Herrenschmidt 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 49 +++
 1 file changed, 43 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 32c7e1e..b48c130 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -197,9 +197,6 @@ static int pnv_ioda2_init_m64(struct pnv_phb *phb)
goto fail;
}
 
-   /* Mark the M64 BAR assigned */
-   set_bit(phb->ioda.m64_bar_idx, >ioda.m64_bar_alloc);
-
/*
 * Exclude the segments for reserved and root bus PE, which
 * are first or last two PEs.
@@ -410,6 +407,7 @@ static void __init pnv_ioda_parse_m64_window(struct pnv_phb 
*phb)
struct pci_controller *hose = phb->hose;
struct device_node *dn = hose->dn;
struct resource *res;
+   u32 m64_range[2], i;
const u32 *r;
u64 pci_addr;
 
@@ -430,6 +428,29 @@ static void __init pnv_ioda_parse_m64_window(struct 
pnv_phb *phb)
return;
}
 
+   /* Find the available M64 BAR range and pickup the last one for
+* covering the whole 64-bits space. We support only one range.
+*/
+   if (of_property_read_u32_array(dn, "ibm,opal-available-m64-ranges",
+  m64_range, 2)) {
+   /* In absence of the property, assume 0..15 */
+   m64_range[0] = 0;
+   m64_range[1] = 16;
+   }
+   /* We only support 64 bits in our allocator */
+   if (m64_range[1] > 63) {
+   pr_warn("%s: Limiting M64 range to 63 (from %d) on PHB#%x\n",
+   __func__, m64_range[1], phb->hose->global_number);
+   m64_range[1] = 63;
+   }
+   /* Empty range, no m64 */
+   if (m64_range[1] <= m64_range[0]) {
+   pr_warn("%s: M64 empty, disabling M64 usage on PHB#%x\n",
+   __func__, phb->hose->global_number);
+   return;
+   }
+
+   /* Configure M64 informations */
res = >mem_resources[1];
res->name = dn->full_name;
res->start = of_translate_address(dn, r + 2);
@@ -442,11 +463,27 @@ static void __init pnv_ioda_parse_m64_window(struct 
pnv_phb *phb)
phb->ioda.m64_segsize = phb->ioda.m64_size / phb->ioda.total_pe_num;
phb->ioda.m64_base = pci_addr;
 
-   pr_info(" MEM64 0x%016llx..0x%016llx -> 0x%016llx\n",
-   res->start, res->end, pci_addr);
+   /* This lines up nicely with the display from processing OF ranges */
+   pr_info(" MEM 0x%016llx..0x%016llx -> 0x%016llx (M64 #%d..%d)\n",
+   res->start, res->end, pci_addr, m64_range[0],
+   m64_range[0] + m64_range[1] - 1);
+
+   /* Mark all M64 used up by default */
+   phb->ioda.m64_bar_alloc = (unsigned long)-1;
 
/* Use last M64 BAR to cover M64 window */
-   phb->ioda.m64_bar_idx = 15;
+   m64_range[1]--;
+   phb->ioda.m64_bar_idx = m64_range[0] + m64_range[1];
+
+   pr_info(" Using M64 #%d as default window\n", phb->ioda.m64_bar_idx);
+
+   /* Mark remaining ones free */
+   for (i = m64_range[0]; i < m64_range[1]; i++)
+   clear_bit(i, >ioda.m64_bar_alloc);
+
+   /* Setup init functions for M64 based on IODA version, IODA3 uses
+* the IODA2 code
+*/
if (phb->type == PNV_PHB_IODA1)
phb->init_m64 = pnv_ioda1_init_m64;
else
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 11/14] powerpc/powernv/pci: Fallback to OPAL for TCE invalidations

2016-07-08 Thread Benjamin Herrenschmidt
If we don't find registers for the PHB or don't know the model
specific invalidation method, use OPAL calls instead.

Signed-off-by: Benjamin Herrenschmidt 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 33 +++
 1 file changed, 29 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index ac4a432..32c7e1e 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1867,6 +1867,17 @@ static void pnv_pci_phb3_tce_invalidate(struct 
pnv_ioda_pe *pe, bool rm,
}
 }
 
+static inline void pnv_pci_ioda2_tce_invalidate_pe(struct pnv_ioda_pe *pe)
+{
+   struct pnv_phb *phb = pe->phb;
+
+   if (phb->model == PNV_PHB_MODEL_PHB3 && phb->regs)
+   pnv_pci_phb3_tce_invalidate_pe(pe);
+   else
+   opal_pci_tce_kill(phb->opal_id, OPAL_PCI_TCE_KILL_PE,
+ pe->pe_number, 0, 0, 0);
+}
+
 static void pnv_pci_ioda2_tce_invalidate(struct iommu_table *tbl,
unsigned long index, unsigned long npages, bool rm)
 {
@@ -1875,17 +1886,31 @@ static void pnv_pci_ioda2_tce_invalidate(struct 
iommu_table *tbl,
list_for_each_entry_rcu(tgl, >it_group_list, next) {
struct pnv_ioda_pe *pe = container_of(tgl->table_group,
struct pnv_ioda_pe, table_group);
-   if (pe->phb->type == PNV_PHB_NPU) {
+   struct pnv_phb *phb = pe->phb;
+   unsigned int shift = tbl->it_page_shift;
+
+   if (phb->type == PNV_PHB_NPU) {
/*
 * The NVLink hardware does not support TCE kill
 * per TCE entry so we have to invalidate
 * the entire cache for it.
 */
-   pnv_pci_phb3_tce_invalidate_entire(pe->phb, rm);
+   pnv_pci_phb3_tce_invalidate_entire(phb, rm);
continue;
}
-   pnv_pci_phb3_tce_invalidate(pe, rm, tbl->it_page_shift,
-   index, npages);
+   if (phb->model == PNV_PHB_MODEL_PHB3 && phb->regs)
+   pnv_pci_phb3_tce_invalidate(pe, rm, shift,
+   index, npages);
+   else if (rm)
+   opal_rm_pci_tce_kill(phb->opal_id,
+OPAL_PCI_TCE_KILL_PAGES,
+pe->pe_number, 1u << shift,
+index << shift, npages);
+   else
+   opal_pci_tce_kill(phb->opal_id,
+ OPAL_PCI_TCE_KILL_PAGES,
+ pe->pe_number, 1u << shift,
+ index << shift, npages);
}
 }
 
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 10/14] powerpc/powernv/pci: Rework accessing the TCE invalidate register

2016-07-08 Thread Benjamin Herrenschmidt
It's architected, always in a known place, so there is no need
to keep a separate pointer to it, we use the existing "regs",
and we complement it with a real mode variant.

Signed-off-by: Benjamin Herrenschmidt 

# Conflicts:
#   arch/powerpc/platforms/powernv/pci-ioda.c
#   arch/powerpc/platforms/powernv/pci.h
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 69 ---
 arch/powerpc/platforms/powernv/pci.h  |  7 +---
 2 files changed, 28 insertions(+), 48 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index dc13c14..ac4a432 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1727,6 +1727,13 @@ static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe 
*pe,
}
 }
 
+static inline __be64 __iomem *pnv_ioda_get_inval_reg(struct pnv_phb *phb,
+bool real_mode)
+{
+   return real_mode ? (__be64 __iomem *)(phb->regs_phys + 0x210) :
+   (phb->regs + 0x210);
+}
+
 static void pnv_pci_p7ioc_tce_invalidate(struct iommu_table *tbl,
unsigned long index, unsigned long npages, bool rm)
 {
@@ -1735,9 +1742,7 @@ static void pnv_pci_p7ioc_tce_invalidate(struct 
iommu_table *tbl,
next);
struct pnv_ioda_pe *pe = container_of(tgl->table_group,
struct pnv_ioda_pe, table_group);
-   __be64 __iomem *invalidate = rm ?
-   (__be64 __iomem *)pe->phb->ioda.tce_inval_reg_phys :
-   pe->phb->ioda.tce_inval_reg;
+   __be64 __iomem *invalidate = pnv_ioda_get_inval_reg(pe->phb, rm);
unsigned long start, end, inc;
 
start = __pa(((__be64 *)tbl->it_base) + index - tbl->it_offset);
@@ -1815,39 +1820,36 @@ static struct iommu_table_ops pnv_ioda1_iommu_ops = {
 
 void pnv_pci_phb3_tce_invalidate_entire(struct pnv_phb *phb, bool rm)
 {
+   __be64 __iomem *invalidate = pnv_ioda_get_inval_reg(phb, rm);
const unsigned long val = PHB3_TCE_KILL_INVAL_ALL;
 
mb(); /* Ensure previous TCE table stores are visible */
if (rm)
-   __raw_rm_writeq(cpu_to_be64(val),
-   (__be64 __iomem *)
-   phb->ioda.tce_inval_reg_phys);
+   __raw_rm_writeq(cpu_to_be64(val), invalidate);
else
-   __raw_writeq(cpu_to_be64(val), phb->ioda.tce_inval_reg);
+   __raw_writeq(cpu_to_be64(val), invalidate);
 }
 
 static inline void pnv_pci_phb3_tce_invalidate_pe(struct pnv_ioda_pe *pe)
 {
/* 01xb - invalidate TCEs that match the specified PE# */
+   __be64 __iomem *invalidate = pnv_ioda_get_inval_reg(pe->phb, false);
unsigned long val = PHB3_TCE_KILL_INVAL_PE | (pe->pe_number & 0xFF);
-   struct pnv_phb *phb = pe->phb;
-
-   if (!phb->ioda.tce_inval_reg)
-   return;
 
mb(); /* Ensure above stores are visible */
-   __raw_writeq(cpu_to_be64(val), phb->ioda.tce_inval_reg);
+   __raw_writeq(cpu_to_be64(val), invalidate);
 }
 
-static void pnv_pci_phb3_tce_invalidate(unsigned pe_number, bool rm,
-   __be64 __iomem *invalidate, unsigned shift,
-   unsigned long index, unsigned long npages)
+static void pnv_pci_phb3_tce_invalidate(struct pnv_ioda_pe *pe, bool rm,
+   unsigned shift, unsigned long index,
+   unsigned long npages)
 {
+   __be64 __iomem *invalidate = pnv_ioda_get_inval_reg(pe->phb, false);
unsigned long start, end, inc;
 
/* We'll invalidate DMA address in PE scope */
start = PHB3_TCE_KILL_INVAL_ONE;
-   start |= (pe_number & 0xFF);
+   start |= (pe->pe_number & 0xFF);
end = start;
 
/* Figure out the start, end and step */
@@ -1873,10 +1875,6 @@ static void pnv_pci_ioda2_tce_invalidate(struct 
iommu_table *tbl,
list_for_each_entry_rcu(tgl, >it_group_list, next) {
struct pnv_ioda_pe *pe = container_of(tgl->table_group,
struct pnv_ioda_pe, table_group);
-   __be64 __iomem *invalidate = rm ?
-   (__be64 __iomem *)pe->phb->ioda.tce_inval_reg_phys :
-   pe->phb->ioda.tce_inval_reg;
-
if (pe->phb->type == PNV_PHB_NPU) {
/*
 * The NVLink hardware does not support TCE kill
@@ -1886,9 +1884,8 @@ static void pnv_pci_ioda2_tce_invalidate(struct 
iommu_table *tbl,
pnv_pci_phb3_tce_invalidate_entire(pe->phb, rm);
continue;
}
-   pnv_pci_phb3_tce_invalidate(pe->pe_number, rm,
-   invalidate, tbl->it_page_shift,
-   index, npages);
+   pnv_pci_phb3_tce_invalidate(pe, rm, 

[PATCH 09/14] powerpc/powernv/pci: Remove SWINV constants and obsolete TCE code

2016-07-08 Thread Benjamin Herrenschmidt
We have some obsolete code in pnv_pci_p7ioc_tce_invalidate()
to handle some internal lab tools that have stopped being
useful a long time ago. Remove that along with the definition
and test for the TCE_PCI_SWINV_* flags whose value is basically
always the same.

Signed-off-by: Benjamin Herrenschmidt 
---
 arch/powerpc/include/asm/tce.h|  3 --
 arch/powerpc/platforms/powernv/pci-ioda.c | 50 +++
 2 files changed, 10 insertions(+), 43 deletions(-)

diff --git a/arch/powerpc/include/asm/tce.h b/arch/powerpc/include/asm/tce.h
index 743f36b..12e3629 100644
--- a/arch/powerpc/include/asm/tce.h
+++ b/arch/powerpc/include/asm/tce.h
@@ -31,9 +31,6 @@
  */
 #define TCE_VB 0
 #define TCE_PCI1
-#define TCE_PCI_SWINV_CREATE   2
-#define TCE_PCI_SWINV_FREE 4
-#define TCE_PCI_SWINV_PAIR 8
 
 /* TCE page size is 4096 bytes (1 << 12) */
 
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index afb1c5e..dc13c14 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1739,29 +1739,15 @@ static void pnv_pci_p7ioc_tce_invalidate(struct 
iommu_table *tbl,
(__be64 __iomem *)pe->phb->ioda.tce_inval_reg_phys :
pe->phb->ioda.tce_inval_reg;
unsigned long start, end, inc;
-   const unsigned shift = tbl->it_page_shift;
 
start = __pa(((__be64 *)tbl->it_base) + index - tbl->it_offset);
end = __pa(((__be64 *)tbl->it_base) + index - tbl->it_offset +
npages - 1);
 
-   /* BML uses this case for p6/p7/galaxy2: Shift addr and put in node */
-   if (tbl->it_busno) {
-   start <<= shift;
-   end <<= shift;
-   inc = 128ull << shift;
-   start |= tbl->it_busno;
-   end |= tbl->it_busno;
-   } else if (tbl->it_type & TCE_PCI_SWINV_PAIR) {
-   /* p7ioc-style invalidation, 2 TCEs per write */
-   start |= (1ull << 63);
-   end |= (1ull << 63);
-   inc = 16;
-} else {
-   /* Default (older HW) */
-inc = 128;
-   }
-
+   /* p7ioc-style invalidation, 2 TCEs per write */
+   start |= (1ull << 63);
+   end |= (1ull << 63);
+   inc = 16;
 end |= inc - 1;/* round up end to be different than start */
 
 mb(); /* Ensure above stores are visible */
@@ -1787,7 +1773,7 @@ static int pnv_ioda1_tce_build(struct iommu_table *tbl, 
long index,
int ret = pnv_tce_build(tbl, index, npages, uaddr, direction,
attrs);
 
-   if (!ret && (tbl->it_type & TCE_PCI_SWINV_CREATE))
+   if (!ret)
pnv_pci_p7ioc_tce_invalidate(tbl, index, npages, false);
 
return ret;
@@ -1799,8 +1785,7 @@ static int pnv_ioda1_tce_xchg(struct iommu_table *tbl, 
long index,
 {
long ret = pnv_tce_xchg(tbl, index, hpa, direction);
 
-   if (!ret && (tbl->it_type &
-   (TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE)))
+   if (!ret)
pnv_pci_p7ioc_tce_invalidate(tbl, index, 1, false);
 
return ret;
@@ -1812,8 +1797,7 @@ static void pnv_ioda1_tce_free(struct iommu_table *tbl, 
long index,
 {
pnv_tce_free(tbl, index, npages);
 
-   if (tbl->it_type & TCE_PCI_SWINV_FREE)
-   pnv_pci_p7ioc_tce_invalidate(tbl, index, npages, false);
+   pnv_pci_p7ioc_tce_invalidate(tbl, index, npages, false);
 }
 
 static struct iommu_table_ops pnv_ioda1_iommu_ops = {
@@ -1916,7 +1900,7 @@ static int pnv_ioda2_tce_build(struct iommu_table *tbl, 
long index,
int ret = pnv_tce_build(tbl, index, npages, uaddr, direction,
attrs);
 
-   if (!ret && (tbl->it_type & TCE_PCI_SWINV_CREATE))
+   if (!ret)
pnv_pci_ioda2_tce_invalidate(tbl, index, npages, false);
 
return ret;
@@ -1928,8 +1912,7 @@ static int pnv_ioda2_tce_xchg(struct iommu_table *tbl, 
long index,
 {
long ret = pnv_tce_xchg(tbl, index, hpa, direction);
 
-   if (!ret && (tbl->it_type &
-   (TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE)))
+   if (!ret)
pnv_pci_ioda2_tce_invalidate(tbl, index, 1, false);
 
return ret;
@@ -1941,8 +1924,7 @@ static void pnv_ioda2_tce_free(struct iommu_table *tbl, 
long index,
 {
pnv_tce_free(tbl, index, npages);
 
-   if (tbl->it_type & TCE_PCI_SWINV_FREE)
-   pnv_pci_ioda2_tce_invalidate(tbl, index, npages, false);
+   pnv_pci_ioda2_tce_invalidate(tbl, index, npages, false);
 }
 
 static void pnv_ioda2_table_free(struct iommu_table *tbl)
@@ -2111,12 +2093,6 @@ found:
  base * PNV_IODA1_DMA32_SEGSIZE,
  IOMMU_PAGE_SHIFT_4K);
 
-   /* OPAL variant of P7IOC SW invalidated TCEs */
-   if 

[PATCH 08/14] powerpc/powernv/pci: Rename TCE invalidation calls

2016-07-08 Thread Benjamin Herrenschmidt
The TCE invalidation functions are fairly implementation specific,
and while the IODA specs more/less describe the register, in practice
various implementation workarounds may be required. So name the
functions after the target PHB.

Note today and for the foreseeable future, there's a 1:1 relationship
between an IODA version and a PHB implementation. There exist another
variant of IODA1 (Torrent) but we never supported in with OPAL and
never will.

Signed-off-by: Benjamin Herrenschmidt 
---
 arch/powerpc/platforms/powernv/npu-dma.c  |  8 +++
 arch/powerpc/platforms/powernv/pci-ioda.c | 36 +++
 arch/powerpc/platforms/powernv/pci.h  |  4 +---
 3 files changed, 23 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
b/arch/powerpc/platforms/powernv/npu-dma.c
index 0459e10..4383a5f 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -180,7 +180,7 @@ long pnv_npu_set_window(struct pnv_ioda_pe *npe, int num,
pe_err(npe, "Failed to configure TCE table, err %lld\n", rc);
return rc;
}
-   pnv_pci_ioda2_tce_invalidate_entire(phb, false);
+   pnv_pci_phb3_tce_invalidate_entire(phb, false);
 
/* Add the table to the list so its TCE cache will get invalidated */
pnv_pci_link_table_and_group(phb->hose->node, num,
@@ -204,7 +204,7 @@ long pnv_npu_unset_window(struct pnv_ioda_pe *npe, int num)
pe_err(npe, "Unmapping failed, ret = %lld\n", rc);
return rc;
}
-   pnv_pci_ioda2_tce_invalidate_entire(phb, false);
+   pnv_pci_phb3_tce_invalidate_entire(phb, false);
 
pnv_pci_unlink_table_and_group(npe->table_group.tables[num],
>table_group);
@@ -270,7 +270,7 @@ static int pnv_npu_dma_set_bypass(struct pnv_ioda_pe *npe)
0 /* bypass base */, top);
 
if (rc == OPAL_SUCCESS)
-   pnv_pci_ioda2_tce_invalidate_entire(phb, false);
+   pnv_pci_phb3_tce_invalidate_entire(phb, false);
 
return rc;
 }
@@ -334,7 +334,7 @@ void pnv_npu_take_ownership(struct pnv_ioda_pe *npe)
pe_err(npe, "Failed to disable bypass, err %lld\n", rc);
return;
}
-   pnv_pci_ioda2_tce_invalidate_entire(npe->phb, false);
+   pnv_pci_phb3_tce_invalidate_entire(npe->phb, false);
 }
 
 struct pnv_ioda_pe *pnv_pci_npu_setup_iommu(struct pnv_ioda_pe *npe)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 092b2e6..afb1c5e 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1727,7 +1727,7 @@ static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe,
}
 }
 
-static void pnv_pci_ioda1_tce_invalidate(struct iommu_table *tbl,
+static void pnv_pci_p7ioc_tce_invalidate(struct iommu_table *tbl,
unsigned long index, unsigned long npages, bool rm)
 {
struct iommu_table_group_link *tgl = list_first_entry_or_null(
@@ -1788,7 +1788,7 @@ static int pnv_ioda1_tce_build(struct iommu_table *tbl, 
long index,
attrs);
 
if (!ret && (tbl->it_type & TCE_PCI_SWINV_CREATE))
-   pnv_pci_ioda1_tce_invalidate(tbl, index, npages, false);
+   pnv_pci_p7ioc_tce_invalidate(tbl, index, npages, false);
 
return ret;
 }
@@ -1801,7 +1801,7 @@ static int pnv_ioda1_tce_xchg(struct iommu_table *tbl, 
long index,
 
if (!ret && (tbl->it_type &
(TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE)))
-   pnv_pci_ioda1_tce_invalidate(tbl, index, 1, false);
+   pnv_pci_p7ioc_tce_invalidate(tbl, index, 1, false);
 
return ret;
 }
@@ -1813,7 +1813,7 @@ static void pnv_ioda1_tce_free(struct iommu_table *tbl, 
long index,
pnv_tce_free(tbl, index, npages);
 
if (tbl->it_type & TCE_PCI_SWINV_FREE)
-   pnv_pci_ioda1_tce_invalidate(tbl, index, npages, false);
+   pnv_pci_p7ioc_tce_invalidate(tbl, index, npages, false);
 }
 
 static struct iommu_table_ops pnv_ioda1_iommu_ops = {
@@ -1825,13 +1825,13 @@ static struct iommu_table_ops pnv_ioda1_iommu_ops = {
.get = pnv_tce_get,
 };
 
-#define TCE_KILL_INVAL_ALL  PPC_BIT(0)
-#define TCE_KILL_INVAL_PE   PPC_BIT(1)
-#define TCE_KILL_INVAL_TCE  PPC_BIT(2)
+#define PHB3_TCE_KILL_INVAL_ALLPPC_BIT(0)
+#define PHB3_TCE_KILL_INVAL_PE PPC_BIT(1)
+#define PHB3_TCE_KILL_INVAL_ONEPPC_BIT(2)
 
-void pnv_pci_ioda2_tce_invalidate_entire(struct pnv_phb *phb, bool rm)
+void pnv_pci_phb3_tce_invalidate_entire(struct pnv_phb *phb, bool rm)
 {
-   const unsigned long val = TCE_KILL_INVAL_ALL;
+   const unsigned long val = PHB3_TCE_KILL_INVAL_ALL;
 
mb(); /* Ensure previous TCE table stores are visible */
if (rm)
@@ -1842,10 

[PATCH 07/14] powerpc/opal: Add real mode call wrappers

2016-07-08 Thread Benjamin Herrenschmidt
Replace the old generic opal_call_realmode() with proper per-call
wrappers similar to the normal ones and convert callers.

Signed-off-by: Benjamin Herrenschmidt 
---
 arch/powerpc/include/asm/opal-api.h| 10 +++-
 arch/powerpc/include/asm/opal.h|  6 +++
 arch/powerpc/kernel/idle_power7.S  | 16 ++-
 arch/powerpc/platforms/powernv/opal-wrappers.S | 63 +-
 4 files changed, 51 insertions(+), 44 deletions(-)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 170ba0c..273f7b3 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -166,7 +166,8 @@
 #defineOPAL_INT_SET_CPPR   123
 #define OPAL_INT_EOI   124
 #define OPAL_INT_SET_MFRR  125
-#define OPAL_LAST  125
+#define OPAL_PCI_TCE_KILL  126
+#define OPAL_LAST  126
 
 /* Device tree flags */
 
@@ -912,6 +913,13 @@ enum {
OPAL_REBOOT_PLATFORM_ERROR  = 1,
 };
 
+/* Argument to OPAL_PCI_TCE_KILL */
+enum {
+   OPAL_PCI_TCE_KILL_PAGES,
+   OPAL_PCI_TCE_KILL_PE,
+   OPAL_PCI_TCE_KILL_ALL,
+};
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __OPAL_API_H */
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 42f0c95..8edf8d4 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -222,6 +222,12 @@ int64_t opal_int_get_xirr(uint32_t *out_xirr, bool 
just_poll);
 int64_t opal_int_set_cppr(uint8_t cppr);
 int64_t opal_int_eoi(uint32_t xirr);
 int64_t opal_int_set_mfrr(uint32_t cpu, uint8_t mfrr);
+int64_t opal_pci_tce_kill(uint64_t phb_id, uint32_t kill_type,
+ uint32_t pe_num, uint32_t tce_size,
+ uint64_t dma_addr, uint32_t npages);
+int64_t opal_rm_pci_tce_kill(uint64_t phb_id, uint32_t kill_type,
+uint32_t pe_num, uint32_t tce_size,
+uint64_t dma_addr, uint32_t npages);
 
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
diff --git a/arch/powerpc/kernel/idle_power7.S 
b/arch/powerpc/kernel/idle_power7.S
index 470ceeb..c93f825 100644
--- a/arch/powerpc/kernel/idle_power7.S
+++ b/arch/powerpc/kernel/idle_power7.S
@@ -196,8 +196,7 @@ fastsleep_workaround_at_entry:
/* Fast sleep workaround */
li  r3,1
li  r4,1
-   li  r0,OPAL_CONFIG_CPU_IDLE_STATE
-   bl  opal_call_realmode
+   bl  opal_rm_config_cpu_idle_state
 
/* Clear Lock bit */
li  r0,0
@@ -270,8 +269,7 @@ ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66);
\
ld  r2,PACATOC(r13);\
ld  r1,PACAR1(r13); \
std r3,ORIG_GPR3(r1);   /* Save original r3 */  \
-   li  r0,OPAL_HANDLE_HMI; /* Pass opal token argument*/   \
-   bl  opal_call_realmode; \
+   bl  opal_rm_handle_hmi; \
ld  r3,ORIG_GPR3(r1);   /* Restore original r3 */   \
 20:nop;
 
@@ -284,7 +282,7 @@ _GLOBAL(power7_wakeup_tb_loss)
 * and they are restored before switching to the process context. Hence
 * until they are restored, they are free to be used.
 *
-* Save SRR1 in a NVGPR as it might be clobbered in opal_call_realmode
+* Save SRR1 in a NVGPR as it might be clobbered in opal call
 * (called in CHECK_HMI_INTERRUPT). SRR1 is required to determine the
 * wakeup reason if we branch to kvm_start_guest.
 */
@@ -378,10 +376,7 @@ timebase_resync:
 * set in exceptions-64s.S */
ble cr3,clear_lock
/* Time base re-sync */
-   li  r0,OPAL_RESYNC_TIMEBASE
-   bl  opal_call_realmode;
-   /* TODO: Check r3 for failure */
-
+   bl  opal_rm_resync_timebase;
/*
 * If waking up from sleep, per core state is not lost, skip to
 * clear_lock.
@@ -469,8 +464,7 @@ hypervisor_state_restored:
 fastsleep_workaround_at_exit:
li  r3,1
li  r4,0
-   li  r0,OPAL_CONFIG_CPU_IDLE_STATE
-   bl  opal_call_realmode
+   bl  opal_rm_config_cpu_idle_state
b   timebase_resync
 
 /*
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S 
b/arch/powerpc/platforms/powernv/opal-wrappers.S
index c7764f9..cf928bb 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -59,7 +59,7 @@ END_FTR_SECTION(0, 1);
\
 #define OPAL_CALL(name, token) \
  _GLOBAL_TOC(name);\

[PATCH 06/14] powerpc/pseries/pci: Remove obsolete SW invalidate

2016-07-08 Thread Benjamin Herrenschmidt
That was used by some old IBM internal bringup tools and is
no longer relevant.

Signed-off-by: Benjamin Herrenschmidt 
---
 arch/powerpc/platforms/pseries/iommu.c | 53 +-
 1 file changed, 1 insertion(+), 52 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index 3e8865b..770a753 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -120,35 +120,6 @@ static void iommu_pseries_free_group(struct 
iommu_table_group *table_group,
kfree(table_group);
 }
 
-static void tce_invalidate_pSeries_sw(struct iommu_table *tbl,
- __be64 *startp, __be64 *endp)
-{
-   u64 __iomem *invalidate = (u64 __iomem *)tbl->it_index;
-   unsigned long start, end, inc;
-
-   start = __pa(startp);
-   end = __pa(endp);
-   inc = L1_CACHE_BYTES; /* invalidate a cacheline of TCEs at a time */
-
-   /* If this is non-zero, change the format.  We shift the
-* address and or in the magic from the device tree. */
-   if (tbl->it_busno) {
-   start <<= 12;
-   end <<= 12;
-   inc <<= 12;
-   start |= tbl->it_busno;
-   end |= tbl->it_busno;
-   }
-
-   end |= inc - 1; /* round up end to be different than start */
-
-   mb(); /* Make sure TCEs in memory are written */
-   while (start <= end) {
-   out_be64(invalidate, start);
-   start += inc;
-   }
-}
-
 static int tce_build_pSeries(struct iommu_table *tbl, long index,
  long npages, unsigned long uaddr,
  enum dma_data_direction direction,
@@ -173,9 +144,6 @@ static int tce_build_pSeries(struct iommu_table *tbl, long 
index,
uaddr += TCE_PAGE_SIZE;
tcep++;
}
-
-   if (tbl->it_type & TCE_PCI_SWINV_CREATE)
-   tce_invalidate_pSeries_sw(tbl, tces, tcep - 1);
return 0;
 }
 
@@ -188,9 +156,6 @@ static void tce_free_pSeries(struct iommu_table *tbl, long 
index, long npages)
 
while (npages--)
*(tcep++) = 0;
-
-   if (tbl->it_type & TCE_PCI_SWINV_FREE)
-   tce_invalidate_pSeries_sw(tbl, tces, tcep - 1);
 }
 
 static unsigned long tce_get_pseries(struct iommu_table *tbl, long index)
@@ -537,7 +502,7 @@ static void iommu_table_setparms(struct pci_controller *phb,
 struct iommu_table *tbl)
 {
struct device_node *node;
-   const unsigned long *basep, *sw_inval;
+   const unsigned long *basep;
const u32 *sizep;
 
node = phb->dn;
@@ -575,22 +540,6 @@ static void iommu_table_setparms(struct pci_controller 
*phb,
tbl->it_index = 0;
tbl->it_blocksize = 16;
tbl->it_type = TCE_PCI;
-
-   sw_inval = of_get_property(node, "linux,tce-sw-invalidate-info", NULL);
-   if (sw_inval) {
-   /*
-* This property contains information on how to
-* invalidate the TCE entry.  The first property is
-* the base MMIO address used to invalidate entries.
-* The second property tells us the format of the TCE
-* invalidate (whether it needs to be shifted) and
-* some magic routing info to add to our invalidate
-* command.
-*/
-   tbl->it_index = (unsigned long) ioremap(sw_inval[0], 8);
-   tbl->it_busno = sw_inval[1]; /* overload this with magic */
-   tbl->it_type = TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE;
-   }
 }
 
 /*
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 05/14] powerpc/powernv: Discover IODA3 PHBs

2016-07-08 Thread Benjamin Herrenschmidt
We instanciate them as IODA2. We also change the MSI EOI hack
to only kick on PHB3 since it will not be needed on any new
implementation.

Signed-off-by: Benjamin Herrenschmidt 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 3 ++-
 arch/powerpc/platforms/powernv/pci.c  | 4 
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 0be6f2b..092b2e6 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2711,7 +2711,8 @@ static void set_msi_irq_chip(struct pnv_phb *phb, 
unsigned int virq)
struct irq_data *idata;
struct irq_chip *ichip;
 
-   if (phb->type != PNV_PHB_IODA2)
+   /* The MSI EOI OPAL call is only needed on PHB3 */
+   if (phb->model != PNV_PHB_MODEL_PHB3)
return;
 
if (!phb->ioda.irq_chip_init) {
diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index 62c7637..4617ea2 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -932,6 +932,10 @@ void __init pnv_pci_init(void)
for_each_compatible_node(np, NULL, "ibm,ioda2-phb")
pnv_pci_init_ioda2_phb(np);
 
+   /* Look for ioda3 built-in PHB4's, we treat them as IODA2 */
+   for_each_compatible_node(np, NULL, "ibm,ioda3-phb")
+   pnv_pci_init_ioda2_phb(np);
+
/* Look for NPU PHBs */
for_each_compatible_node(np, NULL, "ibm,ioda2-npu-phb")
pnv_pci_init_npu_phb(np);
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 04/14] powerpc/xics: Add ICP OPAL backend

2016-07-08 Thread Benjamin Herrenschmidt
This adds a new XICS backend that uses OPAL calls, which can be
used when we don't have native support for the platform interrupt
controller.

Signed-off-by: Benjamin Herrenschmidt 
---
 arch/powerpc/include/asm/xics.h|   8 +-
 arch/powerpc/sysdev/xics/Makefile  |   2 +-
 arch/powerpc/sysdev/xics/icp-opal.c| 144 +
 arch/powerpc/sysdev/xics/xics-common.c |   5 +-
 4 files changed, 156 insertions(+), 3 deletions(-)
 create mode 100644 arch/powerpc/sysdev/xics/icp-opal.c

diff --git a/arch/powerpc/include/asm/xics.h b/arch/powerpc/include/asm/xics.h
index 04ef3ae..a30d845 100644
--- a/arch/powerpc/include/asm/xics.h
+++ b/arch/powerpc/include/asm/xics.h
@@ -42,6 +42,12 @@ extern int icp_hv_init(void);
 static inline int icp_hv_init(void) { return -ENODEV; }
 #endif
 
+#ifdef CONFIG_PPC_POWERNV
+extern int icp_opal_init(void);
+#else
+static inline int icp_opal_init(void) { return -ENODEV; }
+#endif
+
 /* ICP ops */
 struct icp_ops {
unsigned int (*get_irq)(void);
@@ -135,7 +141,7 @@ static inline void xics_set_base_cppr(unsigned char cppr)
 static inline unsigned char xics_cppr_top(void)
 {
struct xics_cppr *os_cppr = this_cpu_ptr(_cppr);
-   
+
return os_cppr->stack[os_cppr->index];
 }
 
diff --git a/arch/powerpc/sysdev/xics/Makefile 
b/arch/powerpc/sysdev/xics/Makefile
index c606aa8..5d7f5a6 100644
--- a/arch/powerpc/sysdev/xics/Makefile
+++ b/arch/powerpc/sysdev/xics/Makefile
@@ -4,4 +4,4 @@ obj-y   += xics-common.o
 obj-$(CONFIG_PPC_ICP_NATIVE)   += icp-native.o
 obj-$(CONFIG_PPC_ICP_HV)   += icp-hv.o
 obj-$(CONFIG_PPC_ICS_RTAS) += ics-rtas.o
-obj-$(CONFIG_PPC_POWERNV)  += ics-opal.o
+obj-$(CONFIG_PPC_POWERNV)  += ics-opal.o icp-opal.o
diff --git a/arch/powerpc/sysdev/xics/icp-opal.c 
b/arch/powerpc/sysdev/xics/icp-opal.c
new file mode 100644
index 000..eb484e9
--- /dev/null
+++ b/arch/powerpc/sysdev/xics/icp-opal.c
@@ -0,0 +1,144 @@
+/*
+ * Copyright 2011 IBM Corporation.
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static void icp_opal_teardown_cpu(void)
+{
+   int cpu = smp_processor_id();
+
+   /* Clear any pending IPI */
+   opal_int_set_mfrr(cpu, 0xff);
+}
+
+static void icp_opal_flush_ipi(void)
+{
+   /* We take the ipi irq but and never return so we
+* need to EOI the IPI, but want to leave our priority 0
+*
+* should we check all the other interrupts too?
+* should we be flagging idle loop instead?
+* or creating some task to be scheduled?
+*/
+
+   opal_int_eoi((0x00 << 24) | XICS_IPI);
+}
+
+static unsigned int icp_opal_get_irq(void)
+{
+   unsigned int xirr;
+   unsigned int vec;
+   unsigned int irq;
+   int64_t rc;
+
+   rc = opal_int_get_xirr(, false);
+   if (rc < 0)
+   return NO_IRQ;
+   xirr = be32_to_cpu(xirr);
+   vec = xirr & 0x00ff;
+   if (vec == XICS_IRQ_SPURIOUS)
+   return NO_IRQ;
+
+   irq = irq_find_mapping(xics_host, vec);
+   if (likely(irq != NO_IRQ)) {
+   xics_push_cppr(vec);
+   return irq;
+   }
+
+   /* We don't have a linux mapping, so have rtas mask it. */
+   xics_mask_unknown_vec(vec);
+
+   /* We might learn about it later, so EOI it */
+   opal_int_eoi(xirr);
+
+   return NO_IRQ;
+}
+
+static void icp_opal_set_cpu_priority(unsigned char cppr)
+{
+   xics_set_base_cppr(cppr);
+   opal_int_set_cppr(cppr);
+   iosync();
+}
+
+static void icp_opal_eoi(struct irq_data *d)
+{
+   unsigned int hw_irq = (unsigned int)irqd_to_hwirq(d);
+   int64_t rc;
+
+   iosync();
+   rc = opal_int_eoi((xics_pop_cppr() << 24) | hw_irq);
+
+   /* EOI tells us whether there are more interrupts to fetch.
+*
+* Some HW implementations might not be able to send us another
+* external interrupt in that case, so we force a replay.
+*/
+   if (rc > 0)
+   force_external_irq_replay();
+}
+
+#ifdef CONFIG_SMP
+
+static void icp_opal_cause_ipi(int cpu, unsigned long data)
+{
+   opal_int_set_mfrr(cpu, IPI_PRIORITY);
+}
+
+static irqreturn_t icp_opal_ipi_action(int irq, void *dev_id)
+{
+   int cpu = smp_processor_id();
+
+   opal_int_set_mfrr(cpu, 0xff);
+
+   return smp_ipi_demux();
+}
+
+#endif /* CONFIG_SMP */
+
+static const struct icp_ops icp_opal_ops = {
+   .get_irq= icp_opal_get_irq,
+   .eoi= icp_opal_eoi,
+   .set_priority   = icp_opal_set_cpu_priority,
+   

[PATCH 03/14] powerpc/irq: Add mechanism to force a replay of interrupts

2016-07-08 Thread Benjamin Herrenschmidt
Calling this function with interrupts soft-disabled will cause
a replay of the external interrupt vector when they are re-enabled.

This will be used by the OPAL XICS backend (and latter by the native
XIVE code) to handle EOI signaling that there are more interrupts to
fetch from the hardware since the hardware won't issue another HW
interrupt in that case.

Signed-off-by: Benjamin Herrenschmidt 
---
 arch/powerpc/include/asm/hw_irq.h |  2 ++
 arch/powerpc/kernel/irq.c | 14 ++
 2 files changed, 16 insertions(+)

diff --git a/arch/powerpc/include/asm/hw_irq.h 
b/arch/powerpc/include/asm/hw_irq.h
index b59ac27..c7d82ff 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -130,6 +130,8 @@ static inline bool arch_irq_disabled_regs(struct pt_regs 
*regs)
 
 extern bool prep_irq_for_idle(void);
 
+extern void force_external_irq_replay(void);
+
 #else /* CONFIG_PPC64 */
 
 #define SET_MSR_EE(x)  mtmsr(x)
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 3cb46a3..604e3dd 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -342,6 +342,20 @@ bool prep_irq_for_idle(void)
return true;
 }
 
+/* Force a replay of the external interrupt handler on this
+ * CPU.
+ */
+void force_external_irq_replay(void)
+{
+   /* This must only be called with interrupts soft-disabled,
+* the replay will happen when re-enabling
+*/
+   WARN_ON(!arch_irqs_disabled());
+
+   /* Indicate in the PACA that we have an interrupt to replay */
+   local_paca->irq_happened |= PACA_IRQ_EE;
+}
+
 #endif /* CONFIG_PPC64 */
 
 int arch_show_interrupts(struct seq_file *p, int prec)
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 02/14] powerpc/irq: Add support for HV virtualization interrupts

2016-07-08 Thread Benjamin Herrenschmidt
This will be delivering external interrupts from the XIVE to the
Hypervisor. We treat it as a normal external interrupt for the
lazy irq disable code (so it will be replayed as a 0x500) and
route it to do_IRQ.

Signed-off-by: Benjamin Herrenschmidt 
---
 arch/powerpc/include/asm/exception-64s.h |  2 ++
 arch/powerpc/include/asm/reg.h   |  1 +
 arch/powerpc/kernel/cpu_setup_power.S|  2 ++
 arch/powerpc/kernel/exceptions-64s.S | 19 +++
 4 files changed, 24 insertions(+)

diff --git a/arch/powerpc/include/asm/exception-64s.h 
b/arch/powerpc/include/asm/exception-64s.h
index 93ae809..c7d2773 100644
--- a/arch/powerpc/include/asm/exception-64s.h
+++ b/arch/powerpc/include/asm/exception-64s.h
@@ -403,6 +403,8 @@ label##_relon_hv:   
\
 #define SOFTEN_VALUE_0xe82 PACA_IRQ_DBELL
 #define SOFTEN_VALUE_0xe60 PACA_IRQ_HMI
 #define SOFTEN_VALUE_0xe62 PACA_IRQ_HMI
+#define SOFTEN_VALUE_0xea0 PACA_IRQ_EE
+#define SOFTEN_VALUE_0xea2 PACA_IRQ_EE
 
 #define __SOFTEN_TEST(h, vec)  \
lbz r10,PACASOFTIRQEN(r13); \
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 320136f..3c60a40 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -352,6 +352,7 @@
 #define   LPCR_LPES1   0x0004  /* LPAR Env selector 1 */
 #define   LPCR_LPES_SH 2
 #define   LPCR_RMI 0x0002  /* real mode is cache inhibit */
+#define   LPCR_HVICE   0x0002  /* P9: HV interrupt enable */
 #define   LPCR_HDICE   0x0001  /* Hyp Decr enable (HV,PR,EE) */
 #define   LPCR_UPRT0x0040  /* Use Process Table (ISA 3) */
 #ifndef SPRN_LPID
diff --git a/arch/powerpc/kernel/cpu_setup_power.S 
b/arch/powerpc/kernel/cpu_setup_power.S
index ec8a228..52ff3f0 100644
--- a/arch/powerpc/kernel/cpu_setup_power.S
+++ b/arch/powerpc/kernel/cpu_setup_power.S
@@ -99,6 +99,7 @@ _GLOBAL(__setup_cpu_power9)
mtspr   SPRN_LPID,r0
mfspr   r3,SPRN_LPCR
ori r3, r3, LPCR_PECEDH
+   ori r3, r3, LPCR_HVICE
bl  __init_LPCR
bl  __init_HFSCR
bl  __init_tlb_power9
@@ -118,6 +119,7 @@ _GLOBAL(__restore_cpu_power9)
mtspr   SPRN_LPID,r0
mfspr   r3,SPRN_LPCR
ori r3, r3, LPCR_PECEDH
+   ori r3, r3, LPCR_HVICE
bl  __init_LPCR
bl  __init_HFSCR
bl  __init_tlb_power9
diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 4c94406..5726d84 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -351,6 +351,12 @@ hv_doorbell_trampoline:
EXCEPTION_PROLOG_0(PACA_EXGEN)
b   h_doorbell_hv
 
+   . = 0xea0
+hv_virt_irq_trampoline:
+   SET_SCRATCH0(r13)
+   EXCEPTION_PROLOG_0(PACA_EXGEN)
+   b   h_virt_irq_hv
+
/* We need to deal with the Altivec unavailable exception
 * here which is at 0xf20, thus in the middle of the
 * prolog code of the PerformanceMonitor one. A little
@@ -601,6 +607,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
MASKABLE_EXCEPTION_HV_OOL(0xe82, h_doorbell)
KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xe82)
 
+   MASKABLE_EXCEPTION_HV_OOL(0xea2, h_virt_irq)
+   KVM_HANDLER(PACA_EXGEN, EXC_HV, 0xea2)
+
/* moved from 0xf00 */
STD_EXCEPTION_PSERIES_OOL(0xf00, performance_monitor)
KVM_HANDLER(PACA_EXGEN, EXC_STD, 0xf00)
@@ -680,6 +689,8 @@ _GLOBAL(__replay_interrupt)
 BEGIN_FTR_SECTION
cmpwi   r3,0xe80
beq h_doorbell_common
+   cmpwi   r3,0xea0
+   beq h_virt_irq_common
 FTR_SECTION_ELSE
cmpwi   r3,0xa00
beq doorbell_super_common
@@ -754,6 +765,7 @@ kvmppc_skip_Hinterrupt:
 #else
STD_EXCEPTION_COMMON_ASYNC(0xe80, h_doorbell, unknown_exception)
 #endif
+   STD_EXCEPTION_COMMON_ASYNC(0xea0, h_virt_irq, do_IRQ)
STD_EXCEPTION_COMMON_ASYNC(0xf00, performance_monitor, 
performance_monitor_exception)
STD_EXCEPTION_COMMON(0x1300, instruction_breakpoint, 
instruction_breakpoint_exception)
STD_EXCEPTION_COMMON(0x1502, denorm, unknown_exception)
@@ -877,6 +889,12 @@ h_doorbell_relon_trampoline:
EXCEPTION_PROLOG_0(PACA_EXGEN)
b   h_doorbell_relon_hv
 
+   . = 0x4ea0
+h_virt_irq_relon_trampoline:
+   SET_SCRATCH0(r13)
+   EXCEPTION_PROLOG_0(PACA_EXGEN)
+   b   h_virt_irq_relon_hv
+
. = 0x4f00
 performance_monitor_relon_pseries_trampoline:
SET_SCRATCH0(r13)
@@ -1137,6 +1155,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
/* Equivalents to the above handlers for relocation-on interrupt 
vectors */
STD_RELON_EXCEPTION_HV_OOL(0xe40, emulation_assist)
MASKABLE_RELON_EXCEPTION_HV_OOL(0xe80, h_doorbell)
+   

[PATCH 01/14] powerpc/powernv: Add XICS emulation APIs

2016-07-08 Thread Benjamin Herrenschmidt
OPAL provides an emulated XICS interrupt controller to
use as a fallback on newer processors that don't have a
XICS. It's meant as a way to provide backward compatibility
with future processors. Add the corresponding interfaces.

Signed-off-by: Benjamin Herrenschmidt 
---
 arch/powerpc/include/asm/opal-api.h| 6 +-
 arch/powerpc/include/asm/opal.h| 5 +
 arch/powerpc/platforms/powernv/opal-wrappers.S | 4 
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 72b5f27..170ba0c 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -162,7 +162,11 @@
 #define OPAL_PCI_GET_PRESENCE_STATE119
 #define OPAL_PCI_GET_POWER_STATE   120
 #define OPAL_PCI_SET_POWER_STATE   121
-#define OPAL_LAST  121
+#define OPAL_INT_GET_XIRR  122
+#defineOPAL_INT_SET_CPPR   123
+#define OPAL_INT_EOI   124
+#define OPAL_INT_SET_MFRR  125
+#define OPAL_LAST  125
 
 /* Device tree flags */
 
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 3b369e9..42f0c95 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -218,6 +218,11 @@ int64_t opal_pci_set_power_state(uint64_t async_token, 
uint64_t id,
 uint64_t data);
 int64_t opal_pci_poll2(uint64_t id, uint64_t data);
 
+int64_t opal_int_get_xirr(uint32_t *out_xirr, bool just_poll);
+int64_t opal_int_set_cppr(uint8_t cppr);
+int64_t opal_int_eoi(uint32_t xirr);
+int64_t opal_int_set_mfrr(uint32_t cpu, uint8_t mfrr);
+
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
   int depth, void *data);
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S 
b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 7979d6d..c7764f9 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -307,3 +307,7 @@ OPAL_CALL(opal_get_device_tree, 
OPAL_GET_DEVICE_TREE);
 OPAL_CALL(opal_pci_get_presence_state, OPAL_PCI_GET_PRESENCE_STATE);
 OPAL_CALL(opal_pci_get_power_state,OPAL_PCI_GET_POWER_STATE);
 OPAL_CALL(opal_pci_set_power_state,OPAL_PCI_SET_POWER_STATE);
+OPAL_CALL(opal_int_get_xirr,   OPAL_INT_GET_XIRR);
+OPAL_CALL(opal_int_set_cppr,   OPAL_INT_SET_CPPR);
+OPAL_CALL(opal_int_eoi,OPAL_INT_EOI);
+OPAL_CALL(opal_int_set_mfrr,   OPAL_INT_SET_MFRR);
-- 
2.7.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 11/11] powerpc/powernv: Use deepest stop state when cpu is offlined

2016-07-08 Thread Shreyas B. Prabhu
If hardware supports stop state, use the deepest stop state when
the cpu is offlined.

Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Shreyas B. Prabhu 
---
- No changes since v1

 arch/powerpc/platforms/powernv/idle.c| 15 +--
 arch/powerpc/platforms/powernv/powernv.h |  1 +
 arch/powerpc/platforms/powernv/smp.c |  4 +++-
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 8219e22..479c256 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -253,6 +253,11 @@ static void power9_idle(void)
 u64 pnv_first_deep_stop_state = MAX_STOP_STATE;
 
 /*
+ * Deepest stop idle state. Used when a cpu is offlined
+ */
+u64 pnv_deepest_stop_state;
+
+/*
  * Power ISA 3.0 idle initialization.
  *
  * POWER ISA 3.0 defines a new SPR Processor stop Status and Control
@@ -314,8 +319,11 @@ static int __init pnv_arch300_idle_init(struct device_node 
*np, u32 *flags,
}
 
/*
-* Set pnv_first_deep_stop_state to the first stop level
-* to cause hypervisor state loss
+* Set pnv_first_deep_stop_state and pnv_deepest_stop_state.
+* pnv_first_deep_stop_state should be set to the first stop
+* level to cause hypervisor state loss.
+* pnv_deepest_stop_state should be set to the deepest stop
+* stop state.
 */
pnv_first_deep_stop_state = MAX_STOP_STATE;
for (i = 0; i < dt_idle_states; i++) {
@@ -324,6 +332,9 @@ static int __init pnv_arch300_idle_init(struct device_node 
*np, u32 *flags,
if ((flags[i] & OPAL_PM_LOSE_FULL_CONTEXT) &&
 (pnv_first_deep_stop_state > psscr_rl))
pnv_first_deep_stop_state = psscr_rl;
+
+   if (pnv_deepest_stop_state < psscr_rl)
+   pnv_deepest_stop_state = psscr_rl;
}
 
 out:
diff --git a/arch/powerpc/platforms/powernv/powernv.h 
b/arch/powerpc/platforms/powernv/powernv.h
index 6dbc0a1..da7c843 100644
--- a/arch/powerpc/platforms/powernv/powernv.h
+++ b/arch/powerpc/platforms/powernv/powernv.h
@@ -18,6 +18,7 @@ static inline void pnv_pci_shutdown(void) { }
 #endif
 
 extern u32 pnv_get_supported_cpuidle_states(void);
+extern u64 pnv_deepest_stop_state;
 
 extern void pnv_lpc_init(void);
 
diff --git a/arch/powerpc/platforms/powernv/smp.c 
b/arch/powerpc/platforms/powernv/smp.c
index ad7b1a3..c789258 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -182,7 +182,9 @@ static void pnv_smp_cpu_kill_self(void)
 
ppc64_runlatch_off();
 
-   if (idle_states & OPAL_PM_WINKLE_ENABLED)
+   if (cpu_has_feature(CPU_FTR_ARCH_300))
+   srr1 = power9_idle_stop(pnv_deepest_stop_state);
+   else if (idle_states & OPAL_PM_WINKLE_ENABLED)
srr1 = power7_winkle();
else if ((idle_states & OPAL_PM_SLEEP_ENABLED) ||
(idle_states & OPAL_PM_SLEEP_ENABLED_ER1))
-- 
2.4.11

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 10/11] cpuidle/powernv: Add support for POWER ISA v3 idle states

2016-07-08 Thread Shreyas B. Prabhu
POWER ISA v3 defines a new idle processor core mechanism. In summary,
 a) new instruction named stop is added.
 b) new per thread SPR named PSSCR is added which controls the behavior
of stop instruction.

Supported idle states and value to be written to PSSCR register to enter
any idle state is exposed via ibm,cpu-idle-state-names and
ibm,cpu-idle-state-psscr respectively. To enter an idle state,
platform provided power_stop() needs to be invoked with the appropriate
PSSCR value.

This patch adds support for this new mechanism in cpuidle powernv driver.

Cc: Rafael J. Wysocki 
Cc: Daniel Lezcano 
Cc: Rob Herring 
Cc: Lorenzo Pieralisi 
Cc: linux...@vger.kernel.org
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: linuxppc-dev@lists.ozlabs.org
Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Shreyas B. Prabhu 
---
Note: Documentation for the device tree bindings is posted here-
http://patchwork.ozlabs.org/patch/629125/

Changes in v8
=
 - Fix a copy paste mistake while reading ibm,cpu-idle-state-names 

Changes in v7
=
 - Using stack instead kzalloc/kcalloc 

Changes in v5
=
 - Use generic cpuidle constant CPUIDLE_NAME_LEN
 - Fix return code handling for of_property_read_string_array
 - Use DT flags to determine if are using stop instruction, instead of
   cpu_has_feature
 - Removed uncessary cast with names
 - _loop -> stop_loop
 - Added POWERNV_THRESHOLD_LATENCY_NS to filter out idle states with high 
latency

 drivers/cpuidle/cpuidle-powernv.c | 61 +++
 1 file changed, 61 insertions(+)

diff --git a/drivers/cpuidle/cpuidle-powernv.c 
b/drivers/cpuidle/cpuidle-powernv.c
index 600bbe1..f7ca891 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -20,6 +20,8 @@
 #include 
 #include 
 
+#define POWERNV_THRESHOLD_LATENCY_NS 20
+
 struct cpuidle_driver powernv_idle_driver = {
.name = "powernv_idle",
.owner= THIS_MODULE,
@@ -27,6 +29,9 @@ struct cpuidle_driver powernv_idle_driver = {
 
 static int max_idle_state;
 static struct cpuidle_state *cpuidle_state_table;
+
+static u64 stop_psscr_table[CPUIDLE_STATE_MAX];
+
 static u64 snooze_timeout;
 static bool snooze_timeout_en;
 
@@ -91,6 +96,17 @@ static int fastsleep_loop(struct cpuidle_device *dev,
return index;
 }
 #endif
+
+static int stop_loop(struct cpuidle_device *dev,
+struct cpuidle_driver *drv,
+int index)
+{
+   ppc64_runlatch_off();
+   power9_idle_stop(stop_psscr_table[index]);
+   ppc64_runlatch_on();
+   return index;
+}
+
 /*
  * States for dedicated partition case.
  */
@@ -169,6 +185,8 @@ static int powernv_add_idle_states(void)
u32 latency_ns[CPUIDLE_STATE_MAX];
u32 residency_ns[CPUIDLE_STATE_MAX];
u32 flags[CPUIDLE_STATE_MAX];
+   u64 psscr_val[CPUIDLE_STATE_MAX];
+   const char *names[CPUIDLE_STATE_MAX];
int i, rc;
 
/* Currently we have snooze statically defined */
@@ -207,11 +225,34 @@ static int powernv_add_idle_states(void)
pr_warn("cpuidle-powernv: missing 
ibm,cpu-idle-state-latencies-ns in DT\n");
goto out;
}
+   if (of_property_read_string_array(power_mgt,
+   "ibm,cpu-idle-state-names", names, dt_idle_states) < 0) {
+   pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-names in 
DT\n");
+   goto out;
+   }
+
+   /*
+* If the idle states use stop instruction, probe for psscr values
+* which are necessary to specify required stop level.
+*/
+   if (flags[0] & (OPAL_PM_STOP_INST_FAST | OPAL_PM_STOP_INST_DEEP))
+   if (of_property_read_u64_array(power_mgt,
+   "ibm,cpu-idle-state-psscr", psscr_val, dt_idle_states)) {
+   pr_warn("cpuidle-powernv: missing 
ibm,cpu-idle-states-psscr in DT\n");
+   goto out;
+   }
 
rc = of_property_read_u32_array(power_mgt,
"ibm,cpu-idle-state-residency-ns", residency_ns, 
dt_idle_states);
 
for (i = 0; i < dt_idle_states; i++) {
+   /*
+* If an idle state has exit latency beyond
+* POWERNV_THRESHOLD_LATENCY_NS then don't use it
+* in cpu-idle.
+*/
+   if (latency_ns[i] > POWERNV_THRESHOLD_LATENCY_NS)
+   continue;
 
/*
 * Cpuidle accepts exit_latency and target_residency in us.
@@ -224,6 +265,16 @@ static int powernv_add_idle_states(void)
powernv_states[nr_idle_states].flags = 0;
powernv_states[nr_idle_states].target_residency = 100;

[PATCH v8 09/11] cpuidle/powernv: cleanup cpuidle-powernv.c

2016-07-08 Thread Shreyas B. Prabhu
 - Use stack instead of kzalloc'ed memory for variables while probing
   device tree for idle states.
 - Set cap for number of idle states that can be added to
   cpuidle_state_table
 - Minor change in way we check of_property_read_u32_array for error
   for sake of consistency
 - Drop unnecessary "&" while assigning function pointer

Cc: Rafael J. Wysocki 
Cc: Daniel Lezcano 
Cc: linux...@vger.kernel.org
Signed-off-by: Shreyas B. Prabhu 
---
Changes in v8
=
 - _loop -> snooze_loop

Changes in v7
=
 - New in v7. This was mainly to make the existing code
   consistent with the review comments for new code

 drivers/cpuidle/cpuidle-powernv.c | 38 --
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/drivers/cpuidle/cpuidle-powernv.c 
b/drivers/cpuidle/cpuidle-powernv.c
index 3a763a8..600bbe1 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -100,7 +100,7 @@ static struct cpuidle_state 
powernv_states[CPUIDLE_STATE_MAX] = {
.desc = "snooze",
.exit_latency = 0,
.target_residency = 0,
-   .enter = _loop },
+   .enter = snooze_loop },
 };
 
 static int powernv_cpuidle_add_cpu_notifier(struct notifier_block *n,
@@ -166,7 +166,9 @@ static int powernv_add_idle_states(void)
struct device_node *power_mgt;
int nr_idle_states = 1; /* Snooze */
int dt_idle_states;
-   u32 *latency_ns, *residency_ns, *flags;
+   u32 latency_ns[CPUIDLE_STATE_MAX];
+   u32 residency_ns[CPUIDLE_STATE_MAX];
+   u32 flags[CPUIDLE_STATE_MAX];
int i, rc;
 
/* Currently we have snooze statically defined */
@@ -184,22 +186,28 @@ static int powernv_add_idle_states(void)
goto out;
}
 
-   flags = kzalloc(sizeof(*flags) * dt_idle_states, GFP_KERNEL);
+   /*
+* Since snooze is used as first idle state, max idle states allowed is
+* CPUIDLE_STATE_MAX -1
+*/
+   if (dt_idle_states > CPUIDLE_STATE_MAX - 1) {
+   pr_warn("cpuidle-powernv: discovered idle states more than 
allowed");
+   dt_idle_states = CPUIDLE_STATE_MAX - 1;
+   }
+
if (of_property_read_u32_array(power_mgt,
"ibm,cpu-idle-state-flags", flags, dt_idle_states)) {
pr_warn("cpuidle-powernv : missing ibm,cpu-idle-state-flags in 
DT\n");
-   goto out_free_flags;
+   goto out;
}
 
-   latency_ns = kzalloc(sizeof(*latency_ns) * dt_idle_states, GFP_KERNEL);
-   rc = of_property_read_u32_array(power_mgt,
-   "ibm,cpu-idle-state-latencies-ns", latency_ns, dt_idle_states);
-   if (rc) {
+   if (of_property_read_u32_array(power_mgt,
+   "ibm,cpu-idle-state-latencies-ns", latency_ns,
+   dt_idle_states)) {
pr_warn("cpuidle-powernv: missing 
ibm,cpu-idle-state-latencies-ns in DT\n");
-   goto out_free_latency;
+   goto out;
}
 
-   residency_ns = kzalloc(sizeof(*residency_ns) * dt_idle_states, 
GFP_KERNEL);
rc = of_property_read_u32_array(power_mgt,
"ibm,cpu-idle-state-residency-ns", residency_ns, 
dt_idle_states);
 
@@ -215,7 +223,7 @@ static int powernv_add_idle_states(void)
strcpy(powernv_states[nr_idle_states].desc, "Nap");
powernv_states[nr_idle_states].flags = 0;
powernv_states[nr_idle_states].target_residency = 100;
-   powernv_states[nr_idle_states].enter = _loop;
+   powernv_states[nr_idle_states].enter = nap_loop;
}
 
/*
@@ -230,7 +238,7 @@ static int powernv_add_idle_states(void)
strcpy(powernv_states[nr_idle_states].desc, 
"FastSleep");
powernv_states[nr_idle_states].flags = 
CPUIDLE_FLAG_TIMER_STOP;
powernv_states[nr_idle_states].target_residency = 
30;
-   powernv_states[nr_idle_states].enter = _loop;
+   powernv_states[nr_idle_states].enter = fastsleep_loop;
}
 #endif
powernv_states[nr_idle_states].exit_latency =
@@ -243,12 +251,6 @@ static int powernv_add_idle_states(void)
 
nr_idle_states++;
}
-
-   kfree(residency_ns);
-out_free_latency:
-   kfree(latency_ns);
-out_free_flags:
-   kfree(flags);
 out:
return nr_idle_states;
 }
-- 
2.4.11

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 08/11] cpuidle/powernv: Use CPUIDLE_STATE_MAX instead of MAX_POWERNV_IDLE_STATES

2016-07-08 Thread Shreyas B. Prabhu
Use cpuidle's CPUIDLE_STATE_MAX macro instead of powernv specific
MAX_POWERNV_IDLE_STATES.

Cc: Rafael J. Wysocki 
Cc: Daniel Lezcano 
Cc: linux...@vger.kernel.org
Acked-by: Daniel Lezcano 
Signed-off-by: Shreyas B. Prabhu 
---
 - No changes after v5

Changes in v5
=
 - New in v5

 drivers/cpuidle/cpuidle-powernv.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/cpuidle/cpuidle-powernv.c 
b/drivers/cpuidle/cpuidle-powernv.c
index e12dc30..3a763a8 100644
--- a/drivers/cpuidle/cpuidle-powernv.c
+++ b/drivers/cpuidle/cpuidle-powernv.c
@@ -20,8 +20,6 @@
 #include 
 #include 
 
-#define MAX_POWERNV_IDLE_STATES8
-
 struct cpuidle_driver powernv_idle_driver = {
.name = "powernv_idle",
.owner= THIS_MODULE,
@@ -96,7 +94,7 @@ static int fastsleep_loop(struct cpuidle_device *dev,
 /*
  * States for dedicated partition case.
  */
-static struct cpuidle_state powernv_states[MAX_POWERNV_IDLE_STATES] = {
+static struct cpuidle_state powernv_states[CPUIDLE_STATE_MAX] = {
{ /* Snooze */
.name = "snooze",
.desc = "snooze",
-- 
2.4.11

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 07/11] powerpc/powernv: Add platform support for stop instruction

2016-07-08 Thread Shreyas B. Prabhu
POWER ISA v3 defines a new idle processor core mechanism. In summary,
 a) new instruction named stop is added. This instruction replaces
instructions like nap, sleep, rvwinkle.
 b) new per thread SPR named Processor Stop Status and Control Register
(PSSCR) is added which controls the behavior of stop instruction.

PSSCR layout:
--
| PLS | /// | SD | ESL | EC | PSLL | /// | TR | MTL | RL |
--
0  4 41   4243   44 4854   5660

PSSCR key fields:
Bits 0:3  - Power-Saving Level Status. This field indicates the lowest
power-saving state the thread entered since stop instruction was last
executed.

Bit 42 - Enable State Loss
0 - No state is lost irrespective of other fields
1 - Allows state loss

Bits 44:47 - Power-Saving Level Limit
This limits the power-saving level that can be entered into.

Bits 60:63 - Requested Level
Used to specify which power-saving level must be entered on executing
stop instruction

This patch adds support for stop instruction and PSSCR handling.

Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Shreyas B. Prabhu 
---
Changes in v8
=
 - Initializing pnv_first_deep_stop_state
 - Changing MMU_FTR_SECTION condition to reduce FTR section length

Changes in v7
=
 - LMRR, LMSER and ADSR not restored since its not necessary
 - power_stop0, power_stop renamed to power9_idle and power_idle_stop
 - PSSCR template is now a macro instead of storing in paca
 - power9_idle in C file instead of assembly
 - Fixed TOC related bug
 - Handling subcore within FTR section
 - Functions in idle.c reordered and broken into multiple functions
 - calling __restore_cpu_power8/9 via cur_cpu_spec->cpu_restore 
 - Restoring RPR once per core in P9

Changes in v6
=
 - Save/restore new P9 SPRs when using deep idle states

Changes in v4:
==
 - Added PSSCR layout to commit message
 - Improved / Fixed comments
 - Fixed whitespace error in paca.h
 - Using MAX_POSSIBLE_STOP_STATE macro instead of hardcoding 0xF as 
   max possible stop state

Changes in v3:
==
 - Instead of introducing new file idle_power_stop.S, P9 idle support
   is added to idle_power_common.S using CPU_FTR sections.
 - Fixed r4 reg clobbering in power_stop0
 - Improved comments

Changes in v2:
==
 - Using CPU_FTR_ARCH_300 bit instead of CPU_FTR_STOP_INST

 arch/powerpc/include/asm/cpuidle.h|   2 +
 arch/powerpc/include/asm/kvm_book3s_asm.h |   2 +-
 arch/powerpc/include/asm/opal-api.h   |  11 +-
 arch/powerpc/include/asm/ppc-opcode.h |   4 +
 arch/powerpc/include/asm/processor.h  |   2 +
 arch/powerpc/include/asm/reg.h|  10 ++
 arch/powerpc/kernel/idle_book3s.S | 193 --
 arch/powerpc/platforms/powernv/idle.c | 174 ++-
 8 files changed, 332 insertions(+), 66 deletions(-)

diff --git a/arch/powerpc/include/asm/cpuidle.h 
b/arch/powerpc/include/asm/cpuidle.h
index d2f99ca..3d7fc06 100644
--- a/arch/powerpc/include/asm/cpuidle.h
+++ b/arch/powerpc/include/asm/cpuidle.h
@@ -13,6 +13,8 @@
 #ifndef __ASSEMBLY__
 extern u32 pnv_fastsleep_workaround_at_entry[];
 extern u32 pnv_fastsleep_workaround_at_exit[];
+
+extern u64 pnv_first_deep_stop_state;
 #endif
 
 #endif
diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h 
b/arch/powerpc/include/asm/kvm_book3s_asm.h
index 72b6225..d318d43 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -162,7 +162,7 @@ struct kvmppc_book3s_shadow_vcpu {
 
 /* Values for kvm_state */
 #define KVM_HWTHREAD_IN_KERNEL 0
-#define KVM_HWTHREAD_IN_NAP1
+#define KVM_HWTHREAD_IN_IDLE   1
 #define KVM_HWTHREAD_IN_KVM2
 
 #endif /* __ASM_KVM_BOOK3S_ASM_H__ */
diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 72b5f27..6de1e4e 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -166,13 +166,20 @@
 
 /* Device tree flags */
 
-/* Flags set in power-mgmt nodes in device tree if
- * respective idle states are supported in the platform.
+/*
+ * Flags set in power-mgmt nodes in device tree describing
+ * idle states that are supported in the platform.
  */
+
+#define OPAL_PM_TIMEBASE_STOP  0x0002
+#define OPAL_PM_LOSE_HYP_CONTEXT   0x2000
+#define OPAL_PM_LOSE_FULL_CONTEXT  0x4000
 #define OPAL_PM_NAP_ENABLED0x0001
 #define OPAL_PM_SLEEP_ENABLED  0x0002
 #define OPAL_PM_WINKLE_ENABLED 0x0004
 #define OPAL_PM_SLEEP_ENABLED_ER1  0x0008 /* with workaround */
+#define OPAL_PM_STOP_INST_FAST 0x0010
+#define OPAL_PM_STOP_INST_DEEP 0x0020
 
 /*
  * 

[PATCH v8 06/11] powerpc/powernv: abstraction for saving SPRs before entering deep idle states

2016-07-08 Thread Shreyas B. Prabhu
Create a function for saving SPRs before entering deep idle states.
This function can be reused for POWER9 deep idle states.

Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Shreyas B. Prabhu 
---
 - No changes since v3

Changes in v3:
=
 - Newly added in v3

 arch/powerpc/kernel/idle_book3s.S | 54 +++
 1 file changed, 32 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kernel/idle_book3s.S 
b/arch/powerpc/kernel/idle_book3s.S
index a8397e3..2f909a1 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -53,6 +53,36 @@
.text
 
 /*
+ * Used by threads before entering deep idle states. Saves SPRs
+ * in interrupt stack frame
+ */
+save_sprs_to_stack:
+   /*
+* Note all register i.e per-core, per-subcore or per-thread is saved
+* here since any thread in the core might wake up first
+*/
+   mfspr   r3,SPRN_SDR1
+   std r3,_SDR1(r1)
+   mfspr   r3,SPRN_RPR
+   std r3,_RPR(r1)
+   mfspr   r3,SPRN_SPURR
+   std r3,_SPURR(r1)
+   mfspr   r3,SPRN_PURR
+   std r3,_PURR(r1)
+   mfspr   r3,SPRN_TSCR
+   std r3,_TSCR(r1)
+   mfspr   r3,SPRN_DSCR
+   std r3,_DSCR(r1)
+   mfspr   r3,SPRN_AMOR
+   std r3,_AMOR(r1)
+   mfspr   r3,SPRN_WORT
+   std r3,_WORT(r1)
+   mfspr   r3,SPRN_WORC
+   std r3,_WORC(r1)
+
+   blr
+
+/*
  * Used by threads when the lock bit of core_idle_state is set.
  * Threads will spin in HMT_LOW until the lock bit is cleared.
  * r14 - pointer to core_idle_state
@@ -209,28 +239,8 @@ fastsleep_workaround_at_entry:
b   common_enter
 
 enter_winkle:
-   /*
-* Note all register i.e per-core, per-subcore or per-thread is saved
-* here since any thread in the core might wake up first
-*/
-   mfspr   r3,SPRN_SDR1
-   std r3,_SDR1(r1)
-   mfspr   r3,SPRN_RPR
-   std r3,_RPR(r1)
-   mfspr   r3,SPRN_SPURR
-   std r3,_SPURR(r1)
-   mfspr   r3,SPRN_PURR
-   std r3,_PURR(r1)
-   mfspr   r3,SPRN_TSCR
-   std r3,_TSCR(r1)
-   mfspr   r3,SPRN_DSCR
-   std r3,_DSCR(r1)
-   mfspr   r3,SPRN_AMOR
-   std r3,_AMOR(r1)
-   mfspr   r3,SPRN_WORT
-   std r3,_WORT(r1)
-   mfspr   r3,SPRN_WORC
-   std r3,_WORC(r1)
+   bl  save_sprs_to_stack
+
IDLE_STATE_ENTER_SEQ(PPC_WINKLE)
 
 _GLOBAL(power7_idle)
-- 
2.4.11

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 05/11] powerpc/powernv: Make pnv_powersave_common more generic

2016-07-08 Thread Shreyas B. Prabhu
pnv_powersave_common does common steps needed before entering idle
state and eventually changes MSR to MSR_IDLE and does rfid to
pnv_enter_arch207_idle_mode.

Move the updation of HSTATE_HWTHREAD_STATE to pnv_powersave_common
from pnv_enter_arch207_idle_mode and make it more generic by passing the
rfid address as a function parameter.

Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Shreyas B. Prabhu 
---
 - No changes since v4

Changes in v4:
==
 - Moved renaming of power7_powersave_common to earlier patch

Changes in v3:
==
 - Moved HSTATE_HWTHREAD_STATE updation to power_powersave_common

 arch/powerpc/kernel/idle_book3s.S | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kernel/idle_book3s.S 
b/arch/powerpc/kernel/idle_book3s.S
index 34dbfc9..a8397e3 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -75,6 +75,8 @@ core_idle_lock_held:
  * To check IRQ_HAPPENED in r4
  * 0 - don't check
  * 1 - check
+ *
+ * Address to 'rfid' to in r5
  */
 _GLOBAL(pnv_powersave_common)
/* Use r3 to pass state nap/sleep/winkle */
@@ -127,28 +129,28 @@ _GLOBAL(pnv_powersave_common)
std r9,_MSR(r1)
std r1,PACAR1(r13)
 
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+   /* Tell KVM we're entering idle */
+   li  r4,KVM_HWTHREAD_IN_NAP
+   stb r4,HSTATE_HWTHREAD_STATE(r13)
+#endif
+
/*
 * Go to real mode to do the nap, as required by the architecture.
 * Also, we need to be in real mode before setting hwthread_state,
 * because as soon as we do that, another thread can switch
 * the MMU context to the guest.
 */
-   LOAD_REG_IMMEDIATE(r5, MSR_IDLE)
+   LOAD_REG_IMMEDIATE(r7, MSR_IDLE)
li  r6, MSR_RI
andcr6, r9, r6
-   LOAD_REG_ADDR(r7, pnv_enter_arch207_idle_mode)
mtmsrd  r6, 1   /* clear RI before setting SRR0/1 */
-   mtspr   SPRN_SRR0, r7
-   mtspr   SPRN_SRR1, r5
+   mtspr   SPRN_SRR0, r5
+   mtspr   SPRN_SRR1, r7
rfid
 
.globl pnv_enter_arch207_idle_mode
 pnv_enter_arch207_idle_mode:
-#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
-   /* Tell KVM we're napping */
-   li  r4,KVM_HWTHREAD_IN_NAP
-   stb r4,HSTATE_HWTHREAD_STATE(r13)
-#endif
stb r3,PACA_THREAD_IDLE_STATE(r13)
cmpwi   cr3,r3,PNV_THREAD_SLEEP
bge cr3,2f
@@ -243,18 +245,21 @@ _GLOBAL(power7_idle)
 _GLOBAL(power7_nap)
mr  r4,r3
li  r3,PNV_THREAD_NAP
+   LOAD_REG_ADDR(r5, pnv_enter_arch207_idle_mode)
b   pnv_powersave_common
/* No return */
 
 _GLOBAL(power7_sleep)
li  r3,PNV_THREAD_SLEEP
li  r4,1
+   LOAD_REG_ADDR(r5, pnv_enter_arch207_idle_mode)
b   pnv_powersave_common
/* No return */
 
 _GLOBAL(power7_winkle)
li  r3,PNV_THREAD_WINKLE
li  r4,1
+   LOAD_REG_ADDR(r5, pnv_enter_arch207_idle_mode)
b   pnv_powersave_common
/* No return */
 
-- 
2.4.11

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 04/11] powerpc/powernv: Rename reusable idle functions to hardware agnostic names

2016-07-08 Thread Shreyas B. Prabhu
Functions like power7_wakeup_loss, power7_wakeup_noloss,
power7_wakeup_tb_loss are used by POWER7 and POWER8 hardware. They can
also be used by POWER9. Hence rename these functions hardware agnostic
names.

Suggested-by: Gautham R. Shenoy 
Signed-off-by: Shreyas B. Prabhu 
---
 - No changes since v4

Changes in v4:
==
 - renaming power7_powersave_common to pnv_powersave_common
 - renaming power7_enter_nap_mode to pnv_enter_arch207_idle_mode

 arch/powerpc/kernel/exceptions-64s.S|  8 
 arch/powerpc/kernel/idle_book3s.S   | 33 +
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  4 ++--
 3 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 4a74d6a..2a123cd 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -108,7 +108,7 @@ BEGIN_FTR_SECTION
 
cmpwi   cr3,r13,2
GET_PACA(r13)
-   bl  power7_restore_hyp_resource
+   bl  pnv_restore_hyp_resource
 
li  r0,PNV_THREAD_RUNNING
stb r0,PACA_THREAD_IDLE_STATE(r13)  /* Clear thread state */
@@ -128,8 +128,8 @@ BEGIN_FTR_SECTION
/* Return SRR1 from power7_nap() */
mfspr   r3,SPRN_SRR1
blt cr3,2f
-   b   power7_wakeup_loss
-2: b   power7_wakeup_noloss
+   b   pnv_wakeup_loss
+2: b   pnv_wakeup_noloss
 
 9:
 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
@@ -1269,7 +1269,7 @@ machine_check_handle_early:
GET_PACA(r13)
ld  r1,PACAR1(r13)
li  r3,PNV_THREAD_NAP
-   b   power7_enter_nap_mode
+   b   pnv_enter_arch207_idle_mode
 4:
 #endif
/*
diff --git a/arch/powerpc/kernel/idle_book3s.S 
b/arch/powerpc/kernel/idle_book3s.S
index d5def06..34dbfc9 100644
--- a/arch/powerpc/kernel/idle_book3s.S
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -1,5 +1,6 @@
 /*
- *  This file contains the power_save function for Power7 CPUs.
+ *  This file contains idle entry/exit functions for POWER7 and
+ *  POWER8 CPUs.
  *
  *  This program is free software; you can redistribute it and/or
  *  modify it under the terms of the GNU General Public License
@@ -75,7 +76,7 @@ core_idle_lock_held:
  * 0 - don't check
  * 1 - check
  */
-_GLOBAL(power7_powersave_common)
+_GLOBAL(pnv_powersave_common)
/* Use r3 to pass state nap/sleep/winkle */
/* NAP is a state loss, we create a regs frame on the
 * stack, fill it up with the state we care about and
@@ -135,14 +136,14 @@ _GLOBAL(power7_powersave_common)
LOAD_REG_IMMEDIATE(r5, MSR_IDLE)
li  r6, MSR_RI
andcr6, r9, r6
-   LOAD_REG_ADDR(r7, power7_enter_nap_mode)
+   LOAD_REG_ADDR(r7, pnv_enter_arch207_idle_mode)
mtmsrd  r6, 1   /* clear RI before setting SRR0/1 */
mtspr   SPRN_SRR0, r7
mtspr   SPRN_SRR1, r5
rfid
 
-   .globl  power7_enter_nap_mode
-power7_enter_nap_mode:
+   .globl pnv_enter_arch207_idle_mode
+pnv_enter_arch207_idle_mode:
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
/* Tell KVM we're napping */
li  r4,KVM_HWTHREAD_IN_NAP
@@ -242,19 +243,19 @@ _GLOBAL(power7_idle)
 _GLOBAL(power7_nap)
mr  r4,r3
li  r3,PNV_THREAD_NAP
-   b   power7_powersave_common
+   b   pnv_powersave_common
/* No return */
 
 _GLOBAL(power7_sleep)
li  r3,PNV_THREAD_SLEEP
li  r4,1
-   b   power7_powersave_common
+   b   pnv_powersave_common
/* No return */
 
 _GLOBAL(power7_winkle)
li  r3,PNV_THREAD_WINKLE
li  r4,1
-   b   power7_powersave_common
+   b   pnv_powersave_common
/* No return */
 
 #define CHECK_HMI_INTERRUPT\
@@ -284,7 +285,7 @@ ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66);
\
  * r13 - Contents of HSPRG0
  * cr3 - set to gt if waking up with partial/complete hypervisor state loss
  */
-_GLOBAL(power7_restore_hyp_resource)
+_GLOBAL(pnv_restore_hyp_resource)
/*
 * Check if last bit of HSPGR0 is set. This indicates whether we are
 * waking up from winkle.
@@ -296,7 +297,7 @@ _GLOBAL(power7_restore_hyp_resource)
 
lbz r0,PACA_THREAD_IDLE_STATE(r13)
cmpwi   cr2,r0,PNV_THREAD_NAP
-   bgt cr2,power7_wakeup_tb_loss   /* Either sleep or Winkle */
+   bgt cr2,pnv_wakeup_tb_loss  /* Either sleep or Winkle */
 
/*
 * We fall through here if PACA_THREAD_IDLE_STATE shows we are waking
@@ -306,10 +307,10 @@ _GLOBAL(power7_restore_hyp_resource)
bgt cr3,.
 
blr /* Return back to System Reset vector from where
-  power7_restore_hyp_resource was invoked */
+  pnv_restore_hyp_resource was 

[PATCH v8 03/11] powerpc/powernv: Rename idle_power7.S to idle_book3s.S

2016-07-08 Thread Shreyas B. Prabhu
idle_power7.S handles idle entry/exit for POWER7, POWER8 and in next
patch for POWER9. Rename the file to a non-hardware specific
name.

Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Shreyas B. Prabhu 
---

Changes in v7:
=
 - File renamed to idle_book3s.S instead of idle_power_common.S

Changes in v3:
==
 - Instead of moving few common functions from idle_power7.S to
   idle_power_common.S, renaming idle_power7.S to idle_power_common.S

 arch/powerpc/kernel/Makefile  |   2 +-
 arch/powerpc/kernel/idle_book3s.S | 527 ++
 arch/powerpc/kernel/idle_power7.S | 527 --
 3 files changed, 528 insertions(+), 528 deletions(-)
 create mode 100644 arch/powerpc/kernel/idle_book3s.S
 delete mode 100644 arch/powerpc/kernel/idle_power7.S

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 2da380f..9e7bfc32 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -47,7 +47,7 @@ obj-$(CONFIG_PPC_BOOK3E_64)   += exceptions-64e.o 
idle_book3e.o
 obj-$(CONFIG_PPC64)+= vdso64/
 obj-$(CONFIG_ALTIVEC)  += vecemu.o
 obj-$(CONFIG_PPC_970_NAP)  += idle_power4.o
-obj-$(CONFIG_PPC_P7_NAP)   += idle_power7.o
+obj-$(CONFIG_PPC_P7_NAP)   += idle_book3s.o
 procfs-y   := proc_powerpc.o
 obj-$(CONFIG_PROC_FS)  += $(procfs-y)
 rtaspci-$(CONFIG_PPC64)-$(CONFIG_PCI)  := rtas_pci.o
diff --git a/arch/powerpc/kernel/idle_book3s.S 
b/arch/powerpc/kernel/idle_book3s.S
new file mode 100644
index 000..d5def06
--- /dev/null
+++ b/arch/powerpc/kernel/idle_book3s.S
@@ -0,0 +1,527 @@
+/*
+ *  This file contains the power_save function for Power7 CPUs.
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#undef DEBUG
+
+/*
+ * Use unused space in the interrupt stack to save and restore
+ * registers for winkle support.
+ */
+#define _SDR1  GPR3
+#define _RPR   GPR4
+#define _SPURR GPR5
+#define _PURR  GPR6
+#define _TSCR  GPR7
+#define _DSCR  GPR8
+#define _AMOR  GPR9
+#define _WORT  GPR10
+#define _WORC  GPR11
+
+/* Idle state entry routines */
+
+#defineIDLE_STATE_ENTER_SEQ(IDLE_INST) \
+   /* Magic NAP/SLEEP/WINKLE mode enter sequence */\
+   std r0,0(r1);   \
+   ptesync;\
+   ld  r0,0(r1);   \
+1: cmp cr0,r0,r0;  \
+   bne 1b; \
+   IDLE_INST;  \
+   b   .
+
+   .text
+
+/*
+ * Used by threads when the lock bit of core_idle_state is set.
+ * Threads will spin in HMT_LOW until the lock bit is cleared.
+ * r14 - pointer to core_idle_state
+ * r15 - used to load contents of core_idle_state
+ */
+
+core_idle_lock_held:
+   HMT_LOW
+3: lwz r15,0(r14)
+   andi.   r15,r15,PNV_CORE_IDLE_LOCK_BIT
+   bne 3b
+   HMT_MEDIUM
+   lwarx   r15,0,r14
+   blr
+
+/*
+ * Pass requested state in r3:
+ * r3 - PNV_THREAD_NAP/SLEEP/WINKLE
+ *
+ * To check IRQ_HAPPENED in r4
+ * 0 - don't check
+ * 1 - check
+ */
+_GLOBAL(power7_powersave_common)
+   /* Use r3 to pass state nap/sleep/winkle */
+   /* NAP is a state loss, we create a regs frame on the
+* stack, fill it up with the state we care about and
+* stick a pointer to it in PACAR1. We really only
+* need to save PC, some CR bits and the NV GPRs,
+* but for now an interrupt frame will do.
+*/
+   mflrr0
+   std r0,16(r1)
+   stdur1,-INT_FRAME_SIZE(r1)
+   std r0,_LINK(r1)
+   std r0,_NIP(r1)
+
+   /* Hard disable interrupts */
+   mfmsr   r9
+   rldicl  r9,r9,48,1
+   rotldi  r9,r9,16
+   mtmsrd  r9,1/* hard-disable interrupts */
+
+   /* Check if something happened while soft-disabled */
+   lbz r0,PACAIRQHAPPENED(r13)
+   andi.   r0,r0,~PACA_IRQ_HARD_DIS@l
+   beq 1f
+   cmpwi   cr0,r4,0
+   beq 1f
+   addir1,r1,INT_FRAME_SIZE
+   ld  r0,16(r1)
+   li  r3,0/* Return 0 (no nap) */
+   mtlrr0
+   blr
+
+1: /* We mark irqs hard disabled as this is the state we'll
+* be in when returning and we need to tell arch_local_irq_restore()
+* about it
+*/
+   li  

[PATCH v8 02/11] powerpc/kvm: make hypervisor state restore a function

2016-07-08 Thread Shreyas B. Prabhu
In the current code, when the thread wakes up in reset vector, some
of the state restore code and check for whether a thread needs to
branch to kvm is duplicated. Reorder the code such that this
duplication is avoided.

At a higher level this is what the change looks like-

Before this patch -
power7_wakeup_tb_loss:
restore hypervisor state
if (thread needed by kvm)
goto kvm_start_guest
restore nvgprs, cr, pc
rfid to process context

power7_wakeup_loss:
restore nvgprs, cr, pc
rfid to process context

reset vector:
if (waking from deep idle states)
goto power7_wakeup_tb_loss
else
if (thread needed by kvm)
goto kvm_start_guest
goto power7_wakeup_loss

After this patch -
power7_wakeup_tb_loss:
restore hypervisor state
return

power7_restore_hyp_resource():
if (waking from deep idle states)
goto power7_wakeup_tb_loss
return

power7_wakeup_loss:
restore nvgprs, cr, pc
rfid to process context

reset vector:
power7_restore_hyp_resource()
if (thread needed by kvm)
goto kvm_start_guest
goto power7_wakeup_loss

Reviewed-by: Paul Mackerras 
Reviewed-by: Gautham R. Shenoy 
Signed-off-by: Shreyas B. Prabhu 
---
- No changes since v3

Changes in v3:
=
- Retaining GET_PACA(r13) in System Reset vector instead of moving it
  to power7_restore_hyp_resource
- Added comments indicating entry conditions for power7_restore_hyp_resource
- Improved comments around return statements

 arch/powerpc/kernel/exceptions-64s.S | 28 ++
 arch/powerpc/kernel/idle_power7.S| 72 +---
 2 files changed, 46 insertions(+), 54 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S 
b/arch/powerpc/kernel/exceptions-64s.S
index 4c94406..4a74d6a 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -107,25 +107,9 @@ BEGIN_FTR_SECTION
beq 9f
 
cmpwi   cr3,r13,2
-
-   /*
-* Check if last bit of HSPGR0 is set. This indicates whether we are
-* waking up from winkle.
-*/
GET_PACA(r13)
-   clrldi  r5,r13,63
-   clrrdi  r13,r13,1
-   cmpwi   cr4,r5,1
-   mtspr   SPRN_HSPRG0,r13
+   bl  power7_restore_hyp_resource
 
-   lbz r0,PACA_THREAD_IDLE_STATE(r13)
-   cmpwi   cr2,r0,PNV_THREAD_NAP
-   bgt cr2,8f  /* Either sleep or Winkle */
-
-   /* Waking up from nap should not cause hypervisor state loss */
-   bgt cr3,.
-
-   /* Waking up from nap */
li  r0,PNV_THREAD_RUNNING
stb r0,PACA_THREAD_IDLE_STATE(r13)  /* Clear thread state */
 
@@ -143,13 +127,9 @@ BEGIN_FTR_SECTION
 
/* Return SRR1 from power7_nap() */
mfspr   r3,SPRN_SRR1
-   beq cr3,2f
-   b   power7_wakeup_noloss
-2: b   power7_wakeup_loss
-
-   /* Fast Sleep wakeup on PowerNV */
-8: GET_PACA(r13)
-   b   power7_wakeup_tb_loss
+   blt cr3,2f
+   b   power7_wakeup_loss
+2: b   power7_wakeup_noloss
 
 9:
 END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
diff --git a/arch/powerpc/kernel/idle_power7.S 
b/arch/powerpc/kernel/idle_power7.S
index 705c867..d5def06 100644
--- a/arch/powerpc/kernel/idle_power7.S
+++ b/arch/powerpc/kernel/idle_power7.S
@@ -276,6 +276,39 @@ ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66);   
\
 20:nop;
 
 
+/*
+ * Called from reset vector. Check whether we have woken up with
+ * hypervisor state loss. If yes, restore hypervisor state and return
+ * back to reset vector.
+ *
+ * r13 - Contents of HSPRG0
+ * cr3 - set to gt if waking up with partial/complete hypervisor state loss
+ */
+_GLOBAL(power7_restore_hyp_resource)
+   /*
+* Check if last bit of HSPGR0 is set. This indicates whether we are
+* waking up from winkle.
+*/
+   clrldi  r5,r13,63
+   clrrdi  r13,r13,1
+   cmpwi   cr4,r5,1
+   mtspr   SPRN_HSPRG0,r13
+
+   lbz r0,PACA_THREAD_IDLE_STATE(r13)
+   cmpwi   cr2,r0,PNV_THREAD_NAP
+   bgt cr2,power7_wakeup_tb_loss   /* Either sleep or Winkle */
+
+   /*
+* We fall through here if PACA_THREAD_IDLE_STATE shows we are waking
+* up from nap. At this stage CR3 shouldn't contains 'gt' since that
+* indicates we are waking with hypervisor state loss from nap.
+*/
+   bgt cr3,.
+
+   blr /* Return back to System Reset vector from where
+  power7_restore_hyp_resource was invoked */
+
+
 _GLOBAL(power7_wakeup_tb_loss)
ld  r2,PACATOC(r13);
ld  r1,PACAR1(r13)
@@ -284,11 +317,13 @@ _GLOBAL(power7_wakeup_tb_loss)
 * and 

[PATCH v8 01/11] powerpc/powernv: Use PNV_THREAD_WINKLE macro while requesting for winkle

2016-07-08 Thread Shreyas B. Prabhu
Signed-off-by: Shreyas B. Prabhu 
---
-No changes since v4

Changes in v4
=
- New in v4

 arch/powerpc/kernel/idle_power7.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/idle_power7.S 
b/arch/powerpc/kernel/idle_power7.S
index 470ceeb..705c867 100644
--- a/arch/powerpc/kernel/idle_power7.S
+++ b/arch/powerpc/kernel/idle_power7.S
@@ -252,7 +252,7 @@ _GLOBAL(power7_sleep)
/* No return */
 
 _GLOBAL(power7_winkle)
-   li  r3,3
+   li  r3,PNV_THREAD_WINKLE
li  r4,1
b   power7_powersave_common
/* No return */
-- 
2.4.11

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v8 00/11] powerpc/powernv/cpuidle: Add support for POWER ISA v3 idle states

2016-07-08 Thread Shreyas B. Prabhu
POWER ISA v3 defines a new idle processor core mechanism. In summary,
 a) new instruction named stop is added. This instruction replaces
instructions like nap, sleep, rvwinkle.
 b) new per thread SPR named PSSCR is added which controls the behavior
of stop instruction. 

PSSCR has following key fields
Bits 0:3  - Power-Saving Level Status. This field indicates the
lowest power-saving state the thread entered since stop
instruction was last executed.

Bit 42 - Enable State Loss  
0 - No state is lost irrespective of other fields  
1 - Allows state loss

Bits 44:47 - Power-Saving Level Limit  
This limits the power-saving level that can be entered into.

Bits 60:63 - Requested Level  
Used to specify which power-saving level must be entered on
executing stop instruction

Stop idle states and their properties like name, latency, target
residency, psscr value are exposed via device tree.

This patch series adds support for this new mechanism.

Patches 1-6 are cleanups and code movement.
Patch 7 adds platform specific support for stop and psscr handling.
Patch 8 and 9 are minor cleanup in cpuidle driver.
Patch 10 adds cpuidle driver support.
Patch 11 makes offlined cpu use deepest stop state.

Note: Documentation for the device tree bindings is posted here-
http://patchwork.ozlabs.org/patch/629125/

Changes in v8
=
 - Fixed a copy-paste mistake in PATCH 10
 - Initializing pnv_first_deep_stop_state
 - Changing MMU_FTR_SECTION condition to reduce FTR section length
 - _loop -> snooze_loop

Changes in v7
=
 - File renamed to idle_book3s.S instead of idle_power_common.S
 - Comment changes
 - power_stop0, power_stop renamed to power9_idle and power_idle_stop
 - PSSCR template is now a macro instead of storing in paca
 - power9_idle in C file instead of assembly
 - Fixed TOC related bug
 - Handling subcore within FTR section
 - Functions in idle.c reordered and broken into multiple functions
 - calling __restore_cpu_power8/9 via cur_cpu_spec->cpu_restore 
 - Added a minor patch with minor cleanups in cpuidle-powernv.c . This
   was mainly to make the existing code consistent with the review
   comments for new code
 - Using stack for variables while probing for idle states instead of
   kzalloc/kcalloc
 - Restoring RPR once per core in P9

Changes in v6
=
 - Restore new POWER ISA v3 SPRS when waking up from deep idle

Changes in v5
=
 - Use generic cpuidle constant CPUIDLE_NAME_LEN
 - Fix return code handling for of_property_read_string_array
 - Use DT flags to determine if are using stop instruction, instead of
   cpu_has_feature
 - Removed uncessary cast with names
 - _loop -> stop_loop
 - Added POWERNV_THRESHOLD_LATENCY_NS to filter out idle states with high 
latency

Changes in v4
=
 - Added a patch to use PNV_THREAD_WINKLE macro while requesting for winkle
 - Moved power7_powersave_common rename to more appropriate patch
 - renaming power7_enter_nap_mode to pnv_enter_arch207_idle_mode
 - Added PSSCR layout to Patch 7's commit message
 - Improved / Fixed comments
 - Fixed whitespace error in paca.h
 - Using MAX_POSSIBLE_STOP_STATE macro instead of hardcoding 0xF has
   max possible stop state

Changes in v3
=
 - Rebased on powerpc-next
 - Dropping patch 1 since we are not adding a new file for P9 idle support
 - Improved comments in multiple places
 - Moved GET_PACA from power7_restore_hyp_resource to System Reset
 - Instead of moving few functions from idle_power7 to idle_power_common,
   renaming idle_power7.S to idle_power_common.S
 - Moved HSTATE_HWTHREAD_STATE updation to power_powersave_common
 - Dropped earlier patch 5 which moved few macros from idle_power_common to
   asm/cpuidle.h. 
 - Added a patch to rename reusable power7_* idle functions to pnv_*
 - Added new patch that creates abstraction for saving SPRs before
   entering deep idle states
 - Instead of introducing new file idle_power_stop.S, P9 idle support
   is added to idle_power_common.S using CPU_FTR sections.
 - Fixed r4 reg clobbering in power_stop0

Changes in v2
=
 - Rebased on v4.6-rc6
 - Using CPU_FTR_ARCH_300 bit instead of CPU_FTR_STOP_INST

Cc: Rafael J. Wysocki 
Cc: Daniel Lezcano 
Cc: Rob Herring 
Cc: Lorenzo Pieralisi 
Cc: linux...@vger.kernel.org
Cc: Benjamin Herrenschmidt 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Michael Neuling 
Cc: linuxppc-dev@lists.ozlabs.org

Shreyas B. Prabhu (11):
  powerpc/powernv: Use PNV_THREAD_WINKLE macro while requesting for
winkle
  powerpc/kvm: make hypervisor state restore a function
  powerpc/powernv: Rename