[PATCH v1] ARM: dts: aspeed: Add Mowgli rtc driver

2021-01-20 Thread Ben Pai
Add mowgli rtc driver.

Signed-off-by: Ben Pai 
---
 arch/arm/boot/dts/aspeed-bmc-opp-mowgli.dts | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/arm/boot/dts/aspeed-bmc-opp-mowgli.dts 
b/arch/arm/boot/dts/aspeed-bmc-opp-mowgli.dts
index b648e468e9db..8503152faaf0 100644
--- a/arch/arm/boot/dts/aspeed-bmc-opp-mowgli.dts
+++ b/arch/arm/boot/dts/aspeed-bmc-opp-mowgli.dts
@@ -582,6 +582,11 @@
/* TMP275A */
/* TMP275A */
 
+   rtc@32 {
+   compatible = "epson,rx8900";
+   reg = <0x32>;
+   };
+
tmp275@48 {
compatible = "ti,tmp275";
reg = <0x48>;
-- 
2.17.1



Re: [PATCH] powerpc/mm: Limit allocation of SWIOTLB on server machines

2020-12-23 Thread Ram Pai
On Wed, Dec 23, 2020 at 09:06:01PM -0300, Thiago Jung Bauermann wrote:
> 
> Hi Ram,
> 
> Thanks for reviewing this patch.
> 
> Ram Pai  writes:
> 
> > On Fri, Dec 18, 2020 at 03:21:03AM -0300, Thiago Jung Bauermann wrote:
> >> On server-class POWER machines, we don't need the SWIOTLB unless we're a
> >> secure VM. Nevertheless, if CONFIG_SWIOTLB is enabled we unconditionally
> >> allocate it.
> >> 
> >> In most cases this is harmless, but on a few machine configurations (e.g.,
> >> POWER9 powernv systems with 4 GB area reserved for crashdump kernel) it can
> >> happen that memblock can't find a 64 MB chunk of memory for the SWIOTLB and
> >> fails with a scary-looking WARN_ONCE:
> >> 
> >>  [ cut here ]
> >>  memblock: bottom-up allocation failed, memory hotremove may be affected
> >>  WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 
> >> memblock_find_in_range_node+0x328/0x340
> >>  Modules linked in:
> >>  CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0-rc2-orig+ #6
> >>  NIP:  c0442f38 LR: c0442f34 CTR: c01e0080
> >>  REGS: c1def900 TRAP: 0700   Not tainted  (5.10.0-rc2-orig+)
> >>  MSR:  92021033   CR: 2802  XER: 
> >> 2004
> >>  CFAR: c014b7b4 IRQMASK: 1
> >>  GPR00: c0442f34 c1defba0 c1deff00 0047
> >>  GPR04: 7fff c1def828 c1def820 
> >>  GPR08: 001ffc3e c1b75478 c1b75478 0001
> >>  GPR12: 2000 c203  
> >>  GPR16:    0203
> >>  GPR20:  0001 0001 c1defc10
> >>  GPR24: c1defc08 c1c91868 c1defc18 c1c91890
> >>  GPR28:   0400 
> >>  NIP [c0442f38] memblock_find_in_range_node+0x328/0x340
> >>  LR [c0442f34] memblock_find_in_range_node+0x324/0x340
> >>  Call Trace:
> >>  [c1defba0] [c0442f34] 
> >> memblock_find_in_range_node+0x324/0x340 (unreliable)
> >>  [c1defc90] [c15ac088] memblock_alloc_range_nid+0xec/0x1b0
> >>  [c1defd40] [c15ac1f8] memblock_alloc_internal+0xac/0x110
> >>  [c1defda0] [c15ac4d0] memblock_alloc_try_nid+0x94/0xcc
> >>  [c1defe30] [c159c3c8] swiotlb_init+0x78/0x104
> >>  [c1defea0] [c158378c] mem_init+0x4c/0x98
> >>  [c1defec0] [c157457c] start_kernel+0x714/0xac8
> >>  [c1deff90] [c000d244] start_here_common+0x1c/0x58
> >>  Instruction dump:
> >>  2c23 4182ffd4 ea610088 ea810090 4bfffe84 3921 3d42fff4 3c62ff60
> >>  3863c560 992a8bfc 4bd0881d 6000 <0fe0> ea610088 4bfffd94 6000
> >>  random: get_random_bytes called from __warn+0x128/0x184 with crng_init=0
> >>  ---[ end trace  ]---
> >>  software IO TLB: Cannot allocate buffer
> >> 
> >> Unless this is a secure VM the message can actually be ignored, because the
> >> SWIOTLB isn't needed. Therefore, let's avoid the SWIOTLB in those cases.
> >
> > The above warn_on is conveying a genuine warning. Should it be silenced?
> 
> Not sure I understand your point. This patch doesn't silence the
> warning, it avoids the problem it is warning about.

Sorry, I should have explained it better. My point is...  

If CONFIG_SWIOTLB is enabled, it means that the kernel is
promising the bounce buffering capability. I know, currently we
do not have any kernel subsystems that use bounce buffers on
non-secure-pseries-kernel or powernv-kernel.  But that does not
mean, there wont be any. In case there is such a third-party
module needing bounce buffering, it wont be able to operate,
because of the proposed change in your patch.

Is that a good thing or a bad thing, I do not know. I will let
the experts opine.

RP


Re: [PATCH] powerpc/mm: Limit allocation of SWIOTLB on server machines

2020-12-23 Thread Ram Pai
On Fri, Dec 18, 2020 at 03:21:03AM -0300, Thiago Jung Bauermann wrote:
> On server-class POWER machines, we don't need the SWIOTLB unless we're a
> secure VM. Nevertheless, if CONFIG_SWIOTLB is enabled we unconditionally
> allocate it.
> 
> In most cases this is harmless, but on a few machine configurations (e.g.,
> POWER9 powernv systems with 4 GB area reserved for crashdump kernel) it can
> happen that memblock can't find a 64 MB chunk of memory for the SWIOTLB and
> fails with a scary-looking WARN_ONCE:
> 
>  [ cut here ]
>  memblock: bottom-up allocation failed, memory hotremove may be affected
>  WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 
> memblock_find_in_range_node+0x328/0x340
>  Modules linked in:
>  CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0-rc2-orig+ #6
>  NIP:  c0442f38 LR: c0442f34 CTR: c01e0080
>  REGS: c1def900 TRAP: 0700   Not tainted  (5.10.0-rc2-orig+)
>  MSR:  92021033   CR: 2802  XER: 
> 2004
>  CFAR: c014b7b4 IRQMASK: 1
>  GPR00: c0442f34 c1defba0 c1deff00 0047
>  GPR04: 7fff c1def828 c1def820 
>  GPR08: 001ffc3e c1b75478 c1b75478 0001
>  GPR12: 2000 c203  
>  GPR16:    0203
>  GPR20:  0001 0001 c1defc10
>  GPR24: c1defc08 c1c91868 c1defc18 c1c91890
>  GPR28:   0400 
>  NIP [c0442f38] memblock_find_in_range_node+0x328/0x340
>  LR [c0442f34] memblock_find_in_range_node+0x324/0x340
>  Call Trace:
>  [c1defba0] [c0442f34] 
> memblock_find_in_range_node+0x324/0x340 (unreliable)
>  [c1defc90] [c15ac088] memblock_alloc_range_nid+0xec/0x1b0
>  [c1defd40] [c15ac1f8] memblock_alloc_internal+0xac/0x110
>  [c1defda0] [c15ac4d0] memblock_alloc_try_nid+0x94/0xcc
>  [c1defe30] [c159c3c8] swiotlb_init+0x78/0x104
>  [c1defea0] [c158378c] mem_init+0x4c/0x98
>  [c1defec0] [c157457c] start_kernel+0x714/0xac8
>  [c1deff90] [c000d244] start_here_common+0x1c/0x58
>  Instruction dump:
>  2c23 4182ffd4 ea610088 ea810090 4bfffe84 3921 3d42fff4 3c62ff60
>  3863c560 992a8bfc 4bd0881d 6000 <0fe0> ea610088 4bfffd94 6000
>  random: get_random_bytes called from __warn+0x128/0x184 with crng_init=0
>  ---[ end trace  ]---
>  software IO TLB: Cannot allocate buffer
> 
> Unless this is a secure VM the message can actually be ignored, because the
> SWIOTLB isn't needed. Therefore, let's avoid the SWIOTLB in those cases.

The above warn_on is conveying a genuine warning. Should it be silenced?

> 
> Fixes: eae9eec476d1 ("powerpc/pseries/svm: Allocate SWIOTLB buffer anywhere 
> in memory")
> Signed-off-by: Thiago Jung Bauermann 
> ---
>  arch/powerpc/mm/mem.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index afab328d0887..3af991844145 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -300,7 +300,8 @@ void __init mem_init(void)
>   memblock_set_bottom_up(true);
>   if (is_secure_guest())
>   svm_swiotlb_init();
> - else
> + /* Server machines don't need SWIOTLB if they're not secure guests. */
> + else if (!machine_is(pseries) && !machine_is(powernv))

I can see powernv never needing SWIOTLB. But, pseries guests, I am not
so sure.

RP


Re: [PATCH stable] net: sch_generic: fix the missing new qdisc assignment bug

2020-11-03 Thread Vishwanath Pai

On 11/2/20 10:25 PM, Yunsheng Lin wrote:
> commit 2fb541c862c9 ("net: sch_generic: aviod concurrent reset and 
enqueue op for lockless qdisc")

>
> When the above upstream commit is backported to stable kernel,
> one assignment is missing, which causes two problems reported
> by Joakim and Vishwanath, see [1] and [2].
>
> So add the assignment back to fix it.
>
> 1. 
https://urldefense.com/v3/__https://www.spinics.net/lists/netdev/msg693916.html__;!!GjvTz_vk!AqzcoNtwXeDu-vDNRKnOiOWYmi4B-2atZZExjZTvpp2jeJ9asOyQBVUtQyBp$
> 2. 
https://urldefense.com/v3/__https://www.spinics.net/lists/netdev/msg695131.html__;!!GjvTz_vk!AqzcoNtwXeDu-vDNRKnOiOWYmi4B-2atZZExjZTvpp2jeJ9asOyQBQlaitCQ$

>
> Fixes: 749cc0b0c7f3 ("net: sch_generic: aviod concurrent reset and 
enqueue op for lockless qdisc")

> Signed-off-by: Yunsheng Lin 
> ---
>  net/sched/sch_generic.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 0e275e1..6e6147a 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -1127,10 +1127,13 @@ static void dev_deactivate_queue(struct 
net_device *dev,

>   void *_qdisc_default)
>  {
>  struct Qdisc *qdisc = rtnl_dereference(dev_queue->qdisc);
> +    struct Qdisc *qdisc_default = _qdisc_default;
>
>  if (qdisc) {
>  if (!(qdisc->flags & TCQ_F_BUILTIN))
>  set_bit(__QDISC_STATE_DEACTIVATED, >state);
> +
> +    rcu_assign_pointer(dev_queue->qdisc, qdisc_default);
>  }
>  }
>

I have tested the patch on v5.4.71 and it fixes our issues.

Tested-by: Vishwanath Pai 



Re: [PATCH v2 net] net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc

2020-11-02 Thread Vishwanath Pai

On 11/2/20 4:08 AM, Yunsheng Lin wrote:
> On 2020/10/30 1:20, Vishwanath Pai wrote:
>> On 10/29/20 6:24 AM, Yunsheng Lin wrote:
>>> On 2020/10/29 12:50, Vishwanath Pai wrote:
>>>> On 10/28/20 10:37 PM, Yunsheng Lin wrote:
>>>>> On 2020/10/29 4:04, Vishwanath Pai wrote:
>>>>>> On 10/28/20 1:47 PM, Cong Wang wrote:
>>>>>>> On Wed, Oct 28, 2020 at 8:37 AM Pai, Vishwanath 
 wrote:

>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> We noticed some problems when testing the latest 5.4 LTS 
kernel and traced it
>>>>>>>> back to this commit using git bisect. When running our tests 
the machine stops
>>>>>>>> responding to all traffic and the only way to recover is a 
reboot. I do not see

>>>>>>>> a stack trace on the console.
>>>>>>>
>>>>>>> Do you mean the machine is still running fine just the network 
is down?

>>>>>>>
>>>>>>> If so, can you dump your tc config with stats when the problem 
is happening?

>>>>>>> (You can use `tc -s -d qd show ...`.)
>>>>>>>
>>>>>>>>
>>>>>>>> This can be reproduced using the packetdrill test below, it 
should be run a
>>>>>>>> few times or in a loop. You should hit this issue within a few 
tries but

>>>>>>>> sometimes might take up to 15-20 tries.
>>>>>>> ...
>>>>>>>> I can reproduce the issue easily on v5.4.68, and after 
reverting this commit it

>>>>>>>> does not happen anymore.
>>>>>>>
>>>>>>> This is odd. The patch in this thread touches netdev reset 
path, if packetdrill
>>>>>>> is the only thing you use to trigger the bug (that is netdev is 
always active),

>>>>>>> I can not connect them.
>>>>>>>
>>>>>>> Thanks.
>>>>>>
>>>>>> Hi Cong,
>>>>>>
>>>>>>> Do you mean the machine is still running fine just the network 
is down?

>>>>>>
>>>>>> I was able to access the machine via serial console, it looks 
like it is

>>>>>> up and running, just that networking is down.
>>>>>>
>>>>>>> If so, can you dump your tc config with stats when the problem 
is happening?

>>>>>>> (You can use `tc -s -d qd show ...`.)
>>>>>>
>>>>>> If I try running tc when the machine is in this state the 
command never

>>>>>> returns. It doesn't print anything but doesn't exit either.
>>>>>>
>>>>>>> This is odd. The patch in this thread touches netdev reset 
path, if packetdrill
>>>>>>> is the only thing you use to trigger the bug (that is netdev is 
always active),

>>>>>>> I can not connect them.
>>>>>>
>>>>>> I think packetdrill creates a tun0 interface when it starts the
>>>>>> test and tears it down at the end, so it might be hitting this 
code path

>>>>>> during teardown.
>>>>>
>>>>> Hi, Is there any preparation setup before running the above 
packetdrill test
>>>>> case, I run the above test case in 5.9-rc4 with this patch 
applied without any

>>>>> preparation setup, did not reproduce it.
>>>>>
>>>>> By the way, I am newbie to packetdrill:), it would be good to 
provide the

>>>>> detail setup to reproduce it,thanks.
>>>>>
>>>>>>
>>>>>> P.S: My mail server is having connectivity issues with 
vger.kernel.org

>>>>>> so messages aren't getting delivered to netdev. It'll hopefully get
>>>>>> resolved soon.
>>>>>>
>>>>>> Thanks,
>>>>>> Vishwanath
>>>>>>
>>>>>>
>>>>>> .
>>>>>>
>>>>
>>>> I can't reproduce it on v5.9-rc4 either, it is probably an issue 
only on

>>>> 5.4 then (and maybe older LTS versions). Can you give it a try on
>>>> 5.4.68?
>>>>
>>>> For running packetdrill, download the latest version from their github
>>>> repo, then run it in a loop without any special arguments. This is 
what

>>>> I do to reproduce it:
>>>>
>>>> while true; do ./packetdri

Re: [PATCH v2 net] net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc

2020-10-29 Thread Vishwanath Pai

On 10/29/20 6:24 AM, Yunsheng Lin wrote:
> On 2020/10/29 12:50, Vishwanath Pai wrote:
>> On 10/28/20 10:37 PM, Yunsheng Lin wrote:
>>> On 2020/10/29 4:04, Vishwanath Pai wrote:
>>>> On 10/28/20 1:47 PM, Cong Wang wrote:
>>>>> On Wed, Oct 28, 2020 at 8:37 AM Pai, Vishwanath  
wrote:

>>>>>> Hi,
>>>>>>
>>>>>> We noticed some problems when testing the latest 5.4 LTS kernel 
and traced it
>>>>>> back to this commit using git bisect. When running our tests the 
machine stops
>>>>>> responding to all traffic and the only way to recover is a 
reboot. I do not see

>>>>>> a stack trace on the console.
>>>>>
>>>>> Do you mean the machine is still running fine just the network is 
down?

>>>>>
>>>>> If so, can you dump your tc config with stats when the problem is 
happening?

>>>>> (You can use `tc -s -d qd show ...`.)
>>>>>
>>>>>>
>>>>>> This can be reproduced using the packetdrill test below, it 
should be run a
>>>>>> few times or in a loop. You should hit this issue within a few 
tries but

>>>>>> sometimes might take up to 15-20 tries.
>>>>> ...
>>>>>> I can reproduce the issue easily on v5.4.68, and after reverting 
this commit it

>>>>>> does not happen anymore.
>>>>>
>>>>> This is odd. The patch in this thread touches netdev reset path, 
if packetdrill
>>>>> is the only thing you use to trigger the bug (that is netdev is 
always active),

>>>>> I can not connect them.
>>>>>
>>>>> Thanks.
>>>>
>>>> Hi Cong,
>>>>
>>>>> Do you mean the machine is still running fine just the network is 
down?

>>>>
>>>> I was able to access the machine via serial console, it looks like 
it is

>>>> up and running, just that networking is down.
>>>>
>>>>> If so, can you dump your tc config with stats when the problem is 
happening?

>>>>> (You can use `tc -s -d qd show ...`.)
>>>>
>>>> If I try running tc when the machine is in this state the command 
never

>>>> returns. It doesn't print anything but doesn't exit either.
>>>>
>>>>> This is odd. The patch in this thread touches netdev reset path, 
if packetdrill
>>>>> is the only thing you use to trigger the bug (that is netdev is 
always active),

>>>>> I can not connect them.
>>>>
>>>> I think packetdrill creates a tun0 interface when it starts the
>>>> test and tears it down at the end, so it might be hitting this 
code path

>>>> during teardown.
>>>
>>> Hi, Is there any preparation setup before running the above 
packetdrill test
>>> case, I run the above test case in 5.9-rc4 with this patch applied 
without any

>>> preparation setup, did not reproduce it.
>>>
>>> By the way, I am newbie to packetdrill:), it would be good to 
provide the

>>> detail setup to reproduce it,thanks.
>>>
>>>>
>>>> P.S: My mail server is having connectivity issues with vger.kernel.org
>>>> so messages aren't getting delivered to netdev. It'll hopefully get
>>>> resolved soon.
>>>>
>>>> Thanks,
>>>> Vishwanath
>>>>
>>>>
>>>> .
>>>>
>>
>> I can't reproduce it on v5.9-rc4 either, it is probably an issue only on
>> 5.4 then (and maybe older LTS versions). Can you give it a try on
>> 5.4.68?
>>
>> For running packetdrill, download the latest version from their github
>> repo, then run it in a loop without any special arguments. This is what
>> I do to reproduce it:
>>
>> while true; do ./packetdrill ; done
>>
>> I don't think any other setup is necessary.
>
> Hi, run the above test for above an hour using 5.4.68, still did not
> reproduce it, as below:
>
>
> root@(none)$ cd /home/root/
> root@(none)$ ls
> creat_vlan.sh  packetdrill    test.pd
> root@(none)$ cat test.pd
> 0 `echo 4 > /proc/sys/net/ipv4/tcp_min_tso_segs`
>
> 0.400 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
> 0.400 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
>
> // set maxseg to 1000 to work with both ipv4 and ipv6
> 0.500 setsockopt(3, SOL_TCP, TCP_MAXSEG, [1000], 4) = 0
> 0.500 bind(3, ..., ...) = 0
> 0.500 listen(3, 1) = 0
>
> // Establish connection
> 0.600 < S 0:0(0) win 3

Re: [PATCH v2 net] net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc

2020-10-29 Thread Vishwanath Pai

On 10/28/20 10:37 PM, Yunsheng Lin wrote:
> On 2020/10/29 4:04, Vishwanath Pai wrote:
>> On 10/28/20 1:47 PM, Cong Wang wrote:
>>> On Wed, Oct 28, 2020 at 8:37 AM Pai, Vishwanath  
wrote:

>>>> Hi,
>>>>
>>>> We noticed some problems when testing the latest 5.4 LTS kernel 
and traced it
>>>> back to this commit using git bisect. When running our tests the 
machine stops
>>>> responding to all traffic and the only way to recover is a reboot. 
I do not see

>>>> a stack trace on the console.
>>>
>>> Do you mean the machine is still running fine just the network is down?
>>>
>>> If so, can you dump your tc config with stats when the problem is 
happening?

>>> (You can use `tc -s -d qd show ...`.)
>>>
>>>>
>>>> This can be reproduced using the packetdrill test below, it should 
be run a
>>>> few times or in a loop. You should hit this issue within a few 
tries but

>>>> sometimes might take up to 15-20 tries.
>>> ...
>>>> I can reproduce the issue easily on v5.4.68, and after reverting 
this commit it

>>>> does not happen anymore.
>>>
>>> This is odd. The patch in this thread touches netdev reset path, if 
packetdrill
>>> is the only thing you use to trigger the bug (that is netdev is 
always active),

>>> I can not connect them.
>>>
>>> Thanks.
>>
>> Hi Cong,
>>
>>> Do you mean the machine is still running fine just the network is down?
>>
>> I was able to access the machine via serial console, it looks like it is
>> up and running, just that networking is down.
>>
>>> If so, can you dump your tc config with stats when the problem is 
happening?

>>> (You can use `tc -s -d qd show ...`.)
>>
>> If I try running tc when the machine is in this state the command never
>> returns. It doesn't print anything but doesn't exit either.
>>
>>> This is odd. The patch in this thread touches netdev reset path, if 
packetdrill
>>> is the only thing you use to trigger the bug (that is netdev is 
always active),

>>> I can not connect them.
>>
>> I think packetdrill creates a tun0 interface when it starts the
>> test and tears it down at the end, so it might be hitting this code path
>> during teardown.
>
> Hi, Is there any preparation setup before running the above 
packetdrill test
> case, I run the above test case in 5.9-rc4 with this patch applied 
without any

> preparation setup, did not reproduce it.
>
> By the way, I am newbie to packetdrill:), it would be good to provide the
> detail setup to reproduce it,thanks.
>
>>
>> P.S: My mail server is having connectivity issues with vger.kernel.org
>> so messages aren't getting delivered to netdev. It'll hopefully get
>> resolved soon.
>>
>> Thanks,
>> Vishwanath
>>
>>
>> .
>>

I can't reproduce it on v5.9-rc4 either, it is probably an issue only on
5.4 then (and maybe older LTS versions). Can you give it a try on
5.4.68?

For running packetdrill, download the latest version from their github
repo, then run it in a loop without any special arguments. This is what
I do to reproduce it:

while true; do ./packetdrill ; done

I don't think any other setup is necessary.

-Vishwanath



Re: [PATCH v2 net] net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc

2020-10-28 Thread Vishwanath Pai

On 10/28/20 1:47 PM, Cong Wang wrote:
> On Wed, Oct 28, 2020 at 8:37 AM Pai, Vishwanath  wrote:
>> Hi,
>>
>> We noticed some problems when testing the latest 5.4 LTS kernel and 
traced it
>> back to this commit using git bisect. When running our tests the 
machine stops
>> responding to all traffic and the only way to recover is a reboot. I 
do not see

>> a stack trace on the console.
>
> Do you mean the machine is still running fine just the network is down?
>
> If so, can you dump your tc config with stats when the problem is 
happening?

> (You can use `tc -s -d qd show ...`.)
>
>>
>> This can be reproduced using the packetdrill test below, it should 
be run a

>> few times or in a loop. You should hit this issue within a few tries but
>> sometimes might take up to 15-20 tries.
> ...
>> I can reproduce the issue easily on v5.4.68, and after reverting 
this commit it

>> does not happen anymore.
>
> This is odd. The patch in this thread touches netdev reset path, if 
packetdrill
> is the only thing you use to trigger the bug (that is netdev is 
always active),

> I can not connect them.
>
> Thanks.

Hi Cong,

> Do you mean the machine is still running fine just the network is down?

I was able to access the machine via serial console, it looks like it is
up and running, just that networking is down.

> If so, can you dump your tc config with stats when the problem is 
happening?

> (You can use `tc -s -d qd show ...`.)

If I try running tc when the machine is in this state the command never
returns. It doesn't print anything but doesn't exit either.

> This is odd. The patch in this thread touches netdev reset path, if 
packetdrill
> is the only thing you use to trigger the bug (that is netdev is 
always active),

> I can not connect them.

I think packetdrill creates a tun0 interface when it starts the
test and tears it down at the end, so it might be hitting this code path
during teardown.

P.S: My mail server is having connectivity issues with vger.kernel.org
so messages aren't getting delivered to netdev. It'll hopefully get
resolved soon.

Thanks,
Vishwanath




Re: [PATCH v2 net] net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc

2020-10-28 Thread Pai, Vishwanath
On 10/28/20, 10:51 AM, "Vishwanath Pai"  wrote:

On Thu, Sep 17, 2020 at 4:26 PM Cong Wang  wrote:
>
> On Fri, Sep 11, 2020 at 1:13 AM Yunsheng Lin  wrote:
> >
> > On 2020/9/11 4:07, Cong Wang wrote:
> > > On Tue, Sep 8, 2020 at 4:06 AM Yunsheng Lin  
> > > wrote:
> > >>
> > >> Currently there is concurrent reset and enqueue operation for the
> > >> same lockless qdisc when there is no lock to synchronize the
> > >> q->enqueue() in __dev_xmit_skb() with the qdisc reset operation in
> > >> qdisc_deactivate() called by dev_deactivate_queue(), which may cause
> > >> out-of-bounds access for priv->ring[] in hns3 driver if user has
> > >> requested a smaller queue num when __dev_xmit_skb() still enqueue a
> > >> skb with a larger queue_mapping after the corresponding qdisc is
> > >> reset, and call hns3_nic_net_xmit() with that skb later.
> > >>
> > >> Reused the existing synchronize_net() in dev_deactivate_many() to
> > >> make sure skb with larger queue_mapping enqueued to old qdisc(which
> > >> is saved in dev_queue->qdisc_sleeping) will always be reset when
> > >> dev_reset_queue() is called.
> > >>
> > >> Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking")
> > >> Signed-off-by: Yunsheng Lin 
> > >> ---
> > >> ChangeLog V2:
> > >> Reuse existing synchronize_net().
> > >> ---
> > >>  net/sched/sch_generic.c | 48 
> > >> +---
> > >>  1 file changed, 33 insertions(+), 15 deletions(-)
> > >>
> > >> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> > >> index 265a61d..54c4172 100644
> > >> --- a/net/sched/sch_generic.c
> > >> +++ b/net/sched/sch_generic.c
> > >> @@ -1131,24 +1131,10 @@ EXPORT_SYMBOL(dev_activate);
> > >>
> > >>  static void qdisc_deactivate(struct Qdisc *qdisc)
> > >>  {
> > >> -   bool nolock = qdisc->flags & TCQ_F_NOLOCK;
> > >> -
> > >> if (qdisc->flags & TCQ_F_BUILTIN)
> > >> return;
> > >> -   if (test_bit(__QDISC_STATE_DEACTIVATED, >state))
> > >> -   return;
> > >> -
> > >> -   if (nolock)
> > >> -   spin_lock_bh(>seqlock);
> > >> -   spin_lock_bh(qdisc_lock(qdisc));
> > >>
> > >> set_bit(__QDISC_STATE_DEACTIVATED, >state);
> > >> -
> > >> -   qdisc_reset(qdisc);
> > >> -
> > >> -   spin_unlock_bh(qdisc_lock(qdisc));
> > >> -   if (nolock)
> > >> -   spin_unlock_bh(>seqlock);
> > >>  }
> > >>
> > >>  static void dev_deactivate_queue(struct net_device *dev,
> > >> @@ -1165,6 +1151,30 @@ static void dev_deactivate_queue(struct 
> > >> net_device *dev,
> > >> }
> > >>  }
> > >>
> > >> +static void dev_reset_queue(struct net_device *dev,
> > >> +   struct netdev_queue *dev_queue,
> > >> +   void *_unused)
> > >> +{
> > >> +   struct Qdisc *qdisc;
> > >> +   bool nolock;
> > >> +
> > >> +   qdisc = dev_queue->qdisc_sleeping;
> > >> +   if (!qdisc)
> > >> +   return;
> > >> +
> > >> +   nolock = qdisc->flags & TCQ_F_NOLOCK;
> > >> +
> > >> +   if (nolock)
> > >> +   spin_lock_bh(>seqlock);
> > >> +   spin_lock_bh(qdisc_lock(qdisc));
> > >
> > >
> > > I think you do not need this lock for lockless one.
> >
> > It seems so.
> > Maybe another patch to remove qdisc_lock(qdisc) for lockless
> > qdisc?
>
> Yeah, but not sure if we still want this lockless qdisc any more,
> it brings more troubles than gains.
>
> >
> >
> > >
> > >> +
> > >> +   qdisc_reset(qdisc);
> > >> +
> > >> +   spin_unlock_bh(qdisc_lock(qdisc));
> > >> +   if (nolock)
> > >> +   spin_unlock_bh(>seqlock);
> > >> +}
> > >> +
> > >>  static bool some_qdisc_is_busy(struct net_device *dev)
> > >>  {
> > >> unsigned int i;

Re: Re: [PATCH v2 net] net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc

2020-10-28 Thread Vishwanath Pai

On 9/17/20 3:26 PM, Cong Wang wrote:
> On Fri, Sep 11, 2020 at 1:13 AM Yunsheng Lin  
wrote:

>>
>> On 2020/9/11 4:07, Cong Wang wrote:
>>> On Tue, Sep 8, 2020 at 4:06 AM Yunsheng Lin 
 wrote:


 Currently there is concurrent reset and enqueue operation for the
 same lockless qdisc when there is no lock to synchronize the
 q->enqueue() in __dev_xmit_skb() with the qdisc reset operation in
 qdisc_deactivate() called by dev_deactivate_queue(), which may cause
 out-of-bounds access for priv->ring[] in hns3 driver if user has
 requested a smaller queue num when __dev_xmit_skb() still enqueue a
 skb with a larger queue_mapping after the corresponding qdisc is
 reset, and call hns3_nic_net_xmit() with that skb later.

 Reused the existing synchronize_net() in dev_deactivate_many() to
 make sure skb with larger queue_mapping enqueued to old qdisc(which
 is saved in dev_queue->qdisc_sleeping) will always be reset when
 dev_reset_queue() is called.

 Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking")
 Signed-off-by: Yunsheng Lin 
 ---
 ChangeLog V2:
 Reuse existing synchronize_net().
 ---
  net/sched/sch_generic.c | 48 
+---

  1 file changed, 33 insertions(+), 15 deletions(-)

 diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
 index 265a61d..54c4172 100644
 --- a/net/sched/sch_generic.c
 +++ b/net/sched/sch_generic.c
 @@ -1131,24 +1131,10 @@ EXPORT_SYMBOL(dev_activate);

  static void qdisc_deactivate(struct Qdisc *qdisc)
  {
 -   bool nolock = qdisc->flags & TCQ_F_NOLOCK;
 -
 if (qdisc->flags & TCQ_F_BUILTIN)
 return;
 -   if (test_bit(__QDISC_STATE_DEACTIVATED, >state))
 -   return;
 -
 -   if (nolock)
 - spin_lock_bh(>seqlock);
 -   spin_lock_bh(qdisc_lock(qdisc));

 set_bit(__QDISC_STATE_DEACTIVATED, >state);
 -
 -   qdisc_reset(qdisc);
 -
 -   spin_unlock_bh(qdisc_lock(qdisc));
 -   if (nolock)
 - spin_unlock_bh(>seqlock);
  }

  static void dev_deactivate_queue(struct net_device *dev,
 @@ -1165,6 +1151,30 @@ static void dev_deactivate_queue(struct 
net_device *dev,

 }
  }

 +static void dev_reset_queue(struct net_device *dev,
 +   struct netdev_queue *dev_queue,
 +   void *_unused)
 +{
 +   struct Qdisc *qdisc;
 +   bool nolock;
 +
 +   qdisc = dev_queue->qdisc_sleeping;
 +   if (!qdisc)
 +   return;
 +
 +   nolock = qdisc->flags & TCQ_F_NOLOCK;
 +
 +   if (nolock)
 + spin_lock_bh(>seqlock);
 +   spin_lock_bh(qdisc_lock(qdisc));
>>>
>>>
>>> I think you do not need this lock for lockless one.
>>
>> It seems so.
>> Maybe another patch to remove qdisc_lock(qdisc) for lockless
>> qdisc?
>
> Yeah, but not sure if we still want this lockless qdisc any more,
> it brings more troubles than gains.
>
>>
>>
>>>
 +
 +   qdisc_reset(qdisc);
 +
 +   spin_unlock_bh(qdisc_lock(qdisc));
 +   if (nolock)
 + spin_unlock_bh(>seqlock);
 +}
 +
  static bool some_qdisc_is_busy(struct net_device *dev)
  {
 unsigned int i;
 @@ -1213,12 +1223,20 @@ void dev_deactivate_many(struct list_head 
*head)

 dev_watchdog_down(dev);
 }

 -   /* Wait for outstanding qdisc-less dev_queue_xmit calls.
 +   /* Wait for outstanding qdisc-less dev_queue_xmit calls or
 +    * outstanding qdisc enqueuing calls.
  * This is avoided if all devices are in dismantle phase :
  * Caller will call synchronize_net() for us
  */
 synchronize_net();

 +   list_for_each_entry(dev, head, close_list) {
 +   netdev_for_each_tx_queue(dev, dev_reset_queue, NULL);
 +
 +   if (dev_ingress_queue(dev))
 +   dev_reset_queue(dev, 
dev_ingress_queue(dev), NULL);

 +   }
 +
 /* Wait for outstanding qdisc_run calls. */
 list_for_each_entry(dev, head, close_list) {
 while (some_qdisc_is_busy(dev)) {
>>>
>>> Do you want to reset before waiting for TX action?
>>>
>>> I think it is safer to do it after, at least prior to commit 759ae57f1b
>>> we did after.
>>
>> The reference to the txq->qdisc is always protected by RCU, so the 
synchronize_net()
>> should be enought to ensure there is no skb enqueued to the old 
qdisc that is saved
>> in the dev_queue->qdisc_sleeping, because __dev_queue_xmit can only 
see the new qdisc
>> after synchronize_net(), which is noop_qdisc, and noop_qdisc will 
make sure any skb

>> enqueued to it will be dropped and 

Re: [RFC PATCH v3 2/2] mm: remove extra ZONE_DEVICE struct page refcount

2020-10-07 Thread Ram Pai
On Thu, Oct 01, 2020 at 11:17:15AM -0700, Ralph Campbell wrote:
> ZONE_DEVICE struct pages have an extra reference count that complicates the
> code for put_page() and several places in the kernel that need to check the
> reference count to see that a page is not being used (gup, compaction,
> migration, etc.). Clean up the code so the reference count doesn't need to
> be treated specially for ZONE_DEVICE.


I was hoping this patch would resolve a page-reference issue that we run
into at random times while migrating a page to a device page backed by
secure-memory.

Unfortunately I continue to see the problem. There is a reference
held on that page, which fails the migration.

FYI
RP


[PATCH v1] ARM: dts: aspeed: Add Mowgli BMC platform

2020-09-09 Thread Ben Pai
The Mowgli BMC is an ASPEED ast2500 based BMC that is part of an
OpenPower Power9 server.

Signed-off-by: Ben Pai 
---
 arch/arm/boot/dts/Makefile  |   1 +
 arch/arm/boot/dts/aspeed-bmc-opp-mowgli.dts | 663 
 2 files changed, 664 insertions(+)
 create mode 100644 arch/arm/boot/dts/aspeed-bmc-opp-mowgli.dts

diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
index 4572db3fa5ae..92edb2e12823 100644
--- a/arch/arm/boot/dts/Makefile
+++ b/arch/arm/boot/dts/Makefile
@@ -1381,6 +1381,7 @@ dtb-$(CONFIG_ARCH_ASPEED) += \
aspeed-bmc-microsoft-olympus.dtb \
aspeed-bmc-opp-lanyang.dtb \
aspeed-bmc-opp-mihawk.dtb \
+   aspeed-bmc-opp-mowgli.dtb \
aspeed-bmc-opp-nicole.dtb \
aspeed-bmc-opp-palmetto.dtb \
aspeed-bmc-opp-romulus.dtb \
diff --git a/arch/arm/boot/dts/aspeed-bmc-opp-mowgli.dts 
b/arch/arm/boot/dts/aspeed-bmc-opp-mowgli.dts
new file mode 100644
index ..11fd86980162
--- /dev/null
+++ b/arch/arm/boot/dts/aspeed-bmc-opp-mowgli.dts
@@ -0,0 +1,663 @@
+// SPDX-License-Identifier: GPL-2.0+
+/dts-v1/;
+#include "aspeed-g5.dtsi"
+#include 
+#include 
+
+/ {
+   model = "Mowgli BMC";
+   compatible = "ibm,mowgli-bmc", "aspeed,ast2500";
+
+
+   chosen {
+   stdout-path = 
+   bootargs = "console=ttyS4,115200 earlyprintk";
+   };
+
+   memory@8000 {
+   reg = <0x8000 0x2000>;
+   };
+
+   reserved-memory {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   ranges;
+
+   flash_memory: region@9800 {
+   no-map;
+   reg = <0x9800 0x0400>; /* 64M */
+   };
+
+   gfx_memory: framebuffer {
+   size = <0x0100>;
+   alignment = <0x0100>;
+   compatible = "shared-dma-pool";
+   reusable;
+   };
+
+   video_engine_memory: jpegbuffer {
+   size = <0x0200>;
+   alignment = <0x0100>;
+   compatible = "shared-dma-pool";
+   reusable;
+   };
+   };
+
+   gpio-keys {
+   compatible = "gpio-keys";
+
+   air-water {
+   label = "air-water";
+   gpios = < ASPEED_GPIO(F, 6) GPIO_ACTIVE_LOW>;
+   linux,code = ;
+   };
+
+   checkstop {
+   label = "checkstop";
+   gpios = < ASPEED_GPIO(J, 2) GPIO_ACTIVE_LOW>;
+   linux,code = ;
+   };
+
+   ps0-presence {
+   label = "ps0-presence";
+   gpios = < ASPEED_GPIO(Z, 2) GPIO_ACTIVE_LOW>;
+   linux,code = ;
+   };
+
+   ps1-presence {
+   label = "ps1-presence";
+   gpios = < ASPEED_GPIO(Z, 0) GPIO_ACTIVE_LOW>;
+   linux,code = ;
+   };
+
+   id-button {
+   label = "id-button";
+   gpios = < ASPEED_GPIO(F, 1) GPIO_ACTIVE_LOW>;
+   linux,code = ;
+   };
+   };
+
+   gpio-keys-polled {
+   compatible = "gpio-keys-polled";
+   poll-interval = <1000>;
+
+   fan0-presence {
+   label = "fan0-presence";
+   gpios = < 9 GPIO_ACTIVE_LOW>;
+   linux,code = <9>;
+   };
+
+   fan1-presence {
+   label = "fan1-presence";
+   gpios = < 10 GPIO_ACTIVE_LOW>;
+   linux,code = <10>;
+   };
+
+   fan2-presence {
+   label = "fan2-presence";
+   gpios = < 11 GPIO_ACTIVE_LOW>;
+   linux,code = <11>;
+   };
+
+   fan3-presence {
+   label = "fan3-presence";
+   gpios = < 12 GPIO_ACTIVE_LOW>;
+   linux,code = <12>;
+   };
+
+   fan4-presence {
+   label = "fan4-presence";
+   gpios = < 13 GPIO_ACTIVE_LOW>;
+   linux,code = <13>;
+   };
+   };
+
+   leds {
+   compatible = "gpio-leds";
+
+   front-fault {
+   retain-state-

Re: [PATCH v2 2/2] KVM: PPC: Book3S HV: rework secure mem slot dropping

2020-07-21 Thread Ram Pai
On Tue, Jul 21, 2020 at 12:42:02PM +0200, Laurent Dufour wrote:
> When a secure memslot is dropped, all the pages backed in the secure device
> (aka really backed by secure memory by the Ultravisor) should be paged out
> to a normal page. Previously, this was achieved by triggering the page
> fault mechanism which is calling kvmppc_svm_page_out() on each pages.
> 
> This can't work when hot unplugging a memory slot because the memory slot
> is flagged as invalid and gfn_to_pfn() is then not trying to access the
> page, so the page fault mechanism is not triggered.
> 
> Since the final goal is to make a call to kvmppc_svm_page_out() it seems
> simpler to directly calling it instead of triggering such a mechanism. This
^^ call directly instead of triggering..

> way kvmppc_uvmem_drop_pages() can be called even when hot unplugging a
> memslot.
> 
> Since kvmppc_uvmem_drop_pages() is already holding kvm->arch.uvmem_lock,
> the call to __kvmppc_svm_page_out() is made.
> As __kvmppc_svm_page_out needs the vma pointer to migrate the pages, the
> VMA is fetched in a lazy way, to not trigger find_vma() all the time. In
> addition, the mmap_sem is help in read mode during that time, not in write
  ^^ held

> mode since the virual memory layout is not impacted, and
> kvm->arch.uvmem_lock prevents concurrent operation on the secure device.
> 
> Cc: Ram Pai 

Reviewed-by: Ram Pai 

RP


Re: [PATCH v2 1/2] KVM: PPC: Book3S HV: move kvmppc_svm_page_out up

2020-07-21 Thread Ram Pai
On Tue, Jul 21, 2020 at 12:42:01PM +0200, Laurent Dufour wrote:
> kvmppc_svm_page_out() will need to be called by kvmppc_uvmem_drop_pages()
> so move it upper in this file.
> 
> Furthermore it will be interesting to call this function when already
> holding the kvm->arch.uvmem_lock, so prefix the original function with __
> and remove the locking in it, and introduce a wrapper which call that
> function with the lock held.
> 
> There is no functional change.

Reviewed-by: Ram Pai 

> 
> Cc: Ram Pai 
> Cc: Bharata B Rao 
> Cc: Paul Mackerras 
> Signed-off-by: Laurent Dufour 
> ---

RP


[PATCH v1 2/2] ARM: dts: aspeed: mihawk: Add 8 tmp401 thermal sensor

2020-06-05 Thread Ben Pai
Signed-off-by: Ben Pai 
---
 arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts | 40 +
 1 file changed, 40 insertions(+)

diff --git a/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts 
b/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts
index 25ffe65fbdc0..5bf1f13dda3b 100644
--- a/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts
+++ b/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts
@@ -834,6 +834,11 @@
line-name = "smbus0";
};
};
+
+   tmp431@4c {
+   compatible = "ti,tmp401";
+   reg = <0x4c>;
+   };
};
 
bus9_mux232: i2c@1 {
@@ -854,6 +859,11 @@
line-name = "smbus1";
};
};
+
+   tmp431@4c {
+   compatible = "ti,tmp401";
+   reg = <0x4c>;
+   };
};
 
bus9_mux233: i2c@2 {
@@ -897,6 +907,11 @@
line-name = "smbus2";
};
};
+
+   tmp431@4c {
+   compatible = "ti,tmp401";
+   reg = <0x4c>;
+   };
};
 
bus9_mux236: i2c@1 {
@@ -917,6 +932,11 @@
line-name = "smbus3";
};
};
+
+   tmp431@4c {
+   compatible = "ti,tmp401";
+   reg = <0x4c>;
+   };
};
 
bus9_mux237: i2c@2 {
@@ -979,6 +999,11 @@
line-name = "smbus4";
};
};
+
+   tmp431@4c {
+   compatible = "ti,tmp401";
+   reg = <0x4c>;
+   };
};
 
bus10_mux240: i2c@1 {
@@ -999,6 +1024,11 @@
line-name = "smbus5";
};
};
+
+   tmp431@4c {
+   compatible = "ti,tmp401";
+   reg = <0x4c>;
+   };
};
 
bus10_mux241: i2c@2 {
@@ -1042,6 +1072,11 @@
line-name = "smbus6";
};
};
+
+   tmp431@4c {
+   compatible = "ti,tmp401";
+   reg = <0x4c>;
+   };
};
 
bus10_mux244: i2c@1 {
@@ -1062,6 +1097,11 @@
line-name = "smbus7";
};
};
+
+   tmp431@4c {
+   compatible = "ti,tmp401";
+   reg = <0x4c>;
+   };
};
 
bus10_mux245: i2c@2 {
-- 
2.17.1



[PATCH v1 1/2] ARM: dts: aspeed: mihawk: IO expander uses TCA9554 driver

2020-06-05 Thread Ben Pai
Set smbus_en of IO expander to 1 in order to be able to read sensor.

Signed-off-by: Ben Pai 
---
 arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts | 112 
 1 file changed, 112 insertions(+)

diff --git a/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts 
b/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts
index eac6ed2bbc67..25ffe65fbdc0 100644
--- a/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts
+++ b/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts
@@ -820,12 +820,40 @@
#address-cells = <1>;
#size-cells = <0>;
reg = <0>;
+
+   tca9554@39 {
+   compatible = "ti,tca9554";
+   reg = <0x39>;
+   gpio-controller;
+   #gpio-cells = <2>;
+
+   smbus0 {
+   gpio-hog;
+   gpios = <4 GPIO_ACTIVE_HIGH>;
+   output-high;
+   line-name = "smbus0";
+   };
+   };
};
 
bus9_mux232: i2c@1 {
#address-cells = <1>;
#size-cells = <0>;
reg = <1>;
+
+   tca9554@39 {
+   compatible = "ti,tca9554";
+   reg = <0x39>;
+   gpio-controller;
+   #gpio-cells = <2>;
+
+   smbus1 {
+   gpio-hog;
+   gpios = <4 GPIO_ACTIVE_HIGH>;
+   output-high;
+   line-name = "smbus1";
+   };
+   };
};
 
bus9_mux233: i2c@2 {
@@ -855,12 +883,40 @@
#address-cells = <1>;
#size-cells = <0>;
reg = <0>;
+
+   tca9554@39 {
+   compatible = "ti,tca9554";
+   reg = <0x39>;
+   gpio-controller;
+   #gpio-cells = <2>;
+
+   smbus2 {
+   gpio-hog;
+   gpios = <4 GPIO_ACTIVE_HIGH>;
+   output-high;
+   line-name = "smbus2";
+   };
+   };
};
 
bus9_mux236: i2c@1 {
#address-cells = <1>;
#size-cells = <0>;
reg = <1>;
+
+   tca9554@39 {
+   compatible = "ti,tca9554";
+   reg = <0x39>;
+   gpio-controller;
+   #gpio-cells = <2>;
+
+   smbus3 {
+   gpio-hog;
+   gpios = <4 GPIO_ACTIVE_HIGH>;
+   output-high;
+   line-name = "smbus3";
+   };
+   };
};
 
bus9_mux237: i2c@2 {
@@ -909,12 +965,40 @@
#address-cells = <1>;
#size-cells = <0>;
reg = <0>;
+
+   tca9554@39 {
+   compatible = "ti,tca9554";
+   reg = <0x39>;
+   gpio-controller;
+   #gpio-cells = <2>;
+
+   smbus4 {
+   gpio-hog;
+   gpios = <4 GPIO_ACTIVE_HIGH>;
+   output-high;
+   line-name = "smbus4";
+   };
+   };
};
 
bus10_mux240: i2c@1 {
#address-cells = <1>;
#size-cells = <0>;
reg = <1>;
+
+   tca9554@39 {
+   compatible = "ti,tca9554";
+   reg = <0x39>;
+   gpio-controller;
+   

[PATCH v1] ARM: dts: aspeed: mihawk: add aliases for i2c and add thermal sensor

2020-06-03 Thread Ben Pai
1.Set the bus id for each mux channel to avoid switching channels
multiple times
2.Set smbus_en of IO expander to 1 in order to be able to read tmp401
sensor
3.Add 8 tmp401 thermal sensors

Signed-off-by: Ben Pai 
---
 arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts | 449 +++-
 1 file changed, 444 insertions(+), 5 deletions(-)

diff --git a/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts 
b/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts
index f7e935ede919..78451b283d93 100644
--- a/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts
+++ b/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts
@@ -7,7 +7,52 @@
 / {
model = "Mihawk BMC";
compatible = "ibm,mihawk-bmc", "aspeed,ast2500";
-
+   aliases {
+   i2c215 = _mux215;
+   i2c216 = _mux216;
+   i2c217 = _mux217;
+   i2c218 = _mux218;
+   i2c219 = _mux219;
+   i2c220 = _mux220;
+   i2c221 = _mux221;
+   i2c222 = _mux222;
+   i2c223 = _mux223;
+   i2c224 = _mux224;
+   i2c225 = _mux225;
+   i2c226 = _mux226;
+   i2c227 = _mux227;
+   i2c228 = _mux228;
+   i2c229 = _mux229;
+   i2c230 = _mux230;
+   i2c231 = _mux231;
+   i2c232 = _mux232;
+   i2c233 = _mux233;
+   i2c234 = _mux234;
+   i2c235 = _mux235;
+   i2c236 = _mux236;
+   i2c237 = _mux237;
+   i2c238 = _mux238;
+   i2c239 = _mux239;
+   i2c240 = _mux240;
+   i2c241 = _mux241;
+   i2c242 = _mux242;
+   i2c243 = _mux243;
+   i2c244 = _mux244;
+   i2c245 = _mux245;
+   i2c246 = _mux246;
+   i2c247 = _mux247;
+   i2c248 = _mux248;
+   i2c249 = _mux249;
+   i2c250 = _mux250;
+   i2c251 = _mux251;
+   i2c252 = _mux252;
+   i2c253 = _mux253;
+   i2c254 = _mux254;
+   i2c255 = _mux255;
+   i2c256 = _mux256;
+   i2c257 = _mux257;
+   i2c258 = _mux258;
+   };
 
chosen {
stdout-path = 
@@ -630,6 +675,55 @@
#address-cells = <1>;
#size-cells = <0>;
reg = <0x70>;
+
+   bus6_mux215: i2c@0 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   reg = <0>;
+   };
+
+   bus6_mux216: i2c@1 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   reg = <1>;
+   };
+
+   bus6_mux217: i2c@2 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   reg = <2>;
+   };
+
+   bus6_mux218: i2c@3 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   reg = <3>;
+   };
+
+   bus6_mux219: i2c@4 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   reg = <4>;
+   };
+
+   bus6_mux220: i2c@5 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   reg = <5>;
+   };
+
+   bus6_mux221: i2c@6 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   reg = <6>;
+   };
+
+   bus6_mux222: i2c@7 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   reg = <7>;
+   };
+
};
 
 };
@@ -644,6 +738,55 @@
#address-cells = <1>;
#size-cells = <0>;
reg = <0x70>;
+
+   bus7_mux223: i2c@0 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   reg = <0>;
+   };
+
+   bus7_mux224: i2c@1 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   reg = <1>;
+   };
+
+   bus7_mux225: i2c@2 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   reg = <2>;
+   };
+
+   bus7_mux226: i2c@3 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   reg = <3>;
+

Re: [PATCH v2] KVM: PPC: Book3S HV: relax check on H_SVM_INIT_ABORT

2020-05-21 Thread Ram Pai
On Wed, May 20, 2020 at 07:43:08PM +0200, Laurent Dufour wrote:
> The commit 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_*
> Hcalls") added checks of secure bit of SRR1 to filter out the Hcall
> reserved to the Ultravisor.
> 
> However, the Hcall H_SVM_INIT_ABORT is made by the Ultravisor passing the
> context of the VM calling UV_ESM. This allows the Hypervisor to return to
> the guest without going through the Ultravisor. Thus the Secure bit of SRR1
> is not set in that particular case.
> 
> In the case a regular VM is calling H_SVM_INIT_ABORT, this hcall will be
> filtered out in kvmppc_h_svm_init_abort() because kvm->arch.secure_guest is
> not set in that case.
> 
> Fixes: 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_* Hcalls")
> Signed-off-by: Laurent Dufour 


Reviewed-by: Ram Pai 

> ---
>  arch/powerpc/kvm/book3s_hv.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 93493f0cbfe8..6ad1a3b14300 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -1099,9 +1099,12 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
>   ret = kvmppc_h_svm_init_done(vcpu->kvm);
>   break;
>   case H_SVM_INIT_ABORT:
> - ret = H_UNSUPPORTED;
> - if (kvmppc_get_srr1(vcpu) & MSR_S)
> - ret = kvmppc_h_svm_init_abort(vcpu->kvm);
> + /*
> +  * Even if that call is made by the Ultravisor, the SSR1 value
> +  * is the guest context one, with the secure bit clear as it has
> +  * not yet been secured. So we can't check it here.
> +  */

Frankly speaking, the comment above when read in isolation; i.e without
the delete code above, feels out of place.  The reasoning for change is
anyway captured in the changelog.  So, I think, we should delete this
comment.

Also the comment above assumes the Ultravisor will call H_SVM_INIT_ABORT
with SRR1(S) bit not set; which may or may not be true.  Regardless of
who and how H_SVM_INIT_ABORT is called, we should just call
kvmppc_h_svm_init_abort() and let it deal with the complexities.


RP


Re: [PATCH] powerpc/prom_init: Undo relocation before entering secure mode

2019-10-18 Thread Ram Pai
On Wed, Sep 11, 2019 at 01:34:33PM -0300, Thiago Jung Bauermann wrote:
> The ultravisor will do an integrity check of the kernel image but we
> relocated it so the check will fail. Restore the original image by
> relocating it back to the kernel virtual base address.
> 
> This works because during build vmlinux is linked with an expected virtual
> runtime address of KERNELBASE.
> 
> Fixes: 6a9c930bd775 ("powerpc/prom_init: Add the ESM call to prom_init")
> Signed-off-by: Thiago Jung Bauermann 

Tested-by: Ram Pai 


> ---
>  arch/powerpc/include/asm/elf.h |  3 +++
>  arch/powerpc/kernel/prom_init.c| 11 +++
>  arch/powerpc/kernel/prom_init_check.sh |  3 ++-
>  3 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/elf.h b/arch/powerpc/include/asm/elf.h
> index 409c9bfb43d9..57c229a86f08 100644
> --- a/arch/powerpc/include/asm/elf.h
> +++ b/arch/powerpc/include/asm/elf.h
> @@ -175,4 +175,7 @@ do {  
> \
>   ARCH_DLINFO_CACHE_GEOMETRY; \
>  } while (0)
> 
> +/* Relocate the kernel image to @final_address */
> +void relocate(unsigned long final_address);
> +
>  #endif /* _ASM_POWERPC_ELF_H */
> diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
> index 74f70f90eff0..44b1d404250e 100644
> --- a/arch/powerpc/kernel/prom_init.c
> +++ b/arch/powerpc/kernel/prom_init.c
> @@ -3249,7 +3249,18 @@ static void setup_secure_guest(unsigned long kbase, 
> unsigned long fdt)
>   /* Switch to secure mode. */
>   prom_printf("Switching to secure mode.\n");
> 
> + /*
> +  * The ultravisor will do an integrity check of the kernel image but we
> +  * relocated it so the check will fail. Restore the original image by
> +  * relocating it back to the kernel virtual base address.
> +  */
> + relocate(KERNELBASE);
> +
>   ret = enter_secure_mode(kbase, fdt);
> +
> + /* Relocate the kernel again. */
> + relocate(kbase);
> +
>   if (ret != U_SUCCESS) {
>   prom_printf("Returned %d from switching to secure mode.\n", 
> ret);
>   prom_rtas_os_term("Switch to secure mode failed.\n");
> diff --git a/arch/powerpc/kernel/prom_init_check.sh 
> b/arch/powerpc/kernel/prom_init_check.sh
> index 160bef0d553d..16535ccc0fa0 100644
> --- a/arch/powerpc/kernel/prom_init_check.sh
> +++ b/arch/powerpc/kernel/prom_init_check.sh
> @@ -26,7 +26,8 @@ _end enter_prom $MEM_FUNCS reloc_offset __secondary_hold
>  __secondary_hold_acknowledge __secondary_hold_spinloop __start
>  logo_linux_clut224 btext_prepare_BAT
>  reloc_got2 kernstart_addr memstart_addr linux_banner _stext
> -__prom_init_toc_start __prom_init_toc_end btext_setup_display TOC."
> +__prom_init_toc_start __prom_init_toc_end btext_setup_display TOC.
> +relocate"
> 
>  NM="$1"
>  OBJ="$2"

-- 
Ram Pai



RE: [PATCH 2/2] virtio_ring: Use DMA API if memory is encrypted

2019-10-16 Thread Ram Pai
On Tue, Oct 15, 2019 at 09:35:01AM +0200, Christoph Hellwig wrote:
> On Fri, Oct 11, 2019 at 06:25:19PM -0700, Ram Pai wrote:
> > From: Thiago Jung Bauermann 
> > 
> > Normally, virtio enables DMA API with VIRTIO_F_IOMMU_PLATFORM, which must
> > be set by both device and guest driver. However, as a hack, when DMA API
> > returns physical addresses, guest driver can use the DMA API; even though
> > device does not set VIRTIO_F_IOMMU_PLATFORM and just uses physical
> > addresses.
> 
> Sorry, but this is a complete bullshit hack.  Driver must always use
> the DMA API if they do DMA, and if virtio devices use physical addresses
> that needs to be returned through the platform firmware interfaces for
> the dma setup.  If you don't do that yet (which based on previous
> informations you don't), you need to fix it, and we can then quirk
> old implementations that already are out in the field.
> 
> In other words: we finally need to fix that virtio mess and not pile
> hacks on top of hacks.

So force all virtio devices to use DMA API, except when
VIRTIO_F_IOMMU_PLATFORM is not enabled?

Any help detailing the idea, will enable us fix this issue once for all.

Will something like below work? It removes the prior hacks, and
always uses DMA API; except when VIRTIO_F_IOMMU_PLATFORM is not enabled.

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index c8be1c4..b593d3d 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -240,22 +240,10 @@ static inline bool virtqueue_use_indirect(struct 
virtqueue *_vq,
 
 static bool vring_use_dma_api(struct virtio_device *vdev)
 {
-   if (!virtio_has_iommu_quirk(vdev))
-   return true;
-
-   /* Otherwise, we are left to guess. */
-   /*
-* In theory, it's possible to have a buggy QEMU-supposed
-* emulated Q35 IOMMU and Xen enabled at the same time.  On
-* such a configuration, virtio has never worked and will
-* not work without an even larger kludge.  Instead, enable
-* the DMA API if we're a Xen guest, which at least allows
-* all of the sensible Xen configurations to work correctly.
-*/
-   if (xen_domain())
-   return true;
+   if (virtio_has_iommu_quirk(vdev))
+   return false;
 
-   return false;
+   return true;
 }
 
 size_t virtio_max_dma_size(struct virtio_device *vdev)


-- 
Ram Pai



Re: [PATCH] powerpc/pkeys: remove unused pkey_allows_readwrite

2019-09-17 Thread Ram Pai
On Tue, Sep 17, 2019 at 11:22:30AM -0400, Qian Cai wrote:
> pkey_allows_readwrite() was first introduced in the commit 5586cf61e108
> ("powerpc: introduce execute-only pkey"), but the usage was removed
> entirely in the commit a4fcc877d4e1 ("powerpc/pkeys: Preallocate
> execute-only key").
> 
> Found by the "-Wunused-function" compiler warning flag.
> 
> Fixes: a4fcc877d4e1 ("powerpc/pkeys: Preallocate execute-only key")
> Signed-off-by: Qian Cai 
> ---
>  arch/powerpc/mm/book3s64/pkeys.c | 10 --
>  1 file changed, 10 deletions(-)
> 
> diff --git a/arch/powerpc/mm/book3s64/pkeys.c 
> b/arch/powerpc/mm/book3s64/pkeys.c
> index ae7fca40e5b3..59e0ebbd8036 100644
> --- a/arch/powerpc/mm/book3s64/pkeys.c
> +++ b/arch/powerpc/mm/book3s64/pkeys.c
> @@ -307,16 +307,6 @@ void thread_pkey_regs_init(struct thread_struct *thread)
>   write_iamr(pkey_iamr_mask);
>  }
> 
> -static inline bool pkey_allows_readwrite(int pkey)
> -{
> - int pkey_shift = pkeyshift(pkey);
> -
> - if (!is_pkey_enabled(pkey))
> - return true;
> -
> - return !(read_amr() & ((AMR_RD_BIT|AMR_WR_BIT) << pkey_shift));
> -}
> -
>  int __execute_only_pkey(struct mm_struct *mm)
>  {
>   return mm->context.execute_only_pkey;

The function was initially used by __execute_only_pkey(), but ones we
changed the implementation of __execute_only_pkey(), the need for 
pkey_allows_readwrite() disappeared.

Acked-by: Ram Pai 

-- 
Ram Pai



[PATCH v4 4/4] ARM: dts: aspeed: Add Mihawk BMC platform

2019-08-02 Thread Ben Pai
The Mihawk BMC is an ASPEED ast2500 based BMC that is part of an
OpenPower Power9 server.

Signed-off-by: Ben Pai 
---
 arch/arm/boot/dts/Makefile  |   1 +
 arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts | 918 
 2 files changed, 919 insertions(+)
 create mode 100755 arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts

diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
index eb6de52c1936..cdfe0f43ffd3 100644
--- a/arch/arm/boot/dts/Makefile
+++ b/arch/arm/boot/dts/Makefile
@@ -1275,6 +1275,7 @@ dtb-$(CONFIG_ARCH_ASPEED) += \
aspeed-bmc-lenovo-hr630.dtb \
aspeed-bmc-microsoft-olympus.dtb \
aspeed-bmc-opp-lanyang.dtb \
+   aspeed-bmc-opp-mihawk.dtb \
aspeed-bmc-opp-palmetto.dtb \
aspeed-bmc-opp-romulus.dtb \
aspeed-bmc-opp-swift.dtb \
diff --git a/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts 
b/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts
new file mode 100755
index ..bbf4a4671421
--- /dev/null
+++ b/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts
@@ -0,0 +1,918 @@
+/dts-v1/;
+
+#include "aspeed-g5.dtsi"
+#include 
+#include 
+
+/ {
+   model = "Mihawk BMC";
+   compatible = "ibm,mihawk-bmc", "aspeed,ast2500";
+
+
+   chosen {
+   stdout-path = 
+   bootargs = "console=ttyS4,115200 earlyprintk";
+   };
+
+   memory@8000 {
+   reg = <0x8000 0x2000>; /* address and size of 
RAM(512MB) */
+   };
+
+   reserved-memory {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   ranges;
+
+   flash_memory: region@9800 {
+   no-map;
+   reg = <0x9800 0x0400>; /* 64M */
+   };
+
+   gfx_memory: framebuffer {
+   size = <0x0100>;
+   alignment = <0x0100>;
+   compatible = "shared-dma-pool";
+   reusable;
+   };
+
+   video_engine_memory: jpegbuffer {
+   size = <0x0200>;/* 32MM */
+   alignment = <0x0100>;
+   compatible = "shared-dma-pool";
+   reusable;
+   };
+   };
+
+   gpio-keys {
+   compatible = "gpio-keys";
+
+   air-water {
+   label = "air-water";
+   gpios = < ASPEED_GPIO(F, 6) GPIO_ACTIVE_LOW>;
+   linux,code = ;
+   };
+
+   checkstop {
+   label = "checkstop";
+   gpios = < ASPEED_GPIO(J, 2) GPIO_ACTIVE_LOW>;
+   linux,code = ;
+   };
+
+   ps0-presence {
+   label = "ps0-presence";
+   gpios = < ASPEED_GPIO(Z, 2) GPIO_ACTIVE_LOW>;
+   linux,code = ;
+   };
+
+   ps1-presence {
+   label = "ps1-presence";
+   gpios = < ASPEED_GPIO(Z, 0) GPIO_ACTIVE_LOW>;
+   linux,code = ;
+   };
+   id-button {
+   label = "id-button";
+   gpios = < ASPEED_GPIO(F, 1) GPIO_ACTIVE_LOW>;
+   linux,code = ;
+   };
+   };
+
+   gpio-keys-polled {
+   compatible = "gpio-keys-polled";
+   poll-interval = <1000>;
+
+   fan0-presence {
+   label = "fan0-presence";
+   gpios = < 9 GPIO_ACTIVE_LOW>;
+   linux,code = <9>;
+   };
+
+   fan1-presence {
+   label = "fan1-presence";
+   gpios = < 10 GPIO_ACTIVE_LOW>;
+   linux,code = <10>;
+   };
+
+   fan2-presence {
+   label = "fan2-presence";
+   gpios = < 11 GPIO_ACTIVE_LOW>;
+   linux,code = <11>;
+   };
+
+   fan3-presence {
+   label = "fan3-presence";
+   gpios = < 12 GPIO_ACTIVE_LOW>;
+   linux,code = <12>;
+   };
+
+   fan4-presence {
+   label = "fan4-presence";
+   gpios = < 13 GPIO_ACTIVE_LOW>;
+   linux,code = <13>;
+   };
+
+   fan5-presence {
+   label = "fan5-presence";
+   gpios = 

[PATCH v3 3/3] ARM: dts: aspeed: Add Mihawk BMC platform

2019-08-01 Thread Ben Pai
The Mihawk BMC is an ASPEED ast2500 based BMC that is part of an
OpenPower Power9 server.

Signed-off-by: Ben Pai 
---
 arch/arm/boot/dts/Makefile  |   1 +
 arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts | 902 
 2 files changed, 903 insertions(+)
 create mode 100755 arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts

diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
index eb6de52c1936..cdfe0f43ffd3 100644
--- a/arch/arm/boot/dts/Makefile
+++ b/arch/arm/boot/dts/Makefile
@@ -1275,6 +1275,7 @@ dtb-$(CONFIG_ARCH_ASPEED) += \
aspeed-bmc-lenovo-hr630.dtb \
aspeed-bmc-microsoft-olympus.dtb \
aspeed-bmc-opp-lanyang.dtb \
+   aspeed-bmc-opp-mihawk.dtb \
aspeed-bmc-opp-palmetto.dtb \
aspeed-bmc-opp-romulus.dtb \
aspeed-bmc-opp-swift.dtb \
diff --git a/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts 
b/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts
new file mode 100755
index ..ca42057c0c1f
--- /dev/null
+++ b/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts
@@ -0,0 +1,902 @@
+/dts-v1/;
+
+#include "aspeed-g5.dtsi"
+#include 
+#include 
+
+/ {
+   model = "Mihawk BMC";
+   compatible = "ibm,mihawk-bmc", "aspeed,ast2500";
+
+
+   chosen {
+   stdout-path = 
+   bootargs = "console=ttyS4,115200 earlyprintk";
+   };
+
+   memory@8000 {
+   reg = <0x8000 0x2000>; /* address and size of 
RAM(512MB) */
+   };
+
+   reserved-memory {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   ranges;
+
+   flash_memory: region@9800 {
+   no-map;
+   reg = <0x9800 0x0400>; /* 64M */
+   };
+
+   gfx_memory: framebuffer {
+   size = <0x0100>;
+   alignment = <0x0100>;
+   compatible = "shared-dma-pool";
+   reusable;
+   };
+
+   video_engine_memory: jpegbuffer {
+   size = <0x0200>;/* 32MM */
+   alignment = <0x0100>;
+   compatible = "shared-dma-pool";
+   reusable;
+   };
+   };
+
+   gpio-keys {
+   compatible = "gpio-keys";
+
+   air-water {
+   label = "air-water";
+   gpios = < ASPEED_GPIO(F, 6) GPIO_ACTIVE_LOW>;
+   linux,code = ;
+   };
+
+   checkstop {
+   label = "checkstop";
+   gpios = < ASPEED_GPIO(J, 2) GPIO_ACTIVE_LOW>;
+   linux,code = ;
+   };
+
+   ps0-presence {
+   label = "ps0-presence";
+   gpios = < ASPEED_GPIO(Z, 2) GPIO_ACTIVE_LOW>;
+   linux,code = ;
+   };
+
+   ps1-presence {
+   label = "ps1-presence";
+   gpios = < ASPEED_GPIO(Z, 0) GPIO_ACTIVE_LOW>;
+   linux,code = ;
+   };
+   id-button {
+   label = "id-button";
+   gpios = < ASPEED_GPIO(F, 1) GPIO_ACTIVE_LOW>;
+   linux,code = ;
+   };
+   };
+
+   gpio-keys-polled {
+   compatible = "gpio-keys-polled";
+   poll-interval = <1000>;
+
+   fan0-presence {
+   label = "fan0-presence";
+   gpios = < 9 GPIO_ACTIVE_LOW>;
+   linux,code = <9>;
+   };
+
+   fan1-presence {
+   label = "fan1-presence";
+   gpios = < 10 GPIO_ACTIVE_LOW>;
+   linux,code = <10>;
+   };
+
+   fan2-presence {
+   label = "fan2-presence";
+   gpios = < 11 GPIO_ACTIVE_LOW>;
+   linux,code = <11>;
+   };
+
+   fan3-presence {
+   label = "fan3-presence";
+   gpios = < 12 GPIO_ACTIVE_LOW>;
+   linux,code = <12>;
+   };
+
+   fan4-presence {
+   label = "fan4-presence";
+   gpios = < 13 GPIO_ACTIVE_LOW>;
+   linux,code = <13>;
+   };
+
+   fan5-presence {
+   label = "fan5-presence";
+   gpios = 

[PATCH v2 2/2] ARM: dts: aspeed: Add Mihawk BMC platform

2019-07-31 Thread Ben Pai
The Mihawk BMC is an ASPEED ast2500 based BMC that is part of an
OpenPower Power9 server.

Signed-off-by: Ben Pai 
---
 arch/arm/boot/dts/Makefile  |   1 +
 arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts | 907 
 2 files changed, 908 insertions(+)
 create mode 100755 arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts

diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
index eb6de52c1936..262345544359 100644
--- a/arch/arm/boot/dts/Makefile
+++ b/arch/arm/boot/dts/Makefile
@@ -1281,5 +1281,6 @@ dtb-$(CONFIG_ARCH_ASPEED) += \
aspeed-bmc-opp-vesnin.dtb \
aspeed-bmc-opp-witherspoon.dtb \
aspeed-bmc-opp-zaius.dtb \
+   aspeed-bmc-opp-mihawk.dtb \
aspeed-bmc-portwell-neptune.dtb \
aspeed-bmc-quanta-q71l.dtb
diff --git a/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts 
b/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts
new file mode 100755
index ..913c94326f3f
--- /dev/null
+++ b/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts
@@ -0,0 +1,907 @@
+/dts-v1/;
+
+#include "aspeed-g5.dtsi"
+#include 
+#include 
+
+/ {
+   model = "Mihawk BMC";
+   compatible = "ibm,mihawk-bmc", "aspeed,ast2500";
+
+
+   chosen {
+   stdout-path = 
+   bootargs = "console=ttyS4,115200 earlyprintk";
+   };
+
+   memory@8000 {
+   reg = <0x8000 0x2000>; /* address and size of 
RAM(512MB) */
+   };
+
+   reserved-memory {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   ranges;
+
+   flash_memory: region@9800 {
+   no-map;
+   reg = <0x9800 0x0400>; /* 64M */
+   };
+
+   gfx_memory: framebuffer {
+   size = <0x0100>;
+   alignment = <0x0100>;
+   compatible = "shared-dma-pool";
+   reusable;
+   };
+
+   video_engine_memory: jpegbuffer {
+   size = <0x0200>;/* 32MM */
+   alignment = <0x0100>;
+   compatible = "shared-dma-pool";
+   reusable;
+   };
+   };
+
+   gpio-keys {
+   compatible = "gpio-keys";
+
+   air-water {
+   label = "air-water";
+   gpios = < ASPEED_GPIO(F, 6) GPIO_ACTIVE_LOW>;
+   linux,code = ;
+   };
+
+   checkstop {
+   label = "checkstop";
+   gpios = < ASPEED_GPIO(J, 2) GPIO_ACTIVE_LOW>;
+   linux,code = ;
+   };
+
+   ps0-presence {
+   label = "ps0-presence";
+   gpios = < ASPEED_GPIO(Z, 2) GPIO_ACTIVE_LOW>;
+   linux,code = ;
+   };
+
+   ps1-presence {
+   label = "ps1-presence";
+   gpios = < ASPEED_GPIO(Z, 0) GPIO_ACTIVE_LOW>;
+   linux,code = ;
+   };
+   id-button {
+   label = "id-button";
+   gpios = < ASPEED_GPIO(F, 1) GPIO_ACTIVE_LOW>;
+   linux,code = ;
+   };
+   };
+
+   gpio-keys-polled {
+   compatible = "gpio-keys-polled";
+   #address-cells = <1>;
+   poll-interval = <1000>;
+
+   fan0-presence {
+   label = "fan0-presence";
+   gpios = < 9 GPIO_ACTIVE_LOW>;
+   linux,code = <9>;
+   };
+
+   fan1-presence {
+   label = "fan1-presence";
+   gpios = < 10 GPIO_ACTIVE_LOW>;
+   linux,code = <10>;
+   };
+
+   fan2-presence {
+   label = "fan2-presence";
+   gpios = < 11 GPIO_ACTIVE_LOW>;
+   linux,code = <11>;
+   };
+
+   fan3-presence {
+   label = "fan3-presence";
+   gpios = < 12 GPIO_ACTIVE_LOW>;
+   linux,code = <12>;
+   };
+
+   fan4-presence {
+   label = "fan4-presence";
+   gpios = < 13 GPIO_ACTIVE_LOW>;
+   linux,code = <13>;
+   };
+
+   fan5-presence {
+   label = "fan5-presence";
+   gp

[PATCH] ARM: dts: aspeed: Add Mihawk BMC platform

2019-07-30 Thread Ben Pai
The Mihawk BMC is an ASPEED ast2500 based BMC that is part of an
OpenPower Power9 server.

This adds the device tree description for most upstream components. It
is a squashed commit from the OpenBMC kernel tree.

Signed-off-by: Ben Pai 
---
 arch/arm/boot/dts/Makefile  |   1 +
 arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts | 922 
 2 files changed, 923 insertions(+)
 create mode 100755 arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts

diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
index eb6de52c1936..262345544359 100644
--- a/arch/arm/boot/dts/Makefile
+++ b/arch/arm/boot/dts/Makefile
@@ -1281,5 +1281,6 @@ dtb-$(CONFIG_ARCH_ASPEED) += \
aspeed-bmc-opp-vesnin.dtb \
aspeed-bmc-opp-witherspoon.dtb \
aspeed-bmc-opp-zaius.dtb \
+   aspeed-bmc-opp-mihawk.dtb \
aspeed-bmc-portwell-neptune.dtb \
aspeed-bmc-quanta-q71l.dtb
diff --git a/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts 
b/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts
new file mode 100755
index ..cfa20e0b2939
--- /dev/null
+++ b/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts
@@ -0,0 +1,922 @@
+/dts-v1/;
+
+#include "aspeed-g5.dtsi"
+#include 
+#include 
+
+/ {
+   model = "Mihawk BMC";
+   compatible = "ibm,mihawk-bmc", "aspeed,ast2500";
+
+
+   chosen {
+   stdout-path = 
+   bootargs = "console=ttyS4,115200 earlyprintk";
+   };
+
+   memory@8000 {
+   reg = <0x8000 0x2000>; /* address and size of 
RAM(512MB) */
+   };
+
+   reserved-memory {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   ranges;
+
+   flash_memory: region@9800 {
+   no-map;
+   reg = <0x9800 0x0400>; /* 64M */
+   };
+
+   gfx_memory: framebuffer {
+   size = <0x0100>;
+   alignment = <0x0100>;
+   compatible = "shared-dma-pool";
+   reusable;
+   };
+
+   video_engine_memory: jpegbuffer {
+   size = <0x0200>;/* 32MM */
+   alignment = <0x0100>;
+   compatible = "shared-dma-pool";
+   reusable;
+   };
+   };
+
+   gpio-keys {
+   compatible = "gpio-keys";
+
+   air-water {
+   label = "air-water";
+   gpios = < ASPEED_GPIO(F, 6) GPIO_ACTIVE_LOW>;
+   linux,code = ;
+   };
+
+   checkstop {
+   label = "checkstop";
+   gpios = < ASPEED_GPIO(J, 2) GPIO_ACTIVE_LOW>;
+   linux,code = ;
+   };
+
+   ps0-presence {
+   label = "ps0-presence";
+   gpios = < ASPEED_GPIO(Z, 2) GPIO_ACTIVE_LOW>;
+   linux,code = ;
+   };
+
+   ps1-presence {
+   label = "ps1-presence";
+   gpios = < ASPEED_GPIO(Z, 0) GPIO_ACTIVE_LOW>;
+   linux,code = ;
+   };
+   id-button {
+   label = "id-button";
+   gpios = < ASPEED_GPIO(F, 1) GPIO_ACTIVE_LOW>;
+   linux,code = ;
+   };
+   };
+
+   gpio-keys-polled {
+   compatible = "gpio-keys-polled";
+   #address-cells = <1>;
+   #size-cells = <0>;
+   poll-interval = <1000>;
+
+   fan0-presence {
+   label = "fan0-presence";
+   gpios = < 9 GPIO_ACTIVE_LOW>;
+   linux,code = <9>;
+   };
+
+   fan1-presence {
+   label = "fan1-presence";
+   gpios = < 10 GPIO_ACTIVE_LOW>;
+   linux,code = <10>;
+   };
+
+   fan2-presence {
+   label = "fan2-presence";
+   gpios = < 11 GPIO_ACTIVE_LOW>;
+   linux,code = <11>;
+   };
+
+   fan3-presence {
+   label = "fan3-presence";
+   gpios = < 12 GPIO_ACTIVE_LOW>;
+   linux,code = <12>;
+   };
+
+   fan4-presence {
+   label = "fan4-presence";
+   gpios = < 13 GPIO_ACTIVE_LOW>;
+  

[PATCH] ARM: dts: aspeed: Add Mihawk BMC platform

2019-07-17 Thread Ben Pai
The Mihawk BMC is an ASPEED ast2500 based BMC that is part of an
OpenPower Power9 server.

This adds the device tree description for most upstream components. It
is a squashed commit from the OpenBMC kernel tree.

Signed-off-by: Ben Pai 
---
 arch/arm/boot/dts/Makefile  |   1 +
 arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts | 922 
 2 files changed, 923 insertions(+)
 create mode 100755 arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts

diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
index dab2914fa293..5368d657781e 100644
--- a/arch/arm/boot/dts/Makefile
+++ b/arch/arm/boot/dts/Makefile
@@ -1274,5 +1274,6 @@ dtb-$(CONFIG_ARCH_ASPEED) += \
aspeed-bmc-opp-romulus.dtb \
aspeed-bmc-opp-witherspoon.dtb \
aspeed-bmc-opp-zaius.dtb \
+aspeed-bmc-opp-mihawk.dtb \
aspeed-bmc-portwell-neptune.dtb \
aspeed-bmc-quanta-q71l.dtb
diff --git a/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts 
b/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts
new file mode 100755
index ..cfa20e0b2939
--- /dev/null
+++ b/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts
@@ -0,0 +1,922 @@
+/dts-v1/;
+
+#include "aspeed-g5.dtsi"
+#include 
+#include 
+
+/ {
+   model = "Mihawk BMC";
+   compatible = "ibm,mihawk-bmc", "aspeed,ast2500";
+
+
+   chosen {
+   stdout-path = 
+   bootargs = "console=ttyS4,115200 earlyprintk";
+   };
+
+   memory@8000 {
+   reg = <0x8000 0x2000>; /* address and size of 
RAM(512MB) */
+   };
+
+   reserved-memory {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   ranges;
+
+   flash_memory: region@9800 {
+   no-map;
+   reg = <0x9800 0x0400>; /* 64M */
+   };
+
+   gfx_memory: framebuffer {
+   size = <0x0100>;
+   alignment = <0x0100>;
+   compatible = "shared-dma-pool";
+   reusable;
+   };
+
+   video_engine_memory: jpegbuffer {
+   size = <0x0200>;/* 32MM */
+   alignment = <0x0100>;
+   compatible = "shared-dma-pool";
+   reusable;
+   };
+   };
+
+   gpio-keys {
+   compatible = "gpio-keys";
+
+   air-water {
+   label = "air-water";
+   gpios = < ASPEED_GPIO(F, 6) GPIO_ACTIVE_LOW>;
+   linux,code = ;
+   };
+
+   checkstop {
+   label = "checkstop";
+   gpios = < ASPEED_GPIO(J, 2) GPIO_ACTIVE_LOW>;
+   linux,code = ;
+   };
+
+   ps0-presence {
+   label = "ps0-presence";
+   gpios = < ASPEED_GPIO(Z, 2) GPIO_ACTIVE_LOW>;
+   linux,code = ;
+   };
+
+   ps1-presence {
+   label = "ps1-presence";
+   gpios = < ASPEED_GPIO(Z, 0) GPIO_ACTIVE_LOW>;
+   linux,code = ;
+   };
+   id-button {
+   label = "id-button";
+   gpios = < ASPEED_GPIO(F, 1) GPIO_ACTIVE_LOW>;
+   linux,code = ;
+   };
+   };
+
+   gpio-keys-polled {
+   compatible = "gpio-keys-polled";
+   #address-cells = <1>;
+   #size-cells = <0>;
+   poll-interval = <1000>;
+
+   fan0-presence {
+   label = "fan0-presence";
+   gpios = < 9 GPIO_ACTIVE_LOW>;
+   linux,code = <9>;
+   };
+
+   fan1-presence {
+   label = "fan1-presence";
+   gpios = < 10 GPIO_ACTIVE_LOW>;
+   linux,code = <10>;
+   };
+
+   fan2-presence {
+   label = "fan2-presence";
+   gpios = < 11 GPIO_ACTIVE_LOW>;
+   linux,code = <11>;
+   };
+
+   fan3-presence {
+   label = "fan3-presence";
+   gpios = < 12 GPIO_ACTIVE_LOW>;
+   linux,code = <12>;
+   };
+
+   fan4-presence {
+   label = "fan4-presence";
+   gpios = < 13 GPIO_ACTIVE_LOW>;
+  

[RFC PATCH 1/1] powerpc/pseries/svm: Unshare all pages before kexecing a new kernel

2019-06-07 Thread Ram Pai
powerpc/pseries/svm: Unshare all pages before kexecing a new kernel.

A new kernel deserves a clean slate. Any pages shared with the
hypervisor is unshared before invoking the new kernel. However there are
exceptions.  If the new kernel is invoked to dump the current kernel, or
if there is a explicit request to preserve the state of the current
kernel, unsharing of pages is skipped.
 
NOTE: Reserve atleast 256M for crashkernel.  Otherwise SWIOTLB
allocation fails and crash kernel fails to boot.
 
Signed-off-by: Ram Pai 

diff --git a/arch/powerpc/include/asm/ultravisor-api.h 
b/arch/powerpc/include/asm/ultravisor-api.h
index 8a6c5b4d..c8dd470 100644
--- a/arch/powerpc/include/asm/ultravisor-api.h
+++ b/arch/powerpc/include/asm/ultravisor-api.h
@@ -31,5 +31,6 @@
 #define UV_UNSHARE_PAGE0xF134
 #define UV_PAGE_INVAL  0xF138
 #define UV_SVM_TERMINATE   0xF13C
+#define UV_UNSHARE_ALL_PAGES   0xF140
 
 #endif /* _ASM_POWERPC_ULTRAVISOR_API_H */
diff --git a/arch/powerpc/include/asm/ultravisor.h 
b/arch/powerpc/include/asm/ultravisor.h
index bf5ac05..73c44ff 100644
--- a/arch/powerpc/include/asm/ultravisor.h
+++ b/arch/powerpc/include/asm/ultravisor.h
@@ -120,6 +120,12 @@ static inline int uv_unshare_page(u64 pfn, u64 npages)
return ucall(UV_UNSHARE_PAGE, retbuf, pfn, npages);
 }
 
+static inline int uv_unshare_all_pages(void)
+{
+   unsigned long retbuf[UCALL_BUFSIZE];
+
+   return ucall(UV_UNSHARE_ALL_PAGES, retbuf);
+}
 #endif /* !__ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_ULTRAVISOR_H */
diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
b/arch/powerpc/kernel/machine_kexec_64.c
index 75692c3..a93e3ab 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -329,6 +329,13 @@ void default_machine_kexec(struct kimage *image)
 #ifdef CONFIG_PPC_PSERIES
kexec_paca.lppaca_ptr = NULL;
 #endif
+
+   if (is_svm_platform() && !(image->preserve_context ||
+  image->type == KEXEC_TYPE_CRASH)) {
+   uv_unshare_all_pages();
+   printk("kexec: Unshared all shared pages.\n");
+   }
+
paca_ptrs[kexec_paca.paca_index] = _paca;
 
setup_paca(_paca);



Re: Re: [RFC PATCH 02/12] powerpc: Add support for adding an ESM blob to the zImage wrapper

2019-05-21 Thread Ram Pai
On Tue, May 21, 2019 at 07:13:26AM +0200, Christoph Hellwig wrote:
> On Tue, May 21, 2019 at 01:49:02AM -0300, Thiago Jung Bauermann wrote:
> > From: Benjamin Herrenschmidt 
> > 
> > For secure VMs, the signing tool will create a ticket called the "ESM blob"
> > for the Enter Secure Mode ultravisor call with the signatures of the kernel
> > and initrd among other things.
> > 
> > This adds support to the wrapper script for adding that blob via the "-e"
> > option to the zImage.pseries.
> > 
> > It also adds code to the zImage wrapper itself to retrieve and if necessary
> > relocate the blob, and pass its address to Linux via the device-tree, to be
> > later consumed by prom_init.
> 
> Where does the "BLOB" come from?  How is it licensed and how can we
> satisfy the GPL with it?

The "BLOB" is not a piece of code. Its just a piece of data that gets
generated by our build tools. This data contains the
signed hash of the kernel, initrd, and kernel command line parameters.
Also it contains any information that the creator the the BLOB wants to
be made available to anyone needing it, inside the
secure-virtual-machine. All of this is integrity-protected and encrypted
to safegaurd it when at rest and at runtime.
 
Bottomline -- Blob is data, and hence no licensing implication. And due
to some reason, even data needs to have licensing statement, we can
make it available to have no conflicts with GPL.


-- 
Ram Pai



Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices

2018-05-24 Thread Ram Pai
On Wed, May 23, 2018 at 09:50:02PM +0300, Michael S. Tsirkin wrote:
> subj: s/virito/virtio/
> 
..snip..
> >  machine_subsys_initcall_sync(pseries, tce_iommu_bus_notifier_init);
> > +
> > +bool platform_forces_virtio_dma(struct virtio_device *vdev)
> > +{
> > +   /*
> > +* On protected guest platforms, force virtio core to use DMA
> > +* MAP API for all virtio devices. But there can also be some
> > +* exceptions for individual devices like virtio balloon.
> > +*/
> > +   return (of_find_compatible_node(NULL, NULL, "ibm,ultravisor") != NULL);
> > +}
> 
> Isn't this kind of slow?  vring_use_dma_api is on
> data path and supposed to be very fast.

Yes it is slow and not ideal. This won't be the final code. The final
code will cache the information in some global variable and used
in this function.

However this code was added to the RFC to illustrate the idea that dma
operation are needed only in a secure/protected environment.

RP



Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices

2018-05-24 Thread Ram Pai
On Wed, May 23, 2018 at 09:50:02PM +0300, Michael S. Tsirkin wrote:
> subj: s/virito/virtio/
> 
..snip..
> >  machine_subsys_initcall_sync(pseries, tce_iommu_bus_notifier_init);
> > +
> > +bool platform_forces_virtio_dma(struct virtio_device *vdev)
> > +{
> > +   /*
> > +* On protected guest platforms, force virtio core to use DMA
> > +* MAP API for all virtio devices. But there can also be some
> > +* exceptions for individual devices like virtio balloon.
> > +*/
> > +   return (of_find_compatible_node(NULL, NULL, "ibm,ultravisor") != NULL);
> > +}
> 
> Isn't this kind of slow?  vring_use_dma_api is on
> data path and supposed to be very fast.

Yes it is slow and not ideal. This won't be the final code. The final
code will cache the information in some global variable and used
in this function.

However this code was added to the RFC to illustrate the idea that dma
operation are needed only in a secure/protected environment.

RP



Re: [PATCH 9/8] powerpc/pkeys: Drop private VM_PKEY definitions

2018-05-10 Thread Ram Pai

Agree. Was going to send that the moment the other patches
landed upstream. Glad I dont have to do it :-)

Reviewed-by: Ram Pai <linux...@us.ibm.com>



On Thu, May 10, 2018 at 11:54:22PM +1000, Michael Ellerman wrote:
> Now that we've updated the generic headers to support 5 PKEY bits for
> powerpc we don't need our own #defines in arch code.
> 
> Signed-off-by: Michael Ellerman <m...@ellerman.id.au>
> ---
>  arch/powerpc/include/asm/pkeys.h | 15 ---
>  1 file changed, 15 deletions(-)
> 
> One additional patch to finish cleaning things up.
> 
> I've added this to my branch.
> 
> cheers
> 
> diff --git a/arch/powerpc/include/asm/pkeys.h 
> b/arch/powerpc/include/asm/pkeys.h
> index 18ef59a9886d..5ba80cffb505 100644
> --- a/arch/powerpc/include/asm/pkeys.h
> +++ b/arch/powerpc/include/asm/pkeys.h
> @@ -15,21 +15,6 @@ DECLARE_STATIC_KEY_TRUE(pkey_disabled);
>  extern int pkeys_total; /* total pkeys as per device tree */
>  extern u32 initial_allocation_mask; /* bits set for reserved keys */
> 
> -/*
> - * Define these here temporarily so we're not dependent on patching 
> linux/mm.h.
> - * Once it's updated we can drop these.
> - */
> -#ifndef VM_PKEY_BIT0
> -# define VM_PKEY_SHIFT   VM_HIGH_ARCH_BIT_0
> -# define VM_PKEY_BIT0VM_HIGH_ARCH_0
> -# define VM_PKEY_BIT1VM_HIGH_ARCH_1
> -# define VM_PKEY_BIT2VM_HIGH_ARCH_2
> -# define VM_PKEY_BIT3VM_HIGH_ARCH_3
> -# define VM_PKEY_BIT4VM_HIGH_ARCH_4
> -#elif !defined(VM_PKEY_BIT4)
> -# define VM_PKEY_BIT4VM_HIGH_ARCH_4
> -#endif
> -
>  #define ARCH_VM_PKEY_FLAGS (VM_PKEY_BIT0 | VM_PKEY_BIT1 | VM_PKEY_BIT2 | \
>   VM_PKEY_BIT3 | VM_PKEY_BIT4)
> 
> -- 
> 2.14.1

-- 
Ram Pai



Re: [PATCH 9/8] powerpc/pkeys: Drop private VM_PKEY definitions

2018-05-10 Thread Ram Pai

Agree. Was going to send that the moment the other patches
landed upstream. Glad I dont have to do it :-)

Reviewed-by: Ram Pai 



On Thu, May 10, 2018 at 11:54:22PM +1000, Michael Ellerman wrote:
> Now that we've updated the generic headers to support 5 PKEY bits for
> powerpc we don't need our own #defines in arch code.
> 
> Signed-off-by: Michael Ellerman 
> ---
>  arch/powerpc/include/asm/pkeys.h | 15 ---
>  1 file changed, 15 deletions(-)
> 
> One additional patch to finish cleaning things up.
> 
> I've added this to my branch.
> 
> cheers
> 
> diff --git a/arch/powerpc/include/asm/pkeys.h 
> b/arch/powerpc/include/asm/pkeys.h
> index 18ef59a9886d..5ba80cffb505 100644
> --- a/arch/powerpc/include/asm/pkeys.h
> +++ b/arch/powerpc/include/asm/pkeys.h
> @@ -15,21 +15,6 @@ DECLARE_STATIC_KEY_TRUE(pkey_disabled);
>  extern int pkeys_total; /* total pkeys as per device tree */
>  extern u32 initial_allocation_mask; /* bits set for reserved keys */
> 
> -/*
> - * Define these here temporarily so we're not dependent on patching 
> linux/mm.h.
> - * Once it's updated we can drop these.
> - */
> -#ifndef VM_PKEY_BIT0
> -# define VM_PKEY_SHIFT   VM_HIGH_ARCH_BIT_0
> -# define VM_PKEY_BIT0VM_HIGH_ARCH_0
> -# define VM_PKEY_BIT1VM_HIGH_ARCH_1
> -# define VM_PKEY_BIT2VM_HIGH_ARCH_2
> -# define VM_PKEY_BIT3VM_HIGH_ARCH_3
> -# define VM_PKEY_BIT4VM_HIGH_ARCH_4
> -#elif !defined(VM_PKEY_BIT4)
> -# define VM_PKEY_BIT4VM_HIGH_ARCH_4
> -#endif
> -
>  #define ARCH_VM_PKEY_FLAGS (VM_PKEY_BIT0 | VM_PKEY_BIT1 | VM_PKEY_BIT2 | \
>   VM_PKEY_BIT3 | VM_PKEY_BIT4)
> 
> -- 
> 2.14.1

-- 
Ram Pai



Re: [PATCH 4/8] mm/pkeys, powerpc, x86: Provide an empty vma_pkey() in linux/pkeys.h

2018-05-08 Thread Ram Pai
On Wed, May 09, 2018 at 12:59:44AM +1000, Michael Ellerman wrote:
> Consolidate the pkey handling by providing a common empty definition
> of vma_pkey() in pkeys.h when CONFIG_ARCH_HAS_PKEYS=n.
> 
> This also removes another entanglement of pkeys.h and
> asm/mmu_context.h.
>

Reviewed-by: Ram Pai <linux...@us.ibm.com>


> Signed-off-by: Michael Ellerman <m...@ellerman.id.au>
> ---
>  arch/powerpc/include/asm/mmu_context.h | 5 -
>  arch/x86/include/asm/mmu_context.h | 5 -
>  include/linux/pkeys.h  | 5 +
>  3 files changed, 5 insertions(+), 10 deletions(-)

RP



Re: [PATCH 4/8] mm/pkeys, powerpc, x86: Provide an empty vma_pkey() in linux/pkeys.h

2018-05-08 Thread Ram Pai
On Wed, May 09, 2018 at 12:59:44AM +1000, Michael Ellerman wrote:
> Consolidate the pkey handling by providing a common empty definition
> of vma_pkey() in pkeys.h when CONFIG_ARCH_HAS_PKEYS=n.
> 
> This also removes another entanglement of pkeys.h and
> asm/mmu_context.h.
>

Reviewed-by: Ram Pai 


> Signed-off-by: Michael Ellerman 
> ---
>  arch/powerpc/include/asm/mmu_context.h | 5 -
>  arch/x86/include/asm/mmu_context.h | 5 -
>  include/linux/pkeys.h  | 5 +
>  3 files changed, 5 insertions(+), 10 deletions(-)

RP



Re: [PATCH 2/8] mm, powerpc, x86: introduce an additional vma bit for powerpc pkey

2018-05-08 Thread Ram Pai
On Wed, May 09, 2018 at 12:59:42AM +1000, Michael Ellerman wrote:
> From: Ram Pai <linux...@us.ibm.com>
> 
> Currently only 4bits are allocated in the vma flags to hold 16
> keys. This is sufficient for x86. PowerPC  supports  32  keys,
> which needs 5bits. This patch allocates an  additional bit.
> 
> Reviewed-by: Ingo Molnar <mi...@kernel.org>
> Acked-by: Balbir Singh <bsinghar...@gmail.com>
> Signed-off-by: Ram Pai <linux...@us.ibm.com>
> [mpe: Fold in #if VM_PKEY_BIT4 as noticed by Dave Hansen]
> Signed-off-by: Michael Ellerman <m...@ellerman.id.au>
> ---
>  fs/proc/task_mmu.c | 3 +++
>  include/linux/mm.h | 3 ++-
>  2 files changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 541392a62608..c2163606e6fb 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -679,6 +679,9 @@ static void show_smap_vma_flags(struct seq_file *m, 
> struct vm_area_struct *vma)
>   [ilog2(VM_PKEY_BIT1)]   = "",
>   [ilog2(VM_PKEY_BIT2)]   = "",
>   [ilog2(VM_PKEY_BIT3)]   = "",
> +#if VM_PKEY_BIT4
> + [ilog2(VM_PKEY_BIT4)]   = "",
> +#endif
>  #endif /* CONFIG_ARCH_HAS_PKEYS */
>   };
>   size_t i;
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index c6a6f2492c1b..abfd758ff83a 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -231,9 +231,10 @@ extern unsigned int kobjsize(const void *objp);
>  #ifdef CONFIG_ARCH_HAS_PKEYS
>  # define VM_PKEY_SHIFT   VM_HIGH_ARCH_BIT_0
>  # define VM_PKEY_BIT0VM_HIGH_ARCH_0  /* A protection key is a 4-bit 
> value */
> -# define VM_PKEY_BIT1VM_HIGH_ARCH_1
> +# define VM_PKEY_BIT1VM_HIGH_ARCH_1  /* on x86 and 5-bit value on 
> ppc64   */
>  # define VM_PKEY_BIT2VM_HIGH_ARCH_2
>  # define VM_PKEY_BIT3VM_HIGH_ARCH_3
> +# define VM_PKEY_BIT4VM_HIGH_ARCH_4
>  #endif /* CONFIG_ARCH_HAS_PKEYS */

this has to be: 

+#if defined(CONFIG_PPC)
+# define VM_PKEY_BIT4  VM_HIGH_ARCH_4
+#else
+# define VM_PKEY_BIT4  0
+#endif



Re: [PATCH 2/8] mm, powerpc, x86: introduce an additional vma bit for powerpc pkey

2018-05-08 Thread Ram Pai
On Wed, May 09, 2018 at 12:59:42AM +1000, Michael Ellerman wrote:
> From: Ram Pai 
> 
> Currently only 4bits are allocated in the vma flags to hold 16
> keys. This is sufficient for x86. PowerPC  supports  32  keys,
> which needs 5bits. This patch allocates an  additional bit.
> 
> Reviewed-by: Ingo Molnar 
> Acked-by: Balbir Singh 
> Signed-off-by: Ram Pai 
> [mpe: Fold in #if VM_PKEY_BIT4 as noticed by Dave Hansen]
> Signed-off-by: Michael Ellerman 
> ---
>  fs/proc/task_mmu.c | 3 +++
>  include/linux/mm.h | 3 ++-
>  2 files changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 541392a62608..c2163606e6fb 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -679,6 +679,9 @@ static void show_smap_vma_flags(struct seq_file *m, 
> struct vm_area_struct *vma)
>   [ilog2(VM_PKEY_BIT1)]   = "",
>   [ilog2(VM_PKEY_BIT2)]   = "",
>   [ilog2(VM_PKEY_BIT3)]   = "",
> +#if VM_PKEY_BIT4
> + [ilog2(VM_PKEY_BIT4)]   = "",
> +#endif
>  #endif /* CONFIG_ARCH_HAS_PKEYS */
>   };
>   size_t i;
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index c6a6f2492c1b..abfd758ff83a 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -231,9 +231,10 @@ extern unsigned int kobjsize(const void *objp);
>  #ifdef CONFIG_ARCH_HAS_PKEYS
>  # define VM_PKEY_SHIFT   VM_HIGH_ARCH_BIT_0
>  # define VM_PKEY_BIT0VM_HIGH_ARCH_0  /* A protection key is a 4-bit 
> value */
> -# define VM_PKEY_BIT1VM_HIGH_ARCH_1
> +# define VM_PKEY_BIT1VM_HIGH_ARCH_1  /* on x86 and 5-bit value on 
> ppc64   */
>  # define VM_PKEY_BIT2VM_HIGH_ARCH_2
>  # define VM_PKEY_BIT3VM_HIGH_ARCH_3
> +# define VM_PKEY_BIT4VM_HIGH_ARCH_4
>  #endif /* CONFIG_ARCH_HAS_PKEYS */

this has to be: 

+#if defined(CONFIG_PPC)
+# define VM_PKEY_BIT4  VM_HIGH_ARCH_4
+#else
+# define VM_PKEY_BIT4  0
+#endif



Re: [PATCH v13 3/3] mm, powerpc, x86: introduce an additional vma bit for powerpc pkey

2018-05-04 Thread Ram Pai
On Fri, May 04, 2018 at 03:57:33PM -0700, Dave Hansen wrote:
> > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> > index 0c9e392..3c7 100644
> > --- a/fs/proc/task_mmu.c
> > +++ b/fs/proc/task_mmu.c
> > @@ -679,6 +679,7 @@ static void show_smap_vma_flags(struct seq_file *m, 
> > struct vm_area_struct *vma)
> > [ilog2(VM_PKEY_BIT1)]   = "",
> > [ilog2(VM_PKEY_BIT2)]   = "",
> > [ilog2(VM_PKEY_BIT3)]   = "",
> > +   [ilog2(VM_PKEY_BIT4)]   = "",
> >  #endif /* CONFIG_ARCH_HAS_PKEYS */
> ...
> > +#if defined(CONFIG_PPC)
> > +# define VM_PKEY_BIT4  VM_HIGH_ARCH_4
> > +#else 
> > +# define VM_PKEY_BIT4  0
> > +#endif
> >  #endif /* CONFIG_ARCH_HAS_PKEYS */
> 
> That new line boils down to:
> 
>   [ilog2(0)]  = "",
> 
> on x86.  It wasn't *obvious* to me that it is OK to do that.  The other
> possibly undefined bits (VM_SOFTDIRTY for instance) #ifdef themselves
> out of this array.
> 
> I would just be a wee bit worried that this would overwrite the 0 entry
> ("??") with "".

Yes it would :-( and could potentially break anything that depends on
0th entry being "??"

Is the following fix acceptable?

#if VM_PKEY_BIT4
[ilog2(VM_PKEY_BIT4)]   = "",
#endif

-- 
Ram Pai



Re: [PATCH v13 3/3] mm, powerpc, x86: introduce an additional vma bit for powerpc pkey

2018-05-04 Thread Ram Pai
On Fri, May 04, 2018 at 03:57:33PM -0700, Dave Hansen wrote:
> > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> > index 0c9e392..3c7 100644
> > --- a/fs/proc/task_mmu.c
> > +++ b/fs/proc/task_mmu.c
> > @@ -679,6 +679,7 @@ static void show_smap_vma_flags(struct seq_file *m, 
> > struct vm_area_struct *vma)
> > [ilog2(VM_PKEY_BIT1)]   = "",
> > [ilog2(VM_PKEY_BIT2)]   = "",
> > [ilog2(VM_PKEY_BIT3)]   = "",
> > +   [ilog2(VM_PKEY_BIT4)]   = "",
> >  #endif /* CONFIG_ARCH_HAS_PKEYS */
> ...
> > +#if defined(CONFIG_PPC)
> > +# define VM_PKEY_BIT4  VM_HIGH_ARCH_4
> > +#else 
> > +# define VM_PKEY_BIT4  0
> > +#endif
> >  #endif /* CONFIG_ARCH_HAS_PKEYS */
> 
> That new line boils down to:
> 
>   [ilog2(0)]  = "",
> 
> on x86.  It wasn't *obvious* to me that it is OK to do that.  The other
> possibly undefined bits (VM_SOFTDIRTY for instance) #ifdef themselves
> out of this array.
> 
> I would just be a wee bit worried that this would overwrite the 0 entry
> ("??") with "".

Yes it would :-( and could potentially break anything that depends on
0th entry being "??"

Is the following fix acceptable?

#if VM_PKEY_BIT4
[ilog2(VM_PKEY_BIT4)]   = "",
#endif

-- 
Ram Pai



[PATCH v11 3/3] mm, x86, powerpc: display pkey in smaps only if arch supports pkeys

2018-05-04 Thread Ram Pai
Currently the  architecture  specific code is expected to
display  the  protection  keys  in  smap  for a given vma.
This can lead to redundant code and possibly to divergent
formats in which the key gets displayed.

This  patch  changes  the implementation. It displays the
pkey only if the architecture support pkeys, i.e
arch_pkeys_enabled() returns true.  This patch
provides x86 implementation for arch_pkeys_enabled().

x86 arch_show_smap() function is not needed anymore.
Deleting it.

cc: Michael Ellermen <m...@ellerman.id.au>
cc: Benjamin Herrenschmidt <b...@kernel.crashing.org>
cc: Andrew Morton <a...@linux-foundation.org>
Reviewed-by: Dave Hansen <dave.han...@intel.com>
Signed-off-by: Thiago Jung Bauermann <bauer...@linux.vnet.ibm.com>
(fixed compilation errors for x86 configs)
Acked-by: Michal Hocko <mho...@suse.com>
Reviewed-by: Ingo Molnar <mi...@kernel.org>
Signed-off-by: Ram Pai <linux...@us.ibm.com>
---
 arch/powerpc/include/asm/mmu_context.h |5 -
 arch/x86/include/asm/mmu_context.h |5 -
 arch/x86/include/asm/pkeys.h   |1 +
 arch/x86/kernel/fpu/xstate.c   |5 +
 arch/x86/kernel/setup.c|8 
 fs/proc/task_mmu.c |   11 ++-
 include/linux/pkeys.h  |7 ++-
 7 files changed, 18 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h 
b/arch/powerpc/include/asm/mmu_context.h
index 1835ca1..896efa5 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -250,11 +250,6 @@ static inline bool arch_vma_access_permitted(struct 
vm_area_struct *vma,
 #define thread_pkey_regs_restore(new_thread, old_thread)
 #define thread_pkey_regs_init(thread)
 
-static inline int vma_pkey(struct vm_area_struct *vma)
-{
-   return 0;
-}
-
 static inline u64 pte_to_hpte_pkey_bits(u64 pteflags)
 {
return 0x0UL;
diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index 57e3785..3d748bd 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -296,11 +296,6 @@ static inline int vma_pkey(struct vm_area_struct *vma)
 
return (vma->vm_flags & vma_pkey_mask) >> VM_PKEY_SHIFT;
 }
-#else
-static inline int vma_pkey(struct vm_area_struct *vma)
-{
-   return 0;
-}
 #endif
 
 /*
diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h
index a0ba1ff..f6c287b 100644
--- a/arch/x86/include/asm/pkeys.h
+++ b/arch/x86/include/asm/pkeys.h
@@ -6,6 +6,7 @@
 
 extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
unsigned long init_val);
+extern bool arch_pkeys_enabled(void);
 
 /*
  * Try to dedicate one of the protection keys to be used as an
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 87a57b7..4f566e9 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -945,6 +945,11 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int 
pkey,
 
return 0;
 }
+
+bool arch_pkeys_enabled(void)
+{
+   return boot_cpu_has(X86_FEATURE_OSPKE);
+}
 #endif /* ! CONFIG_ARCH_HAS_PKEYS */
 
 /*
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 6285697..960dbab 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1306,11 +1306,3 @@ static int __init register_kernel_offset_dumper(void)
return 0;
 }
 __initcall(register_kernel_offset_dumper);
-
-void arch_show_smap(struct seq_file *m, struct vm_area_struct *vma)
-{
-   if (!boot_cpu_has(X86_FEATURE_OSPKE))
-   return;
-
-   seq_printf(m, "ProtectionKey:  %8u\n", vma_pkey(vma));
-}
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 3c7..9ce0097 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -18,10 +18,12 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 
 #define SEQ_PUT_DEC(str, val) \
@@ -728,12 +730,9 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long 
hmask,
 }
 #endif /* HUGETLB_PAGE */
 
-void __weak arch_show_smap(struct seq_file *m, struct vm_area_struct *vma)
-{
-}
-
 #define SEQ_PUT_DEC(str, val) \
seq_put_decimal_ull_width(m, str, (val) >> 10, 8)
+
 static int show_smap(struct seq_file *m, void *v, int is_pid)
 {
struct proc_maps_private *priv = m->private;
@@ -836,9 +835,11 @@ static int show_smap(struct seq_file *m, void *v, int 
is_pid)
seq_puts(m, " kB\n");
}
if (!rollup_mode) {
-   arch_show_smap(m, vma);
+   if (arch_pkeys_enabled())
+   seq_printf(m, "ProtectionKey:  %8u\n", vma_pkey(vma));
show_smap_vma_flags(m, vma);
}
+
m_cache_vma(m, vma);
return ret;
 }
diff --git a/include

[PATCH v11 3/3] mm, x86, powerpc: display pkey in smaps only if arch supports pkeys

2018-05-04 Thread Ram Pai
Currently the  architecture  specific code is expected to
display  the  protection  keys  in  smap  for a given vma.
This can lead to redundant code and possibly to divergent
formats in which the key gets displayed.

This  patch  changes  the implementation. It displays the
pkey only if the architecture support pkeys, i.e
arch_pkeys_enabled() returns true.  This patch
provides x86 implementation for arch_pkeys_enabled().

x86 arch_show_smap() function is not needed anymore.
Deleting it.

cc: Michael Ellermen 
cc: Benjamin Herrenschmidt 
cc: Andrew Morton 
Reviewed-by: Dave Hansen 
Signed-off-by: Thiago Jung Bauermann 
(fixed compilation errors for x86 configs)
Acked-by: Michal Hocko 
Reviewed-by: Ingo Molnar 
Signed-off-by: Ram Pai 
---
 arch/powerpc/include/asm/mmu_context.h |5 -
 arch/x86/include/asm/mmu_context.h |5 -
 arch/x86/include/asm/pkeys.h   |1 +
 arch/x86/kernel/fpu/xstate.c   |5 +
 arch/x86/kernel/setup.c|8 
 fs/proc/task_mmu.c |   11 ++-
 include/linux/pkeys.h  |7 ++-
 7 files changed, 18 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h 
b/arch/powerpc/include/asm/mmu_context.h
index 1835ca1..896efa5 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -250,11 +250,6 @@ static inline bool arch_vma_access_permitted(struct 
vm_area_struct *vma,
 #define thread_pkey_regs_restore(new_thread, old_thread)
 #define thread_pkey_regs_init(thread)
 
-static inline int vma_pkey(struct vm_area_struct *vma)
-{
-   return 0;
-}
-
 static inline u64 pte_to_hpte_pkey_bits(u64 pteflags)
 {
return 0x0UL;
diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index 57e3785..3d748bd 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -296,11 +296,6 @@ static inline int vma_pkey(struct vm_area_struct *vma)
 
return (vma->vm_flags & vma_pkey_mask) >> VM_PKEY_SHIFT;
 }
-#else
-static inline int vma_pkey(struct vm_area_struct *vma)
-{
-   return 0;
-}
 #endif
 
 /*
diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h
index a0ba1ff..f6c287b 100644
--- a/arch/x86/include/asm/pkeys.h
+++ b/arch/x86/include/asm/pkeys.h
@@ -6,6 +6,7 @@
 
 extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
unsigned long init_val);
+extern bool arch_pkeys_enabled(void);
 
 /*
  * Try to dedicate one of the protection keys to be used as an
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 87a57b7..4f566e9 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -945,6 +945,11 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int 
pkey,
 
return 0;
 }
+
+bool arch_pkeys_enabled(void)
+{
+   return boot_cpu_has(X86_FEATURE_OSPKE);
+}
 #endif /* ! CONFIG_ARCH_HAS_PKEYS */
 
 /*
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 6285697..960dbab 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1306,11 +1306,3 @@ static int __init register_kernel_offset_dumper(void)
return 0;
 }
 __initcall(register_kernel_offset_dumper);
-
-void arch_show_smap(struct seq_file *m, struct vm_area_struct *vma)
-{
-   if (!boot_cpu_has(X86_FEATURE_OSPKE))
-   return;
-
-   seq_printf(m, "ProtectionKey:  %8u\n", vma_pkey(vma));
-}
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 3c7..9ce0097 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -18,10 +18,12 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 
 #define SEQ_PUT_DEC(str, val) \
@@ -728,12 +730,9 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long 
hmask,
 }
 #endif /* HUGETLB_PAGE */
 
-void __weak arch_show_smap(struct seq_file *m, struct vm_area_struct *vma)
-{
-}
-
 #define SEQ_PUT_DEC(str, val) \
seq_put_decimal_ull_width(m, str, (val) >> 10, 8)
+
 static int show_smap(struct seq_file *m, void *v, int is_pid)
 {
struct proc_maps_private *priv = m->private;
@@ -836,9 +835,11 @@ static int show_smap(struct seq_file *m, void *v, int 
is_pid)
seq_puts(m, " kB\n");
}
if (!rollup_mode) {
-   arch_show_smap(m, vma);
+   if (arch_pkeys_enabled())
+   seq_printf(m, "ProtectionKey:  %8u\n", vma_pkey(vma));
show_smap_vma_flags(m, vma);
}
+
m_cache_vma(m, vma);
return ret;
 }
diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h
index 0794ca7..49dff15 100644
--- a/include/linux/pkeys.h
+++ b/include/linux/pkeys.h
@@ -3,7 +3,6 @@
 #define _LINUX_PKEYS_H
 
 #include 
-#include 
 
 #ifdef CONFIG_ARCH_HAS_PKEYS
 #include 
@@

[PATCH v13 0/3] mm, x86, powerpc: Enhancements to Memory Protection Keys.

2018-05-04 Thread Ram Pai
This patch series provides arch-neutral enhancements to
enable memory-keys on new architecutes, and the corresponding
changes in x86 and powerpc specific code to support that.

a) Provides ability to support upto 32 keys.  PowerPC
can handle 32 keys and hence needs this.

b) Arch-neutral code; and not the arch-specific code,
   determines the format of the string, that displays the key
   for each vma in smaps.

History:
---
version 14:
(1) made VM_PKEY_BIT4 unusable on x86, #defined it to 0
-- comment by Dave Hansen
(2) due to some reason this patch series continue to
  break some or the other build. The last series
  passed everything but created a merge
  conflict followed by build failure for
  Michael Ellermen. :(

version v13:
(1) fixed a git bisect error. :(

version v12:
(1) fixed compilation errors seen with various x86
configs.
version v11:
(1) code that displays key in smaps is not any more
defined under CONFIG_ARCH_HAS_PKEYS.
- Comment by Eric W. Biederman and Michal Hocko
(2) merged two patches that implemented (1).
- comment by Michal Hocko

version prior to v11:
(1) used one additional bit from VM_HIGH_ARCH_*
to support 32 keys.
- Suggestion by Dave Hansen.
(2) powerpc specific changes to support memory keys.


Ram Pai (3):
  mm, powerpc, x86: define VM_PKEY_BITx bits if CONFIG_ARCH_HAS_PKEYS
is enabled
  mm, powerpc, x86: introduce an additional vma bit for powerpc pkey
  mm, x86, powerpc: display pkey in smaps only if arch supports pkeys

 arch/powerpc/include/asm/mmu_context.h |5 -
 arch/powerpc/include/asm/pkeys.h   |2 ++
 arch/x86/include/asm/mmu_context.h |5 -
 arch/x86/include/asm/pkeys.h   |1 +
 arch/x86/kernel/fpu/xstate.c   |5 +
 arch/x86/kernel/setup.c|8 
 fs/proc/task_mmu.c |   16 +---
 include/linux/mm.h |   15 +++
 include/linux/pkeys.h  |7 ++-
 9 files changed, 34 insertions(+), 30 deletions(-)



[PATCH v13 0/3] mm, x86, powerpc: Enhancements to Memory Protection Keys.

2018-05-04 Thread Ram Pai
This patch series provides arch-neutral enhancements to
enable memory-keys on new architecutes, and the corresponding
changes in x86 and powerpc specific code to support that.

a) Provides ability to support upto 32 keys.  PowerPC
can handle 32 keys and hence needs this.

b) Arch-neutral code; and not the arch-specific code,
   determines the format of the string, that displays the key
   for each vma in smaps.

History:
---
version 14:
(1) made VM_PKEY_BIT4 unusable on x86, #defined it to 0
-- comment by Dave Hansen
(2) due to some reason this patch series continue to
  break some or the other build. The last series
  passed everything but created a merge
  conflict followed by build failure for
  Michael Ellermen. :(

version v13:
(1) fixed a git bisect error. :(

version v12:
(1) fixed compilation errors seen with various x86
configs.
version v11:
(1) code that displays key in smaps is not any more
defined under CONFIG_ARCH_HAS_PKEYS.
- Comment by Eric W. Biederman and Michal Hocko
(2) merged two patches that implemented (1).
- comment by Michal Hocko

version prior to v11:
(1) used one additional bit from VM_HIGH_ARCH_*
to support 32 keys.
- Suggestion by Dave Hansen.
(2) powerpc specific changes to support memory keys.


Ram Pai (3):
  mm, powerpc, x86: define VM_PKEY_BITx bits if CONFIG_ARCH_HAS_PKEYS
is enabled
  mm, powerpc, x86: introduce an additional vma bit for powerpc pkey
  mm, x86, powerpc: display pkey in smaps only if arch supports pkeys

 arch/powerpc/include/asm/mmu_context.h |5 -
 arch/powerpc/include/asm/pkeys.h   |2 ++
 arch/x86/include/asm/mmu_context.h |5 -
 arch/x86/include/asm/pkeys.h   |1 +
 arch/x86/kernel/fpu/xstate.c   |5 +
 arch/x86/kernel/setup.c|8 
 fs/proc/task_mmu.c |   16 +---
 include/linux/mm.h |   15 +++
 include/linux/pkeys.h  |7 ++-
 9 files changed, 34 insertions(+), 30 deletions(-)



[PATCH v13 3/3] mm, powerpc, x86: introduce an additional vma bit for powerpc pkey

2018-05-04 Thread Ram Pai
Only 4bits are allocated in the vma flags to hold 16 keys. This is
sufficient on x86. PowerPC supports 32 keys, which needs 5bits.
Allocate an  additional bit.

cc: Dave Hansen <dave.han...@intel.com>
cc: Michael Ellermen <m...@ellerman.id.au>
cc: Benjamin Herrenschmidt <b...@kernel.crashing.org>
cc: Andrew Morton <a...@linux-foundation.org>
Reviewed-by: Ingo Molnar <mi...@kernel.org>
Acked-by: Balbir Singh <bsinghar...@gmail.com>
Signed-off-by: Ram Pai <linux...@us.ibm.com>
---
 fs/proc/task_mmu.c |1 +
 include/linux/mm.h |8 +++-
 2 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 0c9e392..3c7 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -679,6 +679,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct 
vm_area_struct *vma)
[ilog2(VM_PKEY_BIT1)]   = "",
[ilog2(VM_PKEY_BIT2)]   = "",
[ilog2(VM_PKEY_BIT3)]   = "",
+   [ilog2(VM_PKEY_BIT4)]   = "",
 #endif /* CONFIG_ARCH_HAS_PKEYS */
};
size_t i;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c6a6f24..cca67d1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -230,10 +230,16 @@ extern int overcommit_kbytes_handler(struct ctl_table *, 
int, void __user *,
 
 #ifdef CONFIG_ARCH_HAS_PKEYS
 # define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0
-# define VM_PKEY_BIT0  VM_HIGH_ARCH_0  /* A protection key is a 4-bit value */
+/* Protection key is a 4-bit value on x86 and 5-bit value on ppc64   */
+# define VM_PKEY_BIT0  VM_HIGH_ARCH_0
 # define VM_PKEY_BIT1  VM_HIGH_ARCH_1
 # define VM_PKEY_BIT2  VM_HIGH_ARCH_2
 # define VM_PKEY_BIT3  VM_HIGH_ARCH_3
+#if defined(CONFIG_PPC)
+# define VM_PKEY_BIT4  VM_HIGH_ARCH_4
+#else 
+# define VM_PKEY_BIT4  0
+#endif
 #endif /* CONFIG_ARCH_HAS_PKEYS */
 
 #if defined(CONFIG_X86)
-- 
1.7.1



[PATCH v13 3/3] mm, powerpc, x86: introduce an additional vma bit for powerpc pkey

2018-05-04 Thread Ram Pai
Only 4bits are allocated in the vma flags to hold 16 keys. This is
sufficient on x86. PowerPC supports 32 keys, which needs 5bits.
Allocate an  additional bit.

cc: Dave Hansen 
cc: Michael Ellermen 
cc: Benjamin Herrenschmidt 
cc: Andrew Morton 
Reviewed-by: Ingo Molnar 
Acked-by: Balbir Singh 
Signed-off-by: Ram Pai 
---
 fs/proc/task_mmu.c |1 +
 include/linux/mm.h |8 +++-
 2 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 0c9e392..3c7 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -679,6 +679,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct 
vm_area_struct *vma)
[ilog2(VM_PKEY_BIT1)]   = "",
[ilog2(VM_PKEY_BIT2)]   = "",
[ilog2(VM_PKEY_BIT3)]   = "",
+   [ilog2(VM_PKEY_BIT4)]   = "",
 #endif /* CONFIG_ARCH_HAS_PKEYS */
};
size_t i;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c6a6f24..cca67d1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -230,10 +230,16 @@ extern int overcommit_kbytes_handler(struct ctl_table *, 
int, void __user *,
 
 #ifdef CONFIG_ARCH_HAS_PKEYS
 # define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0
-# define VM_PKEY_BIT0  VM_HIGH_ARCH_0  /* A protection key is a 4-bit value */
+/* Protection key is a 4-bit value on x86 and 5-bit value on ppc64   */
+# define VM_PKEY_BIT0  VM_HIGH_ARCH_0
 # define VM_PKEY_BIT1  VM_HIGH_ARCH_1
 # define VM_PKEY_BIT2  VM_HIGH_ARCH_2
 # define VM_PKEY_BIT3  VM_HIGH_ARCH_3
+#if defined(CONFIG_PPC)
+# define VM_PKEY_BIT4  VM_HIGH_ARCH_4
+#else 
+# define VM_PKEY_BIT4  0
+#endif
 #endif /* CONFIG_ARCH_HAS_PKEYS */
 
 #if defined(CONFIG_X86)
-- 
1.7.1



[PATCH v13 1/3] mm, powerpc, x86: define VM_PKEY_BITx bits if CONFIG_ARCH_HAS_PKEYS is enabled

2018-05-04 Thread Ram Pai
VM_PKEY_BITx are defined only if CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
is enabled. Powerpc also needs these bits. Hence lets define the
VM_PKEY_BITx bits for any architecture that enables
CONFIG_ARCH_HAS_PKEYS.

cc: Michael Ellermen <m...@ellerman.id.au>
cc: Benjamin Herrenschmidt <b...@kernel.crashing.org>
cc: Andrew Morton <a...@linux-foundation.org>
Reviewed-by: Dave Hansen <dave.han...@intel.com>
Signed-off-by: Ram Pai <linux...@us.ibm.com>
Reviewed-by: Ingo Molnar <mi...@kernel.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.ku...@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pkeys.h |2 ++
 fs/proc/task_mmu.c   |4 ++--
 include/linux/mm.h   |9 +
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 3a9b82b..425b181 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -26,6 +26,8 @@
 # define VM_PKEY_BIT2  VM_HIGH_ARCH_2
 # define VM_PKEY_BIT3  VM_HIGH_ARCH_3
 # define VM_PKEY_BIT4  VM_HIGH_ARCH_4
+#elif !defined(VM_PKEY_BIT4)
+# define VM_PKEY_BIT4  VM_HIGH_ARCH_4
 #endif
 
 #define ARCH_VM_PKEY_FLAGS (VM_PKEY_BIT0 | VM_PKEY_BIT1 | VM_PKEY_BIT2 | \
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 65ae546..0c9e392 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -673,13 +673,13 @@ static void show_smap_vma_flags(struct seq_file *m, 
struct vm_area_struct *vma)
[ilog2(VM_MERGEABLE)]   = "mg",
[ilog2(VM_UFFD_MISSING)]= "um",
[ilog2(VM_UFFD_WP)] = "uw",
-#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
+#ifdef CONFIG_ARCH_HAS_PKEYS
/* These come out via ProtectionKey: */
[ilog2(VM_PKEY_BIT0)]   = "",
[ilog2(VM_PKEY_BIT1)]   = "",
[ilog2(VM_PKEY_BIT2)]   = "",
[ilog2(VM_PKEY_BIT3)]   = "",
-#endif
+#endif /* CONFIG_ARCH_HAS_PKEYS */
};
size_t i;
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1ac1f06..c6a6f24 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -228,15 +228,16 @@ extern int overcommit_kbytes_handler(struct ctl_table *, 
int, void __user *,
 #define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4)
 #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */
 
-#if defined(CONFIG_X86)
-# define VM_PATVM_ARCH_1   /* PAT reserves whole VMA at 
once (x86) */
-#if defined (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS)
+#ifdef CONFIG_ARCH_HAS_PKEYS
 # define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0
 # define VM_PKEY_BIT0  VM_HIGH_ARCH_0  /* A protection key is a 4-bit value */
 # define VM_PKEY_BIT1  VM_HIGH_ARCH_1
 # define VM_PKEY_BIT2  VM_HIGH_ARCH_2
 # define VM_PKEY_BIT3  VM_HIGH_ARCH_3
-#endif
+#endif /* CONFIG_ARCH_HAS_PKEYS */
+
+#if defined(CONFIG_X86)
+# define VM_PATVM_ARCH_1   /* PAT reserves whole VMA at 
once (x86) */
 #elif defined(CONFIG_PPC)
 # define VM_SAOVM_ARCH_1   /* Strong Access Ordering 
(powerpc) */
 #elif defined(CONFIG_PARISC)
-- 
1.7.1



[PATCH v13 1/3] mm, powerpc, x86: define VM_PKEY_BITx bits if CONFIG_ARCH_HAS_PKEYS is enabled

2018-05-04 Thread Ram Pai
VM_PKEY_BITx are defined only if CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
is enabled. Powerpc also needs these bits. Hence lets define the
VM_PKEY_BITx bits for any architecture that enables
CONFIG_ARCH_HAS_PKEYS.

cc: Michael Ellermen 
cc: Benjamin Herrenschmidt 
cc: Andrew Morton 
Reviewed-by: Dave Hansen 
Signed-off-by: Ram Pai 
Reviewed-by: Ingo Molnar 
Reviewed-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/pkeys.h |2 ++
 fs/proc/task_mmu.c   |4 ++--
 include/linux/mm.h   |9 +
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 3a9b82b..425b181 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -26,6 +26,8 @@
 # define VM_PKEY_BIT2  VM_HIGH_ARCH_2
 # define VM_PKEY_BIT3  VM_HIGH_ARCH_3
 # define VM_PKEY_BIT4  VM_HIGH_ARCH_4
+#elif !defined(VM_PKEY_BIT4)
+# define VM_PKEY_BIT4  VM_HIGH_ARCH_4
 #endif
 
 #define ARCH_VM_PKEY_FLAGS (VM_PKEY_BIT0 | VM_PKEY_BIT1 | VM_PKEY_BIT2 | \
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 65ae546..0c9e392 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -673,13 +673,13 @@ static void show_smap_vma_flags(struct seq_file *m, 
struct vm_area_struct *vma)
[ilog2(VM_MERGEABLE)]   = "mg",
[ilog2(VM_UFFD_MISSING)]= "um",
[ilog2(VM_UFFD_WP)] = "uw",
-#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
+#ifdef CONFIG_ARCH_HAS_PKEYS
/* These come out via ProtectionKey: */
[ilog2(VM_PKEY_BIT0)]   = "",
[ilog2(VM_PKEY_BIT1)]   = "",
[ilog2(VM_PKEY_BIT2)]   = "",
[ilog2(VM_PKEY_BIT3)]   = "",
-#endif
+#endif /* CONFIG_ARCH_HAS_PKEYS */
};
size_t i;
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1ac1f06..c6a6f24 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -228,15 +228,16 @@ extern int overcommit_kbytes_handler(struct ctl_table *, 
int, void __user *,
 #define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4)
 #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */
 
-#if defined(CONFIG_X86)
-# define VM_PATVM_ARCH_1   /* PAT reserves whole VMA at 
once (x86) */
-#if defined (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS)
+#ifdef CONFIG_ARCH_HAS_PKEYS
 # define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0
 # define VM_PKEY_BIT0  VM_HIGH_ARCH_0  /* A protection key is a 4-bit value */
 # define VM_PKEY_BIT1  VM_HIGH_ARCH_1
 # define VM_PKEY_BIT2  VM_HIGH_ARCH_2
 # define VM_PKEY_BIT3  VM_HIGH_ARCH_3
-#endif
+#endif /* CONFIG_ARCH_HAS_PKEYS */
+
+#if defined(CONFIG_X86)
+# define VM_PATVM_ARCH_1   /* PAT reserves whole VMA at 
once (x86) */
 #elif defined(CONFIG_PPC)
 # define VM_SAOVM_ARCH_1   /* Strong Access Ordering 
(powerpc) */
 #elif defined(CONFIG_PARISC)
-- 
1.7.1



Re: [RFC] virtio: Use DMA MAP API for devices without an IOMMU

2018-05-01 Thread Ram Pai
On Wed, Apr 18, 2018 at 07:20:10PM +0300, Michael S. Tsirkin wrote:
> On Wed, Apr 18, 2018 at 08:47:10AM +0530, Anshuman Khandual wrote:
> > On 04/15/2018 05:41 PM, Christoph Hellwig wrote:
> > > On Fri, Apr 06, 2018 at 06:37:18PM +1000, Benjamin Herrenschmidt wrote:
> > >>>> implemented as DMA API which the virtio core understands. There is no
> > >>>> need for an IOMMU to be involved for the device representation in this
> > >>>> case IMHO.
> > >>>
> > >>> This whole virtio translation issue is a mess.  I think we need to
> > >>> switch it to the dma API, and then quirk the legacy case to always
> > >>> use the direct mapping inside the dma API.
> > >>
> > >> Fine with using a dma API always on the Linux side, but we do want to
> > >> special case virtio still at the arch and qemu side to have a "direct
> > >> mapping" mode. Not sure how (special flags on PCI devices) to avoid
> > >> actually going through an emulated IOMMU on the qemu side, because that
> > >> slows things down, esp. with vhost.
> > >>
> > >> IE, we can't I think just treat it the same as a physical device.
> > > 
> > > We should have treated it like a physical device from the start, but
> > > that device has unfortunately sailed.
> > > 
> > > But yes, we'll need a per-device quirk that says 'don't attach an
> > > iommu'.
> > 
> > How about doing it per platform basis as suggested in this RFC through
> > an arch specific callback. Because all the virtio devices in the given
> > platform would require and exercise this option (to avail bounce buffer
> > mechanism for secure guests as an example). So the flag basically is a
> > platform specific one not a device specific one.
> 
> That's not the case. A single platform can have a mix of virtio and
> non-virtio devices. Same applies even within virtio, e.g. the balloon
> device always bypasses an iommu.  Further, QEMU supports out of process
> devices some of which might bypass the IOMMU.

Given that each virtio device has to behave differently depending on 
(a) what it does?  (balloon, block, net etc ) 
(b) what platform it is on?  (pseries, x86, )
(c) what environment it is on?  (secure, insecure...)

I think, we should let the virtio device decide what it wants, instead
of forcing it to NOT use dma_ops when VIRTIO_F_IOMMU_PLATFORM is NOT
enabled.

Currently, virtio generic code, has an assumption that a device must NOT
use dma operations if the hypervisor has NOT enabled VIRTIO_F_IOMMU_PLATFORM.
This assumption is baked into vring_use_dma_api(); though there is a
special exception for xen_domain().

This assumption is restricting us from using the dma_ops abstraction for
virtio devices on secure VM. BTW: VIRTIO_F_IOMMU_PLATFORM may or may not
be set on this platform.

On our secure VM, virtio devices; by default, do not share pages with
hypervisor. In other words, hypervisor cannot access the secure VM
pages. The secure VM with the help of the hardware enables some pages
to be shared with the hypervisor.  Secure VM then uses these pages to
bounce virtio data with the hypervisor.

One elegant way to impliment this functionality is to abstract it
under our special dma_ops and wire it to the virtio devices.

However the restriction imposed by the generic virtio code, contrains us
from doing so.

If we can enrich vring_use_dma_api() to take multiple factors into
consideration and not just VIRTIO_F_IOMMU_PLATFORM; perferrably by
consulting a arch-dependent function, we could seemlessly integrate
into the existing virtio infrastructure.

RP
> 
> -- 
> MST

-- 
Ram Pai



Re: [RFC] virtio: Use DMA MAP API for devices without an IOMMU

2018-05-01 Thread Ram Pai
On Wed, Apr 18, 2018 at 07:20:10PM +0300, Michael S. Tsirkin wrote:
> On Wed, Apr 18, 2018 at 08:47:10AM +0530, Anshuman Khandual wrote:
> > On 04/15/2018 05:41 PM, Christoph Hellwig wrote:
> > > On Fri, Apr 06, 2018 at 06:37:18PM +1000, Benjamin Herrenschmidt wrote:
> > >>>> implemented as DMA API which the virtio core understands. There is no
> > >>>> need for an IOMMU to be involved for the device representation in this
> > >>>> case IMHO.
> > >>>
> > >>> This whole virtio translation issue is a mess.  I think we need to
> > >>> switch it to the dma API, and then quirk the legacy case to always
> > >>> use the direct mapping inside the dma API.
> > >>
> > >> Fine with using a dma API always on the Linux side, but we do want to
> > >> special case virtio still at the arch and qemu side to have a "direct
> > >> mapping" mode. Not sure how (special flags on PCI devices) to avoid
> > >> actually going through an emulated IOMMU on the qemu side, because that
> > >> slows things down, esp. with vhost.
> > >>
> > >> IE, we can't I think just treat it the same as a physical device.
> > > 
> > > We should have treated it like a physical device from the start, but
> > > that device has unfortunately sailed.
> > > 
> > > But yes, we'll need a per-device quirk that says 'don't attach an
> > > iommu'.
> > 
> > How about doing it per platform basis as suggested in this RFC through
> > an arch specific callback. Because all the virtio devices in the given
> > platform would require and exercise this option (to avail bounce buffer
> > mechanism for secure guests as an example). So the flag basically is a
> > platform specific one not a device specific one.
> 
> That's not the case. A single platform can have a mix of virtio and
> non-virtio devices. Same applies even within virtio, e.g. the balloon
> device always bypasses an iommu.  Further, QEMU supports out of process
> devices some of which might bypass the IOMMU.

Given that each virtio device has to behave differently depending on 
(a) what it does?  (balloon, block, net etc ) 
(b) what platform it is on?  (pseries, x86, )
(c) what environment it is on?  (secure, insecure...)

I think, we should let the virtio device decide what it wants, instead
of forcing it to NOT use dma_ops when VIRTIO_F_IOMMU_PLATFORM is NOT
enabled.

Currently, virtio generic code, has an assumption that a device must NOT
use dma operations if the hypervisor has NOT enabled VIRTIO_F_IOMMU_PLATFORM.
This assumption is baked into vring_use_dma_api(); though there is a
special exception for xen_domain().

This assumption is restricting us from using the dma_ops abstraction for
virtio devices on secure VM. BTW: VIRTIO_F_IOMMU_PLATFORM may or may not
be set on this platform.

On our secure VM, virtio devices; by default, do not share pages with
hypervisor. In other words, hypervisor cannot access the secure VM
pages. The secure VM with the help of the hardware enables some pages
to be shared with the hypervisor.  Secure VM then uses these pages to
bounce virtio data with the hypervisor.

One elegant way to impliment this functionality is to abstract it
under our special dma_ops and wire it to the virtio devices.

However the restriction imposed by the generic virtio code, contrains us
from doing so.

If we can enrich vring_use_dma_api() to take multiple factors into
consideration and not just VIRTIO_F_IOMMU_PLATFORM; perferrably by
consulting a arch-dependent function, we could seemlessly integrate
into the existing virtio infrastructure.

RP
> 
> -- 
> MST

-- 
Ram Pai



Re: [PATCH 0/9] [v3] x86, pkeys: two protection keys bug fixes

2018-04-30 Thread Ram Pai
On Mon, Apr 30, 2018 at 08:30:43AM -0700, Dave Hansen wrote:
> On 04/28/2018 12:05 AM, Ingo Molnar wrote:
> > In the above kernel that was missing the PROT_EXEC fix I was repeatedly 
> > running 
> > the 64-bit and 32-bit testcases as non-root and as root as well, until I 
> > got a 
> > hang in the middle of a 32-bit test running as root:
> > 
> >   test  7 PASSED (iteration 19)
> >   test  8 PASSED (iteration 19)
> >   test  9 PASSED (iteration 19)
> > 
> >   < test just hangs here >
> 
> For the hang, there is a known issue with the use of printf() in the
> signal handler and a resulting deadlock.  I *thought* there was a patch
> merged to fix this from Ram Pai or one of the other IBM folks.

Yes. there is a patch. unfortunately that patch assumes the selftest has
been moved into selftests/vm directory.  One option is --  I merge your
changes in my selftest patchset, and send the entire series for upstream
merge.

Or you can manually massage-in the specific fix.
The patch is "selftests/vm: Fix deadlock in protection_keys.c"
https://patchwork.ozlabs.org/patch/864394/

Let me know,
-- 
Ram Pai



Re: [PATCH 0/9] [v3] x86, pkeys: two protection keys bug fixes

2018-04-30 Thread Ram Pai
On Mon, Apr 30, 2018 at 08:30:43AM -0700, Dave Hansen wrote:
> On 04/28/2018 12:05 AM, Ingo Molnar wrote:
> > In the above kernel that was missing the PROT_EXEC fix I was repeatedly 
> > running 
> > the 64-bit and 32-bit testcases as non-root and as root as well, until I 
> > got a 
> > hang in the middle of a 32-bit test running as root:
> > 
> >   test  7 PASSED (iteration 19)
> >   test  8 PASSED (iteration 19)
> >   test  9 PASSED (iteration 19)
> > 
> >   < test just hangs here >
> 
> For the hang, there is a known issue with the use of printf() in the
> signal handler and a resulting deadlock.  I *thought* there was a patch
> merged to fix this from Ram Pai or one of the other IBM folks.

Yes. there is a patch. unfortunately that patch assumes the selftest has
been moved into selftests/vm directory.  One option is --  I merge your
changes in my selftest patchset, and send the entire series for upstream
merge.

Or you can manually massage-in the specific fix.
The patch is "selftests/vm: Fix deadlock in protection_keys.c"
https://patchwork.ozlabs.org/patch/864394/

Let me know,
-- 
Ram Pai



Re: [PATCH 4/9] x86, pkeys: override pkey when moving away from PROT_EXEC

2018-04-30 Thread Ram Pai
On Thu, Apr 26, 2018 at 10:57:31AM -0700, Dave Hansen wrote:
> On 04/06/2018 06:09 PM, Ram Pai wrote:
> > Well :). my point is add this code and delete the other
> > code that you add later in that function.
> 
> I don't think I'm understanding what your suggestion was.  I looked at
> the code and I honestly do not think I can remove any of it.
> 
> For the plain (non-explicit pkey_mprotect()) case, there are exactly
> four paths through __arch_override_mprotect_pkey(), resulting in three
> different results.
> 
> 1. New prot==PROT_EXEC, no pkey-exec support -> do not override
> 2. New prot!=PROT_EXEC, old VMA not PROT_EXEC-> do not override
> 3. New prot==PROT_EXEC, w/ pkey-exec support -> override to exec pkey
> 4. New prot!=PROT_EXEC, old VMA is PROT_EXEC -> override to default
> 
> I don't see any redundancy there, or any code that we can eliminate or
> simplify.  It was simpler before, but that's what where bug was.

Your code is fine.  But than the following code accomplishes the same
outcome; arguably with a one line change. Its not a big deal. Just
trying to clarify my comment.

int __arch_override_mprotect_pkey(struct vm_area_struct *vma, int prot, int 
pkey)
{
/*
 * Is this an mprotect_pkey() call?  If so, never
 * override the value that came from the user.
 */
if (pkey != -1)
return pkey;
/*
 * Look for a protection-key-drive execute-only mapping
 * which is now being given permissions that are not
 * execute-only.  Move it back to the default pkey.
 */
if (vma_is_pkey_exec_only(vma) && (prot != PROT_EXEC)) <
return ARCH_DEFAULT_PKEY;

/*
 * The mapping is execute-only.  Go try to get the
 * execute-only protection key.  If we fail to do that,
 * fall through as if we do not have execute-only
 * support.
 */
if (prot == PROT_EXEC) {
pkey = execute_only_pkey(vma->vm_mm);
if (pkey > 0)
return pkey;
}
/*
 * This is a vanilla, non-pkey mprotect (or we failed to
 * setup execute-only), inherit the pkey from the VMA we
 * are working on.
 */
return vma_pkey(vma);
}

-- 
Ram Pai



Re: [PATCH 4/9] x86, pkeys: override pkey when moving away from PROT_EXEC

2018-04-30 Thread Ram Pai
On Thu, Apr 26, 2018 at 10:57:31AM -0700, Dave Hansen wrote:
> On 04/06/2018 06:09 PM, Ram Pai wrote:
> > Well :). my point is add this code and delete the other
> > code that you add later in that function.
> 
> I don't think I'm understanding what your suggestion was.  I looked at
> the code and I honestly do not think I can remove any of it.
> 
> For the plain (non-explicit pkey_mprotect()) case, there are exactly
> four paths through __arch_override_mprotect_pkey(), resulting in three
> different results.
> 
> 1. New prot==PROT_EXEC, no pkey-exec support -> do not override
> 2. New prot!=PROT_EXEC, old VMA not PROT_EXEC-> do not override
> 3. New prot==PROT_EXEC, w/ pkey-exec support -> override to exec pkey
> 4. New prot!=PROT_EXEC, old VMA is PROT_EXEC -> override to default
> 
> I don't see any redundancy there, or any code that we can eliminate or
> simplify.  It was simpler before, but that's what where bug was.

Your code is fine.  But than the following code accomplishes the same
outcome; arguably with a one line change. Its not a big deal. Just
trying to clarify my comment.

int __arch_override_mprotect_pkey(struct vm_area_struct *vma, int prot, int 
pkey)
{
/*
 * Is this an mprotect_pkey() call?  If so, never
 * override the value that came from the user.
 */
if (pkey != -1)
return pkey;
/*
 * Look for a protection-key-drive execute-only mapping
 * which is now being given permissions that are not
 * execute-only.  Move it back to the default pkey.
 */
if (vma_is_pkey_exec_only(vma) && (prot != PROT_EXEC)) <
return ARCH_DEFAULT_PKEY;

/*
 * The mapping is execute-only.  Go try to get the
 * execute-only protection key.  If we fail to do that,
 * fall through as if we do not have execute-only
 * support.
 */
if (prot == PROT_EXEC) {
pkey = execute_only_pkey(vma->vm_mm);
if (pkey > 0)
return pkey;
}
/*
 * This is a vanilla, non-pkey mprotect (or we failed to
 * setup execute-only), inherit the pkey from the VMA we
 * are working on.
 */
return vma_pkey(vma);
}

-- 
Ram Pai



Re: [PATCH v2] resource: Fix integer overflow at reallocation

2018-04-11 Thread Ram Pai
On Wed, Apr 11, 2018 at 08:16:33AM +0200, Takashi Iwai wrote:
> On Wed, 11 Apr 2018 02:37:44 +0200,
> Ram Pai wrote:
> > 
> > On Tue, Apr 10, 2018 at 01:42:39PM -0700, Andrew Morton wrote:
> > > On Tue, 10 Apr 2018 06:54:11 +0200 Takashi Iwai <ti...@suse.de> wrote:
> > > 
> > > > On Tue, 10 Apr 2018 02:23:26 +0200,
> > > > Andrew Morton wrote:
> > > > > 
> > > > > On Sun,  8 Apr 2018 09:20:26 +0200 Takashi Iwai <ti...@suse.de> wrote:
> > > > > 
> > > > > > We've got a bug report indicating a kernel panic at booting on an
> > > > > > x86-32 system, and it turned out to be the invalid resource assigned
> > > > > > after PCI resource reallocation.  __find_resource() first aligns the
> > > > > > resource start address and resets the end address with start+size-1
> > > > > > accordingly, then checks whether it's contained.  Here the end 
> > > > > > address
> > > > > > may overflow the integer, although resource_contains() still returns
> > > > > > true because the function validates only start and end address.  So
> > > > > > this ends up with returning an invalid resource (start > end).
> > > > > > 
> > > > > > There was already an attempt to cover such a problem in the commit
> > > > > > 47ea91b4052d ("Resource: fix wrong resource window calculation"), 
> > > > > > but
> > > > > > this case is an overseen one.
> > > > > > 
> > > > > > This patch adds the validity check in resource_contains() to see
> > > > > > whether the given resource has a valid range for avoiding the 
> > > > > > integer
> > > > > > overflow problem.
> > > > > > 
> > > > > > ...
> > > > > >
> > > > > > --- a/include/linux/ioport.h
> > > > > > +++ b/include/linux/ioport.h
> > > > > > @@ -212,6 +212,9 @@ static inline bool resource_contains(struct 
> > > > > > resource *r1, struct resource *r2)
> > > > > > return false;
> > > > > > if (r1->flags & IORESOURCE_UNSET || r2->flags & 
> > > > > > IORESOURCE_UNSET)
> > > > > > return false;
> > > > > > +   /* sanity check whether it's a valid resource range */
> > > > > > +   if (r2->end < r2->start)
> > > > > > +   return false;
> > > > > > return r1->start <= r2->start && r1->end >= r2->end;
> > > > > >  }
> > > > > 
> > > > > This doesn't look like the correct place to handle this?  Clearly .end
> > > > > < .start is an invalid state for a resource and we should never have
> > > > > constructed such a thing in the first place?  So adding a check at the
> > > > > place where this resource was initially created seems to be the 
> > > > > correct
> > > > > fix?
> > > > 
> > > > Yes, that was also my first thought and actually the v1 patch was like
> > > > that.
> > > 
> > > Yes, I do prefer.
> > > 
> > > >  The v2 one was by Ram's suggestion so that we can cover
> > > > potential bugs by all other callers as well.
> > > 
> > > That could be done as a separate thing?
> > 
> > the first approach will fix overflows in just that particular case. The
> > second approach will catch and error-out overflows anywhere. There is a
> > short-term down side to the second approach; it might cause a slew of
> > error reports but will eventually help clean up all bad behavior.
> 
> For that purpose, maybe we should do in two folds: at first fix this
> specific issue in __find_resource(), then put the sanity check in
> resource_contains() in addition but with WARN_ON() so that we can
> catch more obviously.

Yes WARN_ON() is a better solution.

do the v1 way for this bug and replace the check in
resource_contains() to a WARN_ON() in a separate patch?

RP



Re: [PATCH v2] resource: Fix integer overflow at reallocation

2018-04-11 Thread Ram Pai
On Wed, Apr 11, 2018 at 08:16:33AM +0200, Takashi Iwai wrote:
> On Wed, 11 Apr 2018 02:37:44 +0200,
> Ram Pai wrote:
> > 
> > On Tue, Apr 10, 2018 at 01:42:39PM -0700, Andrew Morton wrote:
> > > On Tue, 10 Apr 2018 06:54:11 +0200 Takashi Iwai  wrote:
> > > 
> > > > On Tue, 10 Apr 2018 02:23:26 +0200,
> > > > Andrew Morton wrote:
> > > > > 
> > > > > On Sun,  8 Apr 2018 09:20:26 +0200 Takashi Iwai  wrote:
> > > > > 
> > > > > > We've got a bug report indicating a kernel panic at booting on an
> > > > > > x86-32 system, and it turned out to be the invalid resource assigned
> > > > > > after PCI resource reallocation.  __find_resource() first aligns the
> > > > > > resource start address and resets the end address with start+size-1
> > > > > > accordingly, then checks whether it's contained.  Here the end 
> > > > > > address
> > > > > > may overflow the integer, although resource_contains() still returns
> > > > > > true because the function validates only start and end address.  So
> > > > > > this ends up with returning an invalid resource (start > end).
> > > > > > 
> > > > > > There was already an attempt to cover such a problem in the commit
> > > > > > 47ea91b4052d ("Resource: fix wrong resource window calculation"), 
> > > > > > but
> > > > > > this case is an overseen one.
> > > > > > 
> > > > > > This patch adds the validity check in resource_contains() to see
> > > > > > whether the given resource has a valid range for avoiding the 
> > > > > > integer
> > > > > > overflow problem.
> > > > > > 
> > > > > > ...
> > > > > >
> > > > > > --- a/include/linux/ioport.h
> > > > > > +++ b/include/linux/ioport.h
> > > > > > @@ -212,6 +212,9 @@ static inline bool resource_contains(struct 
> > > > > > resource *r1, struct resource *r2)
> > > > > > return false;
> > > > > > if (r1->flags & IORESOURCE_UNSET || r2->flags & 
> > > > > > IORESOURCE_UNSET)
> > > > > > return false;
> > > > > > +   /* sanity check whether it's a valid resource range */
> > > > > > +   if (r2->end < r2->start)
> > > > > > +   return false;
> > > > > > return r1->start <= r2->start && r1->end >= r2->end;
> > > > > >  }
> > > > > 
> > > > > This doesn't look like the correct place to handle this?  Clearly .end
> > > > > < .start is an invalid state for a resource and we should never have
> > > > > constructed such a thing in the first place?  So adding a check at the
> > > > > place where this resource was initially created seems to be the 
> > > > > correct
> > > > > fix?
> > > > 
> > > > Yes, that was also my first thought and actually the v1 patch was like
> > > > that.
> > > 
> > > Yes, I do prefer.
> > > 
> > > >  The v2 one was by Ram's suggestion so that we can cover
> > > > potential bugs by all other callers as well.
> > > 
> > > That could be done as a separate thing?
> > 
> > the first approach will fix overflows in just that particular case. The
> > second approach will catch and error-out overflows anywhere. There is a
> > short-term down side to the second approach; it might cause a slew of
> > error reports but will eventually help clean up all bad behavior.
> 
> For that purpose, maybe we should do in two folds: at first fix this
> specific issue in __find_resource(), then put the sanity check in
> resource_contains() in addition but with WARN_ON() so that we can
> catch more obviously.

Yes WARN_ON() is a better solution.

do the v1 way for this bug and replace the check in
resource_contains() to a WARN_ON() in a separate patch?

RP



Re: [PATCH v2] resource: Fix integer overflow at reallocation

2018-04-10 Thread Ram Pai
On Tue, Apr 10, 2018 at 01:42:39PM -0700, Andrew Morton wrote:
> On Tue, 10 Apr 2018 06:54:11 +0200 Takashi Iwai  wrote:
> 
> > On Tue, 10 Apr 2018 02:23:26 +0200,
> > Andrew Morton wrote:
> > > 
> > > On Sun,  8 Apr 2018 09:20:26 +0200 Takashi Iwai  wrote:
> > > 
> > > > We've got a bug report indicating a kernel panic at booting on an
> > > > x86-32 system, and it turned out to be the invalid resource assigned
> > > > after PCI resource reallocation.  __find_resource() first aligns the
> > > > resource start address and resets the end address with start+size-1
> > > > accordingly, then checks whether it's contained.  Here the end address
> > > > may overflow the integer, although resource_contains() still returns
> > > > true because the function validates only start and end address.  So
> > > > this ends up with returning an invalid resource (start > end).
> > > > 
> > > > There was already an attempt to cover such a problem in the commit
> > > > 47ea91b4052d ("Resource: fix wrong resource window calculation"), but
> > > > this case is an overseen one.
> > > > 
> > > > This patch adds the validity check in resource_contains() to see
> > > > whether the given resource has a valid range for avoiding the integer
> > > > overflow problem.
> > > > 
> > > > ...
> > > >
> > > > --- a/include/linux/ioport.h
> > > > +++ b/include/linux/ioport.h
> > > > @@ -212,6 +212,9 @@ static inline bool resource_contains(struct 
> > > > resource *r1, struct resource *r2)
> > > > return false;
> > > > if (r1->flags & IORESOURCE_UNSET || r2->flags & 
> > > > IORESOURCE_UNSET)
> > > > return false;
> > > > +   /* sanity check whether it's a valid resource range */
> > > > +   if (r2->end < r2->start)
> > > > +   return false;
> > > > return r1->start <= r2->start && r1->end >= r2->end;
> > > >  }
> > > 
> > > This doesn't look like the correct place to handle this?  Clearly .end
> > > < .start is an invalid state for a resource and we should never have
> > > constructed such a thing in the first place?  So adding a check at the
> > > place where this resource was initially created seems to be the correct
> > > fix?
> > 
> > Yes, that was also my first thought and actually the v1 patch was like
> > that.
> 
> Yes, I do prefer.
> 
> >  The v2 one was by Ram's suggestion so that we can cover
> > potential bugs by all other callers as well.
> 
> That could be done as a separate thing?

the first approach will fix overflows in just that particular case. The
second approach will catch and error-out overflows anywhere. There is a
short-term down side to the second approach; it might cause a slew of
error reports but will eventually help clean up all bad behavior.

RP



Re: [PATCH v2] resource: Fix integer overflow at reallocation

2018-04-10 Thread Ram Pai
On Tue, Apr 10, 2018 at 01:42:39PM -0700, Andrew Morton wrote:
> On Tue, 10 Apr 2018 06:54:11 +0200 Takashi Iwai  wrote:
> 
> > On Tue, 10 Apr 2018 02:23:26 +0200,
> > Andrew Morton wrote:
> > > 
> > > On Sun,  8 Apr 2018 09:20:26 +0200 Takashi Iwai  wrote:
> > > 
> > > > We've got a bug report indicating a kernel panic at booting on an
> > > > x86-32 system, and it turned out to be the invalid resource assigned
> > > > after PCI resource reallocation.  __find_resource() first aligns the
> > > > resource start address and resets the end address with start+size-1
> > > > accordingly, then checks whether it's contained.  Here the end address
> > > > may overflow the integer, although resource_contains() still returns
> > > > true because the function validates only start and end address.  So
> > > > this ends up with returning an invalid resource (start > end).
> > > > 
> > > > There was already an attempt to cover such a problem in the commit
> > > > 47ea91b4052d ("Resource: fix wrong resource window calculation"), but
> > > > this case is an overseen one.
> > > > 
> > > > This patch adds the validity check in resource_contains() to see
> > > > whether the given resource has a valid range for avoiding the integer
> > > > overflow problem.
> > > > 
> > > > ...
> > > >
> > > > --- a/include/linux/ioport.h
> > > > +++ b/include/linux/ioport.h
> > > > @@ -212,6 +212,9 @@ static inline bool resource_contains(struct 
> > > > resource *r1, struct resource *r2)
> > > > return false;
> > > > if (r1->flags & IORESOURCE_UNSET || r2->flags & 
> > > > IORESOURCE_UNSET)
> > > > return false;
> > > > +   /* sanity check whether it's a valid resource range */
> > > > +   if (r2->end < r2->start)
> > > > +   return false;
> > > > return r1->start <= r2->start && r1->end >= r2->end;
> > > >  }
> > > 
> > > This doesn't look like the correct place to handle this?  Clearly .end
> > > < .start is an invalid state for a resource and we should never have
> > > constructed such a thing in the first place?  So adding a check at the
> > > place where this resource was initially created seems to be the correct
> > > fix?
> > 
> > Yes, that was also my first thought and actually the v1 patch was like
> > that.
> 
> Yes, I do prefer.
> 
> >  The v2 one was by Ram's suggestion so that we can cover
> > potential bugs by all other callers as well.
> 
> That could be done as a separate thing?

the first approach will fix overflows in just that particular case. The
second approach will catch and error-out overflows anywhere. There is a
short-term down side to the second approach; it might cause a slew of
error reports but will eventually help clean up all bad behavior.

RP



Re: [PATCH 4/9] x86, pkeys: override pkey when moving away from PROT_EXEC

2018-04-06 Thread Ram Pai
On Fri, Apr 06, 2018 at 05:47:29PM -0700, Dave Hansen wrote:
> On 04/06/2018 05:09 PM, Ram Pai wrote:
> >> -  /*
> >> -   * Look for a protection-key-drive execute-only mapping
> >> -   * which is now being given permissions that are not
> >> -   * execute-only.  Move it back to the default pkey.
> >> -   */
> >> -  if (vma_is_pkey_exec_only(vma) &&
> >> -  (prot & (PROT_READ|PROT_WRITE))) {
> >> -  return 0;
> >> -  }
> >> +
> > Dave,
> > this can be simply:
> > 
> > if ((vma_is_pkey_exec_only(vma) && (prot != PROT_EXEC))
> > return ARCH_DEFAULT_PKEY;
> 
> Yes, but we're removing that code entirely. :)

Well :). my point is add this code and delete the other
code that you add later in that function.

RP



-- 
Ram Pai



Re: [PATCH 4/9] x86, pkeys: override pkey when moving away from PROT_EXEC

2018-04-06 Thread Ram Pai
On Fri, Apr 06, 2018 at 05:47:29PM -0700, Dave Hansen wrote:
> On 04/06/2018 05:09 PM, Ram Pai wrote:
> >> -  /*
> >> -   * Look for a protection-key-drive execute-only mapping
> >> -   * which is now being given permissions that are not
> >> -   * execute-only.  Move it back to the default pkey.
> >> -   */
> >> -  if (vma_is_pkey_exec_only(vma) &&
> >> -  (prot & (PROT_READ|PROT_WRITE))) {
> >> -  return 0;
> >> -  }
> >> +
> > Dave,
> > this can be simply:
> > 
> > if ((vma_is_pkey_exec_only(vma) && (prot != PROT_EXEC))
> > return ARCH_DEFAULT_PKEY;
> 
> Yes, but we're removing that code entirely. :)

Well :). my point is add this code and delete the other
code that you add later in that function.

RP



-- 
Ram Pai



Re: [PATCH 4/9] x86, pkeys: override pkey when moving away from PROT_EXEC

2018-04-06 Thread Ram Pai
On Mon, Mar 26, 2018 at 10:27:27AM -0700, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.han...@linux.intel.com>
> 
> I got a bug report that the following code (roughly) was
> causing a SIGSEGV:
> 
>   mprotect(ptr, size, PROT_EXEC);
>   mprotect(ptr, size, PROT_NONE);
>   mprotect(ptr, size, PROT_READ);
>   *ptr = 100;
> 
> The problem is hit when the mprotect(PROT_EXEC)
> is implicitly assigned a protection key to the VMA, and made
> that key ACCESS_DENY|WRITE_DENY.  The PROT_NONE mprotect()
> failed to remove the protection key, and the PROT_NONE->
> PROT_READ left the PTE usable, but the pkey still in place
> and left the memory inaccessible.
> 
> To fix this, we ensure that we always "override" the pkee
> at mprotect() if the VMA does not have execute-only
> permissions, but the VMA has the execute-only pkey.
> 
> We had a check for PROT_READ/WRITE, but it did not work
> for PROT_NONE.  This entirely removes the PROT_* checks,
> which ensures that PROT_NONE now works.
> 
> Reported-by: Shakeel Butt <shake...@google.com>
> 
> Signed-off-by: Dave Hansen <dave.han...@linux.intel.com>
> Fixes: 62b5f7d013f ("mm/core, x86/mm/pkeys: Add execute-only protection keys 
> support")
> Cc: sta...@kernel.org
> Cc: Ram Pai <linux...@us.ibm.com>
> Cc: Thomas Gleixner <t...@linutronix.de>
> Cc: Dave Hansen <dave.han...@intel.com>
> Cc: Michael Ellermen <m...@ellerman.id.au>
> Cc: Ingo Molnar <mi...@kernel.org>
> Cc: Andrew Morton <a...@linux-foundation.org>
> Cc: Shuah Khan <sh...@kernel.org>
> ---
> 
>  b/arch/x86/include/asm/pkeys.h |   12 +++-
>  b/arch/x86/mm/pkeys.c  |   19 ++-
>  2 files changed, 21 insertions(+), 10 deletions(-)
> 
> diff -puN 
> arch/x86/include/asm/pkeys.h~pkeys-abandon-exec-only-pkey-more-aggressively 
> arch/x86/include/asm/pkeys.h
> --- 
> a/arch/x86/include/asm/pkeys.h~pkeys-abandon-exec-only-pkey-more-aggressively 
> 2018-03-26 10:22:35.380170193 -0700
> +++ b/arch/x86/include/asm/pkeys.h2018-03-26 10:22:35.385170193 -0700
> @@ -2,6 +2,8 @@
>  #ifndef _ASM_X86_PKEYS_H
>  #define _ASM_X86_PKEYS_H
> 
> +#define ARCH_DEFAULT_PKEY0
> +
>  #define arch_max_pkey() (boot_cpu_has(X86_FEATURE_OSPKE) ? 16 : 1)
> 
>  extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
> @@ -15,7 +17,7 @@ extern int __execute_only_pkey(struct mm
>  static inline int execute_only_pkey(struct mm_struct *mm)
>  {
>   if (!boot_cpu_has(X86_FEATURE_OSPKE))
> - return 0;
> + return ARCH_DEFAULT_PKEY;
> 
>   return __execute_only_pkey(mm);
>  }
> @@ -56,6 +58,14 @@ bool mm_pkey_is_allocated(struct mm_stru
>   return false;
>   if (pkey >= arch_max_pkey())
>   return false;
> + /*
> +  * The exec-only pkey is set in the allocation map, but
> +  * is not available to any of the user interfaces like
> +  * mprotect_pkey().
> +  */
> + if (pkey == mm->context.execute_only_pkey)
> + return false;
> +
>   return mm_pkey_allocation_map(mm) & (1U << pkey);
>  }
> 
> diff -puN arch/x86/mm/pkeys.c~pkeys-abandon-exec-only-pkey-more-aggressively 
> arch/x86/mm/pkeys.c
> --- a/arch/x86/mm/pkeys.c~pkeys-abandon-exec-only-pkey-more-aggressively  
> 2018-03-26 10:22:35.381170193 -0700
> +++ b/arch/x86/mm/pkeys.c 2018-03-26 10:22:35.385170193 -0700
> @@ -94,15 +94,7 @@ int __arch_override_mprotect_pkey(struct
>*/
>   if (pkey != -1)
>   return pkey;
> - /*
> -  * Look for a protection-key-drive execute-only mapping
> -  * which is now being given permissions that are not
> -  * execute-only.  Move it back to the default pkey.
> -  */
> - if (vma_is_pkey_exec_only(vma) &&
> - (prot & (PROT_READ|PROT_WRITE))) {
> - return 0;
> - }
> +

Dave,
this can be simply:

if ((vma_is_pkey_exec_only(vma) && (prot != PROT_EXEC))
return ARCH_DEFAULT_PKEY;

No?
RP



Re: [PATCH 4/9] x86, pkeys: override pkey when moving away from PROT_EXEC

2018-04-06 Thread Ram Pai
On Mon, Mar 26, 2018 at 10:27:27AM -0700, Dave Hansen wrote:
> 
> From: Dave Hansen 
> 
> I got a bug report that the following code (roughly) was
> causing a SIGSEGV:
> 
>   mprotect(ptr, size, PROT_EXEC);
>   mprotect(ptr, size, PROT_NONE);
>   mprotect(ptr, size, PROT_READ);
>   *ptr = 100;
> 
> The problem is hit when the mprotect(PROT_EXEC)
> is implicitly assigned a protection key to the VMA, and made
> that key ACCESS_DENY|WRITE_DENY.  The PROT_NONE mprotect()
> failed to remove the protection key, and the PROT_NONE->
> PROT_READ left the PTE usable, but the pkey still in place
> and left the memory inaccessible.
> 
> To fix this, we ensure that we always "override" the pkee
> at mprotect() if the VMA does not have execute-only
> permissions, but the VMA has the execute-only pkey.
> 
> We had a check for PROT_READ/WRITE, but it did not work
> for PROT_NONE.  This entirely removes the PROT_* checks,
> which ensures that PROT_NONE now works.
> 
> Reported-by: Shakeel Butt 
> 
> Signed-off-by: Dave Hansen 
> Fixes: 62b5f7d013f ("mm/core, x86/mm/pkeys: Add execute-only protection keys 
> support")
> Cc: sta...@kernel.org
> Cc: Ram Pai 
> Cc: Thomas Gleixner 
> Cc: Dave Hansen 
> Cc: Michael Ellermen 
> Cc: Ingo Molnar 
> Cc: Andrew Morton 
> Cc: Shuah Khan 
> ---
> 
>  b/arch/x86/include/asm/pkeys.h |   12 +++-
>  b/arch/x86/mm/pkeys.c  |   19 ++-
>  2 files changed, 21 insertions(+), 10 deletions(-)
> 
> diff -puN 
> arch/x86/include/asm/pkeys.h~pkeys-abandon-exec-only-pkey-more-aggressively 
> arch/x86/include/asm/pkeys.h
> --- 
> a/arch/x86/include/asm/pkeys.h~pkeys-abandon-exec-only-pkey-more-aggressively 
> 2018-03-26 10:22:35.380170193 -0700
> +++ b/arch/x86/include/asm/pkeys.h2018-03-26 10:22:35.385170193 -0700
> @@ -2,6 +2,8 @@
>  #ifndef _ASM_X86_PKEYS_H
>  #define _ASM_X86_PKEYS_H
> 
> +#define ARCH_DEFAULT_PKEY0
> +
>  #define arch_max_pkey() (boot_cpu_has(X86_FEATURE_OSPKE) ? 16 : 1)
> 
>  extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
> @@ -15,7 +17,7 @@ extern int __execute_only_pkey(struct mm
>  static inline int execute_only_pkey(struct mm_struct *mm)
>  {
>   if (!boot_cpu_has(X86_FEATURE_OSPKE))
> - return 0;
> + return ARCH_DEFAULT_PKEY;
> 
>   return __execute_only_pkey(mm);
>  }
> @@ -56,6 +58,14 @@ bool mm_pkey_is_allocated(struct mm_stru
>   return false;
>   if (pkey >= arch_max_pkey())
>   return false;
> + /*
> +  * The exec-only pkey is set in the allocation map, but
> +  * is not available to any of the user interfaces like
> +  * mprotect_pkey().
> +  */
> + if (pkey == mm->context.execute_only_pkey)
> + return false;
> +
>   return mm_pkey_allocation_map(mm) & (1U << pkey);
>  }
> 
> diff -puN arch/x86/mm/pkeys.c~pkeys-abandon-exec-only-pkey-more-aggressively 
> arch/x86/mm/pkeys.c
> --- a/arch/x86/mm/pkeys.c~pkeys-abandon-exec-only-pkey-more-aggressively  
> 2018-03-26 10:22:35.381170193 -0700
> +++ b/arch/x86/mm/pkeys.c 2018-03-26 10:22:35.385170193 -0700
> @@ -94,15 +94,7 @@ int __arch_override_mprotect_pkey(struct
>*/
>   if (pkey != -1)
>   return pkey;
> - /*
> -  * Look for a protection-key-drive execute-only mapping
> -  * which is now being given permissions that are not
> -  * execute-only.  Move it back to the default pkey.
> -  */
> - if (vma_is_pkey_exec_only(vma) &&
> - (prot & (PROT_READ|PROT_WRITE))) {
> - return 0;
> - }
> +

Dave,
this can be simply:

if ((vma_is_pkey_exec_only(vma) && (prot != PROT_EXEC))
return ARCH_DEFAULT_PKEY;

No?
RP



Re: [PATCH v2] powerpc, pkey: make protection key 0 less special

2018-04-06 Thread Ram Pai
On Wed, Apr 04, 2018 at 06:41:01PM -0300, Thiago Jung Bauermann wrote:
> 
> Hello Ram,
> 
> Ram Pai <linux...@us.ibm.com> writes:
> 
> > Applications need the ability to associate an address-range with some
> > key and latter revert to its initial default key. Pkey-0 comes close to
> > providing this function but falls short, because the current
> > implementation disallows applications to explicitly associate pkey-0 to
> > the address range.
> >
> > Lets make pkey-0 less special and treat it almost like any other key.
> > Thus it can be explicitly associated with any address range, and can be
> > freed. This gives the application more flexibility and power.  The
> > ability to free pkey-0 must be used responsibily, since pkey-0 is
> > associated with almost all address-range by default.
> >
> > Even with this change pkey-0 continues to be slightly more special
> > from the following point of view.
> > (a) it is implicitly allocated.
> > (b) it is the default key assigned to any address-range.
> 
> It's also special in more ways (and if intentional, these should be part
> of the commit message as well):
> 
> (c) it's not possible to change permissions for key 0
> 
>   This has two causes: this patch explicitly forbids it in
>   arch_set_user_pkey_access(), and also because even if it's allocated,
>   the bits for key 0 in AMOR and UAMOR aren't set.

Yes. will have to capture that one aswell.

we cannot let userspace change permissions on key 0 because
doing so will hurt the kernel too. Unlike x86 where keys are effective
only in userspace, powerpc keys are effective even in the kernel.
So if the kernel tries to access something, it will get stuck forever.
Almost everything in the kernel is associated with key-0.

I ran a small test program which disabled access on key 0 from
userspace, and as expected ran into softlockups. It certainly
can lead to denial-of-service-attack. We can let apps
shoot-itself-in-its-foot but if the shot hurts someone else, we will
have to stop it.

The key-semantics discussed with the x86 folks did not 
explicitly say anything about changing permissions on key-0. We will
have to keep that part of the semantics open-ended.

> 
> (d) it can be freed, but can't be allocated again later.
> 
>   This is because mm_pkey_alloc() only calls __arch_activate_pkey(ret)
>   if ret > 0.
> 
> It looks like (d) is a bug. Either mm_pkey_free() should fail with key
> 0, or mm_pkey_alloc() should work with it.

Well, it can be allocated, just that we do not let userspace change the
permissions on the key.  __arch_activate_pkey(ret) does not get called
for pkey-0.


> 
> (c) could be a measure to prevent users from shooting themselves in
> their feet. But if that is the case, then mm_pkey_free() should forbid
> freeing key 0 too.
> 
> > Tested on powerpc.
> >
> > cc: Thomas Gleixner <t...@linutronix.de>
> > cc: Dave Hansen <dave.han...@intel.com>
> > cc: Michael Ellermen <m...@ellerman.id.au>
> > cc: Ingo Molnar <mi...@kernel.org>
> > cc: Andrew Morton <a...@linux-foundation.org>
> > Signed-off-by: Ram Pai <linux...@us.ibm.com>
> > ---
> > History:
> > v2: mm_pkey_is_allocated() continued to treat pkey-0 special.
> > fixed it.
> >
> >  arch/powerpc/include/asm/pkeys.h | 20 
> >  arch/powerpc/mm/pkeys.c  | 20 
> >  2 files changed, 28 insertions(+), 12 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/pkeys.h 
> > b/arch/powerpc/include/asm/pkeys.h
> > index 0409c80..b598fa9 100644
> > --- a/arch/powerpc/include/asm/pkeys.h
> > +++ b/arch/powerpc/include/asm/pkeys.h
> > @@ -101,10 +101,14 @@ static inline u16 pte_to_pkey_bits(u64 pteflags)
> >
> >  static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey)
> >  {
> > -   /* A reserved key is never considered as 'explicitly allocated' */
> > -   return ((pkey < arch_max_pkey()) &&
> > -   !__mm_pkey_is_reserved(pkey) &&
> > -   __mm_pkey_is_allocated(mm, pkey));
> > +   if (pkey < 0 || pkey >= arch_max_pkey())
> > +   return false;
> > +
> > +   /* Reserved keys are never allocated. */
> > +   if (__mm_pkey_is_reserved(pkey))
> > +   return false;
> > +
> > +   return __mm_pkey_is_allocated(mm, pkey);
> >  }
> >
> >  extern void __arch_activate_pkey(int pkey);
> > @@ -200,6 +204,14 @@ static inline int arch_set_user_pkey_access(struct 
> > task_struct *tsk, int pkey,
> >  {
> > if (static_branch_likely(_disabled))
&

Re: [PATCH v2] powerpc, pkey: make protection key 0 less special

2018-04-06 Thread Ram Pai
On Wed, Apr 04, 2018 at 06:41:01PM -0300, Thiago Jung Bauermann wrote:
> 
> Hello Ram,
> 
> Ram Pai  writes:
> 
> > Applications need the ability to associate an address-range with some
> > key and latter revert to its initial default key. Pkey-0 comes close to
> > providing this function but falls short, because the current
> > implementation disallows applications to explicitly associate pkey-0 to
> > the address range.
> >
> > Lets make pkey-0 less special and treat it almost like any other key.
> > Thus it can be explicitly associated with any address range, and can be
> > freed. This gives the application more flexibility and power.  The
> > ability to free pkey-0 must be used responsibily, since pkey-0 is
> > associated with almost all address-range by default.
> >
> > Even with this change pkey-0 continues to be slightly more special
> > from the following point of view.
> > (a) it is implicitly allocated.
> > (b) it is the default key assigned to any address-range.
> 
> It's also special in more ways (and if intentional, these should be part
> of the commit message as well):
> 
> (c) it's not possible to change permissions for key 0
> 
>   This has two causes: this patch explicitly forbids it in
>   arch_set_user_pkey_access(), and also because even if it's allocated,
>   the bits for key 0 in AMOR and UAMOR aren't set.

Yes. will have to capture that one aswell.

we cannot let userspace change permissions on key 0 because
doing so will hurt the kernel too. Unlike x86 where keys are effective
only in userspace, powerpc keys are effective even in the kernel.
So if the kernel tries to access something, it will get stuck forever.
Almost everything in the kernel is associated with key-0.

I ran a small test program which disabled access on key 0 from
userspace, and as expected ran into softlockups. It certainly
can lead to denial-of-service-attack. We can let apps
shoot-itself-in-its-foot but if the shot hurts someone else, we will
have to stop it.

The key-semantics discussed with the x86 folks did not 
explicitly say anything about changing permissions on key-0. We will
have to keep that part of the semantics open-ended.

> 
> (d) it can be freed, but can't be allocated again later.
> 
>   This is because mm_pkey_alloc() only calls __arch_activate_pkey(ret)
>   if ret > 0.
> 
> It looks like (d) is a bug. Either mm_pkey_free() should fail with key
> 0, or mm_pkey_alloc() should work with it.

Well, it can be allocated, just that we do not let userspace change the
permissions on the key.  __arch_activate_pkey(ret) does not get called
for pkey-0.


> 
> (c) could be a measure to prevent users from shooting themselves in
> their feet. But if that is the case, then mm_pkey_free() should forbid
> freeing key 0 too.
> 
> > Tested on powerpc.
> >
> > cc: Thomas Gleixner 
> > cc: Dave Hansen 
> > cc: Michael Ellermen 
> > cc: Ingo Molnar 
> > cc: Andrew Morton 
> > Signed-off-by: Ram Pai 
> > ---
> > History:
> > v2: mm_pkey_is_allocated() continued to treat pkey-0 special.
> > fixed it.
> >
> >  arch/powerpc/include/asm/pkeys.h | 20 
> >  arch/powerpc/mm/pkeys.c  | 20 
> >  2 files changed, 28 insertions(+), 12 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/pkeys.h 
> > b/arch/powerpc/include/asm/pkeys.h
> > index 0409c80..b598fa9 100644
> > --- a/arch/powerpc/include/asm/pkeys.h
> > +++ b/arch/powerpc/include/asm/pkeys.h
> > @@ -101,10 +101,14 @@ static inline u16 pte_to_pkey_bits(u64 pteflags)
> >
> >  static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey)
> >  {
> > -   /* A reserved key is never considered as 'explicitly allocated' */
> > -   return ((pkey < arch_max_pkey()) &&
> > -   !__mm_pkey_is_reserved(pkey) &&
> > -   __mm_pkey_is_allocated(mm, pkey));
> > +   if (pkey < 0 || pkey >= arch_max_pkey())
> > +   return false;
> > +
> > +   /* Reserved keys are never allocated. */
> > +   if (__mm_pkey_is_reserved(pkey))
> > +   return false;
> > +
> > +   return __mm_pkey_is_allocated(mm, pkey);
> >  }
> >
> >  extern void __arch_activate_pkey(int pkey);
> > @@ -200,6 +204,14 @@ static inline int arch_set_user_pkey_access(struct 
> > task_struct *tsk, int pkey,
> >  {
> > if (static_branch_likely(_disabled))
> > return -EINVAL;
> > +
> > +   /*
> > +* userspace is discouraged from changing permissions of
> > +* pkey-0.
> 
> They're not discouraged. They're

Re: [PATCH] resource: Fix integer overflow at reallocation

2018-04-05 Thread Ram Pai
On Thu, Apr 05, 2018 at 04:40:15PM +0200, Takashi Iwai wrote:
> On Mon, 02 Apr 2018 22:35:04 +0200,
> Takashi Iwai wrote:
> > 
> > On Mon, 02 Apr 2018 21:09:03 +0200,
> > Ram Pai wrote:
> > > 
> > > On Mon, Apr 02, 2018 at 09:16:16AM +0200, Takashi Iwai wrote:
> > > > We've got a bug report indicating a kernel panic at booting on an
> > > > x86-32 system, and it turned out to be the invalid resource assigned
> > > > after reallocation.  __find_resource() first aligns the resource start
> > > > address and resets the end address with start+size-1 accordingly, then
> > > > checks whether it's contained.  Here the end address may overflow the
> > > > integer, although resource_contains() still returns true because the
> > > > function validates only start and end address.  So this ends up with
> > > > returning an invalid resource (start > end).
> > > > 
> > > > There was already an attempt to cover such a problem in the commit
> > > > 47ea91b4052d ("Resource: fix wrong resource window calculation"), but
> > > > this case is an overseen one.
> > > > 
> > > > This patch adds the validity check of the newly calculated resource
> > > > for avoiding the integer overflow problem.
> > > 
> > > Should we move this check "alloc.start <= alloc.end" into 
> > > resource_contains()?
> > > Doing so will catch all uses of such erroneous (overflowing) resources.
> > 
> > I thought of it, too.  OTOH, it's rather a validity check and doesn't
> > belong to resource_contains() semantics.  If the function returns
> > false, you don't know whether it's an invalid resource or it's really
> > not contained.
> > 
> > We may return an error code, but I'm not sure whether such an API
> > change is needed.  Maybe not.
> 
> FWIW, below is the revised one to move the check into
> resource_contains().
> 
> My concern is, however, that the resource validity check isn't a job
> of resource_contains().  OTOH, this may avoid other similar cases, so
> it might be worth.
> 
> In anyway, if there is no objection, and anyone else doesn't want to
> take, I'll forward this to Andrew as a poor orphan kid.

I will stand by you as a poor-orphan-buddy. :-) 

Reviewed-by: Ram Pai <linux...@us.ibm.com>

RP

> 
> 
> thanks,
> 
> Takashi
> 
> -- 8< --
> From: Takashi Iwai <ti...@suse.de>
> Subject: [PATCH v2] resource: Fix integer overflow at reallocation
> 
> We've got a bug report indicating a kernel panic at booting on an
> x86-32 system, and it turned out to be the invalid resource assigned
> after reallocation.  __find_resource() first aligns the resource start
> address and resets the end address with start+size-1 accordingly, then
> checks whether it's contained.  Here the end address may overflow the
> integer, although resource_contains() still returns true because the
> function validates only start and end address.  So this ends up with
> returning an invalid resource (start > end).
> 
> There was already an attempt to cover such a problem in the commit
> 47ea91b4052d ("Resource: fix wrong resource window calculation"), but
> this case is an overseen one.
> 
> This patch adds the validity check in resource_contains() to see
> whether the given resource has a valid range for avoiding the integer
> overflow problem.



> 
> Bugzilla: 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__bugzilla.opensuse.org_show-5Fbug.cgi-3Fid-3D1086739=DwIBAg=jf_iaSHvJObTbx-siA1ZOg=m-UrKChQVkZtnPpjbF6YY99NbT8FBByQ-E-ygV8luxw=NNkNAWFZc1XRu3rkv7Y2sepSCe8re2cvNZuCJt5dWFE=7i3g3ZVYsUceGVK_ZKkCJIABp4l0RD59NelK3GoD8mI=
> Fixes: 23c570a67448 ("resource: ability to resize an allocated resource")
> Reported-and-tested-by: Michael Henders <hende...@shaw.ca>
> Cc: <sta...@vger.kernel.org>
> Signed-off-by: Takashi Iwai <ti...@suse.de>
> ---
>  include/linux/ioport.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> index da0ebaec25f0..466d7be046eb 100644
> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -212,6 +212,9 @@ static inline bool resource_contains(struct resource *r1, 
> struct resource *r2)
>   return false;
>   if (r1->flags & IORESOURCE_UNSET || r2->flags & IORESOURCE_UNSET)
>   return false;
> + /* sanity check whether it's a valid resource range */
> + if (r2->end < r2->start)
> + return false;
>   return r1->start <= r2->start && r1->end >= r2->end;
>  }
>  
> -- 
> 2.16.3

-- 
Ram Pai



Re: [PATCH] resource: Fix integer overflow at reallocation

2018-04-05 Thread Ram Pai
On Thu, Apr 05, 2018 at 04:40:15PM +0200, Takashi Iwai wrote:
> On Mon, 02 Apr 2018 22:35:04 +0200,
> Takashi Iwai wrote:
> > 
> > On Mon, 02 Apr 2018 21:09:03 +0200,
> > Ram Pai wrote:
> > > 
> > > On Mon, Apr 02, 2018 at 09:16:16AM +0200, Takashi Iwai wrote:
> > > > We've got a bug report indicating a kernel panic at booting on an
> > > > x86-32 system, and it turned out to be the invalid resource assigned
> > > > after reallocation.  __find_resource() first aligns the resource start
> > > > address and resets the end address with start+size-1 accordingly, then
> > > > checks whether it's contained.  Here the end address may overflow the
> > > > integer, although resource_contains() still returns true because the
> > > > function validates only start and end address.  So this ends up with
> > > > returning an invalid resource (start > end).
> > > > 
> > > > There was already an attempt to cover such a problem in the commit
> > > > 47ea91b4052d ("Resource: fix wrong resource window calculation"), but
> > > > this case is an overseen one.
> > > > 
> > > > This patch adds the validity check of the newly calculated resource
> > > > for avoiding the integer overflow problem.
> > > 
> > > Should we move this check "alloc.start <= alloc.end" into 
> > > resource_contains()?
> > > Doing so will catch all uses of such erroneous (overflowing) resources.
> > 
> > I thought of it, too.  OTOH, it's rather a validity check and doesn't
> > belong to resource_contains() semantics.  If the function returns
> > false, you don't know whether it's an invalid resource or it's really
> > not contained.
> > 
> > We may return an error code, but I'm not sure whether such an API
> > change is needed.  Maybe not.
> 
> FWIW, below is the revised one to move the check into
> resource_contains().
> 
> My concern is, however, that the resource validity check isn't a job
> of resource_contains().  OTOH, this may avoid other similar cases, so
> it might be worth.
> 
> In anyway, if there is no objection, and anyone else doesn't want to
> take, I'll forward this to Andrew as a poor orphan kid.

I will stand by you as a poor-orphan-buddy. :-) 

Reviewed-by: Ram Pai 

RP

> 
> 
> thanks,
> 
> Takashi
> 
> -- 8< --
> From: Takashi Iwai 
> Subject: [PATCH v2] resource: Fix integer overflow at reallocation
> 
> We've got a bug report indicating a kernel panic at booting on an
> x86-32 system, and it turned out to be the invalid resource assigned
> after reallocation.  __find_resource() first aligns the resource start
> address and resets the end address with start+size-1 accordingly, then
> checks whether it's contained.  Here the end address may overflow the
> integer, although resource_contains() still returns true because the
> function validates only start and end address.  So this ends up with
> returning an invalid resource (start > end).
> 
> There was already an attempt to cover such a problem in the commit
> 47ea91b4052d ("Resource: fix wrong resource window calculation"), but
> this case is an overseen one.
> 
> This patch adds the validity check in resource_contains() to see
> whether the given resource has a valid range for avoiding the integer
> overflow problem.



> 
> Bugzilla: 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__bugzilla.opensuse.org_show-5Fbug.cgi-3Fid-3D1086739=DwIBAg=jf_iaSHvJObTbx-siA1ZOg=m-UrKChQVkZtnPpjbF6YY99NbT8FBByQ-E-ygV8luxw=NNkNAWFZc1XRu3rkv7Y2sepSCe8re2cvNZuCJt5dWFE=7i3g3ZVYsUceGVK_ZKkCJIABp4l0RD59NelK3GoD8mI=
> Fixes: 23c570a67448 ("resource: ability to resize an allocated resource")
> Reported-and-tested-by: Michael Henders 
> Cc: 
> Signed-off-by: Takashi Iwai 
> ---
>  include/linux/ioport.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> index da0ebaec25f0..466d7be046eb 100644
> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -212,6 +212,9 @@ static inline bool resource_contains(struct resource *r1, 
> struct resource *r2)
>   return false;
>   if (r1->flags & IORESOURCE_UNSET || r2->flags & IORESOURCE_UNSET)
>   return false;
> + /* sanity check whether it's a valid resource range */
> + if (r2->end < r2->start)
> + return false;
>   return r1->start <= r2->start && r1->end >= r2->end;
>  }
>  
> -- 
> 2.16.3

-- 
Ram Pai



Re: [PATCH] resource: Fix integer overflow at reallocation

2018-04-02 Thread Ram Pai
On Mon, Apr 02, 2018 at 09:16:16AM +0200, Takashi Iwai wrote:
> We've got a bug report indicating a kernel panic at booting on an
> x86-32 system, and it turned out to be the invalid resource assigned
> after reallocation.  __find_resource() first aligns the resource start
> address and resets the end address with start+size-1 accordingly, then
> checks whether it's contained.  Here the end address may overflow the
> integer, although resource_contains() still returns true because the
> function validates only start and end address.  So this ends up with
> returning an invalid resource (start > end).
> 
> There was already an attempt to cover such a problem in the commit
> 47ea91b4052d ("Resource: fix wrong resource window calculation"), but
> this case is an overseen one.
> 
> This patch adds the validity check of the newly calculated resource
> for avoiding the integer overflow problem.

Should we move this check "alloc.start <= alloc.end" into resource_contains()?
Doing so will catch all uses of such erroneous (overflowing) resources.

RP

> 
> Bugzilla: 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__bugzilla.opensuse.org_show-5Fbug.cgi-3Fid-3D1086739=DwIBAg=jf_iaSHvJObTbx-siA1ZOg=m-UrKChQVkZtnPpjbF6YY99NbT8FBByQ-E-ygV8luxw=FoiwlR-LTJ9_EBQsLYSCqXuWrGhU1lXycdvhbaK7wOk=clxOtFUIAMlPNwQJZTaKnmIta9pMtJ8XprmwVd-ylvo=
> Fixes: 23c570a67448 ("resource: ability to resize an allocated resource")
> Reported-and-tested-by: Michael Henders <hende...@shaw.ca>
> Cc: <sta...@vger.kernel.org>
> Signed-off-by: Takashi Iwai <ti...@suse.de>
> ---
> 
> Bjorn, I send this to you since the bug hits during PCI init, although
> the culprit is in generic resource management.
> 
>  kernel/resource.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/resource.c b/kernel/resource.c
> index e270b5048988..2af6c03858b9 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -651,7 +651,8 @@ static int __find_resource(struct resource *root, struct 
> resource *old,
>   alloc.start = 
> constraint->alignf(constraint->alignf_data, ,
>   size, constraint->align);
>   alloc.end = alloc.start + size - 1;
> - if (resource_contains(, )) {
> + if (alloc.start <= alloc.end &&
> + resource_contains(, )) {
>   new->start = alloc.start;
>   new->end = alloc.end;
>   return 0;
> -- 
> 2.16.2

-- 
Ram Pai



Re: [PATCH] resource: Fix integer overflow at reallocation

2018-04-02 Thread Ram Pai
On Mon, Apr 02, 2018 at 09:16:16AM +0200, Takashi Iwai wrote:
> We've got a bug report indicating a kernel panic at booting on an
> x86-32 system, and it turned out to be the invalid resource assigned
> after reallocation.  __find_resource() first aligns the resource start
> address and resets the end address with start+size-1 accordingly, then
> checks whether it's contained.  Here the end address may overflow the
> integer, although resource_contains() still returns true because the
> function validates only start and end address.  So this ends up with
> returning an invalid resource (start > end).
> 
> There was already an attempt to cover such a problem in the commit
> 47ea91b4052d ("Resource: fix wrong resource window calculation"), but
> this case is an overseen one.
> 
> This patch adds the validity check of the newly calculated resource
> for avoiding the integer overflow problem.

Should we move this check "alloc.start <= alloc.end" into resource_contains()?
Doing so will catch all uses of such erroneous (overflowing) resources.

RP

> 
> Bugzilla: 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__bugzilla.opensuse.org_show-5Fbug.cgi-3Fid-3D1086739=DwIBAg=jf_iaSHvJObTbx-siA1ZOg=m-UrKChQVkZtnPpjbF6YY99NbT8FBByQ-E-ygV8luxw=FoiwlR-LTJ9_EBQsLYSCqXuWrGhU1lXycdvhbaK7wOk=clxOtFUIAMlPNwQJZTaKnmIta9pMtJ8XprmwVd-ylvo=
> Fixes: 23c570a67448 ("resource: ability to resize an allocated resource")
> Reported-and-tested-by: Michael Henders 
> Cc: 
> Signed-off-by: Takashi Iwai 
> ---
> 
> Bjorn, I send this to you since the bug hits during PCI init, although
> the culprit is in generic resource management.
> 
>  kernel/resource.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/resource.c b/kernel/resource.c
> index e270b5048988..2af6c03858b9 100644
> --- a/kernel/resource.c
> +++ b/kernel/resource.c
> @@ -651,7 +651,8 @@ static int __find_resource(struct resource *root, struct 
> resource *old,
>   alloc.start = 
> constraint->alignf(constraint->alignf_data, ,
>   size, constraint->align);
>   alloc.end = alloc.start + size - 1;
> - if (resource_contains(, )) {
> + if (alloc.start <= alloc.end &&
> + resource_contains(, )) {
>   new->start = alloc.start;
>   new->end = alloc.end;
>   return 0;
> -- 
> 2.16.2

-- 
Ram Pai



[PATCH v13 1/3] mm, powerpc, x86: define VM_PKEY_BITx bits if CONFIG_ARCH_HAS_PKEYS is enabled

2018-03-27 Thread Ram Pai
VM_PKEY_BITx are defined only if CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
is enabled. Powerpc also needs these bits. Hence lets define the
VM_PKEY_BITx bits for any architecture that enables
CONFIG_ARCH_HAS_PKEYS.

cc: Michael Ellermen <m...@ellerman.id.au>
cc: Benjamin Herrenschmidt <b...@kernel.crashing.org>
cc: Andrew Morton <a...@linux-foundation.org>
Reviewed-by: Dave Hansen <dave.han...@intel.com>
Signed-off-by: Ram Pai <linux...@us.ibm.com>
Reviewed-by: Ingo Molnar <mi...@kernel.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.ku...@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pkeys.h |2 ++
 fs/proc/task_mmu.c   |4 ++--
 include/linux/mm.h   |9 +
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 0d3c630..99344d7 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -26,6 +26,8 @@
 # define VM_PKEY_BIT2  VM_HIGH_ARCH_2
 # define VM_PKEY_BIT3  VM_HIGH_ARCH_3
 # define VM_PKEY_BIT4  VM_HIGH_ARCH_4
+#elif !defined(VM_PKEY_BIT4)
+# define VM_PKEY_BIT4  VM_HIGH_ARCH_4
 #endif
 
 #define ARCH_VM_PKEY_FLAGS (VM_PKEY_BIT0 | VM_PKEY_BIT1 | VM_PKEY_BIT2 | \
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index ec6d298..6b996d0 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -679,13 +679,13 @@ static void show_smap_vma_flags(struct seq_file *m, 
struct vm_area_struct *vma)
[ilog2(VM_MERGEABLE)]   = "mg",
[ilog2(VM_UFFD_MISSING)]= "um",
[ilog2(VM_UFFD_WP)] = "uw",
-#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
+#ifdef CONFIG_ARCH_HAS_PKEYS
/* These come out via ProtectionKey: */
[ilog2(VM_PKEY_BIT0)]   = "",
[ilog2(VM_PKEY_BIT1)]   = "",
[ilog2(VM_PKEY_BIT2)]   = "",
[ilog2(VM_PKEY_BIT3)]   = "",
-#endif
+#endif /* CONFIG_ARCH_HAS_PKEYS */
};
size_t i;
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ad06d42..ad207ad 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -228,15 +228,16 @@ extern int overcommit_kbytes_handler(struct ctl_table *, 
int, void __user *,
 #define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4)
 #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */
 
-#if defined(CONFIG_X86)
-# define VM_PATVM_ARCH_1   /* PAT reserves whole VMA at 
once (x86) */
-#if defined (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS)
+#ifdef CONFIG_ARCH_HAS_PKEYS
 # define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0
 # define VM_PKEY_BIT0  VM_HIGH_ARCH_0  /* A protection key is a 4-bit value */
 # define VM_PKEY_BIT1  VM_HIGH_ARCH_1
 # define VM_PKEY_BIT2  VM_HIGH_ARCH_2
 # define VM_PKEY_BIT3  VM_HIGH_ARCH_3
-#endif
+#endif /* CONFIG_ARCH_HAS_PKEYS */
+
+#if defined(CONFIG_X86)
+# define VM_PATVM_ARCH_1   /* PAT reserves whole VMA at 
once (x86) */
 #elif defined(CONFIG_PPC)
 # define VM_SAOVM_ARCH_1   /* Strong Access Ordering 
(powerpc) */
 #elif defined(CONFIG_PARISC)
-- 
1.7.1



[PATCH v13 1/3] mm, powerpc, x86: define VM_PKEY_BITx bits if CONFIG_ARCH_HAS_PKEYS is enabled

2018-03-27 Thread Ram Pai
VM_PKEY_BITx are defined only if CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
is enabled. Powerpc also needs these bits. Hence lets define the
VM_PKEY_BITx bits for any architecture that enables
CONFIG_ARCH_HAS_PKEYS.

cc: Michael Ellermen 
cc: Benjamin Herrenschmidt 
cc: Andrew Morton 
Reviewed-by: Dave Hansen 
Signed-off-by: Ram Pai 
Reviewed-by: Ingo Molnar 
Reviewed-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/pkeys.h |2 ++
 fs/proc/task_mmu.c   |4 ++--
 include/linux/mm.h   |9 +
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 0d3c630..99344d7 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -26,6 +26,8 @@
 # define VM_PKEY_BIT2  VM_HIGH_ARCH_2
 # define VM_PKEY_BIT3  VM_HIGH_ARCH_3
 # define VM_PKEY_BIT4  VM_HIGH_ARCH_4
+#elif !defined(VM_PKEY_BIT4)
+# define VM_PKEY_BIT4  VM_HIGH_ARCH_4
 #endif
 
 #define ARCH_VM_PKEY_FLAGS (VM_PKEY_BIT0 | VM_PKEY_BIT1 | VM_PKEY_BIT2 | \
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index ec6d298..6b996d0 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -679,13 +679,13 @@ static void show_smap_vma_flags(struct seq_file *m, 
struct vm_area_struct *vma)
[ilog2(VM_MERGEABLE)]   = "mg",
[ilog2(VM_UFFD_MISSING)]= "um",
[ilog2(VM_UFFD_WP)] = "uw",
-#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
+#ifdef CONFIG_ARCH_HAS_PKEYS
/* These come out via ProtectionKey: */
[ilog2(VM_PKEY_BIT0)]   = "",
[ilog2(VM_PKEY_BIT1)]   = "",
[ilog2(VM_PKEY_BIT2)]   = "",
[ilog2(VM_PKEY_BIT3)]   = "",
-#endif
+#endif /* CONFIG_ARCH_HAS_PKEYS */
};
size_t i;
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ad06d42..ad207ad 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -228,15 +228,16 @@ extern int overcommit_kbytes_handler(struct ctl_table *, 
int, void __user *,
 #define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4)
 #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */
 
-#if defined(CONFIG_X86)
-# define VM_PATVM_ARCH_1   /* PAT reserves whole VMA at 
once (x86) */
-#if defined (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS)
+#ifdef CONFIG_ARCH_HAS_PKEYS
 # define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0
 # define VM_PKEY_BIT0  VM_HIGH_ARCH_0  /* A protection key is a 4-bit value */
 # define VM_PKEY_BIT1  VM_HIGH_ARCH_1
 # define VM_PKEY_BIT2  VM_HIGH_ARCH_2
 # define VM_PKEY_BIT3  VM_HIGH_ARCH_3
-#endif
+#endif /* CONFIG_ARCH_HAS_PKEYS */
+
+#if defined(CONFIG_X86)
+# define VM_PATVM_ARCH_1   /* PAT reserves whole VMA at 
once (x86) */
 #elif defined(CONFIG_PPC)
 # define VM_SAOVM_ARCH_1   /* Strong Access Ordering 
(powerpc) */
 #elif defined(CONFIG_PARISC)
-- 
1.7.1



[PATCH v13 3/3] mm, x86, powerpc: display pkey in smaps only if arch supports pkeys

2018-03-27 Thread Ram Pai
Currently the  architecture  specific code is expected to
display  the  protection  keys  in  smap  for a given vma.
This can lead to redundant code and possibly to divergent
formats in which the key gets displayed.

This  patch  changes  the implementation. It displays the
pkey only if the architecture support pkeys, i.e
arch_pkeys_enabled() returns true.  This patch
provides x86 implementation for arch_pkeys_enabled().

x86 arch_show_smap() function is not needed anymore.
Deleting it.

cc: Michael Ellermen <m...@ellerman.id.au>
cc: Benjamin Herrenschmidt <b...@kernel.crashing.org>
cc: Andrew Morton <a...@linux-foundation.org>
Reviewed-by: Dave Hansen <dave.han...@intel.com>
Signed-off-by: Thiago Jung Bauermann <bauer...@linux.vnet.ibm.com>
(fixed compilation errors for x86 configs)
Acked-by: Michal Hocko <mho...@suse.com>
Reviewed-by: Ingo Molnar <mi...@kernel.org>
Signed-off-by: Ram Pai <linux...@us.ibm.com>
---
 arch/powerpc/include/asm/mmu_context.h |5 -
 arch/x86/include/asm/mmu_context.h |5 -
 arch/x86/include/asm/pkeys.h   |1 +
 arch/x86/kernel/fpu/xstate.c   |5 +
 arch/x86/kernel/setup.c|8 
 fs/proc/task_mmu.c |   10 +-
 include/linux/pkeys.h  |7 ++-
 7 files changed, 17 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h 
b/arch/powerpc/include/asm/mmu_context.h
index 051b3d6..566b3c2 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -203,11 +203,6 @@ static inline bool arch_vma_access_permitted(struct 
vm_area_struct *vma,
 #define thread_pkey_regs_restore(new_thread, old_thread)
 #define thread_pkey_regs_init(thread)
 
-static inline int vma_pkey(struct vm_area_struct *vma)
-{
-   return 0;
-}
-
 static inline u64 pte_to_hpte_pkey_bits(u64 pteflags)
 {
return 0x0UL;
diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index 1de72ce..e597d09 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -295,11 +295,6 @@ static inline int vma_pkey(struct vm_area_struct *vma)
 
return (vma->vm_flags & vma_pkey_mask) >> VM_PKEY_SHIFT;
 }
-#else
-static inline int vma_pkey(struct vm_area_struct *vma)
-{
-   return 0;
-}
 #endif
 
 /*
diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h
index a0ba1ff..f6c287b 100644
--- a/arch/x86/include/asm/pkeys.h
+++ b/arch/x86/include/asm/pkeys.h
@@ -6,6 +6,7 @@
 
 extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
unsigned long init_val);
+extern bool arch_pkeys_enabled(void);
 
 /*
  * Try to dedicate one of the protection keys to be used as an
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 87a57b7..4f566e9 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -945,6 +945,11 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int 
pkey,
 
return 0;
 }
+
+bool arch_pkeys_enabled(void)
+{
+   return boot_cpu_has(X86_FEATURE_OSPKE);
+}
 #endif /* ! CONFIG_ARCH_HAS_PKEYS */
 
 /*
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 4c616be..117ed01 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1307,11 +1307,3 @@ static int __init register_kernel_offset_dumper(void)
return 0;
 }
 __initcall(register_kernel_offset_dumper);
-
-void arch_show_smap(struct seq_file *m, struct vm_area_struct *vma)
-{
-   if (!boot_cpu_has(X86_FEATURE_OSPKE))
-   return;
-
-   seq_printf(m, "ProtectionKey:  %8u\n", vma_pkey(vma));
-}
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 6d83bb7..70aa912 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -18,10 +18,12 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 
 void task_mem(struct seq_file *m, struct mm_struct *mm)
@@ -733,10 +735,6 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long 
hmask,
 }
 #endif /* HUGETLB_PAGE */
 
-void __weak arch_show_smap(struct seq_file *m, struct vm_area_struct *vma)
-{
-}
-
 static int show_smap(struct seq_file *m, void *v, int is_pid)
 {
struct proc_maps_private *priv = m->private;
@@ -856,9 +854,11 @@ static int show_smap(struct seq_file *m, void *v, int 
is_pid)
   (unsigned long)(mss->pss >> (10 + PSS_SHIFT)));
 
if (!rollup_mode) {
-   arch_show_smap(m, vma);
+   if (arch_pkeys_enabled())
+   seq_printf(m, "ProtectionKey:  %8u\n", vma_pkey(vma));
show_smap_vma_flags(m, vma);
}
+
m_cache_vma(m, vma);
return ret;
 }
diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h
index 0794ca7..49dff15

[PATCH v13 3/3] mm, x86, powerpc: display pkey in smaps only if arch supports pkeys

2018-03-27 Thread Ram Pai
Currently the  architecture  specific code is expected to
display  the  protection  keys  in  smap  for a given vma.
This can lead to redundant code and possibly to divergent
formats in which the key gets displayed.

This  patch  changes  the implementation. It displays the
pkey only if the architecture support pkeys, i.e
arch_pkeys_enabled() returns true.  This patch
provides x86 implementation for arch_pkeys_enabled().

x86 arch_show_smap() function is not needed anymore.
Deleting it.

cc: Michael Ellermen 
cc: Benjamin Herrenschmidt 
cc: Andrew Morton 
Reviewed-by: Dave Hansen 
Signed-off-by: Thiago Jung Bauermann 
(fixed compilation errors for x86 configs)
Acked-by: Michal Hocko 
Reviewed-by: Ingo Molnar 
Signed-off-by: Ram Pai 
---
 arch/powerpc/include/asm/mmu_context.h |5 -
 arch/x86/include/asm/mmu_context.h |5 -
 arch/x86/include/asm/pkeys.h   |1 +
 arch/x86/kernel/fpu/xstate.c   |5 +
 arch/x86/kernel/setup.c|8 
 fs/proc/task_mmu.c |   10 +-
 include/linux/pkeys.h  |7 ++-
 7 files changed, 17 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h 
b/arch/powerpc/include/asm/mmu_context.h
index 051b3d6..566b3c2 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -203,11 +203,6 @@ static inline bool arch_vma_access_permitted(struct 
vm_area_struct *vma,
 #define thread_pkey_regs_restore(new_thread, old_thread)
 #define thread_pkey_regs_init(thread)
 
-static inline int vma_pkey(struct vm_area_struct *vma)
-{
-   return 0;
-}
-
 static inline u64 pte_to_hpte_pkey_bits(u64 pteflags)
 {
return 0x0UL;
diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index 1de72ce..e597d09 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -295,11 +295,6 @@ static inline int vma_pkey(struct vm_area_struct *vma)
 
return (vma->vm_flags & vma_pkey_mask) >> VM_PKEY_SHIFT;
 }
-#else
-static inline int vma_pkey(struct vm_area_struct *vma)
-{
-   return 0;
-}
 #endif
 
 /*
diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h
index a0ba1ff..f6c287b 100644
--- a/arch/x86/include/asm/pkeys.h
+++ b/arch/x86/include/asm/pkeys.h
@@ -6,6 +6,7 @@
 
 extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
unsigned long init_val);
+extern bool arch_pkeys_enabled(void);
 
 /*
  * Try to dedicate one of the protection keys to be used as an
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 87a57b7..4f566e9 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -945,6 +945,11 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int 
pkey,
 
return 0;
 }
+
+bool arch_pkeys_enabled(void)
+{
+   return boot_cpu_has(X86_FEATURE_OSPKE);
+}
 #endif /* ! CONFIG_ARCH_HAS_PKEYS */
 
 /*
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 4c616be..117ed01 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1307,11 +1307,3 @@ static int __init register_kernel_offset_dumper(void)
return 0;
 }
 __initcall(register_kernel_offset_dumper);
-
-void arch_show_smap(struct seq_file *m, struct vm_area_struct *vma)
-{
-   if (!boot_cpu_has(X86_FEATURE_OSPKE))
-   return;
-
-   seq_printf(m, "ProtectionKey:  %8u\n", vma_pkey(vma));
-}
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 6d83bb7..70aa912 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -18,10 +18,12 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 
 void task_mem(struct seq_file *m, struct mm_struct *mm)
@@ -733,10 +735,6 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long 
hmask,
 }
 #endif /* HUGETLB_PAGE */
 
-void __weak arch_show_smap(struct seq_file *m, struct vm_area_struct *vma)
-{
-}
-
 static int show_smap(struct seq_file *m, void *v, int is_pid)
 {
struct proc_maps_private *priv = m->private;
@@ -856,9 +854,11 @@ static int show_smap(struct seq_file *m, void *v, int 
is_pid)
   (unsigned long)(mss->pss >> (10 + PSS_SHIFT)));
 
if (!rollup_mode) {
-   arch_show_smap(m, vma);
+   if (arch_pkeys_enabled())
+   seq_printf(m, "ProtectionKey:  %8u\n", vma_pkey(vma));
show_smap_vma_flags(m, vma);
}
+
m_cache_vma(m, vma);
return ret;
 }
diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h
index 0794ca7..49dff15 100644
--- a/include/linux/pkeys.h
+++ b/include/linux/pkeys.h
@@ -3,7 +3,6 @@
 #define _LINUX_PKEYS_H
 
 #include 
-#include 
 
 #ifdef CONFIG_ARCH_HAS_PKEYS
 #include 
@@ -13,6 +12,7 @@
 #define arch_override_mprotect_pkey(vma,

[PATCH v13 0/3] mm, x86, powerpc: Enhancements to Memory Protection Keys.

2018-03-27 Thread Ram Pai
This patch series provides arch-neutral enhancements to
enable memory-keys on new architecutes, and the corresponding
changes in x86 and powerpc specific code to support that.

a) Provides ability to support upto 32 keys.  PowerPC
can handle 32 keys and hence needs this.

b) Arch-neutral code; and not the arch-specific code,
   determines the format of the string, that displays the key
   for each vma in smaps.

History:
---
version v13:
(1) fixed a git bisect error. :(

version v12:
(1) fixed compilation errors seen with various x86
configs.
version v11:
(1) code that displays key in smaps is not any more
defined under CONFIG_ARCH_HAS_PKEYS.
- Comment by Eric W. Biederman and Michal Hocko
(2) merged two patches that implemented (1).
- comment by Michal Hocko

version prior to v11:
(1) used one additional bit from VM_HIGH_ARCH_*
to support 32 keys.
- Suggestion by Dave Hansen.
(2) powerpc specific changes to support memory keys.


Ram Pai (3):
  mm, powerpc, x86: define VM_PKEY_BITx bits if CONFIG_ARCH_HAS_PKEYS
is enabled
  mm, powerpc, x86: introduce an additional vma bit for powerpc pkey
  mm, x86, powerpc: display pkey in smaps only if arch supports pkeys

 arch/powerpc/include/asm/mmu_context.h |5 -
 arch/powerpc/include/asm/pkeys.h   |2 ++
 arch/x86/include/asm/mmu_context.h |5 -
 arch/x86/include/asm/pkeys.h   |1 +
 arch/x86/kernel/fpu/xstate.c   |5 +
 arch/x86/kernel/setup.c|8 
 fs/proc/task_mmu.c |   15 ---
 include/linux/mm.h |   12 +++-
 include/linux/pkeys.h  |7 ++-
 9 files changed, 29 insertions(+), 31 deletions(-)



[PATCH v13 0/3] mm, x86, powerpc: Enhancements to Memory Protection Keys.

2018-03-27 Thread Ram Pai
This patch series provides arch-neutral enhancements to
enable memory-keys on new architecutes, and the corresponding
changes in x86 and powerpc specific code to support that.

a) Provides ability to support upto 32 keys.  PowerPC
can handle 32 keys and hence needs this.

b) Arch-neutral code; and not the arch-specific code,
   determines the format of the string, that displays the key
   for each vma in smaps.

History:
---
version v13:
(1) fixed a git bisect error. :(

version v12:
(1) fixed compilation errors seen with various x86
configs.
version v11:
(1) code that displays key in smaps is not any more
defined under CONFIG_ARCH_HAS_PKEYS.
- Comment by Eric W. Biederman and Michal Hocko
(2) merged two patches that implemented (1).
- comment by Michal Hocko

version prior to v11:
(1) used one additional bit from VM_HIGH_ARCH_*
to support 32 keys.
- Suggestion by Dave Hansen.
(2) powerpc specific changes to support memory keys.


Ram Pai (3):
  mm, powerpc, x86: define VM_PKEY_BITx bits if CONFIG_ARCH_HAS_PKEYS
is enabled
  mm, powerpc, x86: introduce an additional vma bit for powerpc pkey
  mm, x86, powerpc: display pkey in smaps only if arch supports pkeys

 arch/powerpc/include/asm/mmu_context.h |5 -
 arch/powerpc/include/asm/pkeys.h   |2 ++
 arch/x86/include/asm/mmu_context.h |5 -
 arch/x86/include/asm/pkeys.h   |1 +
 arch/x86/kernel/fpu/xstate.c   |5 +
 arch/x86/kernel/setup.c|8 
 fs/proc/task_mmu.c |   15 ---
 include/linux/mm.h |   12 +++-
 include/linux/pkeys.h  |7 ++-
 9 files changed, 29 insertions(+), 31 deletions(-)



[PATCH v13 2/3] mm, powerpc, x86: introduce an additional vma bit for powerpc pkey

2018-03-27 Thread Ram Pai
Currently only 4bits are allocated in the vma flags to hold 16
keys. This is sufficient for x86. PowerPC  supports  32  keys,
which needs 5bits. This patch allocates an  additional bit.

cc: Dave Hansen <dave.han...@intel.com>
cc: Michael Ellermen <m...@ellerman.id.au>
cc: Benjamin Herrenschmidt <b...@kernel.crashing.org>
cc: Andrew Morton <a...@linux-foundation.org>
Reviewed-by: Ingo Molnar <mi...@kernel.org>
Acked-by: Balbir Singh <bsinghar...@gmail.com>
Signed-off-by: Ram Pai <linux...@us.ibm.com>
---
 fs/proc/task_mmu.c |1 +
 include/linux/mm.h |3 ++-
 2 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 6b996d0..6d83bb7 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -685,6 +685,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct 
vm_area_struct *vma)
[ilog2(VM_PKEY_BIT1)]   = "",
[ilog2(VM_PKEY_BIT2)]   = "",
[ilog2(VM_PKEY_BIT3)]   = "",
+   [ilog2(VM_PKEY_BIT4)]   = "",
 #endif /* CONFIG_ARCH_HAS_PKEYS */
};
size_t i;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ad207ad..d534f46 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -231,9 +231,10 @@ extern int overcommit_kbytes_handler(struct ctl_table *, 
int, void __user *,
 #ifdef CONFIG_ARCH_HAS_PKEYS
 # define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0
 # define VM_PKEY_BIT0  VM_HIGH_ARCH_0  /* A protection key is a 4-bit value */
-# define VM_PKEY_BIT1  VM_HIGH_ARCH_1
+# define VM_PKEY_BIT1  VM_HIGH_ARCH_1  /* on x86 and 5-bit value on ppc64   */
 # define VM_PKEY_BIT2  VM_HIGH_ARCH_2
 # define VM_PKEY_BIT3  VM_HIGH_ARCH_3
+# define VM_PKEY_BIT4  VM_HIGH_ARCH_4
 #endif /* CONFIG_ARCH_HAS_PKEYS */
 
 #if defined(CONFIG_X86)
-- 
1.7.1



[PATCH v13 2/3] mm, powerpc, x86: introduce an additional vma bit for powerpc pkey

2018-03-27 Thread Ram Pai
Currently only 4bits are allocated in the vma flags to hold 16
keys. This is sufficient for x86. PowerPC  supports  32  keys,
which needs 5bits. This patch allocates an  additional bit.

cc: Dave Hansen 
cc: Michael Ellermen 
cc: Benjamin Herrenschmidt 
cc: Andrew Morton 
Reviewed-by: Ingo Molnar 
Acked-by: Balbir Singh 
Signed-off-by: Ram Pai 
---
 fs/proc/task_mmu.c |1 +
 include/linux/mm.h |3 ++-
 2 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 6b996d0..6d83bb7 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -685,6 +685,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct 
vm_area_struct *vma)
[ilog2(VM_PKEY_BIT1)]   = "",
[ilog2(VM_PKEY_BIT2)]   = "",
[ilog2(VM_PKEY_BIT3)]   = "",
+   [ilog2(VM_PKEY_BIT4)]   = "",
 #endif /* CONFIG_ARCH_HAS_PKEYS */
};
size_t i;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ad207ad..d534f46 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -231,9 +231,10 @@ extern int overcommit_kbytes_handler(struct ctl_table *, 
int, void __user *,
 #ifdef CONFIG_ARCH_HAS_PKEYS
 # define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0
 # define VM_PKEY_BIT0  VM_HIGH_ARCH_0  /* A protection key is a 4-bit value */
-# define VM_PKEY_BIT1  VM_HIGH_ARCH_1
+# define VM_PKEY_BIT1  VM_HIGH_ARCH_1  /* on x86 and 5-bit value on ppc64   */
 # define VM_PKEY_BIT2  VM_HIGH_ARCH_2
 # define VM_PKEY_BIT3  VM_HIGH_ARCH_3
+# define VM_PKEY_BIT4  VM_HIGH_ARCH_4
 #endif /* CONFIG_ARCH_HAS_PKEYS */
 
 #if defined(CONFIG_X86)
-- 
1.7.1



Re: [PATCH 1/9] x86, pkeys: do not special case protection key 0

2018-03-26 Thread Ram Pai
On Fri, Mar 23, 2018 at 11:09:05AM -0700, Dave Hansen wrote:
> 
> From: Dave Hansen 
> 
> mm_pkey_is_allocated() treats pkey 0 as unallocated.  That is
> inconsistent with the manpages, and also inconsistent with
> mm->context.pkey_allocation_map.  Stop special casing it and only
> disallow values that are actually bad (< 0).
> 
> The end-user visible effect of this is that you can now use
> mprotect_pkey() to set pkey=0.
> 
> This is a bit nicer than what Ram proposed because it is simpler
> and removes special-casing for pkey 0.  On the other hand, it does
> allow applciations to pkey_free() pkey-0, but that's just a silly
> thing to do, so we are not going to protect against it.

The more I think about this, the more I feel we are opening up a can
of worms.  I am ok with a bad application, shooting itself in its feet.
But I am worried about all the bug reports and support requests we
will encounter when applications inadvertently shoot themselves 
and blame it on the kernel.

a warning in dmesg logs indicating a free-of-pkey-0 can help deflect
the blame from the kernel.

RP



Re: [PATCH 1/9] x86, pkeys: do not special case protection key 0

2018-03-26 Thread Ram Pai
On Fri, Mar 23, 2018 at 11:09:05AM -0700, Dave Hansen wrote:
> 
> From: Dave Hansen 
> 
> mm_pkey_is_allocated() treats pkey 0 as unallocated.  That is
> inconsistent with the manpages, and also inconsistent with
> mm->context.pkey_allocation_map.  Stop special casing it and only
> disallow values that are actually bad (< 0).
> 
> The end-user visible effect of this is that you can now use
> mprotect_pkey() to set pkey=0.
> 
> This is a bit nicer than what Ram proposed because it is simpler
> and removes special-casing for pkey 0.  On the other hand, it does
> allow applciations to pkey_free() pkey-0, but that's just a silly
> thing to do, so we are not going to protect against it.

The more I think about this, the more I feel we are opening up a can
of worms.  I am ok with a bad application, shooting itself in its feet.
But I am worried about all the bug reports and support requests we
will encounter when applications inadvertently shoot themselves 
and blame it on the kernel.

a warning in dmesg logs indicating a free-of-pkey-0 can help deflect
the blame from the kernel.

RP



[PATCH v2] powerpc, pkey: make protection key 0 less special

2018-03-26 Thread Ram Pai
Applications need the ability to associate an address-range with some
key and latter revert to its initial default key. Pkey-0 comes close to
providing this function but falls short, because the current
implementation disallows applications to explicitly associate pkey-0 to
the address range.

Lets make pkey-0 less special and treat it almost like any other key.
Thus it can be explicitly associated with any address range, and can be
freed. This gives the application more flexibility and power.  The
ability to free pkey-0 must be used responsibily, since pkey-0 is
associated with almost all address-range by default.

Even with this change pkey-0 continues to be slightly more special
from the following point of view.
(a) it is implicitly allocated.
(b) it is the default key assigned to any address-range.

Tested on powerpc.

cc: Thomas Gleixner <t...@linutronix.de>
cc: Dave Hansen <dave.han...@intel.com>
cc: Michael Ellermen <m...@ellerman.id.au>
cc: Ingo Molnar <mi...@kernel.org>
cc: Andrew Morton <a...@linux-foundation.org>
Signed-off-by: Ram Pai <linux...@us.ibm.com>
---
History:
v2: mm_pkey_is_allocated() continued to treat pkey-0 special.
fixed it.

 arch/powerpc/include/asm/pkeys.h | 20 
 arch/powerpc/mm/pkeys.c  | 20 
 2 files changed, 28 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 0409c80..b598fa9 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -101,10 +101,14 @@ static inline u16 pte_to_pkey_bits(u64 pteflags)
 
 static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey)
 {
-   /* A reserved key is never considered as 'explicitly allocated' */
-   return ((pkey < arch_max_pkey()) &&
-   !__mm_pkey_is_reserved(pkey) &&
-   __mm_pkey_is_allocated(mm, pkey));
+   if (pkey < 0 || pkey >= arch_max_pkey())
+   return false;
+
+   /* Reserved keys are never allocated. */
+   if (__mm_pkey_is_reserved(pkey))
+   return false;
+
+   return __mm_pkey_is_allocated(mm, pkey);
 }
 
 extern void __arch_activate_pkey(int pkey);
@@ -200,6 +204,14 @@ static inline int arch_set_user_pkey_access(struct 
task_struct *tsk, int pkey,
 {
if (static_branch_likely(_disabled))
return -EINVAL;
+
+   /*
+* userspace is discouraged from changing permissions of
+* pkey-0. powerpc hardware does not support it anyway.
+*/
+   if (!pkey)
+   return init_val ? -EINVAL : 0;
+
return __arch_set_user_pkey_access(tsk, pkey, init_val);
 }
 
diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
index ba71c54..e7a9e34 100644
--- a/arch/powerpc/mm/pkeys.c
+++ b/arch/powerpc/mm/pkeys.c
@@ -119,19 +119,21 @@ int pkey_initialize(void)
 #else
os_reserved = 0;
 #endif
-   /*
-* Bits are in LE format. NOTE: 1, 0 are reserved.
-* key 0 is the default key, which allows read/write/execute.
-* key 1 is recommended not to be used. PowerISA(3.0) page 1015,
-* programming note.
-*/
+   /* Bits are in LE format. */
initial_allocation_mask = ~0x0;
 
/* register mask is in BE format */
pkey_amr_uamor_mask = ~0x0ul;
pkey_iamr_mask = ~0x0ul;
 
-   for (i = 2; i < (pkeys_total - os_reserved); i++) {
+   for (i = 0; i < (pkeys_total - os_reserved); i++) {
+   /*
+* key 1 is recommended not to be used.
+* PowerISA(3.0) page 1015,
+*/
+   if (i == 1)
+   continue;
+
initial_allocation_mask &= ~(0x1 << i);
pkey_amr_uamor_mask &= ~(0x3ul << pkeyshift(i));
pkey_iamr_mask &= ~(0x1ul << pkeyshift(i));
@@ -145,7 +147,9 @@ void pkey_mm_init(struct mm_struct *mm)
 {
if (static_branch_likely(_disabled))
return;
-   mm_pkey_allocation_map(mm) = initial_allocation_mask;
+
+   /* allocate key-0 by default */
+   mm_pkey_allocation_map(mm) = initial_allocation_mask | 0x1;
/* -1 means unallocated or invalid */
mm->context.execute_only_pkey = -1;
 }
-- 
1.8.3.1



[PATCH v2] powerpc, pkey: make protection key 0 less special

2018-03-26 Thread Ram Pai
Applications need the ability to associate an address-range with some
key and latter revert to its initial default key. Pkey-0 comes close to
providing this function but falls short, because the current
implementation disallows applications to explicitly associate pkey-0 to
the address range.

Lets make pkey-0 less special and treat it almost like any other key.
Thus it can be explicitly associated with any address range, and can be
freed. This gives the application more flexibility and power.  The
ability to free pkey-0 must be used responsibily, since pkey-0 is
associated with almost all address-range by default.

Even with this change pkey-0 continues to be slightly more special
from the following point of view.
(a) it is implicitly allocated.
(b) it is the default key assigned to any address-range.

Tested on powerpc.

cc: Thomas Gleixner 
cc: Dave Hansen 
cc: Michael Ellermen 
cc: Ingo Molnar 
cc: Andrew Morton 
Signed-off-by: Ram Pai 
---
History:
v2: mm_pkey_is_allocated() continued to treat pkey-0 special.
fixed it.

 arch/powerpc/include/asm/pkeys.h | 20 
 arch/powerpc/mm/pkeys.c  | 20 
 2 files changed, 28 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 0409c80..b598fa9 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -101,10 +101,14 @@ static inline u16 pte_to_pkey_bits(u64 pteflags)
 
 static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey)
 {
-   /* A reserved key is never considered as 'explicitly allocated' */
-   return ((pkey < arch_max_pkey()) &&
-   !__mm_pkey_is_reserved(pkey) &&
-   __mm_pkey_is_allocated(mm, pkey));
+   if (pkey < 0 || pkey >= arch_max_pkey())
+   return false;
+
+   /* Reserved keys are never allocated. */
+   if (__mm_pkey_is_reserved(pkey))
+   return false;
+
+   return __mm_pkey_is_allocated(mm, pkey);
 }
 
 extern void __arch_activate_pkey(int pkey);
@@ -200,6 +204,14 @@ static inline int arch_set_user_pkey_access(struct 
task_struct *tsk, int pkey,
 {
if (static_branch_likely(_disabled))
return -EINVAL;
+
+   /*
+* userspace is discouraged from changing permissions of
+* pkey-0. powerpc hardware does not support it anyway.
+*/
+   if (!pkey)
+   return init_val ? -EINVAL : 0;
+
return __arch_set_user_pkey_access(tsk, pkey, init_val);
 }
 
diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
index ba71c54..e7a9e34 100644
--- a/arch/powerpc/mm/pkeys.c
+++ b/arch/powerpc/mm/pkeys.c
@@ -119,19 +119,21 @@ int pkey_initialize(void)
 #else
os_reserved = 0;
 #endif
-   /*
-* Bits are in LE format. NOTE: 1, 0 are reserved.
-* key 0 is the default key, which allows read/write/execute.
-* key 1 is recommended not to be used. PowerISA(3.0) page 1015,
-* programming note.
-*/
+   /* Bits are in LE format. */
initial_allocation_mask = ~0x0;
 
/* register mask is in BE format */
pkey_amr_uamor_mask = ~0x0ul;
pkey_iamr_mask = ~0x0ul;
 
-   for (i = 2; i < (pkeys_total - os_reserved); i++) {
+   for (i = 0; i < (pkeys_total - os_reserved); i++) {
+   /*
+* key 1 is recommended not to be used.
+* PowerISA(3.0) page 1015,
+*/
+   if (i == 1)
+   continue;
+
initial_allocation_mask &= ~(0x1 << i);
pkey_amr_uamor_mask &= ~(0x3ul << pkeyshift(i));
pkey_iamr_mask &= ~(0x1ul << pkeyshift(i));
@@ -145,7 +147,9 @@ void pkey_mm_init(struct mm_struct *mm)
 {
if (static_branch_likely(_disabled))
return;
-   mm_pkey_allocation_map(mm) = initial_allocation_mask;
+
+   /* allocate key-0 by default */
+   mm_pkey_allocation_map(mm) = initial_allocation_mask | 0x1;
/* -1 means unallocated or invalid */
mm->context.execute_only_pkey = -1;
 }
-- 
1.8.3.1



Re: [PATCH] powerpc, pkey: make protection key 0 less special

2018-03-26 Thread Ram Pai
On Mon, Mar 26, 2018 at 04:31:41PM -0700, Ram Pai wrote:
> Applications need the ability to associate an address-range with some
> key and latter revert to its initial default key. Pkey-0 comes close to
> providing this function but falls short, because the current
> implementation disallows applications to explicitly associate pkey-0 to
> the address range.
> 
> Lets make pkey-0 less special and treat it almost like any other key.
> Thus it can be explicitly associated with any address range, and can be
> freed. This gives the application more flexibility and power.  The
> ability to free pkey-0 must be used responsibily, since pkey-0 is
> associated with almost all address-range by default.
> 
> Even with this change pkey-0 continues to be slightly more special
> from the following point of view.
> (a) it is implicitly allocated.
> (b) it is the default key assigned to any address-range.
> 
> Tested on powerpc.

This patch is not entirely correct.
> 
> cc: Thomas Gleixner <t...@linutronix.de>
> cc: Dave Hansen <dave.han...@intel.com>
> cc: Michael Ellermen <m...@ellerman.id.au>
> cc: Ingo Molnar <mi...@kernel.org>
> cc: Andrew Morton <a...@linux-foundation.org>
> Signed-off-by: Ram Pai <linux...@us.ibm.com>
> ---
>  arch/powerpc/include/asm/pkeys.h | 24 
>  1 file changed, 20 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/pkeys.h 
> b/arch/powerpc/include/asm/pkeys.h
> index 0409c80..9c7d3bd 100644
> --- a/arch/powerpc/include/asm/pkeys.h
> +++ b/arch/powerpc/include/asm/pkeys.h
> @@ -101,10 +101,18 @@ static inline u16 pte_to_pkey_bits(u64 pteflags)
> 
>  static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey)
>  {
> - /* A reserved key is never considered as 'explicitly allocated' */
> - return ((pkey < arch_max_pkey()) &&
> - !__mm_pkey_is_reserved(pkey) &&
> - __mm_pkey_is_allocated(mm, pkey));
> + /* pkey 0 is allocated by default. */
> + if (!pkey)
> + return true;

This is wrong. pkey-0 should not be treated any special here. Will fix
this and send a new patch. Sorry for the noise.

RP



Re: [PATCH] powerpc, pkey: make protection key 0 less special

2018-03-26 Thread Ram Pai
On Mon, Mar 26, 2018 at 04:31:41PM -0700, Ram Pai wrote:
> Applications need the ability to associate an address-range with some
> key and latter revert to its initial default key. Pkey-0 comes close to
> providing this function but falls short, because the current
> implementation disallows applications to explicitly associate pkey-0 to
> the address range.
> 
> Lets make pkey-0 less special and treat it almost like any other key.
> Thus it can be explicitly associated with any address range, and can be
> freed. This gives the application more flexibility and power.  The
> ability to free pkey-0 must be used responsibily, since pkey-0 is
> associated with almost all address-range by default.
> 
> Even with this change pkey-0 continues to be slightly more special
> from the following point of view.
> (a) it is implicitly allocated.
> (b) it is the default key assigned to any address-range.
> 
> Tested on powerpc.

This patch is not entirely correct.
> 
> cc: Thomas Gleixner 
> cc: Dave Hansen 
> cc: Michael Ellermen 
> cc: Ingo Molnar 
> cc: Andrew Morton 
> Signed-off-by: Ram Pai 
> ---
>  arch/powerpc/include/asm/pkeys.h | 24 
>  1 file changed, 20 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/pkeys.h 
> b/arch/powerpc/include/asm/pkeys.h
> index 0409c80..9c7d3bd 100644
> --- a/arch/powerpc/include/asm/pkeys.h
> +++ b/arch/powerpc/include/asm/pkeys.h
> @@ -101,10 +101,18 @@ static inline u16 pte_to_pkey_bits(u64 pteflags)
> 
>  static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey)
>  {
> - /* A reserved key is never considered as 'explicitly allocated' */
> - return ((pkey < arch_max_pkey()) &&
> - !__mm_pkey_is_reserved(pkey) &&
> - __mm_pkey_is_allocated(mm, pkey));
> + /* pkey 0 is allocated by default. */
> + if (!pkey)
> + return true;

This is wrong. pkey-0 should not be treated any special here. Will fix
this and send a new patch. Sorry for the noise.

RP



[PATCH] powerpc, pkey: make protection key 0 less special

2018-03-26 Thread Ram Pai
Applications need the ability to associate an address-range with some
key and latter revert to its initial default key. Pkey-0 comes close to
providing this function but falls short, because the current
implementation disallows applications to explicitly associate pkey-0 to
the address range.

Lets make pkey-0 less special and treat it almost like any other key.
Thus it can be explicitly associated with any address range, and can be
freed. This gives the application more flexibility and power.  The
ability to free pkey-0 must be used responsibily, since pkey-0 is
associated with almost all address-range by default.

Even with this change pkey-0 continues to be slightly more special
from the following point of view.
(a) it is implicitly allocated.
(b) it is the default key assigned to any address-range.

Tested on powerpc.

cc: Thomas Gleixner <t...@linutronix.de>
cc: Dave Hansen <dave.han...@intel.com>
cc: Michael Ellermen <m...@ellerman.id.au>
cc: Ingo Molnar <mi...@kernel.org>
cc: Andrew Morton <a...@linux-foundation.org>
Signed-off-by: Ram Pai <linux...@us.ibm.com>
---
 arch/powerpc/include/asm/pkeys.h | 24 
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 0409c80..9c7d3bd 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -101,10 +101,18 @@ static inline u16 pte_to_pkey_bits(u64 pteflags)
 
 static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey)
 {
-   /* A reserved key is never considered as 'explicitly allocated' */
-   return ((pkey < arch_max_pkey()) &&
-   !__mm_pkey_is_reserved(pkey) &&
-   __mm_pkey_is_allocated(mm, pkey));
+   /* pkey 0 is allocated by default. */
+   if (!pkey)
+   return true;
+
+   if (pkey < 0 || pkey >= arch_max_pkey())
+   return false;
+
+   /* Reserved keys are never allocated. */
+   if (__mm_pkey_is_reserved(pkey))
+   return false;
+
+   return __mm_pkey_is_allocated(mm, pkey);
 }
 
 extern void __arch_activate_pkey(int pkey);
@@ -200,6 +208,14 @@ static inline int arch_set_user_pkey_access(struct 
task_struct *tsk, int pkey,
 {
if (static_branch_likely(_disabled))
return -EINVAL;
+
+   /*
+* userspace is discouraged from changing permissions of
+* pkey-0. powerpc hardware does not support it anyway.
+*/
+   if (!pkey)
+   return init_val ? -EINVAL : 0;
+
return __arch_set_user_pkey_access(tsk, pkey, init_val);
 }
 
-- 
1.8.3.1



[PATCH] powerpc, pkey: make protection key 0 less special

2018-03-26 Thread Ram Pai
Applications need the ability to associate an address-range with some
key and latter revert to its initial default key. Pkey-0 comes close to
providing this function but falls short, because the current
implementation disallows applications to explicitly associate pkey-0 to
the address range.

Lets make pkey-0 less special and treat it almost like any other key.
Thus it can be explicitly associated with any address range, and can be
freed. This gives the application more flexibility and power.  The
ability to free pkey-0 must be used responsibily, since pkey-0 is
associated with almost all address-range by default.

Even with this change pkey-0 continues to be slightly more special
from the following point of view.
(a) it is implicitly allocated.
(b) it is the default key assigned to any address-range.

Tested on powerpc.

cc: Thomas Gleixner 
cc: Dave Hansen 
cc: Michael Ellermen 
cc: Ingo Molnar 
cc: Andrew Morton 
Signed-off-by: Ram Pai 
---
 arch/powerpc/include/asm/pkeys.h | 24 
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 0409c80..9c7d3bd 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -101,10 +101,18 @@ static inline u16 pte_to_pkey_bits(u64 pteflags)
 
 static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey)
 {
-   /* A reserved key is never considered as 'explicitly allocated' */
-   return ((pkey < arch_max_pkey()) &&
-   !__mm_pkey_is_reserved(pkey) &&
-   __mm_pkey_is_allocated(mm, pkey));
+   /* pkey 0 is allocated by default. */
+   if (!pkey)
+   return true;
+
+   if (pkey < 0 || pkey >= arch_max_pkey())
+   return false;
+
+   /* Reserved keys are never allocated. */
+   if (__mm_pkey_is_reserved(pkey))
+   return false;
+
+   return __mm_pkey_is_allocated(mm, pkey);
 }
 
 extern void __arch_activate_pkey(int pkey);
@@ -200,6 +208,14 @@ static inline int arch_set_user_pkey_access(struct 
task_struct *tsk, int pkey,
 {
if (static_branch_likely(_disabled))
return -EINVAL;
+
+   /*
+* userspace is discouraged from changing permissions of
+* pkey-0. powerpc hardware does not support it anyway.
+*/
+   if (!pkey)
+   return init_val ? -EINVAL : 0;
+
return __arch_set_user_pkey_access(tsk, pkey, init_val);
 }
 
-- 
1.8.3.1



Re: [PATCH 1/9] x86, pkeys: do not special case protection key 0

2018-03-26 Thread Ram Pai
On Fri, Mar 23, 2018 at 11:09:05AM -0700, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.han...@linux.intel.com>
> 
> mm_pkey_is_allocated() treats pkey 0 as unallocated.  That is
> inconsistent with the manpages, and also inconsistent with
> mm->context.pkey_allocation_map.  Stop special casing it and only
> disallow values that are actually bad (< 0).
> 
> The end-user visible effect of this is that you can now use
> mprotect_pkey() to set pkey=0.
> 
> This is a bit nicer than what Ram proposed because it is simpler
> and removes special-casing for pkey 0.  On the other hand, it does
> allow applciations to pkey_free() pkey-0, but that's just a silly
> thing to do, so we are not going to protect against it.
> 
> Signed-off-by: Dave Hansen <dave.han...@linux.intel.com>
> Cc: Ram Pai <linux...@us.ibm.com>
> Cc: Thomas Gleixner <t...@linutronix.de>
> Cc: Dave Hansen <dave.han...@intel.com>
> Cc: Michael Ellermen <m...@ellerman.id.au>
> Cc: Ingo Molnar <mi...@kernel.org>
> Cc: Andrew Morton <a...@linux-foundation.org>p
> Cc: Shuah Khan <sh...@kernel.org>
> ---
> 
>  b/arch/x86/include/asm/mmu_context.h |2 +-
>  b/arch/x86/include/asm/pkeys.h   |6 +++---
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff -puN arch/x86/include/asm/mmu_context.h~x86-pkey-0-default-allocated 
> arch/x86/include/asm/mmu_context.h
> --- a/arch/x86/include/asm/mmu_context.h~x86-pkey-0-default-allocated 
> 2018-03-21 15:47:48.182198927 -0700
> +++ b/arch/x86/include/asm/mmu_context.h  2018-03-21 15:47:48.187198927 
> -0700
> @@ -192,7 +192,7 @@ static inline int init_new_context(struc
> 
>  #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
>   if (cpu_feature_enabled(X86_FEATURE_OSPKE)) {
> - /* pkey 0 is the default and always allocated */
> + /* pkey 0 is the default and allocated implicitly */
>   mm->context.pkey_allocation_map = 0x1;

In the second patch, you introduce DEFAULT_KEY. Maybe you 
should introduce here and express the above code as

mm->context.pkey_allocation_map = (0x1 << DEFAULT_KEY);

Incase your default key changes to something else, you are still good.

>   /* -1 means unallocated or invalid */
>   mm->context.execute_only_pkey = -1;
> diff -puN arch/x86/include/asm/pkeys.h~x86-pkey-0-default-allocated 
> arch/x86/include/asm/pkeys.h
> --- a/arch/x86/include/asm/pkeys.h~x86-pkey-0-default-allocated   
> 2018-03-21 15:47:48.184198927 -0700
> +++ b/arch/x86/include/asm/pkeys.h2018-03-21 15:47:48.188198927 -0700
> @@ -49,10 +49,10 @@ bool mm_pkey_is_allocated(struct mm_stru
>  {
>   /*
>* "Allocated" pkeys are those that have been returned
> -  * from pkey_alloc().  pkey 0 is special, and never
> -  * returned from pkey_alloc().
> +  * from pkey_alloc() or pkey 0 which is allocated
> +  * implicitly when the mm is created.
>*/
> - if (pkey <= 0)
> + if (pkey < 0)
>   return false;
>   if (pkey >= arch_max_pkey())
>   return false;
> _

-- 
Ram Pai



Re: [PATCH 1/9] x86, pkeys: do not special case protection key 0

2018-03-26 Thread Ram Pai
On Fri, Mar 23, 2018 at 11:09:05AM -0700, Dave Hansen wrote:
> 
> From: Dave Hansen 
> 
> mm_pkey_is_allocated() treats pkey 0 as unallocated.  That is
> inconsistent with the manpages, and also inconsistent with
> mm->context.pkey_allocation_map.  Stop special casing it and only
> disallow values that are actually bad (< 0).
> 
> The end-user visible effect of this is that you can now use
> mprotect_pkey() to set pkey=0.
> 
> This is a bit nicer than what Ram proposed because it is simpler
> and removes special-casing for pkey 0.  On the other hand, it does
> allow applciations to pkey_free() pkey-0, but that's just a silly
> thing to do, so we are not going to protect against it.
> 
> Signed-off-by: Dave Hansen 
> Cc: Ram Pai 
> Cc: Thomas Gleixner 
> Cc: Dave Hansen 
> Cc: Michael Ellermen 
> Cc: Ingo Molnar 
> Cc: Andrew Morton p
> Cc: Shuah Khan 
> ---
> 
>  b/arch/x86/include/asm/mmu_context.h |2 +-
>  b/arch/x86/include/asm/pkeys.h   |6 +++---
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff -puN arch/x86/include/asm/mmu_context.h~x86-pkey-0-default-allocated 
> arch/x86/include/asm/mmu_context.h
> --- a/arch/x86/include/asm/mmu_context.h~x86-pkey-0-default-allocated 
> 2018-03-21 15:47:48.182198927 -0700
> +++ b/arch/x86/include/asm/mmu_context.h  2018-03-21 15:47:48.187198927 
> -0700
> @@ -192,7 +192,7 @@ static inline int init_new_context(struc
> 
>  #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
>   if (cpu_feature_enabled(X86_FEATURE_OSPKE)) {
> - /* pkey 0 is the default and always allocated */
> + /* pkey 0 is the default and allocated implicitly */
>   mm->context.pkey_allocation_map = 0x1;

In the second patch, you introduce DEFAULT_KEY. Maybe you 
should introduce here and express the above code as

mm->context.pkey_allocation_map = (0x1 << DEFAULT_KEY);

Incase your default key changes to something else, you are still good.

>   /* -1 means unallocated or invalid */
>   mm->context.execute_only_pkey = -1;
> diff -puN arch/x86/include/asm/pkeys.h~x86-pkey-0-default-allocated 
> arch/x86/include/asm/pkeys.h
> --- a/arch/x86/include/asm/pkeys.h~x86-pkey-0-default-allocated   
> 2018-03-21 15:47:48.184198927 -0700
> +++ b/arch/x86/include/asm/pkeys.h2018-03-21 15:47:48.188198927 -0700
> @@ -49,10 +49,10 @@ bool mm_pkey_is_allocated(struct mm_stru
>  {
>   /*
>* "Allocated" pkeys are those that have been returned
> -  * from pkey_alloc().  pkey 0 is special, and never
> -  * returned from pkey_alloc().
> +  * from pkey_alloc() or pkey 0 which is allocated
> +  * implicitly when the mm is created.
>*/
> - if (pkey <= 0)
> + if (pkey < 0)
>   return false;
>   if (pkey >= arch_max_pkey())
>   return false;
> _

-- 
Ram Pai



Re: [PATCH 1/3] x86, pkeys: do not special case protection key 0

2018-03-18 Thread Ram Pai
On Sun, Mar 18, 2018 at 10:30:48AM +0100, Thomas Gleixner wrote:
> On Sat, 17 Mar 2018, Ram Pai wrote:
> > On Fri, Mar 16, 2018 at 02:46:56PM -0700, Dave Hansen wrote:
> > > 
> > > From: Dave Hansen <dave.han...@linux.intel.com>
> > > 
> > > mm_pkey_is_allocated() treats pkey 0 as unallocated.  That is
> > > inconsistent with the manpages, and also inconsistent with
> > > mm->context.pkey_allocation_map.  Stop special casing it and only
> > > disallow values that are actually bad (< 0).
> > > 
> > > The end-user visible effect of this is that you can now use
> > > mprotect_pkey() to set pkey=0.
> > > 
> > > This is a bit nicer than what Ram proposed because it is simpler
> > > and removes special-casing for pkey 0.  On the other hand, it does
> > > allow applciations to pkey_free() pkey-0, but that's just a silly
> > > thing to do, so we are not going to protect against it.
> > 
> > So your proposal 
> > (a) allocates pkey 0 implicitly, 
> > (b) does not stop anyone from freeing pkey-0
> > (c) and allows pkey-0 to be explicitly associated with any address range.
> > correct?
> > 
> > My proposal
> > (a) allocates pkey 0 implicitly, 
> > (b) stops anyone from freeing pkey-0
> > (c) and allows pkey-0 to be explicitly associated with any address range.
> > 
> > So the difference between the two proposals is just the freeing part i.e 
> > (b).
> > Did I get this right?
> 
> Yes, and that's consistent with the other pkeys.
> 

ok.

Yes it makes pkey-0 even more consistent with the other keys, but not
entirely consistent.  pkey-0 still has the priviledge of being
allocated by default.


RP



Re: [PATCH 1/3] x86, pkeys: do not special case protection key 0

2018-03-18 Thread Ram Pai
On Sun, Mar 18, 2018 at 10:30:48AM +0100, Thomas Gleixner wrote:
> On Sat, 17 Mar 2018, Ram Pai wrote:
> > On Fri, Mar 16, 2018 at 02:46:56PM -0700, Dave Hansen wrote:
> > > 
> > > From: Dave Hansen 
> > > 
> > > mm_pkey_is_allocated() treats pkey 0 as unallocated.  That is
> > > inconsistent with the manpages, and also inconsistent with
> > > mm->context.pkey_allocation_map.  Stop special casing it and only
> > > disallow values that are actually bad (< 0).
> > > 
> > > The end-user visible effect of this is that you can now use
> > > mprotect_pkey() to set pkey=0.
> > > 
> > > This is a bit nicer than what Ram proposed because it is simpler
> > > and removes special-casing for pkey 0.  On the other hand, it does
> > > allow applciations to pkey_free() pkey-0, but that's just a silly
> > > thing to do, so we are not going to protect against it.
> > 
> > So your proposal 
> > (a) allocates pkey 0 implicitly, 
> > (b) does not stop anyone from freeing pkey-0
> > (c) and allows pkey-0 to be explicitly associated with any address range.
> > correct?
> > 
> > My proposal
> > (a) allocates pkey 0 implicitly, 
> > (b) stops anyone from freeing pkey-0
> > (c) and allows pkey-0 to be explicitly associated with any address range.
> > 
> > So the difference between the two proposals is just the freeing part i.e 
> > (b).
> > Did I get this right?
> 
> Yes, and that's consistent with the other pkeys.
> 

ok.

Yes it makes pkey-0 even more consistent with the other keys, but not
entirely consistent.  pkey-0 still has the priviledge of being
allocated by default.


RP



Re: [PATCH 1/3] x86, pkeys: do not special case protection key 0

2018-03-17 Thread Ram Pai
On Fri, Mar 16, 2018 at 02:46:56PM -0700, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.han...@linux.intel.com>
> 
> mm_pkey_is_allocated() treats pkey 0 as unallocated.  That is
> inconsistent with the manpages, and also inconsistent with
> mm->context.pkey_allocation_map.  Stop special casing it and only
> disallow values that are actually bad (< 0).
> 
> The end-user visible effect of this is that you can now use
> mprotect_pkey() to set pkey=0.
> 
> This is a bit nicer than what Ram proposed because it is simpler
> and removes special-casing for pkey 0.  On the other hand, it does
> allow applciations to pkey_free() pkey-0, but that's just a silly
> thing to do, so we are not going to protect against it.

So your proposal 
(a) allocates pkey 0 implicitly, 
(b) does not stop anyone from freeing pkey-0
(c) and allows pkey-0 to be explicitly associated with any address range.
correct?

My proposal
(a) allocates pkey 0 implicitly, 
(b) stops anyone from freeing pkey-0
(c) and allows pkey-0 to be explicitly associated with any address range.

So the difference between the two proposals is just the freeing part i.e (b).
Did I get this right?

Its a philosophical debate; allow the user
to shoot-in-the-feet or stop from not doing so. There is no 
clear answer either way. I am fine either way.

So here is my

Reviewed-by: Ram Pai <linux...@us.ibm.com>

I will write a corresponding patch for powerpc.

> 
> Signed-off-by: Dave Hansen <dave.han...@linux.intel.com>
> Cc: Ram Pai <linux...@us.ibm.com>
> Cc: Thomas Gleixner <t...@linutronix.de>
> Cc: Dave Hansen <dave.han...@intel.com>
> Cc: Michael Ellermen <m...@ellerman.id.au>
> Cc: Ingo Molnar <mi...@kernel.org>
> Cc: Andrew Morton <a...@linux-foundation.org>p
> Cc: Shuah Khan <sh...@kernel.org>
> ---
> 
>  b/arch/x86/include/asm/mmu_context.h |2 +-
>  b/arch/x86/include/asm/pkeys.h   |6 +++---
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff -puN arch/x86/include/asm/mmu_context.h~x86-pkey-0-default-allocated 
> arch/x86/include/asm/mmu_context.h
> --- a/arch/x86/include/asm/mmu_context.h~x86-pkey-0-default-allocated 
> 2018-03-16 14:46:39.023285476 -0700
> +++ b/arch/x86/include/asm/mmu_context.h  2018-03-16 14:46:39.028285476 
> -0700
> @@ -191,7 +191,7 @@ static inline int init_new_context(struc
> 
>  #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
>   if (cpu_feature_enabled(X86_FEATURE_OSPKE)) {
> - /* pkey 0 is the default and always allocated */
> + /* pkey 0 is the default and allocated implicitly */
>   mm->context.pkey_allocation_map = 0x1;
>   /* -1 means unallocated or invalid */
>   mm->context.execute_only_pkey = -1;
> diff -puN arch/x86/include/asm/pkeys.h~x86-pkey-0-default-allocated 
> arch/x86/include/asm/pkeys.h
> --- a/arch/x86/include/asm/pkeys.h~x86-pkey-0-default-allocated   
> 2018-03-16 14:46:39.025285476 -0700
> +++ b/arch/x86/include/asm/pkeys.h2018-03-16 14:46:39.028285476 -0700
> @@ -49,10 +49,10 @@ bool mm_pkey_is_allocated(struct mm_stru
>  {
>   /*
>* "Allocated" pkeys are those that have been returned
> -  * from pkey_alloc().  pkey 0 is special, and never
> -  * returned from pkey_alloc().
> +  * from pkey_alloc() or pkey 0 which is allocated
> +  * implicitly when the mm is created.
>*/
> - if (pkey <= 0)
> + if (pkey < 0)
>   return false;
>   if (pkey >= arch_max_pkey())
>   return false;
> _

-- 
Ram Pai



Re: [PATCH 1/3] x86, pkeys: do not special case protection key 0

2018-03-17 Thread Ram Pai
On Fri, Mar 16, 2018 at 02:46:56PM -0700, Dave Hansen wrote:
> 
> From: Dave Hansen 
> 
> mm_pkey_is_allocated() treats pkey 0 as unallocated.  That is
> inconsistent with the manpages, and also inconsistent with
> mm->context.pkey_allocation_map.  Stop special casing it and only
> disallow values that are actually bad (< 0).
> 
> The end-user visible effect of this is that you can now use
> mprotect_pkey() to set pkey=0.
> 
> This is a bit nicer than what Ram proposed because it is simpler
> and removes special-casing for pkey 0.  On the other hand, it does
> allow applciations to pkey_free() pkey-0, but that's just a silly
> thing to do, so we are not going to protect against it.

So your proposal 
(a) allocates pkey 0 implicitly, 
(b) does not stop anyone from freeing pkey-0
(c) and allows pkey-0 to be explicitly associated with any address range.
correct?

My proposal
(a) allocates pkey 0 implicitly, 
(b) stops anyone from freeing pkey-0
(c) and allows pkey-0 to be explicitly associated with any address range.

So the difference between the two proposals is just the freeing part i.e (b).
Did I get this right?

Its a philosophical debate; allow the user
to shoot-in-the-feet or stop from not doing so. There is no 
clear answer either way. I am fine either way.

So here is my

Reviewed-by: Ram Pai 

I will write a corresponding patch for powerpc.

> 
> Signed-off-by: Dave Hansen 
> Cc: Ram Pai 
> Cc: Thomas Gleixner 
> Cc: Dave Hansen 
> Cc: Michael Ellermen 
> Cc: Ingo Molnar 
> Cc: Andrew Morton p
> Cc: Shuah Khan 
> ---
> 
>  b/arch/x86/include/asm/mmu_context.h |2 +-
>  b/arch/x86/include/asm/pkeys.h   |6 +++---
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff -puN arch/x86/include/asm/mmu_context.h~x86-pkey-0-default-allocated 
> arch/x86/include/asm/mmu_context.h
> --- a/arch/x86/include/asm/mmu_context.h~x86-pkey-0-default-allocated 
> 2018-03-16 14:46:39.023285476 -0700
> +++ b/arch/x86/include/asm/mmu_context.h  2018-03-16 14:46:39.028285476 
> -0700
> @@ -191,7 +191,7 @@ static inline int init_new_context(struc
> 
>  #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
>   if (cpu_feature_enabled(X86_FEATURE_OSPKE)) {
> - /* pkey 0 is the default and always allocated */
> + /* pkey 0 is the default and allocated implicitly */
>   mm->context.pkey_allocation_map = 0x1;
>   /* -1 means unallocated or invalid */
>   mm->context.execute_only_pkey = -1;
> diff -puN arch/x86/include/asm/pkeys.h~x86-pkey-0-default-allocated 
> arch/x86/include/asm/pkeys.h
> --- a/arch/x86/include/asm/pkeys.h~x86-pkey-0-default-allocated   
> 2018-03-16 14:46:39.025285476 -0700
> +++ b/arch/x86/include/asm/pkeys.h2018-03-16 14:46:39.028285476 -0700
> @@ -49,10 +49,10 @@ bool mm_pkey_is_allocated(struct mm_stru
>  {
>   /*
>* "Allocated" pkeys are those that have been returned
> -  * from pkey_alloc().  pkey 0 is special, and never
> -  * returned from pkey_alloc().
> +  * from pkey_alloc() or pkey 0 which is allocated
> +  * implicitly when the mm is created.
>*/
> - if (pkey <= 0)
> + if (pkey < 0)
>   return false;
>   if (pkey >= arch_max_pkey())
>   return false;
> _

-- 
Ram Pai



Re: [PATCH v4] mm, pkey: treat pkey-0 special

2018-03-16 Thread Ram Pai
On Fri, Mar 16, 2018 at 10:02:22PM +1100, Balbir Singh wrote:
> On Fri, Mar 16, 2018 at 9:33 PM, Ram Pai <linux...@us.ibm.com> wrote:
> > Applications need the ability to associate an address-range with some
> > key and latter revert to its initial default key. Pkey-0 comes close to
> > providing this function but falls short, because the current
> > implementation disallows applications to explicitly associate pkey-0 to
> > the address range.
> >
> > Clarify the semantics of pkey-0 and provide the corresponding
> > implementation.
> >
> > Pkey-0 is special with the following semantics.
> > (a) it is implicitly allocated and can never be freed. It always exists.
> > (b) it is the default key assigned to any address-range.
> > (c) it can be explicitly associated with any address-range.
> >
> > Tested on powerpc only. Could not test on x86.
> 
> 
> Ram,
> 
> I was wondering if we should check the AMOR values on the ppc side to make 
> sure
> that pkey0 is indeed available for use as default. I am still of the
> opinion that we

AMOR cannot be read/written by the OS in priviledge-non-hypervisor-mode.
We could try testing if key-0 is available to the OS by temproarily
changing the bits key-0 bits of AMR or IAMR register. But will be
dangeorous to do, for you might disable read,execute of all the pages,
since all pages are asscoiated with key-0 bydefault.

May be we can play with UAMOR register and check if its key-0 can be
modified. That is a good indication that key-0 is available.
If it is not available, disable the pkey-subsystem, and operate
the legacy way; no pkeys.


> should consider non-0 default pkey in the long run. I'm OK with the patches 
> for
> now, but really 0 is not special except for it being the default bit
> values present
> in the PTE.

it will be a pain. Any new pte that gets instantiated will now have to
explicitly initialize its key to this default-non-zero-key.  I hope
we or any architecture goes there ever.

-- 
Ram Pai



Re: [PATCH v4] mm, pkey: treat pkey-0 special

2018-03-16 Thread Ram Pai
On Fri, Mar 16, 2018 at 10:02:22PM +1100, Balbir Singh wrote:
> On Fri, Mar 16, 2018 at 9:33 PM, Ram Pai  wrote:
> > Applications need the ability to associate an address-range with some
> > key and latter revert to its initial default key. Pkey-0 comes close to
> > providing this function but falls short, because the current
> > implementation disallows applications to explicitly associate pkey-0 to
> > the address range.
> >
> > Clarify the semantics of pkey-0 and provide the corresponding
> > implementation.
> >
> > Pkey-0 is special with the following semantics.
> > (a) it is implicitly allocated and can never be freed. It always exists.
> > (b) it is the default key assigned to any address-range.
> > (c) it can be explicitly associated with any address-range.
> >
> > Tested on powerpc only. Could not test on x86.
> 
> 
> Ram,
> 
> I was wondering if we should check the AMOR values on the ppc side to make 
> sure
> that pkey0 is indeed available for use as default. I am still of the
> opinion that we

AMOR cannot be read/written by the OS in priviledge-non-hypervisor-mode.
We could try testing if key-0 is available to the OS by temproarily
changing the bits key-0 bits of AMR or IAMR register. But will be
dangeorous to do, for you might disable read,execute of all the pages,
since all pages are asscoiated with key-0 bydefault.

May be we can play with UAMOR register and check if its key-0 can be
modified. That is a good indication that key-0 is available.
If it is not available, disable the pkey-subsystem, and operate
the legacy way; no pkeys.


> should consider non-0 default pkey in the long run. I'm OK with the patches 
> for
> now, but really 0 is not special except for it being the default bit
> values present
> in the PTE.

it will be a pain. Any new pte that gets instantiated will now have to
explicitly initialize its key to this default-non-zero-key.  I hope
we or any architecture goes there ever.

-- 
Ram Pai



[PATCH v4] mm, pkey: treat pkey-0 special

2018-03-16 Thread Ram Pai
Applications need the ability to associate an address-range with some
key and latter revert to its initial default key. Pkey-0 comes close to
providing this function but falls short, because the current
implementation disallows applications to explicitly associate pkey-0 to
the address range.

Clarify the semantics of pkey-0 and provide the corresponding
implementation.

Pkey-0 is special with the following semantics.
(a) it is implicitly allocated and can never be freed. It always exists.
(b) it is the default key assigned to any address-range.
(c) it can be explicitly associated with any address-range.

Tested on powerpc only. Could not test on x86.

cc: Thomas Gleixner <t...@linutronix.de>
cc: Dave Hansen <dave.han...@intel.com>
cc: Michael Ellermen <m...@ellerman.id.au>
cc: Ingo Molnar <mi...@kernel.org>
cc: Andrew Morton <a...@linux-foundation.org>
Signed-off-by: Ram Pai <linux...@us.ibm.com>
---
 History:
 v4 : (1) moved the code entirely in arch-independent location.
  (2) fixed comments -- suggested by Thomas Gliexner
 v3 : added clarification of the semantics of pkey0.
   -- suggested by Dave Hansen
 v2 : split the patch into two, one for x86 and one for powerpc
   -- suggested by Michael Ellermen

 Documentation/x86/protection-keys.txt |8 
 mm/mprotect.c |   25 ++---
 2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/Documentation/x86/protection-keys.txt 
b/Documentation/x86/protection-keys.txt
index ecb0d2d..92802c4 100644
--- a/Documentation/x86/protection-keys.txt
+++ b/Documentation/x86/protection-keys.txt
@@ -88,3 +88,11 @@ with a read():
 The kernel will send a SIGSEGV in both cases, but si_code will be set
 to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when
 the plain mprotect() permissions are violated.
+
+== pkey 0 ==
+
+Pkey-0 is special. It is implicitly allocated. Applications cannot allocate or
+free that key. This key is the default key that gets associated with a
+addres-space. It can be explicitly associated with any address-space.
+
+
diff --git a/mm/mprotect.c b/mm/mprotect.c
index e3309fc..2c779fa 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -430,7 +430,13 @@ static int do_mprotect_pkey(unsigned long start, size_t 
len,
 * them use it here.
 */
error = -EINVAL;
-   if ((pkey != -1) && !mm_pkey_is_allocated(current->mm, pkey))
+
+   /*
+* pkey-0 is special. It always exists. No need to check if it is
+* allocated. Check allocation status of all other keys. pkey=-1
+* is not realy a key, it means; use any available key.
+*/
+   if (pkey && pkey != -1 && !mm_pkey_is_allocated(current->mm, pkey))
goto out;
 
vma = find_vma(current->mm, start);
@@ -549,6 +555,12 @@ static int do_mprotect_pkey(unsigned long start, size_t 
len,
if (pkey == -1)
goto out;
 
+   if (!pkey) {
+   mm_pkey_free(current->mm, pkey);
+   printk("Internal error, cannot explicitly allocate key-0");
+   goto out;
+   }
+
ret = arch_set_user_pkey_access(current, pkey, init_val);
if (ret) {
mm_pkey_free(current->mm, pkey);
@@ -564,13 +576,20 @@ static int do_mprotect_pkey(unsigned long start, size_t 
len,
 {
int ret;
 
+   /*
+* pkey-0 is special. Userspace can never allocate or free it. It is
+* allocated by default. It always exists.
+*/
+   if (!pkey)
+   return -EINVAL;
+
down_write(>mm->mmap_sem);
ret = mm_pkey_free(current->mm, pkey);
up_write(>mm->mmap_sem);
 
/*
-* We could provie warnings or errors if any VMA still
-* has the pkey set here.
+* We could provide warnings or errors if any VMA still has the pkey
+* set here.
 */
return ret;
 }
-- 
1.7.1



[PATCH v4] mm, pkey: treat pkey-0 special

2018-03-16 Thread Ram Pai
Applications need the ability to associate an address-range with some
key and latter revert to its initial default key. Pkey-0 comes close to
providing this function but falls short, because the current
implementation disallows applications to explicitly associate pkey-0 to
the address range.

Clarify the semantics of pkey-0 and provide the corresponding
implementation.

Pkey-0 is special with the following semantics.
(a) it is implicitly allocated and can never be freed. It always exists.
(b) it is the default key assigned to any address-range.
(c) it can be explicitly associated with any address-range.

Tested on powerpc only. Could not test on x86.

cc: Thomas Gleixner 
cc: Dave Hansen 
cc: Michael Ellermen 
cc: Ingo Molnar 
cc: Andrew Morton 
Signed-off-by: Ram Pai 
---
 History:
 v4 : (1) moved the code entirely in arch-independent location.
  (2) fixed comments -- suggested by Thomas Gliexner
 v3 : added clarification of the semantics of pkey0.
   -- suggested by Dave Hansen
 v2 : split the patch into two, one for x86 and one for powerpc
   -- suggested by Michael Ellermen

 Documentation/x86/protection-keys.txt |8 
 mm/mprotect.c |   25 ++---
 2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/Documentation/x86/protection-keys.txt 
b/Documentation/x86/protection-keys.txt
index ecb0d2d..92802c4 100644
--- a/Documentation/x86/protection-keys.txt
+++ b/Documentation/x86/protection-keys.txt
@@ -88,3 +88,11 @@ with a read():
 The kernel will send a SIGSEGV in both cases, but si_code will be set
 to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when
 the plain mprotect() permissions are violated.
+
+== pkey 0 ==
+
+Pkey-0 is special. It is implicitly allocated. Applications cannot allocate or
+free that key. This key is the default key that gets associated with a
+addres-space. It can be explicitly associated with any address-space.
+
+
diff --git a/mm/mprotect.c b/mm/mprotect.c
index e3309fc..2c779fa 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -430,7 +430,13 @@ static int do_mprotect_pkey(unsigned long start, size_t 
len,
 * them use it here.
 */
error = -EINVAL;
-   if ((pkey != -1) && !mm_pkey_is_allocated(current->mm, pkey))
+
+   /*
+* pkey-0 is special. It always exists. No need to check if it is
+* allocated. Check allocation status of all other keys. pkey=-1
+* is not realy a key, it means; use any available key.
+*/
+   if (pkey && pkey != -1 && !mm_pkey_is_allocated(current->mm, pkey))
goto out;
 
vma = find_vma(current->mm, start);
@@ -549,6 +555,12 @@ static int do_mprotect_pkey(unsigned long start, size_t 
len,
if (pkey == -1)
goto out;
 
+   if (!pkey) {
+   mm_pkey_free(current->mm, pkey);
+   printk("Internal error, cannot explicitly allocate key-0");
+   goto out;
+   }
+
ret = arch_set_user_pkey_access(current, pkey, init_val);
if (ret) {
mm_pkey_free(current->mm, pkey);
@@ -564,13 +576,20 @@ static int do_mprotect_pkey(unsigned long start, size_t 
len,
 {
int ret;
 
+   /*
+* pkey-0 is special. Userspace can never allocate or free it. It is
+* allocated by default. It always exists.
+*/
+   if (!pkey)
+   return -EINVAL;
+
down_write(>mm->mmap_sem);
ret = mm_pkey_free(current->mm, pkey);
up_write(>mm->mmap_sem);
 
/*
-* We could provie warnings or errors if any VMA still
-* has the pkey set here.
+* We could provide warnings or errors if any VMA still has the pkey
+* set here.
 */
return ret;
 }
-- 
1.7.1



Re: [PATCH v3] x86: treat pkey-0 special

2018-03-15 Thread Ram Pai
On Thu, Mar 15, 2018 at 10:31:51AM -0700, Dave Hansen wrote:
> On 03/15/2018 10:21 AM, Ram Pai wrote:
> > On Thu, Mar 15, 2018 at 08:55:31AM -0700, Dave Hansen wrote:
> >> On 03/15/2018 02:46 AM, Thomas Gleixner wrote:
> >>>> +if (!pkey || !mm_pkey_is_allocated(mm, pkey))
> >>> Why this extra check? mm_pkey_is_allocated(mm, 0) should not return true
> >>> ever. If it does, then this wants to be fixed.
> >> I was thinking that we _do_ actually want it to seem allocated.  It just
> >> get "allocated" implicitly when an mm is created.  I think that will
> >> simplify the code if we avoid treating it specially in as many places as
> >> possible.
> > I think, the logic that makes pkey-0 special must to go
> > in arch-neutral code.   How about checking for pkey-0 in sys_pkey_free()
> > itself?
> 
> This is for protection against shooting yourself in the foot?  Yes, that
> can go in sys_pkey_free().
> 
> Does this need manpage and/or selftests updates?

Yes. it needs selftest, manpage and documentation updates too.

Unfortunately I am not getting enough reviewed-by for my selftests
and documentation changes. :-(  Need help!


-- 
Ram Pai



Re: [PATCH v3] x86: treat pkey-0 special

2018-03-15 Thread Ram Pai
On Thu, Mar 15, 2018 at 10:31:51AM -0700, Dave Hansen wrote:
> On 03/15/2018 10:21 AM, Ram Pai wrote:
> > On Thu, Mar 15, 2018 at 08:55:31AM -0700, Dave Hansen wrote:
> >> On 03/15/2018 02:46 AM, Thomas Gleixner wrote:
> >>>> +if (!pkey || !mm_pkey_is_allocated(mm, pkey))
> >>> Why this extra check? mm_pkey_is_allocated(mm, 0) should not return true
> >>> ever. If it does, then this wants to be fixed.
> >> I was thinking that we _do_ actually want it to seem allocated.  It just
> >> get "allocated" implicitly when an mm is created.  I think that will
> >> simplify the code if we avoid treating it specially in as many places as
> >> possible.
> > I think, the logic that makes pkey-0 special must to go
> > in arch-neutral code.   How about checking for pkey-0 in sys_pkey_free()
> > itself?
> 
> This is for protection against shooting yourself in the foot?  Yes, that
> can go in sys_pkey_free().
> 
> Does this need manpage and/or selftests updates?

Yes. it needs selftest, manpage and documentation updates too.

Unfortunately I am not getting enough reviewed-by for my selftests
and documentation changes. :-(  Need help!


-- 
Ram Pai



Re: [PATCH v3] x86: treat pkey-0 special

2018-03-15 Thread Ram Pai
On Thu, Mar 15, 2018 at 08:55:31AM -0700, Dave Hansen wrote:
> On 03/15/2018 02:46 AM, Thomas Gleixner wrote:
> >> +  if (!pkey || !mm_pkey_is_allocated(mm, pkey))
> > Why this extra check? mm_pkey_is_allocated(mm, 0) should not return true
> > ever. If it does, then this wants to be fixed.
> 
> I was thinking that we _do_ actually want it to seem allocated.  It just
> get "allocated" implicitly when an mm is created.  I think that will
> simplify the code if we avoid treating it specially in as many places as
> possible.

I think, the logic that makes pkey-0 special must to go
in arch-neutral code.   How about checking for pkey-0 in sys_pkey_free()
itself?

RP



Re: [PATCH v3] x86: treat pkey-0 special

2018-03-15 Thread Ram Pai
On Thu, Mar 15, 2018 at 08:55:31AM -0700, Dave Hansen wrote:
> On 03/15/2018 02:46 AM, Thomas Gleixner wrote:
> >> +  if (!pkey || !mm_pkey_is_allocated(mm, pkey))
> > Why this extra check? mm_pkey_is_allocated(mm, 0) should not return true
> > ever. If it does, then this wants to be fixed.
> 
> I was thinking that we _do_ actually want it to seem allocated.  It just
> get "allocated" implicitly when an mm is created.  I think that will
> simplify the code if we avoid treating it specially in as many places as
> possible.

I think, the logic that makes pkey-0 special must to go
in arch-neutral code.   How about checking for pkey-0 in sys_pkey_free()
itself?

RP



[PATCH v3] powerpc: treat pkey-0 special

2018-03-14 Thread Ram Pai
Applications need the ability to associate an address-range with some
key and latter revert to its initial default key. Pkey-0 comes close to
providing this function but falls short, because the current
implementation disallows applications to explicitly associate pkey-0 to
the address range.

This patch clarifies the semantics of pkey-0 and provides the
corresponding implementation on powerpc.

Pkey-0 is special with the following semantics.
(a) it is implicitly allocated and can never be freed. It always exists.
(b) it is the default key assigned to any address-range.
(c) it can be explicitly associated with any address-range.

Tested on powerpc.

History:
v3 : added clarification of the semantics of pkey0.
-- suggested by Dave Hansen
v2 : split the patch into two, one for x86 and one for powerpc
-- suggested by Michael Ellermen

cc: Dave Hansen <dave.han...@intel.com>
cc: Michael Ellermen <m...@ellerman.id.au>
cc: Ingo Molnar <mi...@kernel.org>
Signed-off-by: Ram Pai <linux...@us.ibm.com>
---
 arch/powerpc/include/asm/pkeys.h | 19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 0409c80..3c1deec 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -101,10 +101,18 @@ static inline u16 pte_to_pkey_bits(u64 pteflags)
 
 static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey)
 {
-   /* A reserved key is never considered as 'explicitly allocated' */
-   return ((pkey < arch_max_pkey()) &&
-   !__mm_pkey_is_reserved(pkey) &&
-   __mm_pkey_is_allocated(mm, pkey));
+   /* pkey 0 is allocated by default. */
+   if (!pkey)
+  return true;
+
+   if (pkey < 0 || pkey >= arch_max_pkey())
+  return false;
+
+   /* Reserved keys are never allocated. */
+   if (__mm_pkey_is_reserved(pkey))
+  return false;
+
+   return __mm_pkey_is_allocated(mm, pkey);
 }
 
 extern void __arch_activate_pkey(int pkey);
@@ -150,7 +158,8 @@ static inline int mm_pkey_free(struct mm_struct *mm, int 
pkey)
if (static_branch_likely(_disabled))
return -1;
 
-   if (!mm_pkey_is_allocated(mm, pkey))
+   /* pkey 0 cannot be freed */
+   if (!pkey || !mm_pkey_is_allocated(mm, pkey))
return -EINVAL;
 
/*
-- 
1.8.3.1



[PATCH v3] powerpc: treat pkey-0 special

2018-03-14 Thread Ram Pai
Applications need the ability to associate an address-range with some
key and latter revert to its initial default key. Pkey-0 comes close to
providing this function but falls short, because the current
implementation disallows applications to explicitly associate pkey-0 to
the address range.

This patch clarifies the semantics of pkey-0 and provides the
corresponding implementation on powerpc.

Pkey-0 is special with the following semantics.
(a) it is implicitly allocated and can never be freed. It always exists.
(b) it is the default key assigned to any address-range.
(c) it can be explicitly associated with any address-range.

Tested on powerpc.

History:
v3 : added clarification of the semantics of pkey0.
-- suggested by Dave Hansen
v2 : split the patch into two, one for x86 and one for powerpc
-- suggested by Michael Ellermen

cc: Dave Hansen 
cc: Michael Ellermen 
cc: Ingo Molnar 
Signed-off-by: Ram Pai 
---
 arch/powerpc/include/asm/pkeys.h | 19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 0409c80..3c1deec 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -101,10 +101,18 @@ static inline u16 pte_to_pkey_bits(u64 pteflags)
 
 static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey)
 {
-   /* A reserved key is never considered as 'explicitly allocated' */
-   return ((pkey < arch_max_pkey()) &&
-   !__mm_pkey_is_reserved(pkey) &&
-   __mm_pkey_is_allocated(mm, pkey));
+   /* pkey 0 is allocated by default. */
+   if (!pkey)
+  return true;
+
+   if (pkey < 0 || pkey >= arch_max_pkey())
+  return false;
+
+   /* Reserved keys are never allocated. */
+   if (__mm_pkey_is_reserved(pkey))
+  return false;
+
+   return __mm_pkey_is_allocated(mm, pkey);
 }
 
 extern void __arch_activate_pkey(int pkey);
@@ -150,7 +158,8 @@ static inline int mm_pkey_free(struct mm_struct *mm, int 
pkey)
if (static_branch_likely(_disabled))
return -1;
 
-   if (!mm_pkey_is_allocated(mm, pkey))
+   /* pkey 0 cannot be freed */
+   if (!pkey || !mm_pkey_is_allocated(mm, pkey))
return -EINVAL;
 
/*
-- 
1.8.3.1



[PATCH v3] x86: treat pkey-0 special

2018-03-14 Thread Ram Pai
Applications need the ability to associate an address-range with some
key and latter revert to its initial default key. Pkey-0 comes close to
providing this function but falls short, because the current
implementation disallows applications to explicitly associate pkey-0 to
the address range.

This patch clarifies the semantics of pkey-0 and provides the
corresponding implementation on powerpc.

Pkey-0 is special with the following semantics.
(a) it is implicitly allocated and can never be freed. It always exists.
(b) it is the default key assigned to any address-range.
(c) it can be explicitly associated with any address-range.

Tested on x86_64.

History:
v3 : added clarification of the semantics of pkey0.
-- suggested by Dave Hansen
v2 : split the patch into two, one for x86 and one for powerpc
-- suggested by Michael Ellermen

cc: Dave Hansen <dave.han...@intel.com>
cc: Michael Ellermen <m...@ellerman.id.au>
cc: Ingo Molnar <mi...@kernel.org>
Signed-off-by: Ram Pai <linux...@us.ibm.com>
---
 arch/x86/include/asm/pkeys.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h
index a0ba1ff..6ea7486 100644
--- a/arch/x86/include/asm/pkeys.h
+++ b/arch/x86/include/asm/pkeys.h
@@ -52,7 +52,7 @@ bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey)
 * from pkey_alloc().  pkey 0 is special, and never
 * returned from pkey_alloc().
 */
-   if (pkey <= 0)
+   if (pkey < 0)
return false;
if (pkey >= arch_max_pkey())
return false;
@@ -92,7 +92,8 @@ int mm_pkey_alloc(struct mm_struct *mm)
 static inline
 int mm_pkey_free(struct mm_struct *mm, int pkey)
 {
-   if (!mm_pkey_is_allocated(mm, pkey))
+   /* pkey 0 is special and can never be freed */
+   if (!pkey || !mm_pkey_is_allocated(mm, pkey))
return -EINVAL;
 
mm_set_pkey_free(mm, pkey);
-- 
1.8.3.1



[PATCH v3] x86: treat pkey-0 special

2018-03-14 Thread Ram Pai
Applications need the ability to associate an address-range with some
key and latter revert to its initial default key. Pkey-0 comes close to
providing this function but falls short, because the current
implementation disallows applications to explicitly associate pkey-0 to
the address range.

This patch clarifies the semantics of pkey-0 and provides the
corresponding implementation on powerpc.

Pkey-0 is special with the following semantics.
(a) it is implicitly allocated and can never be freed. It always exists.
(b) it is the default key assigned to any address-range.
(c) it can be explicitly associated with any address-range.

Tested on x86_64.

History:
v3 : added clarification of the semantics of pkey0.
-- suggested by Dave Hansen
v2 : split the patch into two, one for x86 and one for powerpc
-- suggested by Michael Ellermen

cc: Dave Hansen 
cc: Michael Ellermen 
cc: Ingo Molnar 
Signed-off-by: Ram Pai 
---
 arch/x86/include/asm/pkeys.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h
index a0ba1ff..6ea7486 100644
--- a/arch/x86/include/asm/pkeys.h
+++ b/arch/x86/include/asm/pkeys.h
@@ -52,7 +52,7 @@ bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey)
 * from pkey_alloc().  pkey 0 is special, and never
 * returned from pkey_alloc().
 */
-   if (pkey <= 0)
+   if (pkey < 0)
return false;
if (pkey >= arch_max_pkey())
return false;
@@ -92,7 +92,8 @@ int mm_pkey_alloc(struct mm_struct *mm)
 static inline
 int mm_pkey_free(struct mm_struct *mm, int pkey)
 {
-   if (!mm_pkey_is_allocated(mm, pkey))
+   /* pkey 0 is special and can never be freed */
+   if (!pkey || !mm_pkey_is_allocated(mm, pkey))
return -EINVAL;
 
mm_set_pkey_free(mm, pkey);
-- 
1.8.3.1



  1   2   3   4   5   6   7   8   9   10   >