[PATCH v1] ARM: dts: aspeed: Add Mowgli rtc driver
Add mowgli rtc driver. Signed-off-by: Ben Pai --- arch/arm/boot/dts/aspeed-bmc-opp-mowgli.dts | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/arm/boot/dts/aspeed-bmc-opp-mowgli.dts b/arch/arm/boot/dts/aspeed-bmc-opp-mowgli.dts index b648e468e9db..8503152faaf0 100644 --- a/arch/arm/boot/dts/aspeed-bmc-opp-mowgli.dts +++ b/arch/arm/boot/dts/aspeed-bmc-opp-mowgli.dts @@ -582,6 +582,11 @@ /* TMP275A */ /* TMP275A */ + rtc@32 { + compatible = "epson,rx8900"; + reg = <0x32>; + }; + tmp275@48 { compatible = "ti,tmp275"; reg = <0x48>; -- 2.17.1
Re: [PATCH] powerpc/mm: Limit allocation of SWIOTLB on server machines
On Wed, Dec 23, 2020 at 09:06:01PM -0300, Thiago Jung Bauermann wrote: > > Hi Ram, > > Thanks for reviewing this patch. > > Ram Pai writes: > > > On Fri, Dec 18, 2020 at 03:21:03AM -0300, Thiago Jung Bauermann wrote: > >> On server-class POWER machines, we don't need the SWIOTLB unless we're a > >> secure VM. Nevertheless, if CONFIG_SWIOTLB is enabled we unconditionally > >> allocate it. > >> > >> In most cases this is harmless, but on a few machine configurations (e.g., > >> POWER9 powernv systems with 4 GB area reserved for crashdump kernel) it can > >> happen that memblock can't find a 64 MB chunk of memory for the SWIOTLB and > >> fails with a scary-looking WARN_ONCE: > >> > >> [ cut here ] > >> memblock: bottom-up allocation failed, memory hotremove may be affected > >> WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 > >> memblock_find_in_range_node+0x328/0x340 > >> Modules linked in: > >> CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0-rc2-orig+ #6 > >> NIP: c0442f38 LR: c0442f34 CTR: c01e0080 > >> REGS: c1def900 TRAP: 0700 Not tainted (5.10.0-rc2-orig+) > >> MSR: 92021033 CR: 2802 XER: > >> 2004 > >> CFAR: c014b7b4 IRQMASK: 1 > >> GPR00: c0442f34 c1defba0 c1deff00 0047 > >> GPR04: 7fff c1def828 c1def820 > >> GPR08: 001ffc3e c1b75478 c1b75478 0001 > >> GPR12: 2000 c203 > >> GPR16: 0203 > >> GPR20: 0001 0001 c1defc10 > >> GPR24: c1defc08 c1c91868 c1defc18 c1c91890 > >> GPR28: 0400 > >> NIP [c0442f38] memblock_find_in_range_node+0x328/0x340 > >> LR [c0442f34] memblock_find_in_range_node+0x324/0x340 > >> Call Trace: > >> [c1defba0] [c0442f34] > >> memblock_find_in_range_node+0x324/0x340 (unreliable) > >> [c1defc90] [c15ac088] memblock_alloc_range_nid+0xec/0x1b0 > >> [c1defd40] [c15ac1f8] memblock_alloc_internal+0xac/0x110 > >> [c1defda0] [c15ac4d0] memblock_alloc_try_nid+0x94/0xcc > >> [c1defe30] [c159c3c8] swiotlb_init+0x78/0x104 > >> [c1defea0] [c158378c] mem_init+0x4c/0x98 > >> [c1defec0] [c157457c] start_kernel+0x714/0xac8 > >> [c1deff90] [c000d244] start_here_common+0x1c/0x58 > >> Instruction dump: > >> 2c23 4182ffd4 ea610088 ea810090 4bfffe84 3921 3d42fff4 3c62ff60 > >> 3863c560 992a8bfc 4bd0881d 6000 <0fe0> ea610088 4bfffd94 6000 > >> random: get_random_bytes called from __warn+0x128/0x184 with crng_init=0 > >> ---[ end trace ]--- > >> software IO TLB: Cannot allocate buffer > >> > >> Unless this is a secure VM the message can actually be ignored, because the > >> SWIOTLB isn't needed. Therefore, let's avoid the SWIOTLB in those cases. > > > > The above warn_on is conveying a genuine warning. Should it be silenced? > > Not sure I understand your point. This patch doesn't silence the > warning, it avoids the problem it is warning about. Sorry, I should have explained it better. My point is... If CONFIG_SWIOTLB is enabled, it means that the kernel is promising the bounce buffering capability. I know, currently we do not have any kernel subsystems that use bounce buffers on non-secure-pseries-kernel or powernv-kernel. But that does not mean, there wont be any. In case there is such a third-party module needing bounce buffering, it wont be able to operate, because of the proposed change in your patch. Is that a good thing or a bad thing, I do not know. I will let the experts opine. RP
Re: [PATCH] powerpc/mm: Limit allocation of SWIOTLB on server machines
On Fri, Dec 18, 2020 at 03:21:03AM -0300, Thiago Jung Bauermann wrote: > On server-class POWER machines, we don't need the SWIOTLB unless we're a > secure VM. Nevertheless, if CONFIG_SWIOTLB is enabled we unconditionally > allocate it. > > In most cases this is harmless, but on a few machine configurations (e.g., > POWER9 powernv systems with 4 GB area reserved for crashdump kernel) it can > happen that memblock can't find a 64 MB chunk of memory for the SWIOTLB and > fails with a scary-looking WARN_ONCE: > > [ cut here ] > memblock: bottom-up allocation failed, memory hotremove may be affected > WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 > memblock_find_in_range_node+0x328/0x340 > Modules linked in: > CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0-rc2-orig+ #6 > NIP: c0442f38 LR: c0442f34 CTR: c01e0080 > REGS: c1def900 TRAP: 0700 Not tainted (5.10.0-rc2-orig+) > MSR: 92021033 CR: 2802 XER: > 2004 > CFAR: c014b7b4 IRQMASK: 1 > GPR00: c0442f34 c1defba0 c1deff00 0047 > GPR04: 7fff c1def828 c1def820 > GPR08: 001ffc3e c1b75478 c1b75478 0001 > GPR12: 2000 c203 > GPR16: 0203 > GPR20: 0001 0001 c1defc10 > GPR24: c1defc08 c1c91868 c1defc18 c1c91890 > GPR28: 0400 > NIP [c0442f38] memblock_find_in_range_node+0x328/0x340 > LR [c0442f34] memblock_find_in_range_node+0x324/0x340 > Call Trace: > [c1defba0] [c0442f34] > memblock_find_in_range_node+0x324/0x340 (unreliable) > [c1defc90] [c15ac088] memblock_alloc_range_nid+0xec/0x1b0 > [c1defd40] [c15ac1f8] memblock_alloc_internal+0xac/0x110 > [c1defda0] [c15ac4d0] memblock_alloc_try_nid+0x94/0xcc > [c1defe30] [c159c3c8] swiotlb_init+0x78/0x104 > [c1defea0] [c158378c] mem_init+0x4c/0x98 > [c1defec0] [c157457c] start_kernel+0x714/0xac8 > [c1deff90] [c000d244] start_here_common+0x1c/0x58 > Instruction dump: > 2c23 4182ffd4 ea610088 ea810090 4bfffe84 3921 3d42fff4 3c62ff60 > 3863c560 992a8bfc 4bd0881d 6000 <0fe0> ea610088 4bfffd94 6000 > random: get_random_bytes called from __warn+0x128/0x184 with crng_init=0 > ---[ end trace ]--- > software IO TLB: Cannot allocate buffer > > Unless this is a secure VM the message can actually be ignored, because the > SWIOTLB isn't needed. Therefore, let's avoid the SWIOTLB in those cases. The above warn_on is conveying a genuine warning. Should it be silenced? > > Fixes: eae9eec476d1 ("powerpc/pseries/svm: Allocate SWIOTLB buffer anywhere > in memory") > Signed-off-by: Thiago Jung Bauermann > --- > arch/powerpc/mm/mem.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c > index afab328d0887..3af991844145 100644 > --- a/arch/powerpc/mm/mem.c > +++ b/arch/powerpc/mm/mem.c > @@ -300,7 +300,8 @@ void __init mem_init(void) > memblock_set_bottom_up(true); > if (is_secure_guest()) > svm_swiotlb_init(); > - else > + /* Server machines don't need SWIOTLB if they're not secure guests. */ > + else if (!machine_is(pseries) && !machine_is(powernv)) I can see powernv never needing SWIOTLB. But, pseries guests, I am not so sure. RP
Re: [PATCH stable] net: sch_generic: fix the missing new qdisc assignment bug
On 11/2/20 10:25 PM, Yunsheng Lin wrote: > commit 2fb541c862c9 ("net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc") > > When the above upstream commit is backported to stable kernel, > one assignment is missing, which causes two problems reported > by Joakim and Vishwanath, see [1] and [2]. > > So add the assignment back to fix it. > > 1. https://urldefense.com/v3/__https://www.spinics.net/lists/netdev/msg693916.html__;!!GjvTz_vk!AqzcoNtwXeDu-vDNRKnOiOWYmi4B-2atZZExjZTvpp2jeJ9asOyQBVUtQyBp$ > 2. https://urldefense.com/v3/__https://www.spinics.net/lists/netdev/msg695131.html__;!!GjvTz_vk!AqzcoNtwXeDu-vDNRKnOiOWYmi4B-2atZZExjZTvpp2jeJ9asOyQBQlaitCQ$ > > Fixes: 749cc0b0c7f3 ("net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc") > Signed-off-by: Yunsheng Lin > --- > net/sched/sch_generic.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c > index 0e275e1..6e6147a 100644 > --- a/net/sched/sch_generic.c > +++ b/net/sched/sch_generic.c > @@ -1127,10 +1127,13 @@ static void dev_deactivate_queue(struct net_device *dev, > void *_qdisc_default) > { > struct Qdisc *qdisc = rtnl_dereference(dev_queue->qdisc); > + struct Qdisc *qdisc_default = _qdisc_default; > > if (qdisc) { > if (!(qdisc->flags & TCQ_F_BUILTIN)) > set_bit(__QDISC_STATE_DEACTIVATED, >state); > + > + rcu_assign_pointer(dev_queue->qdisc, qdisc_default); > } > } > I have tested the patch on v5.4.71 and it fixes our issues. Tested-by: Vishwanath Pai
Re: [PATCH v2 net] net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc
On 11/2/20 4:08 AM, Yunsheng Lin wrote: > On 2020/10/30 1:20, Vishwanath Pai wrote: >> On 10/29/20 6:24 AM, Yunsheng Lin wrote: >>> On 2020/10/29 12:50, Vishwanath Pai wrote: >>>> On 10/28/20 10:37 PM, Yunsheng Lin wrote: >>>>> On 2020/10/29 4:04, Vishwanath Pai wrote: >>>>>> On 10/28/20 1:47 PM, Cong Wang wrote: >>>>>>> On Wed, Oct 28, 2020 at 8:37 AM Pai, Vishwanath wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> We noticed some problems when testing the latest 5.4 LTS kernel and traced it >>>>>>>> back to this commit using git bisect. When running our tests the machine stops >>>>>>>> responding to all traffic and the only way to recover is a reboot. I do not see >>>>>>>> a stack trace on the console. >>>>>>> >>>>>>> Do you mean the machine is still running fine just the network is down? >>>>>>> >>>>>>> If so, can you dump your tc config with stats when the problem is happening? >>>>>>> (You can use `tc -s -d qd show ...`.) >>>>>>> >>>>>>>> >>>>>>>> This can be reproduced using the packetdrill test below, it should be run a >>>>>>>> few times or in a loop. You should hit this issue within a few tries but >>>>>>>> sometimes might take up to 15-20 tries. >>>>>>> ... >>>>>>>> I can reproduce the issue easily on v5.4.68, and after reverting this commit it >>>>>>>> does not happen anymore. >>>>>>> >>>>>>> This is odd. The patch in this thread touches netdev reset path, if packetdrill >>>>>>> is the only thing you use to trigger the bug (that is netdev is always active), >>>>>>> I can not connect them. >>>>>>> >>>>>>> Thanks. >>>>>> >>>>>> Hi Cong, >>>>>> >>>>>>> Do you mean the machine is still running fine just the network is down? >>>>>> >>>>>> I was able to access the machine via serial console, it looks like it is >>>>>> up and running, just that networking is down. >>>>>> >>>>>>> If so, can you dump your tc config with stats when the problem is happening? >>>>>>> (You can use `tc -s -d qd show ...`.) >>>>>> >>>>>> If I try running tc when the machine is in this state the command never >>>>>> returns. It doesn't print anything but doesn't exit either. >>>>>> >>>>>>> This is odd. The patch in this thread touches netdev reset path, if packetdrill >>>>>>> is the only thing you use to trigger the bug (that is netdev is always active), >>>>>>> I can not connect them. >>>>>> >>>>>> I think packetdrill creates a tun0 interface when it starts the >>>>>> test and tears it down at the end, so it might be hitting this code path >>>>>> during teardown. >>>>> >>>>> Hi, Is there any preparation setup before running the above packetdrill test >>>>> case, I run the above test case in 5.9-rc4 with this patch applied without any >>>>> preparation setup, did not reproduce it. >>>>> >>>>> By the way, I am newbie to packetdrill:), it would be good to provide the >>>>> detail setup to reproduce it,thanks. >>>>> >>>>>> >>>>>> P.S: My mail server is having connectivity issues with vger.kernel.org >>>>>> so messages aren't getting delivered to netdev. It'll hopefully get >>>>>> resolved soon. >>>>>> >>>>>> Thanks, >>>>>> Vishwanath >>>>>> >>>>>> >>>>>> . >>>>>> >>>> >>>> I can't reproduce it on v5.9-rc4 either, it is probably an issue only on >>>> 5.4 then (and maybe older LTS versions). Can you give it a try on >>>> 5.4.68? >>>> >>>> For running packetdrill, download the latest version from their github >>>> repo, then run it in a loop without any special arguments. This is what >>>> I do to reproduce it: >>>> >>>> while true; do ./packetdri
Re: [PATCH v2 net] net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc
On 10/29/20 6:24 AM, Yunsheng Lin wrote: > On 2020/10/29 12:50, Vishwanath Pai wrote: >> On 10/28/20 10:37 PM, Yunsheng Lin wrote: >>> On 2020/10/29 4:04, Vishwanath Pai wrote: >>>> On 10/28/20 1:47 PM, Cong Wang wrote: >>>>> On Wed, Oct 28, 2020 at 8:37 AM Pai, Vishwanath wrote: >>>>>> Hi, >>>>>> >>>>>> We noticed some problems when testing the latest 5.4 LTS kernel and traced it >>>>>> back to this commit using git bisect. When running our tests the machine stops >>>>>> responding to all traffic and the only way to recover is a reboot. I do not see >>>>>> a stack trace on the console. >>>>> >>>>> Do you mean the machine is still running fine just the network is down? >>>>> >>>>> If so, can you dump your tc config with stats when the problem is happening? >>>>> (You can use `tc -s -d qd show ...`.) >>>>> >>>>>> >>>>>> This can be reproduced using the packetdrill test below, it should be run a >>>>>> few times or in a loop. You should hit this issue within a few tries but >>>>>> sometimes might take up to 15-20 tries. >>>>> ... >>>>>> I can reproduce the issue easily on v5.4.68, and after reverting this commit it >>>>>> does not happen anymore. >>>>> >>>>> This is odd. The patch in this thread touches netdev reset path, if packetdrill >>>>> is the only thing you use to trigger the bug (that is netdev is always active), >>>>> I can not connect them. >>>>> >>>>> Thanks. >>>> >>>> Hi Cong, >>>> >>>>> Do you mean the machine is still running fine just the network is down? >>>> >>>> I was able to access the machine via serial console, it looks like it is >>>> up and running, just that networking is down. >>>> >>>>> If so, can you dump your tc config with stats when the problem is happening? >>>>> (You can use `tc -s -d qd show ...`.) >>>> >>>> If I try running tc when the machine is in this state the command never >>>> returns. It doesn't print anything but doesn't exit either. >>>> >>>>> This is odd. The patch in this thread touches netdev reset path, if packetdrill >>>>> is the only thing you use to trigger the bug (that is netdev is always active), >>>>> I can not connect them. >>>> >>>> I think packetdrill creates a tun0 interface when it starts the >>>> test and tears it down at the end, so it might be hitting this code path >>>> during teardown. >>> >>> Hi, Is there any preparation setup before running the above packetdrill test >>> case, I run the above test case in 5.9-rc4 with this patch applied without any >>> preparation setup, did not reproduce it. >>> >>> By the way, I am newbie to packetdrill:), it would be good to provide the >>> detail setup to reproduce it,thanks. >>> >>>> >>>> P.S: My mail server is having connectivity issues with vger.kernel.org >>>> so messages aren't getting delivered to netdev. It'll hopefully get >>>> resolved soon. >>>> >>>> Thanks, >>>> Vishwanath >>>> >>>> >>>> . >>>> >> >> I can't reproduce it on v5.9-rc4 either, it is probably an issue only on >> 5.4 then (and maybe older LTS versions). Can you give it a try on >> 5.4.68? >> >> For running packetdrill, download the latest version from their github >> repo, then run it in a loop without any special arguments. This is what >> I do to reproduce it: >> >> while true; do ./packetdrill ; done >> >> I don't think any other setup is necessary. > > Hi, run the above test for above an hour using 5.4.68, still did not > reproduce it, as below: > > > root@(none)$ cd /home/root/ > root@(none)$ ls > creat_vlan.sh packetdrill test.pd > root@(none)$ cat test.pd > 0 `echo 4 > /proc/sys/net/ipv4/tcp_min_tso_segs` > > 0.400 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 > 0.400 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 > > // set maxseg to 1000 to work with both ipv4 and ipv6 > 0.500 setsockopt(3, SOL_TCP, TCP_MAXSEG, [1000], 4) = 0 > 0.500 bind(3, ..., ...) = 0 > 0.500 listen(3, 1) = 0 > > // Establish connection > 0.600 < S 0:0(0) win 3
Re: [PATCH v2 net] net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc
On 10/28/20 10:37 PM, Yunsheng Lin wrote: > On 2020/10/29 4:04, Vishwanath Pai wrote: >> On 10/28/20 1:47 PM, Cong Wang wrote: >>> On Wed, Oct 28, 2020 at 8:37 AM Pai, Vishwanath wrote: >>>> Hi, >>>> >>>> We noticed some problems when testing the latest 5.4 LTS kernel and traced it >>>> back to this commit using git bisect. When running our tests the machine stops >>>> responding to all traffic and the only way to recover is a reboot. I do not see >>>> a stack trace on the console. >>> >>> Do you mean the machine is still running fine just the network is down? >>> >>> If so, can you dump your tc config with stats when the problem is happening? >>> (You can use `tc -s -d qd show ...`.) >>> >>>> >>>> This can be reproduced using the packetdrill test below, it should be run a >>>> few times or in a loop. You should hit this issue within a few tries but >>>> sometimes might take up to 15-20 tries. >>> ... >>>> I can reproduce the issue easily on v5.4.68, and after reverting this commit it >>>> does not happen anymore. >>> >>> This is odd. The patch in this thread touches netdev reset path, if packetdrill >>> is the only thing you use to trigger the bug (that is netdev is always active), >>> I can not connect them. >>> >>> Thanks. >> >> Hi Cong, >> >>> Do you mean the machine is still running fine just the network is down? >> >> I was able to access the machine via serial console, it looks like it is >> up and running, just that networking is down. >> >>> If so, can you dump your tc config with stats when the problem is happening? >>> (You can use `tc -s -d qd show ...`.) >> >> If I try running tc when the machine is in this state the command never >> returns. It doesn't print anything but doesn't exit either. >> >>> This is odd. The patch in this thread touches netdev reset path, if packetdrill >>> is the only thing you use to trigger the bug (that is netdev is always active), >>> I can not connect them. >> >> I think packetdrill creates a tun0 interface when it starts the >> test and tears it down at the end, so it might be hitting this code path >> during teardown. > > Hi, Is there any preparation setup before running the above packetdrill test > case, I run the above test case in 5.9-rc4 with this patch applied without any > preparation setup, did not reproduce it. > > By the way, I am newbie to packetdrill:), it would be good to provide the > detail setup to reproduce it,thanks. > >> >> P.S: My mail server is having connectivity issues with vger.kernel.org >> so messages aren't getting delivered to netdev. It'll hopefully get >> resolved soon. >> >> Thanks, >> Vishwanath >> >> >> . >> I can't reproduce it on v5.9-rc4 either, it is probably an issue only on 5.4 then (and maybe older LTS versions). Can you give it a try on 5.4.68? For running packetdrill, download the latest version from their github repo, then run it in a loop without any special arguments. This is what I do to reproduce it: while true; do ./packetdrill ; done I don't think any other setup is necessary. -Vishwanath
Re: [PATCH v2 net] net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc
On 10/28/20 1:47 PM, Cong Wang wrote: > On Wed, Oct 28, 2020 at 8:37 AM Pai, Vishwanath wrote: >> Hi, >> >> We noticed some problems when testing the latest 5.4 LTS kernel and traced it >> back to this commit using git bisect. When running our tests the machine stops >> responding to all traffic and the only way to recover is a reboot. I do not see >> a stack trace on the console. > > Do you mean the machine is still running fine just the network is down? > > If so, can you dump your tc config with stats when the problem is happening? > (You can use `tc -s -d qd show ...`.) > >> >> This can be reproduced using the packetdrill test below, it should be run a >> few times or in a loop. You should hit this issue within a few tries but >> sometimes might take up to 15-20 tries. > ... >> I can reproduce the issue easily on v5.4.68, and after reverting this commit it >> does not happen anymore. > > This is odd. The patch in this thread touches netdev reset path, if packetdrill > is the only thing you use to trigger the bug (that is netdev is always active), > I can not connect them. > > Thanks. Hi Cong, > Do you mean the machine is still running fine just the network is down? I was able to access the machine via serial console, it looks like it is up and running, just that networking is down. > If so, can you dump your tc config with stats when the problem is happening? > (You can use `tc -s -d qd show ...`.) If I try running tc when the machine is in this state the command never returns. It doesn't print anything but doesn't exit either. > This is odd. The patch in this thread touches netdev reset path, if packetdrill > is the only thing you use to trigger the bug (that is netdev is always active), > I can not connect them. I think packetdrill creates a tun0 interface when it starts the test and tears it down at the end, so it might be hitting this code path during teardown. P.S: My mail server is having connectivity issues with vger.kernel.org so messages aren't getting delivered to netdev. It'll hopefully get resolved soon. Thanks, Vishwanath
Re: [PATCH v2 net] net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc
On 10/28/20, 10:51 AM, "Vishwanath Pai" wrote: On Thu, Sep 17, 2020 at 4:26 PM Cong Wang wrote: > > On Fri, Sep 11, 2020 at 1:13 AM Yunsheng Lin wrote: > > > > On 2020/9/11 4:07, Cong Wang wrote: > > > On Tue, Sep 8, 2020 at 4:06 AM Yunsheng Lin > > > wrote: > > >> > > >> Currently there is concurrent reset and enqueue operation for the > > >> same lockless qdisc when there is no lock to synchronize the > > >> q->enqueue() in __dev_xmit_skb() with the qdisc reset operation in > > >> qdisc_deactivate() called by dev_deactivate_queue(), which may cause > > >> out-of-bounds access for priv->ring[] in hns3 driver if user has > > >> requested a smaller queue num when __dev_xmit_skb() still enqueue a > > >> skb with a larger queue_mapping after the corresponding qdisc is > > >> reset, and call hns3_nic_net_xmit() with that skb later. > > >> > > >> Reused the existing synchronize_net() in dev_deactivate_many() to > > >> make sure skb with larger queue_mapping enqueued to old qdisc(which > > >> is saved in dev_queue->qdisc_sleeping) will always be reset when > > >> dev_reset_queue() is called. > > >> > > >> Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking") > > >> Signed-off-by: Yunsheng Lin > > >> --- > > >> ChangeLog V2: > > >> Reuse existing synchronize_net(). > > >> --- > > >> net/sched/sch_generic.c | 48 > > >> +--- > > >> 1 file changed, 33 insertions(+), 15 deletions(-) > > >> > > >> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c > > >> index 265a61d..54c4172 100644 > > >> --- a/net/sched/sch_generic.c > > >> +++ b/net/sched/sch_generic.c > > >> @@ -1131,24 +1131,10 @@ EXPORT_SYMBOL(dev_activate); > > >> > > >> static void qdisc_deactivate(struct Qdisc *qdisc) > > >> { > > >> - bool nolock = qdisc->flags & TCQ_F_NOLOCK; > > >> - > > >> if (qdisc->flags & TCQ_F_BUILTIN) > > >> return; > > >> - if (test_bit(__QDISC_STATE_DEACTIVATED, >state)) > > >> - return; > > >> - > > >> - if (nolock) > > >> - spin_lock_bh(>seqlock); > > >> - spin_lock_bh(qdisc_lock(qdisc)); > > >> > > >> set_bit(__QDISC_STATE_DEACTIVATED, >state); > > >> - > > >> - qdisc_reset(qdisc); > > >> - > > >> - spin_unlock_bh(qdisc_lock(qdisc)); > > >> - if (nolock) > > >> - spin_unlock_bh(>seqlock); > > >> } > > >> > > >> static void dev_deactivate_queue(struct net_device *dev, > > >> @@ -1165,6 +1151,30 @@ static void dev_deactivate_queue(struct > > >> net_device *dev, > > >> } > > >> } > > >> > > >> +static void dev_reset_queue(struct net_device *dev, > > >> + struct netdev_queue *dev_queue, > > >> + void *_unused) > > >> +{ > > >> + struct Qdisc *qdisc; > > >> + bool nolock; > > >> + > > >> + qdisc = dev_queue->qdisc_sleeping; > > >> + if (!qdisc) > > >> + return; > > >> + > > >> + nolock = qdisc->flags & TCQ_F_NOLOCK; > > >> + > > >> + if (nolock) > > >> + spin_lock_bh(>seqlock); > > >> + spin_lock_bh(qdisc_lock(qdisc)); > > > > > > > > > I think you do not need this lock for lockless one. > > > > It seems so. > > Maybe another patch to remove qdisc_lock(qdisc) for lockless > > qdisc? > > Yeah, but not sure if we still want this lockless qdisc any more, > it brings more troubles than gains. > > > > > > > > > > >> + > > >> + qdisc_reset(qdisc); > > >> + > > >> + spin_unlock_bh(qdisc_lock(qdisc)); > > >> + if (nolock) > > >> + spin_unlock_bh(>seqlock); > > >> +} > > >> + > > >> static bool some_qdisc_is_busy(struct net_device *dev) > > >> { > > >> unsigned int i;
Re: Re: [PATCH v2 net] net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc
On 9/17/20 3:26 PM, Cong Wang wrote: > On Fri, Sep 11, 2020 at 1:13 AM Yunsheng Lin wrote: >> >> On 2020/9/11 4:07, Cong Wang wrote: >>> On Tue, Sep 8, 2020 at 4:06 AM Yunsheng Lin wrote: Currently there is concurrent reset and enqueue operation for the same lockless qdisc when there is no lock to synchronize the q->enqueue() in __dev_xmit_skb() with the qdisc reset operation in qdisc_deactivate() called by dev_deactivate_queue(), which may cause out-of-bounds access for priv->ring[] in hns3 driver if user has requested a smaller queue num when __dev_xmit_skb() still enqueue a skb with a larger queue_mapping after the corresponding qdisc is reset, and call hns3_nic_net_xmit() with that skb later. Reused the existing synchronize_net() in dev_deactivate_many() to make sure skb with larger queue_mapping enqueued to old qdisc(which is saved in dev_queue->qdisc_sleeping) will always be reset when dev_reset_queue() is called. Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking") Signed-off-by: Yunsheng Lin --- ChangeLog V2: Reuse existing synchronize_net(). --- net/sched/sch_generic.c | 48 +--- 1 file changed, 33 insertions(+), 15 deletions(-) diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 265a61d..54c4172 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -1131,24 +1131,10 @@ EXPORT_SYMBOL(dev_activate); static void qdisc_deactivate(struct Qdisc *qdisc) { - bool nolock = qdisc->flags & TCQ_F_NOLOCK; - if (qdisc->flags & TCQ_F_BUILTIN) return; - if (test_bit(__QDISC_STATE_DEACTIVATED, >state)) - return; - - if (nolock) - spin_lock_bh(>seqlock); - spin_lock_bh(qdisc_lock(qdisc)); set_bit(__QDISC_STATE_DEACTIVATED, >state); - - qdisc_reset(qdisc); - - spin_unlock_bh(qdisc_lock(qdisc)); - if (nolock) - spin_unlock_bh(>seqlock); } static void dev_deactivate_queue(struct net_device *dev, @@ -1165,6 +1151,30 @@ static void dev_deactivate_queue(struct net_device *dev, } } +static void dev_reset_queue(struct net_device *dev, + struct netdev_queue *dev_queue, + void *_unused) +{ + struct Qdisc *qdisc; + bool nolock; + + qdisc = dev_queue->qdisc_sleeping; + if (!qdisc) + return; + + nolock = qdisc->flags & TCQ_F_NOLOCK; + + if (nolock) + spin_lock_bh(>seqlock); + spin_lock_bh(qdisc_lock(qdisc)); >>> >>> >>> I think you do not need this lock for lockless one. >> >> It seems so. >> Maybe another patch to remove qdisc_lock(qdisc) for lockless >> qdisc? > > Yeah, but not sure if we still want this lockless qdisc any more, > it brings more troubles than gains. > >> >> >>> + + qdisc_reset(qdisc); + + spin_unlock_bh(qdisc_lock(qdisc)); + if (nolock) + spin_unlock_bh(>seqlock); +} + static bool some_qdisc_is_busy(struct net_device *dev) { unsigned int i; @@ -1213,12 +1223,20 @@ void dev_deactivate_many(struct list_head *head) dev_watchdog_down(dev); } - /* Wait for outstanding qdisc-less dev_queue_xmit calls. + /* Wait for outstanding qdisc-less dev_queue_xmit calls or + * outstanding qdisc enqueuing calls. * This is avoided if all devices are in dismantle phase : * Caller will call synchronize_net() for us */ synchronize_net(); + list_for_each_entry(dev, head, close_list) { + netdev_for_each_tx_queue(dev, dev_reset_queue, NULL); + + if (dev_ingress_queue(dev)) + dev_reset_queue(dev, dev_ingress_queue(dev), NULL); + } + /* Wait for outstanding qdisc_run calls. */ list_for_each_entry(dev, head, close_list) { while (some_qdisc_is_busy(dev)) { >>> >>> Do you want to reset before waiting for TX action? >>> >>> I think it is safer to do it after, at least prior to commit 759ae57f1b >>> we did after. >> >> The reference to the txq->qdisc is always protected by RCU, so the synchronize_net() >> should be enought to ensure there is no skb enqueued to the old qdisc that is saved >> in the dev_queue->qdisc_sleeping, because __dev_queue_xmit can only see the new qdisc >> after synchronize_net(), which is noop_qdisc, and noop_qdisc will make sure any skb >> enqueued to it will be dropped and
Re: [RFC PATCH v3 2/2] mm: remove extra ZONE_DEVICE struct page refcount
On Thu, Oct 01, 2020 at 11:17:15AM -0700, Ralph Campbell wrote: > ZONE_DEVICE struct pages have an extra reference count that complicates the > code for put_page() and several places in the kernel that need to check the > reference count to see that a page is not being used (gup, compaction, > migration, etc.). Clean up the code so the reference count doesn't need to > be treated specially for ZONE_DEVICE. I was hoping this patch would resolve a page-reference issue that we run into at random times while migrating a page to a device page backed by secure-memory. Unfortunately I continue to see the problem. There is a reference held on that page, which fails the migration. FYI RP
[PATCH v1] ARM: dts: aspeed: Add Mowgli BMC platform
The Mowgli BMC is an ASPEED ast2500 based BMC that is part of an OpenPower Power9 server. Signed-off-by: Ben Pai --- arch/arm/boot/dts/Makefile | 1 + arch/arm/boot/dts/aspeed-bmc-opp-mowgli.dts | 663 2 files changed, 664 insertions(+) create mode 100644 arch/arm/boot/dts/aspeed-bmc-opp-mowgli.dts diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile index 4572db3fa5ae..92edb2e12823 100644 --- a/arch/arm/boot/dts/Makefile +++ b/arch/arm/boot/dts/Makefile @@ -1381,6 +1381,7 @@ dtb-$(CONFIG_ARCH_ASPEED) += \ aspeed-bmc-microsoft-olympus.dtb \ aspeed-bmc-opp-lanyang.dtb \ aspeed-bmc-opp-mihawk.dtb \ + aspeed-bmc-opp-mowgli.dtb \ aspeed-bmc-opp-nicole.dtb \ aspeed-bmc-opp-palmetto.dtb \ aspeed-bmc-opp-romulus.dtb \ diff --git a/arch/arm/boot/dts/aspeed-bmc-opp-mowgli.dts b/arch/arm/boot/dts/aspeed-bmc-opp-mowgli.dts new file mode 100644 index ..11fd86980162 --- /dev/null +++ b/arch/arm/boot/dts/aspeed-bmc-opp-mowgli.dts @@ -0,0 +1,663 @@ +// SPDX-License-Identifier: GPL-2.0+ +/dts-v1/; +#include "aspeed-g5.dtsi" +#include +#include + +/ { + model = "Mowgli BMC"; + compatible = "ibm,mowgli-bmc", "aspeed,ast2500"; + + + chosen { + stdout-path = + bootargs = "console=ttyS4,115200 earlyprintk"; + }; + + memory@8000 { + reg = <0x8000 0x2000>; + }; + + reserved-memory { + #address-cells = <1>; + #size-cells = <1>; + ranges; + + flash_memory: region@9800 { + no-map; + reg = <0x9800 0x0400>; /* 64M */ + }; + + gfx_memory: framebuffer { + size = <0x0100>; + alignment = <0x0100>; + compatible = "shared-dma-pool"; + reusable; + }; + + video_engine_memory: jpegbuffer { + size = <0x0200>; + alignment = <0x0100>; + compatible = "shared-dma-pool"; + reusable; + }; + }; + + gpio-keys { + compatible = "gpio-keys"; + + air-water { + label = "air-water"; + gpios = < ASPEED_GPIO(F, 6) GPIO_ACTIVE_LOW>; + linux,code = ; + }; + + checkstop { + label = "checkstop"; + gpios = < ASPEED_GPIO(J, 2) GPIO_ACTIVE_LOW>; + linux,code = ; + }; + + ps0-presence { + label = "ps0-presence"; + gpios = < ASPEED_GPIO(Z, 2) GPIO_ACTIVE_LOW>; + linux,code = ; + }; + + ps1-presence { + label = "ps1-presence"; + gpios = < ASPEED_GPIO(Z, 0) GPIO_ACTIVE_LOW>; + linux,code = ; + }; + + id-button { + label = "id-button"; + gpios = < ASPEED_GPIO(F, 1) GPIO_ACTIVE_LOW>; + linux,code = ; + }; + }; + + gpio-keys-polled { + compatible = "gpio-keys-polled"; + poll-interval = <1000>; + + fan0-presence { + label = "fan0-presence"; + gpios = < 9 GPIO_ACTIVE_LOW>; + linux,code = <9>; + }; + + fan1-presence { + label = "fan1-presence"; + gpios = < 10 GPIO_ACTIVE_LOW>; + linux,code = <10>; + }; + + fan2-presence { + label = "fan2-presence"; + gpios = < 11 GPIO_ACTIVE_LOW>; + linux,code = <11>; + }; + + fan3-presence { + label = "fan3-presence"; + gpios = < 12 GPIO_ACTIVE_LOW>; + linux,code = <12>; + }; + + fan4-presence { + label = "fan4-presence"; + gpios = < 13 GPIO_ACTIVE_LOW>; + linux,code = <13>; + }; + }; + + leds { + compatible = "gpio-leds"; + + front-fault { + retain-state-
Re: [PATCH v2 2/2] KVM: PPC: Book3S HV: rework secure mem slot dropping
On Tue, Jul 21, 2020 at 12:42:02PM +0200, Laurent Dufour wrote: > When a secure memslot is dropped, all the pages backed in the secure device > (aka really backed by secure memory by the Ultravisor) should be paged out > to a normal page. Previously, this was achieved by triggering the page > fault mechanism which is calling kvmppc_svm_page_out() on each pages. > > This can't work when hot unplugging a memory slot because the memory slot > is flagged as invalid and gfn_to_pfn() is then not trying to access the > page, so the page fault mechanism is not triggered. > > Since the final goal is to make a call to kvmppc_svm_page_out() it seems > simpler to directly calling it instead of triggering such a mechanism. This ^^ call directly instead of triggering.. > way kvmppc_uvmem_drop_pages() can be called even when hot unplugging a > memslot. > > Since kvmppc_uvmem_drop_pages() is already holding kvm->arch.uvmem_lock, > the call to __kvmppc_svm_page_out() is made. > As __kvmppc_svm_page_out needs the vma pointer to migrate the pages, the > VMA is fetched in a lazy way, to not trigger find_vma() all the time. In > addition, the mmap_sem is help in read mode during that time, not in write ^^ held > mode since the virual memory layout is not impacted, and > kvm->arch.uvmem_lock prevents concurrent operation on the secure device. > > Cc: Ram Pai Reviewed-by: Ram Pai RP
Re: [PATCH v2 1/2] KVM: PPC: Book3S HV: move kvmppc_svm_page_out up
On Tue, Jul 21, 2020 at 12:42:01PM +0200, Laurent Dufour wrote: > kvmppc_svm_page_out() will need to be called by kvmppc_uvmem_drop_pages() > so move it upper in this file. > > Furthermore it will be interesting to call this function when already > holding the kvm->arch.uvmem_lock, so prefix the original function with __ > and remove the locking in it, and introduce a wrapper which call that > function with the lock held. > > There is no functional change. Reviewed-by: Ram Pai > > Cc: Ram Pai > Cc: Bharata B Rao > Cc: Paul Mackerras > Signed-off-by: Laurent Dufour > --- RP
[PATCH v1 2/2] ARM: dts: aspeed: mihawk: Add 8 tmp401 thermal sensor
Signed-off-by: Ben Pai --- arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts | 40 + 1 file changed, 40 insertions(+) diff --git a/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts b/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts index 25ffe65fbdc0..5bf1f13dda3b 100644 --- a/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts +++ b/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts @@ -834,6 +834,11 @@ line-name = "smbus0"; }; }; + + tmp431@4c { + compatible = "ti,tmp401"; + reg = <0x4c>; + }; }; bus9_mux232: i2c@1 { @@ -854,6 +859,11 @@ line-name = "smbus1"; }; }; + + tmp431@4c { + compatible = "ti,tmp401"; + reg = <0x4c>; + }; }; bus9_mux233: i2c@2 { @@ -897,6 +907,11 @@ line-name = "smbus2"; }; }; + + tmp431@4c { + compatible = "ti,tmp401"; + reg = <0x4c>; + }; }; bus9_mux236: i2c@1 { @@ -917,6 +932,11 @@ line-name = "smbus3"; }; }; + + tmp431@4c { + compatible = "ti,tmp401"; + reg = <0x4c>; + }; }; bus9_mux237: i2c@2 { @@ -979,6 +999,11 @@ line-name = "smbus4"; }; }; + + tmp431@4c { + compatible = "ti,tmp401"; + reg = <0x4c>; + }; }; bus10_mux240: i2c@1 { @@ -999,6 +1024,11 @@ line-name = "smbus5"; }; }; + + tmp431@4c { + compatible = "ti,tmp401"; + reg = <0x4c>; + }; }; bus10_mux241: i2c@2 { @@ -1042,6 +1072,11 @@ line-name = "smbus6"; }; }; + + tmp431@4c { + compatible = "ti,tmp401"; + reg = <0x4c>; + }; }; bus10_mux244: i2c@1 { @@ -1062,6 +1097,11 @@ line-name = "smbus7"; }; }; + + tmp431@4c { + compatible = "ti,tmp401"; + reg = <0x4c>; + }; }; bus10_mux245: i2c@2 { -- 2.17.1
[PATCH v1 1/2] ARM: dts: aspeed: mihawk: IO expander uses TCA9554 driver
Set smbus_en of IO expander to 1 in order to be able to read sensor. Signed-off-by: Ben Pai --- arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts | 112 1 file changed, 112 insertions(+) diff --git a/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts b/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts index eac6ed2bbc67..25ffe65fbdc0 100644 --- a/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts +++ b/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts @@ -820,12 +820,40 @@ #address-cells = <1>; #size-cells = <0>; reg = <0>; + + tca9554@39 { + compatible = "ti,tca9554"; + reg = <0x39>; + gpio-controller; + #gpio-cells = <2>; + + smbus0 { + gpio-hog; + gpios = <4 GPIO_ACTIVE_HIGH>; + output-high; + line-name = "smbus0"; + }; + }; }; bus9_mux232: i2c@1 { #address-cells = <1>; #size-cells = <0>; reg = <1>; + + tca9554@39 { + compatible = "ti,tca9554"; + reg = <0x39>; + gpio-controller; + #gpio-cells = <2>; + + smbus1 { + gpio-hog; + gpios = <4 GPIO_ACTIVE_HIGH>; + output-high; + line-name = "smbus1"; + }; + }; }; bus9_mux233: i2c@2 { @@ -855,12 +883,40 @@ #address-cells = <1>; #size-cells = <0>; reg = <0>; + + tca9554@39 { + compatible = "ti,tca9554"; + reg = <0x39>; + gpio-controller; + #gpio-cells = <2>; + + smbus2 { + gpio-hog; + gpios = <4 GPIO_ACTIVE_HIGH>; + output-high; + line-name = "smbus2"; + }; + }; }; bus9_mux236: i2c@1 { #address-cells = <1>; #size-cells = <0>; reg = <1>; + + tca9554@39 { + compatible = "ti,tca9554"; + reg = <0x39>; + gpio-controller; + #gpio-cells = <2>; + + smbus3 { + gpio-hog; + gpios = <4 GPIO_ACTIVE_HIGH>; + output-high; + line-name = "smbus3"; + }; + }; }; bus9_mux237: i2c@2 { @@ -909,12 +965,40 @@ #address-cells = <1>; #size-cells = <0>; reg = <0>; + + tca9554@39 { + compatible = "ti,tca9554"; + reg = <0x39>; + gpio-controller; + #gpio-cells = <2>; + + smbus4 { + gpio-hog; + gpios = <4 GPIO_ACTIVE_HIGH>; + output-high; + line-name = "smbus4"; + }; + }; }; bus10_mux240: i2c@1 { #address-cells = <1>; #size-cells = <0>; reg = <1>; + + tca9554@39 { + compatible = "ti,tca9554"; + reg = <0x39>; + gpio-controller; +
[PATCH v1] ARM: dts: aspeed: mihawk: add aliases for i2c and add thermal sensor
1.Set the bus id for each mux channel to avoid switching channels multiple times 2.Set smbus_en of IO expander to 1 in order to be able to read tmp401 sensor 3.Add 8 tmp401 thermal sensors Signed-off-by: Ben Pai --- arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts | 449 +++- 1 file changed, 444 insertions(+), 5 deletions(-) diff --git a/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts b/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts index f7e935ede919..78451b283d93 100644 --- a/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts +++ b/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts @@ -7,7 +7,52 @@ / { model = "Mihawk BMC"; compatible = "ibm,mihawk-bmc", "aspeed,ast2500"; - + aliases { + i2c215 = _mux215; + i2c216 = _mux216; + i2c217 = _mux217; + i2c218 = _mux218; + i2c219 = _mux219; + i2c220 = _mux220; + i2c221 = _mux221; + i2c222 = _mux222; + i2c223 = _mux223; + i2c224 = _mux224; + i2c225 = _mux225; + i2c226 = _mux226; + i2c227 = _mux227; + i2c228 = _mux228; + i2c229 = _mux229; + i2c230 = _mux230; + i2c231 = _mux231; + i2c232 = _mux232; + i2c233 = _mux233; + i2c234 = _mux234; + i2c235 = _mux235; + i2c236 = _mux236; + i2c237 = _mux237; + i2c238 = _mux238; + i2c239 = _mux239; + i2c240 = _mux240; + i2c241 = _mux241; + i2c242 = _mux242; + i2c243 = _mux243; + i2c244 = _mux244; + i2c245 = _mux245; + i2c246 = _mux246; + i2c247 = _mux247; + i2c248 = _mux248; + i2c249 = _mux249; + i2c250 = _mux250; + i2c251 = _mux251; + i2c252 = _mux252; + i2c253 = _mux253; + i2c254 = _mux254; + i2c255 = _mux255; + i2c256 = _mux256; + i2c257 = _mux257; + i2c258 = _mux258; + }; chosen { stdout-path = @@ -630,6 +675,55 @@ #address-cells = <1>; #size-cells = <0>; reg = <0x70>; + + bus6_mux215: i2c@0 { + #address-cells = <1>; + #size-cells = <0>; + reg = <0>; + }; + + bus6_mux216: i2c@1 { + #address-cells = <1>; + #size-cells = <0>; + reg = <1>; + }; + + bus6_mux217: i2c@2 { + #address-cells = <1>; + #size-cells = <0>; + reg = <2>; + }; + + bus6_mux218: i2c@3 { + #address-cells = <1>; + #size-cells = <0>; + reg = <3>; + }; + + bus6_mux219: i2c@4 { + #address-cells = <1>; + #size-cells = <0>; + reg = <4>; + }; + + bus6_mux220: i2c@5 { + #address-cells = <1>; + #size-cells = <0>; + reg = <5>; + }; + + bus6_mux221: i2c@6 { + #address-cells = <1>; + #size-cells = <0>; + reg = <6>; + }; + + bus6_mux222: i2c@7 { + #address-cells = <1>; + #size-cells = <0>; + reg = <7>; + }; + }; }; @@ -644,6 +738,55 @@ #address-cells = <1>; #size-cells = <0>; reg = <0x70>; + + bus7_mux223: i2c@0 { + #address-cells = <1>; + #size-cells = <0>; + reg = <0>; + }; + + bus7_mux224: i2c@1 { + #address-cells = <1>; + #size-cells = <0>; + reg = <1>; + }; + + bus7_mux225: i2c@2 { + #address-cells = <1>; + #size-cells = <0>; + reg = <2>; + }; + + bus7_mux226: i2c@3 { + #address-cells = <1>; + #size-cells = <0>; + reg = <3>; +
Re: [PATCH v2] KVM: PPC: Book3S HV: relax check on H_SVM_INIT_ABORT
On Wed, May 20, 2020 at 07:43:08PM +0200, Laurent Dufour wrote: > The commit 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_* > Hcalls") added checks of secure bit of SRR1 to filter out the Hcall > reserved to the Ultravisor. > > However, the Hcall H_SVM_INIT_ABORT is made by the Ultravisor passing the > context of the VM calling UV_ESM. This allows the Hypervisor to return to > the guest without going through the Ultravisor. Thus the Secure bit of SRR1 > is not set in that particular case. > > In the case a regular VM is calling H_SVM_INIT_ABORT, this hcall will be > filtered out in kvmppc_h_svm_init_abort() because kvm->arch.secure_guest is > not set in that case. > > Fixes: 8c47b6ff29e3 ("KVM: PPC: Book3S HV: Check caller of H_SVM_* Hcalls") > Signed-off-by: Laurent Dufour Reviewed-by: Ram Pai > --- > arch/powerpc/kvm/book3s_hv.c | 9 ++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > index 93493f0cbfe8..6ad1a3b14300 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -1099,9 +1099,12 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu) > ret = kvmppc_h_svm_init_done(vcpu->kvm); > break; > case H_SVM_INIT_ABORT: > - ret = H_UNSUPPORTED; > - if (kvmppc_get_srr1(vcpu) & MSR_S) > - ret = kvmppc_h_svm_init_abort(vcpu->kvm); > + /* > + * Even if that call is made by the Ultravisor, the SSR1 value > + * is the guest context one, with the secure bit clear as it has > + * not yet been secured. So we can't check it here. > + */ Frankly speaking, the comment above when read in isolation; i.e without the delete code above, feels out of place. The reasoning for change is anyway captured in the changelog. So, I think, we should delete this comment. Also the comment above assumes the Ultravisor will call H_SVM_INIT_ABORT with SRR1(S) bit not set; which may or may not be true. Regardless of who and how H_SVM_INIT_ABORT is called, we should just call kvmppc_h_svm_init_abort() and let it deal with the complexities. RP
Re: [PATCH] powerpc/prom_init: Undo relocation before entering secure mode
On Wed, Sep 11, 2019 at 01:34:33PM -0300, Thiago Jung Bauermann wrote: > The ultravisor will do an integrity check of the kernel image but we > relocated it so the check will fail. Restore the original image by > relocating it back to the kernel virtual base address. > > This works because during build vmlinux is linked with an expected virtual > runtime address of KERNELBASE. > > Fixes: 6a9c930bd775 ("powerpc/prom_init: Add the ESM call to prom_init") > Signed-off-by: Thiago Jung Bauermann Tested-by: Ram Pai > --- > arch/powerpc/include/asm/elf.h | 3 +++ > arch/powerpc/kernel/prom_init.c| 11 +++ > arch/powerpc/kernel/prom_init_check.sh | 3 ++- > 3 files changed, 16 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/include/asm/elf.h b/arch/powerpc/include/asm/elf.h > index 409c9bfb43d9..57c229a86f08 100644 > --- a/arch/powerpc/include/asm/elf.h > +++ b/arch/powerpc/include/asm/elf.h > @@ -175,4 +175,7 @@ do { > \ > ARCH_DLINFO_CACHE_GEOMETRY; \ > } while (0) > > +/* Relocate the kernel image to @final_address */ > +void relocate(unsigned long final_address); > + > #endif /* _ASM_POWERPC_ELF_H */ > diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c > index 74f70f90eff0..44b1d404250e 100644 > --- a/arch/powerpc/kernel/prom_init.c > +++ b/arch/powerpc/kernel/prom_init.c > @@ -3249,7 +3249,18 @@ static void setup_secure_guest(unsigned long kbase, > unsigned long fdt) > /* Switch to secure mode. */ > prom_printf("Switching to secure mode.\n"); > > + /* > + * The ultravisor will do an integrity check of the kernel image but we > + * relocated it so the check will fail. Restore the original image by > + * relocating it back to the kernel virtual base address. > + */ > + relocate(KERNELBASE); > + > ret = enter_secure_mode(kbase, fdt); > + > + /* Relocate the kernel again. */ > + relocate(kbase); > + > if (ret != U_SUCCESS) { > prom_printf("Returned %d from switching to secure mode.\n", > ret); > prom_rtas_os_term("Switch to secure mode failed.\n"); > diff --git a/arch/powerpc/kernel/prom_init_check.sh > b/arch/powerpc/kernel/prom_init_check.sh > index 160bef0d553d..16535ccc0fa0 100644 > --- a/arch/powerpc/kernel/prom_init_check.sh > +++ b/arch/powerpc/kernel/prom_init_check.sh > @@ -26,7 +26,8 @@ _end enter_prom $MEM_FUNCS reloc_offset __secondary_hold > __secondary_hold_acknowledge __secondary_hold_spinloop __start > logo_linux_clut224 btext_prepare_BAT > reloc_got2 kernstart_addr memstart_addr linux_banner _stext > -__prom_init_toc_start __prom_init_toc_end btext_setup_display TOC." > +__prom_init_toc_start __prom_init_toc_end btext_setup_display TOC. > +relocate" > > NM="$1" > OBJ="$2" -- Ram Pai
RE: [PATCH 2/2] virtio_ring: Use DMA API if memory is encrypted
On Tue, Oct 15, 2019 at 09:35:01AM +0200, Christoph Hellwig wrote: > On Fri, Oct 11, 2019 at 06:25:19PM -0700, Ram Pai wrote: > > From: Thiago Jung Bauermann > > > > Normally, virtio enables DMA API with VIRTIO_F_IOMMU_PLATFORM, which must > > be set by both device and guest driver. However, as a hack, when DMA API > > returns physical addresses, guest driver can use the DMA API; even though > > device does not set VIRTIO_F_IOMMU_PLATFORM and just uses physical > > addresses. > > Sorry, but this is a complete bullshit hack. Driver must always use > the DMA API if they do DMA, and if virtio devices use physical addresses > that needs to be returned through the platform firmware interfaces for > the dma setup. If you don't do that yet (which based on previous > informations you don't), you need to fix it, and we can then quirk > old implementations that already are out in the field. > > In other words: we finally need to fix that virtio mess and not pile > hacks on top of hacks. So force all virtio devices to use DMA API, except when VIRTIO_F_IOMMU_PLATFORM is not enabled? Any help detailing the idea, will enable us fix this issue once for all. Will something like below work? It removes the prior hacks, and always uses DMA API; except when VIRTIO_F_IOMMU_PLATFORM is not enabled. diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index c8be1c4..b593d3d 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -240,22 +240,10 @@ static inline bool virtqueue_use_indirect(struct virtqueue *_vq, static bool vring_use_dma_api(struct virtio_device *vdev) { - if (!virtio_has_iommu_quirk(vdev)) - return true; - - /* Otherwise, we are left to guess. */ - /* -* In theory, it's possible to have a buggy QEMU-supposed -* emulated Q35 IOMMU and Xen enabled at the same time. On -* such a configuration, virtio has never worked and will -* not work without an even larger kludge. Instead, enable -* the DMA API if we're a Xen guest, which at least allows -* all of the sensible Xen configurations to work correctly. -*/ - if (xen_domain()) - return true; + if (virtio_has_iommu_quirk(vdev)) + return false; - return false; + return true; } size_t virtio_max_dma_size(struct virtio_device *vdev) -- Ram Pai
Re: [PATCH] powerpc/pkeys: remove unused pkey_allows_readwrite
On Tue, Sep 17, 2019 at 11:22:30AM -0400, Qian Cai wrote: > pkey_allows_readwrite() was first introduced in the commit 5586cf61e108 > ("powerpc: introduce execute-only pkey"), but the usage was removed > entirely in the commit a4fcc877d4e1 ("powerpc/pkeys: Preallocate > execute-only key"). > > Found by the "-Wunused-function" compiler warning flag. > > Fixes: a4fcc877d4e1 ("powerpc/pkeys: Preallocate execute-only key") > Signed-off-by: Qian Cai > --- > arch/powerpc/mm/book3s64/pkeys.c | 10 -- > 1 file changed, 10 deletions(-) > > diff --git a/arch/powerpc/mm/book3s64/pkeys.c > b/arch/powerpc/mm/book3s64/pkeys.c > index ae7fca40e5b3..59e0ebbd8036 100644 > --- a/arch/powerpc/mm/book3s64/pkeys.c > +++ b/arch/powerpc/mm/book3s64/pkeys.c > @@ -307,16 +307,6 @@ void thread_pkey_regs_init(struct thread_struct *thread) > write_iamr(pkey_iamr_mask); > } > > -static inline bool pkey_allows_readwrite(int pkey) > -{ > - int pkey_shift = pkeyshift(pkey); > - > - if (!is_pkey_enabled(pkey)) > - return true; > - > - return !(read_amr() & ((AMR_RD_BIT|AMR_WR_BIT) << pkey_shift)); > -} > - > int __execute_only_pkey(struct mm_struct *mm) > { > return mm->context.execute_only_pkey; The function was initially used by __execute_only_pkey(), but ones we changed the implementation of __execute_only_pkey(), the need for pkey_allows_readwrite() disappeared. Acked-by: Ram Pai -- Ram Pai
[PATCH v4 4/4] ARM: dts: aspeed: Add Mihawk BMC platform
The Mihawk BMC is an ASPEED ast2500 based BMC that is part of an OpenPower Power9 server. Signed-off-by: Ben Pai --- arch/arm/boot/dts/Makefile | 1 + arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts | 918 2 files changed, 919 insertions(+) create mode 100755 arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile index eb6de52c1936..cdfe0f43ffd3 100644 --- a/arch/arm/boot/dts/Makefile +++ b/arch/arm/boot/dts/Makefile @@ -1275,6 +1275,7 @@ dtb-$(CONFIG_ARCH_ASPEED) += \ aspeed-bmc-lenovo-hr630.dtb \ aspeed-bmc-microsoft-olympus.dtb \ aspeed-bmc-opp-lanyang.dtb \ + aspeed-bmc-opp-mihawk.dtb \ aspeed-bmc-opp-palmetto.dtb \ aspeed-bmc-opp-romulus.dtb \ aspeed-bmc-opp-swift.dtb \ diff --git a/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts b/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts new file mode 100755 index ..bbf4a4671421 --- /dev/null +++ b/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts @@ -0,0 +1,918 @@ +/dts-v1/; + +#include "aspeed-g5.dtsi" +#include +#include + +/ { + model = "Mihawk BMC"; + compatible = "ibm,mihawk-bmc", "aspeed,ast2500"; + + + chosen { + stdout-path = + bootargs = "console=ttyS4,115200 earlyprintk"; + }; + + memory@8000 { + reg = <0x8000 0x2000>; /* address and size of RAM(512MB) */ + }; + + reserved-memory { + #address-cells = <1>; + #size-cells = <1>; + ranges; + + flash_memory: region@9800 { + no-map; + reg = <0x9800 0x0400>; /* 64M */ + }; + + gfx_memory: framebuffer { + size = <0x0100>; + alignment = <0x0100>; + compatible = "shared-dma-pool"; + reusable; + }; + + video_engine_memory: jpegbuffer { + size = <0x0200>;/* 32MM */ + alignment = <0x0100>; + compatible = "shared-dma-pool"; + reusable; + }; + }; + + gpio-keys { + compatible = "gpio-keys"; + + air-water { + label = "air-water"; + gpios = < ASPEED_GPIO(F, 6) GPIO_ACTIVE_LOW>; + linux,code = ; + }; + + checkstop { + label = "checkstop"; + gpios = < ASPEED_GPIO(J, 2) GPIO_ACTIVE_LOW>; + linux,code = ; + }; + + ps0-presence { + label = "ps0-presence"; + gpios = < ASPEED_GPIO(Z, 2) GPIO_ACTIVE_LOW>; + linux,code = ; + }; + + ps1-presence { + label = "ps1-presence"; + gpios = < ASPEED_GPIO(Z, 0) GPIO_ACTIVE_LOW>; + linux,code = ; + }; + id-button { + label = "id-button"; + gpios = < ASPEED_GPIO(F, 1) GPIO_ACTIVE_LOW>; + linux,code = ; + }; + }; + + gpio-keys-polled { + compatible = "gpio-keys-polled"; + poll-interval = <1000>; + + fan0-presence { + label = "fan0-presence"; + gpios = < 9 GPIO_ACTIVE_LOW>; + linux,code = <9>; + }; + + fan1-presence { + label = "fan1-presence"; + gpios = < 10 GPIO_ACTIVE_LOW>; + linux,code = <10>; + }; + + fan2-presence { + label = "fan2-presence"; + gpios = < 11 GPIO_ACTIVE_LOW>; + linux,code = <11>; + }; + + fan3-presence { + label = "fan3-presence"; + gpios = < 12 GPIO_ACTIVE_LOW>; + linux,code = <12>; + }; + + fan4-presence { + label = "fan4-presence"; + gpios = < 13 GPIO_ACTIVE_LOW>; + linux,code = <13>; + }; + + fan5-presence { + label = "fan5-presence"; + gpios =
[PATCH v3 3/3] ARM: dts: aspeed: Add Mihawk BMC platform
The Mihawk BMC is an ASPEED ast2500 based BMC that is part of an OpenPower Power9 server. Signed-off-by: Ben Pai --- arch/arm/boot/dts/Makefile | 1 + arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts | 902 2 files changed, 903 insertions(+) create mode 100755 arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile index eb6de52c1936..cdfe0f43ffd3 100644 --- a/arch/arm/boot/dts/Makefile +++ b/arch/arm/boot/dts/Makefile @@ -1275,6 +1275,7 @@ dtb-$(CONFIG_ARCH_ASPEED) += \ aspeed-bmc-lenovo-hr630.dtb \ aspeed-bmc-microsoft-olympus.dtb \ aspeed-bmc-opp-lanyang.dtb \ + aspeed-bmc-opp-mihawk.dtb \ aspeed-bmc-opp-palmetto.dtb \ aspeed-bmc-opp-romulus.dtb \ aspeed-bmc-opp-swift.dtb \ diff --git a/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts b/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts new file mode 100755 index ..ca42057c0c1f --- /dev/null +++ b/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts @@ -0,0 +1,902 @@ +/dts-v1/; + +#include "aspeed-g5.dtsi" +#include +#include + +/ { + model = "Mihawk BMC"; + compatible = "ibm,mihawk-bmc", "aspeed,ast2500"; + + + chosen { + stdout-path = + bootargs = "console=ttyS4,115200 earlyprintk"; + }; + + memory@8000 { + reg = <0x8000 0x2000>; /* address and size of RAM(512MB) */ + }; + + reserved-memory { + #address-cells = <1>; + #size-cells = <1>; + ranges; + + flash_memory: region@9800 { + no-map; + reg = <0x9800 0x0400>; /* 64M */ + }; + + gfx_memory: framebuffer { + size = <0x0100>; + alignment = <0x0100>; + compatible = "shared-dma-pool"; + reusable; + }; + + video_engine_memory: jpegbuffer { + size = <0x0200>;/* 32MM */ + alignment = <0x0100>; + compatible = "shared-dma-pool"; + reusable; + }; + }; + + gpio-keys { + compatible = "gpio-keys"; + + air-water { + label = "air-water"; + gpios = < ASPEED_GPIO(F, 6) GPIO_ACTIVE_LOW>; + linux,code = ; + }; + + checkstop { + label = "checkstop"; + gpios = < ASPEED_GPIO(J, 2) GPIO_ACTIVE_LOW>; + linux,code = ; + }; + + ps0-presence { + label = "ps0-presence"; + gpios = < ASPEED_GPIO(Z, 2) GPIO_ACTIVE_LOW>; + linux,code = ; + }; + + ps1-presence { + label = "ps1-presence"; + gpios = < ASPEED_GPIO(Z, 0) GPIO_ACTIVE_LOW>; + linux,code = ; + }; + id-button { + label = "id-button"; + gpios = < ASPEED_GPIO(F, 1) GPIO_ACTIVE_LOW>; + linux,code = ; + }; + }; + + gpio-keys-polled { + compatible = "gpio-keys-polled"; + poll-interval = <1000>; + + fan0-presence { + label = "fan0-presence"; + gpios = < 9 GPIO_ACTIVE_LOW>; + linux,code = <9>; + }; + + fan1-presence { + label = "fan1-presence"; + gpios = < 10 GPIO_ACTIVE_LOW>; + linux,code = <10>; + }; + + fan2-presence { + label = "fan2-presence"; + gpios = < 11 GPIO_ACTIVE_LOW>; + linux,code = <11>; + }; + + fan3-presence { + label = "fan3-presence"; + gpios = < 12 GPIO_ACTIVE_LOW>; + linux,code = <12>; + }; + + fan4-presence { + label = "fan4-presence"; + gpios = < 13 GPIO_ACTIVE_LOW>; + linux,code = <13>; + }; + + fan5-presence { + label = "fan5-presence"; + gpios =
[PATCH v2 2/2] ARM: dts: aspeed: Add Mihawk BMC platform
The Mihawk BMC is an ASPEED ast2500 based BMC that is part of an OpenPower Power9 server. Signed-off-by: Ben Pai --- arch/arm/boot/dts/Makefile | 1 + arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts | 907 2 files changed, 908 insertions(+) create mode 100755 arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile index eb6de52c1936..262345544359 100644 --- a/arch/arm/boot/dts/Makefile +++ b/arch/arm/boot/dts/Makefile @@ -1281,5 +1281,6 @@ dtb-$(CONFIG_ARCH_ASPEED) += \ aspeed-bmc-opp-vesnin.dtb \ aspeed-bmc-opp-witherspoon.dtb \ aspeed-bmc-opp-zaius.dtb \ + aspeed-bmc-opp-mihawk.dtb \ aspeed-bmc-portwell-neptune.dtb \ aspeed-bmc-quanta-q71l.dtb diff --git a/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts b/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts new file mode 100755 index ..913c94326f3f --- /dev/null +++ b/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts @@ -0,0 +1,907 @@ +/dts-v1/; + +#include "aspeed-g5.dtsi" +#include +#include + +/ { + model = "Mihawk BMC"; + compatible = "ibm,mihawk-bmc", "aspeed,ast2500"; + + + chosen { + stdout-path = + bootargs = "console=ttyS4,115200 earlyprintk"; + }; + + memory@8000 { + reg = <0x8000 0x2000>; /* address and size of RAM(512MB) */ + }; + + reserved-memory { + #address-cells = <1>; + #size-cells = <1>; + ranges; + + flash_memory: region@9800 { + no-map; + reg = <0x9800 0x0400>; /* 64M */ + }; + + gfx_memory: framebuffer { + size = <0x0100>; + alignment = <0x0100>; + compatible = "shared-dma-pool"; + reusable; + }; + + video_engine_memory: jpegbuffer { + size = <0x0200>;/* 32MM */ + alignment = <0x0100>; + compatible = "shared-dma-pool"; + reusable; + }; + }; + + gpio-keys { + compatible = "gpio-keys"; + + air-water { + label = "air-water"; + gpios = < ASPEED_GPIO(F, 6) GPIO_ACTIVE_LOW>; + linux,code = ; + }; + + checkstop { + label = "checkstop"; + gpios = < ASPEED_GPIO(J, 2) GPIO_ACTIVE_LOW>; + linux,code = ; + }; + + ps0-presence { + label = "ps0-presence"; + gpios = < ASPEED_GPIO(Z, 2) GPIO_ACTIVE_LOW>; + linux,code = ; + }; + + ps1-presence { + label = "ps1-presence"; + gpios = < ASPEED_GPIO(Z, 0) GPIO_ACTIVE_LOW>; + linux,code = ; + }; + id-button { + label = "id-button"; + gpios = < ASPEED_GPIO(F, 1) GPIO_ACTIVE_LOW>; + linux,code = ; + }; + }; + + gpio-keys-polled { + compatible = "gpio-keys-polled"; + #address-cells = <1>; + poll-interval = <1000>; + + fan0-presence { + label = "fan0-presence"; + gpios = < 9 GPIO_ACTIVE_LOW>; + linux,code = <9>; + }; + + fan1-presence { + label = "fan1-presence"; + gpios = < 10 GPIO_ACTIVE_LOW>; + linux,code = <10>; + }; + + fan2-presence { + label = "fan2-presence"; + gpios = < 11 GPIO_ACTIVE_LOW>; + linux,code = <11>; + }; + + fan3-presence { + label = "fan3-presence"; + gpios = < 12 GPIO_ACTIVE_LOW>; + linux,code = <12>; + }; + + fan4-presence { + label = "fan4-presence"; + gpios = < 13 GPIO_ACTIVE_LOW>; + linux,code = <13>; + }; + + fan5-presence { + label = "fan5-presence"; + gp
[PATCH] ARM: dts: aspeed: Add Mihawk BMC platform
The Mihawk BMC is an ASPEED ast2500 based BMC that is part of an OpenPower Power9 server. This adds the device tree description for most upstream components. It is a squashed commit from the OpenBMC kernel tree. Signed-off-by: Ben Pai --- arch/arm/boot/dts/Makefile | 1 + arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts | 922 2 files changed, 923 insertions(+) create mode 100755 arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile index eb6de52c1936..262345544359 100644 --- a/arch/arm/boot/dts/Makefile +++ b/arch/arm/boot/dts/Makefile @@ -1281,5 +1281,6 @@ dtb-$(CONFIG_ARCH_ASPEED) += \ aspeed-bmc-opp-vesnin.dtb \ aspeed-bmc-opp-witherspoon.dtb \ aspeed-bmc-opp-zaius.dtb \ + aspeed-bmc-opp-mihawk.dtb \ aspeed-bmc-portwell-neptune.dtb \ aspeed-bmc-quanta-q71l.dtb diff --git a/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts b/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts new file mode 100755 index ..cfa20e0b2939 --- /dev/null +++ b/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts @@ -0,0 +1,922 @@ +/dts-v1/; + +#include "aspeed-g5.dtsi" +#include +#include + +/ { + model = "Mihawk BMC"; + compatible = "ibm,mihawk-bmc", "aspeed,ast2500"; + + + chosen { + stdout-path = + bootargs = "console=ttyS4,115200 earlyprintk"; + }; + + memory@8000 { + reg = <0x8000 0x2000>; /* address and size of RAM(512MB) */ + }; + + reserved-memory { + #address-cells = <1>; + #size-cells = <1>; + ranges; + + flash_memory: region@9800 { + no-map; + reg = <0x9800 0x0400>; /* 64M */ + }; + + gfx_memory: framebuffer { + size = <0x0100>; + alignment = <0x0100>; + compatible = "shared-dma-pool"; + reusable; + }; + + video_engine_memory: jpegbuffer { + size = <0x0200>;/* 32MM */ + alignment = <0x0100>; + compatible = "shared-dma-pool"; + reusable; + }; + }; + + gpio-keys { + compatible = "gpio-keys"; + + air-water { + label = "air-water"; + gpios = < ASPEED_GPIO(F, 6) GPIO_ACTIVE_LOW>; + linux,code = ; + }; + + checkstop { + label = "checkstop"; + gpios = < ASPEED_GPIO(J, 2) GPIO_ACTIVE_LOW>; + linux,code = ; + }; + + ps0-presence { + label = "ps0-presence"; + gpios = < ASPEED_GPIO(Z, 2) GPIO_ACTIVE_LOW>; + linux,code = ; + }; + + ps1-presence { + label = "ps1-presence"; + gpios = < ASPEED_GPIO(Z, 0) GPIO_ACTIVE_LOW>; + linux,code = ; + }; + id-button { + label = "id-button"; + gpios = < ASPEED_GPIO(F, 1) GPIO_ACTIVE_LOW>; + linux,code = ; + }; + }; + + gpio-keys-polled { + compatible = "gpio-keys-polled"; + #address-cells = <1>; + #size-cells = <0>; + poll-interval = <1000>; + + fan0-presence { + label = "fan0-presence"; + gpios = < 9 GPIO_ACTIVE_LOW>; + linux,code = <9>; + }; + + fan1-presence { + label = "fan1-presence"; + gpios = < 10 GPIO_ACTIVE_LOW>; + linux,code = <10>; + }; + + fan2-presence { + label = "fan2-presence"; + gpios = < 11 GPIO_ACTIVE_LOW>; + linux,code = <11>; + }; + + fan3-presence { + label = "fan3-presence"; + gpios = < 12 GPIO_ACTIVE_LOW>; + linux,code = <12>; + }; + + fan4-presence { + label = "fan4-presence"; + gpios = < 13 GPIO_ACTIVE_LOW>; +
[PATCH] ARM: dts: aspeed: Add Mihawk BMC platform
The Mihawk BMC is an ASPEED ast2500 based BMC that is part of an OpenPower Power9 server. This adds the device tree description for most upstream components. It is a squashed commit from the OpenBMC kernel tree. Signed-off-by: Ben Pai --- arch/arm/boot/dts/Makefile | 1 + arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts | 922 2 files changed, 923 insertions(+) create mode 100755 arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile index dab2914fa293..5368d657781e 100644 --- a/arch/arm/boot/dts/Makefile +++ b/arch/arm/boot/dts/Makefile @@ -1274,5 +1274,6 @@ dtb-$(CONFIG_ARCH_ASPEED) += \ aspeed-bmc-opp-romulus.dtb \ aspeed-bmc-opp-witherspoon.dtb \ aspeed-bmc-opp-zaius.dtb \ +aspeed-bmc-opp-mihawk.dtb \ aspeed-bmc-portwell-neptune.dtb \ aspeed-bmc-quanta-q71l.dtb diff --git a/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts b/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts new file mode 100755 index ..cfa20e0b2939 --- /dev/null +++ b/arch/arm/boot/dts/aspeed-bmc-opp-mihawk.dts @@ -0,0 +1,922 @@ +/dts-v1/; + +#include "aspeed-g5.dtsi" +#include +#include + +/ { + model = "Mihawk BMC"; + compatible = "ibm,mihawk-bmc", "aspeed,ast2500"; + + + chosen { + stdout-path = + bootargs = "console=ttyS4,115200 earlyprintk"; + }; + + memory@8000 { + reg = <0x8000 0x2000>; /* address and size of RAM(512MB) */ + }; + + reserved-memory { + #address-cells = <1>; + #size-cells = <1>; + ranges; + + flash_memory: region@9800 { + no-map; + reg = <0x9800 0x0400>; /* 64M */ + }; + + gfx_memory: framebuffer { + size = <0x0100>; + alignment = <0x0100>; + compatible = "shared-dma-pool"; + reusable; + }; + + video_engine_memory: jpegbuffer { + size = <0x0200>;/* 32MM */ + alignment = <0x0100>; + compatible = "shared-dma-pool"; + reusable; + }; + }; + + gpio-keys { + compatible = "gpio-keys"; + + air-water { + label = "air-water"; + gpios = < ASPEED_GPIO(F, 6) GPIO_ACTIVE_LOW>; + linux,code = ; + }; + + checkstop { + label = "checkstop"; + gpios = < ASPEED_GPIO(J, 2) GPIO_ACTIVE_LOW>; + linux,code = ; + }; + + ps0-presence { + label = "ps0-presence"; + gpios = < ASPEED_GPIO(Z, 2) GPIO_ACTIVE_LOW>; + linux,code = ; + }; + + ps1-presence { + label = "ps1-presence"; + gpios = < ASPEED_GPIO(Z, 0) GPIO_ACTIVE_LOW>; + linux,code = ; + }; + id-button { + label = "id-button"; + gpios = < ASPEED_GPIO(F, 1) GPIO_ACTIVE_LOW>; + linux,code = ; + }; + }; + + gpio-keys-polled { + compatible = "gpio-keys-polled"; + #address-cells = <1>; + #size-cells = <0>; + poll-interval = <1000>; + + fan0-presence { + label = "fan0-presence"; + gpios = < 9 GPIO_ACTIVE_LOW>; + linux,code = <9>; + }; + + fan1-presence { + label = "fan1-presence"; + gpios = < 10 GPIO_ACTIVE_LOW>; + linux,code = <10>; + }; + + fan2-presence { + label = "fan2-presence"; + gpios = < 11 GPIO_ACTIVE_LOW>; + linux,code = <11>; + }; + + fan3-presence { + label = "fan3-presence"; + gpios = < 12 GPIO_ACTIVE_LOW>; + linux,code = <12>; + }; + + fan4-presence { + label = "fan4-presence"; + gpios = < 13 GPIO_ACTIVE_LOW>; +
[RFC PATCH 1/1] powerpc/pseries/svm: Unshare all pages before kexecing a new kernel
powerpc/pseries/svm: Unshare all pages before kexecing a new kernel. A new kernel deserves a clean slate. Any pages shared with the hypervisor is unshared before invoking the new kernel. However there are exceptions. If the new kernel is invoked to dump the current kernel, or if there is a explicit request to preserve the state of the current kernel, unsharing of pages is skipped. NOTE: Reserve atleast 256M for crashkernel. Otherwise SWIOTLB allocation fails and crash kernel fails to boot. Signed-off-by: Ram Pai diff --git a/arch/powerpc/include/asm/ultravisor-api.h b/arch/powerpc/include/asm/ultravisor-api.h index 8a6c5b4d..c8dd470 100644 --- a/arch/powerpc/include/asm/ultravisor-api.h +++ b/arch/powerpc/include/asm/ultravisor-api.h @@ -31,5 +31,6 @@ #define UV_UNSHARE_PAGE0xF134 #define UV_PAGE_INVAL 0xF138 #define UV_SVM_TERMINATE 0xF13C +#define UV_UNSHARE_ALL_PAGES 0xF140 #endif /* _ASM_POWERPC_ULTRAVISOR_API_H */ diff --git a/arch/powerpc/include/asm/ultravisor.h b/arch/powerpc/include/asm/ultravisor.h index bf5ac05..73c44ff 100644 --- a/arch/powerpc/include/asm/ultravisor.h +++ b/arch/powerpc/include/asm/ultravisor.h @@ -120,6 +120,12 @@ static inline int uv_unshare_page(u64 pfn, u64 npages) return ucall(UV_UNSHARE_PAGE, retbuf, pfn, npages); } +static inline int uv_unshare_all_pages(void) +{ + unsigned long retbuf[UCALL_BUFSIZE]; + + return ucall(UV_UNSHARE_ALL_PAGES, retbuf); +} #endif /* !__ASSEMBLY__ */ #endif /* _ASM_POWERPC_ULTRAVISOR_H */ diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c index 75692c3..a93e3ab 100644 --- a/arch/powerpc/kernel/machine_kexec_64.c +++ b/arch/powerpc/kernel/machine_kexec_64.c @@ -329,6 +329,13 @@ void default_machine_kexec(struct kimage *image) #ifdef CONFIG_PPC_PSERIES kexec_paca.lppaca_ptr = NULL; #endif + + if (is_svm_platform() && !(image->preserve_context || + image->type == KEXEC_TYPE_CRASH)) { + uv_unshare_all_pages(); + printk("kexec: Unshared all shared pages.\n"); + } + paca_ptrs[kexec_paca.paca_index] = _paca; setup_paca(_paca);
Re: Re: [RFC PATCH 02/12] powerpc: Add support for adding an ESM blob to the zImage wrapper
On Tue, May 21, 2019 at 07:13:26AM +0200, Christoph Hellwig wrote: > On Tue, May 21, 2019 at 01:49:02AM -0300, Thiago Jung Bauermann wrote: > > From: Benjamin Herrenschmidt > > > > For secure VMs, the signing tool will create a ticket called the "ESM blob" > > for the Enter Secure Mode ultravisor call with the signatures of the kernel > > and initrd among other things. > > > > This adds support to the wrapper script for adding that blob via the "-e" > > option to the zImage.pseries. > > > > It also adds code to the zImage wrapper itself to retrieve and if necessary > > relocate the blob, and pass its address to Linux via the device-tree, to be > > later consumed by prom_init. > > Where does the "BLOB" come from? How is it licensed and how can we > satisfy the GPL with it? The "BLOB" is not a piece of code. Its just a piece of data that gets generated by our build tools. This data contains the signed hash of the kernel, initrd, and kernel command line parameters. Also it contains any information that the creator the the BLOB wants to be made available to anyone needing it, inside the secure-virtual-machine. All of this is integrity-protected and encrypted to safegaurd it when at rest and at runtime. Bottomline -- Blob is data, and hence no licensing implication. And due to some reason, even data needs to have licensing statement, we can make it available to have no conflicts with GPL. -- Ram Pai
Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices
On Wed, May 23, 2018 at 09:50:02PM +0300, Michael S. Tsirkin wrote: > subj: s/virito/virtio/ > ..snip.. > > machine_subsys_initcall_sync(pseries, tce_iommu_bus_notifier_init); > > + > > +bool platform_forces_virtio_dma(struct virtio_device *vdev) > > +{ > > + /* > > +* On protected guest platforms, force virtio core to use DMA > > +* MAP API for all virtio devices. But there can also be some > > +* exceptions for individual devices like virtio balloon. > > +*/ > > + return (of_find_compatible_node(NULL, NULL, "ibm,ultravisor") != NULL); > > +} > > Isn't this kind of slow? vring_use_dma_api is on > data path and supposed to be very fast. Yes it is slow and not ideal. This won't be the final code. The final code will cache the information in some global variable and used in this function. However this code was added to the RFC to illustrate the idea that dma operation are needed only in a secure/protected environment. RP
Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices
On Wed, May 23, 2018 at 09:50:02PM +0300, Michael S. Tsirkin wrote: > subj: s/virito/virtio/ > ..snip.. > > machine_subsys_initcall_sync(pseries, tce_iommu_bus_notifier_init); > > + > > +bool platform_forces_virtio_dma(struct virtio_device *vdev) > > +{ > > + /* > > +* On protected guest platforms, force virtio core to use DMA > > +* MAP API for all virtio devices. But there can also be some > > +* exceptions for individual devices like virtio balloon. > > +*/ > > + return (of_find_compatible_node(NULL, NULL, "ibm,ultravisor") != NULL); > > +} > > Isn't this kind of slow? vring_use_dma_api is on > data path and supposed to be very fast. Yes it is slow and not ideal. This won't be the final code. The final code will cache the information in some global variable and used in this function. However this code was added to the RFC to illustrate the idea that dma operation are needed only in a secure/protected environment. RP
Re: [PATCH 9/8] powerpc/pkeys: Drop private VM_PKEY definitions
Agree. Was going to send that the moment the other patches landed upstream. Glad I dont have to do it :-) Reviewed-by: Ram Pai <linux...@us.ibm.com> On Thu, May 10, 2018 at 11:54:22PM +1000, Michael Ellerman wrote: > Now that we've updated the generic headers to support 5 PKEY bits for > powerpc we don't need our own #defines in arch code. > > Signed-off-by: Michael Ellerman <m...@ellerman.id.au> > --- > arch/powerpc/include/asm/pkeys.h | 15 --- > 1 file changed, 15 deletions(-) > > One additional patch to finish cleaning things up. > > I've added this to my branch. > > cheers > > diff --git a/arch/powerpc/include/asm/pkeys.h > b/arch/powerpc/include/asm/pkeys.h > index 18ef59a9886d..5ba80cffb505 100644 > --- a/arch/powerpc/include/asm/pkeys.h > +++ b/arch/powerpc/include/asm/pkeys.h > @@ -15,21 +15,6 @@ DECLARE_STATIC_KEY_TRUE(pkey_disabled); > extern int pkeys_total; /* total pkeys as per device tree */ > extern u32 initial_allocation_mask; /* bits set for reserved keys */ > > -/* > - * Define these here temporarily so we're not dependent on patching > linux/mm.h. > - * Once it's updated we can drop these. > - */ > -#ifndef VM_PKEY_BIT0 > -# define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0 > -# define VM_PKEY_BIT0VM_HIGH_ARCH_0 > -# define VM_PKEY_BIT1VM_HIGH_ARCH_1 > -# define VM_PKEY_BIT2VM_HIGH_ARCH_2 > -# define VM_PKEY_BIT3VM_HIGH_ARCH_3 > -# define VM_PKEY_BIT4VM_HIGH_ARCH_4 > -#elif !defined(VM_PKEY_BIT4) > -# define VM_PKEY_BIT4VM_HIGH_ARCH_4 > -#endif > - > #define ARCH_VM_PKEY_FLAGS (VM_PKEY_BIT0 | VM_PKEY_BIT1 | VM_PKEY_BIT2 | \ > VM_PKEY_BIT3 | VM_PKEY_BIT4) > > -- > 2.14.1 -- Ram Pai
Re: [PATCH 9/8] powerpc/pkeys: Drop private VM_PKEY definitions
Agree. Was going to send that the moment the other patches landed upstream. Glad I dont have to do it :-) Reviewed-by: Ram Pai On Thu, May 10, 2018 at 11:54:22PM +1000, Michael Ellerman wrote: > Now that we've updated the generic headers to support 5 PKEY bits for > powerpc we don't need our own #defines in arch code. > > Signed-off-by: Michael Ellerman > --- > arch/powerpc/include/asm/pkeys.h | 15 --- > 1 file changed, 15 deletions(-) > > One additional patch to finish cleaning things up. > > I've added this to my branch. > > cheers > > diff --git a/arch/powerpc/include/asm/pkeys.h > b/arch/powerpc/include/asm/pkeys.h > index 18ef59a9886d..5ba80cffb505 100644 > --- a/arch/powerpc/include/asm/pkeys.h > +++ b/arch/powerpc/include/asm/pkeys.h > @@ -15,21 +15,6 @@ DECLARE_STATIC_KEY_TRUE(pkey_disabled); > extern int pkeys_total; /* total pkeys as per device tree */ > extern u32 initial_allocation_mask; /* bits set for reserved keys */ > > -/* > - * Define these here temporarily so we're not dependent on patching > linux/mm.h. > - * Once it's updated we can drop these. > - */ > -#ifndef VM_PKEY_BIT0 > -# define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0 > -# define VM_PKEY_BIT0VM_HIGH_ARCH_0 > -# define VM_PKEY_BIT1VM_HIGH_ARCH_1 > -# define VM_PKEY_BIT2VM_HIGH_ARCH_2 > -# define VM_PKEY_BIT3VM_HIGH_ARCH_3 > -# define VM_PKEY_BIT4VM_HIGH_ARCH_4 > -#elif !defined(VM_PKEY_BIT4) > -# define VM_PKEY_BIT4VM_HIGH_ARCH_4 > -#endif > - > #define ARCH_VM_PKEY_FLAGS (VM_PKEY_BIT0 | VM_PKEY_BIT1 | VM_PKEY_BIT2 | \ > VM_PKEY_BIT3 | VM_PKEY_BIT4) > > -- > 2.14.1 -- Ram Pai
Re: [PATCH 4/8] mm/pkeys, powerpc, x86: Provide an empty vma_pkey() in linux/pkeys.h
On Wed, May 09, 2018 at 12:59:44AM +1000, Michael Ellerman wrote: > Consolidate the pkey handling by providing a common empty definition > of vma_pkey() in pkeys.h when CONFIG_ARCH_HAS_PKEYS=n. > > This also removes another entanglement of pkeys.h and > asm/mmu_context.h. > Reviewed-by: Ram Pai <linux...@us.ibm.com> > Signed-off-by: Michael Ellerman <m...@ellerman.id.au> > --- > arch/powerpc/include/asm/mmu_context.h | 5 - > arch/x86/include/asm/mmu_context.h | 5 - > include/linux/pkeys.h | 5 + > 3 files changed, 5 insertions(+), 10 deletions(-) RP
Re: [PATCH 4/8] mm/pkeys, powerpc, x86: Provide an empty vma_pkey() in linux/pkeys.h
On Wed, May 09, 2018 at 12:59:44AM +1000, Michael Ellerman wrote: > Consolidate the pkey handling by providing a common empty definition > of vma_pkey() in pkeys.h when CONFIG_ARCH_HAS_PKEYS=n. > > This also removes another entanglement of pkeys.h and > asm/mmu_context.h. > Reviewed-by: Ram Pai > Signed-off-by: Michael Ellerman > --- > arch/powerpc/include/asm/mmu_context.h | 5 - > arch/x86/include/asm/mmu_context.h | 5 - > include/linux/pkeys.h | 5 + > 3 files changed, 5 insertions(+), 10 deletions(-) RP
Re: [PATCH 2/8] mm, powerpc, x86: introduce an additional vma bit for powerpc pkey
On Wed, May 09, 2018 at 12:59:42AM +1000, Michael Ellerman wrote: > From: Ram Pai <linux...@us.ibm.com> > > Currently only 4bits are allocated in the vma flags to hold 16 > keys. This is sufficient for x86. PowerPC supports 32 keys, > which needs 5bits. This patch allocates an additional bit. > > Reviewed-by: Ingo Molnar <mi...@kernel.org> > Acked-by: Balbir Singh <bsinghar...@gmail.com> > Signed-off-by: Ram Pai <linux...@us.ibm.com> > [mpe: Fold in #if VM_PKEY_BIT4 as noticed by Dave Hansen] > Signed-off-by: Michael Ellerman <m...@ellerman.id.au> > --- > fs/proc/task_mmu.c | 3 +++ > include/linux/mm.h | 3 ++- > 2 files changed, 5 insertions(+), 1 deletion(-) > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index 541392a62608..c2163606e6fb 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -679,6 +679,9 @@ static void show_smap_vma_flags(struct seq_file *m, > struct vm_area_struct *vma) > [ilog2(VM_PKEY_BIT1)] = "", > [ilog2(VM_PKEY_BIT2)] = "", > [ilog2(VM_PKEY_BIT3)] = "", > +#if VM_PKEY_BIT4 > + [ilog2(VM_PKEY_BIT4)] = "", > +#endif > #endif /* CONFIG_ARCH_HAS_PKEYS */ > }; > size_t i; > diff --git a/include/linux/mm.h b/include/linux/mm.h > index c6a6f2492c1b..abfd758ff83a 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -231,9 +231,10 @@ extern unsigned int kobjsize(const void *objp); > #ifdef CONFIG_ARCH_HAS_PKEYS > # define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0 > # define VM_PKEY_BIT0VM_HIGH_ARCH_0 /* A protection key is a 4-bit > value */ > -# define VM_PKEY_BIT1VM_HIGH_ARCH_1 > +# define VM_PKEY_BIT1VM_HIGH_ARCH_1 /* on x86 and 5-bit value on > ppc64 */ > # define VM_PKEY_BIT2VM_HIGH_ARCH_2 > # define VM_PKEY_BIT3VM_HIGH_ARCH_3 > +# define VM_PKEY_BIT4VM_HIGH_ARCH_4 > #endif /* CONFIG_ARCH_HAS_PKEYS */ this has to be: +#if defined(CONFIG_PPC) +# define VM_PKEY_BIT4 VM_HIGH_ARCH_4 +#else +# define VM_PKEY_BIT4 0 +#endif
Re: [PATCH 2/8] mm, powerpc, x86: introduce an additional vma bit for powerpc pkey
On Wed, May 09, 2018 at 12:59:42AM +1000, Michael Ellerman wrote: > From: Ram Pai > > Currently only 4bits are allocated in the vma flags to hold 16 > keys. This is sufficient for x86. PowerPC supports 32 keys, > which needs 5bits. This patch allocates an additional bit. > > Reviewed-by: Ingo Molnar > Acked-by: Balbir Singh > Signed-off-by: Ram Pai > [mpe: Fold in #if VM_PKEY_BIT4 as noticed by Dave Hansen] > Signed-off-by: Michael Ellerman > --- > fs/proc/task_mmu.c | 3 +++ > include/linux/mm.h | 3 ++- > 2 files changed, 5 insertions(+), 1 deletion(-) > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index 541392a62608..c2163606e6fb 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -679,6 +679,9 @@ static void show_smap_vma_flags(struct seq_file *m, > struct vm_area_struct *vma) > [ilog2(VM_PKEY_BIT1)] = "", > [ilog2(VM_PKEY_BIT2)] = "", > [ilog2(VM_PKEY_BIT3)] = "", > +#if VM_PKEY_BIT4 > + [ilog2(VM_PKEY_BIT4)] = "", > +#endif > #endif /* CONFIG_ARCH_HAS_PKEYS */ > }; > size_t i; > diff --git a/include/linux/mm.h b/include/linux/mm.h > index c6a6f2492c1b..abfd758ff83a 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -231,9 +231,10 @@ extern unsigned int kobjsize(const void *objp); > #ifdef CONFIG_ARCH_HAS_PKEYS > # define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0 > # define VM_PKEY_BIT0VM_HIGH_ARCH_0 /* A protection key is a 4-bit > value */ > -# define VM_PKEY_BIT1VM_HIGH_ARCH_1 > +# define VM_PKEY_BIT1VM_HIGH_ARCH_1 /* on x86 and 5-bit value on > ppc64 */ > # define VM_PKEY_BIT2VM_HIGH_ARCH_2 > # define VM_PKEY_BIT3VM_HIGH_ARCH_3 > +# define VM_PKEY_BIT4VM_HIGH_ARCH_4 > #endif /* CONFIG_ARCH_HAS_PKEYS */ this has to be: +#if defined(CONFIG_PPC) +# define VM_PKEY_BIT4 VM_HIGH_ARCH_4 +#else +# define VM_PKEY_BIT4 0 +#endif
Re: [PATCH v13 3/3] mm, powerpc, x86: introduce an additional vma bit for powerpc pkey
On Fri, May 04, 2018 at 03:57:33PM -0700, Dave Hansen wrote: > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > > index 0c9e392..3c7 100644 > > --- a/fs/proc/task_mmu.c > > +++ b/fs/proc/task_mmu.c > > @@ -679,6 +679,7 @@ static void show_smap_vma_flags(struct seq_file *m, > > struct vm_area_struct *vma) > > [ilog2(VM_PKEY_BIT1)] = "", > > [ilog2(VM_PKEY_BIT2)] = "", > > [ilog2(VM_PKEY_BIT3)] = "", > > + [ilog2(VM_PKEY_BIT4)] = "", > > #endif /* CONFIG_ARCH_HAS_PKEYS */ > ... > > +#if defined(CONFIG_PPC) > > +# define VM_PKEY_BIT4 VM_HIGH_ARCH_4 > > +#else > > +# define VM_PKEY_BIT4 0 > > +#endif > > #endif /* CONFIG_ARCH_HAS_PKEYS */ > > That new line boils down to: > > [ilog2(0)] = "", > > on x86. It wasn't *obvious* to me that it is OK to do that. The other > possibly undefined bits (VM_SOFTDIRTY for instance) #ifdef themselves > out of this array. > > I would just be a wee bit worried that this would overwrite the 0 entry > ("??") with "". Yes it would :-( and could potentially break anything that depends on 0th entry being "??" Is the following fix acceptable? #if VM_PKEY_BIT4 [ilog2(VM_PKEY_BIT4)] = "", #endif -- Ram Pai
Re: [PATCH v13 3/3] mm, powerpc, x86: introduce an additional vma bit for powerpc pkey
On Fri, May 04, 2018 at 03:57:33PM -0700, Dave Hansen wrote: > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > > index 0c9e392..3c7 100644 > > --- a/fs/proc/task_mmu.c > > +++ b/fs/proc/task_mmu.c > > @@ -679,6 +679,7 @@ static void show_smap_vma_flags(struct seq_file *m, > > struct vm_area_struct *vma) > > [ilog2(VM_PKEY_BIT1)] = "", > > [ilog2(VM_PKEY_BIT2)] = "", > > [ilog2(VM_PKEY_BIT3)] = "", > > + [ilog2(VM_PKEY_BIT4)] = "", > > #endif /* CONFIG_ARCH_HAS_PKEYS */ > ... > > +#if defined(CONFIG_PPC) > > +# define VM_PKEY_BIT4 VM_HIGH_ARCH_4 > > +#else > > +# define VM_PKEY_BIT4 0 > > +#endif > > #endif /* CONFIG_ARCH_HAS_PKEYS */ > > That new line boils down to: > > [ilog2(0)] = "", > > on x86. It wasn't *obvious* to me that it is OK to do that. The other > possibly undefined bits (VM_SOFTDIRTY for instance) #ifdef themselves > out of this array. > > I would just be a wee bit worried that this would overwrite the 0 entry > ("??") with "". Yes it would :-( and could potentially break anything that depends on 0th entry being "??" Is the following fix acceptable? #if VM_PKEY_BIT4 [ilog2(VM_PKEY_BIT4)] = "", #endif -- Ram Pai
[PATCH v11 3/3] mm, x86, powerpc: display pkey in smaps only if arch supports pkeys
Currently the architecture specific code is expected to display the protection keys in smap for a given vma. This can lead to redundant code and possibly to divergent formats in which the key gets displayed. This patch changes the implementation. It displays the pkey only if the architecture support pkeys, i.e arch_pkeys_enabled() returns true. This patch provides x86 implementation for arch_pkeys_enabled(). x86 arch_show_smap() function is not needed anymore. Deleting it. cc: Michael Ellermen <m...@ellerman.id.au> cc: Benjamin Herrenschmidt <b...@kernel.crashing.org> cc: Andrew Morton <a...@linux-foundation.org> Reviewed-by: Dave Hansen <dave.han...@intel.com> Signed-off-by: Thiago Jung Bauermann <bauer...@linux.vnet.ibm.com> (fixed compilation errors for x86 configs) Acked-by: Michal Hocko <mho...@suse.com> Reviewed-by: Ingo Molnar <mi...@kernel.org> Signed-off-by: Ram Pai <linux...@us.ibm.com> --- arch/powerpc/include/asm/mmu_context.h |5 - arch/x86/include/asm/mmu_context.h |5 - arch/x86/include/asm/pkeys.h |1 + arch/x86/kernel/fpu/xstate.c |5 + arch/x86/kernel/setup.c|8 fs/proc/task_mmu.c | 11 ++- include/linux/pkeys.h |7 ++- 7 files changed, 18 insertions(+), 24 deletions(-) diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h index 1835ca1..896efa5 100644 --- a/arch/powerpc/include/asm/mmu_context.h +++ b/arch/powerpc/include/asm/mmu_context.h @@ -250,11 +250,6 @@ static inline bool arch_vma_access_permitted(struct vm_area_struct *vma, #define thread_pkey_regs_restore(new_thread, old_thread) #define thread_pkey_regs_init(thread) -static inline int vma_pkey(struct vm_area_struct *vma) -{ - return 0; -} - static inline u64 pte_to_hpte_pkey_bits(u64 pteflags) { return 0x0UL; diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 57e3785..3d748bd 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -296,11 +296,6 @@ static inline int vma_pkey(struct vm_area_struct *vma) return (vma->vm_flags & vma_pkey_mask) >> VM_PKEY_SHIFT; } -#else -static inline int vma_pkey(struct vm_area_struct *vma) -{ - return 0; -} #endif /* diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h index a0ba1ff..f6c287b 100644 --- a/arch/x86/include/asm/pkeys.h +++ b/arch/x86/include/asm/pkeys.h @@ -6,6 +6,7 @@ extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, unsigned long init_val); +extern bool arch_pkeys_enabled(void); /* * Try to dedicate one of the protection keys to be used as an diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 87a57b7..4f566e9 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -945,6 +945,11 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, return 0; } + +bool arch_pkeys_enabled(void) +{ + return boot_cpu_has(X86_FEATURE_OSPKE); +} #endif /* ! CONFIG_ARCH_HAS_PKEYS */ /* diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 6285697..960dbab 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -1306,11 +1306,3 @@ static int __init register_kernel_offset_dumper(void) return 0; } __initcall(register_kernel_offset_dumper); - -void arch_show_smap(struct seq_file *m, struct vm_area_struct *vma) -{ - if (!boot_cpu_has(X86_FEATURE_OSPKE)) - return; - - seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma)); -} diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 3c7..9ce0097 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -18,10 +18,12 @@ #include #include #include +#include #include #include #include +#include #include "internal.h" #define SEQ_PUT_DEC(str, val) \ @@ -728,12 +730,9 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask, } #endif /* HUGETLB_PAGE */ -void __weak arch_show_smap(struct seq_file *m, struct vm_area_struct *vma) -{ -} - #define SEQ_PUT_DEC(str, val) \ seq_put_decimal_ull_width(m, str, (val) >> 10, 8) + static int show_smap(struct seq_file *m, void *v, int is_pid) { struct proc_maps_private *priv = m->private; @@ -836,9 +835,11 @@ static int show_smap(struct seq_file *m, void *v, int is_pid) seq_puts(m, " kB\n"); } if (!rollup_mode) { - arch_show_smap(m, vma); + if (arch_pkeys_enabled()) + seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma)); show_smap_vma_flags(m, vma); } + m_cache_vma(m, vma); return ret; } diff --git a/include
[PATCH v11 3/3] mm, x86, powerpc: display pkey in smaps only if arch supports pkeys
Currently the architecture specific code is expected to display the protection keys in smap for a given vma. This can lead to redundant code and possibly to divergent formats in which the key gets displayed. This patch changes the implementation. It displays the pkey only if the architecture support pkeys, i.e arch_pkeys_enabled() returns true. This patch provides x86 implementation for arch_pkeys_enabled(). x86 arch_show_smap() function is not needed anymore. Deleting it. cc: Michael Ellermen cc: Benjamin Herrenschmidt cc: Andrew Morton Reviewed-by: Dave Hansen Signed-off-by: Thiago Jung Bauermann (fixed compilation errors for x86 configs) Acked-by: Michal Hocko Reviewed-by: Ingo Molnar Signed-off-by: Ram Pai --- arch/powerpc/include/asm/mmu_context.h |5 - arch/x86/include/asm/mmu_context.h |5 - arch/x86/include/asm/pkeys.h |1 + arch/x86/kernel/fpu/xstate.c |5 + arch/x86/kernel/setup.c|8 fs/proc/task_mmu.c | 11 ++- include/linux/pkeys.h |7 ++- 7 files changed, 18 insertions(+), 24 deletions(-) diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h index 1835ca1..896efa5 100644 --- a/arch/powerpc/include/asm/mmu_context.h +++ b/arch/powerpc/include/asm/mmu_context.h @@ -250,11 +250,6 @@ static inline bool arch_vma_access_permitted(struct vm_area_struct *vma, #define thread_pkey_regs_restore(new_thread, old_thread) #define thread_pkey_regs_init(thread) -static inline int vma_pkey(struct vm_area_struct *vma) -{ - return 0; -} - static inline u64 pte_to_hpte_pkey_bits(u64 pteflags) { return 0x0UL; diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 57e3785..3d748bd 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -296,11 +296,6 @@ static inline int vma_pkey(struct vm_area_struct *vma) return (vma->vm_flags & vma_pkey_mask) >> VM_PKEY_SHIFT; } -#else -static inline int vma_pkey(struct vm_area_struct *vma) -{ - return 0; -} #endif /* diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h index a0ba1ff..f6c287b 100644 --- a/arch/x86/include/asm/pkeys.h +++ b/arch/x86/include/asm/pkeys.h @@ -6,6 +6,7 @@ extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, unsigned long init_val); +extern bool arch_pkeys_enabled(void); /* * Try to dedicate one of the protection keys to be used as an diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 87a57b7..4f566e9 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -945,6 +945,11 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, return 0; } + +bool arch_pkeys_enabled(void) +{ + return boot_cpu_has(X86_FEATURE_OSPKE); +} #endif /* ! CONFIG_ARCH_HAS_PKEYS */ /* diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 6285697..960dbab 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -1306,11 +1306,3 @@ static int __init register_kernel_offset_dumper(void) return 0; } __initcall(register_kernel_offset_dumper); - -void arch_show_smap(struct seq_file *m, struct vm_area_struct *vma) -{ - if (!boot_cpu_has(X86_FEATURE_OSPKE)) - return; - - seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma)); -} diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 3c7..9ce0097 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -18,10 +18,12 @@ #include #include #include +#include #include #include #include +#include #include "internal.h" #define SEQ_PUT_DEC(str, val) \ @@ -728,12 +730,9 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask, } #endif /* HUGETLB_PAGE */ -void __weak arch_show_smap(struct seq_file *m, struct vm_area_struct *vma) -{ -} - #define SEQ_PUT_DEC(str, val) \ seq_put_decimal_ull_width(m, str, (val) >> 10, 8) + static int show_smap(struct seq_file *m, void *v, int is_pid) { struct proc_maps_private *priv = m->private; @@ -836,9 +835,11 @@ static int show_smap(struct seq_file *m, void *v, int is_pid) seq_puts(m, " kB\n"); } if (!rollup_mode) { - arch_show_smap(m, vma); + if (arch_pkeys_enabled()) + seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma)); show_smap_vma_flags(m, vma); } + m_cache_vma(m, vma); return ret; } diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h index 0794ca7..49dff15 100644 --- a/include/linux/pkeys.h +++ b/include/linux/pkeys.h @@ -3,7 +3,6 @@ #define _LINUX_PKEYS_H #include -#include #ifdef CONFIG_ARCH_HAS_PKEYS #include @@
[PATCH v13 0/3] mm, x86, powerpc: Enhancements to Memory Protection Keys.
This patch series provides arch-neutral enhancements to enable memory-keys on new architecutes, and the corresponding changes in x86 and powerpc specific code to support that. a) Provides ability to support upto 32 keys. PowerPC can handle 32 keys and hence needs this. b) Arch-neutral code; and not the arch-specific code, determines the format of the string, that displays the key for each vma in smaps. History: --- version 14: (1) made VM_PKEY_BIT4 unusable on x86, #defined it to 0 -- comment by Dave Hansen (2) due to some reason this patch series continue to break some or the other build. The last series passed everything but created a merge conflict followed by build failure for Michael Ellermen. :( version v13: (1) fixed a git bisect error. :( version v12: (1) fixed compilation errors seen with various x86 configs. version v11: (1) code that displays key in smaps is not any more defined under CONFIG_ARCH_HAS_PKEYS. - Comment by Eric W. Biederman and Michal Hocko (2) merged two patches that implemented (1). - comment by Michal Hocko version prior to v11: (1) used one additional bit from VM_HIGH_ARCH_* to support 32 keys. - Suggestion by Dave Hansen. (2) powerpc specific changes to support memory keys. Ram Pai (3): mm, powerpc, x86: define VM_PKEY_BITx bits if CONFIG_ARCH_HAS_PKEYS is enabled mm, powerpc, x86: introduce an additional vma bit for powerpc pkey mm, x86, powerpc: display pkey in smaps only if arch supports pkeys arch/powerpc/include/asm/mmu_context.h |5 - arch/powerpc/include/asm/pkeys.h |2 ++ arch/x86/include/asm/mmu_context.h |5 - arch/x86/include/asm/pkeys.h |1 + arch/x86/kernel/fpu/xstate.c |5 + arch/x86/kernel/setup.c|8 fs/proc/task_mmu.c | 16 +--- include/linux/mm.h | 15 +++ include/linux/pkeys.h |7 ++- 9 files changed, 34 insertions(+), 30 deletions(-)
[PATCH v13 0/3] mm, x86, powerpc: Enhancements to Memory Protection Keys.
This patch series provides arch-neutral enhancements to enable memory-keys on new architecutes, and the corresponding changes in x86 and powerpc specific code to support that. a) Provides ability to support upto 32 keys. PowerPC can handle 32 keys and hence needs this. b) Arch-neutral code; and not the arch-specific code, determines the format of the string, that displays the key for each vma in smaps. History: --- version 14: (1) made VM_PKEY_BIT4 unusable on x86, #defined it to 0 -- comment by Dave Hansen (2) due to some reason this patch series continue to break some or the other build. The last series passed everything but created a merge conflict followed by build failure for Michael Ellermen. :( version v13: (1) fixed a git bisect error. :( version v12: (1) fixed compilation errors seen with various x86 configs. version v11: (1) code that displays key in smaps is not any more defined under CONFIG_ARCH_HAS_PKEYS. - Comment by Eric W. Biederman and Michal Hocko (2) merged two patches that implemented (1). - comment by Michal Hocko version prior to v11: (1) used one additional bit from VM_HIGH_ARCH_* to support 32 keys. - Suggestion by Dave Hansen. (2) powerpc specific changes to support memory keys. Ram Pai (3): mm, powerpc, x86: define VM_PKEY_BITx bits if CONFIG_ARCH_HAS_PKEYS is enabled mm, powerpc, x86: introduce an additional vma bit for powerpc pkey mm, x86, powerpc: display pkey in smaps only if arch supports pkeys arch/powerpc/include/asm/mmu_context.h |5 - arch/powerpc/include/asm/pkeys.h |2 ++ arch/x86/include/asm/mmu_context.h |5 - arch/x86/include/asm/pkeys.h |1 + arch/x86/kernel/fpu/xstate.c |5 + arch/x86/kernel/setup.c|8 fs/proc/task_mmu.c | 16 +--- include/linux/mm.h | 15 +++ include/linux/pkeys.h |7 ++- 9 files changed, 34 insertions(+), 30 deletions(-)
[PATCH v13 3/3] mm, powerpc, x86: introduce an additional vma bit for powerpc pkey
Only 4bits are allocated in the vma flags to hold 16 keys. This is sufficient on x86. PowerPC supports 32 keys, which needs 5bits. Allocate an additional bit. cc: Dave Hansen <dave.han...@intel.com> cc: Michael Ellermen <m...@ellerman.id.au> cc: Benjamin Herrenschmidt <b...@kernel.crashing.org> cc: Andrew Morton <a...@linux-foundation.org> Reviewed-by: Ingo Molnar <mi...@kernel.org> Acked-by: Balbir Singh <bsinghar...@gmail.com> Signed-off-by: Ram Pai <linux...@us.ibm.com> --- fs/proc/task_mmu.c |1 + include/linux/mm.h |8 +++- 2 files changed, 8 insertions(+), 1 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 0c9e392..3c7 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -679,6 +679,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) [ilog2(VM_PKEY_BIT1)] = "", [ilog2(VM_PKEY_BIT2)] = "", [ilog2(VM_PKEY_BIT3)] = "", + [ilog2(VM_PKEY_BIT4)] = "", #endif /* CONFIG_ARCH_HAS_PKEYS */ }; size_t i; diff --git a/include/linux/mm.h b/include/linux/mm.h index c6a6f24..cca67d1 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -230,10 +230,16 @@ extern int overcommit_kbytes_handler(struct ctl_table *, int, void __user *, #ifdef CONFIG_ARCH_HAS_PKEYS # define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0 -# define VM_PKEY_BIT0 VM_HIGH_ARCH_0 /* A protection key is a 4-bit value */ +/* Protection key is a 4-bit value on x86 and 5-bit value on ppc64 */ +# define VM_PKEY_BIT0 VM_HIGH_ARCH_0 # define VM_PKEY_BIT1 VM_HIGH_ARCH_1 # define VM_PKEY_BIT2 VM_HIGH_ARCH_2 # define VM_PKEY_BIT3 VM_HIGH_ARCH_3 +#if defined(CONFIG_PPC) +# define VM_PKEY_BIT4 VM_HIGH_ARCH_4 +#else +# define VM_PKEY_BIT4 0 +#endif #endif /* CONFIG_ARCH_HAS_PKEYS */ #if defined(CONFIG_X86) -- 1.7.1
[PATCH v13 3/3] mm, powerpc, x86: introduce an additional vma bit for powerpc pkey
Only 4bits are allocated in the vma flags to hold 16 keys. This is sufficient on x86. PowerPC supports 32 keys, which needs 5bits. Allocate an additional bit. cc: Dave Hansen cc: Michael Ellermen cc: Benjamin Herrenschmidt cc: Andrew Morton Reviewed-by: Ingo Molnar Acked-by: Balbir Singh Signed-off-by: Ram Pai --- fs/proc/task_mmu.c |1 + include/linux/mm.h |8 +++- 2 files changed, 8 insertions(+), 1 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 0c9e392..3c7 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -679,6 +679,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) [ilog2(VM_PKEY_BIT1)] = "", [ilog2(VM_PKEY_BIT2)] = "", [ilog2(VM_PKEY_BIT3)] = "", + [ilog2(VM_PKEY_BIT4)] = "", #endif /* CONFIG_ARCH_HAS_PKEYS */ }; size_t i; diff --git a/include/linux/mm.h b/include/linux/mm.h index c6a6f24..cca67d1 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -230,10 +230,16 @@ extern int overcommit_kbytes_handler(struct ctl_table *, int, void __user *, #ifdef CONFIG_ARCH_HAS_PKEYS # define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0 -# define VM_PKEY_BIT0 VM_HIGH_ARCH_0 /* A protection key is a 4-bit value */ +/* Protection key is a 4-bit value on x86 and 5-bit value on ppc64 */ +# define VM_PKEY_BIT0 VM_HIGH_ARCH_0 # define VM_PKEY_BIT1 VM_HIGH_ARCH_1 # define VM_PKEY_BIT2 VM_HIGH_ARCH_2 # define VM_PKEY_BIT3 VM_HIGH_ARCH_3 +#if defined(CONFIG_PPC) +# define VM_PKEY_BIT4 VM_HIGH_ARCH_4 +#else +# define VM_PKEY_BIT4 0 +#endif #endif /* CONFIG_ARCH_HAS_PKEYS */ #if defined(CONFIG_X86) -- 1.7.1
[PATCH v13 1/3] mm, powerpc, x86: define VM_PKEY_BITx bits if CONFIG_ARCH_HAS_PKEYS is enabled
VM_PKEY_BITx are defined only if CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS is enabled. Powerpc also needs these bits. Hence lets define the VM_PKEY_BITx bits for any architecture that enables CONFIG_ARCH_HAS_PKEYS. cc: Michael Ellermen <m...@ellerman.id.au> cc: Benjamin Herrenschmidt <b...@kernel.crashing.org> cc: Andrew Morton <a...@linux-foundation.org> Reviewed-by: Dave Hansen <dave.han...@intel.com> Signed-off-by: Ram Pai <linux...@us.ibm.com> Reviewed-by: Ingo Molnar <mi...@kernel.org> Reviewed-by: Aneesh Kumar K.V <aneesh.ku...@linux.vnet.ibm.com> --- arch/powerpc/include/asm/pkeys.h |2 ++ fs/proc/task_mmu.c |4 ++-- include/linux/mm.h |9 + 3 files changed, 9 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h index 3a9b82b..425b181 100644 --- a/arch/powerpc/include/asm/pkeys.h +++ b/arch/powerpc/include/asm/pkeys.h @@ -26,6 +26,8 @@ # define VM_PKEY_BIT2 VM_HIGH_ARCH_2 # define VM_PKEY_BIT3 VM_HIGH_ARCH_3 # define VM_PKEY_BIT4 VM_HIGH_ARCH_4 +#elif !defined(VM_PKEY_BIT4) +# define VM_PKEY_BIT4 VM_HIGH_ARCH_4 #endif #define ARCH_VM_PKEY_FLAGS (VM_PKEY_BIT0 | VM_PKEY_BIT1 | VM_PKEY_BIT2 | \ diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 65ae546..0c9e392 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -673,13 +673,13 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) [ilog2(VM_MERGEABLE)] = "mg", [ilog2(VM_UFFD_MISSING)]= "um", [ilog2(VM_UFFD_WP)] = "uw", -#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS +#ifdef CONFIG_ARCH_HAS_PKEYS /* These come out via ProtectionKey: */ [ilog2(VM_PKEY_BIT0)] = "", [ilog2(VM_PKEY_BIT1)] = "", [ilog2(VM_PKEY_BIT2)] = "", [ilog2(VM_PKEY_BIT3)] = "", -#endif +#endif /* CONFIG_ARCH_HAS_PKEYS */ }; size_t i; diff --git a/include/linux/mm.h b/include/linux/mm.h index 1ac1f06..c6a6f24 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -228,15 +228,16 @@ extern int overcommit_kbytes_handler(struct ctl_table *, int, void __user *, #define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4) #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */ -#if defined(CONFIG_X86) -# define VM_PATVM_ARCH_1 /* PAT reserves whole VMA at once (x86) */ -#if defined (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) +#ifdef CONFIG_ARCH_HAS_PKEYS # define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0 # define VM_PKEY_BIT0 VM_HIGH_ARCH_0 /* A protection key is a 4-bit value */ # define VM_PKEY_BIT1 VM_HIGH_ARCH_1 # define VM_PKEY_BIT2 VM_HIGH_ARCH_2 # define VM_PKEY_BIT3 VM_HIGH_ARCH_3 -#endif +#endif /* CONFIG_ARCH_HAS_PKEYS */ + +#if defined(CONFIG_X86) +# define VM_PATVM_ARCH_1 /* PAT reserves whole VMA at once (x86) */ #elif defined(CONFIG_PPC) # define VM_SAOVM_ARCH_1 /* Strong Access Ordering (powerpc) */ #elif defined(CONFIG_PARISC) -- 1.7.1
[PATCH v13 1/3] mm, powerpc, x86: define VM_PKEY_BITx bits if CONFIG_ARCH_HAS_PKEYS is enabled
VM_PKEY_BITx are defined only if CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS is enabled. Powerpc also needs these bits. Hence lets define the VM_PKEY_BITx bits for any architecture that enables CONFIG_ARCH_HAS_PKEYS. cc: Michael Ellermen cc: Benjamin Herrenschmidt cc: Andrew Morton Reviewed-by: Dave Hansen Signed-off-by: Ram Pai Reviewed-by: Ingo Molnar Reviewed-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/pkeys.h |2 ++ fs/proc/task_mmu.c |4 ++-- include/linux/mm.h |9 + 3 files changed, 9 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h index 3a9b82b..425b181 100644 --- a/arch/powerpc/include/asm/pkeys.h +++ b/arch/powerpc/include/asm/pkeys.h @@ -26,6 +26,8 @@ # define VM_PKEY_BIT2 VM_HIGH_ARCH_2 # define VM_PKEY_BIT3 VM_HIGH_ARCH_3 # define VM_PKEY_BIT4 VM_HIGH_ARCH_4 +#elif !defined(VM_PKEY_BIT4) +# define VM_PKEY_BIT4 VM_HIGH_ARCH_4 #endif #define ARCH_VM_PKEY_FLAGS (VM_PKEY_BIT0 | VM_PKEY_BIT1 | VM_PKEY_BIT2 | \ diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 65ae546..0c9e392 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -673,13 +673,13 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) [ilog2(VM_MERGEABLE)] = "mg", [ilog2(VM_UFFD_MISSING)]= "um", [ilog2(VM_UFFD_WP)] = "uw", -#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS +#ifdef CONFIG_ARCH_HAS_PKEYS /* These come out via ProtectionKey: */ [ilog2(VM_PKEY_BIT0)] = "", [ilog2(VM_PKEY_BIT1)] = "", [ilog2(VM_PKEY_BIT2)] = "", [ilog2(VM_PKEY_BIT3)] = "", -#endif +#endif /* CONFIG_ARCH_HAS_PKEYS */ }; size_t i; diff --git a/include/linux/mm.h b/include/linux/mm.h index 1ac1f06..c6a6f24 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -228,15 +228,16 @@ extern int overcommit_kbytes_handler(struct ctl_table *, int, void __user *, #define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4) #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */ -#if defined(CONFIG_X86) -# define VM_PATVM_ARCH_1 /* PAT reserves whole VMA at once (x86) */ -#if defined (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) +#ifdef CONFIG_ARCH_HAS_PKEYS # define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0 # define VM_PKEY_BIT0 VM_HIGH_ARCH_0 /* A protection key is a 4-bit value */ # define VM_PKEY_BIT1 VM_HIGH_ARCH_1 # define VM_PKEY_BIT2 VM_HIGH_ARCH_2 # define VM_PKEY_BIT3 VM_HIGH_ARCH_3 -#endif +#endif /* CONFIG_ARCH_HAS_PKEYS */ + +#if defined(CONFIG_X86) +# define VM_PATVM_ARCH_1 /* PAT reserves whole VMA at once (x86) */ #elif defined(CONFIG_PPC) # define VM_SAOVM_ARCH_1 /* Strong Access Ordering (powerpc) */ #elif defined(CONFIG_PARISC) -- 1.7.1
Re: [RFC] virtio: Use DMA MAP API for devices without an IOMMU
On Wed, Apr 18, 2018 at 07:20:10PM +0300, Michael S. Tsirkin wrote: > On Wed, Apr 18, 2018 at 08:47:10AM +0530, Anshuman Khandual wrote: > > On 04/15/2018 05:41 PM, Christoph Hellwig wrote: > > > On Fri, Apr 06, 2018 at 06:37:18PM +1000, Benjamin Herrenschmidt wrote: > > >>>> implemented as DMA API which the virtio core understands. There is no > > >>>> need for an IOMMU to be involved for the device representation in this > > >>>> case IMHO. > > >>> > > >>> This whole virtio translation issue is a mess. I think we need to > > >>> switch it to the dma API, and then quirk the legacy case to always > > >>> use the direct mapping inside the dma API. > > >> > > >> Fine with using a dma API always on the Linux side, but we do want to > > >> special case virtio still at the arch and qemu side to have a "direct > > >> mapping" mode. Not sure how (special flags on PCI devices) to avoid > > >> actually going through an emulated IOMMU on the qemu side, because that > > >> slows things down, esp. with vhost. > > >> > > >> IE, we can't I think just treat it the same as a physical device. > > > > > > We should have treated it like a physical device from the start, but > > > that device has unfortunately sailed. > > > > > > But yes, we'll need a per-device quirk that says 'don't attach an > > > iommu'. > > > > How about doing it per platform basis as suggested in this RFC through > > an arch specific callback. Because all the virtio devices in the given > > platform would require and exercise this option (to avail bounce buffer > > mechanism for secure guests as an example). So the flag basically is a > > platform specific one not a device specific one. > > That's not the case. A single platform can have a mix of virtio and > non-virtio devices. Same applies even within virtio, e.g. the balloon > device always bypasses an iommu. Further, QEMU supports out of process > devices some of which might bypass the IOMMU. Given that each virtio device has to behave differently depending on (a) what it does? (balloon, block, net etc ) (b) what platform it is on? (pseries, x86, ) (c) what environment it is on? (secure, insecure...) I think, we should let the virtio device decide what it wants, instead of forcing it to NOT use dma_ops when VIRTIO_F_IOMMU_PLATFORM is NOT enabled. Currently, virtio generic code, has an assumption that a device must NOT use dma operations if the hypervisor has NOT enabled VIRTIO_F_IOMMU_PLATFORM. This assumption is baked into vring_use_dma_api(); though there is a special exception for xen_domain(). This assumption is restricting us from using the dma_ops abstraction for virtio devices on secure VM. BTW: VIRTIO_F_IOMMU_PLATFORM may or may not be set on this platform. On our secure VM, virtio devices; by default, do not share pages with hypervisor. In other words, hypervisor cannot access the secure VM pages. The secure VM with the help of the hardware enables some pages to be shared with the hypervisor. Secure VM then uses these pages to bounce virtio data with the hypervisor. One elegant way to impliment this functionality is to abstract it under our special dma_ops and wire it to the virtio devices. However the restriction imposed by the generic virtio code, contrains us from doing so. If we can enrich vring_use_dma_api() to take multiple factors into consideration and not just VIRTIO_F_IOMMU_PLATFORM; perferrably by consulting a arch-dependent function, we could seemlessly integrate into the existing virtio infrastructure. RP > > -- > MST -- Ram Pai
Re: [RFC] virtio: Use DMA MAP API for devices without an IOMMU
On Wed, Apr 18, 2018 at 07:20:10PM +0300, Michael S. Tsirkin wrote: > On Wed, Apr 18, 2018 at 08:47:10AM +0530, Anshuman Khandual wrote: > > On 04/15/2018 05:41 PM, Christoph Hellwig wrote: > > > On Fri, Apr 06, 2018 at 06:37:18PM +1000, Benjamin Herrenschmidt wrote: > > >>>> implemented as DMA API which the virtio core understands. There is no > > >>>> need for an IOMMU to be involved for the device representation in this > > >>>> case IMHO. > > >>> > > >>> This whole virtio translation issue is a mess. I think we need to > > >>> switch it to the dma API, and then quirk the legacy case to always > > >>> use the direct mapping inside the dma API. > > >> > > >> Fine with using a dma API always on the Linux side, but we do want to > > >> special case virtio still at the arch and qemu side to have a "direct > > >> mapping" mode. Not sure how (special flags on PCI devices) to avoid > > >> actually going through an emulated IOMMU on the qemu side, because that > > >> slows things down, esp. with vhost. > > >> > > >> IE, we can't I think just treat it the same as a physical device. > > > > > > We should have treated it like a physical device from the start, but > > > that device has unfortunately sailed. > > > > > > But yes, we'll need a per-device quirk that says 'don't attach an > > > iommu'. > > > > How about doing it per platform basis as suggested in this RFC through > > an arch specific callback. Because all the virtio devices in the given > > platform would require and exercise this option (to avail bounce buffer > > mechanism for secure guests as an example). So the flag basically is a > > platform specific one not a device specific one. > > That's not the case. A single platform can have a mix of virtio and > non-virtio devices. Same applies even within virtio, e.g. the balloon > device always bypasses an iommu. Further, QEMU supports out of process > devices some of which might bypass the IOMMU. Given that each virtio device has to behave differently depending on (a) what it does? (balloon, block, net etc ) (b) what platform it is on? (pseries, x86, ) (c) what environment it is on? (secure, insecure...) I think, we should let the virtio device decide what it wants, instead of forcing it to NOT use dma_ops when VIRTIO_F_IOMMU_PLATFORM is NOT enabled. Currently, virtio generic code, has an assumption that a device must NOT use dma operations if the hypervisor has NOT enabled VIRTIO_F_IOMMU_PLATFORM. This assumption is baked into vring_use_dma_api(); though there is a special exception for xen_domain(). This assumption is restricting us from using the dma_ops abstraction for virtio devices on secure VM. BTW: VIRTIO_F_IOMMU_PLATFORM may or may not be set on this platform. On our secure VM, virtio devices; by default, do not share pages with hypervisor. In other words, hypervisor cannot access the secure VM pages. The secure VM with the help of the hardware enables some pages to be shared with the hypervisor. Secure VM then uses these pages to bounce virtio data with the hypervisor. One elegant way to impliment this functionality is to abstract it under our special dma_ops and wire it to the virtio devices. However the restriction imposed by the generic virtio code, contrains us from doing so. If we can enrich vring_use_dma_api() to take multiple factors into consideration and not just VIRTIO_F_IOMMU_PLATFORM; perferrably by consulting a arch-dependent function, we could seemlessly integrate into the existing virtio infrastructure. RP > > -- > MST -- Ram Pai
Re: [PATCH 0/9] [v3] x86, pkeys: two protection keys bug fixes
On Mon, Apr 30, 2018 at 08:30:43AM -0700, Dave Hansen wrote: > On 04/28/2018 12:05 AM, Ingo Molnar wrote: > > In the above kernel that was missing the PROT_EXEC fix I was repeatedly > > running > > the 64-bit and 32-bit testcases as non-root and as root as well, until I > > got a > > hang in the middle of a 32-bit test running as root: > > > > test 7 PASSED (iteration 19) > > test 8 PASSED (iteration 19) > > test 9 PASSED (iteration 19) > > > > < test just hangs here > > > For the hang, there is a known issue with the use of printf() in the > signal handler and a resulting deadlock. I *thought* there was a patch > merged to fix this from Ram Pai or one of the other IBM folks. Yes. there is a patch. unfortunately that patch assumes the selftest has been moved into selftests/vm directory. One option is -- I merge your changes in my selftest patchset, and send the entire series for upstream merge. Or you can manually massage-in the specific fix. The patch is "selftests/vm: Fix deadlock in protection_keys.c" https://patchwork.ozlabs.org/patch/864394/ Let me know, -- Ram Pai
Re: [PATCH 0/9] [v3] x86, pkeys: two protection keys bug fixes
On Mon, Apr 30, 2018 at 08:30:43AM -0700, Dave Hansen wrote: > On 04/28/2018 12:05 AM, Ingo Molnar wrote: > > In the above kernel that was missing the PROT_EXEC fix I was repeatedly > > running > > the 64-bit and 32-bit testcases as non-root and as root as well, until I > > got a > > hang in the middle of a 32-bit test running as root: > > > > test 7 PASSED (iteration 19) > > test 8 PASSED (iteration 19) > > test 9 PASSED (iteration 19) > > > > < test just hangs here > > > For the hang, there is a known issue with the use of printf() in the > signal handler and a resulting deadlock. I *thought* there was a patch > merged to fix this from Ram Pai or one of the other IBM folks. Yes. there is a patch. unfortunately that patch assumes the selftest has been moved into selftests/vm directory. One option is -- I merge your changes in my selftest patchset, and send the entire series for upstream merge. Or you can manually massage-in the specific fix. The patch is "selftests/vm: Fix deadlock in protection_keys.c" https://patchwork.ozlabs.org/patch/864394/ Let me know, -- Ram Pai
Re: [PATCH 4/9] x86, pkeys: override pkey when moving away from PROT_EXEC
On Thu, Apr 26, 2018 at 10:57:31AM -0700, Dave Hansen wrote: > On 04/06/2018 06:09 PM, Ram Pai wrote: > > Well :). my point is add this code and delete the other > > code that you add later in that function. > > I don't think I'm understanding what your suggestion was. I looked at > the code and I honestly do not think I can remove any of it. > > For the plain (non-explicit pkey_mprotect()) case, there are exactly > four paths through __arch_override_mprotect_pkey(), resulting in three > different results. > > 1. New prot==PROT_EXEC, no pkey-exec support -> do not override > 2. New prot!=PROT_EXEC, old VMA not PROT_EXEC-> do not override > 3. New prot==PROT_EXEC, w/ pkey-exec support -> override to exec pkey > 4. New prot!=PROT_EXEC, old VMA is PROT_EXEC -> override to default > > I don't see any redundancy there, or any code that we can eliminate or > simplify. It was simpler before, but that's what where bug was. Your code is fine. But than the following code accomplishes the same outcome; arguably with a one line change. Its not a big deal. Just trying to clarify my comment. int __arch_override_mprotect_pkey(struct vm_area_struct *vma, int prot, int pkey) { /* * Is this an mprotect_pkey() call? If so, never * override the value that came from the user. */ if (pkey != -1) return pkey; /* * Look for a protection-key-drive execute-only mapping * which is now being given permissions that are not * execute-only. Move it back to the default pkey. */ if (vma_is_pkey_exec_only(vma) && (prot != PROT_EXEC)) < return ARCH_DEFAULT_PKEY; /* * The mapping is execute-only. Go try to get the * execute-only protection key. If we fail to do that, * fall through as if we do not have execute-only * support. */ if (prot == PROT_EXEC) { pkey = execute_only_pkey(vma->vm_mm); if (pkey > 0) return pkey; } /* * This is a vanilla, non-pkey mprotect (or we failed to * setup execute-only), inherit the pkey from the VMA we * are working on. */ return vma_pkey(vma); } -- Ram Pai
Re: [PATCH 4/9] x86, pkeys: override pkey when moving away from PROT_EXEC
On Thu, Apr 26, 2018 at 10:57:31AM -0700, Dave Hansen wrote: > On 04/06/2018 06:09 PM, Ram Pai wrote: > > Well :). my point is add this code and delete the other > > code that you add later in that function. > > I don't think I'm understanding what your suggestion was. I looked at > the code and I honestly do not think I can remove any of it. > > For the plain (non-explicit pkey_mprotect()) case, there are exactly > four paths through __arch_override_mprotect_pkey(), resulting in three > different results. > > 1. New prot==PROT_EXEC, no pkey-exec support -> do not override > 2. New prot!=PROT_EXEC, old VMA not PROT_EXEC-> do not override > 3. New prot==PROT_EXEC, w/ pkey-exec support -> override to exec pkey > 4. New prot!=PROT_EXEC, old VMA is PROT_EXEC -> override to default > > I don't see any redundancy there, or any code that we can eliminate or > simplify. It was simpler before, but that's what where bug was. Your code is fine. But than the following code accomplishes the same outcome; arguably with a one line change. Its not a big deal. Just trying to clarify my comment. int __arch_override_mprotect_pkey(struct vm_area_struct *vma, int prot, int pkey) { /* * Is this an mprotect_pkey() call? If so, never * override the value that came from the user. */ if (pkey != -1) return pkey; /* * Look for a protection-key-drive execute-only mapping * which is now being given permissions that are not * execute-only. Move it back to the default pkey. */ if (vma_is_pkey_exec_only(vma) && (prot != PROT_EXEC)) < return ARCH_DEFAULT_PKEY; /* * The mapping is execute-only. Go try to get the * execute-only protection key. If we fail to do that, * fall through as if we do not have execute-only * support. */ if (prot == PROT_EXEC) { pkey = execute_only_pkey(vma->vm_mm); if (pkey > 0) return pkey; } /* * This is a vanilla, non-pkey mprotect (or we failed to * setup execute-only), inherit the pkey from the VMA we * are working on. */ return vma_pkey(vma); } -- Ram Pai
Re: [PATCH v2] resource: Fix integer overflow at reallocation
On Wed, Apr 11, 2018 at 08:16:33AM +0200, Takashi Iwai wrote: > On Wed, 11 Apr 2018 02:37:44 +0200, > Ram Pai wrote: > > > > On Tue, Apr 10, 2018 at 01:42:39PM -0700, Andrew Morton wrote: > > > On Tue, 10 Apr 2018 06:54:11 +0200 Takashi Iwai <ti...@suse.de> wrote: > > > > > > > On Tue, 10 Apr 2018 02:23:26 +0200, > > > > Andrew Morton wrote: > > > > > > > > > > On Sun, 8 Apr 2018 09:20:26 +0200 Takashi Iwai <ti...@suse.de> wrote: > > > > > > > > > > > We've got a bug report indicating a kernel panic at booting on an > > > > > > x86-32 system, and it turned out to be the invalid resource assigned > > > > > > after PCI resource reallocation. __find_resource() first aligns the > > > > > > resource start address and resets the end address with start+size-1 > > > > > > accordingly, then checks whether it's contained. Here the end > > > > > > address > > > > > > may overflow the integer, although resource_contains() still returns > > > > > > true because the function validates only start and end address. So > > > > > > this ends up with returning an invalid resource (start > end). > > > > > > > > > > > > There was already an attempt to cover such a problem in the commit > > > > > > 47ea91b4052d ("Resource: fix wrong resource window calculation"), > > > > > > but > > > > > > this case is an overseen one. > > > > > > > > > > > > This patch adds the validity check in resource_contains() to see > > > > > > whether the given resource has a valid range for avoiding the > > > > > > integer > > > > > > overflow problem. > > > > > > > > > > > > ... > > > > > > > > > > > > --- a/include/linux/ioport.h > > > > > > +++ b/include/linux/ioport.h > > > > > > @@ -212,6 +212,9 @@ static inline bool resource_contains(struct > > > > > > resource *r1, struct resource *r2) > > > > > > return false; > > > > > > if (r1->flags & IORESOURCE_UNSET || r2->flags & > > > > > > IORESOURCE_UNSET) > > > > > > return false; > > > > > > + /* sanity check whether it's a valid resource range */ > > > > > > + if (r2->end < r2->start) > > > > > > + return false; > > > > > > return r1->start <= r2->start && r1->end >= r2->end; > > > > > > } > > > > > > > > > > This doesn't look like the correct place to handle this? Clearly .end > > > > > < .start is an invalid state for a resource and we should never have > > > > > constructed such a thing in the first place? So adding a check at the > > > > > place where this resource was initially created seems to be the > > > > > correct > > > > > fix? > > > > > > > > Yes, that was also my first thought and actually the v1 patch was like > > > > that. > > > > > > Yes, I do prefer. > > > > > > > The v2 one was by Ram's suggestion so that we can cover > > > > potential bugs by all other callers as well. > > > > > > That could be done as a separate thing? > > > > the first approach will fix overflows in just that particular case. The > > second approach will catch and error-out overflows anywhere. There is a > > short-term down side to the second approach; it might cause a slew of > > error reports but will eventually help clean up all bad behavior. > > For that purpose, maybe we should do in two folds: at first fix this > specific issue in __find_resource(), then put the sanity check in > resource_contains() in addition but with WARN_ON() so that we can > catch more obviously. Yes WARN_ON() is a better solution. do the v1 way for this bug and replace the check in resource_contains() to a WARN_ON() in a separate patch? RP
Re: [PATCH v2] resource: Fix integer overflow at reallocation
On Wed, Apr 11, 2018 at 08:16:33AM +0200, Takashi Iwai wrote: > On Wed, 11 Apr 2018 02:37:44 +0200, > Ram Pai wrote: > > > > On Tue, Apr 10, 2018 at 01:42:39PM -0700, Andrew Morton wrote: > > > On Tue, 10 Apr 2018 06:54:11 +0200 Takashi Iwai wrote: > > > > > > > On Tue, 10 Apr 2018 02:23:26 +0200, > > > > Andrew Morton wrote: > > > > > > > > > > On Sun, 8 Apr 2018 09:20:26 +0200 Takashi Iwai wrote: > > > > > > > > > > > We've got a bug report indicating a kernel panic at booting on an > > > > > > x86-32 system, and it turned out to be the invalid resource assigned > > > > > > after PCI resource reallocation. __find_resource() first aligns the > > > > > > resource start address and resets the end address with start+size-1 > > > > > > accordingly, then checks whether it's contained. Here the end > > > > > > address > > > > > > may overflow the integer, although resource_contains() still returns > > > > > > true because the function validates only start and end address. So > > > > > > this ends up with returning an invalid resource (start > end). > > > > > > > > > > > > There was already an attempt to cover such a problem in the commit > > > > > > 47ea91b4052d ("Resource: fix wrong resource window calculation"), > > > > > > but > > > > > > this case is an overseen one. > > > > > > > > > > > > This patch adds the validity check in resource_contains() to see > > > > > > whether the given resource has a valid range for avoiding the > > > > > > integer > > > > > > overflow problem. > > > > > > > > > > > > ... > > > > > > > > > > > > --- a/include/linux/ioport.h > > > > > > +++ b/include/linux/ioport.h > > > > > > @@ -212,6 +212,9 @@ static inline bool resource_contains(struct > > > > > > resource *r1, struct resource *r2) > > > > > > return false; > > > > > > if (r1->flags & IORESOURCE_UNSET || r2->flags & > > > > > > IORESOURCE_UNSET) > > > > > > return false; > > > > > > + /* sanity check whether it's a valid resource range */ > > > > > > + if (r2->end < r2->start) > > > > > > + return false; > > > > > > return r1->start <= r2->start && r1->end >= r2->end; > > > > > > } > > > > > > > > > > This doesn't look like the correct place to handle this? Clearly .end > > > > > < .start is an invalid state for a resource and we should never have > > > > > constructed such a thing in the first place? So adding a check at the > > > > > place where this resource was initially created seems to be the > > > > > correct > > > > > fix? > > > > > > > > Yes, that was also my first thought and actually the v1 patch was like > > > > that. > > > > > > Yes, I do prefer. > > > > > > > The v2 one was by Ram's suggestion so that we can cover > > > > potential bugs by all other callers as well. > > > > > > That could be done as a separate thing? > > > > the first approach will fix overflows in just that particular case. The > > second approach will catch and error-out overflows anywhere. There is a > > short-term down side to the second approach; it might cause a slew of > > error reports but will eventually help clean up all bad behavior. > > For that purpose, maybe we should do in two folds: at first fix this > specific issue in __find_resource(), then put the sanity check in > resource_contains() in addition but with WARN_ON() so that we can > catch more obviously. Yes WARN_ON() is a better solution. do the v1 way for this bug and replace the check in resource_contains() to a WARN_ON() in a separate patch? RP
Re: [PATCH v2] resource: Fix integer overflow at reallocation
On Tue, Apr 10, 2018 at 01:42:39PM -0700, Andrew Morton wrote: > On Tue, 10 Apr 2018 06:54:11 +0200 Takashi Iwaiwrote: > > > On Tue, 10 Apr 2018 02:23:26 +0200, > > Andrew Morton wrote: > > > > > > On Sun, 8 Apr 2018 09:20:26 +0200 Takashi Iwai wrote: > > > > > > > We've got a bug report indicating a kernel panic at booting on an > > > > x86-32 system, and it turned out to be the invalid resource assigned > > > > after PCI resource reallocation. __find_resource() first aligns the > > > > resource start address and resets the end address with start+size-1 > > > > accordingly, then checks whether it's contained. Here the end address > > > > may overflow the integer, although resource_contains() still returns > > > > true because the function validates only start and end address. So > > > > this ends up with returning an invalid resource (start > end). > > > > > > > > There was already an attempt to cover such a problem in the commit > > > > 47ea91b4052d ("Resource: fix wrong resource window calculation"), but > > > > this case is an overseen one. > > > > > > > > This patch adds the validity check in resource_contains() to see > > > > whether the given resource has a valid range for avoiding the integer > > > > overflow problem. > > > > > > > > ... > > > > > > > > --- a/include/linux/ioport.h > > > > +++ b/include/linux/ioport.h > > > > @@ -212,6 +212,9 @@ static inline bool resource_contains(struct > > > > resource *r1, struct resource *r2) > > > > return false; > > > > if (r1->flags & IORESOURCE_UNSET || r2->flags & > > > > IORESOURCE_UNSET) > > > > return false; > > > > + /* sanity check whether it's a valid resource range */ > > > > + if (r2->end < r2->start) > > > > + return false; > > > > return r1->start <= r2->start && r1->end >= r2->end; > > > > } > > > > > > This doesn't look like the correct place to handle this? Clearly .end > > > < .start is an invalid state for a resource and we should never have > > > constructed such a thing in the first place? So adding a check at the > > > place where this resource was initially created seems to be the correct > > > fix? > > > > Yes, that was also my first thought and actually the v1 patch was like > > that. > > Yes, I do prefer. > > > The v2 one was by Ram's suggestion so that we can cover > > potential bugs by all other callers as well. > > That could be done as a separate thing? the first approach will fix overflows in just that particular case. The second approach will catch and error-out overflows anywhere. There is a short-term down side to the second approach; it might cause a slew of error reports but will eventually help clean up all bad behavior. RP
Re: [PATCH v2] resource: Fix integer overflow at reallocation
On Tue, Apr 10, 2018 at 01:42:39PM -0700, Andrew Morton wrote: > On Tue, 10 Apr 2018 06:54:11 +0200 Takashi Iwai wrote: > > > On Tue, 10 Apr 2018 02:23:26 +0200, > > Andrew Morton wrote: > > > > > > On Sun, 8 Apr 2018 09:20:26 +0200 Takashi Iwai wrote: > > > > > > > We've got a bug report indicating a kernel panic at booting on an > > > > x86-32 system, and it turned out to be the invalid resource assigned > > > > after PCI resource reallocation. __find_resource() first aligns the > > > > resource start address and resets the end address with start+size-1 > > > > accordingly, then checks whether it's contained. Here the end address > > > > may overflow the integer, although resource_contains() still returns > > > > true because the function validates only start and end address. So > > > > this ends up with returning an invalid resource (start > end). > > > > > > > > There was already an attempt to cover such a problem in the commit > > > > 47ea91b4052d ("Resource: fix wrong resource window calculation"), but > > > > this case is an overseen one. > > > > > > > > This patch adds the validity check in resource_contains() to see > > > > whether the given resource has a valid range for avoiding the integer > > > > overflow problem. > > > > > > > > ... > > > > > > > > --- a/include/linux/ioport.h > > > > +++ b/include/linux/ioport.h > > > > @@ -212,6 +212,9 @@ static inline bool resource_contains(struct > > > > resource *r1, struct resource *r2) > > > > return false; > > > > if (r1->flags & IORESOURCE_UNSET || r2->flags & > > > > IORESOURCE_UNSET) > > > > return false; > > > > + /* sanity check whether it's a valid resource range */ > > > > + if (r2->end < r2->start) > > > > + return false; > > > > return r1->start <= r2->start && r1->end >= r2->end; > > > > } > > > > > > This doesn't look like the correct place to handle this? Clearly .end > > > < .start is an invalid state for a resource and we should never have > > > constructed such a thing in the first place? So adding a check at the > > > place where this resource was initially created seems to be the correct > > > fix? > > > > Yes, that was also my first thought and actually the v1 patch was like > > that. > > Yes, I do prefer. > > > The v2 one was by Ram's suggestion so that we can cover > > potential bugs by all other callers as well. > > That could be done as a separate thing? the first approach will fix overflows in just that particular case. The second approach will catch and error-out overflows anywhere. There is a short-term down side to the second approach; it might cause a slew of error reports but will eventually help clean up all bad behavior. RP
Re: [PATCH 4/9] x86, pkeys: override pkey when moving away from PROT_EXEC
On Fri, Apr 06, 2018 at 05:47:29PM -0700, Dave Hansen wrote: > On 04/06/2018 05:09 PM, Ram Pai wrote: > >> - /* > >> - * Look for a protection-key-drive execute-only mapping > >> - * which is now being given permissions that are not > >> - * execute-only. Move it back to the default pkey. > >> - */ > >> - if (vma_is_pkey_exec_only(vma) && > >> - (prot & (PROT_READ|PROT_WRITE))) { > >> - return 0; > >> - } > >> + > > Dave, > > this can be simply: > > > > if ((vma_is_pkey_exec_only(vma) && (prot != PROT_EXEC)) > > return ARCH_DEFAULT_PKEY; > > Yes, but we're removing that code entirely. :) Well :). my point is add this code and delete the other code that you add later in that function. RP -- Ram Pai
Re: [PATCH 4/9] x86, pkeys: override pkey when moving away from PROT_EXEC
On Fri, Apr 06, 2018 at 05:47:29PM -0700, Dave Hansen wrote: > On 04/06/2018 05:09 PM, Ram Pai wrote: > >> - /* > >> - * Look for a protection-key-drive execute-only mapping > >> - * which is now being given permissions that are not > >> - * execute-only. Move it back to the default pkey. > >> - */ > >> - if (vma_is_pkey_exec_only(vma) && > >> - (prot & (PROT_READ|PROT_WRITE))) { > >> - return 0; > >> - } > >> + > > Dave, > > this can be simply: > > > > if ((vma_is_pkey_exec_only(vma) && (prot != PROT_EXEC)) > > return ARCH_DEFAULT_PKEY; > > Yes, but we're removing that code entirely. :) Well :). my point is add this code and delete the other code that you add later in that function. RP -- Ram Pai
Re: [PATCH 4/9] x86, pkeys: override pkey when moving away from PROT_EXEC
On Mon, Mar 26, 2018 at 10:27:27AM -0700, Dave Hansen wrote: > > From: Dave Hansen <dave.han...@linux.intel.com> > > I got a bug report that the following code (roughly) was > causing a SIGSEGV: > > mprotect(ptr, size, PROT_EXEC); > mprotect(ptr, size, PROT_NONE); > mprotect(ptr, size, PROT_READ); > *ptr = 100; > > The problem is hit when the mprotect(PROT_EXEC) > is implicitly assigned a protection key to the VMA, and made > that key ACCESS_DENY|WRITE_DENY. The PROT_NONE mprotect() > failed to remove the protection key, and the PROT_NONE-> > PROT_READ left the PTE usable, but the pkey still in place > and left the memory inaccessible. > > To fix this, we ensure that we always "override" the pkee > at mprotect() if the VMA does not have execute-only > permissions, but the VMA has the execute-only pkey. > > We had a check for PROT_READ/WRITE, but it did not work > for PROT_NONE. This entirely removes the PROT_* checks, > which ensures that PROT_NONE now works. > > Reported-by: Shakeel Butt <shake...@google.com> > > Signed-off-by: Dave Hansen <dave.han...@linux.intel.com> > Fixes: 62b5f7d013f ("mm/core, x86/mm/pkeys: Add execute-only protection keys > support") > Cc: sta...@kernel.org > Cc: Ram Pai <linux...@us.ibm.com> > Cc: Thomas Gleixner <t...@linutronix.de> > Cc: Dave Hansen <dave.han...@intel.com> > Cc: Michael Ellermen <m...@ellerman.id.au> > Cc: Ingo Molnar <mi...@kernel.org> > Cc: Andrew Morton <a...@linux-foundation.org> > Cc: Shuah Khan <sh...@kernel.org> > --- > > b/arch/x86/include/asm/pkeys.h | 12 +++- > b/arch/x86/mm/pkeys.c | 19 ++- > 2 files changed, 21 insertions(+), 10 deletions(-) > > diff -puN > arch/x86/include/asm/pkeys.h~pkeys-abandon-exec-only-pkey-more-aggressively > arch/x86/include/asm/pkeys.h > --- > a/arch/x86/include/asm/pkeys.h~pkeys-abandon-exec-only-pkey-more-aggressively > 2018-03-26 10:22:35.380170193 -0700 > +++ b/arch/x86/include/asm/pkeys.h2018-03-26 10:22:35.385170193 -0700 > @@ -2,6 +2,8 @@ > #ifndef _ASM_X86_PKEYS_H > #define _ASM_X86_PKEYS_H > > +#define ARCH_DEFAULT_PKEY0 > + > #define arch_max_pkey() (boot_cpu_has(X86_FEATURE_OSPKE) ? 16 : 1) > > extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, > @@ -15,7 +17,7 @@ extern int __execute_only_pkey(struct mm > static inline int execute_only_pkey(struct mm_struct *mm) > { > if (!boot_cpu_has(X86_FEATURE_OSPKE)) > - return 0; > + return ARCH_DEFAULT_PKEY; > > return __execute_only_pkey(mm); > } > @@ -56,6 +58,14 @@ bool mm_pkey_is_allocated(struct mm_stru > return false; > if (pkey >= arch_max_pkey()) > return false; > + /* > + * The exec-only pkey is set in the allocation map, but > + * is not available to any of the user interfaces like > + * mprotect_pkey(). > + */ > + if (pkey == mm->context.execute_only_pkey) > + return false; > + > return mm_pkey_allocation_map(mm) & (1U << pkey); > } > > diff -puN arch/x86/mm/pkeys.c~pkeys-abandon-exec-only-pkey-more-aggressively > arch/x86/mm/pkeys.c > --- a/arch/x86/mm/pkeys.c~pkeys-abandon-exec-only-pkey-more-aggressively > 2018-03-26 10:22:35.381170193 -0700 > +++ b/arch/x86/mm/pkeys.c 2018-03-26 10:22:35.385170193 -0700 > @@ -94,15 +94,7 @@ int __arch_override_mprotect_pkey(struct >*/ > if (pkey != -1) > return pkey; > - /* > - * Look for a protection-key-drive execute-only mapping > - * which is now being given permissions that are not > - * execute-only. Move it back to the default pkey. > - */ > - if (vma_is_pkey_exec_only(vma) && > - (prot & (PROT_READ|PROT_WRITE))) { > - return 0; > - } > + Dave, this can be simply: if ((vma_is_pkey_exec_only(vma) && (prot != PROT_EXEC)) return ARCH_DEFAULT_PKEY; No? RP
Re: [PATCH 4/9] x86, pkeys: override pkey when moving away from PROT_EXEC
On Mon, Mar 26, 2018 at 10:27:27AM -0700, Dave Hansen wrote: > > From: Dave Hansen > > I got a bug report that the following code (roughly) was > causing a SIGSEGV: > > mprotect(ptr, size, PROT_EXEC); > mprotect(ptr, size, PROT_NONE); > mprotect(ptr, size, PROT_READ); > *ptr = 100; > > The problem is hit when the mprotect(PROT_EXEC) > is implicitly assigned a protection key to the VMA, and made > that key ACCESS_DENY|WRITE_DENY. The PROT_NONE mprotect() > failed to remove the protection key, and the PROT_NONE-> > PROT_READ left the PTE usable, but the pkey still in place > and left the memory inaccessible. > > To fix this, we ensure that we always "override" the pkee > at mprotect() if the VMA does not have execute-only > permissions, but the VMA has the execute-only pkey. > > We had a check for PROT_READ/WRITE, but it did not work > for PROT_NONE. This entirely removes the PROT_* checks, > which ensures that PROT_NONE now works. > > Reported-by: Shakeel Butt > > Signed-off-by: Dave Hansen > Fixes: 62b5f7d013f ("mm/core, x86/mm/pkeys: Add execute-only protection keys > support") > Cc: sta...@kernel.org > Cc: Ram Pai > Cc: Thomas Gleixner > Cc: Dave Hansen > Cc: Michael Ellermen > Cc: Ingo Molnar > Cc: Andrew Morton > Cc: Shuah Khan > --- > > b/arch/x86/include/asm/pkeys.h | 12 +++- > b/arch/x86/mm/pkeys.c | 19 ++- > 2 files changed, 21 insertions(+), 10 deletions(-) > > diff -puN > arch/x86/include/asm/pkeys.h~pkeys-abandon-exec-only-pkey-more-aggressively > arch/x86/include/asm/pkeys.h > --- > a/arch/x86/include/asm/pkeys.h~pkeys-abandon-exec-only-pkey-more-aggressively > 2018-03-26 10:22:35.380170193 -0700 > +++ b/arch/x86/include/asm/pkeys.h2018-03-26 10:22:35.385170193 -0700 > @@ -2,6 +2,8 @@ > #ifndef _ASM_X86_PKEYS_H > #define _ASM_X86_PKEYS_H > > +#define ARCH_DEFAULT_PKEY0 > + > #define arch_max_pkey() (boot_cpu_has(X86_FEATURE_OSPKE) ? 16 : 1) > > extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, > @@ -15,7 +17,7 @@ extern int __execute_only_pkey(struct mm > static inline int execute_only_pkey(struct mm_struct *mm) > { > if (!boot_cpu_has(X86_FEATURE_OSPKE)) > - return 0; > + return ARCH_DEFAULT_PKEY; > > return __execute_only_pkey(mm); > } > @@ -56,6 +58,14 @@ bool mm_pkey_is_allocated(struct mm_stru > return false; > if (pkey >= arch_max_pkey()) > return false; > + /* > + * The exec-only pkey is set in the allocation map, but > + * is not available to any of the user interfaces like > + * mprotect_pkey(). > + */ > + if (pkey == mm->context.execute_only_pkey) > + return false; > + > return mm_pkey_allocation_map(mm) & (1U << pkey); > } > > diff -puN arch/x86/mm/pkeys.c~pkeys-abandon-exec-only-pkey-more-aggressively > arch/x86/mm/pkeys.c > --- a/arch/x86/mm/pkeys.c~pkeys-abandon-exec-only-pkey-more-aggressively > 2018-03-26 10:22:35.381170193 -0700 > +++ b/arch/x86/mm/pkeys.c 2018-03-26 10:22:35.385170193 -0700 > @@ -94,15 +94,7 @@ int __arch_override_mprotect_pkey(struct >*/ > if (pkey != -1) > return pkey; > - /* > - * Look for a protection-key-drive execute-only mapping > - * which is now being given permissions that are not > - * execute-only. Move it back to the default pkey. > - */ > - if (vma_is_pkey_exec_only(vma) && > - (prot & (PROT_READ|PROT_WRITE))) { > - return 0; > - } > + Dave, this can be simply: if ((vma_is_pkey_exec_only(vma) && (prot != PROT_EXEC)) return ARCH_DEFAULT_PKEY; No? RP
Re: [PATCH v2] powerpc, pkey: make protection key 0 less special
On Wed, Apr 04, 2018 at 06:41:01PM -0300, Thiago Jung Bauermann wrote: > > Hello Ram, > > Ram Pai <linux...@us.ibm.com> writes: > > > Applications need the ability to associate an address-range with some > > key and latter revert to its initial default key. Pkey-0 comes close to > > providing this function but falls short, because the current > > implementation disallows applications to explicitly associate pkey-0 to > > the address range. > > > > Lets make pkey-0 less special and treat it almost like any other key. > > Thus it can be explicitly associated with any address range, and can be > > freed. This gives the application more flexibility and power. The > > ability to free pkey-0 must be used responsibily, since pkey-0 is > > associated with almost all address-range by default. > > > > Even with this change pkey-0 continues to be slightly more special > > from the following point of view. > > (a) it is implicitly allocated. > > (b) it is the default key assigned to any address-range. > > It's also special in more ways (and if intentional, these should be part > of the commit message as well): > > (c) it's not possible to change permissions for key 0 > > This has two causes: this patch explicitly forbids it in > arch_set_user_pkey_access(), and also because even if it's allocated, > the bits for key 0 in AMOR and UAMOR aren't set. Yes. will have to capture that one aswell. we cannot let userspace change permissions on key 0 because doing so will hurt the kernel too. Unlike x86 where keys are effective only in userspace, powerpc keys are effective even in the kernel. So if the kernel tries to access something, it will get stuck forever. Almost everything in the kernel is associated with key-0. I ran a small test program which disabled access on key 0 from userspace, and as expected ran into softlockups. It certainly can lead to denial-of-service-attack. We can let apps shoot-itself-in-its-foot but if the shot hurts someone else, we will have to stop it. The key-semantics discussed with the x86 folks did not explicitly say anything about changing permissions on key-0. We will have to keep that part of the semantics open-ended. > > (d) it can be freed, but can't be allocated again later. > > This is because mm_pkey_alloc() only calls __arch_activate_pkey(ret) > if ret > 0. > > It looks like (d) is a bug. Either mm_pkey_free() should fail with key > 0, or mm_pkey_alloc() should work with it. Well, it can be allocated, just that we do not let userspace change the permissions on the key. __arch_activate_pkey(ret) does not get called for pkey-0. > > (c) could be a measure to prevent users from shooting themselves in > their feet. But if that is the case, then mm_pkey_free() should forbid > freeing key 0 too. > > > Tested on powerpc. > > > > cc: Thomas Gleixner <t...@linutronix.de> > > cc: Dave Hansen <dave.han...@intel.com> > > cc: Michael Ellermen <m...@ellerman.id.au> > > cc: Ingo Molnar <mi...@kernel.org> > > cc: Andrew Morton <a...@linux-foundation.org> > > Signed-off-by: Ram Pai <linux...@us.ibm.com> > > --- > > History: > > v2: mm_pkey_is_allocated() continued to treat pkey-0 special. > > fixed it. > > > > arch/powerpc/include/asm/pkeys.h | 20 > > arch/powerpc/mm/pkeys.c | 20 > > 2 files changed, 28 insertions(+), 12 deletions(-) > > > > diff --git a/arch/powerpc/include/asm/pkeys.h > > b/arch/powerpc/include/asm/pkeys.h > > index 0409c80..b598fa9 100644 > > --- a/arch/powerpc/include/asm/pkeys.h > > +++ b/arch/powerpc/include/asm/pkeys.h > > @@ -101,10 +101,14 @@ static inline u16 pte_to_pkey_bits(u64 pteflags) > > > > static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey) > > { > > - /* A reserved key is never considered as 'explicitly allocated' */ > > - return ((pkey < arch_max_pkey()) && > > - !__mm_pkey_is_reserved(pkey) && > > - __mm_pkey_is_allocated(mm, pkey)); > > + if (pkey < 0 || pkey >= arch_max_pkey()) > > + return false; > > + > > + /* Reserved keys are never allocated. */ > > + if (__mm_pkey_is_reserved(pkey)) > > + return false; > > + > > + return __mm_pkey_is_allocated(mm, pkey); > > } > > > > extern void __arch_activate_pkey(int pkey); > > @@ -200,6 +204,14 @@ static inline int arch_set_user_pkey_access(struct > > task_struct *tsk, int pkey, > > { > > if (static_branch_likely(_disabled)) &
Re: [PATCH v2] powerpc, pkey: make protection key 0 less special
On Wed, Apr 04, 2018 at 06:41:01PM -0300, Thiago Jung Bauermann wrote: > > Hello Ram, > > Ram Pai writes: > > > Applications need the ability to associate an address-range with some > > key and latter revert to its initial default key. Pkey-0 comes close to > > providing this function but falls short, because the current > > implementation disallows applications to explicitly associate pkey-0 to > > the address range. > > > > Lets make pkey-0 less special and treat it almost like any other key. > > Thus it can be explicitly associated with any address range, and can be > > freed. This gives the application more flexibility and power. The > > ability to free pkey-0 must be used responsibily, since pkey-0 is > > associated with almost all address-range by default. > > > > Even with this change pkey-0 continues to be slightly more special > > from the following point of view. > > (a) it is implicitly allocated. > > (b) it is the default key assigned to any address-range. > > It's also special in more ways (and if intentional, these should be part > of the commit message as well): > > (c) it's not possible to change permissions for key 0 > > This has two causes: this patch explicitly forbids it in > arch_set_user_pkey_access(), and also because even if it's allocated, > the bits for key 0 in AMOR and UAMOR aren't set. Yes. will have to capture that one aswell. we cannot let userspace change permissions on key 0 because doing so will hurt the kernel too. Unlike x86 where keys are effective only in userspace, powerpc keys are effective even in the kernel. So if the kernel tries to access something, it will get stuck forever. Almost everything in the kernel is associated with key-0. I ran a small test program which disabled access on key 0 from userspace, and as expected ran into softlockups. It certainly can lead to denial-of-service-attack. We can let apps shoot-itself-in-its-foot but if the shot hurts someone else, we will have to stop it. The key-semantics discussed with the x86 folks did not explicitly say anything about changing permissions on key-0. We will have to keep that part of the semantics open-ended. > > (d) it can be freed, but can't be allocated again later. > > This is because mm_pkey_alloc() only calls __arch_activate_pkey(ret) > if ret > 0. > > It looks like (d) is a bug. Either mm_pkey_free() should fail with key > 0, or mm_pkey_alloc() should work with it. Well, it can be allocated, just that we do not let userspace change the permissions on the key. __arch_activate_pkey(ret) does not get called for pkey-0. > > (c) could be a measure to prevent users from shooting themselves in > their feet. But if that is the case, then mm_pkey_free() should forbid > freeing key 0 too. > > > Tested on powerpc. > > > > cc: Thomas Gleixner > > cc: Dave Hansen > > cc: Michael Ellermen > > cc: Ingo Molnar > > cc: Andrew Morton > > Signed-off-by: Ram Pai > > --- > > History: > > v2: mm_pkey_is_allocated() continued to treat pkey-0 special. > > fixed it. > > > > arch/powerpc/include/asm/pkeys.h | 20 > > arch/powerpc/mm/pkeys.c | 20 > > 2 files changed, 28 insertions(+), 12 deletions(-) > > > > diff --git a/arch/powerpc/include/asm/pkeys.h > > b/arch/powerpc/include/asm/pkeys.h > > index 0409c80..b598fa9 100644 > > --- a/arch/powerpc/include/asm/pkeys.h > > +++ b/arch/powerpc/include/asm/pkeys.h > > @@ -101,10 +101,14 @@ static inline u16 pte_to_pkey_bits(u64 pteflags) > > > > static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey) > > { > > - /* A reserved key is never considered as 'explicitly allocated' */ > > - return ((pkey < arch_max_pkey()) && > > - !__mm_pkey_is_reserved(pkey) && > > - __mm_pkey_is_allocated(mm, pkey)); > > + if (pkey < 0 || pkey >= arch_max_pkey()) > > + return false; > > + > > + /* Reserved keys are never allocated. */ > > + if (__mm_pkey_is_reserved(pkey)) > > + return false; > > + > > + return __mm_pkey_is_allocated(mm, pkey); > > } > > > > extern void __arch_activate_pkey(int pkey); > > @@ -200,6 +204,14 @@ static inline int arch_set_user_pkey_access(struct > > task_struct *tsk, int pkey, > > { > > if (static_branch_likely(_disabled)) > > return -EINVAL; > > + > > + /* > > +* userspace is discouraged from changing permissions of > > +* pkey-0. > > They're not discouraged. They're
Re: [PATCH] resource: Fix integer overflow at reallocation
On Thu, Apr 05, 2018 at 04:40:15PM +0200, Takashi Iwai wrote: > On Mon, 02 Apr 2018 22:35:04 +0200, > Takashi Iwai wrote: > > > > On Mon, 02 Apr 2018 21:09:03 +0200, > > Ram Pai wrote: > > > > > > On Mon, Apr 02, 2018 at 09:16:16AM +0200, Takashi Iwai wrote: > > > > We've got a bug report indicating a kernel panic at booting on an > > > > x86-32 system, and it turned out to be the invalid resource assigned > > > > after reallocation. __find_resource() first aligns the resource start > > > > address and resets the end address with start+size-1 accordingly, then > > > > checks whether it's contained. Here the end address may overflow the > > > > integer, although resource_contains() still returns true because the > > > > function validates only start and end address. So this ends up with > > > > returning an invalid resource (start > end). > > > > > > > > There was already an attempt to cover such a problem in the commit > > > > 47ea91b4052d ("Resource: fix wrong resource window calculation"), but > > > > this case is an overseen one. > > > > > > > > This patch adds the validity check of the newly calculated resource > > > > for avoiding the integer overflow problem. > > > > > > Should we move this check "alloc.start <= alloc.end" into > > > resource_contains()? > > > Doing so will catch all uses of such erroneous (overflowing) resources. > > > > I thought of it, too. OTOH, it's rather a validity check and doesn't > > belong to resource_contains() semantics. If the function returns > > false, you don't know whether it's an invalid resource or it's really > > not contained. > > > > We may return an error code, but I'm not sure whether such an API > > change is needed. Maybe not. > > FWIW, below is the revised one to move the check into > resource_contains(). > > My concern is, however, that the resource validity check isn't a job > of resource_contains(). OTOH, this may avoid other similar cases, so > it might be worth. > > In anyway, if there is no objection, and anyone else doesn't want to > take, I'll forward this to Andrew as a poor orphan kid. I will stand by you as a poor-orphan-buddy. :-) Reviewed-by: Ram Pai <linux...@us.ibm.com> RP > > > thanks, > > Takashi > > -- 8< -- > From: Takashi Iwai <ti...@suse.de> > Subject: [PATCH v2] resource: Fix integer overflow at reallocation > > We've got a bug report indicating a kernel panic at booting on an > x86-32 system, and it turned out to be the invalid resource assigned > after reallocation. __find_resource() first aligns the resource start > address and resets the end address with start+size-1 accordingly, then > checks whether it's contained. Here the end address may overflow the > integer, although resource_contains() still returns true because the > function validates only start and end address. So this ends up with > returning an invalid resource (start > end). > > There was already an attempt to cover such a problem in the commit > 47ea91b4052d ("Resource: fix wrong resource window calculation"), but > this case is an overseen one. > > This patch adds the validity check in resource_contains() to see > whether the given resource has a valid range for avoiding the integer > overflow problem. > > Bugzilla: > https://urldefense.proofpoint.com/v2/url?u=http-3A__bugzilla.opensuse.org_show-5Fbug.cgi-3Fid-3D1086739=DwIBAg=jf_iaSHvJObTbx-siA1ZOg=m-UrKChQVkZtnPpjbF6YY99NbT8FBByQ-E-ygV8luxw=NNkNAWFZc1XRu3rkv7Y2sepSCe8re2cvNZuCJt5dWFE=7i3g3ZVYsUceGVK_ZKkCJIABp4l0RD59NelK3GoD8mI= > Fixes: 23c570a67448 ("resource: ability to resize an allocated resource") > Reported-and-tested-by: Michael Henders <hende...@shaw.ca> > Cc: <sta...@vger.kernel.org> > Signed-off-by: Takashi Iwai <ti...@suse.de> > --- > include/linux/ioport.h | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/include/linux/ioport.h b/include/linux/ioport.h > index da0ebaec25f0..466d7be046eb 100644 > --- a/include/linux/ioport.h > +++ b/include/linux/ioport.h > @@ -212,6 +212,9 @@ static inline bool resource_contains(struct resource *r1, > struct resource *r2) > return false; > if (r1->flags & IORESOURCE_UNSET || r2->flags & IORESOURCE_UNSET) > return false; > + /* sanity check whether it's a valid resource range */ > + if (r2->end < r2->start) > + return false; > return r1->start <= r2->start && r1->end >= r2->end; > } > > -- > 2.16.3 -- Ram Pai
Re: [PATCH] resource: Fix integer overflow at reallocation
On Thu, Apr 05, 2018 at 04:40:15PM +0200, Takashi Iwai wrote: > On Mon, 02 Apr 2018 22:35:04 +0200, > Takashi Iwai wrote: > > > > On Mon, 02 Apr 2018 21:09:03 +0200, > > Ram Pai wrote: > > > > > > On Mon, Apr 02, 2018 at 09:16:16AM +0200, Takashi Iwai wrote: > > > > We've got a bug report indicating a kernel panic at booting on an > > > > x86-32 system, and it turned out to be the invalid resource assigned > > > > after reallocation. __find_resource() first aligns the resource start > > > > address and resets the end address with start+size-1 accordingly, then > > > > checks whether it's contained. Here the end address may overflow the > > > > integer, although resource_contains() still returns true because the > > > > function validates only start and end address. So this ends up with > > > > returning an invalid resource (start > end). > > > > > > > > There was already an attempt to cover such a problem in the commit > > > > 47ea91b4052d ("Resource: fix wrong resource window calculation"), but > > > > this case is an overseen one. > > > > > > > > This patch adds the validity check of the newly calculated resource > > > > for avoiding the integer overflow problem. > > > > > > Should we move this check "alloc.start <= alloc.end" into > > > resource_contains()? > > > Doing so will catch all uses of such erroneous (overflowing) resources. > > > > I thought of it, too. OTOH, it's rather a validity check and doesn't > > belong to resource_contains() semantics. If the function returns > > false, you don't know whether it's an invalid resource or it's really > > not contained. > > > > We may return an error code, but I'm not sure whether such an API > > change is needed. Maybe not. > > FWIW, below is the revised one to move the check into > resource_contains(). > > My concern is, however, that the resource validity check isn't a job > of resource_contains(). OTOH, this may avoid other similar cases, so > it might be worth. > > In anyway, if there is no objection, and anyone else doesn't want to > take, I'll forward this to Andrew as a poor orphan kid. I will stand by you as a poor-orphan-buddy. :-) Reviewed-by: Ram Pai RP > > > thanks, > > Takashi > > -- 8< -- > From: Takashi Iwai > Subject: [PATCH v2] resource: Fix integer overflow at reallocation > > We've got a bug report indicating a kernel panic at booting on an > x86-32 system, and it turned out to be the invalid resource assigned > after reallocation. __find_resource() first aligns the resource start > address and resets the end address with start+size-1 accordingly, then > checks whether it's contained. Here the end address may overflow the > integer, although resource_contains() still returns true because the > function validates only start and end address. So this ends up with > returning an invalid resource (start > end). > > There was already an attempt to cover such a problem in the commit > 47ea91b4052d ("Resource: fix wrong resource window calculation"), but > this case is an overseen one. > > This patch adds the validity check in resource_contains() to see > whether the given resource has a valid range for avoiding the integer > overflow problem. > > Bugzilla: > https://urldefense.proofpoint.com/v2/url?u=http-3A__bugzilla.opensuse.org_show-5Fbug.cgi-3Fid-3D1086739=DwIBAg=jf_iaSHvJObTbx-siA1ZOg=m-UrKChQVkZtnPpjbF6YY99NbT8FBByQ-E-ygV8luxw=NNkNAWFZc1XRu3rkv7Y2sepSCe8re2cvNZuCJt5dWFE=7i3g3ZVYsUceGVK_ZKkCJIABp4l0RD59NelK3GoD8mI= > Fixes: 23c570a67448 ("resource: ability to resize an allocated resource") > Reported-and-tested-by: Michael Henders > Cc: > Signed-off-by: Takashi Iwai > --- > include/linux/ioport.h | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/include/linux/ioport.h b/include/linux/ioport.h > index da0ebaec25f0..466d7be046eb 100644 > --- a/include/linux/ioport.h > +++ b/include/linux/ioport.h > @@ -212,6 +212,9 @@ static inline bool resource_contains(struct resource *r1, > struct resource *r2) > return false; > if (r1->flags & IORESOURCE_UNSET || r2->flags & IORESOURCE_UNSET) > return false; > + /* sanity check whether it's a valid resource range */ > + if (r2->end < r2->start) > + return false; > return r1->start <= r2->start && r1->end >= r2->end; > } > > -- > 2.16.3 -- Ram Pai
Re: [PATCH] resource: Fix integer overflow at reallocation
On Mon, Apr 02, 2018 at 09:16:16AM +0200, Takashi Iwai wrote: > We've got a bug report indicating a kernel panic at booting on an > x86-32 system, and it turned out to be the invalid resource assigned > after reallocation. __find_resource() first aligns the resource start > address and resets the end address with start+size-1 accordingly, then > checks whether it's contained. Here the end address may overflow the > integer, although resource_contains() still returns true because the > function validates only start and end address. So this ends up with > returning an invalid resource (start > end). > > There was already an attempt to cover such a problem in the commit > 47ea91b4052d ("Resource: fix wrong resource window calculation"), but > this case is an overseen one. > > This patch adds the validity check of the newly calculated resource > for avoiding the integer overflow problem. Should we move this check "alloc.start <= alloc.end" into resource_contains()? Doing so will catch all uses of such erroneous (overflowing) resources. RP > > Bugzilla: > https://urldefense.proofpoint.com/v2/url?u=http-3A__bugzilla.opensuse.org_show-5Fbug.cgi-3Fid-3D1086739=DwIBAg=jf_iaSHvJObTbx-siA1ZOg=m-UrKChQVkZtnPpjbF6YY99NbT8FBByQ-E-ygV8luxw=FoiwlR-LTJ9_EBQsLYSCqXuWrGhU1lXycdvhbaK7wOk=clxOtFUIAMlPNwQJZTaKnmIta9pMtJ8XprmwVd-ylvo= > Fixes: 23c570a67448 ("resource: ability to resize an allocated resource") > Reported-and-tested-by: Michael Henders <hende...@shaw.ca> > Cc: <sta...@vger.kernel.org> > Signed-off-by: Takashi Iwai <ti...@suse.de> > --- > > Bjorn, I send this to you since the bug hits during PCI init, although > the culprit is in generic resource management. > > kernel/resource.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/kernel/resource.c b/kernel/resource.c > index e270b5048988..2af6c03858b9 100644 > --- a/kernel/resource.c > +++ b/kernel/resource.c > @@ -651,7 +651,8 @@ static int __find_resource(struct resource *root, struct > resource *old, > alloc.start = > constraint->alignf(constraint->alignf_data, , > size, constraint->align); > alloc.end = alloc.start + size - 1; > - if (resource_contains(, )) { > + if (alloc.start <= alloc.end && > + resource_contains(, )) { > new->start = alloc.start; > new->end = alloc.end; > return 0; > -- > 2.16.2 -- Ram Pai
Re: [PATCH] resource: Fix integer overflow at reallocation
On Mon, Apr 02, 2018 at 09:16:16AM +0200, Takashi Iwai wrote: > We've got a bug report indicating a kernel panic at booting on an > x86-32 system, and it turned out to be the invalid resource assigned > after reallocation. __find_resource() first aligns the resource start > address and resets the end address with start+size-1 accordingly, then > checks whether it's contained. Here the end address may overflow the > integer, although resource_contains() still returns true because the > function validates only start and end address. So this ends up with > returning an invalid resource (start > end). > > There was already an attempt to cover such a problem in the commit > 47ea91b4052d ("Resource: fix wrong resource window calculation"), but > this case is an overseen one. > > This patch adds the validity check of the newly calculated resource > for avoiding the integer overflow problem. Should we move this check "alloc.start <= alloc.end" into resource_contains()? Doing so will catch all uses of such erroneous (overflowing) resources. RP > > Bugzilla: > https://urldefense.proofpoint.com/v2/url?u=http-3A__bugzilla.opensuse.org_show-5Fbug.cgi-3Fid-3D1086739=DwIBAg=jf_iaSHvJObTbx-siA1ZOg=m-UrKChQVkZtnPpjbF6YY99NbT8FBByQ-E-ygV8luxw=FoiwlR-LTJ9_EBQsLYSCqXuWrGhU1lXycdvhbaK7wOk=clxOtFUIAMlPNwQJZTaKnmIta9pMtJ8XprmwVd-ylvo= > Fixes: 23c570a67448 ("resource: ability to resize an allocated resource") > Reported-and-tested-by: Michael Henders > Cc: > Signed-off-by: Takashi Iwai > --- > > Bjorn, I send this to you since the bug hits during PCI init, although > the culprit is in generic resource management. > > kernel/resource.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/kernel/resource.c b/kernel/resource.c > index e270b5048988..2af6c03858b9 100644 > --- a/kernel/resource.c > +++ b/kernel/resource.c > @@ -651,7 +651,8 @@ static int __find_resource(struct resource *root, struct > resource *old, > alloc.start = > constraint->alignf(constraint->alignf_data, , > size, constraint->align); > alloc.end = alloc.start + size - 1; > - if (resource_contains(, )) { > + if (alloc.start <= alloc.end && > + resource_contains(, )) { > new->start = alloc.start; > new->end = alloc.end; > return 0; > -- > 2.16.2 -- Ram Pai
[PATCH v13 1/3] mm, powerpc, x86: define VM_PKEY_BITx bits if CONFIG_ARCH_HAS_PKEYS is enabled
VM_PKEY_BITx are defined only if CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS is enabled. Powerpc also needs these bits. Hence lets define the VM_PKEY_BITx bits for any architecture that enables CONFIG_ARCH_HAS_PKEYS. cc: Michael Ellermen <m...@ellerman.id.au> cc: Benjamin Herrenschmidt <b...@kernel.crashing.org> cc: Andrew Morton <a...@linux-foundation.org> Reviewed-by: Dave Hansen <dave.han...@intel.com> Signed-off-by: Ram Pai <linux...@us.ibm.com> Reviewed-by: Ingo Molnar <mi...@kernel.org> Reviewed-by: Aneesh Kumar K.V <aneesh.ku...@linux.vnet.ibm.com> --- arch/powerpc/include/asm/pkeys.h |2 ++ fs/proc/task_mmu.c |4 ++-- include/linux/mm.h |9 + 3 files changed, 9 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h index 0d3c630..99344d7 100644 --- a/arch/powerpc/include/asm/pkeys.h +++ b/arch/powerpc/include/asm/pkeys.h @@ -26,6 +26,8 @@ # define VM_PKEY_BIT2 VM_HIGH_ARCH_2 # define VM_PKEY_BIT3 VM_HIGH_ARCH_3 # define VM_PKEY_BIT4 VM_HIGH_ARCH_4 +#elif !defined(VM_PKEY_BIT4) +# define VM_PKEY_BIT4 VM_HIGH_ARCH_4 #endif #define ARCH_VM_PKEY_FLAGS (VM_PKEY_BIT0 | VM_PKEY_BIT1 | VM_PKEY_BIT2 | \ diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index ec6d298..6b996d0 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -679,13 +679,13 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) [ilog2(VM_MERGEABLE)] = "mg", [ilog2(VM_UFFD_MISSING)]= "um", [ilog2(VM_UFFD_WP)] = "uw", -#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS +#ifdef CONFIG_ARCH_HAS_PKEYS /* These come out via ProtectionKey: */ [ilog2(VM_PKEY_BIT0)] = "", [ilog2(VM_PKEY_BIT1)] = "", [ilog2(VM_PKEY_BIT2)] = "", [ilog2(VM_PKEY_BIT3)] = "", -#endif +#endif /* CONFIG_ARCH_HAS_PKEYS */ }; size_t i; diff --git a/include/linux/mm.h b/include/linux/mm.h index ad06d42..ad207ad 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -228,15 +228,16 @@ extern int overcommit_kbytes_handler(struct ctl_table *, int, void __user *, #define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4) #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */ -#if defined(CONFIG_X86) -# define VM_PATVM_ARCH_1 /* PAT reserves whole VMA at once (x86) */ -#if defined (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) +#ifdef CONFIG_ARCH_HAS_PKEYS # define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0 # define VM_PKEY_BIT0 VM_HIGH_ARCH_0 /* A protection key is a 4-bit value */ # define VM_PKEY_BIT1 VM_HIGH_ARCH_1 # define VM_PKEY_BIT2 VM_HIGH_ARCH_2 # define VM_PKEY_BIT3 VM_HIGH_ARCH_3 -#endif +#endif /* CONFIG_ARCH_HAS_PKEYS */ + +#if defined(CONFIG_X86) +# define VM_PATVM_ARCH_1 /* PAT reserves whole VMA at once (x86) */ #elif defined(CONFIG_PPC) # define VM_SAOVM_ARCH_1 /* Strong Access Ordering (powerpc) */ #elif defined(CONFIG_PARISC) -- 1.7.1
[PATCH v13 1/3] mm, powerpc, x86: define VM_PKEY_BITx bits if CONFIG_ARCH_HAS_PKEYS is enabled
VM_PKEY_BITx are defined only if CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS is enabled. Powerpc also needs these bits. Hence lets define the VM_PKEY_BITx bits for any architecture that enables CONFIG_ARCH_HAS_PKEYS. cc: Michael Ellermen cc: Benjamin Herrenschmidt cc: Andrew Morton Reviewed-by: Dave Hansen Signed-off-by: Ram Pai Reviewed-by: Ingo Molnar Reviewed-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/pkeys.h |2 ++ fs/proc/task_mmu.c |4 ++-- include/linux/mm.h |9 + 3 files changed, 9 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h index 0d3c630..99344d7 100644 --- a/arch/powerpc/include/asm/pkeys.h +++ b/arch/powerpc/include/asm/pkeys.h @@ -26,6 +26,8 @@ # define VM_PKEY_BIT2 VM_HIGH_ARCH_2 # define VM_PKEY_BIT3 VM_HIGH_ARCH_3 # define VM_PKEY_BIT4 VM_HIGH_ARCH_4 +#elif !defined(VM_PKEY_BIT4) +# define VM_PKEY_BIT4 VM_HIGH_ARCH_4 #endif #define ARCH_VM_PKEY_FLAGS (VM_PKEY_BIT0 | VM_PKEY_BIT1 | VM_PKEY_BIT2 | \ diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index ec6d298..6b996d0 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -679,13 +679,13 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) [ilog2(VM_MERGEABLE)] = "mg", [ilog2(VM_UFFD_MISSING)]= "um", [ilog2(VM_UFFD_WP)] = "uw", -#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS +#ifdef CONFIG_ARCH_HAS_PKEYS /* These come out via ProtectionKey: */ [ilog2(VM_PKEY_BIT0)] = "", [ilog2(VM_PKEY_BIT1)] = "", [ilog2(VM_PKEY_BIT2)] = "", [ilog2(VM_PKEY_BIT3)] = "", -#endif +#endif /* CONFIG_ARCH_HAS_PKEYS */ }; size_t i; diff --git a/include/linux/mm.h b/include/linux/mm.h index ad06d42..ad207ad 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -228,15 +228,16 @@ extern int overcommit_kbytes_handler(struct ctl_table *, int, void __user *, #define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4) #endif /* CONFIG_ARCH_USES_HIGH_VMA_FLAGS */ -#if defined(CONFIG_X86) -# define VM_PATVM_ARCH_1 /* PAT reserves whole VMA at once (x86) */ -#if defined (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) +#ifdef CONFIG_ARCH_HAS_PKEYS # define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0 # define VM_PKEY_BIT0 VM_HIGH_ARCH_0 /* A protection key is a 4-bit value */ # define VM_PKEY_BIT1 VM_HIGH_ARCH_1 # define VM_PKEY_BIT2 VM_HIGH_ARCH_2 # define VM_PKEY_BIT3 VM_HIGH_ARCH_3 -#endif +#endif /* CONFIG_ARCH_HAS_PKEYS */ + +#if defined(CONFIG_X86) +# define VM_PATVM_ARCH_1 /* PAT reserves whole VMA at once (x86) */ #elif defined(CONFIG_PPC) # define VM_SAOVM_ARCH_1 /* Strong Access Ordering (powerpc) */ #elif defined(CONFIG_PARISC) -- 1.7.1
[PATCH v13 3/3] mm, x86, powerpc: display pkey in smaps only if arch supports pkeys
Currently the architecture specific code is expected to display the protection keys in smap for a given vma. This can lead to redundant code and possibly to divergent formats in which the key gets displayed. This patch changes the implementation. It displays the pkey only if the architecture support pkeys, i.e arch_pkeys_enabled() returns true. This patch provides x86 implementation for arch_pkeys_enabled(). x86 arch_show_smap() function is not needed anymore. Deleting it. cc: Michael Ellermen <m...@ellerman.id.au> cc: Benjamin Herrenschmidt <b...@kernel.crashing.org> cc: Andrew Morton <a...@linux-foundation.org> Reviewed-by: Dave Hansen <dave.han...@intel.com> Signed-off-by: Thiago Jung Bauermann <bauer...@linux.vnet.ibm.com> (fixed compilation errors for x86 configs) Acked-by: Michal Hocko <mho...@suse.com> Reviewed-by: Ingo Molnar <mi...@kernel.org> Signed-off-by: Ram Pai <linux...@us.ibm.com> --- arch/powerpc/include/asm/mmu_context.h |5 - arch/x86/include/asm/mmu_context.h |5 - arch/x86/include/asm/pkeys.h |1 + arch/x86/kernel/fpu/xstate.c |5 + arch/x86/kernel/setup.c|8 fs/proc/task_mmu.c | 10 +- include/linux/pkeys.h |7 ++- 7 files changed, 17 insertions(+), 24 deletions(-) diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h index 051b3d6..566b3c2 100644 --- a/arch/powerpc/include/asm/mmu_context.h +++ b/arch/powerpc/include/asm/mmu_context.h @@ -203,11 +203,6 @@ static inline bool arch_vma_access_permitted(struct vm_area_struct *vma, #define thread_pkey_regs_restore(new_thread, old_thread) #define thread_pkey_regs_init(thread) -static inline int vma_pkey(struct vm_area_struct *vma) -{ - return 0; -} - static inline u64 pte_to_hpte_pkey_bits(u64 pteflags) { return 0x0UL; diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 1de72ce..e597d09 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -295,11 +295,6 @@ static inline int vma_pkey(struct vm_area_struct *vma) return (vma->vm_flags & vma_pkey_mask) >> VM_PKEY_SHIFT; } -#else -static inline int vma_pkey(struct vm_area_struct *vma) -{ - return 0; -} #endif /* diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h index a0ba1ff..f6c287b 100644 --- a/arch/x86/include/asm/pkeys.h +++ b/arch/x86/include/asm/pkeys.h @@ -6,6 +6,7 @@ extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, unsigned long init_val); +extern bool arch_pkeys_enabled(void); /* * Try to dedicate one of the protection keys to be used as an diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 87a57b7..4f566e9 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -945,6 +945,11 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, return 0; } + +bool arch_pkeys_enabled(void) +{ + return boot_cpu_has(X86_FEATURE_OSPKE); +} #endif /* ! CONFIG_ARCH_HAS_PKEYS */ /* diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 4c616be..117ed01 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -1307,11 +1307,3 @@ static int __init register_kernel_offset_dumper(void) return 0; } __initcall(register_kernel_offset_dumper); - -void arch_show_smap(struct seq_file *m, struct vm_area_struct *vma) -{ - if (!boot_cpu_has(X86_FEATURE_OSPKE)) - return; - - seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma)); -} diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 6d83bb7..70aa912 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -18,10 +18,12 @@ #include #include #include +#include #include #include #include +#include #include "internal.h" void task_mem(struct seq_file *m, struct mm_struct *mm) @@ -733,10 +735,6 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask, } #endif /* HUGETLB_PAGE */ -void __weak arch_show_smap(struct seq_file *m, struct vm_area_struct *vma) -{ -} - static int show_smap(struct seq_file *m, void *v, int is_pid) { struct proc_maps_private *priv = m->private; @@ -856,9 +854,11 @@ static int show_smap(struct seq_file *m, void *v, int is_pid) (unsigned long)(mss->pss >> (10 + PSS_SHIFT))); if (!rollup_mode) { - arch_show_smap(m, vma); + if (arch_pkeys_enabled()) + seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma)); show_smap_vma_flags(m, vma); } + m_cache_vma(m, vma); return ret; } diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h index 0794ca7..49dff15
[PATCH v13 3/3] mm, x86, powerpc: display pkey in smaps only if arch supports pkeys
Currently the architecture specific code is expected to display the protection keys in smap for a given vma. This can lead to redundant code and possibly to divergent formats in which the key gets displayed. This patch changes the implementation. It displays the pkey only if the architecture support pkeys, i.e arch_pkeys_enabled() returns true. This patch provides x86 implementation for arch_pkeys_enabled(). x86 arch_show_smap() function is not needed anymore. Deleting it. cc: Michael Ellermen cc: Benjamin Herrenschmidt cc: Andrew Morton Reviewed-by: Dave Hansen Signed-off-by: Thiago Jung Bauermann (fixed compilation errors for x86 configs) Acked-by: Michal Hocko Reviewed-by: Ingo Molnar Signed-off-by: Ram Pai --- arch/powerpc/include/asm/mmu_context.h |5 - arch/x86/include/asm/mmu_context.h |5 - arch/x86/include/asm/pkeys.h |1 + arch/x86/kernel/fpu/xstate.c |5 + arch/x86/kernel/setup.c|8 fs/proc/task_mmu.c | 10 +- include/linux/pkeys.h |7 ++- 7 files changed, 17 insertions(+), 24 deletions(-) diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h index 051b3d6..566b3c2 100644 --- a/arch/powerpc/include/asm/mmu_context.h +++ b/arch/powerpc/include/asm/mmu_context.h @@ -203,11 +203,6 @@ static inline bool arch_vma_access_permitted(struct vm_area_struct *vma, #define thread_pkey_regs_restore(new_thread, old_thread) #define thread_pkey_regs_init(thread) -static inline int vma_pkey(struct vm_area_struct *vma) -{ - return 0; -} - static inline u64 pte_to_hpte_pkey_bits(u64 pteflags) { return 0x0UL; diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 1de72ce..e597d09 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -295,11 +295,6 @@ static inline int vma_pkey(struct vm_area_struct *vma) return (vma->vm_flags & vma_pkey_mask) >> VM_PKEY_SHIFT; } -#else -static inline int vma_pkey(struct vm_area_struct *vma) -{ - return 0; -} #endif /* diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h index a0ba1ff..f6c287b 100644 --- a/arch/x86/include/asm/pkeys.h +++ b/arch/x86/include/asm/pkeys.h @@ -6,6 +6,7 @@ extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, unsigned long init_val); +extern bool arch_pkeys_enabled(void); /* * Try to dedicate one of the protection keys to be used as an diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 87a57b7..4f566e9 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -945,6 +945,11 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, return 0; } + +bool arch_pkeys_enabled(void) +{ + return boot_cpu_has(X86_FEATURE_OSPKE); +} #endif /* ! CONFIG_ARCH_HAS_PKEYS */ /* diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 4c616be..117ed01 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -1307,11 +1307,3 @@ static int __init register_kernel_offset_dumper(void) return 0; } __initcall(register_kernel_offset_dumper); - -void arch_show_smap(struct seq_file *m, struct vm_area_struct *vma) -{ - if (!boot_cpu_has(X86_FEATURE_OSPKE)) - return; - - seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma)); -} diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 6d83bb7..70aa912 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -18,10 +18,12 @@ #include #include #include +#include #include #include #include +#include #include "internal.h" void task_mem(struct seq_file *m, struct mm_struct *mm) @@ -733,10 +735,6 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask, } #endif /* HUGETLB_PAGE */ -void __weak arch_show_smap(struct seq_file *m, struct vm_area_struct *vma) -{ -} - static int show_smap(struct seq_file *m, void *v, int is_pid) { struct proc_maps_private *priv = m->private; @@ -856,9 +854,11 @@ static int show_smap(struct seq_file *m, void *v, int is_pid) (unsigned long)(mss->pss >> (10 + PSS_SHIFT))); if (!rollup_mode) { - arch_show_smap(m, vma); + if (arch_pkeys_enabled()) + seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma)); show_smap_vma_flags(m, vma); } + m_cache_vma(m, vma); return ret; } diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h index 0794ca7..49dff15 100644 --- a/include/linux/pkeys.h +++ b/include/linux/pkeys.h @@ -3,7 +3,6 @@ #define _LINUX_PKEYS_H #include -#include #ifdef CONFIG_ARCH_HAS_PKEYS #include @@ -13,6 +12,7 @@ #define arch_override_mprotect_pkey(vma,
[PATCH v13 0/3] mm, x86, powerpc: Enhancements to Memory Protection Keys.
This patch series provides arch-neutral enhancements to enable memory-keys on new architecutes, and the corresponding changes in x86 and powerpc specific code to support that. a) Provides ability to support upto 32 keys. PowerPC can handle 32 keys and hence needs this. b) Arch-neutral code; and not the arch-specific code, determines the format of the string, that displays the key for each vma in smaps. History: --- version v13: (1) fixed a git bisect error. :( version v12: (1) fixed compilation errors seen with various x86 configs. version v11: (1) code that displays key in smaps is not any more defined under CONFIG_ARCH_HAS_PKEYS. - Comment by Eric W. Biederman and Michal Hocko (2) merged two patches that implemented (1). - comment by Michal Hocko version prior to v11: (1) used one additional bit from VM_HIGH_ARCH_* to support 32 keys. - Suggestion by Dave Hansen. (2) powerpc specific changes to support memory keys. Ram Pai (3): mm, powerpc, x86: define VM_PKEY_BITx bits if CONFIG_ARCH_HAS_PKEYS is enabled mm, powerpc, x86: introduce an additional vma bit for powerpc pkey mm, x86, powerpc: display pkey in smaps only if arch supports pkeys arch/powerpc/include/asm/mmu_context.h |5 - arch/powerpc/include/asm/pkeys.h |2 ++ arch/x86/include/asm/mmu_context.h |5 - arch/x86/include/asm/pkeys.h |1 + arch/x86/kernel/fpu/xstate.c |5 + arch/x86/kernel/setup.c|8 fs/proc/task_mmu.c | 15 --- include/linux/mm.h | 12 +++- include/linux/pkeys.h |7 ++- 9 files changed, 29 insertions(+), 31 deletions(-)
[PATCH v13 0/3] mm, x86, powerpc: Enhancements to Memory Protection Keys.
This patch series provides arch-neutral enhancements to enable memory-keys on new architecutes, and the corresponding changes in x86 and powerpc specific code to support that. a) Provides ability to support upto 32 keys. PowerPC can handle 32 keys and hence needs this. b) Arch-neutral code; and not the arch-specific code, determines the format of the string, that displays the key for each vma in smaps. History: --- version v13: (1) fixed a git bisect error. :( version v12: (1) fixed compilation errors seen with various x86 configs. version v11: (1) code that displays key in smaps is not any more defined under CONFIG_ARCH_HAS_PKEYS. - Comment by Eric W. Biederman and Michal Hocko (2) merged two patches that implemented (1). - comment by Michal Hocko version prior to v11: (1) used one additional bit from VM_HIGH_ARCH_* to support 32 keys. - Suggestion by Dave Hansen. (2) powerpc specific changes to support memory keys. Ram Pai (3): mm, powerpc, x86: define VM_PKEY_BITx bits if CONFIG_ARCH_HAS_PKEYS is enabled mm, powerpc, x86: introduce an additional vma bit for powerpc pkey mm, x86, powerpc: display pkey in smaps only if arch supports pkeys arch/powerpc/include/asm/mmu_context.h |5 - arch/powerpc/include/asm/pkeys.h |2 ++ arch/x86/include/asm/mmu_context.h |5 - arch/x86/include/asm/pkeys.h |1 + arch/x86/kernel/fpu/xstate.c |5 + arch/x86/kernel/setup.c|8 fs/proc/task_mmu.c | 15 --- include/linux/mm.h | 12 +++- include/linux/pkeys.h |7 ++- 9 files changed, 29 insertions(+), 31 deletions(-)
[PATCH v13 2/3] mm, powerpc, x86: introduce an additional vma bit for powerpc pkey
Currently only 4bits are allocated in the vma flags to hold 16 keys. This is sufficient for x86. PowerPC supports 32 keys, which needs 5bits. This patch allocates an additional bit. cc: Dave Hansen <dave.han...@intel.com> cc: Michael Ellermen <m...@ellerman.id.au> cc: Benjamin Herrenschmidt <b...@kernel.crashing.org> cc: Andrew Morton <a...@linux-foundation.org> Reviewed-by: Ingo Molnar <mi...@kernel.org> Acked-by: Balbir Singh <bsinghar...@gmail.com> Signed-off-by: Ram Pai <linux...@us.ibm.com> --- fs/proc/task_mmu.c |1 + include/linux/mm.h |3 ++- 2 files changed, 3 insertions(+), 1 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 6b996d0..6d83bb7 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -685,6 +685,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) [ilog2(VM_PKEY_BIT1)] = "", [ilog2(VM_PKEY_BIT2)] = "", [ilog2(VM_PKEY_BIT3)] = "", + [ilog2(VM_PKEY_BIT4)] = "", #endif /* CONFIG_ARCH_HAS_PKEYS */ }; size_t i; diff --git a/include/linux/mm.h b/include/linux/mm.h index ad207ad..d534f46 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -231,9 +231,10 @@ extern int overcommit_kbytes_handler(struct ctl_table *, int, void __user *, #ifdef CONFIG_ARCH_HAS_PKEYS # define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0 # define VM_PKEY_BIT0 VM_HIGH_ARCH_0 /* A protection key is a 4-bit value */ -# define VM_PKEY_BIT1 VM_HIGH_ARCH_1 +# define VM_PKEY_BIT1 VM_HIGH_ARCH_1 /* on x86 and 5-bit value on ppc64 */ # define VM_PKEY_BIT2 VM_HIGH_ARCH_2 # define VM_PKEY_BIT3 VM_HIGH_ARCH_3 +# define VM_PKEY_BIT4 VM_HIGH_ARCH_4 #endif /* CONFIG_ARCH_HAS_PKEYS */ #if defined(CONFIG_X86) -- 1.7.1
[PATCH v13 2/3] mm, powerpc, x86: introduce an additional vma bit for powerpc pkey
Currently only 4bits are allocated in the vma flags to hold 16 keys. This is sufficient for x86. PowerPC supports 32 keys, which needs 5bits. This patch allocates an additional bit. cc: Dave Hansen cc: Michael Ellermen cc: Benjamin Herrenschmidt cc: Andrew Morton Reviewed-by: Ingo Molnar Acked-by: Balbir Singh Signed-off-by: Ram Pai --- fs/proc/task_mmu.c |1 + include/linux/mm.h |3 ++- 2 files changed, 3 insertions(+), 1 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 6b996d0..6d83bb7 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -685,6 +685,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) [ilog2(VM_PKEY_BIT1)] = "", [ilog2(VM_PKEY_BIT2)] = "", [ilog2(VM_PKEY_BIT3)] = "", + [ilog2(VM_PKEY_BIT4)] = "", #endif /* CONFIG_ARCH_HAS_PKEYS */ }; size_t i; diff --git a/include/linux/mm.h b/include/linux/mm.h index ad207ad..d534f46 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -231,9 +231,10 @@ extern int overcommit_kbytes_handler(struct ctl_table *, int, void __user *, #ifdef CONFIG_ARCH_HAS_PKEYS # define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0 # define VM_PKEY_BIT0 VM_HIGH_ARCH_0 /* A protection key is a 4-bit value */ -# define VM_PKEY_BIT1 VM_HIGH_ARCH_1 +# define VM_PKEY_BIT1 VM_HIGH_ARCH_1 /* on x86 and 5-bit value on ppc64 */ # define VM_PKEY_BIT2 VM_HIGH_ARCH_2 # define VM_PKEY_BIT3 VM_HIGH_ARCH_3 +# define VM_PKEY_BIT4 VM_HIGH_ARCH_4 #endif /* CONFIG_ARCH_HAS_PKEYS */ #if defined(CONFIG_X86) -- 1.7.1
Re: [PATCH 1/9] x86, pkeys: do not special case protection key 0
On Fri, Mar 23, 2018 at 11:09:05AM -0700, Dave Hansen wrote: > > From: Dave Hansen> > mm_pkey_is_allocated() treats pkey 0 as unallocated. That is > inconsistent with the manpages, and also inconsistent with > mm->context.pkey_allocation_map. Stop special casing it and only > disallow values that are actually bad (< 0). > > The end-user visible effect of this is that you can now use > mprotect_pkey() to set pkey=0. > > This is a bit nicer than what Ram proposed because it is simpler > and removes special-casing for pkey 0. On the other hand, it does > allow applciations to pkey_free() pkey-0, but that's just a silly > thing to do, so we are not going to protect against it. The more I think about this, the more I feel we are opening up a can of worms. I am ok with a bad application, shooting itself in its feet. But I am worried about all the bug reports and support requests we will encounter when applications inadvertently shoot themselves and blame it on the kernel. a warning in dmesg logs indicating a free-of-pkey-0 can help deflect the blame from the kernel. RP
Re: [PATCH 1/9] x86, pkeys: do not special case protection key 0
On Fri, Mar 23, 2018 at 11:09:05AM -0700, Dave Hansen wrote: > > From: Dave Hansen > > mm_pkey_is_allocated() treats pkey 0 as unallocated. That is > inconsistent with the manpages, and also inconsistent with > mm->context.pkey_allocation_map. Stop special casing it and only > disallow values that are actually bad (< 0). > > The end-user visible effect of this is that you can now use > mprotect_pkey() to set pkey=0. > > This is a bit nicer than what Ram proposed because it is simpler > and removes special-casing for pkey 0. On the other hand, it does > allow applciations to pkey_free() pkey-0, but that's just a silly > thing to do, so we are not going to protect against it. The more I think about this, the more I feel we are opening up a can of worms. I am ok with a bad application, shooting itself in its feet. But I am worried about all the bug reports and support requests we will encounter when applications inadvertently shoot themselves and blame it on the kernel. a warning in dmesg logs indicating a free-of-pkey-0 can help deflect the blame from the kernel. RP
[PATCH v2] powerpc, pkey: make protection key 0 less special
Applications need the ability to associate an address-range with some key and latter revert to its initial default key. Pkey-0 comes close to providing this function but falls short, because the current implementation disallows applications to explicitly associate pkey-0 to the address range. Lets make pkey-0 less special and treat it almost like any other key. Thus it can be explicitly associated with any address range, and can be freed. This gives the application more flexibility and power. The ability to free pkey-0 must be used responsibily, since pkey-0 is associated with almost all address-range by default. Even with this change pkey-0 continues to be slightly more special from the following point of view. (a) it is implicitly allocated. (b) it is the default key assigned to any address-range. Tested on powerpc. cc: Thomas Gleixner <t...@linutronix.de> cc: Dave Hansen <dave.han...@intel.com> cc: Michael Ellermen <m...@ellerman.id.au> cc: Ingo Molnar <mi...@kernel.org> cc: Andrew Morton <a...@linux-foundation.org> Signed-off-by: Ram Pai <linux...@us.ibm.com> --- History: v2: mm_pkey_is_allocated() continued to treat pkey-0 special. fixed it. arch/powerpc/include/asm/pkeys.h | 20 arch/powerpc/mm/pkeys.c | 20 2 files changed, 28 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h index 0409c80..b598fa9 100644 --- a/arch/powerpc/include/asm/pkeys.h +++ b/arch/powerpc/include/asm/pkeys.h @@ -101,10 +101,14 @@ static inline u16 pte_to_pkey_bits(u64 pteflags) static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey) { - /* A reserved key is never considered as 'explicitly allocated' */ - return ((pkey < arch_max_pkey()) && - !__mm_pkey_is_reserved(pkey) && - __mm_pkey_is_allocated(mm, pkey)); + if (pkey < 0 || pkey >= arch_max_pkey()) + return false; + + /* Reserved keys are never allocated. */ + if (__mm_pkey_is_reserved(pkey)) + return false; + + return __mm_pkey_is_allocated(mm, pkey); } extern void __arch_activate_pkey(int pkey); @@ -200,6 +204,14 @@ static inline int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, { if (static_branch_likely(_disabled)) return -EINVAL; + + /* +* userspace is discouraged from changing permissions of +* pkey-0. powerpc hardware does not support it anyway. +*/ + if (!pkey) + return init_val ? -EINVAL : 0; + return __arch_set_user_pkey_access(tsk, pkey, init_val); } diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c index ba71c54..e7a9e34 100644 --- a/arch/powerpc/mm/pkeys.c +++ b/arch/powerpc/mm/pkeys.c @@ -119,19 +119,21 @@ int pkey_initialize(void) #else os_reserved = 0; #endif - /* -* Bits are in LE format. NOTE: 1, 0 are reserved. -* key 0 is the default key, which allows read/write/execute. -* key 1 is recommended not to be used. PowerISA(3.0) page 1015, -* programming note. -*/ + /* Bits are in LE format. */ initial_allocation_mask = ~0x0; /* register mask is in BE format */ pkey_amr_uamor_mask = ~0x0ul; pkey_iamr_mask = ~0x0ul; - for (i = 2; i < (pkeys_total - os_reserved); i++) { + for (i = 0; i < (pkeys_total - os_reserved); i++) { + /* +* key 1 is recommended not to be used. +* PowerISA(3.0) page 1015, +*/ + if (i == 1) + continue; + initial_allocation_mask &= ~(0x1 << i); pkey_amr_uamor_mask &= ~(0x3ul << pkeyshift(i)); pkey_iamr_mask &= ~(0x1ul << pkeyshift(i)); @@ -145,7 +147,9 @@ void pkey_mm_init(struct mm_struct *mm) { if (static_branch_likely(_disabled)) return; - mm_pkey_allocation_map(mm) = initial_allocation_mask; + + /* allocate key-0 by default */ + mm_pkey_allocation_map(mm) = initial_allocation_mask | 0x1; /* -1 means unallocated or invalid */ mm->context.execute_only_pkey = -1; } -- 1.8.3.1
[PATCH v2] powerpc, pkey: make protection key 0 less special
Applications need the ability to associate an address-range with some key and latter revert to its initial default key. Pkey-0 comes close to providing this function but falls short, because the current implementation disallows applications to explicitly associate pkey-0 to the address range. Lets make pkey-0 less special and treat it almost like any other key. Thus it can be explicitly associated with any address range, and can be freed. This gives the application more flexibility and power. The ability to free pkey-0 must be used responsibily, since pkey-0 is associated with almost all address-range by default. Even with this change pkey-0 continues to be slightly more special from the following point of view. (a) it is implicitly allocated. (b) it is the default key assigned to any address-range. Tested on powerpc. cc: Thomas Gleixner cc: Dave Hansen cc: Michael Ellermen cc: Ingo Molnar cc: Andrew Morton Signed-off-by: Ram Pai --- History: v2: mm_pkey_is_allocated() continued to treat pkey-0 special. fixed it. arch/powerpc/include/asm/pkeys.h | 20 arch/powerpc/mm/pkeys.c | 20 2 files changed, 28 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h index 0409c80..b598fa9 100644 --- a/arch/powerpc/include/asm/pkeys.h +++ b/arch/powerpc/include/asm/pkeys.h @@ -101,10 +101,14 @@ static inline u16 pte_to_pkey_bits(u64 pteflags) static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey) { - /* A reserved key is never considered as 'explicitly allocated' */ - return ((pkey < arch_max_pkey()) && - !__mm_pkey_is_reserved(pkey) && - __mm_pkey_is_allocated(mm, pkey)); + if (pkey < 0 || pkey >= arch_max_pkey()) + return false; + + /* Reserved keys are never allocated. */ + if (__mm_pkey_is_reserved(pkey)) + return false; + + return __mm_pkey_is_allocated(mm, pkey); } extern void __arch_activate_pkey(int pkey); @@ -200,6 +204,14 @@ static inline int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, { if (static_branch_likely(_disabled)) return -EINVAL; + + /* +* userspace is discouraged from changing permissions of +* pkey-0. powerpc hardware does not support it anyway. +*/ + if (!pkey) + return init_val ? -EINVAL : 0; + return __arch_set_user_pkey_access(tsk, pkey, init_val); } diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c index ba71c54..e7a9e34 100644 --- a/arch/powerpc/mm/pkeys.c +++ b/arch/powerpc/mm/pkeys.c @@ -119,19 +119,21 @@ int pkey_initialize(void) #else os_reserved = 0; #endif - /* -* Bits are in LE format. NOTE: 1, 0 are reserved. -* key 0 is the default key, which allows read/write/execute. -* key 1 is recommended not to be used. PowerISA(3.0) page 1015, -* programming note. -*/ + /* Bits are in LE format. */ initial_allocation_mask = ~0x0; /* register mask is in BE format */ pkey_amr_uamor_mask = ~0x0ul; pkey_iamr_mask = ~0x0ul; - for (i = 2; i < (pkeys_total - os_reserved); i++) { + for (i = 0; i < (pkeys_total - os_reserved); i++) { + /* +* key 1 is recommended not to be used. +* PowerISA(3.0) page 1015, +*/ + if (i == 1) + continue; + initial_allocation_mask &= ~(0x1 << i); pkey_amr_uamor_mask &= ~(0x3ul << pkeyshift(i)); pkey_iamr_mask &= ~(0x1ul << pkeyshift(i)); @@ -145,7 +147,9 @@ void pkey_mm_init(struct mm_struct *mm) { if (static_branch_likely(_disabled)) return; - mm_pkey_allocation_map(mm) = initial_allocation_mask; + + /* allocate key-0 by default */ + mm_pkey_allocation_map(mm) = initial_allocation_mask | 0x1; /* -1 means unallocated or invalid */ mm->context.execute_only_pkey = -1; } -- 1.8.3.1
Re: [PATCH] powerpc, pkey: make protection key 0 less special
On Mon, Mar 26, 2018 at 04:31:41PM -0700, Ram Pai wrote: > Applications need the ability to associate an address-range with some > key and latter revert to its initial default key. Pkey-0 comes close to > providing this function but falls short, because the current > implementation disallows applications to explicitly associate pkey-0 to > the address range. > > Lets make pkey-0 less special and treat it almost like any other key. > Thus it can be explicitly associated with any address range, and can be > freed. This gives the application more flexibility and power. The > ability to free pkey-0 must be used responsibily, since pkey-0 is > associated with almost all address-range by default. > > Even with this change pkey-0 continues to be slightly more special > from the following point of view. > (a) it is implicitly allocated. > (b) it is the default key assigned to any address-range. > > Tested on powerpc. This patch is not entirely correct. > > cc: Thomas Gleixner <t...@linutronix.de> > cc: Dave Hansen <dave.han...@intel.com> > cc: Michael Ellermen <m...@ellerman.id.au> > cc: Ingo Molnar <mi...@kernel.org> > cc: Andrew Morton <a...@linux-foundation.org> > Signed-off-by: Ram Pai <linux...@us.ibm.com> > --- > arch/powerpc/include/asm/pkeys.h | 24 > 1 file changed, 20 insertions(+), 4 deletions(-) > > diff --git a/arch/powerpc/include/asm/pkeys.h > b/arch/powerpc/include/asm/pkeys.h > index 0409c80..9c7d3bd 100644 > --- a/arch/powerpc/include/asm/pkeys.h > +++ b/arch/powerpc/include/asm/pkeys.h > @@ -101,10 +101,18 @@ static inline u16 pte_to_pkey_bits(u64 pteflags) > > static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey) > { > - /* A reserved key is never considered as 'explicitly allocated' */ > - return ((pkey < arch_max_pkey()) && > - !__mm_pkey_is_reserved(pkey) && > - __mm_pkey_is_allocated(mm, pkey)); > + /* pkey 0 is allocated by default. */ > + if (!pkey) > + return true; This is wrong. pkey-0 should not be treated any special here. Will fix this and send a new patch. Sorry for the noise. RP
Re: [PATCH] powerpc, pkey: make protection key 0 less special
On Mon, Mar 26, 2018 at 04:31:41PM -0700, Ram Pai wrote: > Applications need the ability to associate an address-range with some > key and latter revert to its initial default key. Pkey-0 comes close to > providing this function but falls short, because the current > implementation disallows applications to explicitly associate pkey-0 to > the address range. > > Lets make pkey-0 less special and treat it almost like any other key. > Thus it can be explicitly associated with any address range, and can be > freed. This gives the application more flexibility and power. The > ability to free pkey-0 must be used responsibily, since pkey-0 is > associated with almost all address-range by default. > > Even with this change pkey-0 continues to be slightly more special > from the following point of view. > (a) it is implicitly allocated. > (b) it is the default key assigned to any address-range. > > Tested on powerpc. This patch is not entirely correct. > > cc: Thomas Gleixner > cc: Dave Hansen > cc: Michael Ellermen > cc: Ingo Molnar > cc: Andrew Morton > Signed-off-by: Ram Pai > --- > arch/powerpc/include/asm/pkeys.h | 24 > 1 file changed, 20 insertions(+), 4 deletions(-) > > diff --git a/arch/powerpc/include/asm/pkeys.h > b/arch/powerpc/include/asm/pkeys.h > index 0409c80..9c7d3bd 100644 > --- a/arch/powerpc/include/asm/pkeys.h > +++ b/arch/powerpc/include/asm/pkeys.h > @@ -101,10 +101,18 @@ static inline u16 pte_to_pkey_bits(u64 pteflags) > > static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey) > { > - /* A reserved key is never considered as 'explicitly allocated' */ > - return ((pkey < arch_max_pkey()) && > - !__mm_pkey_is_reserved(pkey) && > - __mm_pkey_is_allocated(mm, pkey)); > + /* pkey 0 is allocated by default. */ > + if (!pkey) > + return true; This is wrong. pkey-0 should not be treated any special here. Will fix this and send a new patch. Sorry for the noise. RP
[PATCH] powerpc, pkey: make protection key 0 less special
Applications need the ability to associate an address-range with some key and latter revert to its initial default key. Pkey-0 comes close to providing this function but falls short, because the current implementation disallows applications to explicitly associate pkey-0 to the address range. Lets make pkey-0 less special and treat it almost like any other key. Thus it can be explicitly associated with any address range, and can be freed. This gives the application more flexibility and power. The ability to free pkey-0 must be used responsibily, since pkey-0 is associated with almost all address-range by default. Even with this change pkey-0 continues to be slightly more special from the following point of view. (a) it is implicitly allocated. (b) it is the default key assigned to any address-range. Tested on powerpc. cc: Thomas Gleixner <t...@linutronix.de> cc: Dave Hansen <dave.han...@intel.com> cc: Michael Ellermen <m...@ellerman.id.au> cc: Ingo Molnar <mi...@kernel.org> cc: Andrew Morton <a...@linux-foundation.org> Signed-off-by: Ram Pai <linux...@us.ibm.com> --- arch/powerpc/include/asm/pkeys.h | 24 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h index 0409c80..9c7d3bd 100644 --- a/arch/powerpc/include/asm/pkeys.h +++ b/arch/powerpc/include/asm/pkeys.h @@ -101,10 +101,18 @@ static inline u16 pte_to_pkey_bits(u64 pteflags) static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey) { - /* A reserved key is never considered as 'explicitly allocated' */ - return ((pkey < arch_max_pkey()) && - !__mm_pkey_is_reserved(pkey) && - __mm_pkey_is_allocated(mm, pkey)); + /* pkey 0 is allocated by default. */ + if (!pkey) + return true; + + if (pkey < 0 || pkey >= arch_max_pkey()) + return false; + + /* Reserved keys are never allocated. */ + if (__mm_pkey_is_reserved(pkey)) + return false; + + return __mm_pkey_is_allocated(mm, pkey); } extern void __arch_activate_pkey(int pkey); @@ -200,6 +208,14 @@ static inline int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, { if (static_branch_likely(_disabled)) return -EINVAL; + + /* +* userspace is discouraged from changing permissions of +* pkey-0. powerpc hardware does not support it anyway. +*/ + if (!pkey) + return init_val ? -EINVAL : 0; + return __arch_set_user_pkey_access(tsk, pkey, init_val); } -- 1.8.3.1
[PATCH] powerpc, pkey: make protection key 0 less special
Applications need the ability to associate an address-range with some key and latter revert to its initial default key. Pkey-0 comes close to providing this function but falls short, because the current implementation disallows applications to explicitly associate pkey-0 to the address range. Lets make pkey-0 less special and treat it almost like any other key. Thus it can be explicitly associated with any address range, and can be freed. This gives the application more flexibility and power. The ability to free pkey-0 must be used responsibily, since pkey-0 is associated with almost all address-range by default. Even with this change pkey-0 continues to be slightly more special from the following point of view. (a) it is implicitly allocated. (b) it is the default key assigned to any address-range. Tested on powerpc. cc: Thomas Gleixner cc: Dave Hansen cc: Michael Ellermen cc: Ingo Molnar cc: Andrew Morton Signed-off-by: Ram Pai --- arch/powerpc/include/asm/pkeys.h | 24 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h index 0409c80..9c7d3bd 100644 --- a/arch/powerpc/include/asm/pkeys.h +++ b/arch/powerpc/include/asm/pkeys.h @@ -101,10 +101,18 @@ static inline u16 pte_to_pkey_bits(u64 pteflags) static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey) { - /* A reserved key is never considered as 'explicitly allocated' */ - return ((pkey < arch_max_pkey()) && - !__mm_pkey_is_reserved(pkey) && - __mm_pkey_is_allocated(mm, pkey)); + /* pkey 0 is allocated by default. */ + if (!pkey) + return true; + + if (pkey < 0 || pkey >= arch_max_pkey()) + return false; + + /* Reserved keys are never allocated. */ + if (__mm_pkey_is_reserved(pkey)) + return false; + + return __mm_pkey_is_allocated(mm, pkey); } extern void __arch_activate_pkey(int pkey); @@ -200,6 +208,14 @@ static inline int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, { if (static_branch_likely(_disabled)) return -EINVAL; + + /* +* userspace is discouraged from changing permissions of +* pkey-0. powerpc hardware does not support it anyway. +*/ + if (!pkey) + return init_val ? -EINVAL : 0; + return __arch_set_user_pkey_access(tsk, pkey, init_val); } -- 1.8.3.1
Re: [PATCH 1/9] x86, pkeys: do not special case protection key 0
On Fri, Mar 23, 2018 at 11:09:05AM -0700, Dave Hansen wrote: > > From: Dave Hansen <dave.han...@linux.intel.com> > > mm_pkey_is_allocated() treats pkey 0 as unallocated. That is > inconsistent with the manpages, and also inconsistent with > mm->context.pkey_allocation_map. Stop special casing it and only > disallow values that are actually bad (< 0). > > The end-user visible effect of this is that you can now use > mprotect_pkey() to set pkey=0. > > This is a bit nicer than what Ram proposed because it is simpler > and removes special-casing for pkey 0. On the other hand, it does > allow applciations to pkey_free() pkey-0, but that's just a silly > thing to do, so we are not going to protect against it. > > Signed-off-by: Dave Hansen <dave.han...@linux.intel.com> > Cc: Ram Pai <linux...@us.ibm.com> > Cc: Thomas Gleixner <t...@linutronix.de> > Cc: Dave Hansen <dave.han...@intel.com> > Cc: Michael Ellermen <m...@ellerman.id.au> > Cc: Ingo Molnar <mi...@kernel.org> > Cc: Andrew Morton <a...@linux-foundation.org>p > Cc: Shuah Khan <sh...@kernel.org> > --- > > b/arch/x86/include/asm/mmu_context.h |2 +- > b/arch/x86/include/asm/pkeys.h |6 +++--- > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff -puN arch/x86/include/asm/mmu_context.h~x86-pkey-0-default-allocated > arch/x86/include/asm/mmu_context.h > --- a/arch/x86/include/asm/mmu_context.h~x86-pkey-0-default-allocated > 2018-03-21 15:47:48.182198927 -0700 > +++ b/arch/x86/include/asm/mmu_context.h 2018-03-21 15:47:48.187198927 > -0700 > @@ -192,7 +192,7 @@ static inline int init_new_context(struc > > #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS > if (cpu_feature_enabled(X86_FEATURE_OSPKE)) { > - /* pkey 0 is the default and always allocated */ > + /* pkey 0 is the default and allocated implicitly */ > mm->context.pkey_allocation_map = 0x1; In the second patch, you introduce DEFAULT_KEY. Maybe you should introduce here and express the above code as mm->context.pkey_allocation_map = (0x1 << DEFAULT_KEY); Incase your default key changes to something else, you are still good. > /* -1 means unallocated or invalid */ > mm->context.execute_only_pkey = -1; > diff -puN arch/x86/include/asm/pkeys.h~x86-pkey-0-default-allocated > arch/x86/include/asm/pkeys.h > --- a/arch/x86/include/asm/pkeys.h~x86-pkey-0-default-allocated > 2018-03-21 15:47:48.184198927 -0700 > +++ b/arch/x86/include/asm/pkeys.h2018-03-21 15:47:48.188198927 -0700 > @@ -49,10 +49,10 @@ bool mm_pkey_is_allocated(struct mm_stru > { > /* >* "Allocated" pkeys are those that have been returned > - * from pkey_alloc(). pkey 0 is special, and never > - * returned from pkey_alloc(). > + * from pkey_alloc() or pkey 0 which is allocated > + * implicitly when the mm is created. >*/ > - if (pkey <= 0) > + if (pkey < 0) > return false; > if (pkey >= arch_max_pkey()) > return false; > _ -- Ram Pai
Re: [PATCH 1/9] x86, pkeys: do not special case protection key 0
On Fri, Mar 23, 2018 at 11:09:05AM -0700, Dave Hansen wrote: > > From: Dave Hansen > > mm_pkey_is_allocated() treats pkey 0 as unallocated. That is > inconsistent with the manpages, and also inconsistent with > mm->context.pkey_allocation_map. Stop special casing it and only > disallow values that are actually bad (< 0). > > The end-user visible effect of this is that you can now use > mprotect_pkey() to set pkey=0. > > This is a bit nicer than what Ram proposed because it is simpler > and removes special-casing for pkey 0. On the other hand, it does > allow applciations to pkey_free() pkey-0, but that's just a silly > thing to do, so we are not going to protect against it. > > Signed-off-by: Dave Hansen > Cc: Ram Pai > Cc: Thomas Gleixner > Cc: Dave Hansen > Cc: Michael Ellermen > Cc: Ingo Molnar > Cc: Andrew Morton p > Cc: Shuah Khan > --- > > b/arch/x86/include/asm/mmu_context.h |2 +- > b/arch/x86/include/asm/pkeys.h |6 +++--- > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff -puN arch/x86/include/asm/mmu_context.h~x86-pkey-0-default-allocated > arch/x86/include/asm/mmu_context.h > --- a/arch/x86/include/asm/mmu_context.h~x86-pkey-0-default-allocated > 2018-03-21 15:47:48.182198927 -0700 > +++ b/arch/x86/include/asm/mmu_context.h 2018-03-21 15:47:48.187198927 > -0700 > @@ -192,7 +192,7 @@ static inline int init_new_context(struc > > #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS > if (cpu_feature_enabled(X86_FEATURE_OSPKE)) { > - /* pkey 0 is the default and always allocated */ > + /* pkey 0 is the default and allocated implicitly */ > mm->context.pkey_allocation_map = 0x1; In the second patch, you introduce DEFAULT_KEY. Maybe you should introduce here and express the above code as mm->context.pkey_allocation_map = (0x1 << DEFAULT_KEY); Incase your default key changes to something else, you are still good. > /* -1 means unallocated or invalid */ > mm->context.execute_only_pkey = -1; > diff -puN arch/x86/include/asm/pkeys.h~x86-pkey-0-default-allocated > arch/x86/include/asm/pkeys.h > --- a/arch/x86/include/asm/pkeys.h~x86-pkey-0-default-allocated > 2018-03-21 15:47:48.184198927 -0700 > +++ b/arch/x86/include/asm/pkeys.h2018-03-21 15:47:48.188198927 -0700 > @@ -49,10 +49,10 @@ bool mm_pkey_is_allocated(struct mm_stru > { > /* >* "Allocated" pkeys are those that have been returned > - * from pkey_alloc(). pkey 0 is special, and never > - * returned from pkey_alloc(). > + * from pkey_alloc() or pkey 0 which is allocated > + * implicitly when the mm is created. >*/ > - if (pkey <= 0) > + if (pkey < 0) > return false; > if (pkey >= arch_max_pkey()) > return false; > _ -- Ram Pai
Re: [PATCH 1/3] x86, pkeys: do not special case protection key 0
On Sun, Mar 18, 2018 at 10:30:48AM +0100, Thomas Gleixner wrote: > On Sat, 17 Mar 2018, Ram Pai wrote: > > On Fri, Mar 16, 2018 at 02:46:56PM -0700, Dave Hansen wrote: > > > > > > From: Dave Hansen <dave.han...@linux.intel.com> > > > > > > mm_pkey_is_allocated() treats pkey 0 as unallocated. That is > > > inconsistent with the manpages, and also inconsistent with > > > mm->context.pkey_allocation_map. Stop special casing it and only > > > disallow values that are actually bad (< 0). > > > > > > The end-user visible effect of this is that you can now use > > > mprotect_pkey() to set pkey=0. > > > > > > This is a bit nicer than what Ram proposed because it is simpler > > > and removes special-casing for pkey 0. On the other hand, it does > > > allow applciations to pkey_free() pkey-0, but that's just a silly > > > thing to do, so we are not going to protect against it. > > > > So your proposal > > (a) allocates pkey 0 implicitly, > > (b) does not stop anyone from freeing pkey-0 > > (c) and allows pkey-0 to be explicitly associated with any address range. > > correct? > > > > My proposal > > (a) allocates pkey 0 implicitly, > > (b) stops anyone from freeing pkey-0 > > (c) and allows pkey-0 to be explicitly associated with any address range. > > > > So the difference between the two proposals is just the freeing part i.e > > (b). > > Did I get this right? > > Yes, and that's consistent with the other pkeys. > ok. Yes it makes pkey-0 even more consistent with the other keys, but not entirely consistent. pkey-0 still has the priviledge of being allocated by default. RP
Re: [PATCH 1/3] x86, pkeys: do not special case protection key 0
On Sun, Mar 18, 2018 at 10:30:48AM +0100, Thomas Gleixner wrote: > On Sat, 17 Mar 2018, Ram Pai wrote: > > On Fri, Mar 16, 2018 at 02:46:56PM -0700, Dave Hansen wrote: > > > > > > From: Dave Hansen > > > > > > mm_pkey_is_allocated() treats pkey 0 as unallocated. That is > > > inconsistent with the manpages, and also inconsistent with > > > mm->context.pkey_allocation_map. Stop special casing it and only > > > disallow values that are actually bad (< 0). > > > > > > The end-user visible effect of this is that you can now use > > > mprotect_pkey() to set pkey=0. > > > > > > This is a bit nicer than what Ram proposed because it is simpler > > > and removes special-casing for pkey 0. On the other hand, it does > > > allow applciations to pkey_free() pkey-0, but that's just a silly > > > thing to do, so we are not going to protect against it. > > > > So your proposal > > (a) allocates pkey 0 implicitly, > > (b) does not stop anyone from freeing pkey-0 > > (c) and allows pkey-0 to be explicitly associated with any address range. > > correct? > > > > My proposal > > (a) allocates pkey 0 implicitly, > > (b) stops anyone from freeing pkey-0 > > (c) and allows pkey-0 to be explicitly associated with any address range. > > > > So the difference between the two proposals is just the freeing part i.e > > (b). > > Did I get this right? > > Yes, and that's consistent with the other pkeys. > ok. Yes it makes pkey-0 even more consistent with the other keys, but not entirely consistent. pkey-0 still has the priviledge of being allocated by default. RP
Re: [PATCH 1/3] x86, pkeys: do not special case protection key 0
On Fri, Mar 16, 2018 at 02:46:56PM -0700, Dave Hansen wrote: > > From: Dave Hansen <dave.han...@linux.intel.com> > > mm_pkey_is_allocated() treats pkey 0 as unallocated. That is > inconsistent with the manpages, and also inconsistent with > mm->context.pkey_allocation_map. Stop special casing it and only > disallow values that are actually bad (< 0). > > The end-user visible effect of this is that you can now use > mprotect_pkey() to set pkey=0. > > This is a bit nicer than what Ram proposed because it is simpler > and removes special-casing for pkey 0. On the other hand, it does > allow applciations to pkey_free() pkey-0, but that's just a silly > thing to do, so we are not going to protect against it. So your proposal (a) allocates pkey 0 implicitly, (b) does not stop anyone from freeing pkey-0 (c) and allows pkey-0 to be explicitly associated with any address range. correct? My proposal (a) allocates pkey 0 implicitly, (b) stops anyone from freeing pkey-0 (c) and allows pkey-0 to be explicitly associated with any address range. So the difference between the two proposals is just the freeing part i.e (b). Did I get this right? Its a philosophical debate; allow the user to shoot-in-the-feet or stop from not doing so. There is no clear answer either way. I am fine either way. So here is my Reviewed-by: Ram Pai <linux...@us.ibm.com> I will write a corresponding patch for powerpc. > > Signed-off-by: Dave Hansen <dave.han...@linux.intel.com> > Cc: Ram Pai <linux...@us.ibm.com> > Cc: Thomas Gleixner <t...@linutronix.de> > Cc: Dave Hansen <dave.han...@intel.com> > Cc: Michael Ellermen <m...@ellerman.id.au> > Cc: Ingo Molnar <mi...@kernel.org> > Cc: Andrew Morton <a...@linux-foundation.org>p > Cc: Shuah Khan <sh...@kernel.org> > --- > > b/arch/x86/include/asm/mmu_context.h |2 +- > b/arch/x86/include/asm/pkeys.h |6 +++--- > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff -puN arch/x86/include/asm/mmu_context.h~x86-pkey-0-default-allocated > arch/x86/include/asm/mmu_context.h > --- a/arch/x86/include/asm/mmu_context.h~x86-pkey-0-default-allocated > 2018-03-16 14:46:39.023285476 -0700 > +++ b/arch/x86/include/asm/mmu_context.h 2018-03-16 14:46:39.028285476 > -0700 > @@ -191,7 +191,7 @@ static inline int init_new_context(struc > > #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS > if (cpu_feature_enabled(X86_FEATURE_OSPKE)) { > - /* pkey 0 is the default and always allocated */ > + /* pkey 0 is the default and allocated implicitly */ > mm->context.pkey_allocation_map = 0x1; > /* -1 means unallocated or invalid */ > mm->context.execute_only_pkey = -1; > diff -puN arch/x86/include/asm/pkeys.h~x86-pkey-0-default-allocated > arch/x86/include/asm/pkeys.h > --- a/arch/x86/include/asm/pkeys.h~x86-pkey-0-default-allocated > 2018-03-16 14:46:39.025285476 -0700 > +++ b/arch/x86/include/asm/pkeys.h2018-03-16 14:46:39.028285476 -0700 > @@ -49,10 +49,10 @@ bool mm_pkey_is_allocated(struct mm_stru > { > /* >* "Allocated" pkeys are those that have been returned > - * from pkey_alloc(). pkey 0 is special, and never > - * returned from pkey_alloc(). > + * from pkey_alloc() or pkey 0 which is allocated > + * implicitly when the mm is created. >*/ > - if (pkey <= 0) > + if (pkey < 0) > return false; > if (pkey >= arch_max_pkey()) > return false; > _ -- Ram Pai
Re: [PATCH 1/3] x86, pkeys: do not special case protection key 0
On Fri, Mar 16, 2018 at 02:46:56PM -0700, Dave Hansen wrote: > > From: Dave Hansen > > mm_pkey_is_allocated() treats pkey 0 as unallocated. That is > inconsistent with the manpages, and also inconsistent with > mm->context.pkey_allocation_map. Stop special casing it and only > disallow values that are actually bad (< 0). > > The end-user visible effect of this is that you can now use > mprotect_pkey() to set pkey=0. > > This is a bit nicer than what Ram proposed because it is simpler > and removes special-casing for pkey 0. On the other hand, it does > allow applciations to pkey_free() pkey-0, but that's just a silly > thing to do, so we are not going to protect against it. So your proposal (a) allocates pkey 0 implicitly, (b) does not stop anyone from freeing pkey-0 (c) and allows pkey-0 to be explicitly associated with any address range. correct? My proposal (a) allocates pkey 0 implicitly, (b) stops anyone from freeing pkey-0 (c) and allows pkey-0 to be explicitly associated with any address range. So the difference between the two proposals is just the freeing part i.e (b). Did I get this right? Its a philosophical debate; allow the user to shoot-in-the-feet or stop from not doing so. There is no clear answer either way. I am fine either way. So here is my Reviewed-by: Ram Pai I will write a corresponding patch for powerpc. > > Signed-off-by: Dave Hansen > Cc: Ram Pai > Cc: Thomas Gleixner > Cc: Dave Hansen > Cc: Michael Ellermen > Cc: Ingo Molnar > Cc: Andrew Morton p > Cc: Shuah Khan > --- > > b/arch/x86/include/asm/mmu_context.h |2 +- > b/arch/x86/include/asm/pkeys.h |6 +++--- > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff -puN arch/x86/include/asm/mmu_context.h~x86-pkey-0-default-allocated > arch/x86/include/asm/mmu_context.h > --- a/arch/x86/include/asm/mmu_context.h~x86-pkey-0-default-allocated > 2018-03-16 14:46:39.023285476 -0700 > +++ b/arch/x86/include/asm/mmu_context.h 2018-03-16 14:46:39.028285476 > -0700 > @@ -191,7 +191,7 @@ static inline int init_new_context(struc > > #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS > if (cpu_feature_enabled(X86_FEATURE_OSPKE)) { > - /* pkey 0 is the default and always allocated */ > + /* pkey 0 is the default and allocated implicitly */ > mm->context.pkey_allocation_map = 0x1; > /* -1 means unallocated or invalid */ > mm->context.execute_only_pkey = -1; > diff -puN arch/x86/include/asm/pkeys.h~x86-pkey-0-default-allocated > arch/x86/include/asm/pkeys.h > --- a/arch/x86/include/asm/pkeys.h~x86-pkey-0-default-allocated > 2018-03-16 14:46:39.025285476 -0700 > +++ b/arch/x86/include/asm/pkeys.h2018-03-16 14:46:39.028285476 -0700 > @@ -49,10 +49,10 @@ bool mm_pkey_is_allocated(struct mm_stru > { > /* >* "Allocated" pkeys are those that have been returned > - * from pkey_alloc(). pkey 0 is special, and never > - * returned from pkey_alloc(). > + * from pkey_alloc() or pkey 0 which is allocated > + * implicitly when the mm is created. >*/ > - if (pkey <= 0) > + if (pkey < 0) > return false; > if (pkey >= arch_max_pkey()) > return false; > _ -- Ram Pai
Re: [PATCH v4] mm, pkey: treat pkey-0 special
On Fri, Mar 16, 2018 at 10:02:22PM +1100, Balbir Singh wrote: > On Fri, Mar 16, 2018 at 9:33 PM, Ram Pai <linux...@us.ibm.com> wrote: > > Applications need the ability to associate an address-range with some > > key and latter revert to its initial default key. Pkey-0 comes close to > > providing this function but falls short, because the current > > implementation disallows applications to explicitly associate pkey-0 to > > the address range. > > > > Clarify the semantics of pkey-0 and provide the corresponding > > implementation. > > > > Pkey-0 is special with the following semantics. > > (a) it is implicitly allocated and can never be freed. It always exists. > > (b) it is the default key assigned to any address-range. > > (c) it can be explicitly associated with any address-range. > > > > Tested on powerpc only. Could not test on x86. > > > Ram, > > I was wondering if we should check the AMOR values on the ppc side to make > sure > that pkey0 is indeed available for use as default. I am still of the > opinion that we AMOR cannot be read/written by the OS in priviledge-non-hypervisor-mode. We could try testing if key-0 is available to the OS by temproarily changing the bits key-0 bits of AMR or IAMR register. But will be dangeorous to do, for you might disable read,execute of all the pages, since all pages are asscoiated with key-0 bydefault. May be we can play with UAMOR register and check if its key-0 can be modified. That is a good indication that key-0 is available. If it is not available, disable the pkey-subsystem, and operate the legacy way; no pkeys. > should consider non-0 default pkey in the long run. I'm OK with the patches > for > now, but really 0 is not special except for it being the default bit > values present > in the PTE. it will be a pain. Any new pte that gets instantiated will now have to explicitly initialize its key to this default-non-zero-key. I hope we or any architecture goes there ever. -- Ram Pai
Re: [PATCH v4] mm, pkey: treat pkey-0 special
On Fri, Mar 16, 2018 at 10:02:22PM +1100, Balbir Singh wrote: > On Fri, Mar 16, 2018 at 9:33 PM, Ram Pai wrote: > > Applications need the ability to associate an address-range with some > > key and latter revert to its initial default key. Pkey-0 comes close to > > providing this function but falls short, because the current > > implementation disallows applications to explicitly associate pkey-0 to > > the address range. > > > > Clarify the semantics of pkey-0 and provide the corresponding > > implementation. > > > > Pkey-0 is special with the following semantics. > > (a) it is implicitly allocated and can never be freed. It always exists. > > (b) it is the default key assigned to any address-range. > > (c) it can be explicitly associated with any address-range. > > > > Tested on powerpc only. Could not test on x86. > > > Ram, > > I was wondering if we should check the AMOR values on the ppc side to make > sure > that pkey0 is indeed available for use as default. I am still of the > opinion that we AMOR cannot be read/written by the OS in priviledge-non-hypervisor-mode. We could try testing if key-0 is available to the OS by temproarily changing the bits key-0 bits of AMR or IAMR register. But will be dangeorous to do, for you might disable read,execute of all the pages, since all pages are asscoiated with key-0 bydefault. May be we can play with UAMOR register and check if its key-0 can be modified. That is a good indication that key-0 is available. If it is not available, disable the pkey-subsystem, and operate the legacy way; no pkeys. > should consider non-0 default pkey in the long run. I'm OK with the patches > for > now, but really 0 is not special except for it being the default bit > values present > in the PTE. it will be a pain. Any new pte that gets instantiated will now have to explicitly initialize its key to this default-non-zero-key. I hope we or any architecture goes there ever. -- Ram Pai
[PATCH v4] mm, pkey: treat pkey-0 special
Applications need the ability to associate an address-range with some key and latter revert to its initial default key. Pkey-0 comes close to providing this function but falls short, because the current implementation disallows applications to explicitly associate pkey-0 to the address range. Clarify the semantics of pkey-0 and provide the corresponding implementation. Pkey-0 is special with the following semantics. (a) it is implicitly allocated and can never be freed. It always exists. (b) it is the default key assigned to any address-range. (c) it can be explicitly associated with any address-range. Tested on powerpc only. Could not test on x86. cc: Thomas Gleixner <t...@linutronix.de> cc: Dave Hansen <dave.han...@intel.com> cc: Michael Ellermen <m...@ellerman.id.au> cc: Ingo Molnar <mi...@kernel.org> cc: Andrew Morton <a...@linux-foundation.org> Signed-off-by: Ram Pai <linux...@us.ibm.com> --- History: v4 : (1) moved the code entirely in arch-independent location. (2) fixed comments -- suggested by Thomas Gliexner v3 : added clarification of the semantics of pkey0. -- suggested by Dave Hansen v2 : split the patch into two, one for x86 and one for powerpc -- suggested by Michael Ellermen Documentation/x86/protection-keys.txt |8 mm/mprotect.c | 25 ++--- 2 files changed, 30 insertions(+), 3 deletions(-) diff --git a/Documentation/x86/protection-keys.txt b/Documentation/x86/protection-keys.txt index ecb0d2d..92802c4 100644 --- a/Documentation/x86/protection-keys.txt +++ b/Documentation/x86/protection-keys.txt @@ -88,3 +88,11 @@ with a read(): The kernel will send a SIGSEGV in both cases, but si_code will be set to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when the plain mprotect() permissions are violated. + +== pkey 0 == + +Pkey-0 is special. It is implicitly allocated. Applications cannot allocate or +free that key. This key is the default key that gets associated with a +addres-space. It can be explicitly associated with any address-space. + + diff --git a/mm/mprotect.c b/mm/mprotect.c index e3309fc..2c779fa 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -430,7 +430,13 @@ static int do_mprotect_pkey(unsigned long start, size_t len, * them use it here. */ error = -EINVAL; - if ((pkey != -1) && !mm_pkey_is_allocated(current->mm, pkey)) + + /* +* pkey-0 is special. It always exists. No need to check if it is +* allocated. Check allocation status of all other keys. pkey=-1 +* is not realy a key, it means; use any available key. +*/ + if (pkey && pkey != -1 && !mm_pkey_is_allocated(current->mm, pkey)) goto out; vma = find_vma(current->mm, start); @@ -549,6 +555,12 @@ static int do_mprotect_pkey(unsigned long start, size_t len, if (pkey == -1) goto out; + if (!pkey) { + mm_pkey_free(current->mm, pkey); + printk("Internal error, cannot explicitly allocate key-0"); + goto out; + } + ret = arch_set_user_pkey_access(current, pkey, init_val); if (ret) { mm_pkey_free(current->mm, pkey); @@ -564,13 +576,20 @@ static int do_mprotect_pkey(unsigned long start, size_t len, { int ret; + /* +* pkey-0 is special. Userspace can never allocate or free it. It is +* allocated by default. It always exists. +*/ + if (!pkey) + return -EINVAL; + down_write(>mm->mmap_sem); ret = mm_pkey_free(current->mm, pkey); up_write(>mm->mmap_sem); /* -* We could provie warnings or errors if any VMA still -* has the pkey set here. +* We could provide warnings or errors if any VMA still has the pkey +* set here. */ return ret; } -- 1.7.1
[PATCH v4] mm, pkey: treat pkey-0 special
Applications need the ability to associate an address-range with some key and latter revert to its initial default key. Pkey-0 comes close to providing this function but falls short, because the current implementation disallows applications to explicitly associate pkey-0 to the address range. Clarify the semantics of pkey-0 and provide the corresponding implementation. Pkey-0 is special with the following semantics. (a) it is implicitly allocated and can never be freed. It always exists. (b) it is the default key assigned to any address-range. (c) it can be explicitly associated with any address-range. Tested on powerpc only. Could not test on x86. cc: Thomas Gleixner cc: Dave Hansen cc: Michael Ellermen cc: Ingo Molnar cc: Andrew Morton Signed-off-by: Ram Pai --- History: v4 : (1) moved the code entirely in arch-independent location. (2) fixed comments -- suggested by Thomas Gliexner v3 : added clarification of the semantics of pkey0. -- suggested by Dave Hansen v2 : split the patch into two, one for x86 and one for powerpc -- suggested by Michael Ellermen Documentation/x86/protection-keys.txt |8 mm/mprotect.c | 25 ++--- 2 files changed, 30 insertions(+), 3 deletions(-) diff --git a/Documentation/x86/protection-keys.txt b/Documentation/x86/protection-keys.txt index ecb0d2d..92802c4 100644 --- a/Documentation/x86/protection-keys.txt +++ b/Documentation/x86/protection-keys.txt @@ -88,3 +88,11 @@ with a read(): The kernel will send a SIGSEGV in both cases, but si_code will be set to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when the plain mprotect() permissions are violated. + +== pkey 0 == + +Pkey-0 is special. It is implicitly allocated. Applications cannot allocate or +free that key. This key is the default key that gets associated with a +addres-space. It can be explicitly associated with any address-space. + + diff --git a/mm/mprotect.c b/mm/mprotect.c index e3309fc..2c779fa 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -430,7 +430,13 @@ static int do_mprotect_pkey(unsigned long start, size_t len, * them use it here. */ error = -EINVAL; - if ((pkey != -1) && !mm_pkey_is_allocated(current->mm, pkey)) + + /* +* pkey-0 is special. It always exists. No need to check if it is +* allocated. Check allocation status of all other keys. pkey=-1 +* is not realy a key, it means; use any available key. +*/ + if (pkey && pkey != -1 && !mm_pkey_is_allocated(current->mm, pkey)) goto out; vma = find_vma(current->mm, start); @@ -549,6 +555,12 @@ static int do_mprotect_pkey(unsigned long start, size_t len, if (pkey == -1) goto out; + if (!pkey) { + mm_pkey_free(current->mm, pkey); + printk("Internal error, cannot explicitly allocate key-0"); + goto out; + } + ret = arch_set_user_pkey_access(current, pkey, init_val); if (ret) { mm_pkey_free(current->mm, pkey); @@ -564,13 +576,20 @@ static int do_mprotect_pkey(unsigned long start, size_t len, { int ret; + /* +* pkey-0 is special. Userspace can never allocate or free it. It is +* allocated by default. It always exists. +*/ + if (!pkey) + return -EINVAL; + down_write(>mm->mmap_sem); ret = mm_pkey_free(current->mm, pkey); up_write(>mm->mmap_sem); /* -* We could provie warnings or errors if any VMA still -* has the pkey set here. +* We could provide warnings or errors if any VMA still has the pkey +* set here. */ return ret; } -- 1.7.1
Re: [PATCH v3] x86: treat pkey-0 special
On Thu, Mar 15, 2018 at 10:31:51AM -0700, Dave Hansen wrote: > On 03/15/2018 10:21 AM, Ram Pai wrote: > > On Thu, Mar 15, 2018 at 08:55:31AM -0700, Dave Hansen wrote: > >> On 03/15/2018 02:46 AM, Thomas Gleixner wrote: > >>>> +if (!pkey || !mm_pkey_is_allocated(mm, pkey)) > >>> Why this extra check? mm_pkey_is_allocated(mm, 0) should not return true > >>> ever. If it does, then this wants to be fixed. > >> I was thinking that we _do_ actually want it to seem allocated. It just > >> get "allocated" implicitly when an mm is created. I think that will > >> simplify the code if we avoid treating it specially in as many places as > >> possible. > > I think, the logic that makes pkey-0 special must to go > > in arch-neutral code. How about checking for pkey-0 in sys_pkey_free() > > itself? > > This is for protection against shooting yourself in the foot? Yes, that > can go in sys_pkey_free(). > > Does this need manpage and/or selftests updates? Yes. it needs selftest, manpage and documentation updates too. Unfortunately I am not getting enough reviewed-by for my selftests and documentation changes. :-( Need help! -- Ram Pai
Re: [PATCH v3] x86: treat pkey-0 special
On Thu, Mar 15, 2018 at 10:31:51AM -0700, Dave Hansen wrote: > On 03/15/2018 10:21 AM, Ram Pai wrote: > > On Thu, Mar 15, 2018 at 08:55:31AM -0700, Dave Hansen wrote: > >> On 03/15/2018 02:46 AM, Thomas Gleixner wrote: > >>>> +if (!pkey || !mm_pkey_is_allocated(mm, pkey)) > >>> Why this extra check? mm_pkey_is_allocated(mm, 0) should not return true > >>> ever. If it does, then this wants to be fixed. > >> I was thinking that we _do_ actually want it to seem allocated. It just > >> get "allocated" implicitly when an mm is created. I think that will > >> simplify the code if we avoid treating it specially in as many places as > >> possible. > > I think, the logic that makes pkey-0 special must to go > > in arch-neutral code. How about checking for pkey-0 in sys_pkey_free() > > itself? > > This is for protection against shooting yourself in the foot? Yes, that > can go in sys_pkey_free(). > > Does this need manpage and/or selftests updates? Yes. it needs selftest, manpage and documentation updates too. Unfortunately I am not getting enough reviewed-by for my selftests and documentation changes. :-( Need help! -- Ram Pai
Re: [PATCH v3] x86: treat pkey-0 special
On Thu, Mar 15, 2018 at 08:55:31AM -0700, Dave Hansen wrote: > On 03/15/2018 02:46 AM, Thomas Gleixner wrote: > >> + if (!pkey || !mm_pkey_is_allocated(mm, pkey)) > > Why this extra check? mm_pkey_is_allocated(mm, 0) should not return true > > ever. If it does, then this wants to be fixed. > > I was thinking that we _do_ actually want it to seem allocated. It just > get "allocated" implicitly when an mm is created. I think that will > simplify the code if we avoid treating it specially in as many places as > possible. I think, the logic that makes pkey-0 special must to go in arch-neutral code. How about checking for pkey-0 in sys_pkey_free() itself? RP
Re: [PATCH v3] x86: treat pkey-0 special
On Thu, Mar 15, 2018 at 08:55:31AM -0700, Dave Hansen wrote: > On 03/15/2018 02:46 AM, Thomas Gleixner wrote: > >> + if (!pkey || !mm_pkey_is_allocated(mm, pkey)) > > Why this extra check? mm_pkey_is_allocated(mm, 0) should not return true > > ever. If it does, then this wants to be fixed. > > I was thinking that we _do_ actually want it to seem allocated. It just > get "allocated" implicitly when an mm is created. I think that will > simplify the code if we avoid treating it specially in as many places as > possible. I think, the logic that makes pkey-0 special must to go in arch-neutral code. How about checking for pkey-0 in sys_pkey_free() itself? RP
[PATCH v3] powerpc: treat pkey-0 special
Applications need the ability to associate an address-range with some key and latter revert to its initial default key. Pkey-0 comes close to providing this function but falls short, because the current implementation disallows applications to explicitly associate pkey-0 to the address range. This patch clarifies the semantics of pkey-0 and provides the corresponding implementation on powerpc. Pkey-0 is special with the following semantics. (a) it is implicitly allocated and can never be freed. It always exists. (b) it is the default key assigned to any address-range. (c) it can be explicitly associated with any address-range. Tested on powerpc. History: v3 : added clarification of the semantics of pkey0. -- suggested by Dave Hansen v2 : split the patch into two, one for x86 and one for powerpc -- suggested by Michael Ellermen cc: Dave Hansen <dave.han...@intel.com> cc: Michael Ellermen <m...@ellerman.id.au> cc: Ingo Molnar <mi...@kernel.org> Signed-off-by: Ram Pai <linux...@us.ibm.com> --- arch/powerpc/include/asm/pkeys.h | 19 ++- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h index 0409c80..3c1deec 100644 --- a/arch/powerpc/include/asm/pkeys.h +++ b/arch/powerpc/include/asm/pkeys.h @@ -101,10 +101,18 @@ static inline u16 pte_to_pkey_bits(u64 pteflags) static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey) { - /* A reserved key is never considered as 'explicitly allocated' */ - return ((pkey < arch_max_pkey()) && - !__mm_pkey_is_reserved(pkey) && - __mm_pkey_is_allocated(mm, pkey)); + /* pkey 0 is allocated by default. */ + if (!pkey) + return true; + + if (pkey < 0 || pkey >= arch_max_pkey()) + return false; + + /* Reserved keys are never allocated. */ + if (__mm_pkey_is_reserved(pkey)) + return false; + + return __mm_pkey_is_allocated(mm, pkey); } extern void __arch_activate_pkey(int pkey); @@ -150,7 +158,8 @@ static inline int mm_pkey_free(struct mm_struct *mm, int pkey) if (static_branch_likely(_disabled)) return -1; - if (!mm_pkey_is_allocated(mm, pkey)) + /* pkey 0 cannot be freed */ + if (!pkey || !mm_pkey_is_allocated(mm, pkey)) return -EINVAL; /* -- 1.8.3.1
[PATCH v3] powerpc: treat pkey-0 special
Applications need the ability to associate an address-range with some key and latter revert to its initial default key. Pkey-0 comes close to providing this function but falls short, because the current implementation disallows applications to explicitly associate pkey-0 to the address range. This patch clarifies the semantics of pkey-0 and provides the corresponding implementation on powerpc. Pkey-0 is special with the following semantics. (a) it is implicitly allocated and can never be freed. It always exists. (b) it is the default key assigned to any address-range. (c) it can be explicitly associated with any address-range. Tested on powerpc. History: v3 : added clarification of the semantics of pkey0. -- suggested by Dave Hansen v2 : split the patch into two, one for x86 and one for powerpc -- suggested by Michael Ellermen cc: Dave Hansen cc: Michael Ellermen cc: Ingo Molnar Signed-off-by: Ram Pai --- arch/powerpc/include/asm/pkeys.h | 19 ++- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h index 0409c80..3c1deec 100644 --- a/arch/powerpc/include/asm/pkeys.h +++ b/arch/powerpc/include/asm/pkeys.h @@ -101,10 +101,18 @@ static inline u16 pte_to_pkey_bits(u64 pteflags) static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey) { - /* A reserved key is never considered as 'explicitly allocated' */ - return ((pkey < arch_max_pkey()) && - !__mm_pkey_is_reserved(pkey) && - __mm_pkey_is_allocated(mm, pkey)); + /* pkey 0 is allocated by default. */ + if (!pkey) + return true; + + if (pkey < 0 || pkey >= arch_max_pkey()) + return false; + + /* Reserved keys are never allocated. */ + if (__mm_pkey_is_reserved(pkey)) + return false; + + return __mm_pkey_is_allocated(mm, pkey); } extern void __arch_activate_pkey(int pkey); @@ -150,7 +158,8 @@ static inline int mm_pkey_free(struct mm_struct *mm, int pkey) if (static_branch_likely(_disabled)) return -1; - if (!mm_pkey_is_allocated(mm, pkey)) + /* pkey 0 cannot be freed */ + if (!pkey || !mm_pkey_is_allocated(mm, pkey)) return -EINVAL; /* -- 1.8.3.1
[PATCH v3] x86: treat pkey-0 special
Applications need the ability to associate an address-range with some key and latter revert to its initial default key. Pkey-0 comes close to providing this function but falls short, because the current implementation disallows applications to explicitly associate pkey-0 to the address range. This patch clarifies the semantics of pkey-0 and provides the corresponding implementation on powerpc. Pkey-0 is special with the following semantics. (a) it is implicitly allocated and can never be freed. It always exists. (b) it is the default key assigned to any address-range. (c) it can be explicitly associated with any address-range. Tested on x86_64. History: v3 : added clarification of the semantics of pkey0. -- suggested by Dave Hansen v2 : split the patch into two, one for x86 and one for powerpc -- suggested by Michael Ellermen cc: Dave Hansen <dave.han...@intel.com> cc: Michael Ellermen <m...@ellerman.id.au> cc: Ingo Molnar <mi...@kernel.org> Signed-off-by: Ram Pai <linux...@us.ibm.com> --- arch/x86/include/asm/pkeys.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h index a0ba1ff..6ea7486 100644 --- a/arch/x86/include/asm/pkeys.h +++ b/arch/x86/include/asm/pkeys.h @@ -52,7 +52,7 @@ bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey) * from pkey_alloc(). pkey 0 is special, and never * returned from pkey_alloc(). */ - if (pkey <= 0) + if (pkey < 0) return false; if (pkey >= arch_max_pkey()) return false; @@ -92,7 +92,8 @@ int mm_pkey_alloc(struct mm_struct *mm) static inline int mm_pkey_free(struct mm_struct *mm, int pkey) { - if (!mm_pkey_is_allocated(mm, pkey)) + /* pkey 0 is special and can never be freed */ + if (!pkey || !mm_pkey_is_allocated(mm, pkey)) return -EINVAL; mm_set_pkey_free(mm, pkey); -- 1.8.3.1
[PATCH v3] x86: treat pkey-0 special
Applications need the ability to associate an address-range with some key and latter revert to its initial default key. Pkey-0 comes close to providing this function but falls short, because the current implementation disallows applications to explicitly associate pkey-0 to the address range. This patch clarifies the semantics of pkey-0 and provides the corresponding implementation on powerpc. Pkey-0 is special with the following semantics. (a) it is implicitly allocated and can never be freed. It always exists. (b) it is the default key assigned to any address-range. (c) it can be explicitly associated with any address-range. Tested on x86_64. History: v3 : added clarification of the semantics of pkey0. -- suggested by Dave Hansen v2 : split the patch into two, one for x86 and one for powerpc -- suggested by Michael Ellermen cc: Dave Hansen cc: Michael Ellermen cc: Ingo Molnar Signed-off-by: Ram Pai --- arch/x86/include/asm/pkeys.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h index a0ba1ff..6ea7486 100644 --- a/arch/x86/include/asm/pkeys.h +++ b/arch/x86/include/asm/pkeys.h @@ -52,7 +52,7 @@ bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey) * from pkey_alloc(). pkey 0 is special, and never * returned from pkey_alloc(). */ - if (pkey <= 0) + if (pkey < 0) return false; if (pkey >= arch_max_pkey()) return false; @@ -92,7 +92,8 @@ int mm_pkey_alloc(struct mm_struct *mm) static inline int mm_pkey_free(struct mm_struct *mm, int pkey) { - if (!mm_pkey_is_allocated(mm, pkey)) + /* pkey 0 is special and can never be freed */ + if (!pkey || !mm_pkey_is_allocated(mm, pkey)) return -EINVAL; mm_set_pkey_free(mm, pkey); -- 1.8.3.1