Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends

2020-09-19 Thread Thomas Gleixner
On Sat, Sep 19 2020 at 10:18, Linus Torvalds wrote:
> On Sat, Sep 19, 2020 at 2:50 AM Thomas Gleixner  wrote:
>>
>> this provides a preemptible variant of kmap_atomic & related
>> interfaces. This is achieved by:
>
> Ack. This looks really nice, even apart from the new capability.
>
> The only thing I really reacted to is that the name doesn't make sense
> to me: "kmap_temporary()" seems a bit odd.

Yeah. Couldn't come up with something useful.

> Particularly for an interface that really is basically meant as a
> better replacement of "kmap_atomic()" (but is perhaps also a better
> replacement for "kmap()").
>
> I think I understand how the name came about: I think the "temporary"
> is there as a distinction from the "longterm" regular kmap(). So I
> think it makes some sense from an internal implementation angle, but I
> don't think it makes a lot of sense from an interface name.
>
> I don't know what might be a better name, but if we want to emphasize
> that it's thread-private and a one-off, maybe "local" would be a
> better naming, and make it distinct from the "global" nature of the
> old kmap() interface?

Right, _local or _thread would be more intuitive.

> However, another solution might be to just use this new preemptible
> "local" kmap(), and remove the old global one entirely. Yes, the old
> global one caches the page table mapping and that sounds really
> efficient and nice. But it's actually horribly horribly bad, because
> it means that we need to use locking for them. Your new "temporary"
> implementation seems to be fundamentally better locking-wise, and only
> need preemption disabling as locking (and is equally fast for the
> non-highmem case).
>
> So I wonder if the single-page TLB flush isn't a better model, and
> whether it wouldn't be a lot simpler to just get rid of the old
> complex kmap() entirely, and replace it with this?
>
> I agree we can't replace the kmap_atomic() version, because maybe
> people depend on the preemption disabling it also implied. But what
> about replacing the non-atomic kmap()?
>
> Maybe I've missed something.  Is it because the new interface still
> does "pagefault_disable()" perhaps?
>
> But does it even need the pagefault_disable() at all? Yes, the
> *atomic* one obviously needed it. But why does this new one need to
> disable page faults?

It disables pagefaults because it can be called from atomic and
non-atomic context. That was the point to get rid of

 if (preeemptible())
kmap();
 else
kmap_atomic();

If it does not disable pagefaults, then it's just a lightweight variant
of kmap() for short lived mappings.

> But apart from that question about naming (and perhaps replacing
> kmap() entirely), I very much like it.

I thought about it, but then I figured that kmap pointers can be
handed to other contexts from the thread which sets up the mapping
because it's 'permanent'.

I'm not sure whether that actually happens, so we'd need to audit all
kmap() users to be sure. If there is no such use case, then we surely
can get of rid of kmap() completely. It's only 300+ instances to stare
at and quite some of them are wrapped into other functions.

Highmem sucks no matter what and the only sane solution is to remove it
completely.

Thanks,

tglx



Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends

2020-09-19 Thread Thomas Gleixner
On Sat, Sep 19 2020 at 12:37, Daniel Vetter wrote:
> On Sat, Sep 19, 2020 at 12:35 PM Daniel Vetter  wrote:
>> I think it should be the case, but I want to double check: Will
>> copy_*_user be allowed within a kmap_temporary section? This would
>> allow us to ditch an absolute pile of slowpaths.
>
> (coffee just kicked in) copy_*_user is ofc allowed, but if you hit a
> page fault you get a short read/write. This looks like it would remove
> the need to handle these in a slowpath, since page faults can now be
> served in this new kmap_temporary sections. But this sounds too good
> to be true, so I'm wondering what I'm missing.

In principle we could allow pagefaults, but not with the currently
proposed interface which can be called from any context. Obviously if
called from atomic context it can't handle user page faults.

In theory we could make a variant which does not disable pagefaults, but
that's what kmap() already provides.

Thanks,

tglx





Re: [PATCH v2] powerpc/tm: Save and restore AMR on treclaim and trechkpt

2020-09-19 Thread Aneesh Kumar K.V
Gustavo Romero  writes:

> Althought AMR is stashed in the checkpoint area, currently we don't save
> it to the per thread checkpoint struct after a treclaim and so we don't
> restore it either from that struct when we trechkpt. As a consequence when
> the transaction is later rolled back the kernel space AMR value when the
> trechkpt was done appears in userspace.
>
> That commit saves and restores AMR accordingly on treclaim and trechkpt.
> Since AMR value is also used in kernel space in other functions, it also
> takes care of stashing kernel live AMR into the stack before treclaim and
> before trechkpt, restoring it later, just before returning from tm_reclaim
> and __tm_recheckpoint.
>
> Is also fixes two nonrelated comments about CR and MSR.
>

Tested-by: Aneesh Kumar K.V 

> Signed-off-by: Gustavo Romero 
> ---
>  arch/powerpc/include/asm/processor.h |  1 +
>  arch/powerpc/kernel/asm-offsets.c|  1 +
>  arch/powerpc/kernel/tm.S | 35 
>  3 files changed, 33 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/processor.h 
> b/arch/powerpc/include/asm/processor.h
> index ed0d633ab5aa..9f4f6cc033ac 100644
> --- a/arch/powerpc/include/asm/processor.h
> +++ b/arch/powerpc/include/asm/processor.h
> @@ -220,6 +220,7 @@ struct thread_struct {
>   unsigned long   tm_tar;
>   unsigned long   tm_ppr;
>   unsigned long   tm_dscr;
> + unsigned long   tm_amr;
>  
>   /*
>* Checkpointed FP and VSX 0-31 register set.
> diff --git a/arch/powerpc/kernel/asm-offsets.c 
> b/arch/powerpc/kernel/asm-offsets.c
> index 8711c2164b45..c2722ff36e98 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -176,6 +176,7 @@ int main(void)
>   OFFSET(THREAD_TM_TAR, thread_struct, tm_tar);
>   OFFSET(THREAD_TM_PPR, thread_struct, tm_ppr);
>   OFFSET(THREAD_TM_DSCR, thread_struct, tm_dscr);
> + OFFSET(THREAD_TM_AMR, thread_struct, tm_amr);
>   OFFSET(PT_CKPT_REGS, thread_struct, ckpt_regs);
>   OFFSET(THREAD_CKVRSTATE, thread_struct, ckvr_state.vr);
>   OFFSET(THREAD_CKVRSAVE, thread_struct, ckvrsave);
> diff --git a/arch/powerpc/kernel/tm.S b/arch/powerpc/kernel/tm.S
> index 6ba0fdd1e7f8..2b91f233b05d 100644
> --- a/arch/powerpc/kernel/tm.S
> +++ b/arch/powerpc/kernel/tm.S
> @@ -122,6 +122,13 @@ _GLOBAL(tm_reclaim)
>   std r3, STK_PARAM(R3)(r1)
>   SAVE_NVGPRS(r1)
>  
> + /*
> +  * Save kernel live AMR since it will be clobbered by treclaim
> +  * but can be used elsewhere later in kernel space.
> +  */
> + mfspr   r3, SPRN_AMR
> + std r3, TM_FRAME_L1(r1)
> +
>   /* We need to setup MSR for VSX register save instructions. */
>   mfmsr   r14
>   mr  r15, r14
> @@ -245,7 +252,7 @@ _GLOBAL(tm_reclaim)
>* but is used in signal return to 'wind back' to the abort handler.
>*/
>  
> - /*  CR,LR,CCR,MSR ** */
> + /* * CTR, LR, CR, XER ** */
>   mfctr   r3
>   mflrr4
>   mfcrr5
> @@ -256,7 +263,6 @@ _GLOBAL(tm_reclaim)
>   std r5, _CCR(r7)
>   std r6, _XER(r7)
>  
> -
>   /*  TAR, DSCR ** */
>   mfspr   r3, SPRN_TAR
>   mfspr   r4, SPRN_DSCR
> @@ -264,6 +270,10 @@ _GLOBAL(tm_reclaim)
>   std r3, THREAD_TM_TAR(r12)
>   std r4, THREAD_TM_DSCR(r12)
>  
> +/*  AMR  */
> +mfsprr3, SPRN_AMR
> +std  r3, THREAD_TM_AMR(r12)
> +
>   /*
>* MSR and flags: We don't change CRs, and we don't need to alter MSR.
>*/
> @@ -308,7 +318,9 @@ _GLOBAL(tm_reclaim)
>   std r3, THREAD_TM_TFHAR(r12)
>   std r4, THREAD_TM_TFIAR(r12)
>  
> - /* AMR is checkpointed too, but is unsupported by Linux. */
> + /* Restore kernel live AMR */
> + ld  r8, TM_FRAME_L1(r1)
> + mtspr   SPRN_AMR, r8
>  
>   /* Restore original MSR/IRQ state & clear TM mode */
>   ld  r14, TM_FRAME_L0(r1)/* Orig MSR */
> @@ -355,6 +367,13 @@ _GLOBAL(__tm_recheckpoint)
>*/
>   SAVE_NVGPRS(r1)
>  
> + /*
> +  * Save kernel live AMR since it will be clobbered for trechkpt
> +  * but can be used elsewhere later in kernel space.
> +  */
> + mfspr   r8, SPRN_AMR
> + std r8, TM_FRAME_L0(r1)
> +
>   /* Load complete register state from ts_ckpt* registers */
>  
>   addir7, r3, PT_CKPT_REGS/* Thread's ckpt_regs */
> @@ -404,7 +423,7 @@ _GLOBAL(__tm_recheckpoint)
>  
>  restore_gprs:
>  
> - /*  CR,LR,CCR,MSR ** */
> + /* ** CTR, LR, XER * */
>   ld  r4, _CTR(r7)
>   ld  r5, _LINK(r7)
>   ld  r8, _XER(r7)
> @@ -417,6 +436,10 @@ restore_gprs:
>   ld  r4, THREAD_TM_TAR(r3)
>   mtspr   SPRN_TAR,   r4
>  
> + /* *

Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag

2020-09-19 Thread Al Viro
On Sat, Sep 19, 2020 at 05:14:41PM -0700, Andy Lutomirski wrote:

> > 2) have you counted the syscalls that do and do not need that?
> 
> No.

Might be illuminating...

> > 3) how many of those realistically *can* be unified with their
> > compat counterparts?  [hint: ioctl(2) cannot]
> 
> There would be no requirement to unify anything.  The idea is that
> we'd get rid of all the global state flags.

_What_ global state flags?  When you have separate SYSCALL_DEFINE3(ioctl...)
and COMPAT_SYSCALL_DEFINE3(ioctl...), there's no flags at all, global or
local.  They only come into the play when you try to share the same function
for both, right on the top level.

> For ioctl, we'd have a new file_operation:
> 
> long ioctl(struct file *, unsigned int, unsigned long, enum syscall_arch);
> 
> I'm not saying this is easy, but I think it's possible and the result
> would be more obviously correct than what we have now.

No, it would not.  Seriously, from time to time a bit of RTFS before grand
proposals turns out to be useful.


[PATCH v1 2/2] usb: dwc2: add support for APM82181 USB OTG

2020-09-19 Thread Christian Lamparter
adds the specific compatible string for the DWC2 IP found in the APM82181
SoCs. The IP is setup correctly through the auto detection... With the
exception of the AHB Burst Size. The default of GAHBCFG_HBSTLEN_INCR4 of
the "snps,dwc2" can cause a system hang when the USB and SATA is used
concurrently. Because the predecessor (PPC460EX (Canyonlands)) already
had the same problem, this SoC can make use of the existing
dwc2_set_amcc_params() function.

Signed-off-by: Christian Lamparter 
---
 drivers/usb/dwc2/params.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/usb/dwc2/params.c b/drivers/usb/dwc2/params.c
index 8f9d061c4d5f..6d2b9a6c247c 100644
--- a/drivers/usb/dwc2/params.c
+++ b/drivers/usb/dwc2/params.c
@@ -210,6 +210,7 @@ const struct of_device_id dwc2_of_match_table[] = {
{ .compatible = "amlogic,meson-g12a-usb",
  .data = dwc2_set_amlogic_g12a_params },
{ .compatible = "amcc,dwc-otg", .data = dwc2_set_amcc_params },
+   { .compatible = "apm,apm82181-dwc-otg", .data = dwc2_set_amcc_params },
{ .compatible = "st,stm32f4x9-fsotg",
  .data = dwc2_set_stm32f4x9_fsotg_params },
{ .compatible = "st,stm32f4x9-hsotg" },
-- 
2.28.0



[PATCH v1 1/2] dt-bindings: usb: dwc2: add support for APM82181 SoCs USB OTG HS and FS

2020-09-19 Thread Christian Lamparter
adds the specific compatible string for the DWC2 IP found in the APM82181
SoCs. The APM82181's USB-OTG seems like it was taken from its direct
predecessor: the PPC460EX (canyonlands).

Signed-off-by: Christian Lamparter 
---
 Documentation/devicetree/bindings/usb/dwc2.yaml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/usb/dwc2.yaml 
b/Documentation/devicetree/bindings/usb/dwc2.yaml
index ffa157a0fce7..34ddb5c877a1 100644
--- a/Documentation/devicetree/bindings/usb/dwc2.yaml
+++ b/Documentation/devicetree/bindings/usb/dwc2.yaml
@@ -39,6 +39,7 @@ properties:
   - amlogic,meson-g12a-usb
   - const: snps,dwc2
   - const: amcc,dwc-otg
+  - const: apm,apm82181-dwc-otg
   - const: snps,dwc2
   - const: st,stm32f4x9-fsotg
   - const: st,stm32f4x9-hsotg
-- 
2.28.0



Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag

2020-09-19 Thread Andy Lutomirski
On Sat, Sep 19, 2020 at 4:24 PM Al Viro  wrote:
>
> On Sat, Sep 19, 2020 at 03:53:40PM -0700, Andy Lutomirski wrote:
>
> > > It would not be a win - most of the syscalls don't give a damn
> > > about 32bit vs. 64bit...
> >
> > Any reasonable implementation would optimize it out for syscalls that don’t 
> > care.  Or it could be explicit:
> >
> > DEFINE_MULTIARCH_SYSCALL(...)
>
> 1) what would that look like?

In effect, it would work like this:

/* Arch-specific, but there's a generic case for sane architectures. */
enum syscall_arch {
  SYSCALL_NATIVE,
  SYSCALL_COMPAT,
  SYSCALL_X32,
};

DEFINE_MULTIARCH_SYSCALLn(args, arch)
{
  args are the args here, and arch is the arch.
}

> 2) have you counted the syscalls that do and do not need that?

No.

> 3) how many of those realistically *can* be unified with their
> compat counterparts?  [hint: ioctl(2) cannot]

There would be no requirement to unify anything.  The idea is that
we'd get rid of all the global state flags.

For ioctl, we'd have a new file_operation:

long ioctl(struct file *, unsigned int, unsigned long, enum syscall_arch);

I'm not saying this is easy, but I think it's possible and the result
would be more obviously correct than what we have now.


Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag

2020-09-19 Thread Al Viro
On Sat, Sep 19, 2020 at 03:53:40PM -0700, Andy Lutomirski wrote:

> > It would not be a win - most of the syscalls don't give a damn
> > about 32bit vs. 64bit...
> 
> Any reasonable implementation would optimize it out for syscalls that don’t 
> care.  Or it could be explicit:
> 
> DEFINE_MULTIARCH_SYSCALL(...)

1) what would that look like?
2) have you counted the syscalls that do and do not need that?
3) how many of those realistically *can* be unified with their
compat counterparts?  [hint: ioctl(2) cannot]


Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag

2020-09-19 Thread Andy Lutomirski



> On Sep 19, 2020, at 3:41 PM, Al Viro  wrote:
> 
> On Sat, Sep 19, 2020 at 03:23:54PM -0700, Andy Lutomirski wrote:
>> 
 On Sep 19, 2020, at 3:09 PM, Al Viro  wrote:
>>> 
>>> On Fri, Sep 18, 2020 at 05:16:15PM +0200, Christoph Hellwig wrote:
> On Fri, Sep 18, 2020 at 02:58:22PM +0100, Al Viro wrote:
> Said that, why not provide a variant that would take an explicit
> "is it compat" argument and use it there?  And have the normal
> one pass in_compat_syscall() to that...
 
 That would help to not introduce a regression with this series yes.
 But it wouldn't fix existing bugs when io_uring is used to access
 read or write methods that use in_compat_syscall().  One example that
 I recently ran into is drivers/scsi/sg.c.
>>> 
>>> So screw such read/write methods - don't use them with io_uring.
>>> That, BTW, is one of the reasons I'm sceptical about burying the
>>> decisions deep into the callchain - we don't _want_ different
>>> data layouts on read/write depending upon the 32bit vs. 64bit
>>> caller, let alone the pointer-chasing garbage that is /dev/sg.
>> 
>> Well, we could remove in_compat_syscall(), etc and instead have an implicit 
>> parameter in DEFINE_SYSCALL.  Then everything would have to be explicit.  
>> This would probably be a win, although it could be quite a bit of work.
> 
> It would not be a win - most of the syscalls don't give a damn
> about 32bit vs. 64bit...

Any reasonable implementation would optimize it out for syscalls that don’t 
care.  Or it could be explicit:

DEFINE_MULTIARCH_SYSCALL(...)

Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag

2020-09-19 Thread Al Viro
On Sat, Sep 19, 2020 at 03:23:54PM -0700, Andy Lutomirski wrote:
> 
> > On Sep 19, 2020, at 3:09 PM, Al Viro  wrote:
> > 
> > On Fri, Sep 18, 2020 at 05:16:15PM +0200, Christoph Hellwig wrote:
> >>> On Fri, Sep 18, 2020 at 02:58:22PM +0100, Al Viro wrote:
> >>> Said that, why not provide a variant that would take an explicit
> >>> "is it compat" argument and use it there?  And have the normal
> >>> one pass in_compat_syscall() to that...
> >> 
> >> That would help to not introduce a regression with this series yes.
> >> But it wouldn't fix existing bugs when io_uring is used to access
> >> read or write methods that use in_compat_syscall().  One example that
> >> I recently ran into is drivers/scsi/sg.c.
> > 
> > So screw such read/write methods - don't use them with io_uring.
> > That, BTW, is one of the reasons I'm sceptical about burying the
> > decisions deep into the callchain - we don't _want_ different
> > data layouts on read/write depending upon the 32bit vs. 64bit
> > caller, let alone the pointer-chasing garbage that is /dev/sg.
> 
> Well, we could remove in_compat_syscall(), etc and instead have an implicit 
> parameter in DEFINE_SYSCALL.  Then everything would have to be explicit.  
> This would probably be a win, although it could be quite a bit of work.

It would not be a win - most of the syscalls don't give a damn
about 32bit vs. 64bit...


Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag

2020-09-19 Thread Andy Lutomirski


> On Sep 19, 2020, at 3:09 PM, Al Viro  wrote:
> 
> On Fri, Sep 18, 2020 at 05:16:15PM +0200, Christoph Hellwig wrote:
>>> On Fri, Sep 18, 2020 at 02:58:22PM +0100, Al Viro wrote:
>>> Said that, why not provide a variant that would take an explicit
>>> "is it compat" argument and use it there?  And have the normal
>>> one pass in_compat_syscall() to that...
>> 
>> That would help to not introduce a regression with this series yes.
>> But it wouldn't fix existing bugs when io_uring is used to access
>> read or write methods that use in_compat_syscall().  One example that
>> I recently ran into is drivers/scsi/sg.c.
> 
> So screw such read/write methods - don't use them with io_uring.
> That, BTW, is one of the reasons I'm sceptical about burying the
> decisions deep into the callchain - we don't _want_ different
> data layouts on read/write depending upon the 32bit vs. 64bit
> caller, let alone the pointer-chasing garbage that is /dev/sg.

Well, we could remove in_compat_syscall(), etc and instead have an implicit 
parameter in DEFINE_SYSCALL.  Then everything would have to be explicit.  This 
would probably be a win, although it could be quite a bit of work.

Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag

2020-09-19 Thread Andy Lutomirski


> On Sep 19, 2020, at 2:16 PM, Arnd Bergmann  wrote:
> 
> On Sat, Sep 19, 2020 at 6:21 PM Andy Lutomirski  wrote:
>>> On Fri, Sep 18, 2020 at 8:16 AM Christoph Hellwig  wrote:
>>> On Fri, Sep 18, 2020 at 02:58:22PM +0100, Al Viro wrote:
 Said that, why not provide a variant that would take an explicit
 "is it compat" argument and use it there?  And have the normal
 one pass in_compat_syscall() to that...
>>> 
>>> That would help to not introduce a regression with this series yes.
>>> But it wouldn't fix existing bugs when io_uring is used to access
>>> read or write methods that use in_compat_syscall().  One example that
>>> I recently ran into is drivers/scsi/sg.c.
> 
> Ah, so reading /dev/input/event* would suffer from the same issue,
> and that one would in fact be broken by your patch in the hypothetical
> case that someone tried to use io_uring to read /dev/input/event on x32...
> 
> For reference, I checked the socket timestamp handling that has a
> number of corner cases with time32/time64 formats in compat mode,
> but none of those appear to be affected by the problem.
> 
>> Aside from the potentially nasty use of per-task variables, one thing
>> I don't like about PF_FORCE_COMPAT is that it's one-way.  If we're
>> going to have a generic mechanism for this, shouldn't we allow a full
>> override of the syscall arch instead of just allowing forcing compat
>> so that a compat syscall can do a non-compat operation?
> 
> The only reason it's needed here is that the caller is in a kernel
> thread rather than a system call. Are there any possible scenarios
> where one would actually need the opposite?
> 

I can certainly imagine needing to force x32 mode from a kernel thread.

As for the other direction: what exactly are the desired bitness/arch semantics 
of io_uring?  Is the operation bitness chosen by the io_uring creation or by 
the io_uring_enter() bitness?

Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag

2020-09-19 Thread Al Viro
On Fri, Sep 18, 2020 at 05:16:15PM +0200, Christoph Hellwig wrote:
> On Fri, Sep 18, 2020 at 02:58:22PM +0100, Al Viro wrote:
> > Said that, why not provide a variant that would take an explicit
> > "is it compat" argument and use it there?  And have the normal
> > one pass in_compat_syscall() to that...
> 
> That would help to not introduce a regression with this series yes.
> But it wouldn't fix existing bugs when io_uring is used to access
> read or write methods that use in_compat_syscall().  One example that
> I recently ran into is drivers/scsi/sg.c.

So screw such read/write methods - don't use them with io_uring.
That, BTW, is one of the reasons I'm sceptical about burying the
decisions deep into the callchain - we don't _want_ different
data layouts on read/write depending upon the 32bit vs. 64bit
caller, let alone the pointer-chasing garbage that is /dev/sg.


Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag

2020-09-19 Thread Finn Thain
On Sat, 19 Sep 2020, Arnd Bergmann wrote:

> On Sat, Sep 19, 2020 at 6:21 PM Andy Lutomirski  wrote:
> > On Fri, Sep 18, 2020 at 8:16 AM Christoph Hellwig  wrote:
> > > On Fri, Sep 18, 2020 at 02:58:22PM +0100, Al Viro wrote:
> > > > Said that, why not provide a variant that would take an explicit 
> > > > "is it compat" argument and use it there?  And have the normal one 
> > > > pass in_compat_syscall() to that...
> > >
> > > That would help to not introduce a regression with this series yes. 
> > > But it wouldn't fix existing bugs when io_uring is used to access 
> > > read or write methods that use in_compat_syscall().  One example 
> > > that I recently ran into is drivers/scsi/sg.c.
> 
> Ah, so reading /dev/input/event* would suffer from the same issue, and 
> that one would in fact be broken by your patch in the hypothetical case 
> that someone tried to use io_uring to read /dev/input/event on x32...
> 
> For reference, I checked the socket timestamp handling that has a number 
> of corner cases with time32/time64 formats in compat mode, but none of 
> those appear to be affected by the problem.
> 
> > Aside from the potentially nasty use of per-task variables, one thing 
> > I don't like about PF_FORCE_COMPAT is that it's one-way.  If we're 
> > going to have a generic mechanism for this, shouldn't we allow a full 
> > override of the syscall arch instead of just allowing forcing compat 
> > so that a compat syscall can do a non-compat operation?
> 
> The only reason it's needed here is that the caller is in a kernel 
> thread rather than a system call. Are there any possible scenarios where 
> one would actually need the opposite?
> 

Quite possibly. The ext4 vs. compat getdents bug is still unresolved. 
Please see,
https://lore.kernel.org/lkml/cafeaca9w+jk7_trttnl1p2es1knnpjx9wcuvhflwxlq9aug...@mail.gmail.com/

>Arnd
> 


Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag

2020-09-19 Thread Arnd Bergmann
On Sat, Sep 19, 2020 at 6:21 PM Andy Lutomirski  wrote:
> On Fri, Sep 18, 2020 at 8:16 AM Christoph Hellwig  wrote:
> > On Fri, Sep 18, 2020 at 02:58:22PM +0100, Al Viro wrote:
> > > Said that, why not provide a variant that would take an explicit
> > > "is it compat" argument and use it there?  And have the normal
> > > one pass in_compat_syscall() to that...
> >
> > That would help to not introduce a regression with this series yes.
> > But it wouldn't fix existing bugs when io_uring is used to access
> > read or write methods that use in_compat_syscall().  One example that
> > I recently ran into is drivers/scsi/sg.c.

Ah, so reading /dev/input/event* would suffer from the same issue,
and that one would in fact be broken by your patch in the hypothetical
case that someone tried to use io_uring to read /dev/input/event on x32...

For reference, I checked the socket timestamp handling that has a
number of corner cases with time32/time64 formats in compat mode,
but none of those appear to be affected by the problem.

> Aside from the potentially nasty use of per-task variables, one thing
> I don't like about PF_FORCE_COMPAT is that it's one-way.  If we're
> going to have a generic mechanism for this, shouldn't we allow a full
> override of the syscall arch instead of just allowing forcing compat
> so that a compat syscall can do a non-compat operation?

The only reason it's needed here is that the caller is in a kernel
thread rather than a system call. Are there any possible scenarios
where one would actually need the opposite?

   Arnd


Re: [PATCH v3 2/5] powerpc: apm82181: create shared dtsi for APM bluestone

2020-09-19 Thread Christian Lamparter

On 2020-09-15 03:05, Rob Herring wrote:

On Sun, Sep 06, 2020 at 12:06:12AM +0200, Christian Lamparter wrote:

This patch adds an DTSI-File that can be used by various device-tree
files for APM82181-based devices.

Some of the nodes (like UART, PCIE, SATA) are used by the uboot and
need to stick with the naming-conventions of the old times'.
I've added comments whenever this was the case.

Signed-off-by: Chris Blake 
Signed-off-by: Christian Lamparter 
---
rfc v1 -> v2:
- removed PKA (this CryptoPU will need driver)
- stick with compatibles, nodes, ... from either
  Bluestone (APM82181) or Canyonlands (PPC460EX).
- add labels for NAND and NOR to help with access.
v2 -> v3:
- nodename of pciex@d was changed to pcie@d..
  due to upstream patch.
- use simple-bus on the ebc, opb and plb nodes
---
  arch/powerpc/boot/dts/apm82181.dtsi | 466 
  1 file changed, 466 insertions(+)
  create mode 100644 arch/powerpc/boot/dts/apm82181.dtsi

diff --git a/arch/powerpc/boot/dts/apm82181.dtsi 
b/arch/powerpc/boot/dts/apm82181.dtsi
new file mode 100644
index ..60283430978d
--- /dev/null
+++ b/arch/powerpc/boot/dts/apm82181.dtsi
@@ -0,0 +1,466 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Device Tree template include for various APM82181 boards.
+ *
+ * The SoC is an evolution of the PPC460EX predecessor.
+ * This is why dt-nodes from the canyonlands EBC, OPB, USB,
+ * DMA, SATA, EMAC, ... ended up in here.
+ *
+ * Copyright (c) 2010, Applied Micro Circuits Corporation
+ * Author: Tirumala R Marri ,
+ *Christian Lamparter ,
+ *Chris Blake 
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+/ {
+   #address-cells = <2>;
+   #size-cells = <1>;
+   dcr-parent = <&{/cpus/cpu@0}>;
+
+   aliases {
+   ethernet0 = &EMAC0; /* needed for BSP u-boot */
+   };
+
+   cpus {
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   CPU0: cpu@0 {
+   device_type = "cpu";
+   model = "PowerPC,apm82181";


This doesn't match the existing bluestone dts file.

Please separate any restructuring from changes.



"I see" (I'm including your comment of the dt-binding as well).

I'm getting the vibe that I better should not touch that bluestone.dts.
An honestly, looking at the series and patches that the APM-engineers
posted back in the day, I can see why this well is so poisoned... and
stuff like SATA/AHBDMA/USB/GPIO/CPM/... was missing.

As for the devices. In the spirit of Arnd Bergmann's post of


|It would be nice to move over the the bluestone .dts to the apm82181.dtsi file
|when that gets added, if only to ensure they use the same description for each
|node, but that shouldn't stop the addition of the new file if that is needed 
for
|distros to make use of a popular device.
|I see a couple of additional files in openwrt.

I mean I don't have the bluestone dev board, just the consumer devices.
Would it be possible to support those? I can start from a "skeleton" 
apm82181.dtsi
This would just include CPU, Memory (SD-RAM+L2C+OCM), UIC 
(Interrupt-Controller),
the PLB+OBP+EBC Busses and UART. Just enough to make a board "boot from ram".

And then add nodes for PCIE+MSI, AHBDMA+SATA, I2C, Ethernet, NAND+NOR and 
finally
the Crypto each in separate patches.


Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends

2020-09-19 Thread Linus Torvalds
On Sat, Sep 19, 2020 at 10:39 AM Matthew Wilcox  wrote:
>
> My concern with that is people might use kmap() and then pass the address
> to a different task.  So we need to audit the current users of kmap()
> and convert any that do that into using vmap() instead.

Ahh. Yes, I guess they might do that. It sounds strange, but not
entirely crazy - I could imagine some "PIO thread" that does IO to a
page that has been set up by somebody else using kmap(). Or similar.

Linus


Re: [PATCH AUTOSEL 5.4 101/330] powerpc/powernv/ioda: Fix ref count for devices with their own PE

2020-09-19 Thread Sasha Levin

On Fri, Sep 18, 2020 at 08:35:06AM +0200, Frederic Barrat wrote:



Le 18/09/2020 à 03:57, Sasha Levin a écrit :

From: Frederic Barrat 

[ Upstream commit 05dd7da76986937fb288b4213b1fa10dbe0d1b33 ]



This patch is not desirable for stable, for 5.4 and 4.19 (it was 
already flagged by autosel back in April. Not sure why it's showing 
again now)


Hey Fred,

This was a bit of a "lie", it wasn't a run of AUTOSEL, but rather an
audit of patches that went into distro/vendor trees but not into the
upstream stable trees.

I can see that this patch was pulled into Ubuntu's 5.4 tree, is it not
needed in the upstream stable tree?

--
Thanks,
Sasha


Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends

2020-09-19 Thread Matthew Wilcox
On Sat, Sep 19, 2020 at 10:18:54AM -0700, Linus Torvalds wrote:
> On Sat, Sep 19, 2020 at 2:50 AM Thomas Gleixner  wrote:
> >
> > this provides a preemptible variant of kmap_atomic & related
> > interfaces. This is achieved by:
> 
> Ack. This looks really nice, even apart from the new capability.
> 
> The only thing I really reacted to is that the name doesn't make sense
> to me: "kmap_temporary()" seems a bit odd.
> 
> Particularly for an interface that really is basically meant as a
> better replacement of "kmap_atomic()" (but is perhaps also a better
> replacement for "kmap()").
> 
> I think I understand how the name came about: I think the "temporary"
> is there as a distinction from the "longterm" regular kmap(). So I
> think it makes some sense from an internal implementation angle, but I
> don't think it makes a lot of sense from an interface name.
> 
> I don't know what might be a better name, but if we want to emphasize
> that it's thread-private and a one-off, maybe "local" would be a
> better naming, and make it distinct from the "global" nature of the
> old kmap() interface?
> 
> However, another solution might be to just use this new preemptible
> "local" kmap(), and remove the old global one entirely. Yes, the old
> global one caches the page table mapping and that sounds really
> efficient and nice. But it's actually horribly horribly bad, because
> it means that we need to use locking for them. Your new "temporary"
> implementation seems to be fundamentally better locking-wise, and only
> need preemption disabling as locking (and is equally fast for the
> non-highmem case).
> 
> So I wonder if the single-page TLB flush isn't a better model, and
> whether it wouldn't be a lot simpler to just get rid of the old
> complex kmap() entirely, and replace it with this?
> 
> I agree we can't replace the kmap_atomic() version, because maybe
> people depend on the preemption disabling it also implied. But what
> about replacing the non-atomic kmap()?

My concern with that is people might use kmap() and then pass the address
to a different task.  So we need to audit the current users of kmap()
and convert any that do that into using vmap() instead.

I like kmap_local().  Or kmap_thread().


[PATCH] PCI: Convert enum pci_bus_flags to bit fields in struct pci_bus

2020-09-19 Thread Krzysztof Wilczyński
All the flags defined in the enum pci_bus_flags are used to determine
whether a particular feature of a PCI bus is available or not - features
are also often disabled via architecture or device-specific quirk.

These flags are tightly coupled with a PCI buses and PCI bridges and
primarily used in simple binary on/off manner to check whether something
is enabled or disabled, and have almost no other users (with an
exception of the x86 architecture-specific quirk) outside of the PCI
device drivers space.

Therefore, convert enum pci_bus_flags into a set of bit fields in the
struct pci_bus, and then drop said enum and the typedef pci_bus_flags_t.

This will keep PCI device-specific features as part of the struct
pci_dev and make the code that used to use flags simpler.

Related:
  https://patchwork.kernel.org/patch/11772809

Suggested-by: Bjorn Helgaas 
Signed-off-by: Krzysztof Wilczyński 
---
 arch/x86/pci/fixup.c|  6 +++---
 drivers/pci/msi.c   |  8 
 drivers/pci/pci-sysfs.c | 14 ++
 drivers/pci/pci.c   |  2 +-
 drivers/pci/pcie/aer.c  |  5 ++---
 drivers/pci/probe.c | 13 +
 drivers/pci/quirks.c| 16 
 include/linux/pci.h | 20 ++--
 8 files changed, 43 insertions(+), 41 deletions(-)

diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
index b8c9a5b87f37..50ff8aa542b8 100644
--- a/arch/x86/pci/fixup.c
+++ b/arch/x86/pci/fixup.c
@@ -641,14 +641,14 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x8c10, 
quirk_apple_mbp_poweroff);
  * ID, the AER driver should traverse the child device tree, reading
  * AER registers to find the faulting device.
  */
-static void quirk_no_aersid(struct pci_dev *pdev)
+static void quirk_no_aer_sid(struct pci_dev *pdev)
 {
/* VMD Domain */
if (is_vmd(pdev->bus) && pci_is_root_bus(pdev->bus))
-   pdev->bus->bus_flags |= PCI_BUS_FLAGS_NO_AERSID;
+   pdev->bus->no_aer_sid = 1;
 }
 DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, PCI_ANY_ID,
- PCI_CLASS_BRIDGE_PCI, 8, quirk_no_aersid);
+ PCI_CLASS_BRIDGE_PCI, 8, quirk_no_aer_sid);
 
 static void quirk_intel_th_dnv(struct pci_dev *dev)
 {
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 30ae4ffda5c1..01e4bdbc830e 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -875,13 +875,13 @@ static int pci_msi_supported(struct pci_dev *dev, int 
nvec)
 
/*
 * Any bridge which does NOT route MSI transactions from its
-* secondary bus to its primary bus must set NO_MSI flag on
-* the secondary pci_bus.
+* secondary bus to its primary bus must enable "no_msi" on
+* the secondary bus (pci_bus).
 * We expect only arch-specific PCI host bus controller driver
-* or quirks for specific PCI bridges to be setting NO_MSI.
+* or quirks for specific PCI bridges to enable "no_msi".
 */
for (bus = dev->bus; bus; bus = bus->parent)
-   if (bus->bus_flags & PCI_BUS_FLAGS_NO_MSI)
+   if (bus->no_msi)
return 0;
 
return 1;
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 6d78df981d41..eca214e45418 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -357,9 +357,7 @@ static ssize_t msi_bus_show(struct device *dev, struct 
device_attribute *attr,
struct pci_dev *pdev = to_pci_dev(dev);
struct pci_bus *subordinate = pdev->subordinate;
 
-   return sprintf(buf, "%u\n", subordinate ?
-  !(subordinate->bus_flags & PCI_BUS_FLAGS_NO_MSI)
-  : !pdev->no_msi);
+   return sprintf(buf, "%u\n", subordinate ? !subordinate->no_msi : 
!pdev->no_msi);
 }
 
 static ssize_t msi_bus_store(struct device *dev, struct device_attribute *attr,
@@ -376,9 +374,9 @@ static ssize_t msi_bus_store(struct device *dev, struct 
device_attribute *attr,
return -EPERM;
 
/*
-* "no_msi" and "bus_flags" only affect what happens when a driver
-* requests MSI or MSI-X.  They don't affect any drivers that have
-* already requested MSI or MSI-X.
+* "no_msi" enabled for device and bus only affect what happens
+* when a driver requests MSI or MSI-X.  They don't affect any
+* drivers that have already requested MSI or MSI-X.
 */
if (!subordinate) {
pdev->no_msi = !val;
@@ -388,9 +386,9 @@ static ssize_t msi_bus_store(struct device *dev, struct 
device_attribute *attr,
}
 
if (val)
-   subordinate->bus_flags &= ~PCI_BUS_FLAGS_NO_MSI;
+   subordinate->no_msi = 0;
else
-   subordinate->bus_flags |= PCI_BUS_FLAGS_NO_MSI;
+   subordinate->no_msi = 1;
 
dev_info(&subordinate->dev, "MSI/MSI-X %s for future drivers of devices 
on this bus\n",
 val ? "allowed" : "disallowed

Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends

2020-09-19 Thread Linus Torvalds
On Sat, Sep 19, 2020 at 2:50 AM Thomas Gleixner  wrote:
>
> this provides a preemptible variant of kmap_atomic & related
> interfaces. This is achieved by:

Ack. This looks really nice, even apart from the new capability.

The only thing I really reacted to is that the name doesn't make sense
to me: "kmap_temporary()" seems a bit odd.

Particularly for an interface that really is basically meant as a
better replacement of "kmap_atomic()" (but is perhaps also a better
replacement for "kmap()").

I think I understand how the name came about: I think the "temporary"
is there as a distinction from the "longterm" regular kmap(). So I
think it makes some sense from an internal implementation angle, but I
don't think it makes a lot of sense from an interface name.

I don't know what might be a better name, but if we want to emphasize
that it's thread-private and a one-off, maybe "local" would be a
better naming, and make it distinct from the "global" nature of the
old kmap() interface?

However, another solution might be to just use this new preemptible
"local" kmap(), and remove the old global one entirely. Yes, the old
global one caches the page table mapping and that sounds really
efficient and nice. But it's actually horribly horribly bad, because
it means that we need to use locking for them. Your new "temporary"
implementation seems to be fundamentally better locking-wise, and only
need preemption disabling as locking (and is equally fast for the
non-highmem case).

So I wonder if the single-page TLB flush isn't a better model, and
whether it wouldn't be a lot simpler to just get rid of the old
complex kmap() entirely, and replace it with this?

I agree we can't replace the kmap_atomic() version, because maybe
people depend on the preemption disabling it also implied. But what
about replacing the non-atomic kmap()?

Maybe I've missed something.  Is it because the new interface still
does "pagefault_disable()" perhaps?

But does it even need the pagefault_disable() at all? Yes, the
*atomic* one obviously needed it. But why does this new one need to
disable page faults?

[ I'm just reading the patches, I didn't try to apply them and look at
the end result, so I might have missed something ]

The only other worry I would have is just test coverage of this
change. I suspect very few developers use HIGHMEM. And I know the
various test robots don't tend to test 32-bit either.

But apart from that question about naming (and perhaps replacing
kmap() entirely), I very much like it.

Linus


Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag

2020-09-19 Thread Andy Lutomirski
On Fri, Sep 18, 2020 at 8:16 AM Christoph Hellwig  wrote:
>
> On Fri, Sep 18, 2020 at 02:58:22PM +0100, Al Viro wrote:
> > Said that, why not provide a variant that would take an explicit
> > "is it compat" argument and use it there?  And have the normal
> > one pass in_compat_syscall() to that...
>
> That would help to not introduce a regression with this series yes.
> But it wouldn't fix existing bugs when io_uring is used to access
> read or write methods that use in_compat_syscall().  One example that
> I recently ran into is drivers/scsi/sg.c.

Aside from the potentially nasty use of per-task variables, one thing
I don't like about PF_FORCE_COMPAT is that it's one-way.  If we're
going to have a generic mechanism for this, shouldn't we allow a full
override of the syscall arch instead of just allowing forcing compat
so that a compat syscall can do a non-compat operation?


[PATCH v2] powerpc/tm: Save and restore AMR on treclaim and trechkpt

2020-09-19 Thread Gustavo Romero
Althought AMR is stashed in the checkpoint area, currently we don't save
it to the per thread checkpoint struct after a treclaim and so we don't
restore it either from that struct when we trechkpt. As a consequence when
the transaction is later rolled back the kernel space AMR value when the
trechkpt was done appears in userspace.

That commit saves and restores AMR accordingly on treclaim and trechkpt.
Since AMR value is also used in kernel space in other functions, it also
takes care of stashing kernel live AMR into the stack before treclaim and
before trechkpt, restoring it later, just before returning from tm_reclaim
and __tm_recheckpoint.

Is also fixes two nonrelated comments about CR and MSR.

Signed-off-by: Gustavo Romero 
---
 arch/powerpc/include/asm/processor.h |  1 +
 arch/powerpc/kernel/asm-offsets.c|  1 +
 arch/powerpc/kernel/tm.S | 35 
 3 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index ed0d633ab5aa..9f4f6cc033ac 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -220,6 +220,7 @@ struct thread_struct {
unsigned long   tm_tar;
unsigned long   tm_ppr;
unsigned long   tm_dscr;
+   unsigned long   tm_amr;
 
/*
 * Checkpointed FP and VSX 0-31 register set.
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 8711c2164b45..c2722ff36e98 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -176,6 +176,7 @@ int main(void)
OFFSET(THREAD_TM_TAR, thread_struct, tm_tar);
OFFSET(THREAD_TM_PPR, thread_struct, tm_ppr);
OFFSET(THREAD_TM_DSCR, thread_struct, tm_dscr);
+   OFFSET(THREAD_TM_AMR, thread_struct, tm_amr);
OFFSET(PT_CKPT_REGS, thread_struct, ckpt_regs);
OFFSET(THREAD_CKVRSTATE, thread_struct, ckvr_state.vr);
OFFSET(THREAD_CKVRSAVE, thread_struct, ckvrsave);
diff --git a/arch/powerpc/kernel/tm.S b/arch/powerpc/kernel/tm.S
index 6ba0fdd1e7f8..2b91f233b05d 100644
--- a/arch/powerpc/kernel/tm.S
+++ b/arch/powerpc/kernel/tm.S
@@ -122,6 +122,13 @@ _GLOBAL(tm_reclaim)
std r3, STK_PARAM(R3)(r1)
SAVE_NVGPRS(r1)
 
+   /*
+* Save kernel live AMR since it will be clobbered by treclaim
+* but can be used elsewhere later in kernel space.
+*/
+   mfspr   r3, SPRN_AMR
+   std r3, TM_FRAME_L1(r1)
+
/* We need to setup MSR for VSX register save instructions. */
mfmsr   r14
mr  r15, r14
@@ -245,7 +252,7 @@ _GLOBAL(tm_reclaim)
 * but is used in signal return to 'wind back' to the abort handler.
 */
 
-   /*  CR,LR,CCR,MSR ** */
+   /* * CTR, LR, CR, XER ** */
mfctr   r3
mflrr4
mfcrr5
@@ -256,7 +263,6 @@ _GLOBAL(tm_reclaim)
std r5, _CCR(r7)
std r6, _XER(r7)
 
-
/*  TAR, DSCR ** */
mfspr   r3, SPRN_TAR
mfspr   r4, SPRN_DSCR
@@ -264,6 +270,10 @@ _GLOBAL(tm_reclaim)
std r3, THREAD_TM_TAR(r12)
std r4, THREAD_TM_DSCR(r12)
 
+/*  AMR  */
+mfspr  r3, SPRN_AMR
+stdr3, THREAD_TM_AMR(r12)
+
/*
 * MSR and flags: We don't change CRs, and we don't need to alter MSR.
 */
@@ -308,7 +318,9 @@ _GLOBAL(tm_reclaim)
std r3, THREAD_TM_TFHAR(r12)
std r4, THREAD_TM_TFIAR(r12)
 
-   /* AMR is checkpointed too, but is unsupported by Linux. */
+   /* Restore kernel live AMR */
+   ld  r8, TM_FRAME_L1(r1)
+   mtspr   SPRN_AMR, r8
 
/* Restore original MSR/IRQ state & clear TM mode */
ld  r14, TM_FRAME_L0(r1)/* Orig MSR */
@@ -355,6 +367,13 @@ _GLOBAL(__tm_recheckpoint)
 */
SAVE_NVGPRS(r1)
 
+   /*
+* Save kernel live AMR since it will be clobbered for trechkpt
+* but can be used elsewhere later in kernel space.
+*/
+   mfspr   r8, SPRN_AMR
+   std r8, TM_FRAME_L0(r1)
+
/* Load complete register state from ts_ckpt* registers */
 
addir7, r3, PT_CKPT_REGS/* Thread's ckpt_regs */
@@ -404,7 +423,7 @@ _GLOBAL(__tm_recheckpoint)
 
 restore_gprs:
 
-   /*  CR,LR,CCR,MSR ** */
+   /* ** CTR, LR, XER * */
ld  r4, _CTR(r7)
ld  r5, _LINK(r7)
ld  r8, _XER(r7)
@@ -417,6 +436,10 @@ restore_gprs:
ld  r4, THREAD_TM_TAR(r3)
mtspr   SPRN_TAR,   r4
 
+   /*  AMR  */
+   ld  r4, THREAD_TM_AMR(r3)
+   mtspr   SPRN_AMR, r4
+
/* Load up the PPR and DSCR in GPRs only at this stage */
ld 

RE: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag

2020-09-19 Thread David Laight
From: Al Viro
> Sent: 18 September 2020 14:58
> 
> On Fri, Sep 18, 2020 at 03:44:06PM +0200, Christoph Hellwig wrote:
> > On Fri, Sep 18, 2020 at 02:40:12PM +0100, Al Viro wrote:
> > > > /* Vector 0x110 is LINUX_32BIT_SYSCALL_TRAP */
> > > > -   return pt_regs_trap_type(current_pt_regs()) == 0x110;
> > > > +   return pt_regs_trap_type(current_pt_regs()) == 0x110 ||
> > > > +   (current->flags & PF_FORCE_COMPAT);
> > >
> > > Can't say I like that approach ;-/  Reasoning about the behaviour is much
> > > harder when it's controlled like that - witness set_fs() shite...
> >
> > I don't particularly like it either.  But do you have a better idea
> > how to deal with io_uring vs compat tasks?
> 
>  git rm fs/io_uring.c would make a good starting point 
> Yes, I know it's not going to happen, but one can dream...

Maybe the io_uring code needs some changes to make it vaguely safe.
- No support for 32-bit compat mixed working (or at all?).
  Plausibly a special worker could do 32bit work.
- ring structure (I'm assuming mapped by mmap()) never mapped
  in more than one process (not cloned by fork()).
- No implicit handover of files to another process.
  Would need an munmap, handover, mmap sequence.

In any case the io_ring rather abuses the import_iovec() interface.

The canonical sequence is (types from memory):
struct iovec cache[8], *iov = cache;
struct iter iter;
...
rval = import_iovec(..., &iov, 8, &iter);
// Do read/write user using 'iter'
free(iov);

I don't think there is any strict requirement that iter.iov
is set to either 'cache' or 'iov' (it probably must point
into one of them.)
But the io_uring code will make that assumption because the
actual copies can be done much later and it doesn't save 'iter'.
It gets itself in a right mess because it doesn't separate
the 'address I need to free' from 'the iov[] for any transfers'.

io_uring is also the only code that relies on import_iovec()
returning the iter.count on success.
It would be much better to have:
iov = import_iovec(..., &cache, ...);
free(iov);
and use ERR_PTR() et al for error detectoion.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)



RE: let import_iovec deal with compat_iovecs as well

2020-09-19 Thread David Laight
From: Christoph Hellwig
> Sent: 18 September 2020 13:45
> 
> this series changes import_iovec to transparently deal with comat iovec
> structures, and then cleanups up a lot of code dupliation.  But to get
> there it first has to fix the pre-existing bug that io_uring compat
> contexts don't trigger the in_compat_syscall() check.  This has so far
> been relatively harmless as very little code callable from io_uring used
> the check, and even that code that could be called usually wasn't.

I thought about that change while writing my import_iovec() => iovec_import()
patch - and thought that the io_uring code would (as usual) cause grief.

Christoph - did you see those patches?
David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)



RE: [PATCH] powerpc: Select HAVE_FUTEX_CMPXCHG

2020-09-19 Thread David Laight
From: Samuel Holland
> Sent: 19 September 2020 04:20
> 
> On powerpc, access_ok() succeeds for the NULL pointer. This breaks the
> dynamic check in futex_detect_cmpxchg(), which expects -EFAULT. As a
> result, robust futex operations are not functional on powerpc.

access_ok(NULL, sane_count) will succeed on all (maybe most) architectures.
All access_ok() does is check that kernel addresses aren't referenced.
(access_ok(kernel_adress, 0) is also likely to succeed.)

It is the access to user-address 0 that is expected to fault.
If this isn't faulting something else is wrong.

Historically (at least pre-elf, if not before) user programs
were linked to address zero - so the page was mapped.
(Linux may be too new to actually require it.)
Not sure what 'wine' requires for win-32 execuatbles.

ISTR there are also some 'crazy' ARM? cpu that read the interrupt
vectors from address 0 in user-space.

So assuming:

static void __init futex_detect_cmpxchg(void)
{
#ifndef CONFIG_HAVE_FUTEX_CMPXCHG
u32 curval;

/*
 * This will fail and we want it. Some arch implementations do
 * runtime detection of the futex_atomic_cmpxchg_inatomic()
 * functionality. We want to know that before we call in any
 * of the complex code paths. Also we want to prevent
 * registration of robust lists in that case. NULL is
 * guaranteed to fault and we get -EFAULT on functional
 * implementation, the non-functional ones will return
 * -ENOSYS.
 */
if (cmpxchg_futex_value_locked(&curval, NULL, 0, 0) == -EFAULT)
futex_cmpxchg_enabled = 1;
#endif
}

will fail -EFAULT because user address 0 is invalid seems hopeful.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)



Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends

2020-09-19 Thread Daniel Vetter
On Sat, Sep 19, 2020 at 12:35 PM Daniel Vetter  wrote:
>
> On Sat, Sep 19, 2020 at 11:50 AM Thomas Gleixner  wrote:
> >
> > First of all, sorry for the horribly big Cc list!
> >
> > Following up to the discussion in:
> >
> >   https://lore.kernel.org/r/20200914204209.256266...@linutronix.de
> >
> > this provides a preemptible variant of kmap_atomic & related
> > interfaces. This is achieved by:
> >
> >  - Consolidating all kmap atomic implementations in generic code
> >
> >  - Switching from per CPU storage of the kmap index to a per task storage
> >
> >  - Adding a pteval array to the per task storage which contains the ptevals
> >of the currently active temporary kmaps
> >
> >  - Adding context switch code which checks whether the outgoing or the
> >incoming task has active temporary kmaps. If so, the outgoing task's
> >kmaps are removed and the incoming task's kmaps are restored.
> >
> >  - Adding new interfaces k[un]map_temporary*() which are not disabling
> >preemption and can be called from any context (except NMI).
> >
> >Contrary to kmap() which provides preemptible and "persistant" mappings,
> >these interfaces are meant to replace the temporary mappings provided by
> >kmap_atomic*() today.
> >
> > This allows to get rid of conditional mapping choices and allows to have
> > preemptible short term mappings on 64bit which are today enforced to be
> > non-preemptible due to the highmem constraints. It clearly puts overhead on
> > the highmem users, but highmem is slow anyway.
> >
> > This is not a wholesale conversion which makes kmap_atomic magically
> > preemptible because there might be usage sites which rely on the implicit
> > preempt disable. So this needs to be done on a case by case basis and the
> > call sites converted to kmap_temporary.
> >
> > Note, that this is only lightly tested on X86 and completely untested on
> > all other architectures.
> >
> > The lot is also available from
> >
> >git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git highmem
>
> I think it should be the case, but I want to double check: Will
> copy_*_user be allowed within a kmap_temporary section? This would
> allow us to ditch an absolute pile of slowpaths.

(coffee just kicked in) copy_*_user is ofc allowed, but if you hit a
page fault you get a short read/write. This looks like it would remove
the need to handle these in a slowpath, since page faults can now be
served in this new kmap_temporary sections. But this sounds too good
to be true, so I'm wondering what I'm missing.
-Daniel
>
> >
> > Thanks,
> >
> > tglx
> > ---
> >  a/arch/arm/mm/highmem.c   |  121 -
> >  a/arch/microblaze/mm/highmem.c|   78 -
> >  a/arch/nds32/mm/highmem.c |   48 
> >  a/arch/powerpc/mm/highmem.c   |   67 ---
> >  a/arch/sparc/mm/highmem.c |  115 
> >  arch/arc/Kconfig  |1
> >  arch/arc/include/asm/highmem.h|8 +
> >  arch/arc/mm/highmem.c |   44 ---
> >  arch/arm/Kconfig  |1
> >  arch/arm/include/asm/highmem.h|   30 +++--
> >  arch/arm/mm/Makefile  |1
> >  arch/csky/Kconfig |1
> >  arch/csky/include/asm/highmem.h   |4
> >  arch/csky/mm/highmem.c|   75 -
> >  arch/microblaze/Kconfig   |1
> >  arch/microblaze/include/asm/highmem.h |6 -
> >  arch/microblaze/mm/Makefile   |1
> >  arch/microblaze/mm/init.c |6 -
> >  arch/mips/Kconfig |1
> >  arch/mips/include/asm/highmem.h   |4
> >  arch/mips/mm/highmem.c|   77 -
> >  arch/mips/mm/init.c   |3
> >  arch/nds32/Kconfig.cpu|1
> >  arch/nds32/include/asm/highmem.h  |   21 ++-
> >  arch/nds32/mm/Makefile|1
> >  arch/powerpc/Kconfig  |1
> >  arch/powerpc/include/asm/highmem.h|6 -
> >  arch/powerpc/mm/Makefile  |1
> >  arch/powerpc/mm/mem.c |7 -
> >  arch/sparc/Kconfig|1
> >  arch/sparc/include/asm/highmem.h  |7 -
> >  arch/sparc/mm/Makefile|3
> >  arch/sparc/mm/srmmu.c |2
> >  arch/x86/include/asm/fixmap.h |1
> >  arch/x86/include/asm/highmem.h|   12 +-
> >  arch/x86/include/asm/iomap.h  |   29 +++--
> >  arch/x86/mm/highmem_32.c  |   59 --
> >  arch/x86/mm/init_32.c |   15 --
> >  arch/x86/mm/iomap_32.c|   57 --
> >  arch/xtensa/Kconfig   |1
> >  arch/xtensa/include/asm/highmem.h |9 +
> >  arch/xtensa/mm/highmem.c  |   44 ---
> >  b/arch/x86/Kconfig|3
> >  include/linux/highmem.h   |  141 +

Re: [patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends

2020-09-19 Thread Daniel Vetter
On Sat, Sep 19, 2020 at 11:50 AM Thomas Gleixner  wrote:
>
> First of all, sorry for the horribly big Cc list!
>
> Following up to the discussion in:
>
>   https://lore.kernel.org/r/20200914204209.256266...@linutronix.de
>
> this provides a preemptible variant of kmap_atomic & related
> interfaces. This is achieved by:
>
>  - Consolidating all kmap atomic implementations in generic code
>
>  - Switching from per CPU storage of the kmap index to a per task storage
>
>  - Adding a pteval array to the per task storage which contains the ptevals
>of the currently active temporary kmaps
>
>  - Adding context switch code which checks whether the outgoing or the
>incoming task has active temporary kmaps. If so, the outgoing task's
>kmaps are removed and the incoming task's kmaps are restored.
>
>  - Adding new interfaces k[un]map_temporary*() which are not disabling
>preemption and can be called from any context (except NMI).
>
>Contrary to kmap() which provides preemptible and "persistant" mappings,
>these interfaces are meant to replace the temporary mappings provided by
>kmap_atomic*() today.
>
> This allows to get rid of conditional mapping choices and allows to have
> preemptible short term mappings on 64bit which are today enforced to be
> non-preemptible due to the highmem constraints. It clearly puts overhead on
> the highmem users, but highmem is slow anyway.
>
> This is not a wholesale conversion which makes kmap_atomic magically
> preemptible because there might be usage sites which rely on the implicit
> preempt disable. So this needs to be done on a case by case basis and the
> call sites converted to kmap_temporary.
>
> Note, that this is only lightly tested on X86 and completely untested on
> all other architectures.
>
> The lot is also available from
>
>git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git highmem

I think it should be the case, but I want to double check: Will
copy_*_user be allowed within a kmap_temporary section? This would
allow us to ditch an absolute pile of slowpaths.
-Daniel

>
> Thanks,
>
> tglx
> ---
>  a/arch/arm/mm/highmem.c   |  121 -
>  a/arch/microblaze/mm/highmem.c|   78 -
>  a/arch/nds32/mm/highmem.c |   48 
>  a/arch/powerpc/mm/highmem.c   |   67 ---
>  a/arch/sparc/mm/highmem.c |  115 
>  arch/arc/Kconfig  |1
>  arch/arc/include/asm/highmem.h|8 +
>  arch/arc/mm/highmem.c |   44 ---
>  arch/arm/Kconfig  |1
>  arch/arm/include/asm/highmem.h|   30 +++--
>  arch/arm/mm/Makefile  |1
>  arch/csky/Kconfig |1
>  arch/csky/include/asm/highmem.h   |4
>  arch/csky/mm/highmem.c|   75 -
>  arch/microblaze/Kconfig   |1
>  arch/microblaze/include/asm/highmem.h |6 -
>  arch/microblaze/mm/Makefile   |1
>  arch/microblaze/mm/init.c |6 -
>  arch/mips/Kconfig |1
>  arch/mips/include/asm/highmem.h   |4
>  arch/mips/mm/highmem.c|   77 -
>  arch/mips/mm/init.c   |3
>  arch/nds32/Kconfig.cpu|1
>  arch/nds32/include/asm/highmem.h  |   21 ++-
>  arch/nds32/mm/Makefile|1
>  arch/powerpc/Kconfig  |1
>  arch/powerpc/include/asm/highmem.h|6 -
>  arch/powerpc/mm/Makefile  |1
>  arch/powerpc/mm/mem.c |7 -
>  arch/sparc/Kconfig|1
>  arch/sparc/include/asm/highmem.h  |7 -
>  arch/sparc/mm/Makefile|3
>  arch/sparc/mm/srmmu.c |2
>  arch/x86/include/asm/fixmap.h |1
>  arch/x86/include/asm/highmem.h|   12 +-
>  arch/x86/include/asm/iomap.h  |   29 +++--
>  arch/x86/mm/highmem_32.c  |   59 --
>  arch/x86/mm/init_32.c |   15 --
>  arch/x86/mm/iomap_32.c|   57 --
>  arch/xtensa/Kconfig   |1
>  arch/xtensa/include/asm/highmem.h |9 +
>  arch/xtensa/mm/highmem.c  |   44 ---
>  b/arch/x86/Kconfig|3
>  include/linux/highmem.h   |  141 +++-
>  include/linux/io-mapping.h|2
>  include/linux/sched.h |9 +
>  kernel/sched/core.c   |   10 +
>  mm/Kconfig|3
>  mm/highmem.c  |  192 
> --
>  49 files changed, 422 insertions(+), 909 deletions(-)



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH v2] kbuild: preprocess module linker script

2020-09-19 Thread Jessica Yu

+++ Masahiro Yamada [08/09/20 13:27 +0900]:

There was a request to preprocess the module linker script like we
do for the vmlinux one. (https://lkml.org/lkml/2020/8/21/512)

The difference between vmlinux.lds and module.lds is that the latter
is needed for external module builds, thus must be cleaned up by
'make mrproper' instead of 'make clean'. Also, it must be created
by 'make modules_prepare'.

You cannot put it in arch/$(SRCARCH)/kernel/, which is cleaned up by
'make clean'. I moved arch/$(SRCARCH)/kernel/module.lds to
arch/$(SRCARCH)/include/asm/module.lds.h, which is included from
scripts/module.lds.S.

scripts/module.lds is fine because 'make clean' keeps all the
build artifacts under scripts/.

You can add arch-specific sections in .

Signed-off-by: Masahiro Yamada 
Tested-by: Jessica Yu 
Acked-by: Will Deacon 


Acked-by: Jessica Yu 

Thanks for working on this! 



[patch RFC 15/15] mm/highmem: Provide kmap_temporary*

2020-09-19 Thread Thomas Gleixner
Now that the kmap atomic index is stored in task struct provide a
preemptible variant. On context switch the maps of an outgoing task are
removed and the map of the incoming task are restored. That's obviously
slow, but highmem is slow anyway.

The kmap_temporary and iomap_temporary interfaces can be invoked from both
preemptible and atomic context.

A wholesale conversion of kmap_atomic to be fully preemptible is not
possible because some of the usage sites might rely on the preemption
disable for serialization or per CPUness. Needs to be done on a case by
case basis.

Signed-off-by: Thomas Gleixner 
---
 arch/x86/include/asm/iomap.h |   16 -
 arch/x86/mm/iomap_32.c   |7 +---
 include/linux/highmem.h  |   70 +--
 mm/highmem.c |   18 +--
 4 files changed, 80 insertions(+), 31 deletions(-)

--- a/arch/x86/include/asm/iomap.h
+++ b/arch/x86/include/asm/iomap.h
@@ -13,11 +13,23 @@
 #include 
 #include 
 
-void __iomem *iomap_atomic_pfn_prot(unsigned long pfn, pgprot_t prot);
+void __iomem *iomap_temporary_pfn_prot(unsigned long pfn, pgprot_t prot);
+
+static inline void __iomem *iomap_atomic_pfn_prot(unsigned long pfn,
+ pgprot_t prot)
+{
+   preempt_disable();
+   return iomap_temporary_pfn_prot(pfn, prot);
+}
+
+static inline void iounmap_temporary(void __iomem *vaddr)
+{
+   kunmap_temporary_indexed((void __force *)vaddr);
+}
 
 static inline void iounmap_atomic(void __iomem *vaddr)
 {
-   kunmap_atomic_indexed((void __force *)vaddr);
+   iounmap_temporary(vaddr);
preempt_enable();
 }
 
--- a/arch/x86/mm/iomap_32.c
+++ b/arch/x86/mm/iomap_32.c
@@ -44,7 +44,7 @@ void iomap_free(resource_size_t base, un
 }
 EXPORT_SYMBOL_GPL(iomap_free);
 
-void __iomem *iomap_atomic_pfn_prot(unsigned long pfn, pgprot_t prot)
+void __iomem *iomap_temporary_pfn_prot(unsigned long pfn, pgprot_t prot)
 {
/*
 * For non-PAT systems, translate non-WB request to UC- just in
@@ -60,7 +60,6 @@ void __iomem *iomap_atomic_pfn_prot(unsi
/* Filter out unsupported __PAGE_KERNEL* bits: */
pgprot_val(prot) &= __default_kernel_pte_mask;
 
-   preempt_disable();
-   return (void __force __iomem *)kmap_atomic_pfn_prot(pfn, prot);
+   return (void __force __iomem *)__kmap_temporary_pfn_prot(pfn, prot);
 }
-EXPORT_SYMBOL_GPL(iomap_atomic_pfn_prot);
+EXPORT_SYMBOL_GPL(iomap_temporary_pfn_prot);
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -35,9 +35,9 @@ static inline void invalidate_kernel_vma
  * Outside of CONFIG_HIGHMEM to support X86 32bit iomap_atomic() cruft.
  */
 #ifdef CONFIG_KMAP_ATOMIC_GENERIC
-void *kmap_atomic_pfn_prot(unsigned long pfn, pgprot_t prot);
-void *kmap_atomic_page_prot(struct page *page, pgprot_t prot);
-void kunmap_atomic_indexed(void *vaddr);
+void *__kmap_temporary_pfn_prot(unsigned long pfn, pgprot_t prot);
+void *__kmap_temporary_page_prot(struct page *page, pgprot_t prot);
+void kunmap_temporary_indexed(void *vaddr);
 void kmap_switch_temporary(struct task_struct *prev, struct task_struct *next);
 # ifndef ARCH_NEEDS_KMAP_HIGH_GET
 static inline void *arch_kmap_temporary_high_get(struct page *page)
@@ -95,16 +95,35 @@ static inline void kunmap(struct page *p
  * be used in IRQ contexts, so in some (very limited) cases we need
  * it.
  */
-static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot)
+static inline void *kmap_temporary_page_prot(struct page *page, pgprot_t prot)
 {
-   preempt_disable();
-   return kmap_atomic_page_prot(page, prot);
+   return __kmap_temporary_page_prot(page, prot);
 }
 
-static inline void *kmap_atomic_pfn(unsigned long pfn)
+static inline void *kmap_temporary_page(struct page *page)
+{
+   return kmap_temporary_page_prot(page, kmap_prot);
+}
+
+static inline void *kmap_temporary_pfn_prot(unsigned long pfn, pgprot_t prot)
+{
+   return __kmap_temporary_pfn_prot(pfn, prot);
+}
+
+static inline void *kmap_temporary_pfn(unsigned long pfn)
+{
+   return kmap_temporary_pfn_prot(pfn, kmap_prot);
+}
+
+static inline void __kunmap_temporary(void *vaddr)
+{
+   kunmap_temporary_indexed(vaddr);
+}
+
+static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot)
 {
preempt_disable();
-   return kmap_atomic_pfn_prot(pfn, kmap_prot);
+   return kmap_temporary_page_prot(page, prot);
 }
 
 static inline void *kmap_atomic(struct page *page)
@@ -112,9 +131,10 @@ static inline void *kmap_atomic(struct p
return kmap_atomic_prot(page, kmap_prot);
 }
 
-static inline void __kunmap_atomic(void *addr)
+static inline void *kmap_atomic_pfn(unsigned long pfn)
 {
-   kumap_atomic_indexed(addr);
+   preempt_disable();
+   return kmap_temporary_pfn_prot(pfn, kmap_prot);
 }
 
 /* declarations for linux/mm/highmem.c */
@@ -177,6 +197,22 @@ static inline void kunmap(struct page *p
 #endif
 }
 
+static inline void 

[patch RFC 14/15] sched: highmem: Store temporary kmaps in task struct

2020-09-19 Thread Thomas Gleixner
Instead of storing the map per CPU provide and use per task storage. That
prepares for temporary kmaps which are preemptible.

The context switch code is preparatory and not yet in use because
kmap_atomic() runs with preemption disabled. Will be made usable in the
next step.

Signed-off-by: Thomas Gleixner 
---
 include/linux/highmem.h |1 
 include/linux/sched.h   |9 +++
 kernel/sched/core.c |   10 
 mm/highmem.c|   59 ++--
 4 files changed, 72 insertions(+), 7 deletions(-)

--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -38,6 +38,7 @@ static inline void invalidate_kernel_vma
 void *kmap_atomic_pfn_prot(unsigned long pfn, pgprot_t prot);
 void *kmap_atomic_page_prot(struct page *page, pgprot_t prot);
 void kunmap_atomic_indexed(void *vaddr);
+void kmap_switch_temporary(struct task_struct *prev, struct task_struct *next);
 # ifndef ARCH_NEEDS_KMAP_HIGH_GET
 static inline void *arch_kmap_temporary_high_get(struct page *page)
 {
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* task_struct member predeclarations (sorted alphabetically): */
 struct audit_context;
@@ -628,6 +629,13 @@ struct wake_q_node {
struct wake_q_node *next;
 };
 
+struct kmap_ctrl {
+#ifdef CONFIG_KMAP_ATOMIC_GENERIC
+   int idx;
+   pte_t   pteval[KM_TYPE_NR];
+#endif
+};
+
 struct task_struct {
 #ifdef CONFIG_THREAD_INFO_IN_TASK
/*
@@ -1280,6 +1288,7 @@ struct task_struct {
unsigned intsequential_io;
unsigned intsequential_io_avg;
 #endif
+   struct kmap_ctrlkmap_ctrl;
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
unsigned long   task_state_change;
 #endif
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3529,6 +3529,15 @@ static inline void finish_lock_switch(st
 # define finish_arch_post_lock_switch()do { } while (0)
 #endif
 
+static inline void kmap_temp_switch(struct task_struct *prev,
+   struct task_struct *next)
+{
+#ifdef CONFIG_HIGHMEM
+   if (unlikely(prev->kmap_ctrl.idx || next->kmap_ctrl.idx))
+   kmap_switch_temporary(prev, next);
+#endif
+}
+
 /**
  * prepare_task_switch - prepare to switch tasks
  * @rq: the runqueue preparing to switch
@@ -3551,6 +3560,7 @@ prepare_task_switch(struct rq *rq, struc
perf_event_task_sched_out(prev, next);
rseq_preempt(prev);
fire_sched_out_preempt_notifiers(prev, next);
+   kmap_temp_switch(prev, next);
prepare_task(next);
prepare_arch_switch(next);
 }
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -370,6 +370,7 @@ void kunmap_high(struct page *page)
if (need_wakeup)
wake_up(pkmap_map_wait);
 }
+
 EXPORT_SYMBOL(kunmap_high);
 #else
 static inline void kmap_high_unmap_temporary(unsigned long vaddr) { }
@@ -377,11 +378,9 @@ static inline void kmap_high_unmap_tempo
 
 #ifdef CONFIG_KMAP_ATOMIC_GENERIC
 
-static DEFINE_PER_CPU(int, __kmap_atomic_idx);
-
 static inline int kmap_atomic_idx_push(void)
 {
-   int idx = __this_cpu_inc_return(__kmap_atomic_idx) - 1;
+   int idx = current->kmap_ctrl.idx++;
 
WARN_ON_ONCE(in_irq() && !irqs_disabled());
BUG_ON(idx >= KM_TYPE_NR);
@@ -390,14 +389,13 @@ static inline int kmap_atomic_idx_push(v
 
 static inline int kmap_atomic_idx(void)
 {
-   return __this_cpu_read(__kmap_atomic_idx) - 1;
+   return current->kmap_ctrl.idx - 1;
 }
 
 static inline void kmap_atomic_idx_pop(void)
 {
-   int idx = __this_cpu_dec_return(__kmap_atomic_idx);
-
-   BUG_ON(idx < 0);
+   current->kmap_ctrl.idx--;
+   BUG_ON(current->kmap_ctrl.idx < 0);
 }
 
 #ifndef arch_kmap_temp_post_map
@@ -447,6 +445,7 @@ static void *__kmap_atomic_pfn_prot(unsi
pteval = pfn_pte(pfn, prot);
set_pte(kmap_pte - idx, pteval);
arch_kmap_temp_post_map(vaddr, pteval);
+   current->kmap_ctrl.pteval[kmap_atomic_idx()] = pteval;
preempt_enable();
 
return (void *)vaddr;
@@ -499,11 +498,57 @@ void kunmap_atomic_indexed(void *vaddr)
arch_kmap_temp_pre_unmap(addr);
pte_clear(&init_mm, addr, kmap_pte - idx);
arch_kmap_temp_post_unmap(addr);
+   current->kmap_ctrl.pteval[kmap_atomic_idx()] = __pte(0);
kmap_atomic_idx_pop();
preempt_enable();
pagefault_enable();
 }
 EXPORT_SYMBOL(kunmap_atomic_indexed);
+
+void kmap_switch_temporary(struct task_struct *prev, struct task_struct *next)
+{
+   pte_t *kmap_pte = kmap_get_pte();
+   int i;
+
+   /* Clear @prev's kmaps */
+   for (i = 0; i < prev->kmap_ctrl.idx; i++) {
+   pte_t pteval = prev->kmap_ctrl.pteval[i];
+   unsigned long addr;
+   int idx;
+
+   if (WARN_ON_ONCE(pte_none(pteval)))
+

[patch RFC 13/15] mm/highmem: Remove the old kmap_atomic cruft

2020-09-19 Thread Thomas Gleixner
Signed-off-by: Thomas Gleixner 
---
 include/linux/highmem.h |   65 ++--
 mm/highmem.c|   28 +---
 2 files changed, 28 insertions(+), 65 deletions(-)

--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -94,27 +94,6 @@ static inline void kunmap(struct page *p
  * be used in IRQ contexts, so in some (very limited) cases we need
  * it.
  */
-
-#ifndef CONFIG_KMAP_ATOMIC_GENERIC
-void *kmap_atomic_high_prot(struct page *page, pgprot_t prot);
-void kunmap_atomic_high(void *kvaddr);
-
-static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot)
-{
-   preempt_disable();
-   pagefault_disable();
-   if (!PageHighMem(page))
-   return page_address(page);
-   return kmap_atomic_high_prot(page, prot);
-}
-
-static inline void __kunmap_atomic(void *vaddr)
-{
-   kunmap_atomic_high(vaddr);
-   pagefault_enable();
-}
-#else /* !CONFIG_KMAP_ATOMIC_GENERIC */
-
 static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot)
 {
preempt_disable();
@@ -127,17 +106,14 @@ static inline void *kmap_atomic_pfn(unsi
return kmap_atomic_pfn_prot(pfn, kmap_prot);
 }
 
-static inline void __kunmap_atomic(void *addr)
+static inline void *kmap_atomic(struct page *page)
 {
-   kumap_atomic_indexed(addr);
+   return kmap_atomic_prot(page, kmap_prot);
 }
 
-
-#endif /* CONFIG_KMAP_ATOMIC_GENERIC */
-
-static inline void *kmap_atomic(struct page *page)
+static inline void __kunmap_atomic(void *addr)
 {
-   return kmap_atomic_prot(page, kmap_prot);
+   kumap_atomic_indexed(addr);
 }
 
 /* declarations for linux/mm/highmem.c */
@@ -233,39 +209,6 @@ static inline void __kunmap_atomic(void
 
 #endif /* CONFIG_HIGHMEM */
 
-#if defined(CONFIG_HIGHMEM) || defined(CONFIG_X86_32)
-
-DECLARE_PER_CPU(int, __kmap_atomic_idx);
-
-static inline int kmap_atomic_idx_push(void)
-{
-   int idx = __this_cpu_inc_return(__kmap_atomic_idx) - 1;
-
-#ifdef CONFIG_DEBUG_HIGHMEM
-   WARN_ON_ONCE(in_irq() && !irqs_disabled());
-   BUG_ON(idx >= KM_TYPE_NR);
-#endif
-   return idx;
-}
-
-static inline int kmap_atomic_idx(void)
-{
-   return __this_cpu_read(__kmap_atomic_idx) - 1;
-}
-
-static inline void kmap_atomic_idx_pop(void)
-{
-#ifdef CONFIG_DEBUG_HIGHMEM
-   int idx = __this_cpu_dec_return(__kmap_atomic_idx);
-
-   BUG_ON(idx < 0);
-#else
-   __this_cpu_dec(__kmap_atomic_idx);
-#endif
-}
-
-#endif
-
 /*
  * Prevent people trying to call kunmap_atomic() as if it were kunmap()
  * kunmap_atomic() should get the return value of kmap_atomic, not the page.
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -31,10 +31,6 @@
 #include 
 #include 
 
-#if defined(CONFIG_HIGHMEM) || defined(CONFIG_X86_32)
-DEFINE_PER_CPU(int, __kmap_atomic_idx);
-#endif
-
 /*
  * Virtual_count is not a pure "count".
  *  0 means that it is not mapped, and has not been mapped
@@ -380,6 +376,30 @@ static inline void kmap_high_unmap_tempo
 #endif /* CONFIG_HIGHMEM */
 
 #ifdef CONFIG_KMAP_ATOMIC_GENERIC
+
+static DEFINE_PER_CPU(int, __kmap_atomic_idx);
+
+static inline int kmap_atomic_idx_push(void)
+{
+   int idx = __this_cpu_inc_return(__kmap_atomic_idx) - 1;
+
+   WARN_ON_ONCE(in_irq() && !irqs_disabled());
+   BUG_ON(idx >= KM_TYPE_NR);
+   return idx;
+}
+
+static inline int kmap_atomic_idx(void)
+{
+   return __this_cpu_read(__kmap_atomic_idx) - 1;
+}
+
+static inline void kmap_atomic_idx_pop(void)
+{
+   int idx = __this_cpu_dec_return(__kmap_atomic_idx);
+
+   BUG_ON(idx < 0);
+}
+
 #ifndef arch_kmap_temp_post_map
 # define arch_kmap_temp_post_map(vaddr, pteval)do { } while (0)
 #endif



[patch RFC 11/15] sparc/mm/highmem: Switch to generic kmap atomic

2020-09-19 Thread Thomas Gleixner
Signed-off-by: Thomas Gleixner 
Cc: "David S. Miller" 
Cc: sparcli...@vger.kernel.org
---
Note: Completely untested
---
 arch/sparc/Kconfig   |1 
 arch/sparc/include/asm/highmem.h |7 +-
 arch/sparc/mm/Makefile   |3 -
 arch/sparc/mm/highmem.c  |  115 ---
 arch/sparc/mm/srmmu.c|2 
 5 files changed, 6 insertions(+), 122 deletions(-)

--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -137,6 +137,7 @@ config MMU
 config HIGHMEM
bool
default y if SPARC32
+select KMAP_ATOMIC_GENERIC
 
 config ZONE_DMA
bool
--- a/arch/sparc/include/asm/highmem.h
+++ b/arch/sparc/include/asm/highmem.h
@@ -33,8 +33,6 @@ extern unsigned long highstart_pfn, high
 #define kmap_prot __pgprot(SRMMU_ET_PTE | SRMMU_PRIV | SRMMU_CACHE)
 extern pte_t *pkmap_page_table;
 
-void kmap_init(void) __init;
-
 /*
  * Right now we initialize only a single pte table. It can be extended
  * easily, subsequent pte tables have to be allocated in one physical
@@ -53,6 +51,11 @@ void kmap_init(void) __init;
 
 #define flush_cache_kmaps()flush_cache_all()
 
+/* FIXME: Use __flush_tlb_one(vaddr) instead of flush_cache_all() -- Anton */
+#define arch_kmap_temp_post_map(vaddr, pteval) flush_cache_all()
+#define arch_kmap_temp_post_unmap(vaddr)   flush_cache_all()
+
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_HIGHMEM_H */
--- a/arch/sparc/mm/Makefile
+++ b/arch/sparc/mm/Makefile
@@ -15,6 +15,3 @@ obj-$(CONFIG_SPARC32)   += leon_mm.o
 
 # Only used by sparc64
 obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
-
-# Only used by sparc32
-obj-$(CONFIG_HIGHMEM)   += highmem.o
--- a/arch/sparc/mm/highmem.c
+++ /dev/null
@@ -1,115 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- *  highmem.c: virtual kernel memory mappings for high memory
- *
- *  Provides kernel-static versions of atomic kmap functions originally
- *  found as inlines in include/asm-sparc/highmem.h.  These became
- *  needed as kmap_atomic() and kunmap_atomic() started getting
- *  called from within modules.
- *  -- Tomas Szepe , September 2002
- *
- *  But kmap_atomic() and kunmap_atomic() cannot be inlined in
- *  modules because they are loaded with btfixup-ped functions.
- */
-
-/*
- * The use of kmap_atomic/kunmap_atomic is discouraged - kmap/kunmap
- * gives a more generic (and caching) interface. But kmap_atomic can
- * be used in IRQ contexts, so in some (very limited) cases we need it.
- *
- * XXX This is an old text. Actually, it's good to use atomic kmaps,
- * provided you remember that they are atomic and not try to sleep
- * with a kmap taken, much like a spinlock. Non-atomic kmaps are
- * shared by CPUs, and so precious, and establishing them requires IPI.
- * Atomic kmaps are lightweight and we may have NCPUS more of them.
- */
-#include 
-#include 
-#include 
-
-#include 
-#include 
-#include 
-
-static pte_t *kmap_pte;
-
-void __init kmap_init(void)
-{
-   unsigned long address = __fix_to_virt(FIX_KMAP_BEGIN);
-
-/* cache the first kmap pte */
-kmap_pte = virt_to_kpte(address);
-}
-
-void *kmap_atomic_high_prot(struct page *page, pgprot_t prot)
-{
-   unsigned long vaddr;
-   long idx, type;
-
-   type = kmap_atomic_idx_push();
-   idx = type + KM_TYPE_NR*smp_processor_id();
-   vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
-
-/* XXX Fix - Anton */
-#if 0
-   __flush_cache_one(vaddr);
-#else
-   flush_cache_all();
-#endif
-
-#ifdef CONFIG_DEBUG_HIGHMEM
-   BUG_ON(!pte_none(*(kmap_pte-idx)));
-#endif
-   set_pte(kmap_pte-idx, mk_pte(page, prot));
-/* XXX Fix - Anton */
-#if 0
-   __flush_tlb_one(vaddr);
-#else
-   flush_tlb_all();
-#endif
-
-   return (void*) vaddr;
-}
-EXPORT_SYMBOL(kmap_atomic_high_prot);
-
-void kunmap_atomic_high(void *kvaddr)
-{
-   unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK;
-   int type;
-
-   if (vaddr < FIXADDR_START)
-   return;
-
-   type = kmap_atomic_idx();
-
-#ifdef CONFIG_DEBUG_HIGHMEM
-   {
-   unsigned long idx;
-
-   idx = type + KM_TYPE_NR * smp_processor_id();
-   BUG_ON(vaddr != __fix_to_virt(FIX_KMAP_BEGIN+idx));
-
-   /* XXX Fix - Anton */
-#if 0
-   __flush_cache_one(vaddr);
-#else
-   flush_cache_all();
-#endif
-
-   /*
-* force other mappings to Oops if they'll try to access
-* this pte without first remap it
-*/
-   pte_clear(&init_mm, vaddr, kmap_pte-idx);
-   /* XXX Fix - Anton */
-#if 0
-   __flush_tlb_one(vaddr);
-#else
-   flush_tlb_all();
-#endif
-   }
-#endif
-
-   kmap_atomic_idx_pop();
-}
-EXPORT_SYMBOL(kunmap_atomic_high);
--- a/arch/sparc/mm/srmmu.c
+++ b/arch/sparc/mm/srmmu.c
@@ -971,8 +971,6 @@ void __init srmmu_paging_init(void)
 
sparc_context_init(num_contexts);
 
-   kmap_init();
-

[patch RFC 10/15] powerpc/mm/highmem: Switch to generic kmap atomic

2020-09-19 Thread Thomas Gleixner
Signed-off-by: Thomas Gleixner 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: linuxppc-dev@lists.ozlabs.org
---
Note: Completely untested
---
 arch/powerpc/Kconfig   |1 
 arch/powerpc/include/asm/highmem.h |6 ++-
 arch/powerpc/mm/Makefile   |1 
 arch/powerpc/mm/highmem.c  |   67 -
 arch/powerpc/mm/mem.c  |7 ---
 5 files changed, 6 insertions(+), 76 deletions(-)

--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -406,6 +406,7 @@ menu "Kernel options"
 config HIGHMEM
bool "High memory support"
depends on PPC32
+select KMAP_ATOMIC_GENERIC
 
 source "kernel/Kconfig.hz"
 
--- a/arch/powerpc/include/asm/highmem.h
+++ b/arch/powerpc/include/asm/highmem.h
@@ -29,7 +29,6 @@
 #include 
 #include 
 
-extern pte_t *kmap_pte;
 extern pte_t *pkmap_page_table;
 
 /*
@@ -60,6 +59,11 @@ extern pte_t *pkmap_page_table;
 
 #define flush_cache_kmaps()flush_cache_all()
 
+#define arch_kmap_temp_post_map(vaddr, pteval) \
+   local_flush_tlb_page(NULL, vaddr)
+#define arch_kmap_temp_post_unmap(vaddr)   \
+   local_flush_tlb_page(NULL, vaddr)
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_HIGHMEM_H */
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -16,7 +16,6 @@ obj-$(CONFIG_NEED_MULTIPLE_NODES) += num
 obj-$(CONFIG_PPC_MM_SLICES)+= slice.o
 obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
 obj-$(CONFIG_NOT_COHERENT_CACHE) += dma-noncoherent.o
-obj-$(CONFIG_HIGHMEM)  += highmem.o
 obj-$(CONFIG_PPC_COPRO_BASE)   += copro_fault.o
 obj-$(CONFIG_PPC_PTDUMP)   += ptdump/
 obj-$(CONFIG_KASAN)+= kasan/
--- a/arch/powerpc/mm/highmem.c
+++ /dev/null
@@ -1,67 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * highmem.c: virtual kernel memory mappings for high memory
- *
- * PowerPC version, stolen from the i386 version.
- *
- * Used in CONFIG_HIGHMEM systems for memory pages which
- * are not addressable by direct kernel virtual addresses.
- *
- * Copyright (C) 1999 Gerhard Wichert, Siemens AG
- *   gerhard.wich...@pdb.siemens.de
- *
- *
- * Redesigned the x86 32-bit VM architecture to deal with
- * up to 16 Terrabyte physical memory. With current x86 CPUs
- * we now support up to 64 Gigabytes physical RAM.
- *
- * Copyright (C) 1999 Ingo Molnar 
- *
- * Reworked for PowerPC by various contributors. Moved from
- * highmem.h by Benjamin Herrenschmidt (c) 2009 IBM Corp.
- */
-
-#include 
-#include 
-
-void *kmap_atomic_high_prot(struct page *page, pgprot_t prot)
-{
-   unsigned long vaddr;
-   int idx, type;
-
-   type = kmap_atomic_idx_push();
-   idx = type + KM_TYPE_NR*smp_processor_id();
-   vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
-   WARN_ON(IS_ENABLED(CONFIG_DEBUG_HIGHMEM) && !pte_none(*(kmap_pte - 
idx)));
-   __set_pte_at(&init_mm, vaddr, kmap_pte-idx, mk_pte(page, prot), 1);
-   local_flush_tlb_page(NULL, vaddr);
-
-   return (void*) vaddr;
-}
-EXPORT_SYMBOL(kmap_atomic_high_prot);
-
-void kunmap_atomic_high(void *kvaddr)
-{
-   unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK;
-
-   if (vaddr < __fix_to_virt(FIX_KMAP_END))
-   return;
-
-   if (IS_ENABLED(CONFIG_DEBUG_HIGHMEM)) {
-   int type = kmap_atomic_idx();
-   unsigned int idx;
-
-   idx = type + KM_TYPE_NR * smp_processor_id();
-   WARN_ON(vaddr != __fix_to_virt(FIX_KMAP_BEGIN + idx));
-
-   /*
-* force other mappings to Oops if they'll try to access
-* this pte without first remap it
-*/
-   pte_clear(&init_mm, vaddr, kmap_pte-idx);
-   local_flush_tlb_page(NULL, vaddr);
-   }
-
-   kmap_atomic_idx_pop();
-}
-EXPORT_SYMBOL(kunmap_atomic_high);
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -60,11 +60,6 @@
 unsigned long long memory_limit;
 bool init_mem_is_free;
 
-#ifdef CONFIG_HIGHMEM
-pte_t *kmap_pte;
-EXPORT_SYMBOL(kmap_pte);
-#endif
-
 pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
  unsigned long size, pgprot_t vma_prot)
 {
@@ -233,8 +228,6 @@ void __init paging_init(void)
 
map_kernel_page(PKMAP_BASE, 0, __pgprot(0));/* XXX gross */
pkmap_page_table = virt_to_kpte(PKMAP_BASE);
-
-   kmap_pte = virt_to_kpte(__fix_to_virt(FIX_KMAP_BEGIN));
 #endif /* CONFIG_HIGHMEM */
 
printk(KERN_DEBUG "Top of RAM: 0x%llx, Total RAM: 0x%llx\n",



[patch RFC 09/15] nds32/mm/highmem: Switch to generic kmap atomic

2020-09-19 Thread Thomas Gleixner
The mapping code is odd and looks broken. See FIXME in the comment.

Signed-off-by: Thomas Gleixner 
Cc: Nick Hu 
Cc: Greentime Hu 
Cc: Vincent Chen 
---
Note: Completely untested
---
 arch/nds32/Kconfig.cpu   |1 
 arch/nds32/include/asm/highmem.h |   21 +
 arch/nds32/mm/Makefile   |1 
 arch/nds32/mm/highmem.c  |   48 ---
 4 files changed, 17 insertions(+), 54 deletions(-)

--- a/arch/nds32/Kconfig.cpu
+++ b/arch/nds32/Kconfig.cpu
@@ -157,6 +157,7 @@ config HW_SUPPORT_UNALIGNMENT_ACCESS
 config HIGHMEM
bool "High Memory Support"
depends on MMU && !CPU_CACHE_ALIASING
+select KMAP_ATOMIC_GENERIC
help
  The address space of Andes processors is only 4 Gigabytes large
  and it has to accommodate user address space, kernel address
--- a/arch/nds32/include/asm/highmem.h
+++ b/arch/nds32/include/asm/highmem.h
@@ -45,11 +45,22 @@ extern pte_t *pkmap_page_table;
 extern void kmap_init(void);
 
 /*
- * The following functions are already defined by 
- * when CONFIG_HIGHMEM is not set.
+ * FIXME: The below looks broken vs. a kmap_atomic() in task context which
+ * is interupted and another kmap_atomic() happens in interrupt context.
+ * But what do I know about nds32. -- tglx
  */
-#ifdef CONFIG_HIGHMEM
-extern void *kmap_atomic_pfn(unsigned long pfn);
-#endif
+#define arch_kmap_temp_post_map(vaddr, pteval) \
+   do {\
+   __nds32__tlbop_inv(vaddr);  \
+   __nds32__mtsr_dsb(vaddr, NDS32_SR_TLB_VPN); \
+   __nds32__tlbop_rwr(pteval); \
+   __nds32__isb(); \
+   } while (0)
+
+#define arch_kmap_temp_pre_unmap(vaddr, pte)   \
+   do {\
+   __nds32__tlbop_inv(vaddr);  \
+   __nds32__isb(); \
+   } while (0)
 
 #endif
--- a/arch/nds32/mm/Makefile
+++ b/arch/nds32/mm/Makefile
@@ -3,7 +3,6 @@ obj-y   := extable.o tlb.o fault.o init
mm-nds32.o cacheflush.o proc.o
 
 obj-$(CONFIG_ALIGNMENT_TRAP)   += alignment.o
-obj-$(CONFIG_HIGHMEM)   += highmem.o
 
 ifdef CONFIG_FUNCTION_TRACER
 CFLAGS_REMOVE_proc.o = $(CC_FLAGS_FTRACE)
--- a/arch/nds32/mm/highmem.c
+++ /dev/null
@@ -1,48 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-// Copyright (C) 2005-2017 Andes Technology Corporation
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-void *kmap_atomic_high_prot(struct page *page, pgprot_t prot)
-{
-   unsigned int idx;
-   unsigned long vaddr, pte;
-   int type;
-   pte_t *ptep;
-
-   type = kmap_atomic_idx_push();
-
-   idx = type + KM_TYPE_NR * smp_processor_id();
-   vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
-   pte = (page_to_pfn(page) << PAGE_SHIFT) | prot;
-   ptep = pte_offset_kernel(pmd_off_k(vaddr), vaddr);
-   set_pte(ptep, pte);
-
-   __nds32__tlbop_inv(vaddr);
-   __nds32__mtsr_dsb(vaddr, NDS32_SR_TLB_VPN);
-   __nds32__tlbop_rwr(pte);
-   __nds32__isb();
-   return (void *)vaddr;
-}
-EXPORT_SYMBOL(kmap_atomic_high_prot);
-
-void kunmap_atomic_high(void *kvaddr)
-{
-   if (kvaddr >= (void *)FIXADDR_START) {
-   unsigned long vaddr = (unsigned long)kvaddr;
-   pte_t *ptep;
-   kmap_atomic_idx_pop();
-   __nds32__tlbop_inv(vaddr);
-   __nds32__isb();
-   ptep = pte_offset_kernel(pmd_off_k(vaddr), vaddr);
-   set_pte(ptep, 0);
-   }
-}
-EXPORT_SYMBOL(kunmap_atomic_high);



[patch RFC 12/15] xtensa/mm/highmem: Switch to generic kmap atomic

2020-09-19 Thread Thomas Gleixner
Signed-off-by: Thomas Gleixner 
Cc: Chris Zankel 
Cc: Max Filippov 
Cc: linux-xte...@linux-xtensa.org
---
Note: Completely untested
---
 arch/xtensa/Kconfig   |1 
 arch/xtensa/include/asm/highmem.h |9 +++
 arch/xtensa/mm/highmem.c  |   44 +++---
 3 files changed, 14 insertions(+), 40 deletions(-)

--- a/arch/xtensa/Kconfig
+++ b/arch/xtensa/Kconfig
@@ -679,6 +679,7 @@ endchoice
 config HIGHMEM
bool "High Memory Support"
depends on MMU
+select KMAP_ATOMIC_GENERIC
help
  Linux can use the full amount of RAM in the system by
  default. However, the default MMUv2 setup only maps the
--- a/arch/xtensa/include/asm/highmem.h
+++ b/arch/xtensa/include/asm/highmem.h
@@ -68,6 +68,15 @@ static inline void flush_cache_kmaps(voi
flush_cache_all();
 }
 
+enum fixed_addresses kmap_temp_map_idx(int type, unsigned long pfn);
+#define arch_kmap_temp_map_idx kmap_temp_map_idx
+
+enum fixed_addresses kmap_temp_unmap_idx(int type, unsigned long addr);
+#define arch_kmap_temp_unmap_idx   kmap_temp_unmap_idx
+
+#define arch_kmap_temp_post_unmap(vaddr)   \
+   local_flush_tlb_kernel_range(vaddr, vaddr + PAGE_SIZE)
+
 void kmap_init(void);
 
 #endif
--- a/arch/xtensa/mm/highmem.c
+++ b/arch/xtensa/mm/highmem.c
@@ -12,8 +12,6 @@
 #include 
 #include 
 
-static pte_t *kmap_pte;
-
 #if DCACHE_WAY_SIZE > PAGE_SIZE
 unsigned int last_pkmap_nr_arr[DCACHE_N_COLORS];
 wait_queue_head_t pkmap_map_wait_arr[DCACHE_N_COLORS];
@@ -37,55 +35,21 @@ static inline enum fixed_addresses kmap_
color;
 }
 
-void *kmap_atomic_high_prot(struct page *page, pgprot_t prot)
+enum fixed_addresses kmap_temp_map_idx(int type, unsigned long pfn)
 {
-   enum fixed_addresses idx;
-   unsigned long vaddr;
-
-   idx = kmap_idx(kmap_atomic_idx_push(),
-  DCACHE_ALIAS(page_to_phys(page)));
-   vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
-#ifdef CONFIG_DEBUG_HIGHMEM
-   BUG_ON(!pte_none(*(kmap_pte + idx)));
-#endif
-   set_pte(kmap_pte + idx, mk_pte(page, prot));
-
-   return (void *)vaddr;
+   return kmap_idx(type, DCACHE_ALIAS(pfn << PAGE_SHIFT);
 }
-EXPORT_SYMBOL(kmap_atomic_high_prot);
 
-void kunmap_atomic_high(void *kvaddr)
+enum fixed_addresses kmap_temp_unmap_idx(int type, unsigned long addr)
 {
-   if (kvaddr >= (void *)FIXADDR_START &&
-   kvaddr < (void *)FIXADDR_TOP) {
-   int idx = kmap_idx(kmap_atomic_idx(),
-  DCACHE_ALIAS((unsigned long)kvaddr));
-
-   /*
-* Force other mappings to Oops if they'll try to access this
-* pte without first remap it.  Keeping stale mappings around
-* is a bad idea also, in case the page changes cacheability
-* attributes or becomes a protected page in a hypervisor.
-*/
-   pte_clear(&init_mm, kvaddr, kmap_pte + idx);
-   local_flush_tlb_kernel_range((unsigned long)kvaddr,
-(unsigned long)kvaddr + PAGE_SIZE);
-
-   kmap_atomic_idx_pop();
-   }
+   return kmap_idx(type, DCACHE_ALIAS(addr));
 }
-EXPORT_SYMBOL(kunmap_atomic_high);
 
 void __init kmap_init(void)
 {
-   unsigned long kmap_vstart;
-
/* Check if this memory layout is broken because PKMAP overlaps
 * page table.
 */
BUILD_BUG_ON(PKMAP_BASE < TLBTEMP_BASE_1 + TLBTEMP_SIZE);
-   /* cache the first kmap pte */
-   kmap_vstart = __fix_to_virt(FIX_KMAP_BEGIN);
-   kmap_pte = virt_to_kpte(kmap_vstart);
kmap_waitqueues_init();
 }



[patch RFC 07/15] microblaze/mm/highmem: Switch to generic kmap atomic

2020-09-19 Thread Thomas Gleixner
Signed-off-by: Thomas Gleixner 
Cc: Michal Simek 
---
Note: Completely untested
---
 arch/microblaze/Kconfig   |1 
 arch/microblaze/include/asm/highmem.h |6 ++
 arch/microblaze/mm/Makefile   |1 
 arch/microblaze/mm/highmem.c  |   78 --
 arch/microblaze/mm/init.c |6 --
 5 files changed, 6 insertions(+), 86 deletions(-)

--- a/arch/microblaze/Kconfig
+++ b/arch/microblaze/Kconfig
@@ -170,6 +170,7 @@ config XILINX_UNCACHED_SHADOW
 config HIGHMEM
bool "High memory support"
depends on MMU
+   select KMAP_ATOMIC_GENERIC
help
  The address space of Microblaze processors is only 4 Gigabytes large
  and it has to accommodate user address space, kernel address
--- a/arch/microblaze/include/asm/highmem.h
+++ b/arch/microblaze/include/asm/highmem.h
@@ -25,7 +25,6 @@
 #include 
 #include 
 
-extern pte_t *kmap_pte;
 extern pte_t *pkmap_page_table;
 
 /*
@@ -52,6 +51,11 @@ extern pte_t *pkmap_page_table;
 
 #define flush_cache_kmaps(){ flush_icache(); flush_dcache(); }
 
+#define arch_kmap_temp_post_map(vaddr, pteval) \
+   local_flush_tlb_page(NULL, vaddr);
+#define arch_kmap_temp_post_unmap(vaddr)   \
+   local_flush_tlb_page(NULL, vaddr);
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_HIGHMEM_H */
--- a/arch/microblaze/mm/Makefile
+++ b/arch/microblaze/mm/Makefile
@@ -6,4 +6,3 @@
 obj-y := consistent.o init.o
 
 obj-$(CONFIG_MMU) += pgtable.o mmu_context.o fault.o
-obj-$(CONFIG_HIGHMEM) += highmem.o
--- a/arch/microblaze/mm/highmem.c
+++ /dev/null
@@ -1,78 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * highmem.c: virtual kernel memory mappings for high memory
- *
- * PowerPC version, stolen from the i386 version.
- *
- * Used in CONFIG_HIGHMEM systems for memory pages which
- * are not addressable by direct kernel virtual addresses.
- *
- * Copyright (C) 1999 Gerhard Wichert, Siemens AG
- *   gerhard.wich...@pdb.siemens.de
- *
- *
- * Redesigned the x86 32-bit VM architecture to deal with
- * up to 16 Terrabyte physical memory. With current x86 CPUs
- * we now support up to 64 Gigabytes physical RAM.
- *
- * Copyright (C) 1999 Ingo Molnar 
- *
- * Reworked for PowerPC by various contributors. Moved from
- * highmem.h by Benjamin Herrenschmidt (c) 2009 IBM Corp.
- */
-
-#include 
-#include 
-
-/*
- * The use of kmap_atomic/kunmap_atomic is discouraged - kmap/kunmap
- * gives a more generic (and caching) interface. But kmap_atomic can
- * be used in IRQ contexts, so in some (very limited) cases we need
- * it.
- */
-#include 
-
-void *kmap_atomic_high_prot(struct page *page, pgprot_t prot)
-{
-
-   unsigned long vaddr;
-   int idx, type;
-
-   type = kmap_atomic_idx_push();
-   idx = type + KM_TYPE_NR*smp_processor_id();
-   vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
-#ifdef CONFIG_DEBUG_HIGHMEM
-   BUG_ON(!pte_none(*(kmap_pte-idx)));
-#endif
-   set_pte_at(&init_mm, vaddr, kmap_pte-idx, mk_pte(page, prot));
-   local_flush_tlb_page(NULL, vaddr);
-
-   return (void *) vaddr;
-}
-EXPORT_SYMBOL(kmap_atomic_high_prot);
-
-void kunmap_atomic_high(void *kvaddr)
-{
-   unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK;
-   int type;
-   unsigned int idx;
-
-   if (vaddr < __fix_to_virt(FIX_KMAP_END))
-   return;
-
-   type = kmap_atomic_idx();
-
-   idx = type + KM_TYPE_NR * smp_processor_id();
-#ifdef CONFIG_DEBUG_HIGHMEM
-   BUG_ON(vaddr != __fix_to_virt(FIX_KMAP_BEGIN + idx));
-#endif
-   /*
-* force other mappings to Oops if they'll try to access
-* this pte without first remap it
-*/
-   pte_clear(&init_mm, vaddr, kmap_pte-idx);
-   local_flush_tlb_page(NULL, vaddr);
-
-   kmap_atomic_idx_pop();
-}
-EXPORT_SYMBOL(kunmap_atomic_high);
--- a/arch/microblaze/mm/init.c
+++ b/arch/microblaze/mm/init.c
@@ -49,17 +49,11 @@ unsigned long lowmem_size;
 EXPORT_SYMBOL(min_low_pfn);
 EXPORT_SYMBOL(max_low_pfn);
 
-#ifdef CONFIG_HIGHMEM
-pte_t *kmap_pte;
-EXPORT_SYMBOL(kmap_pte);
-
 static void __init highmem_init(void)
 {
pr_debug("%x\n", (u32)PKMAP_BASE);
map_page(PKMAP_BASE, 0, 0); /* XXX gross */
pkmap_page_table = virt_to_kpte(PKMAP_BASE);
-
-   kmap_pte = virt_to_kpte(__fix_to_virt(FIX_KMAP_BEGIN));
 }
 
 static void highmem_setup(void)



[patch RFC 05/15] ARM: highmem: Switch to generic kmap atomic

2020-09-19 Thread Thomas Gleixner
Signed-off-by: Thomas Gleixner 
Cc: Russell King 
Cc: Arnd Bergmann 
Cc: linux-arm-ker...@lists.infradead.org
---
Note: Completely untested
---
 arch/arm/Kconfig   |1 
 arch/arm/include/asm/highmem.h |   30 +++---
 arch/arm/mm/Makefile   |1 
 arch/arm/mm/highmem.c  |  121 -
 4 files changed, 23 insertions(+), 130 deletions(-)

--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1499,6 +1499,7 @@ config HAVE_ARCH_PFN_VALID
 config HIGHMEM
bool "High Memory Support"
depends on MMU
+   select KMAP_ATOMIC_GENERIC
help
  The address space of ARM processors is only 4 Gigabytes large
  and it has to accommodate user address space, kernel address
--- a/arch/arm/include/asm/highmem.h
+++ b/arch/arm/include/asm/highmem.h
@@ -46,19 +46,33 @@ extern pte_t *pkmap_page_table;
 
 #ifdef ARCH_NEEDS_KMAP_HIGH_GET
 extern void *kmap_high_get(struct page *page);
+
+#ifdef CONFIG_DEBUG_HIGHMEM
+extern void *arch_kmap_temporary_high_get(struct page *page);
 #else
+static inline void *arch_kmap_temporary_high_get(struct page *page)
+{
+   return kmap_high_get(page);
+}
+#endif /* !CONFIG_DEBUG_HIGHMEM */
+
+#else /* ARCH_NEEDS_KMAP_HIGH_GET */
 static inline void *kmap_high_get(struct page *page)
 {
return NULL;
 }
-#endif
+#endif /* !ARCH_NEEDS_KMAP_HIGH_GET */
 
-/*
- * The following functions are already defined by 
- * when CONFIG_HIGHMEM is not set.
- */
-#ifdef CONFIG_HIGHMEM
-extern void *kmap_atomic_pfn(unsigned long pfn);
-#endif
+#define arch_kmap_temp_post_map(vaddr, pteval) \
+   local_flush_tlb_kernel_page(vaddr)
+
+#define arch_kmap_temp_pre_unmap(vaddr)
\
+do {   \
+   if (cache_is_vivt())\
+   __cpuc_flush_dcache_area((void *)vaddr, PAGE_SIZE); \
+} while (0)
+
+#define arch_kmap_temp_post_unmap(vaddr)   \
+   local_flush_tlb_kernel_page(vaddr)
 
 #endif
--- a/arch/arm/mm/Makefile
+++ b/arch/arm/mm/Makefile
@@ -19,7 +19,6 @@ obj-$(CONFIG_MODULES) += proc-syms.o
 obj-$(CONFIG_DEBUG_VIRTUAL)+= physaddr.o
 
 obj-$(CONFIG_ALIGNMENT_TRAP)   += alignment.o
-obj-$(CONFIG_HIGHMEM)  += highmem.o
 obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
 obj-$(CONFIG_ARM_PV_FIXUP) += pv-fixup-asm.o
 
--- a/arch/arm/mm/highmem.c
+++ /dev/null
@@ -1,121 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * arch/arm/mm/highmem.c -- ARM highmem support
- *
- * Author: Nicolas Pitre
- * Created:september 8, 2008
- * Copyright:  Marvell Semiconductors Inc.
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include "mm.h"
-
-static inline void set_fixmap_pte(int idx, pte_t pte)
-{
-   unsigned long vaddr = __fix_to_virt(idx);
-   pte_t *ptep = virt_to_kpte(vaddr);
-
-   set_pte_ext(ptep, pte, 0);
-   local_flush_tlb_kernel_page(vaddr);
-}
-
-static inline pte_t get_fixmap_pte(unsigned long vaddr)
-{
-   pte_t *ptep = virt_to_kpte(vaddr);
-
-   return *ptep;
-}
-
-void *kmap_atomic_high_prot(struct page *page, pgprot_t prot)
-{
-   unsigned int idx;
-   unsigned long vaddr;
-   void *kmap;
-   int type;
-
-#ifdef CONFIG_DEBUG_HIGHMEM
-   /*
-* There is no cache coherency issue when non VIVT, so force the
-* dedicated kmap usage for better debugging purposes in that case.
-*/
-   if (!cache_is_vivt())
-   kmap = NULL;
-   else
-#endif
-   kmap = kmap_high_get(page);
-   if (kmap)
-   return kmap;
-
-   type = kmap_atomic_idx_push();
-
-   idx = FIX_KMAP_BEGIN + type + KM_TYPE_NR * smp_processor_id();
-   vaddr = __fix_to_virt(idx);
-#ifdef CONFIG_DEBUG_HIGHMEM
-   /*
-* With debugging enabled, kunmap_atomic forces that entry to 0.
-* Make sure it was indeed properly unmapped.
-*/
-   BUG_ON(!pte_none(get_fixmap_pte(vaddr)));
-#endif
-   /*
-* When debugging is off, kunmap_atomic leaves the previous mapping
-* in place, so the contained TLB flush ensures the TLB is updated
-* with the new mapping.
-*/
-   set_fixmap_pte(idx, mk_pte(page, prot));
-
-   return (void *)vaddr;
-}
-EXPORT_SYMBOL(kmap_atomic_high_prot);
-
-void kunmap_atomic_high(void *kvaddr)
-{
-   unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK;
-   int idx, type;
-
-   if (kvaddr >= (void *)FIXADDR_START) {
-   type = kmap_atomic_idx();
-   idx = FIX_KMAP_BEGIN + type + KM_TYPE_NR * smp_processor_id();
-
-   if (cache_is_vivt())
-   __cpuc_flush_dcache_area((void *)vaddr, PAGE_SIZE);
-#ifdef CONFIG_DEBUG_HIGHMEM
-   BUG_ON(vaddr != __fix_to_virt(idx));
-   set_fi

[patch RFC 08/15] mips/mm/highmem: Switch to generic kmap atomic

2020-09-19 Thread Thomas Gleixner
Signed-off-by: Thomas Gleixner 
Cc: Thomas Bogendoerfer 
Cc: linux-m...@vger.kernel.org
---
Note: Completely untested
---
 arch/mips/Kconfig   |1 
 arch/mips/include/asm/highmem.h |4 +-
 arch/mips/mm/highmem.c  |   77 
 arch/mips/mm/init.c |3 -
 4 files changed, 3 insertions(+), 82 deletions(-)

--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -2654,6 +2654,7 @@ config MIPS_CRC_SUPPORT
 config HIGHMEM
bool "High Memory Support"
depends on 32BIT && CPU_SUPPORTS_HIGHMEM && SYS_SUPPORTS_HIGHMEM && 
!CPU_MIPS32_3_5_EVA
+   select KMAP_ATOMIC_GENERIC
 
 config CPU_SUPPORTS_HIGHMEM
bool
--- a/arch/mips/include/asm/highmem.h
+++ b/arch/mips/include/asm/highmem.h
@@ -48,11 +48,11 @@ extern pte_t *pkmap_page_table;
 
 #define ARCH_HAS_KMAP_FLUSH_TLB
 extern void kmap_flush_tlb(unsigned long addr);
-extern void *kmap_atomic_pfn(unsigned long pfn);
 
 #define flush_cache_kmaps()BUG_ON(cpu_has_dc_aliases)
 
-extern void kmap_init(void);
+#define arch_kmap_temp_post_map(vaddr, pteval) local_flush_tlb_one(vaddr)
+#define arch_kmap_temp_post_unmap(vaddr)   local_flush_tlb_one(vaddr)
 
 #endif /* __KERNEL__ */
 
--- a/arch/mips/mm/highmem.c
+++ b/arch/mips/mm/highmem.c
@@ -8,8 +8,6 @@
 #include 
 #include 
 
-static pte_t *kmap_pte;
-
 unsigned long highstart_pfn, highend_pfn;
 
 void kmap_flush_tlb(unsigned long addr)
@@ -17,78 +15,3 @@ void kmap_flush_tlb(unsigned long addr)
flush_tlb_one(addr);
 }
 EXPORT_SYMBOL(kmap_flush_tlb);
-
-void *kmap_atomic_high_prot(struct page *page, pgprot_t prot)
-{
-   unsigned long vaddr;
-   int idx, type;
-
-   type = kmap_atomic_idx_push();
-   idx = type + KM_TYPE_NR*smp_processor_id();
-   vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
-#ifdef CONFIG_DEBUG_HIGHMEM
-   BUG_ON(!pte_none(*(kmap_pte - idx)));
-#endif
-   set_pte(kmap_pte-idx, mk_pte(page, prot));
-   local_flush_tlb_one((unsigned long)vaddr);
-
-   return (void*) vaddr;
-}
-EXPORT_SYMBOL(kmap_atomic_high_prot);
-
-void kunmap_atomic_high(void *kvaddr)
-{
-   unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK;
-   int type __maybe_unused;
-
-   if (vaddr < FIXADDR_START)
-   return;
-
-   type = kmap_atomic_idx();
-#ifdef CONFIG_DEBUG_HIGHMEM
-   {
-   int idx = type + KM_TYPE_NR * smp_processor_id();
-
-   BUG_ON(vaddr != __fix_to_virt(FIX_KMAP_BEGIN + idx));
-
-   /*
-* force other mappings to Oops if they'll try to access
-* this pte without first remap it
-*/
-   pte_clear(&init_mm, vaddr, kmap_pte-idx);
-   local_flush_tlb_one(vaddr);
-   }
-#endif
-   kmap_atomic_idx_pop();
-}
-EXPORT_SYMBOL(kunmap_atomic_high);
-
-/*
- * This is the same as kmap_atomic() but can map memory that doesn't
- * have a struct page associated with it.
- */
-void *kmap_atomic_pfn(unsigned long pfn)
-{
-   unsigned long vaddr;
-   int idx, type;
-
-   preempt_disable();
-   pagefault_disable();
-
-   type = kmap_atomic_idx_push();
-   idx = type + KM_TYPE_NR*smp_processor_id();
-   vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
-   set_pte(kmap_pte-idx, pfn_pte(pfn, PAGE_KERNEL));
-   flush_tlb_one(vaddr);
-
-   return (void*) vaddr;
-}
-
-void __init kmap_init(void)
-{
-   unsigned long kmap_vstart;
-
-   /* cache the first kmap pte */
-   kmap_vstart = __fix_to_virt(FIX_KMAP_BEGIN);
-   kmap_pte = virt_to_kpte(kmap_vstart);
-}
--- a/arch/mips/mm/init.c
+++ b/arch/mips/mm/init.c
@@ -402,9 +402,6 @@ void __init paging_init(void)
 
pagetable_init();
 
-#ifdef CONFIG_HIGHMEM
-   kmap_init();
-#endif
 #ifdef CONFIG_ZONE_DMA
max_zone_pfns[ZONE_DMA] = MAX_DMA_PFN;
 #endif



[patch RFC 06/15] csky/mm/highmem: Switch to generic kmap atomic

2020-09-19 Thread Thomas Gleixner
Signed-off-by: Thomas Gleixner 
Cc: Guo Ren 
Cc: linux-c...@vger.kernel.org
---
Note: Completely untested
---
 arch/csky/Kconfig   |1 
 arch/csky/include/asm/highmem.h |4 +-
 arch/csky/mm/highmem.c  |   75 
 3 files changed, 5 insertions(+), 75 deletions(-)

--- a/arch/csky/Kconfig
+++ b/arch/csky/Kconfig
@@ -285,6 +285,7 @@ config NR_CPUS
 config HIGHMEM
bool "High Memory Support"
depends on !CPU_CK610
+   select KMAP_ATOMIC_GENERIC
default y
 
 config FORCE_MAX_ZONEORDER
--- a/arch/csky/include/asm/highmem.h
+++ b/arch/csky/include/asm/highmem.h
@@ -32,10 +32,12 @@ extern pte_t *pkmap_page_table;
 
 #define ARCH_HAS_KMAP_FLUSH_TLB
 extern void kmap_flush_tlb(unsigned long addr);
-extern void *kmap_atomic_pfn(unsigned long pfn);
 
 #define flush_cache_kmaps() do {} while (0)
 
+#define arch_kmap_temp_post_map(vaddr, pteval) kmap_flush_tlb(vaddr)
+#define arch_kmap_temp_post_unmap(vaddr)   kmap_flush_tlb(vaddr)
+
 extern void kmap_init(void);
 
 #endif /* __KERNEL__ */
--- a/arch/csky/mm/highmem.c
+++ b/arch/csky/mm/highmem.c
@@ -9,8 +9,6 @@
 #include 
 #include 
 
-static pte_t *kmap_pte;
-
 unsigned long highstart_pfn, highend_pfn;
 
 void kmap_flush_tlb(unsigned long addr)
@@ -19,67 +17,7 @@ void kmap_flush_tlb(unsigned long addr)
 }
 EXPORT_SYMBOL(kmap_flush_tlb);
 
-void *kmap_atomic_high_prot(struct page *page, pgprot_t prot)
-{
-   unsigned long vaddr;
-   int idx, type;
-
-   type = kmap_atomic_idx_push();
-   idx = type + KM_TYPE_NR*smp_processor_id();
-   vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
-#ifdef CONFIG_DEBUG_HIGHMEM
-   BUG_ON(!pte_none(*(kmap_pte - idx)));
-#endif
-   set_pte(kmap_pte-idx, mk_pte(page, prot));
-   flush_tlb_one((unsigned long)vaddr);
-
-   return (void *)vaddr;
-}
-EXPORT_SYMBOL(kmap_atomic_high_prot);
-
-void kunmap_atomic_high(void *kvaddr)
-{
-   unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK;
-   int idx;
-
-   if (vaddr < FIXADDR_START)
-   return;
-
-#ifdef CONFIG_DEBUG_HIGHMEM
-   idx = KM_TYPE_NR*smp_processor_id() + kmap_atomic_idx();
-
-   BUG_ON(vaddr != __fix_to_virt(FIX_KMAP_BEGIN + idx));
-
-   pte_clear(&init_mm, vaddr, kmap_pte - idx);
-   flush_tlb_one(vaddr);
-#else
-   (void) idx; /* to kill a warning */
-#endif
-   kmap_atomic_idx_pop();
-}
-EXPORT_SYMBOL(kunmap_atomic_high);
-
-/*
- * This is the same as kmap_atomic() but can map memory that doesn't
- * have a struct page associated with it.
- */
-void *kmap_atomic_pfn(unsigned long pfn)
-{
-   unsigned long vaddr;
-   int idx, type;
-
-   pagefault_disable();
-
-   type = kmap_atomic_idx_push();
-   idx = type + KM_TYPE_NR*smp_processor_id();
-   vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
-   set_pte(kmap_pte-idx, pfn_pte(pfn, PAGE_KERNEL));
-   flush_tlb_one(vaddr);
-
-   return (void *) vaddr;
-}
-
-static void __init kmap_pages_init(void)
+void __init kmap_init(void)
 {
unsigned long vaddr;
pgd_t *pgd;
@@ -96,14 +34,3 @@ static void __init kmap_pages_init(void)
pte = pte_offset_kernel(pmd, vaddr);
pkmap_page_table = pte;
 }
-
-void __init kmap_init(void)
-{
-   unsigned long vaddr;
-
-   kmap_pages_init();
-
-   vaddr = __fix_to_virt(FIX_KMAP_BEGIN);
-
-   kmap_pte = pte_offset_kernel((pmd_t *)pgd_offset_k(vaddr), vaddr);
-}



[patch RFC 00/15] mm/highmem: Provide a preemptible variant of kmap_atomic & friends

2020-09-19 Thread Thomas Gleixner
First of all, sorry for the horribly big Cc list!

Following up to the discussion in:

  https://lore.kernel.org/r/20200914204209.256266...@linutronix.de

this provides a preemptible variant of kmap_atomic & related
interfaces. This is achieved by:

 - Consolidating all kmap atomic implementations in generic code

 - Switching from per CPU storage of the kmap index to a per task storage

 - Adding a pteval array to the per task storage which contains the ptevals
   of the currently active temporary kmaps

 - Adding context switch code which checks whether the outgoing or the
   incoming task has active temporary kmaps. If so, the outgoing task's
   kmaps are removed and the incoming task's kmaps are restored.

 - Adding new interfaces k[un]map_temporary*() which are not disabling
   preemption and can be called from any context (except NMI).

   Contrary to kmap() which provides preemptible and "persistant" mappings,
   these interfaces are meant to replace the temporary mappings provided by
   kmap_atomic*() today.

This allows to get rid of conditional mapping choices and allows to have
preemptible short term mappings on 64bit which are today enforced to be
non-preemptible due to the highmem constraints. It clearly puts overhead on
the highmem users, but highmem is slow anyway.

This is not a wholesale conversion which makes kmap_atomic magically
preemptible because there might be usage sites which rely on the implicit
preempt disable. So this needs to be done on a case by case basis and the
call sites converted to kmap_temporary.

Note, that this is only lightly tested on X86 and completely untested on
all other architectures.

The lot is also available from

   git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git highmem

Thanks,

tglx
---
 a/arch/arm/mm/highmem.c   |  121 -
 a/arch/microblaze/mm/highmem.c|   78 -
 a/arch/nds32/mm/highmem.c |   48 
 a/arch/powerpc/mm/highmem.c   |   67 ---
 a/arch/sparc/mm/highmem.c |  115 
 arch/arc/Kconfig  |1 
 arch/arc/include/asm/highmem.h|8 +
 arch/arc/mm/highmem.c |   44 ---
 arch/arm/Kconfig  |1 
 arch/arm/include/asm/highmem.h|   30 +++--
 arch/arm/mm/Makefile  |1 
 arch/csky/Kconfig |1 
 arch/csky/include/asm/highmem.h   |4 
 arch/csky/mm/highmem.c|   75 -
 arch/microblaze/Kconfig   |1 
 arch/microblaze/include/asm/highmem.h |6 -
 arch/microblaze/mm/Makefile   |1 
 arch/microblaze/mm/init.c |6 -
 arch/mips/Kconfig |1 
 arch/mips/include/asm/highmem.h   |4 
 arch/mips/mm/highmem.c|   77 -
 arch/mips/mm/init.c   |3 
 arch/nds32/Kconfig.cpu|1 
 arch/nds32/include/asm/highmem.h  |   21 ++-
 arch/nds32/mm/Makefile|1 
 arch/powerpc/Kconfig  |1 
 arch/powerpc/include/asm/highmem.h|6 -
 arch/powerpc/mm/Makefile  |1 
 arch/powerpc/mm/mem.c |7 -
 arch/sparc/Kconfig|1 
 arch/sparc/include/asm/highmem.h  |7 -
 arch/sparc/mm/Makefile|3 
 arch/sparc/mm/srmmu.c |2 
 arch/x86/include/asm/fixmap.h |1 
 arch/x86/include/asm/highmem.h|   12 +-
 arch/x86/include/asm/iomap.h  |   29 +++--
 arch/x86/mm/highmem_32.c  |   59 --
 arch/x86/mm/init_32.c |   15 --
 arch/x86/mm/iomap_32.c|   57 --
 arch/xtensa/Kconfig   |1 
 arch/xtensa/include/asm/highmem.h |9 +
 arch/xtensa/mm/highmem.c  |   44 ---
 b/arch/x86/Kconfig|3 
 include/linux/highmem.h   |  141 +++-
 include/linux/io-mapping.h|2 
 include/linux/sched.h |9 +
 kernel/sched/core.c   |   10 +
 mm/Kconfig|3 
 mm/highmem.c  |  192 --
 49 files changed, 422 insertions(+), 909 deletions(-)


[patch RFC 03/15] x86/mm/highmem: Use generic kmap atomic implementation

2020-09-19 Thread Thomas Gleixner
Convert X86 to the generic kmap atomic implementation.

Make the iomap_atomic() naming convention consistent while at it.

Signed-off-by: Thomas Gleixner 
---
 arch/x86/Kconfig   |3 +-
 arch/x86/include/asm/fixmap.h  |1 
 arch/x86/include/asm/highmem.h |   12 ++--
 arch/x86/include/asm/iomap.h   |   17 ++-
 arch/x86/mm/highmem_32.c   |   59 -
 arch/x86/mm/init_32.c  |   15 --
 arch/x86/mm/iomap_32.c |   58 ++--
 include/linux/io-mapping.h |2 -
 8 files changed, 25 insertions(+), 142 deletions(-)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -14,10 +14,11 @@ config X86_32
select ARCH_WANT_IPC_PARSE_VERSION
select CLKSRC_I8253
select CLONE_BACKWARDS
+   select GENERIC_VDSO_32
select HAVE_DEBUG_STACKOVERFLOW
+   select KMAP_ATOMIC_GENERIC
select MODULES_USE_ELF_REL
select OLD_SIGACTION
-   select GENERIC_VDSO_32
 
 config X86_64
def_bool y
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -151,7 +151,6 @@ extern void reserve_top_address(unsigned
 
 extern int fixmaps_set;
 
-extern pte_t *kmap_pte;
 extern pte_t *pkmap_page_table;
 
 void __native_set_fixmap(enum fixed_addresses idx, pte_t pte);
--- a/arch/x86/include/asm/highmem.h
+++ b/arch/x86/include/asm/highmem.h
@@ -58,11 +58,17 @@ extern unsigned long highstart_pfn, high
 #define PKMAP_NR(virt)  ((virt-PKMAP_BASE) >> PAGE_SHIFT)
 #define PKMAP_ADDR(nr)  (PKMAP_BASE + ((nr) << PAGE_SHIFT))
 
-void *kmap_atomic_pfn(unsigned long pfn);
-void *kmap_atomic_prot_pfn(unsigned long pfn, pgprot_t prot);
-
 #define flush_cache_kmaps()do { } while (0)
 
+#definearch_kmap_temp_post_map(vaddr, pteval)  \
+   arch_flush_lazy_mmu_mode()
+
+#definearch_kmap_temp_post_unmap(vaddr)\
+   do {\
+   flush_tlb_one_kernel((vaddr));  \
+   arch_flush_lazy_mmu_mode(); \
+   } while (0)
+
 extern void add_highpages_with_active_regions(int nid, unsigned long start_pfn,
unsigned long end_pfn);
 
--- a/arch/x86/include/asm/iomap.h
+++ b/arch/x86/include/asm/iomap.h
@@ -9,19 +9,20 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
-void __iomem *
-iomap_atomic_prot_pfn(unsigned long pfn, pgprot_t prot);
+void __iomem *iomap_atomic_pfn_prot(unsigned long pfn, pgprot_t prot);
 
-void
-iounmap_atomic(void __iomem *kvaddr);
+static inline void iounmap_atomic(void __iomem *vaddr)
+{
+   kunmap_atomic_indexed((void __force *)vaddr);
+   preempt_enable();
+}
 
-int
-iomap_create_wc(resource_size_t base, unsigned long size, pgprot_t *prot);
+int iomap_create_wc(resource_size_t base, unsigned long size, pgprot_t *prot);
 
-void
-iomap_free(resource_size_t base, unsigned long size);
+void iomap_free(resource_size_t base, unsigned long size);
 
 #endif /* _ASM_X86_IOMAP_H */
--- a/arch/x86/mm/highmem_32.c
+++ b/arch/x86/mm/highmem_32.c
@@ -4,65 +4,6 @@
 #include  /* for totalram_pages */
 #include 
 
-void *kmap_atomic_high_prot(struct page *page, pgprot_t prot)
-{
-   unsigned long vaddr;
-   int idx, type;
-
-   type = kmap_atomic_idx_push();
-   idx = type + KM_TYPE_NR*smp_processor_id();
-   vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
-   BUG_ON(!pte_none(*(kmap_pte-idx)));
-   set_pte(kmap_pte-idx, mk_pte(page, prot));
-   arch_flush_lazy_mmu_mode();
-
-   return (void *)vaddr;
-}
-EXPORT_SYMBOL(kmap_atomic_high_prot);
-
-/*
- * This is the same as kmap_atomic() but can map memory that doesn't
- * have a struct page associated with it.
- */
-void *kmap_atomic_pfn(unsigned long pfn)
-{
-   return kmap_atomic_prot_pfn(pfn, kmap_prot);
-}
-EXPORT_SYMBOL_GPL(kmap_atomic_pfn);
-
-void kunmap_atomic_high(void *kvaddr)
-{
-   unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK;
-
-   if (vaddr >= __fix_to_virt(FIX_KMAP_END) &&
-   vaddr <= __fix_to_virt(FIX_KMAP_BEGIN)) {
-   int idx, type;
-
-   type = kmap_atomic_idx();
-   idx = type + KM_TYPE_NR * smp_processor_id();
-
-#ifdef CONFIG_DEBUG_HIGHMEM
-   WARN_ON_ONCE(vaddr != __fix_to_virt(FIX_KMAP_BEGIN + idx));
-#endif
-   /*
-* Force other mappings to Oops if they'll try to access this
-* pte without first remap it.  Keeping stale mappings around
-* is a bad idea also, in case the page changes cacheability
-* attributes or becomes a protected page in a hypervisor.
-*/
-   kpte_clear_flush(kmap_pte-idx, vaddr);
-   kmap_atomic_idx_pop();
-   arch_flush_lazy_mmu_mode();
-   }
-#ifdef CONFIG_DEBUG_HIGHMEM
-   else {
-   BUG_ON(vaddr < PAGE_OFFSET

[patch RFC 01/15] mm/highmem: Un-EXPORT __kmap_atomic_idx()

2020-09-19 Thread Thomas Gleixner
Nothing in modules can use that.

Signed-off-by: Thomas Gleixner 
---
 mm/highmem.c |2 --
 1 file changed, 2 deletions(-)

--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -108,8 +108,6 @@ static inline wait_queue_head_t *get_pkm
 atomic_long_t _totalhigh_pages __read_mostly;
 EXPORT_SYMBOL(_totalhigh_pages);
 
-EXPORT_PER_CPU_SYMBOL(__kmap_atomic_idx);
-
 unsigned int nr_free_highpages (void)
 {
struct zone *zone;



[patch RFC 04/15] arc/mm/highmem: Use generic kmap atomic implementation

2020-09-19 Thread Thomas Gleixner
Adopt the map ordering to match the other architectures and the generic
code.

Signed-off-by: Thomas Gleixner 
Cc: Vineet Gupta 
Cc: linux-snps-...@lists.infradead.org
---
Note: Completely untested
---
 arch/arc/Kconfig   |1 
 arch/arc/include/asm/highmem.h |8 ++-
 arch/arc/mm/highmem.c  |   44 -
 3 files changed, 9 insertions(+), 44 deletions(-)

--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -508,6 +508,7 @@ config LINUX_RAM_BASE
 config HIGHMEM
bool "High Memory Support"
select ARCH_DISCONTIGMEM_ENABLE
+   select KMAP_ATOMIC_GENERIC
help
  With ARC 2G:2G address split, only upper 2G is directly addressable by
  kernel. Enable this to potentially allow access to rest of 2G and PAE
--- a/arch/arc/include/asm/highmem.h
+++ b/arch/arc/include/asm/highmem.h
@@ -15,7 +15,10 @@
 #define FIXMAP_BASE(PAGE_OFFSET - FIXMAP_SIZE - PKMAP_SIZE)
 #define FIXMAP_SIZEPGDIR_SIZE  /* only 1 PGD worth */
 #define KM_TYPE_NR ((FIXMAP_SIZE >> PAGE_SHIFT)/NR_CPUS)
-#define FIXMAP_ADDR(nr)(FIXMAP_BASE + ((nr) << PAGE_SHIFT))
+
+#define FIX_KMAP_BEGIN (0)
+#define FIX_KMAP_END   ((FIXMAP_SIZE >> PAGE_SHIFT) - 1)
+#define FIXADDR_TOP(FIXMAP_BASE + FIXMAP_SIZE - PAGE_SIZE)
 
 /* start after fixmap area */
 #define PKMAP_BASE (FIXMAP_BASE + FIXMAP_SIZE)
@@ -29,6 +32,9 @@
 
 extern void kmap_init(void);
 
+#define arch_kmap_temp_post_unmap(vaddr)   \
+   local_flush_tlb_kernel_range(vaddr, vaddr + PAGE_SIZE)
+
 static inline void flush_cache_kmaps(void)
 {
flush_cache_all();
--- a/arch/arc/mm/highmem.c
+++ b/arch/arc/mm/highmem.c
@@ -47,48 +47,6 @@
  */
 
 extern pte_t * pkmap_page_table;
-static pte_t * fixmap_page_table;
-
-void *kmap_atomic_high_prot(struct page *page, pgprot_t prot)
-{
-   int idx, cpu_idx;
-   unsigned long vaddr;
-
-   cpu_idx = kmap_atomic_idx_push();
-   idx = cpu_idx + KM_TYPE_NR * smp_processor_id();
-   vaddr = FIXMAP_ADDR(idx);
-
-   set_pte_at(&init_mm, vaddr, fixmap_page_table + idx,
-  mk_pte(page, prot));
-
-   return (void *)vaddr;
-}
-EXPORT_SYMBOL(kmap_atomic_high_prot);
-
-void kunmap_atomic_high(void *kv)
-{
-   unsigned long kvaddr = (unsigned long)kv;
-
-   if (kvaddr >= FIXMAP_BASE && kvaddr < (FIXMAP_BASE + FIXMAP_SIZE)) {
-
-   /*
-* Because preemption is disabled, this vaddr can be associated
-* with the current allocated index.
-* But in case of multiple live kmap_atomic(), it still relies 
on
-* callers to unmap in right order.
-*/
-   int cpu_idx = kmap_atomic_idx();
-   int idx = cpu_idx + KM_TYPE_NR * smp_processor_id();
-
-   WARN_ON(kvaddr != FIXMAP_ADDR(idx));
-
-   pte_clear(&init_mm, kvaddr, fixmap_page_table + idx);
-   local_flush_tlb_kernel_range(kvaddr, kvaddr + PAGE_SIZE);
-
-   kmap_atomic_idx_pop();
-   }
-}
-EXPORT_SYMBOL(kunmap_atomic_high);
 
 static noinline pte_t * __init alloc_kmap_pgtable(unsigned long kvaddr)
 {
@@ -113,5 +71,5 @@ void __init kmap_init(void)
pkmap_page_table = alloc_kmap_pgtable(PKMAP_BASE);
 
BUILD_BUG_ON(LAST_PKMAP > PTRS_PER_PTE);
-   fixmap_page_table = alloc_kmap_pgtable(FIXMAP_BASE);
+   alloc_kmap_pgtable(FIXMAP_BASE);
 }



[patch RFC 02/15] highmem: Provide generic variant of kmap_atomic*

2020-09-19 Thread Thomas Gleixner
The kmap_atomic* interfaces in all architectures are pretty much the same
except for post map operations (flush) and pre- and post unmap operations.

Provide a generic variant for that.

Signed-off-by: Thomas Gleixner 
---
 include/linux/highmem.h |   87 ---
 mm/Kconfig  |3 +
 mm/highmem.c|  119 +++-
 3 files changed, 192 insertions(+), 17 deletions(-)

--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -31,9 +31,22 @@ static inline void invalidate_kernel_vma
 
 #include 
 
+/*
+ * Outside of CONFIG_HIGHMEM to support X86 32bit iomap_atomic() cruft.
+ */
+#ifdef CONFIG_KMAP_ATOMIC_GENERIC
+void *kmap_atomic_pfn_prot(unsigned long pfn, pgprot_t prot);
+void *kmap_atomic_page_prot(struct page *page, pgprot_t prot);
+void kunmap_atomic_indexed(void *vaddr);
+# ifndef ARCH_NEEDS_KMAP_HIGH_GET
+static inline void *arch_kmap_temporary_high_get(struct page *page)
+{
+   return NULL;
+}
+# endif
+#endif
+
 #ifdef CONFIG_HIGHMEM
-extern void *kmap_atomic_high_prot(struct page *page, pgprot_t prot);
-extern void kunmap_atomic_high(void *kvaddr);
 #include 
 
 #ifndef ARCH_HAS_KMAP_FLUSH_TLB
@@ -81,6 +94,11 @@ static inline void kunmap(struct page *p
  * be used in IRQ contexts, so in some (very limited) cases we need
  * it.
  */
+
+#ifndef CONFIG_KMAP_ATOMIC_GENERIC
+void *kmap_atomic_high_prot(struct page *page, pgprot_t prot);
+void kunmap_atomic_high(void *kvaddr);
+
 static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot)
 {
preempt_disable();
@@ -89,7 +107,38 @@ static inline void *kmap_atomic_prot(str
return page_address(page);
return kmap_atomic_high_prot(page, prot);
 }
-#define kmap_atomic(page)  kmap_atomic_prot(page, kmap_prot)
+
+static inline void __kunmap_atomic(void *vaddr)
+{
+   kunmap_atomic_high(vaddr);
+   pagefault_enable();
+}
+#else /* !CONFIG_KMAP_ATOMIC_GENERIC */
+
+static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot)
+{
+   preempt_disable();
+   return kmap_atomic_page_prot(page, prot);
+}
+
+static inline void *kmap_atomic_pfn(unsigned long pfn)
+{
+   preempt_disable();
+   return kmap_atomic_pfn_prot(pfn, kmap_prot);
+}
+
+static inline void __kunmap_atomic(void *addr)
+{
+   kumap_atomic_indexed(addr);
+}
+
+
+#endif /* CONFIG_KMAP_ATOMIC_GENERIC */
+
+static inline void *kmap_atomic(struct page *page)
+{
+   return kmap_atomic_prot(page, kmap_prot);
+}
 
 /* declarations for linux/mm/highmem.c */
 unsigned int nr_free_highpages(void);
@@ -157,21 +206,29 @@ static inline void *kmap_atomic(struct p
pagefault_disable();
return page_address(page);
 }
-#define kmap_atomic_prot(page, prot)   kmap_atomic(page)
 
-static inline void kunmap_atomic_high(void *addr)
+static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot)
+{
+   return kmap_atomic(page);
+}
+
+static inline void *kmap_atomic_pfn(unsigned long pfn)
+{
+   return kmap_atomic(pfn_to_page(pfn));
+}
+
+static inline void __kunmap_atomic(void *addr)
 {
/*
 * Mostly nothing to do in the CONFIG_HIGHMEM=n case as kunmap_atomic()
-* handles re-enabling faults + preemption
+* handles preemption
 */
 #ifdef ARCH_HAS_FLUSH_ON_KUNMAP
kunmap_flush_on_unmap(addr);
 #endif
+   pagefault_enable();
 }
 
-#define kmap_atomic_pfn(pfn)   kmap_atomic(pfn_to_page(pfn))
-
 #define kmap_flush_unused()do {} while(0)
 
 #endif /* CONFIG_HIGHMEM */
@@ -213,14 +270,12 @@ static inline void kmap_atomic_idx_pop(v
  * Prevent people trying to call kunmap_atomic() as if it were kunmap()
  * kunmap_atomic() should get the return value of kmap_atomic, not the page.
  */
-#define kunmap_atomic(addr) \
-do {\
-   BUILD_BUG_ON(__same_type((addr), struct page *));   \
-   kunmap_atomic_high(addr);  \
-   pagefault_enable(); \
-   preempt_enable();   \
-} while (0)
-
+#define kunmap_atomic(addr)\
+   do {\
+   BUILD_BUG_ON(__same_type((addr), struct page *));   \
+   __kunmap_atomic(addr);  \
+   preempt_enable();   \
+   } while (0)
 
 /* when CONFIG_HIGHMEM is not set these will be plain clear/copy_page */
 #ifndef clear_user_highpage
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -868,4 +868,7 @@ config ARCH_HAS_HUGEPD
 config MAPPING_DIRTY_HELPERS
 bool
 
+config KMAP_ATOMIC_GENERIC
+   bool
+
 endmenu
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -314,6 +314,15 @@ void *kmap_high_get(struct page *page)
unlock_kmap_a

[PATCH v2 1/4] selftests/seccomp: Record syscall during ptrace entry

2020-09-19 Thread Kees Cook
In preparation for performing actions during ptrace syscall exit, save
the syscall number during ptrace syscall entry. Some architectures do
no have the syscall number available during ptrace syscall exit.

Suggested-by: Thadeu Lima de Souza Cascardo 
Link: 
https://lore.kernel.org/linux-kselftest/20200911181012.171027-1-casca...@canonical.com/
Signed-off-by: Kees Cook 
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 40 +--
 1 file changed, 27 insertions(+), 13 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c 
b/tools/testing/selftests/seccomp/seccomp_bpf.c
index bc0fb463c709..c0311b4c736b 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -1949,12 +1949,19 @@ void tracer_seccomp(struct __test_metadata *_metadata, 
pid_t tracee,
 
 }
 
+FIXTURE(TRACE_syscall) {
+   struct sock_fprog prog;
+   pid_t tracer, mytid, mypid, parent;
+   long syscall_nr;
+};
+
 void tracer_ptrace(struct __test_metadata *_metadata, pid_t tracee,
   int status, void *args)
 {
-   int ret, nr;
+   int ret;
unsigned long msg;
static bool entry;
+   FIXTURE_DATA(TRACE_syscall) *self = args;
 
/*
 * The traditional way to tell PTRACE_SYSCALL entry/exit
@@ -1968,24 +1975,31 @@ void tracer_ptrace(struct __test_metadata *_metadata, 
pid_t tracee,
EXPECT_EQ(entry ? PTRACE_EVENTMSG_SYSCALL_ENTRY
: PTRACE_EVENTMSG_SYSCALL_EXIT, msg);
 
-   if (!entry)
+   /*
+* Some architectures only support setting return values during
+* syscall exit under ptrace, and on exit the syscall number may
+* no longer be available. Therefore, save the initial sycall
+* number here, so it can be examined during both entry and exit
+* phases.
+*/
+   if (entry)
+   self->syscall_nr = get_syscall(_metadata, tracee);
+   else
return;
 
-   nr = get_syscall(_metadata, tracee);
-
-   if (nr == __NR_getpid)
+   switch (self->syscall_nr) {
+   case __NR_getpid:
change_syscall(_metadata, tracee, __NR_getppid, 0);
-   if (nr == __NR_gettid)
+   break;
+   case __NR_gettid:
change_syscall(_metadata, tracee, -1, 45000);
-   if (nr == __NR_openat)
+   break;
+   case __NR_openat:
change_syscall(_metadata, tracee, -1, -ESRCH);
+   break;
+   }
 }
 
-FIXTURE(TRACE_syscall) {
-   struct sock_fprog prog;
-   pid_t tracer, mytid, mypid, parent;
-};
-
 FIXTURE_VARIANT(TRACE_syscall) {
/*
 * All of the SECCOMP_RET_TRACE behaviors can be tested with either
@@ -2044,7 +2058,7 @@ FIXTURE_SETUP(TRACE_syscall)
self->tracer = setup_trace_fixture(_metadata,
   variant->use_ptrace ? tracer_ptrace
   : tracer_seccomp,
-  NULL, variant->use_ptrace);
+  self, variant->use_ptrace);
 
ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
ASSERT_EQ(0, ret);
-- 
2.25.1



[PATCH v2 4/4] selftests/clone3: Avoid OS-defined clone_args

2020-09-19 Thread Kees Cook
As the UAPI headers start to appear in distros, we need to avoid outdated
versions of struct clone_args to be able to test modern features;
rename to "struct __clone_args". Additionally update the struct size
macro names to match UAPI names.

Signed-off-by: Kees Cook 
---
 tools/testing/selftests/clone3/clone3.c   | 45 ---
 .../clone3/clone3_cap_checkpoint_restore.c|  4 +-
 .../selftests/clone3/clone3_clear_sighand.c   |  2 +-
 .../selftests/clone3/clone3_selftests.h   | 24 +-
 .../testing/selftests/clone3/clone3_set_tid.c |  4 +-
 tools/testing/selftests/seccomp/seccomp_bpf.c |  4 +-
 6 files changed, 40 insertions(+), 43 deletions(-)

diff --git a/tools/testing/selftests/clone3/clone3.c 
b/tools/testing/selftests/clone3/clone3.c
index b7e6dec36173..42be3b925830 100644
--- a/tools/testing/selftests/clone3/clone3.c
+++ b/tools/testing/selftests/clone3/clone3.c
@@ -20,13 +20,6 @@
 #include "../kselftest.h"
 #include "clone3_selftests.h"
 
-/*
- * Different sizes of struct clone_args
- */
-#ifndef CLONE3_ARGS_SIZE_V0
-#define CLONE3_ARGS_SIZE_V0 64
-#endif
-
 enum test_mode {
CLONE3_ARGS_NO_TEST,
CLONE3_ARGS_ALL_0,
@@ -38,13 +31,13 @@ enum test_mode {
 
 static int call_clone3(uint64_t flags, size_t size, enum test_mode test_mode)
 {
-   struct clone_args args = {
+   struct __clone_args args = {
.flags = flags,
.exit_signal = SIGCHLD,
};
 
struct clone_args_extended {
-   struct clone_args args;
+   struct __clone_args args;
__aligned_u64 excess_space[2];
} args_ext;
 
@@ -52,11 +45,11 @@ static int call_clone3(uint64_t flags, size_t size, enum 
test_mode test_mode)
int status;
 
memset(&args_ext, 0, sizeof(args_ext));
-   if (size > sizeof(struct clone_args))
+   if (size > sizeof(struct __clone_args))
args_ext.excess_space[1] = 1;
 
if (size == 0)
-   size = sizeof(struct clone_args);
+   size = sizeof(struct __clone_args);
 
switch (test_mode) {
case CLONE3_ARGS_ALL_0:
@@ -77,9 +70,9 @@ static int call_clone3(uint64_t flags, size_t size, enum 
test_mode test_mode)
break;
}
 
-   memcpy(&args_ext.args, &args, sizeof(struct clone_args));
+   memcpy(&args_ext.args, &args, sizeof(struct __clone_args));
 
-   pid = sys_clone3((struct clone_args *)&args_ext, size);
+   pid = sys_clone3((struct __clone_args *)&args_ext, size);
if (pid < 0) {
ksft_print_msg("%s - Failed to create new process\n",
strerror(errno));
@@ -144,14 +137,14 @@ int main(int argc, char *argv[])
else
ksft_test_result_skip("Skipping clone3() with CLONE_NEWPID\n");
 
-   /* Do a clone3() with CLONE3_ARGS_SIZE_V0. */
-   test_clone3(0, CLONE3_ARGS_SIZE_V0, 0, CLONE3_ARGS_NO_TEST);
+   /* Do a clone3() with CLONE_ARGS_SIZE_VER0. */
+   test_clone3(0, CLONE_ARGS_SIZE_VER0, 0, CLONE3_ARGS_NO_TEST);
 
-   /* Do a clone3() with CLONE3_ARGS_SIZE_V0 - 8 */
-   test_clone3(0, CLONE3_ARGS_SIZE_V0 - 8, -EINVAL, CLONE3_ARGS_NO_TEST);
+   /* Do a clone3() with CLONE_ARGS_SIZE_VER0 - 8 */
+   test_clone3(0, CLONE_ARGS_SIZE_VER0 - 8, -EINVAL, CLONE3_ARGS_NO_TEST);
 
/* Do a clone3() with sizeof(struct clone_args) + 8 */
-   test_clone3(0, sizeof(struct clone_args) + 8, 0, CLONE3_ARGS_NO_TEST);
+   test_clone3(0, sizeof(struct __clone_args) + 8, 0, CLONE3_ARGS_NO_TEST);
 
/* Do a clone3() with exit_signal having highest 32 bits non-zero */
test_clone3(0, 0, -EINVAL, CLONE3_ARGS_INVAL_EXIT_SIGNAL_BIG);
@@ -165,31 +158,31 @@ int main(int argc, char *argv[])
/* Do a clone3() with NSIG < exit_signal < CSIG */
test_clone3(0, 0, -EINVAL, CLONE3_ARGS_INVAL_EXIT_SIGNAL_NSIG);
 
-   test_clone3(0, sizeof(struct clone_args) + 8, 0, CLONE3_ARGS_ALL_0);
+   test_clone3(0, sizeof(struct __clone_args) + 8, 0, CLONE3_ARGS_ALL_0);
 
-   test_clone3(0, sizeof(struct clone_args) + 16, -E2BIG,
+   test_clone3(0, sizeof(struct __clone_args) + 16, -E2BIG,
CLONE3_ARGS_ALL_0);
 
-   test_clone3(0, sizeof(struct clone_args) * 2, -E2BIG,
+   test_clone3(0, sizeof(struct __clone_args) * 2, -E2BIG,
CLONE3_ARGS_ALL_0);
 
/* Do a clone3() with > page size */
test_clone3(0, getpagesize() + 8, -E2BIG, CLONE3_ARGS_NO_TEST);
 
-   /* Do a clone3() with CLONE3_ARGS_SIZE_V0 in a new PID NS. */
+   /* Do a clone3() with CLONE_ARGS_SIZE_VER0 in a new PID NS. */
if (uid == 0)
-   test_clone3(CLONE_NEWPID, CLONE3_ARGS_SIZE_V0, 0,
+   test_clone3(CLONE_NEWPID, CLONE_ARGS_SIZE_VER0, 0,
CLONE3_ARGS_NO_TEST);
else
ksft_test_result_skip("Skipping clone3() with CLONE_NEWPID\n");
 
-

[PATCH v2 0/4] selftests/seccomp: Refactor change_syscall()

2020-09-19 Thread Kees Cook
v1: https://lore.kernel.org/lkml/20200912110820.597135-1-keesc...@chromium.org
v2:
- Took Acked patches into -next
- refactored powerpc syscall setting implementation
- refactored clone3 args implementation

Hi,

This finishes the refactoring of the seccomp selftest logic used in
for ptrace syscall number/return handling for powerpc. Additionally
fixes clone3 (which seccomp depends on for testing) to run under MIPS
where an old struct clone_args has become visible.

(FWIW, I expect to take these via the seccomp tree.)

Thanks,

Kees Cook (4):
  selftests/seccomp: Record syscall during ptrace entry
  selftests/seccomp: Allow syscall nr and ret value to be set separately
  selftests/seccomp: powerpc: Set syscall return during ptrace syscall
exit
  selftests/clone3: Avoid OS-defined clone_args

 tools/testing/selftests/clone3/clone3.c   |  45 +++
 .../clone3/clone3_cap_checkpoint_restore.c|   4 +-
 .../selftests/clone3/clone3_clear_sighand.c   |   2 +-
 .../selftests/clone3/clone3_selftests.h   |  24 ++--
 .../testing/selftests/clone3/clone3_set_tid.c |   4 +-
 tools/testing/selftests/seccomp/seccomp_bpf.c | 120 ++
 6 files changed, 131 insertions(+), 68 deletions(-)

-- 
2.25.1



[PATCH v2 2/4] selftests/seccomp: Allow syscall nr and ret value to be set separately

2020-09-19 Thread Kees Cook
In preparation for setting syscall nr and ret values separately, refactor
the helpers to take a pointer to a value, so that a NULL can indicate
"do not change this respective value". This is done to keep the regset
read/write happening once and in one code path.

Signed-off-by: Kees Cook 
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 59 +++
 1 file changed, 47 insertions(+), 12 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c 
b/tools/testing/selftests/seccomp/seccomp_bpf.c
index c0311b4c736b..98ce5e8a6398 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -1888,27 +1888,47 @@ int get_syscall(struct __test_metadata *_metadata, 
pid_t tracee)
 }
 
 /* Architecture-specific syscall changing routine. */
-void change_syscall(struct __test_metadata *_metadata,
-   pid_t tracee, int syscall, int result)
+void __change_syscall(struct __test_metadata *_metadata,
+   pid_t tracee, long *syscall, long *ret)
 {
ARCH_REGS orig, regs;
 
+   /* Do not get/set registers if we have nothing to do. */
+   if (!syscall && !ret)
+   return;
+
EXPECT_EQ(0, ARCH_GETREGS(regs)) {
return;
}
orig = regs;
 
-   SYSCALL_NUM_SET(regs, syscall);
+   if (syscall)
+   SYSCALL_NUM_SET(regs, *syscall);
 
-   /* If syscall is skipped, change return value. */
-   if (syscall == -1)
-   SYSCALL_RET_SET(regs, result);
+   if (ret)
+   SYSCALL_RET_SET(regs, *ret);
 
/* Flush any register changes made. */
if (memcmp(&orig, ®s, sizeof(orig)) != 0)
EXPECT_EQ(0, ARCH_SETREGS(regs));
 }
 
+/* Change only syscall number. */
+void change_syscall_nr(struct __test_metadata *_metadata,
+  pid_t tracee, long syscall)
+{
+   __change_syscall(_metadata, tracee, &syscall, NULL);
+}
+
+/* Change syscall return value (and set syscall number to -1). */
+void change_syscall_ret(struct __test_metadata *_metadata,
+   pid_t tracee, long ret)
+{
+   long syscall = -1;
+
+   __change_syscall(_metadata, tracee, &syscall, &ret);
+}
+
 void tracer_seccomp(struct __test_metadata *_metadata, pid_t tracee,
int status, void *args)
 {
@@ -1924,17 +1944,17 @@ void tracer_seccomp(struct __test_metadata *_metadata, 
pid_t tracee,
case 0x1002:
/* change getpid to getppid. */
EXPECT_EQ(__NR_getpid, get_syscall(_metadata, tracee));
-   change_syscall(_metadata, tracee, __NR_getppid, 0);
+   change_syscall_nr(_metadata, tracee, __NR_getppid);
break;
case 0x1003:
/* skip gettid with valid return code. */
EXPECT_EQ(__NR_gettid, get_syscall(_metadata, tracee));
-   change_syscall(_metadata, tracee, -1, 45000);
+   change_syscall_ret(_metadata, tracee, 45000);
break;
case 0x1004:
/* skip openat with error. */
EXPECT_EQ(__NR_openat, get_syscall(_metadata, tracee));
-   change_syscall(_metadata, tracee, -1, -ESRCH);
+   change_syscall_ret(_metadata, tracee, -ESRCH);
break;
case 0x1005:
/* do nothing (allow getppid) */
@@ -1961,6 +1981,8 @@ void tracer_ptrace(struct __test_metadata *_metadata, 
pid_t tracee,
int ret;
unsigned long msg;
static bool entry;
+   long syscall_nr_val, syscall_ret_val;
+   long *syscall_nr = NULL, *syscall_ret = NULL;
FIXTURE_DATA(TRACE_syscall) *self = args;
 
/*
@@ -1987,17 +2009,30 @@ void tracer_ptrace(struct __test_metadata *_metadata, 
pid_t tracee,
else
return;
 
+   syscall_nr = &syscall_nr_val;
+   syscall_ret = &syscall_ret_val;
+
+   /* Now handle the actual rewriting cases. */
switch (self->syscall_nr) {
case __NR_getpid:
-   change_syscall(_metadata, tracee, __NR_getppid, 0);
+   syscall_nr_val = __NR_getppid;
+   /* Never change syscall return for this case. */
+   syscall_ret = NULL;
break;
case __NR_gettid:
-   change_syscall(_metadata, tracee, -1, 45000);
+   syscall_nr_val = -1;
+   syscall_ret_val = 45000;
break;
case __NR_openat:
-   change_syscall(_metadata, tracee, -1, -ESRCH);
+   syscall_nr_val = -1;
+   syscall_ret_val = -ESRCH;
break;
+   default:
+   /* Unhandled, do nothing. */
+   return;
}
+
+   __change_syscall(_metadata, tracee, syscall_nr, syscall_ret);
 }
 
 FIXTURE_VARIANT(TRACE_syscall) {
-- 
2.25.1



[PATCH v2 3/4] selftests/seccomp: powerpc: Set syscall return during ptrace syscall exit

2020-09-19 Thread Kees Cook
Some archs (like powerpc) only support changing the return code during
syscall exit when ptrace is used. Test entry vs exit phases for which
portions of the syscall number and return values need to be set at which
different phases. For non-powerpc, all changes are made during ptrace
syscall entry, as before. For powerpc, the syscall number is changed at
ptrace syscall entry and the syscall return value is changed on ptrace
syscall exit.

Reported-by: Thadeu Lima de Souza Cascardo 
Suggested-by: Thadeu Lima de Souza Cascardo 
Link: 
https://lore.kernel.org/linux-kselftest/20200911181012.171027-1-casca...@canonical.com/
Fixes: 58d0a862f573 ("seccomp: add tests for ptrace hole")
Signed-off-by: Kees Cook 
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 25 ---
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c 
b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 98ce5e8a6398..894c2404d321 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -1765,6 +1765,7 @@ TEST_F(TRACE_poke, getpid_runs_normally)
(_regs).ccr &= ~0x1000; \
}   \
} while (0)
+# define SYSCALL_RET_SET_ON_PTRACE_EXIT
 #elif defined(__s390__)
 # define ARCH_REGS s390_regs
 # define SYSCALL_NUM(_regs)(_regs).gprs[2]
@@ -1853,6 +1854,18 @@ TEST_F(TRACE_poke, getpid_runs_normally)
} while (0)
 #endif
 
+/*
+ * Some architectures (e.g. powerpc) can only set syscall
+ * return values on syscall exit during ptrace.
+ */
+const bool ptrace_entry_set_syscall_nr = true;
+const bool ptrace_entry_set_syscall_ret =
+#ifndef SYSCALL_RET_SET_ON_PTRACE_EXIT
+   true;
+#else
+   false;
+#endif
+
 /*
  * Use PTRACE_GETREGS and PTRACE_SETREGS when available. This is useful for
  * architectures without HAVE_ARCH_TRACEHOOK (e.g. User-mode Linux).
@@ -2006,11 +2019,15 @@ void tracer_ptrace(struct __test_metadata *_metadata, 
pid_t tracee,
 */
if (entry)
self->syscall_nr = get_syscall(_metadata, tracee);
-   else
-   return;
 
-   syscall_nr = &syscall_nr_val;
-   syscall_ret = &syscall_ret_val;
+   /*
+* Depending on the architecture's syscall setting abilities, we
+* pick which things to set during this phase (entry or exit).
+*/
+   if (entry == ptrace_entry_set_syscall_nr)
+   syscall_nr = &syscall_nr_val;
+   if (entry == ptrace_entry_set_syscall_ret)
+   syscall_ret = &syscall_ret_val;
 
/* Now handle the actual rewriting cases. */
switch (self->syscall_nr) {
-- 
2.25.1



[PATCH] KVM: PPC: Book3S: Remove redundant initialization of variable ret

2020-09-19 Thread Jing Xiangfeng
The variable ret is being initialized with '-ENOMEM' that is meaningless.
So remove it.

Signed-off-by: Jing Xiangfeng 
---
 arch/powerpc/kvm/book3s_64_vio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 1a529df0ab44..b277a75cd1be 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -283,7 +283,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
struct kvmppc_spapr_tce_table *siter;
struct mm_struct *mm = kvm->mm;
unsigned long npages, size = args->size;
-   int ret = -ENOMEM;
+   int ret;
 
if (!args->size || args->page_shift < 12 || args->page_shift > 34 ||
(args->offset + args->size > (ULLONG_MAX >> args->page_shift)))
-- 
2.17.1