Re: [PATCH v3] powerpc/64: Fix memcmp reading past the end of src/dest

2019-03-26 Thread Segher Boessenkool
On Tue, Mar 26, 2019 at 08:18:07PM +1100, Michael Ellerman wrote:
> Segher Boessenkool  writes:
> > On Mon, Mar 25, 2019 at 11:33:56PM +1100, Michael Ellerman wrote:
> >> Segher Boessenkool  writes:
> >> > On Fri, Mar 22, 2019 at 11:37:24PM +1100, Michael Ellerman wrote:
> >> >> +   clrldi  r6,r4,(64-12)   // r6 = r4 & 0xfff
> >> >
> >> > You can just write
> >> >   rlwinm r6,r4,0,0x0fff
> >> 
> >> > if that is clearer?  Or do you still want a comment with that :-)
> >> 
> >> I don't think it's clearer doing a rotate of zero bits :)
> >> 
> >> And yeah I'd probably still leave the comment, so I'm inclined to stick
> >> with the clrldi?
> >
> > I always have to think what the clrldi etc. do exactly, while with rlwinm
> > it is obvious.  But yeah this may be different for other people who are
> > used to different idiom.
> 
> Interesting, I'm the opposite. You know ppc assembler better than me so
> I guess I just need to spend more time on it and embrace the zen of the
> rotate instructions.

There is only one rlwinm instruction, but there are 9 extended mnemonics
for it.  (And similarly for the 64-bit ops, where there are 3 basic
mnemonics but 9 extended mnemonics).  Of course I use {s,rot}{l,r}{w,d}i,
those are actually *simpler* than rlwinm / rldic{,l,r}, but not the other
stuff.

It may also be because the disassembler doesn't show these things.  Dunno.

> I'm not sure it's vastly more hostile though than `andi. 6,4,0x`
> being valid but `andi. 6,4,0x1` being not valid.

But that is the same for *all* UIMM insns.

> Yeah, I guess a new `andi` instruction is the only real answer :)

Too bad there is no space in the opcode map for it, already not in the
original POWER architecture (where this is "andil.") :-/


Segher


Re: [PATCH v3] powerpc/64: Fix memcmp reading past the end of src/dest

2019-03-26 Thread Michael Ellerman
Segher Boessenkool  writes:
> On Mon, Mar 25, 2019 at 11:33:56PM +1100, Michael Ellerman wrote:
>> Segher Boessenkool  writes:
>> > On Fri, Mar 22, 2019 at 11:37:24PM +1100, Michael Ellerman wrote:
>> >> + clrldi  r6,r4,(64-12)   // r6 = r4 & 0xfff
>> >
>> > You can just write
>> >   rlwinm r6,r4,0,0x0fff
>> 
>> > if that is clearer?  Or do you still want a comment with that :-)
>> 
>> I don't think it's clearer doing a rotate of zero bits :)
>> 
>> And yeah I'd probably still leave the comment, so I'm inclined to stick
>> with the clrldi?
>
> I always have to think what the clrldi etc. do exactly, while with rlwinm
> it is obvious.  But yeah this may be different for other people who are
> used to different idiom.

Interesting, I'm the opposite. You know ppc assembler better than me so
I guess I just need to spend more time on it and embrace the zen of the
rotate instructions.

>> Would be nice if the assembler could support:
>> 
>>  andir6, r4, 0x0fff
>> 
>> And turn it into the rlwinm, or rldicl :)
>
> The extended mnemonics are *simple*, *one-to-one* mappings.

It would still be simple and 1:1, but would only be valid for certain
constants :)

> Having "andi. 6,4,0x0f0f" a valid insn, but an extended mnemonic "andi 
> 6,4,0x0f0f"
> that is not (and the other way around for say 0xffff) would violate that.

I agree that's a bit of a foot gun.

I'm not sure it's vastly more hostile though than `andi. 6,4,0x`
being valid but `andi. 6,4,0x1` being not valid.

The assembler could print a nice error saying you need to use a
contiguous mask. And I mean how often do you andi. with a mask that
isn't contiguous?

> You could do some assembler macro, that can also expand to multiple insns
> where that is useful.  Also one for loading constants, etc.  The downside
> to that is you often do care how many insns are generated.
>
> Instead you could do a macro for only those cases that can be done with *one*
> insn.  But that then is pretty restricted in use, and people have to learn
> what values are valid.
>
> I don't see a perfect solution.

Yeah, I guess a new `andi` instruction is the only real answer :)

cheers


Re: [PATCH v3] powerpc/64: Fix memcmp reading past the end of src/dest

2019-03-25 Thread Segher Boessenkool
On Mon, Mar 25, 2019 at 11:33:56PM +1100, Michael Ellerman wrote:
> Segher Boessenkool  writes:
> > On Fri, Mar 22, 2019 at 11:37:24PM +1100, Michael Ellerman wrote:
> >> +  clrldi  r6,r4,(64-12)   // r6 = r4 & 0xfff
> >
> > You can just write
> >   rlwinm r6,r4,0,0x0fff
> 
> > if that is clearer?  Or do you still want a comment with that :-)
> 
> I don't think it's clearer doing a rotate of zero bits :)
> 
> And yeah I'd probably still leave the comment, so I'm inclined to stick
> with the clrldi?

I always have to think what the clrldi etc. do exactly, while with rlwinm
it is obvious.  But yeah this may be different for other people who are
used to different idiom.

> Would be nice if the assembler could support:
> 
>   andir6, r4, 0x0fff
> 
> And turn it into the rlwinm, or rldicl :)

The extended mnemonics are *simple*, *one-to-one* mappings.  Having
"andi. 6,4,0x0f0f" a valid insn, but an extended mnemonic "andi 6,4,0x0f0f"
that is not (and the other way around for say 0xffff) would violate that.

You could do some assembler macro, that can also expand to multiple insns
where that is useful.  Also one for loading constants, etc.  The downside
to that is you often do care how many insns are generated.

Instead you could do a macro for only those cases that can be done with *one*
insn.  But that then is pretty restricted in use, and people have to learn
what values are valid.

I don't see a perfect solution.


Segher


Re: [PATCH v3] powerpc/64: Fix memcmp reading past the end of src/dest

2019-03-25 Thread Michael Ellerman
Segher Boessenkool  writes:
> On Fri, Mar 22, 2019 at 11:37:24PM +1100, Michael Ellerman wrote:
>>  .Lcmp_rest_lt8bytes:
>> -/* Here we have only less than 8 bytes to compare with. at least s1
>> - * Address is aligned with 8 bytes.
>> - * The next double words are load and shift right with appropriate
>> - * bits.
>> +/*
>> + * Here we have less than 8 bytes to compare. At least s1 is aligned to
>> + * 8 bytes, but s2 may not be. We must make sure s2 + 8 doesn't cross a
>
> "s2 + 7"?  The code is fine though (bgt, not bge).

Duh, thanks for catching it.

>> + * page boundary, otherwise we might read past the end of the buffer and
>> + * trigger a page fault. We use 4K as the conservative minimum page
>> + * size. If we detect that case we go to the byte-by-byte loop.
>> + *
>> + * Otherwise the next double word is loaded from s1 and s2, and shifted
>> + * right to compare the appropriate bits.
>>   */
>> +clrldi  r6,r4,(64-12)   // r6 = r4 & 0xfff
>
> You can just write
>   rlwinm r6,r4,0,0x0fff

> if that is clearer?  Or do you still want a comment with that :-)

I don't think it's clearer doing a rotate of zero bits :)

And yeah I'd probably still leave the comment, so I'm inclined to stick
with the clrldi?

Would be nice if the assembler could support:

andir6, r4, 0x0fff

And turn it into the rlwinm, or rldicl :)

>> +cmpdi   r6,0xff8
>> +bgt .Lshort
>
> Reviewed-by: Segher Boessenkool 

I'll fixup the comment. Thanks.

cheers


Re: [PATCH v3] powerpc/64: Fix memcmp reading past the end of src/dest

2019-03-25 Thread Chandan Rajendra
On Friday, March 22, 2019 6:07:24 PM IST Michael Ellerman wrote:
> Chandan reported that fstests' generic/026 test hit a crash:
> 
>   BUG: Unable to handle kernel data access at 0xc0062ac4
>   Faulting instruction address: 0xc0092240
>   Oops: Kernel access of bad area, sig: 11 [#1]
>   LE SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA pSeries
>   CPU: 0 PID: 27828 Comm: chacl Not tainted 
> 5.0.0-rc2-next-20190115-1-g6de6dba64dda #1
>   NIP:  c0092240 LR: c066a55c CTR: 
>   REGS: c0062c0c3430 TRAP: 0300   Not tainted  
> (5.0.0-rc2-next-20190115-1-g6de6dba64dda)
>   MSR:  82009033   CR: 44000842  XER: 
> 2000
>   CFAR: 7fff7f3108ac DAR: c0062ac4 DSISR: 4000 IRQMASK: 0
>   GPR00:  c0062c0c36c0 c17f4c00 c121a660
>   GPR04: c0062ac3fff9 0004 0020 275b19c4
>   GPR08: 000c 46494c45 5347495f41434c5f c26073a0
>   GPR12:  c27a  
>   GPR16:    
>   GPR20: c0062ea70020 c0062c0c38d0 0002 0002
>   GPR24: c0062ac3ffe8 275b19c4 0001 c0062ac3
>   GPR28: c0062c0c38d0 c0062ac30050 c0062ac30058 
>   NIP memcmp+0x120/0x690
>   LR  xfs_attr3_leaf_lookup_int+0x53c/0x5b0
>   Call Trace:
> xfs_attr3_leaf_lookup_int+0x78/0x5b0 (unreliable)
> xfs_da3_node_lookup_int+0x32c/0x5a0
> xfs_attr_node_addname+0x170/0x6b0
> xfs_attr_set+0x2ac/0x340
> __xfs_set_acl+0xf0/0x230
> xfs_set_acl+0xd0/0x160
> set_posix_acl+0xc0/0x130
> posix_acl_xattr_set+0x68/0x110
> __vfs_setxattr+0xa4/0x110
> __vfs_setxattr_noperm+0xac/0x240
> vfs_setxattr+0x128/0x130
> setxattr+0x248/0x600
> path_setxattr+0x108/0x120
> sys_setxattr+0x28/0x40
> system_call+0x5c/0x70
>   Instruction dump:
>   7d201c28 7d402428 7c295040 38630008 38840008 408201f0 4200ffe8 2c05
>   4182ff6c 20c50008 54c61838 7d201c28 <7d402428> 7d293436 7d4a3436 7c295040
> 
> The instruction dump decodes as:
>   subfic  r6,r5,8
>   rlwinm  r6,r6,3,0,28
>   ldbrx   r9,0,r3
>   ldbrx   r10,0,r4  <-
> 
> Which shows us doing an 8 byte load from c0062ac3fff9, which
> crosses the page boundary at c0062ac4 and faults.
> 
> It's not OK for memcmp to read past the end of the source or
> destination buffers if that would cross a page boundary, because we
> don't know that the next page is mapped.
> 
> As pointed out by Segher, we can read past the end of the source or
> destination as long as we don't cross a 4K boundary, because that's
> our minimum page size on all platforms.
> 
> The bug is in the code at the .Lcmp_rest_lt8bytes label. When we get
> there we know that s1 is 8-byte aligned and we have at least 1 byte to
> read, so a single 8-byte load won't read past the end of s1 and cross
> a page boundary.
> 
> But we have to be more careful with s2. So check if it's within 8
> bytes of a 4K boundary and if so go to the byte-by-byte loop.
> 
> Fixes: 2d9ee327adce ("powerpc/64: Align bytes before fall back to .Lshort in 
> powerpc64 memcmp()")
> Cc: sta...@vger.kernel.org # v4.19+
> Reported-by: Chandan Rajendra 
> Signed-off-by: Michael Ellerman 

For unknown reasons, I am unable to recreate this bug on the unmodified
next-20190115 which was the original kernel I had found this bug on.

FWIW, I have executed generic/026 on a next-20190115 kernel with this patch
applied and I wasn't able to recreate the bug. Hence,

Tested-by: Chandan Rajendra 

-- 
chandan





Re: [PATCH v3] powerpc/64: Fix memcmp reading past the end of src/dest

2019-03-25 Thread Chandan Rajendra
On Friday, March 22, 2019 6:07:24 PM IST Michael Ellerman wrote:
> Chandan reported that fstests' generic/026 test hit a crash:
> 
>   BUG: Unable to handle kernel data access at 0xc0062ac4
>   Faulting instruction address: 0xc0092240
>   Oops: Kernel access of bad area, sig: 11 [#1]
>   LE SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA pSeries
>   CPU: 0 PID: 27828 Comm: chacl Not tainted 
> 5.0.0-rc2-next-20190115-1-g6de6dba64dda #1
>   NIP:  c0092240 LR: c066a55c CTR: 
>   REGS: c0062c0c3430 TRAP: 0300   Not tainted  
> (5.0.0-rc2-next-20190115-1-g6de6dba64dda)
>   MSR:  82009033   CR: 44000842  XER: 
> 2000
>   CFAR: 7fff7f3108ac DAR: c0062ac4 DSISR: 4000 IRQMASK: 0
>   GPR00:  c0062c0c36c0 c17f4c00 c121a660
>   GPR04: c0062ac3fff9 0004 0020 275b19c4
>   GPR08: 000c 46494c45 5347495f41434c5f c26073a0
>   GPR12:  c27a  
>   GPR16:    
>   GPR20: c0062ea70020 c0062c0c38d0 0002 0002
>   GPR24: c0062ac3ffe8 275b19c4 0001 c0062ac3
>   GPR28: c0062c0c38d0 c0062ac30050 c0062ac30058 
>   NIP memcmp+0x120/0x690
>   LR  xfs_attr3_leaf_lookup_int+0x53c/0x5b0
>   Call Trace:
> xfs_attr3_leaf_lookup_int+0x78/0x5b0 (unreliable)
> xfs_da3_node_lookup_int+0x32c/0x5a0
> xfs_attr_node_addname+0x170/0x6b0
> xfs_attr_set+0x2ac/0x340
> __xfs_set_acl+0xf0/0x230
> xfs_set_acl+0xd0/0x160
> set_posix_acl+0xc0/0x130
> posix_acl_xattr_set+0x68/0x110
> __vfs_setxattr+0xa4/0x110
> __vfs_setxattr_noperm+0xac/0x240
> vfs_setxattr+0x128/0x130
> setxattr+0x248/0x600
> path_setxattr+0x108/0x120
> sys_setxattr+0x28/0x40
> system_call+0x5c/0x70
>   Instruction dump:
>   7d201c28 7d402428 7c295040 38630008 38840008 408201f0 4200ffe8 2c05
>   4182ff6c 20c50008 54c61838 7d201c28 <7d402428> 7d293436 7d4a3436 7c295040
> 
> The instruction dump decodes as:
>   subfic  r6,r5,8
>   rlwinm  r6,r6,3,0,28
>   ldbrx   r9,0,r3
>   ldbrx   r10,0,r4  <-
> 
> Which shows us doing an 8 byte load from c0062ac3fff9, which
> crosses the page boundary at c0062ac4 and faults.
> 
> It's not OK for memcmp to read past the end of the source or
> destination buffers if that would cross a page boundary, because we
> don't know that the next page is mapped.
> 
> As pointed out by Segher, we can read past the end of the source or
> destination as long as we don't cross a 4K boundary, because that's
> our minimum page size on all platforms.
> 
> The bug is in the code at the .Lcmp_rest_lt8bytes label. When we get
> there we know that s1 is 8-byte aligned and we have at least 1 byte to
> read, so a single 8-byte load won't read past the end of s1 and cross
> a page boundary.
> 
> But we have to be more careful with s2. So check if it's within 8
> bytes of a 4K boundary and if so go to the byte-by-byte loop.
> 
> Fixes: 2d9ee327adce ("powerpc/64: Align bytes before fall back to .Lshort in 
> powerpc64 memcmp()")
> Cc: sta...@vger.kernel.org # v4.19+
> Reported-by: Chandan Rajendra 
> Signed-off-by: Michael Ellerman 

For unknown reasons, I am unable to recreate this bug on the unmodified
next-20190115 which was the original kernel I had found this bug on.

FWIW, I have executed generic/026 on a next-20190115 kernel with this patch
applied and I wasn't able to recreate the bug. Hence,

Tested-by: Chandan Rajendra 

-- 
chandan





Re: [PATCH v3] powerpc/64: Fix memcmp reading past the end of src/dest

2019-03-22 Thread Segher Boessenkool
On Fri, Mar 22, 2019 at 11:37:24PM +1100, Michael Ellerman wrote:
>  .Lcmp_rest_lt8bytes:
> - /* Here we have only less than 8 bytes to compare with. at least s1
> -  * Address is aligned with 8 bytes.
> -  * The next double words are load and shift right with appropriate
> -  * bits.
> + /*
> +  * Here we have less than 8 bytes to compare. At least s1 is aligned to
> +  * 8 bytes, but s2 may not be. We must make sure s2 + 8 doesn't cross a

"s2 + 7"?  The code is fine though (bgt, not bge).

> +  * page boundary, otherwise we might read past the end of the buffer and
> +  * trigger a page fault. We use 4K as the conservative minimum page
> +  * size. If we detect that case we go to the byte-by-byte loop.
> +  *
> +  * Otherwise the next double word is loaded from s1 and s2, and shifted
> +  * right to compare the appropriate bits.
>*/
> + clrldi  r6,r4,(64-12)   // r6 = r4 & 0xfff

You can just write
  rlwinm r6,r4,0,0x0fff
if that is clearer?  Or do you still want a comment with that :-)

> + cmpdi   r6,0xff8
> + bgt .Lshort

Reviewed-by: Segher Boessenkool 


Segher


[PATCH v3] powerpc/64: Fix memcmp reading past the end of src/dest

2019-03-22 Thread Michael Ellerman
Chandan reported that fstests' generic/026 test hit a crash:

  BUG: Unable to handle kernel data access at 0xc0062ac4
  Faulting instruction address: 0xc0092240
  Oops: Kernel access of bad area, sig: 11 [#1]
  LE SMP NR_CPUS=2048 DEBUG_PAGEALLOC NUMA pSeries
  CPU: 0 PID: 27828 Comm: chacl Not tainted 
5.0.0-rc2-next-20190115-1-g6de6dba64dda #1
  NIP:  c0092240 LR: c066a55c CTR: 
  REGS: c0062c0c3430 TRAP: 0300   Not tainted  
(5.0.0-rc2-next-20190115-1-g6de6dba64dda)
  MSR:  82009033   CR: 44000842  XER: 2000
  CFAR: 7fff7f3108ac DAR: c0062ac4 DSISR: 4000 IRQMASK: 0
  GPR00:  c0062c0c36c0 c17f4c00 c121a660
  GPR04: c0062ac3fff9 0004 0020 275b19c4
  GPR08: 000c 46494c45 5347495f41434c5f c26073a0
  GPR12:  c27a  
  GPR16:    
  GPR20: c0062ea70020 c0062c0c38d0 0002 0002
  GPR24: c0062ac3ffe8 275b19c4 0001 c0062ac3
  GPR28: c0062c0c38d0 c0062ac30050 c0062ac30058 
  NIP memcmp+0x120/0x690
  LR  xfs_attr3_leaf_lookup_int+0x53c/0x5b0
  Call Trace:
xfs_attr3_leaf_lookup_int+0x78/0x5b0 (unreliable)
xfs_da3_node_lookup_int+0x32c/0x5a0
xfs_attr_node_addname+0x170/0x6b0
xfs_attr_set+0x2ac/0x340
__xfs_set_acl+0xf0/0x230
xfs_set_acl+0xd0/0x160
set_posix_acl+0xc0/0x130
posix_acl_xattr_set+0x68/0x110
__vfs_setxattr+0xa4/0x110
__vfs_setxattr_noperm+0xac/0x240
vfs_setxattr+0x128/0x130
setxattr+0x248/0x600
path_setxattr+0x108/0x120
sys_setxattr+0x28/0x40
system_call+0x5c/0x70
  Instruction dump:
  7d201c28 7d402428 7c295040 38630008 38840008 408201f0 4200ffe8 2c05
  4182ff6c 20c50008 54c61838 7d201c28 <7d402428> 7d293436 7d4a3436 7c295040

The instruction dump decodes as:
  subfic  r6,r5,8
  rlwinm  r6,r6,3,0,28
  ldbrx   r9,0,r3
  ldbrx   r10,0,r4  <-

Which shows us doing an 8 byte load from c0062ac3fff9, which
crosses the page boundary at c0062ac4 and faults.

It's not OK for memcmp to read past the end of the source or
destination buffers if that would cross a page boundary, because we
don't know that the next page is mapped.

As pointed out by Segher, we can read past the end of the source or
destination as long as we don't cross a 4K boundary, because that's
our minimum page size on all platforms.

The bug is in the code at the .Lcmp_rest_lt8bytes label. When we get
there we know that s1 is 8-byte aligned and we have at least 1 byte to
read, so a single 8-byte load won't read past the end of s1 and cross
a page boundary.

But we have to be more careful with s2. So check if it's within 8
bytes of a 4K boundary and if so go to the byte-by-byte loop.

Fixes: 2d9ee327adce ("powerpc/64: Align bytes before fall back to .Lshort in 
powerpc64 memcmp()")
Cc: sta...@vger.kernel.org # v4.19+
Reported-by: Chandan Rajendra 
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/lib/memcmp_64.S | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

v3: Just check if we're crossing a 4K boundary.

Oops, I wrote this a while back but forgot to send v3.

diff --git a/arch/powerpc/lib/memcmp_64.S b/arch/powerpc/lib/memcmp_64.S
index 844d8e774492..f6554ebebeb5 100644
--- a/arch/powerpc/lib/memcmp_64.S
+++ b/arch/powerpc/lib/memcmp_64.S
@@ -215,11 +215,20 @@ _GLOBAL_TOC(memcmp)
beq .Lzero
 
 .Lcmp_rest_lt8bytes:
-   /* Here we have only less than 8 bytes to compare with. at least s1
-* Address is aligned with 8 bytes.
-* The next double words are load and shift right with appropriate
-* bits.
+   /*
+* Here we have less than 8 bytes to compare. At least s1 is aligned to
+* 8 bytes, but s2 may not be. We must make sure s2 + 8 doesn't cross a
+* page boundary, otherwise we might read past the end of the buffer and
+* trigger a page fault. We use 4K as the conservative minimum page
+* size. If we detect that case we go to the byte-by-byte loop.
+*
+* Otherwise the next double word is loaded from s1 and s2, and shifted
+* right to compare the appropriate bits.
 */
+   clrldi  r6,r4,(64-12)   // r6 = r4 & 0xfff
+   cmpdi   r6,0xff8
+   bgt .Lshort
+
subfic  r6,r5,8
slwir6,r6,3
LD  rA,0,r3
-- 
2.20.1