[Bug tree-optimization/102162] Byte-wise access optimized away at -O1 and above

2021-09-01 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #10 from Andrew Pinski  ---
So I looked into this a little bit and it works on aarch64 with -O1
-mstrict-align but if you remove -mstrict-align we get an unaligned access
which I think it is expected.
The gimple level is the same in both cases, it is expand which changes.

Does hppa*-*-linux* have STRICT_ALIGNMENT set to true or false?

[Bug tree-optimization/102162] Byte-wise access optimized away at -O1 and above

2021-09-01 Thread deller at gmx dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #9 from deller at gmx dot de ---
On 9/1/21 11:25 PM, deller at gmx dot de wrote:
> The "ldh" loads only the first two bytes, and extends it into the upper 32bits
> with "extrw,s".
> So, only 16bits instead of 32bits are loaded from the address where "evil" 
> is...

Forget this!
My testcase was wrong. Here is the correct testcase which then loads 32bits:

short evil;
int f_unaligned2(void)
{ return get_unaligned((unsigned long *)); }

 :
0:   2b 60 00 00 addil L%0,dp,r1
4:   34 33 00 00 ldo 0(r1),r19
8:   44 3c 00 00 ldh 0(r1),ret0
c:   d7 9c 0a 10 depw,z ret0,15,16,ret0
   10:   0e 64 10 53 ldh 2(r19),r19
   14:   e8 40 c0 00 bv r0(rp)
   18:   0b 93 02 5c or r19,ret0,ret0

[Bug tree-optimization/102162] Byte-wise access optimized away at -O1 and above

2021-09-01 Thread deller at gmx dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #8 from deller at gmx dot de ---
On 9/1/21 11:19 PM, dave.anglin at bell dot net wrote:
>> I think the problem with your testcase is, that the compiler doesn't know the
>> alignment of the parameter "p" in your f_unaligned() function.
>> So it will generate byte-accesses.
> I think it's the type rather than the alignment.  If type is char, one gets
> byte accesses.  If type is short, one gets 16-bit accesses.
>
> The alignment is being ignored.

You are right.
It's even worse!

short evil;
int f_unaligned2(void)
{ return get_unaligned(); }

gives:
 :
0:   2b 60 00 00 addil L%0,dp,r1
4:   44 3c 00 00 ldh 0(r1),ret0
8:   e8 40 c0 00 bv r0(rp)
c:   d3 9c 1f f0 extrw,s ret0,31,16,ret0

The "ldh" loads only the first two bytes, and extends it into the upper 32bits
with "extrw,s".
So, only 16bits instead of 32bits are loaded from the address where "evil"
is...

[Bug tree-optimization/102162] Byte-wise access optimized away at -O1 and above

2021-09-01 Thread dave.anglin at bell dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #7 from dave.anglin at bell dot net ---
On 2021-09-01 4:52 p.m., deller at gmx dot de wrote:
> I think the problem with your testcase is, that the compiler doesn't know the 
> alignment of the parameter "p" in your f_unaligned() function.
> So it will generate byte-accesses.
I think it's the type rather than the alignment.  If type is char, one gets
byte accesses.  If
type is short, one gets 16-bit accesses.

The alignment is being ignored.

[Bug tree-optimization/102162] Byte-wise access optimized away at -O1 and above

2021-09-01 Thread deller at gmx dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #6 from deller at gmx dot de ---
> So, it seems the __aligned__ attribute is ignored:
> extern u32 output_len __attribute__((__aligned__(1)));

I think the aligned attribute is not relevant here. Even
u32 output_len;
will generate word-accesses.
I'd say that the "forcement-to-packed" is ignored
when the compiler knows that the source is aligned.
The "__attribute__((__packed__))" should *always* trigger byte-accesses.

[Bug tree-optimization/102162] Byte-wise access optimized away at -O1 and above

2021-09-01 Thread dave.anglin at bell dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #5 from dave.anglin at bell dot net ---
On 2021-09-01 4:52 p.m., deller at gmx dot de wrote:
> I think the problem with your testcase is, that the compiler doesn't know the 
> alignment of the parameter "p" in your f_unaligned() function.
> So it will generate byte-accesses.
So, it seems the __aligned__ attribute is ignored:
extern u32 output_len __attribute__((__aligned__(1)));

[Bug tree-optimization/102162] Byte-wise access optimized away at -O1 and above

2021-09-01 Thread dave.anglin at bell dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #4 from dave.anglin at bell dot net ---
On 2021-09-01 4:14 p.m., arnd at linaro dot org wrote:
> Any idea what the difference is between the working version and your broken
> one?
Not really.  My original test case worked as well.  Helge created the broken
one.

[Bug tree-optimization/102162] Byte-wise access optimized away at -O1 and above

2021-09-01 Thread deller at gmx dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #3 from deller at gmx dot de ---
Hi Arnd,

I think the problem with your testcase is, that the compiler doesn't know the 
alignment of the parameter "p" in your f_unaligned() function.
So it will generate byte-accesses.

If you modify your testcase by adding this and compiling with -O1 (or higher)
you see the problem:

int evil;
int f_unaligned2(void)
{
 return get_unaligned();
}

 :
   0:   2b 60 00 00 addil L%0,dp,r1
   4:   34 21 00 00 ldo 0(r1),r1
   8:   e8 40 c0 00 bv r0(rp)
   c:   0c 20 10 9c ldw 0(r1),ret0

[Bug tree-optimization/102162] Byte-wise access optimized away at -O1 and above

2021-09-01 Thread arnd at linaro dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

Arnd Bergmann  changed:

   What|Removed |Added

 CC||arnd at linaro dot org

--- Comment #2 from Arnd Bergmann  ---
I tried reproducing the issue with my original kernel code, using this input:

typedef unsigned u32;
#define __packed __attribute__((packed))

#define __get_unaligned_t(type, ptr) ({
\
const struct { type x; } __packed *__pptr = (typeof(__pptr))(ptr); 
\
__pptr->x; 
\
}) 

#define get_unaligned(ptr)  __get_unaligned_t(typeof(*(ptr)), (ptr))

int f_unaligned(u32 *p)
{ 
 return get_unaligned(p); 
}

int g(u32 *p) 
{ 
 return *(p); 
}

and it looks like I get correct output:

hppa64-linux-gcc -S kernel/test_unaligned.c -o - -O2
.LEVEL 2.0w
.text
.align 8
.globl f_unaligned
.type   f_unaligned, @function
f_unaligned:
.PROC
.CALLINFO FRAME=0,NO_CALLS
.ENTRY
ldb 0(%r26),%r20
ldb 1(%r26),%r19
depd,z %r20,39,40,%r20
depd,z %r19,47,48,%r19
ldb 2(%r26),%r31
ldb 3(%r26),%r28
or %r19,%r20,%r19
depd,z %r31,55,56,%r31
or %r31,%r19,%r31
or %r28,%r31,%r28
bve (%r2)
extrd,s %r28,63,32,%r28
.EXIT
.PROCEND
.size   f_unaligned, .-f_unaligned
.align 8
.globl g
.type   g, @function
g:
.PROC
.CALLINFO FRAME=0,NO_CALLS
.ENTRY
ldw 0(%r26),%r28
bve (%r2)
extrd,s %r28,63,32,%r28
.EXIT
.PROCEND
.size   g, .-g
.ident  "GCC: (GNU) 11.1.0"

Any idea what the difference is between the working version and your broken
one?

[Bug tree-optimization/102162] Byte-wise access optimized away at -O1 and above

2021-09-01 Thread danglin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #1 from John David Anglin  ---
Created attachment 51395
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51395=edit
Second test case

Changing the optimization of get_unaligned_le32 to 0 results in correct
code generation.  We have the following in test-unaligned.c.235t.optimized:

ave@atlas:~/linux/misc$ cat test-unaligned.c.235t.optimized

;; Function get_unaligned_le32 (get_unaligned_le32, funcdef_no=0,
decl_uid=1506, cgraph_uid=1, symbol_order=1)

__attribute__((optimize (0)))
get_unaligned_le32 (const void * p)
{
  const struct
  {
u32 x;
  } * __pptr;
  u32 D.1517;
  u32 _4;

   :
  __pptr_2 = p_1(D);
  _4 = __pptr_2->x;

   :
:
  return _4;

}



;; Function test (test, funcdef_no=1, decl_uid=1512, cgraph_uid=2,
symbol_order=2)

test ()
{
  unsigned int _1;
  int _4;

   [local count: 1073741824]:
  _1 = get_unaligned_le32 (_len); [tail call]
  _4 = (int) _1;
  return _4;

}