[Bug tree-optimization/102162] Byte-wise access optimized away at -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 --- Comment #10 from Andrew Pinski --- So I looked into this a little bit and it works on aarch64 with -O1 -mstrict-align but if you remove -mstrict-align we get an unaligned access which I think it is expected. The gimple level is the same in both cases, it is expand which changes. Does hppa*-*-linux* have STRICT_ALIGNMENT set to true or false?
[Bug tree-optimization/102162] Byte-wise access optimized away at -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 --- Comment #9 from deller at gmx dot de --- On 9/1/21 11:25 PM, deller at gmx dot de wrote: > The "ldh" loads only the first two bytes, and extends it into the upper 32bits > with "extrw,s". > So, only 16bits instead of 32bits are loaded from the address where "evil" > is... Forget this! My testcase was wrong. Here is the correct testcase which then loads 32bits: short evil; int f_unaligned2(void) { return get_unaligned((unsigned long *)); } : 0: 2b 60 00 00 addil L%0,dp,r1 4: 34 33 00 00 ldo 0(r1),r19 8: 44 3c 00 00 ldh 0(r1),ret0 c: d7 9c 0a 10 depw,z ret0,15,16,ret0 10: 0e 64 10 53 ldh 2(r19),r19 14: e8 40 c0 00 bv r0(rp) 18: 0b 93 02 5c or r19,ret0,ret0
[Bug tree-optimization/102162] Byte-wise access optimized away at -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 --- Comment #8 from deller at gmx dot de --- On 9/1/21 11:19 PM, dave.anglin at bell dot net wrote: >> I think the problem with your testcase is, that the compiler doesn't know the >> alignment of the parameter "p" in your f_unaligned() function. >> So it will generate byte-accesses. > I think it's the type rather than the alignment. If type is char, one gets > byte accesses. If type is short, one gets 16-bit accesses. > > The alignment is being ignored. You are right. It's even worse! short evil; int f_unaligned2(void) { return get_unaligned(); } gives: : 0: 2b 60 00 00 addil L%0,dp,r1 4: 44 3c 00 00 ldh 0(r1),ret0 8: e8 40 c0 00 bv r0(rp) c: d3 9c 1f f0 extrw,s ret0,31,16,ret0 The "ldh" loads only the first two bytes, and extends it into the upper 32bits with "extrw,s". So, only 16bits instead of 32bits are loaded from the address where "evil" is...
[Bug tree-optimization/102162] Byte-wise access optimized away at -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 --- Comment #7 from dave.anglin at bell dot net --- On 2021-09-01 4:52 p.m., deller at gmx dot de wrote: > I think the problem with your testcase is, that the compiler doesn't know the > alignment of the parameter "p" in your f_unaligned() function. > So it will generate byte-accesses. I think it's the type rather than the alignment. If type is char, one gets byte accesses. If type is short, one gets 16-bit accesses. The alignment is being ignored.
[Bug tree-optimization/102162] Byte-wise access optimized away at -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 --- Comment #6 from deller at gmx dot de --- > So, it seems the __aligned__ attribute is ignored: > extern u32 output_len __attribute__((__aligned__(1))); I think the aligned attribute is not relevant here. Even u32 output_len; will generate word-accesses. I'd say that the "forcement-to-packed" is ignored when the compiler knows that the source is aligned. The "__attribute__((__packed__))" should *always* trigger byte-accesses.
[Bug tree-optimization/102162] Byte-wise access optimized away at -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 --- Comment #5 from dave.anglin at bell dot net --- On 2021-09-01 4:52 p.m., deller at gmx dot de wrote: > I think the problem with your testcase is, that the compiler doesn't know the > alignment of the parameter "p" in your f_unaligned() function. > So it will generate byte-accesses. So, it seems the __aligned__ attribute is ignored: extern u32 output_len __attribute__((__aligned__(1)));
[Bug tree-optimization/102162] Byte-wise access optimized away at -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 --- Comment #4 from dave.anglin at bell dot net --- On 2021-09-01 4:14 p.m., arnd at linaro dot org wrote: > Any idea what the difference is between the working version and your broken > one? Not really. My original test case worked as well. Helge created the broken one.
[Bug tree-optimization/102162] Byte-wise access optimized away at -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 --- Comment #3 from deller at gmx dot de --- Hi Arnd, I think the problem with your testcase is, that the compiler doesn't know the alignment of the parameter "p" in your f_unaligned() function. So it will generate byte-accesses. If you modify your testcase by adding this and compiling with -O1 (or higher) you see the problem: int evil; int f_unaligned2(void) { return get_unaligned(); } : 0: 2b 60 00 00 addil L%0,dp,r1 4: 34 21 00 00 ldo 0(r1),r1 8: e8 40 c0 00 bv r0(rp) c: 0c 20 10 9c ldw 0(r1),ret0
[Bug tree-optimization/102162] Byte-wise access optimized away at -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 Arnd Bergmann changed: What|Removed |Added CC||arnd at linaro dot org --- Comment #2 from Arnd Bergmann --- I tried reproducing the issue with my original kernel code, using this input: typedef unsigned u32; #define __packed __attribute__((packed)) #define __get_unaligned_t(type, ptr) ({ \ const struct { type x; } __packed *__pptr = (typeof(__pptr))(ptr); \ __pptr->x; \ }) #define get_unaligned(ptr) __get_unaligned_t(typeof(*(ptr)), (ptr)) int f_unaligned(u32 *p) { return get_unaligned(p); } int g(u32 *p) { return *(p); } and it looks like I get correct output: hppa64-linux-gcc -S kernel/test_unaligned.c -o - -O2 .LEVEL 2.0w .text .align 8 .globl f_unaligned .type f_unaligned, @function f_unaligned: .PROC .CALLINFO FRAME=0,NO_CALLS .ENTRY ldb 0(%r26),%r20 ldb 1(%r26),%r19 depd,z %r20,39,40,%r20 depd,z %r19,47,48,%r19 ldb 2(%r26),%r31 ldb 3(%r26),%r28 or %r19,%r20,%r19 depd,z %r31,55,56,%r31 or %r31,%r19,%r31 or %r28,%r31,%r28 bve (%r2) extrd,s %r28,63,32,%r28 .EXIT .PROCEND .size f_unaligned, .-f_unaligned .align 8 .globl g .type g, @function g: .PROC .CALLINFO FRAME=0,NO_CALLS .ENTRY ldw 0(%r26),%r28 bve (%r2) extrd,s %r28,63,32,%r28 .EXIT .PROCEND .size g, .-g .ident "GCC: (GNU) 11.1.0" Any idea what the difference is between the working version and your broken one?
[Bug tree-optimization/102162] Byte-wise access optimized away at -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162 --- Comment #1 from John David Anglin --- Created attachment 51395 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51395=edit Second test case Changing the optimization of get_unaligned_le32 to 0 results in correct code generation. We have the following in test-unaligned.c.235t.optimized: ave@atlas:~/linux/misc$ cat test-unaligned.c.235t.optimized ;; Function get_unaligned_le32 (get_unaligned_le32, funcdef_no=0, decl_uid=1506, cgraph_uid=1, symbol_order=1) __attribute__((optimize (0))) get_unaligned_le32 (const void * p) { const struct { u32 x; } * __pptr; u32 D.1517; u32 _4; : __pptr_2 = p_1(D); _4 = __pptr_2->x; : : return _4; } ;; Function test (test, funcdef_no=1, decl_uid=1512, cgraph_uid=2, symbol_order=2) test () { unsigned int _1; int _4; [local count: 1073741824]: _1 = get_unaligned_le32 (_len); [tail call] _4 = (int) _1; return _4; }