[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above

2021-09-03 Thread deller at gmx dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #32 from deller at gmx dot de ---
Fixed in Linux kernel by declaring the extern int32 as char:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c42813b71a06a2ff4a155aa87ac609feeab76cf3

[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above

2021-09-02 Thread deller at gmx dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #31 from deller at gmx dot de ---
Richard suggested that adding a compiler optimization barrier (__asm__ ("" :
"+r" (__pptr))) might fix the problem.
I tested the attached patch and it works nicely.

[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above

2021-09-02 Thread deller at gmx dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #30 from deller at gmx dot de ---
Created attachment 51405
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51405=edit
Linux kernel patch to add compiler optimization barrier

Linux kernel boots sucessfully with this patch on hppa.

[Bug middle-end/102162] Byte-wise access optimized away at -O1 and above

2021-09-02 Thread deller at gmx dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #28 from deller at gmx dot de ---
Arnd,
there are various calls to the get_unaligned_X() functions in all kernel
bootloaders, specifically in the kernel decompression routines: 
[deller@ls3530 linux-2.6]$ grep get_unaligned lib/decompress*
lib/decompress_unlz4.c: size_t out_len = get_unaligned_le32(input + in_len);
lib/decompress_unlz4.c: chunksize = get_unaligned_le32(inp);
lib/decompress_unlz4.c: chunksize = get_unaligned_le32(inp);
lib/decompress_unlzo.c: version = get_unaligned_be16(parse);
lib/decompress_unlzo.c: if (get_unaligned_be32(parse) & HEADER_HAS_FILTER)
lib/decompress_unlzo.c: dst_len = get_unaligned_be32(in_buf);
lib/decompress_unlzo.c: src_len = get_unaligned_be32(in_buf);

So sadly it's not possible to work around that cases with linker scripts,
because they work on externally generated compressed files (kernel code) for
which the specs of the compressed files can't be changed.
Same for the output_len variable - externally linked in directly behind the
code and not (easily?) changeable.
Helge

[Bug tree-optimization/102162] Byte-wise access optimized away at -O1 and above

2021-09-01 Thread deller at gmx dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #9 from deller at gmx dot de ---
On 9/1/21 11:25 PM, deller at gmx dot de wrote:
> The "ldh" loads only the first two bytes, and extends it into the upper 32bits
> with "extrw,s".
> So, only 16bits instead of 32bits are loaded from the address where "evil" 
> is...

Forget this!
My testcase was wrong. Here is the correct testcase which then loads 32bits:

short evil;
int f_unaligned2(void)
{ return get_unaligned((unsigned long *)); }

 :
0:   2b 60 00 00 addil L%0,dp,r1
4:   34 33 00 00 ldo 0(r1),r19
8:   44 3c 00 00 ldh 0(r1),ret0
c:   d7 9c 0a 10 depw,z ret0,15,16,ret0
   10:   0e 64 10 53 ldh 2(r19),r19
   14:   e8 40 c0 00 bv r0(rp)
   18:   0b 93 02 5c or r19,ret0,ret0

[Bug tree-optimization/102162] Byte-wise access optimized away at -O1 and above

2021-09-01 Thread deller at gmx dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #8 from deller at gmx dot de ---
On 9/1/21 11:19 PM, dave.anglin at bell dot net wrote:
>> I think the problem with your testcase is, that the compiler doesn't know the
>> alignment of the parameter "p" in your f_unaligned() function.
>> So it will generate byte-accesses.
> I think it's the type rather than the alignment.  If type is char, one gets
> byte accesses.  If type is short, one gets 16-bit accesses.
>
> The alignment is being ignored.

You are right.
It's even worse!

short evil;
int f_unaligned2(void)
{ return get_unaligned(); }

gives:
 :
0:   2b 60 00 00 addil L%0,dp,r1
4:   44 3c 00 00 ldh 0(r1),ret0
8:   e8 40 c0 00 bv r0(rp)
c:   d3 9c 1f f0 extrw,s ret0,31,16,ret0

The "ldh" loads only the first two bytes, and extends it into the upper 32bits
with "extrw,s".
So, only 16bits instead of 32bits are loaded from the address where "evil"
is...

[Bug tree-optimization/102162] Byte-wise access optimized away at -O1 and above

2021-09-01 Thread deller at gmx dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #6 from deller at gmx dot de ---
> So, it seems the __aligned__ attribute is ignored:
> extern u32 output_len __attribute__((__aligned__(1)));

I think the aligned attribute is not relevant here. Even
u32 output_len;
will generate word-accesses.
I'd say that the "forcement-to-packed" is ignored
when the compiler knows that the source is aligned.
The "__attribute__((__packed__))" should *always* trigger byte-accesses.

[Bug tree-optimization/102162] Byte-wise access optimized away at -O1 and above

2021-09-01 Thread deller at gmx dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102162

--- Comment #3 from deller at gmx dot de ---
Hi Arnd,

I think the problem with your testcase is, that the compiler doesn't know the 
alignment of the parameter "p" in your f_unaligned() function.
So it will generate byte-accesses.

If you modify your testcase by adding this and compiling with -O1 (or higher)
you see the problem:

int evil;
int f_unaligned2(void)
{
 return get_unaligned();
}

 :
   0:   2b 60 00 00 addil L%0,dp,r1
   4:   34 21 00 00 ldo 0(r1),r1
   8:   e8 40 c0 00 bv r0(rp)
   c:   0c 20 10 9c ldw 0(r1),ret0