[Bug target/114741] New: [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations

2024-04-16 Thread nsz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114741

Bug ID: 114741
   Summary: [14 regression] aarch64 sve: unnecessary fmov for
scalar int bit operations
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

void foo(unsigned i, unsigned *p)
{
*p = i & 1;
}

with gcc -march=armv8-a+sve -O2 compiles to

foo:
fmovs31, w0
and z31.s, z31.s, #1
str s31, [x1]
ret

instead of

foo:
and w0, w0, 1
str w0, [x1]
ret

it is wrong with -mcpu=generic but good e.g. with -mcpu=neoverse-v1

[Bug target/113874] GNU2 TLS descriptor calls do not follow psABI on x86_64-linux-gnu

2024-02-13 Thread nsz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113874

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 CC||nsz at gcc dot gnu.org

--- Comment #40 from nsz at gcc dot gnu.org ---
(In reply to Jakub Jelinek from comment #22)
> BTW, does aarch64 dl-tlsdesc.S save SVE/SME register state (I only see fixed
> offsets in there), or are those call-saved?

call-saved.

[Bug target/112987] [14 Regression][aarch64] ICE in aarch64_do_track_speculation, at config/aarch64/aarch64-speculation.cc:214 since r14-5886-g426fddcbdad674

2024-02-01 Thread nsz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112987

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from nsz at gcc dot gnu.org ---
fixed for gcc-14 at 305fe4f136a3a3a78377a48c55d546000a3ba529

[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.

2024-01-23 Thread nsz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 CC||nsz at gcc dot gnu.org

--- Comment #2 from nsz at gcc dot gnu.org ---
is this fortran only?

glibc release is in a week, we can still do something (or backport a fix).

the vector abi does not allow 1 lane in this case
https://github.com/ARM-software/abi-aa/blob/main/vfabia64/vfabia64.rst#L867

c annotation:
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/aarch64/fpu/bits/math-vector.h;h=04837bdcd7c0d0ce91192e09fc2d6614cae289c2;hb=HEAD
fortran annotation:
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/aarch64/fpu/finclude/math-vector-fortran.h;h=92e15f0d6a758258f5728e628bbb2422b176fa95;hb=HEAD

i think the bug can be reproduced with older glibc by adding

!GCC$ builtin (cos) attributes simd (notinbranch)

[Bug target/112987] [14 Regression][aarch64] ICE in aarch64_do_track_speculation, at config/aarch64/aarch64-speculation.cc:214 since r14-5886-g426fddcbdad674

2024-01-17 Thread nsz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112987

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2024-01-17
 CC||nsz at gcc dot gnu.org

--- Comment #2 from nsz at gcc dot gnu.org ---
confirmed.

[Bug tree-optimization/111478] [12/13/14 regression] aarch64 SVE ICE: in compute_live_loop_exits, at tree-ssa-loop-manip.cc:250

2023-09-19 Thread nsz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111478

--- Comment #1 from nsz at gcc dot gnu.org ---
see also bug 111479

[Bug tree-optimization/111479] New: [12/13 regression] aarch64 SVE ICE: in compute_live_loop_exits, at tree-ssa-loop-manip.cc:248

2023-09-19 Thread nsz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111479

Bug ID: 111479
   Summary: [12/13 regression] aarch64 SVE ICE: in
compute_live_loop_exits, at tree-ssa-loop-manip.cc:248
   Product: gcc
   Version: 13.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

maybe related to bug 111478

$ cat bug.c
float a, b, c;
void *d;
int e, f, g;
void p() {
  float i = a;
  for (;;)
for (e = 0; e < g; e++) {
  float j = c, k = b, l = k, h = j, m = 0.0, n = 0.0;
  for (f = 0; f < e; f++) {
float o = b;
m = n = o;
  }
  ((float *)d)[2 * e] = l;
  ((float *)d)[e] = h;
  ((float *)d)[2 * e] += m - i * n;
  ((float *)d)[2 * e + 1] += n + i * m;
}
}

$ gcc -c -O3 -march=armv8-a+sve bug.c
during GIMPLE pass: vect
: In function 'p':
:4:6: internal compiler error: in compute_live_loop_exits, at
tree-ssa-loop-manip.cc:248
4 | void p() {
  |  ^
0x10ca603 compute_live_loop_exits
   
/data/jenkins/workspace/GNU-toolchain/fsf-13/src/gcc/gcc/tree-ssa-loop-manip.cc:248
0x10ca603 add_exit_phis_var
   
/data/jenkins/workspace/GNU-toolchain/fsf-13/src/gcc/gcc/tree-ssa-loop-manip.cc:330
0x10ca603 add_exit_phis
   
/data/jenkins/workspace/GNU-toolchain/fsf-13/src/gcc/gcc/tree-ssa-loop-manip.cc:391
0x10ca603 rewrite_into_loop_closed_ssa_1
   
/data/jenkins/workspace/GNU-toolchain/fsf-13/src/gcc/gcc/tree-ssa-loop-manip.cc:604
0x10ca603 rewrite_into_loop_closed_ssa(bitmap_head*, unsigned int)
   
/data/jenkins/workspace/GNU-toolchain/fsf-13/src/gcc/gcc/tree-ssa-loop-manip.cc:626
0x1262514 execute
   
/data/jenkins/workspace/GNU-toolchain/fsf-13/src/gcc/gcc/tree-vectorizer.cc:1361
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
Compiler returned: 1

[Bug tree-optimization/111478] New: [12/13/14 regression] aarch64 SVE ICE: in compute_live_loop_exits, at tree-ssa-loop-manip.cc:250

2023-09-19 Thread nsz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111478

Bug ID: 111478
   Summary: [12/13/14 regression] aarch64 SVE ICE: in
compute_live_loop_exits, at tree-ssa-loop-manip.cc:250
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

$ cat bug.c
float a, d, e, f, g;
int b, c;
void h() {
  for (; b; b++) {
for (; c;) {
  float i = d = i;
}
a += f - e * g;
a += g + e * f;
  }
}


$ gcc -c -O3 -march=armv8-a+sve bug.c
during GIMPLE pass: vect
: In function 'h':
:3:6: internal compiler error: in compute_live_loop_exits, at
tree-ssa-loop-manip.cc:250
3 | void h() {
  |  ^
0x1168eb4 compute_live_loop_exits
   
/data/jenkins/workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/tree-ssa-loop-manip.cc:250
0x1168eb4 add_exit_phis_var
   
/data/jenkins/workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/tree-ssa-loop-manip.cc:332
0x1168eb4 add_exit_phis
   
/data/jenkins/workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/tree-ssa-loop-manip.cc:393
0x1168eb4 rewrite_into_loop_closed_ssa_1
   
/data/jenkins/workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/tree-ssa-loop-manip.cc:606
0x1168eb4 rewrite_into_loop_closed_ssa(bitmap_head*, unsigned int)
   
/data/jenkins/workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/tree-ssa-loop-manip.cc:628
0x130a1e8 execute
   
/data/jenkins/workspace/GNU-toolchain/fsf-trunk/src/gcc/gcc/tree-vectorizer.cc:1358
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
Compiler returned: 1

[Bug target/106671] aarch64: BTI instruction are not inserted for cross-section direct calls

2023-08-15 Thread nsz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671

--- Comment #12 from nsz at gcc dot gnu.org ---
(In reply to Jiangning Liu from comment #11)
> Hi Wilco,
> 
> > "it means we will need a linker optimization to remove those redundant BTIs 
> > (eg. by changing them into NOPs)"
> 
> It will be only for performance optimization, right? If we don't care about
> performance, the linker doesn't need to optimize it to be NOP, right? It
> could still be useful if we only do this operation for a specific module.

no, this is a security feature, we want as few BTI c in an executable
segment as possible.

[Bug target/106671] aarch64: BTI instruction are not inserted for cross-section direct calls

2023-03-23 Thread nsz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 CC||nsz at gcc dot gnu.org
 Status|NEW |WAITING

--- Comment #7 from nsz at gcc dot gnu.org ---
fixed in bfd ld 2.41 see
https://sourceware.org/bugzilla/show_bug.cgi?id=30076

we can also fix gcc to work with older ld (emit bti c in local functions), but
i don't plan to do that unless there is a reason to do so. (it increases the
emitted bti c considerably in some workloads, e.g. linux kernel, while the
linker fix is less intrusive in the common case with small binaries and no
weird section hacks).

[Bug target/104689] aarch64: libgcc: DW_CFA_val_expression is not supported for RA_SIGN_SATE register

2022-05-25 Thread nsz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104689

nsz at gcc dot gnu.org changed:

   What|Removed |Added

   Target Milestone|--- |13.0
 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from nsz at gcc dot gnu.org ---
fixed for gcc-13

[Bug ipa/105160] New: [12 regression] ipa modref marks functions with asm volatile as const or pure

2022-04-05 Thread nsz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105160

Bug ID: 105160
   Summary: [12 regression] ipa modref marks functions with asm
volatile as const or pure
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: ipa
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

the following code is miscompiled with gcc -O1

#define sysreg_read(regname)\
({  \
unsigned long __sr_val; \
asm volatile(   \
"mrs %0, " #regname "\n"\
: "=r" (__sr_val)); \
\
__sr_val;   \
})

#define sysreg_write(regname, __sw_val) \
do {\
asm volatile(   \
"msr " #regname ", %0\n"\
:   \
: "r" (__sw_val));  \
} while (0)

#define isb()   \
do {\
asm volatile(   \
"isb"   \
:   \
:   \
: "memory");\
} while (0)

static unsigned long sctlr_read(void)
{
return sysreg_read(sctlr_el1);
}

static void sctlr_write(unsigned long val)
{
sysreg_write(sctlr_el1, val);
}

static void sctlr_rmw(void)
{
unsigned long val;

val = sctlr_read();
val |= 1UL << 7;
sctlr_write(val);
}

void sctlr_read_multiple(void)
{
sctlr_read();
sctlr_read();
sctlr_read();
sctlr_read();
}

void sctlr_write_multiple(void)
{
sctlr_write(0);
sctlr_write(0);
sctlr_write(0);
sctlr_write(0);
sctlr_write(0);
}

void sctlr_rmw_multiple(void)
{
sctlr_rmw();
sctlr_rmw();
sctlr_rmw();
sctlr_rmw();
}

void function(void)
{
sctlr_read_multiple();
sctlr_write_multiple();
sctlr_rmw_multiple();

isb();
}



aarch64-linux-gnu-gcc -O1 compiles it to
(note 'function' and 'sctlr_rmw_multiple'):


sctlr_rmw:
mrs x0, sctlr_el1

orr x0, x0, 128
msr sctlr_el1, x0

ret
sctlr_read_multiple:
mrs x0, sctlr_el1

mrs x0, sctlr_el1

mrs x0, sctlr_el1

mrs x0, sctlr_el1

ret
sctlr_write_multiple:
mov x0, 0
msr sctlr_el1, x0

msr sctlr_el1, x0

msr sctlr_el1, x0

msr sctlr_el1, x0

msr sctlr_el1, x0

ret
sctlr_rmw_multiple:
ret
function:
isb
ret


a similar issue in linux (but lager source file) got bisected to

https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=1b62cddcf091fb8cadf575246a7d3ff778650a6b

commit 1b62cddcf091fb8cadf575246a7d3ff778650a6b
Author: Jan Hubicka 
Date:   Fri Nov 12 14:00:47 2021 +0100

Fix ipa-modref pure/const discovery

PR ipa/103200
* ipa-modref.c (analyze_function, modref_propagate_in_scc): Do
not mark pure/const function if there are side-effects.


with -fdump-ipa-all

$ grep found t.c.087i.modref
Function found to be const: sctlr_rmw/2
Function found to be const: sctlr_read_multiple/3
Function found to be const: sctlr_write_multiple/4
Function found to be const: sctlr_rmw_multiple/5

even though t.c.086i.pure-const correctly identifies asm volatile
as not const/pure.

[Bug target/104689] New: aarch64: libgcc: DW_CFA_val_expression is not supported for RA_SIGN_SATE register

2022-02-25 Thread nsz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104689

Bug ID: 104689
   Summary: aarch64: libgcc: DW_CFA_val_expression is not
supported for RA_SIGN_SATE register
   Product: gcc
   Version: 10.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

gcc emits DW_CFA_AARCH64_negate_ra_state (DW_CFA_window_save) for pac-ret
but it's valid to set the RA_SIGN_STATE pseudo register via other dwarf
instructions.

currently libgcc unwinder can crash if DW_CFA_val_expression is used to
set the register value directly.
(reportedly the cranelift compiler can generate such code.)

[Bug target/102768] [feature request] Add compiler support for aarch64 shadow call stack

2022-02-22 Thread nsz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED
   Target Milestone|--- |12.0

--- Comment #9 from nsz at gcc dot gnu.org ---
i'm closing this as fixed. open separate bugs for further improvements.

Fixed by

https://gcc.gnu.org/g:ce09ab17ddd21f73ff2caf6eec3b0ee9b0e1a11e

commit ce09ab17ddd21f73ff2caf6eec3b0ee9b0e1a11e
Author: Dan Li 
AuthorDate: 2022-02-21 20:01:14 +

aarch64: Add compiler support for Shadow Call Stack

Shadow Call Stack can be used to protect the return address of a
function at runtime, and clang already supports this feature[1].

To enable SCS in user mode, in addition to compiler, other support
is also required (as discussed in [2]). This patch only adds basic
support for SCS from the compiler side, and provides convenience
for users to enable SCS.

For linux kernel, only the support of the compiler is required.

[1] https://clang.llvm.org/docs/ShadowCallStack.html
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768

Signed-off-by: Dan Li 

gcc/ChangeLog:

* config/aarch64/aarch64.cc (SLOT_REQUIRED):
Change wb_candidate[12] to wb_push_candidate[12].
(aarch64_layout_frame): Likewise, and
change callee_adjust when scs is enabled.
(aarch64_save_callee_saves):
Change wb_candidate[12] to wb_push_candidate[12].
(aarch64_restore_callee_saves):
Change wb_candidate[12] to wb_pop_candidate[12].
(aarch64_get_separate_components):
Change wb_candidate[12] to wb_push_candidate[12].
(aarch64_expand_prologue): Push x30 onto SCS before it's
pushed onto stack.
(aarch64_expand_epilogue): Pop x30 frome SCS, while
preventing it from being popped from the regular stack again.
(aarch64_override_options_internal): Add SCS compile option check.
(TARGET_HAVE_SHADOW_CALL_STACK): New hook.
* config/aarch64/aarch64.h (struct GTY): Add is_scs_enabled,
wb_pop_candidate[12], and rename wb_candidate[12] to
wb_push_candidate[12].
* config/aarch64/aarch64.md (scs_push): New template.
(scs_pop): Likewise.
* doc/invoke.texi: Document -fsanitize=shadow-call-stack.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Add hook have_shadow_call_stack.
* flag-types.h (enum sanitize_code):
Add SANITIZE_SHADOW_CALL_STACK.
* opts.cc (parse_sanitizer_options): Add shadow-call-stack
and exclude SANITIZE_SHADOW_CALL_STACK.
* target.def: New hook.
* toplev.cc (process_options): Add SCS compile option check.
* ubsan.cc (ubsan_expand_null_ifn): Enum type conversion.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/shadow_call_stack_1.c: New test.
* gcc.target/aarch64/shadow_call_stack_2.c: New test.
* gcc.target/aarch64/shadow_call_stack_3.c: New test.
* gcc.target/aarch64/shadow_call_stack_4.c: New test.
* gcc.target/aarch64/shadow_call_stack_5.c: New test.
* gcc.target/aarch64/shadow_call_stack_6.c: New test.
* gcc.target/aarch64/shadow_call_stack_7.c: New test.
* gcc.target/aarch64/shadow_call_stack_8.c: New test.

[Bug middle-end/104504] New: spurious -Wswitch-unreachable warning with -ftrivial-auto-var-init=zero

2022-02-11 Thread nsz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104504

Bug ID: 104504
   Summary: spurious -Wswitch-unreachable warning with
-ftrivial-auto-var-init=zero
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

reduced from linux code on which gcc-12 warns now:

int foo(int x) {
switch(x) {
int y;
/* spuriously warns with -ftrivial-auto-var-init=zero */
default:
y = x * 2;
return y;
}
}

$ gcc -Wall -ftrivial-auto-var-init=zero -c a.c
a.c: In function 'foo':
a.c:3:13: warning: statement will never be executed [-Wswitch-unreachable]
3 | int y;
  | ^

i can see why gcc warns, but it would be better not to.

[Bug target/102768] [feature request] Add support for aarch64 shadow call stack

2021-10-18 Thread nsz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768

--- Comment #3 from nsz at gcc dot gnu.org ---
well, protection mechanisms are rarely equivalent. neither scs nor
traditional stack protector are perfect.

to me compiler support for freestanding environments such as linux
makes sense. i cannot immediately tell if libc support would work.

(android is not a good indicator of what can be done in linux userspace:
the android abi is broken between releases while glibc is abi stable,
bionic can do hacks in longjmp/setcontext that is not acceptable in
glibc and android does not have mixed toolchain issues such as old
unwinder tries to unwind across a new binary.)

[Bug target/102768] [feature request] Add support for aarch64 shadow call stack

2021-10-15 Thread nsz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102768

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 CC||nsz at gcc dot gnu.org

--- Comment #1 from nsz at gcc dot gnu.org ---
note that this at least

 - requires runtime support (to manage the shadow stack),
 - needs a reserved register (x18),
 - affects unwinding (shadow stack must be unwound too),
 - affects longjmp and jmp_buf abi.

i guess these are taken care of in the linux context and in
that case i think it makes sense to have the gcc support
upstream instead of in a plugin.

however the general support in user-space is not trivial
(the required libc changes may not be possible in a backward
compatible way such as changing jmp_buf, or reliably such as
allocating the size of shadow stack and dealing with related
failures, or with good ui e.g. opt-in mechanism for binaries
that require shadow stack so there is no regression for
non-shadow-stack binaries, etc.) and there are existing stack
protection mechanisms implemented.

i just wanted to note here that the linux kernel use-case
can be treated separately from user-space applications and
likely less effort and less controversial if you scope the
feature right.

[Bug target/100354] New: [9 regression] aarch64: non-deligitimized UNSPEC UNSPEC_TLS (76) found in variable location

2021-04-30 Thread nsz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100354

Bug ID: 100354
   Summary: [9 regression] aarch64: non-deligitimized UNSPEC
UNSPEC_TLS (76) found in variable location
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

i see this note/warning a lot during an aarch64 glibc build
since gcc-9, it seems to require -O -g, and seems to be
harmless wrt code generation, just annoying.

$ cat bug.c
struct s {
  void *p;
  int n;
};

void foo(struct s *x)
{
  void *p = __builtin_thread_pointer();
  if (x->p != p)
x->p = p;
  x->n++;
}

$ aarch64-none-linux-gnu-gcc -S -O1 -g bug.c
bug.c: In function ‘foo’:
bug.c:6:6: note: non-delegitimized UNSPEC UNSPEC_TLS (76) found in variable
location
6 | void foo(struct s *x)
  |  ^~~


may be related to bug 89006

[Bug target/99551] New: aarch64: csel is used for cold scalar computation which affects performance

2021-03-11 Thread nsz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99551

Bug ID: 99551
   Summary: aarch64: csel is used for cold scalar computation
which affects performance
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

this is an optimization bug, i don't know which layer it should
be fixed so i report it as target bug.

cold path affects performance of hot code because csel is used:

long foo(long x, int c)
{
if (__builtin_expect(c,0))
x = (x + 15) & ~15;
return x;
}


compiles to

foo:
cmp w1, 0
add x1, x0, 15
and x1, x1, -16
cselx0, x1, x0, ne
ret

i think it would be better to use a branch if the user
explicitly marked the computation cold.
e.g. this is faster if c is always 0:

long foo(long x, int c)
{
if (__builtin_expect(c,0)) {
asm ("");
x = (x + 15) & ~15;
}
return x;
}

foo:
cbnzw1, .L7
ret
.L7:
add x0, x0, 15
and x0, x0, -16
ret

[Bug target/98747] New: aarch64: __ARM_FEATURE_MEMORY_TAGGING is defined on ilp32

2021-01-19 Thread nsz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98747

Bug ID: 98747
   Summary: aarch64: __ARM_FEATURE_MEMORY_TAGGING is defined on
ilp32
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

memory tagging intrinsics should be available when arm_acle.h
is included and __ARM_FEATURE_MEMORY_TAGGING is defined.

memory tagging is not supported with ILP32 so the feature test
macro should not be defined either, but gcc seems to define it

$ gcc -march=armv8.5-a+memtag -mabi=lp64 -E -dM - 

int *foo(int *p, unsigned long m)
{
#ifdef __ARM_FEATURE_MEMORY_TAGGING
return __arm_mte_create_random_tag(p, m);
#else
return p;
#endif
}

but with -march=armv8.5-a+memtag -mabi=ilp32 it fails

In file included from :1:
: In function 'foo':
:6:12: error: Memory Tagging Extension does not support '-mabi=ilp32'
6 | return __arm_mte_create_random_tag(p, m);
  |^~~
Compiler returned: 1

[Bug target/98618] aarch64: oob adrp offset causes relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21

2021-01-11 Thread nsz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98618

--- Comment #5 from nsz at gcc dot gnu.org ---
(In reply to Wilco from comment #3)
> I fixed this in GCC10:
> https://gcc.gnu.org/git/?p=gcc.git=commit;
> h=7d3b27ff12610fde9d6c4b56abc70c6ee9b6b3db
> 
> So this just needs to be backported.

thanks, i'll try that, i'm still looking for
a simple workaround in glibc, this affects
this code in elf_get_dynamic_info:

...
  63   else if ((d_tag_utype) DT_VERSIONTAGIDX (dyn->d_tag) <
DT_VERSIONTAGNUM)
  64 info[VERSYMIDX (dyn->d_tag)] = dyn;
  65   else if ((d_tag_utype) DT_EXTRATAGIDX (dyn->d_tag) < DT_EXTRANUM)
  66 info[DT_EXTRATAGIDX (dyn->d_tag) + DT_NUM + DT_THISPROCNUM
  67  + DT_VERSIONTAGNUM] = dyn;
  68   else if ((d_tag_utype) DT_VALTAGIDX (dyn->d_tag) < DT_VALNUM)
  69 info[DT_VALTAGIDX (dyn->d_tag) + DT_NUM + DT_THISPROCNUM
  70  + DT_VERSIONTAGNUM + DT_EXTRANUM] = dyn;
  71   else if ((d_tag_utype) DT_ADDRTAGIDX (dyn->d_tag) < DT_ADDRNUM)
  72 info[DT_ADDRTAGIDX (dyn->d_tag) + DT_NUM + DT_THISPROCNUM
  73  + DT_VERSIONTAGNUM + DT_EXTRANUM + DT_VALNUM] = dyn;
...

[Bug target/98618] aarch64: oob adrp offset causes relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21

2021-01-11 Thread nsz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98618

--- Comment #4 from nsz at gcc dot gnu.org ---
(In reply to Florian Weimer from comment #1)
> Is the test case really valid? It involves an out-of-bounds array access,
> after all.

sorry you are right the indexes are too far, a better test is

long n;
struct s { long a[100]; };
extern struct s obj __attribute__((visibility("hidden")));
void foo()
{
  long *a = obj.a;
  a[n - 0x7000] = n;
  a[0x7000 - n + 99] = n;
}

(i wanted to have an example with both + and - offset)
it compiles to

foo:
adrpx0, :got:n
adrpx2, obj-15032385536
add x2, x2, :lo12:obj-15032385536
adrpx1, obj+15032386328
ldr x0, [x0, #:got_lo12:n]
add x1, x1, :lo12:obj+15032386328
ldr x0, [x0]
neg x3, x0, lsl 3
str x0, [x2, x0, lsl 3]
str x0, [x3, x1]
ret

[Bug target/98618] aarch64: oob adrp offset causes relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21

2021-01-11 Thread nsz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98618

--- Comment #2 from nsz at gcc dot gnu.org ---
(In reply to Florian Weimer from comment #1)
> Is the test case really valid? It involves an out-of-bounds array access,
> after all.

no it doesn't, n is signed long and its value can be such that the access is in
bounds (and that's what the compiler must assume, so adrp must be anchored
accordingly).

[Bug target/98618] New: aarch64: oob adrp offset causes relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21

2021-01-11 Thread nsz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98618

Bug ID: 98618
   Summary: aarch64: oob adrp offset causes relocation truncated
to fit: R_AARCH64_ADR_PREL_PG_HI21
   Product: gcc
   Version: 8.4.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

gcc-8 and earlier can generate adrp with out of bounds offset
for hidden and local symbols.

i haven't yet found the change that fixed this in gcc-9.

this affects glibc since
https://sourceware.org/git/?p=glibc.git;a=commit;h=2f056e8a5dd4dc0f075413f931e82cede37d1057

$ cat bug.c
long n;
struct s { long a[100]; };
extern struct s obj __attribute__((visibility("hidden")));
void foo()
{
  long *a = obj.a;
  a[n - 0x7000 + 35] = n;
  a[0x6dff - n + 35 + 6 + 16 + 3] = n;
}

$ gcc -fPIC -O2 -c bug.c
$ objdump -rd bug.o

bug.o: file format elf64-littleaarch64


Disassembly of section .text:

 :
   0:   9000adrpx0, 8 
0: R_AARCH64_ADR_GOT_PAGE   n
   4:   9002adrpx2, 0 
4: R_AARCH64_ADR_PREL_PG_HI21   obj-0x37ee8
   8:   9142add x2, x2, #0x0
8: R_AARCH64_ADD_ABS_LO12_NCobj-0x37ee8
   c:   9001adrpx1, 0 
c: R_AARCH64_ADR_PREL_PG_HI21   obj+0x371d8
  10:   f940ldr x0, [x0]
10: R_AARCH64_LD64_GOT_LO12_NC  n
  14:   9121add x1, x1, #0x0
14: R_AARCH64_ADD_ABS_LO12_NC   obj+0x371d8
  18:   f940ldr x0, [x0]
  1c:   cb000fe3neg x3, x0, lsl #3
  20:   f8207840str x0, [x2, x0, lsl #3]
  24:   f8216860str x0, [x3, x1]
  28:   d65f03c0ret
$ gcc -shared bug.o obj.o
bug.o: In function `foo':
bug.c:(.text+0x4): relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21
against symbol `obj' defined in .data section in obj.o
bug.c:(.text+0xc): relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21
against symbol `obj' defined in .data section in obj.o
collect2: error: ld returned 1 exit status

[Bug libgcc/98251] libgcc on 32-bit soft-float ARM narrows -NaN incorrectly

2020-12-17 Thread nsz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98251

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 CC||nsz at gcc dot gnu.org

--- Comment #1 from nsz at gcc dot gnu.org ---
i believe ieee-754 only specifies the sign bit of a
nan after copy, negate, abs and copysign operations.

iso c does not specify further requirements about
the sign bit of a nan either.

so i think gcc should not assume that conversions
preserve the sign bit. (there may be real hw where
that is not the case, independently from what libgcc
is doing.)

[Bug target/97638] New: aarch64: bti c is missing at function entry with branch-protection

2020-10-30 Thread nsz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97638

Bug ID: 97638
   Summary: aarch64: bti c is missing at function entry with
branch-protection
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

gcc-10 (and trunk) with -mbranch-protection=bti (or standard)
fails to generate bti c at function entry in some cases:

char *foo (const char *s, const int c)
{
  const char *p = 0;
  for (;;)
  {
if (*s == c)
p = s;
if (p != 0 || *s++ == 0)
break;
  }
  return (char *)p;
}

gcc -O2 -mbranch-protection=bti is

foo:
.L3:
ldrbw2, [x0]
cmp w2, w1
beq .L2
add x0, x0, 1
cbnzw2, .L3
mov x0, 0
.L2:
ret

[Bug c/97321] New: add warning for pointer casts that may lead to aliasing violation when dereferenced

2020-10-07 Thread nsz at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97321

Bug ID: 97321
   Summary: add warning for pointer casts that may lead to
aliasing violation when dereferenced
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

consider:

int f(unsigned char **);

int g(char *p)
{
return f((unsigned char **));
}

such code is almost surely wrong (if f dereferences its
argument) this is a common mistake and it seems gcc-11
will optimize such code more aggressively which can lead
to broken behavior, see bug 97264.

so it would be useful to simply warn about casts between
pointer types that cannot alias. e.g.:

"warning: dangerous cast from `char **` to `unsigned char **` can lead to
aliasing violation [-Wpointer-cast]"

does not have to be in -Wall, but the current aliasing
warnings are too weak to catch bugs like in the example.

[Bug target/94891] aarch64: there is no way to strip PAC from a return address in c code

2020-07-16 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94891

nsz at gcc dot gnu.org changed:

   What|Removed |Added

   Target Milestone|--- |11.0
 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #13 from nsz at gcc dot gnu.org ---
fixed for gcc-11, gcc-10.2, gcc-9.4

[Bug target/94791] aarch64: -pg profiling is broken with pac-ret

2020-07-16 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94791

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |11.0

--- Comment #4 from nsz at gcc dot gnu.org ---
fixed for gcc-11, gcc-10.2, gcc-9.4

[Bug libgcc/96001] aarch64: bti is missing from lse.S when built with branch protection

2020-07-16 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96001

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 Resolution|--- |FIXED
   Target Milestone|--- |11.0
 Status|UNCONFIRMED |RESOLVED

--- Comment #4 from nsz at gcc dot gnu.org ---
fixed for gcc-11, gcc-10.2, gcc-9.4

[Bug libfortran/95920] Implicit declaration of function 'feenableexcept' in fpu-target.h

2020-07-06 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95920

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||nsz at gcc dot gnu.org
 Resolution|--- |INVALID

--- Comment #1 from nsz at gcc dot gnu.org ---
this is a newlib bug.

[Bug tree-optimization/95966] New: soft float operations are not tail called

2020-06-29 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95966

Bug ID: 95966
   Summary: soft float operations are not tail called
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

i'd expect this to be a tail call into the soft float add
operation on soft float targets:

fp_t foo(fp_t a, fp_t b)
{
return a + b;
}

e.g. on x86 with 'typedef __float128 fp_t' the generated code is

foo:
sub rsp, 8
call__addtf3
add rsp, 8
ret

on aarch64 with 'typedef long double fp_t' the generated code is

foo:
stp x29, x30, [sp, -16]!
mov x29, sp
bl  __addtf3
ldp x29, x30, [sp], 16
ret

i see similar code on other softfp targets.

[Bug target/94986] missing diagnostic on ARM thumb2 compilation with -pg when using r7 in inline asm

2020-06-03 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94986

--- Comment #5 from nsz at gcc dot gnu.org ---
(In reply to Nick Desaulniers from comment #4)
> (In reply to nsz from comment #2)
> > ideally r7 clobber would just work with -pg -fomit-frame-pointer.
> > the alloca problem is a separate issue (that r7 clobber may not
> > work with alloca).
> 
> Should GCC change this for aaarch32 then (rather than closing the bug)?

yes, but that's bug 69690.

[Bug target/94986] missing diagnostic on ARM thumb2 compilation with -pg when using r7 in inline asm

2020-06-03 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94986

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 CC||nsz at gcc dot gnu.org

--- Comment #2 from nsz at gcc dot gnu.org ---
on arm the -pg abi is

func:
  push {lr}
  bl _gnu_mcount_nc
  ...

so no frame pointer is involved, -pg implying
-fno-omit-frame-pointer is a historical mistake i think
(because some targets required fp for -pg, but most don't).

ideally r7 clobber would just work with -pg -fomit-frame-pointer.
the alloca problem is a separate issue (that r7 clobber may not
work with alloca).

[Bug target/94748] aarch64: many unnecessary bti j emitted

2020-05-14 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94748

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from nsz at gcc dot gnu.org ---
fixed for gcc-10.1 and on the gcc-9 branch.

[Bug target/94697] aarch64: bti j at function start instead of bti c

2020-05-14 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94697

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from nsz at gcc dot gnu.org ---
fixed for gcc-10.1 and on the gcc-9 branch.

[Bug target/94515] aarch64: broken unwind information for pac-ret

2020-05-14 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94515

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #5 from nsz at gcc dot gnu.org ---
fixed for gcc-10.1 and on gcc-9 and gcc-8 branches.

[Bug target/94514] aarch64: unwinding across mixed pac-ret and non-pac-ret frames is broken

2020-05-14 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94514

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #5 from nsz at gcc dot gnu.org ---
fixed for gcc-10.1 and on gcc-9 and gcc-8 branches.

[Bug target/94515] aarch64: broken unwind information for pac-ret

2020-05-14 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94515
Bug 94515 depends on bug 94514, which changed state.

Bug 94514 Summary: aarch64: unwinding across mixed pac-ret and non-pac-ret 
frames is broken
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94514

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

[Bug target/95129] aarch64: make outline-atomics work on non-gnu targets

2020-05-14 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95129

--- Comment #1 from nsz at gcc dot gnu.org ---
i also opened bug 95128 to just configure the outline-atomics away.

[Bug target/95128] aarch64: configure option for outline-atomics

2020-05-14 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95128

--- Comment #2 from nsz at gcc dot gnu.org ---
i also opened bug 95129 to fix the runtime detection.

[Bug target/95129] New: aarch64: make outline-atomics work on non-gnu targets

2020-05-14 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95129

Bug ID: 95129
   Summary: aarch64: make outline-atomics work on non-gnu targets
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

the initializer in libgcc uses __getauxval which is not
available on non-gnu targets so outlining atomics is
ineffective.

change the runtime lse check in libgcc such that non-glibc
targets can implement it too (e.g. calling __getauxval via
a weak reference and no #ifdef __gnu_linux__ check allows
a libc to implement it later, unfortunately a non-linux os
may not have the same hwcap mechanism so a more generic
libc<->libgcc abi would be better).

[Bug target/95128] New: aarch64: configure option for outline-atomics

2020-05-14 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95128

Bug ID: 95128
   Summary: aarch64: configure option for outline-atomics
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

on aarch64, non-gnu targets likely want to turn outline atomics off
in their toolchain (since outlining is ineffective without the hwcap
based initializer that can select lse atomics at runtime).

[Bug target/94697] aarch64: bti j at function start instead of bti c

2020-05-07 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94697

--- Comment #6 from nsz at gcc dot gnu.org ---
this is fixed for gcc 10.1, just not backported yet so i kept the bug open

[Bug target/94891] New: aarch64: there is no way to strip PAC from a return address in c code

2020-04-30 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94891

Bug ID: 94891
   Summary: aarch64: there is no way to strip PAC from a return
address in c code
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

Neither __builtin_return_address nor __builtin_extract_return_address
strips the pointer authentication code (PAC) when compiling with
-mbranch-protection=pac-ret.

Currently inline asm is the only way to get the actual return address
in pac-ret code (xpaclri instruction strips PAC without authenticating
the pointer), so users will have to disable pac-ret for code that uses
the builtins or add aarch64 asm.

It seems the only code that requires __builtin_return_address to return
the signed return address is the libgcc unwinder so it seems that would
be easier to fix than all other code. (Note that having PAC in
__builtin_return_address is not compatible with ilp32 and thus currently
pac-ret is disabled with -mabi=ilp32)

__builtin_extract_return_addr is required to be invertible with
__builtin_frob_return_addr which does not work for PAC.

So it seems aarch64 needs new builtins or existing builtins need to
change.

[Bug target/94791] New: aarch64: -pg profiling is broken with pac-ret

2020-04-27 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94791

Bug ID: 94791
   Summary: aarch64: -pg profiling is broken with pac-ret
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

int foo(int x)
{
return x;
}

gcc -pg -mbranch-protection=pac-ret
gives

foo:
hint25 // paciasp
stp x29, x30, [sp, -32]!
mov x29, sp
mov x1, x30
str w0, [sp, 28]
mov x0, x1  // passing signed return address
bl  _mcount
ldr w0, [sp, 28]
ldp x29, x30, [sp], 32
hint29 // autiasp
ret

_mcount needs a valid code address as argument
so different calls from the same call site can
be correlated and the caller can be identified
(e.g. with dladdr). either pac should be removed
with xpaclri or x30 saved into another temp reg
before paciasp.

[Bug target/94748] New: aarch64: many unnecessary bti j emitted

2020-04-24 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94748

Bug ID: 94748
   Summary: aarch64: many unnecessary bti j emitted
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

__attribute__((target("branch-protection=bti")))
int foo(void)
{
label:
  return 0;
} 


compiles to

foo:
hint34 // bti c
hint36 // bti j
mov w0, 0
ret

the bti j is not necessary and bti j should be rarely emitted
otherwise the security architecture is weakened.

[Bug target/94697] aarch64: bti j at function start instead of bti c

2020-04-23 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94697

nsz at gcc dot gnu.org changed:

   What|Removed |Added

   Target Milestone|--- |10.0

[Bug target/94729] New: aarch64: __attribute__((target("branch-protection=pac-ret"))) is accepted in ilp32

2020-04-23 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94729

Bug ID: 94729
   Summary: aarch64:
__attribute__((target("branch-protection=pac-ret")))
is accepted in ilp32
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

-mbranch-protection=pac-ret is not supported in ilp32
so i would expect the related attribute to be a compile
time error. (or a warning that it is ignored on ilp32)

gcc generates pac-ret instructions with -mabi=ilp32
which will almost surely fail at runtime on a pac-ret
enabled system:

long bar(void);
__attribute__((target("branch-protection=pac-ret")))
long foo(void)
{
return bar()+1;
} 

becomes

foo:
hint25 // paciasp
stp x29, x30, [sp, -16]!
mov x29, sp
bl  bar
add w0, w0, 1
ldp x29, x30, [sp], 16
hint29 // autiasp
ret

[Bug target/94697] New: aarch64: bti j at function start instead of bti c

2020-04-21 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94697

Bug ID: 94697
   Summary: aarch64: bti j at function start instead of bti c
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

function that may be indirectly called does not start with bti c:

void bar(int *);
void *addr;
int foo(int x)
{
label:
  addr=&
  bar();
  return x;
} 

with -O2 -mbranch-protection=bti+pac-ret

foo:
.L2:
hint36 // bti j
hint25 // paciasp
adrpx1, .L2
stp x29, x30, [sp, -32]!
add x1, x1, :lo12:.L2
adrpx2, .LANCHOR0
mov x29, sp
str x1, [x2, #:lo12:.LANCHOR0]
str w0, [sp, 28]
add x0, sp, 28
bl  bar
ldr w0, [sp, 28]
ldp x29, x30, [sp], 32
hint29 // autiasp
ret

.set.LANCHOR0,. + 0
addr:
.zero   8

happens if function starts with a label that may be indirect
jump target so a bti j is inserted, but there is a paciasp
at the beginning which would normally act as implicit bti c
when it's the first instruction.

[Bug target/94515] aarch64: broken unwind information for pac-ret

2020-04-21 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94515

--- Comment #1 from nsz at gcc dot gnu.org ---
i had a fix but it's not enough, so here is another test case:

__attribute__((noreturn)) void unwind(void);
int bar(void);
int global;

int foo(int x)
{
  if (x==1) return 2;
  int y = bar();
  if (y > global) global=y;
  if (y==3) unwind();
  return 0;
}

-O2 -S -mbranch-protection=pac-ret the asm:

foo:
.cfi_startproc
cmp w0, 1
beq .L4
hint25 // paciasp
.cfi_window_save   pauth on
stp x29, x30, [sp, -16]!
.cfi_def_cfa_offset 16
.cfi_offset 29, -16
.cfi_offset 30, -8
mov x29, sp
bl  bar
mov w1, w0
adrpx2, .LANCHOR0
ldr w0, [x2, #:lo12:.LANCHOR0]
cmp w0, w1
blt .L11
.L3:
mov w0, 0
cmp w1, 3
beq .L12
ldp x29, x30, [sp], 16
.cfi_remember_state
.cfi_restore 30
.cfi_restore 29
.cfi_def_cfa_offset 0
hint29 // autiasp
.cfi_window_save   pauth off
ret
.p2align 2,,3
.L11:
.cfi_restore_state   pauth on
str w1, [x2, #:lo12:.LANCHOR0]
b   .L3
.p2align 2,,3
.L4:
.cfi_def_cfa_offset 0
.cfi_restore 29
.cfi_restore 30
mov w0, 2   pauth should be off but it's on 
ret
.L12:
.cfi_def_cfa_offset 16
.cfi_offset 29, -16
.cfi_offset 30, -8
bl  unwind
.cfi_endproc

[Bug target/91970] arm: 64bit int to double conversion does not respect rounding mode

2020-04-18 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91970

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2020-04-18

--- Comment #11 from nsz at gcc dot gnu.org ---
confirmed

[Bug target/91970] arm: 64bit int to double conversion does not respect rounding mode

2020-04-18 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91970

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 CC||bugdal at aerifal dot cx

--- Comment #10 from nsz at gcc dot gnu.org ---
*** Bug 94646 has been marked as a duplicate of this bug. ***

[Bug target/94646] [arm] invalid codegen for conversion from 64-bit int to double hardfloat

2020-04-18 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94646

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE
 CC||nsz at gcc dot gnu.org

--- Comment #1 from nsz at gcc dot gnu.org ---
dup

*** This bug has been marked as a duplicate of bug 91970 ***

[Bug target/94515] New: aarch64: broken unwind information for pac-ret

2020-04-07 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94515

Bug ID: 94515
   Summary: aarch64: broken unwind information for pac-ret
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

pac-ret uses the .cfi_window_save directive to toggle between signed/unsigned
return address, alternatively .cfi_remember_state and .cfi_restore_state pair
can be used to keep track of the "return address signedness" state.

in some cases, when there are several return paths, gcc fails to generate
the correct cfi directives for all return paths which can cause the unwinder
not to authenticate a signed return address leading to a runtime crash on
pauth enabled systems.

example c++ test that segfaults (after fixing bug 94514 ):

volatile int zero = 0;

__attribute__((noinline))
void unwind (void)
{
  if (zero == 0)
throw 42;
}

__attribute__((noinline,noipa))
static int test (int z)
{
  if (z) {
asm volatile("":::"x20","x21");
unwind();
return 1;
  } else {
unwind();
return 2;
  }
}

int main ()
{
  try {
test (zero);
__builtin_abort ();
  } catch (...) {
return 0;
  }
  __builtin_abort ();
}

the test() function with -mbranch-protection=standard -O2 compiles to

_ZL4testi:
.LFB1:
.cfi_startproc
hint25 // paciasp
.cfi_window_save  // pauth on
stp x29, x30, [sp, -32]!
.cfi_def_cfa_offset 32
.cfi_offset 29, -32
.cfi_offset 30, -24
mov x29, sp
cbz w0, .L9
stp x20, x21, [sp, 16]
.cfi_offset 21, -8
.cfi_offset 20, -16
bl  _Z6unwindv
mov w0, 1
ldp x20, x21, [sp, 16]
.cfi_restore 21
.cfi_restore 20
ldp x29, x30, [sp], 32
.cfi_restore 30
.cfi_restore 29
.cfi_def_cfa_offset 0
hint29 // autiasp
.cfi_window_save  // pauth off
ret
.p2align 2,,3
.L9:
 ret addr pauth state is wrong here !
.cfi_def_cfa_offset 32
.cfi_offset 29, -32
.cfi_offset 30, -24
bl  _Z6unwindv
ldp x29, x30, [sp], 32
.cfi_restore 30
.cfi_restore 29
.cfi_def_cfa_offset 0
hint29 // autiasp
.cfi_window_save
mov w0, 2
ret
.cfi_endproc
.LFE1:
.size   _ZL4testi, .-_ZL4testi

[Bug target/94514] New: aarch64: unwinding across mixed pac-ret and non-pac-ret frames is broken

2020-04-07 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94514

Bug ID: 94514
   Summary: aarch64: unwinding across mixed pac-ret and
non-pac-ret frames is broken
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

libgcc unwinder on aarch64 fails to keep track of pauth state and may try to
authenticate return addresses that were not signed causing a runtime crash.

example c++ code that segfaults in the unwinder on a pauth enabled system:

__attribute__((noinline, target("branch-protection=pac-ret")))
static void do_throw (void)
{
  throw 42;
  __builtin_abort ();
}

__attribute__((noinline, target("branch-protection=none")))
static void no_pac_ret (void)
{
  do_throw ();
  __builtin_abort ();
}

int main ()
{
  try {
no_pac_ret ();
  } catch (...) {
return 0;
  }
  __builtin_abort ();
}

[Bug libgomp/91938] libgomp (and libitm) DSOs are incorrectly built with initial-exec tls-model

2020-01-29 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91938

--- Comment #7 from nsz at gcc dot gnu.org ---
(In reply to Martin Liška from comment #6)
> Can we close this issue now?

as far as *-musl* is concerned the bug is fixed,
but e.g. now android uses elf tls too, i'm not
sure what happens there.

i'm fine closing the bug with target milestone
gcc-10 and let other *-linux* targets open new
bugs if they care (don't reserve tls surplus).

[Bug target/92424] [aarch64] Broken code with -fpatchable-function-entry and BTI

2020-01-29 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 Target|aarch64, x86|aarch64
 Status|NEW |RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |10.0

--- Comment #7 from nsz at gcc dot gnu.org ---
fixed for gcc-10 and gcc-9.3, opened bug 93492 for the x86 case.

[Bug target/93492] New: Broken code with -fpatchable-function-entry and -fcf-protection=full

2020-01-29 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93492

Bug ID: 93492
   Summary: Broken code with -fpatchable-function-entry and
-fcf-protection=full
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

x86 version of bug 92424

endbr64 is not right at the function label with
-fcf-protection=full -fpatchable-function-entry=1

void f(){} is compiled to

f:
.section__patchable_function_entries,"aw",@progbits
.quad   .LPFE1
.text
.LPFE1:
nop
.LFB0:
.cfi_startproc
endbr64
ret
.cfi_endproc
.LFE0:
.size   f, .-f

[Bug target/93455] New: aarch64: Q constraint address is recomputed

2020-01-27 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93455

Bug ID: 93455
   Summary: aarch64: Q constraint address is recomputed
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

gcc may recompute the address used in a Q constraint
(which may be used for atomic load and stores).

static volatile int x[1];
int f()
{
int r;
asm volatile ("A %w0 %1" : "=r"(r) : "Q"(*x));
asm volatile ("B %0" : "=Q"(*x));
return r;
}

with -O3 gcc generates

f:
adrpx1, .LANCHOR0
add x0, x1, :lo12:.LANCHOR0
A w0 [x0]
add x1, x1, :lo12:.LANCHOR0
B [x1]
ret

i expected one address computation.

[Bug c/91113] add declare_simd_variant attribute support

2020-01-24 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91113

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |WONTFIX

--- Comment #1 from nsz at gcc dot gnu.org ---
i think this will have to be done differently (the attribute syntax).

[Bug target/92424] [aarch64] Broken code with -fpatchable-function-entry and BTI

2020-01-15 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 Target|aarch64 |aarch64, x86
 CC||nsz at gcc dot gnu.org

--- Comment #4 from nsz at gcc dot gnu.org ---
also affects x86 with -fcf-protection=branch -fpatchable-function-entry=N

that's the same issue so this should not be target specific.

[Bug target/92822] [10 Regression] testsuite failures on aarch64 after r278938

2019-12-10 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92822

--- Comment #3 from nsz at gcc dot gnu.org ---
it seems at least the following neon intrinsics are affected:

float32x2_t vmulx_laneq_f32 (float32x2_t, float32x4_t, const int);
float32x2_t vmul_laneq_f32 (float32x2_t, float32x4_t, const int);
float32x2_t vfma_laneq_f32 (float32x2_t, float32x2_t, float32x4_t, const int);
float32x2_t vfms_laneq_f32 (float32x2_t, float32x2_t, float32x4_t, const int);
float64x1_t vmul_laneq_f64 (float64x1_t, float64x2_t, const int);

[Bug target/92822] [10 Regression] testsuite failures on aarch64 after r278938

2019-12-10 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92822

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 CC||nsz at gcc dot gnu.org

--- Comment #2 from nsz at gcc dot gnu.org ---
e.g.

#include 

float32x2_t
foo (float32x2_t v0, float32x4_t v1)
{
  return vmulx_laneq_f32 (v0, v1, 0);
}

used to get translated to

foo:
fmulx   v0.2s, v0.2s, v1.s[0]
ret

now it is

foo:
adrpx0, .LC0
ldr q2, [x0, #:lo12:.LC0]
tbl v1.16b, {v1.16b}, v2.16b
fmulx   v0.2s, v0.2s, v1.2s
ret
.size   foo, .-foo
.section.rodata.cst16,"aM",@progbits,16
.align  4
.LC0:
.byte   0
.byte   1
.byte   2
.byte   3
.byte   0
.byte   1
.byte   2
.byte   3
.byte   0
.byte   1
.byte   2
.byte   3
.byte   4
.byte   5
.byte   6
.byte   7

[Bug libgomp/91938] libgomp (and libitm) DSOs are incorrectly built with initial-exec tls-model

2019-12-03 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91938

--- Comment #5 from nsz at gcc dot gnu.org ---
Author: nsz
Date: Tue Dec  3 11:13:38 2019
New Revision: 278932

URL: https://gcc.gnu.org/viewcvs?rev=278932=gcc=rev
Log:
musl: Fix invalid tls model in libgomp and libitm PR91938

Musl does not support initial-exec tls in dynamically loaded shared
libraries.

libgomp/ChangeLog:

2019-12-03  Szabolcs Nagy  

PR libgomp/91938
* configure.tgt: Avoid IE tls on *-*-musl*.

libitm/ChangeLog:

2019-12-03  Szabolcs Nagy  

PR libgomp/91938
* configure.tgt: Avoid IE tls on *-*-musl*.


Modified:
trunk/libgomp/ChangeLog
trunk/libgomp/configure.tgt
trunk/libitm/ChangeLog
trunk/libitm/configure.tgt

[Bug libgcc/91737] On Alpine Linux (libmusl) a statically linked C++ program which throws the first exception in two threads at the same time can busy spin on shutdown after main().

2019-11-18 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91737

nsz at gcc dot gnu.org changed:

   What|Removed |Added

   Target Milestone|--- |10.0

--- Comment #6 from nsz at gcc dot gnu.org ---
fixed in r278399 for gcc-10

[Bug target/65649] gcc generates overlarge constants for microblaze-linux-gnu

2019-11-18 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65649

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 CC||nsz at gcc dot gnu.org
   Target Milestone|--- |10.0

[Bug target/65649] gcc generates overlarge constants for microblaze-linux-gnu

2019-11-15 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65649

--- Comment #7 from nsz at gcc dot gnu.org ---
Author: nsz
Date: Fri Nov 15 17:39:14 2019
New Revision: 278308

URL: https://gcc.gnu.org/viewcvs?rev=278308=gcc=rev
Log:
microblaze: fix PR65649

microblaze-linux-musl build fails without this.

(This is a rebase of an earlier patch posted on bugzilla.)

gcc/ChangeLog:

2019-11-15  Nick Clifton  
Szabolcs Nagy  

PR target/65649
* config/microblaze/microblaze.c (print_operand): Print value as long.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/microblaze/microblaze.c

[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm

2019-11-08 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886

--- Comment #31 from nsz at gcc dot gnu.org ---
(In reply to Segher Boessenkool from comment #28)
> [ "ws" needs at least a Power7, btw. ]

powerpc64le-* implies power8 and that's where this came up.

[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm

2019-11-08 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886

--- Comment #30 from nsz at gcc dot gnu.org ---
i think it is not the end of the world if the asm constraint api
changes in this case: fixing musl is easy because it's not super
important to optimize fmin, fminf, fmax, fmaxf in libc (if it were
important then gcc should inline them instead of calling into libc,
currently it seems gcc is not able to do that without -ffast-math).

the change breaks the build of old musl releases with new gcc,
so as a general principle it makes more sense to me to keep
documented apis working (e.g. when glibc removed ustat, the gcc
devs asked for 5 years advance notice via deprecation warnings),
but it's up to the gcc maintainers to decide.

[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm

2019-11-06 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886

--- Comment #6 from nsz at gcc dot gnu.org ---
(In reply to Segher Boessenkool from comment #5)
> -- LLVM should support "wa", since that is *the* constraint for VSX
> registers.
> -- musl should use the "wa" constraint in its inline asm.
> -- If after those two you still want "ws" (for compiling legacy code, say), I
>can add that back to GCC 10 (it will do just the same as "wa").
> 
> Is that a plan?

llvm only accepts vector types for wa, not scalar types, so there is
a difference between wa and ws in llvm.

i guess musl can switch to wa and configure check if it works (and
disable the asm on compilers where it does not)

but i would prefer if ws and ww were kept as alias to wa in gcc to
avoid breaking existing code (this should not have huge maintenance
cost).

[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm

2019-10-23 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886

--- Comment #2 from nsz at gcc dot gnu.org ---
note that "ws" is now supported by clang, but "wa" is not.

[Bug target/91886] [10 regression] powerpc64 impossible constraint in asm

2019-10-23 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886

--- Comment #1 from nsz at gcc dot gnu.org ---
seems to be broken since r271916

[Bug target/91970] arm: 64bit int to double conversion does not respect rounding mode

2019-10-08 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91970

--- Comment #9 from nsz at gcc dot gnu.org ---
ok i was looking at the wrong code, didn't know libgcc2,
i agree that's the right way to fix this.

[Bug target/91970] arm: 64bit int to double conversion does not respect rounding mode

2019-10-07 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91970

--- Comment #7 from nsz at gcc dot gnu.org ---
i think the code snippet i posted is more efficient and significantly
smaller than using libgcc (which also sounds hard to wire up to do the
right thing). the code sequence can possibly be even inlined.
(and i don't mind if ucontrollers with single precision only fpu
don't have correct fenv behaviour)

[Bug target/91970] arm: 64bit int to double conversion does not respect rounding mode

2019-10-03 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91970

--- Comment #5 from nsz at gcc dot gnu.org ---
ok so the real problem is that libgcc does not define
FP_INIT_ROUNDMODE and FP_HANDLE_EXCEPTIONS etc for
hardfloat arm targets.

[Bug target/91970] arm: 64bit int to double conversion does not respect rounding mode

2019-10-02 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91970

--- Comment #3 from nsz at gcc dot gnu.org ---
(In reply to Andreas Schwab from comment #2)
> Don't you need #pragma STDC FENV_ACCESS?

yes, for iso c conformance you need it, but gcc does not
handle it anyway, instead it requires -frounding-math.

however if double prec instructions are available, using them
may be even faster in the difficult inexact case, e.g.

double uconv64(uint64_t x)
{
double lo = uconv32(x); // single instruction, always exact
double hi = uconv32(x>>32);
return lo + hi*0x1p32;
}

so i would not make the fix depend on -frounding-math,
just always use hardfloat instructions on hardfloat targets
to do the conv.

(i suspect it affects more than just armhf)

[Bug target/91970] arm: 64bit int to double conversion does not respect rounding mode

2019-10-02 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91970

--- Comment #1 from nsz at gcc dot gnu.org ---
floating-point exceptions are also missing for the same reason.

[Bug target/91970] New: arm: 64bit int to double conversion does not respect rounding mode

2019-10-02 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91970

Bug ID: 91970
   Summary: arm: 64bit int to double conversion does not respect
rounding mode
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

on arm-* with

#include 
#include 
int main()
{
long long x = (1LL << 60) - 1;
double y;
fesetround(FE_DOWNWARD);
__asm__ __volatile__ ("" : "+m" (x));
y = x;
__asm__ __volatile__ ("" : "+m" (y));
fesetround(FE_TONEAREST);
printf("%a\n", y);
}

i get

0x1p60

instead of

0x1.fp+59

i assume this is because the conversion is handled by __aeabi_l2d
(also known as __floatdidf in libgcc) which is not rounding mode
aware.

this affects hardfloat targets which otherwise support directed
rounding modes.

[Bug libgomp/91938] libgomp (and libitm) DSOs are incorrectly built with initial-exec tls-model

2019-09-30 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91938

--- Comment #3 from nsz at gcc dot gnu.org ---
i opened a glibc bug
https://sourceware.org/bugzilla/show_bug.cgi?id=25051

but i think this bug should be kept open for non *-linux*-gnu* targets.

[Bug libgomp/91938] libgomp (and libitm) DSOs are incorrectly built with initial-exec tls-model

2019-09-30 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91938

--- Comment #2 from nsz at gcc dot gnu.org ---
if you really want this optimization then libgomp has to do checks
to guarantee that the target libc supports this usage and only
enable it when it's 100% safe. (e.g. musl or bionic does not support
this, my guess is nothing really supports this other than glibc
and even glibc has trouble because users abuse it)

i don't believe the unacceptable performance claims, since with
tlsdesc there should be only a small performance difference
between initial-exec tls vs general tls access, so instead of
building broken binaries with initial-exec tls-model may be the
tls-dialect should be changed to tlsdesc (when supported).

[Bug libgomp/91938] New: libgomp (and libitm) DSOs are incorrectly built with initial-exec tls-model

2019-09-30 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91938

Bug ID: 91938
   Summary: libgomp (and libitm) DSOs are incorrectly built with
initial-exec tls-model
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgomp
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
CC: jakub at gcc dot gnu.org
  Target Milestone: ---

initial-exec tls is only valid in a dso if there is a guarantee
that the dso is never dynamically loaded or the c runtime has
tls reserved specifically for that dso to use.

gcc target libs don't provide such guarantee nor glibc has special
tls for libgomp or libitm so they are broken on *-linux*.

optimizing tls access is only acceptable if it does not break
correctness. (side note: initial-exec tls is required on glibc for
as-safe tls access, if that's necessary then the fix will need
glibc discussions, but the default should be safe for other libcs.)

this hits targets like aarch64 and powerpc* harder where glibc
can optimize dynamic tls in dsos to use the preallocated static
tls area if available so it runs out faster than on targets where
no such optimization is done. (initial-exec tls usage is actually
less performance relevant on those targets for the same reason.)

in principle that glibc logic can be changed to be more consistent
across targets, but i would only support that if there is a way to
coordinate the use of preallocated tls otherwise it's unsupportable.

see the libc-apha discussion
https://sourceware.org/ml/libc-alpha/2019-09/msg00512.html

[Bug c/82542] -fdump-lang-raw (formerly -fdump-translation-unit) no longer available for C

2019-09-25 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82542

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 CC||nsz at gcc dot gnu.org

--- Comment #13 from nsz at gcc dot gnu.org ---
this option allows on <=gcc-7 to see all global identifiers
(types, builtins, etc) that the compiler predefines,
currently i don't see a way to do that for c.

e.g. currently there is now way to tell what _FloatN
variants gcc understands, even though -fdump-translation-unit
with empty tu worked for it reliably previously.

(i guess i can attach gdb to cc1 and hope there is
enough debug info in cc1 to print things from gcc
internal data structures.. but that's not exactly
userfriendly)

[Bug target/91900] New: [10 regression] mipsisa64r6-*-* rejects lo clobber

2019-09-25 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91900

Bug ID: 91900
   Summary: [10 regression] mipsisa64r6-*-* rejects lo clobber
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

mips64 syscall code in musl is like

#define __NR_getpid 5038

static inline long __syscall0(long n)
{
 register long r7 __asm__("$7");
 register long r2 __asm__("$2") = n;
 __asm__ __volatile__ (
  "syscall"
  : "+"(r2), "=r"(r7)
  :
  : "$1", "$3", "$10", "$11", "$12", "$13", "$14", "$15", "$24", "$25", "hi",
"lo", "memory");
 return r7 ? -r2 : r2;
}

int getpid()
{
  return __syscall0(__NR_getpid);
}

because linux clobbers all sorts of registers.
this compiles with mips64-linux-musl-gcc and
up to gcc-9 with mipsisa64r6-linux-musl-gcc too,
but mipsisa64r6-* fails with trunk gcc
(gcc version 10.0.0 20190924):

t.c: In function '__syscall0':
t.c:7:2: error: the register 'lo' cannot be clobbered in 'asm' for the current
target
7 |  __asm__ __volatile__ (
  |  ^~~

[Bug target/91886] New: [10 regression] powerpc64 impossible register constraint in asm

2019-09-24 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91886

Bug ID: 91886
   Summary: [10 regression] powerpc64 impossible register
constraint in asm
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

this used to work for me:

double fmax(double x, double y)
{
__asm__ ("xsmaxdp %x0, %x1, %x2" : "=ws"(x) : "ws"(x), "ws"(y));
return x;
}

compiled to

fmax:
xsmaxdp 1, 1, 2
blr

now (gcc version 10.0.0 20190924) i get

fmax.c: In function 'fmax':
fmax.c:3:2: error: impossible constraint in 'asm'
3 |  __asm__ ("xsmaxdp %x0, %x1, %x2" : "=ws"(x) : "ws"(x), "ws"(y));
  |  ^~~

[Bug c++/91809] New: in c++ bit-field is not promoted to int in printf argument

2019-09-18 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91809

Bug ID: 91809
   Summary: in c++ bit-field is not promoted to int in printf
argument
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

may be a -Wformat bug only, but the c++ front-end seems
to use the wrong type:

#include 

struct X {
  unsigned long long a: 1;
} x;

void foo()
{
  printf("%d", x.a);
}

gcc -Wformat -xc++  says

a.c: In function 'void foo()':
a.c:9:12: warning: format '%d' expects argument of type 'int', but argument 2
has type 'long long unsigned int' [-Wformat=]
9 |   printf("%d", x.a);
  |   ~^   ~~~
  || |
  |int   long long unsigned int
  |   %lld

the warning is not present with -xc, which is
the expected behaviour: bit-field should be
promoted to int in this context, i don't think
c++ should behave differently.

(not a new regression, at least present since gcc-4.8)

[Bug libgcc/91737] On Alpine Linux (libmusl) a statically linked C++ program which throws the first exception in two threads at the same time can busy spin on shutdown after main().

2019-09-17 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91737

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 Status|RESOLVED|NEW
   Last reconfirmed||2019-09-17
 CC||nsz at gcc dot gnu.org
 Resolution|MOVED   |---
 Ever confirmed|0   |1

--- Comment #5 from nsz at gcc dot gnu.org ---
(In reply to Andrew Pinski from comment #1)
> Glibc has a similar bug and been discussed how to fix it.
> The way Glibc is going to fix it (though it has not yet) is that
> libpthread.a will be really just include one object file which includes all
> of the pthread library.

citation needed.

the plan in glibc is to provide a "is single threaded" api.
https://sourceware.org/ml/libc-alpha/2019-08/msg00438.html

once that's in then in principle any library (like libstdc++)
can do single thread optimizations without hacks.

(another glibc plan is to move libpthread.so into libc.so
so there are no awkward internal abis between them and then
avoiding pthread dependency is no longer relevant.)

i think that should work for the unwinder in libgcc too.

on the musl side, we want to disable this hack before that
happens, it's better to not do any single thread optimizations
than silently breaking things.

so the right fix is something equivalent to
https://gcc.gnu.org/viewcvs/gcc?view=revision=222329
i.e. libgcc should be compiled with GTHREAD_USE_WEAK=0 on *musl*.

[Bug tree-optimization/91723] New: builtin fma is not optimized or vectorized as *+

2019-09-10 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91723

Bug ID: 91723
   Summary: builtin fma is not optimized or vectorized as *+
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

i'd expect a*b+c to generate the same code as __builtin_fmaf(a,b,c)
when hw instruction is available for fmaf, but the later generates
significantly worst code in some cases, e.g. when vectorization
is involved.

consider:

void
foo (float *restrict r, const float *restrict a,
 const float *restrict b, const float *restrict c)
{
for (int i=0; i < 4; i++) {
float x;
#ifdef BUILTIN
x = __builtin_fmaf(a[i],b[i],c[i]);
x = __builtin_fmaf(a[i],b[i],x);
#else
x = a[i]*b[i]+c[i];
x = a[i]*b[i]+x;
#endif
r[i] = x;
}
}

with gcc -O3 -mfma -mavx -ffp-contract=fast -fno-math-errno i get good code:

foo:
vmovups (%rdx), %xmm0
vmovups (%rsi), %xmm1
vmovaps %xmm0, %xmm2
vfmadd213ps (%rcx), %xmm1, %xmm2
vfmadd132ps %xmm1, %xmm2, %xmm0
vmovups %xmm0, (%rdi)
ret

but if i add -DBUILTIN i get

foo:
vmovss  (%rsi), %xmm0
vmovss  (%rdx), %xmm1
vmovaps %xmm0, %xmm2
vfmadd213ss (%rcx), %xmm1, %xmm2
vfmadd132ss %xmm1, %xmm2, %xmm0
vmovss  4(%rdx), %xmm1
vmovss  %xmm0, (%rdi)
vmovss  4(%rsi), %xmm0
vmovaps %xmm0, %xmm2
vfmadd213ss 4(%rcx), %xmm1, %xmm2
vfmadd132ss %xmm1, %xmm2, %xmm0
vmovss  8(%rdx), %xmm1
vmovss  %xmm0, 4(%rdi)
vmovss  8(%rsi), %xmm0
vmovaps %xmm0, %xmm2
vfmadd213ss 8(%rcx), %xmm1, %xmm2
vfmadd132ss %xmm1, %xmm2, %xmm0
vmovss  12(%rdx), %xmm1
vmovss  %xmm0, 8(%rdi)
vmovss  12(%rsi), %xmm0
vmovaps %xmm0, %xmm2
vfmadd213ss 12(%rcx), %xmm1, %xmm2
vfmadd132ss %xmm1, %xmm2, %xmm0
vmovss  %xmm0, 12(%rdi)
ret


i expected identical results, the same happens on other targets.

[Bug lto/91299] LTO inlines a weak definition in presence of a non-weak definition from an ELF file

2019-07-30 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91299

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-07-30
 CC||nsz at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from nsz at gcc dot gnu.org ---
happens on trunk gcc too and target independent.

[Bug c/91113] New: add declare_simd_variant attribute support

2019-07-08 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91113

Bug ID: 91113
   Summary: add declare_simd_variant attribute support
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

to declare vector functions on aarch64 for one simd architecture only,
support for the openmp 5.0 declare variant syntax is required, but
full support for the omp declare variant pragma is excessive.
(for the aarch64 use-case, see user defined vector functions in
https://developer.arm.com/docs/101129/latest )

I suggest introducing an attribute in gcc that can handle a subset
of omp declare variant pragma and works in c and fortran declarations
for declare simd functions.

I think the syntax and semantics for the attribute should follow
the proposal for clang (without the clang_ prefix):
http://lists.llvm.org/pipermail/llvm-dev/2019-June/132987.html

```
declare_simd_variant
  (, {, })

:= The name of a function variant that is a base
language identifier, or, for C++, a template-id.

 := , {, }

 := simdlen() | simdlen("scalable")

:= inbranch | notinbranch

 :=  
 | 
 |   | {,}

  := linear_ref(,)
  | linear_var(, )
  | linear_uval(, )
  | linear(, )

 :=  | 

 := uniform()

   := align(, )

 := Name of a parameter in the scalar function declaration/definition

 := ... | -2 | -1 | 1 | 2 | ...

 := 1 | 2 | 3 | ...

 := {}{,} {}

 := isa(target-specific-value)

 := arch(target-specific-value)
```

example usage:
```
__attribute__(declare_simd_variant("vfoo", simdlen(2), notinbranch,
isa("simd"))
double foo(double x);

float64x2_t vfoo(float64x2_t vx);
```

should be equivalent to the openmp 5.0 code
```
#pragma omp declare variant(vfoo) \
  match(construct={simd(simdlen(2), notinbranch)}, device={isa("simd")})
double foo(double x);

float64x2_t vfoo(float64x2_t vx);
```

[Bug fortran/90539] [10 Regression] 481.wrf slowdown by 25% on Intel Kaby with -Ofast -march=native starting with r271377

2019-05-21 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90539

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 CC||nsz at gcc dot gnu.org

--- Comment #9 from nsz at gcc dot gnu.org ---
spec2017 521.wrf_r never finishes on aarch64

gcc rev 271291 runs fine
gcc rev 271380 does not finish (possibly a crash that the spec scripts don't
detect)

[Bug middle-end/90478] [10 Regression] ICE in emit_case_dispatch_table at gcc/stmt.c:796

2019-05-16 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90478

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 CC||nsz at gcc dot gnu.org

--- Comment #8 from nsz at gcc dot gnu.org ---
i see

FAIL: gcc.dg/tree-ssa/pr90478-2.c (internal compiler error)

on

aarch64-none-elf, aarch64_be-none-elf, arm-none-eabi

targets.

[Bug target/89628] New: aarch64_vector_pcs does not use v24-v31 as temp regs

2019-03-07 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89628

Bug ID: 89628
   Summary: aarch64_vector_pcs does not use v24-v31 as temp regs
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: nsz at gcc dot gnu.org
  Target Milestone: ---

consider

typedef __Float32x4_t vec;

__attribute__((aarch64_vector_pcs))
vec f(vec a0, vec a1, vec a2, vec a3, vec a4, vec a5, vec a6, vec a7)
{
vec t0, t1, t2, t3, t4, t5, t6, t7, s0, s1, s2, s3;
t0 = a0 - a7;
t1 = a1 - a6;
t2 = a2 - a5;
t3 = a3 - a4;
t4 = a4 - a3;
t5 = a5 - a2;
t6 = a6 - a1;
t7 = a7 - a0;
s0 = t0 * t1;
s1 = t2 * t3;
s2 = t4 * t5;
s3 = t6 * t7;
return s0 * s1 * s2 * s3 * a0 * a1 * a2 * a3 * a4 * a5 * a6 * a7;
}

the aarch64 vpcs has 8 arg + 8 temp regs to use, so i think such code should
not need to spill, however current gcc seems to compile it as

f:
stp q16, q17, [sp, -96]!
fsubv16.4s, v2.4s, v5.4s
stp q18, q19, [sp, 32]
fsubv17.4s, v0.4s, v7.4s
stp q20, q21, [sp, 64]
fsubv18.4s, v1.4s, v6.4s
fsubv20.4s, v3.4s, v4.4s
fsubv21.4s, v5.4s, v2.4s
fsubv19.4s, v4.4s, v3.4s
fmulv17.4s, v17.4s, v18.4s
fmulv16.4s, v16.4s, v20.4s
fsubv18.4s, v6.4s, v1.4s
fsubv20.4s, v7.4s, v0.4s
fmulv19.4s, v19.4s, v21.4s
fmulv16.4s, v17.4s, v16.4s
fmulv17.4s, v18.4s, v20.4s
ldp q20, q21, [sp, 64]
fmulv16.4s, v16.4s, v19.4s
ldp q18, q19, [sp, 32]
fmulv16.4s, v16.4s, v17.4s
fmulv16.4s, v16.4s, v0.4s
fmulv1.4s, v16.4s, v1.4s
ldp q16, q17, [sp], 96
fmulv2.4s, v1.4s, v2.4s
fmulv3.4s, v2.4s, v3.4s
fmulv4.4s, v3.4s, v4.4s
fmulv5.4s, v4.4s, v5.4s
fmulv6.4s, v5.4s, v6.4s
fmulv0.4s, v6.4s, v7.4s
ret

note that v24..v31 regs are not used but there are 6 spills.

[Bug libfortran/78314] [aarch64] ieee_support_halting does not report unsupported fpu traps correctly

2019-02-01 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78314

--- Comment #18 from nsz at gcc dot gnu.org ---
(In reply to Christophe Lyon from comment #16)
> I've noticed this problem on arm and aarch64 native builds too.
> But my cross-compilers (using QEMU as simulator) still pass this test. Does
> this mean there is a bug in QEMU?

qemu-user will just translate each guest fp operations
to host fp operations, so if the host supports traps
then you will see traps working.

it's not a bug in the sense that the arm architecture
allows trap support (it's just not required), but it's
buggy that it would not report the support correctly
(e.g. enabling traps always succeed under qemu but
traps don't happen if the underlying hw has no support)

[Bug libfortran/78314] [aarch64] ieee_support_halting does not report unsupported fpu traps correctly

2019-01-31 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78314

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|NEW
   Assignee|nsz at gcc dot gnu.org |unassigned at gcc dot 
gnu.org

--- Comment #15 from nsz at gcc dot gnu.org ---
i unassigned myself as i'm not working on this right now.

[Bug libfortran/78314] [aarch64] ieee_support_halting does not report unsupported fpu traps correctly

2019-01-31 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78314

--- Comment #14 from nsz at gcc dot gnu.org ---
(In reply to Uroš Bizjak from comment #13)
> (In reply to nsz from comment #12)
> > i don't know how to change this to false for IEEE_SUPPORT_HALTING
> > on aarch64 and arm targets, but that would be a possible fix.
> 
> --cut here--
> Index: libgfortran/config/fpu-glibc.h

that only turns the runtime check into "always false"

but the compile time check is still "always true".

which is still broken.

[Bug libfortran/78314] [aarch64] ieee_support_halting does not report unsupported fpu traps correctly

2019-01-31 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78314

--- Comment #12 from nsz at gcc dot gnu.org ---
this got reverted because of bug 88678

and because compile time and runtime support_halting are different.

the compile time value is unconditionally true, which is wrong for
aarch64 and arm:

gcc/fortran/simplify.c:
gfc_expr *
simplify_ieee_support (gfc_expr *expr)
{
  /* We consider that if the IEEE modules are loaded, we have full support
 for flags, halting and rounding, which are the three functions
 (IEEE_SUPPORT_{FLAG,HALTING,ROUNDING}) allowed in constant
 expressions. One day, we will need libgfortran to detect support and
 communicate it back to us, allowing for partial support.  */

  return gfc_get_logical_expr (gfc_default_logical_kind, >where,
   true);
}

i don't know how to change this to false for IEEE_SUPPORT_HALTING
on aarch64 and arm targets, but that would be a possible fix.

[Bug fortran/88678] [9 regression] Many gfortran.dg/ieee/ieee_X.f90 test cases fail starting with r267465

2019-01-31 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88678

--- Comment #21 from nsz at gcc dot gnu.org ---
this fix undid the change for bug 78314
do you plan to backport it to gcc 7,8 branches ?

note that in principle on targets where trapping is not supported
the "immediate alternate exception handling" mechanism of ieee 754
can be emulated by save/clear/check/restore status flags around each
fp operation, but i don't think gcc currently supports that
(and it's not very practical unless somebody uses it for debugging
fp issues).

[Bug fortran/88678] [9 regression] Many gfortran.dg/ieee/ieee_X.f90 test cases fail starting with r267465

2019-01-31 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88678

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 CC||nsz at gcc dot gnu.org

--- Comment #19 from nsz at gcc dot gnu.org ---
that code was there for a reason.. now aarch64 fails because it cannot detect
if the flags are supported or not.

so if detection is turned off then on aarch64 "supports trapping" should always
be false and likewise on any target that allows an implementation without
trapping exceptions.

[Bug target/88954] __attribute__((noplt)) doesn't work with function pointers

2019-01-23 Thread nsz at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88954

nsz at gcc dot gnu.org changed:

   What|Removed |Added

 CC||nsz at gcc dot gnu.org

--- Comment #7 from nsz at gcc dot gnu.org ---
note that with

void f_noplt(void) __attribute__((noplt));
void (*p)(void) = f_noplt;

the linker may create a PLT for f_noplt and use its address to initialize p in
case of non-pie linking.

alternatively the linker may emit a dynamic relocation for p so it is filled in
by the dynamic linker to the actual address of f_noplt.

it seems the bfd linker on x86_64 does the latter (if there is otherwise no
PLT),
but e.g. the gold linker does the former. (as far as the sysv abi is concerned
both behaviours are correct, the linker does not know about the noplt attr.)

this means that (depending on linker behaviour) a noplt function may get a PLT
in
non-pie executables (so noplt can only avoid lazy binding and jump slot relocs
reliably in pic code), may be linkers should be fixed so noplt always avoids
PLT
(on x86_64, other targets have other issues with non-pic), but then this has to
be abi to be reliable.

  1   2   3   >