[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation since r15-571-g1e0ae1f52741f7

2025-03-01 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #47 from GCC Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:ff38712bcba97ff9cba168a4e864c5a8ac453b7f

commit r15-7776-gff38712bcba97ff9cba168a4e864c5a8ac453b7f
Author: Jakub Jelinek 
Date:   Sat Mar 1 20:48:16 2025 +0100

ggc: Fix up ggc_internal_cleared_alloc_no_dtor [PR117047]

Apparently I got one of the !HAVE_ATTRIBUTE_ALIAS fallbacks wrong.

It compiled with a warning:
../../gcc/ggc-common.cc: In function 'void*
ggc_internal_cleared_alloc_no_dtor(size_t, void (*)(void*), size_t, size_t)':
../../gcc/ggc-common.cc:154:44: warning: unused parameter 'size'
[-Wunused-parameter]
  154 | ggc_internal_cleared_alloc_no_dtor (size_t size, void (*f)(void *),
  | ~~~^~~~
and obviously didn't work right (always allocated 0-sized objects).

Fixed thusly.

2025-03-01  Jakub Jelinek  

PR jit/117047
* ggc-common.cc (ggc_internal_cleared_alloc_no_dtor): Pass size
rather than s as the first argument to ggc_internal_cleared_alloc.

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation since r15-571-g1e0ae1f52741f7

2025-03-01 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

Jakub Jelinek  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #46 from Jakub Jelinek  ---
Fixed.

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation since r15-571-g1e0ae1f52741f7

2025-03-01 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #45 from GCC Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:8c15a6cefa0d1f8ec12701af1f528f473c33ff6b

commit r15-7770-g8c15a6cefa0d1f8ec12701af1f528f473c33ff6b
Author: Jakub Jelinek 
Date:   Sat Mar 1 11:22:27 2025 +0100

ggc: Avoid using ATTRIBUTE_MALLOC for allocations that need finalization
[PR117047]

As analyzed by Andrew/David/Richi/Sam in the PR, the reason for the
libgccjit ICE is that there are GC allocations with finalizers and we
still mark ggc_internal_{,cleared_}alloc with ATTRIBUTE_MALLOC, which
to the optimizers hints that nothing will actually read the state
of the objects when they get out of lifetime.  The finalizer actually
inspects those though.  What actually happens in the testcases is that on
  tree expr_size = TYPE_SIZE (expr->get_type ()->as_tree ());
we see that expr->get_type () was allocated using something with malloc
attribute but it doesn't escape and only the type size from it is queried,
so there is no need to store other members of it.  Except that it does
escape
in the GC internals.  Normal GC allocations are fine, they don't look at
the
data in the allocated objects on "free", but the ones with finalizers
actually
call a function on that object and expect the data to be in there.
So that we don't lose ATTRIBUTE_MALLOC for the common case when no
finalization is needed, the following patch uses the approach used e.g.
for glibc error function which can sometimes be noreturn but at other
times just return normally.
If possible, it uses __attribute__((alias ("..."))) to add an alias
to the function, where one is without ATTRIBUTE_MALLOC and one
(with _no_dtor suffix) is with ATTRIBUTE_MALLOC (note, as this is
C++ and I didn't want to hardcode particular mangling I used an
extern "C" function with 2 aliases to it), and otherwise adds a wrapper
(for the ggc-page/ggc-common case with noinline attribute if possible,
for ggc-none that doesn't matter because ggc-none doesn't support
finalizers).
The *_no_dtor aliases/wrappers are then used in inline functions which
pass unconditional NULL, 0 as the f/s pair.

2025-03-01  Jakub Jelinek  

PR jit/117047
* acinclude.m4 (gcc_CHECK_ATTRIBUTE_ALIAS): New.
* configure.ac: Add gcc_CHECK_ATTRIBUTE_ALIAS.
* ggc.h (ggc_internal_alloc): Remove ATTRIBUTE_MALLOC from
overload with finalizer pointer.  Call ggc_internal_alloc_no_dtor
in inline overload without finalizer pointer.
(ggc_internal_alloc_no_dtor): Declare.
(ggc_internal_cleared_alloc): Remove ATTRIBUTE_MALLOC from
overload with finalizer pointer.  Call
ggc_internal_cleared_alloc_no_dtor in inline overload without
finalizer pointer.
(ggc_internal_cleared_alloc_no_dtor): Declare.
(ggc_alloc): Call ggc_internal_alloc_no_dtor if no finalization
is needed.
(ggc_alloc_no_dtor): Call ggc_internal_alloc_no_dtor.
(ggc_cleared_alloc): Call ggc_internal_cleared_alloc_no_dtor if no
finalization is needed.
(ggc_vec_alloc): Call ggc_internal_alloc_no_dtor if no finalization
is needed.
(ggc_cleared_vec_alloc): Call ggc_internal_cleared_alloc_no_dtor if
no
finalization is needed.
* ggc-page.cc (ggc_internal_alloc): If HAVE_ATTRIBUTE_ALIAS, turn
overload with finalizer into alias to ggc_internal_alloc_ and
rename it to ...
(ggc_internal_alloc_): ... this, make it extern "C".
(ggc_internal_alloc_no_dtor): New alias if HAVE_ATTRIBUTE_ALIAS,
otherwise new noinline wrapper.
* ggc-common.cc (ggc_internal_cleared_alloc): If
HAVE_ATTRIBUTE_ALIAS,
turn overload with finalizer into alias to ggc_internal_alloc_ and
rename it to ...
(ggc_internal_cleared_alloc_): ... this, make it extern "C".
(ggc_internal_cleared_alloc_no_dtor): New alias if
HAVE_ATTRIBUTE_ALIAS, otherwise new noinline wrapper.
* ggc-none.cc (ggc_internal_alloc): If HAVE_ATTRIBUTE_ALIAS, turn
overload with finalizer into alias to ggc_internal_alloc_ and
rename it to ...
(ggc_internal_alloc_): ... this, make it extern "C".
(ggc_internal_alloc_no_dtor): New alias if HAVE_ATTRIBUTE_ALIAS,
otherwise new wrapper.
(ggc_internal_cleared_alloc): If HAVE_ATTRIBUTE_ALIAS, turn
overload
with finalizer into alias to ggc_internal_alloc_ and rename it to
...
(ggc_internal_cleared_alloc_): ... this, make it extern "C".
(ggc_internal_cleared_alloc_no_dtor): New alias if
HAVE_ATTRIBUTE_ALIAS, otherwise new wrapper.
 

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation since r15-571-g1e0ae1f52741f7

2025-03-01 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #44 from Jakub Jelinek  ---
In any case, libgccjit should be fixed to use just
ggc_internal_alloc instead of ggc_internal_cleared_alloc in the operator new so
that one doesn't actually even think of relying on the zero initialization
instead of properly constructing the values in the constructors.  Because all
the previous values are lost at the start of the constructor.

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation since r15-571-g1e0ae1f52741f7

2025-02-28 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #43 from Sam James  ---
Thanks. Emacs and jit.exp works with that change on top.

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation since r15-571-g1e0ae1f52741f7

2025-02-28 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #42 from Jakub Jelinek  ---
Fixed in my copy; s/ggc_cv/gcc_cv/ on the patch.

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation since r15-571-g1e0ae1f52741f7

2025-02-28 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #41 from Sam James  ---
(In reply to Jakub Jelinek from comment #40)

ggc_cv_have_attribute_alias=no typo (vs gcc_cv_...)

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation since r15-571-g1e0ae1f52741f7

2025-02-28 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #40 from Jakub Jelinek  ---
Created attachment 60610
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60610&action=edit
gcc15-pr117047.patch

So what about this patch?  Tries to use alias attribute at least for the
host==build cases if the host/build compiler supports alias attribute,
and then either uses aliases to make one alias without ATTRIBUTE_MALLOC or
wrapper around it.
Only build tested so far.

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation since r15-571-g1e0ae1f52741f7

2025-02-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #39 from Richard Biener  ---
playback::rvalue *
playback::context::
new_bitcast (location *loc,
 rvalue *expr,
 type *type_)
{
  tree expr_size = TYPE_SIZE (expr->get_type ()->as_tree ());

Hmm, so the issue is likely that the GC allocated object expr->get_type ()
allocates does not escape anywhere and thus when DSE does not find any
use of the vtable pointer it removes the store, not realizing that the
actual use is a defered GC walk and invocation of a DTOR.

So indeed it seems that those allocation functions are not suitable
'malloc' functions given their result escape to the GC.

That's independent on whether any of the alloc/free are inlined.  It
works just fine when there's no finalizer as there's nothing to
preserve in the objects when they are trivially "dead", but when a
finalizer invokes a DTOR then of course that can read from the
objects contents.

So a less radical approach would be to make only the allocation functions
without a finalizer 'malloc'.

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation since r15-571-g1e0ae1f52741f7

2025-02-27 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #38 from Jakub Jelinek  ---
(In reply to Andrew Pinski from comment #34)
> from ggc.h:
> ```
> /* The internal primitive.  */
> extern void *ggc_internal_alloc (size_t, void (*)(void *), size_t,
>  size_t CXX_MEM_STAT_INFO)
>  ATTRIBUTE_MALLOC;
> 
> 
> ...
> 
> /* Allocates cleared memory.  */
> extern void *ggc_internal_cleared_alloc (size_t, void (*)(void *),
>  size_t, size_t
>  CXX_MEM_STAT_INFO) ATTRIBUTE_MALLOC;
> ```
> 
> I am not 100% sure that is valid with LTO especially if ggc_free can be
> inlined.
> 
> a simple test is to mark ggc_free as noinline (or noipa) or remove the
> ATTRIBUTE_MALLOC usage from ggc.h header file.

ggc_free is really large, is that fnsplit that inlines just the if (in_gcc)
return; part of it or something similar?
I think noinline attribute on ggc_free wouldn't be a bad idea.

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation since r15-571-g1e0ae1f52741f7

2025-02-27 Thread dmalcolm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #37 from David Malcolm  ---
Created attachment 60608
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60608&action=edit
Excerpt from jit-playback.s

This is an excerpt from the .s file for 
_ZN3gcc3jit8playback7context11new_bitcastEPNS1_8locationEPNS1_6rvalueEPNS1_4typeE
= gcc::jit::playback::context::new_bitcast(gcc::jit::playback::location*,
gcc::jit::playback::rvalue*, gcc::jit::playback::type*)

It seems to have an inlined copy of gcc::jit::playback::rvalue::get_type

It calls _Z26ggc_internal_cleared_allocmPFvPvEmm@PLT

The behavior I'm seeing is that the vtable ptr never seems to get written to
the allocated object:
  _ZTVN3gcc3jit8playback6rvalueE aka "vtable for gcc::jit::playback::rvalue"
leaving the vtable ptr null, and thus when eventually the finalizer is called,
wrapper::finalizer's vfunc call to wrapper::finalize becomes a jump through a
null fnptr.

On cfarm420 I see:

(gdb) p /x $rax
$2 = 0x77345d50

(which is +16

   0x749f58bb <+11>:mov%rdx,%r13
   0x749f58be <+14>:xor%edx,%edx
   0x749f58c0 <+16>:push   %r12
   0x749f58c2 <+18>:mov%rdi,%r12
   0x749f58c5 <+21>:mov$0x10,%edi
   0x749f58ca <+26>:push   %rbp
   0x749f58cb <+27>:mov%rsi,%rbp
   0x749f58ce <+30>:lea-0x9395(%rip),%rsi#
0x749ec540 
   0x749f58d5 <+37>:push   %rbx
   0x749f58d6 <+38>:mov%rcx,%rbx
   0x749f58d9 <+41>:mov$0x1,%ecx
   0x749f58de <+46>:sub$0x18,%rsp
   0x749f58e2 <+50>:lea0x2950457(%rip),%rax#
0x77345d40 <_ZTVN3gcc3jit8playback6rvalueE>
=> 0x749f58e9 <+57>:add$0x10,%rax
   0x749f58ed <+61>:movq   %rax,%xmm0
   0x749f58f2 <+66>:punpcklqdq %xmm1,%xmm0
   0x749f58f6 <+70>:movaps %xmm0,(%rsp)
   0x749f58fa <+74>:call   0x74c48280
<_Z26ggc_internal_cleared_allocmPFvPvEmm>
   0x749f58ff <+79>:mov0x10(%r13),%r13
   0x749f5903 <+83>:movzwl 0x0(%r13),%eax
   0x749f5908 <+88>:shl$0x6,%rax
   0x749f590c <+92>:add0x2a8daf5(%rip),%rax#
0x77483408
   0x749f5913 <+99>:cmpb   $0x0,0x1(%rax)
   0x749f5917 <+103>:   je 0x7415e106
<_ZN3gcc3jit8playback7context11new_bitcastEPNS1_8locationEPNS1_6rvalueEPNS1_4typeE.cold>
   0x749f591d <+109>:   mov0x8(%r13),%rdi
   0x749f5921 <+113>:   lea0x2115ff8(%rip),%rax#
0x76b0b920 
   0x749f5928 <+120>:   movzwl (%rdi),%edx
   0x749f592b <+123>:   cmpl   $0x2,(%rax,%rdx,4)
   0x749f592f <+127>:   jne0x7415e146
<_ZN3gcc3jit8playback7context11new_bitcastEPNS1_8locationEPNS1_6rvalueEPNS1_4typeE-9009002>
   0x749f5935 <+133>:   mov0x8(%rbx),%rbx

stepping instructions for <+50> through <+74>:

(gdb) p /x $xmm0
$4 = {v8_bfloat16 = {0x5d50, 0xf734, 0x7fff, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_half
= {0x5d50, 0xf734, 0x7fff, 0x0, 0x0, 
0x0, 0x0, 0x0}, v4_float = {0xf7345d50, 0x7fff, 0x0, 0x0}, v2_double =
{0x77345d50, 0x0}, v16_int8 = {0x50, 
0x5d, 0x34, 0xf7, 0xff, 0x7f, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0}, v8_int16 = {0x5d50, 0xf734, 
0x7fff, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int32 = {0xf7345d50, 0x7fff, 0x0,
0x0}, v2_int64 = {0x77345d50, 0x0}, 
  uint128 = 0x77345d50}

which seems to get packed with:

(gdb) p /x $xmm0
$11 = {v8_bfloat16 = {0x5d50, 0xf734, 0x7fff, 0x0, 0xca30, 0x, 0x7fff,
0x0}, v8_half = {0x5d50, 0xf734, 0x7fff, 
0x0, 0xca30, 0x, 0x7fff, 0x0}, v4_float = {0xf7345d50, 0x7fff,
0xca30, 0x7fff}, v2_double = {
0x77345d50, 0x7fffca30}, v16_int8 = {0x50, 0x5d, 0x34, 0xf7, 0xff,
0x7f, 0x0, 0x0, 0x30, 0xca, 0xff, 
0xff, 0xff, 0x7f, 0x0, 0x0}, v8_int16 = {0x5d50, 0xf734, 0x7fff, 0x0,
0xca30, 0x, 0x7fff, 0x0}, v4_int32 = {
0xf7345d50, 0x7fff, 0xca30, 0x7fff}, v2_int64 = {0x77345d50,
0x7fffca30}, 
  uint128 = 0x7fffca3077345d50}

and the "memset" call within the ggc_internal_cleared_alloc overwrites $xmm0
here:

Dump of assembler code for function __memset_avx2_unaligned_erms:
   0x73d7acc0 <+0>: endbr64
   0x73d7acc4 <+4>: vmovd  %esi,%xmm0
=> 0x73d7acc8 <+8>: mov%rdi,%rax
   0x73d7accb <+11>:cmp$0x20,%rdx
   0x73d7accf <+15>:jb 0x73d7ada0
<__memset_avx2_unaligned_erms+224>

and after the call to memset:

(gdb) p /x $xmm0
$14 = {v8_bfloat16 = {0xafaf, 0xafaf, 0xafaf, 0xafaf, 0xafaf, 0xafaf, 0xafaf,
0xafaf}, v8_half = {0xafaf, 0xafaf, 
0xafaf, 0xafaf, 0xafaf, 0xafaf, 0xafaf, 0xafaf}, v4_float = {0xafafafaf,
0xafafafaf, 0xafafafaf, 0xafafafaf}, 
  v2_double = {0xafafafafafafafaf, 0xafafafafafafafaf}, v16_int8 = {0xaf
}, v8_int16 = {0xafaf, 
0xafaf, 0xafaf, 0xafaf, 0xafaf, 0xafaf, 0xafaf, 0xafaf

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation since r15-571-g1e0ae1f52741f7

2025-02-27 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #36 from Sam James  ---
(In reply to Andrew Pinski from comment #34) 
> a simple test is to mark ggc_free as noinline (or noipa) or remove the
> ATTRIBUTE_MALLOC usage from ggc.h header file.

--- a/gcc/ggc.h
+++ b/gcc/ggc.h
@@ -127,8 +127,7 @@ extern void gt_pch_save (FILE *f);

 /* The internal primitive.  */
 extern void *ggc_internal_alloc (size_t, void (*)(void *), size_t,
-size_t CXX_MEM_STAT_INFO)
- ATTRIBUTE_MALLOC;
+size_t CXX_MEM_STAT_INFO);

 inline void *
 ggc_internal_alloc (size_t s CXX_MEM_STAT_INFO)
@@ -140,8 +139,7 @@ extern size_t ggc_round_alloc_size (size_t requested_size);

 /* Allocates cleared memory.  */
 extern void *ggc_internal_cleared_alloc (size_t, void (*)(void *),
-size_t, size_t
-CXX_MEM_STAT_INFO) ATTRIBUTE_MALLOC;
+size_t, size_t);

 inline void *
 ggc_internal_cleared_alloc (size_t s CXX_MEM_STAT_INFO)

works

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation since r15-571-g1e0ae1f52741f7

2025-02-27 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #35 from Sam James  ---
On another note -- David, at the moment, we're building libgccjit separately as
is recommended at
https://gcc.gnu.org/onlinedocs/jit/internals/index.html#packaging-notes but
with --disable-bootstrap on the first build to not have too high of a penalty
for doing that.

The build time cost from doing two bootstraps (even if the JIT build is as
minimal as possible) isn't ideal.

I wonder if I should just eat the cost and do --enable-host-shared instead in
one build (that bootstraps), given the amount of time this took to debug.

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation since r15-571-g1e0ae1f52741f7

2025-02-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #34 from Andrew Pinski  ---
from ggc.h:
```
/* The internal primitive.  */
extern void *ggc_internal_alloc (size_t, void (*)(void *), size_t,
 size_t CXX_MEM_STAT_INFO)
 ATTRIBUTE_MALLOC;


...

/* Allocates cleared memory.  */
extern void *ggc_internal_cleared_alloc (size_t, void (*)(void *),
 size_t, size_t
 CXX_MEM_STAT_INFO) ATTRIBUTE_MALLOC;
```

I am not 100% sure that is valid with LTO especially if ggc_free can be
inlined.

a simple test is to mark ggc_free as noinline (or noipa) or remove the
ATTRIBUTE_MALLOC usage from ggc.h header file.

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation since r15-571-g1e0ae1f52741f7

2025-02-27 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

Sam James  changed:

   What|Removed |Added

Summary|[15 regression] Segfault in |[15 regression] Segfault in
   |libgccjit garbage   |libgccjit garbage
   |collection when compiling   |collection when compiling
   |GNU Emacs with Native   |GNU Emacs with Native
   |Compilation |Compilation since
   ||r15-571-g1e0ae1f52741f7

--- Comment #33 from Sam James  ---
Bisected to r15-571-g1e0ae1f52741f7.

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2025-02-27 Thread dmalcolm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #32 from David Malcolm  ---
Thanks for the script and the notes.  I can now reproduce the crash on my main
development box, with e.g. test-bitcast.c

It seems to not be writing a vtable ptr to an object; the class's operator new
allocates it in the gc-heap and thus records a finalizer.  Crash happens
attempting to call a vfunc in wrapper_finalizer, due to vtable ptr being null. 

Am investigating further...

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2025-02-27 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #31 from Sam James  ---
(In reply to Sam James from comment #30)
> Created attachment 60601 [details]
> emacs-bug.sh

With this, running the testsuite, I get:

$ rg ^FAIL gcc/testsuite/jit/jit.sum
77:FAIL: did not find a generated reproducer: test-asm.cc.exe.reproducer.c
2290:FAIL: test-asm.cc.exe killed: SIGABRT SIGABRT
2310:FAIL: test-bitcast.c.exe killed: SIGSEGV segmentation violation
2541:FAIL: test-combination.c.exe killed: SIGSEGV segmentation violation
3056:FAIL: test-error-bad-bitcast.c.exe killed: SIGSEGV segmentation violation
3058:FAIL: test-error-bad-bitcast2.c.exe killed: SIGSEGV segmentation violation
3946:FAIL: test-error-impossible-must-tail-call.c.exe iteration 1 of 5:
verify_code: actual: "cannot tail-call: memory reference or volatile after
call" != expected: "cannot tail-call: callee returns a structure"
3948:FAIL: test-error-impossible-must-tail-call.c.exe killed: SIGABRT SIGABRT
5926:FAIL: test-ggc-bugfix.c.exe iteration 1 of 5: verify_code: result is NULL
5927:FAIL: test-ggc-bugfix.c.exe killed: SIGABRT SIGABRT
7651:FAIL: test-threads.c.exe killed: SIGSEGV segmentation violation

One or two of those seem -D_GLIBCXX_ASSERTIONS related.

But picking on another...
```
$ gdb --args ./testsuite/jit4/test-combination.c.exe
[...]
PASSED: test-combination.c.exe iteration 1 of 5:
make_calc_discriminant: actual: "q->b * q->b - (double)4 * q->a * q->c" ==
expected: "q->b * q->b - (double)4 * q->a * q->c"
PASSED: test-combination.c.exe iteration 1 of 5:
create_code_pr95306_builtin_types: gcc_jit_context_get_builtin_function (ctxt,
"__atomic_load") is non-null
PASSED: test-combination.c.exe iteration 1 of 5:
create_code_pr95306_builtin_types: gcc_jit_context_get_builtin_function (ctxt,
"__builtin_memcpy") is non-null
PASSED: test-combination.c.exe iteration 1 of 5:
create_code_pr95306_builtin_types: gcc_jit_context_get_builtin_function (ctxt,
"__builtin_sadd_overflow") is non-null
NOTE: test-combination.c.exe iteration 1 of 5: writing reproducer to
/tmp/build/gcc/testsuite/jit4/test-combination.c.exe.reproducer.c

Program received signal SIGSEGV, Segmentation fault.
0x752b7143 in gcc::jit::wrapper_finalizer(void*) () from
./libgccjit.so.0
(gdb) bt
#0  0x752b7143 in gcc::jit::wrapper_finalizer(void*) () from
./libgccjit.so.0
#1  0x752e5887 in ggc_collect(ggc_collect) [clone .localalias] () from
./libgccjit.so.0
#2  0x7538d997 in cgraph_node::finalize_function(tree_node*, bool)
[clone .localalias] () from ./libgccjit.so.0
#3  0x752b8c3d in gcc::jit::playback::function::postprocess() [clone
.localalias] () from ./libgccjit.so.0
#4  0x752ba13e in gcc::jit::playback::context::replay() () from
./libgccjit.so.0
#5  0x75943f1a in compile_file() () from ./libgccjit.so.0
#6  0x7527e501 in toplev::main(int, char**) () from ./libgccjit.so.0
#7  0x752bcf1c in gcc::jit::playback::context::compile() () from
./libgccjit.so.0
#8  0x752ac828 in gcc::jit::recording::context::compile() () from
./libgccjit.so.0
#9  0x75297526 in gcc_jit_context_compile () from ./libgccjit.so.0
#10 0x555853df in test_jit (argv0=0x7fffdbd3
"/tmp/build/gcc/testsuite/jit4/test-combination.c.exe", user_data=0x0) at
/home/sam/git/gcc/gcc/testsuite/jit.dg/harness.h:390
#11 0x555854d6 in main (argc=1, argv=0x7fffd898) at
/home/sam/git/gcc/gcc/testsuite/jit.dg/harness.h:438
```

Bingo!

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2025-02-27 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

Sam James  changed:

   What|Removed |Added

  Attachment #60599|0   |1
is obsolete||

--- Comment #30 from Sam James  ---
Created attachment 60601
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60601&action=edit
emacs-bug.sh

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2025-02-27 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #29 from Sam James  ---
richi's build doesn't use those flags, it instead does pgo+lto; lto is probably
the key bit there to get visibility discovery for similar effects in some cases
to -fno-semantic-interposition.

I also suspect --enable-host-pie may be able to replace --enable-default-pie.
But I'm bisecting first.

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2025-02-27 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #28 from Sam James  ---
(with fixed EMACS_SRC=)

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2025-02-27 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #27 from Sam James  ---
Created attachment 60599
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60599&action=edit
emacs-bug.sh

The attached `emacs-bug.sh` script reproduces it for me on cfarm420.

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2025-02-26 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

Sam James  changed:

   What|Removed |Added

   Keywords||wrong-code

--- Comment #26 from Sam James  ---
I've reproduced it on cfarm420. It requires bootstrapping (building libgccjit
w/ --disable-bootstrap with 14 works fine) and some specific CFLAGS. Narrowing
it down more now then will give a script.

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2025-02-26 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #25 from Sam James  ---
I've reproduced it

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2025-02-25 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #24 from Sam James  ---
Created attachment 60587
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60587&action=edit
libgccjit.log.xz crashing

This log is from `'../src/bootstrap-emacs' -batch --no-site-file --no-site-lisp
 -l comp -f batch-byte+native-compile emacs-lisp/byte-opt.el` crashing.

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2025-02-25 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #23 from Sam James  ---
Created attachment 60586
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60586&action=edit
Contents of /proc/cpuinfo on a machine which crashes

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2025-02-25 Thread dmalcolm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #22 from David Malcolm  ---
Created attachment 60584
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60584&action=edit
Contents of /proc/cpuinfo on a machine that this crash *doesn't* happen on

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2025-02-25 Thread dmalcolm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

David Malcolm  changed:

   What|Removed |Added

 CC||andrea.corallo at arm dot com,
   ||dmalcolm at gcc dot gnu.org

--- Comment #21 from David Malcolm  ---
If you can reproduce this, please set comp-ctxt-debug to >= 3: emacs ought to
write out a log to libgccjit.log (each invocation will overwrite the previous
log file there); please attach the log to this bug.

FWIW when I do this, my logfile has (among all kinds of other useful info):

JIT: argv[10]: -mtune=generic
JIT: argv[11]: -march=x86-64

so I don't think it's being affected by the machine its run on (but maybe emacs
can customize this?)

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2025-02-25 Thread dmalcolm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #19 from David Malcolm  ---
(In reply to David Malcolm from comment #12)
> Sam: what architectures/configurations do you see this on?  Comment #0 was
> presumably aarch64, but I don't think comment #3 specified anything beyond
> it being 64-bit.

Looking again at comment #0, I see that it was actually on x86_64.

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2025-02-25 Thread dmalcolm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #20 from David Malcolm  ---
This looks like it might be x86_64-specific.  If so, perhaps it's specific to a
particular microarchitecture?

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2025-02-25 Thread dmalcolm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #18 from David Malcolm  ---
I spent a large chunk of yesterday attempting to reproduce this, but
unfortunately I'm still not seeing it.

Is anyone seeing this on a machine in the compiler farm, and if so which?
Which specific version of emacs, and which specific version of gcc/libgccjit?
Alternatively, if you do see it and have time for a pair-debugging session,
please ping me on IRC.

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2025-02-21 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #17 from rguenther at suse dot de  ---
On Thu, 20 Feb 2025, sjames at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047
> 
> --- Comment #13 from Sam James  ---
> I've only seen this on amd64 so far (2 machines) but I didn't try to reproduce
> it on arm64 or elsewhere.

I've seen it on x86_64 as well

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2025-02-20 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #16 from Sam James  ---
Created attachment 60552
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60552&action=edit
emacs.log.xz

So far, not got anywhere with attempting to copy our packaging into a script.

I've attached a build log from building Emacs from git (just ./autogen.sh &&
./configure && make V=1 -j$(nproc) -l$(nproc)) using Gentoo's GCC in case you
can spot some difference with your own.

I'm going to see if I can reproduce in a Docker container using Gentoo's GCC
and go from there.

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2025-02-20 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #15 from Sam James  ---
(In reply to David Malcolm from comment #14)
> FWIW I tried again building emacs (from git) with gcc trunk with
> --with-native-compilation=aot on x86_64 and, annoyingly, "make" completed
> successfully; I see lots of
>./native-lisp/31.0.50-677d9325/*.eln 
> which are "ELF 64-bit LSB shared object, x86-64, version 1 (SYSV),
> statically linked, not stripped"
> 

I'll get back to trying to find how to configure GCC s.t. it happens. It seems
like in the right environment, it always happens, I just don't know what the
condition is yet.

> How clean is Emacs under valgrind normally?

It's clean "enough" if you...
a) pass -DUSE_VALGRIND in CFLAGS or CPPFLAGS when building, and
b) use a suppression file (like
https://lists.gnu.org/archive/html/bug-gnu-emacs/2022-01/txtaJC0QpICF7.txt,
which isn't perfect, but it made the output mostly clean for me)

When I ran the crasher under Valgrind, the only output I saw besides GC noise
at the beginning was the invalid access on the null deref. I didn't see
anything that looked useful or around the time of the crash, and the bit I did
see seemed like the usual innocent GC noise for Emacs.

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2025-02-20 Thread dmalcolm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #14 from David Malcolm  ---
FWIW I tried again building emacs (from git) with gcc trunk with
--with-native-compilation=aot on x86_64 and, annoyingly, "make" completed
successfully; I see lots of
   ./native-lisp/31.0.50-677d9325/*.eln 
which are "ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), statically
linked, not stripped"

So I tried "make clean" and hacked up lisp/Makefile to have:

  # The actual Emacs command run in the targets below.
  emacs = valgrind '$(EMACS)' $(EMACSOPT)

to try to run all of emacs in the compilation under valgrind.

I'm seeing huge numbers of valgrind issues within Emacs, many within its own
garbage collector e.g.:

==4122183== Conditional jump or move depends on uninitialised value(s)
==4122183==at 0x5D219D: pdumper_find_object_type_impl (pdumper.c:5287)
==4122183==by 0x5CA8FB: pdumper_find_object_type (pdumper.h:200)
==4122183==by 0x5CA8FB: mark_maybe_pointer (alloc.c:5028)
==4122183==by 0x5CABA1: mark_memory (alloc.c:5170)
==4122183==by 0x5CABA1: mark_c_stack (alloc.c:5353)
==4122183==by 0x675278: mark_one_thread (thread.c:670)
==4122183==by 0x675278: mark_threads_callback (thread.c:703)
==4122183==by 0x67617C: flush_stack_call_func (lisp.h:4463)
==4122183==by 0x67617C: mark_threads (thread.c:710)
==4122183==by 0x5C918F: garbage_collect (alloc.c:6004)
==4122183==by 0x5EE1B0: maybe_gc (lisp.h:5866)
==4122183==by 0x5EE1B0: eval_sub (eval.c:2479)
==4122183==by 0x61F65C: readevalloop (lread.c:2542)
==4122183==by 0x620235: Fload (lread.c:1730)
==4122183==by 0x620602: save_match_data_load (lread.c:1782)
==4122183==by 0x5EC495: load_with_autoload_queue (eval.c:2359)
==4122183==by 0x5FFE6F: Frequire (fns.c:3807)
==4122183== 
==4122183== Use of uninitialised value of size 8
==4122183==at 0x5D21C0: dump_bitset_bit_set_p (pdumper.c:5126)
==4122183==by 0x5D21C0: pdumper_find_object_type_impl (pdumper.c:5288)
==4122183==by 0x5CA8FB: pdumper_find_object_type (pdumper.h:200)
==4122183==by 0x5CA8FB: mark_maybe_pointer (alloc.c:5028)
==4122183==by 0x5CABA1: mark_memory (alloc.c:5170)
==4122183==by 0x5CABA1: mark_c_stack (alloc.c:5353)
==4122183==by 0x675278: mark_one_thread (thread.c:670)
==4122183==by 0x675278: mark_threads_callback (thread.c:703)
==4122183==by 0x67617C: flush_stack_call_func (lisp.h:4463)
==4122183==by 0x67617C: mark_threads (thread.c:710)
==4122183==by 0x5C918F: garbage_collect (alloc.c:6004)
==4122183==by 0x5EE1B0: maybe_gc (lisp.h:5866)
==4122183==by 0x5EE1B0: eval_sub (eval.c:2479)
==4122183==by 0x61F65C: readevalloop (lread.c:2542)
==4122183==by 0x620235: Fload (lread.c:1730)
==4122183==by 0x620602: save_match_data_load (lread.c:1782)
==4122183==by 0x5EC495: load_with_autoload_queue (eval.c:2359)
==4122183==by 0x5FFE6F: Frequire (fns.c:3807)

It's still building...

How clean is Emacs under valgrind normally?

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2025-02-20 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #13 from Sam James  ---
I've only seen this on amd64 so far (2 machines) but I didn't try to reproduce
it on arm64 or elsewhere.

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2025-02-20 Thread dmalcolm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #12 from David Malcolm  ---
Sam: what architectures/configurations do you see this on?  Comment #0 was
presumably aarch64, but I don't think comment #3 specified anything beyond it
being 64-bit.

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2025-02-20 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #11 from Sam James  ---
(In reply to Richard Biener from comment #10)
> So how does one go to try reproducing this?  Does it show up when building
> emacs itself?

Yes. If you build Emacs with ./configure --with-native-compilation, it should
happen (it may need --with-native-compilation=aot in order to pre-compile more)
just on `make`. No need to run Emacs manually or install it.

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2025-02-20 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #10 from Richard Biener  ---
So how does one go to try reproducing this?  Does it show up when building
emacs itself?  I do see a similar crash involving libgccjit in our build logs,
but that's likely not the "smallest" testcase?

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2025-02-13 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #9 from Sam James  ---
> but what would be really helpful is an option to call
>   gcc_jit_context_set_bool_option (ctxt, GCC_JIT_BOOL_OPTION_SELFCHECK_GC);
> on the underlying gcc_jit_context (or I suppose you could hack up your emacs
> build to do this). 

Unfortunately, this didn't seem to do anything
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047#c5). I can try again to
see if maybe I made an error though.

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2025-02-13 Thread dmalcolm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #8 from David Malcolm  ---
(In reply to Sam James from comment #4)

Thanks.

> (In reply to David Malcolm from comment #2)
> > What does printing *wrapper in the debugger look like?
> > 
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x725f8007 in gcc::jit::wrapper_finalizer (ptr=0x7fffdc491e70) at
> /usr/src/debug/sys-devel/gcc-15.0./gcc-15.0./gcc/jit/jit-playback.cc:
> 2094
> 2094  wrapper->finalizer ();
> (gdb) p wrapper
> $1 = (gcc::jit::playback::wrapper *) 0x7fffdc491e70
> (gdb) p *wrapper
> $2 = {_vptr.wrapper = 0x0}

wrapper_finalizer is a callback passed to ggc_internal_cleared_alloc; it's
being called with NULL.

Looking at finalizer::call I see:

  void call () const { m_function (m_addr); }

so presumably we somehow have a finalizer with NULL m_addr, but I don't see how
that can happen: they're only created by add_finalizer with "result", and
result seems to need to be non-NULL (unless I'm missing something).


Unfortunately I still haven't been able to reproduce this locally.

Looking at
https://www.gnu.org/software/emacs/manual/html_node/elisp/Native_002dCompilation-Variables.html
there are a few options, but what would be really helpful is an option to call
  gcc_jit_context_set_bool_option (ctxt, GCC_JIT_BOOL_OPTION_SELFCHECK_GC);
on the underlying gcc_jit_context (or I suppose you could hack up your emacs
build to do this).  It will *really* slow down libgccjit's code generation, but
ought to make the bug reproduce more reliably (and thus hopefully allow
minimization from the Lisp side).

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2025-02-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1

--- Comment #7 from Richard Biener  ---
I'd say bisecting might be most helpful, but it looks like a ::wrapper
registration issue (I'm just assuming the JITs GC state hangs off a
"dynamic" GC root and apps are allowed to allocate/deallocate wrappers).

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2025-02-06 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #6 from Sam James  ---
I'll try get back to bisecting this in the week. It's hard to reproduce in some
environments.

richi/jakub: can we make this P1 for now? it doesn't have to really block the
release if it comes down to it, but I'd like to make sure we don't forget about
it, and the functionality broken here is really popular with users.

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2025-01-18 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

Sam James  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Keywords||GC
   Last reconfirmed||2025-01-18

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2025-01-18 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #5 from Sam James  ---
Created attachment 60204
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60204&action=edit
macroexp-2c3e1495-c0b1cf80_libgccjit_repro.c.xz

Attached macroexp-2c3e1495-c0b1cf80_libgccjit_repro.c after patching Emacs to
unconditionally dump it (it already had a debug path for it).

The GC checking bool didn't change anything there.

I can't reproduce it manually using this C file yet tho.

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2025-01-18 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #4 from Sam James  ---
(In reply to David Malcolm from comment #2)
> What does printing *wrapper in the debugger look like?
> 

Program received signal SIGSEGV, Segmentation fault.
0x725f8007 in gcc::jit::wrapper_finalizer (ptr=0x7fffdc491e70) at
/usr/src/debug/sys-devel/gcc-15.0./gcc-15.0./gcc/jit/jit-playback.cc:2094
2094  wrapper->finalizer ();
(gdb) p wrapper
$1 = (gcc::jit::playback::wrapper *) 0x7fffdc491e70
(gdb) p *wrapper
$2 = {_vptr.wrapper = 0x0}

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2025-01-18 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

--- Comment #3 from Sam James  ---
I'm seeing this too.

```
Reading symbols from ../src/bootstrap-emacs...
(gdb) r
Starting program:
/var/tmp/portage/app-editors/emacs-31.0./work/emacs/src/bootstrap-emacs
-batch --no-site-file --no-site-lisp --eval \(setq\ load-prefer-newer\ t\
byte-compile-warnings\ \'all\) --eval \(setq\ org--inhibit-version-check\ t\)
-l comp -f batch-byte+native-compile emacs-lisp/loaddefs-gen.el
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib64/libthread_db.so.1".
Function(s) ^std::(move|forward|as_const|(__)?addressof) will be skipped when
stepping.
Function(s) ^std::(shared|unique)_ptr<.*>::(get|operator) will be skipped when
stepping.
Function(s)
^std::(basic_string|vector|array|deque|(forward_)?list|(unordered_|flat_)?(multi)?(map|set)|span)<.*>::(c?r?(begin|end)|front|back|data|size|empty)
will be skipped when stepping.
Function(s) ^std::(basic_string|vector|array|deque|span)<.*>::operator.] will
be skipped when stepping.

Program received signal SIGSEGV, Segmentation fault.
0x725f8007 in gcc::jit::wrapper_finalizer (ptr=0x7fffdc491e70) at
/usr/src/debug/sys-devel/gcc-15.0./gcc-15.0./gcc/jit/jit-playback.cc:2094
2094  wrapper->finalizer ();
(gdb) bt
#0  0x725f8007 in gcc::jit::wrapper_finalizer (ptr=0x7fffdc491e70) at
/usr/src/debug/sys-devel/gcc-15.0./gcc-15.0./gcc/jit/jit-playback.cc:2094
#1  0x72634402 in finalizer::call (this=) at
/usr/src/debug/sys-devel/gcc-15.0./gcc-15.0./gcc/ggc-page.cc:333
#2  ggc_handle_finalizers () at
/usr/src/debug/sys-devel/gcc-15.0./gcc-15.0./gcc/ggc-page.cc:1932
#3  ggc_collect (mode=) at
/usr/src/debug/sys-devel/gcc-15.0./gcc-15.0./gcc/ggc-page.cc:2232
#4  0x72707737 in cgraph_node::finalize_function (decl=0x7fffdc9b7d00,
no_collect=no_collect@entry=false)
at
/usr/src/debug/sys-devel/gcc-15.0./gcc-15.0./gcc/cgraphunit.cc:508
#5  0x725fed2f in gcc::jit::playback::function::postprocess
(this=0x7fffdc675a00) at
/usr/src/debug/sys-devel/gcc-15.0./gcc-15.0./gcc/jit/jit-playback.cc:2315
#6  0x72606350 in gcc::jit::playback::context::replay
(this=0x7fffa2e0) at
/usr/src/debug/sys-devel/gcc-15.0./gcc-15.0./gcc/jit/jit-playback.cc:3659
#7  0x72e35968 in compile_file () at
/usr/src/debug/sys-devel/gcc-15.0./gcc-15.0./gcc/toplev.cc:453
#8  0x725aa2e5 in do_compile () at
/usr/src/debug/sys-devel/gcc-15.0./gcc-15.0./gcc/toplev.cc:2213
#9  toplev::main (this=this@entry=0x7fffa24e, argc=,
argv=) at
/usr/src/debug/sys-devel/gcc-15.0./gcc-15.0./gcc/toplev.cc:2373
#10 0x726051c5 in gcc::jit::playback::context::compile
(this=this@entry=0x7fffa2e0) at
/usr/src/debug/sys-devel/gcc-15.0./gcc-15.0./gcc/jit/jit-playback.cc:2778
#11 0x725e588d in gcc::jit::recording::context::compile_to_file
(this=0x5659be20, output_kind=GCC_JIT_OUTPUT_KIND_DYNAMIC_LIBRARY,
output_path=0x56ebb040
"/var/tmp/portage/app-editors/emacs-31.0./work/emacs/native-lisp/31.0.50-f88d6d87/loaddefs-gen-e8a3ad9c-3bac3121bsYYK0.eln.tmp")
at
/usr/src/debug/sys-devel/gcc-15.0./gcc-15.0./gcc/jit/jit-recording.cc:1671
#12 0x725c74ff in gcc_jit_context_compile_to_file
(ctxt=0x5659be20, output_kind=GCC_JIT_OUTPUT_KIND_DYNAMIC_LIBRARY,
output_path=0x56ebb040
"/var/tmp/portage/app-editors/emacs-31.0./work/emacs/native-lisp/31.0.50-f88d6d87/loaddefs-gen-e8a3ad9c-3bac3121bsYYK0.eln.tmp")
at
/usr/src/debug/sys-devel/gcc-15.0./gcc-15.0./gcc/jit/libgccjit.cc:3883
#13 0x557c3c8a in Fcomp__compile_ctxt_to_file0
(filename=0x560e8f24) at
/var/tmp/portage/app-editors/emacs-31.0./work/emacs/src/lisp.h:1631
#14 0x557643d7 in eval_sub (form=) at
/var/tmp/portage/app-editors/emacs-31.0./work/emacs/src/eval.c:2587
#15 0x55765590 in Fprogn (body=) at
/var/tmp/portage/app-editors/emacs-31.0./work/emacs/src/eval.c:439
#16 Flet (args=) at
/var/tmp/portage/app-editors/emacs-31.0./work/emacs/src/eval.c:1109
[...]
```

Frustratingly, I can't reproduce it on a more powerful machine to first bisect
before poking more. So I'll do it slowly first.

[Bug jit/117047] [15 regression] Segfault in libgccjit garbage collection when compiling GNU Emacs with Native Compilation

2024-10-09 Thread dmalcolm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117047

David Malcolm  changed:

   What|Removed |Added

Summary|[15 regression] Segfault in |[15 regression] Segfault in
   |gcc/jit/jit-playback.cc |libgccjit garbage
   |when compiling GNU Emacs|collection when compiling
   |with Native Compilation |GNU Emacs with Native
   ||Compilation

--- Comment #2 from David Malcolm  ---
(In reply to Dario Gjorgjevski from comment #0)
> GCC commit ff889b359
> GNU Emacs commit 9ed82c2
> 
> When I attempt to compile GNU Emacs with `Native Compilation
>  Compilation.html>_, there is a segfault in gcc/jit/jit-playback.cc.
> 
> (lldb) run
> Process 96017 launched: '/Volumes/src/emacs/src/bootstrap-emacs' (x86_64)
> Process 96017 stopped
> * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS
> (code=1, address=0x0)
> frame #0: 0x000104ce7553
> libgccjit.0.dylib`gcc::jit::wrapper_finalizer(ptr=0x000104777f50) at
> jit-playback.cc:1900:22
>1897   wrapper_finalizer (void *ptr)
>1898   {
>1899 playback::wrapper *wrapper = reinterpret_cast 
>  *> (ptr);
> -> 1900 wrapper->finalizer ();
>1901   }
>1902   
>1903   /* gcc::jit::playback::wrapper subclasses are GC-managed:
> Target 0: (bootstrap-emacs) stopped.
> (lldb) bt
> * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS
> (code=1, address=0x0)
>   * frame #0: 0x000104ce7553
> libgccjit.0.dylib`gcc::jit::wrapper_finalizer(ptr=0x000104777f50) at
> jit-playback.cc:1900:22
> frame #1: 0x000105acc827 libgccjit.0.dylib`ggc_collect(ggc_collect)
> [inlined] finalizer::call(this=0x7fca639372f8) const at
> ggc-page.cc:333:35
> frame #2: 0x000105acc820 libgccjit.0.dylib`ggc_collect(ggc_collect)
> at ggc-page.cc:1932:15
> frame #3: 0x000105acc7c2
> libgccjit.0.dylib`ggc_collect(mode=) at ggc-page.cc:2232:25
> frame #4: 0x000105b8f767
> libgccjit.0.dylib`cgraph_node::finalize_function(decl=0x000100c56c00,
> no_collect=) at cgraphunit.cc:506:17
> frame #5: 0x000104ce8f0e
> libgccjit.0.dylib`gcc::jit::playback::function::
> postprocess(this=0x000100c25d70) at jit-playback.cc:2111:38
> frame #6: 0x000104cea49a
> libgccjit.0.dylib`gcc::jit::playback::context::
> replay(this=0x7ff7bfef6c30) at jit-playback.cc:3455:22
> frame #7: 0x000107adb9e0 libgccjit.0.dylib`global_options_set + 6400
> frame #8: 0x0001078ecfa0
> libgccjit.0.dylib`hard_frame_pointer_adjustment + 24
> frame #9: 0x000107adb9e0 libgccjit.0.dylib`global_options_set + 6400
> 
> The issue does not happen with the releases/gcc-14 branch -- commit
> be06962b3 in particular.
> 
> Any hints how to debug this further?

What does printing *wrapper in the debugger look like?

FWIW from the libgccjit side, there's gcc_jit_context_dump_reproducer_to_file

https://gcc.gnu.org/onlinedocs/jit/topics/contexts.html#c.gcc_jit_context_dump_reproducer_to_file

which in theory ought to give you a standalone C reproducer (without needing
emacs), but I don't know if that's exposed in an easy way from the emacs
native-compilation code that's invoking libgccjit.

The backtrace shows this is happening during GCC's garbage collection, so I
wonder if this is a dormant bug in memory management that your use case is
happening to trigger.  You could try:
   gcc_jit_context_set_bool_option (GCC_JIT_BOOL_OPTION_SELFCHECK_GC);
as per
https://gcc.gnu.org/onlinedocs/jit/topics/contexts.html#c.gcc_jit_context_set_bool_option.GCC_JIT_BOOL_OPTION_SELFCHECK_GC
to see if it triggers the bug earlier (or perhaps on gcc-14) but note that that
option will make libgccjit *really* slow at compiling.