[PATCH] preserve base pointer for __deregister_frame [PR110956]

2023-08-10 Thread Thomas Neumann via Gcc-patches
Original bug report: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110956 Rainer Orth successfully tested the patch on Solaris with a full bootstrap. Some uncommon unwinding table encodings need to access the base pointer for address computations. We do not have that information in calls to

PING: [PATCH] fix radix sort on 32bit platforms [PR109670]

2023-06-02 Thread Thomas Neumann via Gcc-patches
Summary: The radix sort did not handle the uppermost byte correctly, which sometimes broke win32 exceptions. Bugzilla #109670. The reporter confirmed that the patch fixes the bug. See: https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618000.html Best Thomas

PING^2: [PATCH] release the sorted FDE array when deregistering a frame [PR109685]

2023-06-02 Thread Thomas Neumann via Gcc-patches
Summary: The old linear scan logic called free while searching the list of frames. The atomic fast path finds the frame quickly, but forgot the free call. This patches adds the missing free. Bugzilla #109685. See: https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619026.html Best Thomas

[PATCH v2] release the sorted FDE array when deregistering a frame [PR109685]

2023-05-19 Thread Thomas Neumann via Gcc-patches
Am 19.05.23 um 19:26 schrieb Jeff Law: See: https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617245.html I think this needs an update given the other changes in this space. jeff I have included the updated the patch below. The atomic fastpath bypasses the code that releases the sort

Re: [PATCH] Fix assertion for unwind-dw2-fde.c btree changes

2023-05-15 Thread Thomas Neumann via Gcc-patches
Hello, this patch breaks the build on targets where range is not declared i.e. where the #ifdef ATOMIC_FDE_FAST_PATH path is not taken. argh, I did not realize I tested the patch only on atomic fast path platforms. The patch below fixes that by moving the check inside the #ifdef. I will

Re: [PATCH] Fix assertion for unwind-dw2-fde.c btree changes

2023-05-14 Thread Thomas Neumann via Gcc-patches
even building the flatbuffers repo externally using cmake at the same revision didn't reproduce it. that said, i have attached a dockerfile that shows you what /does/ fail, under the specific conditions. but there is no unpatched libgcc_s, so you'll have to do that part yourself, and build a

Re: [PATCH] Fix assertion for unwind-dw2-fde.c btree changes

2023-05-14 Thread Thomas Neumann via Gcc-patches
Dear Sören, we ran into a regression introduced by these changes. The regression manifests itself in a failing assertion in __deregister_frame_info_bases. The assertion failure was observed while using Chromium's `flatc` build system tool. The failing assertion is: unwind-dw2-fde.c:281

PING: [PATCH] release the sorted FDE array when deregistering a frame [PR109685]

2023-05-12 Thread Thomas Neumann via Gcc-patches
Summary: The old linear scan logic called free while searching the list of frames. The atomic fast path finds the frame quickly, but forgot the free call. This patches adds the missing free. Bugzilla #109685. See: https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617245.html Best Thomas

[PATCH] fix radix sort on 32bit platforms [PR109670]

2023-05-10 Thread Thomas Neumann via Gcc-patches
The radix sort uses two buffers, a1 for input and a2 for output. After every digit the role of the two buffers is swapped. When terminating the sort early the code made sure the output was in a2. However, when we run out of bits, as can happen on 32bit platforms, the sorted result was in a1, was

[PATCH] release the sorted FDE array when deregistering a frame [PR109685]

2023-05-02 Thread Thomas Neumann via Gcc-patches
The atomic fastpath bypasses the code that releases the sort array which was lazily allocated during unwinding. We now check after deregistering if there is an array to free. libgcc/ChangeLog: * unwind-dw2-fde.c: Free sort array in atomic fast path. --- libgcc/unwind-dw2-fde.c | 6

[PATCH] initialize fde objects lazily

2022-12-09 Thread Thomas Neumann via Gcc-patches
When registering an unwind frame with __register_frame_info_bases we currently initialize that fde object eagerly. This has the advantage that it is immutable afterwards and we can safely access it from multiple threads, but it has the disadvantage that we pay the initialization cost even if the

Re: [PATCH v4] eliminate mutex in fast path of __register_frame

2022-11-22 Thread Thomas Neumann via Gcc-patches
Would it be possible to trigger lazy registration if the version is read as a zero? This would not introduce any additional atomic instructions on the fast path. yes, that is possible. The main problem is the transition from lazy to non-lazy mode when the first exception is thrown. We must

[PATCH] speed up end_fde_sort using radix sort

2022-11-22 Thread Thomas Neumann via Gcc-patches
When registering a dynamic unwinding frame the fde list is sorted. Previously, we split the list into a sorted and an unsorted part, sorted the later using heap sort, and merged both. That can be quite slow due to the large number of (expensive) comparisons. This patch replaces that logic with a

Re: [PATCH v4] eliminate mutex in fast path of __register_frame

2022-11-21 Thread Thomas Neumann via Gcc-patches
Hi, When dynamically linking a fast enough machine hides the latency, but when Statically linking or on slower devices this change caused a 5x increase in Instruction count and 2x increase in cycle count before getting to main. I have looked at ways to fix that. The problem is that with

Re: [PATCH v4] eliminate mutex in fast path of __register_frame

2022-11-21 Thread Thomas Neumann via Gcc-patches
It's easy to reproduce on x86 as well. As a testcase: #include int main(int argc, char** argv) { return 0; } And just compile with: g++ -O1 hello.cpp -static -o hello.exe. thanks, I will take a look. Best Thomas

Re: [PATCH v4] eliminate mutex in fast path of __register_frame

2022-11-21 Thread Thomas Neumann via Gcc-patches
Hi, When dynamically linking a fast enough machine hides the latency, but when Statically linking or on slower devices this change caused a 5x increase in Instruction count and 2x increase in cycle count before getting to main. This has been quite noticeable on smaller devices. Is there a

Re: [PATCH] Avoid depending on destructor order

2022-09-26 Thread Thomas Neumann via Gcc-patches
Hi Iain, You might also want to include Rainer’s patch, AFAIR patches to fix bootstrap are allowed to proceed as an exception to the usual rules, I was not aware of that. I have pushed the patch below now (including Rainer's change), I will update the code if requested. Best Thomas

Re: [PATCH] Avoid depending on destructor order

2022-09-26 Thread Thomas Neumann via Gcc-patches
Hi Claudiu, This change prohibits compiling of ARC backend: + gcc_assert (in_shutdown || ob); in_shutdown is only defined when ATOMIC_FDE_FAST_PATH is defined, while gcc_assert is outside of any ifdef. Please can you revisit this line and change it accordingly. I have a patch ready, I am

Re: [PATCH] Avoid depending on destructor order

2022-09-23 Thread Thomas Neumann via Gcc-patches
+static const bool in_shutdown = false; I'll let Jason or others decide if this is the right solution.  It seems that in_shutdown also could be declared outside the #ifdef and initialized as "false". sure, either is fine. Moving it outside the #ifdef wastes one byte in the executable

Re: [PATCH] Avoid depending on destructor order

2022-09-23 Thread Thomas Neumann via Gcc-patches
This patch broke bootstrap on AIX and probably other targets. #ifdef ATOMIC_FDE_FAST_PATH #include "unwind-dw2-btree.h" static struct btree registered_frames; static bool in_shutdown; ... #else in_shutdown only is defined for ATOMIC_FDE_FAST_PATH but used in code / asserts not protected by

[PATCH] Avoid depending on destructor order

2022-09-19 Thread Thomas Neumann via Gcc-patches
In some scenarios (e.g., when mixing gcc and clang code), it can happen that frames are deregistered after the lookup structure has already been destroyed. That in itself would be fine, but it triggers an assert in __deregister_frame_info_bases that expects to find the frame. To avoid that, we

Re: [PATCH v4] eliminate mutex in fast path of __register_frame

2022-09-19 Thread Thomas Neumann via Gcc-patches
At least when building LibreOffice with Clang (16 trunk) with ASan and UBsan enabled against libstdc++ (with --gcc-toolchain and LD_LIBRARY_PATH to pick up a libstdc++ trunk build including this change at build and run-time), at least one of the LibreOffice tests executed during the build

Re: [PATCH v4] eliminate mutex in fast path of __register_frame

2022-09-19 Thread Thomas Neumann via Gcc-patches
I haven't debugged this in any way, nor checked whether it only impacts exactly my below scenario, but noticed the following: At least when building LibreOffice with Clang (16 trunk) with ASan and UBsan enabled against libstdc++ (with --gcc-toolchain and LD_LIBRARY_PATH to pick up a libstdc++

Re: [PATCH v4] eliminate mutex in fast path of __register_frame

2022-09-18 Thread Thomas Neumann via Gcc-patches
Hi Dimitar, This patch broke avr and pru-elf cross builds: gcc/libgcc/unwind-dw2-fde.c:680:28: error: unknown type name ‘uintptr_t’ 680 |uintptr_t *range) Should uintptr_t be replaced with __UINTPTR_TYPE__? Such change fixes the above broken builds for me. But

Re: [PATCH v4] eliminate mutex in fast path of __register_frame

2022-09-18 Thread Thomas Neumann via Gcc-patches
Hi Dimitar, This patch broke avr and pru-elf cross builds: gcc/libgcc/unwind-dw2-fde.c:680:28: error: unknown type name ‘uintptr_t’ 680 |uintptr_t *range) Should uintptr_t be replaced with __UINTPTR_TYPE__? Such change fixes the above broken builds for me. But

[PATCH v4] eliminate mutex in fast path of __register_frame

2022-09-16 Thread Thomas Neumann via Gcc-patches
The __register_frame/__deregister_frame functions are used to register unwinding frames from JITed code in a sorted list. That list itself is protected by object_mutex, which leads to terrible performance in multi-threaded code and is somewhat expensive even if single-threaded. There was already

Re: [PATCH v3] eliminate mutex in fast path of __register_frame

2022-09-12 Thread Thomas Neumann via Gcc-patches
Thanks for your feedback, I will update the patch in the next few days, addressing the comments and reorganizing classify_object_over_fdes. Concerning your question: + +restart: +  struct btree_node *iter; +  uintptr_t lock; +  { +    // Accessing the root node requires defending against

PING^2: [PATCH v3] eliminate mutex in fast path of __register_frame

2022-08-26 Thread Thomas Neumann via Gcc-patches
Summary: __register_frame and the corresponding _Unwind_Find_FDE use a global mutex for frame registration and unwinding. This can lead to very poor performance on machines with high core counts. This patch organizes the frames in a b-tree with read-optimized synchronization instead, which

PING: [PATCH v3] eliminate mutex in fast path of __register_frame

2022-07-27 Thread Thomas Neumann via Gcc-patches
Summary: __register_frame and the corresponding _Unwind_Find_FDE use a global mutex for frame registration and unwinding. This can lead to very poor performance on machines with high core counts. This patch organizes the frames in a b-tree with read-optimized synchronization instead, which

[PATCH v3] eliminate mutex in fast path of __register_frame

2022-06-26 Thread Thomas Neumann via Gcc-patches
NOTE: A stress test program and a detailed walkthrough that breaks this patch into manageable parts can be found here: https://databasearchitects.blogspot.com/2022/06/making-unwinding-through-jit-ed-code.html The __register_frame/__deregister_frame functions are used to register unwinding frames

PING: [PATCH v2] eliminate mutex in fast path of __register_frame

2022-05-09 Thread Thomas Neumann via Gcc-patches
Summary: __register_frame and the corresponding _Unwind_Find_FDE use a global mutex for frame registration and unwinding. This can lead to very poor performance on machines with high core counts. This patch organizes the frames in a b-tree with read-optimized synchronization instead, which

[PATCH v2] eliminate mutex in fast path of __register_frame

2022-03-04 Thread Thomas Neumann via Gcc-patches
The __register_frame/__deregister_frame functions are used to register unwinding frames from JITed code in a sorted list. That list itself is protected by object_mutex, which leads to terrible performance in multi-threaded code and is somewhat expensive even if single-threaded. There was already

Re: [PATCH] eliminate mutex in fast path of __register_frame

2022-03-03 Thread Thomas Neumann via Gcc-patches
We may have to add a new interface. In some other cases, I've seen errno being used for error reporting, but that requires changes in applications to recognize the error. It's probably better to crash here than to fail mysteriously later. Out of curiosity, how many times do you call the

Re: [PATCH] eliminate mutex in fast path of __register_frame

2022-03-02 Thread Thomas Neumann via Gcc-patches
+// Common logic for version locks +struct version_lock +{ + // The lock itself + uintptr_t version_lock; +}; version_lock must not overflow, right? This means we need a wider counter on 32-bit, too. glibc contains a 62-bit counter that it uses for its own data structure. an overflow

[PATCH] eliminate mutex in fast path of __register_frame

2022-03-01 Thread Thomas Neumann via Gcc-patches
The __register_frame/__deregister_frame functions are used to register unwinding frames from JITed code in a sorted list. That list itself is protected by object_mutex, which leads to terrible performance in multi-threaded code and is somewhat expensive even if single-threaded. There was already