from:"Florian Weimer via Gcc\-patches"

Re: [gcc r13-7720] x86: Update model values for Raptorlake.

2023-08-14 Thread Florian Weimer via Gcc-patches

* Lili Cui via Gcc-cvs:

> https://gcc.gnu.org/g:0fa76e35a5f9e141c08fdf151380f2f9689101c7
>
> commit r13-7720-g0fa76e35a5f9e141c08fdf151380f2f9689101c7
> Author: Cui, Lili 
> Date:   Mon Aug 14 02:06:00 2023 +
>
> x86: Update model values for Raptorlake.
> 
> Update model values for Raptorlake according to SDM.
> 
> gcc/ChangeLog
> 
> * common/config/i386/cpuinfo.h (get_intel_cpu): Add model value 
> 0xba
> to Raptorlake.
>
> Diff:
> ---
>  gcc/common/config/i386/cpuinfo.h | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/gcc/common/config/i386/cpuinfo.h 
> b/gcc/common/config/i386/cpuinfo.h
> index 81f8b1766f8d..d6f2b7e3cfb3 100644
> --- a/gcc/common/config/i386/cpuinfo.h
> +++ b/gcc/common/config/i386/cpuinfo.h
> @@ -539,6 +539,8 @@ get_intel_cpu (struct __processor_model *cpu_model,
>  case 0xbf:
>/* Alder Lake.  */
>  case 0xb7:
> +case 0xba:
> +case 0xbf:
>/* Raptor Lake.  */
>  case 0xaa:
>  case 0xac:

This mismerge breaks the build (duplicate 0xbf case values).

Thanks,
Florian

Re: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Florian Weimer via Gcc-patches

* Hongtao Liu:

> On Wed, Aug 9, 2023 at 3:17 PM Jan Beulich  wrote:
>> Aiui these ABI levels were intended to be incremental, i.e. higher versions
>> would include everything earlier ones cover. Without such a guarantee, how
>> would you propose compatibility checks to be implemented in a way

Correct, this was the intent.  But it's mostly to foster adoption and
make it easier for developers to pick the variants that they want to
target custom builds.  If it's an ascending chain, the trade-offs are
simpler.

> Are there many software implemenation based on this assumption?
> At least in GCC, it's not a big problem, we can adjust code for the
> new micro-architecture level.

The glibc framework can deal with alternate choices in principle,
although I'd prefer not to go there for the reasons indicated.

>> applicable both forwards and backwards? If a new level is wanted here, then
>> I guess it could only be something like v3.5.

> But if we use avx10.1 as v3.5, it's still not subset of
> x86-64-v4(avx10.1 contains avx512fp16,avx512bf16 .etc which are not in
> x86-64-v4), there will be still a diverge.
> Then 256-bit of x86-64-v4 as v3.5? that's too weired to me.

The question is whether you want to mandate the 16-bit floating point
extensions.  You might get better adoption if you stay compatible with
shipping CPUs.  Furthermore, the 256-bit tuning apparently benefits
current Intel CPUs, even though they can do 512-bit vectors.

(The thread subject is a bit misleading for this sub-topic, by the way.)

Thanks,
Florian

Re: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Florian Weimer via Gcc-patches

* Richard Biener via Gcc-patches:

> I don’t think we can realistically change the ABI.  If we could
> passing them in two 256bit registers would be possible as well.
>
> Note I fully expect intel to turn around and implement 512 bits on a
> 256 but data path on the E cores in 5 years.  And it will take at
> least that time for AVX10 to take off (look at AVX512 for this and how
> they cautionously chose to include bf16 to cut off Zen4).  So IMHO we
> shouldn’t worry at all and just wait and see for AVX42 to arrive.

Yes, the direction is a bit unclear.  In retrospect, we could have
defined x86-64-v4 to use 256 bit vector width, so it could eventually be
compatible with AVX10; it's also what current Intel CPUs prefer (and
past, with the exception of the Xeon Phi line).  But in the meantime,
AMD has started to ship CPUs that seem to prefer 512 bit vectors,
despite having a double pumped implementation.  (Disclaimer: All CPU
preferences inferred from current compiler tuning defaults, not actual
experiments. 8-/)

To me, this looks like we may have defined x86-64-v4 prematurely, and
this suggests we should wait a bit to see where things are heading.

Thanks,
Florian

[PATCH releases/gcc-13 1/2] libgcc: Fix eh_frame fast path in find_fde_tail

2023-07-18 Thread Florian Weimer via Gcc-patches

The eh_frame value is only used by linear_search_fdes, not the binary
search directly in find_fde_tail, so the bug is not immediately
apparent with most programs.

Fixes commit e724b0480bfa5ec04f39be8c7290330b495c59de ("libgcc:
Special-case BFD ld unwind table encodings in find_fde_tail").

libgcc/

PR libgcc/109712
* unwind-dw2-fde-dip.c (find_fde_tail): Correct fast path for
parsing eh_frame.

(cherry picked from commit 49310a993308492348119f4033e4db0bda4fe46a)
---
 libgcc/unwind-dw2-fde-dip.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgcc/unwind-dw2-fde-dip.c b/libgcc/unwind-dw2-fde-dip.c
index 6223f5f18a2..4e0b880513f 100644
--- a/libgcc/unwind-dw2-fde-dip.c
+++ b/libgcc/unwind-dw2-fde-dip.c
@@ -403,8 +403,8 @@ find_fde_tail (_Unwind_Ptr pc,
 BFD ld generates.  */
   signed value __attribute__ ((mode (SI)));
   memcpy (, p, sizeof (value));
+  eh_frame = p + value;
   p += sizeof (value);
-  dbase = value;   /* No adjustment because pcrel has base 0.  */
 }
   else
 p = read_encoded_value_with_base (hdr->eh_frame_ptr_enc,

base-commit: a1322d76ca1c3c914fb818d9ba3edc291ccfa25e
-- 
2.41.0

[PATCH releases/gcc-13 2/2] libgcc: Fix -Wint-conversion warning in find_fde_tail

2023-07-18 Thread Florian Weimer via Gcc-patches

Fixes commit r14-1614-g49310a99330849 ("libgcc: Fix eh_frame fast path
in find_fde_tail").

libgcc/

PR libgcc/110179
* unwind-dw2-fde-dip.c (find_fde_tail): Add cast to avoid
implicit conversion of pointer value to integer.

(cherry picked from commit 104b09005229ef48a79a33511ea192bb3ec3c415)
---
 libgcc/unwind-dw2-fde-dip.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgcc/unwind-dw2-fde-dip.c b/libgcc/unwind-dw2-fde-dip.c
index 4e0b880513f..28ea0e64e0e 100644
--- a/libgcc/unwind-dw2-fde-dip.c
+++ b/libgcc/unwind-dw2-fde-dip.c
@@ -403,7 +403,7 @@ find_fde_tail (_Unwind_Ptr pc,
 BFD ld generates.  */
   signed value __attribute__ ((mode (SI)));
   memcpy (, p, sizeof (value));
-  eh_frame = p + value;
+  eh_frame = (_Unwind_Ptr) (p + value);
   p += sizeof (value);
 }
   else
-- 
2.41.0

Re: [PATCH] aarch64: Fix warnings during libgcc build

2023-07-11 Thread Florian Weimer via Gcc-patches

* Richard Earnshaw:

> On 11/07/2023 10:37, Florian Weimer via Gcc-patches wrote:
>> libgcc/
>>  * config/aarch64/aarch64-unwind.h
>> (aarch64_cie_signed_with_b_key):
>>  Add missing const qualifier.  Cast from const unsigned char *
>>  to const char *.  Use __builtin_strchr to avoid an implicit
>>  function declaration.
>>  * config/aarch64/linux-unwind.h (aarch64_fallback_frame_state):
>>  Add missing cast.
>> ---
>> diff --git a/libgcc/config/aarch64/linux-unwind.h 
>> b/libgcc/config/aarch64/linux-unwind.h
>> index 00eba866049..93da7a9537d 100644
>> --- a/libgcc/config/aarch64/linux-unwind.h
>> +++ b/libgcc/config/aarch64/linux-unwind.h
>> @@ -77,7 +77,7 @@ aarch64_fallback_frame_state (struct _Unwind_Context 
>> *context,
>>   }
>>   rt_ = context->cfa;
>> -  sc = _->uc.uc_mcontext;
>> +  sc = (struct sigcontext *) _->uc.uc_mcontext;
>> /* This define duplicates the definition in aarch64.md */
>>   #define SP_REGNUM 31
>> 
>
> This looks somewhat dubious.  I'm not particularly familiar with the
> kernel headers, but a quick look suggests an mcontext_t is nothing
> like a sigcontext_t.  So isn't the cast just papering over some more
> fundamental problem?

I agree it looks dubious.  Note that it's struct sigcontext, not
(not-struct) sigcontext_t.  I don't know why the uc_mcontext members
aren't accessed directly, so I can't really write a good comment about
it.

Obviously it works quite well as-is. 8-)  Similar code is present in
many, many Linux targets.

Thanks,
Florian

[PATCH] riscv: Fix -Wincompatible-pointer-types warning during libgcc build

2023-07-11 Thread Florian Weimer via Gcc-patches

libgcc/

* config/riscv/linux-unwind.h (riscv_fallback_frame_state): Add
missing cast.

---
 libgcc/config/riscv/linux-unwind.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgcc/config/riscv/linux-unwind.h 
b/libgcc/config/riscv/linux-unwind.h
index 931c2f2795d..e58d0f4113e 100644
--- a/libgcc/config/riscv/linux-unwind.h
+++ b/libgcc/config/riscv/linux-unwind.h
@@ -64,7 +64,7 @@ riscv_fallback_frame_state (struct _Unwind_Context *context,
 return _URC_END_OF_STACK;
 
   rt_ = context->cfa;
-  sc = _->uc.uc_mcontext;
+  sc = (struct sigcontext *) _->uc.uc_mcontext;
 
   new_cfa = (_Unwind_Ptr) sc;
   fs->regs.cfa_how = CFA_REG_OFFSET;

[PATCH] or1k: Fix -Wincompatible-pointer-types warning during libgcc build

2023-07-11 Thread Florian Weimer via Gcc-patches

libgcc/

* config/or1k/linux-unwind.h (or1k_fallback_frame_state): Add
missing cast.

---
 libgcc/config/or1k/linux-unwind.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgcc/config/or1k/linux-unwind.h 
b/libgcc/config/or1k/linux-unwind.h
index aa873791daa..37b0c5aef37 100644
--- a/libgcc/config/or1k/linux-unwind.h
+++ b/libgcc/config/or1k/linux-unwind.h
@@ -51,7 +51,7 @@ or1k_fallback_frame_state (struct _Unwind_Context *context,
 return _URC_END_OF_STACK;
 
   rt = context->cfa;
-  sc = >uc.uc_mcontext;
+  sc = (struct sigcontext *) >uc.uc_mcontext;
 
   new_cfa = sc->regs.gpr[1];
   fs->regs.cfa_how = CFA_REG_OFFSET;

[PATCH] arc: Fix -Wincompatible-pointer-types warning during libgcc build

2023-07-11 Thread Florian Weimer via Gcc-patches

libgcc/

* config/arc/linux-unwind.h (arc_fallback_frame_state): Add
missing cast.

---
 libgcc/config/arc/linux-unwind.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgcc/config/arc/linux-unwind.h b/libgcc/config/arc/linux-unwind.h
index 0292e22ed1b..dec9428a7e5 100644
--- a/libgcc/config/arc/linux-unwind.h
+++ b/libgcc/config/arc/linux-unwind.h
@@ -100,7 +100,7 @@ arc_fallback_frame_state (struct _Unwind_Context *context,
   if (pc[0] == MOV_R8_139)
 {
   rt_ = context->cfa;
-  sc = _->uc.uc_mcontext;
+  sc = (struct sigcontext *) _->uc.uc_mcontext;
 }
   else
 return _URC_END_OF_STACK;

[PATCH] csky: Fix -Wincompatible-pointer-types warning during libgcc build

2023-07-11 Thread Florian Weimer via Gcc-patches

libgcc/

* config/csky/linux-unwind.h (csky_fallback_frame_state): Add
missing cast.

---
 libgcc/config/csky/linux-unwind.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgcc/config/csky/linux-unwind.h 
b/libgcc/config/csky/linux-unwind.h
index 66b2a44e047..1acef215974 100644
--- a/libgcc/config/csky/linux-unwind.h
+++ b/libgcc/config/csky/linux-unwind.h
@@ -75,7 +75,7 @@ csky_fallback_frame_state (struct _Unwind_Context *context,
siginfo_t info;
ucontext_t uc;
   } *_rt = context->cfa;
-  sc = &(_rt->uc.uc_mcontext);
+  sc = (struct sigcontext *) &(_rt->uc.uc_mcontext);
 }
   else
 return _URC_END_OF_STACK;

[PATCH] m68k: Avoid implicit function declaration in libgcc

2023-07-11 Thread Florian Weimer via Gcc-patches

libgcc/

* config/m68k/fpgnulib.c (__cmpdf2): Declare.

---
 libgcc/config/m68k/fpgnulib.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libgcc/config/m68k/fpgnulib.c b/libgcc/config/m68k/fpgnulib.c
index fe41edf26aa..d5c3411e947 100644
--- a/libgcc/config/m68k/fpgnulib.c
+++ b/libgcc/config/m68k/fpgnulib.c
@@ -395,6 +395,7 @@ double __extendsfdf2 (float);
 float __truncdfsf2 (double);
 long __fixdfsi (double);
 long __fixsfsi (float);
+int __cmpdf2 (double, double);
 
 int
 __unordxf2(long double a, long double b)

[PATCH] aarch64: Fix warnings during libgcc build

2023-07-11 Thread Florian Weimer via Gcc-patches

libgcc/

* config/aarch64/aarch64-unwind.h (aarch64_cie_signed_with_b_key):
Add missing const qualifier.  Cast from const unsigned char *
to const char *.  Use __builtin_strchr to avoid an implicit
function declaration.
* config/aarch64/linux-unwind.h (aarch64_fallback_frame_state):
Add missing cast.

---
 libgcc/config/aarch64/aarch64-unwind.h | 4 ++--
 libgcc/config/aarch64/linux-unwind.h   | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/libgcc/config/aarch64/aarch64-unwind.h 
b/libgcc/config/aarch64/aarch64-unwind.h
index 3ad2f8239ed..30e428862c4 100644
--- a/libgcc/config/aarch64/aarch64-unwind.h
+++ b/libgcc/config/aarch64/aarch64-unwind.h
@@ -40,8 +40,8 @@ aarch64_cie_signed_with_b_key (struct _Unwind_Context 
*context)
   const struct dwarf_cie *cie = get_cie (fde);
   if (cie != NULL)
{
- char *aug_str = cie->augmentation;
- return strchr (aug_str, 'B') == NULL ? 0 : 1;
+ const char *aug_str = (const char *) cie->augmentation;
+ return __builtin_strchr (aug_str, 'B') == NULL ? 0 : 1;
}
 }
   return 0;
diff --git a/libgcc/config/aarch64/linux-unwind.h 
b/libgcc/config/aarch64/linux-unwind.h
index 00eba866049..93da7a9537d 100644
--- a/libgcc/config/aarch64/linux-unwind.h
+++ b/libgcc/config/aarch64/linux-unwind.h
@@ -77,7 +77,7 @@ aarch64_fallback_frame_state (struct _Unwind_Context *context,
 }
 
   rt_ = context->cfa;
-  sc = _->uc.uc_mcontext;
+  sc = (struct sigcontext *) _->uc.uc_mcontext;
 
 /* This define duplicates the definition in aarch64.md */
 #define SP_REGNUM 31

[PATCH] libgcc: Fix -Wint-conversion warning in find_fde_tail

2023-07-10 Thread Florian Weimer via Gcc-patches

Fixes commit r14-1614-g49310a99330849 ("libgcc: Fix eh_frame fast path
in find_fde_tail").

libgcc/

PR libgcc/110179
* unwind-dw2-fde-dip.c (find_fde_tail): Add cast to avoid
implicit conversion of pointer value to integer.

---
 libgcc/unwind-dw2-fde-dip.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgcc/unwind-dw2-fde-dip.c b/libgcc/unwind-dw2-fde-dip.c
index 4e0b880513f..28ea0e64e0e 100644
--- a/libgcc/unwind-dw2-fde-dip.c
+++ b/libgcc/unwind-dw2-fde-dip.c
@@ -403,7 +403,7 @@ find_fde_tail (_Unwind_Ptr pc,
 BFD ld generates.  */
   signed value __attribute__ ((mode (SI)));
   memcpy (, p, sizeof (value));
-  eh_frame = p + value;
+  eh_frame = (_Unwind_Ptr) (p + value);
   p += sizeof (value);
 }
   else

base-commit: 1f9b18962f2d86abafbb452bf001b72edafb6eef

Re: [RFC] Add stdckdint.h header for C23

2023-06-14 Thread Florian Weimer via Gcc-patches

* Paul Eggert:

> I don't see how you could implement __has_include_next()
> for arbitrary non-GCC compilers, which is what we'd need for glibc
> users.

This is not a requirement for glibc in general.  For example, 
only works with compilers to which it has been ported.

Thanks,
Florian

[PATCH] libgcc: Fix eh_frame fast path in find_fde_tail

2023-06-06 Thread Florian Weimer via Gcc-patches

The eh_frame value is only used by linear_search_fdes, not the binary
search directly in find_fde_tail, so the bug is not immediately
apparent with most programs.

Fixes commit e724b0480bfa5ec04f39be8c7290330b495c59de ("libgcc:
Special-case BFD ld unwind table encodings in find_fde_tail").

[I'd appreciate suggestions how I could add a test for this.  BFD ld
does not seem to allow ommitting the binary search table.]

libgcc/

PR libgcc/109712
* unwind-dw2-fde-dip.c (find_fde_tail): Correct fast path for
parsing eh_frame.

---
 libgcc/unwind-dw2-fde-dip.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgcc/unwind-dw2-fde-dip.c b/libgcc/unwind-dw2-fde-dip.c
index 6223f5f18a2..4e0b880513f 100644
--- a/libgcc/unwind-dw2-fde-dip.c
+++ b/libgcc/unwind-dw2-fde-dip.c
@@ -403,8 +403,8 @@ find_fde_tail (_Unwind_Ptr pc,
 BFD ld generates.  */
   signed value __attribute__ ((mode (SI)));
   memcpy (, p, sizeof (value));
+  eh_frame = p + value;
   p += sizeof (value);
-  dbase = value;   /* No adjustment because pcrel has base 0.  */
 }
   else
 p = read_encoded_value_with_base (hdr->eh_frame_ptr_enc,

base-commit: b327cbe8f4eefc91ee2bea49a1da7128adf30281

Re: [libstdc++] use strtold for from_chars even without locale

2023-05-05 Thread Florian Weimer via Gcc-patches

* Jonathan Wakely via Libstdc:

> We could use strtod for a single-threaded target (i.e.
> !defined(_GLIBCXX_HAS_GTHREADS) by changing the global locale using
> setlocale, instead of changing the per-thread locale using uselocale.

This is not generally safe because the call to setlocale is still
observable to applications in principle because a previous pointer
returned from setlocale they have store could be invalidated.

Thanks,
Florian

[PATCH] libstdc++: Mention recent libgcc_s symbol versions in manual

2023-04-28 Thread Florian Weimer via Gcc-patches

GCC_11.0 is an aarch64-specific outlier.

* doc/xml/manual/abi.xml (abi.versioning.history): Add
GCC_7.0.0, GCC_9.0.0, GCC_11.0, GCC_12.0.0, GCC_13.0.0 for
libgcc_s.

---
 libstdc++-v3/doc/xml/manual/abi.xml | 5 +
 1 file changed, 5 insertions(+)

diff --git a/libstdc++-v3/doc/xml/manual/abi.xml 
b/libstdc++-v3/doc/xml/manual/abi.xml
index 3a3cbd3346c..e0e241de3bd 100644
--- a/libstdc++-v3/doc/xml/manual/abi.xml
+++ b/libstdc++-v3/doc/xml/manual/abi.xml
@@ -203,6 +203,11 @@ compatible.
 GCC 4.6.0: GCC_4.6.0
 GCC 4.7.0: GCC_4.7.0
 GCC 4.8.0: GCC_4.8.0
+GCC 7.1.0: GCC_7.0.0
+GCC 9.1.0: GCC_9.0.0
+GCC 11.1.0: GCC_11.0
+GCC 12.1.0: GCC_12.0.0
+GCC 13.1.0: GCC_13.0.0
 
 
 

base-commit: 0c77a0909456034d34036aa22a8dfcf0258cfa2d

Re: libsanitizer: sync from master

2023-04-28 Thread Florian Weimer via Gcc-patches

* Martin Liška:

> On 4/26/23 20:31, Florian Weimer wrote:
>> * Martin Liška:
>> 
>>> On 11/15/22 16:47, Martin Liška wrote:
 Hi.

 I've just pushed libsanitizer update that was tested on x86_64-linux and 
 ppc64le-linux systems.
 Moreover, I run bootstrap on x86_64-linux and checked ABI difference with 
 abidiff.
>>>
>>> Hello.
>>>
>>> And I've done the same now and merged upstream version 3185e47b5a8444e9fd.
>> 
>> So … we have the issue that involves interceptors outside of libc.so.6,
>> namely crypt, crypt_r, and I posted an upstream patch for this:
>> 
>>   sanitizers: Disable crypt, crypt_r interceptors for glibc
>>   
>> 
>> Can we just apply this downstream for now?  It blocks various folks from
>> using the sanitizers in their projects.
>
> Hello.
>
> Your upstream revision has been already accepted, so please apply it
> and I'm going to do one more merge from upstream in the following
> days. Does it work for you?

It's moving in a different direction now: 

But that's okay for me as well.

Thanks,
Florian

Re: libsanitizer: sync from master

2023-04-26 Thread Florian Weimer via Gcc-patches

* Martin Liška:

> On 11/15/22 16:47, Martin Liška wrote:
>> Hi.
>> 
>> I've just pushed libsanitizer update that was tested on x86_64-linux and 
>> ppc64le-linux systems.
>> Moreover, I run bootstrap on x86_64-linux and checked ABI difference with 
>> abidiff.
>
> Hello.
>
> And I've done the same now and merged upstream version 3185e47b5a8444e9fd.

So … we have the issue that involves interceptors outside of libc.so.6,
namely crypt, crypt_r, and I posted an upstream patch for this:

  sanitizers: Disable crypt, crypt_r interceptors for glibc

Can we just apply this downstream for now?  It blocks various folks from
using the sanitizers in their projects.

Thanks,
Florian

Re: [PATCH] Various fixes for DWARF register size computation

2023-01-03 Thread Florian Weimer via Gcc-patches

* Jakub Jelinek:

> On Tue, Jan 03, 2023 at 12:15:23PM +0100, Florian Weimer wrote:
>> --- a/gcc/debug.h
>> +++ b/gcc/debug.h
>> @@ -245,7 +245,18 @@ extern const struct gcc_debug_hooks vmsdbg_debug_hooks;
>>  
>>  /* Dwarf2 frame information.  */
>>  
>> -extern int dwarf_reg_sizes_constant ();
>> +/* Query size information about DWARF registers.  */
>> +struct dwarf_single_register_size
>> +{
>> +  dwarf_single_register_size();
>
> Formatting, space before (
>
>> @@ -334,27 +333,39 @@ generate_dwarf_reg_sizes (poly_uint16 *sizes)
>>  targetm.init_dwarf_reg_sizes_extra (sizes);
>>  }
>>  
>> -/* Return 0 if the DWARF register sizes are not constant, otherwise
>> -   return the size constant.  */
>> -
>> -int
>> -dwarf_reg_sizes_constant ()
>> +dwarf_single_register_size::dwarf_single_register_size()
>
> Likewise.
>
>> +  for (int i = DWARF_FRAME_REGISTERS; i >= 0; --i)
>> +{
>> +  unsigned short value;
>> +  if (!sizes[i].is_constant () || value != 0)
>
> if (!known_eq (sizes[i], 0))
> ?

Right.

> Though, I still wonder, because all of this is a hack for a single target
> - x86_64-linux -m64 - I think no other target has similar constant
> sizes,

Really?  That's odd.

Is it because other architectures track callee-saved vector registers
through unwinding?

> whether
> it wouldn't be better to revert all this compiler side stuff and handle it
> purely on the libgcc side - allow target headers to specify a simple
> expression how to compute dwarf_reg_size + don't define dwarf_reg_size_table
> array in that case and instead in a testcase verify that
> __builtin_init_dwarf_reg_size_table initializes an array to the exact same
> values as the libgcc/config/**/*.h overridden dwarf_reg_size version.
> That way, for x86_64-linux we can use
> ((index) <= __LIBGCC_DWARF_FRAME_REGISTERS__ ? 8 : 0)
> but could provide something reasonable even for other targets if it improves
> the unwinder sufficiently.
> Say s390x-linux -m64 is
> ((index) <= 32 ? 8 : (index) == 33 ? 4 : 0)
> etc.

> Or, if you want to do it on the compiler side, instead of predefining
> __LIBGCC_DWARF_REG_SIZES_CONSTANT__ and __LIBGCC_DWARF_REG_MAXIMUM__
> register conditionally a new builtin, __builtin_dwarf_reg_size,
> which would be defined only if -fbuilding-libgcc and the compiler determines
> dwarf_reg_size is desirable to be computed inline without a table and
> would fold the builtin to say that
> index <= 16U ? 8 : 0 on x86_64 -m64,
> index <= 9U ? 4 : index - 11U <= 5U ? 12 : 0 on x86_64 -m32 etc.
> and if the expression is too large/complex, wouldn't predefine the builtin.

I think the pre-computation of the size array is useful even for targets
where the expression is not so simple, but the array elements are still
constants.  A builtin like __builtin_dwarf_reg_size could use a
reference to a constant static array, so that we can get rid of the
array initialization code in libgcc.  Before we can do that, we need to
figure out if the fully dynamic register sizes on AArch64 with SVE are
actually correct—and if we need to fix the non-SVE unwinder to work
properly for SVE programs.

So I don't want to revert the size array computation just yet.

> Or, is it actually the use of table that is bad on the unwinder side,
> or lack of a small upper bound for what you get from the table?
> In that case you could predefine upper bound on the sizes instead (if
> constant) and simply add if (size > __LIBGCC_DWARF_REG_SIZE_MAX__)
> __builtin_unreachable ()).

It also matters what kind of register sizes are used in practice.
Looking at the FDE for _Unwind_RaiseException on i686, we only save
4-byte registers there, I think.  Perhaps only non-GCC-generated code
may exercise the other register sizes?  That's different on AArch64,
where the vector registers are saved as well.

Should I repost this patch with the three nits fixed?  Or should I
revert two of the three patches I committed instead?

Thanks,
Florian

[PATCH] Various fixes for DWARF register size computation

2023-01-03 Thread Florian Weimer via Gcc-patches

The previous code had several issues.

1. XALLOCAVEC does not create any objects, so invocating
   the non-POD poly_uint16 assignment operator is undefined.
2. The default constructor of poly-ints does not create a
   zero poly-int object (unlike what happens with regular ints).
3. The register size array must have DWARF_FRAME_REGISTERS + 1
   elements.  The extra element can be DWARF_FRAME_RETURN_COLUMN
   or DWARF_ALT_FRAME_RETURN_COLUMN.

To fix problem 3, merely increasing the array size is sufficient,
but it inhibits the x86-64 register size optimization in libgcc
because it does not use the extra register, so it has size zero.
To re-enable the optimization, expose the maximum used register
to libgcc.  This is sufficient for the optimizers to figure out
that the memcpy call in uw_install_context_1 has a fixed size
argument on x86-64.

This restores bootstrap on aarch64-linux-gnu and powerpc64-linux-gnu.
Not sure about test suite results yet, I need to check the baseline.

gcc/

* debug.h (dwarf_reg_sizes_constant): Remove declaration.
(dwarf_single_register_size): New struct.
* dwarf2cfi.cc (generate_dwarf_reg_sizes): Initialize
extra register size.  Use in-place new for initialization.
Remove unnecessary memset.
(dwarf_reg_sizes_constant): Remove.
(dwarf_single_register_size::dwarf_single_register_size):
New constructor based on removed dwarf_reg_sizes_constant
function.  Allocate extra size element.
(expand_builtin_init_dwarf_reg_sizes): Allocate extra size
element.
* target.def (init_dwarf_reg_sizes_extra): Mention extra size
element.
* doc/tm.texi: Update.

gcc/c-family/

* c-cppbuiltin.cc (c_cpp_builtins): Switch to
dwarf_single_register_size for obtaining DWARF register sizes.
Define __LIBGCC_DWARF_REG_MAXIMUM__.

libgcc/

* unwind-dw2.c (dwarf_reg_size): Use
__LIBGCC_DWARF_REG_MAXIMUM__.

---
 gcc/c-family/c-cppbuiltin.cc | 12 
 gcc/debug.h  | 13 -
 gcc/doc/tm.texi  |  2 +-
 gcc/dwarf2cfi.cc | 45 +++-
 gcc/target.def   |  2 +-
 libgcc/unwind-dw2.c  |  7 +--
 6 files changed, 55 insertions(+), 26 deletions(-)

diff --git a/gcc/c-family/c-cppbuiltin.cc b/gcc/c-family/c-cppbuiltin.cc
index ddfd63b8eb9..8098aca41e8 100644
--- a/gcc/c-family/c-cppbuiltin.cc
+++ b/gcc/c-family/c-cppbuiltin.cc
@@ -1522,10 +1522,14 @@ c_cpp_builtins (cpp_reader *pfile)
   builtin_define_with_int_value ("__LIBGCC_DWARF_FRAME_REGISTERS__",
 DWARF_FRAME_REGISTERS);
   {
-   int value = dwarf_reg_sizes_constant ();
-   if (value > 0)
- builtin_define_with_int_value ("__LIBGCC_DWARF_REG_SIZES_CONSTANT__",
-value);
+   dwarf_single_register_size srs;
+   if (srs.common_size > 0)
+ {
+   builtin_define_with_int_value 
("__LIBGCC_DWARF_REG_SIZES_CONSTANT__",
+  srs.common_size);
+   builtin_define_with_int_value ("__LIBGCC_DWARF_REG_MAXIMUM__",
+  srs.maximum_register);
+ }
   }
   builtin_define_with_int_value ("__LIBGCC_DWARF_CIE_DATA_ALIGNMENT__",
 DWARF_CIE_DATA_ALIGNMENT);
diff --git a/gcc/debug.h b/gcc/debug.h
index 4fe9f3570ac..2e843da8b41 100644
--- a/gcc/debug.h
+++ b/gcc/debug.h
@@ -245,7 +245,18 @@ extern const struct gcc_debug_hooks vmsdbg_debug_hooks;
 
 /* Dwarf2 frame information.  */
 
-extern int dwarf_reg_sizes_constant ();
+/* Query size information about DWARF registers.  */
+struct dwarf_single_register_size
+{
+  dwarf_single_register_size();
+
+  /* The common register size, or 0 if the register size varies.  */
+  unsigned int common_size;
+
+  /* The maximum register number that is actually present.  Registers
+ above the maximum are size zero even if common_size is positive.  */
+  unsigned int maximum_register;
+};
 
 extern void dwarf2out_begin_prologue (unsigned int, unsigned int,
  const char *);
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index b6d7900f212..eb29cfb95aa 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -9847,7 +9847,7 @@ sizes of those pieces in the table used by the unwinder 
at runtime.
 It will be called by @code{generate_dwarf_reg_sizes} after
 filling in a single size corresponding to each hard register;
 @var{sizes} is the address of the table.  It will contain
-@code{DWARF_FRAME_REGISTERS} elements when this hook is called.
+@code{DWARF_FRAME_REGISTERS + 1} elements when this hook is called.
 @end deftypefn
 
 @deftypefn {Target Hook} bool TARGET_ASM_TTYPE (rtx @var{sym})
diff --git a/gcc/dwarf2cfi.cc b/gcc/dwarf2cfi.cc
index d5a27dc36c5..5bd12e070b3 100644
--- a/gcc/dwarf2cfi.cc
+++

Re: [PATCH 1/3] Compute a table of DWARF register sizes at compile

2023-01-02 Thread Florian Weimer via Gcc-patches

* Jeff Law:

> On 11/8/22 11:05, Florian Weimer via Gcc-patches wrote:
>> The sizes are compile-time constants.  Create a vector with them,
>> so that they can be inspected at compile time.
>>
>>  * gcc/dwarf2cfi.cc (init_return_column_size): Remove.
>>  (init_one_dwarf_reg_size): Adjust.
>>  (generate_dwarf_reg_sizes): New function.  Extracted
>>  from expand_builtin_init_dwarf_reg_sizes.
>>  (expand_builtin_init_dwarf_reg_sizes): Call
>>  generate_dwarf_reg_sizes.
>>  * gcc/target.def (init_dwarf_reg_sizes_extra): Adjust
>>  hook signature.
>>  * gcc/config/msp430/msp430.cc
>>  (msp430_init_dwarf_reg_sizes_extra): Adjust.
>>  * gcc/config/rs6000.cc (rs6000_init_dwarf_reg_sizes_extra):
>>  Likewise.
>>  * gcc/doc/tm.texi: Update.
>
> This series of 3 patches is fine.

Thanks, now pushed (after polishing the ChangeLog snippets).

Florian

Re: [PATCH] docs: Suggest options to improve ASAN stack traces

2022-12-07 Thread Florian Weimer via Gcc-patches

* Marek Polacek via Gcc-patches:

> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 726392409b6..2de14466dd3 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -16510,6 +16510,14 @@ The option cannot be combined with 
> @option{-fsanitize=thread} or
>  @option{-fsanitize=hwaddress}.  Note that the only target
>  @option{-fsanitize=hwaddress} is currently supported on is AArch64.
>  
> +To get more accurate stack traces, it is possible to use options such as
> +@option{-O} (which, for instance, prevents most function inlining),
> +@option{-fno-optimize-sibling-calls} (which prevents optimizing sibling
> +and tail recursive calls), or @option{-fno-ipa-icf} (which disables Identical
> +Code Folding for functions and read-only variables).  Since multiple runs
> +of the program may yield backtraces with different addresses due to ASLR,
> +it may be desirable to turn off ASLR: @samp{setarch `uname -m` -R ./prog}.

What about -fasynchronous-unwind-tables?  It should help if ASAN ever
reports stray segmentation faults.  Whether it also helps in general
depends on whether ASAN maintains ABI around its instrumentation.

Thanks,
Florian

Re: [PATCH] c: Propagate erroneous types to declaration specifiers [PR107805]

2022-11-24 Thread Florian Weimer via Gcc-patches

* Jakub Jelinek:

> On Thu, Nov 24, 2022 at 11:01:40AM +0100, Florian Weimer via Gcc-patches 
> wrote:
>> * Joseph Myers:
>> 
>> > On Tue, 22 Nov 2022, Florian Weimer via Gcc-patches wrote:
>> >
>> >> Without this change, finish_declspecs cannot tell that whether there
>> >> was an erroneous type specified, or no type at all.  This may result
>> >> in additional diagnostics for implicit ints, or missing diagnostics
>> >> for multiple types.
>> >> 
>> >>   PR c/107805
>> >> 
>> >> gcc/c/
>> >>   * c-decl.cc (declspecs_add_type): Propagate error_mark_bode
>> >>   from type to specs.
>> >> 
>> >> gcc/testsuite/
>> >>   * gcc.dg/pr107805-1.c: New test.
>> >>   * gcc.dg/pr107805-1.c: Likewise.
>> >
>> > OK.
>> 
>> Thanks.  Permission to backport this to GCC 12 after a week or two?
>
> In this case I'd wait a month, it will take some time until possible
> error recovery bugs are discovered.

Okay, I have made a note to backport it in the new year.  Hopefully
any regressions will be flagged on the PR or linked to it.

Thanks,
Florian

Re: [PATCH] c: Propagate erroneous types to declaration specifiers [PR107805]

2022-11-24 Thread Florian Weimer via Gcc-patches

* Joseph Myers:

> On Tue, 22 Nov 2022, Florian Weimer via Gcc-patches wrote:
>
>> Without this change, finish_declspecs cannot tell that whether there
>> was an erroneous type specified, or no type at all.  This may result
>> in additional diagnostics for implicit ints, or missing diagnostics
>> for multiple types.
>> 
>>  PR c/107805
>> 
>> gcc/c/
>>  * c-decl.cc (declspecs_add_type): Propagate error_mark_bode
>>  from type to specs.
>> 
>> gcc/testsuite/
>>  * gcc.dg/pr107805-1.c: New test.
>>  * gcc.dg/pr107805-1.c: Likewise.
>
> OK.

Thanks.  Permission to backport this to GCC 12 after a week or two?

Florian

[PATCH] c: Propagate erroneous types to declaration specifiers [PR107805]

2022-11-22 Thread Florian Weimer via Gcc-patches

Without this change, finish_declspecs cannot tell that whether there
was an erroneous type specified, or no type at all.  This may result
in additional diagnostics for implicit ints, or missing diagnostics
for multiple types.

PR c/107805

gcc/c/
* c-decl.cc (declspecs_add_type): Propagate error_mark_bode
from type to specs.

gcc/testsuite/
* gcc.dg/pr107805-1.c: New test.
* gcc.dg/pr107805-1.c: Likewise.

---
Note regarding testing: I boostrap with c,c++,lto on x86-64
(non-multlib) and diffed these .sum files:

gcc/testsuite/gcc/gcc.sum
gcc/testsuite/g++/g++.sum
x86_64-pc-linux-gnu/libgomp/testsuite/libgomp.sum
x86_64-pc-linux-gnu/libstdc++-v3/testsuite/libstdc++.sum
x86_64-pc-linux-gnu/libatomic/testsuite/libatomic.sum
x86_64-pc-linux-gnu/libitm/testsuite/libitm.sum

Apart from timestamps, the only differences I get is this change:

--- ./gcc/testsuite/gcc/gcc.sum 2022-11-22 05:45:33.813264761 -0500
+++ /tmp/b/build/./gcc/testsuite/gcc/gcc.sum2022-11-22 06:39:10.667590185 
-0500
@@ -83303,6 +83303,11 @@
 PASS: gcc.dg/pr107618.c  (test for bogus messages, line 9)
 PASS: gcc.dg/pr107618.c (test for excess errors)
 PASS: gcc.dg/pr107686.c (test for excess errors)
+PASS: gcc.dg/pr107805-1.c  (test for errors, line 3)
+PASS: gcc.dg/pr107805-1.c (test for excess errors)
+PASS: gcc.dg/pr107805-2.c  (test for errors, line 3)
+PASS: gcc.dg/pr107805-2.c  (test for errors, line 4)
+PASS: gcc.dg/pr107805-2.c (test for excess errors)
 PASS: gcc.dg/pr11459-1.c (test for excess errors)
 PASS: gcc.dg/pr11492.c  (test for bogus messages, line 8)
 PASS: gcc.dg/pr11492.c (test for excess errors)
@@ -190486,7 +190491,7 @@
 
=== gcc Summary ===
 
-# of expected passes   185932
+# of expected passes   185937
 # of unexpected failures   99
 # of unexpected successes  20
 # of expected failures 1484

So I think this means there are no test suite regressions.

Thanks,
Florian

 gcc/c/c-decl.cc   | 6 ++
 gcc/testsuite/gcc.dg/pr107805-1.c | 5 +
 gcc/testsuite/gcc.dg/pr107805-2.c | 4 
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 098e475f65d..4adb89e4aaf 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -12243,11 +12243,9 @@ declspecs_add_type (location_t loc, struct c_declspecs 
*specs,
 error_at (loc, "two or more data types in declaration specifiers");
   else if (TREE_CODE (type) == TYPE_DECL)
 {
-  if (TREE_TYPE (type) == error_mark_node)
-   ; /* Allow the type to default to int to avoid cascading errors.  */
-  else
+  specs->type = TREE_TYPE (type);
+  if (TREE_TYPE (type) != error_mark_node)
{
- specs->type = TREE_TYPE (type);
  specs->decl_attr = DECL_ATTRIBUTES (type);
  specs->typedef_p = true;
  specs->explicit_signed_p = C_TYPEDEF_EXPLICITLY_SIGNED (type);
diff --git a/gcc/testsuite/gcc.dg/pr107805-1.c 
b/gcc/testsuite/gcc.dg/pr107805-1.c
new file mode 100644
index 000..559b6a5586e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr107805-1.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+typedef int t;
+typedef struct { double a; int b; } t; /* { dg-error "conflicting types" } */
+t x; /* No warning here.  */
+
diff --git a/gcc/testsuite/gcc.dg/pr107805-2.c 
b/gcc/testsuite/gcc.dg/pr107805-2.c
new file mode 100644
index 000..fa5fa4ce273
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr107805-2.c
@@ -0,0 +1,4 @@
+/* { dg-do compile } */
+typedef int t;
+typedef struct { double a; int b; } t; /* { dg-error "conflicting types" } */
+t char x; /* { dg-error "two or more data types" } */

base-commit: e4faee8d02ec5d65bf418612f7181823eb08c078

Re: [PATCH v4] eliminate mutex in fast path of __register_frame

2022-11-22 Thread Florian Weimer via Gcc-patches

* Thomas Neumann:

> Hi,
>
> When dynamically linking a fast enough machine hides the latency, but when
> Statically linking or on slower devices this change caused a 5x increase 
> in
> Instruction count and 2x increase in cycle count before getting to main.
>
> I have looked at ways to fix that. The problem is that with static
> linking unwinding tables are registered dynamically, and with my patch 
> that registration triggers an eager sort of fde lists. While
> previously the lists were sorted when the first exception was
> thrown. If an application throws at least one exception there is no
> downside in eager sorting, but if the application never throws there
> is overhead.

Would it be possible to trigger lazy registration if the version is read
as a zero?  This would not introduce any additional atomic instructions
on the fast path.

Thanks,
Florian

[PATCH 2/3] Define __LIBGCC_DWARF_REG_SIZES_CONSTANT__ if DWARF register size is constant

2022-11-08 Thread Florian Weimer via Gcc-patches

And use that to speed up the libgcc unwinder.

* gcc/debug.h (dwarf_reg_sizes_constant): Declare.
* gcc/dwarf2cfi.cc (dwarf_reg_sizes_constant): New function.
* gcc/c-family/c-cppbuiltin.c
(__LIBGCC_DWARF_REG_SIZES_CONSTANT__): Define if constant is
known.

libgcc/

* unwind-dw2.c (dwarf_reg_size): New function.
(_Unwind_GetGR, _Unwind_SetGR, _Unwind_SetGRPtr)
(_Unwind_SetSpColumn, uw_install_context_1): Use it.
(uw_init_context_1): Do not initialize dwarf_reg_size_table
if not in use.
---
 gcc/c-family/c-cppbuiltin.cc |  6 ++
 gcc/debug.h  |  2 ++
 gcc/dwarf2cfi.cc | 23 
 libgcc/unwind-dw2.c  | 41 +---
 4 files changed, 60 insertions(+), 12 deletions(-)

diff --git a/gcc/c-family/c-cppbuiltin.cc b/gcc/c-family/c-cppbuiltin.cc
index cdb658f6ac9..ab98bf3b059 100644
--- a/gcc/c-family/c-cppbuiltin.cc
+++ b/gcc/c-family/c-cppbuiltin.cc
@@ -1515,6 +1515,12 @@ c_cpp_builtins (cpp_reader *pfile)
 #endif
   builtin_define_with_int_value ("__LIBGCC_DWARF_FRAME_REGISTERS__",
 DWARF_FRAME_REGISTERS);
+  {
+   int value = dwarf_reg_sizes_constant ();
+   if (value > 0)
+ builtin_define_with_int_value ("__LIBGCC_DWARF_REG_SIZES_CONSTANT__",
+value);
+  }
 #ifdef EH_RETURN_STACKADJ_RTX
   cpp_define (pfile, "__LIBGCC_EH_RETURN_STACKADJ_RTX__");
 #endif
diff --git a/gcc/debug.h b/gcc/debug.h
index fe85115d5f3..6bcc8da1f76 100644
--- a/gcc/debug.h
+++ b/gcc/debug.h
@@ -245,6 +245,8 @@ extern const struct gcc_debug_hooks vmsdbg_debug_hooks;
 
 /* Dwarf2 frame information.  */
 
+extern int dwarf_reg_sizes_constant ();
+
 extern void dwarf2out_begin_prologue (unsigned int, unsigned int,
  const char *);
 extern void dwarf2out_vms_end_prologue (unsigned int, const char *);
diff --git a/gcc/dwarf2cfi.cc b/gcc/dwarf2cfi.cc
index b29173b2156..d45d20478b4 100644
--- a/gcc/dwarf2cfi.cc
+++ b/gcc/dwarf2cfi.cc
@@ -334,6 +334,29 @@ generate_dwarf_reg_sizes (poly_uint16 *sizes)
 targetm.init_dwarf_reg_sizes_extra (sizes);
 }
 
+/* Return 0 if the DWARF register sizes are not constant, otherwise
+   return the size constant.  */
+
+int
+dwarf_reg_sizes_constant ()
+{
+  poly_uint16 *sizes = XALLOCAVEC (poly_uint16, DWARF_FRAME_REGISTERS);
+  generate_dwarf_reg_sizes (sizes);
+
+  int result;
+  for (unsigned int i = 0; i < DWARF_FRAME_REGISTERS; i++)
+{
+  unsigned short value;
+  if (!sizes[i].is_constant ())
+   return 0;
+  if (i == 0)
+   result = value;
+  else if (result != value)
+   return 0;
+}
+  return result;
+}
+
 /* Generate code to initialize the dwarf register size table located
at the provided ADDRESS.  */
 
diff --git a/libgcc/unwind-dw2.c b/libgcc/unwind-dw2.c
index eaceace2029..c370121bb29 100644
--- a/libgcc/unwind-dw2.c
+++ b/libgcc/unwind-dw2.c
@@ -148,9 +148,25 @@ struct _Unwind_Context
   char by_value[__LIBGCC_DWARF_FRAME_REGISTERS__+1];
 };
 
+#ifdef __LIBGCC_DWARF_REG_SIZES_CONSTANT__
+static inline unsigned char
+dwarf_reg_size (int index __attribute__ ((__unused__)))
+{
+  return __LIBGCC_DWARF_REG_SIZES_CONSTANT__;
+}
+#else
 /* Byte size of every register managed by these routines.  */
 static unsigned char dwarf_reg_size_table[__LIBGCC_DWARF_FRAME_REGISTERS__+1];
 
+
+static inline unsigned char
+dwarf_reg_size (unsigned index)
+{
+  gcc_assert (index < sizeof (dwarf_reg_size_table));
+  return dwarf_reg_size_table[index];
+}
+#endif
+
 
 /* Read unaligned data from the instruction buffer.  */
 
@@ -232,8 +248,7 @@ _Unwind_GetGR (struct _Unwind_Context *context, int regno)
 #endif
 
   index = DWARF_REG_TO_UNWIND_COLUMN (regno);
-  gcc_assert (index < (int) sizeof(dwarf_reg_size_table));
-  size = dwarf_reg_size_table[index];
+  size = dwarf_reg_size (index);
   val = context->reg[index];
 
   if (_Unwind_IsExtendedContext (context) && context->by_value[index])
@@ -280,8 +295,7 @@ _Unwind_SetGR (struct _Unwind_Context *context, int index, 
_Unwind_Word val)
   void *ptr;
 
   index = DWARF_REG_TO_UNWIND_COLUMN (index);
-  gcc_assert (index < (int) sizeof(dwarf_reg_size_table));
-  size = dwarf_reg_size_table[index];
+  size = dwarf_reg_size (index);
 
   if (_Unwind_IsExtendedContext (context) && context->by_value[index])
 {
@@ -329,9 +343,8 @@ _Unwind_SetGRValue (struct _Unwind_Context *context, int 
index,
_Unwind_Word val)
 {
   index = DWARF_REG_TO_UNWIND_COLUMN (index);
-  gcc_assert (index < (int) sizeof(dwarf_reg_size_table));
   /* Return column size may be smaller than _Unwind_Context_Reg_Val.  */
-  gcc_assert (dwarf_reg_size_table[index] <= sizeof (_Unwind_Context_Reg_Val));
+  gcc_assert (dwarf_reg_size (index) <= sizeof (_Unwind_Context_Reg_Val));
 
   context->by_value[index] = 1;
   context->reg[index] =

[PATCH 3/3] libgcc: Specialize execute_cfa_program in DWARF unwinder for alignments

2022-11-08 Thread Florian Weimer via Gcc-patches

The parameters fs->data_align and fs->code_align always have fixed
values for a particular target in GCC-generated code.  Specialize
execute_cfa_program for these values, to avoid multiplications.

gcc/

* c-family/c-cppbuiltin.c (c_cpp_builtins): Define
__LIBGCC_DWARF_CIE_DATA_ALIGNMENT__.

libgcc/

* unwind-dw2-execute_cfa.h: New file.  Extracted from
the execute_cfa_program function in unwind-dw2.c.
* unwind-dw2.c (execute_cfa_program_generic): New function.
(execute_cfa_program_specialized): Likewise.
(execute_cfa_program): Call execute_cfa_program_specialized
or execute_cfa_program_generic, as appropriate.
---
 gcc/c-family/c-cppbuiltin.cc|   2 +
 libgcc/unwind-dw2-execute_cfa.h | 322 
 libgcc/unwind-dw2.c | 319 +++
 3 files changed, 354 insertions(+), 289 deletions(-)
 create mode 100644 libgcc/unwind-dw2-execute_cfa.h

diff --git a/gcc/c-family/c-cppbuiltin.cc b/gcc/c-family/c-cppbuiltin.cc
index ab98bf3b059..c8c327b3b2e 100644
--- a/gcc/c-family/c-cppbuiltin.cc
+++ b/gcc/c-family/c-cppbuiltin.cc
@@ -1521,6 +1521,8 @@ c_cpp_builtins (cpp_reader *pfile)
  builtin_define_with_int_value ("__LIBGCC_DWARF_REG_SIZES_CONSTANT__",
 value);
   }
+  builtin_define_with_int_value ("__LIBGCC_DWARF_CIE_DATA_ALIGNMENT__",
+DWARF_CIE_DATA_ALIGNMENT);
 #ifdef EH_RETURN_STACKADJ_RTX
   cpp_define (pfile, "__LIBGCC_EH_RETURN_STACKADJ_RTX__");
 #endif
diff --git a/libgcc/unwind-dw2-execute_cfa.h b/libgcc/unwind-dw2-execute_cfa.h
new file mode 100644
index 000..dd97b786668
--- /dev/null
+++ b/libgcc/unwind-dw2-execute_cfa.h
@@ -0,0 +1,322 @@
+/* DWARF2 exception handling CFA execution engine.
+   Copyright (C) 1997-2022 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+/* This file is included from unwind-dw2.c to specialize the code for certain
+   values of DATA_ALIGN and CODE_ALIGN.  These macros must be defined prior to
+   including this file.  */
+
+{
+  struct frame_state_reg_info *unused_rs = NULL;
+
+  /* Don't allow remember/restore between CIE and FDE programs.  */
+  fs->regs.prev = NULL;
+
+  /* The comparison with the return address uses < rather than <= because
+ we are only interested in the effects of code before the call; for a
+ noreturn function, the return address may point to unrelated code with
+ a different stack configuration that we are not interested in.  We
+ assume that the call itself is unwind info-neutral; if not, or if
+ there are delay instructions that adjust the stack, these must be
+ reflected at the point immediately before the call insn.
+ In signal frames, return address is after last completed instruction,
+ so we add 1 to return address to make the comparison <=.  */
+  while (insn_ptr < insn_end
+&& fs->pc < context->ra + _Unwind_IsSignalFrame (context))
+{
+  unsigned char insn = *insn_ptr++;
+  _uleb128_t reg, utmp;
+  _sleb128_t offset, stmp;
+
+  if ((insn & 0xc0) == DW_CFA_advance_loc)
+   fs->pc += (insn & 0x3f) * CODE_ALIGN;
+  else if ((insn & 0xc0) == DW_CFA_offset)
+   {
+ reg = insn & 0x3f;
+ insn_ptr = read_uleb128 (insn_ptr, );
+ offset = (_Unwind_Sword) utmp * DATA_ALIGN;
+ reg = DWARF_REG_TO_UNWIND_COLUMN (reg);
+ if (UNWIND_COLUMN_IN_RANGE (reg))
+   {
+ fs->regs.how[reg] = REG_SAVED_OFFSET;
+ fs->regs.reg[reg].loc.offset = offset;
+   }
+   }
+  else if ((insn & 0xc0) == DW_CFA_restore)
+   {
+ reg = insn & 0x3f;
+ reg = DWARF_REG_TO_UNWIND_COLUMN (reg);
+ if (UNWIND_COLUMN_IN_RANGE (reg))
+   fs->regs.how[reg] = REG_UNSAVED;
+   }
+  else switch (insn)
+   {
+   case DW_CFA_set_loc:
+ {
+   _Unwind_Ptr pc;
+
+   insn_ptr = read_encoded_value (context,

[PATCH 0/3] Further libgcc unwinder improvements

2022-11-08 Thread Florian Weimer via Gcc-patches

This series makes some further unwinder improvements.  Unfortunately,
not many targets define __LIBGCC_DWARF_REG_SIZES_CONSTANT__; x86-64
does, and it makes uw_install_context_1 quite a bit faster because GCC
no longer has to emit generic memcpy code for it.  In general, it may be
worthwhile to replace this code with target-specific implementations.

Tested on powerpc64le-linux-gnu, x86_64-linux-gnu; I didn't see any test
result differences.  Built GCC for msp430-elf, too.

The revision for the patch I posted earlier (using SWAR techniques for
get_cie_encoding) is not ready yet and probably won't make GCC 13.  It
requires some header cleanups first.

Thanks,
Florian

Florian Weimer (3):
  Compute a table of DWARF register sizes at compile
  Define __LIBGCC_DWARF_REG_SIZES_CONSTANT__ if DWARF register size is
constant
  libgcc: Specialize execute_cfa_program in DWARF unwinder for
alignments

 gcc/c-family/c-cppbuiltin.cc|   8 +
 gcc/config/msp430/msp430.cc |  11 +-
 gcc/config/rs6000/rs6000.cc |  14 +-
 gcc/debug.h |   2 +
 gcc/doc/tm.texi |   7 +-
 gcc/dwarf2cfi.cc| 116 +-
 gcc/target.def  |   8 +-
 libgcc/unwind-dw2-execute_cfa.h | 322 
 libgcc/unwind-dw2.c | 360 ++--
 9 files changed, 472 insertions(+), 376 deletions(-)
 create mode 100644 libgcc/unwind-dw2-execute_cfa.h


base-commit: 5d060d8b0477ff4911f41c816281daaa24b41a13
-- 
2.38.1

[PATCH 1/3] Compute a table of DWARF register sizes at compile

2022-11-08 Thread Florian Weimer via Gcc-patches

The sizes are compile-time constants.  Create a vector with them,
so that they can be inspected at compile time.

* gcc/dwarf2cfi.cc (init_return_column_size): Remove.
(init_one_dwarf_reg_size): Adjust.
(generate_dwarf_reg_sizes): New function.  Extracted
from expand_builtin_init_dwarf_reg_sizes.
(expand_builtin_init_dwarf_reg_sizes): Call
generate_dwarf_reg_sizes.
* gcc/target.def (init_dwarf_reg_sizes_extra): Adjust
hook signature.
* gcc/config/msp430/msp430.cc
(msp430_init_dwarf_reg_sizes_extra): Adjust.
* gcc/config/rs6000.cc (rs6000_init_dwarf_reg_sizes_extra):
Likewise.
* gcc/doc/tm.texi: Update.
---
 gcc/config/msp430/msp430.cc | 11 +
 gcc/config/rs6000/rs6000.cc | 14 +-
 gcc/doc/tm.texi |  7 +--
 gcc/dwarf2cfi.cc| 93 ++---
 gcc/target.def  |  8 ++--
 5 files changed, 58 insertions(+), 75 deletions(-)

diff --git a/gcc/config/msp430/msp430.cc b/gcc/config/msp430/msp430.cc
index 6c15780a2b6..dbea8d7f50f 100644
--- a/gcc/config/msp430/msp430.cc
+++ b/gcc/config/msp430/msp430.cc
@@ -3202,11 +3202,9 @@ msp430_expand_eh_return (rtx eh_handler)
 #undef  TARGET_INIT_DWARF_REG_SIZES_EXTRA
 #define TARGET_INIT_DWARF_REG_SIZES_EXTRA msp430_init_dwarf_reg_sizes_extra
 void
-msp430_init_dwarf_reg_sizes_extra (tree address)
+msp430_init_dwarf_reg_sizes_extra (poly_uint16 *sizes)
 {
   int i;
-  rtx addr = expand_normal (address);
-  rtx mem = gen_rtx_MEM (BLKmode, addr);
 
   /* This needs to match msp430_unwind_word_mode (above).  */
   if (!msp430x)
@@ -3218,12 +3216,7 @@ msp430_init_dwarf_reg_sizes_extra (tree address)
   unsigned int rnum = DWARF2_FRAME_REG_OUT (dnum, 1);
 
   if (rnum < DWARF_FRAME_REGISTERS)
-   {
- HOST_WIDE_INT offset = rnum * GET_MODE_SIZE (QImode);
-
- emit_move_insn (adjust_address (mem, QImode, offset),
- gen_int_mode (4, QImode));
-   }
+   sizes[rnum] = 4;
 }
 }
 
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index a85d7630b41..fddb6a8a0f7 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -23783,27 +23783,17 @@ rs6000_initial_elimination_offset (int from, int to)
 /* Fill in sizes of registers used by unwinder.  */
 
 static void
-rs6000_init_dwarf_reg_sizes_extra (tree address)
+rs6000_init_dwarf_reg_sizes_extra (poly_uint16 *sizes)
 {
   if (TARGET_MACHO && ! TARGET_ALTIVEC)
 {
   int i;
-  machine_mode mode = TYPE_MODE (char_type_node);
-  rtx addr = expand_expr (address, NULL_RTX, VOIDmode, EXPAND_NORMAL);
-  rtx mem = gen_rtx_MEM (BLKmode, addr);
-  rtx value = gen_int_mode (16, mode);
 
   /* On Darwin, libgcc may be built to run on both G3 and G4/5.
 The unwinder still needs to know the size of Altivec registers.  */
 
   for (i = FIRST_ALTIVEC_REGNO; i < LAST_ALTIVEC_REGNO+1; i++)
-   {
- int column = DWARF_REG_TO_UNWIND_COLUMN
-   (DWARF2_FRAME_REG_OUT (DWARF_FRAME_REGNUM (i), true));
- HOST_WIDE_INT offset = column * GET_MODE_SIZE (mode);
-
- emit_move_insn (adjust_address (mem, mode, offset), value);
-   }
+   sizes[i] = 16;
 }
 }
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 8572313b308..09a3ab3e55c 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -9824,13 +9824,14 @@ used to return a smaller mode than the raw mode to 
prevent call
 clobbered parts of a register altering the frame register size
 @end deftypefn
 
-@deftypefn {Target Hook} void TARGET_INIT_DWARF_REG_SIZES_EXTRA (tree 
@var{address})
+@deftypefn {Target Hook} void TARGET_INIT_DWARF_REG_SIZES_EXTRA (poly_uint16 
*@var{sizes})
 If some registers are represented in Dwarf-2 unwind information in
 multiple pieces, define this hook to fill in information about the
 sizes of those pieces in the table used by the unwinder at runtime.
-It will be called by @code{expand_builtin_init_dwarf_reg_sizes} after
+It will be called by @code{generate_dwarf_reg_sizes} after
 filling in a single size corresponding to each hard register;
-@var{address} is the address of the table.
+@var{sizes} is the address of the table.  It will contain
+@code{DWARF_FRAME_REGISTERS} elements when this hook is called.
 @end deftypefn
 
 @deftypefn {Target Hook} bool TARGET_ASM_TTYPE (rtx @var{sym})
diff --git a/gcc/dwarf2cfi.cc b/gcc/dwarf2cfi.cc
index bef3165e691..b29173b2156 100644
--- a/gcc/dwarf2cfi.cc
+++ b/gcc/dwarf2cfi.cc
@@ -36,7 +36,7 @@ along with GCC; see the file COPYING3.  If not see
 
 #include "except.h"/* expand_builtin_dwarf_sp_column */
 #include "profile-count.h" /* For expr.h */
-#include "expr.h"  /* init_return_column_size */
+#include "expr.h"  /* expand_normal, emit_move_insn */
 #include "output.h"/* asm_out_file */
 #include "debug.h" /* dwarf2out_do_frame,

Re: [PATCH 2/2] libstdc++: Move stream initialization into compiled library [PR44952]

2022-11-04 Thread Florian Weimer via Gcc-patches

* Patrick Palka:

> On Fri, 4 Nov 2022, Florian Weimer wrote:
>
>> * Patrick Palka via Gcc-patches:
>> 
>> > This patch moves the global object for constructing the standard streams
>> > out from  and into the compiled library on targets that support
>> > the init_priority attribute.  This means that  no longer
>> > introduces a separate global constructor in each TU that includes it.
>> >
>> > We can do this only if the init_priority attribute is supported because
>> > we need to force that the stream initialization runs first before any
>> > user-defined global initializer in programs that that use a static
>> > libstdc++.a.
>> 
>> I think this breaks initialization of iostreams of shared objects that
>> are preloaded with LD_PRELOAD.  With the constructor, they initialize
>> iostreams once they are loaded via their own ELF constructors (even
>> before libstdc++'s ELF constructors run).  Without the constructor in
>> , that no longer happens.
>
> IIUC wouldn't that shared object depend on libstdc++.so and hence
> libstdc++'s constructors would still run before the shared object's?

Hmm, right, we only reorder the symbol binding order, not the
initialization order.  The preloaded object will not participate in a
cycle with libstdc++, so I think this should indeed be a safe change.

Thanks,
Florian

Re: [PATCH 2/2] libstdc++: Move stream initialization into compiled library [PR44952]

2022-11-04 Thread Florian Weimer via Gcc-patches

* Patrick Palka via Gcc-patches:

> This patch moves the global object for constructing the standard streams
> out from  and into the compiled library on targets that support
> the init_priority attribute.  This means that  no longer
> introduces a separate global constructor in each TU that includes it.
>
> We can do this only if the init_priority attribute is supported because
> we need to force that the stream initialization runs first before any
> user-defined global initializer in programs that that use a static
> libstdc++.a.

I think this breaks initialization of iostreams of shared objects that
are preloaded with LD_PRELOAD.  With the constructor, they initialize
iostreams once they are loaded via their own ELF constructors (even
before libstdc++'s ELF constructors run).  Without the constructor in
, that no longer happens.

Thanks,
Florian

Re: [PATCH v2] libgcc: Mostly vectorize CIE encoding extraction for FDEs

2022-11-04 Thread Florian Weimer via Gcc-patches

* Jakub Jelinek:

>> +#undef C
>> +
>> +  /* Validate the augmentation length, and return the enconding after
>> + it.  No check for the return address column because it is
>> + byte-encoded with CIE version 1.  */
>> +  if (__builtin_expect ((value & mask) == expected
>> +&& (cie->augmentation[8] & 0x80) == 0, 1))
>> +  return cie->augmentation[9];
>
> And the above line is indented too much, should be just tab.
>
> But more importantly, I don't see how it can work at all correctly.
> For the "zR" case:
>   Version:   1
>   Augmentation:  "zR"
>   Code alignment factor: 1
>   Data alignment factor: -8
>   Return address column: 16
>   Augmentation data: 1b
> I believe the function wants to return 0x1b, we have
> 1 z R 0 c d r l X
> and we want to return the X, which is indeed at >version + 8
> aka cie->augmentation[7].
> But with "zPLR"
>   Version:   1
>   Augmentation:  "zPLR"
>   Code alignment factor: 1
>   Data alignment factor: -8
>   Return address column: 16
>   Augmentation data: 03 00 00 00 00 03 1b
> we have
> 1 z P L R 0 c d r l M N N N N O X 
> and still want to return the X, while you return the M in there.
> How large the N is depends on the M value.  The "P" data
> is  + , "L" takes just 
> and so does "R".
> So, you'd need in addition to your (cie->augmentation[8] & 0x80) == 0
> check (that augmentation data length is single byte uleb128) verify
> that cie->augmentation[9] is DW_EH_PE_udata4, then you can skip
> the 4 bytes after it and the one byte for "L" and return
> cie->augmentation[15];  But you'd need to verify what different
> DW_EH_PE_* values are used in the wild.

Meh, right.  I think what I'm doing is pretty pointless.  We call
extract_cie_info immediately after calling _Unwind_Find_FDE in
uw_frame_state_for.  That already gives us the FDE encoding.  So it
seems to make more sense to do the PC range check in uw_frame_state_for,
and _Unwind_Find_FDE won't need the FDE encoding anymore.

My testing doesn't catch this problem unfortunately.

> BTW, version 1 is there just because binutils emit those,
> if one uses -fno-dwarf2-cfi-asm, there is:
>   Version:   3
>   Augmentation:  "zR"
>   Code alignment factor: 1
>   Data alignment factor: -8
>   Return address column: 16
>   Augmentation data: 03
> and
>   Version:   3
>   Augmentation:  "zPLR"
>   Code alignment factor: 1
>   Data alignment factor: -8
>   Return address column: 16
>   Augmentation data: 03 00 00 00 00 03 03
> instead (so 3 instead of 1, and leb128 RA instead of byte;
> 1 vs. 3 can be handled by mask and checking if top byte of the
> RA isn't set could work too).  One should trie what one gets
> with -fpic/-fpie/-fno-pie on various arches though.
> E.g. doesn't aarch64 use augmentations with B in it?

I haven't seen those, and I also can't find the GCC code to emit it.  I
think it's now handled via aarch64_frob_update_context, not an
augmentation flag.  Perhaps we should remove the 'B' cases.

[PATCH v2] libgcc: Mostly vectorize CIE encoding extraction for FDEs

2022-11-04 Thread Florian Weimer via Gcc-patches

"zR" and "zPLR" are the most common augmentations.  Use a simple
SIMD-with-in-a-register technique to check for both augmentations,
and that the following variable-length integers have length 1, to
get more quickly at the encoding field.

libgcc/

* unwind-dw2-fde.c (get_cie_encoding_slow): Rename from
get_cie_encoding.  Mark as noinline.
(get_cie_encoding): Add fast path for "zR" and "zPLR"
augmentations.  Call get_cie_encoding_slow as a fall-back.

---
v2: Use memcpy to avoid a potential aliasing violation.

 libgcc/unwind-dw2-fde.c | 62 +++--
 1 file changed, 60 insertions(+), 2 deletions(-)

diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
index 3c0cc654ec0..21eee77882e 100644
--- a/libgcc/unwind-dw2-fde.c
+++ b/libgcc/unwind-dw2-fde.c
@@ -333,8 +333,10 @@ base_from_object (unsigned char encoding, const struct 
object *ob)
 /* Return the FDE pointer encoding from the CIE.  */
 /* ??? This is a subset of extract_cie_info from unwind-dw2.c.  */
 
-static int
-get_cie_encoding (const struct dwarf_cie *cie)
+/* Disable inlining because the function is only used as a slow path in
+   get_cie_encoding below.  */
+static int __attribute__ ((noinline))
+get_cie_encoding_slow (const struct dwarf_cie *cie)
 {
   const unsigned char *aug, *p;
   _Unwind_Ptr dummy;
@@ -389,6 +391,62 @@ get_cie_encoding (const struct dwarf_cie *cie)
 }
 }
 
+static inline int
+get_cie_encoding (const struct dwarf_cie *cie)
+{
+  /* Fast path for some augmentations and single-byte variable-length
+ integers.  Do this only for targets that align struct dwarf_cie to 8
+ bytes, which ensures that at least 8 bytes are available starting at
+ cie->version.  */
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ \
+  || __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+  if (__alignof (*cie) == 8 && sizeof (unsigned long long) == 8)
+{
+  unsigned long long value;
+  memcpy (, >version, sizeof (value));
+
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+#define C(x) __builtin_bswap64 (x)
+#else
+#define C(x) x
+#endif
+
+  /* Fast path for "zR".  Check for version 1, the "zR" string and that
+the sleb128/uleb128 values are single bytes.  In the comments
+below, '1', 'c', 'd', 'r', 'l' are version, code alignment, data
+alignment, return address column, augmentation length.  Note that
+with CIE version 1, the return address column is byte-encoded.  */
+  unsigned long long expected =
+   /*   1 z R 0 c d r l.  */
+   C (0x017a5200ULL);
+  unsigned long long mask =
+   /*   1 z R 0 c d r l.  */
+   C (0x80800080ULL);
+
+  if ((value & mask) == expected)
+   return cie->augmentation[7];
+
+  /* Fast path for "zPLR".  */
+  expected =
+   /*   1 z P L R 0 c d.  */
+   C (0x017a504c5200ULL);
+  mask =
+   /*   1 z P L R 0 c d.  */
+   C (0x8080ULL);
+#undef C
+
+  /* Validate the augmentation length, and return the enconding after
+it.  No check for the return address column because it is
+byte-encoded with CIE version 1.  */
+  if (__builtin_expect ((value & mask) == expected
+   && (cie->augmentation[8] & 0x80) == 0, 1))
+ return cie->augmentation[9];
+}
+#endif
+
+  return get_cie_encoding_slow (cie);
+}
+
 static inline int
 get_fde_encoding (const struct dwarf_fde *f)
 {

base-commit: e724b0480bfa5ec04f39be8c7290330b495c59de

Re: C2x features status

2022-10-21 Thread Florian Weimer via Gcc-patches

* Arsen Arsenović:

> On Friday, 21 October 2022 21:14:54 CEST Marek Polacek via Gcc wrote:
>> commit 0a91bdaf177409a2a5e7895bce4f0e7091b4b3ca
>> Author: Joseph Myers 
>> Date:   Wed Sep 7 13:56:25 2022 +
>> 
>> c: New C2x keywords
>> 
>> which says:
>> 
>> As with the removal of unprototyped functions, this change has a
>> high risk of breaking some old code and people doing GNU/Linux
>> distribution builds may wish to see how much is broken in a build
>> with a -std=gnu2x default.
>
> It already does break a lot.  See https://bugs.gentoo.org/870412 
> (comments go over the details).  I was intending on giving this issue a 
> proper look in the GNU toolchain frame of reference, but never got 
> around to it (and I kinda knocked priority down after managing to 
> configure properly once IIRC).

That's the implicit function declaration/implicit int change.  This
won't happen in GCC 13, it's too late for that.  I tried to make this
change a couple of years in Fedora, and just flipping the compiler flag
Does Not Work.  I hope to get there in time for GCC 14.

Thank you for sharing the Gentoo tracker.  Maybe we can reuse some
patches from there and contribute ours.  I trust Gentoo aims to upstream
patches as they become available?  Unfortunately, I expect that a lot of
these issues will be in packages that don't have an active upstream
anymore, which makes sharing patches more challenging.  In other cases,
we'll just build with -std=gnu89 (e.g. unzip
, it has
configure-style checking implemented without autoconf).

Thanks,
Florian

Re: [PATCH] libiberty: Fix C89-isms in configure tests

2022-10-18 Thread Florian Weimer via Gcc-patches

* Jakub Jelinek:

> On Tue, Oct 18, 2022 at 04:06:17PM +0200, Florian Weimer wrote:
>> By the way, the stack direction test currently gives incorrect results
>> on x86-64 due to -O2 and address comparison of unrelated objects.  I
>> assume this doesn't matter because we don't use it on compilers that
>> support alloca natively.
>
> Guess it would be better to cast the addresses to uintptr_t and
> compare that.

But can we assume that uintptr_t is defined?  Or that  exists?

Thanks,
Florian

[PATCH v2] libiberty: Fix C89-isms in configure tests

2022-10-18 Thread Florian Weimer via Gcc-patches

libiberty/

* acinclude.m4 (ac_cv_func_strncmp_works): Add missing
int return type and parameter list to the definition of main.
Include  and  for prototypes.
(ac_cv_c_stack_direction): Add missing
int return type and parameter list to the definitions of
main, find_stack_direction.  Include  for exit
prototype.
* configure: Regenerate.

---
 libiberty/acinclude.m4 | 14 +++---
 libiberty/configure| 14 +++---
 2 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/libiberty/acinclude.m4 b/libiberty/acinclude.m4
index 6bd127e9826..6bb690597bf 100644
--- a/libiberty/acinclude.m4
+++ b/libiberty/acinclude.m4
@@ -24,6 +24,8 @@ AC_CACHE_CHECK([for working strncmp], 
ac_cv_func_strncmp_works,
 [AC_TRY_RUN([
 /* Test by Jim Wilson and Kaveh Ghazi.
Check whether strncmp reads past the end of its string parameters. */
+#include 
+#include 
 #include 
 
 #ifdef HAVE_FCNTL_H
@@ -51,7 +53,8 @@ AC_CACHE_CHECK([for working strncmp], 
ac_cv_func_strncmp_works,
 
 #define MAP_LEN 0x1
 
-main ()
+int
+main (void)
 {
 #if defined(HAVE_MMAP) || defined(HAVE_MMAP_ANYWHERE)
   char *p;
@@ -157,7 +160,10 @@ if test $ac_cv_os_cray = yes; then
 fi
 
 AC_CACHE_CHECK(stack direction for C alloca, ac_cv_c_stack_direction,
-[AC_TRY_RUN([find_stack_direction ()
+[AC_TRY_RUN([#include 
+
+int
+find_stack_direction (void)
 {
   static char *addr = 0;
   auto char dummy;
@@ -169,7 +175,9 @@ AC_CACHE_CHECK(stack direction for C alloca, 
ac_cv_c_stack_direction,
   else
 return ( > addr) ? 1 : -1;
 }
-main ()
+
+int
+main (void)
 {
   exit (find_stack_direction() < 0);
 }],
diff --git a/libiberty/configure b/libiberty/configure
index 0a797255c70..ca83f89da6d 100755
--- a/libiberty/configure
+++ b/libiberty/configure
@@ -6780,7 +6780,10 @@ else
 else
   cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
-find_stack_direction ()
+#include 
+
+int
+find_stack_direction (void)
 {
   static char *addr = 0;
   auto char dummy;
@@ -6792,7 +6795,9 @@ find_stack_direction ()
   else
 return ( > addr) ? 1 : -1;
 }
-main ()
+
+int
+main (void)
 {
   exit (find_stack_direction() < 0);
 }
@@ -7617,6 +7622,8 @@ else
 
 /* Test by Jim Wilson and Kaveh Ghazi.
Check whether strncmp reads past the end of its string parameters. */
+#include 
+#include 
 #include 
 
 #ifdef HAVE_FCNTL_H
@@ -7644,7 +7651,8 @@ else
 
 #define MAP_LEN 0x1
 
-main ()
+int
+main (void)
 {
 #if defined(HAVE_MMAP) || defined(HAVE_MMAP_ANYWHERE)
   char *p;

base-commit: 1dc4488e091ae75bdd2c00a482db0f1b68229154

Re: [PATCH] libiberty: Fix C89-isms in configure tests

2022-10-18 Thread Florian Weimer via Gcc-patches

* Jakub Jelinek:

> On Tue, Oct 18, 2022 at 12:05:49PM +0200, Florian Weimer via Gcc-patches 
> wrote:
>> libiberty/
>> 
>>  * acinclude.m4 (check for working strncmp): Add missing
>>  int return type and parameter list to the definition of main.
>>  Include  for string functions.  Avoid calling
>>  undeclared exit function.
>> (stack direction for C alloca): Avoid calling undeclared exit
>> function.
>
> Spaces instead of tabs.

You mean I should use tabs throughout, right?

> I'd think we should #include  for exit and keep exit, I vaguely
> remember non-zero return from main doesn't always work reliably, which is
> why e.g. in the testsuite we usually abort instead of return non-zero
> from main.  Don't remember if it is just for some bare metal cases or
> what, which on the either side probably don't have mmap.

Okay, will do that and send a v2.

By the way, the stack direction test currently gives incorrect results
on x86-64 due to -O2 and address comparison of unrelated objects.  I
assume this doesn't matter because we don't use it on compilers that
support alloca natively.

Thanks,
Florian

[PATCH] libiberty: Fix C89-isms in configure tests

2022-10-18 Thread Florian Weimer via Gcc-patches

libiberty/

* acinclude.m4 (check for working strncmp): Add missing
int return type and parameter list to the definition of main.
Include  for string functions.  Avoid calling
undeclared exit function.
(stack direction for C alloca): Avoid calling undeclared exit
function.
* configure: Regenerate.

---
 libiberty/acinclude.m4 | 12 +++-
 libiberty/configure| 12 +++-
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/libiberty/acinclude.m4 b/libiberty/acinclude.m4
index 6bd127e9826..6e23ae906fc 100644
--- a/libiberty/acinclude.m4
+++ b/libiberty/acinclude.m4
@@ -24,6 +24,7 @@ AC_CACHE_CHECK([for working strncmp], 
ac_cv_func_strncmp_works,
 [AC_TRY_RUN([
 /* Test by Jim Wilson and Kaveh Ghazi.
Check whether strncmp reads past the end of its string parameters. */
+#include 
 #include 
 
 #ifdef HAVE_FCNTL_H
@@ -51,7 +52,8 @@ AC_CACHE_CHECK([for working strncmp], 
ac_cv_func_strncmp_works,
 
 #define MAP_LEN 0x1
 
-main ()
+int
+main (void)
 {
 #if defined(HAVE_MMAP) || defined(HAVE_MMAP_ANYWHERE)
   char *p;
@@ -59,7 +61,7 @@ main ()
 
   dev_zero = open ("/dev/zero", O_RDONLY);
   if (dev_zero < 0)
-exit (1);
+return 1;
 
   p = (char *) mmap (0, MAP_LEN, PROT_READ|PROT_WRITE,
 MAP_ANON|MAP_PRIVATE, dev_zero, 0);
@@ -67,7 +69,7 @@ main ()
 p = (char *) mmap (0, MAP_LEN, PROT_READ|PROT_WRITE,
   MAP_ANON|MAP_PRIVATE, -1, 0);
   if (p == (char *)-1)
-exit (2);
+return 2;
   else
 {
   char *string = "__si_type_info";
@@ -79,7 +81,7 @@ main ()
   strncmp (r, q, 14);
 }
 #endif /* HAVE_MMAP || HAVE_MMAP_ANYWHERE */
-  exit (0);
+  return 0;
 }
 ], ac_cv_func_strncmp_works=yes, ac_cv_func_strncmp_works=no,
   ac_cv_func_strncmp_works=yes)
@@ -171,7 +173,7 @@ AC_CACHE_CHECK(stack direction for C alloca, 
ac_cv_c_stack_direction,
 }
 main ()
 {
-  exit (find_stack_direction() < 0);
+  return find_stack_direction() < 0;
 }],
   ac_cv_c_stack_direction=1,
   ac_cv_c_stack_direction=-1,
diff --git a/libiberty/configure b/libiberty/configure
index 65fc5002002..c871cc559ca 100755
--- a/libiberty/configure
+++ b/libiberty/configure
@@ -6798,7 +6798,7 @@ find_stack_direction ()
 }
 main ()
 {
-  exit (find_stack_direction() < 0);
+  return find_stack_direction() < 0;
 }
 _ACEOF
 if ac_fn_c_try_run "$LINENO"; then :
@@ -7621,6 +7621,7 @@ else
 
 /* Test by Jim Wilson and Kaveh Ghazi.
Check whether strncmp reads past the end of its string parameters. */
+#include 
 #include 
 
 #ifdef HAVE_FCNTL_H
@@ -7648,7 +7649,8 @@ else
 
 #define MAP_LEN 0x1
 
-main ()
+int
+main (void)
 {
 #if defined(HAVE_MMAP) || defined(HAVE_MMAP_ANYWHERE)
   char *p;
@@ -7656,7 +7658,7 @@ main ()
 
   dev_zero = open ("/dev/zero", O_RDONLY);
   if (dev_zero < 0)
-exit (1);
+return 1;
 
   p = (char *) mmap (0, MAP_LEN, PROT_READ|PROT_WRITE,
 MAP_ANON|MAP_PRIVATE, dev_zero, 0);
@@ -7664,7 +7666,7 @@ main ()
 p = (char *) mmap (0, MAP_LEN, PROT_READ|PROT_WRITE,
   MAP_ANON|MAP_PRIVATE, -1, 0);
   if (p == (char *)-1)
-exit (2);
+return 2;
   else
 {
   char *string = "__si_type_info";
@@ -7676,7 +7678,7 @@ main ()
   strncmp (r, q, 14);
 }
 #endif /* HAVE_MMAP || HAVE_MMAP_ANYWHERE */
-  exit (0);
+  return 0;
 }
 
 _ACEOF

base-commit: 54b316ff0d4f3bd823ad0b4d0011900948c5d40e

[PATCH] libsanitizer: Avoid implicit function declaration in configure test

2022-10-18 Thread Florian Weimer via Gcc-patches

libsanitizer/

* configure.ac (check for necessary platform features):
Include  for syscall prototype.
* configure: Regenerate.

---
 libsanitizer/configure| 5 +++--
 libsanitizer/configure.ac | 3 ++-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/libsanitizer/configure b/libsanitizer/configure
index 3a0c47513e7..cb99faf6100 100755
--- a/libsanitizer/configure
+++ b/libsanitizer/configure
@@ -12383,7 +12383,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12386 "configure"
+#line 12398 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -12489,7 +12489,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12492 "configure"
+#line 12504 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -16072,6 +16072,7 @@ case "$target" in
 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 #include 
+#include 
 int
 main ()
 {
diff --git a/libsanitizer/configure.ac b/libsanitizer/configure.ac
index 7f1ef3979c4..ad49f29db7e 100644
--- a/libsanitizer/configure.ac
+++ b/libsanitizer/configure.ac
@@ -161,7 +161,8 @@ case "$target" in
   *-*-linux*)
 # Some old Linux distributions miss required syscalls.
 sanitizer_supported=no
-AC_TRY_COMPILE([#include ],[
+AC_TRY_COMPILE([#include 
+#include ],[
   syscall (__NR_gettid);
   syscall (__NR_futex);
   syscall (__NR_exit_group);

base-commit: acdb24166d13d87c374e578d2ad5d58249171930

Re: [PATCH] libgcc: Mostly vectorize CIE encoding extraction for FDEs

2022-10-17 Thread Florian Weimer via Gcc-patches

* Richard Biener:

> On Mon, Oct 17, 2022 at 3:01 PM Florian Weimer via Gcc-patches
>  wrote:
>>
>> "zR" and "zPLR" are the most common augmentations.  Use a simple
>> SIMD-with-in-a-register technique to check for both augmentations,
>> and that the following variable-length integers have length 1, to
>> get more quickly at the encoding field.
>>
>> libgcc/
>>
>> * unwind-dw2-fde.c (get_cie_encoding_slow): Rename from
>> get_cie_encoding.  Mark as noinline.
>> (get_cie_encoding): Add fast path for "zR" and "zPLR"
>> augmentations.  Call get_cie_encoding_slow as a fall-back.
>>
>> ---
>>  libgcc/unwind-dw2-fde.c | 61 
>> +++--
>>  1 file changed, 59 insertions(+), 2 deletions(-)
>>
>> diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
>> index 3c0cc654ec0..4e3a54c5a1a 100644
>> --- a/libgcc/unwind-dw2-fde.c
>> +++ b/libgcc/unwind-dw2-fde.c
>> @@ -333,8 +333,10 @@ base_from_object (unsigned char encoding, const struct 
>> object *ob)
>>  /* Return the FDE pointer encoding from the CIE.  */
>>  /* ??? This is a subset of extract_cie_info from unwind-dw2.c.  */
>>
>> -static int
>> -get_cie_encoding (const struct dwarf_cie *cie)
>> +/* Disable inlining because the function is only used as a slow path in
>> +   get_cie_encoding below.  */
>> +static int __attribute__ ((noinline))
>> +get_cie_encoding_slow (const struct dwarf_cie *cie)
>>  {
>>const unsigned char *aug, *p;
>>_Unwind_Ptr dummy;
>> @@ -389,6 +391,61 @@ get_cie_encoding (const struct dwarf_cie *cie)
>>  }
>>  }
>>
>> +static inline int
>> +get_cie_encoding (const struct dwarf_cie *cie)
>> +{
>> +  /* Fast path for some augmentations and single-byte variable-length
>> + integers.  Do this only for targets that align struct dwarf_cie to 8
>> + bytes, which ensures that at least 8 bytes are available starting at
>> + cie->version.  */
>> +#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ \
>> +  || __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
>> +  if (__alignof (*cie) == 8 && sizeof (unsigned long long) == 8)
>> +{
>> +  unsigned long long value = *(const unsigned long long *) 
>> >version;
>
> TBAA?  Maybe use
>
>unsigned long long value;
>memcpy (, >version, 8);
>
> instead?

It's following the existing libgcc style, see
read_encoded_value_with_base in unwind-pe.h.

We know here that the pointer is aligned, but perhaps GCC still can use
that information if using memcpy.

I can certainly change it to memcpy.

Thanks,
Florian

[PATCH] libgcc: Mostly vectorize CIE encoding extraction for FDEs

2022-10-17 Thread Florian Weimer via Gcc-patches

"zR" and "zPLR" are the most common augmentations.  Use a simple
SIMD-with-in-a-register technique to check for both augmentations,
and that the following variable-length integers have length 1, to
get more quickly at the encoding field.

libgcc/

* unwind-dw2-fde.c (get_cie_encoding_slow): Rename from
get_cie_encoding.  Mark as noinline.
(get_cie_encoding): Add fast path for "zR" and "zPLR"
augmentations.  Call get_cie_encoding_slow as a fall-back.

---
 libgcc/unwind-dw2-fde.c | 61 +++--
 1 file changed, 59 insertions(+), 2 deletions(-)

diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
index 3c0cc654ec0..4e3a54c5a1a 100644
--- a/libgcc/unwind-dw2-fde.c
+++ b/libgcc/unwind-dw2-fde.c
@@ -333,8 +333,10 @@ base_from_object (unsigned char encoding, const struct 
object *ob)
 /* Return the FDE pointer encoding from the CIE.  */
 /* ??? This is a subset of extract_cie_info from unwind-dw2.c.  */
 
-static int
-get_cie_encoding (const struct dwarf_cie *cie)
+/* Disable inlining because the function is only used as a slow path in
+   get_cie_encoding below.  */
+static int __attribute__ ((noinline))
+get_cie_encoding_slow (const struct dwarf_cie *cie)
 {
   const unsigned char *aug, *p;
   _Unwind_Ptr dummy;
@@ -389,6 +391,61 @@ get_cie_encoding (const struct dwarf_cie *cie)
 }
 }
 
+static inline int
+get_cie_encoding (const struct dwarf_cie *cie)
+{
+  /* Fast path for some augmentations and single-byte variable-length
+ integers.  Do this only for targets that align struct dwarf_cie to 8
+ bytes, which ensures that at least 8 bytes are available starting at
+ cie->version.  */
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ \
+  || __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+  if (__alignof (*cie) == 8 && sizeof (unsigned long long) == 8)
+{
+  unsigned long long value = *(const unsigned long long *) >version;
+
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+#define C(x) __builtin_bswap64 (x)
+#else
+#define C(x) x
+#endif
+
+  /* Fast path for "zR".  Check for version 1, the "zR" string and that
+the sleb128/uleb128 values are single bytes.  In the comments
+below, '1', 'c', 'd', 'r', 'l' are version, code alignment, data
+alignment, return address column, augmentation length.  Note that
+with CIE version 1, the return address column is byte-encoded.  */
+  unsigned long long expected =
+   /*   1 z R 0 c d r l.  */
+   C (0x017a5200ULL);
+  unsigned long long mask =
+   /*   1 z R 0 c d r l.  */
+   C (0x80800080ULL);
+
+  if ((value & mask) == expected)
+   return cie->augmentation[7];
+
+  /* Fast path for "zPLR".  */
+  expected =
+   /*   1 z P L R 0 c d.  */
+   C (0x017a504c5200ULL);
+  mask =
+   /*   1 z P L R 0 c d.  */
+   C (0x8080ULL);
+#undef C
+
+  /* Validate the augmentation length, and return the enconding after
+it.  No check for the return address column because it is
+byte-encoded with CIE version 1.  */
+  if (__builtin_expect ((value & mask) == expected
+   && (cie->augmentation[8] & 0x80) == 0, 1))
+ return cie->augmentation[9];
+}
+#endif
+
+  return get_cie_encoding_slow (cie);
+}
+
 static inline int
 get_fde_encoding (const struct dwarf_fde *f)
 {

base-commit: de84a1e4b107b803ec3b064c3771a6ed8c0e201e

[PATCH] libgcc: Special-case BFD ld unwind table encodings in find_fde_tail

2022-10-17 Thread Florian Weimer via Gcc-patches

BFD ld (and the other linkers) only produce one encoding of these
values.  It is not necessary to use the general
read_encoded_value_with_base decoding routine.  This avoids the
data-dependent branches in its implementation.

libgcc/

* unwind-dw2-fde-dip.c (find_fde_tail): Special-case encoding
values actually used by BFD ld.

---
 libgcc/unwind-dw2-fde-dip.c | 58 +
 1 file changed, 48 insertions(+), 10 deletions(-)

diff --git a/libgcc/unwind-dw2-fde-dip.c b/libgcc/unwind-dw2-fde-dip.c
index 7f9be5e6b02..f370c1279ae 100644
--- a/libgcc/unwind-dw2-fde-dip.c
+++ b/libgcc/unwind-dw2-fde-dip.c
@@ -396,10 +396,21 @@ find_fde_tail (_Unwind_Ptr pc,
   if (hdr->version != 1)
 return NULL;
 
-  p = read_encoded_value_with_base (hdr->eh_frame_ptr_enc,
-   base_from_cb_data (hdr->eh_frame_ptr_enc,
-  dbase),
-   p, _frame);
+  if (__builtin_expect (hdr->eh_frame_ptr_enc == (DW_EH_PE_sdata4
+ | DW_EH_PE_pcrel), 1))
+{
+  /* Specialized version of read_encoded_value_with_base, based on what
+BFD ld generates.  */
+  signed value __attribute__ ((mode (SI)));
+  memcpy (, p, sizeof (value));
+  p += sizeof (value);
+  dbase = value;   /* No adjustment because pcrel has base 0.  */
+}
+  else
+p = read_encoded_value_with_base (hdr->eh_frame_ptr_enc,
+ base_from_cb_data (hdr->eh_frame_ptr_enc,
+dbase),
+ p, _frame);
 
   /* We require here specific table encoding to speed things up.
  Also, DW_EH_PE_datarel here means using PT_GNU_EH_FRAME start
@@ -409,10 +420,20 @@ find_fde_tail (_Unwind_Ptr pc,
 {
   _Unwind_Ptr fde_count;
 
-  p = read_encoded_value_with_base (hdr->fde_count_enc,
-   base_from_cb_data (hdr->fde_count_enc,
-  dbase),
-   p, _count);
+  if (__builtin_expect (hdr->fde_count_enc == DW_EH_PE_udata4, 1))
+   {
+ /* Specialized version of read_encoded_value_with_base, based on
+what BFD ld generates.  */
+ unsigned value __attribute__ ((mode (SI)));
+ memcpy (, p, sizeof (value));
+ p += sizeof (value);
+ fde_count = value;
+   }
+  else
+   p = read_encoded_value_with_base (hdr->fde_count_enc,
+ base_from_cb_data (hdr->fde_count_enc,
+dbase),
+ p, _count);
   /* Shouldn't happen.  */
   if (fde_count == 0)
return NULL;
@@ -454,8 +475,25 @@ find_fde_tail (_Unwind_Ptr pc,
  f = (fde *) (table[mid].fde + data_base);
  f_enc = get_fde_encoding (f);
  f_enc_size = size_of_encoded_value (f_enc);
- read_encoded_value_with_base (f_enc & 0x0f, 0,
-   >pc_begin[f_enc_size], );
+
+ /* BFD ld uses DW_EH_PE_sdata4 | DW_EH_PE_pcrel on non-FDPIC targets,
+so optimize for that.
+
+This optimization is not valid for FDPIC targets.  f_enc & 0x0f as
+passed to read_encoded_value_with_base masks away the base flags,
+but they are implicit for FDPIC.  */
+#ifndef __FDPIC__
+ if (__builtin_expect (f_enc == (DW_EH_PE_sdata4 | DW_EH_PE_pcrel),
+   1))
+   {
+ signed value __attribute__ ((mode (SI)));
+ memcpy (, >pc_begin[f_enc_size], sizeof (value));
+ range = value;
+   }
+ else
+#endif
+   read_encoded_value_with_base (f_enc & 0x0f, 0,
+ >pc_begin[f_enc_size], );
  _Unwind_Ptr func = table[mid].initial_loc + data_base;
  if (pc < table[mid].initial_loc + data_base + range)
{

base-commit: de7d6310862c6045cf2dfb0ef209ff0e0923e648

[PATCH] libgcc: Move cfa_how into potential padding in struct frame_state_reg_info

2022-10-14 Thread Florian Weimer via Gcc-patches

On many architectures, there is a padding gap after the how array
member, and cfa_how can be moved there.  This reduces the size of the
struct and the amount of memory that uw_frame_state_for has to clear.

There is no measurable performance benefit from this on x86-64 (even
though the memset goes from 120 to 112 bytes), but it seems to be a
good idea to do anyway.

libgcc/

* unwind-dw2.h (struct frame_state_reg_info): Move cfa_how member
and reduce its size.

---
 libgcc/unwind-dw2.h | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/libgcc/unwind-dw2.h b/libgcc/unwind-dw2.h
index af34e000f07..a0834b18277 100644
--- a/libgcc/unwind-dw2.h
+++ b/libgcc/unwind-dw2.h
@@ -50,6 +50,12 @@ typedef struct
 } reg[__LIBGCC_DWARF_FRAME_REGISTERS__+1];
 unsigned char how[__LIBGCC_DWARF_FRAME_REGISTERS__+1];
 
+enum {
+  CFA_UNSET,
+  CFA_REG_OFFSET,
+  CFA_EXP
+} cfa_how : 8;
+
 /* Used to implement DW_CFA_remember_state.  */
 struct frame_state_reg_info *prev;
 
@@ -58,11 +64,6 @@ typedef struct
 _Unwind_Sword cfa_offset;
 _Unwind_Word cfa_reg;
 const unsigned char *cfa_exp;
-enum {
-  CFA_UNSET,
-  CFA_REG_OFFSET,
-  CFA_EXP
-} cfa_how;
   } regs;
 
   /* The PC described by the current frame state.  */

base-commit: 3dfeda095bd43c011fdc3834b9cec39bb9a73a1f

Re: [PATCH] libgcc: Decrease size of _Unwind_FrameState and even more size of cleared area in uw_frame_state_for

2022-09-19 Thread Florian Weimer via Gcc-patches

* Jakub Jelinek:

> On Mon, Sep 19, 2022 at 11:25:13AM +0200, Florian Weimer wrote:
>> * Jakub Jelinek:
>> 
>> > The disadvantage of the patch is that touching reg[x].loc and how[x]
>> > now means 2 cachelines rather than one as before, and I admit beyond
>> > bootstrap/regtest I haven't benchmarked it in any way.  Florian, could
>> > you retry whatever you measured to get at the 40% of time spent on the
>> > stack clearing to see how the numbers change?
>> 
>> A benchmark that unwinds through 100 frames containing a std::string
>> variable goes from (0b5b8ac5cb7fe92dd17ae8bd7de84640daa59e84):
>> 
>> min: 24418 ns
>> 25%: 24740 ns
>> 50%: 24790 ns
>> 75%: 24840 ns
>> 95%: 24937 ns
>> 99%: 26174 ns
>> max: 42530 ns
>> avg:   24826.1 ns
>> 
>> to (0b5b8ac5cb7fe92dd17ae8bd7de84640daa59e84 with this patch):
>> 
>> min: 22307 ns
>> 25%: 22640 ns
>> 50%: 22713 ns
>> 75%: 22787 ns
>> 95%: 22948 ns
>> 99%: 24839 ns
>> max: 52658 ns
>> avg:   22863.4 ns
>> 
>> So 227 ns per frame instead of 248 ns per frame, or ~9% less.
>
> Thanks for doing that.

So it turns out my test program had 100 frames, but not with
std::string.  With std::string objects, the numbers are:

Before:

min: 71236 ns
25%: 71637 ns
50%: 71724 ns
75%: 71857 ns
95%: 73148 ns
99%: 74023 ns
max:120735 ns
avg:   71973.1 ns

After:

min: 69547 ns
25%: 69961 ns
50%: 70034 ns
75%: 70112 ns
95%: 71273 ns
99%: 71511 ns
max: 82691 ns
avg:   70121.3 ns

So slightly less improvement per frame, but it's still there.

>> Moving cfa_how after how in struct frame_state_reg_info as an 8-bit
>> bitfield should avoid zeroing another 8 bytes.  This shaves off another
>> 3 ns per frame in my testing (on a Core i9-10900T, so with ERMS).
>
> Good idea.  Won't help always, on some targets how could have size divisible
> by pointer alignment, but when it is at the end it always increases the
> size by alignment of pointer, while after how array it only does so if
> how is multiple of pointer alignment.

Okay, I'll send a separate patch once yours is in, along with some other
simple changes.

Thanks,
Florian

Re: [PATCH] libgcc: Decrease size of _Unwind_FrameState and even more size of cleared area in uw_frame_state_for

2022-09-19 Thread Florian Weimer via Gcc-patches

* Jakub Jelinek:

> The disadvantage of the patch is that touching reg[x].loc and how[x]
> now means 2 cachelines rather than one as before, and I admit beyond
> bootstrap/regtest I haven't benchmarked it in any way.  Florian, could
> you retry whatever you measured to get at the 40% of time spent on the
> stack clearing to see how the numbers change?

A benchmark that unwinds through 100 frames containing a std::string
variable goes from (0b5b8ac5cb7fe92dd17ae8bd7de84640daa59e84):

min: 24418 ns
25%: 24740 ns
50%: 24790 ns
75%: 24840 ns
95%: 24937 ns
99%: 26174 ns
max: 42530 ns
avg:   24826.1 ns

to (0b5b8ac5cb7fe92dd17ae8bd7de84640daa59e84 with this patch):

min: 22307 ns
25%: 22640 ns
50%: 22713 ns
75%: 22787 ns
95%: 22948 ns
99%: 24839 ns
max: 52658 ns
avg:   22863.4 ns

So 227 ns per frame instead of 248 ns per frame, or ~9% less.

Moving cfa_how after how in struct frame_state_reg_info as an 8-bit
bitfield should avoid zeroing another 8 bytes.  This shaves off another
3 ns per frame in my testing (on a Core i9-10900T, so with ERMS).

The REP STOS still dominates uw_frame_state_for execution time, but this
seems to be a profiling artifact.  Replacing it with PXOR and seven
MOVUPS instructions makes the hotspot go away, but performance does not
improve.  Odd.

Thanks,
Florian

Re: [PATCH v3] Many pages: Document fixed-width types with ISO C naming

2022-08-25 Thread Florian Weimer via Gcc-patches

* Greg Kroah-Hartman:

> On Thu, Aug 25, 2022 at 01:36:10AM +0200, Alejandro Colomar wrote:
>> But from your side what do we have?  Just direct NAKs without much
>> explanation.  The only one who gave some explanation was Greg, and he
>> vaguely pointed to Linus's comments about it in the past, with no precise
>> pointer to it.  I investigated a lot before v2, and could not find anything
>> strong enough to recommend using kernel types in user space, so I pushed v2,
>> and the discussion was kept.
>
> So despite me saying that "this is not ok", and many other maintainers
> saying "this is not ok", you applied a patch with our objections on it?
> That is very odd and a bit rude.

The justifications brought forward are just regurgitating previous
misinformation.  If you do that, it's hard to take you seriously.

There is actually a good reason for using __u64: it's always based on
long long, so the format strings are no longer architecture-specific,
and those ugly macro hacks are not needed to achieve portability.  But
that's really the only reason I'm aware of.  Admittedly, it's a pretty
good reason.

>> I would like that if you still oppose to the patch, at least were able to
>> provide some facts to this discussion.
>
> The fact is that the kernel can not use the namespace that userspace has
> with ISO C names.  It's that simple as the ISO standard does NOT
> describe the variable types for an ABI that can cross the user/kernel
> boundry.

You cannot avoid using certain ISO C names with current GCC or Clang,
however hard you try.  But currently, the kernel does not try at all,
not really: it is not using -ffreestanding and -fno-builtin, at least
not consistently.  This means that if the compiler sees a known function
(with the right name and a compatible prototype), it will optimize based
on that.  What kind of headers you use does not matter.

, ,  are compiler-provided headers that
are designed to be safe to use for bare-metal contexts (like in
kernels).  Avoiding them is not necessary per se.  However, 
is not particularly useful if you want to use your own printf-style
functions with the usual format specifiers (see above for __u64).  But
on its own, it's perfectly safe to use.  You have problems with
 *because* you use well-known, standard facilities in kernel
space (the printf format specifiers), not because you avoid them.  So
exactly the opposite of what you say.

> But until then, we have to stick to our variable name types,
> just like all other operating systems have to (we are not alone here.)

FreeBSD uses  and the  formatting macros in kernel
space.  I don't think that's unusual at all for current kernels.  It's
particularly safe for FreeBSD because they use a monorepo and toolchain
variance among developers is greatly reduced.  Linux would need to
provide its own  equivalent for the formatting macros
(as it's not a compiler header; FreeBSD has ).

At this point and with the current ABIs we have for Linux, it makes
equal (maybe more) sense to avoid the  types altogether and
use Linux-specific typedefs with have architecture-independent format
strings.

Thanks,
Florian

Re: [PATCH] Add optional __Bfloat16 support

2022-06-10 Thread Florian Weimer via Gcc-patches

* liuhongt via Libc-alpha:

> +\subsubsection{Special Types}
> +
> +The \code{__Bfloat16} type uses a 8-bit exponent and 7-bit mantissa.
> +It is used for \code{BF16} related intrinsics, it cannot be
> +used with standard C operators.

I think it's not necessary to specify whether the type supports certain
C operators (surely assignment will work?).  If they are added later,
the ABI won't need changing.

Thanks,
Florian

Re: [PATCH v2] x86: Document -mcet-switch

2022-05-19 Thread Florian Weimer via Gcc-patches

* H. J. Lu:

> How about this?
>
> @item -mcet-switch
> @opindex mcet-switch
> By default, CET instrumentation is turned off on switch statements that
> use a jump table and indirect branch track is disabled.

Maybe add here: “Since jump tables are stored in read-only memory, this
does not result in a direct loss of hardening.  But if the jump table
index is attacker-controlled, the indirect jump may not be constrained
by CET.”

> This option turns on CET instrumentation to enable indirect branch
> track for switch statements with jump tables.

“This results in a loss of hardening because the jump targets are mow
reachable via all indirect jumps.”

Maybe GCC should just emit a forced (unoptimized) bounds check for jump
tables in CET mode …

Thanks,
Florian

Re: Supporting RISC-V Vendor Extensions in the GNU Toolchain

2022-05-13 Thread Florian Weimer via Gcc-patches

* Christoph Müllner via Binutils:

> I'd like to add two points to this topic and raise two questions.
>
> 1) Accepting vendor extensions = avoidance of fragmentation
>
> RISC-V implementors are actively encouraged to implement their
> own ISA extensions. To avoid fragmentation in the SW ecosystem
> (every vendor maintains a fork of tools, distros and binaries) there
> needs to be a principle acceptance to get vendor extension support
> upstream.

If you eventually want portable binaries, it's necessary to converge on
a small set of widely implemented extensions.  x86 didn't have this, and
adoption was poor outside specialized libraries (and JIT, of course).
Yet everything was as upstream as possible (ISA manuals, assemblers,
compiler intrinsics, even automated adoption by optimizers).  So
upstreaming is only the first step.

Not every useful CPU feature can be adopted through run-time dispatching
(IFUNCs, function multi-versionining).

Thanks,
Florian

Re: [PATCH] x86: Document -mno-cet-switch

2022-05-11 Thread Florian Weimer via Gcc-patches

* H. J. Lu:

> On Wed, May 11, 2022 at 11:45 AM Florian Weimer  wrote:
>>
>> * H. J. Lu:
>>
>> >> NOTRACK avoids the need for ENDBR instructions, right?  That's a
>> >> hardening improvement, so it should be used by default.
>> >
>> > NOTRACK weakens IBT since it disables IBT on the indirect jump instruction.
>> > GCC uses it in the jump table to avoid ENDBR.
>>
>> Typical jump table code looks like this:
>>
>> Dump of assembler code for function __cache_sysconf:
>>0x000f7a80 <+0>: endbr64
>>0x000f7a84 <+4>: sub$0xb9,%edi
>>0x000f7a8a <+10>:cmp$0xc,%edi
>>0x000f7a8d <+13>:ja 0xf7b70 <__cache_sysconf+240>
>>0x000f7a93 <+19>:lea0xba926(%rip),%rdx# 0x1b23c0
>>0x000f7a9a <+26>:movslq (%rdx,%rdi,4),%rax
>>0x000f7a9e <+30>:add%rdx,%rax
>>0x000f7aa1 <+33>:notrack jmp *%rax
>>
>> There's no ENDBR instruction between range check, the address
>> computation, and the NOTRACK JMP, so it's not possible to redirect that
>> JMP to some other place.
>
> That is the assumption we made.   RAX will always point to the valid
> address.

Which means that NOTRACK should not weaken anything here.  What am I
missing?

Thanks,
Florian

Re: [PATCH] x86: Document -mno-cet-switch

2022-05-11 Thread Florian Weimer via Gcc-patches

* H. J. Lu:

>> NOTRACK avoids the need for ENDBR instructions, right?  That's a
>> hardening improvement, so it should be used by default.
>
> NOTRACK weakens IBT since it disables IBT on the indirect jump instruction.
> GCC uses it in the jump table to avoid ENDBR.

Typical jump table code looks like this:

Dump of assembler code for function __cache_sysconf:
   0x000f7a80 <+0>: endbr64 
   0x000f7a84 <+4>: sub$0xb9,%edi
   0x000f7a8a <+10>:cmp$0xc,%edi
   0x000f7a8d <+13>:ja 0xf7b70 <__cache_sysconf+240>
   0x000f7a93 <+19>:lea0xba926(%rip),%rdx# 0x1b23c0
   0x000f7a9a <+26>:movslq (%rdx,%rdi,4),%rax
   0x000f7a9e <+30>:add%rdx,%rax
   0x000f7aa1 <+33>:notrack jmp *%rax

There's no ENDBR instruction between range check, the address
computation, and the NOTRACK JMP, so it's not possible to redirect that
JMP to some other place.

I don't know if GCC systematically enforces this in its optimizers,
though.

Thanks,
Florian

Re: [PATCH] x86: Document -mno-cet-switch

2022-05-11 Thread Florian Weimer via Gcc-patches

* H. J. Lu:

>> >> > Generate jump tables with ENDBR and skip the NOTRACK prefix for indirect
>> >> > jump.  Document -mno-cet-switch to turn off CET instrumentation on jump
>> >> > tables for switch statements.
>> >>
>> >> Of course, that is a slight regression in security hardening.
>> >>
>> >> Quite frankly, I'm puzzled why the kernel decided to require these
>> >> additional ENDBR instructions.
>> >
>> > Kernel is using -mcet-switch today.   Should we document -mcet-switch
>> > and keep it off by default instead?
>>
>> Sorry, I'm not 100% certain of the mechanics/options involved.
>>
>> I think the default should reflect userspace requirements, like with the
>> red zone and vector register usage for integer code.
>
> The question is if the compiler should use NOTRACK by default for
> the jump table.   It is independent of whether NOTRACK is enabled or
> not.

NOTRACK avoids the need for ENDBR instructions, right?  That's a
hardening improvement, so it should be used by default.

Thanks,
Florian

Re: [PATCH] x86: Document -mno-cet-switch

2022-05-11 Thread Florian Weimer via Gcc-patches

* H. J. Lu:

> On Wed, May 11, 2022 at 1:12 AM Florian Weimer  wrote:
>>
>> * H. J. Lu via Gcc-patches:
>>
>> > When -fcf-protection=branch is used, the compiler will generate jump
>> > tables where the indirect jump is prefixed with the NOTRACK prefix, so
>> > it can jump to non-ENDBR targets. Yet, for NOTRACK prefixes to work, the
>> > NOTRACK specific enable bit must be set, what renders the binary broken
>> > on any environment where this is not the case. In fact, having NOTRACK
>> > disabled was a design choice for the Linux kernel CET support.
>>
>> Why isn't that a kernel bug?  It doesn't match what is in the current
>> glibc sources.
>
> User space uses NOTRACK in the jump table in assembly codes.

And is expected to continue to use it?

>> > Generate jump tables with ENDBR and skip the NOTRACK prefix for indirect
>> > jump.  Document -mno-cet-switch to turn off CET instrumentation on jump
>> > tables for switch statements.
>>
>> Of course, that is a slight regression in security hardening.
>>
>> Quite frankly, I'm puzzled why the kernel decided to require these
>> additional ENDBR instructions.
>
> Kernel is using -mcet-switch today.   Should we document -mcet-switch
> and keep it off by default instead?

Sorry, I'm not 100% certain of the mechanics/options involved.

I think the default should reflect userspace requirements, like with the
red zone and vector register usage for integer code.

Thanks,
Florian

Re: [PATCH] x86: Document -mno-cet-switch

2022-05-11 Thread Florian Weimer via Gcc-patches

* H. J. Lu via Gcc-patches:

> When -fcf-protection=branch is used, the compiler will generate jump
> tables where the indirect jump is prefixed with the NOTRACK prefix, so
> it can jump to non-ENDBR targets. Yet, for NOTRACK prefixes to work, the
> NOTRACK specific enable bit must be set, what renders the binary broken
> on any environment where this is not the case. In fact, having NOTRACK
> disabled was a design choice for the Linux kernel CET support.

Why isn't that a kernel bug?  It doesn't match what is in the current
glibc sources.

> Generate jump tables with ENDBR and skip the NOTRACK prefix for indirect
> jump.  Document -mno-cet-switch to turn off CET instrumentation on jump
> tables for switch statements.

Of course, that is a slight regression in security hardening.

Quite frankly, I'm puzzled why the kernel decided to require these
additional ENDBR instructions.

Thanks,
Florian

Re: [PATCH] eliminate mutex in fast path of __register_frame

2022-03-03 Thread Florian Weimer via Gcc-patches

* Thomas Neumann:

>>> +// Common logic for version locks
>>> +struct version_lock
>>> +{
>>> +  // The lock itself
>>> +  uintptr_t version_lock;
>>> +};
>> version_lock must not overflow, right?  This means we need a wider
>> counter on 32-bit, too.  glibc contains a 62-bit counter that it uses
>> for its own data structure.
>
> an overflow is not a problem per se, it is only problematic if we hit
> exactly the same value again between lock_optimistic and
> validate. Note that these ranges are usually a handful of assembler
> instructions, and we would have to see 4 billion frame registrations
> in that short time span. I don't think that is a problem in
> practice. But I can switch to 64bit, of course, if you think the risk
> is too high.

This is more of a GCC policy question, which I cannot answer.

At least it needs a comment explaining the decision to ignore overflows.

> But we could eliminate the spinlock aspect by using a global mutex,
> which would guarantee us that nothing is locked exclusively and thus
> no spinning is required. That would also allow us to fix the
> async-signal-safety at the same time if needed by blocking signals
> globally during updates.

Again, I don't know if people consider the spinning a problem.  In
glibc, we would use that mutex.

> Note that the current code is not async-signal-safe either, it simply
> grabs a mutex. If a signal happens while calling __register_frame, and 
> that handler tries to unwind, the current code will deadlock, too.

Yes, understood.

>>> +// Validate a previously acquire lock
>>> +static inline bool
>>> +version_lock_validate (const struct version_lock *vl, uintptr_t lock)
>>> +{
>>> +  // Check that the node is still in the same state
>>> +  uintptr_t state = __atomic_load_n (&(vl->version_lock), 
>>> __ATOMIC_SEQ_CST);
>>> +  return (state == lock);
>>> +}
>> I think an acquire fence is missing before the __atomic_load_n.  We
>> learned this the hard way in the glibc implementation.  The reference
>> that Szabolcs found is:
>>   Hans Boehm, Can Seqlocks Get Along with Programming Language
>>   Memory Models?, Section 4.
>
> thanks for the pointer, I will look at this more carefully. When I
> read the text correctly we need a
> __atomic_thread_fence(__ATOMIC_ACQUIRE) before the load, but I will
> double check that.

The equivalent glibc function looks like this:

/* Return true if the read was successful, given the start
   version.  */
static inline bool
_dlfo_read_success (uint64_t start_version)
{
  /* See Hans Boehm, Can Seqlocks Get Along with Programming Language
 Memory Models?, Section 4.  This is necessary so that loads in
 the TM region are not ordered past the version check below.  */
  atomic_thread_fence_acquire ();

  /* Synchronizes with the fences in _dlfo_mappings_begin_update,
 _dlfo_mappings_end_update.  It is important that all stores from
 the last update have become visible, and stores from the next
 update to this version are not before the version number is
 updated.

 Unlike with seqlocks, there is no check for odd versions here
 because we have read the unmodified copy (confirmed to be
 unmodified by the unchanged version).  */
  return _dlfo_read_start_version () == start_version;
}

>>> +// Allocate a node. This node will be returned in locked exclusive state
>>> +static struct btree_node *
>>> +btree_allocate_node (struct btree *t, bool inner)
>>> +{
>> 
>>> +  // No free page available, allocate a new one
>>> +  struct btree_node *new_node
>>> +   = (struct btree_node *) (malloc (sizeof (struct btree_node)));
>>> +  version_lock_initialize_locked_exclusive (
>>> +   &(new_node->version_lock)); // initialize the node in locked state
>>> +  new_node->entry_count = 0;
>>> +  new_node->type = inner ? btree_node_inner : btree_node_leaf;
>>> +  return new_node;
>>> +}
>>> +}
>> This needs some form of error checking for malloc.  But I see the
>> existing code does not have that, either. 8-(
>
> and I do not see how we can really handle a malloc failure here. What
> should we do except die?

We may have to add a new interface.  In some other cases, I've seen
errno being used for error reporting, but that requires changes in
applications to recognize the error.  It's probably better to crash here
than to fail mysteriously later.

Out of curiosity, how many times do you call the registration functions
from your application?  Would a batch registration interface help?  It
could address error reporting as well.

>>> +// Find the corresponding entry the given address
>>> +static struct object *
>>> +btree_lookup (const struct btree *t, uintptr_t target_addr)
>>> +{
>> 
>>> +  if (type == btree_node_inner)
>>> +   {
>>> + // We cannot call find_inner_slot here because we can only trust our
>>> + // validated entries
>>> + unsigned slot = 0;
>>> + while (((slot + 1) < entry_count)
>>> +&&

Re: [PATCH] eliminate mutex in fast path of __register_frame

2022-03-02 Thread Florian Weimer via Gcc-patches

* Thomas Neumann:

> +#ifndef HIDE_EXPORTS
> +#pragma GCC visibility push(default)
> +#endif

All defined functions are static, right?  Then this isn't necessary.

> +// Common logic for version locks
> +struct version_lock
> +{
> +  // The lock itself
> +  uintptr_t version_lock;
> +};

version_lock must not overflow, right?  This means we need a wider
counter on 32-bit, too.  glibc contains a 62-bit counter that it uses
for its own data structure.

> +// Lock the node exclusive, blocking as needed
> +static void
> +version_lock_lock_exclusive (struct version_lock *vl)
> +{
> +  // We should virtually never get contention here, as frame
> +  // changes are rare. Thus we use a simple spinlock

Isn't this problematic if there are multiple compiler threads that race
to register their output with the unwinder?

If updates are rare, is per-node locking really needed?

What can we do to make this async-signal-safe, so that a call into the
unwinder from a signal handler does not spin endlessly if the receiving
thread is currently registering or deregistering a frame?  Is simply
calling sigprocmask around the register and unregister operations
enough?  (These operations don't need to be async-signal-safe, only
lookup should be.)  This can be a future change if we feel confident
that it's possible to rectify this later if needed.

> +// Validate a previously acquire lock
> +static inline bool
> +version_lock_validate (const struct version_lock *vl, uintptr_t lock)
> +{
> +  // Check that the node is still in the same state
> +  uintptr_t state = __atomic_load_n (&(vl->version_lock), __ATOMIC_SEQ_CST);
> +  return (state == lock);
> +}

I think an acquire fence is missing before the __atomic_load_n.  We
learned this the hard way in the glibc implementation.  The reference
that Szabolcs found is:

 Hans Boehm, Can Seqlocks Get Along with Programming Language
 Memory Models?, Section 4.

> +static void
> +btree_release_tree_recursively (struct btree *t, struct btree_node *n);
> +
> +// Destroy a tree and release all nodes. Not used currently, but could be 
> called
> +// at shutdown to destroy the frame lookup
> +static void
> +btree_destroy (struct btree *t)
> +{

The comment seems to be incorrect, it is used?  Otherwise there should
be a compiler warning.

> +// Allocate a node. This node will be returned in locked exclusive state
> +static struct btree_node *
> +btree_allocate_node (struct btree *t, bool inner)
> +{

> +  // No free page available, allocate a new one
> +  struct btree_node *new_node
> + = (struct btree_node *) (malloc (sizeof (struct btree_node)));
> +  version_lock_initialize_locked_exclusive (
> + &(new_node->version_lock)); // initialize the node in locked state
> +  new_node->entry_count = 0;
> +  new_node->type = inner ? btree_node_inner : btree_node_leaf;
> +  return new_node;
> +}
> +}

This needs some form of error checking for malloc.  But I see the
existing code does not have that, either. 8-(

> +// Find the corresponding entry the given address
> +static struct object *
> +btree_lookup (const struct btree *t, uintptr_t target_addr)
> +{

> +  if (type == btree_node_inner)
> + {
> +   // We cannot call find_inner_slot here because we can only trust our
> +   // validated entries
> +   unsigned slot = 0;
> +   while (((slot + 1) < entry_count)
> +  && (iter->content.children[slot].separator < target_addr))
> + ++slot;
> +   struct btree_node *child = iter->content.children[slot].child;
> +   if (!btree_node_validate (iter, lock))
> + goto restart;

I don't understand the comment about not using find_inner_slot.

I think if you use separate free lists for inner nodes and leaf nodes,
you never need to structurally modify the data.  Then you can avoid
validating on every level (with proper relaxed MO everywhere; right now
GCC might turn some of the array copies into memcpy, which may use
weaker-than-relaxed MO on some targets and actually produce
out-of-thin-air values).

> diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
> index 8ee55be5675..56be596713b 100644
> --- a/libgcc/unwind-dw2-fde.c
> +++ b/libgcc/unwind-dw2-fde.c
> @@ -42,15 +42,34 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
> If not, see
>   #endif
>   #endif
>   
> +#ifdef ATOMIC_FDE_FAST_PATH
> +#include "unwind-dw2-btree.h"
> +
> +static struct btree registered_frames;
> +
> +static void
> +release_registered_frames (void) __attribute__ ((destructor (110)));
> +static void
> +release_registered_frames (void)
> +{
> +  /* Release the b-tree and all frames. Frame releases that happen later are
> +   * silently ignored */
> +  btree_destroy (_frames);
> +}

Is the comment correct?  Won't a subsequent frame release necessarily
involve a use-after-free bug?  btree_destroy is only safe to call from a
quiesced process.  Deallocating these data structure is mainly relevant
for mtrace and valgrind

[PATCH] libgcc: Fix _Unwind_Find_FDE for missing unwind data with glibc 2.35

2022-01-24 Thread Florian Weimer via Gcc-patches

_dl_find_object returns success even if no unwind information has been
found, and dlfo_eh_frame is NULL.

libgcc/ChangeLog:

PR libgcc/104207
* unwind-dw2-fde-dip.c (_Unwind_Find_FDE): Add NULL check.

---
 libgcc/unwind-dw2-fde-dip.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgcc/unwind-dw2-fde-dip.c b/libgcc/unwind-dw2-fde-dip.c
index 7de847cb120..3d6f39f5460 100644
--- a/libgcc/unwind-dw2-fde-dip.c
+++ b/libgcc/unwind-dw2-fde-dip.c
@@ -509,7 +509,7 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases)
 #ifdef DLFO_STRUCT_HAS_EH_DBASE
   {
 struct dl_find_object dlfo;
-if (_dl_find_object (pc, ) == 0)
+if (_dl_find_object (pc, ) == 0 && dlfo.dlfo_eh_frame != NULL)
   return find_fde_tail ((_Unwind_Ptr) pc, dlfo.dlfo_eh_frame,
 # if DLFO_STRUCT_HAS_EH_DBASE
(_Unwind_Ptr) dlfo.dlfo_eh_dbase,

Re: [PATCH] elf: Add __libc_get_static_tls_bounds [BZ #16291]

2021-12-20 Thread Florian Weimer via Gcc-patches

* Fāng-ruì Sòng:

>> I *think* you can get what you need via existing GLIBC_PRIVATE
>> interfaces.  But in order to describe how to caox the data out of glibc,
>> I need to know what you need.
>
> Unfortunate no, not reliably. Currently _dl_get_tls_static_info
> (https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp#L207)
> is used. It has the size information but not the start address, so the
> sanitizer runtime is doing very ugly/error-prone probing of the start
> address by reading the thread pointer and hard coding the `struct
> pthread` size for various glibc versions.
> Every time `struct pthreads` increases in glibc, sanitizer runtime has to 
> adapt.
>
> __libc_get_static_tls_bounds if supported by glibc, can replace
> _dl_get_tls_static_info and the existing glibc specific GetTls hacks,
> and provide a bit more compatibility when glibc struct pthreads
> increases.

We already export the size of struct pthread, though, as a GLIBC_PRIVATE
symbol.

  const unsigned int *psize = dlvsym("_thread_db_sizeof_pthread",
 "GLIBC_PRIVATE");
  if (psize != nullptr) {
val = *psize;
atomic_store_relaxed(_descriptor_size, val);
return val;
  }

in ThreadDescriptorSize() in
compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp should work
with any further glibc changes today.  It definitely beats hard-coding
the architecture-specific struct pthread size.

Using _thread_db_sizeof_pthread does not remove all magic constants from
the code, and there is of course the _dl_get_tls_static_info symbol
usage.

This is a fairly recent change (glibc 2.34, I believe) that was
introduced to improve thread debugging without debuginfo present in the
glibc objects (which is why it's a GLIBC_PRIVATE symbol).  libthread_db
is bundled with glibc, so the _thread_db_* symbols are not gurantueed to
be stable, but there are no immediate plans to change this interface.

Thanks,
Florian

Re: [PATCH] libstdc++: Allow std::condition_variable waits to be cancelled [PR103382]

2021-12-07 Thread Florian Weimer via Gcc-patches

* Jonathan Wakely:

> On Tue, 7 Dec 2021, 21:20 Florian Weimer via Libstdc++, 
> 
> wrote:
>
>  * Jonathan Wakely via Libstdc:
>
>  > If necessary we could keep the terminate-on-cancellation behaviour as
>  > _ZNSt18condition_variable4waitERSt11unique_lockISt5mutexE@GLIBCXX_3.4.11
>  > and export the new behaviour as @@GLIBCXX_3.4.30, although this patch
>  > doesn't do that.
>
>  Note that if this fix escapes into the wild and then you have to make
>  the symbol version change, you will break newer applications.  In a few
>  cases in glibc, we proactively added aliases at a different symbol
>  version, but with the same implementation (at first).
>
> To be safe, we probably should preserve the old behaviour for the old
> version of the symbol. If we decide that the new behaviour is always
> preferable, we could change that later by making the old symbol an
> alias for the new. If we don't decide that, we'll be glad we made it a
> separate symbol.

On the other hand, with separate versions, it's possible to reintroduce
the old behavior at a later date, as a bugfix.  It's not strictly
necessary to do that work upfront.  It's just nice to have this option.

> I'll see if I can get it working with two versioned symbols. We don't
> actually do that in libstdc++ currently, we only have a single version
> of every symbol.

Ping me if you want to discuss options. 8->

Out of curiosity—do you support building libstdc++ (the shared object)
with a different compiler than the included GCC?

Thanks,
Florian

Re: [PATCH] libstdc++: Allow std::condition_variable waits to be cancelled [PR103382]

2021-12-07 Thread Florian Weimer via Gcc-patches

* Jonathan Wakely via Libstdc:

> If necessary we could keep the terminate-on-cancellation behaviour as
> _ZNSt18condition_variable4waitERSt11unique_lockISt5mutexE@GLIBCXX_3.4.11
> and export the new behaviour as @@GLIBCXX_3.4.30, although this patch
> doesn't do that.

Note that if this fix escapes into the wild and then you have to make
the symbol version change, you will break newer applications.  In a few
cases in glibc, we proactively added aliases at a different symbol
version, but with the same implementation (at first).

Thanks,
Florian

[PATCH v3] elf: Add _dl_find_object function

2021-12-03 Thread Florian Weimer via Gcc-patches

It can be used to speed up the libgcc unwinder, and the internal
_dl_find_dso_for_object function (which is used for caller
identification in dlopen and related functions, and in dladdr).

_dl_find_object is in the internal namespace due to bug 28503.
If libgcc switches to _dl_find_object, this namespace issue will
be fixed.  It is located in libc for two reasons: it is necessary
to forward the call to the static libc after static dlopen, and
there is a link ordering issue with -static-libgcc and libgcc_eh.a
because libc.so is not a linker script that includes ld.so in the
glibc build tree (so that GCC's internal -lc after libgcc_eh.a does
not pick up ld.so).

It is necessary to do the i386 customization in the
sysdeps/x86/bits/dl_find_object.h header shared with x86-64 because
otherwise, multilib installations are broken.

The implementation uses software transactional memory, as suggested
by Torvald Riegel.  Two copies of the supporting data structures are
used, also achieving full async-signal-safety.

---
v3: Introduce _dlfo_lookup, to consolidate the core object-finding
logic, as suggested by Adhemerval.  Added some struct padding
suggested by Jakub.

 NEWS   |   4 +
 bits/dl_find_object.h  |  32 +
 dlfcn/Makefile |   2 +-
 dlfcn/dlfcn.h  |  27 +
 elf/Makefile   |  47 +-
 elf/Versions   |   3 +
 elf/dl-close.c |   4 +
 elf/dl-find_object.c   | 842 +
 elf/dl-find_object.h   | 115 +++
 elf/dl-libc_freeres.c  |   2 +
 elf/dl-open.c  |   5 +
 elf/dl-support.c   |   3 +
 elf/libc-dl_find_object.c  |  26 +
 elf/rtld.c |  11 +
 elf/rtld_static_init.c |   1 +
 elf/tst-dl_find_object-mod1.c  |  10 +
 elf/tst-dl_find_object-mod2.c  |  15 +
 elf/tst-dl_find_object-mod3.c  |  10 +
 elf/tst-dl_find_object-mod4.c  |  10 +
 elf/tst-dl_find_object-mod5.c  |  11 +
 elf/tst-dl_find_object-mod6.c  |  11 +
 elf/tst-dl_find_object-mod7.c  |  10 +
 elf/tst-dl_find_object-mod8.c  |  10 +
 elf/tst-dl_find_object-mod9.c  |  10 +
 elf/tst-dl_find_object-static.c|  22 +
 elf/tst-dl_find_object-threads.c   | 275 +++
 elf/tst-dl_find_object.c   | 240 ++
 include/atomic_wide_counter.h  |  14 +
 include/bits/dl_find_object.h  |   1 +
 include/dlfcn.h|   2 +
 include/link.h |   3 +
 manual/Makefile|   2 +-
 manual/dynlink.texi| 137 
 manual/libdl.texi  |  10 -
 manual/probes.texi |   2 +-
 manual/threads.texi|   2 +-
 sysdeps/arm/bits/dl_find_object.h  |  25 +
 sysdeps/generic/ldsodefs.h |   5 +
 sysdeps/mach/hurd/i386/libc.abilist|   1 +
 sysdeps/nios2/bits/dl_find_object.h|  25 +
 sysdeps/unix/sysv/linux/aarch64/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/alpha/libc.abilist |   1 +
 sysdeps/unix/sysv/linux/arc/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/arm/be/libc.abilist|   1 +
 sysdeps/unix/sysv/linux/arm/le/libc.abilist|   1 +
 sysdeps/unix/sysv/linux/csky/libc.abilist  |   1 +
 sysdeps/unix/sysv/linux/hppa/libc.abilist  |   1 +
 sysdeps/unix/sysv/linux/i386/libc.abilist  |   1 +
 sysdeps/unix/sysv/linux/ia64/libc.abilist  |   1 +
 sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist |   1 +
 sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/microblaze/be/libc.abilist |   1 +
 sysdeps/unix/sysv/linux/microblaze/le/libc.abilist |   1 +
 .../unix/sysv/linux/mips/mips32/fpu/libc.abilist   |   1 +
 .../unix/sysv/linux/mips/mips32/nofpu/libc.abilist |   1 +
 .../unix/sysv/linux/mips/mips64/n32/libc.abilist   |   1 +
 .../unix/sysv/linux/mips/mips64/n64/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/nios2/libc.abilist |   1 +
 .../sysv/linux/powerpc/powerpc32/fpu/libc.abilist  |   1 +
 .../linux/powerpc/powerpc32/nofpu/libc.abilist |   1 +
 .../sysv/linux/powerpc/powerpc64/be/libc.abilist   |   1 +
 .../sysv/linux/powerpc/powerpc64/le/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist|

Re: [committed] libstdc++: Optimize ref-count updates in COW std::string

2021-12-01 Thread Florian Weimer via Gcc-patches

* Jonathan Wakely via Libstdc:

> diff --git a/libstdc++-v3/include/bits/cow_string.h 
> b/libstdc++-v3/include/bits/cow_string.h
> index ced395b80b8..4fae1d02981 100644
> --- a/libstdc++-v3/include/bits/cow_string.h
> +++ b/libstdc++-v3/include/bits/cow_string.h
> @@ -105,7 +105,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> *  destroy the empty-string _Rep object.
> *
> *  All but the last paragraph is considered pretty conventional
> -   *  for a C++ string implementation.
> +   *  for a Copy-On-Write C++ string implementation.
>*/
>// 21.3  Template class basic_string
>template
> @@ -207,10 +207,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> // so we need to use an atomic load. However, _M_is_leaked
> // predicate does not change concurrently (i.e. the string is either
> // leaked or not), so a relaxed load is enough.
> -   return __atomic_load_n(>_M_refcount, __ATOMIC_RELAXED) < 0;
> -#else
> -   return this->_M_refcount < 0;
> +   if (!__gnu_cxx::__is_single_threaded())
> + return __atomic_load_n(>_M_refcount, __ATOMIC_RELAXED) < 0;
>  #endif
> +   return this->_M_refcount < 0;
>   }

Relaxed MO loads of word-size values on all current architectures only
have a compiler barrier, so I think the optimization makes things worse?
(I doubt the conditional lack of a compiler barrier leads to
optimization improvements elsewhere.)

Thanks,
Florian

Re: [PATCH] Fix alignment of stack slots for overaligned types [PR103500]

2021-11-30 Thread Florian Weimer via Gcc-patches

* Alex Coplan via Gcc-patches:

> Bootstrapped and regtested on aarch64-linux-gnu, x86_64-linux-gnu, and
> arm-linux-gnueabihf: no regressions.
>
> I'd appreciate any feedback. Is it OK for trunk?

Does this need an ABI warning?

Thanks,
Florian

Re: [PATCH] elf: Add __libc_get_static_tls_bounds [BZ #16291]

2021-11-29 Thread Florian Weimer via Gcc-patches

* Fāng-ruì Sòng:

> PING^3

I think the core issue with this patch is like this:

* I do not want to commit glibc to a public API that disallows future
  changes to the way we allocate static TLS.  While static TLS objects
  cannot move in memory, the extent of the static TLS area (minimum and
  maximum address) is not fixed by ABI necessities.

* Joseph does not want to add a GLIBC_PRIVATE ABI that is exclusively
  used externally.

I have tried repeatly to wrap my head around how the sanitizers use the
static TLS boundary information.  Based on what I can tell, there are
several applications:

1. Thead Sanitizer needs to know application-accessible thread-specific
   memory that is carried via the glibc thread (stack) cache from one
   thread to another one, seemingly without synchronization.  Since the
   synchronization happens internally within glibc, without that extra
   knowledge, Thread Sanitizer would report a false positive.  This
   covers only data to which the application has direct access, internal
   access by glibc does not count because it is not going to be
   instrumented.

2. Address Sanitizer needs TLS boundary information for bounds checking.
   Again this only applies to accesses that can be instrumented, so only
   objects whose addresses leak to application code count.  (Maybe this
   is a fringe use case, though: it seems to apply only to “extern
   __thread int a[];“ and similar declarations, where the declared type
   is not enough.)

3. Leak Sanitizer needs to find all per-thread pointers pointing into
   the malloc heap, within memory not itself allocated by malloc.  This
   includes the DTV, pthread_getspecific metadata, and perhaps also user
   data set by pthread_getspecific, and of course static TLS variables.
   This goes someone beyond what people would usually consider static
   TLS: the DTV and struct pthread are not really part of static TLS as
   such.  And struct pthread has several members that contain malloc'ed
   pointers.

Is this a complete list of uses?

I *think* you can get what you need via existing GLIBC_PRIVATE
interfaces.  But in order to describe how to caox the data out of glibc,
I need to know what you need.

(Cc:ing a few people from a recent GCC thread.)

Thanks,
Florian

Re: [PATCH v2] elf: Add _dl_find_object function

2021-11-26 Thread Florian Weimer via Gcc-patches

* Jakub Jelinek:

> On Thu, Nov 25, 2021 at 09:35:14PM +0100, Florian Weimer wrote:
>> +struct dl_find_object
>> +{
>> +  unsigned long long int dlfo_flags;/* Currently 0. */
>> +  void *dlfo_map_start; /* Beginning of mapping containing 
>> address.  */
>> +  void *dlfo_map_end;   /* End of mapping.  */
>> +  struct link_map *dlfo_link_map;
>> +  void *dlfo_eh_frame;  /* Exception handling data of the 
>> object.  */
>> +# if DLFO_STRUCT_HAS_EH_DBASE
>> +  void *dlfo_eh_dbase;  /* Base address for DW_EH_PE_datarel.  
>> */
>> +# endif
>> +# if DLFO_STRUCT_HAS_EH_COUNT
>> +  int dlfo_eh_count;/* Number of exception handling 
>> entries.  */
>> +# endif
>> +};
>
> I must say I still don't really like these conditionally included
> fields, if in the future one needs some of them on some other architecture,
> we'd need to add another API or symbol version it etc.

That suggests to me that I should add a few __dwlfo_unused members.  And
if the fields are actually used, a future version would set a flag (in
case a zero value for the field has meaning).  We don't even have to
initialize these members today.

(Although I do not see much need for members like dbase: we are copying
a value that the link editor has computed.  It could have easily written
that to the EH segment, too.)

Thanks,
Florian

Re: [PATCH 4/4] libgcc: Use _dl_find_eh_frame in _Unwind_Find_FDE

2021-11-25 Thread Florian Weimer via Gcc-patches

* Jakub Jelinek:

>> +/* Fallback declaration for old glibc headers.  DL_FIND_EH_FRAME_DBASE is 
>> used
>> +   as a proxy to determine if  declares _dl_find_eh_frame.  */
>> +#if defined __GLIBC__ && !defined DL_FIND_EH_FRAME_DBASE
>> +#if NEED_DBASE_MEMBER
>> +void *_dl_find_eh_frame (void *__pc, void **__dbase) __attribute__ ((weak));
>> +#else
>> +void *_dl_find_eh_frame (void *__pc) __attribute__ ((weak));
>> +#endif
>> +#define USE_DL_FIND_EH_FRAME 1
>> +#define DL_FIND_EH_FRAME_CONDITION (_dl_find_eh_frame != NULL)
>> +#endif
>
> I'd prefer not to do this.  If we find glibc with the support in the
> headers, let's use it, otherwise let's keep using what we were doing before.

I've included a simplified version below, based on the _dl_find_object
patch for glibc.

This is a bit difficult to test, but I ran a full toolchain bootstrap
with GCC + glibc on all glibc-supported architectures (except Hurd and
one m68k variant; they do not presnetly build, see Joseph's testers).

I also tested this by copying the respective GCC-built libgcc_s into a
glibc build tree for run-time testing on i686-linux-gnu and
x86_64-linux-gnu.  There weren't any issues.  There are a buch of
unwinder tests in glibc, giving at least some coverage.

Thanks,
Florian

Subject: libgcc: Use _dl_find_object in _Unwind_Find_FDE

libgcc/ChangeLog:

* unwind-dw2-fde-dip.c (_Unwind_Find_FDE): Call _dl_find_object
if available.

---
 libgcc/unwind-dw2-fde-dip.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/libgcc/unwind-dw2-fde-dip.c b/libgcc/unwind-dw2-fde-dip.c
index fbb0fbdebb9..b837d8e4904 100644
--- a/libgcc/unwind-dw2-fde-dip.c
+++ b/libgcc/unwind-dw2-fde-dip.c
@@ -504,6 +504,24 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases)
   if (ret != NULL)
 return ret;
 
+  /* Use DLFO_STRUCT_HAS_EH_DBASE as a proxy for the existence of a glibc-style
+ _dl_find_object function.  */
+#ifdef DLFO_STRUCT_HAS_EH_DBASE
+  {
+struct dl_find_object dlfo;
+if (_dl_find_object (pc, ) == 0)
+  return find_fde_tail ((_Unwind_Ptr) pc, dlfo.dlfo_eh_frame,
+# if DLFO_STRUCT_HAS_EH_DBASE
+   (_Unwind_Ptr) dlfo.dlfo_eh_dbase,
+# else
+   NULL,
+# endif
+   bases);
+else
+  return NULL;
+}
+#endif /* DLFO_STRUCT_HAS_EH_DBASE */
+
   data.pc = (_Unwind_Ptr) pc;
 #if NEED_DBASE_MEMBER
   data.dbase = NULL;

[PATCH v2] elf: Add _dl_find_object function

2021-11-25 Thread Florian Weimer via Gcc-patches

I have reword the previous patch to make the interface more generally
useful.  Since there are now four words in the core arrays, I did away
with the separate base address array.  (We can bring it back in the
future if necessary.)  I fixed a bug in the handling of proxy map (by
not copying proxy maps during the dlopen update).  The placement of the
function is also different, as explained in the commit message.

The performance seems unchanged.

I haven't included the obvious future performance enhancements in this
patch, and also did not update to Arm's __gnu_Unwind_Find_exidx to use
the new interface.  I think this work can be done in follow-up patches.

Thanks,
Florian

Subject: elf: Add _dl_find_object function

It can be used to speed up the libgcc unwinder, and the internal
_dl_find_dso_for_object function (which is used for caller
identification in dlopen and related functions, and in dladdr).

_dl_find_object is in the internal namespace due to bug 28503.
If libgcc switches to _dl_find_object, this namespace issue will
be fixed.  It is located in libc for two reasons: it is necessary
to forward the call to the static libc after static dlopen, and
there is a link ordering issue with -static-libgcc and libgcc_eh.a
because libc.so is not a linker script that includes ld.so in the
glibc build tree (so that GCC's internal -lc after libgcc_eh.a does
not pick up ld.so).

It is necessary to do the i386 customization in the
sysdeps/x86/bits/dl_find_object.h header shared with x86-64 because
otherwise, multilib installations are broken.

The implementation uses software transactional memory, as suggested
by Torvald Riegel.  Two copies of the supporting data structures are
used, also achieving full async-signal-safety.

---
 NEWS   |   4 +
 bits/dl_find_object.h  |  32 +
 dlfcn/Makefile |   2 +-
 dlfcn/dlfcn.h  |  22 +
 elf/Makefile   |  47 +-
 elf/Versions   |   3 +
 elf/dl-close.c |   4 +
 elf/dl-find_object.c   | 841 +
 elf/dl-find_object.h   | 115 +++
 elf/dl-libc_freeres.c  |   2 +
 elf/dl-open.c  |   5 +
 elf/dl-support.c   |   3 +
 elf/libc-dl_find_object.c  |  26 +
 elf/rtld.c |  11 +
 elf/rtld_static_init.c |   1 +
 elf/tst-dl_find_object-mod1.c  |  10 +
 elf/tst-dl_find_object-mod2.c  |  15 +
 elf/tst-dl_find_object-mod3.c  |  10 +
 elf/tst-dl_find_object-mod4.c  |  10 +
 elf/tst-dl_find_object-mod5.c  |  11 +
 elf/tst-dl_find_object-mod6.c  |  11 +
 elf/tst-dl_find_object-mod7.c  |  10 +
 elf/tst-dl_find_object-mod8.c  |  10 +
 elf/tst-dl_find_object-mod9.c  |  10 +
 elf/tst-dl_find_object-static.c|  22 +
 elf/tst-dl_find_object-threads.c   | 275 +++
 elf/tst-dl_find_object.c   | 240 ++
 include/atomic_wide_counter.h  |  14 +
 include/bits/dl_find_object.h  |   1 +
 include/dlfcn.h|   2 +
 include/link.h |   3 +
 manual/Makefile|   2 +-
 manual/dynlink.texi| 137 
 manual/libdl.texi  |  10 -
 manual/probes.texi |   2 +-
 manual/threads.texi|   2 +-
 sysdeps/arm/bits/dl_find_object.h  |  25 +
 sysdeps/generic/ldsodefs.h |   5 +
 sysdeps/mach/hurd/i386/libc.abilist|   1 +
 sysdeps/nios2/bits/dl_find_object.h|  25 +
 sysdeps/unix/sysv/linux/aarch64/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/alpha/libc.abilist |   1 +
 sysdeps/unix/sysv/linux/arc/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/arm/be/libc.abilist|   1 +
 sysdeps/unix/sysv/linux/arm/le/libc.abilist|   1 +
 sysdeps/unix/sysv/linux/csky/libc.abilist  |   1 +
 sysdeps/unix/sysv/linux/hppa/libc.abilist  |   1 +
 sysdeps/unix/sysv/linux/i386/libc.abilist  |   1 +
 sysdeps/unix/sysv/linux/ia64/libc.abilist  |   1 +
 sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist |   1 +
 sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/microblaze/be/libc.abilist |   1 +
 sysdeps/unix/sysv/linux/microblaze/le/libc.abilist |   1 +

Re: [PATCH 3/4] libgcc: Split FDE search code from PT_GNU_EH_FRAME lookup

2021-11-23 Thread Florian Weimer via Gcc-patches

* Jakub Jelinek:

> On Wed, Nov 03, 2021 at 05:28:48PM +0100, Florian Weimer wrote:
>> @@ -383,12 +376,34 @@ _Unwind_IteratePhdrCallback (struct dl_phdr_info 
>> *info, size_t size, void *ptr)
>>  # endif
>>  #endif
>>  
>> -  _Unwind_Ptr dbase = unw_eh_callback_data_dbase (data);
>> +  return 1;
>> +}
>> +
>> +/* Result type of find_fde_tail below.  */
>> +struct find_fde_tail_result
>> +{
>> +  const fde *entry;
>> +  void *func;
>> +};
>> +
>> +/* Find the FDE for the program counter PC, in a previously located
>> +   PT_GNU_EH_FRAME data region.  */
>> +static struct find_fde_tail_result
>> +find_fde_tail (_Unwind_Ptr pc,
>> +   const struct unw_eh_frame_hdr *hdr,
>> +   _Unwind_Ptr dbase)
>
> I think returning a struct like find_fde_tail_result can work nicely
> on certain targets, but on many others the psABI forces such returns through
> stack etc.
> Wouldn't it be better to return const fde * instead of
> struct find_fde_tail_result, pass in struct dwarf_eh_bases *bases
> as another argument to find_fde_tail, just return NULL on the failure
> cases and return some fde pointer and set bases->func on success?

I've refactored it further in the version below.  I introduced the
struct to consolidate the *bases struct update among the success cases,
but I think it's okay to do this inline.  I didn't introduce a separate
function for that.

I tested this in isolation on x86-64 and i386, with no apparent
regressions.

I think the changes are compatible with the fourth patch (which I still
have to rebase on top of that).

Thanks,
Florian

8<--8<
This allows switching to a different implementation for
PT_GNU_EH_FRAME lookup in a subsequent commit.

This moves some of the PT_GNU_EH_FRAME parsing out of the glibc loader
lock that is implied by dl_iterate_phdr.  However, the FDE is already
parsed outside the lock before this change, so this does not introduce
additional crashes in case of a concurrent dlclose.

libunwind/ChangeLog

* unwind-dw2-fde-dip.c (struct unw_eh_callback_data): Add hdr.
Remove func, ret.
(find_fde_tail): New function.  Split from
_Unwind_IteratePhdrCallback. Move the result initialization
from _Unwind_Find_FDE.
(_Unwind_Find_FDE): Updated to call find_fde_tail.

---
 libgcc/unwind-dw2-fde-dip.c | 92 -
 1 file changed, 50 insertions(+), 42 deletions(-)

diff --git a/libgcc/unwind-dw2-fde-dip.c b/libgcc/unwind-dw2-fde-dip.c
index 3f302826d2d..fbb0fbdebb9 100644
--- a/libgcc/unwind-dw2-fde-dip.c
+++ b/libgcc/unwind-dw2-fde-dip.c
@@ -113,8 +113,7 @@ struct unw_eh_callback_data
 #if NEED_DBASE_MEMBER
   void *dbase;
 #endif
-  void *func;
-  const fde *ret;
+  const struct unw_eh_frame_hdr *hdr;
   int check_cache;
 };
 
@@ -197,10 +196,6 @@ _Unwind_IteratePhdrCallback (struct dl_phdr_info *info, 
size_t size, void *ptr)
 #else
   _Unwind_Ptr load_base;
 #endif
-  const unsigned char *p;
-  const struct unw_eh_frame_hdr *hdr;
-  _Unwind_Ptr eh_frame;
-  struct object ob;
   _Unwind_Ptr pc_low = 0, pc_high = 0;
 
   struct ext_dl_phdr_info
@@ -348,10 +343,8 @@ _Unwind_IteratePhdrCallback (struct dl_phdr_info *info, 
size_t size, void *ptr)
 return 0;
 
   /* Read .eh_frame_hdr header.  */
-  hdr = (const struct unw_eh_frame_hdr *)
+  data->hdr = (const struct unw_eh_frame_hdr *)
 __RELOC_POINTER (p_eh_frame_hdr->p_vaddr, load_base);
-  if (hdr->version != 1)
-return 1;
 
 #ifdef CRT_GET_RFIB_DATA
 # if defined __i386__ || defined __nios2__
@@ -383,12 +376,30 @@ _Unwind_IteratePhdrCallback (struct dl_phdr_info *info, 
size_t size, void *ptr)
 # endif
 #endif
 
-  _Unwind_Ptr dbase = unw_eh_callback_data_dbase (data);
+  return 1;
+}
+
+/* Find the FDE for the program counter PC, in a previously located
+   PT_GNU_EH_FRAME data region.  *BASES is updated if an FDE to return is
+   found.  */
+
+static const fde *
+find_fde_tail (_Unwind_Ptr pc,
+  const struct unw_eh_frame_hdr *hdr,
+  _Unwind_Ptr dbase,
+  struct dwarf_eh_bases *bases)
+{
+  const unsigned char *p = (const unsigned char *) (hdr + 1);
+  _Unwind_Ptr eh_frame;
+  struct object ob;
+
+  if (hdr->version != 1)
+return NULL;
+
   p = read_encoded_value_with_base (hdr->eh_frame_ptr_enc,
base_from_cb_data (hdr->eh_frame_ptr_enc,
   dbase),
-   (const unsigned char *) (hdr + 1),
-   _frame);
+   p, _frame);
 
   /* We require here specific table encoding to speed things up.
  Also, DW_EH_PE_datarel here means using PT_GNU_EH_FRAME start
@@ -404,7 +415,7 @@ _Unwind_IteratePhdrCallback (struct dl_phdr_info *info, 
size_t size, void *ptr)
p, _count);
   /* Shouldn't happen.  */
   if

Re: [PATCH 3/3] elf: Add _dl_find_eh_frame function

2021-11-18 Thread Florian Weimer via Gcc-patches

* Jakub Jelinek:

> dl_iterate_phdr is declared in link.h and without the _ prefix, shouldn't
> dl_find_eh_frame follow the suit and be declared in the same header and
> also without the prefix?

We need to use the _ prefix due to this bug:

  dl_iterate_phdr namespace violation 
  

Not sure about moving to .  The interface is a bit like dladdr,
and that lives in .

> Also, shouldn't the DL_FIND_EH_FRAME_DBASE macro on the other side have
> __ prefix?  We have one DL_* macro, DL_CALL_FCT, so perhaps it is fine
> for -D_GNU_SOURCE, but various other projects do use macros with DL_*
> prefix, like boost or python.

 is not covered by standards, and we rarely use _GNU_SOURCE
conditionals in such headers.  The _ prefix is only needed because of
external linkage.

Thanks,
Florian

Re: [PATCH 3/3] elf: Add _dl_find_eh_frame function

2021-11-18 Thread Florian Weimer via Gcc-patches

* Jakub Jelinek:

> On Wed, Nov 03, 2021 at 05:28:02PM +0100, Florian Weimer wrote:
>> This function is similar to __gnu_Unwind_Find_exidx as used on arm.
>> It can be used to speed up the libgcc unwinder.
>
> I'm little bit worried that this trades the speed of exceptions for
> speed of dlopen/dlclose and extra memory use in each process.
> I admit I haven't been paying close attention to how many shared libraries
> apps typically link against and how many dlopen/dlclose calls they do
> in the last decade and half, but I'd think more applications don't use
> exceptions compared to apps that do use them, and of many of those that do
> use them don't use them for really exceptional cases, so speeding those
> is a good thing.

dlopen has many sources of quadratic behavior already, and many involve
chasing pointers.  The new data structure is very compact, so the new
work during dlopen does not show up prominently in profiles.

> So, I'd wonder, could this overhead be added lazily, when _dl_find_eh_frame
> is called for the first time just take the rtld lock, prepare anything you
> populate right now already from the process start up and every
> dlopen/dlclose before the first _dl_find_eh_frame call and only since then
> keep it updated on dlopen/dlclose?

I think it's possible to do this lazily (except the memory allocation).
But I don't want to do this unless we have performance numbers that
suggest it is actually required.

> Thus, for the expected majority of apps that aren't using exceptions at all
> nothing would change for dlopen/dlclose overhead, while all but the first
> _dl_find_eh_frame would be faster and with no locking?

One thing I'd like to do is to use the data structure in
_dl_find_dso_for_object, and that is actually called during dlopen to
determine the caller DSO.  _dl_find_dso_for_object can show up in
profiles with a lot of dlopen calls, particularly if an object loaded
later calls dlopen, so that the current implementation takes more time
to find the object.  _dl_find_dso_for_object is also used in dlsym,
although we skip it if the caller passes an explicit handle (but
RTLD_DEFAULT, RTLD_NEXT, etc. definitely need it).

We can also replace the soname and file identity lookup with a hash
table.  *That* will definitely recover any losses from
_dl_find_eh_frame_update.  In my profiles strcmp always shows up higher
than _dl_find_eh_frame_update.

Thanks,
Florian

Re: [PATCH 1/3] nptl: Extract from pthread_cond_common.c

2021-11-18 Thread Florian Weimer via Gcc-patches

* Jakub Jelinek:

> On Wed, Nov 03, 2021 at 05:27:42PM +0100, Florian Weimer wrote:
>> +  /* S3.  See __condvar_load_64_relaxed.  */
>
> Shouldn't that be See __atomic_wide_counter_load_relaxed ?

I'm going to fix the stale reference, thanks.

>> +  if (((l >> 31) > 0) && ((h >> 31) > 0))
>
> Any reason not to write it as (int) l < 0 && (int) h < 0?

This requires a GCC extension (which active by default, even without
-fwrapv).  Obviously we relay on this extension in many other place, but
still …

Thanks,
Florian

Re: [PATCH 3/3] elf: Add _dl_find_eh_frame function

2021-11-17 Thread Florian Weimer via Gcc-patches

* Adhemerval Zanella via Libc-alpha:

> However the code is somewhat complex and I would like to have some feedback
> if gcc will be willing to accept this change (I assume it would require
> this code merge on glibc beforehand).

There's a long review queue on the GCC side due to the stage1 close.
It may still be considered for GCC 12.  Jakub has also requested that
we hold off committing the glibc side until the GCC side is reviewed.

I'll flesh out the commit message and NEWS entry once we have agreed
upon the interface.

>> new file mode 100644
>> index 00..c7313c122d
>> --- /dev/null
>> +++ b/elf/dl-find_eh_frame.c

>> +/* Data for the main executable.  There is usually a large gap between
>> +   the main executable and initially loaded shared objects.  Record
>> +   the main executable separately, to increase the chance that the
>> +   range for the non-closeable mappings below covers only the shared
>> +   objects (and not also the gap between main executable and shared
>> +   objects).  */
>> +static uintptr_t _dl_eh_main_map_start attribute_relro;
>> +static struct dl_eh_frame_info _dl_eh_main_info attribute_relro;
>> +
>> +/* Data for initally loaded shared objects that cannot be unlaoded.
>
> s/initally/initially and s/unlaoded/unloaded.

Fixed.

>
>> +   The mapping base addresses are stored in address order in the
>> +   _dl_eh_nodelete_mappings_bases array (containing
>> +   _dl_eh_nodelete_mappings_size elements).  The EH data for a base
>> +   address is stored in the parallel _dl_eh_nodelete_mappings_infos.
>> +   These arrays are not modified after initialization.  */
>> +static uintptr_t _dl_eh_nodelete_mappings_end attribute_relro;
>> +static size_t _dl_eh_nodelete_mappings_size attribute_relro;
>> +static uintptr_t *_dl_eh_nodelete_mappings_bases attribute_relro;
>> +static struct dl_eh_frame_info *_dl_eh_nodelete_mappings_infos
>> +  attribute_relro;
>> +
>> +/* Mappings created by dlopen can go away with dlclose, so a data
>> +   dynamic data structure with some synchronization is needed.
>
> This sounds strange ("a data dynamic data").

I dropped the first data.

>
>> +   Individual segments are similar to the _dl_eh_nodelete_mappings
>
> Maybe use _dl_eh_nodelete_mappings_*, because '_dl_eh_nodelete_mappings'
> itself if not defined anywhere.

Right.

>> +   Adding new elements to this data structure is another source of
>> +   quadratic behavior for dlopen.  If the other causes of quadratic
>> +   behavior are eliminated, a more complicated data structure will be
>> +   needed.  */
>
> This worries me, specially we have reports that python and other dynamic
> environments do use a lot of plugin and generates a lot of dlopen() calls.
> What kind of performance implication do you foresee here?

The additional overhead is not disproportionate to the other sources of
quadratic behavior.  With 1,000 dlopen'ed objects, overall run-time
seems to be comparable to the strcmp time required soname matching, for
example, and is quite difficult to measure.  So we could fix the
performance regression if we used a hash table for that …

It's just an undesirable complexity class.  The implementation is not
actually slow because it's a mostly-linear copy (although a backwards
one).  Other parts of dlopen involve pointer chasing and are much
slower.

>> +/* Allocate an empty segment that is at least SIZE large.  PREVIOUS */
>
> What this PREVIOUS refer to?

Oops, it's now:

/* Allocate an empty segment that is at least SIZE large.  PREVIOUS
   points to the chain of previously allocated segments and can be
   NULL.  */

>> +/* Update the version to reflect that an update is happening.  This
>> +   does not change the bit that controls the active segment chain.
>> +   Returns the index of the currently active segment chain.  */
>> +static inline unsigned int
>> +_dl_eh_mappings_begin_update (void)
>> +{
>> +  unsigned int v
>> += __atomic_wide_counter_fetch_add_relaxed 
>> (&_dl_eh_loaded_mappings_version,
>> +   2);
>
> Why use an 'unsigned int' for the wide counter here?

Because …

>> +  /* Subsequent stores to the TM data must not be reordered before the
>> + store above with the version update.  */
>> +  atomic_thread_fence_release ();
>> +  return v & 1;
>> +}

… we only need the lower bit.

>> +  /* Other initially loaded objects.  */
>> +  if (pc >= *_dl_eh_nodelete_mappings_bases
>> +  && pc < _dl_eh_nodelete_mappings_end)
>> +{
>> +  size_t idx = _dl_eh_find_lower_bound (pc,
>> +_dl_eh_nodelete_mappings_bases,
>> +_dl_eh_nodelete_mappings_size);
>> +  const struct dl_eh_frame_info *info
>> += _dl_eh_nodelete_mappings_infos + idx;
>
> Ins't a UB if idx is not a valid one?

idx is always valid here.

>> +  bool match;
>> +  if (idx < _dl_eh_nodelete_mappings_size
>> +  && pc ==

Re: [PATCH 1/5] libstdc++: Import the fast_float library

2021-11-16 Thread Florian Weimer via Gcc-patches

* Jonathan Wakely:

> On Tue, 16 Nov 2021 at 08:01, Florian Weimer wrote:
>>
>> * Patrick Palka via Libstdc:
>>
>> > This copies the fast_float library[1] into the compiled-in library
>> > sources.  We're going to use this library in our floating-point
>> > std::from_chars implementation for faster and more portable parsing of
>> > binary32/64 decimal strings.
>> >
>> > [1]: https://github.com/fastfloat/fast_float
>> >
>> > Series tested on x86_64, i686, ppc64, ppc64le and aarch64, does it
>> > look OK for trunk?
>>
>> Missing Signed-off-by:?
>
> That's not needed if Patrick is still covered by an FSF assignment.

But the submission is not covered by the FSF assignment.

> I think we could use Apache as well, because this code isn't going to
> appear in public headers so the problematic clause doesn't apply. But
> MIT is simpler.

Okay, so you consider dynamic linking only?  I think the historic
libstdc++ license is more permissive than Apache or MIT when used with
GCC.  There aren't any notification or other requirements.

Thanks,
Florian

Re: [PATCH 1/5] libstdc++: Import the fast_float library

2021-11-16 Thread Florian Weimer via Gcc-patches

* Patrick Palka via Libstdc:

> This copies the fast_float library[1] into the compiled-in library
> sources.  We're going to use this library in our floating-point
> std::from_chars implementation for faster and more portable parsing of
> binary32/64 decimal strings.
>
> [1]: https://github.com/fastfloat/fast_float
>
> Series tested on x86_64, i686, ppc64, ppc64le and aarch64, does it
> look OK for trunk?

Missing Signed-off-by:?

> diff --git a/libstdc++-v3/src/c++17/fast_float/LICENSE-APACHE 
> b/libstdc++-v3/src/c++17/fast_float/LICENSE-APACHE
> new file mode 100644
> index 000..26f4398f249
> --- /dev/null
> +++ b/libstdc++-v3/src/c++17/fast_float/LICENSE-APACHE
> @@ -0,0 +1,190 @@
> + Apache License
> +   Version 2.0, January 2004
> +http://www.apache.org/licenses/

> diff --git a/libstdc++-v3/src/c++17/fast_float/LICENSE-MIT 
> b/libstdc++-v3/src/c++17/fast_float/LICENSE-MIT
> new file mode 100644
> index 000..2fb2a37ad7f
> --- /dev/null
> +++ b/libstdc++-v3/src/c++17/fast_float/LICENSE-MIT
> @@ -0,0 +1,27 @@
> +MIT License
> +
> +Copyright (c) 2021 The fast_float authors

You also need to include the README file, which makes it clear that
recipients can choose between Apache and MIT.  GCC needs to use the MIT
option, I think.

Thanks,
Florian

[RFC PATCH] Implement #pragma GCC noexpand

2021-11-05 Thread Florian Weimer via Gcc-patches

This can be used to avoid excessive __ mangling of identifiers
merely to guard against accidental macro expansion.  (Identifiers in
the global scope or with external linkage may still need mangling.)

In order to support -fdirectives-only without introducing
new line change flags, #include is not permitted in noexpand mode.

I think this could be particularly useful for writing libstdc++ headers.
Is this something we want?  Then I'll figure out how to add some tests.

Thanks,
Florian

libcpp/ChangeLog

* include/cpplib. (enum cpp_warning_reason): Add
CPP_W_EXPANSION_TO_DEFINED.
(NODE_EXPAND): Define new node flag.
(struct cpp_hashnode): Expand width of flags member.
* internal.h (struct cpp_read): Add noexpand member.
* directives.c (struct pragma_entry): Add
pass_if_directives_only.
(do_include_common): Error out in noexpand mode.
(do_linemarker): Reset noexpand flag when leaving files.
(_cpp_pop_buffer): Likewise.
(register_pragma_internal_not_directive_only)
(do_pragma_expand, do_pragma_noexpand): New functions.
(_cpp_init_internal_pragmas): Register "GCC expand",
"GCC noexpand" pragmata.
(do_pragma): Use default callback for pass_if_directives_only
pragmata with -fdirectives-only.
* init.c (cpp_init_builtins): Define _Expand helper macro.
* macro.c (enter_macro_context): Enable nested macro
expansion by disabling noexpand.
(cpp_get_token_1): Skip macro expansion in noexpand mode,
except for NODE_EXPAND macros.

gcc/ChangeLog

* c-family/c.opt (Wcpp-noexpand): Add.
* common.opt (Wcpp-noexpand): Likewise.
* doc/cpp.texi (Common Predefined Macros): Document _Expand.
(Pragmas): Document "GCC noexpand", "GCC expand".
* doc/invoke.texi (Option Summary): Add -Wno-cpp-noexpand.
(Warning Options): Document -Wno-cpp-noexpand.

---
 gcc/c-family/c.opt  |  4 +++
 gcc/common.opt  |  4 +++
 gcc/doc/cpp.texi| 26 +++
 gcc/doc/invoke.texi |  7 +-
 libcpp/directives.c | 66 +
 libcpp/include/cpplib.h |  6 +++--
 libcpp/init.c   |  6 +
 libcpp/internal.h   |  3 +++
 libcpp/macro.c  | 14 ---
 9 files changed, 130 insertions(+), 6 deletions(-)

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 06457ac739e..4cb9811db3a 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -511,6 +511,10 @@ Wcpp
 C ObjC C++ ObjC++ CppReason(CPP_W_WARNING_DIRECTIVE)
 ; Documented in common.opt
 
+Wcpp-noexpand
+C ObjC C++ ObjC++ CppReason(CPP_W_NOEXPAND)
+; Documented in common.opt.
+
 Wctad-maybe-unsupported
 C++ ObjC++ Var(warn_ctad_maybe_unsupported) Warning
 Warn when performing class template argument deduction on a type with no
diff --git a/gcc/common.opt b/gcc/common.opt
index 1a5b9bfcca9..d5f6a5c296d 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -574,6 +574,10 @@ Wcpp
 Common Var(warn_cpp) Init(1) Warning
 Warn when a #warning directive is encountered.
 
+Wcpp-noexpand
+Common Var(warn_cpp_noexpand) Init(1) Warning
+Warn about misuses of the noexpand preprocessor feature.
+
 Wattribute-warning
 Common Var(warn_attribute_warning) Init(1) Warning
 Warn about uses of __attribute__((warning)) declarations.
diff --git a/gcc/doc/cpp.texi b/gcc/doc/cpp.texi
index 53f7204504c..037dc51cba3 100644
--- a/gcc/doc/cpp.texi
+++ b/gcc/doc/cpp.texi
@@ -1941,6 +1941,11 @@ generate unique identifiers.  Care must be taken to 
ensure that
 @code{__COUNTER__} is not expanded prior to inclusion of precompiled headers
 which use it.  Otherwise, the precompiled headers will not be used.
 
+@item _Expand
+@samp{_Expand (@var{source})} can be used to macro-expand @var{source}
+after a @samp{#pragma GCC noexpand} directive, where macro expansion is
+inhibited otherwise.
+
 @item __GFORTRAN__
 The GNU Fortran compiler defines this.
 
@@ -3843,6 +3848,27 @@ file will never be read again, no matter what.  It is a 
less-portable
 alternative to using @samp{#ifndef} to guard the contents of header files
 against multiple inclusions.
 
+@item #pragma GCC noexpand
+Temporarily turns off macro expansion.  In order to expand @var{source}
+in this mode, write @samp{_Expand (@var{source})}.  For example,
+
+@smallexample
+#define A a
+#define B b
+#define C() A B
+#pragma GCC noexpand
+A _Expand (A B C()) B
+@end smallexample
+
+expands to @samp{A a b a b B}.
+
+It is an error to include other files in @code{noexpand}, until macro
+expansion has been turned on again using @samp{#pragma GCC expand}.
+
+@item #pragma GCC expand
+Turns on macro expansion again after it has been turned off using
+@samp{#pragma GCC noexpand}.
+
 @end ftable
 
 @node Other Directives
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 9fb74d34920..4e6c5210161 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -338,7

Re: Invalid -Wstringop-overread warning for valid POSIX constructs

2021-11-04 Thread Florian Weimer via Gcc-patches

* Martin Sebor:

> Thanks for the reminder.  I have not forgotten about this.
> I agreed in our discussion and in the GCC bug report where this
> came up (PR 101751) that the GCC logic here is wrong and should
> be relaxed.  I consider it a GCC bug so I plan to make the change
> in the bug fixing stage 3.  GCC is in the development stage until
> the 15th and I've been busy trying to wrap up what I'm working on.
> Once it's changed in GCC 12 I'll backport it to GCC 11.3.  Does
> this timeframe work for you?

GCC 11.3 will likely be released in spring 2022, right?  That's rather
far in the future for any planning.  But then the fix is not terribly
urgent.  At this point I want to make sure that we get the necessary
capability in GCC at one pointer.  GCC 12 plus backport is fine.

Thanks,
Florian

Invalid -Wstringop-overread warning for valid POSIX constructs

2021-11-04 Thread Florian Weimer via Gcc-patches

This code:

#include 
#include 

void
f (pthread_key_t key)
{
  pthread_setspecific (key, MAP_FAILED);
}

Results in a warning:

t.c: In function ‘f’:
t.c:7:3: warning: ‘pthread_setspecific’ expecting 1 byte in a region of size 0 
[-Wstringop-overread]
7 |   pthread_setspecific (key, MAP_FAILED);
  |   ^
In file included from t.c:1:
/usr/include/pthread.h:1308:12: note: in a call to function 
‘pthread_setspecific’ declared with attribute ‘access (none, 2)’
 1308 | extern int pthread_setspecific (pthread_key_t __key,
  |^~~


This also results in the same warning, for different reasons:

#include 

extern int x[1];

void
f (pthread_key_t key)
{
  pthread_setspecific (key, [1]);
}

t.c: In function ‘f’:
t.c:8:3: warning: ‘pthread_setspecific’ expecting 1 byte in a region of size 0 
[-Wstringop-overread]
8 |   pthread_setspecific (key, [1]);
  |   ^~~~
t.c:3:12: note: at offset 4 into source object ‘x’ of size 4
3 | extern int x[1];
  |^
In file included from t.c:1:
/usr/include/pthread.h:1308:12: note: in a call to function 
‘pthread_setspecific’ declared with attribute ‘access (none, 2)’
 1308 | extern int pthread_setspecific (pthread_key_t __key,
  |^~~

The original argument justifying this warning was that passing
non-pointer constants is invalid.  But MAP_FAILED is a valid POSIX
pointer constant, so it is allowed here as well.  And the second example
shows that the warning also fires for completely valid pointers.  So the
none access attribute is clearly not correct here.  (“none” requires
that the pointer is valid, there just aren't any accesses to the object
it points to, but the object must exist.  Apparently, this is what the
kernel expects for its use of the annotation.)

The root of the problem is the const void * pointer argument.  Without
the access attribute, we warn for other examples:

typedef unsigned int pthread_key_t;
int pthread_setspecific (pthread_key_t __key, const void *);

void
f (pthread_key_t key)
{
  int x;
  pthread_setspecific (key, );
}

t.c: In function ‘f’:
t.c:10:3: warning: ‘x’ may be used uninitialized [-Wmaybe-uninitialized]
   10 |   pthread_setspecific (key, );
  |   ^
t.c:4:5: note: by argument 2 of type ‘const void *’ to ‘pthread_setspecific’ 
declared here
4 | int pthread_setspecific (pthread_key_t __key, const void *);
  | ^~~
t.c:9:7: note: ‘x’ declared here
9 |   int x;
  |   ^

This is why we added the none access attribute, but this leads to the
other problem.

We could change glibc to use a different attribute (preferable one that
we can probe using __has_attribute) if one were to become available, and
backport that.  But so far, I see nothing on the GCC side, and
GCC PR 102329 seems to have stalled.

Thanks,
Florian

[PATCH 4/4] libgcc: Use _dl_find_eh_frame in _Unwind_Find_FDE

2021-11-03 Thread Florian Weimer via Gcc-patches

libgcc/ChangeLog

* unwind-dw2-fde-dip.c (USE_DL_FIND_EH_FRAME)
(DL_FIND_EH_FRAME_CONDITION): New macros.
[__GLIBC__ && !DL_FIND_EH_FRAME_DBASE] (_dl_find_eh_frame):
Declare weak function.
(_Unwind_Find_FDE): Call _dl_find_eh_frame if available.
---
 libgcc/unwind-dw2-fde-dip.c | 50 +
 1 file changed, 50 insertions(+)

diff --git a/libgcc/unwind-dw2-fde-dip.c b/libgcc/unwind-dw2-fde-dip.c
index 272c0ec46c0..b5b4a23dc56 100644
--- a/libgcc/unwind-dw2-fde-dip.c
+++ b/libgcc/unwind-dw2-fde-dip.c
@@ -129,6 +129,30 @@ unw_eh_callback_data_dbase (const struct 
unw_eh_callback_data *data
 #endif
 }
 
+#ifdef DL_FIND_EH_FRAME_DBASE
+#if DL_FIND_EH_FRAME_DBASE != NEED_DBASE_MEMBER
+#error "DL_FIND_EH_FRAME_DBASE != NEED_DBASE_MEMBER"
+#endif
+#define USE_DL_FIND_EH_FRAME 1
+#define DL_FIND_EH_FRAME_CONDITION 1
+#endif
+
+/* Fallback declaration for old glibc headers.  DL_FIND_EH_FRAME_DBASE is used
+   as a proxy to determine if  declares _dl_find_eh_frame.  */
+#if defined __GLIBC__ && !defined DL_FIND_EH_FRAME_DBASE
+#if NEED_DBASE_MEMBER
+void *_dl_find_eh_frame (void *__pc, void **__dbase) __attribute__ ((weak));
+#else
+void *_dl_find_eh_frame (void *__pc) __attribute__ ((weak));
+#endif
+#define USE_DL_FIND_EH_FRAME 1
+#define DL_FIND_EH_FRAME_CONDITION (_dl_find_eh_frame != NULL)
+#endif
+
+#ifndef USE_DL_FIND_EH_FRAME
+#define USE_DL_FIND_EH_FRAME 0
+#endif
+
 struct unw_eh_frame_hdr
 {
   unsigned char version;
@@ -501,6 +525,32 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases)
   if (ret != NULL)
 return ret;
 
+#if USE_DL_FIND_EH_FRAME
+  if (DL_FIND_EH_FRAME_CONDITION)
+{
+  void *dbase;
+  void *eh_frame;
+#if NEED_DBASE_MEMBER
+  eh_frame = _dl_find_eh_frame (pc, );
+#else
+  dbase = NULL;
+  eh_frame = _dl_find_eh_frame (pc);
+#endif
+  if (eh_frame == NULL)
+   return NULL;
+
+  struct find_fde_tail_result result
+   = find_fde_tail ((_Unwind_Ptr) pc, eh_frame, (_Unwind_Ptr) dbase);
+  if (result.entry != NULL)
+   {
+ bases->tbase = NULL;
+ bases->dbase = (void *) dbase;
+ bases->func = result.func;
+   }
+  return result.entry;
+}
+#endif
+
   data.pc = (_Unwind_Ptr) pc;
 #if NEED_DBASE_MEMBER
   data.dbase = NULL;
-- 
2.31.1

[PATCH 3/4] libgcc: Split FDE search code from PT_GNU_EH_FRAME lookup

2021-11-03 Thread Florian Weimer via Gcc-patches

This allows switching to a different implementation for
PT_GNU_EH_FRAME lookup in a subsequent commit.

This moves some of the PT_GNU_EH_FRAME parsing out of the glibc loader
lock that is implied by dl_iterate_phdr.  However, the FDE is already
parsed outside the lock before this change, so this does not introduce
additional crashes in case of a concurrent dlclose.

libunwind/ChangeLog

* unwind-dw2-fde-dip.c (struct unw_eh_callback_data): Add hdr.
Remove func, ret.
(struct find_fde_tail_result): New.
(find_fde_tail): New function.  Split from
_Unwind_IteratePhdrCallback.
(_Unwind_Find_FDE): Add call to find_fde_tail.
---
 libgcc/unwind-dw2-fde-dip.c | 91 +
 1 file changed, 52 insertions(+), 39 deletions(-)

diff --git a/libgcc/unwind-dw2-fde-dip.c b/libgcc/unwind-dw2-fde-dip.c
index 3f302826d2d..272c0ec46c0 100644
--- a/libgcc/unwind-dw2-fde-dip.c
+++ b/libgcc/unwind-dw2-fde-dip.c
@@ -113,8 +113,7 @@ struct unw_eh_callback_data
 #if NEED_DBASE_MEMBER
   void *dbase;
 #endif
-  void *func;
-  const fde *ret;
+  const struct unw_eh_frame_hdr *hdr;
   int check_cache;
 };
 
@@ -197,10 +196,6 @@ _Unwind_IteratePhdrCallback (struct dl_phdr_info *info, 
size_t size, void *ptr)
 #else
   _Unwind_Ptr load_base;
 #endif
-  const unsigned char *p;
-  const struct unw_eh_frame_hdr *hdr;
-  _Unwind_Ptr eh_frame;
-  struct object ob;
   _Unwind_Ptr pc_low = 0, pc_high = 0;
 
   struct ext_dl_phdr_info
@@ -348,10 +343,8 @@ _Unwind_IteratePhdrCallback (struct dl_phdr_info *info, 
size_t size, void *ptr)
 return 0;
 
   /* Read .eh_frame_hdr header.  */
-  hdr = (const struct unw_eh_frame_hdr *)
+  data->hdr = (const struct unw_eh_frame_hdr *)
 __RELOC_POINTER (p_eh_frame_hdr->p_vaddr, load_base);
-  if (hdr->version != 1)
-return 1;
 
 #ifdef CRT_GET_RFIB_DATA
 # if defined __i386__ || defined __nios2__
@@ -383,12 +376,34 @@ _Unwind_IteratePhdrCallback (struct dl_phdr_info *info, 
size_t size, void *ptr)
 # endif
 #endif
 
-  _Unwind_Ptr dbase = unw_eh_callback_data_dbase (data);
+  return 1;
+}
+
+/* Result type of find_fde_tail below.  */
+struct find_fde_tail_result
+{
+  const fde *entry;
+  void *func;
+};
+
+/* Find the FDE for the program counter PC, in a previously located
+   PT_GNU_EH_FRAME data region.  */
+static struct find_fde_tail_result
+find_fde_tail (_Unwind_Ptr pc,
+  const struct unw_eh_frame_hdr *hdr,
+  _Unwind_Ptr dbase)
+{
+  const unsigned char *p = (const unsigned char *) (hdr + 1);
+  _Unwind_Ptr eh_frame;
+  struct object ob;
+
+  if (hdr->version != 1)
+return (struct find_fde_tail_result) { NULL, };
+
   p = read_encoded_value_with_base (hdr->eh_frame_ptr_enc,
base_from_cb_data (hdr->eh_frame_ptr_enc,
   dbase),
-   (const unsigned char *) (hdr + 1),
-   _frame);
+   p, _frame);
 
   /* We require here specific table encoding to speed things up.
  Also, DW_EH_PE_datarel here means using PT_GNU_EH_FRAME start
@@ -404,7 +419,7 @@ _Unwind_IteratePhdrCallback (struct dl_phdr_info *info, 
size_t size, void *ptr)
p, _count);
   /* Shouldn't happen.  */
   if (fde_count == 0)
-   return 1;
+   return (struct find_fde_tail_result) { NULL, };
   if _Unwind_Ptr) p) & 3) == 0)
{
  struct fde_table {
@@ -419,9 +434,9 @@ _Unwind_IteratePhdrCallback (struct dl_phdr_info *info, 
size_t size, void *ptr)
  _Unwind_Ptr range;
 
  mid = fde_count - 1;
- if (data->pc < table[0].initial_loc + data_base)
-   return 1;
- else if (data->pc < table[mid].initial_loc + data_base)
+ if (pc < table[0].initial_loc + data_base)
+   return (struct find_fde_tail_result) { NULL, };
+ else if (pc < table[mid].initial_loc + data_base)
{
  lo = 0;
  hi = mid;
@@ -429,9 +444,9 @@ _Unwind_IteratePhdrCallback (struct dl_phdr_info *info, 
size_t size, void *ptr)
  while (lo < hi)
{
  mid = (lo + hi) / 2;
- if (data->pc < table[mid].initial_loc + data_base)
+ if (pc < table[mid].initial_loc + data_base)
hi = mid;
- else if (data->pc >= table[mid + 1].initial_loc + data_base)
+ else if (pc >= table[mid + 1].initial_loc + data_base)
lo = mid + 1;
  else
break;
@@ -445,10 +460,11 @@ _Unwind_IteratePhdrCallback (struct dl_phdr_info *info, 
size_t size, void *ptr)
  f_enc_size = size_of_encoded_value (f_enc);
  read_encoded_value_with_base (f_enc & 0x0f, 0,
>pc_begin[f_enc_size], );
- if (data->pc <

[PATCH 2/4] libgcc: Remove dbase member from struct unw_eh_callback_data if NULL

2021-11-03 Thread Florian Weimer via Gcc-patches

Only i386 and nios2 need this member at present.

libgcc/ChangeLog

* unwind-dw2-fde-dip.c (NEED_DBASE_MEMBER): Define.
(struct unw_eh_callback_data): Make dbase member conditional.
(unw_eh_callback_data_dbase): New function.
(base_from_cb_data): Simplify for the non-dbase case.
(_Unwind_IteratePhdrCallback): Adjust.
(_Unwind_Find_FDE): Likewise.
---
 libgcc/unwind-dw2-fde-dip.c | 46 +++--
 1 file changed, 39 insertions(+), 7 deletions(-)

diff --git a/libgcc/unwind-dw2-fde-dip.c b/libgcc/unwind-dw2-fde-dip.c
index 4a4d990f455..3f302826d2d 100644
--- a/libgcc/unwind-dw2-fde-dip.c
+++ b/libgcc/unwind-dw2-fde-dip.c
@@ -101,15 +101,35 @@ static const fde * _Unwind_Find_registered_FDE (void *pc, 
struct dwarf_eh_bases
 #define PT_GNU_EH_FRAME (PT_LOOS + 0x474e550)
 #endif
 
+#ifdef CRT_GET_RFIB_DATA
+#define NEED_DBASE_MEMBER 1
+#else
+#define NEED_DBASE_MEMBER 0
+#endif
+
 struct unw_eh_callback_data
 {
   _Unwind_Ptr pc;
+#if NEED_DBASE_MEMBER
   void *dbase;
+#endif
   void *func;
   const fde *ret;
   int check_cache;
 };
 
+/* Returns DATA->dbase if available, else NULL.  */
+static inline _Unwind_Ptr
+unw_eh_callback_data_dbase (const struct unw_eh_callback_data *data
+   __attribute__ ((unused)))
+{
+#if NEED_DBASE_MEMBER
+  return (_Unwind_Ptr) data->dbase;
+#else
+  return 0;
+#endif
+}
+
 struct unw_eh_frame_hdr
 {
   unsigned char version;
@@ -139,9 +159,11 @@ static struct frame_hdr_cache_element 
*frame_hdr_cache_head;
 /* Like base_of_encoded_value, but take the base from a struct
unw_eh_callback_data instead of an _Unwind_Context.  */
 
-static _Unwind_Ptr
-base_from_cb_data (unsigned char encoding, struct unw_eh_callback_data *data)
+static inline _Unwind_Ptr
+base_from_cb_data (unsigned char encoding __attribute__ ((unused)),
+  _Unwind_Ptr dbase __attribute__ ((unused)))
 {
+#if NEED_DBASE_MEMBER
   if (encoding == DW_EH_PE_omit)
 return 0;
 
@@ -155,10 +177,13 @@ base_from_cb_data (unsigned char encoding, struct 
unw_eh_callback_data *data)
 case DW_EH_PE_textrel:
   return 0;
 case DW_EH_PE_datarel:
-  return (_Unwind_Ptr) data->dbase;
+  return dbase;
 default:
   gcc_unreachable ();
 }
+#else /* !NEED_DBASE_MEMBER */
+  return 0;
+#endif
 }
 
 static int
@@ -358,9 +383,10 @@ _Unwind_IteratePhdrCallback (struct dl_phdr_info *info, 
size_t size, void *ptr)
 # endif
 #endif
 
+  _Unwind_Ptr dbase = unw_eh_callback_data_dbase (data);
   p = read_encoded_value_with_base (hdr->eh_frame_ptr_enc,
base_from_cb_data (hdr->eh_frame_ptr_enc,
-  data),
+  dbase),
(const unsigned char *) (hdr + 1),
_frame);
 
@@ -374,7 +400,7 @@ _Unwind_IteratePhdrCallback (struct dl_phdr_info *info, 
size_t size, void *ptr)
 
   p = read_encoded_value_with_base (hdr->fde_count_enc,
base_from_cb_data (hdr->fde_count_enc,
-  data),
+  dbase),
p, _count);
   /* Shouldn't happen.  */
   if (fde_count == 0)
@@ -431,7 +457,7 @@ _Unwind_IteratePhdrCallback (struct dl_phdr_info *info, 
size_t size, void *ptr)
  removed, we could cache this (and thus use search_object).  */
   ob.pc_begin = NULL;
   ob.tbase = NULL;
-  ob.dbase = data->dbase;
+  ob.dbase = (void *) dbase;
   ob.u.single = (fde *) eh_frame;
   ob.s.i = 0;
   ob.s.b.mixed_encoding = 1;  /* Need to assume worst case.  */
@@ -442,7 +468,7 @@ _Unwind_IteratePhdrCallback (struct dl_phdr_info *info, 
size_t size, void *ptr)
   unsigned int encoding = get_fde_encoding (data->ret);
 
   read_encoded_value_with_base (encoding,
-   base_from_cb_data (encoding, data),
+   base_from_cb_data (encoding, dbase),
data->ret->pc_begin, );
   data->func = (void *) func;
 }
@@ -460,7 +486,9 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases)
 return ret;
 
   data.pc = (_Unwind_Ptr) pc;
+#if NEED_DBASE_MEMBER
   data.dbase = NULL;
+#endif
   data.func = NULL;
   data.ret = NULL;
   data.check_cache = 1;
@@ -471,7 +499,11 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases)
   if (data.ret)
 {
   bases->tbase = NULL;
+#if NEED_DBASE_MEMBER
   bases->dbase = data.dbase;
+#else
+  bases->dbase = NULL;
+#endif
   bases->func = data.func;
 }
   return data.ret;
-- 
2.31.1

[PATCH 1/4] libgcc: Remove tbase member from struct unw_eh_callback_data

2021-11-03 Thread Florian Weimer via Gcc-patches

It is always a null pointer.

libgcc/ChangeLog

* unwind-dw2-fde-dip.c (struct unw_eh_callback_data): Remove
tbase member.
(base_from_cb_data): Adjust.
(_Unwind_IteratePhdrCallback): Likewise.
(_Unwind_Find_FDE): Likewise.
---
 libgcc/unwind-dw2-fde-dip.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/libgcc/unwind-dw2-fde-dip.c b/libgcc/unwind-dw2-fde-dip.c
index 5095b6830bf..4a4d990f455 100644
--- a/libgcc/unwind-dw2-fde-dip.c
+++ b/libgcc/unwind-dw2-fde-dip.c
@@ -104,7 +104,6 @@ static const fde * _Unwind_Find_registered_FDE (void *pc, 
struct dwarf_eh_bases
 struct unw_eh_callback_data
 {
   _Unwind_Ptr pc;
-  void *tbase;
   void *dbase;
   void *func;
   const fde *ret;
@@ -154,7 +153,7 @@ base_from_cb_data (unsigned char encoding, struct 
unw_eh_callback_data *data)
   return 0;
 
 case DW_EH_PE_textrel:
-  return (_Unwind_Ptr) data->tbase;
+  return 0;
 case DW_EH_PE_datarel:
   return (_Unwind_Ptr) data->dbase;
 default:
@@ -431,7 +430,7 @@ _Unwind_IteratePhdrCallback (struct dl_phdr_info *info, 
size_t size, void *ptr)
  As soon as GLIBC will provide API so to notify that a library has been
  removed, we could cache this (and thus use search_object).  */
   ob.pc_begin = NULL;
-  ob.tbase = data->tbase;
+  ob.tbase = NULL;
   ob.dbase = data->dbase;
   ob.u.single = (fde *) eh_frame;
   ob.s.i = 0;
@@ -461,7 +460,6 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases)
 return ret;
 
   data.pc = (_Unwind_Ptr) pc;
-  data.tbase = NULL;
   data.dbase = NULL;
   data.func = NULL;
   data.ret = NULL;
@@ -472,7 +470,7 @@ _Unwind_Find_FDE (void *pc, struct dwarf_eh_bases *bases)
 
   if (data.ret)
 {
-  bases->tbase = data.tbase;
+  bases->tbase = NULL;
   bases->dbase = data.dbase;
   bases->func = data.func;
 }
-- 
2.31.1

[PATCH 0/4] Use _dl_find_eh_frame to locate DWARF EH data in the unwinder

2021-11-03 Thread Florian Weimer via Gcc-patches

This is the GCC side of the patch series.

To simplify testing, a weak reference to _dl_find_eh_frame is used to
enable this feature when running on newer glibc even if built for older
glibc.

The first three patches are cleanups/refactorings to simplify the actual
change in the last patch.

Benchmarking-wise, the new unwinder is slightly faster even in the
optimal case for the old implementation (single-threaded, 100% cache hit
rate).  The old implementation performs really poorly once the cache hit
rate drops and the number of shared objects participating in unwinding
increases, so that's not a fair comparison.  Old performance with
multiple threads is also poor due to the global loader lock implied by
dl_iterate_phdr (which is necessary to serialize access to the libgcc
unwinder cache), and I haven't bother to benchmark that.

Thanks,
Florian

Florian Weimer (4):
  libgcc: Remove tbase member from struct unw_eh_callback_data
  libgcc: Remove dbase member from struct unw_eh_callback_data if NULL
  libgcc: Split FDE search code from PT_GNU_EH_FRAME lookup
  libgcc: Use _dl_find_eh_frame in _Unwind_Find_FDE

 libgcc/unwind-dw2-fde-dip.c | 185 +++-
 1 file changed, 139 insertions(+), 46 deletions(-)


base-commit: 6b8b25575570ffde37cc8997af096514b929779d
-- 
2.31.1

[PATCH 3/3] elf: Add _dl_find_eh_frame function

2021-11-03 Thread Florian Weimer via Gcc-patches

This function is similar to __gnu_Unwind_Find_exidx as used on arm.
It can be used to speed up the libgcc unwinder.
---
 NEWS  |   4 +
 bits/dlfcn_eh_frame.h |  33 +
 dlfcn/Makefile|   2 +-
 dlfcn/dlfcn.h |   2 +
 elf/Makefile  |  31 +-
 elf/Versions  |   3 +
 elf/dl-close.c|   4 +
 elf/dl-find_eh_frame.c| 864 ++
 elf/dl-find_eh_frame.h|  90 ++
 elf/dl-find_eh_frame_slow.h   |  55 ++
 elf/dl-libc_freeres.c |   2 +
 elf/dl-open.c |   5 +
 elf/rtld.c|   7 +
 elf/tst-dl_find_eh_frame-mod1.c   |  10 +
 elf/tst-dl_find_eh_frame-mod2.c   |  10 +
 elf/tst-dl_find_eh_frame-mod3.c   |  10 +
 elf/tst-dl_find_eh_frame-mod4.c   |  10 +
 elf/tst-dl_find_eh_frame-mod5.c   |  11 +
 elf/tst-dl_find_eh_frame-mod6.c   |  11 +
 elf/tst-dl_find_eh_frame-mod7.c   |  10 +
 elf/tst-dl_find_eh_frame-mod8.c   |  10 +
 elf/tst-dl_find_eh_frame-mod9.c   |  10 +
 elf/tst-dl_find_eh_frame-threads.c| 237 +
 elf/tst-dl_find_eh_frame.c| 179 
 include/atomic_wide_counter.h |  14 +
 include/bits/dlfcn_eh_frame.h |   1 +
 include/link.h|   3 +
 manual/Makefile   |   2 +-
 manual/dynlink.texi   |  69 ++
 manual/libdl.texi |  10 -
 manual/probes.texi|   2 +-
 manual/threads.texi   |   2 +-
 sysdeps/i386/bits/dlfcn_eh_frame.h|  34 +
 sysdeps/mach/hurd/i386/ld.abilist |   1 +
 sysdeps/nios2/bits/dlfcn_eh_frame.h   |  34 +
 sysdeps/unix/sysv/linux/aarch64/ld.abilist|   1 +
 sysdeps/unix/sysv/linux/alpha/ld.abilist  |   1 +
 sysdeps/unix/sysv/linux/arc/ld.abilist|   1 +
 sysdeps/unix/sysv/linux/arm/be/ld.abilist |   1 +
 sysdeps/unix/sysv/linux/arm/le/ld.abilist |   1 +
 sysdeps/unix/sysv/linux/csky/ld.abilist   |   1 +
 sysdeps/unix/sysv/linux/hppa/ld.abilist   |   1 +
 sysdeps/unix/sysv/linux/i386/ld.abilist   |   1 +
 sysdeps/unix/sysv/linux/ia64/ld.abilist   |   1 +
 .../unix/sysv/linux/m68k/coldfire/ld.abilist  |   1 +
 .../unix/sysv/linux/m68k/m680x0/ld.abilist|   1 +
 sysdeps/unix/sysv/linux/microblaze/ld.abilist |   1 +
 .../unix/sysv/linux/mips/mips32/ld.abilist|   1 +
 .../sysv/linux/mips/mips64/n32/ld.abilist |   1 +
 .../sysv/linux/mips/mips64/n64/ld.abilist |   1 +
 sysdeps/unix/sysv/linux/nios2/ld.abilist  |   1 +
 .../sysv/linux/powerpc/powerpc32/ld.abilist   |   1 +
 .../linux/powerpc/powerpc64/be/ld.abilist |   1 +
 .../linux/powerpc/powerpc64/le/ld.abilist |   1 +
 sysdeps/unix/sysv/linux/riscv/rv32/ld.abilist |   1 +
 sysdeps/unix/sysv/linux/riscv/rv64/ld.abilist |   1 +
 .../unix/sysv/linux/s390/s390-32/ld.abilist   |   1 +
 .../unix/sysv/linux/s390/s390-64/ld.abilist   |   1 +
 sysdeps/unix/sysv/linux/sh/be/ld.abilist  |   1 +
 sysdeps/unix/sysv/linux/sh/le/ld.abilist  |   1 +
 .../unix/sysv/linux/sparc/sparc32/ld.abilist  |   1 +
 .../unix/sysv/linux/sparc/sparc64/ld.abilist  |   1 +
 sysdeps/unix/sysv/linux/x86_64/64/ld.abilist  |   1 +
 sysdeps/unix/sysv/linux/x86_64/x32/ld.abilist |   1 +
 64 files changed, 1795 insertions(+), 16 deletions(-)
 create mode 100644 bits/dlfcn_eh_frame.h
 create mode 100644 elf/dl-find_eh_frame.c
 create mode 100644 elf/dl-find_eh_frame.h
 create mode 100644 elf/dl-find_eh_frame_slow.h
 create mode 100644 elf/tst-dl_find_eh_frame-mod1.c
 create mode 100644 elf/tst-dl_find_eh_frame-mod2.c
 create mode 100644 elf/tst-dl_find_eh_frame-mod3.c
 create mode 100644 elf/tst-dl_find_eh_frame-mod4.c
 create mode 100644 elf/tst-dl_find_eh_frame-mod5.c
 create mode 100644 elf/tst-dl_find_eh_frame-mod6.c
 create mode 100644 elf/tst-dl_find_eh_frame-mod7.c
 create mode 100644 elf/tst-dl_find_eh_frame-mod8.c
 create mode 100644 elf/tst-dl_find_eh_frame-mod9.c
 create mode 100644 elf/tst-dl_find_eh_frame-threads.c
 create mode 100644 elf/tst-dl_find_eh_frame.c
 create mode 100644 include/bits/dlfcn_eh_frame.h
 create mode 100644 manual/dynlink.texi
 delete mode 100644 manual/libdl.texi
 create mode 100644 sysdeps/i386/bits/dlfcn_eh_frame.h
 create mode 100644 sysdeps/nios2/bits/dlfcn_eh_frame.h

diff --git a/NEWS b/NEWS
index 82b7016aef..68c9c21458 100644
--- a/NEWS
+++ b/NEWS
@@ -64,6 +64,10 @@ Major new features:
   to be used by compilers for optimizing usage of 'memcmp' when its
   return value is only used for its boolean status.
 
+* The function _dl_find_eh_frame has

[PATCH 2/3] elf: Introduce GLRO (dl_libc_freeres), called from __libc_freeres

2021-11-03 Thread Florian Weimer via Gcc-patches

---
 elf/Makefile   |  2 +-
 elf/dl-libc_freeres.c  | 24 
 elf/rtld.c |  1 +
 malloc/set-freeres.c   |  5 +
 sysdeps/generic/ldsodefs.h |  7 +++
 5 files changed, 38 insertions(+), 1 deletion(-)
 create mode 100644 elf/dl-libc_freeres.c

diff --git a/elf/Makefile b/elf/Makefile
index cb9bcfb799..1c768bdf47 100644
--- a/elf/Makefile
+++ b/elf/Makefile
@@ -68,7 +68,7 @@ elide-routines.os = $(all-dl-routines) dl-support enbl-secure 
dl-origin \
 rtld-routines  = rtld $(all-dl-routines) dl-sysdep dl-environ dl-minimal \
   dl-error-minimal dl-conflict dl-hwcaps dl-hwcaps_split dl-hwcaps-subdirs \
   dl-usage dl-diagnostics dl-diagnostics-kernel dl-diagnostics-cpu \
-  dl-mutex
+  dl-mutex dl-libc_freeres
 all-rtld-routines = $(rtld-routines) $(sysdep-rtld-routines)
 
 CFLAGS-dl-runtime.c += -fexceptions -fasynchronous-unwind-tables
diff --git a/elf/dl-libc_freeres.c b/elf/dl-libc_freeres.c
new file mode 100644
index 00..68f305a6f9
--- /dev/null
+++ b/elf/dl-libc_freeres.c
@@ -0,0 +1,24 @@
+/* Deallocating malloc'ed memory from the dynamic loader.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   .  */
+
+#include 
+
+void
+__rtld_libc_freeres (void)
+{
+}
diff --git a/elf/rtld.c b/elf/rtld.c
index be2d5d8e74..847141e21d 100644
--- a/elf/rtld.c
+++ b/elf/rtld.c
@@ -378,6 +378,7 @@ struct rtld_global_ro _rtld_global_ro attribute_relro =
 ._dl_catch_error = _rtld_catch_error,
 ._dl_error_free = _dl_error_free,
 ._dl_tls_get_addr_soft = _dl_tls_get_addr_soft,
+._dl_libc_freeres = __rtld_libc_freeres,
 #ifdef HAVE_DL_DISCOVER_OSVERSION
 ._dl_discover_osversion = _dl_discover_osversion
 #endif
diff --git a/malloc/set-freeres.c b/malloc/set-freeres.c
index 5c19a2725c..856ff7831f 100644
--- a/malloc/set-freeres.c
+++ b/malloc/set-freeres.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "../nss/nsswitch.h"
 #include "../libio/libioP.h"
@@ -67,6 +68,10 @@ __libc_freeres (void)
 
   call_function_static_weak (__libc_dlerror_result_free);
 
+#ifdef SHARED
+  GLRO (dl_libc_freeres) ();
+#endif
+
   for (p = symbol_set_first_element (__libc_freeres_ptrs);
!symbol_set_end_p (__libc_freeres_ptrs, p); ++p)
 free (*p);
diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
index 1318c36dce..c26860430c 100644
--- a/sysdeps/generic/ldsodefs.h
+++ b/sysdeps/generic/ldsodefs.h
@@ -712,6 +712,10 @@ struct rtld_global_ro
  namespace.  */
   void (*_dl_error_free) (void *);
   void *(*_dl_tls_get_addr_soft) (struct link_map *);
+
+  /* Called from __libc_shared to deallocate malloc'ed memory.  */
+  void (*_dl_libc_freeres) (void);
+
 #ifdef HAVE_DL_DISCOVER_OSVERSION
   int (*_dl_discover_osversion) (void);
 #endif
@@ -1416,6 +1420,9 @@ __rtld_mutex_init (void)
 }
 #endif /* !PTHREAD_IN_LIBC */
 
+/* Implementation of GL (dl_libc_freeres).  */
+void __rtld_libc_freeres (void) attribute_hidden;
+
 void __thread_gscope_wait (void) attribute_hidden;
 # define THREAD_GSCOPE_WAIT() __thread_gscope_wait ()
 
-- 
2.31.1

[PATCH 1/3] nptl: Extract from pthread_cond_common.c

2021-11-03 Thread Florian Weimer via Gcc-patches

And make it an installed header.  This addresses a few aliasing
violations (which do not seem to result in miscompilation due to
the use of atomics), and also enables use of wide counters in other
parts of the library.

The debug output in nptl/tst-cond22 has been adjusted to print
the 32-bit values instead because it avoids a big-endian/little-endian
difference.
---
 bits/atomic_wide_counter.h  |  35 
 include/atomic_wide_counter.h   |  89 +++
 include/bits/atomic_wide_counter.h  |   1 +
 misc/Makefile   |   3 +-
 misc/atomic_wide_counter.c  | 127 +++
 nptl/Makefile   |  13 +-
 nptl/pthread_cond_common.c  | 204 
 nptl/tst-cond22.c   |  14 +-
 sysdeps/nptl/bits/thread-shared-types.h |  22 +--
 9 files changed, 310 insertions(+), 198 deletions(-)
 create mode 100644 bits/atomic_wide_counter.h
 create mode 100644 include/atomic_wide_counter.h
 create mode 100644 include/bits/atomic_wide_counter.h
 create mode 100644 misc/atomic_wide_counter.c

diff --git a/bits/atomic_wide_counter.h b/bits/atomic_wide_counter.h
new file mode 100644
index 00..0687eb554e
--- /dev/null
+++ b/bits/atomic_wide_counter.h
@@ -0,0 +1,35 @@
+/* Monotonically increasing wide counters (at least 62 bits).
+   Copyright (C) 2016-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   .  */
+
+#ifndef _BITS_ATOMIC_WIDE_COUNTER_H
+#define _BITS_ATOMIC_WIDE_COUNTER_H
+
+/* Counter that is monotonically increasing (by less than 2**31 per
+   increment), with a single writer, and an arbitrary number of
+   readers.  */
+typedef union
+{
+  __extension__ unsigned long long int __value64;
+  struct
+  {
+unsigned int __low;
+unsigned int __high;
+  } __value32;
+} __atomic_wide_counter;
+
+#endif /* _BITS_ATOMIC_WIDE_COUNTER_H */
diff --git a/include/atomic_wide_counter.h b/include/atomic_wide_counter.h
new file mode 100644
index 00..31f009d5e6
--- /dev/null
+++ b/include/atomic_wide_counter.h
@@ -0,0 +1,89 @@
+/* Monotonically increasing wide counters (at least 62 bits).
+   Copyright (C) 2016-2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   .  */
+
+#ifndef _ATOMIC_WIDE_COUNTER_H
+#define _ATOMIC_WIDE_COUNTER_H
+
+#include 
+#include 
+
+#if __HAVE_64B_ATOMICS
+
+static inline uint64_t
+__atomic_wide_counter_load_relaxed (__atomic_wide_counter *c)
+{
+  return atomic_load_relaxed (>__value64);
+}
+
+static inline uint64_t
+__atomic_wide_counter_fetch_add_relaxed (__atomic_wide_counter *c,
+ unsigned int val)
+{
+  return atomic_fetch_add_relaxed (>__value64, val);
+}
+
+static inline uint64_t
+__atomic_wide_counter_fetch_add_acquire (__atomic_wide_counter *c,
+ unsigned int val)
+{
+  return atomic_fetch_add_acquire (>__value64, val);
+}
+
+static inline void
+__atomic_wide_counter_add_relaxed (__atomic_wide_counter *c,
+   unsigned int val)
+{
+  atomic_store_relaxed (>__value64,
+atomic_load_relaxed (>__value64) + val);
+}
+
+static uint64_t __attribute__ ((unused))
+__atomic_wide_counter_fetch_xor_release (__atomic_wide_counter *c,
+ unsigned int val)
+{
+  return atomic_fetch_xor_release (>__value64, val);
+}
+
+#else /* !__HAVE_64B_ATOMICS */
+
+uint64_t __atomic_wide_counter_load_relaxed (__atomic_wide_counter *c)
+  attribute_hidden;
+
+uint64_t

[PATCH 0/3] Add _dl_find_eh_frame function for unwinder optimization

2021-11-03 Thread Florian Weimer via Gcc-patches

This patch series implements a new function, _dl_find_eh_frame, for use
by in-process unwinders.  The new function is lock-free and
async-signal-safe, and it scales logarithmically with the number of
shared objects in the process.  It does not write to global data at all,
unlike the current libgcc unwinder, which has a small global cache and
relies on the unwinder lock.  dlclose is fully supported.  The interface
itself is modelled on the __gnu_Unwind_Find_exidx function.

I'm going to post the GCC patches that enable use of this function
separately.

We may want to tweak the _dl_find_eh_frame interface somewhat.  For
example, we could also provide a pointer to the link map, and perhaps
the in-memory boundaries of the object.  With the link map, we can use
it to accelerate _dl_find_dso_for_object.  And we should really use it
to implement __gnu_Unwind_Find_exidx on arm.  Something like this:

struct dl_find_object
{
  struct link_map *dl_link_map;
  void *dl_eh_frame;
  void *dl_map_start;
  void *dl_map_end;
  void *dl_dbase;   // optional, i386 & nios2
  unsigned long int dl_pcount;  // optional, arm
};

int _dl_find_object (void *__address, struct dl_find_object *__result) __TRHOW;

I wanted to post this version so that the review of the GCC patches can
start now and hopefully finish in time for stage 1 close.

Tested on i686-linux-gnu, x86_64-linux-gnu with the old and new
unwinder.  Tested on powerpc-linux-gnu, powerpc64le-linux-gnu,
aarch64-linux-gnu with the old unwinder.

Thanks,
Florian

Florian Weimer (3):
  nptl: Extract  from pthread_cond_common.c
  elf: Introduce GLRO (dl_libc_freeres), called from __libc_freeres
  elf: Add _dl_find_eh_frame function

 NEWS  |   4 +
 bits/atomic_wide_counter.h|  35 +
 bits/dlfcn_eh_frame.h |  33 +
 dlfcn/Makefile|   2 +-
 dlfcn/dlfcn.h |   2 +
 elf/Makefile  |  33 +-
 elf/Versions  |   3 +
 elf/dl-close.c|   4 +
 elf/dl-find_eh_frame.c| 864 ++
 elf/dl-find_eh_frame.h|  90 ++
 elf/dl-find_eh_frame_slow.h   |  55 ++
 elf/dl-libc_freeres.c |  26 +
 elf/dl-open.c |   5 +
 elf/rtld.c|   8 +
 elf/tst-dl_find_eh_frame-mod1.c   |  10 +
 elf/tst-dl_find_eh_frame-mod2.c   |  10 +
 elf/tst-dl_find_eh_frame-mod3.c   |  10 +
 elf/tst-dl_find_eh_frame-mod4.c   |  10 +
 elf/tst-dl_find_eh_frame-mod5.c   |  11 +
 elf/tst-dl_find_eh_frame-mod6.c   |  11 +
 elf/tst-dl_find_eh_frame-mod7.c   |  10 +
 elf/tst-dl_find_eh_frame-mod8.c   |  10 +
 elf/tst-dl_find_eh_frame-mod9.c   |  10 +
 elf/tst-dl_find_eh_frame-threads.c| 237 +
 elf/tst-dl_find_eh_frame.c| 179 
 include/atomic_wide_counter.h | 103 +++
 include/bits/atomic_wide_counter.h|   1 +
 include/bits/dlfcn_eh_frame.h |   1 +
 include/link.h|   3 +
 malloc/set-freeres.c  |   5 +
 manual/Makefile   |   2 +-
 manual/dynlink.texi   |  69 ++
 manual/libdl.texi |  10 -
 manual/probes.texi|   2 +-
 manual/threads.texi   |   2 +-
 misc/Makefile |   3 +-
 misc/atomic_wide_counter.c| 127 +++
 nptl/Makefile |  13 +-
 nptl/pthread_cond_common.c| 204 +
 nptl/tst-cond22.c |  14 +-
 sysdeps/generic/ldsodefs.h|   7 +
 sysdeps/i386/bits/dlfcn_eh_frame.h|  34 +
 sysdeps/mach/hurd/i386/ld.abilist |   1 +
 sysdeps/nios2/bits/dlfcn_eh_frame.h   |  34 +
 sysdeps/nptl/bits/thread-shared-types.h   |  22 +-
 sysdeps/unix/sysv/linux/aarch64/ld.abilist|   1 +
 sysdeps/unix/sysv/linux/alpha/ld.abilist  |   1 +
 sysdeps/unix/sysv/linux/arc/ld.abilist|   1 +
 sysdeps/unix/sysv/linux/arm/be/ld.abilist |   1 +
 sysdeps/unix/sysv/linux/arm/le/ld.abilist |   1 +
 sysdeps/unix/sysv/linux/csky/ld.abilist   |   1 +
 sysdeps/unix/sysv/linux/hppa/ld.abilist   |   1 +
 sysdeps/unix/sysv/linux/i386/ld.abilist   |   1 +
 sysdeps/unix/sysv/linux/ia64/ld.abilist   |   1 +
 .../unix/sysv/linux/m68k/coldfire/ld.abilist  |   1 +
 .../unix/sysv/linux/m68k/m680x0/ld.abilist|   1 +
 sysdeps/unix/sysv/linux/microblaze/ld.abilist |   1 +
 .../unix/sysv/linux/mips/mips32/ld.abilist|   1 +
 .../sysv/linux/mips/mips64/n32/ld.abilist |   1 +

Re: [PATCH] libgomp: Include early to avoid link failure with glibc 2.34

2021-07-12 Thread Florian Weimer via Gcc-patches

* Florian Weimer:

> * Florian Weimer:
>
>> I tested this on csky-linux-gnuabiv2 with the glibc version that failed
>> before, and it works.  So I guess your version is fine, too.
>
> Build on powerpc64-linux-gnu and other targets now fails with:
>
> /home/bmg/src/gcc/libgomp/config/linux/affinity.c: In function 
> ‘gomp_init_affini
> ty’:
> /home/bmg/src/gcc/libgomp/config/linux/affinity.c:53:41: error: ‘ULONG_MAX’ 
> unde
> clared (first use in this function)
>53 |   if (!gomp_affinity_init_level (1, ULONG_MAX, true))
>   | ^
>
> So affinity.c will need to include .

I verifed that this change on top successfully builds GCC for all glibc
targets:

diff --git a/libgomp/config/linux/affinity.c b/libgomp/config/linux/affinity.c
index c5abdce23..1b636c613 100644
--- a/libgomp/config/linux/affinity.c
+++ b/libgomp/config/linux/affinity.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef HAVE_PTHREAD_AFFINITY_NP
 

Thanks,
Florian

Re: [PATCH] libgomp: Include early to avoid link failure with glibc 2.34

2021-07-12 Thread Florian Weimer via Gcc-patches

* Florian Weimer:

> I tested this on csky-linux-gnuabiv2 with the glibc version that failed
> before, and it works.  So I guess your version is fine, too.

Build on powerpc64-linux-gnu and other targets now fails with:

/home/bmg/src/gcc/libgomp/config/linux/affinity.c: In function ‘gomp_init_affini
ty’:
/home/bmg/src/gcc/libgomp/config/linux/affinity.c:53:41: error: ‘ULONG_MAX’ unde
clared (first use in this function)
   53 |   if (!gomp_affinity_init_level (1, ULONG_MAX, true))
  | ^

So affinity.c will need to include .

Thanks,
Florian

Re: [PATCH] libgomp: Include early to avoid link failure with glibc 2.34

2021-07-12 Thread Florian Weimer via Gcc-patches

* Jakub Jelinek:

> On Mon, Jul 12, 2021 at 10:26:47AM +0200, Florian Weimer via Gcc-patches 
> wrote:
>>  is included indirectly in the #pragma GCC visibility hidden
>> block.  With glibc 2.34,  needs a declaration of the sysconf
>> function, and including it under hidden visibility turns other calls
>> to sysconf into hidden references, leading to a linker failure.
>> 
>> libgomp/ChangeLog:
>> 
>>  * libgomp.h: Include .
>
> If this is because of the config/linux/sem.h #include ,
> I'd prefer not to include that header instead, we rely on being compiled
> by GCC anyway (and clang/icc support __INT_MAX__ anyway).
>
> Or e.g. config/posix/sem.h uses
> #ifdef HAVE_ATTRIBUTE_VISIBILITY
> # pragma GCC visibility push(default)
> #endif
>
> #include 
>
> #ifdef HAVE_ATTRIBUTE_VISIBILITY
> # pragma GCC visibility pop
> #endif
>
> 2021-07-12  Jakub Jelinek  
>   Florian Weimer  
>
>   * config/linux/sem.h: Don't include limits.h.
>   (SEM_WAIT): Define to -__INT_MAX__ - 1 instead of INT_MIN.
>
> --- libgomp/config/linux/sem.h.jj 2021-01-18 07:18:42.360339646 +0100
> +++ libgomp/config/linux/sem.h2021-07-12 15:18:10.121178404 +0200
> @@ -33,10 +33,8 @@
>  #ifndef GOMP_SEM_H
>  #define GOMP_SEM_H 1
>  
> -#include  /* For INT_MIN */
> -
>  typedef int gomp_sem_t;
> -#define SEM_WAIT INT_MIN
> +#define SEM_WAIT (-__INT_MAX__ - 1)
>  #define SEM_INC 1
>  
>  extern void gomp_sem_wait_slow (gomp_sem_t *, int);

I tested this on csky-linux-gnuabiv2 with the glibc version that failed
before, and it works.  So I guess your version is fine, too.

Thanks,
Florian

Re: [PATCH] libgomp: Include early to avoid link failure with glibc 2.34

2021-07-12 Thread Florian Weimer via Gcc-patches

* Florian Weimer:

>  is included indirectly in the #pragma GCC visibility hidden
> block.  With glibc 2.34,  needs a declaration of the sysconf
> function, and including it under hidden visibility turns other calls
> to sysconf into hidden references, leading to a linker failure.
>
> libgomp/ChangeLog:
>
>   * libgomp.h: Include .
>
> ---
>  libgomp/libgomp.h | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
> index 8d25dc8e2a8..1fe209429d1 100644
> --- a/libgomp/libgomp.h
> +++ b/libgomp/libgomp.h
> @@ -46,6 +46,7 @@
>  #include "libgomp-plugin.h"
>  #include "gomp-constants.h"
>  
> +#include 
>  #ifdef HAVE_PTHREAD_H
>  #include 
>  #endif

I think this is a real libgomp bug, but if this glibc patch is accepted,
at least libgomp will build again:

  Reduce  pollution due to dynamic PTHREAD_STACK_MIN
  

So while I still think libgomp should be fixed, it won't need
backporting to release branches (assuming the glibc workaround goes in,
of course).

Thanks,
Florian

[PATCH] libgomp: Include early to avoid link failure with glibc 2.34

2021-07-12 Thread Florian Weimer via Gcc-patches

 is included indirectly in the #pragma GCC visibility hidden
block.  With glibc 2.34,  needs a declaration of the sysconf
function, and including it under hidden visibility turns other calls
to sysconf into hidden references, leading to a linker failure.

libgomp/ChangeLog:

* libgomp.h: Include .

---
 libgomp/libgomp.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 8d25dc8e2a8..1fe209429d1 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -46,6 +46,7 @@
 #include "libgomp-plugin.h"
 #include "gomp-constants.h"
 
+#include 
 #ifdef HAVE_PTHREAD_H
 #include 
 #endif

Re: [PATCH][libsanitizer]: Guard cyclades inclusion in sanitizer

2021-05-20 Thread Florian Weimer via Gcc-patches

* Tamar Christina via Gcc-patches:

> Hi All,
>
> libsanitizer: Guard cyclades inclusion in sanitizer
>
> The Linux kernel has removed the interface to cyclades from
> the latest kernel headers[1] due to them being orphaned for the
> past 13 years.

Nit: The commit subject doesn't match the patch because it removes the
functionality unconditionally (which is fine).

Thanks,
Florian

Re: [RFC v2] bpf.2: Use standard types and attributes

2021-05-04 Thread Florian Weimer via Gcc-patches

* Alejandro Colomar:

> The thing is, in all of those threads, the only reasons to avoid
>  types in the kernel (at least, the only explicitly
> mentioned ones) are (a bit simplified, but this is the general idea of
> those threads):
>
> * Possibly breaking something in such a big automated change.
> * Namespace collision with userspace (the C standard allows defining
>   uint32_t for nefarious purposes as long as you don't include
>  .   POSIX prohibits that, though)
> * Uglier

__u64 can't be formatted with %llu on all architectures.  That's not
true for uint64_t, where you have to use %lu on some architectures to
avoid compiler warnings (and technically undefined behavior).  There are
preprocessor macros to get the expected format specifiers, but they are
clunky.  I don't know if the problem applies to uint32_t.  It does
happen with size_t and ptrdiff_t on 32-bit targets (both vary between
int and long).

Thanks,
Florian

Re: mention x86-64-vX micro-architecture levels in the release notes

2021-04-27 Thread Florian Weimer via Gcc-patches

* Matthias Klose:

> Just saw, these are not mentioned on the release notes. Ok to document these?
>
> Matthias
>
> --- a/htdocs/gcc-11/changes.html
> +++ b/htdocs/gcc-11/changes.html
> @@ -690,6 +690,10 @@ You may also want to check out our
>GCC now supports AMD CPUs based on the znver3 core
>  via -march=znver3.
>
> +  GCC now supports micro-architecture levels defined in the x86-64 psABI
> +via -march=x86-64-v2, -march=x86-64-v3 and
> +-march=x86-64-v4.
> +  
>  

Looks okay to me.  Thanks.

Florian

Re: [PATCH, V2] Add conversions between _Float128 and Decimal.

2021-03-27 Thread Florian Weimer via Gcc-patches

* Michael Meissner via Gcc-patches:

> On Wed, Feb 24, 2021 at 11:12:54PM +, Joseph Myers wrote:
>> This change appears to have broken builds for powerpc in a configuration 
>> that bootstraps a cross toolchain starting with a GCC build with no libc 
>> available.
>> 
>> Specifically, such a bootstrap build uses --disable-decimal-float among 
>> other options (in the first GCC build before libc has been built), to 
>> disable GCC target library code that has any dependence on libc or libc 
>> headers - dfp-bit.c uses libc headers, without an inhibit_libc 
>> conditional, so cannot be used in such a bootstrap configuration.  Most of 
>> the DFP code in libgcc is disabled by --disable-decimal-float, but it 
>> seems the new conversions are not.  This results in errors of the form:
>> 
>> In file included from 
>> /scratch/jmyers/glibc-bot/src/gcc/libgcc/config/rs6000/_kf_to_sd.c:37:
>> /scratch/jmyers/glibc-bot/src/gcc/libgcc/dfp-bit.c:32:10: fatal error: 
>> stdio.h: No such file or directory
>>32 | #include 
>>   |  ^
>> compilation terminated.
>> 
>> The appropriate fix is not to build any of these new conversions in the 
>> --disable-decimal-float case.  (That clearly makes sense anyway, even in 
>> the absence of the bootstrap issue.)
>> 
>> https://sourceware.org/pipermail/libc-testresults/2021q1/007576.html
>
> Thanks.  I agree if --disable-decimal-float is used we should not build the
> conversions.  And we want to eliminate the stdio.h dependency.  I will look at
> it unless somebody has already fixed it.

This issue is still present.

What about the patch below?

Thanks,
Florian

rs6000: Do not build _Float128/Decimal routines with --disable-decimal-float

Fixes commit 781183595acba67a37c66f59a0c1d9b5fee7e248 ("Add conversions
between _Float128 and Decimal.).

libgcc/
* config/rs6000/t-float128 (fp128_ppc_funcs): Add decimal floating
point functions for $(decimal_float) only.

diff --git a/libgcc/config/rs6000/t-float128 b/libgcc/config/rs6000/t-float128
index 6fb1a3d871b..b1f00accdf1 100644
--- a/libgcc/config/rs6000/t-float128
+++ b/libgcc/config/rs6000/t-float128
@@ -37,8 +37,11 @@ ibm128_dec_funcs = _tf_to_sd _tf_to_dd _tf_to_td \
 # New functions for software emulation
 fp128_ppc_funcs= floattikf floatuntikf fixkfti fixunskfti \
  extendkftf2-sw trunctfkf2-sw \
- sfp-exceptions _mulkc3 _divkc3 _powikf2 \
- $(fp128_dec_funcs) $(fp128_decstr_funcs)
+ sfp-exceptions _mulkc3 _divkc3 _powikf2
+
+ifeq ($(decimal_float),yes)
+fp128_ppc_funcs+= $(fp128_dec_funcs) $(fp128_decstr_funcs)
+endif
 
 fp128_ppc_src  = $(addprefix $(srcdir)/config/rs6000/,$(addsuffix \
.c,$(fp128_ppc_funcs)))

Re: [PATCH] improve warning suppression for inlined functions (PR 98465, 98512)

2021-02-19 Thread Florian Weimer via Gcc-patches

* Jeff Law via Gcc-patches:

> I'd lean towards deferring to gcc12 stage1 given the libstdc++ hack is
> in place.  That does mean that glibc will need to work around the
> instance they've stumbled over in ppc's rawmemchr.

We'll need to work around this in the glibc build, too.  I'll check if
the suggested alternative (have the alias covered by the pragma) works
there as well.

Thanks,
Florian

Re: [PATCH] rs6000: Fix MMA API - Add support for compatibility built-ins

2021-02-05 Thread Florian Weimer via Gcc-patches

* Peter Bergner:

> On 2/5/21 4:28 AM, Florian Weimer wrote:
>> * Peter Bergner via Gcc-patches:
>> 
>>> The LLVM and GCC teams agreed to rename the __builtin_mma_assemble_pair and
>>> __builtin_mma_disassemble_pair built-ins to __builtin_vsx_assemble_pair and
>>> __builtin_vsx_disassemble_pair respectively.  It's too late to remove the
>>> old names, so this patch adds support for creating compatibility built-ins
>>> (ie, multiple built-in functions generate the same code) and then creates
>>> compatibility built-ins using the new names.
>> 
>> Maybe add a check that the compatibility builtins are flagged as
>> availble using __has_builtin?
>
> Do you mean add a test in the testsuite for this?  I can check on
> adding that to the test case.

Right, in the test case.  Given that it's a new kind of built-in.
(Not sure if it makes sense.)

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill

Re: [PATCH] rs6000: Fix MMA API - Add support for compatibility built-ins

2021-02-05 Thread Florian Weimer via Gcc-patches

* Peter Bergner via Gcc-patches:

> The LLVM and GCC teams agreed to rename the __builtin_mma_assemble_pair and
> __builtin_mma_disassemble_pair built-ins to __builtin_vsx_assemble_pair and
> __builtin_vsx_disassemble_pair respectively.  It's too late to remove the
> old names, so this patch adds support for creating compatibility built-ins
> (ie, multiple built-in functions generate the same code) and then creates
> compatibility built-ins using the new names.

Maybe add a check that the compatibility builtins are flagged as
availble using __has_builtin?

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill

1 2 >

1 - 100 of 122 matches

Mail list logo