[PATCH 2/2] Ada: Finalization of constrained subtypes of unconstrained synchronized private extensions

2023-08-09 Thread Richard Wai
When generating TSS address finalization bodies for a tagged class-wide
subtype, GNAT climbs the parent chain looking for the first
"non-constrained" type. That type's underlying type's class-wide type is
used as a "designated" type for a dispatching TSS deep finalize call to the
designated class-wide type. In the case of a constrained subtype of an
unconstrained synchronized private extension, this ends up designating the
underlying type of that private extension. This means it targets the
class-wide type of the actual underlying concurrent type rather than the
corresponding record. Ultimately it ends up generating a call to the
corresponding record's deep finalizer, but with incompatible types
(concurrent_type'Class -> concurrent_typeV'Class). This causes compilation
to fail.

 

This patch adds extra logic to exp_ch7(Make_Finalize_Address_Stmts) to
identify such cases and ensure that the designated type is the corresponding
record type's class-wide type in that situation.

 

Patch file is attached.

 

--  Begin change log entry -

 

ada: TSS finalize address subprogram generation for constrained subtypes of
unconstrained synchronized private extensions should take care to designate
the corresponding record of the underlying concurrent type.

 

When generating TSS finalize address subprograms for class-wide types of
constrained root types, it follows the parent chain looking for the first
"non-constrained" type. It is possible that such a type is a private
extension with the "synchronized" keyword, in which case the underlying type
is a concurrent type. When that happens, the designated type of the finalize
address subprogram should be the corresponding record's class-wide-type.

 

Gcc/ada/

* exp_ch3(Expand_Freeze_Class_Wide_Type): Expanded comments
explaining why TSS Finalize_Address is not generated for concurrent
class-wide types.

* exp_ch7(Make_Finalize_Address_Stmts): Handle cases where
the underlying non-constrained parent type is a concurrent type, and adjust
the designated type to be the corresponding record's class-wide type.

 

--  End change log entry -

 

This patch was bootstrapped on x86_64-*-freebsd13.2. One new test cases was
added. Note that 4 gnat test cases fail currently on master and are
unrelated to this patch.

 

Check-ada output of this patch:

 

=== acats tests ===

Running chapter a ...

Running chapter c2 ...

Running chapter c3 ...

Running chapter c4 ...

Running chapter c5 ...

Running chapter c6 ...

Running chapter c7 ...

Running chapter c8 ...

Running chapter c9 ...

Running chapter ca ...

Running chapter cb ...

Running chapter cc ...

Running chapter cd ...

Running chapter ce ...

Running chapter cxa ...

Running chapter cxb ...

Running chapter cxf ...

Running chapter cxg ...

Running chapter cxh ...

Running chapter cz ...

Running chapter d ...

Running chapter e ...

Running chapter l ...

=== acats Summary ===

# of expected passes   2328

# of unexpected failures 0

 

Native configuration is x86_64-unknown-freebsd13.2

 

=== gnat tests ===

 

Schedule of variations:

unix

 

Running target unix

FAIL: gnat.dg/specs/alignment2.ads  (test for warnings, line 14)

FAIL: gnat.dg/specs/alignment2.ads  (test for warnings, line 20)

FAIL: gnat.dg/specs/alignment2.ads  (test for warnings, line 38)

FAIL: gnat.dg/specs/alignment2.ads  (test for warnings, line 42)

 

=== gnat Summary ===

 

# of expected passes   3401

# of unexpected failures 4

# of expected failures  23

# of unsupported tests   10

gnatmake version 14.0.0 20230809 (experimental)

 

 

Richard Wai

ANNEXI-STRAYLINE



ada-tss-constrained-subtype-of-private-synchronized-extention.patch
Description: Binary data


[PATCH] sso-string@gnu-versioned-namespace [PR83077]

2023-08-09 Thread François Dumont via Gcc-patches

Hi

I've eventually completed this work.

This evolution will allow to build libstdc++ without dual abi and using 
cxx11 abi. For the moment such a config is only accessible through the 
--enable-symvers=gnu-versioned-namespace configuration.


    libstdc++: [_GLIBCXX_INLINE_VERSION] Use cxx11 abi [PR83077]

    Use cxx11 abi when activating versioned namespace mode.

    libstdcxx-v3/ChangeLog:

    PR libstdc++/83077
    * acinclude.m4 [GLIBCXX_ENABLE_LIBSTDCXX_DUAL_ABI]: Default 
to "new" libstdcxx abi.
    * config/locale/dragonfly/monetary_members.cc 
[!_GLIBCXX_USE_DUAL_ABI]: Define money_base

    members.
    * config/locale/generic/monetary_members.cc 
[!_GLIBCXX_USE_DUAL_ABI]: Likewise.
    * config/locale/gnu/monetary_members.cc 
[!_GLIBCXX_USE_DUAL_ABI]: Likewise.

    * config/locale/gnu/numeric_members.cc
    [!_GLIBCXX_USE_DUAL_ABI](__narrow_multibyte_chars): Define.
    * configure: Regenerate.
    * include/bits/c++config
    [_GLIBCXX_INLINE_VERSION](_GLIBCXX_NAMESPACE_CXX11, 
_GLIBCXX_BEGIN_NAMESPACE_CXX11):

    Define empty.
[_GLIBCXX_INLINE_VERSION](_GLIBCXX_END_NAMESPACE_CXX11, 
_GLIBCXX_DEFAULT_ABI_TAG):

    Likewise.
    * include/bits/cow_string.h [!_GLIBCXX_USE_CXX11_ABI]: 
Define a light version of COW

    basic_string as __std_cow_string for use in stdexcept.
    * include/std/stdexcept [_GLIBCXX_USE_CXX11_ABI]: Define 
__cow_string.

    (__cow_string(const char*)): New.
    (__cow_string::c_str()): New.
    * python/libstdcxx/v6/printers.py 
(StdStringPrinter::__init__): Set self.new_string to True

    when std::__8::basic_string type is found.
    * src/Makefile.am 
[ENABLE_SYMVERS_GNU_NAMESPACE](ldbl_alt128_compat_sources): Define empty.

    * src/Makefile.in: Regenerate.
    * src/c++11/Makefile.am (cxx11_abi_sources): Rename into...
    (dual_abi_sources): ...this. Also move cow-local_init.cc, 
cxx11-hash_tr1.cc,

    cxx11-ios_failure.cc entries to...
    (sources): ...this.
    (extra_string_inst_sources): Move cow-fstream-inst.cc, 
cow-sstream-inst.cc, cow-string-inst.cc,
    cow-string-io-inst.cc, cow-wtring-inst.cc, 
cow-wstring-io-inst.cc, cxx11-locale-inst.cc,

    cxx11-wlocale-inst.cc entries to...
    (inst_sources): ...this.
    * src/c++11/Makefile.in: Regenerate.
    * src/c++11/cow-fstream-inst.cc [_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.
    * src/c++11/cow-locale_init.cc [_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.
    * src/c++11/cow-sstream-inst.cc [_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.
    * src/c++11/cow-stdexcept.cc [_GLIBCXX_USE_CXX11_ABI]: 
Include .
    [_GLIBCXX_USE_DUAL_ABI || 
_GLIBCXX_USE_CXX11_ABI](__cow_string): Redefine before including
    . Define 
_GLIBCXX_DEFINE_STDEXCEPT_INSTANTIATIONS so that __cow_string definition

    in  is skipped.
    [_GLIBCXX_USE_CXX11_ABI]: Skip Transaction Memory TS 
definitions.
    * src/c++11/cow-string-inst.cc [_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.
    * src/c++11/cow-string-io-inst.cc [_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.
    * src/c++11/cow-wstring-inst.cc [_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.
    * src/c++11/cow-wstring-io-inst.cc 
[_GLIBCXX_USE_CXX11_ABI]: Skip definitions.
    * src/c++11/cxx11-hash_tr1.cc [!_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.
    * src/c++11/cxx11-ios_failure.cc [!_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.

    [!_GLIBCXX_USE_DUAL_ABI] (__ios_failure): Remove.
    * src/c++11/cxx11-locale-inst.cc: Cleanup, just include 
locale-inst.cc.
    * src/c++11/cxx11-stdexcept.cc [!_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.
    * src/c++11/cxx11-wlocale-inst.cc 
[!_GLIBCXX_USE_CXX11_ABI]: Skip definitions.

    * src/c++11/locale-inst-numeric.h
[!_GLIBCXX_USE_DUAL_ABI](std::use_facet>, 
std::use_facet>): Instantiate.
[!_GLIBCXX_USE_DUAL_ABI](std::has_facet>, 
std::has_facet>): Instantiate.
    [!_GLIBCXX_USE_DUAL_ABI](std::num_getistreambuf_iterator>): Instantiate.
    [!_GLIBCXX_USE_DUAL_ABI](std::num_putostreambuf_iterator>): Instantiate.
    * src/c++11/locale-inst.cc [!_GLIBCXX_USE_DUAL_ABI]: Build 
only when configured

    _GLIBCXX_USE_CXX11_ABI is equal to currently built abi.
    [!_GLIBCXX_USE_DUAL_ABI](__moneypunct_cache): 
Instantiate.
    [!_GLIBCXX_USE_DUAL_ABI](__moneypunct_cache): 
Instantiate.

    [!_GLIBCXX_USE_DUAL_ABI](__numpunct_cache): Instantiate.
    [!_GLIBCXX_USE_DUAL_ABI](__timepunct): Instantiate.
    [!_GLIBCXX_USE_DUAL_ABI](__timepunct_cache): Instantiate.
    [!_GLIBCXX_USE_DUAL_ABI](time_putostreambuf_iterator>): Instantiate.
    

[PATCH v1] RISC-V: Support RVV VFMACC rounding mode intrinsic API

2023-08-09 Thread Pan Li via Gcc-patches
From: Pan Li 

This patch would like to support the rounding mode API for the
VFMACC for the below samples.

* __riscv_vfmacc_vv_f32m1_rm
* __riscv_vfmacc_vv_f32m1_rm_m
* __riscv_vfmacc_vf_f32m1_rm
* __riscv_vfmacc_vf_f32m1_rm_m

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class vfmacc_frm): New class for vfmacc frm.
(vfmacc_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfmacc_frm): New function definition.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-single-macc.c: New test.
---
 .../riscv/riscv-vector-builtins-bases.cc  | 25 ++
 .../riscv/riscv-vector-builtins-bases.h   |  1 +
 .../riscv/riscv-vector-builtins-functions.def |  3 ++
 .../riscv/rvv/base/float-point-macc.c | 47 +++
 4 files changed, 76 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/float-point-macc.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index afe3735f5ee..1695d77e8bd 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -356,6 +356,29 @@ public:
   }
 };
 
+/* Implements below instructions for frm
+   - vfmacc
+*/
+class vfmacc_frm : public function_base
+{
+public:
+  bool has_rounding_mode_operand_p () const override { return true; }
+
+  bool has_merge_operand_p () const override { return false; }
+
+  rtx expand (function_expander ) const override
+  {
+if (e.op_info->op == OP_TYPE_vf)
+  return e.use_ternop_insn (true,
+   code_for_pred_mul_scalar (PLUS,
+ e.vector_mode ()));
+if (e.op_info->op == OP_TYPE_vv)
+  return e.use_ternop_insn (true,
+   code_for_pred_mul (PLUS, e.vector_mode ()));
+gcc_unreachable ();
+  }
+};
+
 /* Implements vrsub.  */
 class vrsub : public function_base
 {
@@ -2116,6 +2139,7 @@ static CONSTEXPR const reverse_binop_frm 
vfrdiv_frm_obj;
 static CONSTEXPR const widen_binop vfwmul_obj;
 static CONSTEXPR const widen_binop_frm vfwmul_frm_obj;
 static CONSTEXPR const vfmacc vfmacc_obj;
+static CONSTEXPR const vfmacc_frm vfmacc_frm_obj;
 static CONSTEXPR const vfnmsac vfnmsac_obj;
 static CONSTEXPR const vfmadd vfmadd_obj;
 static CONSTEXPR const vfnmsub vfnmsub_obj;
@@ -2351,6 +2375,7 @@ BASE (vfrdiv_frm)
 BASE (vfwmul)
 BASE (vfwmul_frm)
 BASE (vfmacc)
+BASE (vfmacc_frm)
 BASE (vfnmsac)
 BASE (vfmadd)
 BASE (vfnmsub)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 2d2b52a312c..67d18412b4c 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -160,6 +160,7 @@ extern const function_base *const vfrdiv_frm;
 extern const function_base *const vfwmul;
 extern const function_base *const vfwmul_frm;
 extern const function_base *const vfmacc;
+extern const function_base *const vfmacc_frm;
 extern const function_base *const vfnmsac;
 extern const function_base *const vfmadd;
 extern const function_base *const vfnmsub;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index d43b33ded17..92ecf8a9065 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -349,6 +349,9 @@ DEF_RVV_FUNCTION (vfnmadd, alu, full_preds, f_vvfv_ops)
 DEF_RVV_FUNCTION (vfmsub, alu, full_preds, f__ops)
 DEF_RVV_FUNCTION (vfmsub, alu, full_preds, f_vvfv_ops)
 
+DEF_RVV_FUNCTION (vfmacc_frm, alu_frm, full_preds, f__ops)
+DEF_RVV_FUNCTION (vfmacc_frm, alu_frm, full_preds, f_vvfv_ops)
+
 // 13.7. Vector Widening Floating-Point Fused Multiply-Add Instructions
 DEF_RVV_FUNCTION (vfwmacc, alu, full_preds, f_wwvv_ops)
 DEF_RVV_FUNCTION (vfwmacc, alu, full_preds, f_wwfv_ops)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-macc.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-macc.c
new file mode 100644
index 000..df29f4d240f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-macc.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
+
+#include "riscv_vector.h"
+
+typedef float float32_t;
+
+vfloat32m1_t
+test_riscv_vfmacc_vv_f32m1_rm (vfloat32m1_t vd, vfloat32m1_t op1,
+  vfloat32m1_t op2, size_t vl) {
+  return __riscv_vfmacc_vv_f32m1_rm (vd, op1, op2, 0, vl);
+}
+
+vfloat32m1_t
+test_vfmacc_vv_f32m1_rm_m (vbool32_t mask, vfloat32m1_t vd, vfloat32m1_t op1,
+  vfloat32m1_t op2, size_t vl) {
+  return __riscv_vfmacc_vv_f32m1_rm_m (mask, vd, op1, op2, 1, vl);
+}
+

[PATCH 1/2] Ada: Synchronized private extensions are always limited

2023-08-09 Thread Richard Wai
GNAT currently considers a synchronized private extension that derives from
an interface to be limited only when said interface is a concurrent
interface. However it is of course legal for a synchronized private
extension to derive from a limited interface. In this case GNAT fails to
correctly determine that the private extension is limited.

 

This causes two separate problems that makes discriminated types in such a
case impossible:

1.  GNAT inappropriately rejects compilation, claiming default
discriminants on such a private extension are illegal.
2.  GNAT fails to generate the expected discriminals for the
unconstrained discriminanted case, leading to the corresponding
discriminants of the "corresponding record" of the underlying concurrent
type to have no identifiers, and thus compilation fails.

 

Fairly simple fix. If "synchronized" appears in the private extension
declaration, it is limited. This is explicit in the RM as well (7.3(6/2)).

 

Fixing this bug uncovered of a related bug wrt. TSS address finalizer
generation for constrained subtypes of synchronized private extensions with
no default discriminants. That patch is to follow separately.

 

Patch file is attached.

 

--  Begin change log entry --

 

ada: Private extensions with the keyword "synchronized" are always limited.

 

GNAT was relying on synchronized private type extensions deriving from a
concurrent interface to determine its limitedness. This does not cover the
case where such an extension derives a limited interface. RM-7.6(6/2) makes
is clear that "synchronized" in a private extension implies the derived type
is limited. GNAT should explicitly check for the presence of "synchronized"
in a private extension declaration, and it should have the same effect as
the presence of "limited".

 

gcc/ada/

* sem_ch3.adb (Build_Derived_Record_Type): Treat presence of
keyword "synchronized" the same as "limited" when determining if a private
extension is limited.

 

-- End change log entry --

 

This patch was bootstrapped on x86_64-*-freebsd13.2. Two new test cases were
added. Note that 4 gnat test cases fail currently on master and are
unrelated to this patch.

 

Check-ada output of this patch:

=== acats tests ===

Running chapter a ...

Running chapter c2 ...

Running chapter c3 ...

Running chapter c4 ...

Running chapter c5 ...

Running chapter c6 ...

Running chapter c7 ...

Running chapter c8 ...

Running chapter c9 ...

Running chapter ca ...

Running chapter cb ...

Running chapter cc ...

Running chapter cd ...

Running chapter ce ...

Running chapter cxa ...

Running chapter cxb ...

Running chapter cxf ...

Running chapter cxg ...

Running chapter cxh ...

Running chapter cz ...

Running chapter d ...

Running chapter e ...

Running chapter l ...

=== acats Summary ===

# of expected passes   2328

# of unexpected failures 0

 

Native configuration is x86_64-unknown-freebsd13.2

 

=== gnat tests ===

 

Schedule of variations:

unix

 

Running target unix

FAIL: gnat.dg/specs/alignment2.ads  (test for warnings, line 14)

FAIL: gnat.dg/specs/alignment2.ads  (test for warnings, line 20)

FAIL: gnat.dg/specs/alignment2.ads  (test for warnings, line 38)

FAIL: gnat.dg/specs/alignment2.ads  (test for warnings, line 42)

 

=== gnat Summary ===

 

# of expected passes   3402

# of unexpected failures4

# of expected failures      23

# of unsupported tests   10

gnatmake version 14.0.0 20230809 (experimental)

 

 

Richard Wai

ANNEXI-STRAYLINE



ada-synchronized-private-types-are-limited.patch
Description: Binary data


RE: [PATCH v3] RISC-V: Refactor RVV frm_mode attr for rounding mode intrinsic

2023-08-09 Thread Li, Pan2 via Gcc-patches
Committed with some cleanup in change log, thanks Kito.

Pan

From: Kito Cheng 
Sent: Thursday, August 10, 2023 11:14 AM
To: Li, Pan2 
Cc: GCC Patches ; 钟居哲 ; Jeff Law 
; Wang, Yanzhang ; Kito Cheng 

Subject: Re: [PATCH v3] RISC-V: Refactor RVV frm_mode attr for rounding mode 
intrinsic

LGTM

mailto:pan2...@intel.com>> 於 2023年8月10日 週四 11:12 寫道:
From: Pan Li mailto:pan2...@intel.com>>

The frm_mode attr has some assumptions for each define insn as below.

1. The define insn has at least 9 operands.
2. The operands[9] must be frm reg.
3. The operands[9] must be const int.

Actually, the frm operand can be operands[8], operands[9] or
operands[10], and not all the define insn has frm operands.

This patch would like to refactor frm and eliminate the above
assumptions, as well as unblock the underlying rounding mode intrinsic
API support.

After refactor, the default frm will be none, and the selected insn type
will be dyn. For the floating point which honors the frm, we will
set the frm_mode attr explicitly in define_insn.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
Co-Authored-by: Kito Cheng mailto:kito.ch...@sifive.com>>

gcc/ChangeLog:

* config/riscv/riscv-protos.h (get_frm_mode): Remove operand
assumptions.
* config/riscv/riscv-v.cc (get_frm_mode): New function.
* config/riscv/riscv-vector-builtins.cc
(function_expander::use_ternop_insn):
* config/riscv/vector.md: Set frm_mode attr explicitly.

gcc/ChangeLog:

* config/riscv/riscv-protos.h
(enum floating_point_rounding_mode): Add NONE, DYN_EXIT and DYN_CALL.
(get_frm_mode): New declaration.
* config/riscv/riscv-v.cc (get_frm_mode): New function to get frm mode.
* config/riscv/riscv-vector-builtins.cc
(function_expander::use_ternop_insn): Take care of frm reg.
* config/riscv/riscv.cc (riscv_static_frm_mode_p): Migrate to FRM_XXX.
(riscv_emit_frm_mode_set): Ditto.
(riscv_emit_mode_set): Ditto.
(riscv_frm_adjust_mode_after_call): Ditto.
(riscv_frm_mode_needed): Ditto.
(riscv_frm_mode_after): Ditto.
(riscv_mode_entry): Ditto.
(riscv_mode_exit): Ditto.
* config/riscv/riscv.h (NUM_MODES_FOR_MODE_SWITCHING): Ditto.
* config/riscv/vector.md
(rne,rtz,rdn,rup,rmm,dyn,dyn_exit,dyn_call,none): Removed
(symbol_ref): * config/riscv/vector.md: Set frm_mode attr explicitly.
---
 gcc/config/riscv/riscv-protos.h   |   4 +
 gcc/config/riscv/riscv-v.cc   |  29 
 gcc/config/riscv/riscv-vector-builtins.cc |  22 ++-
 gcc/config/riscv/riscv.cc |  46 +++---
 gcc/config/riscv/riscv.h  |   2 +-
 gcc/config/riscv/vector.md| 172 ++
 6 files changed, 186 insertions(+), 89 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index ac9c4b02f17..2fbed04ff84 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -349,8 +349,12 @@ enum floating_point_rounding_mode
   FRM_DYN = 7, /* Aka 0b111.  */
   FRM_STATIC_MIN = FRM_RNE,
   FRM_STATIC_MAX = FRM_RMM,
+  FRM_DYN_EXIT = 8,
+  FRM_DYN_CALL = 9,
+  FRM_NONE = 10,
 };

+enum floating_point_rounding_mode get_frm_mode (rtx);
 opt_machine_mode vectorize_related_mode (machine_mode, scalar_mode,
 poly_uint64);
 unsigned int autovectorize_vector_modes (vec *, bool);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 0bea04c1967..c9f0a4a9e7b 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -112,6 +112,7 @@ public:
   {
 m_has_fp_rounding_mode_p = true;
 m_fp_rounding_mode = mode;
+gcc_assert (mode <= FRM_DYN);
   }

   void add_output_operand (rtx x, machine_mode mode)
@@ -1590,6 +1591,34 @@ expand_const_vector (rtx target, rtx src)
 gcc_unreachable ();
 }

+/* Get the frm mode with given CONST_INT rtx, the default mode is
+   FRM_DYN.  */
+enum floating_point_rounding_mode
+get_frm_mode (rtx operand)
+{
+  gcc_assert (CONST_INT_P (operand));
+
+  switch (INTVAL (operand))
+{
+case FRM_RNE:
+  return FRM_RNE;
+case FRM_RTZ:
+  return FRM_RTZ;
+case FRM_RDN:
+  return FRM_RDN;
+case FRM_RUP:
+  return FRM_RUP;
+case FRM_RMM:
+  return FRM_RMM;
+case FRM_DYN:
+  return FRM_DYN;
+default:
+  gcc_unreachable ();
+}
+
+  gcc_unreachable ();
+}
+
 /* Expand a pre-RA RVV data move from SRC to DEST.
It expands move for RVV fractional vector modes.  */
 bool
diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index 528dca7ae85..abab06c00ed 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -3730,17 +3730,29 @@ function_expander::use_ternop_insn (bool vd_accum_p, 
insn_code icode)
 }

   for (int argno = arg_offset; 

RE: Re: [PATCH] RISC-V: Fix VLMAX AVL incorrect local anticipate [VSETVL PASS]

2023-08-09 Thread Li, Pan2 via Gcc-patches
Thanks Jeff. Ported to gcc-13 with minor changes to test cases.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of juzhe.zh...@rivai.ai
Sent: Thursday, August 10, 2023 8:50 AM
To: jeffreyalaw ; gcc-patches 
Cc: kito.cheng ; Kito.cheng ; 
Robin Dapp 
Subject: Re: Re: [PATCH] RISC-V: Fix VLMAX AVL incorrect local anticipate 
[VSETVL PASS]

Yes. I think so. Will backport GCC 13 soon.



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-08-10 01:01
To: Juzhe-Zhong; gcc-patches
CC: kito.cheng; kito.cheng; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Fix VLMAX AVL incorrect local anticipate [VSETVL 
PASS]
 
 
On 8/9/23 04:51, Juzhe-Zhong wrote:
> Realize we have a bug in VSETVL PASS which is triggered by 
> strided_load_run-1.c in RV32 system.
> 
> FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c 
> execution test
> FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c 
> execution test
> FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c 
> execution test
> FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c 
> execution test
> 
> This is because VSETVL PASS incorrect hoist vsetvl instruction:
> 
> ...
> 10156: 0d9075d7  vsetvli a1,zero,e64,m2,ta,ma ---> pollute 'a1' 
> register which will be used by following insns.
> 1015a: 01d586b3  add a3,a1,t4  > use 'a1'
> 1015e: 5e070257  vmv.v.v v4,v14
> 10162: b7032257  vmacc.vv v4,v6,v16
> 10166: 26440257  vand.vv v4,v4,v8
> 1016a: 22880227  vs2r.v v4,(a6)
> 1016e: 00b6b7b3  sltu a5,a3,a1
> 10172: 22888227  vs2r.v v4,(a7)
> 10176: 9e60b157  vmv2r.v v2,v6
> 1017a: 97baadd a5,a5,a4
> 1017c: a6a62157  vmadd.vv v2,v12,v10
> 10180: 26240157  vand.vv v2,v2,v8
> 10184: 22830127  vs2r.v v2,(t1)
> 10188: 873emv a4,a5
> 1018a: 982aadd a6,a6,a0
> 1018c: 98aaadd a7,a7,a0
> 1018e: 932aadd t1,t1,a0
> 10190: 85b6mv a1,a3   -> set 'a1'
> ...
> 
> gcc/ChangeLog:
> 
>  * config/riscv/riscv-vsetvl.cc (anticipatable_occurrence_p): Fix 
> incorrect anticipate info.
> 
> gcc/testsuite/ChangeLog:
> 
>  * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c: 
> Adapt test.
>  * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-24.c: Ditto.
>  * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c: Ditto.
>  * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c: Ditto.
>  * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-36.c: Ditto.
>  * gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-14.c: Ditto.
>  * gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-15.c: Ditto.
OK.
 
Do we need to backport this to gcc-13?
 
jeff
 


Re: [PATCH v3] RISC-V: Refactor RVV frm_mode attr for rounding mode intrinsic

2023-08-09 Thread Kito Cheng via Gcc-patches
LGTM

 於 2023年8月10日 週四 11:12 寫道:

> From: Pan Li 
>
> The frm_mode attr has some assumptions for each define insn as below.
>
> 1. The define insn has at least 9 operands.
> 2. The operands[9] must be frm reg.
> 3. The operands[9] must be const int.
>
> Actually, the frm operand can be operands[8], operands[9] or
> operands[10], and not all the define insn has frm operands.
>
> This patch would like to refactor frm and eliminate the above
> assumptions, as well as unblock the underlying rounding mode intrinsic
> API support.
>
> After refactor, the default frm will be none, and the selected insn type
> will be dyn. For the floating point which honors the frm, we will
> set the frm_mode attr explicitly in define_insn.
>
> Signed-off-by: Pan Li 
> Co-Authored-by: Kito Cheng 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-protos.h (get_frm_mode): Remove operand
> assumptions.
> * config/riscv/riscv-v.cc (get_frm_mode): New function.
> * config/riscv/riscv-vector-builtins.cc
> (function_expander::use_ternop_insn):
> * config/riscv/vector.md: Set frm_mode attr explicitly.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-protos.h
> (enum floating_point_rounding_mode): Add NONE, DYN_EXIT and
> DYN_CALL.
> (get_frm_mode): New declaration.
> * config/riscv/riscv-v.cc (get_frm_mode): New function to get frm
> mode.
> * config/riscv/riscv-vector-builtins.cc
> (function_expander::use_ternop_insn): Take care of frm reg.
> * config/riscv/riscv.cc (riscv_static_frm_mode_p): Migrate to
> FRM_XXX.
> (riscv_emit_frm_mode_set): Ditto.
> (riscv_emit_mode_set): Ditto.
> (riscv_frm_adjust_mode_after_call): Ditto.
> (riscv_frm_mode_needed): Ditto.
> (riscv_frm_mode_after): Ditto.
> (riscv_mode_entry): Ditto.
> (riscv_mode_exit): Ditto.
> * config/riscv/riscv.h (NUM_MODES_FOR_MODE_SWITCHING): Ditto.
> * config/riscv/vector.md
> (rne,rtz,rdn,rup,rmm,dyn,dyn_exit,dyn_call,none): Removed
> (symbol_ref): * config/riscv/vector.md: Set frm_mode attr
> explicitly.
> ---
>  gcc/config/riscv/riscv-protos.h   |   4 +
>  gcc/config/riscv/riscv-v.cc   |  29 
>  gcc/config/riscv/riscv-vector-builtins.cc |  22 ++-
>  gcc/config/riscv/riscv.cc |  46 +++---
>  gcc/config/riscv/riscv.h  |   2 +-
>  gcc/config/riscv/vector.md| 172 ++
>  6 files changed, 186 insertions(+), 89 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv-protos.h
> b/gcc/config/riscv/riscv-protos.h
> index ac9c4b02f17..2fbed04ff84 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -349,8 +349,12 @@ enum floating_point_rounding_mode
>FRM_DYN = 7, /* Aka 0b111.  */
>FRM_STATIC_MIN = FRM_RNE,
>FRM_STATIC_MAX = FRM_RMM,
> +  FRM_DYN_EXIT = 8,
> +  FRM_DYN_CALL = 9,
> +  FRM_NONE = 10,
>  };
>
> +enum floating_point_rounding_mode get_frm_mode (rtx);
>  opt_machine_mode vectorize_related_mode (machine_mode, scalar_mode,
>  poly_uint64);
>  unsigned int autovectorize_vector_modes (vec *, bool);
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index 0bea04c1967..c9f0a4a9e7b 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -112,6 +112,7 @@ public:
>{
>  m_has_fp_rounding_mode_p = true;
>  m_fp_rounding_mode = mode;
> +gcc_assert (mode <= FRM_DYN);
>}
>
>void add_output_operand (rtx x, machine_mode mode)
> @@ -1590,6 +1591,34 @@ expand_const_vector (rtx target, rtx src)
>  gcc_unreachable ();
>  }
>
> +/* Get the frm mode with given CONST_INT rtx, the default mode is
> +   FRM_DYN.  */
> +enum floating_point_rounding_mode
> +get_frm_mode (rtx operand)
> +{
> +  gcc_assert (CONST_INT_P (operand));
> +
> +  switch (INTVAL (operand))
> +{
> +case FRM_RNE:
> +  return FRM_RNE;
> +case FRM_RTZ:
> +  return FRM_RTZ;
> +case FRM_RDN:
> +  return FRM_RDN;
> +case FRM_RUP:
> +  return FRM_RUP;
> +case FRM_RMM:
> +  return FRM_RMM;
> +case FRM_DYN:
> +  return FRM_DYN;
> +default:
> +  gcc_unreachable ();
> +}
> +
> +  gcc_unreachable ();
> +}
> +
>  /* Expand a pre-RA RVV data move from SRC to DEST.
> It expands move for RVV fractional vector modes.  */
>  bool
> diff --git a/gcc/config/riscv/riscv-vector-builtins.cc
> b/gcc/config/riscv/riscv-vector-builtins.cc
> index 528dca7ae85..abab06c00ed 100644
> --- a/gcc/config/riscv/riscv-vector-builtins.cc
> +++ b/gcc/config/riscv/riscv-vector-builtins.cc
> @@ -3730,17 +3730,29 @@ function_expander::use_ternop_insn (bool
> vd_accum_p, insn_code icode)
>  }
>
>for (int argno = arg_offset; argno < call_expr_nargs (exp); argno++)
> -add_input_operand (argno);
> +{
> +  if (base->has_rounding_mode_operand_p ()
> +

RE: [PATCH v2] RISC-V: Refactor RVV frm_mode attr for rounding mode intrinsic

2023-08-09 Thread Li, Pan2 via Gcc-patches
Thanks Kito, update it in PATCH v3, and passed riscv/rvv.exp already.

https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626918.html

Pan

-Original Message-
From: Kito Cheng  
Sent: Thursday, August 10, 2023 10:21 AM
To: Li, Pan2 
Cc: Kito Cheng ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; jeffreya...@gmail.com; Wang, Yanzhang 

Subject: Re: [PATCH v2] RISC-V: Refactor RVV frm_mode attr for rounding mode 
intrinsic

Yeah, no further comment from me :)

On Thu, Aug 10, 2023 at 10:16 AM Li, Pan2  wrote:
>
> Thanks kito. It makes sense, should not reach default, may I prepare v3(add 
> gcc_unreachable to default) if no more comments?
>
> Pan
>
> -Original Message-
> From: Kito Cheng 
> Sent: Thursday, August 10, 2023 10:12 AM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; jeffreya...@gmail.com; 
> Wang, Yanzhang ; Kito Cheng 
> Subject: Re: [PATCH v2] RISC-V: Refactor RVV frm_mode attr for rounding mode 
> intrinsic
>
> > +/* Get the frm mode with given CONST_INT rtx, the default mode is
> > +   FRM_DYN.  */
> > +enum floating_point_rounding_mode
> > +get_frm_mode (rtx operand)
> > +{
> > +  gcc_assert (CONST_INT_P (operand));
> > +
> > +  switch (INTVAL (operand))
> > +{
> > +case FRM_RNE:
> > +  return FRM_RNE;
> > +case FRM_RTZ:
> > +  return FRM_RTZ;
> > +case FRM_RDN:
> > +  return FRM_RDN;
> > +case FRM_RUP:
> > +  return FRM_RUP;
> > +case FRM_RMM:
> > +  return FRM_RMM;
> > +case FRM_DYN:
> > +  return FRM_DYN;
> > +default:
> > +  return FRM_DYN;
>
> Should we put a gcc_unreachable or gcc_assert here? I am not sure if
> another value is valid when it appeared for this operand?
>
> > +}
> > +
> > +  gcc_unreachable ();
> > +}


[PATCH v3] RISC-V: Refactor RVV frm_mode attr for rounding mode intrinsic

2023-08-09 Thread Pan Li via Gcc-patches
From: Pan Li 

The frm_mode attr has some assumptions for each define insn as below.

1. The define insn has at least 9 operands.
2. The operands[9] must be frm reg.
3. The operands[9] must be const int.

Actually, the frm operand can be operands[8], operands[9] or
operands[10], and not all the define insn has frm operands.

This patch would like to refactor frm and eliminate the above
assumptions, as well as unblock the underlying rounding mode intrinsic
API support.

After refactor, the default frm will be none, and the selected insn type
will be dyn. For the floating point which honors the frm, we will
set the frm_mode attr explicitly in define_insn.

Signed-off-by: Pan Li 
Co-Authored-by: Kito Cheng 

gcc/ChangeLog:

* config/riscv/riscv-protos.h (get_frm_mode): Remove operand
assumptions.
* config/riscv/riscv-v.cc (get_frm_mode): New function.
* config/riscv/riscv-vector-builtins.cc
(function_expander::use_ternop_insn):
* config/riscv/vector.md: Set frm_mode attr explicitly.

gcc/ChangeLog:

* config/riscv/riscv-protos.h
(enum floating_point_rounding_mode): Add NONE, DYN_EXIT and DYN_CALL.
(get_frm_mode): New declaration.
* config/riscv/riscv-v.cc (get_frm_mode): New function to get frm mode.
* config/riscv/riscv-vector-builtins.cc
(function_expander::use_ternop_insn): Take care of frm reg.
* config/riscv/riscv.cc (riscv_static_frm_mode_p): Migrate to FRM_XXX.
(riscv_emit_frm_mode_set): Ditto.
(riscv_emit_mode_set): Ditto.
(riscv_frm_adjust_mode_after_call): Ditto.
(riscv_frm_mode_needed): Ditto.
(riscv_frm_mode_after): Ditto.
(riscv_mode_entry): Ditto.
(riscv_mode_exit): Ditto.
* config/riscv/riscv.h (NUM_MODES_FOR_MODE_SWITCHING): Ditto.
* config/riscv/vector.md
(rne,rtz,rdn,rup,rmm,dyn,dyn_exit,dyn_call,none): Removed
(symbol_ref): * config/riscv/vector.md: Set frm_mode attr explicitly.
---
 gcc/config/riscv/riscv-protos.h   |   4 +
 gcc/config/riscv/riscv-v.cc   |  29 
 gcc/config/riscv/riscv-vector-builtins.cc |  22 ++-
 gcc/config/riscv/riscv.cc |  46 +++---
 gcc/config/riscv/riscv.h  |   2 +-
 gcc/config/riscv/vector.md| 172 ++
 6 files changed, 186 insertions(+), 89 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index ac9c4b02f17..2fbed04ff84 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -349,8 +349,12 @@ enum floating_point_rounding_mode
   FRM_DYN = 7, /* Aka 0b111.  */
   FRM_STATIC_MIN = FRM_RNE,
   FRM_STATIC_MAX = FRM_RMM,
+  FRM_DYN_EXIT = 8,
+  FRM_DYN_CALL = 9,
+  FRM_NONE = 10,
 };
 
+enum floating_point_rounding_mode get_frm_mode (rtx);
 opt_machine_mode vectorize_related_mode (machine_mode, scalar_mode,
 poly_uint64);
 unsigned int autovectorize_vector_modes (vec *, bool);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 0bea04c1967..c9f0a4a9e7b 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -112,6 +112,7 @@ public:
   {
 m_has_fp_rounding_mode_p = true;
 m_fp_rounding_mode = mode;
+gcc_assert (mode <= FRM_DYN);
   }
 
   void add_output_operand (rtx x, machine_mode mode)
@@ -1590,6 +1591,34 @@ expand_const_vector (rtx target, rtx src)
 gcc_unreachable ();
 }
 
+/* Get the frm mode with given CONST_INT rtx, the default mode is
+   FRM_DYN.  */
+enum floating_point_rounding_mode
+get_frm_mode (rtx operand)
+{
+  gcc_assert (CONST_INT_P (operand));
+
+  switch (INTVAL (operand))
+{
+case FRM_RNE:
+  return FRM_RNE;
+case FRM_RTZ:
+  return FRM_RTZ;
+case FRM_RDN:
+  return FRM_RDN;
+case FRM_RUP:
+  return FRM_RUP;
+case FRM_RMM:
+  return FRM_RMM;
+case FRM_DYN:
+  return FRM_DYN;
+default:
+  gcc_unreachable ();
+}
+
+  gcc_unreachable ();
+}
+
 /* Expand a pre-RA RVV data move from SRC to DEST.
It expands move for RVV fractional vector modes.  */
 bool
diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index 528dca7ae85..abab06c00ed 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -3730,17 +3730,29 @@ function_expander::use_ternop_insn (bool vd_accum_p, 
insn_code icode)
 }
 
   for (int argno = arg_offset; argno < call_expr_nargs (exp); argno++)
-add_input_operand (argno);
+{
+  if (base->has_rounding_mode_operand_p ()
+ && argno == call_expr_nargs (exp) - 2)
+   {
+ /* Since the rounding mode argument position is not consistent with
+the instruction pattern, we need to skip rounding mode argument
+here.  */
+ continue;
+   }
+  add_input_operand (argno);
+  

[PATCH 1/1] RISC-V: Make "prefetch.i" built-in usable

2023-08-09 Thread Tsukasa OI via Gcc-patches
From: Tsukasa OI 

The "__builtin_riscv_zicbop_cbo_prefetchi" built-in function was terribly
broken so that practically unusable.  It emitted "prefetch.i" but with no
meaningful arguments.

Though incompatible, this commit completely changes the function prototype
of this built-in and makes it usable.  To minimize the functionality issues,
it renames the built-in to "__builtin_riscv_zicbop_prefetch_i".

gcc/ChangeLog:

* config/riscv/riscv-cmo.def: Fix function prototype.
* config/riscv/riscv.md (riscv_prefetchi_): Fix instruction
prototype.  Remove possible prefectch type argument
* doc/extend.texi: Document __builtin_riscv_zicbop_prefetch_i.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmo-zicbop-1.c: Reflect new built-in prototype.
* gcc.target/riscv/cmo-zicbop-2.c: Likewise.
---
 gcc/config/riscv/riscv-cmo.def|  4 ++--
 gcc/config/riscv/riscv.md |  5 ++---
 gcc/doc/extend.texi   |  7 +++
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c |  8 +---
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c | 10 ++
 5 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/gcc/config/riscv/riscv-cmo.def b/gcc/config/riscv/riscv-cmo.def
index b92044dc6ff9..2286c8a25544 100644
--- a/gcc/config/riscv/riscv-cmo.def
+++ b/gcc/config/riscv/riscv-cmo.def
@@ -13,8 +13,8 @@ RISCV_BUILTIN (zero_si, "zicboz_cbo_zero", 
RISCV_BUILTIN_DIRECT_NO_TARGET, RISCV
 RISCV_BUILTIN (zero_di, "zicboz_cbo_zero", RISCV_BUILTIN_DIRECT_NO_TARGET, 
RISCV_VOID_FTYPE_VOID_PTR, zero64),
 
 // zicbop
-RISCV_BUILTIN (prefetchi_si, "zicbop_cbo_prefetchi", RISCV_BUILTIN_DIRECT, 
RISCV_SI_FTYPE_SI, prefetchi32),
-RISCV_BUILTIN (prefetchi_di, "zicbop_cbo_prefetchi", RISCV_BUILTIN_DIRECT, 
RISCV_DI_FTYPE_DI, prefetchi64),
+RISCV_BUILTIN (prefetchi_si, "zicbop_prefetch_i", 
RISCV_BUILTIN_DIRECT_NO_TARGET, RISCV_VOID_FTYPE_VOID_PTR, prefetchi32),
+RISCV_BUILTIN (prefetchi_di, "zicbop_prefetch_i", 
RISCV_BUILTIN_DIRECT_NO_TARGET, RISCV_VOID_FTYPE_VOID_PTR, prefetchi64),
 
 // zbkc or zbc
 RISCV_BUILTIN (clmul_si, "clmul", RISCV_BUILTIN_DIRECT, RISCV_SI_FTYPE_SI_SI, 
clmul_zbkc32_or_zbc32),
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 688fd697255b..5658c7b7e113 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -3273,9 +3273,8 @@
 })
 
 (define_insn "riscv_prefetchi_"
-  [(unspec_volatile:X [(match_operand:X 0 "address_operand" "r")
-  (match_operand:X 1 "imm5_operand" "i")]
-  UNSPECV_PREI)]
+  [(unspec_volatile:X [(match_operand:X 0 "register_operand" "r")]
+UNSPECV_PREI)]
   "TARGET_ZICBOP"
   "prefetch.i\t%a0"
 )
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 89c5b4ea2b20..0eb98fc89e3f 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21575,6 +21575,13 @@ Xgnuzihintpausestate extension, which redefines the 
@code{pause} instruction to
 change architectural state.
 @enddefbuiltin
 
+@defbuiltin{void __builtin_riscv_zicbop_prefetch_i (void *@var{addr})}
+Generates the @code{prefetch.i} machine instruction to instruct the hardware
+that a cache block containing @var{addr} is likely to be accessed by an
+instruction fetch in the near future.
+Available if the Zicbop extension is enabled.
+@enddefbuiltin
+
 @node RISC-V Vector Intrinsics
 @subsection RISC-V Vector Intrinsics
 
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c
index c5d78c1763d3..0d5b58c4fadf 100644
--- a/gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c
@@ -13,11 +13,13 @@ void foo (char *p)
   __builtin_prefetch (p, 1, 3);
 }
 
-int foo1()
+void foo1 ()
 {
-  return __builtin_riscv_zicbop_cbo_prefetchi(1);
+  __builtin_riscv_zicbop_prefetch_i(0);
+  __builtin_riscv_zicbop_prefetch_i();
+  __builtin_riscv_zicbop_prefetch_i((void*)0x111);
 }
 
-/* { dg-final { scan-assembler-times "prefetch.i" 1 } } */
+/* { dg-final { scan-assembler-times "prefetch.i" 3 } } */
 /* { dg-final { scan-assembler-times "prefetch.r" 4 } } */
 /* { dg-final { scan-assembler-times "prefetch.w" 4 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c
index 6576365b39ca..09655c4b8593 100644
--- a/gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c
@@ -13,11 +13,13 @@ void foo (char *p)
   __builtin_prefetch (p, 1, 3);
 }
 
-int foo1()
+void foo1 ()
 {
-  return __builtin_riscv_zicbop_cbo_prefetchi(1);
+  __builtin_riscv_zicbop_prefetch_i(0);
+  __builtin_riscv_zicbop_prefetch_i();
+  __builtin_riscv_zicbop_prefetch_i((void*)0x111);
 }
 
-/* { dg-final { scan-assembler-times "prefetch.i" 1 } } */
+/* { dg-final { scan-assembler-times "prefetch.i" 3 } } */
 /* { dg-final { scan-assembler-times "prefetch.r" 4 } } */
-/* { dg-final { scan-assembler-times "prefetch.w" 4 } } */ 
+/* { 

[PATCH 0/1] RISC-V: Make "prefetch.i" built-in usable

2023-08-09 Thread Tsukasa OI via Gcc-patches
Hello,

I found that a built-in function "__builtin_riscv_zicbop_cbo_prefetchi" is
terribly broken so that this is practically unusable.  It emits the
"prefetch.i" machine instruction HINT but with no usable arguments.

Contents of a.c:

> void function_to_be_called(void);
> 
> void sample(void)
> {
> __builtin_riscv_zicbop_cbo_prefetchi(_to_be_called);
> function_to_be_called();
> }


Compiling with the 'Zicbop' extension fails.

> $ riscv64-unknown-elf-gcc -O2 -march=rv64i_zicbop -mabi=lp64 -c a.c
> a.c: In function 'sample':
> a.c:5:42: warning: passing argument 1 of 
> '__builtin_riscv_zicbop_cbo_prefetchi' makes integer from pointer without a 
> cast [-Wint-conversion]
> 5 | __builtin_riscv_zicbop_cbo_prefetchi(_to_be_called);
>   |  ^~
>   |  |
>   |  void (*)(void)
> a.c:5:42: note: expected 'long int' but argument is of type 'void (*)(void)'
> a.c:5:5: error: invalid argument to built-in function
> 5 | __builtin_riscv_zicbop_cbo_prefetchi(_to_be_called);
>   | ^~~~

seems the prototype of this built-in function is...

> int  __builtin_riscv_zicbop_cbo_prefetchi(int);  // RV32
> long __builtin_riscv_zicbop_cbo_prefetchi(long); // RV64

I started to have a bad feeling about this.




Looking at the test case, following code compiles (sort of).

> void sample(void)
> {
> __builtin_riscv_zicbop_cbo_prefetchi(1); /* integer constant: 0-4 
> (inclusive) */
> }

WHAT IS THIS PREFETCHING!?

The answer was obvious (tested with GCC 13.2.0).

> $ riscv64-unknown-elf-gcc -O2 -march=rv64i_zicbop -mabi=lp64 -S a.c
> $ cat a.s
> .file   "a.c"
> .option nopic
> .attribute arch, "rv64i2p1_zicbop1p0"
> .attribute unaligned_access, 0
> .attribute stack_align, 16
> .text
> .align  2
> .globl  sample
> .type   sample, @function
> sample:
> li  a5,0
> prefetch.i  0(a5)
> ret
> .size   sample, .-sample
> .ident  "GCC: (gc891d8dc23e1) 13.2.0"

... none (NULL).  Even the prefetching constant (1) is ignored.




It makes no sense whatsoever.  This patch set is definitely compatibility-
breaking but necessary to make "prefetch.i" work.

This patch set makes following built-in function.

> void  __builtin_riscv_zicbop_prefetch_i(void*);

Did you notice that I renamed the function?  There are three reasons:

1.  We needed to break the compatibility anyway.
2.  To avoid functionality problems.
Co-existing functional and non-functional
__builtin_riscv_zicbop_cbo_prefetchi built-in felt like a nightmare.
3.  Although this is also a cache block operation, the instruction
"prefetch.i" does not have the name "cbo" in it
(unlike "__builtin_riscv_zicbom_cbo_clean" corresponding "cbo.clean").


Let's make a working sample (a.c):

> void function_to_be_called(void);
> 
> void sample(void)
> {
> __builtin_riscv_zicbop_prefetch_i(_to_be_called);
> function_to_be_called();
> }

And take look at the output assembly and its relocations.

> $ riscv64-unknown-elf-gcc -O2 -march=rv64i_zicbop -mabi=lp64 -S a.c
> $ cat a.s
> .file   "a.c"
> .option nopic
> .attribute arch, "rv64i2p1_zicbop1p0"
> .attribute unaligned_access, 0
> .attribute stack_align, 16
> .text
> .align  2
> .globl  sample
> .type   sample, @function
> sample:
> lui a5,%hi(function_to_be_called)
> addia5,a5,%lo(function_to_be_called)
> prefetch.i  0(a5)
> tailfunction_to_be_called
> .size   sample, .-sample
> .ident  "GCC: (GNU) 14.0.0 20230810 (experimental)"



I hope this issue is fixed soon.

Sincerely,
Tsukasa




Tsukasa OI (1):
  RISC-V: Make "prefetch.i" built-in usable

 gcc/config/riscv/riscv-cmo.def|  4 ++--
 gcc/config/riscv/riscv.md |  5 ++---
 gcc/doc/extend.texi   |  7 +++
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c |  8 +---
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c | 10 ++
 5 files changed, 22 insertions(+), 12 deletions(-)


base-commit: 9b099a83b45b8fcdfc07d518e05d36ea741b2227
-- 
2.41.0



Re: [Patch, fortran] PR109684 - compiling failure: complaining about a final subroutine of a type being not PURE (while it is indeed PURE)

2023-08-09 Thread Jerry D via Gcc-patches

On 8/9/23 7:58 AM, Paul Richard Thomas via Fortran wrote:

I took a look at my calendar and decided to backport right away.

r13-7703-ged049e5d5f36cc0f4318cd93bb6b33ed6f6f2ba7

BTW It is a regression :-)

Paul

On Wed, 9 Aug 2023 at 12:10, Paul Richard Thomas
 wrote:


Committed to trunk as 'obvious' in
r14-3098-gb8ec3c952324f866f191883473922e250be81341

13-branch to follow in a few days.

Paul


Thanks Paul, I was about ready to do it myself.

Jerry


[RFC PATCH 0/2] RISC-V: Make __builtin_riscv_pause 'Zihintpause' only

2023-08-09 Thread Tsukasa OI via Gcc-patches
**WARNING**

Following patch sets are exclusive:

1.  [RFC PATCH v2] RISC-V: __builtin_riscv_pause for all environment
2.  [RFC PATCH] RISC-V: Make __builtin_riscv_pause 'Zihintpause' only (this)

See 
for the background of this patch set.


Changes: (all environment) v1 -> ('Zihintpause'-only) v1

*   Based on the feedback from Jeff Law, I made an alternative patch set
which makes "__builtin_riscv_pause" 'Zihintpause'-only.


Comparison: Patch sets [1] and [2] (this)

*   [1] completely preserves the compatibility with GCC 13
([2] removes __builtin_riscv_pause if the 'Zihintpause' extension is
 absent, making a minor compatibility issue)
*   Because of the nature of the instrinsic, [2] is more natural
("pause" is defined in the 'Zihintpause' extension).


Please consider those patch sets and decide which to apply.

Sincerely,
Tsukasa




Tsukasa OI (2):
  RISC-V: Make __builtin_riscv_pause 'Zihintpause' only
  RISC-V: Fix documentation of __builtin_riscv_pause

 gcc/common/config/riscv/riscv-common.cc|  2 ++
 gcc/config/riscv/riscv-builtins.cc |  4 ++--
 gcc/config/riscv/riscv-opts.h  |  2 ++
 gcc/config/riscv/riscv.md  |  2 +-
 gcc/doc/extend.texi|  6 +++---
 gcc/testsuite/gcc.target/riscv/builtin_pause.c | 10 --
 gcc/testsuite/gcc.target/riscv/zihintpause-1.c | 11 +++
 gcc/testsuite/gcc.target/riscv/zihintpause-2.c | 11 +++
 8 files changed, 32 insertions(+), 16 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.target/riscv/builtin_pause.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zihintpause-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zihintpause-2.c


base-commit: 9b099a83b45b8fcdfc07d518e05d36ea741b2227
-- 
2.41.0



[RFC PATCH 2/2] RISC-V: Fix documentation of __builtin_riscv_pause

2023-08-09 Thread Tsukasa OI via Gcc-patches
From: Tsukasa OI 

This built-in does not imply the 'Xgnuzihintpausestate' extension.
It does not change architectural state (because all HINTs are prohibited
from doing that).

gcc/ChangeLog:

* doc/extend.texi: Fix the description of __builtin_riscv_pause.
---
 gcc/doc/extend.texi | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 89c5b4ea2b20..7be27430666a 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21570,9 +21570,9 @@ Returns the value that is currently set in the 
@samp{tp} register.
 @enddefbuiltin
 
 @defbuiltin{void __builtin_riscv_pause (void)}
-Generates the @code{pause} (hint) machine instruction.  This implies the
-Xgnuzihintpausestate extension, which redefines the @code{pause} instruction to
-change architectural state.
+Generates the @code{pause} (hint) machine instruction, indicating that the
+current hart should be temporarily paused or slowed down.
+Available if the Zihintpause extension is enabled.
 @enddefbuiltin
 
 @node RISC-V Vector Intrinsics
-- 
2.41.0



[RFC PATCH 1/2] RISC-V: Make __builtin_riscv_pause 'Zihintpause' only

2023-08-09 Thread Tsukasa OI via Gcc-patches
From: Tsukasa OI 

The "pause" RISC-V hint instruction requires the 'Zihintpause' extension
(in the assembler).  However, GCC emits "pause" unconditionally, making
an assembler error while compiling code with __builtin_riscv_pause while
the 'Zihintpause' extension disabled.

Despite that the "pause" instruction code (0x010f) is a HINT and
emitting its instruction code is safe in any environment, enabling the
built-in only for certain environment is not a good idea.

This commit implements handling for the 'Zihintpause' extension and makes
__built_riscv_pause built-in 'Zihintpause'-only.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_ext_version_table): Implement the 'Zihintpause' extension,
version 2.0.  (riscv_ext_flag_table) Add 'Zihintpause' handling.
* config/riscv/riscv-builtins.cc: Remove availability predicate
"always" and add "hint_pause", corresponding the existence of the
'Zihintpause' extension.
(riscv_builtins) Change che "riscv_pause" requirements.
* config/riscv/riscv-opts.h
(MASK_ZIHINTPAUSE, TARGET_ZIHINTPAUSE): New.
* config/riscv/riscv.md (riscv_pause): Make it only available when
the 'Zihintpause' extension is enabled.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/builtin_pause.c: Removed.
* gcc.target/riscv/zihintpause-1.c:
New test when the 'Zihintpause' extension is enabled.
* gcc.target/riscv/zihintpause-2.c: Likewise.
---
 gcc/common/config/riscv/riscv-common.cc|  2 ++
 gcc/config/riscv/riscv-builtins.cc |  4 ++--
 gcc/config/riscv/riscv-opts.h  |  2 ++
 gcc/config/riscv/riscv.md  |  2 +-
 gcc/testsuite/gcc.target/riscv/builtin_pause.c | 10 --
 gcc/testsuite/gcc.target/riscv/zihintpause-1.c | 11 +++
 gcc/testsuite/gcc.target/riscv/zihintpause-2.c | 11 +++
 7 files changed, 29 insertions(+), 13 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.target/riscv/builtin_pause.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zihintpause-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zihintpause-2.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index f25131eab28b..b77fdb909567 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -209,6 +209,7 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"zkt",   ISA_SPEC_CLASS_NONE, 1, 0},
 
   {"zihintntl", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zihintpause", ISA_SPEC_CLASS_NONE, 2, 0},
 
   {"zicboz",ISA_SPEC_CLASS_NONE, 1, 0},
   {"zicbom",ISA_SPEC_CLASS_NONE, 1, 0},
@@ -1343,6 +1344,7 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"zkt",_options::x_riscv_zk_subext, MASK_ZKT},
 
   {"zihintntl", _options::x_riscv_zi_subext, MASK_ZIHINTNTL},
+  {"zihintpause", _options::x_riscv_zi_subext, MASK_ZIHINTPAUSE},
 
   {"zicboz", _options::x_riscv_zicmo_subext, MASK_ZICBOZ},
   {"zicbom", _options::x_riscv_zicmo_subext, MASK_ZICBOM},
diff --git a/gcc/config/riscv/riscv-builtins.cc 
b/gcc/config/riscv/riscv-builtins.cc
index 79681d759628..73e1512a5992 100644
--- a/gcc/config/riscv/riscv-builtins.cc
+++ b/gcc/config/riscv/riscv-builtins.cc
@@ -122,7 +122,7 @@ AVAIL (clmul_zbkc32_or_zbc32, (TARGET_ZBKC || TARGET_ZBC) 
&& !TARGET_64BIT)
 AVAIL (clmul_zbkc64_or_zbc64, (TARGET_ZBKC || TARGET_ZBC) && TARGET_64BIT)
 AVAIL (clmulr_zbc32, TARGET_ZBC && !TARGET_64BIT)
 AVAIL (clmulr_zbc64, TARGET_ZBC && TARGET_64BIT)
-AVAIL (always, (!0))
+AVAIL (hint_pause, TARGET_ZIHINTPAUSE)
 
 /* Construct a riscv_builtin_description from the given arguments.
 
@@ -179,7 +179,7 @@ static const struct riscv_builtin_description 
riscv_builtins[] = {
 
   DIRECT_BUILTIN (frflags, RISCV_USI_FTYPE, hard_float),
   DIRECT_NO_TARGET_BUILTIN (fsflags, RISCV_VOID_FTYPE_USI, hard_float),
-  DIRECT_NO_TARGET_BUILTIN (pause, RISCV_VOID_FTYPE, always),
+  DIRECT_NO_TARGET_BUILTIN (pause, RISCV_VOID_FTYPE, hint_pause),
 };
 
 /* Index I is the function declaration for riscv_builtins[I], or null if the
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 28d9b81bd800..a6c3e0c9098f 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -102,10 +102,12 @@ enum riscv_entity
 #define MASK_ZICSR(1 << 0)
 #define MASK_ZIFENCEI (1 << 1)
 #define MASK_ZIHINTNTL (1 << 2)
+#define MASK_ZIHINTPAUSE (1 << 3)
 
 #define TARGET_ZICSR((riscv_zi_subext & MASK_ZICSR) != 0)
 #define TARGET_ZIFENCEI ((riscv_zi_subext & MASK_ZIFENCEI) != 0)
 #define TARGET_ZIHINTNTL ((riscv_zi_subext & MASK_ZIHINTNTL) != 0)
+#define TARGET_ZIHINTPAUSE ((riscv_zi_subext & MASK_ZIHINTPAUSE) != 0)
 
 #define MASK_ZAWRS   (1 << 0)
 #define TARGET_ZAWRS ((riscv_za_subext & MASK_ZAWRS) != 0)
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 688fd697255b..44ea1c54b96b 100644
--- 

[RFC PATCH v2 1/2] RISC-V: __builtin_riscv_pause for all environment

2023-08-09 Thread Tsukasa OI via Gcc-patches
From: Tsukasa OI 

The "pause" RISC-V hint instruction requires the 'Zihintpause' extension
(in the assembler).  However, GCC emits "pause" unconditionally, making
an assembler error while compiling code with __builtin_riscv_pause while
the 'Zihintpause' extension disabled.

However, the "pause" instruction code (0x010f) is a HINT and emitting
its instruction code is safe in any environment.

This commit implements handling for the 'Zihintpause' extension and emits
".insn 0x010f" instead of "pause" only if the extension is disabled
(making the diagnostics better).

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_ext_version_table): Implement the 'Zihintpause' extension,
version 2.0.  (riscv_ext_flag_table) Add 'Zihintpause' handling.
* config/riscv/riscv-builtins.cc: Remove availability predicate
"always" and add "hint_pause" and "hint_pause_pseudo", corresponding
the existence of the 'Zihintpause' extension.
(riscv_builtins) Split builtin implementation depending on the
existence of the 'Zihintpause' extension.
* config/riscv/riscv-opts.h
(MASK_ZIHINTPAUSE, TARGET_ZIHINTPAUSE): New.
* config/riscv/riscv.md (riscv_pause): Make it only available when
the 'Zihintpause' extension is enabled.  (riscv_pause_insn) New
"pause" implementation when the 'Zihintpause' extension is disabled.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/builtin_pause.c: Removed.
* gcc.target/riscv/zihintpause-1.c:
New test when the 'Zihintpause' extension is enabled.
* gcc.target/riscv/zihintpause-2.c: Likewise.
* gcc.target/riscv/zihintpause-noarch.c:
New test when the 'Zihintpause' extension is disabled.
---
 gcc/common/config/riscv/riscv-common.cc |  2 ++
 gcc/config/riscv/riscv-builtins.cc  |  6 --
 gcc/config/riscv/riscv-opts.h   |  2 ++
 gcc/config/riscv/riscv.md   |  7 ++-
 gcc/testsuite/gcc.target/riscv/builtin_pause.c  | 10 --
 gcc/testsuite/gcc.target/riscv/zihintpause-1.c  | 11 +++
 gcc/testsuite/gcc.target/riscv/zihintpause-2.c  | 11 +++
 gcc/testsuite/gcc.target/riscv/zihintpause-noarch.c | 12 
 8 files changed, 48 insertions(+), 13 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.target/riscv/builtin_pause.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zihintpause-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zihintpause-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zihintpause-noarch.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index f25131eab28b..b77fdb909567 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -209,6 +209,7 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"zkt",   ISA_SPEC_CLASS_NONE, 1, 0},
 
   {"zihintntl", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zihintpause", ISA_SPEC_CLASS_NONE, 2, 0},
 
   {"zicboz",ISA_SPEC_CLASS_NONE, 1, 0},
   {"zicbom",ISA_SPEC_CLASS_NONE, 1, 0},
@@ -1343,6 +1344,7 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"zkt",_options::x_riscv_zk_subext, MASK_ZKT},
 
   {"zihintntl", _options::x_riscv_zi_subext, MASK_ZIHINTNTL},
+  {"zihintpause", _options::x_riscv_zi_subext, MASK_ZIHINTPAUSE},
 
   {"zicboz", _options::x_riscv_zicmo_subext, MASK_ZICBOZ},
   {"zicbom", _options::x_riscv_zicmo_subext, MASK_ZICBOM},
diff --git a/gcc/config/riscv/riscv-builtins.cc 
b/gcc/config/riscv/riscv-builtins.cc
index 79681d759628..554fb7f69bb0 100644
--- a/gcc/config/riscv/riscv-builtins.cc
+++ b/gcc/config/riscv/riscv-builtins.cc
@@ -122,7 +122,8 @@ AVAIL (clmul_zbkc32_or_zbc32, (TARGET_ZBKC || TARGET_ZBC) 
&& !TARGET_64BIT)
 AVAIL (clmul_zbkc64_or_zbc64, (TARGET_ZBKC || TARGET_ZBC) && TARGET_64BIT)
 AVAIL (clmulr_zbc32, TARGET_ZBC && !TARGET_64BIT)
 AVAIL (clmulr_zbc64, TARGET_ZBC && TARGET_64BIT)
-AVAIL (always, (!0))
+AVAIL (hint_pause, TARGET_ZIHINTPAUSE)
+AVAIL (hint_pause_pseudo, !TARGET_ZIHINTPAUSE)
 
 /* Construct a riscv_builtin_description from the given arguments.
 
@@ -179,7 +180,8 @@ static const struct riscv_builtin_description 
riscv_builtins[] = {
 
   DIRECT_BUILTIN (frflags, RISCV_USI_FTYPE, hard_float),
   DIRECT_NO_TARGET_BUILTIN (fsflags, RISCV_VOID_FTYPE_USI, hard_float),
-  DIRECT_NO_TARGET_BUILTIN (pause, RISCV_VOID_FTYPE, always),
+  RISCV_BUILTIN (pause, "pause", RISCV_BUILTIN_DIRECT_NO_TARGET, 
RISCV_VOID_FTYPE, hint_pause),
+  RISCV_BUILTIN (pause_insn, "pause", RISCV_BUILTIN_DIRECT_NO_TARGET, 
RISCV_VOID_FTYPE, hint_pause_pseudo),
 };
 
 /* Index I is the function declaration for riscv_builtins[I], or null if the
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 28d9b81bd800..a6c3e0c9098f 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h

[RFC PATCH v2 2/2] RISC-V: Fix documentation of __builtin_riscv_pause

2023-08-09 Thread Tsukasa OI via Gcc-patches
From: Tsukasa OI 

This built-in does not imply the 'Xgnuzihintpausestate' extension.
It does not change architectural state (because all HINTs are prohibited
from doing that).

gcc/ChangeLog:

* doc/extend.texi: Fix the description of __builtin_riscv_pause.
---
 gcc/doc/extend.texi | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 89c5b4ea2b20..7ebbe70c78d6 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21570,9 +21570,9 @@ Returns the value that is currently set in the 
@samp{tp} register.
 @enddefbuiltin
 
 @defbuiltin{void __builtin_riscv_pause (void)}
-Generates the @code{pause} (hint) machine instruction.  This implies the
-Xgnuzihintpausestate extension, which redefines the @code{pause} instruction to
-change architectural state.
+Generates the @code{pause} (hint) machine instruction.  If the target 
implements
+the Zihintpause extension, it indicates that the current hart should be
+temporarily paused or slowed down.
 @enddefbuiltin
 
 @node RISC-V Vector Intrinsics
-- 
2.41.0



[RFC PATCH v2 0/2] RISC-V: __builtin_riscv_pause for all environment

2023-08-09 Thread Tsukasa OI via Gcc-patches
**WARNING**

Following patch sets are exclusive:

1.  [RFC PATCH v2] RISC-V: __builtin_riscv_pause for all environment (this)
2.  [RFC PATCH] RISC-V: Make __builtin_riscv_pause 'Zihintpause' only

See 
for the background of this patch set.


Changes: v1 -> v2

*   Improve test case to test both RV32 and RV64 (+'Zihintpause').


Comparison: Patch sets [1] (this) and [2]

*   [1] completely preserves the compatibility with GCC 13
([2] removes __builtin_riscv_pause if the 'Zihintpause' extension is
 absent, making a minor compatibility issue)
*   Because of the nature of the instrinsic, [2] is more natural
("pause" is defined in the 'Zihintpause' extension).


Please consider those patch sets and decide which to apply.

Sincerely,
Tsukasa




Tsukasa OI (2):
  RISC-V: __builtin_riscv_pause for all environment
  RISC-V: Fix documentation of __builtin_riscv_pause

 gcc/common/config/riscv/riscv-common.cc |  2 ++
 gcc/config/riscv/riscv-builtins.cc  |  6 --
 gcc/config/riscv/riscv-opts.h   |  2 ++
 gcc/config/riscv/riscv.md   |  7 ++-
 gcc/doc/extend.texi |  6 +++---
 gcc/testsuite/gcc.target/riscv/builtin_pause.c  | 10 --
 gcc/testsuite/gcc.target/riscv/zihintpause-1.c  | 11 +++
 gcc/testsuite/gcc.target/riscv/zihintpause-2.c  | 11 +++
 gcc/testsuite/gcc.target/riscv/zihintpause-noarch.c | 12 
 9 files changed, 51 insertions(+), 16 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.target/riscv/builtin_pause.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zihintpause-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zihintpause-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zihintpause-noarch.c


base-commit: 9b099a83b45b8fcdfc07d518e05d36ea741b2227
-- 
2.41.0



Re: [PATCH v2] RISC-V: Refactor RVV frm_mode attr for rounding mode intrinsic

2023-08-09 Thread Kito Cheng via Gcc-patches
Yeah, no further comment from me :)

On Thu, Aug 10, 2023 at 10:16 AM Li, Pan2  wrote:
>
> Thanks kito. It makes sense, should not reach default, may I prepare v3(add 
> gcc_unreachable to default) if no more comments?
>
> Pan
>
> -Original Message-
> From: Kito Cheng 
> Sent: Thursday, August 10, 2023 10:12 AM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; jeffreya...@gmail.com; 
> Wang, Yanzhang ; Kito Cheng 
> Subject: Re: [PATCH v2] RISC-V: Refactor RVV frm_mode attr for rounding mode 
> intrinsic
>
> > +/* Get the frm mode with given CONST_INT rtx, the default mode is
> > +   FRM_DYN.  */
> > +enum floating_point_rounding_mode
> > +get_frm_mode (rtx operand)
> > +{
> > +  gcc_assert (CONST_INT_P (operand));
> > +
> > +  switch (INTVAL (operand))
> > +{
> > +case FRM_RNE:
> > +  return FRM_RNE;
> > +case FRM_RTZ:
> > +  return FRM_RTZ;
> > +case FRM_RDN:
> > +  return FRM_RDN;
> > +case FRM_RUP:
> > +  return FRM_RUP;
> > +case FRM_RMM:
> > +  return FRM_RMM;
> > +case FRM_DYN:
> > +  return FRM_DYN;
> > +default:
> > +  return FRM_DYN;
>
> Should we put a gcc_unreachable or gcc_assert here? I am not sure if
> another value is valid when it appeared for this operand?
>
> > +}
> > +
> > +  gcc_unreachable ();
> > +}


RE: [PATCH v2] RISC-V: Refactor RVV frm_mode attr for rounding mode intrinsic

2023-08-09 Thread Li, Pan2 via Gcc-patches
Thanks kito. It makes sense, should not reach default, may I prepare v3(add 
gcc_unreachable to default) if no more comments?

Pan

-Original Message-
From: Kito Cheng  
Sent: Thursday, August 10, 2023 10:12 AM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; jeffreya...@gmail.com; Wang, 
Yanzhang ; Kito Cheng 
Subject: Re: [PATCH v2] RISC-V: Refactor RVV frm_mode attr for rounding mode 
intrinsic

> +/* Get the frm mode with given CONST_INT rtx, the default mode is
> +   FRM_DYN.  */
> +enum floating_point_rounding_mode
> +get_frm_mode (rtx operand)
> +{
> +  gcc_assert (CONST_INT_P (operand));
> +
> +  switch (INTVAL (operand))
> +{
> +case FRM_RNE:
> +  return FRM_RNE;
> +case FRM_RTZ:
> +  return FRM_RTZ;
> +case FRM_RDN:
> +  return FRM_RDN;
> +case FRM_RUP:
> +  return FRM_RUP;
> +case FRM_RMM:
> +  return FRM_RMM;
> +case FRM_DYN:
> +  return FRM_DYN;
> +default:
> +  return FRM_DYN;

Should we put a gcc_unreachable or gcc_assert here? I am not sure if
another value is valid when it appeared for this operand?

> +}
> +
> +  gcc_unreachable ();
> +}


Re: [PATCH] libatomic: Improve ifunc selection on AArch64

2023-08-09 Thread Richard Henderson via Gcc-patches

On 8/9/23 19:11, Richard Henderson wrote:

On 8/4/23 08:05, Wilco Dijkstra via Gcc-patches wrote:

+#ifdef HWCAP_USCAT
+
+#define MIDR_IMPLEMENTOR(midr)    (((midr) >> 24) & 255)
+#define MIDR_PARTNUM(midr)    (((midr) >> 4) & 0xfff)
+
+static inline bool
+ifunc1 (unsigned long hwcap)
+{
+  if (hwcap & HWCAP_USCAT)
+    return true;
+  if (!(hwcap & HWCAP_CPUID))
+    return false;
+
+  unsigned long midr;
+  asm volatile ("mrs %0, midr_el1" : "=r" (midr));
+
+  /* Neoverse N1 supports atomic 128-bit load/store.  */
+  if (MIDR_IMPLEMENTOR (midr) == 'A' && MIDR_PARTNUM(midr) == 0xd0c)
+    return true;
+
+  return false;
+}
+#endif


Why would HWCAP_USCAT not be set by the kernel?

Failing that, I would think you would check ID_AA64MMFR2_EL1.AT.


Answering my own question, N1 does not officially have FEAT_LSE2.


r~



Re: [PATCH] libatomic: Improve ifunc selection on AArch64

2023-08-09 Thread Richard Henderson via Gcc-patches

On 8/4/23 08:05, Wilco Dijkstra via Gcc-patches wrote:

+#ifdef HWCAP_USCAT
+
+#define MIDR_IMPLEMENTOR(midr) (((midr) >> 24) & 255)
+#define MIDR_PARTNUM(midr) (((midr) >> 4) & 0xfff)
+
+static inline bool
+ifunc1 (unsigned long hwcap)
+{
+  if (hwcap & HWCAP_USCAT)
+return true;
+  if (!(hwcap & HWCAP_CPUID))
+return false;
+
+  unsigned long midr;
+  asm volatile ("mrs %0, midr_el1" : "=r" (midr));
+
+  /* Neoverse N1 supports atomic 128-bit load/store.  */
+  if (MIDR_IMPLEMENTOR (midr) == 'A' && MIDR_PARTNUM(midr) == 0xd0c)
+return true;
+
+  return false;
+}
+#endif


Why would HWCAP_USCAT not be set by the kernel?

Failing that, I would think you would check ID_AA64MMFR2_EL1.AT.


r~


Re: [PATCH v2] RISC-V: Refactor RVV frm_mode attr for rounding mode intrinsic

2023-08-09 Thread Kito Cheng via Gcc-patches
> +/* Get the frm mode with given CONST_INT rtx, the default mode is
> +   FRM_DYN.  */
> +enum floating_point_rounding_mode
> +get_frm_mode (rtx operand)
> +{
> +  gcc_assert (CONST_INT_P (operand));
> +
> +  switch (INTVAL (operand))
> +{
> +case FRM_RNE:
> +  return FRM_RNE;
> +case FRM_RTZ:
> +  return FRM_RTZ;
> +case FRM_RDN:
> +  return FRM_RDN;
> +case FRM_RUP:
> +  return FRM_RUP;
> +case FRM_RMM:
> +  return FRM_RMM;
> +case FRM_DYN:
> +  return FRM_DYN;
> +default:
> +  return FRM_DYN;

Should we put a gcc_unreachable or gcc_assert here? I am not sure if
another value is valid when it appeared for this operand?

> +}
> +
> +  gcc_unreachable ();
> +}


RE: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.

2023-08-09 Thread Liu, Hongtao via Gcc-patches


> -Original Message-
> From: Xi Ruoyao 
> Sent: Thursday, August 10, 2023 9:48 AM
> To: Liu, Hongtao ; gcc-patches@gcc.gnu.org
> Cc: richard.guent...@gmail.com; ubiz...@gmail.com; hubi...@ucw.cz
> Subject: Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable
> vectorization for all gather/scatter instructions.
> 
> On Thu, 2023-08-10 at 09:11 +0800, liuhongt via Gcc-patches wrote:
> > Currently we have 3 different independent tunes for gather
> > "use_gather,use_gather_2parts,use_gather_4parts",
> > similar for scatter, there're
> > "use_scatter,use_scatter_2parts,use_scatter_4parts"
> >
> > The patch support 2 standardizing options to enable/disable
> > vectorization for all gather/scatter instructions. The options is
> > interpreted by driver to 3 tunes.
> >
> > bootstrapped and regtested on x86_64-pc-linux-gnu.
> > Ok for trunk?
> 
> And should we set -mno-gather as the default for GDS affected processors?
> We'll likely apply the ucode update for them, and then the gathering
> instructions will be much slower.
Assume you're talking about 
https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/gather-data-sampling.html
Yes, there will be an separate patch for microarchitecture tuning.
> 
> > gcc/ChangeLog:
> >
> > * config/i386/i386.h (DRIVER_SELF_SPECS): Add
> > GATHER_SCATTER_DRIVER_SELF_SPECS.
> > (GATHER_SCATTER_DRIVER_SELF_SPECS): New macro.
> > * config/i386/i386.opt (mgather): New option.
> > (mscatter): Ditto.
> > ---
> >  gcc/config/i386/i386.h   | 12 +++-
> >  gcc/config/i386/i386.opt |  8 
> >  2 files changed, 19 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index
> > ef342fcee9b..d9ac2c29bde 100644
> > --- a/gcc/config/i386/i386.h
> > +++ b/gcc/config/i386/i386.h
> > @@ -565,7 +565,17 @@ extern GTY(()) tree x86_mfence;
> >  # define SUBTARGET_DRIVER_SELF_SPECS ""
> >  #endif
> >
> > -#define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS
> > +#ifndef GATHER_SCATTER_DRIVER_SELF_SPECS # define
> > +GATHER_SCATTER_DRIVER_SELF_SPECS \
> > +  "%{mno-gather:-mtune-
> > ctrl=^use_gather_2parts,^use_gather_4parts,^use_gather} \
> > +   %{mgather:-mtune-
> > ctrl=use_gather_2parts,use_gather_4parts,use_gather} \
> > +   %{mno-scatter:-mtune-
> > ctrl=^use_scatter_2parts,^use_scatter_4parts,^use_scatter} \
> > +   %{mscatter:-mtune-
> > ctrl=use_scatter_2parts,use_scatter_4parts,use_scatter}"
> > +#endif
> > +
> > +#define DRIVER_SELF_SPECS \
> > +  SUBTARGET_DRIVER_SELF_SPECS " " \
> > +  GATHER_SCATTER_DRIVER_SELF_SPECS
> >
> >  /* -march=native handling only makes sense with compiler running on
> >     an x86 or x86_64 chip.  If changing this condition, also change
> > diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index
> > ddb7f110aa2..99948644a8d 100644
> > --- a/gcc/config/i386/i386.opt
> > +++ b/gcc/config/i386/i386.opt
> > @@ -424,6 +424,14 @@ mdaz-ftz
> >  Target
> >  Set the FTZ and DAZ Flags.
> >
> > +mgather
> > +Target
> > +Enable vectorization for gather instruction.
> > +
> > +mscatter
> > +Target
> > +Enable vectorization for scatter instruction.
> > +
> >  mpreferred-stack-boundary=
> >  Target RejectNegative Joined UInteger
> > Var(ix86_preferred_stack_boundary_arg)
> >  Attempt to keep stack aligned to this power of 2.
> 
> --
> Xi Ruoyao 
> School of Aerospace Science and Technology, Xidian University


Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.

2023-08-09 Thread Xi Ruoyao via Gcc-patches
On Thu, 2023-08-10 at 09:11 +0800, liuhongt via Gcc-patches wrote:
> Currently we have 3 different independent tunes for gather
> "use_gather,use_gather_2parts,use_gather_4parts",
> similar for scatter, there're
> "use_scatter,use_scatter_2parts,use_scatter_4parts"
> 
> The patch support 2 standardizing options to enable/disable
> vectorization for all gather/scatter instructions. The options is
> interpreted by driver to 3 tunes.
> 
> bootstrapped and regtested on x86_64-pc-linux-gnu.
> Ok for trunk?

And should we set -mno-gather as the default for GDS affected
processors?  We'll likely apply the ucode update for them, and then the
gathering instructions will be much slower.

> gcc/ChangeLog:
> 
> * config/i386/i386.h (DRIVER_SELF_SPECS): Add
> GATHER_SCATTER_DRIVER_SELF_SPECS.
> (GATHER_SCATTER_DRIVER_SELF_SPECS): New macro.
> * config/i386/i386.opt (mgather): New option.
> (mscatter): Ditto.
> ---
>  gcc/config/i386/i386.h   | 12 +++-
>  gcc/config/i386/i386.opt |  8 
>  2 files changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index ef342fcee9b..d9ac2c29bde 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -565,7 +565,17 @@ extern GTY(()) tree x86_mfence;
>  # define SUBTARGET_DRIVER_SELF_SPECS ""
>  #endif
>  
> -#define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS
> +#ifndef GATHER_SCATTER_DRIVER_SELF_SPECS
> +# define GATHER_SCATTER_DRIVER_SELF_SPECS \
> +  "%{mno-gather:-mtune-
> ctrl=^use_gather_2parts,^use_gather_4parts,^use_gather} \
> +   %{mgather:-mtune-
> ctrl=use_gather_2parts,use_gather_4parts,use_gather} \
> +   %{mno-scatter:-mtune-
> ctrl=^use_scatter_2parts,^use_scatter_4parts,^use_scatter} \
> +   %{mscatter:-mtune-
> ctrl=use_scatter_2parts,use_scatter_4parts,use_scatter}"
> +#endif
> +
> +#define DRIVER_SELF_SPECS \
> +  SUBTARGET_DRIVER_SELF_SPECS " " \
> +  GATHER_SCATTER_DRIVER_SELF_SPECS
>  
>  /* -march=native handling only makes sense with compiler running on
>     an x86 or x86_64 chip.  If changing this condition, also change
> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> index ddb7f110aa2..99948644a8d 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -424,6 +424,14 @@ mdaz-ftz
>  Target
>  Set the FTZ and DAZ Flags.
>  
> +mgather
> +Target
> +Enable vectorization for gather instruction.
> +
> +mscatter
> +Target
> +Enable vectorization for scatter instruction.
> +
>  mpreferred-stack-boundary=
>  Target RejectNegative Joined UInteger
> Var(ix86_preferred_stack_boundary_arg)
>  Attempt to keep stack aligned to this power of 2.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.

2023-08-09 Thread liuhongt via Gcc-patches
Currently we have 3 different independent tunes for gather
"use_gather,use_gather_2parts,use_gather_4parts",
similar for scatter, there're
"use_scatter,use_scatter_2parts,use_scatter_4parts"

The patch support 2 standardizing options to enable/disable
vectorization for all gather/scatter instructions. The options is
interpreted by driver to 3 tunes.

bootstrapped and regtested on x86_64-pc-linux-gnu.
Ok for trunk?

gcc/ChangeLog:

* config/i386/i386.h (DRIVER_SELF_SPECS): Add
GATHER_SCATTER_DRIVER_SELF_SPECS.
(GATHER_SCATTER_DRIVER_SELF_SPECS): New macro.
* config/i386/i386.opt (mgather): New option.
(mscatter): Ditto.
---
 gcc/config/i386/i386.h   | 12 +++-
 gcc/config/i386/i386.opt |  8 
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index ef342fcee9b..d9ac2c29bde 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -565,7 +565,17 @@ extern GTY(()) tree x86_mfence;
 # define SUBTARGET_DRIVER_SELF_SPECS ""
 #endif
 
-#define DRIVER_SELF_SPECS SUBTARGET_DRIVER_SELF_SPECS
+#ifndef GATHER_SCATTER_DRIVER_SELF_SPECS
+# define GATHER_SCATTER_DRIVER_SELF_SPECS \
+  "%{mno-gather:-mtune-ctrl=^use_gather_2parts,^use_gather_4parts,^use_gather} 
\
+   %{mgather:-mtune-ctrl=use_gather_2parts,use_gather_4parts,use_gather} \
+   
%{mno-scatter:-mtune-ctrl=^use_scatter_2parts,^use_scatter_4parts,^use_scatter} 
\
+   %{mscatter:-mtune-ctrl=use_scatter_2parts,use_scatter_4parts,use_scatter}"
+#endif
+
+#define DRIVER_SELF_SPECS \
+  SUBTARGET_DRIVER_SELF_SPECS " " \
+  GATHER_SCATTER_DRIVER_SELF_SPECS
 
 /* -march=native handling only makes sense with compiler running on
an x86 or x86_64 chip.  If changing this condition, also change
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index ddb7f110aa2..99948644a8d 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -424,6 +424,14 @@ mdaz-ftz
 Target
 Set the FTZ and DAZ Flags.
 
+mgather
+Target
+Enable vectorization for gather instruction.
+
+mscatter
+Target
+Enable vectorization for scatter instruction.
+
 mpreferred-stack-boundary=
 Target RejectNegative Joined UInteger Var(ix86_preferred_stack_boundary_arg)
 Attempt to keep stack aligned to this power of 2.
-- 
2.31.1



RISC-V: Enable Hoist to GCSE simple constants

2023-08-09 Thread juzhe.zh...@rivai.ai

-/* { dg-final { scan-assembler-times {j\s+\.L[0-9]+\s+\.L[0-9]+:\s+vlm\.v} 1 { 
target { no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-times {j\s+\.L[0-9]+\s+\.L[0-9]+:\s+vlm\.v} 1 { 
target { no-opts "-O0" no-opts "-O1" no-opts "-O2" no-opts "-Os" no-opts "-Oz" 
no-opts "-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times {j\s+\.L[0-9]+\s+\.L[0-9]+:\s+vlm\.v} 1 { 
target { no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-times {j\s+\.L[0-9]+\s+\.L[0-9]+:\s+vlm\.v} 1 { 
target { no-opts "-O0" no-opts "-O1" no-opts "-O2"  no-opts "-Os" no-opts "-Oz" 
no-opts "-funroll-loops" no-opts "-g" } } } } */

Instead of disabling O2, you should remove these 2 assembly check since you are 
disabling all option now. 
I have checked this patch, this patch didn't make VSETVL  codegen worse.
Ok from myside.


juzhe.zh...@rivai.ai


Re: Re: [PATCH] RISC-V: Fix VLMAX AVL incorrect local anticipate [VSETVL PASS]

2023-08-09 Thread juzhe.zh...@rivai.ai
Yes. I think so. Will backport GCC 13 soon.



juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-08-10 01:01
To: Juzhe-Zhong; gcc-patches
CC: kito.cheng; kito.cheng; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Fix VLMAX AVL incorrect local anticipate [VSETVL 
PASS]
 
 
On 8/9/23 04:51, Juzhe-Zhong wrote:
> Realize we have a bug in VSETVL PASS which is triggered by 
> strided_load_run-1.c in RV32 system.
> 
> FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c 
> execution test
> FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c 
> execution test
> FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c 
> execution test
> FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c 
> execution test
> 
> This is because VSETVL PASS incorrect hoist vsetvl instruction:
> 
> ...
> 10156: 0d9075d7  vsetvli a1,zero,e64,m2,ta,ma ---> pollute 'a1' 
> register which will be used by following insns.
> 1015a: 01d586b3  add a3,a1,t4  > use 'a1'
> 1015e: 5e070257  vmv.v.v v4,v14
> 10162: b7032257  vmacc.vv v4,v6,v16
> 10166: 26440257  vand.vv v4,v4,v8
> 1016a: 22880227  vs2r.v v4,(a6)
> 1016e: 00b6b7b3  sltu a5,a3,a1
> 10172: 22888227  vs2r.v v4,(a7)
> 10176: 9e60b157  vmv2r.v v2,v6
> 1017a: 97baadd a5,a5,a4
> 1017c: a6a62157  vmadd.vv v2,v12,v10
> 10180: 26240157  vand.vv v2,v2,v8
> 10184: 22830127  vs2r.v v2,(t1)
> 10188: 873emv a4,a5
> 1018a: 982aadd a6,a6,a0
> 1018c: 98aaadd a7,a7,a0
> 1018e: 932aadd t1,t1,a0
> 10190: 85b6mv a1,a3   -> set 'a1'
> ...
> 
> gcc/ChangeLog:
> 
>  * config/riscv/riscv-vsetvl.cc (anticipatable_occurrence_p): Fix 
> incorrect anticipate info.
> 
> gcc/testsuite/ChangeLog:
> 
>  * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c: 
> Adapt test.
>  * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-24.c: Ditto.
>  * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c: Ditto.
>  * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c: Ditto.
>  * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-36.c: Ditto.
>  * gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-14.c: Ditto.
>  * gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-15.c: Ditto.
OK.
 
Do we need to backport this to gcc-13?
 
jeff
 


[PATCH] i386: Do not sanitize upper part of V2HFmode and V4HFmode reg with -fno-trapping-math [PR110832]

2023-08-09 Thread liuhongt via Gcc-patches
Also add ix86_partial_vec_fp_math to to condition of V2HF/V4HF named
patterns in order to avoid generation of partial vector V8HFmode
trapping instructions.

Bootstrapped and regtseted on x86_64-pc-linux-gnu{-m32,}
Ok for trunk?

gcc/ChangeLog:

PR target/110832
* config/i386/mmx.md: (movq__to_sse): Also do not
sanitize upper part of V4HFmode register with
-fno-trapping-math.
(v4hf3): Enable for ix86_partial_vec_fp_math.
(v2hf3): Ditto.
(divv2hf3): Ditto.
(movd_v2hf_to_sse): Do not sanitize upper part of V2HFmode
register with -fno-trapping-math.
---
 gcc/config/i386/mmx.md | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index d51b3b9dc71..170432a7128 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -596,7 +596,7 @@ (define_expand "movq__to_sse"
  (match_dup 2)))]
   "TARGET_SSE2"
 {
-  if (mode == V2SFmode
+  if (mode != V2SImode
   && !flag_trapping_math)
 {
   rtx op1 = force_reg (mode, operands[1]);
@@ -1941,7 +1941,7 @@ (define_expand "v4hf3"
(plusminusmult:V4HF
  (match_operand:V4HF 1 "nonimmediate_operand")
  (match_operand:V4HF 2 "nonimmediate_operand")))]
-  "TARGET_AVX512FP16 && TARGET_AVX512VL"
+  "TARGET_AVX512FP16 && TARGET_AVX512VL && ix86_partial_vec_fp_math"
 {
   rtx op2 = gen_reg_rtx (V8HFmode);
   rtx op1 = gen_reg_rtx (V8HFmode);
@@ -1961,7 +1961,7 @@ (define_expand "divv4hf3"
(div:V4HF
  (match_operand:V4HF 1 "nonimmediate_operand")
  (match_operand:V4HF 2 "nonimmediate_operand")))]
-  "TARGET_AVX512FP16 && TARGET_AVX512VL"
+  "TARGET_AVX512FP16 && TARGET_AVX512VL && ix86_partial_vec_fp_math"
 {
   rtx op2 = gen_reg_rtx (V8HFmode);
   rtx op1 = gen_reg_rtx (V8HFmode);
@@ -1983,14 +1983,22 @@ (define_expand "movd_v2hf_to_sse"
(match_operand:V2HF 1 "nonimmediate_operand"))
  (match_operand:V8HF 2 "reg_or_0_operand")
  (const_int 3)))]
-  "TARGET_SSE")
+  "TARGET_SSE"
+{
+  if (!flag_trapping_math && operands[2] == CONST0_RTX (V8HFmode))
+  {
+rtx op1 = force_reg (V2HFmode, operands[1]);
+emit_move_insn (operands[0], lowpart_subreg (V8HFmode, op1, V2HFmode));
+DONE;
+  }
+})
 
 (define_expand "v2hf3"
   [(set (match_operand:V2HF 0 "register_operand")
(plusminusmult:V2HF
  (match_operand:V2HF 1 "nonimmediate_operand")
  (match_operand:V2HF 2 "nonimmediate_operand")))]
-  "TARGET_AVX512FP16 && TARGET_AVX512VL"
+  "TARGET_AVX512FP16 && TARGET_AVX512VL && ix86_partial_vec_fp_math"
 {
   rtx op2 = gen_reg_rtx (V8HFmode);
   rtx op1 = gen_reg_rtx (V8HFmode);
@@ -2009,7 +2017,7 @@ (define_expand "divv2hf3"
(div:V2HF
  (match_operand:V2HF 1 "nonimmediate_operand")
  (match_operand:V2HF 2 "nonimmediate_operand")))]
-  "TARGET_AVX512FP16 && TARGET_AVX512VL"
+  "TARGET_AVX512FP16 && TARGET_AVX512VL && ix86_partial_vec_fp_math"
 {
   rtx op2 = gen_reg_rtx (V8HFmode);
   rtx op1 = gen_reg_rtx (V8HFmode);
-- 
2.31.1



[PATCH] RISC-V: Enable Hoist to GCSE simple constants

2023-08-09 Thread Vineet Gupta
Hoist want_to_gcse_p () calls rtx_cost () to compute max distance for
hoist candidates . For a const with cost 1 backend currently returns 0,
causing Hoist to bail and elide GCSE.

Note that constants requiring more than 1 insns to setup were working
already since backend is returning 1 as well. Arguably that needs updating
as well to reflect cost better, but that should be a different change
anyways.

To keep testsuite parity, some V tests need updatinge which started failing
in the new costing regime.

gcc/ChangeLog:
* gcc/config/riscv.cc (riscv_rtx_costs): Adjust const_int cost.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/gcse-const.c: New Test
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c: Disable for
  -O2.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-8.c: Ditto.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv.cc   |  9 ++---
 gcc/testsuite/gcc.target/riscv/gcse-const.c | 13 +
 .../gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c  |  2 +-
 .../gcc.target/riscv/rvv/vsetvl/vlmax_conflict-8.c  |  2 +-
 4 files changed, 17 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/gcse-const.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 8b7256108157..1802eef908fc 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2464,14 +2464,9 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
 case CONST:
   if ((cost = riscv_const_insns (x)) > 0)
{
- /* If the constant is likely to be stored in a GPR, SETs of
-single-insn constants are as cheap as register sets; we
-never want to CSE them.  */
+ /* Hoist will GCSE constants only if cost is non-zero.  */
  if (cost == 1 && outer_code == SET)
-   *total = 0;
- /* When we load a constant more than once, it usually is better
-to duplicate the last operation in the sequence than to CSE
-the constant itself.  */
+   *total = COSTS_N_INSNS (1);
  else if (outer_code == SET || GET_MODE (x) == VOIDmode)
*total = COSTS_N_INSNS (1);
}
diff --git a/gcc/testsuite/gcc.target/riscv/gcse-const.c 
b/gcc/testsuite/gcc.target/riscv/gcse-const.c
new file mode 100644
index ..b04707ce9745
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/gcse-const.c
@@ -0,0 +1,13 @@
+/* Slightly modified copy of gcc.target/arm/pr40956.c.  */
+/* { dg-options "-Os" }  */
+/* Make sure the constant "6" is loaded into register only once.  */
+/* { dg-final { scan-assembler-times "\tli.*6" 1 } } */
+
+int foo(int p, int* q)
+{
+  if (p!=9)
+*q = 6;
+  else
+*(q+1) = 6;
+  return 3;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c
index 60ad108666f8..83618880c8ea 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c
@@ -21,5 +21,5 @@ void f (int32_t * restrict in, int32_t * restrict out, size_t 
n, size_t cond, si
 }
 
 /* { dg-final { scan-assembler-times {vsetvli} 4 { target { no-opts "-O0"  
no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts 
"-g" } } } } */
-/* { dg-final { scan-assembler-times {j\s+\.L[0-9]+\s+\.L[0-9]+:\s+vlm\.v} 1 { 
target { no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-times {j\s+\.L[0-9]+\s+\.L[0-9]+:\s+vlm\.v} 1 { 
target { no-opts "-O0" no-opts "-O1" no-opts "-O2" no-opts "-Os" no-opts "-Oz" 
no-opts "-funroll-loops" no-opts "-g" } } } } */
 /* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*m8,\s*t[au],\s*m[au]} 3 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-8.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-8.c
index 7b9574cc332d..8aca839c49aa 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-8.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-8.c
@@ -21,7 +21,7 @@ void f (int32_t * restrict in, int32_t * restrict out, size_t 
n, size_t cond, si
 }
 
 /* { dg-final { scan-assembler-times {vsetvli} 5 { target { no-opts "-O0"  
no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts 
"-g" } } } } */
-/* { dg-final { scan-assembler-times {j\s+\.L[0-9]+\s+\.L[0-9]+:\s+vlm\.v} 1 { 
target { no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-times {j\s+\.L[0-9]+\s+\.L[0-9]+:\s+vlm\.v} 1 { 
target { no-opts "-O0" no-opts "-O1" no-opts "-O2"  no-opts "-Os" no-opts "-Oz" 
no-opts "-funroll-loops" no-opts "-g" } } 

[PATCH] Fix PR 110954: wrong code with cmp | !cmp

2023-08-09 Thread Andrew Pinski via Gcc-patches
This was an oversight on my part not realizing that
comparisons in generic can have a non-boolean type.
This means if we have `(f < 0) | !(f < 0)` we would
optimize this to -1 rather than just 1.
This patch just adds the check for the type of the comparisons
to be boolean type to keep the optimization in that case.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR 110954

gcc/ChangeLog:

* generic-match-head.cc (bitwise_inverted_equal_p): Check
the type of the comparison to be boolean too.

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/pr110954-1.c: New test.
---
 gcc/generic-match-head.cc|  3 ++-
 gcc/testsuite/gcc.c-torture/execute/pr110954-1.c | 10 ++
 2 files changed, 12 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr110954-1.c

diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
index ddaf22f2179..ac2119bfdd0 100644
--- a/gcc/generic-match-head.cc
+++ b/gcc/generic-match-head.cc
@@ -146,7 +146,8 @@ bitwise_inverted_equal_p (tree expr1, tree expr2)
   && bitwise_equal_p (expr1, TREE_OPERAND (expr2, 0)))
 return true;
   if (COMPARISON_CLASS_P (expr1)
-  && COMPARISON_CLASS_P (expr2))
+  && COMPARISON_CLASS_P (expr2)
+  && TREE_CODE (TREE_TYPE (expr1)) == BOOLEAN_TYPE)
 {
   tree op10 = TREE_OPERAND (expr1, 0);
   tree op20 = TREE_OPERAND (expr2, 0);
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr110954-1.c 
b/gcc/testsuite/gcc.c-torture/execute/pr110954-1.c
new file mode 100644
index 000..8aad758e10f
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr110954-1.c
@@ -0,0 +1,10 @@
+
+#define comparison (f < 0)
+int main() {
+  int f = 0;
+  int d = comparison | !comparison;
+  if (d != 1)
+__builtin_abort();
+  return 0;
+}
+
-- 
2.31.1



RE: [PATCH] RISC-V: Fix VLMAX AVL incorrect local anticipate [VSETVL PASS]

2023-08-09 Thread Li, Pan2 via Gcc-patches
Committed to trunk, thanks Jeff.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Jeff Law via Gcc-patches
Sent: Thursday, August 10, 2023 1:01 AM
To: Juzhe-Zhong ; gcc-patches@gcc.gnu.org
Cc: kito.ch...@gmail.com; kito.ch...@sifive.com; rdapp@gmail.com
Subject: Re: [PATCH] RISC-V: Fix VLMAX AVL incorrect local anticipate [VSETVL 
PASS]



On 8/9/23 04:51, Juzhe-Zhong wrote:
> Realize we have a bug in VSETVL PASS which is triggered by 
> strided_load_run-1.c in RV32 system.
> 
> FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c 
> execution test
> FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c 
> execution test
> FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c 
> execution test
> FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c 
> execution test
> 
> This is because VSETVL PASS incorrect hoist vsetvl instruction:
> 
> ...
> 10156:0d9075d7vsetvli a1,zero,e64,m2,ta,ma ---> 
> pollute 'a1' register which will be used by following insns.
> 1015a:01d586b3add a3,a1,t4  > use 'a1'
> 1015e:5e070257vmv.v.v v4,v14
> 10162:b7032257vmacc.vvv4,v6,v16
> 10166:26440257vand.vv v4,v4,v8
> 1016a:22880227vs2r.v  v4,(a6)
> 1016e:00b6b7b3sltua5,a3,a1
> 10172:22888227vs2r.v  v4,(a7)
> 10176:9e60b157vmv2r.v v2,v6
> 1017a:97baadd a5,a5,a4
> 1017c:a6a62157vmadd.vvv2,v12,v10
> 10180:26240157vand.vv v2,v2,v8
> 10184:22830127vs2r.v  v2,(t1)
> 10188:873emv  a4,a5
> 1018a:982aadd a6,a6,a0
> 1018c:98aaadd a7,a7,a0
> 1018e:932aadd t1,t1,a0
> 10190:85b6mv  a1,a3   -> set 'a1'
> ...
> 
> gcc/ChangeLog:
> 
>  * config/riscv/riscv-vsetvl.cc (anticipatable_occurrence_p): Fix 
> incorrect anticipate info.
> 
> gcc/testsuite/ChangeLog:
> 
>  * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c: 
> Adapt test.
>  * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-24.c: Ditto.
>  * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c: Ditto.
>  * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c: Ditto.
>  * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-36.c: Ditto.
>  * gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-14.c: Ditto.
>  * gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-15.c: Ditto.
OK.

Do we need to backport this to gcc-13?

jeff


Re: [RFC PATCH 0/2] RISC-V: __builtin_riscv_pause for all environment

2023-08-09 Thread Tsukasa OI via Gcc-patches
On 2023/08/10 5:05, Jeff Law wrote:
> 
> 
> On 8/9/23 00:11, Tsukasa OI via Gcc-patches wrote:
>> Hello,
>>
>> I found that a built-in function "__builtin_riscv_pause" is not usable
>> unless we enable the 'Zihintpause' extension explicitly (still, this
>> built-in exists EVEN IF the 'Zihintpause' extension is disabled).
>>
>> Contents of a.c:
>>
>>> void sample(void)
>>> {
>>>  __builtin_riscv_pause();
>>> }
>>
>>
>> Compiling with the 'Zihintpause' extension is fine.
>>
>>> $ riscv64-unknown-elf-gcc -O2 -march=rv64i_zihintpause -mabi=lp64 -c a.c
>>
>>
>> However, compiling without the 'Zihintpause' causes an assembler error
>> (tested with GNU Binutils 2.41):
>>
>>> $ riscv64-unknown-elf-gcc -O2 -march=rv64i -mabi=lp64 -c a.c
>>> /tmp/ccFjacAz.s: Assembler messages:
>>> /tmp/ccFjacAz.s:11: Error: unrecognized opcode `pause', extension
>>> `zihintpause' required
>>
>>
>> This is because:
>>
>> 1.  GCC does not handle the 'Zihintpause' extension and
>> 2.  "riscv_pause" (insn) unconditionally emits "pause" even if the
>>  assembler does not accept it (when the extension is disabled).
>>
>>
>> This patch set (PATCH 1/2) resolves this issue by:
>>
>> 1.  Handling the 'Zihintpause' extension and
>> 2.  Splitting the "__builtin_riscv_pause" implementation
>>  depending on the existence of the 'Zihintpause' extension.
>>
>> Because a released version of GCC defines "__builtin_riscv_pause"
>> unconditionally, I chose to define no-'Zihintpause' version.
>>
>> There is another option to unconditionally emit ".insn 0x010f"
>> (the machine code of "pause") but I didn't because I wanted to improve
>> the
>> diagnostics (e.g. *.s output).
>>
>> I also fixed the description of this built-in function (in PATCH 2/2).
>>
>>
>> I'm not sure whether this is a good method to split the implementation
>> depending on the 'Zihintpause' extension.  Other than that, I believe
>> that
>> this is okay and approval is appreciated.
>>
>> Note that because I assigned copyright of GCC contribution to FSF, I
>> didn't
>> attach "Signed-off-by" tag.  Tell me if I need it.
> I'd tend to think we do not want to expose the intrinsic unless the
> right extensions are enabled -- even though the encoding is a no-op and
> we could emit it as a .insn.

I think that makes sense.  The only reason I implemented the
no-'Zihintpause' version is because GCC 13 implemented the built-in
unconditionally.  If the compatibility breakage is considered minimum (I
don't know, though), I'm ready to submit 'Zihintpause'-only version of
this patch set.

Thanks,
Tsukasa

> 
> If others think otherwise, I'll go with the consensus opinion.  So let's
> hold off a bit until others have chimed in.
> 
> Thanks,
> jeff
> 


[PATCH v4 7/8] diagnostics: libcpp: Assign real locations to the tokens inside _Pragma strings

2023-08-09 Thread Lewis Hyatt via Gcc-patches
Currently, the tokens obtained from a destringified _Pragma string do not get
assigned proper locations while they are being lexed.  After the tokens have
been obtained, they are reassigned the same location as the _Pragma token,
which is sufficient to make things like _Pragma("GCC diagnostic ignored...")
operate correctly, but this still results in inferior diagnostics, since the
diagnostics do not point to the problematic tokens.  Further, if a diagnostic
is issued by libcpp during the lexing of the tokens, as opposed to being
issued by the frontend during the processing of the pragma, then the
patched-up location is not yet in place, and the user rather sees an invalid
location that is near to the location of the _Pragma string in some cases, or
potentially very far away, depending on the macro expansion history.  For
example:

=
_Pragma("GCC diagnostic ignored \"oops")
=

produces the diagnostic:

file.cpp:1:24: warning: missing terminating " character
1 | _Pragma("GCC diagnostic ignored \"oops")
  |^

with the caret in a nonsensical location, while this one:

=
 #define S "GCC diagnostic ignored \"oops"
_Pragma(S)
=

produces:

file.cpp:2:24: warning: missing terminating " character
2 | _Pragma(S)
  |^

with both the caret in a nonsensical location, and the actual relevant context
completely absent.

Fix this by assigning proper locations using the new LC_GEN type of linemap.
Now the tokens are given locations inside a generated content buffer, and the
macro expansion stack is modified to be aware that these tokens logically
belong to the "expansion" of the _Pragma directive. For the above examples we
now output:

==
In buffer generated from file.cpp:1:
:1:24: warning: missing terminating " character
1 | GCC diagnostic ignored "oops
  |^
file.cpp:1:1: note: in <_Pragma directive>
1 | _Pragma("GCC diagnostic ignored \"oops")
  | ^~~
==

and

==
:1:24: warning: missing terminating " character
1 | GCC diagnostic ignored "oops
  |^
file.cpp:2:1: note: in <_Pragma directive>
2 | _Pragma(S)
  | ^~~
==

So that carets are pointing to something meaningful and all relevant context
appears in the diagnostic.  For the second example, it would be nice if the
macro expansion also output "in expansion of macro S", however doing that for
a general case of macro expansions makes the logic very complicated, since it
has to be done after the fact when the macro maps have already been
constructed.  It doesn't seem worth it for this case, given that the _Pragma
string has already been output once on the first line.

gcc/ChangeLog:

* tree-diagnostic.cc (maybe_unwind_expanded_macro_loc): Add awareness
of _Pragma directive to the macro expansion trace.

libcpp/ChangeLog:

* directives.cc (get_token_no_padding): Add argument to receive the
virtual location of the token.
(get__Pragma_string): Likewise.
(do_pragma): Set pfile->directive_result->src_loc properly, it should
not be a virtual location.
(destringize_and_run): Update to provide proper locations for the
_Pragma string tokens.  Support raw strings.
(_cpp_do__Pragma): Adapt to changes to the helper functions.
* errors.cc (cpp_diagnostic_at): Support
cpp_reader::diagnostic_rebase_loc.
(cpp_diagnostic_with_line): Likewise.
* include/line-map.h (class rich_location): Add new member
forget_cached_expanded_locations().
* internal.h (struct _cpp__Pragma_state): Define new struct.
(_cpp_rebase_diagnostic_location): Declare new function.
(struct cpp_reader): Add diagnostic_rebase_loc member.
(_cpp_push__Pragma_token_context): Declare new function.
(_cpp_do__Pragma): Adjust prototype.
* macro.cc (pragma_str): New static var.
(builtin_macro): Adapt to new implementation of _Pragma processing.
(_cpp_pop_context): Fix the logic for resetting
pfile->top_most_macro_node, which previously was never triggered,
although the error seems to have been harmless.
(_cpp_push__Pragma_token_context): New function.
(_cpp_rebase_diagnostic_location): New function.

gcc/c-family/ChangeLog:

* c-ppoutput.cc (token_streamer::stream): Pass the virtual location of
the _Pragma token to maybe_print_line(), not the spelling location.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Adjust for new
macro tracking output for _Pragma directives.
* testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Likewise.

gcc/testsuite/ChangeLog:

* c-c++-common/cpp/diagnostic-pragma-1.c: Adjust for new macro
tracking output for _Pragma directives.
* c-c++-common/cpp/pr57580.c: Likewise.
* c-c++-common/gomp/pragma-3.c: Likewise.

[PATCH v4 6/8] diagnostics: Full support for generated data locations

2023-08-09 Thread Lewis Hyatt via Gcc-patches
Previous patches in this series have laid the groundwork for supporting
source code locations in memory ("generated data") rather than ordinary
files. This patch completes the support by adding awareness of such
locations to all places that need to support them. The main changes are to
diagnostic-show-locus.cc; the others are primarily small tweaks such as
changing from the FILE to the SRC member when inspecting an
expanded_location.

gcc/c-family/ChangeLog:

* c-format.cc (get_corrected_substring): Use the new overload of
location_get_source_line() to support generated data.
* c-indentation.cc (get_visual_column): Likewise.
(get_first_nws_vis_column): Change argument from a plain file name
to a source_id.
(detect_intervening_unindent): Likewise.
(should_warn_for_misleading_indentation): Pass
detect_intervening_unindent() the SRC field rather than the FILE
field from the expanded_location.

gcc/ChangeLog:

* gcc-rich-location.cc (blank_line_before_p): Use the new overload
of location_get_source_line() to support generated data.
* input.cc (get_source_text_between): Likewise.
(get_substring_ranges_for_loc): Likewise.
(get_source_file_content): Change the argument from a plain filename
to a source_id.
(location_missing_trailing_newline): Likewise.
* input.h (get_source_file_content): Adjust prototype.
(location_missing_trailing_newline): Likewise.
* diagnostic-show-locus.cc (layout::calculate_x_offset_display): Use
the new overload of location_get_source_line() to support generated
data.
(layout::print_line): Likewise.
(class line_corrections): Change m_filename from a plain filename to
a source_id.
(source_line::source_line): Change argument from a plain filename to
a source_id.
(line_corrections::add_hint): Adapt to source_line change.
(layout::print_trailing_fixits): Adapt to line_corrections change.
(test_layout_x_offset_display_utf8): Test generated data too.
(test_layout_x_offset_display_tab): Likewise.
(test_diagnostic_show_locus_one_liner): Likewise.
(test_diagnostic_show_locus_one_liner_utf8): Likewise.
(test_add_location_if_nearby): Likewise.
(test_diagnostic_show_locus_fixit_lines): Likewise.
(test_fixit_consolidation): Likewise.
(test_overlapped_fixit_printing): Likewise.
(test_overlapped_fixit_printing_utf8): Likewise.
(test_overlapped_fixit_printing_2): Likewise.
(test_fixit_insert_containing_newline): Likewise.
(test_fixit_insert_containing_newline_2): Likewise.
(test_fixit_replace_containing_newline): Likewise.
(test_fixit_deletion_affecting_newline): Likewise.
(test_tab_expansion): Likewise.
(test_escaping_bytes_1): Likewise.
(test_escaping_bytes_2): Likewise.
(test_line_numbers_multiline_range): Likewise.
(diagnostic_show_locus_cc_tests): Likewise.
---
 gcc/c-family/c-format.cc  |   2 +-
 gcc/c-family/c-indentation.cc |   8 +-
 gcc/diagnostic-show-locus.cc  | 227 ++
 gcc/gcc-rich-location.cc  |   2 +-
 gcc/input.cc  |  21 ++--
 gcc/input.h   |   6 +-
 6 files changed, 136 insertions(+), 130 deletions(-)

diff --git a/gcc/c-family/c-format.cc b/gcc/c-family/c-format.cc
index 529b1408179..929ec24622c 100644
--- a/gcc/c-family/c-format.cc
+++ b/gcc/c-family/c-format.cc
@@ -4537,7 +4537,7 @@ get_corrected_substring (const substring_loc _loc,
   if (caret.column > finish.column)
 return NULL;
 
-  char_span line = location_get_source_line (start.file, start.line);
+  char_span line = location_get_source_line (start);
   if (!line)
 return NULL;
 
diff --git a/gcc/c-family/c-indentation.cc b/gcc/c-family/c-indentation.cc
index fce74991aae..27a90d9cc15 100644
--- a/gcc/c-family/c-indentation.cc
+++ b/gcc/c-family/c-indentation.cc
@@ -50,7 +50,7 @@ get_visual_column (expanded_location exploc,
   unsigned int *first_nws,
   unsigned int tab_width)
 {
-  char_span line = location_get_source_line (exploc.file, exploc.line);
+  char_span line = location_get_source_line (exploc);
   if (!line)
 return false;
   if ((size_t)exploc.column > line.length ())
@@ -87,7 +87,7 @@ get_visual_column (expanded_location exploc,
Otherwise, return false, leaving *FIRST_NWS untouched.  */
 
 static bool
-get_first_nws_vis_column (const char *file, int line_num,
+get_first_nws_vis_column (source_id file, int line_num,
  unsigned int *first_nws,
  unsigned int tab_width)
 {
@@ -158,7 +158,7 @@ get_first_nws_vis_column (const char *file, int line_num,
Return true if such an unindent/outdent is detected.  */
 
 static bool
-detect_intervening_unindent (const char *file,

[PATCH v4 5/8] diagnostics: Support testing generated data in input.cc selftests

2023-08-09 Thread Lewis Hyatt via Gcc-patches
Add selftests for the new capabilities in input.cc related to source code
locations that are stored in memory rather than ordinary files.

gcc/ChangeLog:

* input.cc (temp_source_file::do_linemap_add): New function.
(line_table_case::line_table_case): Add GENERATED_DATA argument.
(line_table_test::line_table_test): Implement new M_GENERATED_DATA
argument.
(for_each_line_table_case): Optionally include generated data
locations in the set of cases.
(test_accessing_ordinary_linemaps): Test generated data locations.
(test_make_location_nonpure_range_endpoints): Likewise.
(test_line_offset_overflow): Likewise.
(input_cc_tests): Likewise.
* selftest.cc (named_temp_file::named_temp_file): Interpret a null
SUFFIX argument as a request to use in-memory data.
(named_temp_file::~named_temp_file): Support in-memory data.
(temp_source_file::temp_source_file): Likewise.
(temp_source_file::~temp_source_file): Likewise.
* selftest.h (struct line_map_ordinary): Foward declare.
(class named_temp_file): Add missing explicit to the constructor.
(class temp_source_file): Add new members to support in-memory data.
(class line_table_test): Likewise.
(for_each_line_table_case): Adjust prototype.
---
 gcc/input.cc| 81 +
 gcc/selftest.cc | 53 +---
 gcc/selftest.h  | 19 ++--
 3 files changed, 113 insertions(+), 40 deletions(-)

diff --git a/gcc/input.cc b/gcc/input.cc
index 790279d4273..8c4e40aaf23 100644
--- a/gcc/input.cc
+++ b/gcc/input.cc
@@ -2066,6 +2066,20 @@ get_num_source_ranges_for_substring (cpp_reader *pfile,
 
 /* Selftests of location handling.  */
 
+/* Wrapper around linemap_add to handle transparently adding either a tmp file,
+   or in-memory generated content.  */
+const line_map_ordinary *
+temp_source_file::do_linemap_add (int line)
+{
+  const line_map *map;
+  if (content_buf)
+map = linemap_add (line_table, LC_GEN, false, content_buf,
+  line, content_len);
+  else
+map = linemap_add (line_table, LC_ENTER, false, get_filename (), line);
+  return linemap_check_ordinary (map);
+}
+
 /* Verify that compare() on linenum_type handles comparisons over the full
range of the type.  */
 
@@ -2144,13 +2158,16 @@ assert_loceq (const char *exp_filename, int 
exp_linenum, int exp_colnum,
 class line_table_case
 {
 public:
-  line_table_case (int default_range_bits, int base_location)
+  line_table_case (int default_range_bits, int base_location,
+  bool generated_data)
   : m_default_range_bits (default_range_bits),
-m_base_location (base_location)
+m_base_location (base_location),
+m_generated_data (generated_data)
   {}
 
   int m_default_range_bits;
   int m_base_location;
+  bool m_generated_data;
 };
 
 /* Constructor.  Store the old value of line_table, and create a new
@@ -2167,6 +2184,7 @@ line_table_test::line_table_test ()
   gcc_assert (saved_line_table->round_alloc_size);
   line_table->round_alloc_size = saved_line_table->round_alloc_size;
   line_table->default_range_bits = 0;
+  m_generated_data = false;
 }
 
 /* Constructor.  Store the old value of line_table, and create a new
@@ -2188,6 +2206,7 @@ line_table_test::line_table_test (const line_table_case 
_)
   line_table->highest_location = case_.m_base_location;
   line_table->highest_line = case_.m_base_location;
 }
+  m_generated_data = case_.m_generated_data;
 }
 
 /* Destructor.  Restore the old value of line_table.  */
@@ -2207,7 +2226,10 @@ test_accessing_ordinary_linemaps (const line_table_case 
_)
   line_table_test ltt (case_);
 
   /* Build a simple linemap describing some locations. */
-  linemap_add (line_table, LC_ENTER, false, "foo.c", 0);
+  if (ltt.m_generated_data)
+linemap_add (line_table, LC_GEN, false, "some data", 0, 10);
+  else
+linemap_add (line_table, LC_ENTER, false, "foo.c", 0);
 
   linemap_line_start (line_table, 1, 100);
   location_t loc_a = linemap_position_for_column (line_table, 1);
@@ -2257,21 +2279,23 @@ test_accessing_ordinary_linemaps (const line_table_case 
_)
   linemap_add (line_table, LC_LEAVE, false, NULL, 0);
 
   /* Verify that we can recover the location info.  */
-  assert_loceq ("foo.c", 1, 1, loc_a);
-  assert_loceq ("foo.c", 1, 23, loc_b);
-  assert_loceq ("foo.c", 2, 1, loc_c);
-  assert_loceq ("foo.c", 2, 17, loc_d);
-  assert_loceq ("foo.c", 3, 700, loc_e);
-  assert_loceq ("foo.c", 4, 100, loc_back_to_short);
+  const auto fname
+= (ltt.m_generated_data ? special_fname_generated () : "foo.c");
+  assert_loceq (fname, 1, 1, loc_a);
+  assert_loceq (fname, 1, 23, loc_b);
+  assert_loceq (fname, 2, 1, loc_c);
+  assert_loceq (fname, 2, 17, loc_d);
+  assert_loceq (fname, 3, 700, loc_e);
+  assert_loceq (fname, 4, 100, loc_back_to_short);
 
   /* In the very wide line, the 

[PATCH v4 3/8] diagnostics: Refactor class file_cache_slot

2023-08-09 Thread Lewis Hyatt via Gcc-patches
Class file_cache_slot in input.cc is used to query specific lines of source
code from a file when needed by diagnostics infrastructure. This will be
extended in a subsequent patch to support obtaining the source code from
in-memory generated buffers rather than from a file. The present patch
refactors class file_cache_slot, putting most of the logic into a new base
class cache_data_source, in preparation for reusing that code in the next
patch. There is no change in functionality yet.

gcc/ChangeLog:

* input.cc (class file_cache_slot): Refactor functionality into a
new base class...
(class cache_data_source): ...here.
(file_cache::forcibly_evict_file): Adapt for refactoring.
(file_cache_slot::evict): Renamed to...
(file_cache_slot::reset): ...this, and partially refactored into
base class...
(cache_data_source::reset): ...here.
(file_cache_slot::get_full_file_content): Moved into base class...
(cache_data_source::get_full_file_content): ...here.
(file_cache_slot::create): Adapt for refactoring.
(file_cache_slot::file_cache_slot): Refactor partially into...
(cache_data_source::cache_data_source): ...here.
(file_cache_slot::~file_cache_slot): Refactor partially into...
(cache_data_source::~cache_data_source): ...here.
(file_cache_slot::needs_read_p): Remove.
(file_cache_slot::needs_grow_p): Remove.
(file_cache_slot::maybe_grow): Adapt for refactoring.
(file_cache_slot::read_data): Refactored, along with...
(file_cache_slot::maybe_read_data): this, into...
(file_cache_slot::get_more_data): ...here.
(find_end_of_line): Change interface to take a pair of pointers,
rather than a pointer + length.
(file_cache_slot::get_next_line): Refactored into...
(cache_data_source::get_next_line): ...here.
(file_cache_slot::goto_next_line): Refactored into...
(cache_data_source::goto_next_line): ...here.
(file_cache_slot::read_line_num): Refactored into...
(cache_data_source::read_line_num): ...here.
(location_get_source_line): Fix const-correctness as necessitated by
new interface.
---
 gcc/input.cc | 513 +++
 1 file changed, 235 insertions(+), 278 deletions(-)

diff --git a/gcc/input.cc b/gcc/input.cc
index c2559614a99..9377020b460 100644
--- a/gcc/input.cc
+++ b/gcc/input.cc
@@ -55,34 +55,88 @@ file_cache::initialize_input_context 
(diagnostic_input_charset_callback ccb,
   in_context.should_skip_bom = should_skip_bom;
 }
 
-/* This is a cache used by get_next_line to store the content of a
-   file to be searched for file lines.  */
-class file_cache_slot
+/* This is an abstract interface for a class that provides data which we want 
to
+   look up by line number.  Concrete implementations will follow, which handle
+   the cases of reading the data from the input source files, or of reading it
+   from in-memory generated data buffers.  The design is driven with reading
+   from files in mind, in particular it is desirable to read only as much of a
+   file from disk as necessary.  It works like a simplified std::istream, i.e.
+   virtual function calls are only needed when we need to retrieve more data
+   from the underlying source.  */
+
+class cache_data_source
 {
-public:
-  file_cache_slot ();
-  ~file_cache_slot ();
 
-  bool read_line_num (size_t line_num,
- char ** line, ssize_t *line_len);
-
-  /* Accessors.  */
-  const char *get_file_path () const { return m_file_path; }
+public:
+  bool read_line_num (size_t line_num, const char **line, ssize_t *line_len);
   unsigned get_use_count () const { return m_use_count; }
+  void inc_use_count () { m_use_count++; }
+  bool get_next_line (const char **line, ssize_t *line_len);
+  bool goto_next_line ();
   bool missing_trailing_newline_p () const
   {
 return m_missing_trailing_newline;
   }
   char_span get_full_file_content ();
+  bool unused () const { return !m_data_begin; }
+  virtual void reset ();
+
+protected:
+  cache_data_source ();
+  virtual ~cache_data_source ();
+
+  /* These pointers delimit the data that we are processing.  They are
+ maintained by the derived classes, we only ask for more by calling
+ get_more_data().  That function should return TRUE if more data was
+ obtained.  Calling get_more_data () may invalidate these pointers
+ (i.e. reallocating them to a larger buffer).  */
+  const char *m_data_begin;
+  const char *m_data_end;
+  virtual bool get_more_data () = 0;
+
+  /* This is to be called by the derived classes when this object is
+ being activated.  */
+  void on_create (unsigned int use_count, size_t total_lines)
+  {
+m_use_count = use_count;
+m_total_lines = total_lines;
+  }
 
-  void inc_use_count () { m_use_count++; }
+private:
+  /* Non-copyable.  */
+  cache_data_source (const 

[PATCH v4 4/8] diagnostics: Support obtaining source code lines from generated data buffers

2023-08-09 Thread Lewis Hyatt via Gcc-patches
This patch enhances location_get_source_line(), which is the primary
interface provided by the diagnostics infrastructure to obtain the line of
source code corresponding to a given location, so that it understands
generated data locations in addition to normal file-based locations. This
involves changing the argument to location_get_source_line() from a plain
file name, to a source_id object that can represent either type of location.

gcc/ChangeLog:

* input.cc (class data_cache_slot): New class.
(file_cache::lookup_data): New function.
(diagnostics_file_cache_forcibly_evict_data): New function.
(file_cache::forcibly_evict_data): New function.
(file_cache::evicted_cache_tab_entry): Generalize (via a template)
to work for both file_cache_slot and data_cache_slot.
(file_cache::add_file): Adapt for new interface to
evicted_cache_tab_entry.
(file_cache::add_data): New function.
(data_cache_slot::create): New function.
(file_cache::file_cache): Support the new m_data_slots member.
(file_cache::~file_cache): Likewise.
(file_cache::lookup_or_add_data): New function.
(file_cache::lookup_or_add): New function that calls either
lookup_or_add_data or lookup_or_add_file as appropriate.
(location_get_source_line): Change the FILE_PATH argument to a
source_id SRC, and use it to support obtaining source lines from
generated data as well as from files.
(location_compute_display_column): Support generated data using the
new features of location_get_source_line.
(dump_location_info): Likewise.
* input.h (location_get_source_line): Adjust prototype. Add a new
convenience overload taking an expanded_location.
(class cache_data_source): Declare.
(class data_cache_slot): Declare.
(class file_cache): Declare new members.
(diagnostics_file_cache_forcibly_evict_data): Declare.
---
 gcc/input.cc | 171 ---
 gcc/input.h  |  23 +--
 2 files changed, 153 insertions(+), 41 deletions(-)

diff --git a/gcc/input.cc b/gcc/input.cc
index 9377020b460..790279d4273 100644
--- a/gcc/input.cc
+++ b/gcc/input.cc
@@ -207,6 +207,28 @@ private:
   void maybe_grow ();
 };
 
+/* This is the implementation of cache_data_source for generated
+   data that is already in memory.  */
+class data_cache_slot final : public cache_data_source
+{
+public:
+  void create (const char *data, unsigned int data_len,
+  unsigned int highest_use_count);
+  bool represents_data (const char *data, unsigned int) const
+  {
+/* We can just use pointer equality here since the generated data lives in
+   memory in one persistent place.  It isn't anticipated there would be
+   several generated data buffers with the same content, so we don't mind
+   that in such a case we will store it twice.  */
+return m_data_begin == data;
+  }
+
+protected:
+  /* In contrast to file_cache_slot, we do not own a buffer.  The buffer
+ passed to create() needs to outlive this object.  */
+  bool get_more_data () override { return false; }
+};
+
 /* Current position in real source file.  */
 
 location_t input_location = UNKNOWN_LOCATION;
@@ -382,6 +404,21 @@ file_cache::lookup_file (const char *file_path)
   return r;
 }
 
+data_cache_slot *
+file_cache::lookup_data (const char *data, unsigned int data_len)
+{
+  for (unsigned int i = 0; i != num_file_slots; ++i)
+{
+  const auto slot = m_data_slots + i;
+  if (slot->represents_data (data, data_len))
+   {
+ slot->inc_use_count ();
+ return slot;
+   }
+}
+  return nullptr;
+}
+
 /* Purge any mention of FILENAME from the cache of files used for
printing source code.  For use in selftests when working
with tempfiles.  */
@@ -397,6 +434,15 @@ diagnostics_file_cache_forcibly_evict_file (const char 
*file_path)
   global_dc->m_file_cache->forcibly_evict_file (file_path);
 }
 
+void
+diagnostics_file_cache_forcibly_evict_data (const char *data,
+   unsigned int data_len)
+{
+  if (!global_dc->m_file_cache)
+return;
+  global_dc->m_file_cache->forcibly_evict_data (data, data_len);
+}
+
 void
 file_cache::forcibly_evict_file (const char *file_path)
 {
@@ -410,36 +456,36 @@ file_cache::forcibly_evict_file (const char *file_path)
   r->reset ();
 }
 
+void
+file_cache::forcibly_evict_data (const char *data, unsigned int data_len)
+{
+  if (auto r = lookup_data (data, data_len))
+r->reset ();
+}
+
 /* Return the cache that has been less used, recently, or the
first empty one.  If HIGHEST_USE_COUNT is non-null,
*HIGHEST_USE_COUNT is set to the highest use count of the entries
in the cache table.  */
 
-file_cache_slot*
-file_cache::evicted_cache_tab_entry (unsigned *highest_use_count)
+template 
+Slot *
+file_cache::evicted_cache_tab_entry (Slot 

[PATCH v4 8/8] diagnostics: Support generated data locations in SARIF output

2023-08-09 Thread Lewis Hyatt via Gcc-patches
The diagnostics routines for SARIF output need to read the source code back
in, so that they can generate "snippet" and "content" records, so they need to
be able to cope with generated data locations.  Add support for that in
diagnostic-format-sarif.cc.

gcc/ChangeLog:

* diagnostic-format-sarif.cc (class sarif_builder): Adapt interface
to support generated data locations.
(sarif_builder::maybe_make_physical_location_object): Change the
m_filenames hash_set to support generated data.
(sarif_builder::make_artifact_location_object): Use a source_id rather
than a plain file name.
(sarif_builder::maybe_make_region_object): Adapt to
expanded_location interface changes.
(sarif_builder::maybe_make_region_object_for_context): Likewise.
(sarif_builder::make_artifact_object): Likewise.
(sarif_builder::make_run_object): Handle generated data.
(sarif_builder::maybe_make_artifact_content_object): Likewise.
(get_source_lines): Likewise.

gcc/testsuite/ChangeLog:

* c-c++-common/diagnostic-format-sarif-file-5.c: New test.
---
 gcc/diagnostic-format-sarif.cc| 88 +++
 .../diagnostic-format-sarif-file-5.c  | 31 +++
 2 files changed, 82 insertions(+), 37 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/diagnostic-format-sarif-file-5.c

diff --git a/gcc/diagnostic-format-sarif.cc b/gcc/diagnostic-format-sarif.cc
index 1eff71962d7..c7c0e5d4b0a 100644
--- a/gcc/diagnostic-format-sarif.cc
+++ b/gcc/diagnostic-format-sarif.cc
@@ -174,7 +174,7 @@ private:
   json::array *maybe_make_kinds_array (diagnostic_event::meaning m) const;
   json::object *maybe_make_physical_location_object (location_t loc);
   json::object *make_artifact_location_object (location_t loc);
-  json::object *make_artifact_location_object (const char *filename);
+  json::object *make_artifact_location_object (source_id src);
   json::object *make_artifact_location_object_for_pwd () const;
   json::object *maybe_make_region_object (location_t loc) const;
   json::object *maybe_make_region_object_for_context (location_t loc) const;
@@ -197,9 +197,9 @@ private:
   json::object *make_reporting_descriptor_object_for_cwe_id (int cwe_id) const;
   json::object *
   make_reporting_descriptor_reference_object_for_cwe_id (int cwe_id);
-  json::object *make_artifact_object (const char *filename);
-  json::object *maybe_make_artifact_content_object (const char *filename) 
const;
-  json::object *maybe_make_artifact_content_object (const char *filename,
+  json::object *make_artifact_object (source_id src);
+  json::object *maybe_make_artifact_content_object (source_id src) const;
+  json::object *maybe_make_artifact_content_object (source_id src,
int start_line,
int end_line) const;
   json::object *make_fix_object (const rich_location _loc);
@@ -220,7 +220,11 @@ private:
  diagnostic group.  */
   sarif_result *m_cur_group_result;
 
-  hash_set  m_filenames;
+  /* If the second member is >0, then this is a buffer of generated content,
+ with that length, not a filename.  */
+  hash_set ,
+  int_hash  >
+   > m_filenames;
   bool m_seen_any_relative_paths;
   hash_set  m_rule_id_set;
   json::array *m_rules_arr;
@@ -787,7 +791,8 @@ sarif_builder::maybe_make_physical_location_object 
(location_t loc)
   /* "artifactLocation" property (SARIF v2.1.0 section 3.29.3).  */
   json::object *artifact_loc_obj = make_artifact_location_object (loc);
   phys_loc_obj->set ("artifactLocation", artifact_loc_obj);
-  m_filenames.add (LOCATION_FILE (loc));
+  const auto src = LOCATION_SRC (loc);
+  m_filenames.add ({src.get_filename_or_buffer (), src.get_buffer_len ()});
 
   /* "region" property (SARIF v2.1.0 section 3.29.4).  */
   if (json::object *region_obj = maybe_make_region_object (loc))
@@ -811,7 +816,7 @@ sarif_builder::maybe_make_physical_location_object 
(location_t loc)
 json::object *
 sarif_builder::make_artifact_location_object (location_t loc)
 {
-  return make_artifact_location_object (LOCATION_FILE (loc));
+  return make_artifact_location_object (LOCATION_SRC (loc));
 }
 
 /* The ID value for use in "uriBaseId" properties (SARIF v2.1.0 section 3.4.4)
@@ -823,10 +828,13 @@ sarif_builder::make_artifact_location_object (location_t 
loc)
or return NULL.  */
 
 json::object *
-sarif_builder::make_artifact_location_object (const char *filename)
+sarif_builder::make_artifact_location_object (source_id src)
 {
   json::object *artifact_loc_obj = new json::object ();
 
+  const auto filename = src.is_buffer ()
+? special_fname_generated () : src.get_filename_or_buffer ();
+
   /* "uri" property (SARIF v2.1.0 section 3.4.3).  */
   artifact_loc_obj->set ("uri", new json::string (filename));
 
@@ -912,9 +920,9 @@ sarif_builder::maybe_make_region_object (location_t loc) 

[PATCH v4 1/8] libcpp: Add LC_GEN linemaps to support in-memory buffers

2023-08-09 Thread Lewis Hyatt via Gcc-patches
Add a new linemap reason LC_GEN which enables encoding the location of data
that was generated during compilation and does not appear in any source file.
There could be many use cases, such as, for instance, referring to the content
of builtin macros (not yet implemented, but an easy lift after this one.) The
first intended application is to create a place to store the input to a
_Pragma directive, so that proper locations can be assigned to those
tokens. This will be done in a subsequent commit.

The TO_FILE member of struct line_map_ordinary has been changed to a union
named SRC which can be either a file name, or a pointer to a line_map_data
struct describing the data. There is no space overhead added to the line
maps data structures.

Outside libcpp, this patch includes only the minimal changes implied by the
adjustment from TO_FILE to SRC in struct line_map_ordinary. Subsequent
patches will implement the new functionality.

libcpp/ChangeLog:

* include/line-map.h (enum lc_reason): Add LC_GEN.
(struct line_map_data): New struct.
(struct line_map_ordinary): Change TO_FILE from a char* to a union,
and rename to SRC.
(class source_id): New class.
(ORDINARY_MAP_GENERATED_DATA_P): New function.
(ORDINARY_MAP_GENERATED_DATA): New function.
(ORDINARY_MAP_GENERATED_DATA_LEN): New function.
(ORDINARY_MAP_SOURCE_ID): New function.
(ORDINARY_MAPS_SAME_FILE_P): New function.
(ORDINARY_MAP_CONTAINING_FILE_NAME): Declare.
(LINEMAP_FILE): Adapt to struct line_map_ordinary change.
(linemap_get_file_highest_location): Likewise.
* line-map.cc (source_id::operator==): New function.
(ORDINARY_MAP_CONTAINING_FILE_NAME): New function.
(linemap_add): Support creating LC_GEN maps.
(linemap_line_start): Support LC_GEN maps.
(linemap_check_files_exited): Likewise.
(linemap_position_for_loc_and_offset): Likewise.
(linemap_get_expansion_filename): Likewise.
(linemap_dump): Likewise.
(linemap_dump_location): Likewise.
(linemap_get_file_highest_location): Likewise.
* directives.cc (_cpp_do_file_change): Likewise.

gcc/c-family/ChangeLog:

* c-common.cc (try_to_locate_new_include_insertion_point): Recognize
and ignore LC_GEN maps.

gcc/cp/ChangeLog:

* module.cc (module_state::write_ordinary_maps): Recognize and
ignore LC_GEN maps, and adapt to interface change in struct
line_map_ordinary.
(module_state::read_ordinary_maps): Likewise.

gcc/ChangeLog:

* diagnostic-show-locus.cc (compatible_locations_p): Adapt to
interface change in struct line_map_ordinary.
* input.cc (special_fname_generated): New function.
(dump_location_info): Support LC_GEN maps.
(get_substring_ranges_for_loc): Adapt to interface change in struct
line_map_ordinary.
* input.h (special_fname_generated): Declare.

gcc/go/ChangeLog:

* go-linemap.cc (Gcc_linemap::to_string): Recognize and ignore
LC_GEN maps.
---
 gcc/c-family/c-common.cc |  11 ++-
 gcc/cp/module.cc |   8 +-
 gcc/diagnostic-show-locus.cc |   2 +-
 gcc/go/go-linemap.cc |   3 +-
 gcc/input.cc |  27 +-
 gcc/input.h  |   1 +
 libcpp/directives.cc |   4 +-
 libcpp/include/line-map.h| 144 
 libcpp/line-map.cc   | 181 +--
 9 files changed, 299 insertions(+), 82 deletions(-)

diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index 9fbaeb437a1..ecfc2efc29f 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -9206,19 +9206,22 @@ try_to_locate_new_include_insertion_point (const char 
*file, location_t loc)
   const line_map_ordinary *ord_map
= LINEMAPS_ORDINARY_MAP_AT (line_table, i);
 
+  if (ORDINARY_MAP_GENERATED_DATA_P (ord_map))
+   continue;
+
   if (const line_map_ordinary *from
  = linemap_included_from_linemap (line_table, ord_map))
/* We cannot use pointer equality, because with preprocessed
   input all filename strings are unique.  */
-   if (0 == strcmp (from->to_file, file))
+   if (ORDINARY_MAP_SOURCE_ID (from) == file)
  {
last_include_ord_map = from;
last_ord_map_after_include = NULL;
  }
 
-  /* Likewise, use strcmp, and reject any line-zero introductory
-map.  */
-  if (ord_map->to_line && 0 == strcmp (ord_map->to_file, file))
+  /* Likewise, use strcmp (via the source_id comparison), and reject any
+line-zero introductory map.  */
+  if (ord_map->to_line && ORDINARY_MAP_SOURCE_ID (ord_map) == file)
{
  if (!first_ord_map_in_file)
first_ord_map_in_file = ord_map;
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index ea362bdffa4..ff17cd57016 100644
--- 

[PATCH v4 2/8] libcpp: diagnostics: Support generated data in expanded locations

2023-08-09 Thread Lewis Hyatt via Gcc-patches
The previous patch in this series introduced the concept of LC_GEN line
maps. This patch continues on the path to using them to improve _Pragma
diagnostics, by adding a new source_id SRC member to struct
expanded_location, which is populated by linemap_expand_location. This
member allows call sites to detect and handle when a location refers to
generated data rather than a plain file name.

The previous FILE member of expanded_location is preserved (although
redundant with SRC), so that call sites which do not and never will care
about generated data do not need to be concerned about it. Call sites that
will care are modified here, to use SRC rather than FILE for comparing
locations.

libcpp/ChangeLog:

* include/line-map.h (struct expanded_location): Add SRC member. Add
zero-initializers for all members, since source_id is not a POD
type.
(class fixit_hint): Adjust prototype.
* line-map.cc (linemap_expand_location): Populate the new SRC member
in the expanded_location.
(rich_location::maybe_add_fixit): Compare explocs with the new SRC
field instead of the FILE field.
(fixit_hint::affects_line_p): Accept a source_id instead of a file
name, and use it for the comparisons.

gcc/c-family/ChangeLog:

* c-format.cc (get_corrected_substring): Compare explocs with the
new SRC field instead of the FILE field.
* c-indentation.cc (should_warn_for_misleading_indentation): Likewise.
(assert_get_visual_column_succeeds): Initialize the SRC field in the
test expanded_location.
(assert_get_visual_column_fails): Likewise.

gcc/ChangeLog:

* diagnostic-show-locus.cc (make_range): Adapt to the new
constructor semantics for struct expanded_location.
(layout::maybe_add_location_range): Compare explocs with the new SRC
field instead of the FILE field.
(layout::validate_fixit_hint_p): Likewise.
(layout::print_leading_fixits): Use the SRC field in struct
expanded_location to query fixit_hint::affects_line_p.
(layout::print_trailing_fixits): Likewise.
* diagnostic.cc (diagnostic_report_current_module): Use the new SRC
field in expanded_location to detect LC_GEN locations and identify
them as such.
(assert_location_text): Adapt to the new constructor semantics for
struct expanded_location.
* input.cc (expand_location_1): Likewise. And when libcpp's
linemap_expand_location returns a null FILE for generated data,
replace it with special_fname_generated ().
(total_lines_num): Handle a generic source_id argument rather than a
file name only.
(get_source_text_between): Compare explocs with the new SRC field
instead of the FILE field.
(get_substring_ranges_for_loc): Likewise.
* edit-context.cc (edit_context::apply_fixit): Ignore locations in
generated data.
* input.h (LOCATION_SRC): New accessor macro.
---
 gcc/c-family/c-format.cc  |  4 ++--
 gcc/c-family/c-indentation.cc | 10 +-
 gcc/diagnostic-show-locus.cc  | 30 +-
 gcc/diagnostic.cc | 19 ---
 gcc/edit-context.cc   |  2 +-
 gcc/input.cc  | 21 +++--
 gcc/input.h   |  1 +
 libcpp/include/line-map.h | 24 ++--
 libcpp/line-map.cc| 15 +++
 9 files changed, 70 insertions(+), 56 deletions(-)

diff --git a/gcc/c-family/c-format.cc b/gcc/c-family/c-format.cc
index b4eeebcb30e..529b1408179 100644
--- a/gcc/c-family/c-format.cc
+++ b/gcc/c-family/c-format.cc
@@ -4522,9 +4522,9 @@ get_corrected_substring (const substring_loc _loc,
 = expand_location_to_spelling_point (fmt_substring_range.m_start);
   expanded_location finish
 = expand_location_to_spelling_point (fmt_substring_range.m_finish);
-  if (caret.file != start.file)
+  if (caret.src != start.src)
 return NULL;
-  if (start.file != finish.file)
+  if (start.src != finish.src)
 return NULL;
   if (caret.line != start.line)
 return NULL;
diff --git a/gcc/c-family/c-indentation.cc b/gcc/c-family/c-indentation.cc
index e8d3dece770..fce74991aae 100644
--- a/gcc/c-family/c-indentation.cc
+++ b/gcc/c-family/c-indentation.cc
@@ -334,7 +334,7 @@ should_warn_for_misleading_indentation (const 
token_indent_info _tinfo,
   const unsigned int tab_width = global_dc->tabstop;
 
   /* They must be in the same file.  */
-  if (next_stmt_exploc.file != body_exploc.file)
+  if (next_stmt_exploc.src != body_exploc.src)
 return false;
 
   /* If NEXT_STMT_LOC and BODY_LOC are on the same line, consider
@@ -363,7 +363,7 @@ should_warn_for_misleading_indentation (const 
token_indent_info _tinfo,
   ^ DON'T WARN HERE.  */
   if (next_stmt_exploc.line == body_exploc.line)
 {
-  if (guard_exploc.file != 

[PATCH v4 0/8] diagnostics: libcpp: Overhaul locations for _Pragma tokens

2023-08-09 Thread Lewis Hyatt via Gcc-patches
On Mon, Jul 31, 2023 at 06:39:15PM -0400, Lewis Hyatt wrote:
> On Fri, Jul 28, 2023 at 6:58 PM David Malcolm  wrote:
> >
> > On Fri, 2023-07-21 at 19:08 -0400, Lewis Hyatt wrote:
> > > Add a new linemap reason LC_GEN which enables encoding the location
> > > of data
> > > that was generated during compilation and does not appear in any
> > > source file.
> > > There could be many use cases, such as, for instance, referring to
> > > the content
> > > of builtin macros (not yet implemented, but an easy lift after this
> > > one.) The
> > > first intended application is to create a place to store the input to
> > > a
> > > _Pragma directive, so that proper locations can be assigned to those
> > > tokens. This will be done in a subsequent commit.
> > >
> > > The actual change needed to the line-maps API in libcpp is not too
> > > large and
> > > requires no space overhead in the line map data structures (on 64-bit
> > > systems
> > > that is; one newly added data member to class line_map_ordinary sits
> > > inside
> > > former padding bytes.) An LC_GEN map is just an ordinary map like any
> > > other,
> > > but the TO_FILE member that normally points to the file name points
> > > instead to
> > > the actual data.  This works automatically with PCH as well, for the
> > > same
> > > reason that the file name makes its way into a PCH.  In order to
> > > avoid
> > > confusion, the member has been renamed from TO_FILE to DATA, and
> > > associated
> > > accessors adjusted.
> > >
> > > Outside libcpp, there are many small changes but most of them are to
> > > selftests, which are necessarily more sensitive to implementation
> > > details. From the perspective of the user (the "user", here, being a
> > > frontend
> > > using line maps or else the diagnostics infrastructure), the chief
> > > visible
> > > change is that the function location_get_source_line() should be
> > > passed an
> > > expanded_location object instead of a separate filename and line
> > > number.  This
> > > is not a big change because in most cases, this information came
> > > anyway from a
> > > call to expand_location and the needed expanded_location object is
> > > readily
> > > available. The new overload of location_get_source_line() uses the
> > > extra
> > > information in the expanded_location object to obtain the data from
> > > the
> > > in-memory buffer when it originated from an LC_GEN map.
> > >
> > > Until the subsequent patch that starts using LC_GEN maps, none are
> > > yet
> > > generated within GCC, hence nothing is added to the testsuite here;
> > > but all
> > > relevant selftests have been extended to cover generated data maps in
> > > addition
> > > to normal files.
> >
> > [..snip...]
> >
> > Thanks for the updated patch.
> >
> > Reading this patch, it felt a bit unnatural to me to have an
> >   (exploded location, source line)
> > pair where the exploded location seems to be representing "which source
> > file or generated buffer", but the line/column info in that
> > exploded_location is to be ignored in favor of the 2nd source line.
> >
> > I think we're missing a class: something that identifies either a
> > specific source file, or a specific generated buffer.
> >
> > How about something like either:
> >
> > class source_id
> > {
> > public:
> >   source_id (const char *filename)
> >   : m_filename_or_buffer (filename),
> > m_len (0)
> >   {
> >   }
> >
> >   explicit source_id (const char *buffer, unsigned buffer_len)
> >   : m_filename_or_buffer (buffer),
> > m_len (buffer_len)
> >   {
> > linemap_assert (buffer_len > 0);
> >   }
> >
> > private:
> >   const char *m_filename_or_buffer;
> >   unsigned m_len;  // where 0 means "it's a filename"
> > };
> >
> > or:
> >
> > class source_id
> > {
> > public:
> >   source_id (const char *filename)
> >   : m_ptr (filename),
> > m_is_buffer (false)
> >   {
> >   }
> >
> >   explicit source_id (const linemap_ordinary *buffer_linemap)
> >   : m_ptr (buffer_linemap),
> > m_is_buffer (true)
> >   {
> >   }
> >
> > private:
> >   const void *m_ptr;
> >   bool m_is_buffer;
> > };
> >
> > and use one of these "source_id file" in place of "const char *file",
> > rather than replacing such things with expanded_location?
> >
> > > diff --git a/gcc/c-family/c-indentation.cc b/gcc/c-family/c-indentation.cc
> > > index e8d3dece770..4164fa0b1ba 100644
> > > --- a/gcc/c-family/c-indentation.cc
> > > +++ b/gcc/c-family/c-indentation.cc
> > > @@ -50,7 +50,7 @@ get_visual_column (expanded_location exploc,
> > >  unsigned int *first_nws,
> > >  unsigned int tab_width)
> > >  {
> > > -  char_span line = location_get_source_line (exploc.file, exploc.line);
> > > +  char_span line = location_get_source_line (exploc);
> >
> > ...so this might contine to be:
> >
> >   char_span line = location_get_source_line (exploc.file, exploc.line);
> >
> > ...but expanded_location's "file" field would become a source_id,
> > rather than a const char *.  It 

Re: [PATCH V2] riscv: generate builtin macro for compilation with strict alignment:

2023-08-09 Thread Edwin Lu

Hi Vineet,

On 8/8/2023 2:02 PM, Vineet Gupta wrote:


Maybe add a comment that in absence of -m[no-]strict-align, we use the 
cpu tune param -> slow_unaligned_access and that default mcpu is rocket 
which has that set to _slow.




That sounds good to me!


+#if defined(__riscv_unaligned_avoid)
+#error "__riscv_unaligned_avoid is unexpectedly set"
+#endif


Lets first check what is really expected.
#if !defined (_slow) #error


+
+/* Check __riscv_unaligned_slow xor __riscv_unaligned_fast is set.  */
+#if !defined(__riscv_unaligned_slow) && !defined(__riscv_unaligned_fast)
+#error "either __riscv_unaligned_slow or__riscv_unaligned_fast is not 
set"

+#endif
+
+#if defined(__riscv_unaligned_slow) && defined(__riscv_unaligned_fast)
+#error "both __riscv_unaligned_slow and__riscv_unaligned_fast are set"
+#endif
+


I think we can simplify this a bit (sorry I if wasn't clear enough in 
our off-list discussions).
We now need to ensure that unexpected toggles are NOT defined: #if 
defined(_avoid) || defined(_fast) #error
I don't think we need to check for the combinations as that is covered 
by first one and this.


While separate #error prints for 2 failures are ideal, personally it 
feels excessive given that the current implementation will only ever 
generate 1 of them. If a future code change breaks that assumption, the 
onus is on that change to fix/update the errors.




That makes sense to me. I was thinking that adding some more checks 
would help clarify the assumptions but it did just make things overly 
verbose.




Same as my comment for attribute-1. Please recheck all of them.



Thanks for the feedback! I will be sure to update all of them in the 
next iteration.


Edwin



Re: [PATCH v2] analyzer: More features for CPython analyzer plugin [PR107646]

2023-08-09 Thread David Malcolm via Gcc-patches
On Wed, 2023-08-09 at 15:22 -0400, Eric Feng wrote:
> Thank you for your help in getting dg-require-python-h working! I can
> confirm that the FAILs are related to differences between the --
> cflags
> affecting the gimple seen by the analyzer. For this reason, I have
> changed it to --includes for now. 

Sounds good.

Eventually we'll probably want to support --cflags, but given that
every distribution probably has its own set of flags, it's a recipe for
an unpleasantly large test matrix, so just using --includes is a good
compromise.

> To be sure, I tested on Python 3.8 as
> well and it works as expected. I have also addressed the following
> comments on the WIP patch as you described.
> 
> -- Update Changelog entry to list new functions being simulated.
> -- Update region_model::get_or_create_region_for_heap_alloc leading
> comment.
> -- Change register_alloc to update_state_machine.
> -- Change move_ptr_sval_non_null to transition_ptr_sval_non_null.
> -- Static helper functions for:
> -- Initializing ob_refcnt field.
> -- Setting ob_type field.
> -- Getting ob_base field.
> -- Initializing heap allocated region for PyObjects.
> -- Incrementing a field by one.
> -- Change arg_is_long_p to arg_is_integral_p.
> -- Extract common failure scenario for reusability.
> 
> The initial WIP patch using 
> 
> /* { dg-options "-fanalyzer -I/usr/include/python3.9" }. */
> 
> have been bootstrapped and regtested on aarch64-unknown-linux-gnu.
> Since
> we did not change any core logic in the revision and the only changes
> within the analyzer core are changing variable names, is it OK for
> trunk. In the mean time, the revised patch is currently going through
> bootstrap and regtest process.

Thanks for the updated patch.

Unfortunately I just pushed a largish analyzer patch (r14-3114-
g73da34a538ddc2) which may well conflict with your patch, so please
rebase to beyond that.  

Sorry about this.

In particular note that there's no longer a default assignment to the
LHS at a call-site in region_model::on_call_pre; known_function
subclasses are now responsible for assigning to the LHS of the
callsite.  But I suspect that all the known_function subclasses in the
cpython plugin already do that.

Some nits inline below...

[...snip...]

> Some concessions were made to
> simplify the analysis process when comparing kf_PyList_Append with
> the
> real implementation. In particular, PyList_Append performs some
> optimization internally to try and avoid calls to realloc if
> possible. For simplicity, we assume that realloc is called every
> time.
> Also, we grow the size by just 1 (to ensure enough space for adding a
> new element) rather than abide by the heuristics that the actual
> implementation
> follows.

Might be worth capturing these notes as comments in the source (for
class kf_PyList_Append), rather than just within the commit message.

[...snip...]
> 
> diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-
> model.cc
> index e92b3f7b074..c338f045d92 100644
> --- a/gcc/analyzer/region-model.cc
> +++ b/gcc/analyzer/region-model.cc
> @@ -5127,11 +5127,16 @@ region_model::check_dynamic_size_for_floats
> (const svalue *size_in_bytes,
>     Use CTXT to complain about tainted sizes.
>  
>     Reuse an existing heap_allocated_region if it's not being
> referenced by
> -   this region_model; otherwise create a new one.  */
> +   this region_model; otherwise create a new one.
> +
> +   Optionally (update_state_machine) transitions the pointer
> pointing to the
> +   heap_allocated_region from start to assumed non-null.  */
>  
>  const region *
>  region_model::get_or_create_region_for_heap_alloc (const svalue
> *size_in_bytes,
> - 
> region_model_context *ctxt)
> +   region_model_context *ctxt,
> +   bool update_state_machine,
> +   const call_details *cd)
>  {
>    /* Determine which regions are referenced in this region_model, so
> that
>   we can reuse an existing heap_allocated_region if it's not in
> use on
> @@ -5153,6 +5158,17 @@
> region_model::get_or_create_region_for_heap_alloc (const svalue
> *size_in_bytes,
>    if (size_in_bytes)
>  if (compat_types_p (size_in_bytes->get_type (), size_type_node))
>    set_dynamic_extents (reg, size_in_bytes, ctxt);
> +
> +   if (update_state_machine && cd)
> +   {
> +   const svalue *ptr_sval = nullptr;
> +   if (cd->get_lhs_type ())
> +   ptr_sval = m_mgr->get_ptr_svalue (cd->get_lhs_type (), reg);
> +   else
> +   ptr_sval = m_mgr->get_ptr_svalue (NULL_TREE, reg);
> +   transition_ptr_sval_non_null (ctxt,
> ptr_sval);

This if/else is redundant: the "else" is only reached if cd-
>get_lhs_type () is null, in which case you pass in NULL_TREE, so it
works the same either way.  Or am I missing something?

Also, it looks like something weird's happening with 

Re: [PATCH 0/12] GCC _BitInt support [PR102989]

2023-08-09 Thread Joseph Myers
On Wed, 9 Aug 2023, Jakub Jelinek via Gcc-patches wrote:

> - _Complex _BitInt(N) isn't supported; again mainly because none of the psABIs
>   mention how those should be passed/returned; in a limited way they are
>   supported internally because the internal functions into which
>   __builtin_{add,sub,mul}_overflow{,_p} is lowered return COMPLEX_TYPE as a
>   hack to return 2 values without using references/pointers

What happens when the usual arithmetic conversions are applied to 
operands, one of which is a complex integer type and the other of which is 
a wider _BitInt type?  I don't see anything in the code to disallow this 
case (which would produce an expression with a _Complex _BitInt type), or 
any testcases for it.

Other testcases I think should be present (along with any corresponding 
changes needed to the code itself):

* Verifying that the new integer constant suffix is rejected for C++.

* Verifying appropriate pedwarn-if-pedantic for the new constant suffix 
for versions of C before C2x (and probably for use of _BitInt type 
specifiers before C2x as well) - along with the expected -Wc11-c2x-compat 
handling (in C2x mode) / -pedantic -Wno-c11-c2x-compat in older modes.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PING 2][PATCH 0/9] Add btf_decl_tag C attribute

2023-08-09 Thread David Faust via Gcc-patches
Ping for this series.

https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624156.html

Thanks

On 7/11/23 14:57, David Faust via Gcc-patches wrote:
> Hello,
> 
> This series adds support for a new attribute, "btf_decl_tag" in GCC.
> The same attribute is already supported in clang, and is used by various
> components of the BPF ecosystem.
> 
> The purpose of the attribute is to allow to associate (to "tag")
> declarations with arbitrary string annotations, which are emitted into
> debugging information (DWARF and/or BTF) to facilitate post-compilation
> analysis (the motivating use case being the Linux kernel BPF verifier).
> Multiple tags are allowed on the same declaration.
> 
> These strings are not interpreted by the compiler, and the attribute
> itself has no effect on generated code, other than to produce additional
> DWARF DIEs and/or BTF records conveying the annotations.
> 
> This entails:
> 
> - A new C-language-level attribute which allows to associate (to "tag")
>   particular declarations with arbitrary strings.
> 
> - The conveyance of that information in DWARF in the form of a new DIE,
>   DW_TAG_GNU_annotation, with tag number (0x6000) and format matching
>   that of the DW_TAG_LLVM_annotation extension supported in LLVM for
>   the same purpose. These DIEs are already supported by BPF tooling,
>   such as pahole.
> 
> - The conveyance of that information in BTF debug info in the form of
>   BTF_KIND_DECL_TAG records. These records are already supported by
>   LLVM and other tools in the eBPF ecosystem, such as the Linux kernel
>   eBPF verifier.
> 
> 
> Background
> ==
> 
> The purpose of these tags is to convey additional semantic information
> to post-compilation consumers, in particular the Linux kernel eBPF
> verifier. The verifier can make use of that information while analyzing
> a BPF program to aid in determining whether to allow or reject the
> program to be run. More background on these tags can be found in the
> early support for them in the kernel here [1] and [2].
> 
> The "btf_decl_tag" attribute is half the story; the other half is a
> sibling attribute "btf_type_tag" which serves the same purpose but
> applies to types. Support for btf_type_tag will come in a separate
> patch series, since it is impaced by GCC bug 110439 which needs to be
> addressed first.
> 
> I submitted an initial version of this work (including btf_type_tag)
> last spring [3], however at the time there were some open questions
> about the behavior of the btf_type_tag attribute and issues with its
> implementation. Since then we have clarified these details and agreed
> to solutions with the BPF community and LLVM BPF folks.
> 
> The main motivation for emitting the tags in DWARF is that the Linux
> kernel generates its BTF information via pahole, using DWARF as a source:
> 
> ++  BTF  BTF   +--+
> | pahole |---> vmlinux.btf --->| verifier |
> ++ +--+
> ^^
> ||
>   DWARF |BTF |
> ||
>   vmlinux  +-+
>   module1.ko   | BPF program |
>   module2.ko   +-+
> ...
> 
> This is because:
> 
> a)  pahole adds additional kernel-specific information into the
> produced BTF based on additional analysis of kernel objects.
> 
> b)  Unlike GCC, LLVM will only generate BTF for BPF programs.
> 
> b)  GCC can generate BTF for whatever target with -gbtf, but there is no
> support for linking/deduplicating BTF in the linker.
> 
> In the scenario above, the verifier needs access to the pointer tags of
> both the kernel types/declarations (conveyed in the DWARF and translated
> to BTF by pahole) and those of the BPF program (available directly in BTF).
> 
> 
> DWARF Representation
> 
> 
> As noted above, btf_decl_tag is represented in DWARF via a new DIE
> DW_TAG_GNU_annotation, with identical format to the LLVM DWARF
> extension DW_TAG_LLVM_annotation serving the same purpose. The DIE has
> the following format:
> 
>   DW_TAG_GNU_annotation (0x6000)
> DW_AT_name: "btf_decl_tag"
> DW_AT_const_value: 
> 
> These DIEs are placed in the DWARF tree as children of the DIE for the
> appropriate declaration, and one such DIE is created for each occurrence
> of the btf_decl_tag attribute on a declaration.
> 
> For example:
> 
>   const int * c __attribute__((btf_decl_tag ("__c"), btf_decl_tag 
> ("devicemem")));
> 
> This declaration produces the following DWARF:
> 
>  <1><1e>: Abbrev Number: 2 (DW_TAG_variable)
> <1f>   DW_AT_name: c
> <24>   DW_AT_type: <0x49>
> ...
>  <2><36>: Abbrev Number: 3 (User TAG value: 0x6000)
> <37>   DW_AT_name: (indirect string, 

Re: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Jakub Jelinek via Gcc-patches
On Wed, Aug 09, 2023 at 08:43:00PM +, Joseph Myers wrote:
> At this point it seems appropriate to remind people of another ABI 
> consideration for vector extensions.  glibc's libmvec defines vector 
> versions of various functions, including AVX512 ones (of course those 
> function versions only work on hardware with the relevant instructions).  
> glibc's headers use both _Pragma ("omp declare simd notinbranch") and 
> __attribute__ ((__simd__ ("notinbranch"))) to declare, to the compiler 
> including those headers, what function variants are available in glibc.

For omp declare simd or simd attribute that simply implies that the
variants with 512-bit vectors may only be called from -mavx512f or
-mavx10.1-512 (or how the switch will be called code), not from -mavx10.1.
We shouldn't change that ABI because of AVX10.

Jakub



Re: [patch] libgomp: cuda.h and omp_target_memcpy_rect cleanup (was: [patch] OpenMP: Call cuMemcpy2D/cuMemcpy3D for nvptx for omp_target_memcpy_rect)

2023-08-09 Thread Thomas Schwinge
Hi Tobias!

On 2023-07-28T13:51:41+0200, Tobias Burnus  wrote:
> On 27.07.23 23:00, Thomas Schwinge wrote:
>>> +  else if (src_devicep != NULL
>>> +&& (dst_devicep == NULL
>>> +|| (dst_devicep->capabilities
>>> +& GOMP_OFFLOAD_CAP_SHARED_MEM)))
>> Are these 'GOMP_OFFLOAD_CAP_SHARED_MEM' actually reachable, given that
>> 'omp_target_memcpy_check' (via 'omp_target_memcpy_rect_check') clears out
>> the device to 'NULL' for 'GOMP_OFFLOAD_CAP_SHARED_MEM'?
>
> I have now undone this change – I did not dig deep enough into the
> function calls.
>
>
>>> +  else if (dst_devicep == NULL && src_devicep == NULL)
>>> + {
>>> +   memcpy ((char *) dst + dst_off, (const char *) src + src_off,
>>> +   length);
>>> +   ret = 1;
>>> + }
>>> else if (src_devicep == dst_devicep)
>>>ret = src_devicep->dev2dev_func (src_devicep->target_id,
>>> (char *) dst + dst_off,
>>> (const char *) src + src_off,
>>> length);
>> ..., but also left the intra-device case here -- which should now be dead
>> code here?
>
> Why? Unless I missed something, the old, the current, and the proposed
> (= old) code do still run this code.

It is now again reachable, but wasn't in the "intermediate state"
(without your follow-on change) -- at least per my understanding, which
may be gappy.

> I have not added an assert to confirm, but in any case, it is tested for
> in my recently added testcase - thus, we could add a 'printf' to confirm.

We're now back to the original code, which should be fine.

>>> +   else if (*tmp_size < length)
>>> + {
>>> +   *tmp_size = length;
>>> +   *tmp = realloc (*tmp, length);
>>> +   if (*tmp == NULL)
>>> + return ENOMEM;
>> If 'realloc' returns 'NULL', we should 'free' the original '*tmp'?
>>
>> Do we really need here the property here that if the re-allocation can't
>> be done in-place, 'realloc' copies the original content to the new?  In
>> other words, should we just unconditionally 'free' and re-'malloc' here,
>> instead of 'realloc'?
> I have now done so – but I am not really sure what's faster on average.
> If it can be enlarged, 'realloc' is faster, if it cannot free+malloc is
> better.

I have no proof, but would assume that the C library handles as
efficiently as 'realloc' the case of 'free' plus subsequent 'malloc' if
there's space available after the original allocation.

>> I haven't looked whether the re-use of 'tmp' for multiple calls to this
>> is then actually useful, or whether we should just always 'malloc', use,
>> 'free' the buffer here?

..., hence that suggestion.  But I agree what we've got now is good, just
one more idea: couldn't we now actually unify the (original) 'malloc' and
(original) 'realloc' case:

-if (*tmp_size == 0)
-  {
-*tmp_size = length;
-*tmp = malloc (length);
-if (*tmp == NULL)
-  return ENOMEM;
-  }
-else if (*tmp_size < length)
+if (*tmp_size < length)
   {
 *tmp_size = length;
 free (*tmp);
 *tmp = malloc (length);
 if (*tmp == NULL)
   return ENOMEM;
   }

(Untested.)

> Well, it can run in a hot loop – assume a C-array array[1024][1024][2]
> and copying array[:1024,:1024,0:1] (using OpenMP syntax) – i.e. 1048576
> times every other element. And therefore I would like to avoid repeated
> malloc/free in such a case. (But in general, interdevice copying should
> be very rare.)
>
> Actually, I think the realloc case is unreachable: for rectangular
> copies, as implied both by 'target update' with strided access and by
> 'omp_target_memcpy_rect', the size should be constant.

Worth an 'assert'?

Now I "only" don't understand the '__builtin_mul_overflow' computations
and error checks ('return EINVAL;') for the four cases handled in
'libgomp/target.c:omp_target_memcpy_rect_worker': for example, the
generic loop at end of function iterates over all 'num_dims', but the
specific earlier 'num_dims == N' cases don't.  But I suppose that's just
my own problem not understanding this API/code, and I'm not currently
planning on looking into this any further.  ;-)


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


RE: Intel AVX10.1 Compiler Design and Support

2023-08-09 Thread Joseph Myers
On Wed, 9 Aug 2023, Wang, Phoebe via Gcc-patches wrote:

> Proposal 3: Change the ABI of 512-bit vector and always be 
> passed/returned from memory.

Changing ABIs like that for existing code that has worked for some time on 
existing hardware is a bad idea.

At this point it seems appropriate to remind people of another ABI 
consideration for vector extensions.  glibc's libmvec defines vector 
versions of various functions, including AVX512 ones (of course those 
function versions only work on hardware with the relevant instructions).  
glibc's headers use both _Pragma ("omp declare simd notinbranch") and 
__attribute__ ((__simd__ ("notinbranch"))) to declare, to the compiler 
including those headers, what function variants are available in glibc.

Existing glibc versions need to continue to work with new compiler 
versions.  That is, it's part of the ABI, which must remain stable, 
exactly which function versions the above pragma and attribute imply are 
available - and of course the details of how those functions versions take 
arguments / return results are also part of the ABI (it would be OK for a 
new compiler to choose not to use some of those vector versions, but not 
to start calling them with a different ABI).

Maybe you'll want to add new vector function versions, with different 
interfaces, to libmvec in future.  If so, you need a *different* pragma or 
attribute to declare to the compiler that the libmvec version using that 
pragma or attribute has the additional functions - so new compilers using 
the existing header will not try to generate calls to new function 
versions that don't exist in that glibc version (but new compilers using a 
new header version from new glibc will see the new pragma or attribute and 
so be able to generate the relevant calls to new functions).  And once 
you've defined the ABI for such a new pragma or attribute, that itself 
then becomes a stable interface - so if you end up with vector extensions 
involving yet another set of interfaces, they need another corresponding 
new pragma / attribute for libmvec to declare to the compiler that the new 
interfaces exist.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-08-09 Thread Qing Zhao via Gcc-patches


> On Aug 8, 2023, at 10:54 AM, Martin Uecker  wrote:
> 
> 
> 
> I am sure this has been discussed before, but seeing that you
> test for a specific formula, let me point out the following:
> 
> There at least three different size expression which could
> make sense. Consider
> 
> short foo { int a; short b; char t[]; }; 
> 
> Most people seem to use
> 
> sizeof(struct foo) + N * sizeof(foo->t);
> 
> which for N == 3 yields 11 bytes on x86-64 because the formula
> adds the padding of the original struct. There is an example
> in the  C standard that uses this formula.
> 
> 
> But he minimum size of an object which stores N elements is
> 
> max(sizeof (struct s), offsetof(struct s, t[n]))
> 
> which is 9 bytes. 
> 
> This is what clang uses for statically allocated objects with
> initialization, while GCC uses the rule above (11 bytes). But 
> bdos / bos  then returns the smaller size of 9 which is a bit
> confusing.

As I checked the algorithm for bos in GCC,  it uses a similar formula as the 
following to compute the object size:

offset(struct foo, t[0]) + N * sizeof(*foo->t);

Which seems correct to me.  (Therefore bos returns 9 for this example).
> 
> 
> https://godbolt.org/z/K1hvaK1ns
> 
> https://github.com/llvm/llvm-project/issues/62929
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109956
> 
> 
> Then there is also the size of a similar array where the FAM
> is replaced with an array of static size:
> 
> struct foo { int a; short b; char t[3]; }; 
> 
> This would make the most sense to me, but it has 12 bytes
> because the padding is according to the usual alignment
> rules.

My understanding is, since a structure with FAM cannot be an element of an 
array, 
 the tailing padding might not be necessary?

Qing

> 
> 
> Martin
> 
> 
> 
> Am Montag, dem 07.08.2023 um 09:16 -0700 schrieb Kees Cook:
>> On Fri, Aug 04, 2023 at 07:44:28PM +, Qing Zhao wrote:
>>> This is the 2nd version of the patch, per our discussion based on the
>>> review comments for the 1st version, the major changes in this version
>>> are:
>> 
>> Thanks for the update!
>> 
>>> 
>>> 1. change the name "element_count" to "counted_by";
>>> 2. change the parameter for the attribute from a STRING to an
>>> Identifier;
>>> 3. Add logic and testing cases to handle anonymous structure/unions;
>>> 4. Clarify documentation to permit the situation when the allocation
>>> size is larger than what's specified by "counted_by", at the same time,
>>> it's user's error if allocation size is smaller than what's specified by
>>> "counted_by";
>>> 5. Add a complete testing case for using counted_by attribute in
>>> __builtin_dynamic_object_size when there is mismatch between the
>>> allocation size and the value of "counted_by", the expecting behavior
>>> for each case and the explanation on why in the comments. 
>> 
>> All the "normal" test cases I have are passing; this is wonderful! :)
>> 
>> I'm still seeing unexpected situations when I've intentionally set
>> counted_by to be smaller than alloc_size, but I assume it's due to not
>> yet having the patch you mention below.
>> 
>>> As discussed, I plan to add two more separate patch sets after this initial
>>> patch set is approved and committed.
>>> 
>>> set 1. A new warning option and a new sanitizer option for the user error
>>>when the allocation size is smaller than the value of "counted_by".
>>> set 2. An improvement to __builtin_dynamic_object_size  for the following
>>>case:
>>> 
>>> struct A
>>> {
>>> size_t foo;
>>> int array[] __attribute__((counted_by (foo)));
>>> };
>>> 
>>> extern struct fix * alloc_buf ();
>>> 
>>> int main ()
>>> {
>>> struct fix *p = alloc_buf ();
>>> __builtin_object_size(p->array, 0) == sizeof(struct A) + p->foo * 
>>> sizeof(int);
>>>   /* with the current algorithm, it’s UNKNOWN */ 
>>> __builtin_object_size(p->array, 2) == sizeof(struct A) + p->foo * 
>>> sizeof(int);
>>>   /* with the current algorithm, it’s UNKNOWN */
>>> }
>> 
>> Should the above be bdos instead of bos?
>> 
>>> Bootstrapped and regression tested on both aarch64 and X86, no issue.
>> 
>> I've updated the Linux kernel's macros for the name change and done
>> build tests with my first pass at "easy" cases for adding counted_by:
>> https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=devel/counted_by=adc5b3cb48a049563dc673f348eab7b6beba8a9b
>> 
>> Everything is working as expected. :)
>> 
>> -Kees
>> 
> 
> -- 
> Univ.-Prof. Dr. rer. nat. Martin Uecker
> Graz University of Technology
> Institute of Biomedical Imaging



Re: [PATCH] testsuite: Fix gcc.dg/analyzer/allocation-size-multiline-[123].c [PR 110426]

2023-08-09 Thread David Malcolm via Gcc-patches
On Tue, 2023-08-08 at 15:01 +, Christophe Lyon wrote:
> For 32-bit newlib targets (e.g. arm-eabi)  int32_t is "long int".
> 
> Like previous patches in these tests, update the matching regexps to
> match "aka (long )?int".
> 
> Tested on arm-eabi and aarch64-linux-gnu.

Sorry about this breakage.

These tests used to emit the infomation as multiple messages, but were
consolidated as a side-effect of r14-3001-g021077b94741c9.

I've just committed r14-3114-g73da34a538ddc2, a cleanup of the analyzer
code, which has a side-effect of splitting the messages back up.  I
believe that r14-3114 restores these tests to their pre-r14-3001 state,
but I might have messed up.

Does r14-3114-g73da34a538ddc2 fix the issues for you, or is some
patching still needed?

Dave


> 
> 2023-08-08  Christophe Lyon  
> 
> gcc/testsuite/
> PR analyzer/110426
> * gcc.dg/analyzer/allocation-size-multiline-1.c: Handle
> int32_t being "long int".
> * gcc.dg/analyzer/allocation-size-multiline-2.c: Likewise.
> * gcc.dg/analyzer/allocation-size-multiline-3.c: Likewise.
> ---
>  gcc/testsuite/gcc.dg/analyzer/allocation-size-multiline-1.c | 6 +++-
> --
>  gcc/testsuite/gcc.dg/analyzer/allocation-size-multiline-2.c | 6 +++-
> --
>  gcc/testsuite/gcc.dg/analyzer/allocation-size-multiline-3.c | 4 ++--
>  3 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-multiline-
> 1.c b/gcc/testsuite/gcc.dg/analyzer/allocation-size-multiline-1.c
> index 9938ba237a0..b56e4b4e8e1 100644
> --- a/gcc/testsuite/gcc.dg/analyzer/allocation-size-multiline-1.c
> +++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-multiline-1.c
> @@ -16,7 +16,7 @@ void test_constant_1 (void)
>  |   int32_t *ptr = __builtin_malloc (1);
>  |  ^~~~
>  |  |
> -    |  (1) allocated 1 bytes and assigned to
> 'int32_t *' {aka 'int *'} here; 'sizeof (int32_t {aka int})' is '4'
> +    |  (1) allocated 1 bytes and assigned to
> 'int32_t *' {aka '{re:long :re?}int *'} here; 'sizeof (int32_t {aka
> {re:long :re?}int})' is '4'
>  |
>     { dg-end-multiline-output "" } */
>  
> @@ -34,7 +34,7 @@ void test_constant_2 (void)
>  |   int32_t *ptr = __builtin_malloc (2);
>  |  ^~~~
>  |  |
> -    |  (1) allocated 2 bytes and assigned to
> 'int32_t *' {aka 'int *'} here; 'sizeof (int32_t {aka int})' is '4'
> +    |  (1) allocated 2 bytes and assigned to
> 'int32_t *' {aka '{re:long :re?}int *'} here; 'sizeof (int32_t {aka
> {re:long :re?}int})' is '4'
>  |
>     { dg-end-multiline-output "" } */
>  
> @@ -52,6 +52,6 @@ void test_symbolic (int n)
>  |   int32_t *ptr = __builtin_malloc (n * 2);
>  |  ^~~~
>  |  |
> -    |  (1) allocated 'n * 2' bytes and assigned to
> 'int32_t *' {aka 'int *'} here; 'sizeof (int32_t {aka int})' is '4'
> +    |  (1) allocated 'n * 2' bytes and assigned to
> 'int32_t *' {aka '{re:long :re?}int *'} here; 'sizeof (int32_t {aka
> {re:long :re?}int})' is '4'
>  |
>     { dg-end-multiline-output "" } */
> diff --git a/gcc/testsuite/gcc.dg/analyzer/allocation-size-multiline-
> 2.c b/gcc/testsuite/gcc.dg/analyzer/allocation-size-multiline-2.c
> index 9e1269cbb7a..8912913a78c 100644
> --- a/gcc/testsuite/gcc.dg/analyzer/allocation-size-multiline-2.c
> +++ b/gcc/testsuite/gcc.dg/analyzer/allocation-size-multiline-2.c
> @@ -16,7 +16,7 @@ void test_constant_1 (void)
>  |   int32_t *ptr = __builtin_alloca (1);
>  |  ^~~~
>  |  |
> -    |  (1) allocated 1 bytes and assigned to
> 'int32_t *' {aka 'int *'} here; 'sizeof (int32_t {aka int})' is '4'
> +    |  (1) allocated 1 bytes and assigned to
> 'int32_t *' {aka '{re:long :re?}int *'} here; 'sizeof (int32_t {aka
> {re:long :re?}int})' is '4'
>  |
>     { dg-end-multiline-output "" } */
>  
> @@ -33,7 +33,7 @@ void test_constant_2 (void)
>  |   int32_t *ptr = __builtin_alloca (2);
>  |  ^~~~
>  |  |
> -    |  (1) allocated 2 bytes and assigned to
> 'int32_t *' {aka 'int *'} here; 'sizeof (int32_t {aka int})' is '4'
> +    |  (1) allocated 2 bytes and assigned to
> 'int32_t *' {aka '{re:long :re?}int *'} here; 'sizeof (int32_t {aka
> {re:long :re?}int})' is '4'
>  |
>     { dg-end-multiline-output "" } */
>  
> @@ -50,7 +50,7 @@ void test_symbolic (int n)
>  |   int32_t *ptr = __builtin_alloca (n * 2);
>  |  ^~~~
>  |  |
> -    |  (1) allocated 'n * 2' bytes and assigned to
> 'int32_t *' {aka 'int *'} here; 'sizeof (int32_t {aka int})' is '4'
> +    |  (1) 

Re: [PATCH 00/24] Sync shared build infrastructure with binutils-gdb

2023-08-09 Thread Joseph Myers
On Wed, 9 Aug 2023, Arsen Arsenović via Gcc-patches wrote:

> Joseph Myers  writes:
> 
> > On Tue, 8 Aug 2023, Arsen Arsenović via Gcc-patches wrote:
> >
> >> Yes.  Libtool was forked over a decade ago.  My next project is syncing
> >> upstream and us back up.  Unsure about pkg.m4.
> >
> > Note as per previous discussions that libtool commit 
> > 3334f7ed5851ef1e96b052f2984c4acdbf39e20c will need reverting in GCC when 
> > updating libtool because of incompatible usage of --with-sysroot.
> 
> I wanted to, somehow, coalesce the two back together, so that both are
> available.  Presumably, this means adding an option to libtool to accept
> host-sysroot or such, but I haven't done too much looking into this.
> 
> Is my interpretation of the issue correct?  (i.e. GCC uses sysroot to
> mean *target* sysroot rather than host sysroot)

GCC's sysroot is a sysroot containing target headers and libraries, yes.  
Also note:

* The --with-sysroot directory is the directory that would be used by GCC 
if installed in the configured --prefix (relocated if GCC ends up running 
from a different prefix).  At build time, --with-build-sysroot takes 
precedence.

* The --with-sysroot directory (and likewise the --with-build-sysroot 
directory or any directory specified with --sysroot= passed to GCC 
programs) may sometimes contain multiple sysroots in subdirectories; that 
directory gets combined with a suffix from SYSROOT_SUFFIX_SPEC to 
determine the actual subdirectory for the current multilib, within which 
there are subdirectories such as usr/include or usr/lib.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH 3/3] testsuite: Use distinct explicit error codes in value_9.f90

2023-08-09 Thread Mikael Morin via Gcc-patches
Use distinct error codes, so that we can spot directly from the
testsuite log which case is failing.

gcc/testsuite/ChangeLog:

* gfortran.dg/value_9.f90 (val, val4, sub, sub4): Take the error
codes from the arguments.
(p): Update calls: pass explicit distinct error codes.
---
 gcc/testsuite/gfortran.dg/value_9.f90 | 108 +-
 1 file changed, 56 insertions(+), 52 deletions(-)

diff --git a/gcc/testsuite/gfortran.dg/value_9.f90 
b/gcc/testsuite/gfortran.dg/value_9.f90
index 1a2fa80ed0d..4813250ebaa 100644
--- a/gcc/testsuite/gfortran.dg/value_9.f90
+++ b/gcc/testsuite/gfortran.dg/value_9.f90
@@ -20,78 +20,82 @@ program p
   ! Check len=1 actual argument cases first
   ca  =   "a"; cp  =   "b"; cd  =   "c"
   ca4 = 4_"d"; cp4 = 4_"e"; cd4 = 4_"f"
-  call val  ("B","B")
-  call val  ("A",char(65))
-  call val  ("A",char(a))
-  call val  ("A",mychar(65))
-  call val  ("A",mychar(a))
-  call val  ("1",c)
-  call val  ("1",(c))
-  call val4 (4_"C",4_"C")
-  call val4 (4_"A",char(65,kind=4))
-  call val4 (4_"A",char(a, kind=4))
-  call val4 (4_"4",c4)
-  call val4 (4_"4",(c4))
-  call val  (ca,ca)
-  call val  (cp,cp)
-  call val  (cd,cd)
-  call val  (ca,(ca))
-  call val4 (ca4,ca4)
-  call val4 (cp4,cp4)
-  call val4 (cd4,cd4)
-  call val4 (cd4,(cd4))
-  call sub  ("S")
-  call sub4 (4_"T")
+  call val  ("B","B", 1, 2)
+  call val  ("A",char(65), 3, 4)
+  call val  ("A",char(a), 5, 6)
+  call val  ("A",mychar(65), 7, 8)
+  call val  ("A",mychar(a), 9, 10)
+  call val  ("1",c, 11, 12)
+  call val  ("1",(c), 13, 14)
+  call val4 (4_"C",4_"C", 15, 16)
+  call val4 (4_"A",char(65,kind=4), 17, 18)
+  call val4 (4_"A",char(a, kind=4), 19, 20)
+  call val4 (4_"4",c4, 21, 22)
+  call val4 (4_"4",(c4), 23, 24)
+  call val  (ca,ca, 25, 26)
+  call val  (cp,cp, 27, 28)
+  call val  (cd,cd, 29, 30)
+  call val  (ca,(ca), 31, 32)
+  call val4 (ca4,ca4, 33, 34)
+  call val4 (cp4,cp4, 35, 36)
+  call val4 (cd4,cd4, 37, 38)
+  call val4 (cd4,(cd4), 39, 40)
+  call sub  ("S", 41, 42)
+  call sub4 (4_"T", 43, 44)
 
   ! Check that always the first character of the string is finally used
-  call val  (  "U++",  "U--")
-  call val4 (4_"V**",4_"V//")
-  call sub  (  "WTY")
-  call sub4 (4_"ZXV")
-  call val  (  "234",  d)
-  call val4 (4_"345",  d4   )
-  call val  (  "234", (d)   )
-  call val4 (4_"345", (d4)  )
-  call val  (  "234",  d (1:2))
-  call val4 (4_"345",  d4(1:2))
-  call val  (  "234",  d (1:l))
-  call val4 (4_"345",  d4(1:l))
-  call val  ("1",c // d)
-  call val  ("1",trim (c // d))
-  call val4 (4_"4",c4 // d4)
-  call val4 (4_"4",trim (c4 // d4))
+  call val  (  "U++",  "U--", 45, 46)
+  call val4 (4_"V**",4_"V//", 47, 48)
+  call sub  (  "WTY", 49, 50)
+  call sub4 (4_"ZXV", 51, 52)
+  call val  (  "234",  d, 53, 54)
+  call val4 (4_"345",  d4   , 55, 56)
+  call val  (  "234", (d)   , 57, 58)
+  call val4 (4_"345", (d4)  , 59, 60)
+  call val  (  "234",  d (1:2), 61, 62)
+  call val4 (4_"345",  d4(1:2), 63, 64)
+  call val  (  "234",  d (1:l), 65, 66)
+  call val4 (4_"345",  d4(1:l), 67, 68)
+  call val  ("1",c // d, 69, 70)
+  call val  ("1",trim (c // d), 71, 72)
+  call val4 (4_"4",c4 // d4, 73, 74)
+  call val4 (4_"4",trim (c4 // d4), 75, 76)
   cd = "gkl"; cd4 = 4_"hmn"
-  call val  (cd,cd)
-  call val4 (cd4,cd4)
-  call sub  (cd)
-  call sub4 (cd4)
+  call val  (cd,cd, 77, 78)
+  call val4 (cd4,cd4, 79, 80)
+  call sub  (cd, 81, 82)
+  call sub4 (cd4, 83, 84)
   deallocate (ca, cp, ca4, cp4, cd, cd4)
 contains
-  subroutine val (x, c)
+  subroutine val (x, c, err1, err2)
 character(kind=1), intent(in) :: x  ! control: pass by reference
 character(kind=1), value  :: c
+integer, intent(in) :: err1, err2
 print *, "by value(kind=1): ", c
-if (c /= x)   stop 1
+if (c /= x)   stop err1
 c = "*"
-if (c /= "*") stop 2
+if (c /= "*") stop err2
   end
 
-  subroutine val4 (x, c)
+  subroutine val4 (x, c, err1, err2)
 character(kind=4), intent(in) :: x  ! control: pass by reference
 character(kind=4), value  :: c
+integer, intent(in) :: err1, err2
 print *, "by value(kind=4): ", c
-if (c /= x) stop 3
+if (c /= x) stop err1
 c = 4_"#"
-if (c /= 4_"#") stop 4
+if (c /= 4_"#") stop err2
   end
 
-  subroutine sub (s)
+  subroutine sub (s, err1, err2)
 character(*), intent(in) :: s
-call val (s, s)
+integer,  intent(in) :: err1, err2
+call val (s, s, err1, err2)
   end
-  subroutine sub4 (s)
+  subroutine sub4 (s, err1, err2)
 character(kind=4,len=*), intent(in) :: s
-call val4 (s, s)
+integer, intent(in) :: err1, err2
+call val4 (s, s, err1, err2)
   end
 
   character function mychar (i)
-- 
2.40.1



[PATCH 1/3] fortran: New predicate gfc_length_one_character_type_p

2023-08-09 Thread Mikael Morin via Gcc-patches
Introduce a new predicate to simplify conditionals checking for
a character type whose length is the constant one.

gcc/fortran/ChangeLog:

* gfortran.h (gfc_length_one_character_type_p): New inline
function.
* check.cc (is_c_interoperable): Use
gfc_length_one_character_type_p.
* decl.cc (verify_bind_c_sym): Same.
* trans-expr.cc (gfc_conv_procedure_call): Same.
---
 gcc/fortran/check.cc  |  7 +++
 gcc/fortran/decl.cc   |  4 +---
 gcc/fortran/gfortran.h| 15 +++
 gcc/fortran/trans-expr.cc |  8 ++--
 4 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/gcc/fortran/check.cc b/gcc/fortran/check.cc
index 4086dc71d34..6c45e6542f0 100644
--- a/gcc/fortran/check.cc
+++ b/gcc/fortran/check.cc
@@ -5250,10 +5250,9 @@ is_c_interoperable (gfc_expr *expr, const char **msg, 
bool c_loc, bool c_f_ptr)
&& !gfc_simplify_expr (expr->ts.u.cl->length, 0))
   gfc_internal_error ("is_c_interoperable(): gfc_simplify_expr failed");
 
-if (!c_loc && expr->ts.u.cl
-   && (!expr->ts.u.cl->length
-   || expr->ts.u.cl->length->expr_type != EXPR_CONSTANT
-   || mpz_cmp_si (expr->ts.u.cl->length->value.integer, 1) != 0))
+if (!c_loc
+   && expr->ts.u.cl
+   && !gfc_length_one_character_type_p (>ts))
   {
*msg = "Type shall have a character length of 1";
return false;
diff --git a/gcc/fortran/decl.cc b/gcc/fortran/decl.cc
index 844345df77e..8182ef29f43 100644
--- a/gcc/fortran/decl.cc
+++ b/gcc/fortran/decl.cc
@@ -6064,9 +6064,7 @@ verify_bind_c_sym (gfc_symbol *tmp_sym, gfc_typespec *ts,
 
   /* BIND(C) functions cannot return a character string.  */
   if (bind_c_function && tmp_sym->ts.type == BT_CHARACTER)
-   if (tmp_sym->ts.u.cl == NULL || tmp_sym->ts.u.cl->length == NULL
-   || tmp_sym->ts.u.cl->length->expr_type != EXPR_CONSTANT
-   || mpz_cmp_si (tmp_sym->ts.u.cl->length->value.integer, 1) != 0)
+   if (!gfc_length_one_character_type_p (_sym->ts))
  gfc_error ("Return type of BIND(C) function %qs of character "
 "type at %L must have length 1", tmp_sym->name,
 &(tmp_sym->declared_at));
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 6482a885211..d44e5286626 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -3181,6 +3181,21 @@ gfc_finalizer;
 
 / Function prototypes */
 
+
+/* Returns true if the type specified in TS is a character type whose length
+   is the constant one.  Otherwise returns false.  */
+
+inline bool
+gfc_length_one_character_type_p (gfc_typespec *ts)
+{
+  return ts->type == BT_CHARACTER
+&& ts->u.cl
+&& ts->u.cl->length
+&& ts->u.cl->length->expr_type == EXPR_CONSTANT
+&& ts->u.cl->length->ts.type == BT_INTEGER
+&& mpz_cmp_ui (ts->u.cl->length->value.integer, 1) == 0;
+}
+
 /* decl.cc */
 bool gfc_in_match_data (void);
 match gfc_match_char_spec (gfc_typespec *);
diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index ef3e6d08f78..6da3975f77c 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -6453,12 +6453,8 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
   dummy arguments are actually passed by value.
   Strings are truncated to length 1.
   The BIND(C) case is handled elsewhere.  */
-   if (fsym->ts.type == BT_CHARACTER
-   && !fsym->ts.is_c_interop
-   && fsym->ts.u.cl->length->expr_type == EXPR_CONSTANT
-   && fsym->ts.u.cl->length->ts.type == BT_INTEGER
-   && (mpz_cmp_ui
-   (fsym->ts.u.cl->length->value.integer, 1) == 0))
+   if (!fsym->ts.is_c_interop
+   && gfc_length_one_character_type_p (>ts))
  {
if (e->expr_type != EXPR_CONSTANT)
  {
-- 
2.40.1



[PATCH 2/3] fortran: Fix length one character dummy arg type [PR110419]

2023-08-09 Thread Mikael Morin via Gcc-patches
Revision r14-2171-g8736d6b14a4dfdfb58c80ccd398981b0fb5d00aa
changed the argument passing convention for length 1 value dummy
arguments to pass just the single character by value.  However, the
procedure declarations weren't updated to reflect the change in the
argument types.
This change does the missing argument type update.

The change of argument types generated an internal error in
gfc_conv_string_parameter with value_9.f90.  Indeed, that function is
not prepared for bare character type, so it is updated as well.

The condition guarding the single character argument passing code
is loosened to not exclude non-interoperable kind (this fixes
a regression with c_char_tests_2.f03).

Finally, the constant string argument passing code is updated as well
to extract the single char and pass it instead of passing it as
a length one string.  As the code taking care of non-constant arguments
was already doing this, the condition guarding it is just removed.

With these changes, value_9.f90 passes on 32 bits big-endian powerpc.

PR fortran/110360
PR fortran/110419

gcc/fortran/ChangeLog:

* trans-types.cc (gfc_sym_type): Use a bare character type for length
one value character dummy arguments.
* trans-expr.cc (gfc_conv_string_parameter): Handle single character
case.
(gfc_conv_procedure_call): Don't exclude interoperable kinds
from single character handling.  For single character dummy arguments,
extend the existing handling of non-constant expressions to constant
expressions.

gcc/testsuite/ChangeLog:

* gfortran.dg/bind_c_usage_13.f03: Update tree dump patterns.
---
 gcc/fortran/trans-expr.cc | 35 +++
 gcc/fortran/trans-types.cc|  5 ++-
 gcc/testsuite/gfortran.dg/bind_c_usage_13.f03 |  8 ++---
 3 files changed, 28 insertions(+), 20 deletions(-)

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 6da3975f77c..d91cc9da221 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -6451,26 +6451,24 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 
/* ABI: actual arguments to CHARACTER(len=1),VALUE
   dummy arguments are actually passed by value.
-  Strings are truncated to length 1.
-  The BIND(C) case is handled elsewhere.  */
-   if (!fsym->ts.is_c_interop
-   && gfc_length_one_character_type_p (>ts))
+  Strings are truncated to length 1.  */
+   if (gfc_length_one_character_type_p (>ts))
  {
-   if (e->expr_type != EXPR_CONSTANT)
- {
-   tree slen1 = build_int_cst (gfc_charlen_type_node, 
1);
-   gfc_conv_string_parameter ();
-   parmse.expr = gfc_string_to_single_character (slen1,
- 
parmse.expr,
- 
e->ts.kind);
-   /* Truncate resulting string to length 1.  */
-   parmse.string_length = slen1;
- }
-   else if (e->value.character.length > 1)
+   if (e->expr_type == EXPR_CONSTANT
+   && e->value.character.length > 1)
  {
e->value.character.length = 1;
gfc_conv_expr (, e);
  }
+
+   tree slen1 = build_int_cst (gfc_charlen_type_node, 1);
+   gfc_conv_string_parameter ();
+   parmse.expr
+   = gfc_string_to_single_character (slen1,
+ parmse.expr,
+ e->ts.kind);
+   /* Truncate resulting string to length 1.  */
+   parmse.string_length = slen1;
  }
 
if (fsym->attr.optional
@@ -10610,6 +10608,13 @@ gfc_conv_string_parameter (gfc_se * se)
 {
   tree type;
 
+  if (TREE_CODE (TREE_TYPE (se->expr)) == INTEGER_TYPE
+  && integer_onep (se->string_length))
+{
+  se->expr = gfc_build_addr_expr (NULL_TREE, se->expr);
+  return;
+}
+
   if (TREE_CODE (se->expr) == STRING_CST)
 {
   type = TREE_TYPE (TREE_TYPE (se->expr));
diff --git a/gcc/fortran/trans-types.cc b/gcc/fortran/trans-types.cc
index 987e3d26c46..084b8c3ae2c 100644
--- a/gcc/fortran/trans-types.cc
+++ b/gcc/fortran/trans-types.cc
@@ -2313,7 +2313,10 @@ gfc_sym_type (gfc_symbol * sym, bool is_bind_c)
  && sym->ns->proc_name
  && sym->ns->proc_name->attr.is_bind_c)
  || 

[PATCH 0/3] fortran: fix length one character dummy args [PR110419]

2023-08-09 Thread Mikael Morin via Gcc-patches
Hello,

I propose with this patchset a fix for the test value_9.f90 which has been
failing on 32 bits powerpc since it was added a few weeks back (see PR110360
and PR110419).

The problem is an argument type mismatch between a procedure declaration,
and the argument value for a call of that same procedure, in the specific
case of length one character dummy arguments with the value attribute.
Admittedly, our argument passing conventions [1] for those are currently
unspecified.

Before PR110360, character dummy arguments with value attribute were
arrays passed by value, but the actual argument was still passed as
reference.  PR110360 changed that to pass length one dummies as bare
character (i.e. scalar integer), like in the bind(c) case (but with length
argument still present).  However, the argument type in the function declaration
wasn't changed at the same time, so the test was failing on big-endian 32 bits
targets.  Surprisingly, on most targets the middle-end, back-end and runtime
are happy to get a scalar value passed where a length one array is expected.

This can be fixed, either by reverting back to arguments represented as
arrays passed by value with calls fixed, or by keeping the new
representation with single characters for arguments and fixing the procedure
types accordingly.

I haven't really tried the first way, this is using the second one.
The first patch is a preliminary refactoring.  The main change is the
second patch.  It modifies the types of length one character dummy argsuments
with value attribute in the procedure declarations, so that they are scalar
integer types, consistently with how arguments are passed for calls.
The third patch is a change of error codes in the testcase.

I have regression tested this on x86_64-unknown-linux-gnu, and
powerpc64-unknown-linux-gnu (both -m32 and -m64).
OK for master?

[1] https://gcc.gnu.org/onlinedocs/gfortran/Argument-passing-conventions.html


Mikael Morin (3):
  fortran: New predicate gfc_length_one_character_type_p
  fortran: Fix length one character dummy arg type [PR110419]
  testsuite: Use distinct explicit error codes in value_9.f90

 gcc/fortran/check.cc  |   7 +-
 gcc/fortran/decl.cc   |   4 +-
 gcc/fortran/gfortran.h|  15 +++
 gcc/fortran/trans-expr.cc |  39 ---
 gcc/fortran/trans-types.cc|   5 +-
 gcc/testsuite/gfortran.dg/bind_c_usage_13.f03 |   8 +-
 gcc/testsuite/gfortran.dg/value_9.f90 | 108 +-
 7 files changed, 103 insertions(+), 83 deletions(-)

-- 
2.40.1



[pushed] analyzer: remove default return value from region_model::on_call_pre

2023-08-09 Thread David Malcolm via Gcc-patches
Previously, the code for simulating calls to external functions in
region_model::on_call_pre wrote a default svalue to the LHS of the
call statement, which could be further overwritten by known_function
subclasses.

Unfortunately, this led to messy hacks, such as when the default svalue
was an allocation: the LHS would be written to with two different
heap-allocated regions, requiring special-case cleanups to avoid the
stray state from the first heap allocation leading to state explosions;
see r14-3001-g021077b94741c9.

The following patch eliminates this write of a default svalue to the LHS
of callsite.  Instead, all known_function implementations that have a
return value are now responsible for set the LHS themselves.  A new
call_details::set_any_lhs_with_defaults function is provided to make it
easy to get the old behavior.

On working through the various known_function subclasses, I noticed that
memset was using the default behavior.  That patch updates this so that
it's now known to return its first parameter.

Cleaning this up eliminates various doubling of saved_diagnostics (e.g.
for dubious_allocation_size) where it was generating a diagnostic for
both writes to the LHS, deduplicating them to the first diagnostic (with
the default LHS), and then failing to create a region_creation_event
when emitting the diagnostic, leading to the fallback wording in
dubious_allocation_size::describe_final_event, such as:

  (1) allocated 42 bytes and assigned to ‘int32_t *’ {aka ‘int *’} here; 
‘sizeof (int32_t {aka int})’ is ‘4’

Without the double write to the LHS, it creates a region_creation_event,
so we get the allocation and the assignment as two separate events in
the diagnostic path, e.g.:

  (1) allocated 42 bytes here
  (2) assigned to ‘int32_t *’ {aka ‘int *’} here; ‘sizeof (int32_t {aka int})’ 
is ‘4’

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-3114-g73da34a538ddc2.

gcc/analyzer/ChangeLog:
* analyzer.h (class pure_known_function_with_default_return): New
subclass.
* call-details.cc (const_fn_p): Move here from region-model.cc.
(maybe_get_const_fn_result): Likewise.
(get_result_size_in_bytes): Likewise.
(call_details::set_any_lhs_with_defaults): New function, based on
code in region_model::on_call_pre.
* call-details.h (call_details::set_any_lhs_with_defaults): New
decl.
* diagnostic-manager.cc
(diagnostic_manager::emit_saved_diagnostic): Log the index of the
saved_diagnostic.
* kf.cc (pure_known_function_with_default_return::impl_call_pre):
New.
(kf_memset::impl_call_pre): Set the LHS to the first param.
(kf_putenv::impl_call_pre): Call cd.set_any_lhs_with_defaults.
(kf_sprintf::impl_call_pre): Call cd.set_any_lhs_with_defaults.
(class kf_stack_restore): Derive from
pure_known_function_with_default_return.
(class kf_stack_save): Likewise.
(kf_strlen::impl_call_pre): Call cd.set_any_lhs_with_defaults.
* region-model-reachability.cc (reachable_regions::handle_sval):
Remove logic for symbolic regions for pointers.
* region-model.cc (region_model::canonicalize): Remove purging of
dynamic extents workaround for surplus values from
region_model::on_call_pre's default LHS code.
(const_fn_p): Move to call-details.cc.
(maybe_get_const_fn_result): Likewise.
(get_result_size_in_bytes): Likewise.
(region_model::update_for_nonzero_return): Call
cd.set_any_lhs_with_defaults.
(region_model::on_call_pre): Remove the assignment to the LHS of a
default return value, instead requiring all known_function
implementations to write to any LHS of the call.  Use
cd.set_any_lhs_with_defaults on the non-kf paths.
* sm-fd.cc (kf_socket::outcome_of_socket::update_model): Use
cd.set_any_lhs_with_defaults when failing to get at fd state.
(kf_bind::outcome_of_bind::update_model): Likewise.
(kf_listen::outcome_of_listen::update_model): Likewise.
(kf_accept::outcome_of_accept::update_model): Likewise.
(kf_connect::outcome_of_connect::update_model): Likewise.
(kf_read::impl_call_pre): Use cd.set_any_lhs_with_defaults.
* sm-file.cc (class kf_stdio_output_fn): Derive from
pure_known_function_with_default_return.
(class kf_ferror): Likewise.
(class kf_fileno): Likewise.
(kf_fgets::impl_call_pre): Use cd.set_any_lhs_with_defaults.
(kf_read::impl_call_pre): Likewise.
(class kf_getc): Derive from
pure_known_function_with_default_return.
(class kf_getchar): Likewise.
* varargs.cc (kf_va_arg::impl_call_pre): Use
cd.set_any_lhs_with_defaults.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/allocation-size-1.c: Update expected results
to reflect splitting of allocation size 

Re: [PATCH 3/12] phiopt: Fix phiopt ICE on vops [PR102989]

2023-08-09 Thread Jakub Jelinek via Gcc-patches
On Wed, Aug 09, 2023 at 01:06:22PM -0700, Andrew Pinski wrote:
> That was my next thought, returning NULL from
> single_non_singleton_phi_for_edges if it would return a virtual OP
> might be better even.
> Either version of these patches are ok with me (though I am not the
> maintainer here).

In patch form that would be (but so far untested):

2023-08-09  

PR c/102989
* tree-ssa-phiopt.cc (single_non_singleton_phi_for_edges): Never
return virtual phis and return NULL if there is a virtual phi
where the arguments from E0 and E1 edges aren't equal.

--- gcc/tree-ssa-phiopt.cc.jj   2023-08-09 22:08:07.974563266 +0200
+++ gcc/tree-ssa-phiopt.cc  2023-08-09 22:11:37.291517911 +0200
@@ -63,7 +63,13 @@ single_non_singleton_phi_for_edges (gimp
   gimple_stmt_iterator i;
   gphi *phi = NULL;
   if (gimple_seq_singleton_p (seq))
-return as_a  (gsi_stmt (gsi_start (seq)));
+{
+  phi = as_a  (gsi_stmt (gsi_start (seq)));
+  /* Never return virtual phis.  */
+  if (virtual_operand_p (gimple_phi_result (phi)))
+   return NULL;
+  return phi;
+}
   for (i = gsi_start (seq); !gsi_end_p (i); gsi_next ())
 {
   gphi *p = as_a  (gsi_stmt (i));
@@ -72,6 +78,10 @@ single_non_singleton_phi_for_edges (gimp
   gimple_phi_arg_def (p, e1->dest_idx)))
continue;
 
+  /* Punt on virtual phis with different arguments from the edges.  */
+  if (virtual_operand_p (gimple_phi_result (p)))
+   return NULL;
+
   /* If we already have a PHI that has the two edge arguments are
 different, then return it is not a singleton for these PHIs. */
   if (phi)


Jakub



Re: [RFC] GCC Security policy

2023-08-09 Thread Siddhesh Poyarekar

On 2023-08-09 14:17, David Edelsohn wrote:
On Wed, Aug 9, 2023 at 1:33 PM Siddhesh Poyarekar > wrote:


On 2023-08-08 10:30, Siddhesh Poyarekar wrote:
 >> Do you have a suggestion for the language to address libgcc,
 >> libstdc++, etc. and libiberty, libbacktrace, etc.?
 >
 > I'll work on this a bit and share a draft.

Hi David,

Here's what I came up with for different parts of GCC, including the
runtime libraries.  Over time we may find that specific parts of
runtime
libraries simply cannot be used safely in some contexts and flag that.

Sid


Hi, Sid

Thanks for iterating on this.


"""
What is a GCC security bug?
===

      A security bug is one that threatens the security of a system or
      network, or might compromise the security of data stored on it.
      In the context of GCC there are multiple ways in which this might
      happen and they're detailed below.

Compiler drivers, programs, libgccjit and support libraries
---

      The compiler driver processes source code, invokes other programs
      such as the assembler and linker and generates the output result,
      which may be assembly code or machine code.  It is necessary that
      all source code inputs to the compiler are trusted, since it is
      impossible for the driver to validate input source code beyond
      conformance to a programming language standard.

      The GCC JIT implementation, libgccjit, is intended to be plugged
      into applications to translate input source code in the
application
      context.  Limitations that apply to the compiler
      driver, apply here too in terms of sanitizing inputs, so it is
      recommended that inputs are either sanitized by an external
program
      to allow only trusted, safe execution in the context of the
      application or the JIT execution context is appropriately
sandboxed
      to contain the effects of any bugs in the JIT or its generated
code
      to the sandboxed environment.

      Support libraries such as libiberty, libcc1 libvtv and libcpp have
      been developed separately to share code with other tools such as
      binutils and gdb.  These libraries again have similar
challenges to
      compiler drivers.  While they are expected to be robust against
      arbitrary input, they should only be used with trusted inputs.

      Libraries such as zlib and libffi that bundled into GCC to
build it
      will be treated the same as the compiler drivers and programs
as far
      as security coverage is concerned.


Should we direct people to the upstream projects for their security 
policies?


We bundle zlib and libffi so regardless of whether it's a security issue 
in those libraries (because security impact of memory safety bugs in 
general use libraries will be context dependent and hence get assigned 
CVEs more often than not), the context in gcc is well defined as a local 
unprivileged executable and hence not security-relevant.


That said, we could add something like:

However if you find a issue in these libraries independent of their
use in GCC you should reach out to their upstream projects to report
them.




      As a result, the only case for a potential security issue in all
      these cases is when it ends up generating vulnerable output for
      valid input source code.


Language runtime libraries
--

      GCC also builds and distributes libraries that are intended to be
      used widely to implement runtime support for various programming
      languages.  These include the following:

      * libada
      * libatomic
      * libbacktrace
      * libcc1
      * libcody
      * libcpp
      * libdecnumber
      * libgcc
      * libgfortran
      * libgm2
      * libgo
      * libgomp
      * libiberty
      * libitm
      * libobjc
      * libphobos
      * libquadmath
      * libssp
      * libstdc++

      These libraries are intended to be used in arbitrary contexts
and as
      a result, bugs in these libraries may be evaluated for security
      impact.  However, some of these libraries, e.g. libgo, libphobos,
      etc.  are not maintained in the GCC project, due to which the GCC
      project may not be the correct point of contact for them.  You are
      encouraged to look at README files within those library
directories
      to locate the canonical security contact point for those projects.


As Richard mentioned, should GCC make a specific statement about the 
security policy / response for issues that are discovered and fixed in 
the upstream 

Re: [V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-08-09 Thread Qing Zhao via Gcc-patches


> On Aug 9, 2023, at 12:21 PM, Michael Matz  wrote:
> 
> Hello,
> 
> On Wed, 9 Aug 2023, Qing Zhao wrote:
> 
>> Although this is an old FAM related issue that does not relate to my current 
>> patch 
>> (and might need to be resolved in a separate patch).  I think that it’s 
>> necessary to have
>> more discussion on this old issue and resolve it. 
>> 
>> The first thing that I’d like to confirm is:
>> 
>> What the exact memory layout for the following structure x?
>> 
>> struct foo { int a; short b; char t[]; } x = { .t = { 1, 2, 3 } };
>> 
>> And the key that is confusing me is, where should the field “t” start? 
>> 
>> A.  Starting at offset 8 as the following:
>> 
>> a4-bytes
>> b2-bytes
>> padding   2-bytes
>> t3-bytes
> 
> Why should there be padding before 't'?  It's a char array (FAM or not), 
> so it can be (and should be) placed directly after 'b'.  So ...

Yes, you are right. My mistake -:)
> 
>> B. Starting at offset 6 as the following:
>> 
>> a4-bytes
>> b2-bytes
>> t3-bytes
> 
> ... this is the correct layout, when seen in isolation.  The discussion 
> revolves around what should come after 't': if it's a non-FAM struct (with 
> t[3]), then it's clear that there needs to be padding after it, so to pad 
> out the whole struct to be 12 bytes long (for sizeof() purpose), as 
> required by its alignment (due to the int field 'a').

right.
> 
> So, should the equivalent FAM struct also have this sizeof()?  If no: 
> there should be a good argument why it shouldn't be similar to the non-FAM 
> one.

The sizeof() of a structure with FAM is defined as: (after I searched online,
 I think that the one in Wikipedia is the most reasonable one):
https://en.wikipedia.org/wiki/Flexible_array_member

==
Effect on struct size and padding
The sizeof operator on such a struct gives the size of the structure as if the 
flexible 
array member were empty. This may include padding added to accommodate the
 flexible member; the compiler is also free to re-use such padding as part of 
the array itself.[2]

It is common to allocate sizeof(struct) + array_len*sizeof(array element) 
bytes. 

This is not wrong, but it may allocate a few more bytes than necessary: the 
compiler
 may be re-purposing some of the padding that is included in sizeof(struct). 
Should 
this be a concern, macros are available[3] to compute the minimum size while 
ensuring that the compiler's padding is not disrupted. 

As the array may start in the padding before the end of the structure, its 
content
 should always be accessed via indexing (arr[i]) or  offsetof, not sizeof.
==

By definition, the sizeof() of a struct with FAM might not be the same as the 
non-FAM one. 
i.e, for the following two structures, one with FAM, the other with fixed array:

struct foo_flex { int a; short b; char t[]; } x = { .t = { 1, 2, 3 } };
struct foo_fix {int a; short b; char t[3]; } 

With current GCC:
sizeof(foo_flex) == 8
sizeof(foo_fix) == 12

I think that the current behavior of sizeof for structure with FAM in GCC is 
correct. 

The major issue is what was pointed out by Martin in the previous email:

Whether using the following formula is correct to compute the allocation?

sizeof(struct foo_flex) + N * sizeof(foo->t);

As pointed out  in the wikipedia, the value computed by this formula might
 be bigger than the actual size since “sizeof(struct foo_flex)” might include 
paddings that are used as part of the array.

So the more accurate formula should be

offset(struct foo_flex, t[0]) + N * sizeof(foo->t);

With GCC, offset(struct foo_flex,t[0]) == 6, which is also correct. 


> Then there is an argument that the compiler would be fine, when allocating 
> a single object of such type (not as part of an array!), to only reserve 9 
> bytes of space for the FAM-struct.  Then the question is: should it also 
> do that for a non-FAM struct (based on the argument that the padding 
> behind 't' isn't accessible, and hence doesn't need to be alloced).  I 
> think it would be quite unexpected for the compiler to actually reserve 
> less space than sizeof() implies, so I personally don't buy that argument.  
> For FAM or non-FAM structs.

For the question, whether the compiler needs to allocate paddings after the FAM 
field,
 I don’t know the answer, and it’s not specified in the standard either. 
Does it matter?

> 
> Note that if one choses to allocate less space than sizeof implies that 
> this will have quite some consequences for code generation, in that 
> sometimes the instruction sequences (e.g. for copying) need to be careful 
> to never access tail padding that should be there in array context, but 
> isn't there in single-object context.  I think this alone should make it 
> clear that it's advisable that sizeof() and allocated size agree.

Sizeof by definition is return the size of the TYPE, not the size of the 
allocated object.

Another thing I need to point out is,  a 

Re: [PATCH 3/12] phiopt: Fix phiopt ICE on vops [PR102989]

2023-08-09 Thread Andrew Pinski via Gcc-patches
On Wed, Aug 9, 2023 at 1:01 PM Jakub Jelinek  wrote:
>
> On Wed, Aug 09, 2023 at 11:27:48AM -0700, Andrew Pinski wrote:
> > Maybe it is better to punt for VOPS after the call to
> > single_non_singleton_phi_for_edges since none of functions called
> > afterwards support VOPs.
> > That is something like:
> > diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> > index ff36bb0119b..d0b659042a7 100644
> > --- a/gcc/tree-ssa-phiopt.cc
> > +++ b/gcc/tree-ssa-phiopt.cc
> > @@ -4165,6 +4165,10 @@ pass_phiopt::execute (function *)
> >arg0 = gimple_phi_arg_def (phi, e1->dest_idx);
> >arg1 = gimple_phi_arg_def (phi, e2->dest_idx);
> >
> > +  /* Can't do anything with a VOP here.  */
> > +  if (SSA_NAME_IS_VIRTUAL_OPERAND (arg0))
> > +   continue;
> > +
>
> That would ICE if arg0 isn't SSA_NAME (e.g. is INTEGER_CST).
> I think more canonical test for virtual phis is
> if (virtual_operand_p (gimple_phi_result (phi)))
>
> Shall already single_non_singleton_phi_for_edges punt if there is
> a virtual phi with different arguments from the edges (or if there
> is a single virtual phi)?

That was my next thought, returning NULL from
single_non_singleton_phi_for_edges if it would return a virtual OP
might be better even.
Either version of these patches are ok with me (though I am not the
maintainer here).

Thanks,
Andrew

>
> Jakub
>


Re: [RFC PATCH 0/2] RISC-V: __builtin_riscv_pause for all environment

2023-08-09 Thread Jeff Law via Gcc-patches




On 8/9/23 00:11, Tsukasa OI via Gcc-patches wrote:

Hello,

I found that a built-in function "__builtin_riscv_pause" is not usable
unless we enable the 'Zihintpause' extension explicitly (still, this
built-in exists EVEN IF the 'Zihintpause' extension is disabled).

Contents of a.c:


void sample(void)
{
 __builtin_riscv_pause();
}



Compiling with the 'Zihintpause' extension is fine.


$ riscv64-unknown-elf-gcc -O2 -march=rv64i_zihintpause -mabi=lp64 -c a.c



However, compiling without the 'Zihintpause' causes an assembler error
(tested with GNU Binutils 2.41):


$ riscv64-unknown-elf-gcc -O2 -march=rv64i -mabi=lp64 -c a.c
/tmp/ccFjacAz.s: Assembler messages:
/tmp/ccFjacAz.s:11: Error: unrecognized opcode `pause', extension `zihintpause' 
required



This is because:

1.  GCC does not handle the 'Zihintpause' extension and
2.  "riscv_pause" (insn) unconditionally emits "pause" even if the
 assembler does not accept it (when the extension is disabled).


This patch set (PATCH 1/2) resolves this issue by:

1.  Handling the 'Zihintpause' extension and
2.  Splitting the "__builtin_riscv_pause" implementation
 depending on the existence of the 'Zihintpause' extension.

Because a released version of GCC defines "__builtin_riscv_pause"
unconditionally, I chose to define no-'Zihintpause' version.

There is another option to unconditionally emit ".insn 0x010f"
(the machine code of "pause") but I didn't because I wanted to improve the
diagnostics (e.g. *.s output).

I also fixed the description of this built-in function (in PATCH 2/2).


I'm not sure whether this is a good method to split the implementation
depending on the 'Zihintpause' extension.  Other than that, I believe that
this is okay and approval is appreciated.

Note that because I assigned copyright of GCC contribution to FSF, I didn't
attach "Signed-off-by" tag.  Tell me if I need it.
I'd tend to think we do not want to expose the intrinsic unless the 
right extensions are enabled -- even though the encoding is a no-op and 
we could emit it as a .insn.


If others think otherwise, I'll go with the consensus opinion.  So let's 
hold off a bit until others have chimed in.


Thanks,
jeff


Re: [PATCH 3/12] phiopt: Fix phiopt ICE on vops [PR102989]

2023-08-09 Thread Jakub Jelinek via Gcc-patches
On Wed, Aug 09, 2023 at 11:27:48AM -0700, Andrew Pinski wrote:
> Maybe it is better to punt for VOPS after the call to
> single_non_singleton_phi_for_edges since none of functions called
> afterwards support VOPs.
> That is something like:
> diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> index ff36bb0119b..d0b659042a7 100644
> --- a/gcc/tree-ssa-phiopt.cc
> +++ b/gcc/tree-ssa-phiopt.cc
> @@ -4165,6 +4165,10 @@ pass_phiopt::execute (function *)
>arg0 = gimple_phi_arg_def (phi, e1->dest_idx);
>arg1 = gimple_phi_arg_def (phi, e2->dest_idx);
> 
> +  /* Can't do anything with a VOP here.  */
> +  if (SSA_NAME_IS_VIRTUAL_OPERAND (arg0))
> +   continue;
> +

That would ICE if arg0 isn't SSA_NAME (e.g. is INTEGER_CST).
I think more canonical test for virtual phis is
if (virtual_operand_p (gimple_phi_result (phi)))

Shall already single_non_singleton_phi_for_edges punt if there is
a virtual phi with different arguments from the edges (or if there
is a single virtual phi)?

Jakub



Re: [PATCH] RISC-V: Remove non-existing 'Zve32d' extension

2023-08-09 Thread Jeff Law via Gcc-patches




On 8/9/23 00:09, Tsukasa OI via Gcc-patches wrote:

Since this extension does not exist, this commit prunes this from
the defined extension version table.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc(riscv_ext_version_table):
Remove 'Zve32d' from the version list.

Thanks.  Installed.
jeff


Re: [PATCH] RISC-V: Handle no_insn in TARGET_SCHED_VARIABLE_ISSUE.

2023-08-09 Thread Jeff Law via Gcc-patches



On 5/29/23 06:46, Jeff Law wrote:



On 5/29/23 05:01, Jin Ma wrote:
Reference: 
https://github.com/gcc-mirror/gcc/commit/d0bc0cb66bcb0e6a5a5a31a9e900e8ccc98e34e5


RISC-V should also be implemented to handle no_insn patterns for 
pipelining.


gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_sched_variable_issue): New function.
(TARGET_SCHED_VARIABLE_ISSUE): New macro.
---
  gcc/config/riscv/riscv.cc | 21 +
  1 file changed, 21 insertions(+)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 3954fc07a8b..559fa9cd7e0 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -6225,6 +6225,24 @@ riscv_issue_rate (void)
    return tune_param->issue_rate;
  }
+/* Implement TARGET_SCHED_VARIABLE_ISSUE.  */
+
+static int
+riscv_sched_variable_issue (FILE *, int, rtx_insn *insn, int more)
+{
+  if (DEBUG_INSN_P (insn))
+    return more;
+
+  rtx_code code = GET_CODE (PATTERN (insn));
+  if (code == USE || code == CLOBBER)
+    return more;
+
+  if (get_attr_type (insn) == TYPE_UNKNOWN)
+    return more;
+
+  return more - 1;
+}
The problem is that INSN is *much* more likely to be a real instruction 
that takes real resources, even if it is TYPE_UNKNOWN.
TYPE_UNKNOWN here is actually an indicator of what I would consider a 
bug in the backend, specifically that we have INSNs that do not provide 
a mapping for the schedulers to suitable types.


With that in mind I'd much rather get to the point where we can do 
something like this for TYPE_UNKNOWN:


type = get_attr_type (insn);
gcc_assert (type != TYPE_UNKNOWN);

That way if we ever encounter a TYPE_UNKNOWN during development, we can 
fix it in the md files in a sensible manner.  I don't know if we are 
close to being able to do that.  We fixed a ton of stuff in bitmanip.md, 
but I don't think there's been a thorough review of the port to find 
other instances of TYPE_UNKNOWN INSNs.



The other thing if this code probably wants to handle GHOST type 
instructions.  While GHOST is used for instructions which generate no 
code, it might seem they should return "more" as those INSNs take no 
resources.  But GHOST is actually used for things like the blockage insn 
which should end a cycle from an issue standpoint.  So the right 
handling of a GHOST is something like this:


if (type == TYPE_GHOST)
   return 0;
So there wasn't ever any follow-up.  Given this was something Ventana 
was also carrying locally (with very minor differences) I went ahead and 
merged up the implementations and pushed the final result to the trunk.



Attached is the patch that was actually committed.

Jeffcommit f088b768d01ae42385697584a2bcac141685dce2
Author: Jin Ma 
Date:   Wed Aug 9 13:52:06 2023 -0600

RISC-V: Handle no_insn in TARGET_SCHED_VARIABLE_ISSUE.

Reference: 
https://github.com/gcc-mirror/gcc/commit/d0bc0cb66bcb0e6a5a5a31a9e900e8ccc98e34e5

RISC-V should also be implemented to handle no_insn patterns for pipelining.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_sched_variable_issue): New function.
(TARGET_SCHED_VARIABLE_ISSUE): New macro.

Co-authored-by: Philipp Tomsich 
Co-authored-by: Jeff Law 

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 7f2041a54ba..dfb519ab9a8 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -6698,6 +6698,31 @@ riscv_issue_rate (void)
   return tune_param->issue_rate;
 }
 
+/* Implement TARGET_SCHED_VARIABLE_ISSUE.  */
+static int
+riscv_sched_variable_issue (FILE *, int, rtx_insn *insn, int more)
+{
+  if (DEBUG_INSN_P (insn))
+return more;
+
+  rtx_code code = GET_CODE (PATTERN (insn));
+  if (code == USE || code == CLOBBER)
+return more;
+
+  /* GHOST insns are used for blockage and similar cases which
+ effectively end a cycle.  */
+  if (get_attr_type (insn) == TYPE_GHOST)
+return 0;
+
+#if 0
+  /* If we ever encounter an insn with an unknown type, trip
+ an assert so we can find and fix this problem.  */
+  gcc_assert (get_attr_type (insn) != TYPE_UNKNOWN);
+#endif
+
+  return more - 1;
+}
+
 /* Auxiliary function to emit RISC-V ELF attribute. */
 static void
 riscv_emit_attribute ()
@@ -8420,6 +8445,9 @@ riscv_frame_pointer_required (void)
 #undef TARGET_SCHED_ISSUE_RATE
 #define TARGET_SCHED_ISSUE_RATE riscv_issue_rate
 
+#undef  TARGET_SCHED_VARIABLE_ISSUE
+#define TARGET_SCHED_VARIABLE_ISSUE riscv_sched_variable_issue
+
 #undef TARGET_FUNCTION_OK_FOR_SIBCALL
 #define TARGET_FUNCTION_OK_FOR_SIBCALL riscv_function_ok_for_sibcall
 


Re: RISC-V: Folding memory for FP + constant case

2023-08-09 Thread Jeff Law via Gcc-patches



On 7/12/23 14:59, Jivan Hakobyan via Gcc-patches wrote:

Accessing local arrays element turned into load form (fp + (index << C1)) +
C2 address.
In the case when access is in the loop we got loop invariant computation.
For some reason, moving out that part cannot be done in
loop-invariant passes.
But we can handle that in target-specific hook (legitimize_address).
That provides an opportunity to rewrite memory access more suitable for the
target architecture.

This patch solves the mentioned case by rewriting mentioned case to ((fp +
C2) + (index << C1))
I have evaluated it on SPEC2017 and got an improvement on leela (over 7b
instructions,
.39% of the dynamic count) and dwarfs the regression for gcc (14m
instructions, .0012%
of the dynamic count).


gcc/ChangeLog:
 * config/riscv/riscv.cc (riscv_legitimize_address): Handle folding.
 (mem_shadd_or_shadd_rtx_p): New predicate.

So I poked a bit more in this space today.

As you may have noted, Manolis's patch still needs another rev.  But I 
was able to test this patch in conjunction with the f-m-o patch as well 
as the additional improvements made to hard register cprop.  The net 
result was that this patch still shows a nice decrease in instruction 
counts on leela.  It's a bit of a mixed bag elsewhere.


I dove a bit deeper into the small regression in x264.  In the case I 
looked at the reason the patch regresses is the original form of the 
address calculations exposes a common subexpression ie


addr1 = (reg1 << 2) + fp + C1
addr2 = (reg1 << 2) + fp + C2

(reg1 << 2) + fp is a common subexpression resulting in something like 
this as we leave CSE:


t = (reg1 << 2) + fp;
addr1 = t + C1
addr2 = t + C2
mem (addr1)
mem (addr2)

C1 and C2 are small constants, so combine generates

t = (reg1 << 2) + fp;
mem (t+C1)
mem (t+C2)

FP elimination occurs after IRA and we get:

t2 = sp + C3
t = (reg << 2) + t2
mem (t + C1)
mem (t + C2)


Not bad.  Manolis's work should allow us to improve that a bit more.


With this patch we don't capture the CSE and ultimately generate 
slightly worse code.  This kind of issue is fairly inherent in 
reassociations -- and given the regression is 2 orders of magnitude 
smaller than the improvement my inclination is to go forward with this 
patch.




I've fixed a few formatting issues and changed once conditional to use 
CONST_INT_P rather than checking the code directory and pushed the final 
version to the trunk.


Thanks for your patience.

jeff
commit a16dc729fda9fabd6472d50cce45791cb3b6ada8
Author: Jivan Hakobyan 
Date:   Wed Aug 9 13:26:58 2023 -0600

RISC-V: Folding memory for FP + constant case

Accessing local arrays element turned into load form (fp + (index << C1)) +
C2 address.

In the case when access is in the loop we got loop invariant computation.  
For
some reason, moving out that part cannot be done in loop-invariant passes.  
But
we can handle that in target-specific hook (legitimize_address).  That 
provides
an opportunity to rewrite memory access more suitable for the target
architecture.

This patch solves the mentioned case by rewriting mentioned case to ((fp +
C2) + (index << C1))

I have evaluated it on SPEC2017 and got an improvement on leela (over 7b
instructions, .39% of the dynamic count) and dwarfs the regression for gcc 
(14m
instructions, .0012% of the dynamic count).

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_legitimize_address): Handle folding.
(mem_shadd_or_shadd_rtx_p): New function.

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 77892da2920..7f2041a54ba 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -1805,6 +1805,22 @@ riscv_shorten_lw_offset (rtx base, HOST_WIDE_INT offset)
   return addr;
 }
 
+/* Helper for riscv_legitimize_address. Given X, return true if it
+   is a left shift by 1, 2 or 3 positions or a multiply by 2, 4 or 8.
+
+   This respectively represent canonical shift-add rtxs or scaled
+   memory addresses.  */
+static bool
+mem_shadd_or_shadd_rtx_p (rtx x)
+{
+  return ((GET_CODE (x) == ASHIFT
+  || GET_CODE (x) == MULT)
+ && CONST_INT_P (XEXP (x, 1))
+ && ((GET_CODE (x) == ASHIFT && IN_RANGE (INTVAL (XEXP (x, 1)), 1, 3))
+ || (GET_CODE (x) == MULT
+ && IN_RANGE (exact_log2 (INTVAL (XEXP (x, 1))), 1, 3;
+}
+
 /* This function is used to implement LEGITIMIZE_ADDRESS.  If X can
be legitimized in a way that the generic machinery might not expect,
return a new address, otherwise return NULL.  MODE is the mode of
@@ -1830,6 +1846,32 @@ riscv_legitimize_address (rtx x, rtx oldx 
ATTRIBUTE_UNUSED,
   rtx base = XEXP (x, 0);
   HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
 
+  /* Handle (plus (plus (mult (a) (mem_shadd_constant)) (fp)) (C)) case.  
*/
+  if (GET_CODE (base) == PLUS && mem_shadd_or_shadd_rtx_p (XEXP (base, 0))
+ 

[PATCH v2] analyzer: More features for CPython analyzer plugin [PR107646]

2023-08-09 Thread Eric Feng via Gcc-patches
Thank you for your help in getting dg-require-python-h working! I can
confirm that the FAILs are related to differences between the --cflags
affecting the gimple seen by the analyzer. For this reason, I have
changed it to --includes for now. To be sure, I tested on Python 3.8 as
well and it works as expected. I have also addressed the following
comments on the WIP patch as you described.

-- Update Changelog entry to list new functions being simulated.
-- Update region_model::get_or_create_region_for_heap_alloc leading
comment.
-- Change register_alloc to update_state_machine.
-- Change move_ptr_sval_non_null to transition_ptr_sval_non_null.
-- Static helper functions for:
-- Initializing ob_refcnt field.
-- Setting ob_type field.
-- Getting ob_base field.
-- Initializing heap allocated region for PyObjects.
-- Incrementing a field by one.
-- Change arg_is_long_p to arg_is_integral_p.
-- Extract common failure scenario for reusability.

The initial WIP patch using 

/* { dg-options "-fanalyzer -I/usr/include/python3.9" }. */

have been bootstrapped and regtested on aarch64-unknown-linux-gnu. Since
we did not change any core logic in the revision and the only changes
within the analyzer core are changing variable names, is it OK for
trunk. In the mean time, the revised patch is currently going through
bootstrap and regtest process.

Best,
Eric

---
This patch adds known function subclasses for Python/C API functions
PyList_New, PyLong_FromLong, and PyList_Append. It also adds new
optional parameters for
region_model::get_or_create_region_for_heap_alloc, allowing for the
newly allocated region to immediately transition from the start state to
the assumed non-null state in the malloc state machine if desired.
Finally, it adds a new procedure, dg-require-python-h, intended as a
directive in Python-related analyzer tests, to append necessary Python
flags during the tests' build process.

The main warnings we gain in this patch with respect to the known function
subclasses mentioned are leak related. For example:

rc3.c: In function ‘create_py_object’:
│
rc3.c:21:10: warning: leak of ‘item’ [CWE-401] [-Wanalyzer-malloc-leak]
│
   21 |   return list;
  │
  |  ^~~~
│
  ‘create_py_object’: events 1-4
│
|
│
|4 |   PyObject* item = PyLong_FromLong(10);
│
|  |^~~
│
|  ||
│
|  |(1) allocated here
│
|  |(2) when ‘PyLong_FromLong’ succeeds
│
|5 |   PyObject* list = PyList_New(2);
│
|  |~
│
|  ||
│
|  |(3) when ‘PyList_New’ fails
│
|..
│
|   21 |   return list;
│
|  |  
│
|  |  |
│
|  |  (4) ‘item’ leaks here; was allocated at (1)
│

Some concessions were made to
simplify the analysis process when comparing kf_PyList_Append with the
real implementation. In particular, PyList_Append performs some
optimization internally to try and avoid calls to realloc if
possible. For simplicity, we assume that realloc is called every time.
Also, we grow the size by just 1 (to ensure enough space for adding a
new element) rather than abide by the heuristics that the actual implementation
follows.

gcc/analyzer/ChangeLog:
PR analyzer/107646
* region-model.cc (region_model::get_or_create_region_for_heap_alloc):
New optional parameters.
* region-model.h (class region_model): New optional parameters.
* sm-malloc.cc (on_realloc_with_move): New function.
(region_model::transition_ptr_sval_non_null): New function.

gcc/testsuite/ChangeLog:
PR analyzer/107646
* gcc.dg/plugin/analyzer_cpython_plugin.c: Analyzer support for
PyList_New, PyList_Append, PyLong_FromLong
* gcc.dg/plugin/plugin.exp: New test.
* lib/target-supports.exp: New procedure.
* gcc.dg/plugin/cpython-plugin-test-2.c: New test.

Signed-off-by: Eric Feng 
---
 gcc/analyzer/region-model.cc  |  20 +-
 gcc/analyzer/region-model.h   |  10 +-
 gcc/analyzer/sm-malloc.cc |  40 +
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 711 ++
 .../gcc.dg/plugin/cpython-plugin-test-2.c |  78 ++
 gcc/testsuite/gcc.dg/plugin/plugin.exp|   3 +-
 gcc/testsuite/lib/target-supports.exp |  25 +
 7 files changed, 881 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index e92b3f7b074..c338f045d92 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -5127,11 +5127,16 @@ region_model::check_dynamic_size_for_floats (const 
svalue *size_in_bytes,
Use CTXT to complain about tainted sizes.
 
Reuse an existing 

[PATCH] MATCH: [PR110937/PR100798] (a ? ~b : b) should be optimized to b ^ -(a)

2023-08-09 Thread Andrew Pinski via Gcc-patches
This adds a simple match pattern for this case.
I noticed it a couple of different places.
One while I was looking at code generation of a parser and
also while I was looking at locations where bitwise_inverted_equal_p
should be used more.

Committed as approved after bootstrapped and tested on x86_64-linux-gnu with no 
regressions.

PR tree-optimization/110937
PR tree-optimization/100798

gcc/ChangeLog:

* match.pd (`a ? ~b : b`): Handle this
case.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/bool-14.c: New test.
* gcc.dg/tree-ssa/bool-15.c: New test.
* gcc.dg/tree-ssa/phi-opt-33.c: New test.
* gcc.dg/tree-ssa/20030709-2.c: Update testcase
so `a ? -1 : 0` is not used to hit the match
pattern.
---
 gcc/match.pd   | 14 ++
 gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c |  5 +++--
 gcc/testsuite/gcc.dg/tree-ssa/bool-14.c| 15 +++
 gcc/testsuite/gcc.dg/tree-ssa/bool-15.c| 18 ++
 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-33.c | 13 +
 5 files changed, 63 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bool-14.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bool-15.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-33.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 9b4819e5be7..fc630b63563 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -6460,6 +6460,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (cmp == NE_EXPR)
{ constant_boolean_node (true, type); })))
 
+#if GIMPLE
+/* a?~t:t -> (-(a))^t */
+(simplify
+ (cond @0 @1 @2)
+ (if (INTEGRAL_TYPE_P (type)
+  && bitwise_inverted_equal_p (@1, @2))
+  (with {
+auto prec = TYPE_PRECISION (type);
+auto unsign = TYPE_UNSIGNED (type);
+tree inttype = build_nonstandard_integer_type (prec, unsign);
+   }
+   (convert (bit_xor (negate (convert:inttype @0)) (convert:inttype @2))
+#endif
+
 /* Simplify pointer equality compares using PTA.  */
 (for neeq (ne eq)
  (simplify
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c
index 5009cd69cfe..78938f919d4 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c
@@ -29,15 +29,16 @@ union tree_node
 };
 int make_decl_rtl (tree, int);
 void *
-get_alias_set (t)
+get_alias_set (t, t1)
  tree t;
+ void *t1;
 {
   long set;
   if (t->decl.rtl)
 return (t->decl.rtl->fld[1].rtmem 
? 0
: (((t->decl.rtl ? t->decl.rtl: (make_decl_rtl (t, 0), 
t->decl.rtl)))->fld[1]).rtmem);
-  return (void*)-1;
+  return t1;
 }
 
 /* There should be precisely one load of ->decl.rtl.  If there is
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bool-14.c 
b/gcc/testsuite/gcc.dg/tree-ssa/bool-14.c
new file mode 100644
index 000..0149380a63b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/bool-14.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized-raw" } */
+/* PR tree-optimization/110937 */
+
+_Bool f2(_Bool a, _Bool b)
+{
+if (a)
+  return !b;
+return b;
+}
+
+/* We should be able to remove the conditional and convert it to an xor. */
+/* { dg-final { scan-tree-dump-not "gimple_cond " "optimized" } } */
+/* { dg-final { scan-tree-dump-not "gimple_phi " "optimized" } } */
+/* { dg-final { scan-tree-dump-times "bit_xor_expr, " 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bool-15.c 
b/gcc/testsuite/gcc.dg/tree-ssa/bool-15.c
new file mode 100644
index 000..1f496663863
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/bool-15.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized-raw" } */
+/* PR tree-optimization/110937 */
+
+_Bool f2(int x, int y, int w, int z)
+{
+  _Bool a = x == y;
+  _Bool b = w == z;
+  if (a)
+return !b;
+  return b;
+}
+
+/* We should be able to remove the conditional and convert it to an xor. */
+/* { dg-final { scan-tree-dump-not "gimple_cond " "optimized" } } */
+/* { dg-final { scan-tree-dump-not "gimple_phi " "optimized" } } */
+/* { dg-final { scan-tree-dump-not "ne_expr, " "optimized" } } */
+/* { dg-final { scan-tree-dump-times "bit_xor_expr, " 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-33.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-33.c
new file mode 100644
index 000..b79fe44187a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-33.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized-raw" } */
+/* PR tree-optimization/100798 */
+
+int f(int a, int t)
+{
+  return (a=='s' ? ~t : t);
+}
+
+/* This should be convert into t^-(a=='s').  */
+/* { dg-final { scan-tree-dump-times "bit_xor_expr, " 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "negate_expr, " 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "bit_not_expr, " "optimized" } } 

Re: [V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-08-09 Thread Kees Cook via Gcc-patches
On Mon, Aug 07, 2023 at 04:33:13PM +, Qing Zhao wrote:
> What’s the testing case for the one that failed? 
> If it’s 
> 
> __builtin_dynamic_object_size(p->array, 0/2) without the allocation 
> information in the routine, 
> then with the current algorithm, gcc cannot deduce the size for the whole 
> object.
> 
> If not such case, let me know.

I found some more bugs in my tests (now fixed), but I'm left with a single
failure case, which is think it going to boil down to pointer/pointee
issue we discussed earlier.

Using my existing testing tool:
https://github.com/kees/kernel-tools/blob/trunk/fortify/array-bounds.c

I see this error with the "counted_by_seen_by_bdos" case:

Expected __builtin_dynamic_object_size(p, 1) (18446744073709551615) == 
sizeof(*p) + p->count * sizeof(*p->array) (80)

A reduced view of the code is:

struct annotated *p;
int index = MAX_INDEX + unconst;

p = alloc_annotated(index);

EXPECT_EQ(__builtin_dynamic_object_size(p, 1), sizeof(*p) + p->count * 
sizeof(*p->array));

It looks like __bdos will not use the __counted_by information from the
pointee if the argument is only the pointer. i.e. this test works:

EXPECT_EQ(__builtin_dynamic_object_size(p->array, 1), p->count * 
sizeof(*p->array));

However, I thought if any part of the pointee was used (e.g. p->count,
p->array), GCC would be happy to start using the pointee details?

And, again, for the initial version at this feature, I'm fine with this
failing test being declared not a valid test. :) But I'll need some
kind of builtin that can correctly interrogate a pointer to find the
full runtime size with the assumption that pointer is valid, but that
can come later.

And as a side note, I am excited to see the very correct warnings for
bad (too late) assignment of the __counted_by member:

p->array[0] = 0;
p->count = 1;

array-bounds.c: In function 'invalid_assignment_order':
array-bounds.c:366:17: warning: '*p.count' is used uninitialized 
[-Wuninitialized]
  366 | p->array[0] = 0;
  | ^~~

Yay! :)

-Kees

-- 
Kees Cook


PING^3: [PATCH V4, rs6000] Disable generation of scalar modulo instructions

2023-08-09 Thread Pat Haugen via Gcc-patches

Ping.

On 6/30/23 2:26 PM, Pat Haugen via Gcc-patches wrote:

Updated from prior version to address latest review comment (simplify
umod3).

Disable generation of scalar modulo instructions.

It was recently discovered that the scalar modulo instructions can suffer
noticeable performance issues for certain input values. This patch disables
their generation since the equivalent div/mul/sub sequence does not suffer
the same problem.

Bootstrapped and regression tested on powerpc64/powerpc64le.
Ok for master and backports after burn in?

-Pat


2023-06-30  Pat Haugen  

gcc/
     * config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling
     scalar modulo.
     * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
     * config/rs6000/rs6000.md (mod3, *mod3): Disable.
     (define_expand umod3): New.
     (define_insn umod3): Rename to *umod3 and disable.
     (umodti3, modti3): Disable.

gcc/testsuite/
     * gcc.target/powerpc/clone1.c: Add xfails.
     * gcc.target/powerpc/clone3.c: Likewise.
     * gcc.target/powerpc/mod-1.c: Update scan strings and add xfails.
     * gcc.target/powerpc/mod-2.c: Likewise.
     * gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails.




Re: [PATCH 3/12] phiopt: Fix phiopt ICE on vops [PR102989]

2023-08-09 Thread Andrew Pinski via Gcc-patches
On Wed, Aug 9, 2023 at 11:17 AM Jakub Jelinek via Gcc-patches
 wrote:
>
> Hi!
>
> I've ran into ICE on gcc.dg/torture/bitint-42.c with -O1 or -Os
> when enabling expensive tests, and unfortunately I can't reproduce without
> _BitInt.  The IL before phiopt3 has:
>[local count: 203190070]:
>   # .MEM_428 = VDEF <.MEM_367>
>   bitint.159 = VIEW_CONVERT_EXPR(*.LC3);
>   goto ; [100.00%]
>
>[local count: 203190070]:
>   # .MEM_427 = VDEF <.MEM_367>
>   bitint.159 = VIEW_CONVERT_EXPR(*.LC4);
>
>[local count: 406380139]:
>   # .MEM_368 = PHI <.MEM_428(87), .MEM_427(88)>
>   # VUSE <.MEM_368>
>   _123 = VIEW_CONVERT_EXPR(r495[i_107].D.2780)[0];
> and factor_out_conditional_operation is called on the vop PHI, it
> sees it has exactly two operands and defining statements of both
> PHI arguments are converts (VCEs in this case), so it thinks it is
> a good idea to try to optimize that and while doing that it constructs
> void type SSA_NAMEs and the like.

Maybe it is better to punt for VOPS after the call to
single_non_singleton_phi_for_edges since none of functions called
afterwards support VOPs.
That is something like:
diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index ff36bb0119b..d0b659042a7 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -4165,6 +4165,10 @@ pass_phiopt::execute (function *)
   arg0 = gimple_phi_arg_def (phi, e1->dest_idx);
   arg1 = gimple_phi_arg_def (phi, e2->dest_idx);

+  /* Can't do anything with a VOP here.  */
+  if (SSA_NAME_IS_VIRTUAL_OPERAND (arg0))
+   continue;
+
   /* Something is wrong if we cannot find the arguments in the PHI
  node.  */
   gcc_assert (arg0 != NULL_TREE && arg1 != NULL_TREE);

Thanks,
Andrew Pinski

>
> 2023-08-09  
>
> PR c/102989
> * tree-ssa-phiopt.cc (factor_out_conditional_operation): Punt for
> vops.
>
> --- gcc/tree-ssa-phiopt.cc.jj   2023-08-08 15:55:09.508122417 +0200
> +++ gcc/tree-ssa-phiopt.cc  2023-08-09 15:55:23.762314103 +0200
> @@ -241,6 +241,7 @@ factor_out_conditional_operation (edge e
>  }
>
>if (TREE_CODE (arg0) != SSA_NAME
> +  || SSA_NAME_IS_VIRTUAL_OPERAND (arg0)
>|| (TREE_CODE (arg1) != SSA_NAME
>   && TREE_CODE (arg1) != INTEGER_CST))
>  return NULL;
>
> Jakub
>


[PATCH 10/12] C _BitInt support [PR102989]

2023-08-09 Thread Jakub Jelinek via Gcc-patches
Hi!

This patch adds the C FE support, c-family support, small libcpp change
so that 123wb and 42uwb suffixes are handled plus glimits.h change
to define BITINT_MAXWIDTH macro.

The previous patches really do nothing without this, which enables
all the support.

2023-08-09  Jakub Jelinek  

PR c/102989
gcc/
* glimits.h (BITINT_MAXWIDTH): Define if __BITINT_MAXWIDTH__ is
predefined.
gcc/c-family/
* c-common.cc (c_common_reswords): Add _BitInt as keyword.
(c_common_signed_or_unsigned_type): Handle BITINT_TYPE.
(check_builtin_function_arguments): Handle BITINT_TYPE like
INTEGER_TYPE.
(sync_resolve_size): Add ORIG_FORMAT argument.  If
FETCH && !ORIG_FORMAT, type is BITINT_TYPE, return -1 if size isn't
one of 1, 2, 4, 8 or 16 or if it is 16 but TImode is not supported.
(atomic_bitint_fetch_using_cas_loop): New function.
(resolve_overloaded_builtin): Adjust sync_resolve_size caller.  If
-1 is returned, use atomic_bitint_fetch_using_cas_loop to lower it.
Formatting fix.
(keyword_begins_type_specifier): Handle RID_BITINT.
* c-common.h (enum rid): Add RID_BITINT enumerator.
* c-cppbuiltin.cc (c_cpp_builtins): For C call
targetm.c.bitint_type_info and predefine __BITINT_MAXWIDTH__
and for -fbuilding-libgcc also __LIBGCC_BITINT_LIMB_WIDTH__ and
__LIBGCC_BITINT_ORDER__ macros if _BitInt is supported.
* c-lex.cc (interpret_integer): Handle CPP_N_BITINT.
* c-pretty-print.cc (c_pretty_printer::simple_type_specifier,
c_pretty_printer::direct_abstract_declarator): Handle BITINT_TYPE.
(pp_c_integer_constant): Handle printing of large precision wide_ints
which would buffer overflow digit_buffer.
gcc/c/
* c-convert.cc (c_convert): Handle BITINT_TYPE like INTEGER_TYPE.
* c-decl.cc (check_bitfield_type_and_width): Allow BITINT_TYPE
bit-fields.
(finish_struct): Prefer to use BITINT_TYPE for BITINT_TYPE bit-fields
if possible.
(declspecs_add_type): Formatting fixes.  Handle cts_bitint.  Adjust
for added union in *specs.  Handle RID_BITINT.
(finish_declspecs): Handle cts_bitint.  Adjust for added union
in *specs.
* c-parser.cc (c_keyword_starts_typename, c_token_starts_declspecs,
c_parser_declspecs, c_parser_gnu_attribute_any_word): Handle
RID_BITINT.
* c-tree.h (enum c_typespec_keyword): Mention _BitInt in comment.
Add cts_bitint enumerator.
(struct c_declspecs): Move int_n_idx and floatn_nx_idx into a union
and add bitint_prec there as well.
* c-typeck.cc (composite_type, c_common_type, comptypes_internal):
Handle BITINT_TYPE.
(perform_integral_promotions): Promote BITINT_TYPE bit-fields to
their declared type.
(build_array_ref, build_unary_op, build_conditional_expr,
convert_for_assignment, digest_init, build_binary_op): Likewise.
libcpp/
* expr.cc (interpret_int_suffix): Handle wb and WB suffixes.
* include/cpplib.h (CPP_N_BITINT): Define.

--- gcc/glimits.h.jj2023-08-08 15:54:34.481612931 +0200
+++ gcc/glimits.h   2023-08-08 16:12:02.321939910 +0200
@@ -157,6 +157,11 @@ see the files COPYING3 and COPYING.RUNTI
 # undef BOOL_WIDTH
 # define BOOL_WIDTH 1
 
+# ifdef __BITINT_MAXWIDTH__
+#  undef BITINT_MAXWIDTH
+#  define BITINT_MAXWIDTH __BITINT_MAXWIDTH__
+# endif
+
 # define __STDC_VERSION_LIMITS_H__ 202311L
 #endif
 
--- gcc/c-family/c-common.cc.jj 2023-08-08 15:55:05.243182143 +0200
+++ gcc/c-family/c-common.cc2023-08-08 16:19:29.102683903 +0200
@@ -349,6 +349,7 @@ const struct c_common_resword c_common_r
   { "_Alignas",RID_ALIGNAS,   D_CONLY },
   { "_Alignof",RID_ALIGNOF,   D_CONLY },
   { "_Atomic", RID_ATOMIC,D_CONLY },
+  { "_BitInt", RID_BITINT,D_CONLY },
   { "_Bool",   RID_BOOL,  D_CONLY },
   { "_Complex",RID_COMPLEX,0 },
   { "_Imaginary",  RID_IMAGINARY, D_CONLY },
@@ -2728,6 +2729,9 @@ c_common_signed_or_unsigned_type (int un
   || TYPE_UNSIGNED (type) == unsignedp)
 return type;
 
+  if (TREE_CODE (type) == BITINT_TYPE)
+return build_bitint_type (TYPE_PRECISION (type), unsignedp);
+
 #define TYPE_OK(node)  \
   (TYPE_MODE (type) == TYPE_MODE (node)
\
&& TYPE_PRECISION (type) == TYPE_PRECISION (node))
@@ -6341,8 +6345,10 @@ check_builtin_function_arguments (locati
  code0 = TREE_CODE (TREE_TYPE (args[0]));
  code1 = TREE_CODE (TREE_TYPE (args[1]));
  if (!((code0 == REAL_TYPE && code1 == REAL_TYPE)
-   || (code0 == REAL_TYPE && code1 == INTEGER_TYPE)
-   || (code0 == INTEGER_TYPE && code1 == REAL_TYPE)))
+   || (code0 == REAL_TYPE
+   

[PATCH 9/12] libgcc _BitInt support [PR102989]

2023-08-09 Thread Jakub Jelinek via Gcc-patches
Hi!

This patch adds the library helpers for multiplication, division + modulo
and casts from and to floating point (both binary and decimal).
As described in the intro, the first step is try to reduce further the
passed in precision by skipping over most significant limbs with just zeros
or sign bit copies.  For multiplication and division I've implemented
a simple algorithm, using something smarter like Karatsuba or Toom N-Way
might be faster for very large _BitInts (which we don't support right now
anyway), but could mean more code in libgcc, which maybe isn't what people
are willing to accept.
For the to/from floating point conversions the patch uses soft-fp, because
it already has tons of handy macros which can be used for that.  In theory
it could be implemented using {,unsigned} long long or {,unsigned} __int128
to/from floating point conversions with some frexp before/after, but at that
point we already need to force it into integer registers and analyze it
anyway.  Plus, for 32-bit arches there is no __int128 that could be used
for XF/TF mode stuff.
I know that soft-fp is owned by glibc and I think the op-common.h change
should be propagated there, but the bitint stuff is really GCC specific
and IMHO doesn't belong into the glibc copy.

2023-08-09  Jakub Jelinek  

PR c/102989
libgcc/
* config/aarch64/t-softfp (softfp_extras): Use += rather than :=.
* config/i386/64/t-softfp (softfp_extras): Likewise.
* config/i386/libgcc-glibc.ver (GCC_14.0.0): Export _BitInt support
routines.
* config/i386/t-softfp (softfp_extras): Add fixxfbitint and
bf, hf and xf mode floatbitint.
(CFLAGS-floatbitintbf.c, CFLAGS-floatbitinthf.c): Add -msse2.
* config/riscv/t-softfp32 (softfp_extras): Use += rather than :=.
* config/rs6000/t-e500v1-fp (softfp_extras): Likewise.
* config/rs6000/t-e500v2-fp (softfp_extras): Likewise.
* config/t-softfp (softfp_floatbitint_funcs): New.
(softfp_bid_list): New.
(softfp_func_list): Add sf and df mode from and to _BitInt libcalls.
(softfp_bid_file_list): New.
(LIB2ADD_ST): Add $(softfp_bid_file_list).
* config/t-softfp-sfdftf (softfp_extras): Add fixtfbitint and
floatbitinttf.
* config/t-softfp-tf (softfp_extras): Likewise.
* libgcc2.c (bitint_reduce_prec): New inline function.
(BITINT_INC, BITINT_END): Define.
(bitint_mul_1, bitint_addmul_1): New helper functions.
(__mulbitint3): New function.
(bitint_negate, bitint_submul_1): New helper functions.
(__divmodbitint4): New function.
* libgcc2.h (LIBGCC2_UNITS_PER_WORD): When building _BitInt support
libcalls, redefine depending on __LIBGCC_BITINT_LIMB_WIDTH__.
(__mulbitint3, __divmodbitint4): Declare.
* libgcc-std.ver.in (GCC_14.0.0): Export _BitInt support routines.
* Makefile.in (lib2funcs): Add _mulbitint3.
(LIB2_DIVMOD_FUNCS): Add _divmodbitint4.
* soft-fp/bitint.h: New file.
* soft-fp/fixdfbitint.c: New file.
* soft-fp/fixsfbitint.c: New file.
* soft-fp/fixtfbitint.c: New file.
* soft-fp/fixxfbitint.c: New file.
* soft-fp/floatbitintbf.c: New file.
* soft-fp/floatbitintdf.c: New file.
* soft-fp/floatbitinthf.c: New file.
* soft-fp/floatbitintsf.c: New file.
* soft-fp/floatbitinttf.c: New file.
* soft-fp/floatbitintxf.c: New file.
* soft-fp/op-common.h (_FP_FROM_INT): Add support for rsize up to
4 * _FP_W_TYPE_SIZE rather than just 2 * _FP_W_TYPE_SIZE.
* soft-fp/bitintpow10.c: New file.
* soft-fp/fixsdbitint.c: New file.
* soft-fp/fixddbitint.c: New file.
* soft-fp/fixtdbitint.c: New file.
* soft-fp/floatbitintsd.c: New file.
* soft-fp/floatbitintdd.c: New file.
* soft-fp/floatbitinttd.c: New file.

--- libgcc/config/aarch64/t-softfp.jj   2023-08-08 15:54:35.737595343 +0200
+++ libgcc/config/aarch64/t-softfp  2023-08-08 16:12:02.346939560 +0200
@@ -3,7 +3,7 @@ softfp_int_modes := si di ti
 softfp_extensions := sftf dftf hftf bfsf
 softfp_truncations := tfsf tfdf tfhf tfbf dfbf sfbf hfbf
 softfp_exclude_libgcc2 := n
-softfp_extras := fixhfti fixunshfti floattihf floatuntihf \
+softfp_extras += fixhfti fixunshfti floattihf floatuntihf \
 floatdibf floatundibf floattibf floatuntibf
 
 TARGET_LIBGCC2_CFLAGS += -Wno-missing-prototypes
--- libgcc/config/i386/64/t-softfp.jj   2023-08-08 15:54:35.766594936 +0200
+++ libgcc/config/i386/64/t-softfp  2023-08-08 16:12:02.346939560 +0200
@@ -1,4 +1,4 @@
-softfp_extras := fixhfti fixunshfti floattihf floatuntihf \
+softfp_extras += fixhfti fixunshfti floattihf floatuntihf \
 floattibf floatuntibf
 
 CFLAGS-fixhfti.c += -msse2
--- libgcc/config/i386/libgcc-glibc.ver.jj  2023-08-08 15:54:35.831594026 
+0200
+++ libgcc/config/i386/libgcc-glibc.ver 

[PATCH 7/12] ubsan: _BitInt -fsanitize=undefined support [PR102989]

2023-08-09 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch introduces some -fsanitize=undefined support for _BitInt,
but some of the diagnostics is limited by lack of proper support in the
library.
I've filed https://github.com/llvm/llvm-project/issues/64100 to request
proper support, for now some of the diagnostics might have less or more
confusing or inaccurate wording but UB should still be diagnosed when it
happens.

2023-08-09  Jakub Jelinek  

PR c/102989
gcc/
* internal-fn.cc (expand_ubsan_result_store): Add LHS, MODE and
DO_ERROR arguments.  For non-mode precision BITINT_TYPE results
check if all padding bits up to mode precision are zeros or sign
bit copies and if not, jump to DO_ERROR.
(expand_addsub_overflow, expand_neg_overflow, expand_mul_overflow):
Adjust expand_ubsan_result_store callers.
* ubsan.cc: Include target.h and langhooks.h.
(ubsan_encode_value): Pass BITINT_TYPE values which fit into pointer
size converted to pointer sized integer, pass BITINT_TYPE values
which fit into TImode (if supported) or DImode as those integer types
or otherwise for now punt (pass 0).
(ubsan_type_descriptor): Handle BITINT_TYPE.  For pstyle of
UBSAN_PRINT_FORCE_INT use TK_Integer (0x) mode with a
TImode/DImode precision rather than TK_Unknown used otherwise for
large/huge BITINT_TYPEs.
(instrument_si_overflow): Instrument BITINT_TYPE operations even when
they don't have mode precision.
* ubsan.h (enum ubsan_print_style): New enumerator.
gcc/c-family/
* c-ubsan.cc (ubsan_instrument_shift): Use UBSAN_PRINT_FORCE_INT
for type0 type descriptor.

--- gcc/ubsan.cc.jj 2023-08-08 15:54:35.443599459 +0200
+++ gcc/ubsan.cc2023-08-08 16:12:02.329939798 +0200
@@ -50,6 +50,8 @@ along with GCC; see the file COPYING3.
 #include "gimple-fold.h"
 #include "varasm.h"
 #include "realmpfr.h"
+#include "target.h"
+#include "langhooks.h"
 
 /* Map from a tree to a VAR_DECL tree.  */
 
@@ -125,6 +127,25 @@ tree
 ubsan_encode_value (tree t, enum ubsan_encode_value_phase phase)
 {
   tree type = TREE_TYPE (t);
+  if (TREE_CODE (type) == BITINT_TYPE)
+{
+  if (TYPE_PRECISION (type) <= POINTER_SIZE)
+   {
+ type = pointer_sized_int_node;
+ t = fold_build1 (NOP_EXPR, type, t);
+   }
+  else
+   {
+ scalar_int_mode arith_mode
+   = (targetm.scalar_mode_supported_p (TImode) ? TImode : DImode);
+ if (TYPE_PRECISION (type) > GET_MODE_PRECISION (arith_mode))
+   return build_zero_cst (pointer_sized_int_node);
+ type
+   = build_nonstandard_integer_type (GET_MODE_PRECISION (arith_mode),
+ TYPE_UNSIGNED (type));
+ t = fold_build1 (NOP_EXPR, type, t);
+   }
+}
   scalar_mode mode = SCALAR_TYPE_MODE (type);
   const unsigned int bitsize = GET_MODE_BITSIZE (mode);
   if (bitsize <= POINTER_SIZE)
@@ -355,14 +376,32 @@ ubsan_type_descriptor (tree type, enum u
 {
   /* See through any typedefs.  */
   type = TYPE_MAIN_VARIANT (type);
+  tree type3 = type;
+  if (pstyle == UBSAN_PRINT_FORCE_INT)
+{
+  /* Temporary hack for -fsanitize=shift with _BitInt(129) and more.
+libubsan crashes if it is not TK_Integer type.  */
+  if (TREE_CODE (type) == BITINT_TYPE)
+   {
+ scalar_int_mode arith_mode
+   = (targetm.scalar_mode_supported_p (TImode)
+  ? TImode : DImode);
+ if (TYPE_PRECISION (type) > GET_MODE_PRECISION (arith_mode))
+   type3 = build_qualified_type (type, TYPE_QUAL_CONST);
+   }
+  if (type3 == type)
+   pstyle = UBSAN_PRINT_NORMAL;
+}
 
-  tree decl = decl_for_type_lookup (type);
+  tree decl = decl_for_type_lookup (type3);
   /* It is possible that some of the earlier created DECLs were found
  unused, in that case they weren't emitted and varpool_node::get
  returns NULL node on them.  But now we really need them.  Thus,
  renew them here.  */
   if (decl != NULL_TREE && varpool_node::get (decl))
-return build_fold_addr_expr (decl);
+{
+  return build_fold_addr_expr (decl);
+}
 
   tree dtype = ubsan_get_type_descriptor_type ();
   tree type2 = type;
@@ -370,6 +409,7 @@ ubsan_type_descriptor (tree type, enum u
   pretty_printer pretty_name;
   unsigned char deref_depth = 0;
   unsigned short tkind, tinfo;
+  char tname_bitint[sizeof ("unsigned _BitInt(2147483647)")];
 
   /* Get the name of the type, or the name of the pointer type.  */
   if (pstyle == UBSAN_PRINT_POINTER)
@@ -403,8 +443,18 @@ ubsan_type_descriptor (tree type, enum u
 }
 
   if (tname == NULL)
-/* We weren't able to determine the type name.  */
-tname = "";
+{
+  if (TREE_CODE (type2) == BITINT_TYPE)
+   {
+ snprintf (tname_bitint, sizeof (tname_bitint),
+   "%s_BitInt(%d)", TYPE_UNSIGNED (type2) ? "unsigned " : "",
+

[PATCH 6/12] i386: Enable _BitInt on x86-64 [PR102989]

2023-08-09 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch enables _BitInt support on x86-64, the only
target which has _BitInt specified in psABI.

2023-08-09  Jakub Jelinek  

PR c/102989
* config/i386/i386.cc (classify_argument): Handle BITINT_TYPE.
(ix86_bitint_type_info): New function.
(TARGET_C_BITINT_TYPE_INFO): Redefine.

--- gcc/config/i386/i386.cc.jj  2023-08-08 15:55:05.627176766 +0200
+++ gcc/config/i386/i386.cc 2023-08-08 16:12:02.308940091 +0200
@@ -2121,7 +2121,8 @@ classify_argument (machine_mode mode, co
return 0;
 }
 
-  if (type && AGGREGATE_TYPE_P (type))
+  if (type && (AGGREGATE_TYPE_P (type)
+  || (TREE_CODE (type) == BITINT_TYPE && words > 1)))
 {
   int i;
   tree field;
@@ -2270,6 +2271,14 @@ classify_argument (machine_mode mode, co
}
  break;
 
+   case BITINT_TYPE:
+ /* _BitInt(N) for N > 64 is passed as structure containing
+(N + 63) / 64 64-bit elements.  */
+ if (words > 2)
+   return 0;
+ classes[0] = classes[1] = X86_64_INTEGER_CLASS;
+ return 2;
+
default:
  gcc_unreachable ();
}
@@ -24842,6 +24851,26 @@ ix86_get_excess_precision (enum excess_p
   return FLT_EVAL_METHOD_UNPREDICTABLE;
 }
 
+/* Return true if _BitInt(N) is supported and fill details about it into
+   *INFO.  */
+bool
+ix86_bitint_type_info (int n, struct bitint_info *info)
+{
+  if (!TARGET_64BIT)
+return false;
+  if (n <= 8)
+info->limb_mode = QImode;
+  else if (n <= 16)
+info->limb_mode = HImode;
+  else if (n <= 32)
+info->limb_mode = SImode;
+  else
+info->limb_mode = DImode;
+  info->big_endian = false;
+  info->extended = false;
+  return true;
+}
+
 /* Implement PUSH_ROUNDING.  On 386, we have pushw instruction that
decrements by exactly 2 no matter what the position was, there is no pushb.
 
@@ -25446,6 +25475,8 @@ ix86_run_selftests (void)
 
 #undef TARGET_C_EXCESS_PRECISION
 #define TARGET_C_EXCESS_PRECISION ix86_get_excess_precision
+#undef TARGET_C_BITINT_TYPE_INFO
+#define TARGET_C_BITINT_TYPE_INFO ix86_bitint_type_info
 #undef TARGET_PROMOTE_PROTOTYPES
 #define TARGET_PROMOTE_PROTOTYPES hook_bool_const_tree_true
 #undef TARGET_PUSH_ARGUMENT

Jakub



Re: [RFC] GCC Security policy

2023-08-09 Thread David Edelsohn via Gcc-patches
On Wed, Aug 9, 2023 at 1:33 PM Siddhesh Poyarekar 
wrote:

> On 2023-08-08 10:30, Siddhesh Poyarekar wrote:
> >> Do you have a suggestion for the language to address libgcc,
> >> libstdc++, etc. and libiberty, libbacktrace, etc.?
> >
> > I'll work on this a bit and share a draft.
>
> Hi David,
>
> Here's what I came up with for different parts of GCC, including the
> runtime libraries.  Over time we may find that specific parts of runtime
> libraries simply cannot be used safely in some contexts and flag that.
>
> Sid
>

Hi, Sid

Thanks for iterating on this.


>
> """
> What is a GCC security bug?
> ===
>
>  A security bug is one that threatens the security of a system or
>  network, or might compromise the security of data stored on it.
>  In the context of GCC there are multiple ways in which this might
>  happen and they're detailed below.
>
> Compiler drivers, programs, libgccjit and support libraries
> ---
>
>  The compiler driver processes source code, invokes other programs
>  such as the assembler and linker and generates the output result,
>  which may be assembly code or machine code.  It is necessary that
>  all source code inputs to the compiler are trusted, since it is
>  impossible for the driver to validate input source code beyond
>  conformance to a programming language standard.
>
>  The GCC JIT implementation, libgccjit, is intended to be plugged
>  into applications to translate input source code in the application
>  context.  Limitations that apply to the compiler
>  driver, apply here too in terms of sanitizing inputs, so it is
>  recommended that inputs are either sanitized by an external program
>  to allow only trusted, safe execution in the context of the
>  application or the JIT execution context is appropriately sandboxed
>  to contain the effects of any bugs in the JIT or its generated code
>  to the sandboxed environment.
>
>  Support libraries such as libiberty, libcc1 libvtv and libcpp have
>  been developed separately to share code with other tools such as
>  binutils and gdb.  These libraries again have similar challenges to
>  compiler drivers.  While they are expected to be robust against
>  arbitrary input, they should only be used with trusted inputs.
>
>  Libraries such as zlib and libffi that bundled into GCC to build it
>  will be treated the same as the compiler drivers and programs as far
>  as security coverage is concerned.
>

Should we direct people to the upstream projects for their security
policies?


>  As a result, the only case for a potential security issue in all
>  these cases is when it ends up generating vulnerable output for
>  valid input source code.


> Language runtime libraries
> --
>
>  GCC also builds and distributes libraries that are intended to be
>  used widely to implement runtime support for various programming
>  languages.  These include the following:
>
>  * libada
>  * libatomic
>  * libbacktrace
>  * libcc1
>  * libcody
>  * libcpp
>  * libdecnumber
>  * libgcc
>  * libgfortran
>  * libgm2
>  * libgo
>  * libgomp
>  * libiberty
>  * libitm
>  * libobjc
>  * libphobos
>  * libquadmath
>  * libssp
>  * libstdc++
>
>  These libraries are intended to be used in arbitrary contexts and as
>  a result, bugs in these libraries may be evaluated for security
>  impact.  However, some of these libraries, e.g. libgo, libphobos,
>  etc.  are not maintained in the GCC project, due to which the GCC
>  project may not be the correct point of contact for them.  You are
>  encouraged to look at README files within those library directories
>  to locate the canonical security contact point for those projects.
>

As Richard mentioned, should GCC make a specific statement about the
security policy / response for issues that are discovered and fixed in the
upstream projects from which the GCC libraries are imported?


>
> Diagnostic libraries
> 
>
>  The sanitizer library bundled in GCC is intended to be used in
>  diagnostic cases and not intended for use in sensitive environments.
>  As a result, bugs in the sanitizer will not be considered security
>  sensitive.
>
> GCC plugins
> ---
>
>  It should be noted that GCC may execute arbitrary code loaded by a
>  user through the GCC plugin mechanism or through system preloading
>  mechanism.  Such custom code should be vetted by the user for safety
>  as bugs exposed through such code will not be considered security
>  issues.
>

Thanks, David


[PATCH 4/12] Middle-end _BitInt support [PR102989]

2023-08-09 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch introduces the middle-end part of the _BitInt
support, a new BITINT_TYPE, handling it where needed, except the lowering
pass and sanitizer support.

2023-08-09  Jakub Jelinek  

PR c/102989
* tree.def (BITINT_TYPE): New type.
* tree.h (TREE_CHECK6, TREE_NOT_CHECK6): Define.
(NUMERICAL_TYPE_CHECK, INTEGRAL_TYPE_P): Include
BITINT_TYPE.
(BITINT_TYPE_P): Define.
(CONSTRUCTOR_BITFIELD_P): Return true even for BLKmode bit-fields if
they have BITINT_TYPE type.
(tree_check6, tree_not_check6): New inline functions.
(any_integral_type_check): Include BITINT_TYPE.
(build_bitint_type): Declare.
* tree.cc (tree_code_size, wide_int_to_tree_1, cache_integer_cst,
build_zero_cst, type_hash_canon_hash, type_cache_hasher::equal,
type_hash_canon): Handle BITINT_TYPE.
(bitint_type_cache): New variable.
(build_bitint_type): New function.
(signed_or_unsigned_type_for, verify_type_variant, verify_type):
Handle BITINT_TYPE.
(tree_cc_finalize): Free bitint_type_cache.
* builtins.cc (type_to_class): Handle BITINT_TYPE.
(fold_builtin_unordered_cmp): Handle BITINT_TYPE like INTEGER_TYPE.
* cfgexpand.cc (expand_debug_expr): Punt on BLKmode BITINT_TYPE
INTEGER_CSTs.
* convert.cc (convert_to_pointer_1, convert_to_real_1,
convert_to_complex_1): Handle BITINT_TYPE like INTEGER_TYPE.
(convert_to_integer_1): Likewise.  For BITINT_TYPE don't check
GET_MODE_PRECISION (TYPE_MODE (type)).
* doc/generic.texi (BITINT_TYPE): Document.
* doc/tm.texi.in (TARGET_C_BITINT_TYPE_INFO): New.
* doc/tm.texi: Regenerated.
* dwarf2out.cc (base_type_die, is_base_type, modified_type_die,
gen_type_die_with_usage): Handle BITINT_TYPE.
(rtl_for_decl_init): Punt on BLKmode BITINT_TYPE INTEGER_CSTs or
handle those which fit into shwi.
* expr.cc (expand_expr_real_1): Define EXTEND_BITINT macro, reduce
to bitfield precision reads from BITINT_TYPE vars, parameters or
memory locations.  Expand large/huge BITINT_TYPE INTEGER_CSTs into
memory.
* fold-const.cc (fold_convert_loc, make_range_step): Handle
BITINT_TYPE.
(extract_muldiv_1): For BITINT_TYPE use TYPE_PRECISION rather than
GET_MODE_SIZE (SCALAR_INT_TYPE_MODE).
(native_encode_int, native_interpret_int, native_interpret_expr):
Handle BITINT_TYPE.
* gimple-expr.cc (useless_type_conversion_p): Make BITINT_TYPE
to some other integral type or vice versa conversions non-useless.
* gimple-fold.cc (gimple_fold_builtin_memset): Punt for BITINT_TYPE.
(clear_padding_unit): Mention in comment that _BitInt types don't need
to fit either.
(clear_padding_bitint_needs_padding_p): New function.
(clear_padding_type_may_have_padding_p): Handle BITINT_TYPE.
(clear_padding_type): Likewise.
* internal-fn.cc (expand_mul_overflow): For unsigned non-mode
precision operands force pos_neg? to 1.
(expand_MULBITINT, expand_DIVMODBITINT, expand_FLOATTOBITINT,
expand_BITINTTOFLOAT): New functions.
* internal-fn.def (MULBITINT, DIVMODBITINT, FLOATTOBITINT,
BITINTTOFLOAT): New internal functions.
* internal-fn.h (expand_MULBITINT, expand_DIVMODBITINT,
expand_FLOATTOBITINT, expand_BITINTTOFLOAT): Declare.
* match.pd (non-equality compare simplifications from fold_binary):
Punt if TYPE_MODE (arg1_type) is BLKmode.
* pretty-print.h (pp_wide_int): Handle printing of large precision
wide_ints which would buffer overflow digit_buffer.
* stor-layout.cc (finish_bitfield_representative): For bit-fields
with BITINT_TYPE, prefer representatives with precisions in
multiple of limb precision.
(layout_type): Handle BITINT_TYPE.  Handle COMPLEX_TYPE with BLKmode
element type and assert it is BITINT_TYPE.
* target.def (bitint_type_info): New C target hook.
* target.h (struct bitint_info): New type.
* targhooks.cc (default_bitint_type_info): New function.
* targhooks.h (default_bitint_type_info): Declare.
* tree-pretty-print.cc (dump_generic_node): Handle BITINT_TYPE.
Handle printing large wide_ints which would buffer overflow
digit_buffer.
* tree-ssa-sccvn.cc: Include target.h.
(eliminate_dom_walker::eliminate_stmt): Punt for large/huge
BITINT_TYPE.
* tree-switch-conversion.cc (jump_table_cluster::emit): For more than
64-bit BITINT_TYPE subtract low bound from expression and cast to
64-bit integer type both the controlling expression and case labels.
* typeclass.h (enum type_class): Add bitint_type_class enumerator.
* varasm.cc (output_constant): Handle BITINT_TYPE 

[PATCH 3/12] phiopt: Fix phiopt ICE on vops [PR102989]

2023-08-09 Thread Jakub Jelinek via Gcc-patches
Hi!

I've ran into ICE on gcc.dg/torture/bitint-42.c with -O1 or -Os
when enabling expensive tests, and unfortunately I can't reproduce without
_BitInt.  The IL before phiopt3 has:
   [local count: 203190070]:
  # .MEM_428 = VDEF <.MEM_367>
  bitint.159 = VIEW_CONVERT_EXPR(*.LC3);
  goto ; [100.00%]

   [local count: 203190070]:
  # .MEM_427 = VDEF <.MEM_367>
  bitint.159 = VIEW_CONVERT_EXPR(*.LC4);

   [local count: 406380139]:
  # .MEM_368 = PHI <.MEM_428(87), .MEM_427(88)>
  # VUSE <.MEM_368>
  _123 = VIEW_CONVERT_EXPR(r495[i_107].D.2780)[0];
and factor_out_conditional_operation is called on the vop PHI, it
sees it has exactly two operands and defining statements of both
PHI arguments are converts (VCEs in this case), so it thinks it is
a good idea to try to optimize that and while doing that it constructs
void type SSA_NAMEs and the like.

2023-08-09  

PR c/102989
* tree-ssa-phiopt.cc (factor_out_conditional_operation): Punt for
vops.

--- gcc/tree-ssa-phiopt.cc.jj   2023-08-08 15:55:09.508122417 +0200
+++ gcc/tree-ssa-phiopt.cc  2023-08-09 15:55:23.762314103 +0200
@@ -241,6 +241,7 @@ factor_out_conditional_operation (edge e
 }
 
   if (TREE_CODE (arg0) != SSA_NAME
+  || SSA_NAME_IS_VIRTUAL_OPERAND (arg0)
   || (TREE_CODE (arg1) != SSA_NAME
  && TREE_CODE (arg1) != INTEGER_CST))
 return NULL;

Jakub



[PATCH 2/12] lto-streamer-in: Adjust assert [PR102989]

2023-08-09 Thread Jakub Jelinek via Gcc-patches
Hi!

With _BitInt(575) or any other _BitInt(513) or larger constants we can
run into this assertion.  MAX_BITSIZE_MODE_ANY_INT is just a value from
which WIDE_INT_MAX_PRECISION is derived.

2023-08-09  Jakub Jelinek  

PR c/102989
* lto-streamer-in.cc (lto_input_tree_1): Assert TYPE_PRECISION
is up to WIDE_INT_MAX_PRECISION rather than MAX_BITSIZE_MODE_ANY_INT.

--- gcc/lto-streamer-in.cc.jj   2023-07-17 09:07:42.078283882 +0200
+++ gcc/lto-streamer-in.cc  2023-07-27 15:03:24.255234159 +0200
@@ -1888,7 +1888,7 @@ lto_input_tree_1 (class lto_input_block
 
   for (i = 0; i < len; i++)
a[i] = streamer_read_hwi (ib);
-  gcc_assert (TYPE_PRECISION (type) <= MAX_BITSIZE_MODE_ANY_INT);
+  gcc_assert (TYPE_PRECISION (type) <= WIDE_INT_MAX_PRECISION);
   result = wide_int_to_tree (type, wide_int::from_array
 (a, len, TYPE_PRECISION (type)));
   streamer_tree_cache_append (data_in->reader_cache, result, hash);

Jakub



[PATCH 1/12] expr: Small optimization [PR102989]

2023-08-09 Thread Jakub Jelinek via Gcc-patches
Hi!

Small optimization to avoid testing modifier multiple times.

2023-08-09  Jakub Jelinek  

PR c/102989
* expr.cc (expand_expr_real_1) : Add an early return for
EXPAND_WRITE or EXPAND_MEMORY modifiers to avoid testing it multiple
times.

--- gcc/expr.cc.jj  2023-08-08 15:55:06.499164554 +0200
+++ gcc/expr.cc 2023-08-08 15:59:36.594382141 +0200
@@ -11248,17 +11248,15 @@ expand_expr_real_1 (tree exp, rtx target
set_mem_addr_space (temp, as);
if (TREE_THIS_VOLATILE (exp))
  MEM_VOLATILE_P (temp) = 1;
-   if (modifier != EXPAND_WRITE
-   && modifier != EXPAND_MEMORY
-   && !inner_reference_p
+   if (modifier == EXPAND_WRITE || modifier == EXPAND_MEMORY)
+ return temp;
+   if (!inner_reference_p
&& mode != BLKmode
&& align < GET_MODE_ALIGNMENT (mode))
  temp = expand_misaligned_mem_ref (temp, mode, unsignedp, align,
modifier == EXPAND_STACK_PARM
? NULL_RTX : target, alt_rtl);
-   if (reverse
-   && modifier != EXPAND_MEMORY
-   && modifier != EXPAND_WRITE)
+   if (reverse)
  temp = flip_storage_order (mode, temp);
return temp;
   }

Jakub



[PATCH 0/12] GCC _BitInt support [PR102989]

2023-08-09 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch series introduces support for C23 bit-precise integer
types.  In short, they are similar to other integral types in many ways,
just aren't subject for integral promotions if smaller than int and they can
have even much wider precisions than ordinary integer types.

This series includes and thus subsumes all so far uncommitted _BitInt related
patches.  Compared to the last posted series, there is bit-field _BitInt
support, _Atomic/stdatomic.h support, conversions between _Decimal{32,64,128}
and _BitInt and vice versa (this particular item compared to what has been
posted before has a fix for the large powers of 10 computations which
with the _BitInt(575) limitation can't be really seen so far, but I've tried
to call the underlying routines with very large arrays of limbs, and in
addition to that the generated tables header has been made more compact) and
Richard's patch review feedback has been incorporated and series has been
further split into more patches.

It is enabled only on targets which have agreed on processor specific
ABI how to lay those out or pass as function arguments/return values,
which currently is just x86-64 I believe, would be nice if target maintainers
helped to get agreement on psABI changes and GCC 14 could enable it on far
more architectures than just one.

C23 says that  defines BITINT_MAXWIDTH macro and that is the
largest supported precision of the _BitInt types, smallest is precision
of unsigned long long (but due to lack of psABI agreement we'll violate
that on architectures which don't have the support done yet).
The following series uses for the time just WIDE_INT_MAX_PRECISION as
that BITINT_MAXWIDTH, with the intent to increase it incrementally later
on.  WIDE_INT_MAX_PRECISION is 575 bits on x86_64, but will be even smaller
on lots of architectures.  This is the largest precision we can support
without changes of wide_int/widest_int representation (to make those non-POD
and allow use of some allocated buffer rather than the included fixed size
one).  Once that would be overcome, there is another internal enforced limit,
INTEGER_CST in current layout allows at most 255 64-bit limbs, which is
16320 bits as another cap.  And if that is overcome, then we have limitation
of TYPE_PRECISION being 16-bit, so 65535 as maximum precision.  Perhaps
we could make TYPE_PRECISION dependent on BITINT_TYPE vs. others and use
32-bit precision in that case later.  Latest Clang/LLVM I think supports
on paper up to 8388608 bits, but is hardly usable even with much shorter
precisions.

Besides this hopefully temporary cap on supported precision and support
only on targets which buy into it, the support has the following limitations:

- _Complex _BitInt(N) isn't supported; again mainly because none of the psABIs
  mention how those should be passed/returned; in a limited way they are
  supported internally because the internal functions into which
  __builtin_{add,sub,mul}_overflow{,_p} is lowered return COMPLEX_TYPE as a
  hack to return 2 values without using references/pointers

- vectors of _BitInt(N) aren't supported, both because psABIs don't specify
  how that works and because I'm not really sure it would be useful given
  lack of hw support for anything but bit-precise integers with the same
  bit precision as standard integer types

Because the bit-precise types have different behavior both in the C FE
(e.g. the lack of promotion) and do or can have different behavior in type
layout and function argument passing/returning values, the patch introduces
a new integral type, BITINT_TYPE, so various spots which explicitly check
for INTEGER_TYPE and not say INTEGRAL_TYPE_P macro need to be adjusted.
Also the assumption that all integral types have scalar integer type mode
is no longer true, larger BITINT_TYPEs have BLKmode type.

The patch makes 4 different categories of _BitInt depending on the target hook
decisions and their precision.  The x86-64 psABI says that _BitInt which fit
into signed/unsigned char, short, int, long and long long are laid out and
passed as those types (with padding bits undefined if they don't have mode
precision).  Such smallest precision bit-precise integer types are categorized
as small, the target hook gives for specific precision a scalar integral mode
where a single such mode contains all the bits.  Such small _BitInt types are
generally kept in the IL until expansion into RTL, with minor tweaks during
expansion to avoid relying on the padding bit values.  All larger precision
_BitInt types are supposed to be handled as structure containing an array
of limbs or so, where a limb has some integral mode (for libgcc purposes
best if it has word-size) and the limbs have either little or big endian
ordering in the array.  The padding bits in the most significant limb if any
are either undefined or should be always sign/zero extended (but support for 
this
isn't in yet, we don't know if any psABI will require it).  As mentioned in
some 

Re: [PATCH v9] RISC-V: Add the 'zfa' extension, version 0.2

2023-08-09 Thread Vineet Gupta

Hi Jin Ma,

On 5/16/23 00:06, jinma via Gcc-patches wrote:

On 5/15/23 07:16, Jin Ma wrote:


Do we also need to check Z[FDH]INX too?

Otherwise it looks pretty good.  We just need to wait for everything to
freeze and finalization on the assembler interface.

jeff

Yes, you are right, we also need to check Z[FDH]INX. I will send a patch
again to fix it after others give some review comments.


Can we please revisit this and get this merged upstream.
Seems like gcc is supporting frozen but not ratified extensions.

Thx,
-Vineet


Re: [PATCH] aarch64: enable mixed-types for aarch64 simdclones

2023-08-09 Thread Jakub Jelinek via Gcc-patches
On Wed, Aug 09, 2023 at 06:27:20PM +0100, Richard Sandiford wrote:
> Jakub Jelinek  writes:
> > On Wed, Aug 09, 2023 at 05:55:28PM +0100, Richard Sandiford wrote:
> >> Jakub: do you remember what the reason was?  I don't mind dropping
> >> "function", but it feels weird to drop the quotes around "simd".
> >> Seems like, if we do that, there'll one day be a patch to add
> >> them back. :)
> >
> > Because in OpenMP their are % functions, not %
> > %functions, but we also have the %/% attribute as
> > extension.
> 
> Yeah, I can understand dropping the "function" bit.  But why
> s/unsupported ... for %/unsupported ... for simd/?
> Even if it's only a partial syntax quote, it is still a syntax quote.

% in OpenMP is something very different though, so I think it is
better to use it as a generic term which covers the different syntax cases.

Jakub



Re: [RFC] GCC Security policy

2023-08-09 Thread Siddhesh Poyarekar

On 2023-08-08 10:30, Siddhesh Poyarekar wrote:
Do you have a suggestion for the language to address libgcc, 
libstdc++, etc. and libiberty, libbacktrace, etc.?


I'll work on this a bit and share a draft.


Hi David,

Here's what I came up with for different parts of GCC, including the 
runtime libraries.  Over time we may find that specific parts of runtime 
libraries simply cannot be used safely in some contexts and flag that.


Sid

"""
What is a GCC security bug?
===

A security bug is one that threatens the security of a system or
network, or might compromise the security of data stored on it.
In the context of GCC there are multiple ways in which this might
happen and they're detailed below.

Compiler drivers, programs, libgccjit and support libraries
---

The compiler driver processes source code, invokes other programs
such as the assembler and linker and generates the output result,
which may be assembly code or machine code.  It is necessary that
all source code inputs to the compiler are trusted, since it is
impossible for the driver to validate input source code beyond
conformance to a programming language standard.

The GCC JIT implementation, libgccjit, is intended to be plugged
into applications to translate input source code in the application
context.  Limitations that apply to the compiler
driver, apply here too in terms of sanitizing inputs, so it is
recommended that inputs are either sanitized by an external program
to allow only trusted, safe execution in the context of the
application or the JIT execution context is appropriately sandboxed
to contain the effects of any bugs in the JIT or its generated code
to the sandboxed environment.

Support libraries such as libiberty, libcc1 libvtv and libcpp have
been developed separately to share code with other tools such as
binutils and gdb.  These libraries again have similar challenges to
compiler drivers.  While they are expected to be robust against
arbitrary input, they should only be used with trusted inputs.

Libraries such as zlib and libffi that bundled into GCC to build it
will be treated the same as the compiler drivers and programs as far
as security coverage is concerned.

As a result, the only case for a potential security issue in all
these cases is when it ends up generating vulnerable output for
valid input source code.

Language runtime libraries
--

GCC also builds and distributes libraries that are intended to be
used widely to implement runtime support for various programming
languages.  These include the following:

* libada
* libatomic
* libbacktrace
* libcc1
* libcody
* libcpp
* libdecnumber
* libgcc
* libgfortran
* libgm2
* libgo
* libgomp
* libiberty
* libitm
* libobjc
* libphobos
* libquadmath
* libssp
* libstdc++

These libraries are intended to be used in arbitrary contexts and as
a result, bugs in these libraries may be evaluated for security
impact.  However, some of these libraries, e.g. libgo, libphobos,
etc.  are not maintained in the GCC project, due to which the GCC
project may not be the correct point of contact for them.  You are
encouraged to look at README files within those library directories
to locate the canonical security contact point for those projects.

Diagnostic libraries


The sanitizer library bundled in GCC is intended to be used in
diagnostic cases and not intended for use in sensitive environments.
As a result, bugs in the sanitizer will not be considered security
sensitive.

GCC plugins
---

It should be noted that GCC may execute arbitrary code loaded by a
user through the GCC plugin mechanism or through system preloading
mechanism.  Such custom code should be vetted by the user for safety
as bugs exposed through such code will not be considered security
issues.


Re: [PATCH] aarch64: enable mixed-types for aarch64 simdclones

2023-08-09 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek  writes:
> On Wed, Aug 09, 2023 at 05:55:28PM +0100, Richard Sandiford wrote:
>> Jakub: do you remember what the reason was?  I don't mind dropping
>> "function", but it feels weird to drop the quotes around "simd".
>> Seems like, if we do that, there'll one day be a patch to add
>> them back. :)
>
> Because in OpenMP their are % functions, not %
> %functions, but we also have the %/% attribute as
> extension.

Yeah, I can understand dropping the "function" bit.  But why
s/unsupported ... for %/unsupported ... for simd/?
Even if it's only a partial syntax quote, it is still a syntax quote.

Thanks,
Richard


Re: [PATCH] aarch64: enable mixed-types for aarch64 simdclones

2023-08-09 Thread Jakub Jelinek via Gcc-patches
On Wed, Aug 09, 2023 at 05:55:28PM +0100, Richard Sandiford wrote:
> Jakub: do you remember what the reason was?  I don't mind dropping
> "function", but it feels weird to drop the quotes around "simd".
> Seems like, if we do that, there'll one day be a patch to add
> them back. :)

Because in OpenMP their are % functions, not %
%functions, but we also have the %/% attribute as
extension.

Jakub



Re: [PATCH] aarch64: enable mixed-types for aarch64 simdclones

2023-08-09 Thread Andre Vieira (lists) via Gcc-patches




On 09/08/2023 17:55, Richard Sandiford wrote:

"Andre Vieira (lists)"  writes:


On 08/08/2023 11:51, Richard Sandiford wrote:

"Andre Vieira (lists)"  writes:



warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-   "unsupported return type %qT for % functions",
+   "unsupported return type %qT for simd",
ret_type);


What's the reason for s/% functions/simd/, in particular for
dropping the quotes around simd?


It's to align with i386's error message, this helps with testing as then
I can avoid having different tests for the same error.

I asked Jakub which one he preferred, and he gave me an explanation why
the i386's one was preferable, ... but I didn't write it down unfortunately.


Jakub: do you remember what the reason was?  I don't mind dropping
"function", but it feels weird to drop the quotes around "simd".
Seems like, if we do that, there'll one day be a patch to add
them back. :)


After some IRC scrolling, unfortunately my client doesn't have a fancy 
search :(


avieira> Andre Vieira
jakub: which one do you prefer?
1:59 PM
"unsupported argument type %qT for simd" (i386)
1:59 PM
 "unsupported argument type %qT for % functions", (aarch64)
1:59 PM
Gonna change one to be the same as the other ...
2:04 PM
→ gaius joined  ⇐ lh_ideapad, egallager and jwakely_ quit
2:36 PM 
I'd just go with for simd; % functions isn't an established term, 
it would be either % functions, but we have also simd 
attribute...


Re: [PATCH] RISC-V: Fix VLMAX AVL incorrect local anticipate [VSETVL PASS]

2023-08-09 Thread Jeff Law via Gcc-patches




On 8/9/23 04:51, Juzhe-Zhong wrote:

Realize we have a bug in VSETVL PASS which is triggered by strided_load_run-1.c 
in RV32 system.

FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c 
execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c 
execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c 
execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c 
execution test

This is because VSETVL PASS incorrect hoist vsetvl instruction:

...
10156:  0d9075d7vsetvli a1,zero,e64,m2,ta,ma ---> 
pollute 'a1' register which will be used by following insns.
1015a:  01d586b3add a3,a1,t4  > use 'a1'
1015e:  5e070257vmv.v.v v4,v14
10162:  b7032257vmacc.vvv4,v6,v16
10166:  26440257vand.vv v4,v4,v8
1016a:  22880227vs2r.v  v4,(a6)
1016e:  00b6b7b3sltua5,a3,a1
10172:  22888227vs2r.v  v4,(a7)
10176:  9e60b157vmv2r.v v2,v6
1017a:  97baadd a5,a5,a4
1017c:  a6a62157vmadd.vvv2,v12,v10
10180:  26240157vand.vv v2,v2,v8
10184:  22830127vs2r.v  v2,(t1)
10188:  873emv  a4,a5
1018a:  982aadd a6,a6,a0
1018c:  98aaadd a7,a7,a0
1018e:  932aadd t1,t1,a0
10190:  85b6mv  a1,a3   -> set 'a1'
...

gcc/ChangeLog:

 * config/riscv/riscv-vsetvl.cc (anticipatable_occurrence_p): Fix 
incorrect anticipate info.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c: 
Adapt test.
 * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-24.c: Ditto.
 * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c: Ditto.
 * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c: Ditto.
 * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-36.c: Ditto.
 * gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-14.c: Ditto.
 * gcc.target/riscv/rvv/vsetvl/vlmax_switch_vtype-15.c: Ditto.

OK.

Do we need to backport this to gcc-13?

jeff


Re: [PATCH] aarch64: enable mixed-types for aarch64 simdclones

2023-08-09 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)"  writes:
> Here is my new version, see inline response to your comments.
>
> New cover letter:
>
> This patch enables the use of mixed-types for simd clones for AArch64, 
> adds aarch64 as a target_vect_simd_clones and corrects the way the 
> simdlen is chosen for non-specified simdlen clauses according to the 
> 'Vector Function Application Binary Interface Specification for AArch64'.
>
> gcc/ChangeLog:
>
>  * config/aarch64/aarch64.cc (currently_supported_simd_type): 
> Remove.
>  (aarch64_simd_clone_compute_vecsize_and_simdlen): Determine 
> simdlen according to NDS rule.
>  (lane_size): New function.
>
> gcc/testsuite/ChangeLog:
>
>  * lib/target-supports.exp: Add aarch64 targets to vect_simd_clones.
>  * c-c++-common/gomp/declare-variant-14.c: Add aarch64 checks 
> and remove warning check.
>  * g++.dg/gomp/attrs-10.C: Likewise.
>  * g++.dg/gomp/declare-simd-1.C: Likewise.
>  * g++.dg/gomp/declare-simd-3.C: Likewise.
>  * g++.dg/gomp/declare-simd-4.C: Likewise.
>  * gcc.dg/gomp/declare-simd-3.c: Likewise.
>  * gcc.dg/gomp/simd-clones-2.c: Likewise.
>  * gfortran.dg/gomp/declare-variant-14.f90: Likewise.
>  * c-c++-common/gomp/pr60823-1.c: Remove warning check.
>  * c-c++-common/gomp/pr60823-3.c: Likewise.
>  * g++.dg/gomp/declare-simd-7.C: Likewise.
>  * g++.dg/gomp/declare-simd-8.C: Likewise.
>  * g++.dg/gomp/pr88182.C: Likewise.
>  * gcc.dg/declare-simd.c: Likewise.
>  * gcc.dg/gomp/declare-simd-1.c: Likewise.
>  * gcc.dg/gomp/pr87895-1.c: Likewise.
>  * gfortran.dg/gomp/declare-simd-2.f90: Likewise.
>  * gfortran.dg/gomp/declare-simd-coarray-lib.f90: Likewise.
>  * gfortran.dg/gomp/pr79154-1.f90: Likewise.
>  * gfortran.dg/gomp/pr83977.f90: Likewise.
>  * gcc.dg/gomp/pr87887-1.c: Add warning test.
>  * gcc.dg/gomp/pr89246-1.c: Likewise.
>  * gcc.dg/gomp/pr99542.c: Update warning test.
>
>
>
> On 08/08/2023 11:51, Richard Sandiford wrote:
>> "Andre Vieira (lists)"  writes:
>
>>> warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
>>> -   "unsupported return type %qT for % functions",
>>> +   "unsupported return type %qT for simd",
>>> ret_type);
>> 
>> What's the reason for s/% functions/simd/, in particular for
>> dropping the quotes around simd?
>
> It's to align with i386's error message, this helps with testing as then 
> I can avoid having different tests for the same error.
>
> I asked Jakub which one he preferred, and he gave me an explanation why 
> the i386's one was preferable, ... but I didn't write it down unfortunately.

Jakub: do you remember what the reason was?  I don't mind dropping
"function", but it feels weird to drop the quotes around "simd".
Seems like, if we do that, there'll one day be a patch to add
them back. :)

Thanks,
Richard


[PATCH v3] tree-optimization/110279- Check for nested FMA in reassoc

2023-08-09 Thread Di Zhao OS via Gcc-patches
Hi,

The previous version of this patch tries to solve two problems
at the same time. For better clarity, I'll separate them and 
only deal with the "nested" FMA in this version. I plan to
propose another patch in avoiding bad shaped FMA (deferring FMA).

Other changes:

1. Added new testcases for the "nested" FMA issue. For the
   following code:

tmp1 = a + c * c + d * d + x * y;
tmp2 = x * tmp1;
result += (a + c + d + tmp2);

   , when "tmp1 = ..." is not rewritten, tmp1 will be result of
   an FMA, and there will be a list of consecutive FMAs: 

_1 = .FMA (c, c, a_39);
_2 = .FMA (d, d, _1);
tmp1 = .FMA (x, y, _2);
_3 = .FMA (tmp1, x, d);
...
   
   If "tmp1 = ..." is rewritten to parallel, tmp1 will be result
   of a PLUS_EXPR between FMAs:

_1 = .FMA (c, c, a_39);
_2 = x * y;
_3 = .FMA (d, d, _2);
 tmp1 = _3 + _1;
 _4 = .FMA (tmp1, x, d);
...

   It seems the register pressure of the latter is higher than
   the former. On the test machines we have (including Ampere1,
   Neoverse-n1 and Intel Xeon), with "tmp1 = ..." is rewritten to
   parallel, the run time all increased significantly. In
   contrast, when "tmp1" is not the 1st or 2nd operand of another
   FMA (pr110279-1.c), rewriting it results in better performance.
   (I'll also append the testcases in the bug tracker.)

2. Enhanced checking for nested FMA by: 1) Modified
   convert_mult_to_fma so it can return multiple LHS.  2) Check
   NEGATE_EXPRs for nested FMA.

(I think maybe this can be further refined by enabling rewriting
to parallel for very long op list. )

Bootstrapped and regression tested on x86_64-linux-gnu.

Thanks,
Di Zhao



PR tree-optimization/110279

gcc/ChangeLog:

* tree-ssa-math-opts.cc (convert_mult_to_fma_1): Added
new parameter collect_lhs.
(struct fma_transformation_info): Moved to header.
(class fma_deferring_state): Moved to header.
(convert_mult_to_fma): Added new parameter collect_lhs.
* tree-ssa-math-opts.h (struct fma_transformation_info):
(class fma_deferring_state): Moved from .cc.
(convert_mult_to_fma): Moved from .cc.
* tree-ssa-reassoc.cc (enum fma_state): Defined enum to
describe the state of FMA candidates for a list of
operands.
(rewrite_expr_tree_parallel): Changed boolean parameter
to enum type.
(has_nested_fma_p): New function to check for nested FMA
on given multiplication statement.
(rank_ops_for_fma): Return enum fma_state.
(reassociate_bb): Avoid rewriting to parallel if nested
FMAs are found.

gcc/testsuite/ChangeLog:

* gcc.dg/pr110279-1.c: New test.
* gcc.dg/pr110279-2.c: New test.


tree-optimization-110279-Check-for-nested-FMA-in-rea.patch
Description: tree-optimization-110279-Check-for-nested-FMA-in-rea.patch


Re: [PATCH] Handle in-order reductions when SLP vectorizing non-loops

2023-08-09 Thread Jeff Law via Gcc-patches




On 8/9/23 07:51, Alexander Monakov wrote:


On Wed, 9 Aug 2023, Richard Biener via Gcc-patches wrote:


The following teaches the non-loop reduction vectorization code to
handle non-associatable reductions.  Using the existing FOLD_LEFT_PLUS
internal functions might be possible but I'd have to convince myself
that +0.0 + x[0] is a safe extra operation in ever rounding mode
(I also have no way to test the resulting code).


It's not. Under our default -fno-signaling-nans -fno-rounding-math
negative zero is the neutral element for addition, so '-0.0 + x[0]'
might be (but negative zero costs more to materialize).

If the reduction has at least two elements, then

-0.0 + x[0] + x[1]

has the same behavior w.r.t SNaNs as 'x[0] + x[1]', but unfortunately
yields negative zero when x[0] = x[1] = +0.0 and rounding towards
negative infinity (unlike x[0] + x[1], which is +0.0).
Hmm, then there's a bug in an non-released port I worked on a while 
back.  It supports FOLD_LEFT_PLUS by starting the sequence with a +0.0 
in the destination register.


I guess if that port ever gets upstreamed I'll have to keep an eye out 
for that problem.  Luckily I think they can synthesize a -0.0 trivially, 
potentially even zero cost.


Thanks!
Jeff


Re: [PATCH v4] Implement new RTL optimizations pass: fold-mem-offsets.

2023-08-09 Thread Jeff Law via Gcc-patches




On 8/7/23 08:33, Manolis Tsamis wrote:

This is a new RTL pass that tries to optimize memory offset calculations
by moving them from add immediate instructions to the memory loads/stores.
For example it can transform this:

   addi t4,sp,16
   add  t2,a6,t4
   shl  t3,t2,1
   ld   a2,0(t3)
   addi a2,1
   sd   a2,8(t2)

into the following (one instruction less):

   add  t2,a6,sp
   shl  t3,t2,1
   ld   a2,32(t3)
   addi a2,1
   sd   a2,24(t2)

Although there are places where this is done already, this pass is more
powerful and can handle the more difficult cases that are currently not
optimized. Also, it runs late enough and can optimize away unnecessary
stack pointer calculations.

gcc/ChangeLog:

* Makefile.in: Add fold-mem-offsets.o.
* passes.def: Schedule a new pass.
* tree-pass.h (make_pass_fold_mem_offsets): Declare.
* common.opt: New options.
* doc/invoke.texi: Document new option.
* fold-mem-offsets.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/fold-mem-offsets-1.c: New test.
* gcc.target/riscv/fold-mem-offsets-2.c: New test.
* gcc.target/riscv/fold-mem-offsets-3.c: New test.

We still have the m68k issue to deal with.

We've got these key insns going into fold-mem-offsets:


(insn 39 38 41 3 (set (reg:SI 13 %a5 [orig:36 _14 ] [36])
(plus:SI (reg:SI 10 %a2 [47])
(const_int 1 [0x1]))) "j.c":19:3 157 {*addsi3_internal}
 (nil))
(insn 41 39 42 3 (set (reg:SI 8 %a0 [61])
(plus:SI (reg/f:SI 12 %a4 [52])
(reg:SI 13 %a5 [orig:36 _14 ] [36]))) "j.c":19:3 discrim 1 157 
{*addsi3_internal}
 (nil))
(insn 42 41 43 3 (set (mem:SI (reg:SI 8 %a0 [61]) [0 MEM  [(void 
*)_15]+0 S4 A8])
(const_int 1633837924 [0x61626364])) "j.c":19:3 discrim 1 55 
{*movsi_m68k2}
 (nil))
(insn 43 42 45 3 (set (mem:QI (plus:SI (reg:SI 8 %a0 [61])
(const_int 4 [0x4])) [0 MEM  [(void *)_15]+4 S1 A8])
(const_int 101 [0x65])) "j.c":19:3 discrim 1 62 {*m68k.md:1130}
 (expr_list:REG_DEAD (reg:SI 8 %a0 [61])
(nil)))


[ ... ]

(insn 58 57 59 3 (set (reg:SI 8 %a0 [72])
(plus:SI (plus:SI (reg:SI 13 %a5 [orig:36 _14 ] [36])
(reg:SI 11 %a3 [49]))
(const_int 5 [0x5]))) "j.c":24:3 421 {*lea}
 (expr_list:REG_DEAD (reg:SI 13 %a5 [orig:36 _14 ] [36])
(expr_list:REG_DEAD (reg:SI 11 %a3 [49])
(nil



f-m-o will propagate the (const_int 1) from insn 39 into insns 42 & 43 
and modifies insn 39 into a simple copy.  Note carefully that turning 
insn 39 into a simple copy changes the value in a5.


As a result insn 58 computes the wrong value and the test (strlenopt-2) 
fails on the m68k.


In fold_offsets we have this code:


 /* We only fold through instructions that are transitively used as
 memory addresses and do not have other uses.  Use the same logic
 from offset calculation to visit instructions that can propagate
 offsets and keep track of them in CAN_FOLD_INSNS.  */
  bool success;
  struct df_link *uses = get_uses (def, dest, ), *ref_link;

  if (!success)
return 0;

  for (ref_link = uses; ref_link; ref_link = ref_link->next)
{
  rtx_insn* use = DF_REF_INSN (ref_link->ref);

  if (DEBUG_INSN_P (use))
continue;

  /* Punt if the use is anything more complicated than a set
 (clobber, use, etc).  */
  if (!NONJUMP_INSN_P (use) || GET_CODE (PATTERN (use)) != SET)
return 0;

  /* This use affects instructions outside of CAN_FOLD_INSNS.  */
  if (!bitmap_bit_p (_fold_insns, INSN_UID (use)))
return 0;

  rtx use_set = PATTERN (use);

  /* Special case: A foldable memory store is not foldable if it
 mentions DEST outside of the address calculation.  */
  if (use_set && MEM_P (SET_DEST (use_set))
  && reg_mentioned_p (dest, SET_SRC (use_set)))
return 0;
}


AFAICT that code is supposed to detect if there are uses outside of the 
set of insns that can be optimized.  In our case DEF is a0 (the base of 
the memory references) and its def is insn 41.  The assignment of a0 in 
insn 41 only reaches insns 42 and 43 which are marked in can_fold_insns.


That's find and good, but insufficient for correctness.  ISTM we must 
look at the uses of any insn where we change the result of the 
computation, not just the def of the base register in the memory reference.


In this particular case we're going to modify insn 39, so we need to 
look at the uses of a5 (the value defined by insn 39).  If we did that 
we'd see the use of a5 at insn 38 and insn 38 and reject the optimization.




Testcase below.  You should be able to see the incorrect transformation 
with a m68k-linux-gnu cross compiler with -O2.




typedef unsigned int size_t;
extern void abort (void);
void *calloc (size_t, size_t);
void 

Re: [V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-08-09 Thread Michael Matz via Gcc-patches
Hello,

On Wed, 9 Aug 2023, Qing Zhao wrote:

> Although this is an old FAM related issue that does not relate to my current 
> patch 
> (and might need to be resolved in a separate patch).  I think that it’s 
> necessary to have
> more discussion on this old issue and resolve it. 
> 
> The first thing that I’d like to confirm is:
> 
> What the exact memory layout for the following structure x?
> 
> struct foo { int a; short b; char t[]; } x = { .t = { 1, 2, 3 } };
> 
> And the key that is confusing me is, where should the field “t” start? 
> 
> A.  Starting at offset 8 as the following:
> 
> a 4-bytes
> b 2-bytes
> padding   2-bytes
> t 3-bytes

Why should there be padding before 't'?  It's a char array (FAM or not), 
so it can be (and should be) placed directly after 'b'.  So ...

> B. Starting at offset 6 as the following:
> 
> a 4-bytes
> b 2-bytes
> t 3-bytes

... this is the correct layout, when seen in isolation.  The discussion 
revolves around what should come after 't': if it's a non-FAM struct (with 
t[3]), then it's clear that there needs to be padding after it, so to pad 
out the whole struct to be 12 bytes long (for sizeof() purpose), as 
required by its alignment (due to the int field 'a').

So, should the equivalent FAM struct also have this sizeof()?  If no: 
there should be a good argument why it shouldn't be similar to the non-FAM 
one.

Then there is an argument that the compiler would be fine, when allocating 
a single object of such type (not as part of an array!), to only reserve 9 
bytes of space for the FAM-struct.  Then the question is: should it also 
do that for a non-FAM struct (based on the argument that the padding 
behind 't' isn't accessible, and hence doesn't need to be alloced).  I 
think it would be quite unexpected for the compiler to actually reserve 
less space than sizeof() implies, so I personally don't buy that argument.  
For FAM or non-FAM structs.

Note that if one choses to allocate less space than sizeof implies that 
this will have quite some consequences for code generation, in that 
sometimes the instruction sequences (e.g. for copying) need to be careful 
to never access tail padding that should be there in array context, but 
isn't there in single-object context.  I think this alone should make it 
clear that it's advisable that sizeof() and allocated size agree.

As in: I think sizeof for both structs should return 12, and 12 bytes 
should be reserved for objects of such types.

And then the next question is what __builtin_object_size should do with 
these: should it return the size with or without padding at end (i.e. 
could/should it return 9 even if sizeof is 12).  I can see arguments for 
both.


Ciao,
Michael.


[PATCH] VR-VALUES: Simplify comparison using range pairs

2023-08-09 Thread Andrew Pinski via Gcc-patches
If `A` has a range of `[0,0][100,INF]` and the comparison
of `A < 50`. This should be optimized to `A <= 0` (which then
will be optimized to just `A == 0`).
This patch implement this via a new function which sees if
the constant of a comparison is in the middle of 2 range pairs
and change the constant to the either upper bound of the first pair
or the lower bound of the second pair depending on the comparison.

This is the first step in fixing the following PRS:
PR 110131, PR 108360, and PR 108397.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* vr-values.cc (simplify_compare_using_range_pairs): New function.
(simplify_using_ranges::simplify_compare_using_ranges_1): Call
it.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/vrp124.c: New test.
* gcc.dg/pr21643.c: Disable VRP.
---
 gcc/testsuite/gcc.dg/pr21643.c |  6 ++-
 gcc/testsuite/gcc.dg/tree-ssa/vrp124.c | 44 +
 gcc/vr-values.cc   | 65 ++
 3 files changed, 114 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp124.c

diff --git a/gcc/testsuite/gcc.dg/pr21643.c b/gcc/testsuite/gcc.dg/pr21643.c
index 4e7f93d351a..7f121d7006f 100644
--- a/gcc/testsuite/gcc.dg/pr21643.c
+++ b/gcc/testsuite/gcc.dg/pr21643.c
@@ -1,6 +1,10 @@
 /* PR tree-optimization/21643 */
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-reassoc1-details --param 
logical-op-non-short-circuit=1" } */
+/* Note VRP is able to transform `c >= 0x20` in f7
+   to `c >= 0x21` since we want to test
+   reassociation and not VRP, turn it off. */
+
+/* { dg-options "-O2 -fdump-tree-reassoc1-details --param 
logical-op-non-short-circuit=1 -fno-tree-vrp" } */
 
 int
 f1 (unsigned char c)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp124.c 
b/gcc/testsuite/gcc.dg/tree-ssa/vrp124.c
new file mode 100644
index 000..6ccbda35d1b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp124.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+/* Should be optimized to a == -100 */
+int g(int a)
+{
+  if (a == -100 || a >= 0)
+;
+  else
+return 0;
+  return a < 0;
+}
+
+/* Should optimize to a == 0 */
+int f(int a)
+{
+  if (a == 0 || a > 100)
+;
+  else
+return 0;
+  return a < 50;
+}
+
+/* Should be optimized to a == 0. */
+int f2(int a)
+{
+  if (a == 0 || a > 100)
+;
+  else
+return 0;
+  return a < 100;
+}
+
+/* Should optimize to a == 100 */
+int f1(int a)
+{
+  if (a < 0 || a == 100)
+;
+  else
+return 0;
+  return a > 50;
+}
+
+/* { dg-final { scan-tree-dump-not "goto " "optimized" } } */
diff --git a/gcc/vr-values.cc b/gcc/vr-values.cc
index a4fddd62841..1262e7cf9f0 100644
--- a/gcc/vr-values.cc
+++ b/gcc/vr-values.cc
@@ -968,9 +968,72 @@ test_for_singularity (enum tree_code cond_code, tree op0,
   if (operand_equal_p (min, max, 0) && is_gimple_min_invariant (min))
return min;
 }
+
   return NULL;
 }
 
+/* Simplify integer comparisons such that the constant is one of the range 
pairs.
+   For an example, 
+   A has a range of [0,0][100,INF]
+   and the comparison of `A < 50`.
+   This should be optimized to `A <= 0`
+   and then test_for_singularity can optimize it to `A == 0`.   */
+
+static bool
+simplify_compare_using_range_pairs (tree_code _code, tree , tree ,
+   const value_range *vr)
+{
+  if (TREE_CODE (op1) != INTEGER_CST
+  || vr->num_pairs () < 2)
+return false;
+  auto val_op1 = wi::to_wide (op1);
+  tree type = TREE_TYPE (op0);
+  auto sign = TYPE_SIGN (type);
+  auto p = vr->num_pairs ();
+  /* Find the value range pair where op1
+ is in the middle of if one exist. */
+  for (unsigned i = 1; i < p; i++)
+{
+  auto lower = vr->upper_bound (i - 1);
+  auto upper = vr->lower_bound (i);
+  if (wi::lt_p (val_op1, lower, sign))
+   continue;
+  if (wi::gt_p (val_op1, upper, sign))
+   continue;
+  if (cond_code == LT_EXPR
+  && val_op1 != lower)
+{
+ op1 = wide_int_to_tree (type, lower);
+ cond_code = LE_EXPR;
+ return true;
+}
+  if (cond_code == LE_EXPR
+  && val_op1 != upper
+  && val_op1 != lower)
+{
+ op1 = wide_int_to_tree (type, lower);
+ cond_code = LE_EXPR;
+ return true;
+}
+  if (cond_code == GT_EXPR
+  && val_op1 != upper)
+{
+ op1 = wide_int_to_tree (type, upper);
+ cond_code = GE_EXPR;
+ return true;
+}
+  if (cond_code == GE_EXPR
+  && val_op1 != lower
+  && val_op1 != upper)
+{
+ op1 = wide_int_to_tree (type, upper);
+ cond_code = GE_EXPR;
+ return true;
+}
+}
+  return false;
+}
+
 /* Return whether the value range *VR fits in an integer type specified
by PRECISION and UNSIGNED_P.  */
 
@@ -1235,6 

Re: [V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-08-09 Thread Qing Zhao via Gcc-patches
Hi, Martin,

Thanks for raising this issue.

Although this is an old FAM related issue that does not relate to my current 
patch 
(and might need to be resolved in a separate patch).  I think that it’s 
necessary to have
more discussion on this old issue and resolve it. 

The first thing that I’d like to confirm is:

What the exact memory layout for the following structure x?

struct foo { int a; short b; char t[]; } x = { .t = { 1, 2, 3 } };

And the key that is confusing me is, where should the field “t” start? 

A.  Starting at offset 8 as the following:

a   4-bytes
b   2-bytes
padding   2-bytes
t   3-bytes

B. Starting at offset 6 as the following:

a   4-bytes
b   2-bytes
t   3-bytes

From my understanding, A should be correct. However, when I debugged into gcc, 
I found that the following

tree
byte_position (const_tree field)
{
  return byte_from_pos (DECL_FIELD_OFFSET (field),
DECL_FIELD_BIT_OFFSET (field));
}

Returned 6 for the field “t”:

498   tree pos = byte_position (last);
(gdb) n
499   size = fold_build2 (PLUS_EXPR, TREE_TYPE (size), pos, compsize);
(gdb) call debug_generic_expr(pos)
6

So, I suspect that there is a bug in GCC which incorrectly represent the offset 
of the FAM field in the IR.

Thanks.

Qing
> On Aug 8, 2023, at 10:54 AM, Martin Uecker  wrote:
> 
> 
> 
> I am sure this has been discussed before, but seeing that you
> test for a specific formula, let me point out the following:
> 
> There at least three different size expression which could
> make sense. Consider
> 
> short foo { int a; short b; char t[]; }; 
> 
> Most people seem to use
> 
> sizeof(struct foo) + N * sizeof(foo->t);
> 
> which for N == 3 yields 11 bytes on x86-64 because the formula
> adds the padding of the original struct. There is an example
> in the  C standard that uses this formula.
> 
> 
> But he minimum size of an object which stores N elements is
> 
> max(sizeof (struct s), offsetof(struct s, t[n]))
> 
> which is 9 bytes. 
> 
> This is what clang uses for statically allocated objects with
> initialization, while GCC uses the rule above (11 bytes). But 
> bdos / bos  then returns the smaller size of 9 which is a bit
> confusing.
> 
> 
> https://godbolt.org/z/K1hvaK1ns
> 
> https://github.com/llvm/llvm-project/issues/62929
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109956
> 
> 
> Then there is also the size of a similar array where the FAM
> is replaced with an array of static size:
> 
> struct foo { int a; short b; char t[3]; }; 
> 
> This would make the most sense to me, but it has 12 bytes
> because the padding is according to the usual alignment
> rules.
> 
> 
> Martin
> 
> 
> 
> Am Montag, dem 07.08.2023 um 09:16 -0700 schrieb Kees Cook:
>> On Fri, Aug 04, 2023 at 07:44:28PM +, Qing Zhao wrote:
>>> This is the 2nd version of the patch, per our discussion based on the
>>> review comments for the 1st version, the major changes in this version
>>> are:
>> 
>> Thanks for the update!
>> 
>>> 
>>> 1. change the name "element_count" to "counted_by";
>>> 2. change the parameter for the attribute from a STRING to an
>>> Identifier;
>>> 3. Add logic and testing cases to handle anonymous structure/unions;
>>> 4. Clarify documentation to permit the situation when the allocation
>>> size is larger than what's specified by "counted_by", at the same time,
>>> it's user's error if allocation size is smaller than what's specified by
>>> "counted_by";
>>> 5. Add a complete testing case for using counted_by attribute in
>>> __builtin_dynamic_object_size when there is mismatch between the
>>> allocation size and the value of "counted_by", the expecting behavior
>>> for each case and the explanation on why in the comments. 
>> 
>> All the "normal" test cases I have are passing; this is wonderful! :)
>> 
>> I'm still seeing unexpected situations when I've intentionally set
>> counted_by to be smaller than alloc_size, but I assume it's due to not
>> yet having the patch you mention below.
>> 
>>> As discussed, I plan to add two more separate patch sets after this initial
>>> patch set is approved and committed.
>>> 
>>> set 1. A new warning option and a new sanitizer option for the user error
>>>when the allocation size is smaller than the value of "counted_by".
>>> set 2. An improvement to __builtin_dynamic_object_size  for the following
>>>case:
>>> 
>>> struct A
>>> {
>>> size_t foo;
>>> int array[] __attribute__((counted_by (foo)));
>>> };
>>> 
>>> extern struct fix * alloc_buf ();
>>> 
>>> int main ()
>>> {
>>> struct fix *p = alloc_buf ();
>>> __builtin_object_size(p->array, 0) == sizeof(struct A) + p->foo * 
>>> sizeof(int);
>>>   /* with the current algorithm, it’s UNKNOWN */ 
>>> __builtin_object_size(p->array, 2) == sizeof(struct A) + p->foo * 
>>> sizeof(int);
>>>   /* with the current algorithm, it’s UNKNOWN */
>>> }
>> 
>> Should the above be bdos instead of bos?
>> 
>>> 

verify

2023-08-09 Thread Saurabh Jha via Gcc-patches
verify


[PATCH] rs6000, add overloaded DFP quantize support

2023-08-09 Thread Carl Love via Gcc-patches


GCC maintainers:

The following patch adds four built-ins for the decimal floating point
(DFP) quantize instructions on rs6000.  The built-ins are for 64-bit
and 128-bit DFP operands.

The patch also adds a test case for the new builtins.

The Patch has been tested on Power 10LE and Power 9 LE/BE.

Please let me know if the patch is acceptable for mainline.  Thanks.

 Carl Love


--
rs6000, add overloaded DFP quantize support

Add decimal floating point (DFP) quantize built-ins for both 64-bit DFP
and 128-DFP operands.  In each case, there is an immediate version and a
variable version of the bult-in.  The RM value is a 2-bit const int which
specifies the rounding mode to use.  For the immediate versions of the
built-in, TE field is a 5-bit constant that specifies the value of the
ideal exponent for the result.  The built-in specifications are:

  __Decimal64 builtin_dfp_quantize (_Decimal64, _Decimal64,
const int RM)
  __Decimal64 builtin_dfp_quantize (const int TE, _Decimal64,
const int)
  __Decimal128 builtin_dfpq_quantize (_Decimal128, _Decimal128,
  const int RM)
  __Decimal128 builtin_dfpq_quantize (const int TE, _Decimal128,
  const int)

A testcase is added for the new built-in definitions.

gcc/ChangeLog:
* config/rs6000/dfp.md: New UNSPECDQUAN.
(dfp_quan_, dfp_quan_i): New define_insn.
* config/rs6000/rs6000-builtins.def (__builtin_dfp_quantize_64,
__builtin_dfp_quantize_64i, __builtin_dfp_quantize_128,
__builtin_dfp_quantize_128i): New buit-in definitions.
* config/rs6000/rs6000-overload.def (__builtin_dfp_quantize,
__builtin_dfpq_quantize): New overloaded definitions.

gcc/testsuite/
 * gcc.target/powerpc/builtin-dfp-quantize-runnable.c: New test
case.
---
 gcc/config/rs6000/dfp.md  |  25 ++-
 gcc/config/rs6000/rs6000-builtins.def |  15 ++
 gcc/config/rs6000/rs6000-overload.def |  12 ++
 .../powerpc/builtin-dfp-quantize-runnable.c   | 198 ++
 4 files changed, 249 insertions(+), 1 deletion(-)
 create mode 100644 
gcc/testsuite/gcc.target/powerpc/builtin-dfp-quantize-runnable.c

diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
index 5ed8a73ac51..254c22a5c20 100644
--- a/gcc/config/rs6000/dfp.md
+++ b/gcc/config/rs6000/dfp.md
@@ -271,7 +271,8 @@
UNSPEC_DIEX
UNSPEC_DSCLI
UNSPEC_DTSTSFI
-   UNSPEC_DSCRI])
+   UNSPEC_DSCRI
+   UNSPEC_DQUAN])
 
 (define_code_iterator DFP_TEST [eq lt gt unordered])
 
@@ -395,3 +396,25 @@
   "dscri %0,%1,%2"
   [(set_attr "type" "dfp")
(set_attr "size" "")])
+
+(define_insn "dfp_quan_"
+  [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
+(unspec:DDTD [(match_operand:DDTD 1 "gpc_reg_operand" "d")
+ (match_operand:DDTD 2 "gpc_reg_operand" "d")
+  (match_operand:QI 3 "immediate_operand" "i")]
+ UNSPEC_DQUAN))]
+  "TARGET_DFP"
+  "dqua %0,%1,%2,%3"
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
+
+(define_insn "dfp_quan_i"
+  [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
+(unspec:DDTD [(match_operand:SI 1 "const_int_operand" "n")
+ (match_operand:DDTD 2 "gpc_reg_operand" "d")
+  (match_operand:SI 3 "immediate_operand" "i")]
+ UNSPEC_DQUAN))]
+  "TARGET_DFP"
+  "dquai %1,%0,%2,%3"
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 35c4cdf74c5..36a56311643 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -2983,6 +2983,21 @@
   const unsigned long long __builtin_unpack_dec128 (_Decimal128, const int<1>);
 UNPACK_TD unpacktd {}
 
+  const _Decimal64 __builtin_dfp_quantize_64 (_Decimal64, _Decimal64, \
+ const int<2>);
+DFPQUAN_64 dfp_quan_dd {}
+
+  const _Decimal64 __builtin_dfp_quantize_64i (const int<5>, _Decimal64, \
+const int<2>);
+DFPQUAN_64i dfp_quan_idd {}
+
+  const _Decimal128 __builtin_dfp_quantize_128 (_Decimal128, _Decimal128, \
+ const int<2>);
+DFPQUAN_128 dfp_quan_td {}
+
+  const _Decimal128 __builtin_dfp_quantize_128i (const int<5>, _Decimal128, \
+  const int<2>);
+DFPQUAN_128i dfp_quan_itd {}
 
 [crypto]
   const vull __builtin_crypto_vcipher (vull, vull);
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index b83946f5ad8..3bb1bedd69d 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -195,6 +195,18 @@
   unsigned 

Re: [PATCH ver 3] rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation

2023-08-09 Thread Carl Love via Gcc-patches
Kewen:

On Wed, 2023-08-09 at 16:47 +0800, Kewen.Lin wrote:


> > Patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
> > LE
> > with no regressions.
> 
> Okay for trunk with two nits below fixed, thanks!

Thanks for all the help with the patch.  Fixed the nits below, compiled
and reran the test cases to make sure everything was OK.  Will go ahead
and commit the patch.
> 
> > gcc/ChangeLog:
> > 
> > * config/rs6000/rs6000-builtins.def (vcmpneb, vcmpneh,
> > vcmpnew):
> > Move definitions to Altivec stanza.
> > * config/rs6000/altivec.md (vcmpneb, vcmpneh, vcmpnew): New
> > define_expand.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/powerpc/vec-cmpne-runnable.c: New execution test.
> > * gcc.target/powerpc/vec-cmpne.c (define_test_functions,
> > execute_test_functions) moved to vec-cmpne.h.  Added
> > scan-assembler-times for vcmpequb, vcmpequh, vcmpequw.
> 
>   s/ moved/: Move/ => "... execute_test_functions): Move "
>   
> s/Added/Add/

Fixed both issues.

> 



> >  
> > +;; Expand for builtin vcmpne{b,h,w}
> > +(define_expand "altivec_vcmpne_"
> > +  [(set (match_operand:VSX_EXTRACT_I 3 "altivec_register_operand"
> > "=v")
> > +   (eq:VSX_EXTRACT_I (match_operand:VSX_EXTRACT_I 1
> > "altivec_register_operand" "v")
> > + (match_operand:VSX_EXTRACT_I 2
> > "altivec_register_operand" "v")))
> > +   (set (match_operand:VSX_EXTRACT_I 0 "altivec_register_operand"
> > "=v")
> > +(not:VSX_EXTRACT_I (match_dup 3)))]
> > +  "TARGET_ALTIVEC"
> > +  {
> > +operands[3] = gen_reg_rtx (GET_MODE (operands[0]));
> > +  });
> 
> Nit: Useless ";".

removed semicolon.

   Carl 



Re: [PATCH] aarch64: enable mixed-types for aarch64 simdclones

2023-08-09 Thread Andre Vieira (lists) via Gcc-patches

Here is my new version, see inline response to your comments.

New cover letter:

This patch enables the use of mixed-types for simd clones for AArch64, 
adds aarch64 as a target_vect_simd_clones and corrects the way the 
simdlen is chosen for non-specified simdlen clauses according to the 
'Vector Function Application Binary Interface Specification for AArch64'.


gcc/ChangeLog:

* config/aarch64/aarch64.cc (currently_supported_simd_type): 
Remove.
(aarch64_simd_clone_compute_vecsize_and_simdlen): Determine 
simdlen according to NDS rule.

(lane_size): New function.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add aarch64 targets to vect_simd_clones.
* c-c++-common/gomp/declare-variant-14.c: Add aarch64 checks 
and remove warning check.

* g++.dg/gomp/attrs-10.C: Likewise.
* g++.dg/gomp/declare-simd-1.C: Likewise.
* g++.dg/gomp/declare-simd-3.C: Likewise.
* g++.dg/gomp/declare-simd-4.C: Likewise.
* gcc.dg/gomp/declare-simd-3.c: Likewise.
* gcc.dg/gomp/simd-clones-2.c: Likewise.
* gfortran.dg/gomp/declare-variant-14.f90: Likewise.
* c-c++-common/gomp/pr60823-1.c: Remove warning check.
* c-c++-common/gomp/pr60823-3.c: Likewise.
* g++.dg/gomp/declare-simd-7.C: Likewise.
* g++.dg/gomp/declare-simd-8.C: Likewise.
* g++.dg/gomp/pr88182.C: Likewise.
* gcc.dg/declare-simd.c: Likewise.
* gcc.dg/gomp/declare-simd-1.c: Likewise.
* gcc.dg/gomp/pr87895-1.c: Likewise.
* gfortran.dg/gomp/declare-simd-2.f90: Likewise.
* gfortran.dg/gomp/declare-simd-coarray-lib.f90: Likewise.
* gfortran.dg/gomp/pr79154-1.f90: Likewise.
* gfortran.dg/gomp/pr83977.f90: Likewise.
* gcc.dg/gomp/pr87887-1.c: Add warning test.
* gcc.dg/gomp/pr89246-1.c: Likewise.
* gcc.dg/gomp/pr99542.c: Update warning test.



On 08/08/2023 11:51, Richard Sandiford wrote:

"Andre Vieira (lists)"  writes:



warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
-   "unsupported return type %qT for % functions",
+   "unsupported return type %qT for simd",
ret_type);


What's the reason for s/% functions/simd/, in particular for
dropping the quotes around simd?


It's to align with i386's error message, this helps with testing as then 
I can avoid having different tests for the same error.


I asked Jakub which one he preferred, and he gave me an explanation why 
the i386's one was preferable, ... but I didn't write it down unfortunately.





return 0;
  }
  
+  nfs_type = ret_type;


Genuine question, but what does nfs stand for in this context?

Was supposed to be nds... my bad.

I don't think this implements the NDS calculation in the spec:

  The `Narrowest Data Size of f`, or ``NDS(f)``, as the minumum of
  the lane size ``LS(P)`` among all input parameters and
  return value  of ``f``.

   ...

   We then define the `Lane Size of P`, or ``LS(P)``, as follows.

   1. If ``MTV(P)`` is ``false`` and ``P`` is a pointer or reference to
  some type ``T`` for which ``PBV(T)`` is ``true``, ``LS(P) =
  sizeof(T)``.
   2. If ``PBV(T(P))`` is ``true``, ``LS(P) = sizeof(P)``.
   3. Otherwise ``LS(P) = sizeof(uintptr_t)``.

AIUI, (1) means that we need to look at the targets of uniform and
linear scalars[*] that have pointer type, so that e.g. a uniform uint8_t *
pointer should cause NDS to be 1.

[*] i.e. arguments that remain scalar in the vector prototype

(2) means that other types of uniform and linear scalars do contribute.
A uniform uint8_t should cause NDS to be 1.


You are right, I misread the ABI description there.



Thanks,
Richarddiff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
7cd230c4602a15980016bdc92e80579be0c07094..458a4dbf76138e329eb99077780089a9b501c046
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -27274,28 +27274,57 @@ supported_simd_type (tree t)
   return false;
 }
 
-/* Return true for types that currently are supported as SIMD return
-   or argument types.  */
+/* Determine the lane size for the clone argument/return type.  This follows
+   the LS(P) rule in the VFABIA64.  */
 
-static bool
-currently_supported_simd_type (tree t, tree b)
+static unsigned
+lane_size (cgraph_simd_clone_arg_type clone_arg_type, tree type)
 {
-  if (COMPLEX_FLOAT_TYPE_P (t))
-return false;
+  gcc_assert (clone_arg_type != SIMD_CLONE_ARG_TYPE_MASK);
 
-  if (TYPE_SIZE (t) != TYPE_SIZE (b))
-return false;
+  /* For non map-to-vector types that are pointers we use the element type it
+ points to.  */
+  if (POINTER_TYPE_P (type))
+switch (clone_arg_type)
+  {
+  default:
+   break;
+  case SIMD_CLONE_ARG_TYPE_UNIFORM:
+  case SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP:
+  case SIMD_CLONE_ARG_TYPE_LINEAR_VARIABLE_STEP:
+   type = TREE_TYPE 

Re: [Patch, fortran] PR109684 - compiling failure: complaining about a final subroutine of a type being not PURE (while it is indeed PURE)

2023-08-09 Thread Paul Richard Thomas via Gcc-patches
I took a look at my calendar and decided to backport right away.

r13-7703-ged049e5d5f36cc0f4318cd93bb6b33ed6f6f2ba7

BTW It is a regression :-)

Paul

On Wed, 9 Aug 2023 at 12:10, Paul Richard Thomas
 wrote:
>
> Committed to trunk as 'obvious' in
> r14-3098-gb8ec3c952324f866f191883473922e250be81341
>
> 13-branch to follow in a few days.
>
> Paul


  1   2   >