Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-13 Thread Kewen.Lin via Gcc-patches
Hi Juzhe,

>>> Yes, we just wants to add the whole vector register length in bytes.
> OK, I learn it and appreciate you give me the information.
> 
>>> I wonder if you also want WHILE_LEN to have the implicit effect
>>>to update vector length register?
>>>From this perspective, Richi's
>>>suggestion on "tieing the scalar result with the uses" looks better IMHO.
> No, I don't want to make WHILE_LEN have implict side-effect.
> Just tieing the scalar result with the uses.
> Updating vector length register, I let RISC-V backend port to do that.
> I don't want to involve any RISC-V specific feature into GCC middle-end.
> 

Good, thanks for clarifying, that makes more sense.

>>>No, for both cases, IV is variable, the dumping at loop2_doloop for the 
>>>proposed sequence says
>>>"Doloop: Possible infinite iteration case.", it seems to show that for the 
>>>proposed sequence compiler
>>>isn't able to figure out the loop is finite, it may miss the range 
>>>information on n, or it isn't
>>>able to analyze how the invariant involves, but I didn't look into it, all 
>>>my guesses.
> Ok, I think it may be fixed in the future.

Yeah, it can be.  It only matters for us when adopting --param 
vect-partial-vector-usage=2 but it's not
default.

> 
> So, I wonder whether you are basically agree with the concept of this patch?
> Would you mind giving more suggestions  that I can fix this patch to make 
> more benefits for IBM (s390 or rs6000)?
> For example, will you try this patch to see whether it can work for IBM in 
> case of multiple rgroup of SLP?

The concept looks good to me, for IBM ports, it can benefit the length 
preparation for the case of --param
vect-partial-vector-usage=2 (excepting for possible missing doloop chance), 
it's neutral for the case of
--param vect-partial-vector-usage=1.  IMHO, if possible you can extend the 
current function vect_set_loop_controls_directly
rather than adding a new function vect_set_loop_controls_by_while_len, since 
that function does handle both
masks and lengths (controls).  And as vect_gen_len's comments shows, once you 
change the length preparation,
you have to adjust the corresponding costs as well.  And sure, once this 
becomes stable (all decisions from
the discussions settled down, gets fully reviewed in stage 1), I'll test it on 
Power10 and get back to you.

BR,
Kewen


Re: [PATCH, rs6000] xfail float128 comparison test case that fails on powerpc64 [PR108728]

2023-04-13 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

on 2023/4/14 09:34, HAO CHEN GUI wrote:
> Hi Kewen,
> 
> 在 2023/4/13 16:32, Kewen.Lin 写道:
>> xfail all powerpc*-*-* can have some XPASSes on those ENVs with
>> software emulation.  Since the related hw insn xscmpuqp is guarded
>> with TARGET_FLOAT128_HW, could we use the effective target
>> ppc_float128_hw instead?
> 
> Thanks for your review comments. It's tricky. It invokes "__lekf2"
> with "-mno-float128_hw". But it doesn't always pass the check.
> With math library on P8, it can. With the library on P9, it fails.

Math library doesn't provide it, __lekf2 is from libgcc (GCC itself).
The reason why the __lekf2 behaves differently on P8 and P9 is that
we have SW and HW versions for __lekf2, when the underlying "CPU
"supports 128-bit IEEE binary floating point instructions.", it will
use __lekf2_hw instead of __lekf2_sw, the former still adopts
insn xscmpuqp, then it fails.

> So it's totally depended on the version of library which is not
> controlled by GCC. What's your opinion?

So 
1) for ppc_float128_hw, it generates xscmpuqp then fails.
2) for !ppc_float128_hw, it uses __lekf2 but the underlying ENV
   supports __builtin_cpu_supports ("ieee128"), it exploits
   xscmpuqp, then fails.

Ideally we should use one effective target like ppc_ieee128_hw to
indicate the underlying ENV supports __builtin_cpu_supports ("ieee128")
but I think it may not be worth to adding that at this stage, so
I'd suggest xfail-ing it for "ppc_float128_hw || (ppc_cpu_supports_hw && 
p9vector_hw)".

BR,
Kewen

> 
> Test result on P9
> make check-gcc-c RUNTESTFLAGS="--target_board=unix'{-mno-float128-hardware}' 
> dg-torture.exp=float128-cmp-invalid.c"
> 
> FAIL: gcc.dg/torture/float128-cmp-invalid.c   -O0  execution test
> FAIL: gcc.dg/torture/float128-cmp-invalid.c   -O1  execution test
> FAIL: gcc.dg/torture/float128-cmp-invalid.c   -O2  execution test
> FAIL: gcc.dg/torture/float128-cmp-invalid.c   -O3 -g  execution test
> FAIL: gcc.dg/torture/float128-cmp-invalid.c   -Os  execution test
> FAIL: gcc.dg/torture/float128-cmp-invalid.c   -O2 -flto 
> -fno-use-linker-plugin -flto-partition=none  execution test
> FAIL: gcc.dg/torture/float128-cmp-invalid.c   -O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects  execution test
> 
> === gcc Summary ===
> 
> # of expected passes7
> # of unexpected failures7
> 
> Gui Haochen
> Thanks


[PATCH 2/2] libstdc++: Implement P2278R4 "cbegin should always return a constant iterator"

2023-04-13 Thread Patrick Palka via Gcc-patches
This also implements the approved follow-up LWG issues 3765, 3766, 3769,
3770, 3811, 3850, 3853, 3862 and 3872.

Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

libstdc++-v3/ChangeLog:

* include/bits/ranges_base.h (const_iterator_t): Define for C++23.
(const_sentinel_t): Likewise.
(range_const_reference_t): Likewise.
(constant_range): Likewise.
(__cust_access::__possibly_const_range): Likewise, replacing ...
(__cust_access::__as_const): ... this.
(__cust_access::_CBegin::operator()): Redefine for C++23 as per P2278R4.
(__cust_access::_CEnd::operator()): Likewise.
(__cust_access::_CRBegin::operator()): Likewise.
(__cust_access::_CREnd::operator()): Likewise.
(__cust_access::_CData::operator()): Likewise.
(__cust_access::_CData::__as_const_pointer): Define for C++23
* include/bits/ranges_util.h (ranges::__detail::__different_from):
Make it an alias of std::__detail::__different_from.
(view_interface::cbegin): Define for C++23.
(view_interface::cend): Likewise.
* include/bits/stl_iterator.h (__detail::__different_from): Define.
(iter_const_reference_t): Define for C++23.
(__detail::__constant_iterator): Likewise.
(__detail::__is_const_iterator): Likewise.
(__detail::__not_a_const_iterator: Likewise.
(__detail::__iter_const_rvalue_reference_t): Likewise.
(__detail::__basic_const_iter_cat):: Likewise.
(const_iterator): Likewise.
(__detail::__const_sentinel): Likewise.
(const_sentinel): Likewise.
(basic_const_iterator): Likewise.
(common_type, _Up>): Likewise.
(common_type<_Up, basic_const_iterator<_Tp>>): Likewise.
(common_type, basic_const_iterator>):
Likewise.
(make_const_iterator): Define for C++23.
(make_const_sentinel): Likewise.
* include/std/ranges (__cpp_lib_ranges_as_const): Likewise.
(as_const_view): Likewise.
(enable_borrowed_range): Likewise.
(views::__detail::__is_ref_view): Likewise.
(views::__detail::__can_is_const_view): Likewise.
(views::_AsConst, views::as_const): Likewise.
* include/std/span (span::const_iterator): Likewise.
(span::const_reverse_iterator): Likewise.
(span::cbegin): Likewise.
(span::cend): Likewise.
(span::crbegin): Likewise.
(span::crend): Likewise.
* include/std/version (__cpp_lib_ranges_as_const): Likewise.
* testsuite/std/ranges/adaptors/join.cc (test06): Adjust to
behave independently of C++20 vs C++23.
* testsuite/std/ranges/version_c++23.cc: Verify value of
__cpp_lib_ranges_as_const macro.
* testsuite/24_iterators/const_iterator/1.cc: New test.
* testsuite/std/ranges/adaptors/as_const/1.cc: New test.
---
 libstdc++-v3/include/bits/ranges_base.h   |  99 +
 libstdc++-v3/include/bits/ranges_util.h   |  22 +-
 libstdc++-v3/include/bits/stl_iterator.h  | 366 ++
 libstdc++-v3/include/std/ranges   | 106 +
 libstdc++-v3/include/std/span |  22 ++
 libstdc++-v3/include/std/version  |   1 +
 .../24_iterators/const_iterator/1.cc  | 140 +++
 .../std/ranges/adaptors/as_const/1.cc |  64 +++
 .../testsuite/std/ranges/adaptors/join.cc |   5 +-
 .../testsuite/std/ranges/version_c++23.cc |   4 +
 10 files changed, 824 insertions(+), 5 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/24_iterators/const_iterator/1.cc
 create mode 100644 libstdc++-v3/testsuite/std/ranges/adaptors/as_const/1.cc

diff --git a/libstdc++-v3/include/bits/ranges_base.h 
b/libstdc++-v3/include/bits/ranges_base.h
index c89cb3e976a..b3144bbae4d 100644
--- a/libstdc++-v3/include/bits/ranges_base.h
+++ b/libstdc++-v3/include/bits/ranges_base.h
@@ -515,6 +515,17 @@ namespace ranges
   template
 using sentinel_t = decltype(ranges::end(std::declval<_Range&>()));
 
+#if __cplusplus > 202002L
+  template
+using const_iterator_t = const_iterator>;
+
+  template
+using const_sentinel_t = const_sentinel>;
+
+  template
+using range_const_reference_t = iter_const_reference_t>;
+#endif
+
   template
 using range_difference_t = iter_difference_t>;
 
@@ -607,8 +618,25 @@ namespace ranges
 concept common_range
   = range<_Tp> && same_as, sentinel_t<_Tp>>;
 
+#if __cplusplus > 202002L
+  template
+concept constant_range
+  = input_range<_Tp> && 
std::__detail::__constant_iterator>;
+#endif
+
   namespace __cust_access
   {
+#if __cplusplus > 202020L
+template
+  constexpr auto&
+  __possibly_const_range(_Range& __r) noexcept
+  {
+   if constexpr (constant_range && !constant_range<_Range>)
+ return const_cast(__r);
+   else
+ return __r;
+  }
+#else
 // If _To is an lvalue-reference, return const _Tp&, otherwise const 

[PATCH] libstdc++: Implement ranges::fold_* from P2322R6

2023-04-13 Thread Patrick Palka via Gcc-patches
Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

libstdc++-v3/ChangeLog:

* include/bits/ranges_algo.h: Include  for C++23.
(__cpp_lib_fold): Define for C++23.
(in_value_result): Likewise.
(__detail::__flipped): Likewise.
(__detail::__indirectly_binary_left_foldable_impl): Likewise.
(__detail::__indirectly_binary_left_foldable): Likewise.
(___detail:__indirectly_binary_right_foldable): Likewise.
(fold_left_with_iter_result): Likewise.
(__fold_left_with_iter_fn, fold_left_with_iter): Likewise.
(__fold_left_fn, fold_left): Likewise.
(__fold_left_first_with_iter_fn, fold_left_first_with_iter):
Likewise.
(__fold_left_first_fn, fold_left_first): Likewise.
(__fold_right_fn, fold_right): Likewise.
(__fold_right_last_fn, fold_right_last): Likewise.
* include/std/version (__cpp_lib_fold): Likewise.
* libstdc++-v3/testsuite/25_algorithms/fold_left/1.cc: New test.
* libstdc++-v3/testsuite/25_algorithms/fold_right/1.cc: New test.
---
 libstdc++-v3/include/bits/ranges_algo.h   | 251 ++
 libstdc++-v3/include/std/version  |   1 +
 .../testsuite/25_algorithms/fold_left/1.cc|  73 +
 .../testsuite/25_algorithms/fold_right/1.cc   |  45 
 4 files changed, 370 insertions(+)
 create mode 100644 libstdc++-v3/testsuite/25_algorithms/fold_left/1.cc
 create mode 100644 libstdc++-v3/testsuite/25_algorithms/fold_right/1.cc

diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
b/libstdc++-v3/include/bits/ranges_algo.h
index 5d039bd1cd4..f041ff16b0e 100644
--- a/libstdc++-v3/include/bits/ranges_algo.h
+++ b/libstdc++-v3/include/bits/ranges_algo.h
@@ -32,6 +32,9 @@
 
 #if __cplusplus > 201703L
 
+#if __cplusplus > 202002L
+#include 
+#endif
 #include 
 #include 
 #include  // concept uniform_random_bit_generator
@@ -3691,6 +3694,254 @@ namespace ranges
   };
 
   inline constexpr __find_last_if_not_fn find_last_if_not{};
+
+#define __cpp_lib_fold 202207L
+
+  template
+struct in_value_result
+{
+  [[no_unique_address]] _Iter in;
+  [[no_unique_address]] _Tp value;
+
+  template
+   requires convertible_to
+ && convertible_to
+  constexpr
+  operator in_value_result<_Iter2, _Tp2>() const &
+  { return {in, value}; }
+
+  template
+   requires convertible_to<_Iter, _Iter2>
+ && convertible_to<_Tp, _Tp2>
+  constexpr
+  operator in_value_result<_Iter2, _Tp2>() &&
+  { return {std::move(in), std::move(value)}; }
+};
+
+  namespace __detail
+  {
+template
+  class __flipped
+  {
+   _Fp _M_f;
+
+  public:
+   template
+ requires invocable<_Fp&, _Up, _Tp>
+   invoke_result_t<_Fp&, _Up, _Tp>
+   operator()(_Tp&&, _Up&&); // not defined
+  };
+
+  template
+  concept __indirectly_binary_left_foldable_impl = movable<_Tp> && 
movable<_Up>
+   && convertible_to<_Tp, _Up>
+   && invocable<_Fp&, _Up, iter_reference_t<_Iter>>
+   && assignable_from<_Up&, invoke_result_t<_Fp&, _Up, 
iter_reference_t<_Iter>>>;
+
+  template
+  concept __indirectly_binary_left_foldable = copy_constructible<_Fp>
+   && indirectly_readable<_Iter>
+   && invocable<_Fp&, _Tp, iter_reference_t<_Iter>>
+   && convertible_to>,
+ decay_t>>>
+   && __indirectly_binary_left_foldable_impl
+   <_Fp, _Tp, _Iter, decay_t>>>;
+
+  template 
+  concept __indirectly_binary_right_foldable
+   = __indirectly_binary_left_foldable<__flipped<_Fp>, _Tp, _Iter>;
+  } // namespace __detail
+
+  template
+using fold_left_with_iter_result = in_value_result<_Iter, _Tp>;
+
+  struct __fold_left_with_iter_fn
+  {
+template
+  static constexpr auto
+  _S_impl(_Iter __first, _Sent __last, _Tp __init, _Fp __f)
+  {
+   using _Up = decay_t>>;
+   using _Ret = fold_left_with_iter_result<_Ret_iter, _Up>;
+
+   if (__first == __last)
+ return _Ret{std::move(__first), _Up(std::move(__init))};
+
+   _Up __accum = std::__invoke(__f, std::move(__init), *__first);
+   for (++__first; __first != __last; ++__first)
+ __accum = std::__invoke(__f, std::move(__accum), *__first);
+   return _Ret{std::move(__first), std::move(__accum)};
+  }
+
+template _Sent, typename _Tp,
+__detail::__indirectly_binary_left_foldable<_Tp, _Iter> _Fp>
+  constexpr auto
+  operator()(_Iter __first, _Sent __last, _Tp __init, _Fp __f) const
+  {
+   using _Ret_iter = _Iter;
+   return _S_impl<_Ret_iter>(std::move(__first), __last,
+ std::move(__init), std::move(__f));
+  }
+
+template> _Fp>
+  constexpr auto
+  operator()(_Range&& __r, _Tp __init, _Fp __f) const
+  {
+   using _Ret_iter = borrowed_iterator_t<_Range>;
+   return _S_impl<_Ret_iter>(ranges::begin(__r), 

[PATCH 1/2] libstdc++: Move down definitions of ranges::cbegin/cend/cetc

2023-04-13 Thread Patrick Palka via Gcc-patches
This moves down the definitions of the const range access CPOs to after
the definition of input_range in preparation for implementing P2287R4
which redefines these CPOs in a way that indirectly uses input_range.

tested on x86_64-pc-linux-gnu, does this look OK for trunk?

libstdc++-v3/ChangeLog:

* include/bits/ranges_base.h (__cust_access::__as_const)
(__cust_access::_CBegin, __cust::cbegin)
(__cust_access::_CEnd, __cust::cend)
(__cust_access::_CRBegin, __cust::crbegin)
(__cust_access::_CREnd, __cust::crend)
(__cust_access::_CData, __cust::cdata): Move down definitions to
shortly after the definition of input_range.
---
 libstdc++-v3/include/bits/ranges_base.h | 174 +---
 1 file changed, 91 insertions(+), 83 deletions(-)

diff --git a/libstdc++-v3/include/bits/ranges_base.h 
b/libstdc++-v3/include/bits/ranges_base.h
index 86952b34096..c89cb3e976a 100644
--- a/libstdc++-v3/include/bits/ranges_base.h
+++ b/libstdc++-v3/include/bits/ranges_base.h
@@ -177,45 +177,6 @@ namespace ranges
}
 };
 
-// If _To is an lvalue-reference, return const _Tp&, otherwise const _Tp&&.
-template
-  constexpr decltype(auto)
-  __as_const(_Tp& __t) noexcept
-  {
-   static_assert(std::is_same_v<_To&, _Tp&>);
-
-   if constexpr (is_lvalue_reference_v<_To>)
- return const_cast(__t);
-   else
- return static_cast(__t);
-  }
-
-struct _CBegin
-{
-  template
-   [[nodiscard]]
-   constexpr auto
-   operator()(_Tp&& __e) const
-   noexcept(noexcept(_Begin{}(__cust_access::__as_const<_Tp>(__e
-   requires requires { _Begin{}(__cust_access::__as_const<_Tp>(__e)); }
-   {
- return _Begin{}(__cust_access::__as_const<_Tp>(__e));
-   }
-};
-
-struct _CEnd final
-{
-  template
-   [[nodiscard]]
-   constexpr auto
-   operator()(_Tp&& __e) const
-   noexcept(noexcept(_End{}(__cust_access::__as_const<_Tp>(__e
-   requires requires { _End{}(__cust_access::__as_const<_Tp>(__e)); }
-   {
- return _End{}(__cust_access::__as_const<_Tp>(__e));
-   }
-};
-
 template
   concept __member_rbegin = requires(_Tp& __t)
{
@@ -337,32 +298,6 @@ namespace ranges
}
 };
 
-struct _CRBegin
-{
-  template
-   [[nodiscard]]
-   constexpr auto
-   operator()(_Tp&& __e) const
-   noexcept(noexcept(_RBegin{}(__cust_access::__as_const<_Tp>(__e
-   requires requires { _RBegin{}(__cust_access::__as_const<_Tp>(__e)); }
-   {
- return _RBegin{}(__cust_access::__as_const<_Tp>(__e));
-   }
-};
-
-struct _CREnd
-{
-  template
-   [[nodiscard]]
-   constexpr auto
-   operator()(_Tp&& __e) const
-   noexcept(noexcept(_REnd{}(__cust_access::__as_const<_Tp>(__e
-   requires requires { _REnd{}(__cust_access::__as_const<_Tp>(__e)); }
-   {
- return _REnd{}(__cust_access::__as_const<_Tp>(__e));
-   }
-};
-
 template
   concept __member_size = !disable_sized_range>
&& requires(_Tp& __t)
@@ -547,36 +482,18 @@ namespace ranges
}
 };
 
-struct _CData
-{
-  template
-   [[nodiscard]]
-   constexpr auto
-   operator()(_Tp&& __e) const
-   noexcept(noexcept(_Data{}(__cust_access::__as_const<_Tp>(__e
-   requires requires { _Data{}(__cust_access::__as_const<_Tp>(__e)); }
-   {
- return _Data{}(__cust_access::__as_const<_Tp>(__e));
-   }
-};
-
   } // namespace __cust_access
 
   inline namespace __cust
   {
 inline constexpr __cust_access::_Begin begin{};
 inline constexpr __cust_access::_End end{};
-inline constexpr __cust_access::_CBegin cbegin{};
-inline constexpr __cust_access::_CEnd cend{};
 inline constexpr __cust_access::_RBegin rbegin{};
 inline constexpr __cust_access::_REnd rend{};
-inline constexpr __cust_access::_CRBegin crbegin{};
-inline constexpr __cust_access::_CREnd crend{};
 inline constexpr __cust_access::_Size size{};
 inline constexpr __cust_access::_SSize ssize{};
 inline constexpr __cust_access::_Empty empty{};
 inline constexpr __cust_access::_Data data{};
-inline constexpr __cust_access::_CData cdata{};
   }
 
   /// [range.range] The range concept.
@@ -690,6 +607,97 @@ namespace ranges
 concept common_range
   = range<_Tp> && same_as, sentinel_t<_Tp>>;
 
+  namespace __cust_access
+  {
+// If _To is an lvalue-reference, return const _Tp&, otherwise const _Tp&&.
+template
+  constexpr decltype(auto)
+  __as_const(_Tp& __t) noexcept
+  {
+   static_assert(std::is_same_v<_To&, _Tp&>);
+
+   if constexpr (is_lvalue_reference_v<_To>)
+ return const_cast(__t);
+   else
+ return static_cast(__t);
+  }
+
+struct _CBegin
+{
+  template
+   [[nodiscard]]
+   constexpr auto
+   

Re: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-13 Thread juzhe.zh...@rivai.ai
And also I already decided to make remove WHILE_LEN pattern since it seems to 
be unnecessary.
And as Richard said, it's just a simple airthmetic and it's not worthwhile to 
do that.

So, I plan to replace WHILE_LEN into MIN_EXPR and make everything RVV specific 
done in RISC-V port.
I think it's more reasonable for IBM use and more target use in the future.

So, this patch will need to changed as "introduce a new flow to do 
vectorization loop control" which is a new loop control flow
with saturating subtracting n down to zero, and add a target hook for it so 
that we can switch to this flow ?

Is it more reasonable ?
Thanks.


juzhe.zh...@rivai.ai
 
From: Kewen.Lin
Date: 2023-04-14 10:54
To: 钟居哲
CC: gcc-patches; Jeff Law; rdapp; richard.sandiford; rguenther
Subject: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for 
auto-vectorization
Hi Juzhe,
 
on 2023/4/13 21:44, 钟居哲 wrote:
> Thanks Kewen.
> 
> Current flow in this patch like you said:
> 
> len = WHILE_LEN (n,vf);
> ...
> v = len_load (addr,len);
> ..
> addr = addr + vf (in byte align);
> 
> 
> This patch is just keep adding address with a vector factor (adjust as byte 
> align).
> For example, if your vector length = 512bit. Then this patch is just updating 
> address as
> addr = addr + 64;
> 
> However, today after I read RVV ISA more deeply, it should be more 
> appropriate that
> the address should updated as : addr = addr + (len * 4) if len is element 
> number of INT32.
> the len is the result by WHILE_LEN which calculate the len.
 
I just read your detailed explanation on the usage of vsetvli insn (really 
appreciate that),
it looks that this WHILE_LEN wants some more semantics than MIN, so I assume 
you still want
to introduce this WHILE_LEN.
 
> 
> I assume for IBM target, it's better to just update address directly adding 
> the whole register bytesize 
> in address IV. Since I think the second way (address = addr + (len * 4)) is 
> too RVV specific, and won't be suitable for IBM. Is that right?
 
Yes, we just wants to add the whole vector register length in bytes.
 
> If it is true, I will keep this patch flow (won't change to  address = addr + 
> (len * 4)) to see what else I need to do for IBM.
> I would rather do that in RISC-V backend port.
 
IMHO, you don't need to push this down to RV backend, just query these ports 
having len_{load,store}
support with a target hook or special operand in optab while_len (see 
internal_len_load_store_bias)
for this need, and generate different codes accordingly.  IIUC, for WHILE_LEN, 
you want it to have
the semantics as what vsetvli performs, but for IBM ports, it would be just 
like MIN_EXPR, maybe we
can also generate MIN or WHILE_LEN based on this kind of target information.
 
If the above assumption holds, I wonder if you also want WHILE_LEN to have the 
implicit effect
to update vector length register?  If yes, the codes with multiple rgroups 
looks unexpected:
 
+ _76 = .WHILE_LEN (ivtmp_74, vf * nitems_per_ctrl);
+ _79 = .WHILE_LEN (ivtmp_77, vf * nitems_per_ctrl);
 
as the latter one seems to override the former.  Besides, if the given operands 
are known constants,
it can't directly be folded into constants and do further propagation.   From 
this perspective, Richi's
suggestion on "tieing the scalar result with the uses" looks better IMHO.
 
> 
>>> I tried
>>>to compile the above source files on Power, the former can adopt doloop
>>>optimization but the latter fails to. 
> You mean GCC can not do hardward loop optimization when IV loop control is 
> variable ? 
 
No, for both cases, IV is variable, the dumping at loop2_doloop for the 
proposed sequence says
"Doloop: Possible infinite iteration case.", it seems to show that for the 
proposed sequence compiler 
isn't able to figure out the loop is finite, it may miss the range information 
on n, or it isn't
able to analyze how the invariant involves, but I didn't look into it, all my 
guesses.
 
BR,
Kewen
 


RE: [PATCH v2] RISC-V: Add test cases for the RVV mask insn shortcut.

2023-04-13 Thread Li, Pan2 via Gcc-patches
Thanks juzhe, update new version [PATCH v3] for even more checks.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Friday, April 14, 2023 10:46 AM
To: Li, Pan2 ; gcc-patches 
Cc: Kito.cheng ; Wang, Yanzhang 
; Li, Pan2 
Subject: Re: [PATCH v2] RISC-V: Add test cases for the RVV mask insn shortcut.

LGTM. Wait for Kito more comments.


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-04-14 10:45
To: gcc-patches
CC: juzhe.zhong; 
kito.cheng; 
yanzhang.wang; pan2.li
Subject: [PATCH v2] RISC-V: Add test cases for the RVV mask insn shortcut.
From: Pan Li mailto:pan2...@intel.com>>

There are sorts of shortcut codegen for the RVV mask insn. For
example.

vmxor vd, va, va => vmclr vd.

We would like to add more optimization like this but first of all
we must add the tests for the existing shortcut optimization, to
ensure we don't break existing optimization from underlying shortcut
optimization.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/mask_insn_shortcut.c: New test.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
.../riscv/rvv/base/mask_insn_shortcut.c   | 239 ++
1 file changed, 239 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
new file mode 100644
index 000..efc3af39fc3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
@@ -0,0 +1,239 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+vbool1_t test_shortcut_for_riscv_vmand_case_0(vbool1_t v1, size_t vl) {
+  return __riscv_vmand_mm_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmand_case_1(vbool2_t v1, size_t vl) {
+  return __riscv_vmand_mm_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmand_case_2(vbool4_t v1, size_t vl) {
+  return __riscv_vmand_mm_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmand_case_3(vbool8_t v1, size_t vl) {
+  return __riscv_vmand_mm_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmand_case_4(vbool16_t v1, size_t vl) {
+  return __riscv_vmand_mm_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmand_case_5(vbool32_t v1, size_t vl) {
+  return __riscv_vmand_mm_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmand_case_6(vbool64_t v1, size_t vl) {
+  return __riscv_vmand_mm_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmnand_case_0(vbool1_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmnand_case_1(vbool2_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmnand_case_2(vbool4_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmnand_case_3(vbool8_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmnand_case_4(vbool16_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmnand_case_5(vbool32_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmnand_case_6(vbool64_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmandn_case_0(vbool1_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmandn_case_1(vbool2_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmandn_case_2(vbool4_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmandn_case_3(vbool8_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmandn_case_4(vbool16_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmandn_case_5(vbool32_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmandn_case_6(vbool64_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmxor_case_0(vbool1_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmxor_case_1(vbool2_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmxor_case_2(vbool4_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmxor_case_3(vbool8_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmxor_case_4(vbool16_t v1, 

[PATCH v3] RISC-V: Add test cases for the RVV mask insn shortcut.

2023-04-13 Thread Pan Li via Gcc-patches
From: Pan Li 

There are sorts of shortcut codegen for the RVV mask insn. For
example.

vmxor vd, va, va => vmclr vd.

We would like to add more optimization like this but first of all
we must add the tests for the existing shortcut optimization, to
ensure we don't break existing optimization from underlying shortcut
optimization.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/mask_insn_shortcut.c: New test.

Signed-off-by: Pan Li 
---
 .../riscv/rvv/base/mask_insn_shortcut.c   | 241 ++
 1 file changed, 241 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
new file mode 100644
index 000..83cc4a1b5a5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
@@ -0,0 +1,241 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+vbool1_t test_shortcut_for_riscv_vmand_case_0(vbool1_t v1, size_t vl) {
+  return __riscv_vmand_mm_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmand_case_1(vbool2_t v1, size_t vl) {
+  return __riscv_vmand_mm_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmand_case_2(vbool4_t v1, size_t vl) {
+  return __riscv_vmand_mm_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmand_case_3(vbool8_t v1, size_t vl) {
+  return __riscv_vmand_mm_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmand_case_4(vbool16_t v1, size_t vl) {
+  return __riscv_vmand_mm_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmand_case_5(vbool32_t v1, size_t vl) {
+  return __riscv_vmand_mm_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmand_case_6(vbool64_t v1, size_t vl) {
+  return __riscv_vmand_mm_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmnand_case_0(vbool1_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmnand_case_1(vbool2_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmnand_case_2(vbool4_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmnand_case_3(vbool8_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmnand_case_4(vbool16_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmnand_case_5(vbool32_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmnand_case_6(vbool64_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmandn_case_0(vbool1_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmandn_case_1(vbool2_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmandn_case_2(vbool4_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmandn_case_3(vbool8_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmandn_case_4(vbool16_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmandn_case_5(vbool32_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmandn_case_6(vbool64_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmxor_case_0(vbool1_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmxor_case_1(vbool2_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmxor_case_2(vbool4_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmxor_case_3(vbool8_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmxor_case_4(vbool16_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmxor_case_5(vbool32_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmxor_case_6(vbool64_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmor_case_0(vbool1_t v1, size_t vl) {
+  return __riscv_vmor_mm_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmor_case_1(vbool2_t v1, size_t vl) {
+  return __riscv_vmor_mm_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmor_case_2(vbool4_t v1, size_t vl) {
+  return __riscv_vmor_mm_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmor_case_3(vbool8_t v1, size_t vl) {
+  return __riscv_vmor_mm_b8(v1, v1, vl);
+}
+
+vbool16_t 

Re: [PATCH] testsuite: filter out warning noise for CWE-1341 test

2023-04-13 Thread guojiufu via Gcc-patches

Hi,

On 2023-04-13 20:08, Segher Boessenkool wrote:

On Thu, Apr 13, 2023 at 07:39:01AM +, Richard Biener wrote:

On Thu, 13 Apr 2023, Jiufu Guo wrote:
I think this should be fixed in the analyzer, "stripping" malloc
tracking from fopen/fclose since it does this manually.  I've adjusted
the bug accordingly.


Yeah.


> > +/* This case checks double-fclose only, suppress other warning.  */
> > +/* { dg-additional-options -Wno-analyzer-double-free } */


So please add "(PR108722)" or such to the comment here?  That is enough
for future people to see if this is still necessary, to maybe remove it
from the testcase here, but certainly not cargo-cult it to other
testcases!


Good suggestions, thanks!
Committed via r13-7176-gedc6659c97c4a7.

BR,
Jeff (Jiufu)



Thanks,


Segher


Re: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-13 Thread juzhe.zh...@rivai.ai
>> Yes, we just wants to add the whole vector register length in bytes.
OK, I learn it and appreciate you give me the information.

>> I wonder if you also want WHILE_LEN to have the implicit effect
>>to update vector length register?
>>From this perspective, Richi's
>>suggestion on "tieing the scalar result with the uses" looks better IMHO.
No, I don't want to make WHILE_LEN have implict side-effect.
Just tieing the scalar result with the uses.
Updating vector length register, I let RISC-V backend port to do that.
I don't want to involve any RISC-V specific feature into GCC middle-end.

>>No, for both cases, IV is variable, the dumping at loop2_doloop for the 
>>proposed sequence says
>>"Doloop: Possible infinite iteration case.", it seems to show that for the 
>>proposed sequence compiler
>>isn't able to figure out the loop is finite, it may miss the range 
>>information on n, or it isn't
>>able to analyze how the invariant involves, but I didn't look into it, all my 
>>guesses.
Ok, I think it may be fixed in the future.

So, I wonder whether you are basically agree with the concept of this patch?
Would you mind giving more suggestions  that I can fix this patch to make more 
benefits for IBM (s390 or rs6000)?
For example, will you try this patch to see whether it can work for IBM in case 
of multiple rgroup of SLP?
 
Thanks.


juzhe.zh...@rivai.ai
 
From: Kewen.Lin
Date: 2023-04-14 10:54
To: 钟居哲
CC: gcc-patches; Jeff Law; rdapp; richard.sandiford; rguenther
Subject: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for 
auto-vectorization
Hi Juzhe,
 
on 2023/4/13 21:44, 钟居哲 wrote:
> Thanks Kewen.
> 
> Current flow in this patch like you said:
> 
> len = WHILE_LEN (n,vf);
> ...
> v = len_load (addr,len);
> ..
> addr = addr + vf (in byte align);
> 
> 
> This patch is just keep adding address with a vector factor (adjust as byte 
> align).
> For example, if your vector length = 512bit. Then this patch is just updating 
> address as
> addr = addr + 64;
> 
> However, today after I read RVV ISA more deeply, it should be more 
> appropriate that
> the address should updated as : addr = addr + (len * 4) if len is element 
> number of INT32.
> the len is the result by WHILE_LEN which calculate the len.
 
I just read your detailed explanation on the usage of vsetvli insn (really 
appreciate that),
it looks that this WHILE_LEN wants some more semantics than MIN, so I assume 
you still want
to introduce this WHILE_LEN.
 
> 
> I assume for IBM target, it's better to just update address directly adding 
> the whole register bytesize 
> in address IV. Since I think the second way (address = addr + (len * 4)) is 
> too RVV specific, and won't be suitable for IBM. Is that right?
 
Yes, we just wants to add the whole vector register length in bytes.
 
> If it is true, I will keep this patch flow (won't change to  address = addr + 
> (len * 4)) to see what else I need to do for IBM.
> I would rather do that in RISC-V backend port.
 
IMHO, you don't need to push this down to RV backend, just query these ports 
having len_{load,store}
support with a target hook or special operand in optab while_len (see 
internal_len_load_store_bias)
for this need, and generate different codes accordingly.  IIUC, for WHILE_LEN, 
you want it to have
the semantics as what vsetvli performs, but for IBM ports, it would be just 
like MIN_EXPR, maybe we
can also generate MIN or WHILE_LEN based on this kind of target information.
 
If the above assumption holds, I wonder if you also want WHILE_LEN to have the 
implicit effect
to update vector length register?  If yes, the codes with multiple rgroups 
looks unexpected:
 
+ _76 = .WHILE_LEN (ivtmp_74, vf * nitems_per_ctrl);
+ _79 = .WHILE_LEN (ivtmp_77, vf * nitems_per_ctrl);
 
as the latter one seems to override the former.  Besides, if the given operands 
are known constants,
it can't directly be folded into constants and do further propagation.   From 
this perspective, Richi's
suggestion on "tieing the scalar result with the uses" looks better IMHO.
 
> 
>>> I tried
>>>to compile the above source files on Power, the former can adopt doloop
>>>optimization but the latter fails to. 
> You mean GCC can not do hardward loop optimization when IV loop control is 
> variable ? 
 
No, for both cases, IV is variable, the dumping at loop2_doloop for the 
proposed sequence says
"Doloop: Possible infinite iteration case.", it seems to show that for the 
proposed sequence compiler 
isn't able to figure out the loop is finite, it may miss the range information 
on n, or it isn't
able to analyze how the invariant involves, but I didn't look into it, all my 
guesses.
 
BR,
Kewen
 


Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-13 Thread Kewen.Lin via Gcc-patches
Hi Juzhe,

on 2023/4/13 21:44, 钟居哲 wrote:
> Thanks Kewen.
> 
> Current flow in this patch like you said:
> 
> len = WHILE_LEN (n,vf);
> ...
> v = len_load (addr,len);
> ..
> addr = addr + vf (in byte align);
> 
> 
> This patch is just keep adding address with a vector factor (adjust as byte 
> align).
> For example, if your vector length = 512bit. Then this patch is just updating 
> address as
> addr = addr + 64;
> 
> However, today after I read RVV ISA more deeply, it should be more 
> appropriate that
> the address should updated as : addr = addr + (len * 4) if len is element 
> number of INT32.
> the len is the result by WHILE_LEN which calculate the len.

I just read your detailed explanation on the usage of vsetvli insn (really 
appreciate that),
it looks that this WHILE_LEN wants some more semantics than MIN, so I assume 
you still want
to introduce this WHILE_LEN.

> 
> I assume for IBM target, it's better to just update address directly adding 
> the whole register bytesize 
> in address IV. Since I think the second way (address = addr + (len * 4)) is 
> too RVV specific, and won't be suitable for IBM. Is that right?

Yes, we just wants to add the whole vector register length in bytes.

> If it is true, I will keep this patch flow (won't change to  address = addr + 
> (len * 4)) to see what else I need to do for IBM.
> I would rather do that in RISC-V backend port.

IMHO, you don't need to push this down to RV backend, just query these ports 
having len_{load,store}
support with a target hook or special operand in optab while_len (see 
internal_len_load_store_bias)
for this need, and generate different codes accordingly.  IIUC, for WHILE_LEN, 
you want it to have
the semantics as what vsetvli performs, but for IBM ports, it would be just 
like MIN_EXPR, maybe we
can also generate MIN or WHILE_LEN based on this kind of target information.

If the above assumption holds, I wonder if you also want WHILE_LEN to have the 
implicit effect
to update vector length register?  If yes, the codes with multiple rgroups 
looks unexpected:

+   _76 = .WHILE_LEN (ivtmp_74, vf * nitems_per_ctrl);
+   _79 = .WHILE_LEN (ivtmp_77, vf * nitems_per_ctrl);

as the latter one seems to override the former.  Besides, if the given operands 
are known constants,
it can't directly be folded into constants and do further propagation.   From 
this perspective, Richi's
suggestion on "tieing the scalar result with the uses" looks better IMHO.

> 
>>> I tried
>>>to compile the above source files on Power, the former can adopt doloop
>>>optimization but the latter fails to. 
> You mean GCC can not do hardward loop optimization when IV loop control is 
> variable ? 

No, for both cases, IV is variable, the dumping at loop2_doloop for the 
proposed sequence says
"Doloop: Possible infinite iteration case.", it seems to show that for the 
proposed sequence compiler 
isn't able to figure out the loop is finite, it may miss the range information 
on n, or it isn't
able to analyze how the invariant involves, but I didn't look into it, all my 
guesses.

BR,
Kewen


Re: [PATCH v2] RISC-V: Add test cases for the RVV mask insn shortcut.

2023-04-13 Thread juzhe.zh...@rivai.ai
LGTM. Wait for Kito more comments.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-04-14 10:45
To: gcc-patches
CC: juzhe.zhong; kito.cheng; yanzhang.wang; pan2.li
Subject: [PATCH v2] RISC-V: Add test cases for the RVV mask insn shortcut.
From: Pan Li 
 
There are sorts of shortcut codegen for the RVV mask insn. For
example.
 
vmxor vd, va, va => vmclr vd.
 
We would like to add more optimization like this but first of all
we must add the tests for the existing shortcut optimization, to
ensure we don't break existing optimization from underlying shortcut
optimization.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/mask_insn_shortcut.c: New test.
 
Signed-off-by: Pan Li 
---
.../riscv/rvv/base/mask_insn_shortcut.c   | 239 ++
1 file changed, 239 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
new file mode 100644
index 000..efc3af39fc3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
@@ -0,0 +1,239 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+vbool1_t test_shortcut_for_riscv_vmand_case_0(vbool1_t v1, size_t vl) {
+  return __riscv_vmand_mm_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmand_case_1(vbool2_t v1, size_t vl) {
+  return __riscv_vmand_mm_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmand_case_2(vbool4_t v1, size_t vl) {
+  return __riscv_vmand_mm_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmand_case_3(vbool8_t v1, size_t vl) {
+  return __riscv_vmand_mm_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmand_case_4(vbool16_t v1, size_t vl) {
+  return __riscv_vmand_mm_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmand_case_5(vbool32_t v1, size_t vl) {
+  return __riscv_vmand_mm_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmand_case_6(vbool64_t v1, size_t vl) {
+  return __riscv_vmand_mm_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmnand_case_0(vbool1_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmnand_case_1(vbool2_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmnand_case_2(vbool4_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmnand_case_3(vbool8_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmnand_case_4(vbool16_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmnand_case_5(vbool32_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmnand_case_6(vbool64_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmandn_case_0(vbool1_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmandn_case_1(vbool2_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmandn_case_2(vbool4_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmandn_case_3(vbool8_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmandn_case_4(vbool16_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmandn_case_5(vbool32_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmandn_case_6(vbool64_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmxor_case_0(vbool1_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmxor_case_1(vbool2_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmxor_case_2(vbool4_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmxor_case_3(vbool8_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmxor_case_4(vbool16_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmxor_case_5(vbool32_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmxor_case_6(vbool64_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmor_case_0(vbool1_t v1, size_t vl) {
+  return __riscv_vmor_mm_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmor_case_1(vbool2_t v1, size_t vl) {
+  return __riscv_vmor_mm_b2(v1, v1, vl);
+}
+
+vbool4_t 

[PATCH v2] RISC-V: Add test cases for the RVV mask insn shortcut.

2023-04-13 Thread Pan Li via Gcc-patches
From: Pan Li 

There are sorts of shortcut codegen for the RVV mask insn. For
example.

vmxor vd, va, va => vmclr vd.

We would like to add more optimization like this but first of all
we must add the tests for the existing shortcut optimization, to
ensure we don't break existing optimization from underlying shortcut
optimization.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/mask_insn_shortcut.c: New test.

Signed-off-by: Pan Li 
---
 .../riscv/rvv/base/mask_insn_shortcut.c   | 239 ++
 1 file changed, 239 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
new file mode 100644
index 000..efc3af39fc3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
@@ -0,0 +1,239 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+vbool1_t test_shortcut_for_riscv_vmand_case_0(vbool1_t v1, size_t vl) {
+  return __riscv_vmand_mm_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmand_case_1(vbool2_t v1, size_t vl) {
+  return __riscv_vmand_mm_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmand_case_2(vbool4_t v1, size_t vl) {
+  return __riscv_vmand_mm_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmand_case_3(vbool8_t v1, size_t vl) {
+  return __riscv_vmand_mm_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmand_case_4(vbool16_t v1, size_t vl) {
+  return __riscv_vmand_mm_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmand_case_5(vbool32_t v1, size_t vl) {
+  return __riscv_vmand_mm_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmand_case_6(vbool64_t v1, size_t vl) {
+  return __riscv_vmand_mm_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmnand_case_0(vbool1_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmnand_case_1(vbool2_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmnand_case_2(vbool4_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmnand_case_3(vbool8_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmnand_case_4(vbool16_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmnand_case_5(vbool32_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmnand_case_6(vbool64_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmandn_case_0(vbool1_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmandn_case_1(vbool2_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmandn_case_2(vbool4_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmandn_case_3(vbool8_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmandn_case_4(vbool16_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmandn_case_5(vbool32_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmandn_case_6(vbool64_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmxor_case_0(vbool1_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmxor_case_1(vbool2_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmxor_case_2(vbool4_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmxor_case_3(vbool8_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmxor_case_4(vbool16_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmxor_case_5(vbool32_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmxor_case_6(vbool64_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmor_case_0(vbool1_t v1, size_t vl) {
+  return __riscv_vmor_mm_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmor_case_1(vbool2_t v1, size_t vl) {
+  return __riscv_vmor_mm_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmor_case_2(vbool4_t v1, size_t vl) {
+  return __riscv_vmor_mm_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmor_case_3(vbool8_t v1, size_t vl) {
+  return __riscv_vmor_mm_b8(v1, v1, vl);
+}
+
+vbool16_t 

RE: [PATCH] RISC-V: Add test cases for the RVV mask insn shortcut.

2023-04-13 Thread Li, Pan2 via Gcc-patches
Sure thing, let me update it ASAP.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Friday, April 14, 2023 10:35 AM
To: Li, Pan2 ; gcc-patches 
Cc: Kito.cheng ; Wang, Yanzhang 
; Li, Pan2 
Subject: Re: [PATCH] RISC-V: Add test cases for the RVV mask insn shortcut.


+/* { dg-final { scan-assembler-not {vmand\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
+/* { dg-final { scan-assembler-not {vmnand\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
+/* { dg-final { scan-assembler-not {vmnandn\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
+/* { dg-final { scan-assembler-not {vmxor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
+/* { dg-final { scan-assembler-not {vmor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
+/* { dg-final { scan-assembler-not {vmnor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
+/* { dg-final { scan-assembler-times 
{vmorn\.mm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 7 } } */
+/* { dg-final { scan-assembler-not {vmxnor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */

It's better add more assembler check
check how many vmclr.m or vmset.m should be.

juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-04-14 10:32
To: gcc-patches
CC: juzhe.zhong; 
kito.cheng; 
yanzhang.wang; pan2.li
Subject: [PATCH] RISC-V: Add test cases for the RVV mask insn shortcut.
From: Pan Li mailto:pan2...@intel.com>>

There are sorts of shortcut codegen for the RVV mask insn. For
example.

vmxor vd, va, va => vmclr vd.

We would like to add more optimization like this but first of all
we must add the tests for the existing shortcut optimization, to
ensure we don't break existing optimization from underlying shortcut
optimization.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/mask_insn_shortcut.c: New test.
---
.../riscv/rvv/base/mask_insn_shortcut.c   | 237 ++
1 file changed, 237 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
new file mode 100644
index 000..8310aabaf59
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
@@ -0,0 +1,237 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+vbool1_t test_shortcut_for_riscv_vmand_case_0(vbool1_t v1, size_t vl) {
+  return __riscv_vmand_mm_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmand_case_1(vbool2_t v1, size_t vl) {
+  return __riscv_vmand_mm_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmand_case_2(vbool4_t v1, size_t vl) {
+  return __riscv_vmand_mm_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmand_case_3(vbool8_t v1, size_t vl) {
+  return __riscv_vmand_mm_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmand_case_4(vbool16_t v1, size_t vl) {
+  return __riscv_vmand_mm_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmand_case_5(vbool32_t v1, size_t vl) {
+  return __riscv_vmand_mm_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmand_case_6(vbool64_t v1, size_t vl) {
+  return __riscv_vmand_mm_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmnand_case_0(vbool1_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmnand_case_1(vbool2_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmnand_case_2(vbool4_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmnand_case_3(vbool8_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmnand_case_4(vbool16_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmnand_case_5(vbool32_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmnand_case_6(vbool64_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmandn_case_0(vbool1_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmandn_case_1(vbool2_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmandn_case_2(vbool4_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmandn_case_3(vbool8_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmandn_case_4(vbool16_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmandn_case_5(vbool32_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmandn_case_6(vbool64_t v1, size_t vl) {
+  return 

Re: [PATCH] RISC-V: Add test cases for the RVV mask insn shortcut.

2023-04-13 Thread juzhe.zh...@rivai.ai

+/* { dg-final { scan-assembler-not {vmand\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
+/* { dg-final { scan-assembler-not {vmnand\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
+/* { dg-final { scan-assembler-not {vmnandn\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
+/* { dg-final { scan-assembler-not {vmxor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
+/* { dg-final { scan-assembler-not {vmor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
+/* { dg-final { scan-assembler-not {vmnor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
+/* { dg-final { scan-assembler-times 
{vmorn\.mm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 7 } } */
+/* { dg-final { scan-assembler-not {vmxnor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */

It's better add more assembler check
check how many vmclr.m or vmset.m should be.


juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-04-14 10:32
To: gcc-patches
CC: juzhe.zhong; kito.cheng; yanzhang.wang; pan2.li
Subject: [PATCH] RISC-V: Add test cases for the RVV mask insn shortcut.
From: Pan Li 
 
There are sorts of shortcut codegen for the RVV mask insn. For
example.
 
vmxor vd, va, va => vmclr vd.
 
We would like to add more optimization like this but first of all
we must add the tests for the existing shortcut optimization, to
ensure we don't break existing optimization from underlying shortcut
optimization.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/mask_insn_shortcut.c: New test.
---
.../riscv/rvv/base/mask_insn_shortcut.c   | 237 ++
1 file changed, 237 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
new file mode 100644
index 000..8310aabaf59
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
@@ -0,0 +1,237 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+vbool1_t test_shortcut_for_riscv_vmand_case_0(vbool1_t v1, size_t vl) {
+  return __riscv_vmand_mm_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmand_case_1(vbool2_t v1, size_t vl) {
+  return __riscv_vmand_mm_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmand_case_2(vbool4_t v1, size_t vl) {
+  return __riscv_vmand_mm_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmand_case_3(vbool8_t v1, size_t vl) {
+  return __riscv_vmand_mm_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmand_case_4(vbool16_t v1, size_t vl) {
+  return __riscv_vmand_mm_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmand_case_5(vbool32_t v1, size_t vl) {
+  return __riscv_vmand_mm_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmand_case_6(vbool64_t v1, size_t vl) {
+  return __riscv_vmand_mm_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmnand_case_0(vbool1_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmnand_case_1(vbool2_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmnand_case_2(vbool4_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmnand_case_3(vbool8_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmnand_case_4(vbool16_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmnand_case_5(vbool32_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmnand_case_6(vbool64_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmandn_case_0(vbool1_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmandn_case_1(vbool2_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmandn_case_2(vbool4_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmandn_case_3(vbool8_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmandn_case_4(vbool16_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmandn_case_5(vbool32_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmandn_case_6(vbool64_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmxor_case_0(vbool1_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmxor_case_1(vbool2_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmxor_case_2(vbool4_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmxor_case_3(vbool8_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b8(v1, v1, 

[PATCH] RISC-V: Add test cases for the RVV mask insn shortcut.

2023-04-13 Thread Pan Li via Gcc-patches
From: Pan Li 

There are sorts of shortcut codegen for the RVV mask insn. For
example.

vmxor vd, va, va => vmclr vd.

We would like to add more optimization like this but first of all
we must add the tests for the existing shortcut optimization, to
ensure we don't break existing optimization from underlying shortcut
optimization.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/mask_insn_shortcut.c: New test.
---
 .../riscv/rvv/base/mask_insn_shortcut.c   | 237 ++
 1 file changed, 237 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
new file mode 100644
index 000..8310aabaf59
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
@@ -0,0 +1,237 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3" } */
+
+#include "riscv_vector.h"
+
+vbool1_t test_shortcut_for_riscv_vmand_case_0(vbool1_t v1, size_t vl) {
+  return __riscv_vmand_mm_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmand_case_1(vbool2_t v1, size_t vl) {
+  return __riscv_vmand_mm_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmand_case_2(vbool4_t v1, size_t vl) {
+  return __riscv_vmand_mm_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmand_case_3(vbool8_t v1, size_t vl) {
+  return __riscv_vmand_mm_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmand_case_4(vbool16_t v1, size_t vl) {
+  return __riscv_vmand_mm_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmand_case_5(vbool32_t v1, size_t vl) {
+  return __riscv_vmand_mm_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmand_case_6(vbool64_t v1, size_t vl) {
+  return __riscv_vmand_mm_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmnand_case_0(vbool1_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmnand_case_1(vbool2_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmnand_case_2(vbool4_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmnand_case_3(vbool8_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmnand_case_4(vbool16_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmnand_case_5(vbool32_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmnand_case_6(vbool64_t v1, size_t vl) {
+  return __riscv_vmnand_mm_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmandn_case_0(vbool1_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmandn_case_1(vbool2_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmandn_case_2(vbool4_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmandn_case_3(vbool8_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmandn_case_4(vbool16_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmandn_case_5(vbool32_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmandn_case_6(vbool64_t v1, size_t vl) {
+  return __riscv_vmandn_mm_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmxor_case_0(vbool1_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmxor_case_1(vbool2_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmxor_case_2(vbool4_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmxor_case_3(vbool8_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b8(v1, v1, vl);
+}
+
+vbool16_t test_shortcut_for_riscv_vmxor_case_4(vbool16_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b16(v1, v1, vl);
+}
+
+vbool32_t test_shortcut_for_riscv_vmxor_case_5(vbool32_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b32(v1, v1, vl);
+}
+
+vbool64_t test_shortcut_for_riscv_vmxor_case_6(vbool64_t v1, size_t vl) {
+  return __riscv_vmxor_mm_b64(v1, v1, vl);
+}
+
+vbool1_t test_shortcut_for_riscv_vmor_case_0(vbool1_t v1, size_t vl) {
+  return __riscv_vmor_mm_b1(v1, v1, vl);
+}
+
+vbool2_t test_shortcut_for_riscv_vmor_case_1(vbool2_t v1, size_t vl) {
+  return __riscv_vmor_mm_b2(v1, v1, vl);
+}
+
+vbool4_t test_shortcut_for_riscv_vmor_case_2(vbool4_t v1, size_t vl) {
+  return __riscv_vmor_mm_b4(v1, v1, vl);
+}
+
+vbool8_t test_shortcut_for_riscv_vmor_case_3(vbool8_t v1, size_t vl) {
+  return __riscv_vmor_mm_b8(v1, v1, vl);
+}
+
+vbool16_t 

New Chinese (simplified) PO file for 'gcc' (version 13.1-b20230409)

2023-04-13 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Chinese (simplified) team of translators.  The file is available at:

https://translationproject.org/latest/gcc/zh_CN.po

(This file, 'gcc-13.1-b20230409.zh_CN.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [PATCH] RISC-V: Update multilib-generator to handle V

2023-04-13 Thread Kito Cheng via Gcc-patches
Thanks for catch this, I didn't enable multilib for linux toolchain
for a while,
I guess we should implement TARGET_COMPUTE_MULTILIB for linux targets
to simplify the damm multilib files, but I agree it's too late in the
release cycle, so let's fix that in this way for now.

So LGTM and OK for trunk, thanks

On Fri, Apr 14, 2023 at 4:55 AM Palmer Dabbelt  wrote:
>
> It looks like multilib-generator hasn't been run for t-linux-multilib in
> a while and it's pretty broken.  In order to regenerate the stub with
> support for V I needed a pair of fixes:
>
> * All extensions were being prefixed with an underscore, which leads to
>   some odd combinations like "rv32gc_v", this just adds underscores to
>   the multi-letter extensions.
> * The input base ISAs were being canonicalized, which resulted in some
>   odd multilib default search paths.  I'm not sure if anything breaks
>   due to this, but it seems safer to just leave them alone.
>
> We've likely got a bunch more issues here related to multlib path
> mangling in the presence of underscores and for other extensions, but
> this at leasts lets me run the testsuite with V enabled.
>
> gcc/ChangeLog:
>
> * config/riscv/multilib-generator (maybe_add_underscore): New
>   function
>   (_expand_combination): Canonicalize differently.
> * config/riscv/t-linux-multilib: Regenerate.
> ---
> We're probably going to need a bunch more work here to handle the
> ISA-dependent multilib resolution, but I don't think that's gcc-13
> material -- certainly I don't have the time to do it, and even if it was
> ready now I'd bet it's too invasive for this point in the development
> cycle.
>
> We probably also want to handle the various B extensions in here, since
> those are upstream and useful.  I'm going to hold off on that for a bit
> as I've got some V-related testsuite failures that I'd rather look at
> first.  I figured it'd be better to just send this now, though, as at
> least I can run the V test suite under multilib.
>
> OK for trunk?
>
> ---
>  gcc/config/riscv/multilib-generator | 18 +++---
>  gcc/config/riscv/t-linux-multilib   | 86 ++---
>  2 files changed, 78 insertions(+), 26 deletions(-)
>
> diff --git a/gcc/config/riscv/multilib-generator 
> b/gcc/config/riscv/multilib-generator
> index 0a3d4c07757..58b7198b243 100755
> --- a/gcc/config/riscv/multilib-generator
> +++ b/gcc/config/riscv/multilib-generator
> @@ -62,6 +62,15 @@ def arch_canonicalize(arch, isa_spec):
>out, err = proc.communicate()
>return out.decode().strip()
>
> +#
> +# Multi-letter extensions are seperated by underscores, but single-letter
> +# extensions are not.
> +#
> +def maybe_add_underscore(ext):
> +  if len(ext) is 1:
> +return ext
> +  return '_' + ext
> +
>  #
>  # Handle expansion operation.
>  #
> @@ -70,11 +79,7 @@ def arch_canonicalize(arch, isa_spec):
>  #
>  def _expand_combination(ext):
>exts = list(ext.split("*"))
> -
> -  # Add underline to every extension.
> -  # e.g.
> -  #  _b * zvamo => _b * _zvamo
> -  exts = list(map(lambda x: '_' + x, exts))
> +  exts = list(map(lambda x: maybe_add_underscore(x), exts))
>
># No need to expand if there is no `*`.
>if len(exts) == 1:
> @@ -163,14 +168,13 @@ for cmodel in cmodels:
>  if cmodel == "compact" and arch.startswith("rv32"):
>continue
>
> -arch = arch_canonicalize (arch, args.misa_spec)
>  arches[arch] = 1
>  abis[abi] = 1
>  extra = list(filter(None, extra.split(',')))
>  ext_combs = expand_combination(ext)
>  alts = sum([[x] + [x + y for y in ext_combs] for x in [arch] + extra], 
> [])
>  alts = filter(lambda x: len(x) != 0, alts)
> -alts = list(map(lambda a : arch_canonicalize(a, args.misa_spec), alts))
> +alts = alts + list(map(lambda a : arch_canonicalize(a, args.misa_spec), 
> alts))
>
>  # Drop duplicated entry.
>  alts = unique(alts)
> diff --git a/gcc/config/riscv/t-linux-multilib 
> b/gcc/config/riscv/t-linux-multilib
> index 298547fee38..400cf7f0634 100644
> --- a/gcc/config/riscv/t-linux-multilib
> +++ b/gcc/config/riscv/t-linux-multilib
> @@ -1,46 +1,94 @@
>  # This file was generated by multilib-generator with the command:
> -#  ./multilib-generator 
> rv32imac-ilp32-rv32ima,rv32imaf,rv32imafd,rv32imafc,rv32imafdc- 
> rv32imafdc-ilp32d-rv32imafd- 
> rv64imac-lp64-rv64ima,rv64imaf,rv64imafd,rv64imafc,rv64imafdc- 
> rv64imafdc-lp64d-rv64imafd-
> -MULTILIB_OPTIONS = 
> march=rv32imac/march=rv32ima/march=rv32imaf/march=rv32imafd/march=rv32imafc/march=rv32imafdc/march=rv32g/march=rv32gc/march=rv64imac/march=rv64ima/march=rv64imaf/march=rv64imafd/march=rv64imafc/march=rv64imafdc/march=rv64g/march=rv64gc
>  mabi=ilp32/mabi=ilp32d/mabi=lp64/mabi=lp64d
> +#  ./multilib-generator 
> rv32imac-ilp32-rv32ima,rv32imaf,rv32imafd,rv32imafc,rv32imafdc-v 
> rv32imafdc-ilp32d-rv32imafd-v 
> rv64imac-lp64-rv64ima,rv64imaf,rv64imafd,rv64imafc,rv64imafdc-v 
> rv64imafdc-lp64d-rv64imafd-v
> 

[PATCH] RISC-V: Support chunk 128

2023-04-13 Thread juzhe . zhong
From: Juzhe-Zhong 

gcc/ChangeLog:

* config/riscv/riscv-modes.def (FLOAT_MODE): Add chunk 128 support.
(VECTOR_BOOL_MODE): Ditto.
(ADJUST_NUNITS): Ditto.
(ADJUST_ALIGNMENT): Ditto.
(ADJUST_BYTESIZE): Ditto.
(ADJUST_PRECISION): Ditto.
(RVV_MODES): Ditto.
(VECTOR_MODE_WITH_PREFIX): Ditto.
* config/riscv/riscv-v.cc (ENTRY): Ditto.
(get_vlmul): Ditto.
(get_ratio): Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_TYPE): Ditto.
* config/riscv/riscv-vector-builtins.def (DEF_RVV_TYPE): Ditto.
(vbool64_t): Ditto.
(vbool32_t): Ditto.
(vbool16_t): Ditto.
(vbool8_t): Ditto.
(vbool4_t): Ditto.
(vbool2_t): Ditto.
(vbool1_t): Ditto.
(vint8mf8_t): Ditto.
(vuint8mf8_t): Ditto.
(vint8mf4_t): Ditto.
(vuint8mf4_t): Ditto.
(vint8mf2_t): Ditto.
(vuint8mf2_t): Ditto.
(vint8m1_t): Ditto.
(vuint8m1_t): Ditto.
(vint8m2_t): Ditto.
(vuint8m2_t): Ditto.
(vint8m4_t): Ditto.
(vuint8m4_t): Ditto.
(vint8m8_t): Ditto.
(vuint8m8_t): Ditto.
(vint16mf4_t): Ditto.
(vuint16mf4_t): Ditto.
(vint16mf2_t): Ditto.
(vuint16mf2_t): Ditto.
(vint16m1_t): Ditto.
(vuint16m1_t): Ditto.
(vint16m2_t): Ditto.
(vuint16m2_t): Ditto.
(vint16m4_t): Ditto.
(vuint16m4_t): Ditto.
(vint16m8_t): Ditto.
(vuint16m8_t): Ditto.
(vint32mf2_t): Ditto.
(vuint32mf2_t): Ditto.
(vint32m1_t): Ditto.
(vuint32m1_t): Ditto.
(vint32m2_t): Ditto.
(vuint32m2_t): Ditto.
(vint32m4_t): Ditto.
(vuint32m4_t): Ditto.
(vint32m8_t): Ditto.
(vuint32m8_t): Ditto.
(vint64m1_t): Ditto.
(vuint64m1_t): Ditto.
(vint64m2_t): Ditto.
(vuint64m2_t): Ditto.
(vint64m4_t): Ditto.
(vuint64m4_t): Ditto.
(vint64m8_t): Ditto.
(vuint64m8_t): Ditto.
(vfloat32mf2_t): Ditto.
(vfloat32m1_t): Ditto.
(vfloat32m2_t): Ditto.
(vfloat32m4_t): Ditto.
(vfloat32m8_t): Ditto.
(vfloat64m1_t): Ditto.
(vfloat64m2_t): Ditto.
(vfloat64m4_t): Ditto.
(vfloat64m8_t): Ditto.
* config/riscv/riscv-vector-switch.def (ENTRY): Ditto.
* config/riscv/riscv.cc (riscv_legitimize_poly_move): Ditto.
(riscv_convert_vector_bits): Ditto.
* config/riscv/riscv.md:
* config/riscv/vector-iterators.md:
* config/riscv/vector.md 
(@pred_indexed_store): Ditto.
(@pred_indexed_store): Ditto.
(@pred_indexed_store): Ditto.
(@pred_indexed_store): Ditto.
(@pred_indexed_store): Ditto.
(@pred_reduc_): Ditto.
(@pred_widen_reduc_plus): Ditto.
(@pred_reduc_plus): Ditto.
(@pred_widen_reduc_plus): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr108185-4.c: Adapt testcase.
* gcc.target/riscv/rvv/base/spill-1.c: Ditto.
* gcc.target/riscv/rvv/base/spill-11.c: Ditto.
* gcc.target/riscv/rvv/base/spill-2.c: Ditto.
* gcc.target/riscv/rvv/base/spill-3.c: Ditto.
* gcc.target/riscv/rvv/base/spill-5.c: Ditto.
* gcc.target/riscv/rvv/base/spill-9.c: Ditto.

---
 gcc/config/riscv/riscv-modes.def  |  89 +--
 gcc/config/riscv/riscv-v.cc   |  17 +-
 gcc/config/riscv/riscv-vector-builtins.cc |  11 +-
 gcc/config/riscv/riscv-vector-builtins.def| 172 +++---
 gcc/config/riscv/riscv-vector-switch.def  | 105 ++--
 gcc/config/riscv/riscv.cc |  12 +-
 gcc/config/riscv/riscv.md |  14 +-
 gcc/config/riscv/vector-iterators.md  | 571 +++---
 gcc/config/riscv/vector.md| 233 +--
 .../gcc.target/riscv/rvv/base/pr108185-4.c|   2 +-
 .../gcc.target/riscv/rvv/base/spill-1.c   |   2 +-
 .../gcc.target/riscv/rvv/base/spill-11.c  |   2 +-
 .../gcc.target/riscv/rvv/base/spill-2.c   |   2 +-
 .../gcc.target/riscv/rvv/base/spill-3.c   |   2 +-
 .../gcc.target/riscv/rvv/base/spill-5.c   |   2 +-
 .../gcc.target/riscv/rvv/base/spill-9.c   |   2 +-
 16 files changed, 783 insertions(+), 455 deletions(-)

diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
index 4cf7cf8b1c6..b1669609eec 100644
--- a/gcc/config/riscv/riscv-modes.def
+++ b/gcc/config/riscv/riscv-modes.def
@@ -27,15 +27,16 @@ FLOAT_MODE (TF, 16, ieee_quad_format);
 /* Encode the ratio of SEW/LMUL into the mask types. There are the following
  * mask types.  */
 
-/* | Mode | MIN_VLEN = 32 | MIN_VLEN = 64 |
-   |  | SEW/LMUL  | SEW/LMUL  |
-   | VNx1BI   | 32| 64|
-   | VNx2BI   | 16| 32|
-   | VNx4BI   | 8 | 16|
-   | 

Re: [PATCH, rs6000] xfail float128 comparison test case that fails on powerpc64 [PR108728]

2023-04-13 Thread HAO CHEN GUI via Gcc-patches
Hi Kewen,

在 2023/4/13 16:32, Kewen.Lin 写道:
> xfail all powerpc*-*-* can have some XPASSes on those ENVs with
> software emulation.  Since the related hw insn xscmpuqp is guarded
> with TARGET_FLOAT128_HW, could we use the effective target
> ppc_float128_hw instead?

Thanks for your review comments. It's tricky. It invokes "__lekf2"
with "-mno-float128_hw". But it doesn't always pass the check.
With math library on P8, it can. With the library on P9, it fails.
So it's totally depended on the version of library which is not
controlled by GCC. What's your opinion?

Test result on P9
make check-gcc-c RUNTESTFLAGS="--target_board=unix'{-mno-float128-hardware}' 
dg-torture.exp=float128-cmp-invalid.c"

FAIL: gcc.dg/torture/float128-cmp-invalid.c   -O0  execution test
FAIL: gcc.dg/torture/float128-cmp-invalid.c   -O1  execution test
FAIL: gcc.dg/torture/float128-cmp-invalid.c   -O2  execution test
FAIL: gcc.dg/torture/float128-cmp-invalid.c   -O3 -g  execution test
FAIL: gcc.dg/torture/float128-cmp-invalid.c   -Os  execution test
FAIL: gcc.dg/torture/float128-cmp-invalid.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  execution test
FAIL: gcc.dg/torture/float128-cmp-invalid.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  execution test

=== gcc Summary ===

# of expected passes7
# of unexpected failures7

Gui Haochen
Thanks


[PATCH] aarch64: disable LDP via tuning structure for -mcpu=ampere1

2023-04-13 Thread Philipp Tomsich
AmpereOne (-mcpu=ampere1) breaks LDP instructions into two uops.
Given the chance that this causes instructions to slip into the next
decoding cycle and the additional overheads when handling
cacheline-crossing LDP instructions, we disable the generation of LDP
isntructions through the tuning structure from instruction combining
(such as in peephole2).

Given the code-density benefits in builtins and prologue/epilogue
expansion, we allow LDPs there.

This commit:
 * adds a new tuning option AARCH64_EXTRA_TUNE_NO_LDP_COMBINE
 * allows -moverride=tune=... to override this

Signed-off-by: Philipp Tomsich 
Co-Authored-By: Di Zhao 

gcc/ChangeLog:

* config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNING_OPTION):
Add AARCH64_EXTRA_TUNE_NO_LDP_COMBINE.
* config/aarch64/aarch64.cc (aarch64_operands_ok_for_ldpstp):
Check for the above tuning option when processing loads.

---

 gcc/config/aarch64/aarch64-tuning-flags.def | 3 +++
 gcc/config/aarch64/aarch64.cc   | 8 +++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
b/gcc/config/aarch64/aarch64-tuning-flags.def
index 712895a5263..52112ba7c48 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -44,6 +44,9 @@ AARCH64_EXTRA_TUNING_OPTION ("cheap_shift_extend", 
CHEAP_SHIFT_EXTEND)
 /* Disallow load/store pair instructions on Q-registers.  */
 AARCH64_EXTRA_TUNING_OPTION ("no_ldp_stp_qregs", NO_LDP_STP_QREGS)
 
+/* Disallow load-pair instructions to be formed in combine/peephole.  */
+AARCH64_EXTRA_TUNING_OPTION ("no_ldp_combine", NO_LDP_COMBINE)
+
 AARCH64_EXTRA_TUNING_OPTION ("rename_load_regs", RENAME_LOAD_REGS)
 
 AARCH64_EXTRA_TUNING_OPTION ("cse_sve_vl_constants", CSE_SVE_VL_CONSTANTS)
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index f4ef22ce02f..8dc1a9ceb17 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -1971,7 +1971,7 @@ static const struct tune_params ampere1a_tunings =
   2,   /* min_div_recip_mul_df.  */
   0,   /* max_case_values.  */
   tune_params::AUTOPREFETCHER_WEAK,/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),   /* tune_flags.  */
+  (AARCH64_EXTRA_TUNE_NO_LDP_COMBINE), /* tune_flags.  */
   _prefetch_tune
 };
 
@@ -26053,6 +26053,12 @@ aarch64_operands_ok_for_ldpstp (rtx *operands, bool 
load,
   enum reg_class rclass_1, rclass_2;
   rtx mem_1, mem_2, reg_1, reg_2;
 
+  /* Allow the tuning structure to disable LDP instruction formation
+ from combining instructions (e.g., in peephole2).  */
+  if (load && (aarch64_tune_params.extra_tuning_flags
+  & AARCH64_EXTRA_TUNE_NO_LDP_COMBINE))
+return false;
+
   if (load)
 {
   mem_1 = operands[1];
-- 
2.34.1



Re: [PATCH,FORTRAN 00/29] Move towards stringpool, part 1

2023-04-13 Thread Bernhard Reutner-Fischer via Gcc-patches
Hi all, Janne!

On Wed, 19 Sep 2018 16:40:01 +0200
Bernhard Reutner-Fischer  wrote:
> On Fri, 7 Sep 2018 at 10:07, Bernhard Reutner-Fischer
>  wrote:
> >
> > On Wed, 5 Sep 2018 at 20:57, Janne Blomqvist  
> > wrote:  
> > >
> > > On Wed, Sep 5, 2018 at 5:58 PM Bernhard Reutner-Fischer 
> > >  wrote:  
> >  
> > >> Bootstrapped and regtested on x86_64-foo-linux.
> > >>
> > >> I'd appreciate if someone could double check for regressions on other
> > >> setups. Git branch:
> > >> https://gcc.gnu.org/git/?p=gcc.git;a=log;h=refs/heads/aldot/fortran-fe-stringpool
> > >>
> > >> Ok for trunk?  
> > >
> > >
> > > Hi,
> > >
> > > this is quite an impressive patch set. I have looked through all the 
> > > patches, and on the surface they all look ok.  
> >
> > Thanks alot for your appreciation!  
> > >
> > > Unfortunately I don't have any exotic target to test on either, so I 
> > > think you just have to commit it and check for regression reports. Though 
> > > I don't see this set doing anything which would work differently on other 
> > > targets, but you never know..
> > >
> > > I'd say wait a few days in case anybody else wants to comment on it, then 
> > > commit it to trunk.  

Unfortunately I never got to commit it.

Are you still OK with this series?

Iff so, i will refresh it for gcc-14 stage1. I would formally resubmit
the series for approval then, of course, for good measure.

It will need some small adjustments since coarrays were added
afterwards and a few other spots were changed since then.

Note that after your OK i fixed the problem described below with this
patch
https://inbox.sourceware.org/gcc-patches/20180919225533.20009-1-rep.dot@gmail.com/
and so i think this "[PATCH,FORTRAN v2] Use stringpool on loading
module symbols" was not formally OKed yet, FWIW. I believe that this v2 worked 
flawlessly.
Note, however, that by adding/changing module names of symbols in the
module files, this will (i think / fear) require a bump of the module
version if we are honest. Ultimately that was the reason why i did not
push the series back then.

Link to the old series:
https://inbox.sourceware.org/gcc-patches/20180905145732.404-1-rep.dot@gmail.com/

thanks and cheers,

> >
> > Upon further testing i encountered a regression in module writing,
> > manifesting itself in a failure to compile ieee_8.f90 (and only this).  
> 
> > Sorry for that, I'll have another look during the weekend.  
> 
> so in free_pi_tree we should not free true_name nor module:
> 
> @@ -239,12 +239,6 @@ free_pi_tree (pointer_info *p)
>free_pi_tree (p->left);
>free_pi_tree (p->right);
> 
> -  if (iomode == IO_INPUT)
> -{
> -  XDELETEVEC (p->u.rsym.true_name);
> -  XDELETEVEC (p->u.rsym.module);
> -}
> -
>free (p);
>  }
> 
> This fixes the module writing but leaks, obviously.
> Now the reason why i initially did not use mio_pool_string for both
> rsym.module and rsym.true_name was that mio_pool_string sets the name
> to NULL if the string is empty.
> Currently these are read by read_string() [which we should get rid of
> entirely, the 2 mentioned fields are the last two who use
> read_string()] which does not nullify the empty string but returns
> just the pointer. For e.g. ieee_8.f90 using mio_pool_string gives us a
> NULL module which leads to gfc_use_module -> load_needed ->
> gfc_find_symbol -> gfc_find_sym_tree -> gfc_find_symtree which tries
> to c = strcmp (name, st->name); where name is NULL.
> 
> The main culprits seem to be class finalization wrapper variables so
> i'm adding modules to those now.
> Which leaves me with regressions like allocate_with_source_14.f03.
> "Fixing" these by falling back to gfc_current_ns->proc_name->name in
> load_needed for !ns->proc_name if the rsym->module is NULL seems to
> work.
> 
> Now there are a number of issues with names of fixups. Like the 9 in e.g.:
> 
> $ zcat /tmp/n/m.mod | egrep -v "^(\(\)|\(\(\)|$)"
> GFORTRAN module version '15' created from generic_27.f90
> (('testif' 'm' 2 3))
> (4 'm' 'm' '' 1 ((MODULE UNKNOWN-INTENT UNKNOWN-PROC UNKNOWN UNKNOWN 0 0)
> 3 'test1' 'm' '' 1 ((PROCEDURE UNKNOWN-INTENT MODULE-PROC DECL UNKNOWN 0
> 0 FUNCTION) () (REAL 4 0 0 0 REAL ()) 5 0 (6) () 3 () () () 0 0)
> 2 'test2' 'm' '' 1 ((PROCEDURE UNKNOWN-INTENT MODULE-PROC DECL UNKNOWN 0
> 0 FUNCTION ARRAY_OUTER_DEPENDENCY) () (REAL 4 0 0 0 REAL ()) 7 0 (8) ()
> 2 () () () 0 0)
> 6 'obj' '' '' 5 ((VARIABLE UNKNOWN-INTENT UNKNOWN-PROC UNKNOWN UNKNOWN 0
> 0 DUMMY) () (REAL 4 0 0 0 REAL ()) 0 0 () () 0 () () () 0 0)
> 8 'pr' '' '' 7 ((PROCEDURE UNKNOWN-INTENT DUMMY-PROC UNKNOWN UNKNOWN 0 0
> EXTERNAL DUMMY FUNCTION PROCEDURE ARRAY_OUTER_DEPENDENCY) () (REAL 4 9 0
> 0 REAL ()) 0 0 () () 8 () () () 0 0)
> 9 '' '' '' 7 ((PROCEDURE UNKNOWN-INTENT UNKNOWN-PROC UNKNOWN UNKNOWN 0 0
> FUNCTION) () (REAL 4 0 0 0 REAL ()) 0 0 () () 0 () () () 0 0)
> )
> ('m' 0 4 'test1' 0 3 'test2' 0 2)
> 
> which is a bit of a complication since we need them to verify proper
> interface types and 

[Patch, committed] Fortran: call of overloaded ‘abs(long long int&)’ is ambiguous [PR109492]

2023-04-13 Thread Harald Anlauf via Gcc-patches
Dear all,

I've committed the attached patch which fixes a portability issue
when bootstrapping on Solaris.  Discussed and confirmed in the PR
by Jonathan for Solaris and regtested by me on x86_64-pc-linux-gnu.

https://gcc.gnu.org/g:43816633afd275a9057232a6ebfdc19e441f09ec

(Unfortunately the commit message contains Unicode characters
that I got by using copy of the error message.  I wonder
if "git gcc-verify" could have warned me ...)

Thanks,
Harald

From 43816633afd275a9057232a6ebfdc19e441f09ec Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Thu, 13 Apr 2023 22:42:23 +0200
Subject: [PATCH] =?UTF-8?q?Fortran:=20call=20of=20overloaded=20=E2=80=98ab?=
 =?UTF-8?q?s(long=20long=20int&)=E2=80=99=20is=20ambiguous=20[PR109492]?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

gcc/fortran/ChangeLog:

	PR fortran/109492
	* trans-expr.cc (gfc_conv_power_op): Use absu_hwi and
	unsigned HOST_WIDE_INT for portability.
---
 gcc/fortran/trans-expr.cc | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 79367fa2ae0..09cdd9263c4 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -3400,11 +3400,12 @@ gfc_conv_power_op (gfc_se * se, gfc_expr * expr)
   && TREE_CODE (TREE_TYPE (rse.expr)) == INTEGER_TYPE)
 {
   wi::tree_to_wide_ref wlhs = wi::to_wide (lse.expr);
-  HOST_WIDE_INT v, w;
+  HOST_WIDE_INT v;
+  unsigned HOST_WIDE_INT w;
   int kind, ikind, bit_size;

   v = wlhs.to_shwi ();
-  w = abs (v);
+  w = absu_hwi (v);

   kind = expr->value.op.op1->ts.kind;
   ikind = gfc_validate_kind (BT_INTEGER, kind, false);
--
2.35.3



[PATCH] RISC-V: Update multilib-generator to handle V

2023-04-13 Thread Palmer Dabbelt
It looks like multilib-generator hasn't been run for t-linux-multilib in
a while and it's pretty broken.  In order to regenerate the stub with
support for V I needed a pair of fixes:

* All extensions were being prefixed with an underscore, which leads to
  some odd combinations like "rv32gc_v", this just adds underscores to
  the multi-letter extensions.
* The input base ISAs were being canonicalized, which resulted in some
  odd multilib default search paths.  I'm not sure if anything breaks
  due to this, but it seems safer to just leave them alone.

We've likely got a bunch more issues here related to multlib path
mangling in the presence of underscores and for other extensions, but
this at leasts lets me run the testsuite with V enabled.

gcc/ChangeLog:

* config/riscv/multilib-generator (maybe_add_underscore): New
  function
  (_expand_combination): Canonicalize differently.
* config/riscv/t-linux-multilib: Regenerate.
---
We're probably going to need a bunch more work here to handle the
ISA-dependent multilib resolution, but I don't think that's gcc-13
material -- certainly I don't have the time to do it, and even if it was
ready now I'd bet it's too invasive for this point in the development
cycle.

We probably also want to handle the various B extensions in here, since
those are upstream and useful.  I'm going to hold off on that for a bit
as I've got some V-related testsuite failures that I'd rather look at
first.  I figured it'd be better to just send this now, though, as at
least I can run the V test suite under multilib.

OK for trunk?

---
 gcc/config/riscv/multilib-generator | 18 +++---
 gcc/config/riscv/t-linux-multilib   | 86 ++---
 2 files changed, 78 insertions(+), 26 deletions(-)

diff --git a/gcc/config/riscv/multilib-generator 
b/gcc/config/riscv/multilib-generator
index 0a3d4c07757..58b7198b243 100755
--- a/gcc/config/riscv/multilib-generator
+++ b/gcc/config/riscv/multilib-generator
@@ -62,6 +62,15 @@ def arch_canonicalize(arch, isa_spec):
   out, err = proc.communicate()
   return out.decode().strip()
 
+#
+# Multi-letter extensions are seperated by underscores, but single-letter
+# extensions are not.
+#
+def maybe_add_underscore(ext):
+  if len(ext) is 1:
+return ext
+  return '_' + ext
+
 #
 # Handle expansion operation.
 #
@@ -70,11 +79,7 @@ def arch_canonicalize(arch, isa_spec):
 #
 def _expand_combination(ext):
   exts = list(ext.split("*"))
-
-  # Add underline to every extension.
-  # e.g.
-  #  _b * zvamo => _b * _zvamo
-  exts = list(map(lambda x: '_' + x, exts))
+  exts = list(map(lambda x: maybe_add_underscore(x), exts))
 
   # No need to expand if there is no `*`.
   if len(exts) == 1:
@@ -163,14 +168,13 @@ for cmodel in cmodels:
 if cmodel == "compact" and arch.startswith("rv32"):
   continue
 
-arch = arch_canonicalize (arch, args.misa_spec)
 arches[arch] = 1
 abis[abi] = 1
 extra = list(filter(None, extra.split(',')))
 ext_combs = expand_combination(ext)
 alts = sum([[x] + [x + y for y in ext_combs] for x in [arch] + extra], [])
 alts = filter(lambda x: len(x) != 0, alts)
-alts = list(map(lambda a : arch_canonicalize(a, args.misa_spec), alts))
+alts = alts + list(map(lambda a : arch_canonicalize(a, args.misa_spec), 
alts))
 
 # Drop duplicated entry.
 alts = unique(alts)
diff --git a/gcc/config/riscv/t-linux-multilib 
b/gcc/config/riscv/t-linux-multilib
index 298547fee38..400cf7f0634 100644
--- a/gcc/config/riscv/t-linux-multilib
+++ b/gcc/config/riscv/t-linux-multilib
@@ -1,46 +1,94 @@
 # This file was generated by multilib-generator with the command:
-#  ./multilib-generator 
rv32imac-ilp32-rv32ima,rv32imaf,rv32imafd,rv32imafc,rv32imafdc- 
rv32imafdc-ilp32d-rv32imafd- 
rv64imac-lp64-rv64ima,rv64imaf,rv64imafd,rv64imafc,rv64imafdc- 
rv64imafdc-lp64d-rv64imafd-
-MULTILIB_OPTIONS = 
march=rv32imac/march=rv32ima/march=rv32imaf/march=rv32imafd/march=rv32imafc/march=rv32imafdc/march=rv32g/march=rv32gc/march=rv64imac/march=rv64ima/march=rv64imaf/march=rv64imafd/march=rv64imafc/march=rv64imafdc/march=rv64g/march=rv64gc
 mabi=ilp32/mabi=ilp32d/mabi=lp64/mabi=lp64d
+#  ./multilib-generator 
rv32imac-ilp32-rv32ima,rv32imaf,rv32imafd,rv32imafc,rv32imafdc-v 
rv32imafdc-ilp32d-rv32imafd-v 
rv64imac-lp64-rv64ima,rv64imaf,rv64imafd,rv64imafc,rv64imafdc-v 
rv64imafdc-lp64d-rv64imafd-v
+MULTILIB_OPTIONS = 

[PATCH] rs6000: Add buildin for mffscrn instructions

2023-04-13 Thread Carl Love via Gcc-patches


GCC maintainers:

The following patch adds an overloaded builtin.  There are two possible
arguments for the builtin.  The builtin definitions are:

  double __builtin_mffscrn (unsigned long int);
  double __builtin_mffscrn (double);

The patch has been tested on Power 10 with no regressions.  

Please let me know if the patch is acceptable for mainline.  Thanks.

Carl 

---
rs6000: Add buildin for mffscrn instructions

This patch adds overloaded __builtin_mffscrn for the move From FPSCR
Control & Set R instruction with an immediate argument.  It also adds the
builtin with a floating point register argument.  A new runnable test is
added for the new builtin.

gcc/

* config/rs6000/rs6000-builtins.def (__builtin_mffscrni,
__builtin_mffscrnd): Add builtin definitions.
* config/rs6000/rs6000-overload.def (__builtin_mffscrn): Add
overloaded definition.
* doc/extend.texi: Add documentation for __builtin_mffscrn.

gcc/testsuite/

* gcc.target/powerpc/builtin-mffscrn.c: Add testcase for new
builtin.
---
 gcc/config/rs6000/rs6000-builtins.def |   7 ++
 gcc/config/rs6000/rs6000-overload.def |   5 +
 gcc/doc/extend.texi   |   8 ++
 .../gcc.target/powerpc/builtin-mffscrn.c  | 105 ++
 4 files changed, 125 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtin-mffscrn.c

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 03fb194b151..6247cb6c0fe 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -2863,6 +2863,13 @@
   pure vsc __builtin_vsx_xl_len_r (void *, signed long);
 XL_LEN_R xl_len_r {}
 
+; Immediate instruction only uses the least significant two bits of the
+; const int.
+  double __builtin_mffscrni (const int<2>);
+MFFSCRNI rs6000_mffscrni {}
+
+  double __builtin_mffscrnd (double);
+MFFSCRNF rs6000_mffscrn {}
 
 ; Builtins requiring hardware support for IEEE-128 floating-point.
 [ieee128-hw]
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index c582490c084..adda2df69ea 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -78,6 +78,11 @@
 ; like after a required newline, but nowhere else.  Lines beginning with
 ; a semicolon are also treated as blank lines.
 
+[MFFSCR, __builtin_mffscrn, __builtin_mffscrn]
+  double __builtin_mffscrn (const int<2>);
+MFFSCRNI
+  double __builtin_mffscrn (double);
+MFFSCRNF
 
 [BCDADD, __builtin_bcdadd, __builtin_vec_bcdadd]
   vsq __builtin_vec_bcdadd (vsq, vsq, const int);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 3adb67aa47a..168d439c0e4 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -18317,6 +18317,9 @@ int __builtin_dfp_dtstsfi_ov_td (unsigned int 
comparison, _Decimal128 value);
 
 double __builtin_mffsl(void);
 
+double __builtin_mffscrn (unsigned long int);
+double __builtin_mffscrn (double);
+
 @end smallexample
 The @code{__builtin_byte_in_set} function requires a
 64-bit environment supporting ISA 3.0 or later.  This function returns
@@ -18373,6 +18376,11 @@ the FPSCR.  The instruction is a lower latency version 
of the @code{mffs}
 instruction.  If the @code{mffsl} instruction is not available, then the
 builtin uses the older @code{mffs} instruction to read the FPSCR.
 
+The @code{__builtin_mffscrn} returns the contents of the control bits in the
+FPSCR, bits 29:31 (DRN) and bits 56:63 (VE, OE, UE, ZE, XE, NI, RN).  The
+contents of bits [62:63] of the unsigned long int or double argument are placed
+into bits [62:63] of the FPSCR (RN).
+
 @node Basic PowerPC Built-in Functions Available on ISA 3.1
 @subsubsection Basic PowerPC Built-in Functions Available on ISA 3.1
 
diff --git a/gcc/testsuite/gcc.target/powerpc/builtin-mffscrn.c 
b/gcc/testsuite/gcc.target/powerpc/builtin-mffscrn.c
new file mode 100644
index 000..433a9081499
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/builtin-mffscrn.c
@@ -0,0 +1,105 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p9vector_hw } */
+
+#include 
+
+#ifdef DEBUG
+#include 
+#endif
+
+#define MASK 0x3
+#define EXPECTED1 0x1
+#define EXPECTED2 0x2
+
+void abort (void);
+
+int
+main()
+{
+  unsigned long mask, result, expected;
+  double double_arg;
+  
+  union convert_t {
+double d;
+unsigned long ul;
+  } val;
+
+  /* Test immediate version of __builtin_mffscrn. */
+  /* Read FPSCR and set RN bits in FPSCR[62:63]. */
+  val.d = __builtin_mffscrn (EXPECTED2);
+
+  /* Read FPSCR, bits [62:63] should have been set to 0x2 by previous builtin
+ call.  */
+  val.d = __builtin_mffscrn (EXPECTED1);
+  /* The expected result is the argument for the previous call to
+ __builtin_mffscrn.  */
+  expected = EXPECTED2;
+  result = MASK & val.ul;
+
+  if (EXPECTED2 != result)
+#ifdef 

[PATCH] rs6000: Fix test gc.target/powerpc/rs600-fpint.c test options

2023-04-13 Thread Carl Love via Gcc-patches


GCC maintainers:

The following patch fixes the dg-options for test powerpc/rs600-
fpint.c.  The test now works correctly on Power 10.  The patch has been
tested on Power10 with no regressions.

Please let me know if the patch is acceptable for mainline.  Thanks.

Carl 

-
rs6000: Fix test gc.target/powerpc/rs600-fpint.c test options

The test compile option rs6000-*-* is outdated and no longer supported.
The powerpc*-*-* is the defualt, so it doesn't need to be specified.
The dg-options needs to specify an older processor to get the desired
behavior on recent processors.

This patch updates the test specifications so the test will run properly on
Power10LE.  Tested on Power10 LE system with no regression test failures.

gcc/testsuite/:
* gcc.target/powerpc/rs6000-fpint.c: Update dg-options, drop dg-do
compile specifier.
---
 gcc/testsuite/gcc.target/powerpc/rs6000-fpint.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/rs6000-fpint.c 
b/gcc/testsuite/gcc.target/powerpc/rs6000-fpint.c
index 410f780de8b..fdb0a371929 100644
--- a/gcc/testsuite/gcc.target/powerpc/rs6000-fpint.c
+++ b/gcc/testsuite/gcc.target/powerpc/rs6000-fpint.c
@@ -1,5 +1,4 @@
-/* { dg-do compile { target powerpc*-*-* rs6000-*-* } } */
-/* { dg-options "-mno-powerpc-gfxopt" } */
+/* { dg-options "-mno-powerpc-gfxopt -mdejagnu-cpu=power6" } */
 /* { dg-final { scan-assembler-not "stfiwx" } } */
 
 /* A basic test of the old-style (not stfiwx) fp -> int conversion.  */
-- 
2.37.2




Re: [PATCH] loop-iv: Fix up bounds computation

2023-04-13 Thread Jeff Law via Gcc-patches




On 4/13/23 07:45, Jakub Jelinek wrote:

On Thu, Apr 13, 2023 at 06:35:07AM -0600, Jeff Law wrote:

Bootstrap was successful with v3, but there's hundreds of testsuite failures
due to the simplify-rtx hunk.  compile/20070520-1.c for example when
compiled with:  -O3 -funroll-loops -march=rv64gc -mabi=lp64d

Thursdays are my hell day.  It's unlikely I'd be able to look at this at all
today.


So, seems to me this is because loop-iv.cc asks for invalid RTL to be
simplified, it calls simplify_gen_binary (AND, SImode,
(subreg:SI (plus:DI (reg:DI 289 [ ivtmp_312 ])
 (const_int 4294967295 [0x])) 0),
(const_int 4294967295 [0x]))
but 0x is not valid SImode CONST_INT, and unlike previously
we no longer on WORD_REGISTER_OPERATIONS targets which have DImode
word_mode optimize that into the op0, so the invalid constant is emitted
into the IL and checking fails.

The following patch fixes that (and we optimize that & -1 away even earlier
with that).

Could you please just quickly try to apply this patch, make in the stage3
directory followed by
make check-gcc RUNTESTFLAGS="... compile.exp='20070520-1.c ...'"
(with all tests that regressed previously), whether this is the only spot
or whether we need to fix some other place too?

2023-04-13  Jakub Jelinek  

* loop-iv.cc (iv_number_of_iterations): Use gen_int_mode instead
of GEN_INT.

That fixes all the regressions and looks OK to me.

jeff


Re: [PATCH] c++: 'typename T::X' vs 'struct T::X' lookup [PR109420]

2023-04-13 Thread Jason Merrill via Gcc-patches

On 4/5/23 13:31, Patrick Palka wrote:

On Wed, 5 Apr 2023, Patrick Palka wrote:


r13-6098-g46711ff8e60d64 made make_typename_type no longer ignore
non-types during the lookup, unless the TYPENAME_TYPE in question was
followed by the :: scope resolution operator.  But there is another
exception to this rule: we need to ignore non-types during the lookup
also if the TYPENAME_TYPE was named with a tag other than 'typename',
such as 'struct' or 'enum', as per [dcl.type.elab]/5.

This patch implements this additional exception.  It occurred to me that
the tf_qualifying_scope flag is probably unnecessary if we'd use the
scope_type tag more thoroughly, but that requires parser changes that
are probably too risky at this stage.  (I'm working on addressing the
FIXME/TODOs here for GCC 14.)

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/109420

gcc/cp/ChangeLog:

* decl.cc (make_typename_type): Also ignore non-types during
the lookup if tag_type is something other than none_type or
typename_type.
* pt.cc (tsubst) : Pass class_type or
enum_type as tag_type to make_typename_type as appropriate
instead of always passing typename_type.

gcc/testsuite/ChangeLog:

* g++.dg/template/typename27.C: New test.
---
  gcc/cp/decl.cc |  9 -
  gcc/cp/pt.cc   |  9 -
  gcc/testsuite/g++.dg/template/typename27.C | 19 +++
  3 files changed, 35 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/template/typename27.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 5369714f9b3..a0a20c5accc 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -4307,7 +4307,14 @@ make_typename_type (tree context, tree name, enum 
tag_types tag_type,
   lookup will stop when we hit a dependent base.  */
if (!dependent_scope_p (context))
  {
-  bool want_type = (complain & tf_qualifying_scope);
+  /* As per [dcl.type.elab]/5 and [temp.res.general]/3, ignore non-types if
+the tag corresponds to a class-key or 'enum' (or is scope_type), or if
+this typename is followed by :: as per [basic.lookup.qual.general]/1.
+TODO: If we'd set the scope_type tag accurately on all TYPENAME_TYPEs
+that are followed by :: then we wouldn't need the tf_qualifying_scope
+flag.  */
+  bool want_type = (tag_type != none_type && tag_type != typename_type)
+   || (complain & tf_qualifying_scope);


Here's v2 which just slightly improves this comment.  I reckon 
[basic.lookup.elab]
is a better reference than [dcl.type.elab]/5 for justifying why the
lookup should be type-only for class-key and 'enum' TYPENAME_TYPEs.


OK, thanks.


-- >8 --

PR c++/109420

gcc/cp/ChangeLog:

* decl.cc (make_typename_type): Also ignore non-types during the
lookup if tag_type corresponds to an elaborated-type-specifier.
* pt.cc (tsubst) : Pass class_type or
enum_type as tag_type to make_typename_type as appropriate
instead of always passing typename_type.

gcc/testsuite/ChangeLog:

* g++.dg/template/typename27.C: New test.
---
  gcc/cp/decl.cc | 12 +++-
  gcc/cp/pt.cc   |  9 -
  gcc/testsuite/g++.dg/template/typename27.C | 19 +++
  3 files changed, 38 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/template/typename27.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 5369714f9b3..772c059dc2c 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -4307,7 +4307,17 @@ make_typename_type (tree context, tree name, enum 
tag_types tag_type,
   lookup will stop when we hit a dependent base.  */
if (!dependent_scope_p (context))
  {
-  bool want_type = (complain & tf_qualifying_scope);
+  /* We generally don't ignore non-types during TYPENAME_TYPE lookup
+(as per [temp.res.general]/3), unless
+  - the tag corresponds to a class-key or 'enum' so
+[basic.lookup.elab] applies, or
+  - the tag corresponds to scope_type or tf_qualifying_scope is
+set so [basic.lookup.qual]/1 applies.
+TODO: If we'd set/track the scope_type tag thoroughly on all
+TYPENAME_TYPEs that are followed by :: then we wouldn't need the
+tf_qualifying_scope flag.  */
+  bool want_type = (tag_type != none_type && tag_type != typename_type)
+   || (complain & tf_qualifying_scope);
t = lookup_member (context, name, /*protect=*/2, want_type, complain);
  }
else
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 821e0035c08..09559c88f29 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -16580,9 +16580,16 @@ tsubst (tree t, tree args, tsubst_flags_t complain, 
tree in_decl)
  return error_mark_node;
  }
  
+	/* FIXME: TYPENAME_IS_CLASS_P conflates 'union' vs 'struct' vs 

[PATCH 2/2] c++: make trait of incomplete type a permerror [PR109277]

2023-04-13 Thread Jason Merrill via Gcc-patches
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

An incomplete type argument to several traits is specified to be undefined
behavior in the library; since it's a compile-time property, we diagnose
it.  But apparently some code was relying on the previous behavior of not
diagnosing.  So let's make it a permerror.

The assert in cxx_incomplete_type_diagnostic didn't like that, and I don't
see the point of having the assert, so let's just remove it.

PR c++/109277

gcc/cp/ChangeLog:

* semantics.cc (check_trait_type): Handle incomplete type directly.
* typeck2.cc (cxx_incomplete_type_diagnostic): Remove assert.

gcc/testsuite/ChangeLog:

* g++.dg/ext/is_convertible5.C: New test.
---
 gcc/cp/semantics.cc| 7 ++-
 gcc/cp/typeck2.cc  | 4 
 gcc/testsuite/g++.dg/ext/is_convertible5.C | 7 +++
 3 files changed, 13 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/is_convertible5.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 99a76e3ed65..45e0b0e81d3 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12107,7 +12107,12 @@ check_trait_type (tree type, int kind = 1)
   if (VOID_TYPE_P (type))
 return true;
 
-  return !!complete_type_or_else (strip_array_types (type), NULL_TREE);
+  type = complete_type (strip_array_types (type));
+  if (!COMPLETE_TYPE_P (type)
+  && cxx_incomplete_type_diagnostic (NULL_TREE, type, DK_PERMERROR)
+  && !flag_permissive)
+return false;
+  return true;
 }
 
 /* Process a trait expression.  */
diff --git a/gcc/cp/typeck2.cc b/gcc/cp/typeck2.cc
index 76a7a7f6b98..bf03967a71f 100644
--- a/gcc/cp/typeck2.cc
+++ b/gcc/cp/typeck2.cc
@@ -298,10 +298,6 @@ cxx_incomplete_type_diagnostic (location_t loc, const_tree 
value,
 {
   bool is_decl = false, complained = false;
 
-  gcc_assert (diag_kind == DK_WARNING 
- || diag_kind == DK_PEDWARN 
- || diag_kind == DK_ERROR);
-
   /* Avoid duplicate error message.  */
   if (TREE_CODE (type) == ERROR_MARK)
 return false;
diff --git a/gcc/testsuite/g++.dg/ext/is_convertible5.C 
b/gcc/testsuite/g++.dg/ext/is_convertible5.C
new file mode 100644
index 000..ab9be05afea
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/is_convertible5.C
@@ -0,0 +1,7 @@
+// PR c++/109277
+// { dg-do compile { target c++11 } }
+// { dg-options -fpermissive }
+
+struct a;
+struct b{};
+static_assert (!__is_convertible (a, b), ""); // { dg-warning "incomplete" }
-- 
2.31.1



[PATCH 1/2] c++: make cxx_incomplete_type_diagnostic return bool

2023-04-13 Thread Jason Merrill via Gcc-patches
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

Like other diagnostic functions that might be silenced by options, it should
return whether or not it actually emitted a diagnostic.

gcc/cp/ChangeLog:

* typeck2.cc (cxx_incomplete_type_diagnostic): Return bool.
* cp-tree.h (cxx_incomplete_type_diagnostic): Adjust.
---
 gcc/cp/cp-tree.h  |  8 
 gcc/cp/typeck2.cc | 34 ++
 2 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 622752ae4e6..a14eb8d0b9a 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -8155,7 +8155,7 @@ extern void maybe_warn_pessimizing_move(tree, 
tree, bool);
 
 /* in typeck2.cc */
 extern void require_complete_eh_spec_types (tree, tree);
-extern void cxx_incomplete_type_diagnostic (location_t, const_tree,
+extern bool cxx_incomplete_type_diagnostic (location_t, const_tree,
 const_tree, diagnostic_t);
 inline location_t
 loc_or_input_loc (location_t loc)
@@ -8178,12 +8178,12 @@ cp_expr_loc_or_input_loc (const_tree t)
   return cp_expr_loc_or_loc (t, input_location);
 }
 
-inline void
+inline bool
 cxx_incomplete_type_diagnostic (const_tree value, const_tree type,
diagnostic_t diag_kind)
 {
-  cxx_incomplete_type_diagnostic (cp_expr_loc_or_input_loc (value),
- value, type, diag_kind);
+  return cxx_incomplete_type_diagnostic (cp_expr_loc_or_input_loc (value),
+value, type, diag_kind);
 }
 
 extern void cxx_incomplete_type_error  (location_t, const_tree,
diff --git a/gcc/cp/typeck2.cc b/gcc/cp/typeck2.cc
index c56b69164e2..76a7a7f6b98 100644
--- a/gcc/cp/typeck2.cc
+++ b/gcc/cp/typeck2.cc
@@ -292,7 +292,7 @@ cxx_incomplete_type_inform (const_tree type)
and TYPE is the type that was invalid.  DIAG_KIND indicates the
type of diagnostic (see diagnostic.def).  */
 
-void
+bool
 cxx_incomplete_type_diagnostic (location_t loc, const_tree value,
const_tree type, diagnostic_t diag_kind)
 {
@@ -304,7 +304,7 @@ cxx_incomplete_type_diagnostic (location_t loc, const_tree 
value,
 
   /* Avoid duplicate error message.  */
   if (TREE_CODE (type) == ERROR_MARK)
-return;
+return false;
 
   if (value)
 {
@@ -336,7 +336,7 @@ cxx_incomplete_type_diagnostic (location_t loc, const_tree 
value,
   break;
 
 case VOID_TYPE:
-  emit_diagnostic (diag_kind, loc, 0,
+  complained = emit_diagnostic (diag_kind, loc, 0,
   "invalid use of %qT", type);
   break;
 
@@ -346,7 +346,7 @@ cxx_incomplete_type_diagnostic (location_t loc, const_tree 
value,
  type = TREE_TYPE (type);
  goto retry;
}
-  emit_diagnostic (diag_kind, loc, 0,
+  complained = emit_diagnostic (diag_kind, loc, 0,
   "invalid use of array with unspecified bounds");
   break;
 
@@ -365,12 +365,12 @@ cxx_incomplete_type_diagnostic (location_t loc, 
const_tree value,
   add a fix-it hint.  */
if (type_num_arguments (TREE_TYPE (member)) == 1)
  richloc.add_fixit_insert_after ("()");
-   emit_diagnostic (diag_kind, , 0,
+   complained = emit_diagnostic (diag_kind, , 0,
 "invalid use of member function %qD "
 "(did you forget the %<()%> ?)", member);
  }
else
- emit_diagnostic (diag_kind, loc, 0,
+ complained = emit_diagnostic (diag_kind, loc, 0,
   "invalid use of member %qD "
   "(did you forget the %<&%> ?)", member);
   }
@@ -380,38 +380,38 @@ cxx_incomplete_type_diagnostic (location_t loc, 
const_tree value,
   if (is_auto (type))
{
  if (CLASS_PLACEHOLDER_TEMPLATE (type))
-   emit_diagnostic (diag_kind, loc, 0,
+   complained = emit_diagnostic (diag_kind, loc, 0,
 "invalid use of placeholder %qT", type);
  else
-   emit_diagnostic (diag_kind, loc, 0,
+   complained = emit_diagnostic (diag_kind, loc, 0,
 "invalid use of %qT", type);
}
   else
-   emit_diagnostic (diag_kind, loc, 0,
+   complained = emit_diagnostic (diag_kind, loc, 0,
 "invalid use of template type parameter %qT", type);
   break;
 
 case BOUND_TEMPLATE_TEMPLATE_PARM:
-  emit_diagnostic (diag_kind, loc, 0,
+  complained = emit_diagnostic (diag_kind, loc, 0,
   "invalid use of template template parameter %qT",
   TYPE_NAME (type));
   break;
 
 case TYPE_PACK_EXPANSION:
-  emit_diagnostic (diag_kind, loc, 0,
+  complained = emit_diagnostic (diag_kind, loc, 0,
   "invalid use of pack expansion %qT", type);
   

[PATCH] rs6000: Fix test int_128bit-runnable.c instruction counts

2023-04-13 Thread Carl Love via Gcc-patches
GCC maintainers:

The following fix updates the expected instruction counts for the 
test int_128bit-runnable.c test.  The counts changed as a result of a
commit to support 128-bit integer divide and modulus.  The change
resulted in two of the tests using vdivsq instructions rather than the 
vextsd2q instruction.  This increased the counts for the vdivsq from 1
to three and the counts for the vextsd2q instruction from 6 to 4.

The patch has been tested on a Power10 system with no new regression
failures.

Please let me know if this patch is acceptable for mainline.  Thanks.

 Carl 



rs6000: Fix test int_128bit-runnable.c instruction counts

The test reports two failures on Power 10LE:

FAIL: .../int_128bit-runnable.c scan-assembler-times mvdivsqM 1
FAIL: .../int_128bit-runnable.c scan-assembler-times mvextsd2qM 6

The current counts are :

  vdivsq   3
  vextsd2q 4

The counts changed with commit:

  commit 852b11da11a181df517c0348df044354ff0656d6
  Author: Michael Meissner 
  Date:   Wed Jul 7 21:55:38 2021 -0400

  Generate 128-bit int divide/modulus on power10.

  This patch adds support for the VDIVSQ, VDIVUQ, VMODSQ, and VMODUQ
  instructions to do 128-bit arithmetic.

  2021-07-07  Michael Meissner  

The code generation changed significantly.  There are two places where
the vextsd2q is "replaced" by a vdivsq instruction thus increasing the
vdivsq count from 1 to 3.  The first case is:

expected_result = vec_arg1[0]/4;
1af8:   60 01 df e8 ld  r6,352(r31)
1afc:   68 01 ff e8 ld  r7,360(r31)
1b00:   76 fe e9 7c sradi   r9,r7,63
1b04:   67 4b 00 7c mtvsrdd vs32,0,r9
1b08:   02 06 1b 10 vextsd2q v0,v0 <
1b0c:   03 00 40 39 li  r10,3
1b10:   00 00 60 39 li  r11,0
1b14:   67 00 09 7c mfvrd   r9,v0
1b18:   67 02 08 7c mfvsrld r8,vs32
1b1c:   38 50 08 7d and r8,r8,r10
1b20:   38 58 29 7d and r9,r9,r11
1b24:   78 4b 2b 7d mr  r11,r9
1b28:   78 43 0a 7d mr  r10,r8
1b2c:   14 30 4a 7f addcr26,r10,r6
1b30:   14 39 6b 7f adder27,r11,r7
1b34:   46 f0 69 7b sldir9,r27,62
1b38:   82 f0 58 7b srdir24,r26,2
1b3c:   78 c3 38 7d or  r24,r9,r24
1b40:   74 16 79 7f sradi   r25,r27,2
1b44:   30 00 1f fb std r24,48(r31)
1b48:   38 00 3f fb std r25,56(r31)

To:

   expected_result = vec_arg1[0]/4;
1af8:   69 01 1f f4 lxv vs32,352(r31)
1afc:   04 00 20 39 li  r9,4
1b00:   00 00 40 39 li  r10,0
1b04:   67 4b 2a 7c mtvsrdd vs33,r10,r9
1b08:   0b 09 00 10 vdivsq  v0,v0,v1   <
1b0c:   3d 00 1f f4 stxvvs32,48(r31)

The second case were a vexts2q instruction is replaced with vdivsq:

From:

  expected_result = arg1/16;
1c24:   40 00 df e8 ld  r6,64(r31)
1c28:   48 00 ff e8 ld  r7,72(r31)
1c2c:   76 fe e9 7c sradi   r9,r7,63
1c30:   67 4b 00 7c mtvsrdd vs32,0,r9
1c34:   02 06 1b 10 vextsd2q v0,v0<---
1c38:   0f 00 40 39 li  r10,15
1c3c:   00 00 60 39 li  r11,0
1c40:   67 00 09 7c mfvrd   r9,v0
1c44:   67 02 08 7c mfvsrld r8,vs32
1c48:   38 50 08 7d and r8,r8,r10
1c4c:   38 58 29 7d and r9,r9,r11
1c50:   78 4b 2b 7d mr  r11,r9
1c54:   78 43 0a 7d mr  r10,r8
1c58:   14 30 ca 7e addcr22,r10,r6
1c5c:   14 39 eb 7e adder23,r11,r7
1c60:   c6 e0 e9 7a sldir9,r23,60
1c64:   02 e1 d4 7a srdir20,r22,4
1c68:   78 a3 34 7d or  r20,r9,r20
1c6c:   74 26 f5 7e sradi   r21,r23,4
1c70:   30 00 9f fa std r20,48(r31)
1c74:   38 00 bf fa std r21,56(r31)

To:

  expected_result = arg1/16;
1be8:   49 00 1f f4 lxv vs32,64(r31)
1bec:   10 00 20 39 li  r9,16
1bf0:   00 00 40 39 li  r10,0
1bf4:   67 4b 2a 7c mtvsrdd vs33,r10,r9
1bf8:   0b 09 00 10 vdivsq  v0,v0,v1   <---
1bfc:   3d 00 1f f4 stxvvs32,48(r31)

The patch has been tested on Power10LE with no regressions.

gcc/testsuite/
* gcc.target/powerpc/int_128bit-runnable.c: Update expected
instruction counts.
---
 gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
index 1afb00262a1..b2e2da1e013 100644
--- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
+++ 

New Croatian PO file for 'gcc' (version 13.1-b20230409)

2023-04-13 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Croatian team of translators.  The file is available at:

https://translationproject.org/latest/gcc/hr.po

(This file, 'gcc-13.1-b20230409.hr.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [PATCH] PR tree-optimization/109462 - Don't use ANY PHI equivalences in range-on-entry.

2023-04-13 Thread Richard Biener via Gcc-patches



> Am 13.04.2023 um 17:49 schrieb Andrew MacLeod :
> 
> 
>> On 4/13/23 09:56, Richard Biener wrote:
>>> On Wed, Apr 12, 2023 at 10:55 PM Andrew MacLeod  wrote:
>>> 
>>> On 4/12/23 07:01, Richard Biener wrote:
 On Wed, Apr 12, 2023 at 12:59 PM Jakub Jelinek  wrote:
> Would be nice.
> 
> Though, I'm afraid it still wouldn't fix the PR101912 testcase, because
> it has exactly what happens in this PR, undefined phi arg from the
> pre-header and uses of the previous iteration's value (i.e. across
> backedge).
 Well yes, that's what's not allowed.  So when the PHI dominates the
 to-be-equivalenced argument edge src then the equivalence isn't
 valid because there's a place (that very source block for example) a use 
 of the
 PHI lhs could appear and where we'd then mixup iterations.
 
>>> If we want to implement this cleaner, then as you say, we don't create
>>> the equivalence if the PHI node dominates the argument edge.  The
>>> attached patch does just that, removing the both the "fix" for 108139
>>> and the just committed one for 109462, replacing them with catching this
>>> at the time of equivalence registering.
>>> 
>>> It bootstraps and passes all regressions tests.
>>> Do you want me to check this into trunk?
>> Uh, it looks a bit convoluted.  Wouldn't the following be enough?  OK
>> if that works
>> (or fixed if I messed up trivially)
>> 
>> diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
>> index e81f6b3699e..9c29012e160 100644
>> --- a/gcc/gimple-range-fold.cc
>> +++ b/gcc/gimple-range-fold.cc
>> @@ -776,7 +776,11 @@ fold_using_range::range_of_phi (vrange , gphi
>> *phi, fur_source )
>>   if (!seen_arg)
>> {
>>   seen_arg = true;
>> - single_arg = arg;
>> + // Avoid registering an equivalence if the PHI dominates the
>> + // argument edge.  See PR 108139/109462.
>> + if (dom_info_available_p (CDI_DOMINATORS)
>> + && !dominated_by_p (CDI_DOMINATORS, e->src, gimple_bb 
>> (phi)))
>> +   single_arg = arg;
>> }
>>   else if (single_arg != arg)
>> single_arg = NULL_TREE;
>> 
>> 
> It would exposes a slight hole.. in cases where there is more than one copy 
> of the name, ie:
> 
> for a_2 = PHI   we currently will create an equivalence between  
> a_2 and c_3 because its considered a single argument.  Not a big deal for 
> this case since all arguments are c_3, but the hole would be when we have 
> something like:
> 
> a_2 = PHIif d_4 is undefined, then with the above patch 
> we would only check the dominance of the first edge with c_3. we'd need to 
> check all of them.
> 
> The patch is slightly convoluted because we always defer checking the 
> edge/processing single arguments until we think there is a reason to (for 
> performance).  My patch simple does the deferred check on the previous edge 
> and sets the new one so that we would check both edges are valid before 
> setting the equivalence.  Even as it is with this deferred check we're about 
> 0.4% slower in VRP. IF we didnt do this deferring, then every PHI is going to 
> have a check.
> 
> And along the way, remove the boolean seen_arg because having single_arg_edge 
> set produces the same information.
> 
> Perhaps it would be cleaner to simply defer the entire thing to the end, like 
> so.
> Performance is pretty much identical in the end.
> 
> Bootstraps on x86_64-pc-linux-gnu, regressions are running. Assuming no 
> regressions pop up,   OK for trunk?

Yes.  I like this more.  OK

Richard 

> Andrew
> 
> 
> 
> 
> 
> <462c.patch>


Re: [PATCH] PR tree-optimization/109462 - Don't use ANY PHI equivalences in range-on-entry.

2023-04-13 Thread Richard Biener via Gcc-patches



> Am 13.04.2023 um 17:49 schrieb Andrew MacLeod :
> 
> 
>> On 4/13/23 09:56, Richard Biener wrote:
>>> On Wed, Apr 12, 2023 at 10:55 PM Andrew MacLeod  wrote:
>>> 
>>> On 4/12/23 07:01, Richard Biener wrote:
 On Wed, Apr 12, 2023 at 12:59 PM Jakub Jelinek  wrote:
> Would be nice.
> 
> Though, I'm afraid it still wouldn't fix the PR101912 testcase, because
> it has exactly what happens in this PR, undefined phi arg from the
> pre-header and uses of the previous iteration's value (i.e. across
> backedge).
 Well yes, that's what's not allowed.  So when the PHI dominates the
 to-be-equivalenced argument edge src then the equivalence isn't
 valid because there's a place (that very source block for example) a use 
 of the
 PHI lhs could appear and where we'd then mixup iterations.
 
>>> If we want to implement this cleaner, then as you say, we don't create
>>> the equivalence if the PHI node dominates the argument edge.  The
>>> attached patch does just that, removing the both the "fix" for 108139
>>> and the just committed one for 109462, replacing them with catching this
>>> at the time of equivalence registering.
>>> 
>>> It bootstraps and passes all regressions tests.
>>> Do you want me to check this into trunk?
>> Uh, it looks a bit convoluted.  Wouldn't the following be enough?  OK
>> if that works
>> (or fixed if I messed up trivially)
>> 
>> diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
>> index e81f6b3699e..9c29012e160 100644
>> --- a/gcc/gimple-range-fold.cc
>> +++ b/gcc/gimple-range-fold.cc
>> @@ -776,7 +776,11 @@ fold_using_range::range_of_phi (vrange , gphi
>> *phi, fur_source )
>>   if (!seen_arg)
>> {
>>   seen_arg = true;
>> - single_arg = arg;
>> + // Avoid registering an equivalence if the PHI dominates the
>> + // argument edge.  See PR 108139/109462.
>> + if (dom_info_available_p (CDI_DOMINATORS)
>> + && !dominated_by_p (CDI_DOMINATORS, e->src, gimple_bb 
>> (phi)))
>> +   single_arg = arg;
>> }
>>   else if (single_arg != arg)
>> single_arg = NULL_TREE;
>> 
>> 
> It would exposes a slight hole.. in cases where there is more than one copy 
> of the name, ie:
> 
> for a_2 = PHI   we currently will create an equivalence between  
> a_2 and c_3 because its considered a single argument.  Not a big deal for 
> this case since all arguments are c_3, but the hole would be when we have 
> something like:
> 
> a_2 = PHIif d_4 is undefined, then with the above patch 
> we would only check the dominance of the first edge with c_3. we'd need to 
> check all of them.

I didn’t think so, but on a second thought for multiple backedges and an 
irreducible region you are right.  Once we see an edge with the reverse 
domination we’re fine though.  I originally checked all edges but thought 
checking the first is enough.

> The patch is slightly convoluted because we always defer checking the 
> edge/processing single arguments until we think there is a reason to (for 
> performance).  My patch simple does the deferred check on the previous edge 
> and sets the new one so that we would check both edges are valid before 
> setting the equivalence.  Even as it is with this deferred check we're about 
> 0.4% slower in VRP. IF we didnt do this deferring, then every PHI is going to 
> have a check.
> 
> And along the way, remove the boolean seen_arg because having single_arg_edge 
> set produces the same information.
> 
> Perhaps it would be cleaner to simply defer the entire thing to the end, like 
> so.
> Performance is pretty much identical in the end.
> 
> Bootstraps on x86_64-pc-linux-gnu, regressions are running. Assuming no 
> regressions pop up,   OK for trunk?
> 
> Andrew
> 
> 
> 
> 
> 
> <462c.patch>


Re: [PATCH] RISC-V: Set the ABI for the RVV tests

2023-04-13 Thread Kito Cheng via Gcc-patches
Ok, thanks :)

Palmer Dabbelt 於 2023年4月13日 週四,23:12寫道:

> The RVV test harness currently sets the ISA according to the target
> tuple, but doesn't also set the ABI.  This just sets the ABI to match
> the ISA, though we should really also be respecting the user's specific
> ISA to test.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/rvv.exp (gcc_mabi): New variable.
> ---
> I've still got some rv32-related multilib failures so there might be
> something else going on here, but I think at least this is going to be
> necessary.
> ---
>  gcc/testsuite/gcc.target/riscv/rvv/rvv.exp | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
> b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
> index 7a9a2b6ac48..4b5509db385 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
> @@ -31,15 +31,17 @@ if ![info exists DEFAULT_CFLAGS] then {
>  }
>
>  set gcc_march "rv64gcv_zfh"
> +set gcc_mabi  "lp64d"
>  if [istarget riscv32-*-*] then {
>set gcc_march "rv32gcv_zfh"
> +  set gcc_mabi  "ilp32d"
>  }
>
>  # Initialize `dg'.
>  dg-init
>
>  # Main loop.
> -set CFLAGS "$DEFAULT_CFLAGS -march=$gcc_march -O3"
> +set CFLAGS "$DEFAULT_CFLAGS -march=$gcc_march -mabi=$gcc_mabi -O3"
>  dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/base/*.\[cS\]]] \
> "" $CFLAGS
>  gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/vsetvl/*.\[cS\]]]
> \
> --
> 2.39.2
>
>


[PATCH] aarch64: Don't trust TYPE_ALIGN for pointers [PR108910]

2023-04-13 Thread Richard Sandiford via Gcc-patches
The aarch64 PCS rules ignore user alignment for scalars and
vectors and use the "natural" alignment of the type.  GCC tried
to calculate that natural alignment using:

  TYPE_ALIGN (TYPE_MAIN_VARIANT (type))

But as discussed in the PR, it's possible that the main variant
of a pointer type is an overaligned type (although that's usually
accidental).

This isn't known to be a problem for other types, so this patch
changes the bare minimum.  It might be that we need to ignore
TYPE_ALIGN in other cases too.

Tested on aarch64-linux-gnu & pushed to trunk so far.  Will backport
to GCC 12 soon.

Richard


gcc/
PR target/108910
* config/aarch64/aarch64.cc (aarch64_function_arg_alignment): Do
not trust TYPE_ALIGN for pointer types; use POINTER_SIZE instead.

gcc/testsuite/
PR target/108910
* gcc.dg/torture/pr108910.c: New test.
---
 gcc/config/aarch64/aarch64.cc   | 15 ++-
 gcc/testsuite/gcc.dg/torture/pr108910.c |  8 
 2 files changed, 22 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr108910.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 42617ced73a..f4ef22ce02f 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -7484,7 +7484,20 @@ aarch64_function_arg_alignment (machine_mode mode, 
const_tree type,
   gcc_assert (TYPE_MODE (type) == mode);
 
   if (!AGGREGATE_TYPE_P (type))
-return TYPE_ALIGN (TYPE_MAIN_VARIANT (type));
+{
+  /* The ABI alignment is the natural alignment of the type, without
+any attributes applied.  Normally this is the alignment of the
+TYPE_MAIN_VARIANT, but not always; see PR108910 for a counterexample.
+For now we just handle the known exceptions explicitly.  */
+  type = TYPE_MAIN_VARIANT (type);
+  if (POINTER_TYPE_P (type))
+   {
+ gcc_assert (known_eq (POINTER_SIZE, GET_MODE_BITSIZE (mode)));
+ return POINTER_SIZE;
+   }
+  gcc_assert (!TYPE_USER_ALIGN (type));
+  return TYPE_ALIGN (type);
+}
 
   if (TREE_CODE (type) == ARRAY_TYPE)
 return TYPE_ALIGN (TREE_TYPE (type));
diff --git a/gcc/testsuite/gcc.dg/torture/pr108910.c 
b/gcc/testsuite/gcc.dg/torture/pr108910.c
new file mode 100644
index 000..59735488c2e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr108910.c
@@ -0,0 +1,8 @@
+extern void foo (float, float *, float *);
+
+void
+bar (void *p)
+{
+  float *__attribute__((aligned (64))) q = __builtin_assume_aligned (p, 64);
+  foo (0.0f, q, q);
+}
-- 
2.25.1



Re: [PATCH] PR tree-optimization/109462 - Don't use ANY PHI equivalences in range-on-entry.

2023-04-13 Thread Andrew MacLeod via Gcc-patches


On 4/13/23 09:56, Richard Biener wrote:

On Wed, Apr 12, 2023 at 10:55 PM Andrew MacLeod  wrote:


On 4/12/23 07:01, Richard Biener wrote:

On Wed, Apr 12, 2023 at 12:59 PM Jakub Jelinek  wrote:

Would be nice.

Though, I'm afraid it still wouldn't fix the PR101912 testcase, because
it has exactly what happens in this PR, undefined phi arg from the
pre-header and uses of the previous iteration's value (i.e. across
backedge).

Well yes, that's what's not allowed.  So when the PHI dominates the
to-be-equivalenced argument edge src then the equivalence isn't
valid because there's a place (that very source block for example) a use of the
PHI lhs could appear and where we'd then mixup iterations.


If we want to implement this cleaner, then as you say, we don't create
the equivalence if the PHI node dominates the argument edge.  The
attached patch does just that, removing the both the "fix" for 108139
and the just committed one for 109462, replacing them with catching this
at the time of equivalence registering.

It bootstraps and passes all regressions tests.
Do you want me to check this into trunk?

Uh, it looks a bit convoluted.  Wouldn't the following be enough?  OK
if that works
(or fixed if I messed up trivially)

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index e81f6b3699e..9c29012e160 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -776,7 +776,11 @@ fold_using_range::range_of_phi (vrange , gphi
*phi, fur_source )
   if (!seen_arg)
 {
   seen_arg = true;
- single_arg = arg;
+ // Avoid registering an equivalence if the PHI dominates the
+ // argument edge.  See PR 108139/109462.
+ if (dom_info_available_p (CDI_DOMINATORS)
+ && !dominated_by_p (CDI_DOMINATORS, e->src, gimple_bb (phi)))
+   single_arg = arg;
 }
   else if (single_arg != arg)
 single_arg = NULL_TREE;


It would exposes a slight hole.. in cases where there is more than one 
copy of the name, ie:


for a_2 = PHI   we currently will create an equivalence 
between  a_2 and c_3 because its considered a single argument.  Not a 
big deal for this case since all arguments are c_3, but the hole would 
be when we have something like:


a_2 = PHI    if d_4 is undefined, then with the above 
patch we would only check the dominance of the first edge with c_3. we'd 
need to check all of them.


The patch is slightly convoluted because we always defer checking the 
edge/processing single arguments until we think there is a reason to 
(for performance).  My patch simple does the deferred check on the 
previous edge and sets the new one so that we would check both edges are 
valid before setting the equivalence.  Even as it is with this deferred 
check we're about 0.4% slower in VRP. IF we didnt do this deferring, 
then every PHI is going to have a check.


And along the way, remove the boolean seen_arg because having 
single_arg_edge set produces the same information.


Perhaps it would be cleaner to simply defer the entire thing to the end, 
like so.

Performance is pretty much identical in the end.

Bootstraps on x86_64-pc-linux-gnu, regressions are running. Assuming no 
regressions pop up,   OK for trunk?


Andrew





commit 9e16ef8e4de26bdc6e570bd327bbe15845491169
Author: Andrew MacLeod 
Date:   Wed Apr 12 13:10:55 2023 -0400

Ensure PHI equivalencies do not dominate the argument edge.

When we create an equivalency between a PHI definition and an argument,
ensure the definition does not dominate the incoming argument edge.

PR tree-optimziation/108139
PR tree-optimziation/109462
* gimple-range-cache.cc (ranger_cache::fill_block_cache): Remove
equivalency check for PHI nodes.
* gimple-range-fold.cc (fold_using_range::range_of_phi): Ensure def
does not dominate single-arg equivalency edges.

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 3b52f1e734c..2314478d558 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -1220,7 +1220,7 @@ ranger_cache::fill_block_cache (tree name, basic_block bb, basic_block def_bb)
   // See if any equivalences can refine it.
   // PR 109462, like 108139 below, a one way equivalence introduced
   // by a PHI node can also be through the definition side.  Disallow it.
-  if (m_oracle && !is_a (SSA_NAME_DEF_STMT (name)))
+  if (m_oracle)
 	{
 	  tree equiv_name;
 	  relation_kind rel;
@@ -1237,13 +1237,6 @@ ranger_cache::fill_block_cache (tree name, basic_block bb, basic_block def_bb)
 	  if (!m_gori.has_edge_range_p (equiv_name))
 		continue;
 
-	  // PR 108139. It is hazardous to assume an equivalence with
-	  // a PHI is the same value.  The PHI may be an equivalence
-	  // via UNDEFINED arguments which is really a one way equivalence.
-	  // PHIDEF == name, but 

[PATCH] RISC-V: Set the ABI for the RVV tests

2023-04-13 Thread Palmer Dabbelt
The RVV test harness currently sets the ISA according to the target
tuple, but doesn't also set the ABI.  This just sets the ABI to match
the ISA, though we should really also be respecting the user's specific
ISA to test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp (gcc_mabi): New variable.
---
I've still got some rv32-related multilib failures so there might be
something else going on here, but I think at least this is going to be
necessary.
---
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp 
b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
index 7a9a2b6ac48..4b5509db385 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
@@ -31,15 +31,17 @@ if ![info exists DEFAULT_CFLAGS] then {
 }
 
 set gcc_march "rv64gcv_zfh"
+set gcc_mabi  "lp64d"
 if [istarget riscv32-*-*] then {
   set gcc_march "rv32gcv_zfh"
+  set gcc_mabi  "ilp32d"
 }
 
 # Initialize `dg'.
 dg-init
 
 # Main loop.
-set CFLAGS "$DEFAULT_CFLAGS -march=$gcc_march -O3"
+set CFLAGS "$DEFAULT_CFLAGS -march=$gcc_march -mabi=$gcc_mabi -O3"
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/base/*.\[cS\]]] \
"" $CFLAGS
 gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/vsetvl/*.\[cS\]]] \
-- 
2.39.2



Re: [PATCH] loop-iv: Fix up bounds computation

2023-04-13 Thread Jeff Law via Gcc-patches




On 4/13/23 07:45, Jakub Jelinek wrote:

On Thu, Apr 13, 2023 at 06:35:07AM -0600, Jeff Law wrote:

Bootstrap was successful with v3, but there's hundreds of testsuite failures
due to the simplify-rtx hunk.  compile/20070520-1.c for example when
compiled with:  -O3 -funroll-loops -march=rv64gc -mabi=lp64d

Thursdays are my hell day.  It's unlikely I'd be able to look at this at all
today.


So, seems to me this is because loop-iv.cc asks for invalid RTL to be
simplified, it calls simplify_gen_binary (AND, SImode,
(subreg:SI (plus:DI (reg:DI 289 [ ivtmp_312 ])
 (const_int 4294967295 [0x])) 0),
(const_int 4294967295 [0x]))
but 0x is not valid SImode CONST_INT, and unlike previously
we no longer on WORD_REGISTER_OPERATIONS targets which have DImode
word_mode optimize that into the op0, so the invalid constant is emitted
into the IL and checking fails.

The following patch fixes that (and we optimize that & -1 away even earlier
with that).

Could you please just quickly try to apply this patch, make in the stage3
directory followed by
make check-gcc RUNTESTFLAGS="... compile.exp='20070520-1.c ...'"
(with all tests that regressed previously), whether this is the only spot
or whether we need to fix some other place too?

2023-04-13  Jakub Jelinek  

* loop-iv.cc (iv_number_of_iterations): Use gen_int_mode instead
of GEN_INT.
I'll try to apply this and do just an incremental build & test to see if 
it resolves all the regressions.  It should complete while I'm in my 
meeting hell.


jeff


Re: [r13-7135 Regression] FAIL: gcc.dg/vect/vect-simd-clone-18f.c scan-tree-dump-times vect "[\\n\\r] [^\\n]* = foo\\.simdclone" 2 on Linux/x86_64

2023-04-13 Thread Andre Vieira (lists) via Gcc-patches




On 13/04/2023 15:00, Richard Biener wrote:

On Thu, Apr 13, 2023 at 3:00 PM Andre Vieira (lists) via Gcc-patches
 wrote:




On 13/04/2023 11:01, Andrew Stubbs wrote:

Hi Andre,

I don't have a cascadelake device to test on, nor any knowledge about
what makes it different from regular x86_64.


Not sure you need one, but yeah I don't know either, it looks like it
fails because:
in-branch vector clones are not yet supported for integer mask modes.

A quick look tells me this is because mask_mode is not VOIDmode.
i386.cc's TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN will set
mask_mode to either DI or SI mode when TARGET_AVX512F. So I suspect
cascadelake is TARGET_AVX512F.

This is where I bail out as I really don't want to dive into the target
specific simd clone handling of x86 ;)



If the cascadelake device is supposed to work the same as other x86_64
devices for these vectors then the test has found a bug in the compiler
and you should be looking to fix that, not fudge the testcase.

Alternatively, if the device's capabilities really are different and the
tests should behave differently, then the actual expectations need to be
encoded in the dejagnu directives. If you can't tell the difference by
looking at the "x86_64*-*-*" target selector alone then the correct
solution is to invent a new "effective-target" selector. There are lots
of examples of using these throughout the testsuite (you could use
dg-require-effective-target to disable the whole testcase, or just use
the name in the scan-tree-dump-times directive to customise the
expectations), and the definitions can be found in the
lib/target-supports.exp and lib/target-supports-dg.exp scripts. Some are
fixed expressions and some run the compiler to probe the configuration,
but in this case you probably want to do something with "check-flags".


Even though I agree with you, I'm not the right person to do this
digging for such target specific stuff. So for now I'd probably suggest
xfailing this for avx512f.


For the unroll problem, you can probably tweak the optimization options
to disable that, the same as has been done for the epilogues feature
that had the same problem.


I mistaken the current behaviour for unrolling, it's actually because of
a latent bug. The vectorizer calls `vect_get_smallest_scalar_type` to
determine the vectype of a stmt. For a function like foo, that has the
same type (long long) everywhere this wouldn't be a problem, however,
because you transformed it into a MASK_CALL that has a function pointer
(which is 32-bit in -m32) that now becomes the 'smallest' type.

This is all a red-herring though, I don't think we should be calling
this function for potential simdclone calls as the type on which the
veclen is not necessarily the 'smallest' type. And some arguments (like
uniform and linear) should be ignored anyway as they won't be mapped to
vectors.  So I do think this might have been broken even before your
changes, but needs further investigation.

Since these are new tests for a new feature, I don't really understand
why this is classed as a regression?

Andrew

P.S. there was a commit to these tests in the last few days, so make
sure you pull that before making changes.


The latest commit to these tests was mine, it's the one Haochen is
reporting this regression against. My commit was to fix the issue richi
had introduced that was preventing the feature you introduced from
working. The reason nobody noticed was because the tests you introduced
didn't actually test your feature, since you didn't specify 'inbranch'
the omp declare simd pragma was allowing the use of not-inbranch simd
clones and the vectorizer was being smart enough to circumvent the
conditional and was still able to use simdclones (non inbranch ones) so
when the inbranch stopped working, the test didn't notice.

The other changes to this test were already after the fix for 10
that broke the inbranch feature you added, and so it was fixing a
cascadelake testism but for the not-inbranch simdclones. So basically
fixing a testism of a testism :/


I am working on simdclone's for AArch64 for next Stage 1 so I don't mind
looking at the issue with the vectype being chosen wrongly, as for the
other x86 specific testisms I'll leave them to someone else.


Btw, the new testsuite FAILs could be just epiloge vectorizations, so
maybe try the usual --param vect-epilogues-nomask=0 ...

It already has those, Jakub added them.

But that's not it, I've been looking at it, and there is code in place 
that does what I expected which is defer the choice of vectype for simd 
clones until vectorizable_simd_clone_call, unfortunately it has a 
mistaken assumption that simdclones don't return :/

see vect_get_vector_types_for_stmt:
...
  if (gimple_get_lhs (stmt) == NULL_TREE
  /* MASK_STORE has no lhs, but is ok.  */
  && !gimple_call_internal_p (stmt, IFN_MASK_STORE))
{
  if (is_a  (stmt))
{
  /* Ignore calls with no lhs.  These must be 

Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-13 Thread Richard Biener via Gcc-patches
On Thu, 13 Apr 2023, Richard Sandiford wrote:

> ???  writes:
> > Yeah, like kito said.
> > Turns out the tuple type model in ARM SVE is the optimal solution for RVV.
> > And we like ARM SVE style implmentation.
> >
> > And now we see swapping rtx_code and mode in rtx_def can make rtx_def 
> > overal not exceed 64 bit.
> > But it seems that there is still problem in tree_type_common and 
> > tree_decl_common, is that right?
> 
> I thought upthread we had a way forward for tree_type_common and
> tree_decl_common too, but maybe I only convinced myself. :)
> 
> > After several trys (remove all redundant TI/TF vector modes and FP16 vector 
> > mode), now there are 252 modes
> > in RISC-V port. Basically, I can keep supporting new RVV intrinsisc 
> > features recently.
> > However, we can't support more in the future, for example, FP16 vector, 
> > BF16 vector, matrix modes, VLS modes,...etc.
> 
> I agree it doesn't make sense to try to squeeze modes out like this.
> It's a bit artificial, and like you say, it's likely only putting
> off the inevitable.

Agreed.  Let's do the proposed TYPE_PRECISION change first and then
see how bad 16bit mode will be.

Richard.


Re: [r13-7135 Regression] FAIL: gcc.dg/vect/vect-simd-clone-18f.c scan-tree-dump-times vect "[\\n\\r] [^\\n]* = foo\\.simdclone" 2 on Linux/x86_64

2023-04-13 Thread Richard Biener via Gcc-patches
On Thu, Apr 13, 2023 at 3:00 PM Andre Vieira (lists) via Gcc-patches
 wrote:
>
>
>
> On 13/04/2023 11:01, Andrew Stubbs wrote:
> > Hi Andre,
> >
> > I don't have a cascadelake device to test on, nor any knowledge about
> > what makes it different from regular x86_64.
>
> Not sure you need one, but yeah I don't know either, it looks like it
> fails because:
> in-branch vector clones are not yet supported for integer mask modes.
>
> A quick look tells me this is because mask_mode is not VOIDmode.
> i386.cc's TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN will set
> mask_mode to either DI or SI mode when TARGET_AVX512F. So I suspect
> cascadelake is TARGET_AVX512F.
>
> This is where I bail out as I really don't want to dive into the target
> specific simd clone handling of x86 ;)
>
> >
> > If the cascadelake device is supposed to work the same as other x86_64
> > devices for these vectors then the test has found a bug in the compiler
> > and you should be looking to fix that, not fudge the testcase.
> >
> > Alternatively, if the device's capabilities really are different and the
> > tests should behave differently, then the actual expectations need to be
> > encoded in the dejagnu directives. If you can't tell the difference by
> > looking at the "x86_64*-*-*" target selector alone then the correct
> > solution is to invent a new "effective-target" selector. There are lots
> > of examples of using these throughout the testsuite (you could use
> > dg-require-effective-target to disable the whole testcase, or just use
> > the name in the scan-tree-dump-times directive to customise the
> > expectations), and the definitions can be found in the
> > lib/target-supports.exp and lib/target-supports-dg.exp scripts. Some are
> > fixed expressions and some run the compiler to probe the configuration,
> > but in this case you probably want to do something with "check-flags".
>
> Even though I agree with you, I'm not the right person to do this
> digging for such target specific stuff. So for now I'd probably suggest
> xfailing this for avx512f.
> >
> > For the unroll problem, you can probably tweak the optimization options
> > to disable that, the same as has been done for the epilogues feature
> > that had the same problem.
>
> I mistaken the current behaviour for unrolling, it's actually because of
> a latent bug. The vectorizer calls `vect_get_smallest_scalar_type` to
> determine the vectype of a stmt. For a function like foo, that has the
> same type (long long) everywhere this wouldn't be a problem, however,
> because you transformed it into a MASK_CALL that has a function pointer
> (which is 32-bit in -m32) that now becomes the 'smallest' type.
>
> This is all a red-herring though, I don't think we should be calling
> this function for potential simdclone calls as the type on which the
> veclen is not necessarily the 'smallest' type. And some arguments (like
> uniform and linear) should be ignored anyway as they won't be mapped to
> vectors.  So I do think this might have been broken even before your
> changes, but needs further investigation.
> > Since these are new tests for a new feature, I don't really understand
> > why this is classed as a regression?
> >
> > Andrew
> >
> > P.S. there was a commit to these tests in the last few days, so make
> > sure you pull that before making changes.
>
> The latest commit to these tests was mine, it's the one Haochen is
> reporting this regression against. My commit was to fix the issue richi
> had introduced that was preventing the feature you introduced from
> working. The reason nobody noticed was because the tests you introduced
> didn't actually test your feature, since you didn't specify 'inbranch'
> the omp declare simd pragma was allowing the use of not-inbranch simd
> clones and the vectorizer was being smart enough to circumvent the
> conditional and was still able to use simdclones (non inbranch ones) so
> when the inbranch stopped working, the test didn't notice.
>
> The other changes to this test were already after the fix for 10
> that broke the inbranch feature you added, and so it was fixing a
> cascadelake testism but for the not-inbranch simdclones. So basically
> fixing a testism of a testism :/
>
>
> I am working on simdclone's for AArch64 for next Stage 1 so I don't mind
> looking at the issue with the vectype being chosen wrongly, as for the
> other x86 specific testisms I'll leave them to someone else.

Btw, the new testsuite FAILs could be just epiloge vectorizations, so
maybe try the usual --param vect-epilogues-nomask=0 ...

> Kind Regards,
> Andre


Re: [PATCH] PR tree-optimization/109462 - Don't use ANY PHI equivalences in range-on-entry.

2023-04-13 Thread Richard Biener via Gcc-patches
On Wed, Apr 12, 2023 at 10:55 PM Andrew MacLeod  wrote:
>
>
> On 4/12/23 07:01, Richard Biener wrote:
> > On Wed, Apr 12, 2023 at 12:59 PM Jakub Jelinek  wrote:
> >>
> >> Would be nice.
> >>
> >> Though, I'm afraid it still wouldn't fix the PR101912 testcase, because
> >> it has exactly what happens in this PR, undefined phi arg from the
> >> pre-header and uses of the previous iteration's value (i.e. across
> >> backedge).
> > Well yes, that's what's not allowed.  So when the PHI dominates the
> > to-be-equivalenced argument edge src then the equivalence isn't
> > valid because there's a place (that very source block for example) a use of 
> > the
> > PHI lhs could appear and where we'd then mixup iterations.
> >
> If we want to implement this cleaner, then as you say, we don't create
> the equivalence if the PHI node dominates the argument edge.  The
> attached patch does just that, removing the both the "fix" for 108139
> and the just committed one for 109462, replacing them with catching this
> at the time of equivalence registering.
>
> It bootstraps and passes all regressions tests.
> Do you want me to check this into trunk?

Uh, it looks a bit convoluted.  Wouldn't the following be enough?  OK
if that works
(or fixed if I messed up trivially)

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index e81f6b3699e..9c29012e160 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -776,7 +776,11 @@ fold_using_range::range_of_phi (vrange , gphi
*phi, fur_source )
  if (!seen_arg)
{
  seen_arg = true;
- single_arg = arg;
+ // Avoid registering an equivalence if the PHI dominates the
+ // argument edge.  See PR 108139/109462.
+ if (dom_info_available_p (CDI_DOMINATORS)
+ && !dominated_by_p (CDI_DOMINATORS, e->src, gimple_bb (phi)))
+   single_arg = arg;
}
  else if (single_arg != arg)
single_arg = NULL_TREE;


> Andrew
>
> PSOf course, we still fail 101912.   The only way I see us being
> able to do anything with that is to effectively peel the first iteration
> off, either physically,  or logically with the path ranger to determine
> if a given use  is actually reachable by the undefined value.
>
> 
>
> :
># prevcorr_7 = PHI 
># leapcnt_8 = PHI <0(2), leapcnt_26(8)>
>if (leapcnt_8 < n_16)   // 0 < n_16
>  goto ; [INV]
>
> :
>corr_22 = getint ();
>if (corr_22 <= 0)
>  goto ; [INV]
>else
>  goto ; [INV]
>
> :
>_1 = corr_22 == 1;
>_2 = leapcnt_8 != 0;  // [0, 0] = 0 != 0
>_3 = _1 & _2; // [0, 0] = 0 & _2
>if (_3 != 0)// 4->5 is not taken on the path starting
> 2->9
>  goto ; [INV]
>else
>  goto ; [INV]
>
> : // We know this path is not taken when
> prevcorr_7  == prevcorr_19(D)(2)
>if (prevcorr_7 != 1)
>  goto ; [INV]
>else
>  goto ; [INV]
>
> :
>_5 = prevcorr_7 + -1;
>if (prevcorr_7 != 2)
>  goto ; [INV]
>else
>  goto ; [INV]
>
> Using the path ranger (Would it even need tweaks aldy?) , before issuing
> the warning the uninit code could easily start at each use, construct
> the path(s) to that use from the unitialized value, and determine  when
> prevcorr is uninitialized, 2->9->3->4->5 will not be executed  and of
> course,neither will 2->9->3->4->5->6
>
>I think threading already does something similar?

It does quite more convoluted things than that - it computes predicates on
paths with its own representation & simplifications.  It might be worth
trying if replacing this with a lot of path rangers would help - but then
it heavily relies on relation simplification it implements (and at the
same time it does that very imperfectly).

Richard.

>
>


[PATCH] loop-iv: Fix up bounds computation

2023-04-13 Thread Jakub Jelinek via Gcc-patches
On Thu, Apr 13, 2023 at 06:35:07AM -0600, Jeff Law wrote:
> Bootstrap was successful with v3, but there's hundreds of testsuite failures
> due to the simplify-rtx hunk.  compile/20070520-1.c for example when
> compiled with:  -O3 -funroll-loops -march=rv64gc -mabi=lp64d
> 
> Thursdays are my hell day.  It's unlikely I'd be able to look at this at all
> today.

So, seems to me this is because loop-iv.cc asks for invalid RTL to be
simplified, it calls simplify_gen_binary (AND, SImode,
(subreg:SI (plus:DI (reg:DI 289 [ ivtmp_312 ])
(const_int 4294967295 [0x])) 0),
(const_int 4294967295 [0x]))
but 0x is not valid SImode CONST_INT, and unlike previously
we no longer on WORD_REGISTER_OPERATIONS targets which have DImode
word_mode optimize that into the op0, so the invalid constant is emitted
into the IL and checking fails.

The following patch fixes that (and we optimize that & -1 away even earlier
with that).

Could you please just quickly try to apply this patch, make in the stage3
directory followed by
make check-gcc RUNTESTFLAGS="... compile.exp='20070520-1.c ...'"
(with all tests that regressed previously), whether this is the only spot
or whether we need to fix some other place too?

2023-04-13  Jakub Jelinek  

* loop-iv.cc (iv_number_of_iterations): Use gen_int_mode instead
of GEN_INT.

--- gcc/loop-iv.cc.jj   2023-01-02 09:32:23.0 +0100
+++ gcc/loop-iv.cc  2023-04-13 15:34:11.939045804 +0200
@@ -2617,7 +2617,7 @@ iv_number_of_iterations (class loop *loo
  d *= 2;
  size--;
}
-  bound = GEN_INT (((uint64_t) 1 << (size - 1 ) << 1) - 1);
+  bound = gen_int_mode (((uint64_t) 1 << (size - 1) << 1) - 1, mode);
 
   tmp1 = lowpart_subreg (mode, iv1.base, comp_mode);
   tmp = simplify_gen_binary (UMOD, mode, tmp1, gen_int_mode (d, mode));


Jakub



Re: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-13 Thread 钟居哲
Thanks Kewen.

Current flow in this patch like you said:

len = WHILE_LEN (n,vf);
...
v = len_load (addr,len);
..
addr = addr + vf (in byte align);


This patch is just keep adding address with a vector factor (adjust as byte 
align).
For example, if your vector length = 512bit. Then this patch is just updating 
address as
addr = addr + 64;

However, today after I read RVV ISA more deeply, it should be more appropriate 
that
the address should updated as : addr = addr + (len * 4) if len is element 
number of INT32.
the len is the result by WHILE_LEN which calculate the len. 

I assume for IBM target, it's better to just update address directly adding the 
whole register bytesize 
in address IV. Since I think the second way (address = addr + (len * 4)) is too 
RVV specific, and won't be suitable for IBM. Is that right?
If it is true, I will keep this patch flow (won't change to  address = addr + 
(len * 4)) to see what else I need to do for IBM.
I would rather do that in RISC-V backend port.

>> I tried
>>to compile the above source files on Power, the former can adopt doloop
>>optimization but the latter fails to. 
You mean GCC can not do hardward loop optimization when IV loop control is 
variable ? 


juzhe.zh...@rivai.ai
 
From: Kewen.Lin
Date: 2023-04-13 15:29
To: 钟居哲
CC: gcc-patches; Jeff Law; rdapp; richard.sandiford; rguenther
Subject: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for 
auto-vectorization
Hi Juzhe,
 
on 2023/4/12 21:22, 钟居哲 wrote:
> Thanks Kewen. 
> 
> It seems that this proposal WHILE_LEN can help s390 when using --param 
> vect-partial-vector-usage=2 compile option.
> 
 
Yeah, IMHO, the previous sequence vs. the proposed sequence are like:
 
int
foo (int *__restrict a, int *__restrict b, int n)
{
  if (n <= 0)
return 0;
 
  int iv = 0;
  int len = MIN (n, 16);
  int sum = 0;
  do
{
  sum += a[len] + b[len];
  iv += 16;
  int n1 = MIN (n, iv);   // line A
  int n2 = n - n1;
  len = MIN (n2, 16);
}
  while (n > iv);
 
  return sum;
}
 
vs.
 
int
foo (int *__restrict a, int *__restrict b, int n)
{
  if (n <= 0)
return 0;
 
  int len;
  int sum = 0;
  do
{
  len = MIN (n, 16);
  sum += a[len] + b[len];
  n -= len;
}
  while (n > 0);
 
  return sum;
}
 
it at least saves one MIN (at line A) and one length preparation in the
last iteration (it's useless since loop ends).  But I think the concern
that this proposed IV isn't recognized as simple iv may stay.  I tried
to compile the above source files on Power, the former can adopt doloop
optimization but the latter fails to.
 
> Would you mind apply this patch && support WHILE_LEN in s390 backend and test 
> it to see the overal benefits for s390
> as well as the correctness of this sequence ? 
 
Sure, if all of you think this approach and this revision is good enough to go 
forward for this kind of evaluation,
I'm happy to give it a shot, but only for rs6000. ;-)  I noticed that there are 
some discussions on withdrawing this
WHILE_LEN by using MIN_EXPR instead, I'll stay tuned.
 
btw, now we only adopt vector with length on the epilogues rather than the main 
vectorized loops, because of the
non-trivial extra costs for length preparation than just using the normal 
vector load/store (all lanes), so we don't
care about the performance with --param vect-partial-vector-usage=2 much.  Even 
if this new proposal can optimize
the length preparation for --param vect-partial-vector-usage=2, the extra costs 
for length preparation is still
unavoidable (MIN, shifting, one more GPR used), we would still stay with 
default --param vect-partial-vector-usage=1
(which can't benefit from this new proposal).
 
BR,
Kewen
 


[RFC] c++/new-warning: Additional warning for name-hiding [PR12341]

2023-04-13 Thread Benjamin Priour via Gcc-patches
I've tried my hands on this first patch, to add new warnings for
name-hiding, i.e.
when a derived class's field shares the name of a base class's field.

I have currently put it under -Wshadow, but I could instead add a
-Wname-hiding warning, what do you think about this ?

At the moment, I'm using protect = 2 in a call to
cp/search.cc:lookup_field, meaning that I'm looking for a similarly
named field independently of its visibility (whether it is public,
protected or private within the base class) (1).
However, if the inheritance itself is not visible from the current
class, then I dismiss the warning (2).

I justify (1) with the code below.

class Base {
public:
friend void polymorphic_parameter_friend(Base *);
private:
  int x;
  int z;
};
class Derived : private Base {
public:
  int x; // warning emitted
  int y;
};
/* polymorphic_parameter_friend(new Derived()) ambiguous access to field x,
thus the necessity of the warning */

Extending the code (1) shows behavior of (2):
class Grand : public Derived {
  float x; // issue warning from Derived only.
  float z; // no warning issued because Grand doesn't know of Base.
};

Anonymous bit fields, and members of anonymous unions are correctly
dealt with (see pr12341-2.C)

I've also taken Jason's previous feedback and now use lookup_field.
I've also added an optional parameter 'once_suffices'. I describe as
follow in the docstring:
> ONCE_SUFFICES is 1 when we should return upon first find of the member in a 
> branch of the
> inheritance hierarchy tree, rather than collect all similarly named members 
> further in that branch.
> Does not impede other parallel branches of the tree.

I very welcome your comments on this patch, any feedback to change or
improve something. Bootstrapped OK on today's trunk. Regression tests
will run through the night.
The git gcc-mklog alias turns infinitely on my machine, so I generated
the Changelog  below by hand.

Changelog:
PR c++/12341
* search.cc (lookup_member):
New optional parameter to preempt too deep inheritance
tree processing.
(lookup_field): Likewise.
(dfs_walk_all): Likewise.
* cp-tree.h: Complete the above declarations.
* class.cc (warn_name_hiding): New function.
(finish_struct_1): Call warn_name_hiding if -Wshadow.
* testsuite/g++.dg/warn/pr12341-1.C: New file.
* testsuite/g++.dg/warn/pr12341-2.C: New file.

The mail is already long enough as it is, thus I'm attaching the git
diff in a text file.

Thanks,
Benjamin.
diff --git a/gcc/cp/class.cc b/gcc/cp/class.cc
index 68b62086340..1e3efc028a6 100644
--- a/gcc/cp/class.cc
+++ b/gcc/cp/class.cc
@@ -3080,6 +3080,80 @@ warn_hidden (tree t)
   }
 }
 
+/* Warn about non-static fields name hiding. */
+static void
+warn_name_hiding (tree t)
+{
+  if (is_empty_class (t) || CLASSTYPE_NEARLY_EMPTY_P (t))
+return;
+
+  for (tree field = TYPE_FIELDS (t); field; field = DECL_CHAIN (field))
+{
+/* Skip if field is not a user-defined non-static data member. */
+if (TREE_CODE (field) != FIELD_DECL || DECL_ARTIFICIAL (field))
+  continue;
+
+unsigned j;
+tree name = DECL_NAME (field);
+/* Skip if field is anonymous. */
+if (!name || !identifier_p (name))
+  continue;
+
+auto_vec base_vardecls;
+tree binfo;
+tree base_binfo;
+  /* Iterate through all of the base classes looking for possibly
+  shadowed non-static data members. */
+for (binfo = TYPE_BINFO (t), j = 0;
+BINFO_BASE_ITERATE (binfo, j, base_binfo); j++)
+{
+  tree basetype = BINFO_TYPE (base_binfo);
+  tree candidate = lookup_field (basetype,
+name,
+/* protect */ 2,
+/* want_type */ 0,
+/* once_suffices */ true);
+  if (candidate)
+  {
+/* 
+if we went up the hierarchy to a base class with multiple inheritance,
+there could be multiple matches, in which case a TREE_LIST is returned
+*/
+if (TREE_TYPE (candidate) == error_mark_node)
+{
+  for (; candidate; candidate = TREE_CHAIN (candidate))
+  {
+tree candidate_field = TREE_VALUE (candidate);
+tree candidate_klass = DECL_CONTEXT (candidate_field);
+if (accessible_base_p (t, candidate_klass, true))
+  base_vardecls.safe_push (candidate_field);
+  }
+}
+else if (accessible_base_p (t, DECL_CONTEXT (candidate), true))
+  base_vardecls.safe_push (candidate);
+  }
+}
+
+/* field was not found among the base classes */
+if (base_vardecls.is_empty ())
+  continue;
+
+/* Emit a warning for each field similarly named
+found in the base class hierarchy */
+for (tree base_vardecl : base_vardecls)
+  if (base_vardecl)
+  {
+auto_diagnostic_group d;
+if (warning_at (location_of (field),
+OPT_Wshadow,
+"%qD might shadow %qD", field, base_vardecl))
+inform (location_of (base_vardecl),
+  

Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-04-13 Thread Richard Sandiford via Gcc-patches
钟居哲  writes:
> Yeah, like kito said.
> Turns out the tuple type model in ARM SVE is the optimal solution for RVV.
> And we like ARM SVE style implmentation.
>
> And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal 
> not exceed 64 bit.
> But it seems that there is still problem in tree_type_common and 
> tree_decl_common, is that right?

I thought upthread we had a way forward for tree_type_common and
tree_decl_common too, but maybe I only convinced myself. :)

> After several trys (remove all redundant TI/TF vector modes and FP16 vector 
> mode), now there are 252 modes
> in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features 
> recently.
> However, we can't support more in the future, for example, FP16 vector, BF16 
> vector, matrix modes, VLS modes,...etc.

I agree it doesn't make sense to try to squeeze modes out like this.
It's a bit artificial, and like you say, it's likely only putting
off the inevitable.

Thanks,
Richard

>
> From RVV side, I think extending 1 more bit of machine mode should be enough 
> for RVV (overal 512 modes).
> Is it possible make it happen in tree_type_common and tree_decl_common, 
> Richards?
>
> Thank you so much for all comments.
>
>
> juzhe.zh...@rivai.ai


[PATCH] tree-optimization/109491 - ICE in expressions_equal_p

2023-04-13 Thread Richard Biener via Gcc-patches
At some point I elided the NULL pointer check in expressions_equal_p
because it shouldn't be necessary not realizing that for example
TARGET_MEM_REF has optional operands we cannot substitute with
something non-NULL with the same semantics.  The following does the
simple thing and restore the check removed in r11-4982.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/109491
* tree-ssa-sccvn.cc (expressions_equal_p): Restore the
NULL operands test.
---
 gcc/tree-ssa-sccvn.cc | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index 99609538f54..9692911e31b 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -6407,6 +6407,13 @@ expressions_equal_p (tree e1, tree e2, bool 
match_vn_top_optimistically)
   && (e1 == VN_TOP || e2 == VN_TOP))
 return true;
 
+  /* If only one of them is null, they cannot be equal.  While in general
+ this should not happen for operations like TARGET_MEM_REF some
+ operands are optional and an identity value we could substitute
+ has differing semantics.  */
+  if (!e1 || !e2)
+return false;
+
   /* SSA_NAME compare pointer equal.  */
   if (TREE_CODE (e1) == SSA_NAME || TREE_CODE (e2) == SSA_NAME)
 return false;
-- 
2.35.3


Re: [r13-7135 Regression] FAIL: gcc.dg/vect/vect-simd-clone-18f.c scan-tree-dump-times vect "[\\n\\r] [^\\n]* = foo\\.simdclone" 2 on Linux/x86_64

2023-04-13 Thread Andre Vieira (lists) via Gcc-patches




On 13/04/2023 11:01, Andrew Stubbs wrote:

Hi Andre,

I don't have a cascadelake device to test on, nor any knowledge about 
what makes it different from regular x86_64.


Not sure you need one, but yeah I don't know either, it looks like it 
fails because:

in-branch vector clones are not yet supported for integer mask modes.

A quick look tells me this is because mask_mode is not VOIDmode. 
i386.cc's TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN will set 
mask_mode to either DI or SI mode when TARGET_AVX512F. So I suspect 
cascadelake is TARGET_AVX512F.


This is where I bail out as I really don't want to dive into the target 
specific simd clone handling of x86 ;)




If the cascadelake device is supposed to work the same as other x86_64 
devices for these vectors then the test has found a bug in the compiler 
and you should be looking to fix that, not fudge the testcase.


Alternatively, if the device's capabilities really are different and the 
tests should behave differently, then the actual expectations need to be 
encoded in the dejagnu directives. If you can't tell the difference by 
looking at the "x86_64*-*-*" target selector alone then the correct 
solution is to invent a new "effective-target" selector. There are lots 
of examples of using these throughout the testsuite (you could use 
dg-require-effective-target to disable the whole testcase, or just use 
the name in the scan-tree-dump-times directive to customise the 
expectations), and the definitions can be found in the 
lib/target-supports.exp and lib/target-supports-dg.exp scripts. Some are 
fixed expressions and some run the compiler to probe the configuration, 
but in this case you probably want to do something with "check-flags".


Even though I agree with you, I'm not the right person to do this 
digging for such target specific stuff. So for now I'd probably suggest 
xfailing this for avx512f.


For the unroll problem, you can probably tweak the optimization options 
to disable that, the same as has been done for the epilogues feature 
that had the same problem.


I mistaken the current behaviour for unrolling, it's actually because of 
a latent bug. The vectorizer calls `vect_get_smallest_scalar_type` to 
determine the vectype of a stmt. For a function like foo, that has the 
same type (long long) everywhere this wouldn't be a problem, however, 
because you transformed it into a MASK_CALL that has a function pointer 
(which is 32-bit in -m32) that now becomes the 'smallest' type.


This is all a red-herring though, I don't think we should be calling 
this function for potential simdclone calls as the type on which the 
veclen is not necessarily the 'smallest' type. And some arguments (like 
uniform and linear) should be ignored anyway as they won't be mapped to 
vectors.  So I do think this might have been broken even before your 
changes, but needs further investigation.
Since these are new tests for a new feature, I don't really understand 
why this is classed as a regression?


Andrew

P.S. there was a commit to these tests in the last few days, so make 
sure you pull that before making changes.


The latest commit to these tests was mine, it's the one Haochen is 
reporting this regression against. My commit was to fix the issue richi 
had introduced that was preventing the feature you introduced from 
working. The reason nobody noticed was because the tests you introduced 
didn't actually test your feature, since you didn't specify 'inbranch' 
the omp declare simd pragma was allowing the use of not-inbranch simd 
clones and the vectorizer was being smart enough to circumvent the 
conditional and was still able to use simdclones (non inbranch ones) so 
when the inbranch stopped working, the test didn't notice.


The other changes to this test were already after the fix for 10 
that broke the inbranch feature you added, and so it was fixing a 
cascadelake testism but for the not-inbranch simdclones. So basically 
fixing a testism of a testism :/



I am working on simdclone's for AArch64 for next Stage 1 so I don't mind 
looking at the issue with the vectype being chosen wrongly, as for the 
other x86 specific testisms I'll leave them to someone else.


Kind Regards,
Andre


Re: [PATCH] combine, v4: Fix AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]

2023-04-13 Thread Jeff Law via Gcc-patches




On 4/13/23 04:57, Segher Boessenkool wrote:

On Wed, Apr 12, 2023 at 10:05:08PM -0600, Jeff Law wrote:

On 4/12/23 10:58, Jakub Jelinek wrote:

Seems my cross defaulted to 32-bit compilation, reproduced it with
additional -mabi=lp64 -march=rv64gv even on the pr108947.c test.
So, let's include that test in the patch too:

2023-04-12  Jeff Law  
Jakub Jelinek  

PR target/108947
PR target/109040
* combine.cc (simplify_and_const_int_1): Compute nonzero_bits in
word_mode rather than mode if WORD_REGISTER_OPERATIONS and mode is
smaller than word_mode.
* simplify-rtx.cc (simplify_context::simplify_binary_operation_1)
: Likewise.

* gcc.dg/pr108947.c: New test.
* gcc.c-torture/execute/pr109040.c: New test.

Bootstrap of the v3 patch has completed.  Regression testing is still
spinning.   It should be done and waiting for me when I wake up in the
morning.


It's still okay for trunk (of course) if the bootstrap doesn't fail (of
course).  Thanks guys!
Bootstrap was successful with v3, but there's hundreds of testsuite 
failures due to the simplify-rtx hunk.  compile/20070520-1.c for example 
when compiled with:  -O3 -funroll-loops -march=rv64gc -mabi=lp64d


Thursdays are my hell day.  It's unlikely I'd be able to look at this at 
all today.



typedef unsigned char uint8_t;
extern uint8_t ff_cropTbl[256 + 2 * 1024];

void ff_pred8x8_plane_c(uint8_t *src, int stride){
  int j, k;
  int a;
  uint8_t *cm = ff_cropTbl + 1024;
  const uint8_t * const src0 = src+3-stride;
  const uint8_t *src1 = src+4*stride-1;
  const uint8_t *src2 = src1-2*stride;
  int H = src0[1] - src0[-1];
  int V = src1[0] - src2[ 0];
  for(k=2; k<=4; ++k) {
src1 += stride; src2 -= stride;
H += k*(src0[k] - src0[-k]);
V += k*(src1[0] - src2[ 0]);
  }
  H = ( 17*H+16 ) >> 5;
  V = ( 17*V+16 ) >> 5;

  a = 16*(src1[0] + src2[8]+1) - 3*(V+H);
  for(j=8; j>0; --j) {
int b = a;
a += V;
src[0] = cm[ (b ) >> 5 ];
src[1] = cm[ (b+ H) >> 5 ];
src[2] = cm[ (b+2*H) >> 5 ];
src[3] = cm[ (b+3*H) >> 5 ];
src[4] = cm[ (b+4*H) >> 5 ];
src[5] = cm[ (b+5*H) >> 5 ];
src[6] = cm[ (b+6*H) >> 5 ];
src[7] = cm[ (b+7*H) >> 5 ];
src += stride;
  }
}

Jeff


Re: [PATCH] LoongArch: Remove the definition of the macro LOGICAL_OP_NON_SHORT_CIRCUIT under the architecture and use the default definition instead.

2023-04-13 Thread Lulu Cheng



在 2023/4/13 下午8:24, Xi Ruoyao 写道:

On Thu, 2023-04-13 at 19:51 +0800, Lulu Cheng wrote:

In some cases, setting this macro as the default can reduce the number of 
conditional
branch instructions.

gcc/ChangeLog:

 * config/loongarch/loongarch.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Remove 
the macro
 definition.

I think it's OK for GCC 13.  At least the result is better for simple
cases like "x >= a && x < b".

I also want to merge to GCC13.:-D



---
  gcc/config/loongarch/loongarch.h | 1 -
  1 file changed, 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index f8167875646..6b7dbecd3ff 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -836,7 +836,6 @@ typedef struct {
     1 is the default; other values are interpreted relative to that.  */
  
  #define BRANCH_COST(speed_p, predictable_p) loongarch_branch_cost

-#define LOGICAL_OP_NON_SHORT_CIRCUIT 0
  
  /* Return the asm template for a conditional branch instruction.

     OPCODE is the opcode's mnemonic and OPERANDS is the asm template for




[PATCH] RISC-V: Support chunk 128

2023-04-13 Thread juzhe . zhong
From: Juzhe-Zhong 

Since multiple conflicts with previous patch.
Rebase to the trunk and resend it.

gcc/ChangeLog:

* config/riscv/riscv-modes.def (FLOAT_MODE): Add chunk 128 modes.
(VECTOR_BOOL_MODE): Ditto.
(ADJUST_NUNITS): Ditto.
(ADJUST_ALIGNMENT): Ditto.
(ADJUST_BYTESIZE): Ditto.
(ADJUST_PRECISION): Ditto.
(RVV_MODES): Ditto.
(VECTOR_MODE_WITH_PREFIX): Ditto.
* config/riscv/riscv-v.cc (ENTRY): Ditto.
(get_vlmul): Ditto.
(get_ratio): Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_TYPE): Ditto.
* config/riscv/riscv-vector-builtins.def (DEF_RVV_TYPE): Ditto.
(vbool64_t): Ditto.
(vbool32_t): Ditto.
(vbool16_t): Ditto.
(vbool8_t): Ditto.
(vbool4_t): Ditto.
(vbool2_t): Ditto.
(vbool1_t): Ditto.
(vint8mf8_t): Ditto.
(vuint8mf8_t): Ditto.
(vint8mf4_t): Ditto.
(vuint8mf4_t): Ditto.
(vint8mf2_t): Ditto.
(vuint8mf2_t): Ditto.
(vint8m1_t): Ditto.
(vuint8m1_t): Ditto.
(vint8m2_t): Ditto.
(vuint8m2_t): Ditto.
(vint8m4_t): Ditto.
(vuint8m4_t): Ditto.
(vint8m8_t): Ditto.
(vuint8m8_t): Ditto.
(vint16mf4_t): Ditto.
(vuint16mf4_t): Ditto.
(vint16mf2_t): Ditto.
(vuint16mf2_t): Ditto.
(vint16m1_t): Ditto.
(vuint16m1_t): Ditto.
(vint16m2_t): Ditto.
(vuint16m2_t): Ditto.
(vint16m4_t): Ditto.
(vuint16m4_t): Ditto.
(vint16m8_t): Ditto.
(vuint16m8_t): Ditto.
(vint32mf2_t): Ditto.
(vuint32mf2_t): Ditto.
(vint32m1_t): Ditto.
(vuint32m1_t): Ditto.
(vint32m2_t): Ditto.
(vuint32m2_t): Ditto.
(vint32m4_t): Ditto.
(vuint32m4_t): Ditto.
(vint32m8_t): Ditto.
(vuint32m8_t): Ditto.
(vint64m1_t): Ditto.
(vuint64m1_t): Ditto.
(vint64m2_t): Ditto.
(vuint64m2_t): Ditto.
(vint64m4_t): Ditto.
(vuint64m4_t): Ditto.
(vint64m8_t): Ditto.
(vuint64m8_t): Ditto.
(vfloat32mf2_t): Ditto.
(vfloat32m1_t): Ditto.
(vfloat32m2_t): Ditto.
(vfloat32m4_t): Ditto.
(vfloat32m8_t): Ditto.
(vfloat64m1_t): Ditto.
(vfloat64m2_t): Ditto.
(vfloat64m4_t): Ditto.
(vfloat64m8_t): Ditto.
* config/riscv/riscv-vector-switch.def (ENTRY): Ditto.
* config/riscv/riscv.cc (riscv_legitimize_poly_move): Ditto.
(riscv_convert_vector_bits): Ditto.
* config/riscv/riscv.md: Ditto.
* config/riscv/vector-iterators.md: Ditto.
* config/riscv/vector.md 
(@pred_indexed_store): Ditto.
(@pred_indexed_store): Ditto.
(@pred_indexed_store): Ditto.
(@pred_indexed_store): Ditto.
(@pred_indexed_store): Ditto.
(@pred_reduc_): Ditto.
(@pred_widen_reduc_plus): Ditto.
(@pred_reduc_plus): Ditto.
(@pred_widen_reduc_plus): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr108185-4.c: Adapt test.
* gcc.target/riscv/rvv/base/spill-1.c: Ditto.
* gcc.target/riscv/rvv/base/spill-11.c: Ditto.
* gcc.target/riscv/rvv/base/spill-2.c: Ditto.
* gcc.target/riscv/rvv/base/spill-3.c: Ditto.
* gcc.target/riscv/rvv/base/spill-5.c: Ditto.
* gcc.target/riscv/rvv/base/spill-9.c: Ditto.

---
 gcc/config/riscv/riscv-modes.def  |  89 +--
 gcc/config/riscv/riscv-v.cc   |  17 +-
 gcc/config/riscv/riscv-vector-builtins.cc |  11 +-
 gcc/config/riscv/riscv-vector-builtins.def| 172 +++---
 gcc/config/riscv/riscv-vector-switch.def  | 102 ++--
 gcc/config/riscv/riscv.cc |  12 +-
 gcc/config/riscv/riscv.md |  14 +-
 gcc/config/riscv/vector-iterators.md  | 571 +++---
 gcc/config/riscv/vector.md| 233 +--
 .../gcc.target/riscv/rvv/base/pr108185-4.c|   2 +-
 .../gcc.target/riscv/rvv/base/spill-1.c   |   2 +-
 .../gcc.target/riscv/rvv/base/spill-11.c  |   2 +-
 .../gcc.target/riscv/rvv/base/spill-2.c   |   2 +-
 .../gcc.target/riscv/rvv/base/spill-3.c   |   2 +-
 .../gcc.target/riscv/rvv/base/spill-5.c   |   2 +-
 .../gcc.target/riscv/rvv/base/spill-9.c   |   2 +-
 16 files changed, 781 insertions(+), 454 deletions(-)

diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
index 4cf7cf8b1c6..b1669609eec 100644
--- a/gcc/config/riscv/riscv-modes.def
+++ b/gcc/config/riscv/riscv-modes.def
@@ -27,15 +27,16 @@ FLOAT_MODE (TF, 16, ieee_quad_format);
 /* Encode the ratio of SEW/LMUL into the mask types. There are the following
  * mask types.  */
 
-/* | Mode | MIN_VLEN = 32 | MIN_VLEN = 64 |
-   |  | SEW/LMUL  | SEW/LMUL  |
-   | VNx1BI   | 32| 64|
-   | VNx2BI   

Re: [PATCH] LoongArch: Remove the definition of the macro LOGICAL_OP_NON_SHORT_CIRCUIT under the architecture and use the default definition instead.

2023-04-13 Thread Xi Ruoyao via Gcc-patches
On Thu, 2023-04-13 at 19:51 +0800, Lulu Cheng wrote:
> In some cases, setting this macro as the default can reduce the number of 
> conditional
> branch instructions.
> 
> gcc/ChangeLog:
> 
> * config/loongarch/loongarch.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Remove 
> the macro
> definition.

I think it's OK for GCC 13.  At least the result is better for simple
cases like "x >= a && x < b". 

> ---
>  gcc/config/loongarch/loongarch.h | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/gcc/config/loongarch/loongarch.h 
> b/gcc/config/loongarch/loongarch.h
> index f8167875646..6b7dbecd3ff 100644
> --- a/gcc/config/loongarch/loongarch.h
> +++ b/gcc/config/loongarch/loongarch.h
> @@ -836,7 +836,6 @@ typedef struct {
>     1 is the default; other values are interpreted relative to that.  */
>  
>  #define BRANCH_COST(speed_p, predictable_p) loongarch_branch_cost
> -#define LOGICAL_OP_NON_SHORT_CIRCUIT 0
>  
>  /* Return the asm template for a conditional branch instruction.
>     OPCODE is the opcode's mnemonic and OPERANDS is the asm template for

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] testsuite: filter out warning noise for CWE-1341 test

2023-04-13 Thread Segher Boessenkool
On Thu, Apr 13, 2023 at 07:39:01AM +, Richard Biener wrote:
> On Thu, 13 Apr 2023, Jiufu Guo wrote:
> I think this should be fixed in the analyzer, "stripping" malloc
> tracking from fopen/fclose since it does this manually.  I've adjusted
> the bug accordingly.

Yeah.

> > > +/* This case checks double-fclose only, suppress other warning.  */
> > > +/* { dg-additional-options -Wno-analyzer-double-free } */

So please add "(PR108722)" or such to the comment here?  That is enough
for future people to see if this is still necessary, to maybe remove it
from the testcase here, but certainly not cargo-cult it to other
testcases!

Thanks,


Segher


[PATCH] LoongArch: Remove the definition of the macro LOGICAL_OP_NON_SHORT_CIRCUIT under the architecture and use the default definition instead.

2023-04-13 Thread Lulu Cheng
In some cases, setting this macro as the default can reduce the number of 
conditional
branch instructions.

gcc/ChangeLog:

* config/loongarch/loongarch.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Remove 
the macro
definition.
---
 gcc/config/loongarch/loongarch.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index f8167875646..6b7dbecd3ff 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -836,7 +836,6 @@ typedef struct {
1 is the default; other values are interpreted relative to that.  */
 
 #define BRANCH_COST(speed_p, predictable_p) loongarch_branch_cost
-#define LOGICAL_OP_NON_SHORT_CIRCUIT 0
 
 /* Return the asm template for a conditional branch instruction.
OPCODE is the opcode's mnemonic and OPERANDS is the asm template for
-- 
2.31.1



Re: [PATCH v6] RISC-V: Add support for experimental zfa extension.

2023-04-13 Thread jinma via Gcc-patches
Thank you very much for your comments. Since a long time has passed and this is 
an initial version, I will update this patch.
--
From:Christoph Müllner 
Sent At:2023 Apr. 13 (Thu.) 17:22
To:Jin Ma 
Cc:gcc-patches ; kito.cheng ; 
kito.cheng ; palmer 
Subject:Re: [PATCH v6] RISC-V: Add support for experimental zfa extension.
On Fri, Mar 10, 2023 at 1:41 PM Jin Ma via Gcc-patches
 wrote:
>
> This patch adds the 'Zfa' extension for riscv, which is based on:
> https://github.com/riscv/riscv-isa-manual/commit/d74d99e22d5f68832f70982d867614e2149a3bd7
> latest 'Zfa' change on the master branch of the RISC-V ISA Manual as
> of this writing.
>
> The Wiki Page (details):
> https://github.com/a4lg/binutils-gdb/wiki/riscv_zfa
>
> The binutils-gdb for 'Zfa' extension:
> https://sourceware.org/pipermail/binutils/2022-September/122938.html
>
> Implementation of zfa extension on LLVM:
> https://reviews.llvm.org/rGc0947dc44109252fcc0f68a542fc6ef250d4d3a9
>
> There are three points that need to be discussed here.
> 1. According to riscv-spec, "The FCVTMO D.W.D instruction was added 
> principally to
> accelerate the processing of JavaScript Numbers.", so it seems that no 
> implementation
> is required in the compiler.
> 2. The FROUND and FROUNDN instructions in this patch use related functions in 
> the math
> library, such as round, floor, ceil, etc. Since there is no interface for 
> half-precision in
> the math library, the instructions FROUN D.H and FROUNDN X.H have not been 
> implemented for
> the time being. Is it necessary to add a built-in interface belonging to 
> riscv such as
> __builtin_roundhf or __builtin_roundf16 to generate half floating point 
> instructions?
> 3. As far as I know, FMINM and FMAXM instructions correspond to C23 library 
> function fminimum
> and fmaximum. Therefore, I have not dealt with such instructions for the time 
> being, but have
> simply implemented the pattern of fminm3 and fmaxm3. Is 
> it necessary to
> add a built-in interface belonging to riscv such as__builtin_fminm to 
> generate half
> floating-point instructions?
I have rebased and tested this patch.
Here are my observations (with fixes below at the actual code):
* There is a compiler warning because of a missing "fallthrough" comment
* There are merge conflicts with a current master
* The constant operand of the fli instruction uses the constant index
in the rs1-field, but not the constant in hex FP literal form
A patch that addresses these issues can also be found here:
 https://github.com/cmuellner/gcc/tree/riscv-zfa
Additionally I observe the following failing test cases with this patch applied:
 === gcc: Unexpected fails for rv64gc lp64d medlow ===
FAIL: gcc.target/riscv/zero-scratch-regs-3.c -O0 (internal compiler
error: Segmentation fault)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c -O0 (test for excess errors)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c -O1 (internal compiler
error: Segmentation fault)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c -O1 (test for excess errors)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c -O2 (internal compiler
error: Segmentation fault)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c -O2 (test for excess errors)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c -O2 -flto
-fno-use-linker-plugin -flto-partition=none (internal compiler error:
Segmentation fault)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c -O2 -flto
-fno-use-linker-plugin -flto-partition=none (test for excess errors)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects (internal compiler error:
Segmentation fault)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects (test for excess errors)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c -O3 -g (internal
compiler error: Segmentation fault)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c -O3 -g (test for excess errors)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c -Os (internal compiler
error: Segmentation fault)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c -Os (test for excess errors)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c -Og -g (internal
compiler error: Segmentation fault)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c -Og -g (test for excess errors)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c -Oz (internal compiler
error: Segmentation fault)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c -Oz (test for excess errors)
I have not analysed these ICEs so far.
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc: Add zfa extension.
> * config/riscv/constraints.md (Zf): Constrain the floating point number that 
> the FLI instruction can load.
> * config/riscv/iterators.md (round_pattern): New.
> * config/riscv/predicates.md: Predicate the floating point number that the 
> FLI instruction can load.
> * config/riscv/riscv-opts.h (MASK_ZFA): New.
> (TARGET_ZFA): New.
> * config/riscv/riscv-protos.h 

Re: [PATCH] combine, v4: Fix AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]

2023-04-13 Thread Segher Boessenkool
On Wed, Apr 12, 2023 at 10:05:08PM -0600, Jeff Law wrote:
> On 4/12/23 10:58, Jakub Jelinek wrote:
> >Seems my cross defaulted to 32-bit compilation, reproduced it with
> >additional -mabi=lp64 -march=rv64gv even on the pr108947.c test.
> >So, let's include that test in the patch too:
> >
> >2023-04-12  Jeff Law  
> > Jakub Jelinek  
> >
> > PR target/108947
> > PR target/109040
> > * combine.cc (simplify_and_const_int_1): Compute nonzero_bits in
> > word_mode rather than mode if WORD_REGISTER_OPERATIONS and mode is
> > smaller than word_mode.
> > * simplify-rtx.cc (simplify_context::simplify_binary_operation_1)
> > : Likewise.
> >
> > * gcc.dg/pr108947.c: New test.
> > * gcc.c-torture/execute/pr109040.c: New test.
> Bootstrap of the v3 patch has completed.  Regression testing is still 
> spinning.   It should be done and waiting for me when I wake up in the 
> morning.

It's still okay for trunk (of course) if the bootstrap doesn't fail (of
course).  Thanks guys!


Segher


Re: [r13-7135 Regression] FAIL: gcc.dg/vect/vect-simd-clone-18f.c scan-tree-dump-times vect "[\\n\\r] [^\\n]* = foo\\.simdclone" 2 on Linux/x86_64

2023-04-13 Thread Andrew Stubbs

Hi Andre,

I don't have a cascadelake device to test on, nor any knowledge about 
what makes it different from regular x86_64.


If the cascadelake device is supposed to work the same as other x86_64 
devices for these vectors then the test has found a bug in the compiler 
and you should be looking to fix that, not fudge the testcase.


Alternatively, if the device's capabilities really are different and the 
tests should behave differently, then the actual expectations need to be 
encoded in the dejagnu directives. If you can't tell the difference by 
looking at the "x86_64*-*-*" target selector alone then the correct 
solution is to invent a new "effective-target" selector. There are lots 
of examples of using these throughout the testsuite (you could use 
dg-require-effective-target to disable the whole testcase, or just use 
the name in the scan-tree-dump-times directive to customise the 
expectations), and the definitions can be found in the 
lib/target-supports.exp and lib/target-supports-dg.exp scripts. Some are 
fixed expressions and some run the compiler to probe the configuration, 
but in this case you probably want to do something with "check-flags".


For the unroll problem, you can probably tweak the optimization options 
to disable that, the same as has been done for the epilogues feature 
that had the same problem.


Since these are new tests for a new feature, I don't really understand 
why this is classed as a regression?


Andrew

P.S. there was a commit to these tests in the last few days, so make 
sure you pull that before making changes.

On 13/04/2023 10:15, Andre Simoes Dias Vieira wrote:

Hi,

@Andrew: Could you have a look at these? I had a quick look at 17f.c and it 
looks to me like the target selectors aren't specific enough. Unfortunately I 
am not familiar enough with target selectors (or targets for that matter) for 
x86_64. From what I could tell with -m32 gcc decides to unroll the vectorized 
loop so you end up with 4 simdclones rather than the 2 it tests for, GCC 
probably uses a different cost structure for -m32 that decides it is profitable 
to unroll?

As for -march=cascadelake, that seems to prevent gcc from using the inbranch 
simdclones altogether, so I suspect that cascadelake doesn't support these 
inbranch simdclones or the vector types it is trying to use.

Kind regards,
Andre


From: haochen.jiang 
Sent: Thursday, April 13, 2023 2:48 AM
To: Andre Simoes Dias Vieira; gcc-regress...@gcc.gnu.org; 
gcc-patches@gcc.gnu.org; haochen.ji...@intel.com
Subject: [r13-7135 Regression] FAIL: gcc.dg/vect/vect-simd-clone-18f.c 
scan-tree-dump-times vect "[\\n\\r] [^\\n]* = foo\\.simdclone" 2 on Linux/x86_64

On Linux/x86_64,

58c8c1b383bc3c286d6527fc6e8fb62463f9a877 is the first bad commit
commit 58c8c1b383bc3c286d6527fc6e8fb62463f9a877
Author: Andre Vieira 
Date:   Tue Apr 11 10:07:43 2023 +0100

 if-conv: Restore MASK_CALL conversion [PR10]

caused

FAIL: gcc.dg/vect/vect-simd-clone-16e.c scan-tree-dump-times vect "[\\n\\r] [^\\n]* 
= foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-16f.c scan-tree-dump-times vect "[\\n\\r] [^\\n]* 
= foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-17e.c scan-tree-dump-times vect "[\\n\\r] [^\\n]* 
= foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-17f.c scan-tree-dump-times vect "[\\n\\r] [^\\n]* 
= foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-18e.c scan-tree-dump-times vect "[\\n\\r] [^\\n]* 
= foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-18f.c scan-tree-dump-times vect "[\\n\\r] [^\\n]* 
= foo\\.simdclone" 2

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-7135/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-16e.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-16e.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-16f.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-16f.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-17e.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-17e.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-17f.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-17f.c --target_board='unix{-m32\ 

Re: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-13 Thread juzhe.zh...@rivai.ai
Thank you Richard.

>> But the issue is the same in the reverse with WHILE_LEN, no?
>>WHILE_LEN just computes a scalar value - you seem to suggest
>>there's a hidden side-effect of "coalescing" the result with
>>a hardware vector length register?  I don't think that's good design.
No, I don't plan to suggest there's a hidden side-effect of "coalescing"
the result with a hardware vector length register.

Today, I read RVV ISA deeply again. I realize that this patch is not absolute 
correct for 
any RVV hardward.

According to RVV ISA, the vsetvl definition:
an vsetvli instruction which is vsetvli vl, avl, vtype
vl = AVL if AVL ≤ VLMAX
ceil(AVL / 2) ≤ vl ≤ VLMAX if AVL < (2 * VLMAX)
vl = VLMAX if AVL ≥ (2 * VLMAX)
Deterministic on any given implementation for same input AVL and VLMAX values
The second constraint make the result of vsetvli is not necessary to be VLMAX 
(the maximum number of elements will be updated of specific vector-length RVV 
CPU).

So for a vsetvli instruction (vsetvli vl,avl,vtype). The "vl" value can be 
various among different RVV CPU depending on the implementation of the 
downstream RVV hardware.


 Now I think I should fix this patch since this patch is not always suitable 
for all hardware.

So according to RVV ISA:
For example, this permits an implementation to set vl = ceil(AVL / 2) for VLMAX 
< AVL < 2*VLMAX in order to evenly distribute work over the last two iterations 
of a stripmine loop.

We can have these 2 following different RVV CPU:

Suppose  the maximum number of the elements needs to be updated is 10 element 
(int32_t), and the vector length = 256 bit (update 8 INT32 elements in max).

So there are 2 iterations we need, the number elements of each iteration 
depending on hardware implementation.

So we can have these 2 following hardware implementation are both legal for RVV 
standard:

RVV CPU 1:
1st iteration update 5 element (it satisfy the constraint ceil (AVL/2) <= vl <= 
VLMAX), set vl = ceil (AVL/2) = 5
2nd iteration update 5 elements too.

RVV CPU 2:
1st iteration update 8 elements. set vl = VLMAX = 8.
2nd iteration update 3 elments.

These 2 RVV CPU are both legal according to RVV specification standard.
It's obvious this patch is correct for RVV CPU 2 but incorrect for RVV CPU 1.

Since the current flow of this patch is as follows:

+   
+   _19 = (unsigned long) n_5(D);
+   ...
+
+   :
+   ...
+   # ivtmp_20 = PHI 
+   ...
+   _22 = .WHILE_LEN (ivtmp_20, vf);
+   ...
+   LEN_LOAD (addr, _22);...addr = addr + vf;+  ivtmp_21 = 
ivtmp_20 - _22;
+   ...
+   if (ivtmp_21 != 0)
+ goto ; [75.00%]
+   else
+ goto ; [25.00%]
+
+   
+   return;

Here the _22 which is the output of WHILE_LEN is only used in ivtmp_21 = 
ivtmp_20 - _22;
which serves the saturating-to-zero subtraction. 
And "addr = addr + vf;" 
The address is calculated in the loop just keep add vf. 
Such sequence is Ok for most of the RVV CPU so far I think.
However, for future compatibility, we should make WHILE_LEN output as the 
address IV adding value too.

So, we should abandon the current the address loop way which is just keeping 
add vf.

Replace "addr = addr + vf".

Instead, we should do like that:

_22 = .WHILE_LEN (ivtmp_20, vf);

LEN_LOAD (addr, _22);tmp = _22 * 4; (Assume it is INT32 calculation, make _22 
which is INT32 align into BYTE align for address counting) addr = addr + 
tmp;
Makeing the result of WHILE_LEN is not only used to do the remain = remain - 
len,But also used in addressing calculating: tmp = len * (element bytesize) ; 
addr = addr + tmp;
Then this flow is the correct flow for all RVV CPU.
This flow is totally same as example in RVV ISA define:
https://github.com/riscv/riscv-v-spec/blob/master/example/vvaddint32.s
I think I need to change this patch as described above to make it to be global 
suitable for all RVV CPU in the word. But I am not sure whether GCC community 
accept this flow. So I propose it now before I do it. 
I didn't realize that since my downstream RVV hardware and the open-source 
simulator generate "vl" = VLMAX. (Sorry about that)


Expecting any suggestions and comments.
Thank you so much.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-04-13 14:47
To: 钟居哲
CC: richard.sandiford; gcc-patches; Jeff Law; rdapp; linkw; kito.cheng
Subject: Re: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support 
for auto-vectorization
On Wed, 12 Apr 2023, ??? wrote:
 
> >> It's not so much that we need to do that.  But normally it's only worth
> >> adding internal functions if they do something that is too complicated
> >> to express in simple gimple arithmetic.  The UQDEC case I mentioned:
> 
> >>z = MAX (x, y) - y
> 
> >> fell into the "simple arithmetic" category for me.  We could have added
> >> an ifn for unsigned saturating decrement, but it didn't seem complicated
> >> enough to merit its own ifn.
> 
> Ah, I known your concern. I should admit 

Re: [PATCH v6] RISC-V: Add support for experimental zfa extension.

2023-04-13 Thread Christoph Müllner
On Fri, Mar 10, 2023 at 1:41 PM Jin Ma via Gcc-patches
 wrote:
>
> This patch adds the 'Zfa' extension for riscv, which is based on:
>  
> https://github.com/riscv/riscv-isa-manual/commit/d74d99e22d5f68832f70982d867614e2149a3bd7
> latest 'Zfa' change on the master branch of the RISC-V ISA Manual as
> of this writing.
>
> The Wiki Page (details):
>  https://github.com/a4lg/binutils-gdb/wiki/riscv_zfa
>
> The binutils-gdb for 'Zfa' extension:
>  https://sourceware.org/pipermail/binutils/2022-September/122938.html
>
> Implementation of zfa extension on LLVM:
>   https://reviews.llvm.org/rGc0947dc44109252fcc0f68a542fc6ef250d4d3a9
>
> There are three points that need to be discussed here.
> 1. According to riscv-spec, "The FCVTMO D.W.D instruction was added 
> principally to
>   accelerate the processing of JavaScript Numbers.", so it seems that no 
> implementation
>   is required in the compiler.
> 2. The FROUND and FROUNDN instructions in this patch use related functions in 
> the math
>   library, such as round, floor, ceil, etc. Since there is no interface for 
> half-precision in
>   the math library, the instructions FROUN D.H and FROUNDN X.H have not been 
> implemented for
>   the time being. Is it necessary to add a built-in interface belonging to 
> riscv such as
>  __builtin_roundhf or __builtin_roundf16 to generate half floating point 
> instructions?
> 3. As far as I know, FMINM and FMAXM instructions correspond to C23 library 
> function fminimum
>   and fmaximum. Therefore, I have not dealt with such instructions for the 
> time being, but have
>   simply implemented the pattern of fminm3 and fmaxm3. Is 
> it necessary to
>   add a built-in interface belonging to riscv such as__builtin_fminm to 
> generate half
>   floating-point instructions?


I have rebased and tested this patch.
Here are my observations (with fixes below at the actual code):
* There is a compiler warning because of a missing "fallthrough" comment
* There are merge conflicts with a current master
* The constant operand of the fli instruction uses the constant index
in the rs1-field, but not the constant in hex FP literal form

A patch that addresses these issues can also be found here:
  https://github.com/cmuellner/gcc/tree/riscv-zfa

Additionally I observe the following failing test cases with this patch applied:

=== gcc: Unexpected fails for rv64gc lp64d medlow ===
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O0  (internal compiler
error: Segmentation fault)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O0  (test for excess errors)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O1  (internal compiler
error: Segmentation fault)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O1  (test for excess errors)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O2  (internal compiler
error: Segmentation fault)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O2  (test for excess errors)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none  (internal compiler error:
Segmentation fault)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none  (test for excess errors)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects  (internal compiler error:
Segmentation fault)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects  (test for excess errors)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O3 -g  (internal
compiler error: Segmentation fault)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -O3 -g  (test for excess errors)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -Os  (internal compiler
error: Segmentation fault)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c   -Os  (test for excess errors)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c  -Og -g  (internal
compiler error: Segmentation fault)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c  -Og -g  (test for excess errors)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c  -Oz  (internal compiler
error: Segmentation fault)
FAIL: gcc.target/riscv/zero-scratch-regs-3.c  -Oz  (test for excess errors)

I have not analysed these ICEs so far.


>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc: Add zfa extension.
> * config/riscv/constraints.md (Zf): Constrain the floating point 
> number that the FLI instruction can load.
> * config/riscv/iterators.md (round_pattern): New.
> * config/riscv/predicates.md: Predicate the floating point number 
> that the FLI instruction can load.
> * config/riscv/riscv-opts.h (MASK_ZFA): New.
> (TARGET_ZFA): New.
> * config/riscv/riscv-protos.h (riscv_float_const_rtx_index_for_fli): 
> Get the index of the
>   floating-point number that the FLI instruction can load.
> * config/riscv/riscv.cc (find_index_in_array): New.
> (riscv_float_const_rtx_index_for_fli): New.
> 

Re: [r13-7135 Regression] FAIL: gcc.dg/vect/vect-simd-clone-18f.c scan-tree-dump-times vect "[\\n\\r] [^\\n]* = foo\\.simdclone" 2 on Linux/x86_64

2023-04-13 Thread Andre Simoes Dias Vieira via Gcc-patches
Hi,

@Andrew: Could you have a look at these? I had a quick look at 17f.c and it 
looks to me like the target selectors aren't specific enough. Unfortunately I 
am not familiar enough with target selectors (or targets for that matter) for 
x86_64. From what I could tell with -m32 gcc decides to unroll the vectorized 
loop so you end up with 4 simdclones rather than the 2 it tests for, GCC 
probably uses a different cost structure for -m32 that decides it is profitable 
to unroll?

As for -march=cascadelake, that seems to prevent gcc from using the inbranch 
simdclones altogether, so I suspect that cascadelake doesn't support these 
inbranch simdclones or the vector types it is trying to use.

Kind regards,
Andre


From: haochen.jiang 
Sent: Thursday, April 13, 2023 2:48 AM
To: Andre Simoes Dias Vieira; gcc-regress...@gcc.gnu.org; 
gcc-patches@gcc.gnu.org; haochen.ji...@intel.com
Subject: [r13-7135 Regression] FAIL: gcc.dg/vect/vect-simd-clone-18f.c 
scan-tree-dump-times vect "[\\n\\r] [^\\n]* = foo\\.simdclone" 2 on Linux/x86_64

On Linux/x86_64,

58c8c1b383bc3c286d6527fc6e8fb62463f9a877 is the first bad commit
commit 58c8c1b383bc3c286d6527fc6e8fb62463f9a877
Author: Andre Vieira 
Date:   Tue Apr 11 10:07:43 2023 +0100

if-conv: Restore MASK_CALL conversion [PR10]

caused

FAIL: gcc.dg/vect/vect-simd-clone-16e.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-16f.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-17e.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-17f.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-18e.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-18f.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 2

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-7135/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-16e.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-16e.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-16f.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-16f.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-17e.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-17e.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-17f.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-17f.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18e.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18e.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18f.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18f.c 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com)


Re: [PATCH, rs6000] xfail float128 comparison test case that fails on powerpc64 [PR108728]

2023-04-13 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

on 2023/4/12 10:27, HAO CHEN GUI wrote:
> Hi,
>   This patch xfails a float128 comparison test case on powerpc64 that
> fails due to a longstanding issue with floating-point compares.
> 
>   See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58684 for more
> information.
> 
>   The patch passed regression test on Power Linux platforms.
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: xfail float128 comparison test case that fails on powerpc64.
> 
> This patch xfails a float128 comparison test case on powerpc64 that
> fails due to a longstanding issue with floating-point compares.
> 
> See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58684 for more information.
> 
> gcc/testsuite/
>   PR target/108728
>   * gcc.dg/torture/float128-cmp-invalid.c: Add xfail.
> 
> patch.diff
> diff --git a/gcc/testsuite/gcc.dg/torture/float128-cmp-invalid.c 
> b/gcc/testsuite/gcc.dg/torture/float128-cmp-invalid.c
> index 1f675efdd61..f52686e0a24 100644
> --- a/gcc/testsuite/gcc.dg/torture/float128-cmp-invalid.c
> +++ b/gcc/testsuite/gcc.dg/torture/float128-cmp-invalid.c
> @@ -1,5 +1,5 @@
>  /* Test for "invalid" exceptions from __float128 comparisons.  */
> -/* { dg-do run } */
> +/* { dg-do run { xfail { powerpc*-*-* } } } */

xfail all powerpc*-*-* can have some XPASSes on those ENVs with
software emulation.  Since the related hw insn xscmpuqp is guarded
with TARGET_FLOAT128_HW, could we use the effective target
ppc_float128_hw instead?

Maybe something like:

/* { dg-xfail-run-if "unexpected xscmpuqp" { ppc_float128_hw } } */

BR,
Kewen

>  /* { dg-options "" } */
>  /* { dg-require-effective-target __float128 } */
>  /* { dg-require-effective-target base_quadfloat_support } */




Re: [PATCH] testsuite: update requires for powerpc/float128-cmp2-runnable.c

2023-04-13 Thread Kewen.Lin via Gcc-patches
Hi Jeff,

on 2023/4/13 15:45, guojiufu wrote:
> Hi,
> 
> On 2023-04-12 20:47, Kewen.Lin wrote:
>> Hi Segher & Jeff,
>>
>> on 2023/4/11 23:13, Segher Boessenkool wrote:
>>> On Tue, Apr 11, 2023 at 05:40:09PM +0800, Kewen.Lin wrote:
 on 2023/4/11 17:14, guojiufu wrote:
> Thanks for raising this concern.
> The behavior to check about bif on FLOAT128_HW and emit an error message 
> for
> requirements on quad-precision is added in gcc12. This is why gcc12 fails 
> to
> compile the case on -m32.
>
> Before gcc12, altivec_resolve_overloaded_builtin will return the 
> overloaded
> result directly, and does not check more about the result function.

 Thanks for checking, I wonder which commit caused this behavior change and 
 what's
 the underlying justification?  I know there is one new bif handling 
 framework
>>
>> Answered this question by myself with some diggings, test case
>> float128-cmp2-runnable.c started to fail from r12-5752-gd08236359eb229 which
>> exactly makes new bif framework start to take effect and the reason why the
>> behavior changes is the condition change from **TARGET_P9_VECTOR** to
>> **TARGET_FLOAT128_HW**.
>>
>> With r12-5751-gc9dd01314d8467 (still old bif framework):
>>
>> $ grep -r scalar_cmp_exp_qp gcc/config/rs6000/rs6000-builtin.def
>> BU_P9V_VSX_2 (VSCEQPGT, "scalar_cmp_exp_qp_gt", CONST,  xscmpexpqp_gt_kf)
>> BU_P9V_VSX_2 (VSCEQPLT, "scalar_cmp_exp_qp_lt", CONST,  xscmpexpqp_lt_kf)
>> BU_P9V_VSX_2 (VSCEQPEQ, "scalar_cmp_exp_qp_eq", CONST,  xscmpexpqp_eq_kf)
>> BU_P9V_VSX_2 (VSCEQPUO, "scalar_cmp_exp_qp_unordered",  CONST,
>> xscmpexpqp_unordered_kf)
>> BU_P9V_OVERLOAD_2 (VSCEQPGT,    "scalar_cmp_exp_qp_gt")
>> BU_P9V_OVERLOAD_2 (VSCEQPLT,    "scalar_cmp_exp_qp_lt")
>> BU_P9V_OVERLOAD_2 (VSCEQPEQ,    "scalar_cmp_exp_qp_eq")
>> BU_P9V_OVERLOAD_2 (VSCEQPUO,    "scalar_cmp_exp_qp_unordered")
>>
>> There were only 13 bifs requiring TARGET_FLOAT128_HW in old bif framework.
>>
>> $ grep ^BU_FLOAT128_HW gcc/config/rs6000/rs6000-builtin.def
>> BU_FLOAT128_HW_VSX_1 (VSEEQP,   "scalar_extract_expq",  CONST,  xsxexpqp_kf)
>> BU_FLOAT128_HW_VSX_1 (VSESQP,   "scalar_extract_sigq",  CONST,  xsxsigqp_kf)
>> BU_FLOAT128_HW_VSX_1 (VSTDCNQP, "scalar_test_neg_qp",   CONST,  
>> xststdcnegqp_kf)
>> BU_FLOAT128_HW_VSX_2 (VSIEQP,   "scalar_insert_exp_q",  CONST,  xsiexpqp_kf)
>> BU_FLOAT128_HW_VSX_2 (VSIEQPF,  "scalar_insert_exp_qp", CONST,  xsiexpqpf_kf)
>> BU_FLOAT128_HW_VSX_2 (VSTDCQP, "scalar_test_data_class_qp", CONST,
>>  xststdcqp_kf)
>> BU_FLOAT128_HW_1 (SQRTF128_ODD,  "sqrtf128_round_to_odd",  FP, sqrtkf2_odd)
>> BU_FLOAT128_HW_1 (TRUNCF128_ODD, "truncf128_round_to_odd", FP, 
>> trunckfdf2_odd)
>> BU_FLOAT128_HW_2 (ADDF128_ODD,   "addf128_round_to_odd",   FP, addkf3_odd)
>> BU_FLOAT128_HW_2 (SUBF128_ODD,   "subf128_round_to_odd",   FP, subkf3_odd)
>> BU_FLOAT128_HW_2 (MULF128_ODD,   "mulf128_round_to_odd",   FP, mulkf3_odd)
>> BU_FLOAT128_HW_2 (DIVF128_ODD,   "divf128_round_to_odd",   FP, divkf3_odd)
>> BU_FLOAT128_HW_3 (FMAF128_ODD,   "fmaf128_round_to_odd",   FP, fmakf4_odd)
>>
>> Starting from r12-5752-gd08236359eb229, these
>> scalar_cmp_exp_qp_{gt,lt,eq,unordered}
>> bifs were put under stanza ieee128-hw, it makes ieee128-hw to have 17 bifs,
>> comparing to the previous, the extra four ones were exactly these
>> scalar_cmp_exp_qp_{gt,lt,eq,unordered}.
>>
 introduced in gcc12, not sure the checking condition was changed together 
 or by
 a standalone commit.  Anyway, apparently the conditions for the support of 
 these
 bifs are different on gcc-11 and gcc-12, I wonder why it changed.  As 
 mentioned
 above, PR108758's c#1 said this case (bifs) work well on gcc-11, I 
 suspected the
 condition change was an overkill, that's why I asked.
>>>
>>> It almost certainly was an oversight.  The new builtin framework changed
>>> so many things, there was bound to be some breakage to go with all the
>>> good things it brought.
>>
>> Yeah, as the above findings, also I found that
>> r12-3126-g2ed356a4c9af06 introduced
>> power9 related stanzas and r12-3167-g2f9489a1009d98 introduced ieee128-hw 
>> stanza
>> including these four bifs, both of them don't have any notes on why we
>> would change
>> the condition for these scalar_cmp_exp_qp_{gt,lt,eq,unordered} from
>> power9-vector to
>> ieee128-hw, so I think it's just an oversight (ieee128-hw is an
>> overkill comparing
>> to power9-vector :)).
>>
>>>
>>> So what is the actual thing going wrong?  QP insns work fine and are
>>> valid on all systems and environments, BE or LE, 32-bit or 64-bit.  Of
>>> course you cannot use the "long double" type for those everywhere, but
>>> that is a very different thing.
>>
>> The actual thing going wrong is that: the test case float128-cmp2-runnable.c
>> runs well on BE -m32 and -m64 with gcc-11, but meets failures on BE -m32 with
>> latest gcc-12 and trunk during compilation, having the error messages like:
>>
>> 

Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-13 Thread Kewen.Lin via Gcc-patches
Hi Juzhe,

on 2023/4/12 21:22, 钟居哲 wrote:
> Thanks Kewen. 
> 
> It seems that this proposal WHILE_LEN can help s390 when using --param 
> vect-partial-vector-usage=2 compile option.
> 

Yeah, IMHO, the previous sequence vs. the proposed sequence are like:

int
foo (int *__restrict a, int *__restrict b, int n)
{
  if (n <= 0)
return 0;

  int iv = 0;
  int len = MIN (n, 16);
  int sum = 0;
  do
{
  sum += a[len] + b[len];
  iv += 16;
  int n1 = MIN (n, iv);   // line A
  int n2 = n - n1;
  len = MIN (n2, 16);
}
  while (n > iv);

  return sum;
}

vs.

int
foo (int *__restrict a, int *__restrict b, int n)
{
  if (n <= 0)
return 0;

  int len;
  int sum = 0;
  do
{
  len = MIN (n, 16);
  sum += a[len] + b[len];
  n -= len;
}
  while (n > 0);

  return sum;
}

it at least saves one MIN (at line A) and one length preparation in the
last iteration (it's useless since loop ends).  But I think the concern
that this proposed IV isn't recognized as simple iv may stay.  I tried
to compile the above source files on Power, the former can adopt doloop
optimization but the latter fails to.

> Would you mind apply this patch && support WHILE_LEN in s390 backend and test 
> it to see the overal benefits for s390
> as well as the correctness of this sequence ? 

Sure, if all of you think this approach and this revision is good enough to go 
forward for this kind of evaluation,
I'm happy to give it a shot, but only for rs6000. ;-)  I noticed that there are 
some discussions on withdrawing this
WHILE_LEN by using MIN_EXPR instead, I'll stay tuned.

btw, now we only adopt vector with length on the epilogues rather than the main 
vectorized loops, because of the
non-trivial extra costs for length preparation than just using the normal 
vector load/store (all lanes), so we don't
care about the performance with --param vect-partial-vector-usage=2 much.  Even 
if this new proposal can optimize
the length preparation for --param vect-partial-vector-usage=2, the extra costs 
for length preparation is still
unavoidable (MIN, shifting, one more GPR used), we would still stay with 
default --param vect-partial-vector-usage=1
(which can't benefit from this new proposal).

BR,
Kewen


Re: [PATCH] testsuite: update requires for powerpc/float128-cmp2-runnable.c

2023-04-13 Thread guojiufu via Gcc-patches

Hi,

On 2023-04-12 20:47, Kewen.Lin wrote:

Hi Segher & Jeff,

on 2023/4/11 23:13, Segher Boessenkool wrote:

On Tue, Apr 11, 2023 at 05:40:09PM +0800, Kewen.Lin wrote:

on 2023/4/11 17:14, guojiufu wrote:

Thanks for raising this concern.
The behavior to check about bif on FLOAT128_HW and emit an error 
message for
requirements on quad-precision is added in gcc12. This is why gcc12 
fails to

compile the case on -m32.

Before gcc12, altivec_resolve_overloaded_builtin will return the 
overloaded

result directly, and does not check more about the result function.


Thanks for checking, I wonder which commit caused this behavior 
change and what's
the underlying justification?  I know there is one new bif handling 
framework


Answered this question by myself with some diggings, test case
float128-cmp2-runnable.c started to fail from r12-5752-gd08236359eb229 
which
exactly makes new bif framework start to take effect and the reason why 
the

behavior changes is the condition change from **TARGET_P9_VECTOR** to
**TARGET_FLOAT128_HW**.

With r12-5751-gc9dd01314d8467 (still old bif framework):

$ grep -r scalar_cmp_exp_qp gcc/config/rs6000/rs6000-builtin.def
BU_P9V_VSX_2 (VSCEQPGT, "scalar_cmp_exp_qp_gt", CONST,  
xscmpexpqp_gt_kf)
BU_P9V_VSX_2 (VSCEQPLT, "scalar_cmp_exp_qp_lt", CONST,  
xscmpexpqp_lt_kf)
BU_P9V_VSX_2 (VSCEQPEQ, "scalar_cmp_exp_qp_eq", CONST,  
xscmpexpqp_eq_kf)

BU_P9V_VSX_2 (VSCEQPUO, "scalar_cmp_exp_qp_unordered",  CONST,
xscmpexpqp_unordered_kf)
BU_P9V_OVERLOAD_2 (VSCEQPGT,"scalar_cmp_exp_qp_gt")
BU_P9V_OVERLOAD_2 (VSCEQPLT,"scalar_cmp_exp_qp_lt")
BU_P9V_OVERLOAD_2 (VSCEQPEQ,"scalar_cmp_exp_qp_eq")
BU_P9V_OVERLOAD_2 (VSCEQPUO,"scalar_cmp_exp_qp_unordered")

There were only 13 bifs requiring TARGET_FLOAT128_HW in old bif 
framework.


$ grep ^BU_FLOAT128_HW gcc/config/rs6000/rs6000-builtin.def
BU_FLOAT128_HW_VSX_1 (VSEEQP,   "scalar_extract_expq",  CONST,  
xsxexpqp_kf)
BU_FLOAT128_HW_VSX_1 (VSESQP,   "scalar_extract_sigq",  CONST,  
xsxsigqp_kf)
BU_FLOAT128_HW_VSX_1 (VSTDCNQP, "scalar_test_neg_qp",   CONST,  
xststdcnegqp_kf)
BU_FLOAT128_HW_VSX_2 (VSIEQP,   "scalar_insert_exp_q",  CONST,  
xsiexpqp_kf)
BU_FLOAT128_HW_VSX_2 (VSIEQPF,  "scalar_insert_exp_qp", CONST,  
xsiexpqpf_kf)

BU_FLOAT128_HW_VSX_2 (VSTDCQP, "scalar_test_data_class_qp", CONST,
 xststdcqp_kf)
BU_FLOAT128_HW_1 (SQRTF128_ODD,  "sqrtf128_round_to_odd",  FP, 
sqrtkf2_odd)
BU_FLOAT128_HW_1 (TRUNCF128_ODD, "truncf128_round_to_odd", FP, 
trunckfdf2_odd)
BU_FLOAT128_HW_2 (ADDF128_ODD,   "addf128_round_to_odd",   FP, 
addkf3_odd)
BU_FLOAT128_HW_2 (SUBF128_ODD,   "subf128_round_to_odd",   FP, 
subkf3_odd)
BU_FLOAT128_HW_2 (MULF128_ODD,   "mulf128_round_to_odd",   FP, 
mulkf3_odd)
BU_FLOAT128_HW_2 (DIVF128_ODD,   "divf128_round_to_odd",   FP, 
divkf3_odd)
BU_FLOAT128_HW_3 (FMAF128_ODD,   "fmaf128_round_to_odd",   FP, 
fmakf4_odd)


Starting from r12-5752-gd08236359eb229, these
scalar_cmp_exp_qp_{gt,lt,eq,unordered}
bifs were put under stanza ieee128-hw, it makes ieee128-hw to have 17 
bifs,

comparing to the previous, the extra four ones were exactly these
scalar_cmp_exp_qp_{gt,lt,eq,unordered}.

introduced in gcc12, not sure the checking condition was changed 
together or by
a standalone commit.  Anyway, apparently the conditions for the 
support of these
bifs are different on gcc-11 and gcc-12, I wonder why it changed.  As 
mentioned
above, PR108758's c#1 said this case (bifs) work well on gcc-11, I 
suspected the

condition change was an overkill, that's why I asked.


It almost certainly was an oversight.  The new builtin framework 
changed

so many things, there was bound to be some breakage to go with all the
good things it brought.


Yeah, as the above findings, also I found that
r12-3126-g2ed356a4c9af06 introduced
power9 related stanzas and r12-3167-g2f9489a1009d98 introduced 
ieee128-hw stanza

including these four bifs, both of them don't have any notes on why we
would change
the condition for these scalar_cmp_exp_qp_{gt,lt,eq,unordered} from
power9-vector to
ieee128-hw, so I think it's just an oversight (ieee128-hw is an
overkill comparing
to power9-vector :)).



So what is the actual thing going wrong?  QP insns work fine and are
valid on all systems and environments, BE or LE, 32-bit or 64-bit.  Of
course you cannot use the "long double" type for those everywhere, but
that is a very different thing.


The actual thing going wrong is that: the test case 
float128-cmp2-runnable.c
runs well on BE -m32 and -m64 with gcc-11, but meets failures on BE 
-m32 with
latest gcc-12 and trunk during compilation, having the error messages 
like:


gcc/testsuite/gcc.target/powerpc/float128-cmp2-runnable.c: In function 
'main':

gcc/testsuite/gcc.target/powerpc/float128-cmp2-runnable.c:155:3: error:
  '__builtin_vsx_scalar_cmp_exp_qp_eq' requires ISA 3.0 IEEE 128-bit
floating point

As scalar_cmp_exp_qp_{gt,lt,eq,unordered} requires condition 
TARGET_FLOAT128_HW

now (since new bif framework took effect).

(To be 

Re: [PATCH] testsuite: update requires for powerpc/float128-cmp2-runnable.c

2023-04-13 Thread Kewen.Lin via Gcc-patches
on 2023/4/12 20:47, Kewen.Lin wrote:
> Hi Segher & Jeff,
> 
> on 2023/4/11 23:13, Segher Boessenkool wrote:
>> On Tue, Apr 11, 2023 at 05:40:09PM +0800, Kewen.Lin wrote:
>>> on 2023/4/11 17:14, guojiufu wrote:
 Thanks for raising this concern.
 The behavior to check about bif on FLOAT128_HW and emit an error message 
 for
 requirements on quad-precision is added in gcc12. This is why gcc12 fails 
 to
 compile the case on -m32.

 Before gcc12, altivec_resolve_overloaded_builtin will return the overloaded
 result directly, and does not check more about the result function.
>>>
>>> Thanks for checking, I wonder which commit caused this behavior change and 
>>> what's
>>> the underlying justification?  I know there is one new bif handling 
>>> framework
> 
> Answered this question by myself with some diggings, test case
> float128-cmp2-runnable.c started to fail from r12-5752-gd08236359eb229 which
> exactly makes new bif framework start to take effect and the reason why the
> behavior changes is the condition change from **TARGET_P9_VECTOR** to
> **TARGET_FLOAT128_HW**.
> 
> With r12-5751-gc9dd01314d8467 (still old bif framework):
> 
> $ grep -r scalar_cmp_exp_qp gcc/config/rs6000/rs6000-builtin.def
> BU_P9V_VSX_2 (VSCEQPGT, "scalar_cmp_exp_qp_gt", CONST,  xscmpexpqp_gt_kf)
> BU_P9V_VSX_2 (VSCEQPLT, "scalar_cmp_exp_qp_lt", CONST,  xscmpexpqp_lt_kf)
> BU_P9V_VSX_2 (VSCEQPEQ, "scalar_cmp_exp_qp_eq", CONST,  xscmpexpqp_eq_kf)
> BU_P9V_VSX_2 (VSCEQPUO, "scalar_cmp_exp_qp_unordered",  CONST,  
> xscmpexpqp_unordered_kf)
> BU_P9V_OVERLOAD_2 (VSCEQPGT,"scalar_cmp_exp_qp_gt")
> BU_P9V_OVERLOAD_2 (VSCEQPLT,"scalar_cmp_exp_qp_lt")
> BU_P9V_OVERLOAD_2 (VSCEQPEQ,"scalar_cmp_exp_qp_eq")
> BU_P9V_OVERLOAD_2 (VSCEQPUO,"scalar_cmp_exp_qp_unordered")
> 
> There were only 13 bifs requiring TARGET_FLOAT128_HW in old bif framework.
> 
> $ grep ^BU_FLOAT128_HW gcc/config/rs6000/rs6000-builtin.def
> BU_FLOAT128_HW_VSX_1 (VSEEQP,   "scalar_extract_expq",  CONST,  xsxexpqp_kf)
> BU_FLOAT128_HW_VSX_1 (VSESQP,   "scalar_extract_sigq",  CONST,  xsxsigqp_kf)
> BU_FLOAT128_HW_VSX_1 (VSTDCNQP, "scalar_test_neg_qp",   CONST,  
> xststdcnegqp_kf)
> BU_FLOAT128_HW_VSX_2 (VSIEQP,   "scalar_insert_exp_q",  CONST,  xsiexpqp_kf)
> BU_FLOAT128_HW_VSX_2 (VSIEQPF,  "scalar_insert_exp_qp", CONST,  xsiexpqpf_kf)
> BU_FLOAT128_HW_VSX_2 (VSTDCQP, "scalar_test_data_class_qp", CONST,  
> xststdcqp_kf)
> BU_FLOAT128_HW_1 (SQRTF128_ODD,  "sqrtf128_round_to_odd",  FP, sqrtkf2_odd)
> BU_FLOAT128_HW_1 (TRUNCF128_ODD, "truncf128_round_to_odd", FP, trunckfdf2_odd)
> BU_FLOAT128_HW_2 (ADDF128_ODD,   "addf128_round_to_odd",   FP, addkf3_odd)
> BU_FLOAT128_HW_2 (SUBF128_ODD,   "subf128_round_to_odd",   FP, subkf3_odd)
> BU_FLOAT128_HW_2 (MULF128_ODD,   "mulf128_round_to_odd",   FP, mulkf3_odd)
> BU_FLOAT128_HW_2 (DIVF128_ODD,   "divf128_round_to_odd",   FP, divkf3_odd)
> BU_FLOAT128_HW_3 (FMAF128_ODD,   "fmaf128_round_to_odd",   FP, fmakf4_odd)
> 
> Starting from r12-5752-gd08236359eb229, these 
> scalar_cmp_exp_qp_{gt,lt,eq,unordered}
> bifs were put under stanza ieee128-hw, it makes ieee128-hw to have 17 bifs,
> comparing to the previous, the extra four ones were exactly these
> scalar_cmp_exp_qp_{gt,lt,eq,unordered}.
> 
>>> introduced in gcc12, not sure the checking condition was changed together 
>>> or by
>>> a standalone commit.  Anyway, apparently the conditions for the support of 
>>> these
>>> bifs are different on gcc-11 and gcc-12, I wonder why it changed.  As 
>>> mentioned
>>> above, PR108758's c#1 said this case (bifs) work well on gcc-11, I 
>>> suspected the
>>> condition change was an overkill, that's why I asked.
>>
>> It almost certainly was an oversight.  The new builtin framework changed
>> so many things, there was bound to be some breakage to go with all the
>> good things it brought.
> 
> Yeah, as the above findings, also I found that r12-3126-g2ed356a4c9af06 
> introduced
> power9 related stanzas and r12-3167-g2f9489a1009d98 introduced ieee128-hw 
> stanza
> including these four bifs, both of them don't have any notes on why we would 
> change
> the condition for these scalar_cmp_exp_qp_{gt,lt,eq,unordered} from 
> power9-vector to
> ieee128-hw, so I think it's just an oversight (ieee128-hw is an overkill 
> comparing
> to power9-vector :)).
> 
>>
>> So what is the actual thing going wrong?  QP insns work fine and are
>> valid on all systems and environments, BE or LE, 32-bit or 64-bit.  Of
>> course you cannot use the "long double" type for those everywhere, but
>> that is a very different thing.
> 
> The actual thing going wrong is that: the test case float128-cmp2-runnable.c
> runs well on BE -m32 and -m64 with gcc-11, but meets failures on BE -m32 with
> latest gcc-12 and trunk during compilation, having the error messages like:
> 
> gcc/testsuite/gcc.target/powerpc/float128-cmp2-runnable.c: In function 'main':
> gcc/testsuite/gcc.target/powerpc/float128-cmp2-runnable.c:155:3: 

Re: [PATCH] testsuite: filter out warning noise for CWE-1341 test

2023-04-13 Thread Richard Biener via Gcc-patches
On Thu, 13 Apr 2023, Jiufu Guo wrote:

> 
> Add more reviewers. :)
> 
> Jiufu Guo  writes:
> 
> > Hi,
> >
> > The case file-CWE-1341-example.c checkes [CWE-1341](`double-fclose`).
> > While on some systems, besides [CWE-1341], a message of [CWE-415] is
> > also reported. On those systems, attribute `malloc` may be attached on
> > fopen:
> > ```
> > # 258 "/usr/include/stdio.h" 3 4
> > extern FILE *fopen (const char *__restrict __filename,
> >   const char *__restrict __modes)   
> > 
> >   
> >   __attribute__ ((__malloc__)) __attribute__ ((__malloc__ (fclose, 1))) ;

Ouch.

I think this should be fixed in the analyzer, "stripping" malloc
tracking from fopen/fclose since it does this manually.  I've adjusted
the bug accordingly.

The workaround in the testsuite is OK for trunk.

Thanks,
Richard.

> > or say: __attribute_malloc__ __attr_dealloc_fclose __wur;
> > ```
> >
> > It would be ok to suppress other message except CWE-1341 for this case.
> > This patch add -Wno-analyzer-double-free to make this case pass on
> > those systems.
> >
> > Tested on ppc64 both BE and LE.
> > Is this ok for trunk?
> >
> > BR,
> > Jeff (Jiufu)
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/108722
> > * gcc.dg/analyzer/file-CWE-1341-example.c: Update.
> >
> > ---
> >  gcc/testsuite/gcc.dg/analyzer/file-CWE-1341-example.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/gcc/testsuite/gcc.dg/analyzer/file-CWE-1341-example.c 
> > b/gcc/testsuite/gcc.dg/analyzer/file-CWE-1341-example.c
> > index 2add3cb109b..830cb0376ea 100644
> > --- a/gcc/testsuite/gcc.dg/analyzer/file-CWE-1341-example.c
> > +++ b/gcc/testsuite/gcc.dg/analyzer/file-CWE-1341-example.c
> > @@ -19,6 +19,9 @@
> >  
> > IN NO EVENT SHALL THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 
> > OR IS SPONSORED BY (IF ANY), THE MITRE CORPORATION, ITS BOARD OF TRUSTEES, 
> > OFFICERS, AGENTS, AND EMPLOYEES BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 
> > LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
> > FROM, OUT OF OR IN CONNECTION WITH THE INFORMATION OR THE USE OR OTHER 
> > DEALINGS IN THE CWE.  */
> >  
> > +/* This case checks double-fclose only, suppress other warning.  */
> > +/* { dg-additional-options -Wno-analyzer-double-free } */
> > +
> >  #include 
> >  #include 
> >  #include 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-13 Thread Richard Biener via Gcc-patches
On Wed, 12 Apr 2023, ??? wrote:

> >> It's not so much that we need to do that.  But normally it's only worth
> >> adding internal functions if they do something that is too complicated
> >> to express in simple gimple arithmetic.  The UQDEC case I mentioned:
> 
> >>z = MAX (x, y) - y
> 
> >> fell into the "simple arithmetic" category for me.  We could have added
> >> an ifn for unsigned saturating decrement, but it didn't seem complicated
> >> enough to merit its own ifn.
> 
> Ah, I known your concern. I should admit that WHILE_LEN is a simple 
> arithmetic operation
> which is just taking result from
> 
> min (remain,vf).
> 
> The possible solution is to just use MIN_EXPR (remain,vf).
> Then, add speciall handling in umin_optab pattern to recognize "vf" in the 
> backend.
> Finally generate vsetvl in RISC-V backend.
> 
> The "vf" should be recognized as the operand of umin should be 
> const_int/const_poly_int operand.
> Otherwise, just generate umin scalar instruction..
> 
> However, there is a case that I can't recognize umin should generate vsetvl 
> or umin. Is this following case:
> void foo (int32_t a)
> {
>   return min (a, 4);
> }
> 
> In this case I should generate:
> li a1,4
> umin a1,a0,a1
> 
> instead of generating vsetvl
> 
> However, in this case:
> 
> void foo (int32_t *a...)
> for (int i = 0; i < n; i++)
>   a[i] = b[i] + c[i];
> 
> with -mriscv-vector-bits=128 (which means each vector can handle 4 INT32)
> Then the VF will be 4 too. If we also MIN_EXPR instead WHILE_LEN:
> 
> ...
> len = MIN_EXPR (n,4)
> v = len_load (len)
> 
> ...
> 
> In this case, MIN_EXPR should emit vsetvl.
> 
> It's hard for me to tell the difference between these 2 cases...

But the issue is the same in the reverse with WHILE_LEN, no?
WHILE_LEN just computes a scalar value - you seem to suggest
there's a hidden side-effect of "coalescing" the result with
a hardware vector length register?  I don't think that's good design.

IMHO tieing the scalar result with the uses has to be done where
you emit the other vsetvl instructions.

One convenient thing we have with WHILE_LEN is that it is a key
for the vectorizer to query target capabilities (and preferences).
But of course collecting whether stmts can be vectorized
with length and/or with mask would be better.

Richard.

> CC RISC-V port backend maintainer: Kito.
> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Sandiford
> Date: 2023-04-12 20:24
> To: juzhe.zhong\@rivai.ai
> CC: rguenther; gcc-patches; jeffreyalaw; rdapp; linkw
> Subject: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for 
> auto-vectorization
> "juzhe.zh...@rivai.ai"  writes:
> >>> I think that already works for them (could be misremembering).
> >>> However, IIUC, they have no special instruction to calculate the
> >>> length (unlike for RVV), and so it's open-coded using vect_get_len.
> >
> > Yeah, the current flow using min, sub, and then min in vect_get_len
> > is working for IBM. But I wonder whether switching the current flow of
> > length-loop-control into the WHILE_LEN pattern that this patch can improve
> > their performance.
> >
> >>> (1) How easy would it be to express WHILE_LEN in normal gimple?
> >>> I haven't thought about this at all, so the answer might be
> >>> "very hard".  But it reminds me a little of UQDEC on AArch64,
> >>> which we open-code using MAX_EXPR and MINUS_EXPR (see
> >  >>vect_set_loop_controls_directly).
> >
> >   >>   I'm not saying WHILE_LEN is the same operation, just that it seems
> >   >>   like it might be open-codeable in a similar way.
> >
> >  >>Even if we can open-code it, we'd still need some way for the
> >   >>   target to select the "RVV way" from the "s390/PowerPC way".
> >
> > WHILE_LEN in doc I define is
> > operand0 = MIN (operand1, operand2)operand1 is the residual number of 
> > scalar elements need to be updated.operand2 is vectorization factor (vf) 
> > for single rgroup. if multiple rgroup operan2 = vf * 
> > nitems_per_ctrl.You mean such pattern is not well expressed so we need to 
> > replace it with normaltree code (MIN OR MAX). And let RISC-V backend to 
> > optimize them into vsetvl ?Sorry, maybe I am not on the same page.
>  
> It's not so much that we need to do that.  But normally it's only worth
> adding internal functions if they do something that is too complicated
> to express in simple gimple arithmetic.  The UQDEC case I mentioned:
>  
>z = MAX (x, y) - y
>  
> fell into the "simple arithmetic" category for me.  We could have added
> an ifn for unsigned saturating decrement, but it didn't seem complicated
> enough to merit its own ifn.
>  
> >>> (2) What effect does using a variable IV step (the result of
> >>> the WHILE_LEN) have on ivopts?  I remember experimenting with
> >>> something similar once (can't remember the context) and not
> >>> having a constant step prevented ivopts from making good
> >>> addresing-mode choices.
> >
> > Thank you so much for pointing out 

Re: [Patch, fortran] PR109451 - ICE in gfc_conv_expr_descriptor with ASSOCIATE and substrings

2023-04-13 Thread Paul Richard Thomas via Gcc-patches
Hi Harald,

That's interesting - the string length '.q' is not set for either of the
associate blocks. I'm onto it.

Thanks

Paul


On Wed, 12 Apr 2023 at 20:26, Harald Anlauf  wrote:

> Hi Paul,
>
> On 4/12/23 17:25, Paul Richard Thomas via Gcc-patches wrote:
> > Hi All,
> >
> > I think that the changelog says it all. OK for mainline?
>
> this looks almost fine, but still fails if one directly uses the
> dummy argument as the ASSOCIATE target, as in:
>
> program p
>implicit none
>character(4) :: c(2) = ["abcd","efgh"]
>call dcs0 (c)
> ! call dcs0 (["abcd","efgh"])
> contains
>subroutine dcs0(a)
>  character(len=*), intent(in) :: a(:)
>  print *, size(a),len(a)
>  associate (q => a(:))
>print *, size(q),len(q)
>  end associate
>  associate (q => a(:)(:))
>print *, size(q),len(q)
>  end associate
>  return
>end subroutine dcs0
> end
>
> This prints e.g.
>
> 2   4
> 2   0
> 2   0
>
> (sometimes I also get junk values for the character length).
>
> Can you please have another look?
>
> Thanks,
> Harald
>
>
> > Paul
> >
> > Fortran: Fix some deferred character problems in associate [PR109451]
> >
> > 2023-04-07  Paul Thomas  
> >
> > gcc/fortran
> > PR fortran/109451
> > * trans-array.cc (gfc_conv_expr_descriptor): Guard expression
> > character length backend decl before using it. Suppress the
> > assignment if lhs equals rhs.
> > * trans-io.cc (gfc_trans_transfer): Scalarize transfer of
> > associate variables pointing to a variable. Add comment.
> >
> >
> > gcc/testsuite/
> > PR fortran/109451
> > * gfortran.dg/associate_61.f90 : New test
>
>

-- 
"If you can't explain it simply, you don't understand it well enough" -
Albert Einstein