date:20230816

[PATCH] RISC-V: Fix incorrect VTYPE fusion for floating point scalar move insn[PR111037]

2023-08-16 Thread Juzhe-Zhong

void foo(_Float16 y, int64_t *i64p)
{
  vint64m1_t vx =__riscv_vle64_v_i64m1 (i64p, 1);
  vx = __riscv_vadd_vv_i64m1 (vx, vx, 1);
  vfloat16m1_t vy =__riscv_vfmv_s_f_f16m1 (y, 1);
  asm volatile ("# use %0 %1" : : "vr"(vx), "vr" (vy));
}

zve64f:
foo:
vsetivlizero,1,e16,mf4,ta,ma
vle64.v v1,0(a0)
vfmv.s.fv2,fa0
vsetvli zero,zero,e64,m1,ta,ma
vadd.vv v1,v1,v1

zve64d:
foo:
vsetivlizero,1,e64,m1,ta,ma
vle64.v v1,0(a0)
vfmv.s.fv2,fa0
vadd.vv v1,v1,v1

PR target111037

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (float_insn_valid_sew_p): New function.
(second_sew_less_than_first_sew_p): Fix bug.
(first_sew_less_than_second_sew_p): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr111037-1.c: New test.
* gcc.target/riscv/rvv/base/pr111037-2.c: New test.

---
 gcc/config/riscv/riscv-vsetvl.cc  | 22 +--
 .../gcc.target/riscv/rvv/base/pr111037-1.c| 15 +
 .../gcc.target/riscv/rvv/base/pr111037-2.c|  8 +++
 3 files changed, 43 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111037-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111037-2.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 08c487d82c0..79cbac01047 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -1183,18 +1183,36 @@ second_ratio_invalid_for_first_lmul_p (const 
vector_insn_info ,
   return calculate_sew (info1.get_vlmul (), info2.get_ratio ()) == 0;
 }
 
+static bool
+float_insn_valid_sew_p (const vector_insn_info , unsigned int sew)
+{
+  if (info.get_insn () && info.get_insn ()->is_real ()
+  && get_attr_type (info.get_insn ()->rtl ()) == TYPE_VFMOVFV)
+{
+  if (sew == 16)
+   return TARGET_VECTOR_ELEN_FP_16;
+  else if (sew == 32)
+   return TARGET_VECTOR_ELEN_FP_32;
+  else if (sew == 64)
+   return TARGET_VECTOR_ELEN_FP_64;
+}
+  return true;
+}
+
 static bool
 second_sew_less_than_first_sew_p (const vector_insn_info ,
  const vector_insn_info )
 {
-  return info2.get_sew () < info1.get_sew ();
+  return info2.get_sew () < info1.get_sew ()
+|| !float_insn_valid_sew_p (info1, info2.get_sew ());
 }
 
 static bool
 first_sew_less_than_second_sew_p (const vector_insn_info ,
  const vector_insn_info )
 {
-  return info1.get_sew () < info2.get_sew ();
+  return info1.get_sew () < info2.get_sew ()
+|| !float_insn_valid_sew_p (info2, info1.get_sew ());
 }
 
 /* return 0 if LMUL1 == LMUL2.
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111037-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111037-1.c
new file mode 100644
index 000..0b7b32fc3e6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111037-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc_zve64f_zvfh -mabi=ilp32d -O3" } */
+
+#include "riscv_vector.h"
+
+void foo(_Float16 y, int64_t *i64p)
+{
+  vint64m1_t vx =__riscv_vle64_v_i64m1 (i64p, 1);
+  vx = __riscv_vadd_vv_i64m1 (vx, vx, 1);
+  vfloat16m1_t vy =__riscv_vfmv_s_f_f16m1 (y, 1);
+  asm volatile ("# use %0 %1" : : "vr"(vx), "vr" (vy));
+}
+
+/* { dg-final { scan-assembler-times 
{vsetivli\s+zero,\s*1,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 1 } } */
+/* { dg-final { scan-assembler-times 
{vsetvli\s+zero,\s*zero,\s*e64,\s*m1,\s*t[au],\s*m[au]} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111037-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111037-2.c
new file mode 100644
index 000..ac50da71726
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111037-2.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc_zve64d_zvfh -mabi=ilp32d -O3" } */
+
+#include "pr111037-1.c"
+
+/* { dg-final { scan-assembler-times 
{vsetivli\s+zero,\s*1,\s*e64,\s*m1,\s*t[au],\s*m[au]} 1 } } */
+/* { dg-final { scan-assembler-not {vsetvli} } } */
+/* { dg-final { scan-assembler-times {vsetivli} 1 } } */
-- 
2.36.3

Re: [V2][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-08-16 Thread Kees Cook via Gcc-patches

On Fri, Aug 04, 2023 at 07:44:28PM +, Qing Zhao wrote:
> This is the 2nd version of the patch, per our discussion based on the
> review comments for the 1st version, the major changes in this version

I've been using Coccinelle to find and annotate[1] structures (193 so
far...), and I've encountered 2 cases of GCC internal errors. I'm working
on a minimized test case, but just in case these details are immediately
helpful, here's what I'm seeing:

../drivers/net/wireless/ath/wcn36xx/smd.c: In function 
'wcn36xx_smd_rsp_process':
../drivers/net/wireless/ath/wcn36xx/smd.c:3299:5: error: incorrect sharing of 
tree nodes
 3299 | int wcn36xx_smd_rsp_process(struct rpmsg_device *rpdev,
  | ^~~
MEM[(struct wcn36xx_hal_ind_msg *)_96]
_15 = [(struct wcn36xx_hal_ind_msg *)_96].msg;
during GIMPLE pass: objsz
../drivers/net/wireless/ath/wcn36xx/smd.c:3299:5: internal compiler error: 
verify_gimple failed
0xfe97fd verify_gimple_in_cfg(function*, bool, bool)
../../../../gcc/gcc/tree-cfg.cc:5646
0xe84894 execute_function_todo
../../../../gcc/gcc/passes.cc:2088
0xe84dee execute_todo
../../../../gcc/gcc/passes.cc:2142

The associated struct is:

struct wcn36xx_hal_ind_msg {
struct list_head list;
size_t msg_len;
u8 msg[] __counted_by(msg_len);
};



And:

../drivers/usb/gadget/function/f_fs.c: In function '__ffs_epfile_read_data':
../drivers/usb/gadget/function/f_fs.c:900:16: error: incorrect sharing of tree 
nodes
  900 | static ssize_t __ffs_epfile_read_data(struct ffs_epfile *epfile,
  |^~
MEM[(struct ffs_buffer *)_67]
_5 = [(struct ffs_buffer *)_67].storage;
during GIMPLE pass: objsz
../drivers/usb/gadget/function/f_fs.c:900:16: internal compiler error: 
verify_gimple failed
0xfe97fd verify_gimple_in_cfg(function*, bool, bool)
../../../../gcc/gcc/tree-cfg.cc:5646
0xe84894 execute_function_todo
../../../../gcc/gcc/passes.cc:2088
0xe84dee execute_todo
../../../../gcc/gcc/passes.cc:2142

with:

struct ffs_buffer {
size_t length;
char *data;
char storage[] __counted_by(length);
};


[1] 
https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci

-- 
Kees Cook

Re: [PATCH ver 2] rs6000, add overloaded DFP quantize support

2023-08-16 Thread Kewen.Lin via Gcc-patches

on 2023/8/17 11:11, Peter Bergner wrote:
> On 8/16/23 7:19 PM, Carl Love wrote:
>> +(define_insn "dfp_dquan_"
>> +  [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
>> +(unspec:DDTD [(match_operand:DDTD 1 "gpc_reg_operand" "d")
>> +  (match_operand:DDTD 2 "gpc_reg_operand" "d")
>> +  (match_operand:QI 3 "immediate_operand" "i")]
>> + UNSPEC_DQUAN))]
>> +  "TARGET_DFP"
>> +  "dqua %0,%1,%2,%3"
>> +  [(set_attr "type" "dfp")
>> +   (set_attr "size" "")])
> 
> operand 3 refers to the RMC operand field of the insn we are emitting.
> RMC is a two bit unsigned operand, so I think the predicate should be
> const_0_to_3_operand rather than immediate_operand.  It's always best
> to use a tighter predicate if we have one. Ditto for the other patterns
> with an RMC operand.

Good point!  I agree it's better to use a suitable tighter predicate here,
even if for now it's only used for bif expanding and the bif prototype
already restricts it.

> 
> I don't think we allow anything other than an integer for that operand
> value, so I _think_ that "n" is probably a better constraint than "i"?
> Ke Wen/Segher???

Yeah, I agree "n" is better for this context, it better matches your
proposed const_0_to_3_operand/s5bit_cint_operand (const_int).

BR,
Kewen

Re: [PATCH v2] RISCV: Add rotate immediate regression test

2023-08-16 Thread Jeff Law via Gcc-patches





On 8/16/23 19:17, Patrick O'Neill wrote:

This adds new regression tests to ensure half-register rotations are
correctly optimized into rori instructions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zbb-rol-ror-08.c: New test.
* gcc.target/riscv/zbb-rol-ror-09.c: New test.

Co-authored-by: Charlie Jenkins 
Signed-off-by: Patrick O'Neill 

OK
jeff

Re: [PATCH] RISC-V: Support simplify (-1-x) for vector.

2023-08-16 Thread Jeff Law via Gcc-patches





On 8/16/23 02:40, yanzhang.wang--- via Gcc-patches wrote:

From: Yanzhang Wang 

The pattern is enabled for scalar but not for vector. The patch try to
make it consistent and will convert below code,

shortcut_for_riscv_vrsub_case_1_32:
 vl1re32.v   v1,0(a1)
 vsetvli zero,a2,e32,m1,ta,ma
 vrsub.viv1,v1,-1
 vs1r.v  v1,0(a0)
 ret

to,

shortcut_for_riscv_vrsub_case_1_32:
 vl1re32.v   v1,0(a1)
 vsetvli zero,a2,e32,m1,ta,ma
 vnot.v  v1,v1
 vs1r.v  v1,0(a0)
 ret

gcc/ChangeLog:

* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):
 Get -1 with mode.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/simplify-vrsub.c: New test.
Just a note.  It is customary to indicate what testing you did for each 
patch.  A patch which changes target independent code should be 
bootstrapped and regression tested on at least one major target (most 
folks use x86_64 or aarch64).


If you change target code it is customary to run the testsuite on that 
target.  Ideally that would include a bootstrap and regression test, but 
that's not always possible (cross compilers) in which case you just 
build the toolchain and run the cross tests.


I went ahead and bootstrapped & regression tested this on 
x86_64-linux-gnu where it passed without regressions.


I'll push this to the trunk.

Thanks,
jeff

Re: [PATCH v3] LoongArch:Implement 128-bit floating point functions in gcc.

2023-08-16 Thread Xi Ruoyao via Gcc-patches

On Tue, 2023-08-15 at 20:03 +, Joseph Myers wrote:
> On Tue, 15 Aug 2023, chenxiaolong wrote:
> 
> > In the implementation process, the "q" suffix function is
> >     Re-register and associate the "__float128" type with the
> >     "long double" type so that the compiler can handle the
> >     corresponding function correctly. The functions implemented
> >     include __builtin_{huge_valq infq, fabsq, copysignq, nanq,nansq}.
> >     On the LoongArch architecture, __builtin_{fabsq,copysignq} can
> >     be implemented with the instruction "bstrins.d", so that its
> >     optimization effect reaches the optimal value.
> 
> Why?  If long double has binary128 format, you shouldn't need any of these 
> functions at all; if it doesn't, just the C23 _Float128 type name and f128 
> constant suffix, and associated built-in functions defined in 
> builtins.def, should suffice (and since we now have _FloatN support for 
> C++, C++ no longer provides a reason for adding __float128 either).  
> __float128 is a legacy type name and feature and shouldn't be needed on 
> any new architectures, which can just use the standard type name from the 
> start.

For _Float128 GCC already does the correct thing:

_Float128 g(_Float128 x) { return __builtin_fabsf128(x); }

compiled to (with -O2):

g:
.LFB3 = .
.cfi_startproc
bstrpick.d  $r5,$r5,62,0
jr  $r1
.cfi_endproc

So I guess we just need

builtin_define ("__builtin_fabsq=__builtin_fabsf128");
builtin_define ("__builtin_nanq=__builtin_nanf128");

etc. to map the "q" builtins to "f128" builtins if we really need the
"q" builtins.

Joseph: the problem here is many customers of LoongArch CPUs wish to
compile their old code with minimal change.  Is it acceptable to add
these builtin_define's like rs6000-c.cc?  Note "a new architecture" does
not mean we'll only compile post-C2x-era programs onto it.
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH v1] RISC-V: Support RVV VFREDUSUM.VS rounding mode intrinsic API

2023-08-16 Thread Kito Cheng via Gcc-patches

Lgtm

Pan Li via Gcc-patches 於 2023年8月17日 週四，11:09寫道：

> From: Pan Li 
>
> This patch would like to support the rounding mode API for the
> VFREDUSUM.VS as the below samples.
>
> * __riscv_vfredusum_vs_f32m1_f32m1_rm
> * __riscv_vfredusum_vs_f32m1_f32m1_rm_m
>
> Signed-off-by: Pan Li 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-bases.cc
> (class freducop): Add frm_op_type template arg.
> (vfredusum_frm_obj): New declaration.
> (BASE): Ditto.
> * config/riscv/riscv-vector-builtins-bases.h: Ditto.
> * config/riscv/riscv-vector-builtins-functions.def
> (vfredusum_frm): New intrinsic function def.
> * config/riscv/riscv-vector-builtins-shapes.cc
> (struct reduc_alu_frm_def): New class for frm shape.
> (SHAPE): New declaration.
> * config/riscv/riscv-vector-builtins-shapes.h: Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/float-point-redusum.c: New test.
> ---
>  .../riscv/riscv-vector-builtins-bases.cc  |  9 -
>  .../riscv/riscv-vector-builtins-bases.h   |  1 +
>  .../riscv/riscv-vector-builtins-functions.def |  2 +
>  .../riscv/riscv-vector-builtins-shapes.cc | 39 +++
>  .../riscv/riscv-vector-builtins-shapes.h  |  1 +
>  .../riscv/rvv/base/float-point-redusum.c  | 33 
>  6 files changed, 84 insertions(+), 1 deletion(-)
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/base/float-point-redusum.c
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc
> b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> index ad04647f9ba..65f1d9c8ff7 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
> +++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> @@ -1847,10 +1847,15 @@ public:
>  };
>
>  /* Implements floating-point reduction instructions.  */
> -template
> +template
>  class freducop : public function_base
>  {
>  public:
> +  bool has_rounding_mode_operand_p () const override
> +  {
> +return FRM_OP == HAS_FRM;
> +  }
> +
>bool apply_mask_policy_p () const override { return false; }
>
>rtx expand (function_expander ) const override
> @@ -2532,6 +2537,7 @@ static CONSTEXPR const reducop vredxor_obj;
>  static CONSTEXPR const widen_reducop vwredsum_obj;
>  static CONSTEXPR const widen_reducop vwredsumu_obj;
>  static CONSTEXPR const freducop vfredusum_obj;
> +static CONSTEXPR const freducop
> vfredusum_frm_obj;
>  static CONSTEXPR const freducop vfredosum_obj;
>  static CONSTEXPR const reducop vfredmax_obj;
>  static CONSTEXPR const reducop vfredmin_obj;
> @@ -2789,6 +2795,7 @@ BASE (vredxor)
>  BASE (vwredsum)
>  BASE (vwredsumu)
>  BASE (vfredusum)
> +BASE (vfredusum_frm)
>  BASE (vfredosum)
>  BASE (vfredmax)
>  BASE (vfredmin)
> diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h
> b/gcc/config/riscv/riscv-vector-builtins-bases.h
> index c8c649c4bb0..fd1a84f3e68 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-bases.h
> +++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
> @@ -239,6 +239,7 @@ extern const function_base *const vredxor;
>  extern const function_base *const vwredsum;
>  extern const function_base *const vwredsumu;
>  extern const function_base *const vfredusum;
> +extern const function_base *const vfredusum_frm;
>  extern const function_base *const vfredosum;
>  extern const function_base *const vfredmax;
>  extern const function_base *const vfredmin;
> diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def
> b/gcc/config/riscv/riscv-vector-builtins-functions.def
> index cfbc125dcd8..90a83c02d52 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-functions.def
> +++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
> @@ -500,6 +500,8 @@ DEF_RVV_FUNCTION (vfredosum, reduc_alu, no_mu_preds,
> f_vs_ops)
>  DEF_RVV_FUNCTION (vfredmax, reduc_alu, no_mu_preds, f_vs_ops)
>  DEF_RVV_FUNCTION (vfredmin, reduc_alu, no_mu_preds, f_vs_ops)
>
> +DEF_RVV_FUNCTION (vfredusum_frm, reduc_alu_frm, no_mu_preds, f_vs_ops)
> +
>  // 14.4. Vector Widening Floating-Point Reduction Instructions
>  DEF_RVV_FUNCTION (vfwredosum, reduc_alu, no_mu_preds, wf_vs_ops)
>  DEF_RVV_FUNCTION (vfwredusum, reduc_alu, no_mu_preds, wf_vs_ops)
> diff --git a/gcc/config/riscv/riscv-vector-builtins-shapes.cc
> b/gcc/config/riscv/riscv-vector-builtins-shapes.cc
> index 80329113af3..f8fdec863e6 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-shapes.cc
> +++ b/gcc/config/riscv/riscv-vector-builtins-shapes.cc
> @@ -371,6 +371,44 @@ struct narrow_alu_frm_def : public build_frm_base
>}
>  };
>
> +/* reduc_alu_frm_def class.  */
> +struct reduc_alu_frm_def : public build_frm_base
> +{
> +  char *get_name (function_builder , const function_instance ,
> + bool overloaded_p) const override
> +  {
> +char base_name[BASE_NAME_MAX_LEN] = {};
> +
> +normalize_base_name (base_name, instance.base_name, sizeof
> (base_name));
> +
> +b.append_base_name

Re: [PATCH v1] RISC-V: Support RVV VFNCVT.F.{X|XU|F}.W rounding mode intrinsic API

2023-08-16 Thread Kito Cheng via Gcc-patches

Lgtm

Pan Li via Gcc-patches 於 2023年8月17日 週四，10:19寫道：

> From: Pan Li 
>
> This patch would like to support the rounding mode API for the
> VFNCVT.F.{X|XU|F}.W as the below samples.
>
> * __riscv_vfncvt_f_x_w_f32m1_rm
> * __riscv_vfncvt_f_x_w_f32m1_rm_m
> * __riscv_vfncvt_f_xu_w_f32m1_rm
> * __riscv_vfncvt_f_xu_w_f32m1_rm_m
> * __riscv_vfncvt_f_f_w_f32m1_rm
> * __riscv_vfncvt_f_f_w_f32m1_rm_m
>
> Signed-off-by: Pan Li 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-bases.cc
> (class vfncvt_f): Add frm_op_type template arg.
> (vfncvt_f_frm_obj): New declaration.
> (BASE): Ditto.
> * config/riscv/riscv-vector-builtins-bases.h: Ditto.
> * config/riscv/riscv-vector-builtins-functions.def
> (vfncvt_f_frm): New intrinsic function def.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/float-point-ncvt-f.c: New test.
> ---
>  .../riscv/riscv-vector-builtins-bases.cc  | 10 ++-
>  .../riscv/riscv-vector-builtins-bases.h   |  1 +
>  .../riscv/riscv-vector-builtins-functions.def |  3 +
>  .../riscv/rvv/base/float-point-ncvt-f.c   | 69 +++
>  4 files changed, 82 insertions(+), 1 deletion(-)
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/base/float-point-ncvt-f.c
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc
> b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> index acadec2afca..ad04647f9ba 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
> +++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> @@ -1786,9 +1786,15 @@ public:
>}
>  };
>
> +template
>  class vfncvt_f : public function_base
>  {
>  public:
> +  bool has_rounding_mode_operand_p () const override
> +  {
> +return FRM_OP == HAS_FRM;
> +  }
> +
>rtx expand (function_expander ) const override
>{
>  if (e.op_info->op == OP_TYPE_f_w)
> @@ -2512,7 +2518,8 @@ static CONSTEXPR const
> vfncvt_x vfncvt_xu_obj;
>  static CONSTEXPR const vfncvt_x
> vfncvt_xu_frm_obj;
>  static CONSTEXPR const vfncvt_rtz_x vfncvt_rtz_x_obj;
>  static CONSTEXPR const vfncvt_rtz_x vfncvt_rtz_xu_obj;
> -static CONSTEXPR const vfncvt_f vfncvt_f_obj;
> +static CONSTEXPR const vfncvt_f vfncvt_f_obj;
> +static CONSTEXPR const vfncvt_f vfncvt_f_frm_obj;
>  static CONSTEXPR const vfncvt_rod_f vfncvt_rod_f_obj;
>  static CONSTEXPR const reducop vredsum_obj;
>  static CONSTEXPR const reducop vredmaxu_obj;
> @@ -2769,6 +2776,7 @@ BASE (vfncvt_xu_frm)
>  BASE (vfncvt_rtz_x)
>  BASE (vfncvt_rtz_xu)
>  BASE (vfncvt_f)
> +BASE (vfncvt_f_frm)
>  BASE (vfncvt_rod_f)
>  BASE (vredsum)
>  BASE (vredmaxu)
> diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h
> b/gcc/config/riscv/riscv-vector-builtins-bases.h
> index 9bd09a41960..c8c649c4bb0 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-bases.h
> +++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
> @@ -226,6 +226,7 @@ extern const function_base *const vfncvt_xu_frm;
>  extern const function_base *const vfncvt_rtz_x;
>  extern const function_base *const vfncvt_rtz_xu;
>  extern const function_base *const vfncvt_f;
> +extern const function_base *const vfncvt_f_frm;
>  extern const function_base *const vfncvt_rod_f;
>  extern const function_base *const vredsum;
>  extern const function_base *const vredmaxu;
> diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def
> b/gcc/config/riscv/riscv-vector-builtins-functions.def
> index 1e0e989fc2a..cfbc125dcd8 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-functions.def
> +++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
> @@ -474,6 +474,9 @@ DEF_RVV_FUNCTION (vfncvt_rod_f, narrow_alu,
> full_preds, f_to_nf_f_w_ops)
>
>  DEF_RVV_FUNCTION (vfncvt_x_frm, narrow_alu_frm, full_preds,
> f_to_ni_f_w_ops)
>  DEF_RVV_FUNCTION (vfncvt_xu_frm, narrow_alu_frm, full_preds,
> f_to_nu_f_w_ops)
> +DEF_RVV_FUNCTION (vfncvt_f_frm, narrow_alu_frm, full_preds,
> i_to_nf_x_w_ops)
> +DEF_RVV_FUNCTION (vfncvt_f_frm, narrow_alu_frm, full_preds,
> u_to_nf_xu_w_ops)
> +DEF_RVV_FUNCTION (vfncvt_f_frm, narrow_alu_frm, full_preds,
> f_to_nf_f_w_ops)
>
>  /* 14. Vector Reduction Operations.  */
>
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-ncvt-f.c
> b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-ncvt-f.c
> new file mode 100644
> index 000..d6d4be5e98e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-ncvt-f.c
> @@ -0,0 +1,69 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
> +
> +#include "riscv_vector.h"
> +
> +vfloat32m1_t
> +test_riscv_vfncvt_f_x_w_f32m1_rm (vint64m2_t op1, size_t vl) {
> +  return __riscv_vfncvt_f_x_w_f32m1_rm (op1, 0, vl);
> +}
> +
> +vfloat32m1_t
> +test_vfncvt_f_x_w_f32m1_rm_m (vbool32_t mask, vint64m2_t op1, size_t vl) {
> +  return __riscv_vfncvt_f_x_w_f32m1_rm_m (mask, op1, 1, vl);
> +}
> +
> +vfloat32m1_t
> +test_riscv_vfncvt_f_xu_w_f32m1_rm (vuint64m2_t op1, size_t vl) {
> +  return

Re: RISC-V: Added support for CRC.

2023-08-16 Thread Jeff Law via Gcc-patches

On 8/16/23 13:10, Alexander Monakov wrote:

On Tue, 15 Aug 2023, Jeff Law wrote:

Because if the compiler can optimize it automatically, then the projects have
to do literally nothing to take advantage of it. They just compile normally
and their bitwise CRC gets optimized down to either a table lookup or a clmul
variant. That's the real goal here.

The only high-profile FOSS project that carries a bitwise CRC implementation
I'm aware of is the 'xz' compression library. There bitwise CRC is used for
populating the lookup table under './configure --enable-small':

https://github.com/tukaani-project/xz/blob/2b871f4dbffe3801d0da3f89806b5935f758d5f3/src/liblzma/check/crc64_small.c

It's a well-reasoned choice and your compiler would be undoing it
(reintroducing the table when the bitwise CRC is employed specifically
to avoid carrying the table).
If they don't want the table variant, there would obviously be ways to
turn that off. It's essentially no different than any speed improving
optimization that makes things larger.

One final note. Elsewhere in this thread you described performance concerns.
Right now clmuls can be implemented in 4c, fully piped.

Pipelining doesn't matter in the implementation being proposed here, because
the builtin is expanded to

li a4,quotient
li a5,polynomial
xor a0,a1,a0
clmul a0,a0,a4
srlia0,a0,crc_size
clmul a0,a0,a5
sllia0,a0,GET_MODE_BITSIZE (word_mode) - crc_size
srlia0,a0,GET_MODE_BITSIZE (word_mode) - crc_size

making CLMULs data-dependent, so the second can only be started one cycle
after the first finishes, and consecutive invocations of __builtin_crc
are likewise data-dependent (with three cycles between CLMUL). So even
when you get CLMUL down to 3c latency, you'll have two CLMULs and 10 cycles
per input block, while state of the art is one widening CLMUL per input block
(one CLMUL per 32-bit block on a 64-bit CPU) limited by throughput, not latency.

I expect it'll actually be 2c latency. We're approaching the point
where it just won't make that much sense to call out to a library when
you can emit the pair of clmuls and a couple shifts.

1 2 >

1 - 100 of 123 matches

Mail list logo