PR111648: Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding

2023-10-04 Thread Prathamesh Kulkarni
Hi,
The attached patch attempts to fix PR111648.
As mentioned in PR, the issue is when a1 is a multiple of vector
length, we end up creating following encoding in result: { base_elem,
arg[0], arg[1], ... } (assuming S = 1),
where arg is chosen input vector, which is incorrect, since the
encoding originally in arg would be: { arg[0], arg[1], arg[2], ... }

For the test-case mentioned in PR, vectorizer pass creates
VEC_PERM_EXPR where:
arg0: { -16, -9, -10, -11 }
arg1: { -12, -5, -6, -7 }
sel = { 3, 4, 5, 6 }

arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3.
Since a1 = 4 and arg_len = 4, it ended up creating the result with
following encoding:
res = { arg0[3], arg1[0], arg1[1] } // npatterns = 1, nelts_per_pattern = 3
  = { -11, -12, -5 }

So for res[3], it used S = (-5) - (-12) = 7
And hence computed it as -5 + 7 = 2.
instead of selecting arg1[2], ie, -6.

The patch tweaks valid_mask_for_fold_vec_perm_cst_p to punt if a1 is a multiple
of vector length, so a1 ... ae select elements only from stepped part
of the pattern
from input vector and return false for this case.

Since the vectors are VLS, fold_vec_perm_cst then sets:
res_npatterns = res_nelts
res_nelts_per_pattern  = 1
which seems to fix the issue by encoding all the elements.

The patch resulted in Case 4 and Case 5 failing from test_nunits_min_2 because
they used sel = { 0, 0, 1, ... } and {len, 0, 1, ... } respectively,
which used a1 = 0, and thus selected arg1[0].

I removed Case 4 because it was already covered in test_nunits_min_4,
and moved Case 5 to test_nunits_min_4, with sel = { len, 1, 2, ... }
and added a new Case 9 to test for this issue.

Passes bootstrap+test on aarch64-linux-gnu with and without SVE,
and on x86_64-linux-gnu.
Does the patch look OK ?

Thanks,
Prathamesh
[PR111648] Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding.

gcc/ChangeLog:
PR tree-optimization/111648
* fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): Punt if a1
is a multiple of vector length.
(test_nunits_min_2): Remove Case 4 and move Case 5 to ...
(test_nunits_min_4): ... here and rename case numbers. Also add
Case 9.

gcc/testsuite/ChangeLog:
PR tree-optimization/111648
* gcc.dg/vect/pr111648.c: New test.


diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 4f8561509ff..c5f421d6b76 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -10682,8 +10682,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree 
arg1,
  return false;
}
 
-  /* Ensure that the stepped sequence always selects from the same
-input pattern.  */
+  /* Ensure that the stepped sequence always selects from the stepped
+part of same input pattern.  */
   unsigned arg_npatterns
= ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0)
  : VECTOR_CST_NPATTERNS (arg1);
@@ -10694,6 +10694,20 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree 
arg1,
*reason = "step is not multiple of npatterns";
  return false;
}
+
+  /* If a1 is a multiple of len, it will select base element of input
+vector resulting in following encoding:
+{ base_elem, arg[0], arg[1], ... } where arg is the chosen input
+vector. This encoding is not originally present in arg, since it's
+defined as:
+{ arg[0], arg[1], arg[2], ... }.  */
+
+  if (multiple_p (a1, arg_len))
+   {
+ if (reason)
+   *reason = "selecting base element of input vector";
+ return false;
+   }
 }
 
   return true;
@@ -17425,47 +17439,6 @@ test_nunits_min_2 (machine_mode vmode)
tree expected_res[] = { ARG0(0), ARG1(0), ARG0(1), ARG1(1) };
validate_res (2, 2, res, expected_res);
   }
-
-  /* Case 4: mask = {0, 0, 1, ...} // (1, 3)
-Test that the stepped sequence of the pattern selects from
-same input pattern. Since input vectors have npatterns = 2,
-and step (a2 - a1) = 1, step is not a multiple of npatterns
-in input vector. So return NULL_TREE.  */
-  {
-   tree arg0 = build_vec_cst_rand (vmode, 2, 3, 1);
-   tree arg1 = build_vec_cst_rand (vmode, 2, 3, 1);
-   poly_uint64 len = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
-
-   vec_perm_builder builder (len, 1, 3);
-   poly_uint64 mask_elems[] = { 0, 0, 1 };
-   builder_push_elems (builder, mask_elems);
-
-   vec_perm_indices sel (builder, 2, len);
-   const char *reason;
-   tree res = fold_vec_perm_cst (TREE_TYPE (arg0), arg0, arg1, sel,
- &reason);
-   ASSERT_TRUE (res == NULL_TREE);
-   ASSERT_TRUE (!strcmp (reason, "step is not multiple of npatterns"));
-  }
-
-  /* Case 5: mask = {len, 0, 1, ...} // (1, 3)
-Test that stepped sequence of the pattern selects from arg0.
-res = { arg1[0], arg0[0], arg0[1], ... } // (1, 3)  */
-  {
-   tree 

Re: PR111648: Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding

2023-10-11 Thread Prathamesh Kulkarni
On Mon, 9 Oct 2023 at 17:05, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > Hi,
> > The attached patch attempts to fix PR111648.
> > As mentioned in PR, the issue is when a1 is a multiple of vector
> > length, we end up creating following encoding in result: { base_elem,
> > arg[0], arg[1], ... } (assuming S = 1),
> > where arg is chosen input vector, which is incorrect, since the
> > encoding originally in arg would be: { arg[0], arg[1], arg[2], ... }
> >
> > For the test-case mentioned in PR, vectorizer pass creates
> > VEC_PERM_EXPR where:
> > arg0: { -16, -9, -10, -11 }
> > arg1: { -12, -5, -6, -7 }
> > sel = { 3, 4, 5, 6 }
> >
> > arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3.
> > Since a1 = 4 and arg_len = 4, it ended up creating the result with
> > following encoding:
> > res = { arg0[3], arg1[0], arg1[1] } // npatterns = 1, nelts_per_pattern = 3
> >   = { -11, -12, -5 }
> >
> > So for res[3], it used S = (-5) - (-12) = 7
> > And hence computed it as -5 + 7 = 2.
> > instead of selecting arg1[2], ie, -6.
> >
> > The patch tweaks valid_mask_for_fold_vec_perm_cst_p to punt if a1 is a 
> > multiple
> > of vector length, so a1 ... ae select elements only from stepped part
> > of the pattern
> > from input vector and return false for this case.
> >
> > Since the vectors are VLS, fold_vec_perm_cst then sets:
> > res_npatterns = res_nelts
> > res_nelts_per_pattern  = 1
> > which seems to fix the issue by encoding all the elements.
> >
> > The patch resulted in Case 4 and Case 5 failing from test_nunits_min_2 
> > because
> > they used sel = { 0, 0, 1, ... } and {len, 0, 1, ... } respectively,
> > which used a1 = 0, and thus selected arg1[0].
> >
> > I removed Case 4 because it was already covered in test_nunits_min_4,
> > and moved Case 5 to test_nunits_min_4, with sel = { len, 1, 2, ... }
> > and added a new Case 9 to test for this issue.
> >
> > Passes bootstrap+test on aarch64-linux-gnu with and without SVE,
> > and on x86_64-linux-gnu.
> > Does the patch look OK ?
> >
> > Thanks,
> > Prathamesh
> >
> > [PR111648] Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding.
> >
> > gcc/ChangeLog:
> >   PR tree-optimization/111648
> >   * fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): Punt if a1
> >   is a multiple of vector length.
> >   (test_nunits_min_2): Remove Case 4 and move Case 5 to ...
> >   (test_nunits_min_4): ... here and rename case numbers. Also add
> >   Case 9.
> >
> > gcc/testsuite/ChangeLog:
> >   PR tree-optimization/111648
> >   * gcc.dg/vect/pr111648.c: New test.
> >
> >
> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > index 4f8561509ff..c5f421d6b76 100644
> > --- a/gcc/fold-const.cc
> > +++ b/gcc/fold-const.cc
> > @@ -10682,8 +10682,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree 
> > arg1,
> > return false;
> >   }
> >
> > -  /* Ensure that the stepped sequence always selects from the same
> > -  input pattern.  */
> > +  /* Ensure that the stepped sequence always selects from the stepped
> > +  part of same input pattern.  */
> >unsigned arg_npatterns
> >   = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0)
> > : VECTOR_CST_NPATTERNS (arg1);
> > @@ -10694,6 +10694,20 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, 
> > tree arg1,
> >   *reason = "step is not multiple of npatterns";
> > return false;
> >   }
> > +
> > +  /* If a1 is a multiple of len, it will select base element of input
> > +  vector resulting in following encoding:
> > +  { base_elem, arg[0], arg[1], ... } where arg is the chosen input
> > +  vector. This encoding is not originally present in arg, since it's
> > +  defined as:
> > +  { arg[0], arg[1], arg[2], ... }.  */
> > +
> > +  if (multiple_p (a1, arg_len))
> > + {
> > +   if (reason)
> > + *reason = "selecting base element of input vector";
> > +   return false;
> > + }
>
> That wouldn't catch (for example) cases where a1 == arg_len + 1 and the
> second argument has 2 stepped patterns.
Ah right, thanks for pointing out. In the attached patch I extended the check
so that r1 < arg_npatterns which should check if we are choosing base
elements from any of the patterns in arg (an

Re: PR111648: Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding

2023-10-11 Thread Prathamesh Kulkarni
On Wed, 11 Oct 2023 at 16:42, Prathamesh Kulkarni
 wrote:
>
> On Mon, 9 Oct 2023 at 17:05, Richard Sandiford
>  wrote:
> >
> > Prathamesh Kulkarni  writes:
> > > Hi,
> > > The attached patch attempts to fix PR111648.
> > > As mentioned in PR, the issue is when a1 is a multiple of vector
> > > length, we end up creating following encoding in result: { base_elem,
> > > arg[0], arg[1], ... } (assuming S = 1),
> > > where arg is chosen input vector, which is incorrect, since the
> > > encoding originally in arg would be: { arg[0], arg[1], arg[2], ... }
> > >
> > > For the test-case mentioned in PR, vectorizer pass creates
> > > VEC_PERM_EXPR where:
> > > arg0: { -16, -9, -10, -11 }
> > > arg1: { -12, -5, -6, -7 }
> > > sel = { 3, 4, 5, 6 }
> > >
> > > arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 
> > > 3.
> > > Since a1 = 4 and arg_len = 4, it ended up creating the result with
> > > following encoding:
> > > res = { arg0[3], arg1[0], arg1[1] } // npatterns = 1, nelts_per_pattern = 
> > > 3
> > >   = { -11, -12, -5 }
> > >
> > > So for res[3], it used S = (-5) - (-12) = 7
> > > And hence computed it as -5 + 7 = 2.
> > > instead of selecting arg1[2], ie, -6.
> > >
> > > The patch tweaks valid_mask_for_fold_vec_perm_cst_p to punt if a1 is a 
> > > multiple
> > > of vector length, so a1 ... ae select elements only from stepped part
> > > of the pattern
> > > from input vector and return false for this case.
> > >
> > > Since the vectors are VLS, fold_vec_perm_cst then sets:
> > > res_npatterns = res_nelts
> > > res_nelts_per_pattern  = 1
> > > which seems to fix the issue by encoding all the elements.
> > >
> > > The patch resulted in Case 4 and Case 5 failing from test_nunits_min_2 
> > > because
> > > they used sel = { 0, 0, 1, ... } and {len, 0, 1, ... } respectively,
> > > which used a1 = 0, and thus selected arg1[0].
> > >
> > > I removed Case 4 because it was already covered in test_nunits_min_4,
> > > and moved Case 5 to test_nunits_min_4, with sel = { len, 1, 2, ... }
> > > and added a new Case 9 to test for this issue.
> > >
> > > Passes bootstrap+test on aarch64-linux-gnu with and without SVE,
> > > and on x86_64-linux-gnu.
> > > Does the patch look OK ?
> > >
> > > Thanks,
> > > Prathamesh
> > >
> > > [PR111648] Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding.
> > >
> > > gcc/ChangeLog:
> > >   PR tree-optimization/111648
> > >   * fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): Punt if a1
> > >   is a multiple of vector length.
> > >   (test_nunits_min_2): Remove Case 4 and move Case 5 to ...
> > >   (test_nunits_min_4): ... here and rename case numbers. Also add
> > >   Case 9.
> > >
> > > gcc/testsuite/ChangeLog:
> > >   PR tree-optimization/111648
> > >   * gcc.dg/vect/pr111648.c: New test.
> > >
> > >
> > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > > index 4f8561509ff..c5f421d6b76 100644
> > > --- a/gcc/fold-const.cc
> > > +++ b/gcc/fold-const.cc
> > > @@ -10682,8 +10682,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, 
> > > tree arg1,
> > > return false;
> > >   }
> > >
> > > -  /* Ensure that the stepped sequence always selects from the same
> > > -  input pattern.  */
> > > +  /* Ensure that the stepped sequence always selects from the stepped
> > > +  part of same input pattern.  */
> > >unsigned arg_npatterns
> > >   = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0)
> > > : VECTOR_CST_NPATTERNS (arg1);
> > > @@ -10694,6 +10694,20 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, 
> > > tree arg1,
> > >   *reason = "step is not multiple of npatterns";
> > > return false;
> > >   }
> > > +
> > > +  /* If a1 is a multiple of len, it will select base element of input
> > > +  vector resulting in following encoding:
> > > +  { base_elem, arg[0], arg[1], ... } where arg is the chosen input
> > > +  vector. This encoding is not originally present in arg, since it's
> > > +  defined as:
> > > +  { arg[0], arg[1], ar

Re: PR111648: Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding

2023-10-12 Thread Prathamesh Kulkarni
On Wed, 11 Oct 2023 at 16:57, Prathamesh Kulkarni
 wrote:
>
> On Wed, 11 Oct 2023 at 16:42, Prathamesh Kulkarni
>  wrote:
> >
> > On Mon, 9 Oct 2023 at 17:05, Richard Sandiford
> >  wrote:
> > >
> > > Prathamesh Kulkarni  writes:
> > > > Hi,
> > > > The attached patch attempts to fix PR111648.
> > > > As mentioned in PR, the issue is when a1 is a multiple of vector
> > > > length, we end up creating following encoding in result: { base_elem,
> > > > arg[0], arg[1], ... } (assuming S = 1),
> > > > where arg is chosen input vector, which is incorrect, since the
> > > > encoding originally in arg would be: { arg[0], arg[1], arg[2], ... }
> > > >
> > > > For the test-case mentioned in PR, vectorizer pass creates
> > > > VEC_PERM_EXPR where:
> > > > arg0: { -16, -9, -10, -11 }
> > > > arg1: { -12, -5, -6, -7 }
> > > > sel = { 3, 4, 5, 6 }
> > > >
> > > > arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern 
> > > > = 3.
> > > > Since a1 = 4 and arg_len = 4, it ended up creating the result with
> > > > following encoding:
> > > > res = { arg0[3], arg1[0], arg1[1] } // npatterns = 1, nelts_per_pattern 
> > > > = 3
> > > >   = { -11, -12, -5 }
> > > >
> > > > So for res[3], it used S = (-5) - (-12) = 7
> > > > And hence computed it as -5 + 7 = 2.
> > > > instead of selecting arg1[2], ie, -6.
> > > >
> > > > The patch tweaks valid_mask_for_fold_vec_perm_cst_p to punt if a1 is a 
> > > > multiple
> > > > of vector length, so a1 ... ae select elements only from stepped part
> > > > of the pattern
> > > > from input vector and return false for this case.
> > > >
> > > > Since the vectors are VLS, fold_vec_perm_cst then sets:
> > > > res_npatterns = res_nelts
> > > > res_nelts_per_pattern  = 1
> > > > which seems to fix the issue by encoding all the elements.
> > > >
> > > > The patch resulted in Case 4 and Case 5 failing from test_nunits_min_2 
> > > > because
> > > > they used sel = { 0, 0, 1, ... } and {len, 0, 1, ... } respectively,
> > > > which used a1 = 0, and thus selected arg1[0].
> > > >
> > > > I removed Case 4 because it was already covered in test_nunits_min_4,
> > > > and moved Case 5 to test_nunits_min_4, with sel = { len, 1, 2, ... }
> > > > and added a new Case 9 to test for this issue.
> > > >
> > > > Passes bootstrap+test on aarch64-linux-gnu with and without SVE,
> > > > and on x86_64-linux-gnu.
> > > > Does the patch look OK ?
> > > >
> > > > Thanks,
> > > > Prathamesh
> > > >
> > > > [PR111648] Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding.
> > > >
> > > > gcc/ChangeLog:
> > > >   PR tree-optimization/111648
> > > >   * fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): Punt if a1
> > > >   is a multiple of vector length.
> > > >   (test_nunits_min_2): Remove Case 4 and move Case 5 to ...
> > > >   (test_nunits_min_4): ... here and rename case numbers. Also add
> > > >   Case 9.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >   PR tree-optimization/111648
> > > >   * gcc.dg/vect/pr111648.c: New test.
> > > >
> > > >
> > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > > > index 4f8561509ff..c5f421d6b76 100644
> > > > --- a/gcc/fold-const.cc
> > > > +++ b/gcc/fold-const.cc
> > > > @@ -10682,8 +10682,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, 
> > > > tree arg1,
> > > > return false;
> > > >   }
> > > >
> > > > -  /* Ensure that the stepped sequence always selects from the same
> > > > -  input pattern.  */
> > > > +  /* Ensure that the stepped sequence always selects from the 
> > > > stepped
> > > > +  part of same input pattern.  */
> > > >unsigned arg_npatterns
> > > >   = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0)
> > > > : VECTOR_CST_NPATTERNS (arg1);
> > > > @@ -10694,6 +10694,20 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, 
> > > > tree arg1,
> > > >

Re: PR111648: Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding

2023-10-17 Thread Prathamesh Kulkarni
On Tue, 17 Oct 2023 at 02:40, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > On Wed, 11 Oct 2023 at 16:57, Prathamesh Kulkarni
> >  wrote:
> >>
> >> On Wed, 11 Oct 2023 at 16:42, Prathamesh Kulkarni
> >>  wrote:
> >> >
> >> > On Mon, 9 Oct 2023 at 17:05, Richard Sandiford
> >> >  wrote:
> >> > >
> >> > > Prathamesh Kulkarni  writes:
> >> > > > Hi,
> >> > > > The attached patch attempts to fix PR111648.
> >> > > > As mentioned in PR, the issue is when a1 is a multiple of vector
> >> > > > length, we end up creating following encoding in result: { base_elem,
> >> > > > arg[0], arg[1], ... } (assuming S = 1),
> >> > > > where arg is chosen input vector, which is incorrect, since the
> >> > > > encoding originally in arg would be: { arg[0], arg[1], arg[2], ... }
> >> > > >
> >> > > > For the test-case mentioned in PR, vectorizer pass creates
> >> > > > VEC_PERM_EXPR where:
> >> > > > arg0: { -16, -9, -10, -11 }
> >> > > > arg1: { -12, -5, -6, -7 }
> >> > > > sel = { 3, 4, 5, 6 }
> >> > > >
> >> > > > arg0, arg1 and sel are encoded with npatterns = 1 and 
> >> > > > nelts_per_pattern = 3.
> >> > > > Since a1 = 4 and arg_len = 4, it ended up creating the result with
> >> > > > following encoding:
> >> > > > res = { arg0[3], arg1[0], arg1[1] } // npatterns = 1, 
> >> > > > nelts_per_pattern = 3
> >> > > >   = { -11, -12, -5 }
> >> > > >
> >> > > > So for res[3], it used S = (-5) - (-12) = 7
> >> > > > And hence computed it as -5 + 7 = 2.
> >> > > > instead of selecting arg1[2], ie, -6.
> >> > > >
> >> > > > The patch tweaks valid_mask_for_fold_vec_perm_cst_p to punt if a1 is 
> >> > > > a multiple
> >> > > > of vector length, so a1 ... ae select elements only from stepped part
> >> > > > of the pattern
> >> > > > from input vector and return false for this case.
> >> > > >
> >> > > > Since the vectors are VLS, fold_vec_perm_cst then sets:
> >> > > > res_npatterns = res_nelts
> >> > > > res_nelts_per_pattern  = 1
> >> > > > which seems to fix the issue by encoding all the elements.
> >> > > >
> >> > > > The patch resulted in Case 4 and Case 5 failing from 
> >> > > > test_nunits_min_2 because
> >> > > > they used sel = { 0, 0, 1, ... } and {len, 0, 1, ... } respectively,
> >> > > > which used a1 = 0, and thus selected arg1[0].
> >> > > >
> >> > > > I removed Case 4 because it was already covered in test_nunits_min_4,
> >> > > > and moved Case 5 to test_nunits_min_4, with sel = { len, 1, 2, ... }
> >> > > > and added a new Case 9 to test for this issue.
> >> > > >
> >> > > > Passes bootstrap+test on aarch64-linux-gnu with and without SVE,
> >> > > > and on x86_64-linux-gnu.
> >> > > > Does the patch look OK ?
> >> > > >
> >> > > > Thanks,
> >> > > > Prathamesh
> >> > > >
> >> > > > [PR111648] Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding.
> >> > > >
> >> > > > gcc/ChangeLog:
> >> > > >   PR tree-optimization/111648
> >> > > >   * fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): Punt if 
> >> > > > a1
> >> > > >   is a multiple of vector length.
> >> > > >   (test_nunits_min_2): Remove Case 4 and move Case 5 to ...
> >> > > >   (test_nunits_min_4): ... here and rename case numbers. Also add
> >> > > >   Case 9.
> >> > > >
> >> > > > gcc/testsuite/ChangeLog:
> >> > > >   PR tree-optimization/111648
> >> > > >   * gcc.dg/vect/pr111648.c: New test.
> >> > > >
> >> > > >
> >> > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> >> > > > index 4f8561509ff..c5f421d6b76 100644
> >> > > > --- a/gcc/fold-const.cc
> >> > > > +++ b/gcc/fold-const.cc
> &

Re: PR111648: Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding

2023-10-18 Thread Prathamesh Kulkarni
On Wed, 18 Oct 2023 at 23:22, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > On Tue, 17 Oct 2023 at 02:40, Richard Sandiford
> >  wrote:
> >> Prathamesh Kulkarni  writes:
> >> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> >> > index 4f8561509ff..55a6a68c16c 100644
> >> > --- a/gcc/fold-const.cc
> >> > +++ b/gcc/fold-const.cc
> >> > @@ -10684,9 +10684,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, 
> >> > tree arg1,
> >> >
> >> >/* Ensure that the stepped sequence always selects from the same
> >> >input pattern.  */
> >> > -  unsigned arg_npatterns
> >> > - = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0)
> >> > -   : VECTOR_CST_NPATTERNS (arg1);
> >> > +  tree arg = ((q1 & 1) == 0) ? arg0 : arg1;
> >> > +  unsigned arg_npatterns = VECTOR_CST_NPATTERNS (arg);
> >> >
> >> >if (!multiple_p (step, arg_npatterns))
> >> >   {
> >> > @@ -10694,6 +10693,29 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, 
> >> > tree arg1,
> >> >   *reason = "step is not multiple of npatterns";
> >> > return false;
> >> >   }
> >> > +
> >> > +  /* If a1 chooses base element from arg, ensure that it's a natural
> >> > +  stepped sequence, ie, (arg[2] - arg[1]) == (arg[1] - arg[0])
> >> > +  to preserve arg's encoding.  */
> >> > +
> >> > +  unsigned HOST_WIDE_INT index;
> >> > +  if (!r1.is_constant (&index))
> >> > + return false;
> >> > +  if (index < arg_npatterns)
> >> > + {
> >>
> >> I don't know whether it matters in practice, but I think the two conditions
> >> above are more natural as:
> >>
> >> if (maybe_lt (r1, arg_npatterns))
> >>   {
> >> unsigned HOST_WIDE_INT index;
> >> if (!r1.is_constant (&index))
> >>   return false;
> >>
> >> ...[code below]...
> >>   }
> >>
> >> > +   tree arg_elem0 = vector_cst_elt (arg, index);
> >> > +   tree arg_elem1 = vector_cst_elt (arg, index + arg_npatterns);
> >> > +   tree arg_elem2 = vector_cst_elt (arg, index + arg_npatterns * 2);
> >> > +
> >> > +   if (!operand_equal_p (const_binop (MINUS_EXPR, arg_elem2, 
> >> > arg_elem1),
> >> > + const_binop (MINUS_EXPR, arg_elem1, 
> >> > arg_elem0),
> >> > + 0))
> >>
> >> This needs to check whether const_binop returns null.  Maybe:
> >>
> >>tree step1, step2;
> >>if (!(step1 = const_binop (MINUS_EXPR, arg_elem1, arg_elem0))
> >>|| !(step2 = const_binop (MINUS_EXPR, arg_elem2, arg_elem1))
> >>|| !operand_equal_p (step1, step2, 0))
> >>
> >> OK with those changes, thanks.
> > Hi Richard,
> > Thanks for the suggestions, updated the attached patch accordingly.
> > Bootstrapped+tested with and without SVE on aarch64-linux-gnu and
> > x86_64-linux-gnu.
> > OK to commit ?
>
> Yes, thanks.
Thanks, committed to trunk in 3ec8ecb8e92faec889bc6f7aeac9ff59e82b4f7f.

Thanks,
Prathamesh
>
> Richard
>
> >
> > Thanks,
> > Prathamesh
> >>
> >> Richard
> >>
> >> > + {
> >> > +   if (reason)
> >> > + *reason = "not a natural stepped sequence";
> >> > +   return false;
> >> > + }
> >> > + }
> >> >  }
> >> >
> >> >return true;
> >> > @@ -17161,7 +17183,8 @@ namespace test_fold_vec_perm_cst {
> >> >  static tree
> >> >  build_vec_cst_rand (machine_mode vmode, unsigned npatterns,
> >> >   unsigned nelts_per_pattern,
> >> > - int step = 0, int threshold = 100)
> >> > + int step = 0, bool natural_stepped = false,
> >> > + int threshold = 100)
> >> >  {
> >> >tree inner_type = lang_hooks.types.type_for_mode (GET_MODE_INNER 
> >> > (vmode), 1);
> >> >tree vectype = build_vector_type_for_mode (inner_type, vmode);
> >> > @@ -17176,17 +17199,28 @@ bu

PR111754

2023-10-20 Thread Prathamesh Kulkarni
Hi,
For the following test-case:

typedef float __attribute__((__vector_size__ (16))) F;
F foo (F a, F b)
{
  F v = (F) { 9 };
  return __builtin_shufflevector (v, v, 1, 0, 1, 2);
}

Compiling with -O2 results in following ICE:
foo.c: In function ‘foo’:
foo.c:6:10: internal compiler error: in decompose, at rtl.h:2314
6 |   return __builtin_shufflevector (v, v, 1, 0, 1, 2);
  |  ^~
0x7f3185 wi::int_traits
>::decompose(long*, unsigned int, std::pair
const&)
../../gcc/gcc/rtl.h:2314
0x7f3185 wide_int_ref_storage::wide_int_ref_storage
>(std::pair const&)
../../gcc/gcc/wide-int.h:1089
0x7f3185 generic_wide_int
>::generic_wide_int
>(std::pair const&)
../../gcc/gcc/wide-int.h:847
0x7f3185 poly_int<1u, generic_wide_int > >::poly_int
>(poly_int_full, std::pair const&)
../../gcc/gcc/poly-int.h:467
0x7f3185 poly_int<1u, generic_wide_int > >::poly_int
>(std::pair const&)
../../gcc/gcc/poly-int.h:453
0x7f3185 wi::to_poly_wide(rtx_def const*, machine_mode)
../../gcc/gcc/rtl.h:2383
0x7f3185 rtx_vector_builder::step(rtx_def*, rtx_def*) const
../../gcc/gcc/rtx-vector-builder.h:122
0xfd4e1b vector_builder::elt(unsigned int) const
../../gcc/gcc/vector-builder.h:253
0xfd4d11 rtx_vector_builder::build()
../../gcc/gcc/rtx-vector-builder.cc:73
0xc21d9c const_vector_from_tree
../../gcc/gcc/expr.cc:13487
0xc21d9c expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
../../gcc/gcc/expr.cc:11059
0xaee682 expand_expr(tree_node*, rtx_def*, machine_mode, expand_modifier)
../../gcc/gcc/expr.h:310
0xaee682 expand_return
../../gcc/gcc/cfgexpand.cc:3809
0xaee682 expand_gimple_stmt_1
../../gcc/gcc/cfgexpand.cc:3918
0xaee682 expand_gimple_stmt
../../gcc/gcc/cfgexpand.cc:4044
0xaf28f0 expand_gimple_basic_block
../../gcc/gcc/cfgexpand.cc:6100
0xaf4996 execute
../../gcc/gcc/cfgexpand.cc:6835

IIUC, the issue is that fold_vec_perm returns a vector having float element
type with res_nelts_per_pattern == 3, and later ICE's when it tries
to derive element v[3], not present in the encoding, while trying to
build rtx vector
in rtx_vector_builder::build():
 for (unsigned int i = 0; i < nelts; ++i)
RTVEC_ELT (v, i) = elt (i);

The attached patch tries to fix this by returning false from
valid_mask_for_fold_vec_perm_cst if sel has a stepped sequence and
input vector has non-integral element type, so for VLA vectors, it
will only build result with dup sequence (nelts_per_pattern < 3) for
non-integral element type.

For VLS vectors, this will still work for stepped sequence since it
will then use the "VLS exception" in fold_vec_perm_cst, and set:
res_npattern = res_nelts and
res_nelts_per_pattern = 1

and fold the above case to:
F foo (F a, F b)
{
   [local count: 1073741824]:
  return { 0.0, 9.0e+0, 0.0, 0.0 };
}

But I am not sure if this is entirely correct, since:
tree res = out_elts.build ();
will canonicalize the encoding and may result in a stepped sequence
(vector_builder::finalize() may reduce npatterns at the cost of increasing
nelts_per_pattern)  ?

PS: This issue is now latent after PR111648 fix, since
valid_mask_for_fold_vec_perm_cst with  sel = {1, 0, 1, ...} returns
false because the corresponding pattern in arg0 is not a natural
stepped sequence, and folds correctly using VLS exception. However, I
guess the underlying issue of dealing with non-integral element types
in fold_vec_perm_cst still remains ?

The patch passes bootstrap+test with and without SVE on aarch64-linux-gnu,
and on x86_64-linux-gnu.

Thanks,
Prathamesh
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 82299bb7f1d..cedfc9616e9 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -10642,6 +10642,11 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, tree 
arg1,
   if (sel_nelts_per_pattern < 3)
 return true;
 
+  /* If SEL contains stepped sequence, ensure that we are dealing with
+ integral vector_cst.  */
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (TREE_TYPE (arg0
+return false;
+
   for (unsigned pattern = 0; pattern < sel_npatterns; pattern++)
 {
   poly_uint64 a1 = sel[pattern + sel_npatterns];
diff --git a/gcc/testsuite/gcc.dg/vect/pr111754.c 
b/gcc/testsuite/gcc.dg/vect/pr111754.c
new file mode 100644
index 000..7c1c16875c7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr111754.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+typedef float __attribute__((__vector_size__ (16))) F;
+
+F foo (F a, F b)
+{
+  F v = (F) { 9 };
+  return __builtin_shufflevector (v, v, 1, 0, 1, 2);
+}
+
+/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "optimized" } } */
+/* { dg-final { scan-tree-dump "return \{ 0.0, 9.0e\\+0, 0.0, 0.0 \}" 
"optimized" } } */


Re: PR111754

2023-10-25 Thread Prathamesh Kulkarni
On Wed, 25 Oct 2023 at 02:58, Richard Sandiford
 wrote:
>
> Hi,
>
> Sorry the slow review.  I clearly didn't think this through properly
> when doing the review of the original patch, so I wanted to spend
> some time working on the code to get a better understanding of
> the problem.
>
> Prathamesh Kulkarni  writes:
> > Hi,
> > For the following test-case:
> >
> > typedef float __attribute__((__vector_size__ (16))) F;
> > F foo (F a, F b)
> > {
> >   F v = (F) { 9 };
> >   return __builtin_shufflevector (v, v, 1, 0, 1, 2);
> > }
> >
> > Compiling with -O2 results in following ICE:
> > foo.c: In function ‘foo’:
> > foo.c:6:10: internal compiler error: in decompose, at rtl.h:2314
> > 6 |   return __builtin_shufflevector (v, v, 1, 0, 1, 2);
> >   |  ^~
> > 0x7f3185 wi::int_traits
> >>::decompose(long*, unsigned int, std::pair
> > const&)
> > ../../gcc/gcc/rtl.h:2314
> > 0x7f3185 wide_int_ref_storage > false>::wide_int_ref_storage
> >>(std::pair const&)
> > ../../gcc/gcc/wide-int.h:1089
> > 0x7f3185 generic_wide_int
> >>::generic_wide_int
> >>(std::pair const&)
> > ../../gcc/gcc/wide-int.h:847
> > 0x7f3185 poly_int<1u, generic_wide_int > false> > >::poly_int
> >>(poly_int_full, std::pair const&)
> > ../../gcc/gcc/poly-int.h:467
> > 0x7f3185 poly_int<1u, generic_wide_int > false> > >::poly_int
> >>(std::pair const&)
> > ../../gcc/gcc/poly-int.h:453
> > 0x7f3185 wi::to_poly_wide(rtx_def const*, machine_mode)
> > ../../gcc/gcc/rtl.h:2383
> > 0x7f3185 rtx_vector_builder::step(rtx_def*, rtx_def*) const
> > ../../gcc/gcc/rtx-vector-builder.h:122
> > 0xfd4e1b vector_builder > rtx_vector_builder>::elt(unsigned int) const
> > ../../gcc/gcc/vector-builder.h:253
> > 0xfd4d11 rtx_vector_builder::build()
> > ../../gcc/gcc/rtx-vector-builder.cc:73
> > 0xc21d9c const_vector_from_tree
> > ../../gcc/gcc/expr.cc:13487
> > 0xc21d9c expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
> > expand_modifier, rtx_def**, bool)
> > ../../gcc/gcc/expr.cc:11059
> > 0xaee682 expand_expr(tree_node*, rtx_def*, machine_mode, expand_modifier)
> > ../../gcc/gcc/expr.h:310
> > 0xaee682 expand_return
> > ../../gcc/gcc/cfgexpand.cc:3809
> > 0xaee682 expand_gimple_stmt_1
> > ../../gcc/gcc/cfgexpand.cc:3918
> > 0xaee682 expand_gimple_stmt
> > ../../gcc/gcc/cfgexpand.cc:4044
> > 0xaf28f0 expand_gimple_basic_block
> > ../../gcc/gcc/cfgexpand.cc:6100
> > 0xaf4996 execute
> > ../../gcc/gcc/cfgexpand.cc:6835
> >
> > IIUC, the issue is that fold_vec_perm returns a vector having float element
> > type with res_nelts_per_pattern == 3, and later ICE's when it tries
> > to derive element v[3], not present in the encoding, while trying to
> > build rtx vector
> > in rtx_vector_builder::build():
> >  for (unsigned int i = 0; i < nelts; ++i)
> > RTVEC_ELT (v, i) = elt (i);
> >
> > The attached patch tries to fix this by returning false from
> > valid_mask_for_fold_vec_perm_cst if sel has a stepped sequence and
> > input vector has non-integral element type, so for VLA vectors, it
> > will only build result with dup sequence (nelts_per_pattern < 3) for
> > non-integral element type.
> >
> > For VLS vectors, this will still work for stepped sequence since it
> > will then use the "VLS exception" in fold_vec_perm_cst, and set:
> > res_npattern = res_nelts and
> > res_nelts_per_pattern = 1
> >
> > and fold the above case to:
> > F foo (F a, F b)
> > {
> >[local count: 1073741824]:
> >   return { 0.0, 9.0e+0, 0.0, 0.0 };
> > }
> >
> > But I am not sure if this is entirely correct, since:
> > tree res = out_elts.build ();
> > will canonicalize the encoding and may result in a stepped sequence
> > (vector_builder::finalize() may reduce npatterns at the cost of increasing
> > nelts_per_pattern)  ?
> >
> > PS: This issue is now latent after PR111648 fix, since
> > valid_mask_for_fold_vec_perm_cst with  sel = {1, 0, 1, ...} returns
> > false because the corresponding pattern in arg0 is not a natural
> > stepped sequence, and folds correctly using VLS exception. However, I
> > guess the underlying issue of dealing with non-integral element types
> > in fold_vec_perm_cst still 

Re: PR111754

2023-10-25 Thread Prathamesh Kulkarni
On Thu, 26 Oct 2023 at 04:09, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > On Wed, 25 Oct 2023 at 02:58, Richard Sandiford
> >  wrote:
> >>
> >> Hi,
> >>
> >> Sorry the slow review.  I clearly didn't think this through properly
> >> when doing the review of the original patch, so I wanted to spend
> >> some time working on the code to get a better understanding of
> >> the problem.
> >>
> >> Prathamesh Kulkarni  writes:
> >> > Hi,
> >> > For the following test-case:
> >> >
> >> > typedef float __attribute__((__vector_size__ (16))) F;
> >> > F foo (F a, F b)
> >> > {
> >> >   F v = (F) { 9 };
> >> >   return __builtin_shufflevector (v, v, 1, 0, 1, 2);
> >> > }
> >> >
> >> > Compiling with -O2 results in following ICE:
> >> > foo.c: In function ‘foo’:
> >> > foo.c:6:10: internal compiler error: in decompose, at rtl.h:2314
> >> > 6 |   return __builtin_shufflevector (v, v, 1, 0, 1, 2);
> >> >   |  ^~
> >> > 0x7f3185 wi::int_traits
> >> >>::decompose(long*, unsigned int, std::pair
> >> > const&)
> >> > ../../gcc/gcc/rtl.h:2314
> >> > 0x7f3185 wide_int_ref_storage >> > false>::wide_int_ref_storage
> >> >>(std::pair const&)
> >> > ../../gcc/gcc/wide-int.h:1089
> >> > 0x7f3185 generic_wide_int
> >> >>::generic_wide_int
> >> >>(std::pair const&)
> >> > ../../gcc/gcc/wide-int.h:847
> >> > 0x7f3185 poly_int<1u, generic_wide_int >> > false> > >::poly_int
> >> >>(poly_int_full, std::pair const&)
> >> > ../../gcc/gcc/poly-int.h:467
> >> > 0x7f3185 poly_int<1u, generic_wide_int >> > false> > >::poly_int
> >> >>(std::pair const&)
> >> > ../../gcc/gcc/poly-int.h:453
> >> > 0x7f3185 wi::to_poly_wide(rtx_def const*, machine_mode)
> >> > ../../gcc/gcc/rtl.h:2383
> >> > 0x7f3185 rtx_vector_builder::step(rtx_def*, rtx_def*) const
> >> > ../../gcc/gcc/rtx-vector-builder.h:122
> >> > 0xfd4e1b vector_builder >> > rtx_vector_builder>::elt(unsigned int) const
> >> > ../../gcc/gcc/vector-builder.h:253
> >> > 0xfd4d11 rtx_vector_builder::build()
> >> > ../../gcc/gcc/rtx-vector-builder.cc:73
> >> > 0xc21d9c const_vector_from_tree
> >> > ../../gcc/gcc/expr.cc:13487
> >> > 0xc21d9c expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
> >> > expand_modifier, rtx_def**, bool)
> >> > ../../gcc/gcc/expr.cc:11059
> >> > 0xaee682 expand_expr(tree_node*, rtx_def*, machine_mode, expand_modifier)
> >> > ../../gcc/gcc/expr.h:310
> >> > 0xaee682 expand_return
> >> > ../../gcc/gcc/cfgexpand.cc:3809
> >> > 0xaee682 expand_gimple_stmt_1
> >> > ../../gcc/gcc/cfgexpand.cc:3918
> >> > 0xaee682 expand_gimple_stmt
> >> > ../../gcc/gcc/cfgexpand.cc:4044
> >> > 0xaf28f0 expand_gimple_basic_block
> >> > ../../gcc/gcc/cfgexpand.cc:6100
> >> > 0xaf4996 execute
> >> > ../../gcc/gcc/cfgexpand.cc:6835
> >> >
> >> > IIUC, the issue is that fold_vec_perm returns a vector having float 
> >> > element
> >> > type with res_nelts_per_pattern == 3, and later ICE's when it tries
> >> > to derive element v[3], not present in the encoding, while trying to
> >> > build rtx vector
> >> > in rtx_vector_builder::build():
> >> >  for (unsigned int i = 0; i < nelts; ++i)
> >> > RTVEC_ELT (v, i) = elt (i);
> >> >
> >> > The attached patch tries to fix this by returning false from
> >> > valid_mask_for_fold_vec_perm_cst if sel has a stepped sequence and
> >> > input vector has non-integral element type, so for VLA vectors, it
> >> > will only build result with dup sequence (nelts_per_pattern < 3) for
> >> > non-integral element type.
> >> >
> >> > For VLS vectors, this will still work for stepped sequence since it
> >> > will then use the "VLS exception" in fold_vec_perm_cst, and set:
> >> > res_npattern = res_nelts and
> >> >

PR92163

2019-10-23 Thread Prathamesh Kulkarni
Hi,
The attached patch tries to fix PR92163 by calling
gimple_purge_dead_eh_edges from ifcvt_local_dce if we need eh cleanup.
Does it look OK ?

Thanks,
Prathamesh
2019-10-24  Prathamesh Kulkarni  

PR tree-optimization/92163
* tree-if-conv.c (ifcvt_local_dce): Call gimple_purge_dead_eh_edges
if eh cleanup is required.
* tree-ssa-dse.c (delete_dead_or_redundant_assignment): Change return 
type
to bool and return the return value of gsi_remove.
* tree-ssa-dse.h (delete_dead_or_redundant_assignment): Adjust 
prototype.

testsuite/
* gcc.dg/tree-ssa/pr92163.c: New test.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr92163.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr92163.c
new file mode 100644
index 000..f64eaea6517
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr92163.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fexceptions -fnon-call-exceptions -fopenacc" } */
+
+void
+xr (int *k7)
+{
+  int qa;
+
+#pragma acc parallel
+#pragma acc loop vector
+  for (qa = 0; qa < 3; ++qa)
+if (qa % 2 != 0)
+  k7[qa] = 0;
+else
+  k7[qa] = 1;
+}
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index df9046a3014..3e2769dd02d 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -2963,6 +2963,7 @@ ifcvt_local_dce (class loop *loop)
}
 }
   /* Delete dead statements.  */
+  bool do_eh_cleanup = false;
   gsi = gsi_start_bb (bb);
   while (!gsi_end_p (gsi))
 {
@@ -2975,7 +2976,7 @@ ifcvt_local_dce (class loop *loop)
 
  if (dse_classify_store (&write, stmt, false, NULL, NULL, latch_vdef)
  == DSE_STORE_DEAD)
-   delete_dead_or_redundant_assignment (&gsi, "dead");
+   do_eh_cleanup |= delete_dead_or_redundant_assignment (&gsi, "dead");
  else
gsi_next (&gsi);
  continue;
@@ -2994,6 +2995,9 @@ ifcvt_local_dce (class loop *loop)
   gsi_remove (&gsi, true);
   release_defs (stmt);
 }
+
+  if (do_eh_cleanup)
+gimple_purge_dead_eh_edges (bb);
 }
 
 /* If-convert LOOP when it is legal.  For the moment this pass has no
diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index 25cd4709b31..deec6c07c50 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -77,7 +77,6 @@ along with GCC; see the file COPYING3.  If not see
fact, they are the same transformation applied to different views of
the CFG.  */
 
-void delete_dead_or_redundant_assignment (gimple_stmt_iterator *, const char 
*);
 static void delete_dead_or_redundant_call (gimple_stmt_iterator *, const char 
*);
 
 /* Bitmap of blocks that have had EH statements cleaned.  We should
@@ -899,7 +898,7 @@ delete_dead_or_redundant_call (gimple_stmt_iterator *gsi, 
const char *type)
 
 /* Delete a dead store at GSI, which is a gimple assignment. */
 
-void
+bool
 delete_dead_or_redundant_assignment (gimple_stmt_iterator *gsi, const char 
*type)
 {
   gimple *stmt = gsi_stmt (*gsi);
@@ -915,12 +914,14 @@ delete_dead_or_redundant_assignment (gimple_stmt_iterator 
*gsi, const char *type
 
   /* Remove the dead store.  */
   basic_block bb = gimple_bb (stmt);
-  if (gsi_remove (gsi, true))
+  bool eh_cleanup_required = gsi_remove (gsi, true);
+  if (eh_cleanup_required && need_eh_cleanup)
 bitmap_set_bit (need_eh_cleanup, bb->index);
 
   /* And release any SSA_NAMEs set in this statement back to the
  SSA_NAME manager.  */
   release_defs (stmt);
+  return eh_cleanup_required;
 }
 
 /* Attempt to eliminate dead stores in the statement referenced by BSI.
diff --git a/gcc/tree-ssa-dse.h b/gcc/tree-ssa-dse.h
index a5eccbd746d..80b6d9b2616 100644
--- a/gcc/tree-ssa-dse.h
+++ b/gcc/tree-ssa-dse.h
@@ -31,6 +31,6 @@ enum dse_store_status
 dse_store_status dse_classify_store (ao_ref *, gimple *, bool, sbitmap,
 bool * = NULL, tree = NULL);
 
-void delete_dead_or_redundant_assignment (gimple_stmt_iterator *, const char 
*);
+bool delete_dead_or_redundant_assignment (gimple_stmt_iterator *, const char 
*);
 
 #endif   /* GCC_TREE_SSA_DSE_H  */


Re: [SVE] PR91272

2019-10-23 Thread Prathamesh Kulkarni
On Tue, 22 Oct 2019 at 13:12, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> > index acdd90784dc..dfd33b142ed 100644
> > --- a/gcc/tree-vect-stmts.c
> > +++ b/gcc/tree-vect-stmts.c
> > @@ -10016,25 +10016,26 @@ vectorizable_condition (stmt_vec_info stmt_info, 
> > gimple_stmt_iterator *gsi,
> >/* See whether another part of the vectorized code applies a loop
> >mask to the condition, or to its inverse.  */
> >
> > +  vec_loop_masks *masks = NULL;
> >if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
> >   {
> > -   scalar_cond_masked_key cond (cond_expr, ncopies);
> > -   if (loop_vinfo->scalar_cond_masked_set.contains (cond))
> > - {
> > -   vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > -   loop_mask = vect_get_loop_mask (gsi, masks, ncopies, vectype, 
> > j);
> > - }
> > +   if (reduction_type == EXTRACT_LAST_REDUCTION)
> > + masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > else
> >   {
> > -   bool honor_nans = HONOR_NANS (TREE_TYPE (cond.op0));
> > -   cond.code = invert_tree_comparison (cond.code, honor_nans);
> > +   scalar_cond_masked_key cond (cond_expr, ncopies);
> > if (loop_vinfo->scalar_cond_masked_set.contains (cond))
> > + masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > +   else
> >   {
> > -   vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > -   loop_mask = vect_get_loop_mask (gsi, masks, ncopies,
> > -   vectype, j);
> > -   cond_code = cond.code;
> > -   swap_cond_operands = true;
> > +   bool honor_nans = HONOR_NANS (TREE_TYPE (cond.op0));
> > +   cond.code = invert_tree_comparison (cond.code, honor_nans);
> > +   if (loop_vinfo->scalar_cond_masked_set.contains (cond))
> > + {
> > +   masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > +   cond_code = cond.code;
> > +   swap_cond_operands = true;
> > + }
> >   }
> >   }
> >   }
> > @@ -10116,6 +10117,13 @@ vectorizable_condition (stmt_vec_info stmt_info, 
> > gimple_stmt_iterator *gsi,
> >vec_then_clause = vec_oprnds2[i];
> >vec_else_clause = vec_oprnds3[i];
> >
> > +  if (masks)
> > + {
> > +   unsigned vec_num = vec_oprnds0.length ();
> > +   loop_mask = vect_get_loop_mask (gsi, masks, vec_num * ncopies,
> > +   vectype, vec_num * j + i);
> > + }
> > +
>
> I don't think we need an extra "if" here.  "loop_mask" only feeds
> the later "if (loop_mask)" block, so we might as well change that
> later "if" to "if (masks)" and make the "loop_mask" variable local
> to the "if" body.
>
> > if (swap_cond_operands)
> >   std::swap (vec_then_clause, vec_else_clause);
> >
> > @@ -10194,23 +10202,6 @@ vectorizable_condition (stmt_vec_info stmt_info, 
> > gimple_stmt_iterator *gsi,
> > vec_compare = tmp;
> >   }
> >
> > -   tree tmp2 = make_ssa_name (vec_cmp_type);
> > -   gassign *g = gimple_build_assign (tmp2, BIT_AND_EXPR,
> > - vec_compare, loop_mask);
> > -   vect_finish_stmt_generation (stmt_info, g, gsi);
> > -   vec_compare = tmp2;
> > - }
> > -
> > -   if (reduction_type == EXTRACT_LAST_REDUCTION)
> > - {
> > -   if (!is_gimple_val (vec_compare))
> > - {
> > -   tree vec_compare_name = make_ssa_name (vec_cmp_type);
> > -   gassign *new_stmt = gimple_build_assign (vec_compare_name,
> > -vec_compare);
> > -   vect_finish_stmt_generation (stmt_info, new_stmt, gsi);
> > -   vec_compare = vec_compare_name;
> > - }
>
> This form is simpler than:
>
>   if (COMPARISON_CLASS_P (vec_compare))
> {
>   tree tmp = make_ssa_name (vec_cmp_type);
>   tree op0 = TREE_OPERAND (vec_compare, 0);
>  

Re: [SVE] PR91272

2019-10-25 Thread Prathamesh Kulkarni
On Fri, 25 Oct 2019 at 14:18, Richard Sandiford
 wrote:
>
> Hi Prathamesh,
>
> I've just committed a patch that fixes a large number of SVE
> reduction-related failures.  Could you rebase and retest on top of that?
> Sorry for messing you around, but regression testing based on the state
> before the patch wouldn't have been that meaningful.  In particular...
>
> Prathamesh Kulkarni  writes:
> > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> > index a70d52eb2ca..82814e2c2af 100644
> > --- a/gcc/tree-vect-loop.c
> > +++ b/gcc/tree-vect-loop.c
> > @@ -6428,6 +6428,7 @@ vectorizable_reduction (stmt_vec_info stmt_info, 
> > slp_tree slp_node,
> >if (loop_vinfo && LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo))
> >  {
> >if (reduction_type != FOLD_LEFT_REDUCTION
> > +   && reduction_type != EXTRACT_LAST_REDUCTION
> > && !mask_by_cond_expr
> > && (cond_fn == IFN_LAST
> > || !direct_internal_fn_supported_p (cond_fn, vectype_in,
>
> ...after today's patch, it's instead necessary to remove:
>
>   if (loop_vinfo
>   && LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo)
>   && reduction_type == EXTRACT_LAST_REDUCTION)
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>  "can't yet use a fully-masked loop for"
>  " EXTRACT_LAST_REDUCTION.\n");
>   LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo) = false;
> }
>
> from vectorizable_condition.  We no longer need any changes to
> vectorizable_reduction itself.
>
> > @@ -10180,18 +10181,29 @@ vectorizable_condition (stmt_vec_info stmt_info, 
> > gimple_stmt_iterator *gsi,
> >vec != { 0, ... } (masked in the MASK_LOAD,
> >unmasked in the VEC_COND_EXPR).  */
> >
> > -   if (loop_mask)
> > +   if (masks)
> >   {
> > -   if (COMPARISON_CLASS_P (vec_compare))
> > +   unsigned vec_num = vec_oprnds0.length ();
> > +   loop_mask = vect_get_loop_mask (gsi, masks, vec_num * ncopies,
> > +   vectype, vec_num * j + i);
>
> Ah... now that the two cases are merged (good!), just "if (masks)" isn't
> right after all, sorry for the misleading comment.  I think this should
> instead be:
>
>   /* Force vec_compare to be an SSA_NAME rather than a comparison,
>  in cases where that's necessary.  */
>   if (masks || reduction_type == EXTRACT_LAST_REDUCTION)
> {
>
> Not doing that would break unmasked EXTRACT_LAST_REDUCTIONs.
Ah right, thanks for pointing out!
>
> Then make the existing:
>
>   tree tmp2 = make_ssa_name (vec_cmp_type);
>   gassign *g = gimple_build_assign (tmp2, BIT_AND_EXPR,
> vec_compare, loop_mask);
>   vect_finish_stmt_generation (stmt_info, g, gsi);
>   vec_compare = tmp2;
>
> conditional on "if (masks)" only, and defer the calculation of loop_mask
> to this point too.
>
> [ It ould be good to spot-check that aarch64-sve.exp passes after making
>   the changes to the stmt-generation part of vectorizable_condition,
>   but before removing the:
>
> LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo) = false;
>
>   quoted above.  That would show that unmasked fold-left reductions
>   still work after the changes.
>
>   There are still some lingering cases in which we can test unmasked
>   SVE loops directly, but they're becoming rarer and should eventually
>   go away altogether.  So I don't think it's worth trying to construct
>   an unmasked test for the testsuite. ]
>
> > +
> > +  if (!is_gimple_val (vec_compare))
> > +{
> > +  tree vec_compare_name = make_ssa_name (vec_cmp_type);
> > +  gassign *new_stmt = gimple_build_assign 
> > (vec_compare_name,
> > +   vec_compare);
> > +  vect_finish_stmt_generation (stmt_info, new_stmt, gsi);
> > +  vec_compare = vec_compare_name;
> > +}
>
> Should use tab-based indentation.
Thanks for the suggestions, does the attached version look OK ?
Comparing aarch64-sve.exp before/after patch shows no regressions,
bootstrap+test in progress.

Thanks,
Prathamesh
>
> Thanks,
> Richard
diff --git a/gcc/testsuite/gc

Re: PR92163

2019-10-25 Thread Prathamesh Kulkarni
On Fri, 25 Oct 2019 at 13:19, Richard Biener  wrote:
>
> On Wed, Oct 23, 2019 at 11:45 PM Prathamesh Kulkarni
>  wrote:
> >
> > Hi,
> > The attached patch tries to fix PR92163 by calling
> > gimple_purge_dead_eh_edges from ifcvt_local_dce if we need eh cleanup.
> > Does it look OK ?
>
> Hmm.  I think it shows an issue with the return value of 
> remove_stmt_form_eh_lp
> which is true if the LP index is -1 (externally throwing).  We don't
> need to purge
> any edges in that case.  That is, if-conversion should never need to
> do EH purging
> since that would be wrong-code.
>
> As of the segfault can you please instead either pass down need_eh_cleanup
> as function parameter (and NULL from ifcvt) or use the return value in DSE
> to set the bit in the caller.
Hi Richard,
Thanks for the suggestions, does the attached patch look OK ?
Bootstrap+test in progress on x86_64-unknown-linux-gnu.

Thanks,
Prathamesh
>
> Thanks,
> Richard.
>
> > Thanks,
> > Prathamesh
2019-10-25  Prathamesh Kulkarni  

PR tree-optimization/92163
* tree-ssa-dse.c (delete_dead_or_redundant_assignment): New param
need_eh_cleanup with default value NULL. Gate on need_eh_cleanup
before calling bitmap_set_bit.
(dse_optimize_redundant_stores): Pass global need_eh_cleanup to
delete_dead_or_redundant_assignment.
(dse_dom_walker::dse_optimize_stmt): Likewise.
* tree-ssa-dse.h (delete_dead_or_redundant_assignment): Adjust 
prototype.

testsuite/
* gcc.dg/tree-ssa/pr92163.c: New test.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr92163.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr92163.c
new file mode 100644
index 000..58f548fe76b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr92163.c
@@ -0,0 +1,16 @@
+/* { dg-do "compile" } */
+/* { dg-options "-O2 -fexceptions -fnon-call-exceptions -fopenacc" } */
+
+void
+xr (int *k7)
+{
+  int qa;
+
+#pragma acc parallel
+#pragma acc loop vector
+  for (qa = 0; qa < 3; ++qa)
+if (qa % 2 != 0)
+  k7[qa] = 0;
+else
+  k7[qa] = 1;
+}
diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index 25cd4709b31..21a15eef690 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -77,7 +77,6 @@ along with GCC; see the file COPYING3.  If not see
fact, they are the same transformation applied to different views of
the CFG.  */
 
-void delete_dead_or_redundant_assignment (gimple_stmt_iterator *, const char 
*);
 static void delete_dead_or_redundant_call (gimple_stmt_iterator *, const char 
*);
 
 /* Bitmap of blocks that have had EH statements cleaned.  We should
@@ -639,7 +638,8 @@ dse_optimize_redundant_stores (gimple *stmt)
{
  gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
  if (is_gimple_assign (use_stmt))
-   delete_dead_or_redundant_assignment (&gsi, "redundant");
+   delete_dead_or_redundant_assignment (&gsi, "redundant",
+need_eh_cleanup);
  else if (is_gimple_call (use_stmt))
delete_dead_or_redundant_call (&gsi, "redundant");
  else
@@ -900,7 +900,8 @@ delete_dead_or_redundant_call (gimple_stmt_iterator *gsi, 
const char *type)
 /* Delete a dead store at GSI, which is a gimple assignment. */
 
 void
-delete_dead_or_redundant_assignment (gimple_stmt_iterator *gsi, const char 
*type)
+delete_dead_or_redundant_assignment (gimple_stmt_iterator *gsi, const char 
*type,
+bitmap need_eh_cleanup)
 {
   gimple *stmt = gsi_stmt (*gsi);
   if (dump_file && (dump_flags & TDF_DETAILS))
@@ -915,7 +916,7 @@ delete_dead_or_redundant_assignment (gimple_stmt_iterator 
*gsi, const char *type
 
   /* Remove the dead store.  */
   basic_block bb = gimple_bb (stmt);
-  if (gsi_remove (gsi, true))
+  if (gsi_remove (gsi, true) && need_eh_cleanup)
 bitmap_set_bit (need_eh_cleanup, bb->index);
 
   /* And release any SSA_NAMEs set in this statement back to the
@@ -1059,7 +1060,7 @@ dse_dom_walker::dse_optimize_stmt (gimple_stmt_iterator 
*gsi)
  && !by_clobber_p)
return;
 
-  delete_dead_or_redundant_assignment (gsi, "dead");
+  delete_dead_or_redundant_assignment (gsi, "dead", need_eh_cleanup);
 }
 }
 
diff --git a/gcc/tree-ssa-dse.h b/gcc/tree-ssa-dse.h
index a5eccbd746d..2658f92b1bb 100644
--- a/gcc/tree-ssa-dse.h
+++ b/gcc/tree-ssa-dse.h
@@ -31,6 +31,7 @@ enum dse_store_status
 dse_store_status dse_classify_store (ao_ref *, gimple *, bool, sbitmap,
 bool * = NULL, tree = NULL);
 
-void delete_dead_or_redundant_assignment (gimple_stmt_iterator *, const char 
*);
+void delete_dead_or_redundant_assignment (gimple_stmt_iterator *, const char *,
+ bitmap = NULL);
 
 #endif   /* GCC_TREE_SSA_DSE_H  */


Re: [SVE] PR91272

2019-10-28 Thread Prathamesh Kulkarni
On Sun, 27 Oct 2019 at 06:08, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > @@ -10288,6 +10261,23 @@ vectorizable_condition (stmt_vec_info stmt_info, 
> > gimple_stmt_iterator *gsi,
> > vect_finish_stmt_generation (stmt_info, new_stmt, gsi);
> > vec_compare = vec_compare_name;
> >   }
> > +
> > +   if (masks)
> > + {
> > +   unsigned vec_num = vec_oprnds0.length ();
> > +   tree loop_mask
> > + = vect_get_loop_mask (gsi, masks, vec_num * ncopies,
> > +   vectype, vec_num * j + i);
> > +   tree tmp2 = make_ssa_name (vec_cmp_type);
> > +   gassign *g = gimple_build_assign (tmp2, BIT_AND_EXPR,
> > + vec_compare, loop_mask);
>
> Nit: misindented line.
>
> OK with that change, thanks.
Thanks, committed in r277524 after bootstrap+test on
x86_64-unknown-linux-gnu and aarch64-linux-gnu.

Thanks,
Prathamesh
>
> Richard


Re: PR92163

2019-10-28 Thread Prathamesh Kulkarni
On Mon, 28 Oct 2019 at 07:18, Richard Biener  wrote:
>
> On Fri, Oct 25, 2019 at 9:58 PM Prathamesh Kulkarni
>  wrote:
> >
> > On Fri, 25 Oct 2019 at 13:19, Richard Biener  
> > wrote:
> > >
> > > On Wed, Oct 23, 2019 at 11:45 PM Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > Hi,
> > > > The attached patch tries to fix PR92163 by calling
> > > > gimple_purge_dead_eh_edges from ifcvt_local_dce if we need eh cleanup.
> > > > Does it look OK ?
> > >
> > > Hmm.  I think it shows an issue with the return value of 
> > > remove_stmt_form_eh_lp
> > > which is true if the LP index is -1 (externally throwing).  We don't
> > > need to purge
> > > any edges in that case.  That is, if-conversion should never need to
> > > do EH purging
> > > since that would be wrong-code.
> > >
> > > As of the segfault can you please instead either pass down need_eh_cleanup
> > > as function parameter (and NULL from ifcvt) or use the return value in DSE
> > > to set the bit in the caller.
> > Hi Richard,
> > Thanks for the suggestions, does the attached patch look OK ?
> > Bootstrap+test in progress on x86_64-unknown-linux-gnu.
>
> OK.
Thanks, committed to trunk in r277525 after bootstrap+test on
x86_64-unknown-linux-gnu.

Thanks,
Prathamesh
>
> Richard.
>
> > Thanks,
> > Prathamesh
> > >
> > > Thanks,
> > > Richard.
> > >
> > > > Thanks,
> > > > Prathamesh


Re: PR92163

2019-11-04 Thread Prathamesh Kulkarni
On Mon, 4 Nov 2019 at 18:37, Christophe Lyon  wrote:
>
> On Mon, 28 Oct 2019 at 16:03, Prathamesh Kulkarni
>  wrote:
> >
> > On Mon, 28 Oct 2019 at 07:18, Richard Biener  
> > wrote:
> > >
> > > On Fri, Oct 25, 2019 at 9:58 PM Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Fri, 25 Oct 2019 at 13:19, Richard Biener 
> > > >  wrote:
> > > > >
> > > > > On Wed, Oct 23, 2019 at 11:45 PM Prathamesh Kulkarni
> > > > >  wrote:
> > > > > >
> > > > > > Hi,
> > > > > > The attached patch tries to fix PR92163 by calling
> > > > > > gimple_purge_dead_eh_edges from ifcvt_local_dce if we need eh 
> > > > > > cleanup.
> > > > > > Does it look OK ?
> > > > >
> > > > > Hmm.  I think it shows an issue with the return value of 
> > > > > remove_stmt_form_eh_lp
> > > > > which is true if the LP index is -1 (externally throwing).  We don't
> > > > > need to purge
> > > > > any edges in that case.  That is, if-conversion should never need to
> > > > > do EH purging
> > > > > since that would be wrong-code.
> > > > >
> > > > > As of the segfault can you please instead either pass down 
> > > > > need_eh_cleanup
> > > > > as function parameter (and NULL from ifcvt) or use the return value 
> > > > > in DSE
> > > > > to set the bit in the caller.
> > > > Hi Richard,
> > > > Thanks for the suggestions, does the attached patch look OK ?
> > > > Bootstrap+test in progress on x86_64-unknown-linux-gnu.
> > >
> > > OK.
> > Thanks, committed to trunk in r277525 after bootstrap+test on
> > x86_64-unknown-linux-gnu.
> >
>
> Hi Prathamesh,
>
> There's a problem with the new test you added: if uses -fopenacc which
> is not supported by arm-eabi or aarch64-elf targets for instance.
> You probably want to move the test to gcc.dg/goacc or add
> dg-require-effective-target fopenacc.
Oops, sorry about that. Could you please confirm if attached patch
fixes the issue ?
I added dg-require-effective-target fopenacc.

Thanks,
Prathamesh
>
> Thanks,
>
> Christophe
>
> > Thanks,
> > Prathamesh
> > >
> > > Richard.
> > >
> > > > Thanks,
> > > > Prathamesh
> > > > >
> > > > > Thanks,
> > > > > Richard.
> > > > >
> > > > > > Thanks,
> > > > > > Prathamesh
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr92163.c b/gcc/testsuite/gcc.dg/tree-ssa/pr92163.c
index 58f548fe76b..227c09255e4 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr92163.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr92163.c
@@ -1,4 +1,5 @@
 /* { dg-do "compile" } */
+/* { dg-require-effective-target fopenacc } */
 /* { dg-options "-O2 -fexceptions -fnon-call-exceptions -fopenacc" } */
 
 void


Re: PR92163

2019-11-06 Thread Prathamesh Kulkarni
On Tue, 5 Nov 2019 at 18:36, Christophe Lyon  wrote:
>
> On Tue, 5 Nov 2019 at 05:46, Prathamesh Kulkarni
>  wrote:
> >
> > On Mon, 4 Nov 2019 at 18:37, Christophe Lyon  
> > wrote:
> > >
> > > On Mon, 28 Oct 2019 at 16:03, Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Mon, 28 Oct 2019 at 07:18, Richard Biener 
> > > >  wrote:
> > > > >
> > > > > On Fri, Oct 25, 2019 at 9:58 PM Prathamesh Kulkarni
> > > > >  wrote:
> > > > > >
> > > > > > On Fri, 25 Oct 2019 at 13:19, Richard Biener 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Wed, Oct 23, 2019 at 11:45 PM Prathamesh Kulkarni
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > Hi,
> > > > > > > > The attached patch tries to fix PR92163 by calling
> > > > > > > > gimple_purge_dead_eh_edges from ifcvt_local_dce if we need eh 
> > > > > > > > cleanup.
> > > > > > > > Does it look OK ?
> > > > > > >
> > > > > > > Hmm.  I think it shows an issue with the return value of 
> > > > > > > remove_stmt_form_eh_lp
> > > > > > > which is true if the LP index is -1 (externally throwing).  We 
> > > > > > > don't
> > > > > > > need to purge
> > > > > > > any edges in that case.  That is, if-conversion should never need 
> > > > > > > to
> > > > > > > do EH purging
> > > > > > > since that would be wrong-code.
> > > > > > >
> > > > > > > As of the segfault can you please instead either pass down 
> > > > > > > need_eh_cleanup
> > > > > > > as function parameter (and NULL from ifcvt) or use the return 
> > > > > > > value in DSE
> > > > > > > to set the bit in the caller.
> > > > > > Hi Richard,
> > > > > > Thanks for the suggestions, does the attached patch look OK ?
> > > > > > Bootstrap+test in progress on x86_64-unknown-linux-gnu.
> > > > >
> > > > > OK.
> > > > Thanks, committed to trunk in r277525 after bootstrap+test on
> > > > x86_64-unknown-linux-gnu.
> > > >
> > >
> > > Hi Prathamesh,
> > >
> > > There's a problem with the new test you added: if uses -fopenacc which
> > > is not supported by arm-eabi or aarch64-elf targets for instance.
> > > You probably want to move the test to gcc.dg/goacc or add
> > > dg-require-effective-target fopenacc.
> > Oops, sorry about that. Could you please confirm if attached patch
> > fixes the issue ?
> > I added dg-require-effective-target fopenacc.
> >
>
> Yes that works. Maybe you can commit it as obvious?
Thanks, committed in r277906.

Thanks,
Prathamesh
>
> Thanks,
>
> Christophe
>
> > Thanks,
> > Prathamesh
> > >
> > > Thanks,
> > >
> > > Christophe
> > >
> > > > Thanks,
> > > > Prathamesh
> > > > >
> > > > > Richard.
> > > > >
> > > > > > Thanks,
> > > > > > Prathamesh
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Richard.
> > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Prathamesh


Re: [PR47785] COLLECT_AS_OPTIONS

2020-01-14 Thread Prathamesh Kulkarni
On Wed, 8 Jan 2020 at 15:50, Prathamesh Kulkarni
 wrote:
>
> On Tue, 5 Nov 2019 at 17:38, Richard Biener  
> wrote:
> >
> > On Tue, Nov 5, 2019 at 12:17 AM Kugan Vivekanandarajah
> >  wrote:
> > >
> > > Hi,
> > > Thanks for the review.
> > >
> > > On Tue, 5 Nov 2019 at 03:57, H.J. Lu  wrote:
> > > >
> > > > On Sun, Nov 3, 2019 at 6:45 PM Kugan Vivekanandarajah
> > > >  wrote:
> > > > >
> > > > > Thanks for the reviews.
> > > > >
> > > > >
> > > > > On Sat, 2 Nov 2019 at 02:49, H.J. Lu  wrote:
> > > > > >
> > > > > > On Thu, Oct 31, 2019 at 6:33 PM Kugan Vivekanandarajah
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Wed, 30 Oct 2019 at 03:11, H.J. Lu  wrote:
> > > > > > > >
> > > > > > > > On Sun, Oct 27, 2019 at 6:33 PM Kugan Vivekanandarajah
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > Hi Richard,
> > > > > > > > >
> > > > > > > > > Thanks for the review.
> > > > > > > > >
> > > > > > > > > On Wed, 23 Oct 2019 at 23:07, Richard Biener 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, Oct 21, 2019 at 10:04 AM Kugan Vivekanandarajah
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > Hi Richard,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the pointers.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Fri, 11 Oct 2019 at 22:33, Richard Biener 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Oct 11, 2019 at 6:15 AM Kugan Vivekanandarajah
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > > > Thanks for the review.
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, 2 Oct 2019 at 20:41, Richard Biener 
> > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Oct 2, 2019 at 10:39 AM Kugan 
> > > > > > > > > > > > > > Vivekanandarajah
> > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > As mentioned in the PR, attached patch adds 
> > > > > > > > > > > > > > > COLLECT_AS_OPTIONS for
> > > > > > > > > > > > > > > passing assembler options specified with -Wa, to 
> > > > > > > > > > > > > > > the link-time driver.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The proposed solution only works for uniform -Wa 
> > > > > > > > > > > > > > > options across all
> > > > > > > > > > > > > > > TUs. As mentioned by Richard Biener, supporting 
> > > > > > > > > > > > > > > non-uniform -Wa flags
> > > > > > > > > > > > > > > would require either adjusting partitioning 
> > > > > > > > > > > > > > > according to flags or
> > > > > > > > > > > > > > > emitting multiple object files  from a single 
> > > > > > > > > > > > > > >

[RFC] [c-family] PR92867 - Add returns_arg attribute

2020-01-20 Thread Prathamesh Kulkarni
Hi,
This patch attempts to add returns_arg attribute for c-family
languages. For C++ methods, first arg is assumed to be this pointer,
similar to alloc_size.
I have a couple of doubts:

(a) I am not sure why DECL_ARGUMENTS (decl) in
handle_returns_arg_attribute returns NULL ? My intent was to check
that the return-type and argument-types are compatible.

(b) AFAIU the bottom two bits of call return flags are used for storing
args from 0 - 3 and the arg number can be obtained by
masking with ERF_RETURN_ARG_MASK.
So, does that mean we can only allow first 4 arguments to be annotated
with returns_arg attribute ?
In the patch, I use fn_spec ('argnum'), which gimple_call_return_flags
uses to mark the corresponding arg with ERF_RETURN_ARG:

(c) How to write dejaGNU test to check if fn_spec ('argnum') has been correctly
applied to the corresponding parameter ? In the patch, I am just
testing for validation.

Thanks,
Prathamesh
diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index dc9579c5c60..3fe3d2a6298 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -150,6 +150,7 @@ static tree handle_designated_init_attribute (tree *, tree, tree, int, bool *);
 static tree handle_patchable_function_entry_attribute (tree *, tree, tree,
 		   int, bool *);
 static tree handle_copy_attribute (tree *, tree, tree, int, bool *);
+static tree handle_returns_arg_attribute (tree *, tree, tree, int, bool *);
 
 /* Helper to define attribute exclusions.  */
 #define ATTR_EXCL(name, function, type, variable)	\
@@ -484,6 +485,8 @@ const struct attribute_spec c_common_attribute_table[] =
 			  handle_noinit_attribute, attr_noinit_exclusions },
   { "access",		  1, 3, false, true, true, false,
 			  handle_access_attribute, NULL },
+  { "returns_arg",	  1, 1, true, false, false, false,
+			  handle_returns_arg_attribute, NULL },
   { NULL, 0, 0, false, false, false, false, NULL, NULL }
 };
 
@@ -4603,6 +4606,80 @@ handle_patchable_function_entry_attribute (tree *, tree name, tree args,
   return NULL_TREE;
 }
 
+/* Handle a "returns_arg" attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+handle_returns_arg_attribute (tree *node, tree name, tree args,
+			  int flags, bool *no_add_attrs)
+{
+  tree decl = *node;
+  tree rettype = TREE_TYPE (decl);
+
+  if (TREE_CODE (rettype) == METHOD_TYPE
+  || TREE_CODE (rettype) == FUNCTION_TYPE)
+rettype = TREE_TYPE (rettype);
+
+  if (VOID_TYPE_P (rettype))
+{
+  warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wattributes,
+		  "%qE attribute ignored on a function returning %qT",
+		  name, rettype);
+  *no_add_attrs = true;
+  return NULL_TREE;
+}
+
+  gcc_assert (args);
+  tree val = TREE_VALUE (args);
+  if (TREE_CODE (val) != INTEGER_CST)
+{
+  error_at (DECL_SOURCE_LOCATION (decl),
+		"%qE attribute requires integer constant.", name);
+  *no_add_attrs = true;
+  return NULL_TREE;
+}
+
+  int argnum = TREE_INT_CST_LOW (val);
+  if (argnum >= 4)
+{
+  warning (OPT_Wattributes, "%qE attribute can only be applied"
+	   " to first 4 arguments.", name);
+  *no_add_attrs = true;
+  return NULL_TREE;
+}
+
+  tree parm;
+  int i;
+
+  for (i = 0, parm = DECL_ARGUMENTS (decl);
+   i < argnum && parm;
+   i++, parm = DECL_CHAIN (decl))
+;
+
+  if (parm &&
+  !types_compatible_p (TREE_TYPE (parm), rettype))
+{
+  warning (OPT_Wattributes, "%qE attribute parameter type %qT is"
+	   " incompatible with return-type %qT", name, TREE_TYPE (parm),
+	   rettype);
+  *no_add_attrs = true;
+  return NULL_TREE;
+}
+
+  *no_add_attrs = false;
+
+  char s[2];
+  s[0] = argnum + '0';
+  s[1] = '\0';
+
+
+  tree attr = tree_cons (get_identifier ("fn spec"),
+			 build_tree_list (NULL_TREE, build_string (1, s)),
+			 NULL_TREE);
+  decl_attributes (node, attr, flags);
+  return NULL_TREE;
+}
+
 /* Attempt to partially validate a single attribute ATTR as if
it were to be applied to an entity OPER.  */
 
diff --git a/gcc/testsuite/g++.dg/Wattributes-6.C b/gcc/testsuite/g++.dg/Wattributes-6.C
new file mode 100644
index 000..fcf660a4684
--- /dev/null
+++ b/gcc/testsuite/g++.dg/Wattributes-6.C
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* Check that 'this' is counted as first arg to the attribute.  */
+
+struct X
+{
+  X *f() __attribute__((returns_arg(1)));
+};
+
+int main()
+{
+  X x;
+  x.f();
+}
diff --git a/gcc/testsuite/gcc.dg/Wattributes-11.c b/gcc/testsuite/gcc.dg/Wattributes-11.c
new file mode 100644
index 000..e291107243f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Wattributes-11.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-Wattributes" } */
+
+int f1 (int) __attribute__((returns_arg)); /* { dg-error "wrong number of arguments specified for 'returns_arg' attribute" } */
+
+void f2 (int) __attribute__((returns_arg(1))); /* { dg-warning "'returns_arg' attr

Re: [PR47785] COLLECT_AS_OPTIONS

2020-01-23 Thread Prathamesh Kulkarni
On Mon, 20 Jan 2020 at 15:44, Richard Biener  wrote:
>
> On Wed, Jan 8, 2020 at 11:20 AM Prathamesh Kulkarni
>  wrote:
> >
> > On Tue, 5 Nov 2019 at 17:38, Richard Biener  
> > wrote:
> > >
> > > On Tue, Nov 5, 2019 at 12:17 AM Kugan Vivekanandarajah
> > >  wrote:
> > > >
> > > > Hi,
> > > > Thanks for the review.
> > > >
> > > > On Tue, 5 Nov 2019 at 03:57, H.J. Lu  wrote:
> > > > >
> > > > > On Sun, Nov 3, 2019 at 6:45 PM Kugan Vivekanandarajah
> > > > >  wrote:
> > > > > >
> > > > > > Thanks for the reviews.
> > > > > >
> > > > > >
> > > > > > On Sat, 2 Nov 2019 at 02:49, H.J. Lu  wrote:
> > > > > > >
> > > > > > > On Thu, Oct 31, 2019 at 6:33 PM Kugan Vivekanandarajah
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Wed, 30 Oct 2019 at 03:11, H.J. Lu  
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > On Sun, Oct 27, 2019 at 6:33 PM Kugan Vivekanandarajah
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Richard,
> > > > > > > > > >
> > > > > > > > > > Thanks for the review.
> > > > > > > > > >
> > > > > > > > > > On Wed, 23 Oct 2019 at 23:07, Richard Biener 
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Oct 21, 2019 at 10:04 AM Kugan Vivekanandarajah
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for the pointers.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, 11 Oct 2019 at 22:33, Richard Biener 
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Oct 11, 2019 at 6:15 AM Kugan Vivekanandarajah
> > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > > > > Thanks for the review.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, 2 Oct 2019 at 20:41, Richard Biener 
> > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, Oct 2, 2019 at 10:39 AM Kugan 
> > > > > > > > > > > > > > > Vivekanandarajah
> > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > As mentioned in the PR, attached patch adds 
> > > > > > > > > > > > > > > > COLLECT_AS_OPTIONS for
> > > > > > > > > > > > > > > > passing assembler options specified with -Wa, 
> > > > > > > > > > > > > > > > to the link-time driver.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The proposed solution only works for uniform 
> > > > > > > > > > > > > > > > -Wa options across all
> > > > > > > > > > > > > > > > TUs. As mentioned by Richard Biener, supporting 
> > > >

Re: [RFC] [c-family] PR92867 - Add returns_arg attribute

2020-01-24 Thread Prathamesh Kulkarni
On Tue, 21 Jan 2020 at 04:35, Joseph Myers  wrote:
>
> On Mon, 20 Jan 2020, Prathamesh Kulkarni wrote:
>
> > Hi,
> > This patch attempts to add returns_arg attribute for c-family
> > languages. For C++ methods, first arg is assumed to be this pointer,
>
> This is missing .texi documentation explaining the attribute and the cases
> for which it would be useful.
>
> A restriction to the first 4 arguments is not a good design of a language
> feature, whatever implementation issues there may be.
>
> Do you intend to update builtins.def in a followup patch for the various
> built-in functions (e.g. memcpy) for which such an attribute would be
> applicable?
>
> When extracting an integer value from an INTEGER_CST provided in user
> source code, you should always use tree_to_uhwi / tree_to_shwi as
> appropriate, after checking the relevant tree_fits_*, rather than using
> TREE_INT_CST_LOW directly, to avoid mishandling arguments that have a
> small number in the low part of the INTEGER_CST but have bits set in the
> high part as well.  Any direct use of TREE_INT_CST_LOW should have a
> specific justification for why it is correct to discard the high part of
> the integer.  See c-attribs.c:positional_argument, and try to use that
> function if possible rather than reimplementing bits of it, so that
> handling of attribute arguments giving the position of a function argument
> can be as consistent as possible between different attributes.
>
> There are coding style issues, e.g. diagnostics should not end with '.'
> and lines should be broken before not after an operator.
Hi Joseph,
Thanks for the suggestions. Using positional_argument helped to
simplify the patch,
and also catches the case when return-type and arg-type differ.
Does it look OK ?
I will update builtins.def in follow-up patch.

The middle-end representation issue of ERF_RETURNS_ARG still remains,
which restricts the attribute till first four args. The patch simply
emits sorry(), for arguments beyond first four..
I will try to address this in follow up patch.

Thanks,
Prathamesh
>
> --
> Joseph S. Myers
> jos...@codesourcery.com
diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index dc9579c5c60..baed1b811ba 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -150,6 +150,7 @@ static tree handle_designated_init_attribute (tree *, tree, tree, int, bool *);
 static tree handle_patchable_function_entry_attribute (tree *, tree, tree,
 		   int, bool *);
 static tree handle_copy_attribute (tree *, tree, tree, int, bool *);
+static tree handle_returns_arg_attribute (tree *, tree, tree, int, bool *);
 
 /* Helper to define attribute exclusions.  */
 #define ATTR_EXCL(name, function, type, variable)	\
@@ -484,6 +485,8 @@ const struct attribute_spec c_common_attribute_table[] =
 			  handle_noinit_attribute, attr_noinit_exclusions },
   { "access",		  1, 3, false, true, true, false,
 			  handle_access_attribute, NULL },
+  { "returns_arg",	  1, 1, true, false, false, false,
+			  handle_returns_arg_attribute, NULL },
   { NULL, 0, 0, false, false, false, false, NULL, NULL }
 };
 
@@ -4603,6 +4606,55 @@ handle_patchable_function_entry_attribute (tree *, tree name, tree args,
   return NULL_TREE;
 }
 
+/* Handle a "returns_arg" attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+handle_returns_arg_attribute (tree *node, tree name, tree args,
+			  int flags, bool *no_add_attrs)
+{
+  tree decl = *node;
+  tree rettype = TREE_TYPE (decl);
+
+  if (TREE_CODE (rettype) == METHOD_TYPE
+  || TREE_CODE (rettype) == FUNCTION_TYPE)
+rettype = TREE_TYPE (rettype);
+
+  if (VOID_TYPE_P (rettype))
+{
+  warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wattributes,
+		  "%qE attribute ignored on a function returning %qT.",
+		  name, rettype);
+  *no_add_attrs = true;
+  return NULL_TREE;
+}
+
+  if (!positional_argument (TREE_TYPE (decl), name, TREE_VALUE (args),
+			TREE_CODE (rettype)))
+{
+  *no_add_attrs = true;
+  return NULL_TREE;
+}
+
+  *no_add_attrs = false;
+  gcc_assert (args);
+  tree val = TREE_VALUE (args);
+  unsigned HOST_WIDE_INT argnum = tree_to_uhwi (val);
+
+  if (argnum >= 4)
+sorry ("returns_arg attr can only be applied to first four args.\n");
+
+  char s[2];
+  s[0] = argnum + '0';
+  s[1] = '\0';
+
+  tree attr = tree_cons (get_identifier ("fn spec"),
+			 build_tree_list (NULL_TREE, build_string (1, s)),
+			 NULL_TREE);
+  decl_attributes (node, attr, flags);
+  return NULL_TREE;
+}
+
 /* Attempt to partially validate a single attribute ATTR as if
it were to be applied to an entity OPER.  */
 
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index ec99c38a607..3531e0c82

Re: [RFC] [c-family] PR92867 - Add returns_arg attribute

2020-01-28 Thread Prathamesh Kulkarni
On Mon, 27 Jan 2020 at 17:36, Richard Biener  wrote:
>
> On Fri, Jan 24, 2020 at 11:53 PM Joseph Myers  wrote:
> >
> > On Fri, 24 Jan 2020, Prathamesh Kulkarni wrote:
> >
> > > The middle-end representation issue of ERF_RETURNS_ARG still remains,
> > > which restricts the attribute till first four args. The patch simply
> > > emits sorry(), for arguments beyond first four..
> >
> > I think this should be fixed (e.g. make the middle-end check for the
> > attribute, or something like that).
>
> Since it's pure optimization you can also simply chose to ignore this without
> notice.
>
> Note ERF_RETURN_ARG_MASK is limited to the number of available
> bits in an int and practically the only current setter was via "fn spec"
> which uses a single digit [1-9] to denote the argument (so limiting to
> three is indeed an odd choice but matches builtins using this at the moment).
>
> Feel free to up ERF_RETURN_ARG_MASK (but then you need to adjust
> the ERF_ flag defines).
Hi,
Thanks for the suggestions. In the attached patch I bumped up value of
ERF_RETURNS_ARG_MASK
to UINT_MAX >> 2, and use highest two bits for ERF_NOALIAS and ERF_RETURNS_ARG.
And use fn spec "Z" to store the argument number in fn-spec format.
Does that look OK ?

In gimple_call_return_flags, I didn't remove the existing fn spec
"0-3" in this patch, since RET1 (and possibly others?) depend on it.
I will remove that and adjust other cases to use fn-spec "Z"
if that's OK, in follow-up patch.

Thanks,
Prathamesh
>
> >  The language semantics of the
> > attribute should not be driven by such internal implementation details;
> > rather, implementation details should be determined by the language
> > semantics to be implemented.
> >
> > The sorry () has coding style issues.  Diagnostics should not end with '.'
> > or '\n', should use full words (attribute not attr, arguments not args)
> > and programming language text in them should be surrounded by %<%> (so
> > %).
> >
> > --
> > Joseph S. Myers
> > jos...@codesourcery.com
diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index dc9579c5c60..2ed41ed136d 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -150,6 +150,7 @@ static tree handle_designated_init_attribute (tree *, tree, tree, int, bool *);
 static tree handle_patchable_function_entry_attribute (tree *, tree, tree,
 		   int, bool *);
 static tree handle_copy_attribute (tree *, tree, tree, int, bool *);
+static tree handle_returns_arg_attribute (tree *, tree, tree, int, bool *);
 
 /* Helper to define attribute exclusions.  */
 #define ATTR_EXCL(name, function, type, variable)	\
@@ -484,6 +485,8 @@ const struct attribute_spec c_common_attribute_table[] =
 			  handle_noinit_attribute, attr_noinit_exclusions },
   { "access",		  1, 3, false, true, true, false,
 			  handle_access_attribute, NULL },
+  { "returns_arg",	  1, 1, true, false, false, false,
+			  handle_returns_arg_attribute, NULL },
   { NULL, 0, 0, false, false, false, false, NULL, NULL }
 };
 
@@ -4603,6 +4606,53 @@ handle_patchable_function_entry_attribute (tree *, tree name, tree args,
   return NULL_TREE;
 }
 
+/* Handle a "returns_arg" attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+handle_returns_arg_attribute (tree *node, tree name, tree args,
+			  int flags, bool *no_add_attrs)
+{
+  tree decl = *node;
+  tree rettype = TREE_TYPE (decl);
+
+  if (TREE_CODE (rettype) == METHOD_TYPE
+  || TREE_CODE (rettype) == FUNCTION_TYPE)
+rettype = TREE_TYPE (rettype);
+
+  if (VOID_TYPE_P (rettype))
+{
+  warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wattributes,
+		  "%qE attribute ignored on a function returning %qT.",
+		  name, rettype);
+  *no_add_attrs = true;
+  return NULL_TREE;
+}
+
+  if (!positional_argument (TREE_TYPE (decl), name, TREE_VALUE (args),
+			TREE_CODE (rettype)))
+{
+  *no_add_attrs = true;
+  return NULL_TREE;
+}
+
+  *no_add_attrs = false;
+  gcc_assert (args);
+  tree val = TREE_VALUE (args);
+  unsigned HOST_WIDE_INT argnum = tree_to_uhwi (val);
+  char *s = (char *) xmalloc (sizeof (char) * BITS_PER_WORD);
+  s[0] = 'Z';
+  sprintf (s + 1, "%lu", argnum);
+
+  tree attr = tree_cons (get_identifier ("fn spec"),
+			 build_tree_list (NULL_TREE,
+	  build_string (strlen (s), s)),
+			 NULL_TREE);
+  decl_attributes (node, attr, flags);
+  free (s);
+  return NULL_TREE;
+}
+
 /* Attempt to partially validate a single attribute ATTR as if
it were to be applied to an entity OPER.  */
 
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index ec9

Re: [RFC] [c-family] PR92867 - Add returns_arg attribute

2020-01-28 Thread Prathamesh Kulkarni
On Tue, 28 Jan 2020 at 17:00, Jakub Jelinek  wrote:
>
> On Tue, Jan 28, 2020 at 04:56:59PM +0530, Prathamesh Kulkarni wrote:
> > Thanks for the suggestions. In the attached patch I bumped up value of
> > ERF_RETURNS_ARG_MASK
> > to UINT_MAX >> 2, and use highest two bits for ERF_NOALIAS and 
> > ERF_RETURNS_ARG.
> > And use fn spec "Z" to store the argument number in fn-spec format.
> > Does that look OK ?
>
> No.
>
> +#define ERF_RETURN_ARG_MASK(UINT_MAX >> 2)
>
>  /* Nonzero if the return value is equal to the argument number
> flags & ERF_RETURN_ARG_MASK.  */
> -#define ERF_RETURNS_ARG(1 << 2)
> +#define ERF_RETURNS_ARG(1 << (BITS_PER_WORD - 2))
>
> How is size of host int related to BITS_PER_WORD?  Not to mention that
> if BITS_PER_WORD is 64 and host int is 32-bit, 1 << (64 - 2) is UB.
Oops sorry. I should have used HOST_BITS_PER_INT.
I assume that'd be correct ?

Thanks,
Prathamesh
>
> Jakub
>


Re: [PR47785] COLLECT_AS_OPTIONS

2020-01-29 Thread Prathamesh Kulkarni
On Tue, 28 Jan 2020 at 17:17, Richard Biener  wrote:
>
> On Fri, Jan 24, 2020 at 7:04 AM Prathamesh Kulkarni
>  wrote:
> >
> > On Mon, 20 Jan 2020 at 15:44, Richard Biener  
> > wrote:
> > >
> > > On Wed, Jan 8, 2020 at 11:20 AM Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Tue, 5 Nov 2019 at 17:38, Richard Biener 
> > > >  wrote:
> > > > >
> > > > > On Tue, Nov 5, 2019 at 12:17 AM Kugan Vivekanandarajah
> > > > >  wrote:
> > > > > >
> > > > > > Hi,
> > > > > > Thanks for the review.
> > > > > >
> > > > > > On Tue, 5 Nov 2019 at 03:57, H.J. Lu  wrote:
> > > > > > >
> > > > > > > On Sun, Nov 3, 2019 at 6:45 PM Kugan Vivekanandarajah
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > Thanks for the reviews.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Sat, 2 Nov 2019 at 02:49, H.J. Lu  
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > On Thu, Oct 31, 2019 at 6:33 PM Kugan Vivekanandarajah
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Wed, 30 Oct 2019 at 03:11, H.J. Lu  
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Sun, Oct 27, 2019 at 6:33 PM Kugan Vivekanandarajah
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for the review.
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, 23 Oct 2019 at 23:07, Richard Biener 
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Oct 21, 2019 at 10:04 AM Kugan 
> > > > > > > > > > > > > Vivekanandarajah
> > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for the pointers.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Fri, 11 Oct 2019 at 22:33, Richard Biener 
> > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Fri, Oct 11, 2019 at 6:15 AM Kugan 
> > > > > > > > > > > > > > > Vivekanandarajah
> > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > > > > > > Thanks for the review.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Wed, 2 Oct 2019 at 20:41, Richard Biener 
> > > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Wed, Oct 2, 2019 at 10:39 AM Kugan 
> > > > > > > > > > > > > > > > > Vivekanandarajah
> > > > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Hi,
> > > > > > &

Re: [RFC] [c-family] PR92867 - Add returns_arg attribute

2020-01-30 Thread Prathamesh Kulkarni
On Wed, 29 Jan 2020 at 14:38, Richard Biener  wrote:
>
> On Tue, Jan 28, 2020 at 1:02 PM Jakub Jelinek  wrote:
> >
> > On Tue, Jan 28, 2020 at 05:09:36PM +0530, Prathamesh Kulkarni wrote:
> > > On Tue, 28 Jan 2020 at 17:00, Jakub Jelinek  wrote:
> > > >
> > > > On Tue, Jan 28, 2020 at 04:56:59PM +0530, Prathamesh Kulkarni wrote:
> > > > > Thanks for the suggestions. In the attached patch I bumped up value of
> > > > > ERF_RETURNS_ARG_MASK
> > > > > to UINT_MAX >> 2, and use highest two bits for ERF_NOALIAS and 
> > > > > ERF_RETURNS_ARG.
> > > > > And use fn spec "Z" to store the argument number in fn-spec 
> > > > > format.
> > > > > Does that look OK ?
> > > >
> > > > No.
> > > >
> > > > +#define ERF_RETURN_ARG_MASK(UINT_MAX >> 2)
> > > >
> > > >  /* Nonzero if the return value is equal to the argument number
> > > > flags & ERF_RETURN_ARG_MASK.  */
> > > > -#define ERF_RETURNS_ARG(1 << 2)
> > > > +#define ERF_RETURNS_ARG(1 << (BITS_PER_WORD - 2))
> > > >
> > > > How is size of host int related to BITS_PER_WORD?  Not to mention that
> > > > if BITS_PER_WORD is 64 and host int is 32-bit, 1 << (64 - 2) is UB.
> > > Oops sorry. I should have used HOST_BITS_PER_INT.
> > > I assume that'd be correct ?
> >
> > It still wouldn't, 1 << (HOST_BITS_PER_INT - 1) is negative number, you'd
> > need either 1U and verify all ERF_* flags uses, or avoid using the sign bit.
> > The patch has other issues, you don't verify that the argnum fits into
> > the bits (doesn't overflow into the other ERF_* bits), in
> > +  char *s = (char *) xmalloc (sizeof (char) * BITS_PER_WORD);
> > +  s[0] = 'Z';
> > +  sprintf (s + 1, "%lu", argnum);
> > 1) sizeof (char) is 1 by definition
> > 2) it is pointless to allocate it and then deallocate (just use automatic
> > array)
> > 3) it is unclear how is BITS_PER_WORD related to the length of decimal
> > encoded string + Z char + terminating '\0'.  The usual way is for unsigned
> > numbers to estimate number of digits by counting 3 digits per each 8 bits,
> > in your case of course + 2.
> >
> > More importantly, the "fn spec" attribute isn't used just in
> > gimple_call_return_flags, but also in e.g. gimple_call_arg_flags which
> > assumes that the return stuff in there is a single char and the reaming
> > chars are for argument descriptions, or in decl_return_flags which you
> > haven't modified.
> >
> > I must say I fail to see the point in trying to glue this together into the
> > "fn spec" argument so incompatibly, why can't we handle the fn spec with its
> > 1-4 returns_arg and returns_arg attribute with arbitrary position
> > side-by-side?
>
> Yeah, I wouldn't have added "fn spec" for "returns_arg" but kept the
> query interface thus access it via gimple_call_return_flags and use
> ERF_*.  For the flags adjustment just up the maximum argument
> to 1<<15 then the argument number is also nicely aligned, no need
> to do fancy limiting that depends on the host.  For too large
> argument numbers just warn the attribute is ignored.  I'd say even
> a max of 255 is sane just the existing limit is a bit too low.
Hi,
Thanks for the suggestions. In the attached patch, I use TREE_VALUE
(attr) to store/retrieve
arbitrary argument position, and have bumped ERF_RETURNS_ARG_MASK to 0x3fff.
Does it look OK ?

Thanks,
Prathamesh
>
> Richard.
>
> > Jakub
> >
diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index dc9579c5c60..c6d5bbd1d7a 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -150,6 +150,7 @@ static tree handle_designated_init_attribute (tree *, tree, tree, int, bool *);
 static tree handle_patchable_function_entry_attribute (tree *, tree, tree,
 		   int, bool *);
 static tree handle_copy_attribute (tree *, tree, tree, int, bool *);
+static tree handle_returns_arg_attribute (tree *, tree, tree, int, bool *);
 
 /* Helper to define attribute exclusions.  */
 #define ATTR_EXCL(name, function, type, variable)	\
@@ -484,6 +485,8 @@ const struct attribute_spec c_common_attribute_table[] =
 			  handle_noinit_attribute, attr_noinit_exclusions },
   { "access",		  1, 3, false, true, true, false,
 			  handle_access_attribute, NULL },
+  { "returns_arg",	  1, 1, true, false, false,

Re: [RFC] [c-family] PR92867 - Add returns_arg attribute

2020-02-03 Thread Prathamesh Kulkarni
On Thu, 30 Jan 2020 at 19:17, Richard Biener  wrote:
>
> On Thu, Jan 30, 2020 at 11:49 AM Prathamesh Kulkarni
>  wrote:
> >
> > On Wed, 29 Jan 2020 at 14:38, Richard Biener  
> > wrote:
> > >
> > > On Tue, Jan 28, 2020 at 1:02 PM Jakub Jelinek  wrote:
> > > >
> > > > On Tue, Jan 28, 2020 at 05:09:36PM +0530, Prathamesh Kulkarni wrote:
> > > > > On Tue, 28 Jan 2020 at 17:00, Jakub Jelinek  wrote:
> > > > > >
> > > > > > On Tue, Jan 28, 2020 at 04:56:59PM +0530, Prathamesh Kulkarni wrote:
> > > > > > > Thanks for the suggestions. In the attached patch I bumped up 
> > > > > > > value of
> > > > > > > ERF_RETURNS_ARG_MASK
> > > > > > > to UINT_MAX >> 2, and use highest two bits for ERF_NOALIAS and 
> > > > > > > ERF_RETURNS_ARG.
> > > > > > > And use fn spec "Z" to store the argument number in 
> > > > > > > fn-spec format.
> > > > > > > Does that look OK ?
> > > > > >
> > > > > > No.
> > > > > >
> > > > > > +#define ERF_RETURN_ARG_MASK(UINT_MAX >> 2)
> > > > > >
> > > > > >  /* Nonzero if the return value is equal to the argument number
> > > > > > flags & ERF_RETURN_ARG_MASK.  */
> > > > > > -#define ERF_RETURNS_ARG(1 << 2)
> > > > > > +#define ERF_RETURNS_ARG(1 << (BITS_PER_WORD - 2))
> > > > > >
> > > > > > How is size of host int related to BITS_PER_WORD?  Not to mention 
> > > > > > that
> > > > > > if BITS_PER_WORD is 64 and host int is 32-bit, 1 << (64 - 2) is UB.
> > > > > Oops sorry. I should have used HOST_BITS_PER_INT.
> > > > > I assume that'd be correct ?
> > > >
> > > > It still wouldn't, 1 << (HOST_BITS_PER_INT - 1) is negative number, 
> > > > you'd
> > > > need either 1U and verify all ERF_* flags uses, or avoid using the sign 
> > > > bit.
> > > > The patch has other issues, you don't verify that the argnum fits into
> > > > the bits (doesn't overflow into the other ERF_* bits), in
> > > > +  char *s = (char *) xmalloc (sizeof (char) * BITS_PER_WORD);
> > > > +  s[0] = 'Z';
> > > > +  sprintf (s + 1, "%lu", argnum);
> > > > 1) sizeof (char) is 1 by definition
> > > > 2) it is pointless to allocate it and then deallocate (just use 
> > > > automatic
> > > > array)
> > > > 3) it is unclear how is BITS_PER_WORD related to the length of decimal
> > > > encoded string + Z char + terminating '\0'.  The usual way is for 
> > > > unsigned
> > > > numbers to estimate number of digits by counting 3 digits per each 8 
> > > > bits,
> > > > in your case of course + 2.
> > > >
> > > > More importantly, the "fn spec" attribute isn't used just in
> > > > gimple_call_return_flags, but also in e.g. gimple_call_arg_flags which
> > > > assumes that the return stuff in there is a single char and the reaming
> > > > chars are for argument descriptions, or in decl_return_flags which you
> > > > haven't modified.
> > > >
> > > > I must say I fail to see the point in trying to glue this together into 
> > > > the
> > > > "fn spec" argument so incompatibly, why can't we handle the fn spec 
> > > > with its
> > > > 1-4 returns_arg and returns_arg attribute with arbitrary position
> > > > side-by-side?
> > >
> > > Yeah, I wouldn't have added "fn spec" for "returns_arg" but kept the
> > > query interface thus access it via gimple_call_return_flags and use
> > > ERF_*.  For the flags adjustment just up the maximum argument
> > > to 1<<15 then the argument number is also nicely aligned, no need
> > > to do fancy limiting that depends on the host.  For too large
> > > argument numbers just warn the attribute is ignored.  I'd say even
> > > a max of 255 is sane just the existing limit is a bit too low.
> > Hi,
> > Thanks for the suggestions. In the attached patch, I use TREE_VALUE
> > (attr) to store/retrieve
> > arbitrary argument position, and have bumped ERF_RETURNS_ARG_MASK to 

Re: [RFC] [c-family] PR92867 - Add returns_arg attribute

2020-02-03 Thread Prathamesh Kulkarni
On Mon, 3 Feb 2020 at 14:41, Prathamesh Kulkarni
 wrote:
>
> On Thu, 30 Jan 2020 at 19:17, Richard Biener  
> wrote:
> >
> > On Thu, Jan 30, 2020 at 11:49 AM Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Wed, 29 Jan 2020 at 14:38, Richard Biener  
> > > wrote:
> > > >
> > > > On Tue, Jan 28, 2020 at 1:02 PM Jakub Jelinek  wrote:
> > > > >
> > > > > On Tue, Jan 28, 2020 at 05:09:36PM +0530, Prathamesh Kulkarni wrote:
> > > > > > On Tue, 28 Jan 2020 at 17:00, Jakub Jelinek  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Tue, Jan 28, 2020 at 04:56:59PM +0530, Prathamesh Kulkarni 
> > > > > > > wrote:
> > > > > > > > Thanks for the suggestions. In the attached patch I bumped up 
> > > > > > > > value of
> > > > > > > > ERF_RETURNS_ARG_MASK
> > > > > > > > to UINT_MAX >> 2, and use highest two bits for ERF_NOALIAS and 
> > > > > > > > ERF_RETURNS_ARG.
> > > > > > > > And use fn spec "Z" to store the argument number in 
> > > > > > > > fn-spec format.
> > > > > > > > Does that look OK ?
> > > > > > >
> > > > > > > No.
> > > > > > >
> > > > > > > +#define ERF_RETURN_ARG_MASK(UINT_MAX >> 2)
> > > > > > >
> > > > > > >  /* Nonzero if the return value is equal to the argument number
> > > > > > > flags & ERF_RETURN_ARG_MASK.  */
> > > > > > > -#define ERF_RETURNS_ARG(1 << 2)
> > > > > > > +#define ERF_RETURNS_ARG(1 << (BITS_PER_WORD - 2))
> > > > > > >
> > > > > > > How is size of host int related to BITS_PER_WORD?  Not to mention 
> > > > > > > that
> > > > > > > if BITS_PER_WORD is 64 and host int is 32-bit, 1 << (64 - 2) is 
> > > > > > > UB.
> > > > > > Oops sorry. I should have used HOST_BITS_PER_INT.
> > > > > > I assume that'd be correct ?
> > > > >
> > > > > It still wouldn't, 1 << (HOST_BITS_PER_INT - 1) is negative number, 
> > > > > you'd
> > > > > need either 1U and verify all ERF_* flags uses, or avoid using the 
> > > > > sign bit.
> > > > > The patch has other issues, you don't verify that the argnum fits into
> > > > > the bits (doesn't overflow into the other ERF_* bits), in
> > > > > +  char *s = (char *) xmalloc (sizeof (char) * BITS_PER_WORD);
> > > > > +  s[0] = 'Z';
> > > > > +  sprintf (s + 1, "%lu", argnum);
> > > > > 1) sizeof (char) is 1 by definition
> > > > > 2) it is pointless to allocate it and then deallocate (just use 
> > > > > automatic
> > > > > array)
> > > > > 3) it is unclear how is BITS_PER_WORD related to the length of decimal
> > > > > encoded string + Z char + terminating '\0'.  The usual way is for 
> > > > > unsigned
> > > > > numbers to estimate number of digits by counting 3 digits per each 8 
> > > > > bits,
> > > > > in your case of course + 2.
> > > > >
> > > > > More importantly, the "fn spec" attribute isn't used just in
> > > > > gimple_call_return_flags, but also in e.g. gimple_call_arg_flags which
> > > > > assumes that the return stuff in there is a single char and the 
> > > > > reaming
> > > > > chars are for argument descriptions, or in decl_return_flags which you
> > > > > haven't modified.
> > > > >
> > > > > I must say I fail to see the point in trying to glue this together 
> > > > > into the
> > > > > "fn spec" argument so incompatibly, why can't we handle the fn spec 
> > > > > with its
> > > > > 1-4 returns_arg and returns_arg attribute with arbitrary position
> > > > > side-by-side?
> > > >
> > > > Yeah, I wouldn't have added "fn spec" for "returns_arg" but kept the
> > > > query interface thus access it via gimple_call_return_flags and use
> > > > ERF_*.  For the flags adjustmen

Re: [PR47785] COLLECT_AS_OPTIONS

2020-02-03 Thread Prathamesh Kulkarni
On Thu, 30 Jan 2020 at 19:10, Richard Biener  wrote:
>
> On Thu, Jan 30, 2020 at 5:31 AM Prathamesh Kulkarni
>  wrote:
> >
> > On Tue, 28 Jan 2020 at 17:17, Richard Biener  
> > wrote:
> > >
> > > On Fri, Jan 24, 2020 at 7:04 AM Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Mon, 20 Jan 2020 at 15:44, Richard Biener 
> > > >  wrote:
> > > > >
> > > > > On Wed, Jan 8, 2020 at 11:20 AM Prathamesh Kulkarni
> > > > >  wrote:
> > > > > >
> > > > > > On Tue, 5 Nov 2019 at 17:38, Richard Biener 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Tue, Nov 5, 2019 at 12:17 AM Kugan Vivekanandarajah
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > Hi,
> > > > > > > > Thanks for the review.
> > > > > > > >
> > > > > > > > On Tue, 5 Nov 2019 at 03:57, H.J. Lu  
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > On Sun, Nov 3, 2019 at 6:45 PM Kugan Vivekanandarajah
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > Thanks for the reviews.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Sat, 2 Nov 2019 at 02:49, H.J. Lu  
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Oct 31, 2019 at 6:33 PM Kugan Vivekanandarajah
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, 30 Oct 2019 at 03:11, H.J. Lu 
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Sun, Oct 27, 2019 at 6:33 PM Kugan Vivekanandarajah
> > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for the review.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, 23 Oct 2019 at 23:07, Richard Biener 
> > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Mon, Oct 21, 2019 at 10:04 AM Kugan 
> > > > > > > > > > > > > > > Vivekanandarajah
> > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks for the pointers.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Fri, 11 Oct 2019 at 22:33, Richard Biener 
> > > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Fri, Oct 11, 2019 at 6:15 AM Kugan 
> > > > > > > > > > > > > > > > > Vivekanandarajah
> > > > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > > > > > > > > Thanks for the review.
> > > > > > > > > > > >

Re: [RFC] [c-family] PR92867 - Add returns_arg attribute

2020-02-04 Thread Prathamesh Kulkarni
On Mon, 3 Feb 2020 at 14:56, Prathamesh Kulkarni
 wrote:
>
> On Mon, 3 Feb 2020 at 14:41, Prathamesh Kulkarni
>  wrote:
> >
> > On Thu, 30 Jan 2020 at 19:17, Richard Biener  
> > wrote:
> > >
> > > On Thu, Jan 30, 2020 at 11:49 AM Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Wed, 29 Jan 2020 at 14:38, Richard Biener 
> > > >  wrote:
> > > > >
> > > > > On Tue, Jan 28, 2020 at 1:02 PM Jakub Jelinek  
> > > > > wrote:
> > > > > >
> > > > > > On Tue, Jan 28, 2020 at 05:09:36PM +0530, Prathamesh Kulkarni wrote:
> > > > > > > On Tue, 28 Jan 2020 at 17:00, Jakub Jelinek  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Tue, Jan 28, 2020 at 04:56:59PM +0530, Prathamesh Kulkarni 
> > > > > > > > wrote:
> > > > > > > > > Thanks for the suggestions. In the attached patch I bumped up 
> > > > > > > > > value of
> > > > > > > > > ERF_RETURNS_ARG_MASK
> > > > > > > > > to UINT_MAX >> 2, and use highest two bits for ERF_NOALIAS 
> > > > > > > > > and ERF_RETURNS_ARG.
> > > > > > > > > And use fn spec "Z" to store the argument number in 
> > > > > > > > > fn-spec format.
> > > > > > > > > Does that look OK ?
> > > > > > > >
> > > > > > > > No.
> > > > > > > >
> > > > > > > > +#define ERF_RETURN_ARG_MASK(UINT_MAX >> 2)
> > > > > > > >
> > > > > > > >  /* Nonzero if the return value is equal to the argument number
> > > > > > > > flags & ERF_RETURN_ARG_MASK.  */
> > > > > > > > -#define ERF_RETURNS_ARG(1 << 2)
> > > > > > > > +#define ERF_RETURNS_ARG(1 << (BITS_PER_WORD - 
> > > > > > > > 2))
> > > > > > > >
> > > > > > > > How is size of host int related to BITS_PER_WORD?  Not to 
> > > > > > > > mention that
> > > > > > > > if BITS_PER_WORD is 64 and host int is 32-bit, 1 << (64 - 2) is 
> > > > > > > > UB.
> > > > > > > Oops sorry. I should have used HOST_BITS_PER_INT.
> > > > > > > I assume that'd be correct ?
> > > > > >
> > > > > > It still wouldn't, 1 << (HOST_BITS_PER_INT - 1) is negative number, 
> > > > > > you'd
> > > > > > need either 1U and verify all ERF_* flags uses, or avoid using the 
> > > > > > sign bit.
> > > > > > The patch has other issues, you don't verify that the argnum fits 
> > > > > > into
> > > > > > the bits (doesn't overflow into the other ERF_* bits), in
> > > > > > +  char *s = (char *) xmalloc (sizeof (char) * BITS_PER_WORD);
> > > > > > +  s[0] = 'Z';
> > > > > > +  sprintf (s + 1, "%lu", argnum);
> > > > > > 1) sizeof (char) is 1 by definition
> > > > > > 2) it is pointless to allocate it and then deallocate (just use 
> > > > > > automatic
> > > > > > array)
> > > > > > 3) it is unclear how is BITS_PER_WORD related to the length of 
> > > > > > decimal
> > > > > > encoded string + Z char + terminating '\0'.  The usual way is for 
> > > > > > unsigned
> > > > > > numbers to estimate number of digits by counting 3 digits per each 
> > > > > > 8 bits,
> > > > > > in your case of course + 2.
> > > > > >
> > > > > > More importantly, the "fn spec" attribute isn't used just in
> > > > > > gimple_call_return_flags, but also in e.g. gimple_call_arg_flags 
> > > > > > which
> > > > > > assumes that the return stuff in there is a single char and the 
> > > > > > reaming
> > > > > > chars are for argument descriptions, or in decl_return_flags which 
> > > > > > you
> > > > > > haven't modified.
> > > > > >
> > > > > > I must say I fail t

Re: [PR47785] COLLECT_AS_OPTIONS

2020-02-06 Thread Prathamesh Kulkarni
On Tue, 4 Feb 2020 at 19:44, Richard Biener  wrote:
>
> On Mon, Feb 3, 2020 at 12:37 PM Prathamesh Kulkarni
>  wrote:
> >
> > On Thu, 30 Jan 2020 at 19:10, Richard Biener  
> > wrote:
> > >
> > > On Thu, Jan 30, 2020 at 5:31 AM Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Tue, 28 Jan 2020 at 17:17, Richard Biener 
> > > >  wrote:
> > > > >
> > > > > On Fri, Jan 24, 2020 at 7:04 AM Prathamesh Kulkarni
> > > > >  wrote:
> > > > > >
> > > > > > On Mon, 20 Jan 2020 at 15:44, Richard Biener 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Wed, Jan 8, 2020 at 11:20 AM Prathamesh Kulkarni
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Tue, 5 Nov 2019 at 17:38, Richard Biener 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Tue, Nov 5, 2019 at 12:17 AM Kugan Vivekanandarajah
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > > Thanks for the review.
> > > > > > > > > >
> > > > > > > > > > On Tue, 5 Nov 2019 at 03:57, H.J. Lu  
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Sun, Nov 3, 2019 at 6:45 PM Kugan Vivekanandarajah
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for the reviews.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Sat, 2 Nov 2019 at 02:49, H.J. Lu 
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Oct 31, 2019 at 6:33 PM Kugan Vivekanandarajah
> > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, 30 Oct 2019 at 03:11, H.J. Lu 
> > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Sun, Oct 27, 2019 at 6:33 PM Kugan 
> > > > > > > > > > > > > > > Vivekanandarajah
> > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks for the review.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Wed, 23 Oct 2019 at 23:07, Richard Biener 
> > > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Mon, Oct 21, 2019 at 10:04 AM Kugan 
> > > > > > > > > > > > > > > > > Vivekanandarajah
> > > > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks for the pointers.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Fri, 11 Oct 20

Re: [PR47785] COLLECT_AS_OPTIONS

2020-02-06 Thread Prathamesh Kulkarni
On Thu, 6 Feb 2020 at 18:42, Richard Biener  wrote:
>
> On Thu, Feb 6, 2020 at 1:48 PM Prathamesh Kulkarni
>  wrote:
> >
> > On Tue, 4 Feb 2020 at 19:44, Richard Biener  
> > wrote:
> > >
> > > On Mon, Feb 3, 2020 at 12:37 PM Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Thu, 30 Jan 2020 at 19:10, Richard Biener 
> > > >  wrote:
> > > > >
> > > > > On Thu, Jan 30, 2020 at 5:31 AM Prathamesh Kulkarni
> > > > >  wrote:
> > > > > >
> > > > > > On Tue, 28 Jan 2020 at 17:17, Richard Biener 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Fri, Jan 24, 2020 at 7:04 AM Prathamesh Kulkarni
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Mon, 20 Jan 2020 at 15:44, Richard Biener 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Wed, Jan 8, 2020 at 11:20 AM Prathamesh Kulkarni
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, 5 Nov 2019 at 17:38, Richard Biener 
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Nov 5, 2019 at 12:17 AM Kugan Vivekanandarajah
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Hi,
> > > > > > > > > > > > Thanks for the review.
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, 5 Nov 2019 at 03:57, H.J. Lu 
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Sun, Nov 3, 2019 at 6:45 PM Kugan Vivekanandarajah
> > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for the reviews.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Sat, 2 Nov 2019 at 02:49, H.J. Lu 
> > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 6:33 PM Kugan 
> > > > > > > > > > > > > > > Vivekanandarajah
> > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Wed, 30 Oct 2019 at 03:11, H.J. Lu 
> > > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Sun, Oct 27, 2019 at 6:33 PM Kugan 
> > > > > > > > > > > > > > > > > Vivekanandarajah
> > > > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks for the review.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Wed, 23 Oct 2019 at 23:07, Richard 
> > > > > > > > > > > > > > > > > > Biener  wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Mon, Oct 21, 2019 at 10:04 AM Kugan 
> > > > > > > > > > > > > > > > > > > Vivekanandarajah
> > > > > >

Re: [RFC] [c-family] PR92867 - Add returns_arg attribute

2020-02-11 Thread Prathamesh Kulkarni
On Tue, 4 Feb 2020 at 14:54, Prathamesh Kulkarni
 wrote:
>
> On Mon, 3 Feb 2020 at 14:56, Prathamesh Kulkarni
>  wrote:
> >
> > On Mon, 3 Feb 2020 at 14:41, Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Thu, 30 Jan 2020 at 19:17, Richard Biener  
> > > wrote:
> > > >
> > > > On Thu, Jan 30, 2020 at 11:49 AM Prathamesh Kulkarni
> > > >  wrote:
> > > > >
> > > > > On Wed, 29 Jan 2020 at 14:38, Richard Biener 
> > > > >  wrote:
> > > > > >
> > > > > > On Tue, Jan 28, 2020 at 1:02 PM Jakub Jelinek  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Tue, Jan 28, 2020 at 05:09:36PM +0530, Prathamesh Kulkarni 
> > > > > > > wrote:
> > > > > > > > On Tue, 28 Jan 2020 at 17:00, Jakub Jelinek  
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > On Tue, Jan 28, 2020 at 04:56:59PM +0530, Prathamesh Kulkarni 
> > > > > > > > > wrote:
> > > > > > > > > > Thanks for the suggestions. In the attached patch I bumped 
> > > > > > > > > > up value of
> > > > > > > > > > ERF_RETURNS_ARG_MASK
> > > > > > > > > > to UINT_MAX >> 2, and use highest two bits for ERF_NOALIAS 
> > > > > > > > > > and ERF_RETURNS_ARG.
> > > > > > > > > > And use fn spec "Z" to store the argument number in 
> > > > > > > > > > fn-spec format.
> > > > > > > > > > Does that look OK ?
> > > > > > > > >
> > > > > > > > > No.
> > > > > > > > >
> > > > > > > > > +#define ERF_RETURN_ARG_MASK(UINT_MAX >> 2)
> > > > > > > > >
> > > > > > > > >  /* Nonzero if the return value is equal to the argument 
> > > > > > > > > number
> > > > > > > > > flags & ERF_RETURN_ARG_MASK.  */
> > > > > > > > > -#define ERF_RETURNS_ARG(1 << 2)
> > > > > > > > > +#define ERF_RETURNS_ARG(1 << (BITS_PER_WORD 
> > > > > > > > > - 2))
> > > > > > > > >
> > > > > > > > > How is size of host int related to BITS_PER_WORD?  Not to 
> > > > > > > > > mention that
> > > > > > > > > if BITS_PER_WORD is 64 and host int is 32-bit, 1 << (64 - 2) 
> > > > > > > > > is UB.
> > > > > > > > Oops sorry. I should have used HOST_BITS_PER_INT.
> > > > > > > > I assume that'd be correct ?
> > > > > > >
> > > > > > > It still wouldn't, 1 << (HOST_BITS_PER_INT - 1) is negative 
> > > > > > > number, you'd
> > > > > > > need either 1U and verify all ERF_* flags uses, or avoid using 
> > > > > > > the sign bit.
> > > > > > > The patch has other issues, you don't verify that the argnum fits 
> > > > > > > into
> > > > > > > the bits (doesn't overflow into the other ERF_* bits), in
> > > > > > > +  char *s = (char *) xmalloc (sizeof (char) * BITS_PER_WORD);
> > > > > > > +  s[0] = 'Z';
> > > > > > > +  sprintf (s + 1, "%lu", argnum);
> > > > > > > 1) sizeof (char) is 1 by definition
> > > > > > > 2) it is pointless to allocate it and then deallocate (just use 
> > > > > > > automatic
> > > > > > > array)
> > > > > > > 3) it is unclear how is BITS_PER_WORD related to the length of 
> > > > > > > decimal
> > > > > > > encoded string + Z char + terminating '\0'.  The usual way is for 
> > > > > > > unsigned
> > > > > > > numbers to estimate number of digits by counting 3 digits per 
> > > > > > > each 8 bits,
> > > > > > > in your case of course + 2.
> > > > > > >
> > > > > > > More importantly, the "fn spec" attribute isn't used just in
> > > > > > >

Re: [PR47785] COLLECT_AS_OPTIONS

2020-02-17 Thread Prathamesh Kulkarni
On Thu, 6 Feb 2020 at 20:03, Prathamesh Kulkarni
 wrote:
>
> On Thu, 6 Feb 2020 at 18:42, Richard Biener  
> wrote:
> >
> > On Thu, Feb 6, 2020 at 1:48 PM Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Tue, 4 Feb 2020 at 19:44, Richard Biener  
> > > wrote:
> > > >
> > > > On Mon, Feb 3, 2020 at 12:37 PM Prathamesh Kulkarni
> > > >  wrote:
> > > > >
> > > > > On Thu, 30 Jan 2020 at 19:10, Richard Biener 
> > > > >  wrote:
> > > > > >
> > > > > > On Thu, Jan 30, 2020 at 5:31 AM Prathamesh Kulkarni
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Tue, 28 Jan 2020 at 17:17, Richard Biener 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Fri, Jan 24, 2020 at 7:04 AM Prathamesh Kulkarni
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Mon, 20 Jan 2020 at 15:44, Richard Biener 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Wed, Jan 8, 2020 at 11:20 AM Prathamesh Kulkarni
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Tue, 5 Nov 2019 at 17:38, Richard Biener 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Nov 5, 2019 at 12:17 AM Kugan Vivekanandarajah
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > Thanks for the review.
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, 5 Nov 2019 at 03:57, H.J. Lu 
> > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Sun, Nov 3, 2019 at 6:45 PM Kugan 
> > > > > > > > > > > > > > Vivekanandarajah
> > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks for the reviews.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Sat, 2 Nov 2019 at 02:49, H.J. Lu 
> > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 6:33 PM Kugan 
> > > > > > > > > > > > > > > > Vivekanandarajah
> > > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Wed, 30 Oct 2019 at 03:11, H.J. Lu 
> > > > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Sun, Oct 27, 2019 at 6:33 PM Kugan 
> > > > > > > > > > > > > > > > > > Vivekanandarajah
> > > > > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks for the review.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Wed, 23 Oc

Re: [SVE] PR89007 - Implement generic vector average expansion

2019-12-05 Thread Prathamesh Kulkarni
On Fri, 29 Nov 2019 at 15:41, Richard Biener  wrote:
>
> On Fri, Nov 22, 2019 at 12:40 PM Prathamesh Kulkarni
>  wrote:
> >
> > On Wed, 20 Nov 2019 at 16:54, Richard Biener  wrote:
> > >
> > > On Wed, 20 Nov 2019, Richard Sandiford wrote:
> > >
> > > > Hi,
> > > >
> > > > Thanks for doing this.  Adding Richard on cc:, since the SVE subject
> > > > tag might have put him off.  There's not really anything SVE-specific
> > > > here apart from the testcase.
> > >
> > > Ah.
> > >
> > > > > 2019-11-19  Prathamesh Kulkarni  
> > > > >
> > > > > PR tree-optimization/89007
> > > > > * tree-vect-patterns.c (vect_recog_average_pattern): If there is 
> > > > > no
> > > > > target support available, generate code to distribute rshift over 
> > > > > plus
> > > > > and add one depending upon floor or ceil rounding.
> > > > >
> > > > > testsuite/
> > > > > * gcc.target/aarch64/sve/pr89007.c: New test.
> > > > >
> > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr89007-1.c 
> > > > > b/gcc/testsuite/gcc.target/aarch64/sve/pr89007-1.c
> > > > > new file mode 100644
> > > > > index 000..32095c63c61
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr89007-1.c
> > > > > @@ -0,0 +1,29 @@
> > > > > +/* { dg-do assemble { target aarch64_asm_sve_ok } } */
> > > > > +/* { dg-options "-O -ftree-vectorize -march=armv8.2-a+sve 
> > > > > --save-temps" } */
> > > > > +/* { dg-final { check-function-bodies "**" "" } } */
> > > > > +
> > > > > +#define N 1024
> > > > > +unsigned char dst[N];
> > > > > +unsigned char in1[N];
> > > > > +unsigned char in2[N];
> > > > > +
> > > > > +/*
> > > > > +**  foo:
> > > > > +** ...
> > > > > +** lsr (z[0-9]+\.b), z[0-9]+\.b, #1
> > > > > +** lsr (z[0-9]+\.b), z[0-9]+\.b, #1
> > > > > +** add (z[0-9]+\.b), \1, \2
> > > > > +** orr (z[0-9]+)\.d, z[0-9]+\.d, z[0-9]+\.d
> > > > > +** and (z[0-9]+\.b), \4\.b, #0x1
> > > > > +** add z0.b, \3, \5
> > > >
> > > > It'd probably be more future-proof to allow (\1, \2|\2, \1) and
> > > > (\3, \5|\5, \3).  Same for the other testcase.
> > > >
> > > > > +** ...
> > > > > +*/
> > > > > +void
> > > > > +foo ()
> > > > > +{
> > > > > +  for( int x = 0; x < N; x++ )
> > > > > +dst[x] = (in1[x] + in2[x] + 1) >> 1;
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-assembler-not {\tuunpklo\t} } } */
> > > > > +/* { dg-final { scan-assembler-not {\tuunpkhi\t} } } */
> > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr89007-2.c 
> > > > > b/gcc/testsuite/gcc.target/aarch64/sve/pr89007-2.c
> > > > > new file mode 100644
> > > > > index 000..cc40f45046b
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr89007-2.c
> > > > > @@ -0,0 +1,29 @@
> > > > > +/* { dg-do assemble { target aarch64_asm_sve_ok } } */
> > > > > +/* { dg-options "-O -ftree-vectorize -march=armv8.2-a+sve 
> > > > > --save-temps" } */
> > > > > +/* { dg-final { check-function-bodies "**" "" } } */
> > > > > +
> > > > > +#define N 1024
> > > > > +unsigned char dst[N];
> > > > > +unsigned char in1[N];
> > > > > +unsigned char in2[N];
> > > > > +
> > > > > +/*
> > > > > +**  foo:
> > > > > +** ...
> > > > > +** lsr (z[0-9]+\.b), z[0-9]+\.b, #1
> > > > > +** lsr (z[0-9]+\.b), z[0-9]+\.b, #1
> > > > > +** add (z[0-9]+\.b), \1, \2
> > > > > +** and (z[0-9]+)\.d, z[0-9]+\.d, z[0-9]+\.d
> > > > > +** and (z[0-9]+\.b), \4\.b, #0x1
> > > > > +** add z0.b, \3, \5
> > > > > +** ...
> > > > > +*/
> > > > > +void
> > > > &g

Re: [SVE] PR89007 - Implement generic vector average expansion

2019-12-09 Thread Prathamesh Kulkarni
On Thu, 5 Dec 2019 at 18:17, Richard Biener  wrote:
>
> On Thu, 5 Dec 2019, Prathamesh Kulkarni wrote:
>
> > On Fri, 29 Nov 2019 at 15:41, Richard Biener  
> > wrote:
> > >
> > > On Fri, Nov 22, 2019 at 12:40 PM Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Wed, 20 Nov 2019 at 16:54, Richard Biener  wrote:
> > > > >
> > > > > On Wed, 20 Nov 2019, Richard Sandiford wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Thanks for doing this.  Adding Richard on cc:, since the SVE subject
> > > > > > tag might have put him off.  There's not really anything 
> > > > > > SVE-specific
> > > > > > here apart from the testcase.
> > > > >
> > > > > Ah.
> > > > >
> > > > > > > 2019-11-19  Prathamesh Kulkarni  
> > > > > > >
> > > > > > > PR tree-optimization/89007
> > > > > > > * tree-vect-patterns.c (vect_recog_average_pattern): If there 
> > > > > > > is no
> > > > > > > target support available, generate code to distribute rshift 
> > > > > > > over plus
> > > > > > > and add one depending upon floor or ceil rounding.
> > > > > > >
> > > > > > > testsuite/
> > > > > > > * gcc.target/aarch64/sve/pr89007.c: New test.
> > > > > > >
> > > > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr89007-1.c 
> > > > > > > b/gcc/testsuite/gcc.target/aarch64/sve/pr89007-1.c
> > > > > > > new file mode 100644
> > > > > > > index 000..32095c63c61
> > > > > > > --- /dev/null
> > > > > > > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr89007-1.c
> > > > > > > @@ -0,0 +1,29 @@
> > > > > > > +/* { dg-do assemble { target aarch64_asm_sve_ok } } */
> > > > > > > +/* { dg-options "-O -ftree-vectorize -march=armv8.2-a+sve 
> > > > > > > --save-temps" } */
> > > > > > > +/* { dg-final { check-function-bodies "**" "" } } */
> > > > > > > +
> > > > > > > +#define N 1024
> > > > > > > +unsigned char dst[N];
> > > > > > > +unsigned char in1[N];
> > > > > > > +unsigned char in2[N];
> > > > > > > +
> > > > > > > +/*
> > > > > > > +**  foo:
> > > > > > > +** ...
> > > > > > > +** lsr (z[0-9]+\.b), z[0-9]+\.b, #1
> > > > > > > +** lsr (z[0-9]+\.b), z[0-9]+\.b, #1
> > > > > > > +** add (z[0-9]+\.b), \1, \2
> > > > > > > +** orr (z[0-9]+)\.d, z[0-9]+\.d, z[0-9]+\.d
> > > > > > > +** and (z[0-9]+\.b), \4\.b, #0x1
> > > > > > > +** add z0.b, \3, \5
> > > > > >
> > > > > > It'd probably be more future-proof to allow (\1, \2|\2, \1) and
> > > > > > (\3, \5|\5, \3).  Same for the other testcase.
> > > > > >
> > > > > > > +** ...
> > > > > > > +*/
> > > > > > > +void
> > > > > > > +foo ()
> > > > > > > +{
> > > > > > > +  for( int x = 0; x < N; x++ )
> > > > > > > +dst[x] = (in1[x] + in2[x] + 1) >> 1;
> > > > > > > +}
> > > > > > > +
> > > > > > > +/* { dg-final { scan-assembler-not {\tuunpklo\t} } } */
> > > > > > > +/* { dg-final { scan-assembler-not {\tuunpkhi\t} } } */
> > > > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr89007-2.c 
> > > > > > > b/gcc/testsuite/gcc.target/aarch64/sve/pr89007-2.c
> > > > > > > new file mode 100644
> > > > > > > index 000..cc40f45046b
> > > > > > > --- /dev/null
> > > > > > > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr89007-2.c
> > > > > > > @@ -0,0 +1,29 @@
> > > > > > > +/* { dg-do assemble { target aarch64_asm_sve_ok } } */
> > > > > > > +/* { dg-options "-O -ftree-vectorize -march=armv8.2-a+sve 
> > > > &g

Re: [PR47785] COLLECT_AS_OPTIONS

2020-01-08 Thread Prathamesh Kulkarni
On Tue, 5 Nov 2019 at 17:38, Richard Biener  wrote:
>
> On Tue, Nov 5, 2019 at 12:17 AM Kugan Vivekanandarajah
>  wrote:
> >
> > Hi,
> > Thanks for the review.
> >
> > On Tue, 5 Nov 2019 at 03:57, H.J. Lu  wrote:
> > >
> > > On Sun, Nov 3, 2019 at 6:45 PM Kugan Vivekanandarajah
> > >  wrote:
> > > >
> > > > Thanks for the reviews.
> > > >
> > > >
> > > > On Sat, 2 Nov 2019 at 02:49, H.J. Lu  wrote:
> > > > >
> > > > > On Thu, Oct 31, 2019 at 6:33 PM Kugan Vivekanandarajah
> > > > >  wrote:
> > > > > >
> > > > > > On Wed, 30 Oct 2019 at 03:11, H.J. Lu  wrote:
> > > > > > >
> > > > > > > On Sun, Oct 27, 2019 at 6:33 PM Kugan Vivekanandarajah
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > Hi Richard,
> > > > > > > >
> > > > > > > > Thanks for the review.
> > > > > > > >
> > > > > > > > On Wed, 23 Oct 2019 at 23:07, Richard Biener 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Oct 21, 2019 at 10:04 AM Kugan Vivekanandarajah
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Richard,
> > > > > > > > > >
> > > > > > > > > > Thanks for the pointers.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Fri, 11 Oct 2019 at 22:33, Richard Biener 
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Oct 11, 2019 at 6:15 AM Kugan Vivekanandarajah
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Hi Richard,
> > > > > > > > > > > > Thanks for the review.
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, 2 Oct 2019 at 20:41, Richard Biener 
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Oct 2, 2019 at 10:39 AM Kugan Vivekanandarajah
> > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > As mentioned in the PR, attached patch adds 
> > > > > > > > > > > > > > COLLECT_AS_OPTIONS for
> > > > > > > > > > > > > > passing assembler options specified with -Wa, to 
> > > > > > > > > > > > > > the link-time driver.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The proposed solution only works for uniform -Wa 
> > > > > > > > > > > > > > options across all
> > > > > > > > > > > > > > TUs. As mentioned by Richard Biener, supporting 
> > > > > > > > > > > > > > non-uniform -Wa flags
> > > > > > > > > > > > > > would require either adjusting partitioning 
> > > > > > > > > > > > > > according to flags or
> > > > > > > > > > > > > > emitting multiple object files  from a single 
> > > > > > > > > > > > > > LTRANS CU. We could
> > > > > > > > > > > > > > consider this as a follow up.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Bootstrapped and regression tests on  
> > > > > > > > > > > > > > arm-linux-gcc. Is this OK for trunk?
> > > > > > > > > > > > >
> > > > > > > > > > > > > While it works for your simple cases it is unlikely 
> > > > > > > > > > > > > to work in practice since
> > > > > > > > > > > > > your implementation needs the assembler options be 
> > > > > > > > > > > > > present at the link
> > > > > > > > > > > > > command line.  I agree that this might be the way for 
> > > > > > > > > > > > > people to go when
> > > > > > > > > > > > > they face the issue but then it needs to be 
> > > > > > > > > > > > > documented somewhere
> > > > > > > > > > > > > in the manual.
> > > > > > > > > > > > >
> > > > > > > > > > > > > That is, with COLLECT_AS_OPTION (why singular?  I'd 
> > > > > > > > > > > > > expected
> > > > > > > > > > > > > COLLECT_AS_OPTIONS) available to cc1 we could stream 
> > > > > > > > > > > > > this string
> > > > > > > > > > > > > to lto_options and re-materialize it at link time 
> > > > > > > > > > > > > (and diagnose mismatches
> > > > > > > > > > > > > even if we like).
> > > > > > > > > > > > OK. I will try to implement this. So the idea is if we 
> > > > > > > > > > > > provide
> > > > > > > > > > > > -Wa,options as part of the lto compile, this should be 
> > > > > > > > > > > > available
> > > > > > > > > > > > during link time. Like in:
> > > > > > > > > > > >
> > > > > > > > > > > > arm-linux-gnueabihf-gcc -march=armv7-a -mthumb -O2 -flto
> > > > > > > > > > > > -Wa,-mimplicit-it=always,-mthumb -c test.c
> > > > > > > > > > > > arm-linux-gnueabihf-gcc  -flto  test.o
> > > > > > > > > > > >
> > > > > > > > > > > > I am not sure where should we stream this. Currently, 
> > > > > > > > > > > > cl_optimization
> > > > > > > > > > > > has all the optimization flag provided for compiler and 
> > > > > > > > > > > > it is
> > > > > > > > > > > > autogenerated and all the flags are integer values. Do 
> > > > > > > > > > > > you have any
> > > > > > > > > > > > preference or example where this should be done.
> > > > > > > > > > >
> > > > > > > > > > > In lto_write_options, I'd simply append the contents of 
> > > > > > > > > > > COLLECT_AS_OPTIONS
> > > > > > > > > > > (with -Wa, prepended to each of them),

Re: [PATCH v4 4/5] Add tests for C/C++ musttail attributes

2024-02-02 Thread Prathamesh Kulkarni
On Fri, 2 Feb 2024 at 14:44, Andi Kleen  wrote:
>
> Mostly adopted from the existing C musttail plugin tests.
>
> gcc/testsuite/ChangeLog:
>
> * c-c++-common/musttail1.c: New test.
> * c-c++-common/musttail2.c: New test.
> * c-c++-common/musttail3.c: New test.
> * c-c++-common/musttail4.c: New test.
> * c-c++-common/musttail5.c: New test.
> ---
>  gcc/testsuite/c-c++-common/musttail1.c | 15 
>  gcc/testsuite/c-c++-common/musttail2.c | 34 ++
>  gcc/testsuite/c-c++-common/musttail3.c | 29 ++
>  gcc/testsuite/c-c++-common/musttail4.c | 17 +
>  gcc/testsuite/c-c++-common/musttail5.c | 25 +++
>  5 files changed, 120 insertions(+)
>  create mode 100644 gcc/testsuite/c-c++-common/musttail1.c
>  create mode 100644 gcc/testsuite/c-c++-common/musttail2.c
>  create mode 100644 gcc/testsuite/c-c++-common/musttail3.c
>  create mode 100644 gcc/testsuite/c-c++-common/musttail4.c
>  create mode 100644 gcc/testsuite/c-c++-common/musttail5.c
>
> diff --git a/gcc/testsuite/c-c++-common/musttail1.c 
> b/gcc/testsuite/c-c++-common/musttail1.c
> new file mode 100644
> index ..ac92f9f74616
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/musttail1.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
> +/* { dg-options "-O2" } */
> +/* { dg-additional-options "-fdelayed-branch" { target sparc*-*-* } } */
> +
> +int __attribute__((noinline,noclone,noipa))
> +callee (int i)
Hi Andy,
Sorry, I wasn't clear about this in previous patch -- noipa will
subsume other ipa attributes,
so there's no need to have noinline, noclone along with noipa.
int __attribute__((noipa)) callee(int i) should be sufficient for
disabling IPA optimizations involving callee.

Thanks,
Prathamesh

> +{
> +  return i * i;
> +}
> +
> +int __attribute__((noinline,noclone,noipa))
> +caller (int i)
> +{
> +  [[gnu::musttail]] return callee (i + 1);
> +}
> diff --git a/gcc/testsuite/c-c++-common/musttail2.c 
> b/gcc/testsuite/c-c++-common/musttail2.c
> new file mode 100644
> index ..058329b69cc2
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/musttail2.c
> @@ -0,0 +1,34 @@
> +/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
> +
> +struct box { char field[256]; int i; };
> +
> +int __attribute__((noinline,noclone,noipa))
> +test_2_callee (int i, struct box b)
> +{
> +  if (b.field[0])
> +return 5;
> +  return i * i;
> +}
> +
> +int __attribute__((noinline,noclone,noipa))
> +test_2_caller (int i)
> +{
> +  struct box b;
> +  [[gnu::musttail]] return test_2_callee (i + 1, b); /* { dg-error "cannot 
> tail-call: " } */
> +}
> +
> +extern void setjmp (void);
> +void
> +test_3 (void)
> +{
> +  [[gnu::musttail]] return setjmp (); /* { dg-error "cannot tail-call: " } */
> +}
> +
> +typedef void (fn_ptr_t) (void);
> +volatile fn_ptr_t fn_ptr;
> +
> +void
> +test_5 (void)
> +{
> +  [[gnu::musttail]] return fn_ptr (); /* { dg-error "cannot tail-call: " } */
> +}
> diff --git a/gcc/testsuite/c-c++-common/musttail3.c 
> b/gcc/testsuite/c-c++-common/musttail3.c
> new file mode 100644
> index ..ea9589c59ef2
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/musttail3.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
> +
> +extern int foo2 (int x, ...);
> +
> +struct str
> +{
> +  int a, b;
> +};
> +
> +struct str
> +cstruct (int x)
> +{
> +  if (x < 10)
> +[[clang::musttail]] return cstruct (x + 1);
> +  return ((struct str){ x, 0 });
> +}
> +
> +int
> +foo (int x)
> +{
> +  if (x < 10)
> +[[clang::musttail]] return foo2 (x, 29);
> +  if (x < 100)
> +{
> +  int k = foo (x + 1);
> +  [[clang::musttail]] return k;/* { dg-error "cannot tail-call: " } 
> */
> +}
> +  return x;
> +}
> diff --git a/gcc/testsuite/c-c++-common/musttail4.c 
> b/gcc/testsuite/c-c++-common/musttail4.c
> new file mode 100644
> index ..23f4b5e1cd68
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/musttail4.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile { target { tail_call && { c || c++11 } } } } */
> +
> +struct box { char field[64]; int i; };
> +
> +struct box __attribute__((noinline,noclone,noipa))
> +returns_struct (int i)
> +{
> +  struct box b;
> +  b.i = i * i;
> +  return b;
> +}
> +
> +int __attribute__((noinline,noclone))
> +test_1 (int i)
> +{
> +  [[gnu::musttail]] return returns_struct (i * 5).i; /* { dg-error "cannot 
> tail-call: " } */
> +}
> diff --git a/gcc/testsuite/c-c++-common/musttail5.c 
> b/gcc/testsuite/c-c++-common/musttail5.c
> new file mode 100644
> index ..71f4de40fc6d
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/musttail5.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-options "-std=c23" { target c } } */
> +/* { dg-options "-std=gnu++11" { target c++ } } */
> +
> +[[musttail]] int j; /* { dg-warning "attribute" } */
> +__attribute__((musttail)) int k; /*

[aarch64] PR112950: gcc.target/aarch64/sve/acle/general/dupq_5.c fails on aarch64_be-linux-gnu

2024-01-27 Thread Prathamesh Kulkarni
Hi,
The test passes -mlittle-endian option but doesn't have target check
for aarch64_little_endian and thus fails to compile on
aarch64_be-linux-gnu. The patch adds the missing aarch64_little_endian
target check, which makes it unsupported on the target.
OK to commit ?

Thanks,
Prathamesh
PR112950: Add aarch64_little_endian target check for dupq_5.c

gcc/testsuite/ChangeLog:
PR target/112950
* gcc.target/aarch64/sve/acle/general/dupq_5.c: Add
aarch64_little_endian target check.

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c
index 6ae8d4c60b2..1990412d0e5 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mlittle-endian" } */
+/* { dg-require-effective-target aarch64_little_endian } */
 
 #include 
 


Re: [aarch64] PR112950: gcc.target/aarch64/sve/acle/general/dupq_5.c fails on aarch64_be-linux-gnu

2024-01-29 Thread Prathamesh Kulkarni
On Sat, 27 Jan 2024 at 21:19, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > Hi,
> > The test passes -mlittle-endian option but doesn't have target check
> > for aarch64_little_endian and thus fails to compile on
> > aarch64_be-linux-gnu. The patch adds the missing aarch64_little_endian
> > target check, which makes it unsupported on the target.
> > OK to commit ?
> >
> > Thanks,
> > Prathamesh
> >
> > PR112950: Add aarch64_little_endian target check for dupq_5.c
> >
> > gcc/testsuite/ChangeLog:
> >   PR target/112950
> >   * gcc.target/aarch64/sve/acle/general/dupq_5.c: Add
> >   aarch64_little_endian target check.
>
> If we add this requirement, then there's no need to pass -mlittle-endian
> in the dg-options.
>
> But dupq_6.c (the corresponding big-endian test) has:
>
>   /* To avoid needing big-endian header files.  */
>   #pragma GCC aarch64 "arm_sve.h"
>
> instead of:
>
>   #include 
>
> Could you do the same thing here?
That worked, thanks! And it also makes dupq_5.c pass on aarch64_be-linux-gnu.

Thanks,
Prathamesh

>
> Thanks,
> Richard
>
> > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c 
> > b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c
> > index 6ae8d4c60b2..1990412d0e5 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c
> > @@ -1,5 +1,6 @@
> >  /* { dg-do compile } */
> >  /* { dg-options "-O2 -mlittle-endian" } */
> > +/* { dg-require-effective-target aarch64_little_endian } */
> >
> >  #include 
> >
PR112950: Use #pragma GCC for including arm_sve.h. 

gcc/testsuite/ChangeLog:
PR target/112950
* gcc.target/aarch64/sve/acle/general/dupq_5.c: Remove include directive
and instead use #pragma GCC for including arm_sve.h.

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c
index 6ae8d4c60b2..e88477b6379 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_5.c
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mlittle-endian" } */
 
-#include 
+#pragma GCC aarch64 "arm_sve.h"
 
 svint32_t
 dupq (int x1, int x2, int x3, int x4)


Re: [PATCH] aarch64: Fix ICE in poly-int.h due to SLP.

2024-01-30 Thread Prathamesh Kulkarni
On Tue, 30 Jan 2024 at 20:13, Richard Ball  wrote:
>
> Adds a check to ensure that the input vector arguments
> to a function are not variable length. Previously, only the
> output vector of a function was checked.
Hi,
Quoting from patch:
@@ -8989,6 +8989,14 @@ vectorizable_slp_permutation_1 (vec_info
*vinfo, gimple_stmt_iterator *gsi,
   instead of relying on the pattern described above.  */
   if (!nunits.is_constant (&npatterns))
  return -1;
+  FOR_EACH_VEC_ELT (children, i, child)
+ if (SLP_TREE_VECTYPE (child))
+   {
+ tree child_vectype = SLP_TREE_VECTYPE (child);
+ poly_uint64 child_nunits = TYPE_VECTOR_SUBPARTS (child_vectype);
+ if (!child_nunits.is_constant ())
+   return -1;
+   }

Just wondering if that'd be equivalent to checking:
if (!TYPE_VECTOR_SUBPARTS (op_vectype).is_constant ())
  return -1;
Instead of (again) iterating over children since we bail out in the
function above,
if SLP_TREE_VECTYPE (child) and op_vectype are not compatible types ?

Also, could you please include the offending test-case in the patch ?

Thanks,
Prathamesh

>
> gcc/ChangeLog:
>
> * tree-vect-slp.cc (vectorizable_slp_permutation_1):
> Add variable-length check for vector input arguments
> to a function.


Re: [PATCH v3 4/5] Add tests for C/C++ musttail attributes

2024-01-31 Thread Prathamesh Kulkarni
On Wed, 31 Jan 2024 at 07:49, Andi Kleen  wrote:
>
> Mostly adopted from the existing C musttail plugin tests.
> ---
>  gcc/testsuite/c-c++-common/musttail1.c  | 17 
>  gcc/testsuite/c-c++-common/musttail2.c  | 36 +
>  gcc/testsuite/c-c++-common/musttail3.c  | 31 +
>  gcc/testsuite/c-c++-common/musttail4.c  | 19 +
>  gcc/testsuite/gcc.dg/musttail-invalid.c | 17 
>  5 files changed, 120 insertions(+)
>  create mode 100644 gcc/testsuite/c-c++-common/musttail1.c
>  create mode 100644 gcc/testsuite/c-c++-common/musttail2.c
>  create mode 100644 gcc/testsuite/c-c++-common/musttail3.c
>  create mode 100644 gcc/testsuite/c-c++-common/musttail4.c
>  create mode 100644 gcc/testsuite/gcc.dg/musttail-invalid.c
>
> diff --git a/gcc/testsuite/c-c++-common/musttail1.c 
> b/gcc/testsuite/c-c++-common/musttail1.c
> new file mode 100644
> index ..476185e3ed4b
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/musttail1.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile { target tail_call } } */
> +/* { dg-options "-O2" } */
> +/* { dg-additional-options "-std=c++11" { target c++ } } */
> +/* { dg-additional-options "-std=c23" { target c } } */
> +/* { dg-additional-options "-fdelayed-branch" { target sparc*-*-* } } */
> +
> +int __attribute__((noinline,noclone))
Hi,
Sorry to nitpick -- Just wondering if it'd be slightly better to use
noipa attribute instead, assuming the intent is to disable IPA opts ?

Thanks,
Prathamesh


> +callee (int i)
> +{
> +  return i * i;
> +}
> +
> +int __attribute__((noinline,noclone))
> +caller (int i)
> +{
> +  [[gnu::musttail]] return callee (i + 1);
> +}
> diff --git a/gcc/testsuite/c-c++-common/musttail2.c 
> b/gcc/testsuite/c-c++-common/musttail2.c
> new file mode 100644
> index ..28f2f68ef13d
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/musttail2.c
> @@ -0,0 +1,36 @@
> +/* { dg-do compile { target tail_call } } */
> +/* { dg-additional-options "-std=c++11" { target c++ } } */
> +/* { dg-additional-options "-std=c23" { target c } } */
> +
> +struct box { char field[256]; int i; };
> +
> +int __attribute__((noinline,noclone))
> +test_2_callee (int i, struct box b)
> +{
> +  if (b.field[0])
> +return 5;
> +  return i * i;
> +}
> +
> +int __attribute__((noinline,noclone))
> +test_2_caller (int i)
> +{
> +  struct box b;
> +  [[gnu::musttail]] return test_2_callee (i + 1, b); /* { dg-error "cannot 
> tail-call: " } */
> +}
> +
> +extern void setjmp (void);
> +void
> +test_3 (void)
> +{
> +  [[gnu::musttail]] return setjmp (); /* { dg-error "cannot tail-call: " } */
> +}
> +
> +typedef void (fn_ptr_t) (void);
> +volatile fn_ptr_t fn_ptr;
> +
> +void
> +test_5 (void)
> +{
> +  [[gnu::musttail]] return fn_ptr (); /* { dg-error "cannot tail-call: " } */
> +}
> diff --git a/gcc/testsuite/c-c++-common/musttail3.c 
> b/gcc/testsuite/c-c++-common/musttail3.c
> new file mode 100644
> index ..fdbb292944ad
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/musttail3.c
> @@ -0,0 +1,31 @@
> +/* { dg-do compile { target tail_call } } */
> +/* { dg-additional-options "-std=c++11" { target c++ } } */
> +/* { dg-additional-options "-std=c23" { target c } } */
> +
> +extern int foo2 (int x, ...);
> +
> +struct str
> +{
> +  int a, b;
> +};
> +
> +struct str
> +cstruct (int x)
> +{
> +  if (x < 10)
> +[[clang::musttail]] return cstruct (x + 1);
> +  return ((struct str){ x, 0 });
> +}
> +
> +int
> +foo (int x)
> +{
> +  if (x < 10)
> +[[clang::musttail]] return foo2 (x, 29);
> +  if (x < 100)
> +{
> +  int k = foo (x + 1);
> +  [[clang::musttail]] return k;/* { dg-error "cannot tail-call: " } 
> */
> +}
> +  return x;
> +}
> diff --git a/gcc/testsuite/c-c++-common/musttail4.c 
> b/gcc/testsuite/c-c++-common/musttail4.c
> new file mode 100644
> index ..7bf44816f14a
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/musttail4.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target tail_call } } */
> +/* { dg-additional-options "-std=c++11" { target c++ } } */
> +/* { dg-additional-options "-std=c23" { target c } } */
> +
> +struct box { char field[64]; int i; };
> +
> +struct box __attribute__((noinline,noclone))
> +returns_struct (int i)
> +{
> +  struct box b;
> +  b.i = i * i;
> +  return b;
> +}
> +
> +int __attribute__((noinline,noclone))
> +test_1 (int i)
> +{
> +  [[gnu::musttail]] return returns_struct (i * 5).i; /* { dg-error "cannot 
> tail-call: " } */
> +}
> diff --git a/gcc/testsuite/gcc.dg/musttail-invalid.c 
> b/gcc/testsuite/gcc.dg/musttail-invalid.c
> new file mode 100644
> index ..c4725b4b8226
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/musttail-invalid.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-std=c23" } */
> +
> +[[musttail]] int j; /* { dg-warning "attribute ignored" } */
> +__attribute__((musttail)) int k; /* { dg-warning "attribute directive 
> ignored" } */
> +
> +void foo(void)
> +{
> +   [[mus

Re: [aarch64] PR111702 - ICE in insert_regs after interleave+zip1 vector initialization patch

2023-12-19 Thread Prathamesh Kulkarni
On Mon, 4 Dec 2023 at 14:44, Prathamesh Kulkarni
 wrote:
>
> On Thu, 23 Nov 2023 at 17:06, Prathamesh Kulkarni
>  wrote:
> >
> > Hi Richard,
> > For the test-case mentioned in PR111702, compiling with -O2
> > -frounding-math -fstack-protector-all results in following ICE during
> > cse2 pass:
> >
> > test.c: In function 'foo':
> > test.c:119:1: internal compiler error: in insert_regs, at cse.cc:1120
> >   119 | }
> >   | ^
> > 0xb7ebb0 insert_regs
> > ../../gcc/gcc/cse.cc:1120
> > 0x1f95134 merge_equiv_classes
> > ../../gcc/gcc/cse.cc:1764
> > 0x1f9b9ab cse_insn
> > ../../gcc/gcc/cse.cc:4793
> > 0x1f9fe30 cse_extended_basic_block
> > ../../gcc/gcc/cse.cc:6577
> > 0x1f9fe30 cse_main
> > ../../gcc/gcc/cse.cc:6722
> > 0x1fa0984 rest_of_handle_cse2
> > ../../gcc/gcc/cse.cc:7620
> > 0x1fa0984 execute
> > ../../gcc/gcc/cse.cc:7675
> >
> > This happens only with interleave+zip1 vector initialization with
> > -frounding-math -fstack-protector-all, while it compiles OK without
> > -fstack-protector-all. Also, it compiles OK with fallback sequence
> > code-gen (with or without -fstack-protector-all). Unfortunately, I
> > haven't been able to reduce the test-case further :/
> >
> > From the test-case, it seems only the vector initializer for type J
> > uses interleave+zip1 approach, while rest of the vector initializers
> > use fallback sequence.
> >
> > J is defined as:
> > typedef _Float16 __attribute__((__vector_size__ (16))) J;
> >
> > and the initializer is:
> > (J) { 11654, 4801, 5535, 9743, 61680}
> >
> > interleave+zip1 sequence for above initializer J:
> > mode = V8HF
> >
> > vals: (parallel:V8HF [
> > (reg:HF 642)
> > (reg:HF 645)
> > (reg:HF 648)
> > (reg:HF 651)
> > (reg:HF 654)
> > (const_double:HF 0.0 [0x0.0p+0]) repeated x3
> > ])
> >
> > target: (reg:V8HF 641)
> > seq:
> > (insn 1058 0 1059 (set (reg:V4HF 657)
> > (const_vector:V4HF [
> > (const_double:HF 0.0 [0x0.0p+0]) repeated x4
> > ])) "test.c":81:8 -1
> >  (nil))
> > (insn 1059 1058 1060 (set (reg:V4HF 657)
> > (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 642))
> > (reg:V4HF 657)
> > (const_int 1 [0x1]))) "test.c":81:8 -1
> >  (nil))
> > (insn 1060 1059 1061 (set (reg:V4HF 657)
> > (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 648))
> > (reg:V4HF 657)
> > (const_int 2 [0x2]))) "test.c":81:8 -1
> >  (nil))
> > (insn 1061 1060 1062 (set (reg:V4HF 657)
> > (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 654))
> > (reg:V4HF 657)
> > (const_int 4 [0x4]))) "test.c":81:8 -1
> >  (nil))
> > (insn 1062 1061 1063 (set (reg:V4HF 658)
> > (const_vector:V4HF [
> > (const_double:HF 0.0 [0x0.0p+0]) repeated x4
> > ])) "test.c":81:8 -1
> >  (nil))
> > (insn 1063 1062 1064 (set (reg:V4HF 658)
> > (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 645))
> > (reg:V4HF 658)
> > (const_int 1 [0x1]))) "test.c":81:8 -1
> >  (nil))
> > (insn 1064 1063 1065 (set (reg:V4HF 658)
> > (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 651))
> > (reg:V4HF 658)
> > (const_int 2 [0x2]))) "test.c":81:8 -1
> >  (nil))
> > (insn 1065 1064 0 (set (reg:V8HF 641)
> > (unspec:V8HF [
> > (subreg:V8HF (reg:V4HF 657) 0)
> > (subreg:V8HF (reg:V4HF 658) 0)
> > ] UNSPEC_ZIP1)) "test.c":81:8 -1
> >  (nil))
> >
> > It seems to me that the above sequence correctly initializes the
> > vector into r641 ?
> > insns 1058-1061 construct r657 = { r642, r648, r654, 0 }
> > insns 1062-1064 construct r658 = { r645, r651, 0, 0 }
> > and zip1 will create r641 = { r642, r645, r648, r651, r654, 0, 0, 0 }
> >
> > For the above test, it seems that with interleave+zip1 approach and
> > -fstack-protector-all,
> > in cse pass, there are two separate equivalence classes created for
> > (const_int 1), that need
> > to be merged in cse_insn:
> >
> >if (elt->first_same_value != src_eqv_elt->first_same_value)
> > {
> > 

Re: [PATCH] cse: Fix handling of fake vec_select sets [PR111702]

2023-12-26 Thread Prathamesh Kulkarni
On Thu, 21 Dec 2023 at 00:00, Richard Sandiford
 wrote:
>
> If cse sees:
>
>   (set (reg R) (const_vector [A B ...]))
>
> it creates fake sets of the form:
>
>   (set R[0] A)
>   (set R[1] B)
>   ...
>
> (with R[n] replaced by appropriate rtl) and then adds them to the tables
> in the same way as for normal sets.  This allows a sequence like:
>
>   (set (reg R2) A)
>   ...(reg R2)...
>
> to try to use R[0] instead of (reg R2).
>
> But the pass was taking the analogy too far, and was trying to simplify
> these fake sets based on costs.  That is, if there was an earlier:
>
>   (set (reg T) A)
>
> the pass would go to considerable effort trying to work out whether:
>
>   (set R[0] A)
>
> or:
>
>   (set R[0] (reg T))
>
> was more profitable.  This included running validate*_change on the sets,
> which has no meaning given that the sets are not part of the insn.
>
> In this example, the equivalence A == T is already known, and the
> purpose of the fake sets is to add A == T == R[0].  We can do that
> just as easily (or, as the PR shows, more easily) if we keep the
> original form of the fake set, with A instead of T.
>
> The problem in the PR occurred if we had:
>
> (1) something that establishes an equivalence between a vector V1 of
> M-bit scalar integers and a hard register H
>
> (2) something that establishes an equivalence between a vector V2 of
> N-bit scalar integers, where N instances of V1[0]
>
> (1) established an equivalence between V1[0] and H in M bits.
> (2) then triggered a search for an equivalence of V1[0] in N bits.
> This included:
>
>   /* See if we have a CONST_INT that is already in a register in a
>  wider mode.  */
>
> which (correctly) found that the low N bits of H contain the right value.
> But because it came from a wider mode, this equivalence between N-bit H
> and N-bit V1[0] was not yet in the hash table.  It therefore survived
> the purge in:
>
>   /* At this point, ELT, if nonzero, points to a class of expressions
>  equivalent to the source of this SET and SRC, SRC_EQV, SRC_FOLDED,
>  and SRC_RELATED, if nonzero, each contain additional equivalent
>  expressions.  Prune these latter expressions by deleting expressions
>  already in the equivalence class.
>
> And since more than 1 set found the same N-bit equivalence between
> H and V1[0], the pass tried to add it more than once.
>
> Things were already wrong at this stage, but an ICE was only triggered
> later when trying to merge this N-bit equivalence with another one.
>
> We could avoid the double registration by adding:
>
>   for (elt = classp; elt; elt = elt->next_same_value)
> if (rtx_equal_p (elt->exp, x))
>   return elt;
>
> to insert_with_costs, or by making cse_insn check whether previous
> sets have recorded the same equivalence.  The latter seems more
> appealing from a compile-time perspective.  But in this case,
> doing that would be adding yet more spurious work to the handling
> of fake sets.
>
> The handling of fake sets therefore seems like the more fundamental bug.
>
> While there, the patch also makes sure that we don't apply REG_EQUAL
> notes to these fake sets.  They only describe the "real" (first) set.
Hi Richard,
Thanks for the detailed explanation and fix!

Thanks,
Prathamesh
>
> gcc/
> PR rtl-optimization/111702
> * cse.cc (set::mode): Move earlier.
> (set::src_in_memory, set::src_volatile): Convert to bitfields.
> (set::is_fake_set): New member variable.
> (add_to_set): Add an is_fake_set parameter.
> (find_sets_in_insn): Update calls accordingly.
> (cse_insn): Do not apply REG_EQUAL notes to fake sets.  Do not
> try to optimize them either, or validate changes to them.
>
> gcc/
> PR rtl-optimization/111702
> * gcc.dg/rtl/aarch64/pr111702.c: New test.
> ---
>  gcc/cse.cc  | 38 +++---
>  gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c | 43 +
>  2 files changed, 67 insertions(+), 14 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/rtl/aarch64/pr111702.c
>
> diff --git a/gcc/cse.cc b/gcc/cse.cc
> index f9603fdfd43..9fd51ca2832 100644
> --- a/gcc/cse.cc
> +++ b/gcc/cse.cc
> @@ -4128,13 +4128,17 @@ struct set
>unsigned dest_hash;
>/* The SET_DEST, with SUBREG, etc., stripped.  */
>rtx inner_dest;
> +  /* Original machine mode, in case it becomes a CONST_INT.  */
> +  ENUM_BITFIELD(machine_mode) mode : MACHINE_MODE_BITSIZE;
>/* Nonzero if the SET_SRC is in memory.  */
> -  char src_in_memory;
> +  unsigned int src_in_memory : 1;
>/* Nonzero if the SET_SRC contains something
>   whose value cannot be predicted and understood.  */
> -  char src_volatile;
> -  /* Original machine mode, in case it becomes a CONST_INT.  */
> -  ENUM_BITFIELD(machine_mode) mode : MACHINE_MODE_BITSIZE;
> +  unsigned int src_volatile : 1;
> +  /* Nonzero if RTL is an artifical set that has been c

Re: [aarch64] PR111702 - ICE in insert_regs after interleave+zip1 vector initialization patch

2023-12-04 Thread Prathamesh Kulkarni
On Thu, 23 Nov 2023 at 17:06, Prathamesh Kulkarni
 wrote:
>
> Hi Richard,
> For the test-case mentioned in PR111702, compiling with -O2
> -frounding-math -fstack-protector-all results in following ICE during
> cse2 pass:
>
> test.c: In function 'foo':
> test.c:119:1: internal compiler error: in insert_regs, at cse.cc:1120
>   119 | }
>   | ^
> 0xb7ebb0 insert_regs
> ../../gcc/gcc/cse.cc:1120
> 0x1f95134 merge_equiv_classes
> ../../gcc/gcc/cse.cc:1764
> 0x1f9b9ab cse_insn
> ../../gcc/gcc/cse.cc:4793
> 0x1f9fe30 cse_extended_basic_block
> ../../gcc/gcc/cse.cc:6577
> 0x1f9fe30 cse_main
> ../../gcc/gcc/cse.cc:6722
> 0x1fa0984 rest_of_handle_cse2
> ../../gcc/gcc/cse.cc:7620
> 0x1fa0984 execute
> ../../gcc/gcc/cse.cc:7675
>
> This happens only with interleave+zip1 vector initialization with
> -frounding-math -fstack-protector-all, while it compiles OK without
> -fstack-protector-all. Also, it compiles OK with fallback sequence
> code-gen (with or without -fstack-protector-all). Unfortunately, I
> haven't been able to reduce the test-case further :/
>
> From the test-case, it seems only the vector initializer for type J
> uses interleave+zip1 approach, while rest of the vector initializers
> use fallback sequence.
>
> J is defined as:
> typedef _Float16 __attribute__((__vector_size__ (16))) J;
>
> and the initializer is:
> (J) { 11654, 4801, 5535, 9743, 61680}
>
> interleave+zip1 sequence for above initializer J:
> mode = V8HF
>
> vals: (parallel:V8HF [
> (reg:HF 642)
> (reg:HF 645)
> (reg:HF 648)
> (reg:HF 651)
> (reg:HF 654)
> (const_double:HF 0.0 [0x0.0p+0]) repeated x3
> ])
>
> target: (reg:V8HF 641)
> seq:
> (insn 1058 0 1059 (set (reg:V4HF 657)
> (const_vector:V4HF [
> (const_double:HF 0.0 [0x0.0p+0]) repeated x4
> ])) "test.c":81:8 -1
>  (nil))
> (insn 1059 1058 1060 (set (reg:V4HF 657)
> (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 642))
> (reg:V4HF 657)
> (const_int 1 [0x1]))) "test.c":81:8 -1
>  (nil))
> (insn 1060 1059 1061 (set (reg:V4HF 657)
> (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 648))
> (reg:V4HF 657)
> (const_int 2 [0x2]))) "test.c":81:8 -1
>  (nil))
> (insn 1061 1060 1062 (set (reg:V4HF 657)
> (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 654))
> (reg:V4HF 657)
> (const_int 4 [0x4]))) "test.c":81:8 -1
>  (nil))
> (insn 1062 1061 1063 (set (reg:V4HF 658)
> (const_vector:V4HF [
> (const_double:HF 0.0 [0x0.0p+0]) repeated x4
> ])) "test.c":81:8 -1
>  (nil))
> (insn 1063 1062 1064 (set (reg:V4HF 658)
> (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 645))
> (reg:V4HF 658)
> (const_int 1 [0x1]))) "test.c":81:8 -1
>  (nil))
> (insn 1064 1063 1065 (set (reg:V4HF 658)
> (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 651))
> (reg:V4HF 658)
> (const_int 2 [0x2]))) "test.c":81:8 -1
>  (nil))
> (insn 1065 1064 0 (set (reg:V8HF 641)
> (unspec:V8HF [
> (subreg:V8HF (reg:V4HF 657) 0)
> (subreg:V8HF (reg:V4HF 658) 0)
> ] UNSPEC_ZIP1)) "test.c":81:8 -1
>  (nil))
>
> It seems to me that the above sequence correctly initializes the
> vector into r641 ?
> insns 1058-1061 construct r657 = { r642, r648, r654, 0 }
> insns 1062-1064 construct r658 = { r645, r651, 0, 0 }
> and zip1 will create r641 = { r642, r645, r648, r651, r654, 0, 0, 0 }
>
> For the above test, it seems that with interleave+zip1 approach and
> -fstack-protector-all,
> in cse pass, there are two separate equivalence classes created for
> (const_int 1), that need
> to be merged in cse_insn:
>
>if (elt->first_same_value != src_eqv_elt->first_same_value)
> {
>   /* The REG_EQUAL is indicating that two formerly distinct
>  classes are now equivalent.  So merge them.  */
>   merge_equiv_classes (elt, src_eqv_elt);
>
> elt equivalence chain:
> Equivalence chain for (subreg:QI (reg:V16QI 671) 0):
> (subreg:QI (reg:V16QI 671) 0)
> (const_int 1 [0x1])
>
> src_eqv_elt equivalence chain:
> Equivalence chain for (const_int 1 [0x1]):
> (reg:QI 34 v2)
> (reg:QI 32 v0)
> (reg:QI 34 v2)
> (const_int 1 [0x1])
> (vec_select:QI (reg:V16QI 671)
> (parallel [
> (const_int 1 [0x1])
> ]))

Re: [PATCH v8] c++: implement P2564, consteval needs to propagate up [PR107687]

2023-12-06 Thread Prathamesh Kulkarni
On Tue, 5 Dec 2023 at 06:18, Marek Polacek  wrote:
>
> On Mon, Dec 04, 2023 at 04:49:29PM -0500, Jason Merrill wrote:
> > On 12/4/23 15:23, Marek Polacek wrote:
> > > +/* FN is not a consteval function, but may become one.  Remember to
> > > +   escalate it after all pending templates have been instantiated.  */
> > > +
> > > +void
> > > +maybe_store_immediate_escalating_fn (tree fn)
> > > +{
> > > +  if (unchecked_immediate_escalating_function_p (fn))
> > > +remember_escalating_expr (fn);
> > > +}
> >
> > > +++ b/gcc/cp/decl.cc
> > > @@ -18441,7 +18441,10 @@ finish_function (bool inline_p)
> > >if (!processing_template_decl
> > >&& !DECL_IMMEDIATE_FUNCTION_P (fndecl)
> > >&& !DECL_OMP_DECLARE_REDUCTION_P (fndecl))
> > > -cp_fold_function (fndecl);
> > > +{
> > > +  cp_fold_function (fndecl);
> > > +  maybe_store_immediate_escalating_fn (fndecl);
> > > +}
> >
> > I think maybe_store_, and the call to it from finish_function, are unneeded;
> > we will have already decided whether we need to remember the function during
> > the call to cp_fold_function.
>
> 'Tis true.
>
> > OK with that change.
>
> Here's what I pushed after another regtest.  Thanks!
Hi Marek,
It seems the patch caused following regressions on aarch64:

Running g++:g++.dg/modules/modules.exp ...
FAIL: g++.dg/modules/xtreme-header-4_b.C -std=c++2b (internal compiler
error: tree check: expected class 'type', have 'declaration'
(template_decl) in get_originating_module_decl, at cp/module.cc:18659)
FAIL: g++.dg/modules/xtreme-header-5_b.C -std=c++2b (internal compiler
error: tree check: expected class 'type', have 'declaration'
(template_decl) in get_originating_module_decl, at cp/module.cc:18659)
FAIL: g++.dg/modules/xtreme-header_b.C -std=c++2b (internal compiler
error: tree check: expected class 'type', have 'declaration'
(template_decl) in get_originating_module_decl, at cp/module.cc:18659)

Log files: 
https://ci.linaro.org/job/tcwg_gcc_check--master-aarch64-build/1299/artifact/artifacts/00-sumfiles/

Thanks,
Prathamesh
>
> -- >8 --
> This patch implements P2564, described at , whereby
> certain functions are promoted to consteval.  For example:
>
>   consteval int id(int i) { return i; }
>
>   template 
>   constexpr int f(T t)
>   {
> return t + id(t); // id causes f to be promoted to consteval
>   }
>
>   void g(int i)
>   {
> f (3);
>   }
>
> now compiles.  Previously the code was ill-formed: we would complain
> that 't' in 'f' is not a constant expression.  Since 'f' is now
> consteval, it means that the call to id(t) is in an immediate context,
> so doesn't have to produce a constant -- this is how we allow consteval
> functions composition.  But making 'f' consteval also means that
> the call to 'f' in 'g' must yield a constant; failure to do so results
> in an error.  I made the effort to have cc1plus explain to us what's
> going on.  For example, calling f(i) produces this neat diagnostic:
>
> w.C:11:11: error: call to consteval function 'f(i)' is not a constant 
> expression
>11 | f (i);
>   | ~~^~~
> w.C:11:11: error: 'i' is not a constant expression
> w.C:6:22: note: 'constexpr int f(T) [with T = int]' was promoted to an 
> immediate function because its body contains an immediate-escalating 
> expression 'id(t)'
> 6 | return t + id(t); // id causes f to be promoted to 
> consteval
>   |~~^~~
>
> which hopefully makes it clear what's going on.
>
> Implementing this proposal has been tricky.  One problem was delayed
> instantiation: instantiating a function can set off a domino effect
> where one call promotes a function to consteval but that then means
> that another function should also be promoted, etc.
>
> In v1, I addressed the delayed instantiation problem by instantiating
> trees early, so that we can escalate functions right away.  That caused
> a number of problems, and in certain cases, like consteval-prop3.C, it
> can't work, because we need to wait till EOF to see the definition of
> the function anyway.  Overeager instantiation tends to cause diagnostic
> problems too.
>
> In v2, I attempted to move the escalation to the gimplifier, at which
> point all templates have been instantiated.  That attempt flopped,
> however, because once we've gimplified a function, its body is discarded
> and as a consequence, you can no longer evaluate a call to that function
> which is required for escalating, which needs to decide if a call is
> a constant expression or not.
>
> Therefore, we have to perform the escalation before gimplifying, but
> after instantiate_pending_templates.  That's not easy because we have
> no way to walk all the trees.  In the v2 patch, I use two vectors: one
> to store function decls that may become consteval, and another to
> remember references to immediate-escalating functions.  Unfortunately
> the latter must also stash functions that call immediate-escalating
> functions.  Consider:
>
>   

Re: [PATCH v3 10/11] c: Turn -Wincompatible-pointer-types into a permerror

2023-12-06 Thread Prathamesh Kulkarni
On Mon, 20 Nov 2023 at 15:28, Florian Weimer  wrote:
>
> The change to build_conditional_expr drops the downgrade
> from a pedwarn to warning for builtins for C99 and later
> language dialects.  It remains a warning in C89 mode (not
> a permerror), as the -std=gnu89 -fno-permissive test shows.
Hi Florian,
It seems this patch caused a fallout for
gcc.dg/fixed-point/composite-type.c on arm, where the tests for
warnings fail.
For instance:
FAIL: gcc.dg/fixed-point/composite-type.c  (test for warnings, line 71)
Excess errors:
/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/gcc/testsuite/gcc.dg/fixed-point/composite-type.c:71:3:
error: passing argument 1 of 'f2_sf' from incompatible pointer type
[-Wincompatible-pointer-types]
/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/gcc/testsuite/gcc.dg/fixed-point/composite-type.c:71:3:
error: passing argument 1 of 'f2_sf' from incompatible pointer type
[-Wincompatible-pointer-types]
(snipped rest)

Should these warnings be now upgraded to dg-error ?

Thanks,
Prathamesh
>
> gcc/
>
> * doc/invoke.texi (Warning Options): Document changes.
>
> gcc/c/
>
> PR c/96284
> * c-typeck.cc (build_conditional_expr): Upgrade most pointer
> type mismatches to a permerror.
> (convert_for_assignment): Use permerror_opt and
> permerror_init for OPT_Wincompatible_pointer_types warnings.
>
> gcc/testsuite/
>
> * gcc.dg/permerror-default.c (incompatible_pointer_types):
> Expect new permerror.
> * gcc.dg/permerror-gnu89-nopermissive.c
> (incompatible_pointer_types):   Likewise.
> * gcc.dg/permerror-pedantic.c (incompatible_pointer_types):
> Likewise.
> * gcc.dg/permerror-system.c: Likewise.
> * gcc.dg/Wincompatible-pointer-types-2.c: Compile with
> -fpermissivedue to expected errors.
> * gcc.dg/Wincompatible-pointer-types-5.c: New test.  Copied
> from gcc.dg/Wincompatible-pointer-types-2.c.  Expect errors.
> * gcc.dg/anon-struct-11.c: Compile with -fpermissive
> due to expected errors.
> * gcc.dg/anon-struct-11a.c: New test.  Copied from
> gcc.dg/anon-struct-11.c.  Expect errors.
> * gcc.dg/anon-struct-13.c: Compile with -fpermissive
> due to expected errors.
> * gcc.dg/anon-struct-13a.c: New test.  Copied from
> gcc.dg/anon-struct-13.c.  Expect errors.
> * gcc.dg/builtin-arith-overflow-4.c: Compile with -fpermissive
> due to expected errors.
> * gcc.dg/builtin-arith-overflow-4a.c: New test.  Copied from
> gcc.dg/builtin-arith-overflow-4.c.  Expect errors.
> * gcc.dg/c23-qual-4.c: Expect -Wincompatible-pointer-types errors.
> * gcc.dg/dfp/composite-type.c: Compile with -fpermissive
> due to expected errors.
> * gcc.dg/dfp/composite-type-2.c: New test.  Copied from
> gcc.dg/dfp/composite-type.c.  Expect errors.
> * gcc.dg/diag-aka-1.c: Compile with -fpermissive
> due to expected errors.
> * gcc.dg/diag-aka-1a.c: New test.  Copied from gcc.dg/diag-aka-1a.c.
> Expect errors.
> * gcc.dg/enum-compat-1.c: Compile with -fpermissive
> due to expected errors.
> * gcc.dg/enum-compat-2.c: New test.  Copied from
> gcc.dg/enum-compat-1.c.  Expect errors.
> * gcc.dg/func-ptr-conv-1.c: Compile with -fpermissive
> due to expected errors.
> * gcc.dg/func-ptr-conv-2.c: New test.  Copied from
> gcc.dg/func-ptr-conv-1.c.  Expect errors.
> * gcc.dg/init-bad-7.c: Compile with -fpermissive
> due to expected errors.
> * gcc.dg/init-bad-7a.c: New test.  Copied from gcc.dg/init-bad-7.c.
> Expect errors.
> * gcc.dg/noncompile/incomplete-3.c (foo): Expect
> -Wincompatible-pointer-types error.
> * gcc.dg/param-type-mismatch-2.c (test8): Likewise.
> * gcc.dg/pointer-array-atomic.c: Compile with -fpermissive
> due to expected errors.
> * gcc.dg/pointer-array-atomic-2.c: New test.  Copied from
> gcc.dg/pointer-array-atomic.c.  Expect errors.
> * gcc.dg/pointer-array-quals-1.c (test): Expect
> -Wincompatible-pointer-types errors.
> * gcc.dg/transparent-union-1.c: Compile with -fpermissive
> due to expected errors.
> * gcc.dg/transparent-union-1a.c: New test.  Copied from
> gcc.dg/transparent-union-1.c.  Expect errors.
> * gcc.target/aarch64/acle/memtag_2a.c
> (test_memtag_warning_return_qualifier): Expect additional
> errors.
> * gcc.target/aarch64/sve/acle/general-c/load_2.c (f1): Likewise.
> * gcc.target/aarch64/sve/acle/general-c/load_ext_gather_offset_1.c
> (f1): Likewise.
> * gcc.target/aarch64/sve/acle/general-c/load_ext_gather_offset_2.c
> (f1): Likewise.
> * gcc.target/aarch64/sve/acle/general-c/load

Re: [PATCH 4/5] aarch64: rcpc3: add Neon ACLE wrapper functions to `arm_neon.h'

2023-12-07 Thread Prathamesh Kulkarni
On Thu, 9 Nov 2023 at 19:44, Victor Do Nascimento
 wrote:
>
> Create the necessary mappings from the ACLE-defined Neon intrinsics
> names[1] to the internal builtin function names.
>
> [1] https://arm-software.github.io/acle/neon_intrinsics/advsimd.html
Hi Victor,
It seems this patch broke kernel build after the recent patch to
upgrade -Wincompatible-pointer-types to an error:

00:00:56 
/home/tcwg-buildslave/workspace/tcwg_kernel_1/abe/builds/destdir/x86_64-pc-linux-gnu/lib/gcc/aarch64-linux-gnu/14.0.0/include/arm_neon.h:
In function ‘vldap1_lane_s64’:
00:00:56 
/home/tcwg-buildslave/workspace/tcwg_kernel_1/abe/builds/destdir/x86_64-pc-linux-gnu/lib/gcc/aarch64-linux-gnu/14.0.0/include/arm_neon.h:13474:48:
error: passing argument 1 of ‘__builtin_aarch64_vec_ldap1_lanev1di’
from incompatible pointer type [-Wincompatible-pointer-types]
00:00:56 13474 |   return __builtin_aarch64_vec_ldap1_lanev1di (__src,
__vec, __lane);
00:00:56   |^
00:00:56   ||
00:00:56   |const
int64_t * {aka const long long int *}
00:00:56 
/home/tcwg-buildslave/workspace/tcwg_kernel_1/abe/builds/destdir/x86_64-pc-linux-gnu/lib/gcc/aarch64-linux-gnu/14.0.0/include/arm_neon.h:13474:48:
note: expected ‘const long int *’ but argument is of type ‘const
int64_t *’ {aka ‘const long long int *’}

Looking cursorily at the code, should __src be casted to
(__builtin_aarch64_simd_di *) before passing it to
__builtin_aarch64_vec_ldap1_lanev1di ?
For more details, please see:
https://ci.linaro.org/job/tcwg_kernel--gnu-master-aarch64-next-defconfig-build/91/artifact/artifacts/notify/mail-body.txt/*view*/

Thanks,
Prathamesh


>
> gcc/ChangeLog:
>
> * gcc/config/aarch64/arm_neon.h (vldap1_lane_u64): New.
> (vldap1q_lane_u64): Likewise.
> (vldap1_lane_s64): Likewise.
> (vldap1q_lane_s64): Likewise.
> (vldap1_lane_f64): Likewise.
> (vldap1q_lane_f64): Likewise.
> (vldap1_lane_p64): Likewise.
> (vldap1q_lane_p64): Likewise.
> (vstl1_lane_u64): Likewise.
> (vstl1q_lane_u64): Likewise.
> (vstl1_lane_s64): Likewise.
> (vstl1q_lane_s64): Likewise.
> (vstl1_lane_f64): Likewise.
> (vstl1q_lane_f64): Likewise.
> (vstl1_lane_p64): Likewise.
> (vstl1q_lane_p64): Likewise.
> ---
>  gcc/config/aarch64/arm_neon.h | 129 ++
>  1 file changed, 129 insertions(+)
>
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index 349f3167699..ef0d75e07ce 100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -13446,6 +13446,135 @@ vld1q_lane_u64 (const uint64_t *__src, uint64x2_t 
> __vec, const int __lane)
>return __aarch64_vset_lane_any (*__src, __vec, __lane);
>  }
>
> +#pragma GCC push_options
> +#pragma GCC target ("+nothing+rcpc3+simd")
> +
> +/* vldap1_lane.  */
> +
> +__extension__ extern __inline uint64x1_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +vldap1_lane_u64 (const uint64_t *__src, uint64x1_t __vec, const int __lane)
> +{
> +  return __builtin_aarch64_vec_ldap1_lanev1di_usus (
> + (__builtin_aarch64_simd_di *) __src, __vec, __lane);
> +}
> +
> +__extension__ extern __inline uint64x2_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +vldap1q_lane_u64 (const uint64_t *__src, uint64x2_t __vec, const int __lane)
> +{
> +  return __builtin_aarch64_vec_ldap1_lanev2di_usus (
> + (__builtin_aarch64_simd_di *) __src, __vec, __lane);
> +}
> +
> +__extension__ extern __inline int64x1_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +vldap1_lane_s64 (const int64_t *__src, int64x1_t __vec, const int __lane)
> +{
> +  return __builtin_aarch64_vec_ldap1_lanev1di (__src, __vec, __lane);
> +}
> +
> +__extension__ extern __inline int64x2_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +vldap1q_lane_s64 (const int64_t *__src, int64x2_t __vec, const int __lane)
> +{
> +  return __builtin_aarch64_vec_ldap1_lanev2di (__src, __vec, __lane);
> +}
> +
> +__extension__ extern __inline float64x1_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +vldap1_lane_f64 (const float64_t *__src, float64x1_t __vec, const int __lane)
> +{
> +  return __builtin_aarch64_vec_ldap1_lanev1df (__src, __vec, __lane);
> +}
> +
> +__extension__ extern __inline float64x2_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +vldap1q_lane_f64 (const float64_t *__src, float64x2_t __vec, const int 
> __lane)
> +{
> +  return __builtin_aarch64_vec_ldap1_lanev2df (__src, __vec, __lane);
> +}
> +
> +__extension__ extern __inline poly64x1_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +vldap1_lane_p64 (const poly64_t *__src, poly64x1_t __vec, c

Re: PR111754

2023-11-08 Thread Prathamesh Kulkarni
On Thu, 26 Oct 2023 at 09:43, Prathamesh Kulkarni
 wrote:
>
> On Thu, 26 Oct 2023 at 04:09, Richard Sandiford
>  wrote:
> >
> > Prathamesh Kulkarni  writes:
> > > On Wed, 25 Oct 2023 at 02:58, Richard Sandiford
> > >  wrote:
> > >>
> > >> Hi,
> > >>
> > >> Sorry the slow review.  I clearly didn't think this through properly
> > >> when doing the review of the original patch, so I wanted to spend
> > >> some time working on the code to get a better understanding of
> > >> the problem.
> > >>
> > >> Prathamesh Kulkarni  writes:
> > >> > Hi,
> > >> > For the following test-case:
> > >> >
> > >> > typedef float __attribute__((__vector_size__ (16))) F;
> > >> > F foo (F a, F b)
> > >> > {
> > >> >   F v = (F) { 9 };
> > >> >   return __builtin_shufflevector (v, v, 1, 0, 1, 2);
> > >> > }
> > >> >
> > >> > Compiling with -O2 results in following ICE:
> > >> > foo.c: In function ‘foo’:
> > >> > foo.c:6:10: internal compiler error: in decompose, at rtl.h:2314
> > >> > 6 |   return __builtin_shufflevector (v, v, 1, 0, 1, 2);
> > >> >   |  ^~
> > >> > 0x7f3185 wi::int_traits
> > >> >>::decompose(long*, unsigned int, std::pair
> > >> > const&)
> > >> > ../../gcc/gcc/rtl.h:2314
> > >> > 0x7f3185 wide_int_ref_storage > >> > false>::wide_int_ref_storage
> > >> >>(std::pair const&)
> > >> > ../../gcc/gcc/wide-int.h:1089
> > >> > 0x7f3185 generic_wide_int
> > >> >>::generic_wide_int
> > >> >>(std::pair const&)
> > >> > ../../gcc/gcc/wide-int.h:847
> > >> > 0x7f3185 poly_int<1u, generic_wide_int > >> > false> > >::poly_int
> > >> >>(poly_int_full, std::pair const&)
> > >> > ../../gcc/gcc/poly-int.h:467
> > >> > 0x7f3185 poly_int<1u, generic_wide_int > >> > false> > >::poly_int
> > >> >>(std::pair const&)
> > >> > ../../gcc/gcc/poly-int.h:453
> > >> > 0x7f3185 wi::to_poly_wide(rtx_def const*, machine_mode)
> > >> > ../../gcc/gcc/rtl.h:2383
> > >> > 0x7f3185 rtx_vector_builder::step(rtx_def*, rtx_def*) const
> > >> > ../../gcc/gcc/rtx-vector-builder.h:122
> > >> > 0xfd4e1b vector_builder > >> > rtx_vector_builder>::elt(unsigned int) const
> > >> > ../../gcc/gcc/vector-builder.h:253
> > >> > 0xfd4d11 rtx_vector_builder::build()
> > >> > ../../gcc/gcc/rtx-vector-builder.cc:73
> > >> > 0xc21d9c const_vector_from_tree
> > >> > ../../gcc/gcc/expr.cc:13487
> > >> > 0xc21d9c expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
> > >> > expand_modifier, rtx_def**, bool)
> > >> > ../../gcc/gcc/expr.cc:11059
> > >> > 0xaee682 expand_expr(tree_node*, rtx_def*, machine_mode, 
> > >> > expand_modifier)
> > >> > ../../gcc/gcc/expr.h:310
> > >> > 0xaee682 expand_return
> > >> > ../../gcc/gcc/cfgexpand.cc:3809
> > >> > 0xaee682 expand_gimple_stmt_1
> > >> > ../../gcc/gcc/cfgexpand.cc:3918
> > >> > 0xaee682 expand_gimple_stmt
> > >> > ../../gcc/gcc/cfgexpand.cc:4044
> > >> > 0xaf28f0 expand_gimple_basic_block
> > >> > ../../gcc/gcc/cfgexpand.cc:6100
> > >> > 0xaf4996 execute
> > >> > ../../gcc/gcc/cfgexpand.cc:6835
> > >> >
> > >> > IIUC, the issue is that fold_vec_perm returns a vector having float 
> > >> > element
> > >> > type with res_nelts_per_pattern == 3, and later ICE's when it tries
> > >> > to derive element v[3], not present in the encoding, while trying to
> > >> > build rtx vector
> > >> > in rtx_vector_builder::build():
> > >> >  for (unsigned int i = 0; i < nelts; ++i)
> > >> > RTVEC_ELT (v, i) = elt (i);
> > >> >
> > >> > The attached patch tries to fix this by returning false from
> > >> > valid_mask_for_fold_vec_perm_cst if 

Re: [PATCH v3] libiberty: Use posix_spawn in pex-unix when available.

2023-11-10 Thread Prathamesh Kulkarni
On Thu, 5 Oct 2023 at 00:00, Brendan Shanks  wrote:
>
> Hi,
>
> This patch implements pex_unix_exec_child using posix_spawn when
> available.
>
> This should especially benefit recent macOS (where vfork just calls
> fork), but should have equivalent or faster performance on all
> platforms.
> In addition, the implementation is substantially simpler than the
> vfork+exec code path.
>
> Tested on x86_64-linux.
Hi Brendan,
It seems this patch caused the following regressions on aarch64:

FAIL: g++.dg/modules/bad-mapper-1.C -std=c++17  at line 3 (test for
errors, line )
FAIL: g++.dg/modules/bad-mapper-1.C -std=c++17 (test for excess errors)
FAIL: g++.dg/modules/bad-mapper-1.C -std=c++2a  at line 3 (test for
errors, line )
FAIL: g++.dg/modules/bad-mapper-1.C -std=c++2a (test for excess errors)
FAIL: g++.dg/modules/bad-mapper-1.C -std=c++2b  at line 3 (test for
errors, line )
FAIL: g++.dg/modules/bad-mapper-1.C -std=c++2b (test for excess errors)

Looking at g++.log:
/home/tcwg-buildslave/workspace/tcwg_gnu_2/abe/snapshots/gcc.git~master/gcc/testsuite/g++.dg/modules/bad-mapper-1.C:
error: failed posix_spawnp mapper 'this-will-not-work'
In module imported at
/home/tcwg-buildslave/workspace/tcwg_gnu_2/abe/snapshots/gcc.git~master/gcc/testsuite/g++.dg/modules/bad-mapper-1.C:2:1:
unique1.bob: error: failed to read compiled module: No such file or directory
unique1.bob: note: compiled module file is 'gcm.cache/unique1.bob.gcm'
unique1.bob: note: imports must be built before being imported
unique1.bob: fatal error: returning to the gate for a mechanical issue
compilation terminated.

Link to log files:
https://ci.linaro.org/job/tcwg_gcc_check--master-aarch64-build/1159/artifact/artifacts/00-sumfiles/
Could you please investigate ?

Thanks,
Prathamesh
>
> v2: Fix error handling (previously the function would be run twice in
> case of error), and don't use a macro that changes control flow.
>
> v3: Match file style for error-handling blocks, don't close
> in/out/errdes on error, and check close() for errors.
>
> libiberty/
> * configure.ac (AC_CHECK_HEADERS): Add spawn.h.
> (checkfuncs): Add posix_spawn, posix_spawnp.
> (AC_CHECK_FUNCS): Add posix_spawn, posix_spawnp.
> * configure, config.in: Rebuild.
> * pex-unix.c [HAVE_POSIX_SPAWN] (pex_unix_exec_child): New function.
>
> Signed-off-by: Brendan Shanks 
> ---
>  libiberty/configure.ac |   8 +-
>  libiberty/pex-unix.c   | 168 +
>  2 files changed, 173 insertions(+), 3 deletions(-)
>
> diff --git a/libiberty/configure.ac b/libiberty/configure.ac
> index 0748c592704..2488b031bc8 100644
> --- a/libiberty/configure.ac
> +++ b/libiberty/configure.ac
> @@ -289,7 +289,7 @@ AC_SUBST_FILE(host_makefile_frag)
>  # It's OK to check for header files.  Although the compiler may not be
>  # able to link anything, it had better be able to at least compile
>  # something.
> -AC_CHECK_HEADERS(sys/file.h sys/param.h limits.h stdlib.h malloc.h string.h 
> unistd.h strings.h sys/time.h time.h sys/resource.h sys/stat.h sys/mman.h 
> fcntl.h alloca.h sys/pstat.h sys/sysmp.h sys/sysinfo.h machine/hal_sysinfo.h 
> sys/table.h sys/sysctl.h sys/systemcfg.h stdint.h stdio_ext.h process.h 
> sys/prctl.h)
> +AC_CHECK_HEADERS(sys/file.h sys/param.h limits.h stdlib.h malloc.h string.h 
> unistd.h strings.h sys/time.h time.h sys/resource.h sys/stat.h sys/mman.h 
> fcntl.h alloca.h sys/pstat.h sys/sysmp.h sys/sysinfo.h machine/hal_sysinfo.h 
> sys/table.h sys/sysctl.h sys/systemcfg.h stdint.h stdio_ext.h process.h 
> sys/prctl.h spawn.h)
>  AC_HEADER_SYS_WAIT
>  AC_HEADER_TIME
>
> @@ -412,7 +412,8 @@ funcs="$funcs setproctitle"
>  vars="sys_errlist sys_nerr sys_siglist"
>
>  checkfuncs="__fsetlocking canonicalize_file_name dup3 getrlimit getrusage \
> - getsysinfo gettimeofday on_exit pipe2 psignal pstat_getdynamic 
> pstat_getstatic \
> + getsysinfo gettimeofday on_exit pipe2 posix_spawn posix_spawnp psignal \
> + pstat_getdynamic pstat_getstatic \
>   realpath setrlimit spawnve spawnvpe strerror strsignal sysconf sysctl \
>   sysmp table times wait3 wait4"
>
> @@ -435,7 +436,8 @@ if test "x" = "y"; then
>  index insque \
>  memchr memcmp memcpy memmem memmove memset mkstemps \
>  on_exit \
> -pipe2 psignal pstat_getdynamic pstat_getstatic putenv \
> +pipe2 posix_spawn posix_spawnp psignal \
> +pstat_getdynamic pstat_getstatic putenv \
>  random realpath rename rindex \
>  sbrk setenv setproctitle setrlimit sigsetmask snprintf spawnve spawnvpe \
>   stpcpy stpncpy strcasecmp strchr strdup \
> diff --git a/libiberty/pex-unix.c b/libiberty/pex-unix.c
> index 33b5bce31c2..336799d1125 100644
> --- a/libiberty/pex-unix.c
> +++ b/libiberty/pex-unix.c
> @@ -58,6 +58,9 @@ extern int errno;
>  #ifdef HAVE_PROCESS_H
>  #include 
>  #endif
> +#ifdef HAVE_SPAWN_H
> +#include 
> +#endif
>
>  #ifdef vfork /* Autoconf may define this to fork for us. */
>  # define VFORK_STRING "fork"
> @@ -559,6 +562

Re: [PATCH v3 2/2]middle-end match.pd: optimize fneg (fabs (x)) to copysign (x, -1) [PR109154]

2023-11-10 Thread Prathamesh Kulkarni
On Mon, 6 Nov 2023 at 15:50, Tamar Christina  wrote:
>
> Hi All,
>
> This patch transforms fneg (fabs (x)) into copysign (x, -1) which is more
> canonical and allows a target to expand this sequence efficiently.  Such
> sequences are common in scientific code working with gradients.
>
> There is an existing canonicalization of copysign (x, -1) to fneg (fabs (x))
> which I remove since this is a less efficient form.  The testsuite is also
> updated in light of this.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Hi Tamar,
It seems the patch caused following regressions on arm:

Running gcc:gcc.dg/dg.exp ...
FAIL: gcc.dg/pr55152-2.c scan-tree-dump-times optimized ".COPYSIGN" 1
FAIL: gcc.dg/pr55152-2.c scan-tree-dump-times optimized "ABS_EXPR" 1

Running gcc:gcc.dg/tree-ssa/tree-ssa.exp ...
FAIL: gcc.dg/tree-ssa/abs-4.c scan-tree-dump-times optimized "= -" 1
FAIL: gcc.dg/tree-ssa/abs-4.c scan-tree-dump-times optimized "= .COPYSIGN" 2
FAIL: gcc.dg/tree-ssa/abs-4.c scan-tree-dump-times optimized "= ABS_EXPR" 1
FAIL: gcc.dg/tree-ssa/backprop-6.c scan-tree-dump-times backprop
"Deleting[^\\n]* = -" 4
FAIL: gcc.dg/tree-ssa/backprop-6.c scan-tree-dump-times backprop
"Deleting[^\\n]* = ABS_EXPR <" 1
FAIL: gcc.dg/tree-ssa/backprop-6.c scan-tree-dump-times backprop
"Deleting[^\\n]* = \\.COPYSIGN" 2
FAIL: gcc.dg/tree-ssa/copy-sign-2.c scan-tree-dump-times optimized ".COPYSIGN" 1
FAIL: gcc.dg/tree-ssa/copy-sign-2.c scan-tree-dump-times optimized "ABS" 1
FAIL: gcc.dg/tree-ssa/mult-abs-2.c scan-tree-dump-times gimple ".COPYSIGN" 4
FAIL: gcc.dg/tree-ssa/mult-abs-2.c scan-tree-dump-times gimple "ABS" 4
FAIL: gcc.dg/tree-ssa/phi-opt-24.c scan-tree-dump-not phiopt2 "if"
Link to log files:
https://ci.linaro.org/job/tcwg_gcc_check--master-arm-build/1240/artifact/artifacts/00-sumfiles/

Even for following test-case:
double g (double a)
{
  double t1 = fabs (a);
  double t2 = -t1;
  return t2;
}

It seems, the pattern gets applied but doesn't get eventually
simplified to copysign(a, -1).
forwprop dump shows:
Applying pattern match.pd:1131, gimple-match-4.cc:4134
double g (double a)
{
  double t2;
  double t1;

   :
  t1_2 = ABS_EXPR ;
  t2_3 = -t1_2;
  return t2_3;

}

while on x86_64:
Applying pattern match.pd:1131, gimple-match-4.cc:4134
gimple_simplified to t2_3 = .COPYSIGN (a_1(D), -1.0e+0);
Removing dead stmt:t1_2 = ABS_EXPR ;
double g (double a)
{
  double t2;
  double t1;

   :
  t2_3 = .COPYSIGN (a_1(D), -1.0e+0);
  return t2_3;

}

Thanks,
Prathamesh


>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> PR tree-optimization/109154
> * match.pd: Add new neg+abs rule, remove inverse copysign rule.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/109154
> * gcc.dg/fold-copysign-1.c: Updated.
> * gcc.dg/pr55152-2.c: Updated.
> * gcc.dg/tree-ssa/abs-4.c: Updated.
> * gcc.dg/tree-ssa/backprop-6.c: Updated.
> * gcc.dg/tree-ssa/copy-sign-2.c: Updated.
> * gcc.dg/tree-ssa/mult-abs-2.c: Updated.
> * gcc.target/aarch64/fneg-abs_1.c: New test.
> * gcc.target/aarch64/fneg-abs_2.c: New test.
> * gcc.target/aarch64/fneg-abs_3.c: New test.
> * gcc.target/aarch64/fneg-abs_4.c: New test.
> * gcc.target/aarch64/sve/fneg-abs_1.c: New test.
> * gcc.target/aarch64/sve/fneg-abs_2.c: New test.
> * gcc.target/aarch64/sve/fneg-abs_3.c: New test.
> * gcc.target/aarch64/sve/fneg-abs_4.c: New test.
>
> --- inline copy of patch --
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 
> db95931df0672cf4ef08cca36085c3aa6831519e..7a023d510c283c43a87b1795a74761b8af979b53
>  100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -1106,13 +1106,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (hypots @0 (copysigns @1 @2))
> (hypots @0 @1
>
> -/* copysign(x, CST) -> [-]abs (x).  */
> -(for copysigns (COPYSIGN_ALL)
> - (simplify
> -  (copysigns @0 REAL_CST@1)
> -  (if (REAL_VALUE_NEGATIVE (TREE_REAL_CST (@1)))
> -   (negate (abs @0))
> -   (abs @0
> +/* Transform fneg (fabs (X)) -> copysign (X, -1).  */
> +
> +(simplify
> + (negate (abs @0))
> + (IFN_COPYSIGN @0 { build_minus_one_cst (type); }))
>
>  /* copysign(copysign(x, y), z) -> copysign(x, z).  */
>  (for copysigns (COPYSIGN_ALL)
> diff --git a/gcc/testsuite/gcc.dg/fold-copysign-1.c 
> b/gcc/testsuite/gcc.dg/fold-copysign-1.c
> index 
> f17d65c24ee4dca9867827d040fe0a404c515e7b..f9cafd14ab05f5e8ab2f6f68e62801d21c2df6a6
>  100644
> --- a/gcc/testsuite/gcc.dg/fold-copysign-1.c
> +++ b/gcc/testsuite/gcc.dg/fold-copysign-1.c
> @@ -12,5 +12,5 @@ double bar (double x)
>return __builtin_copysign (x, minuszero);
>  }
>
> -/* { dg-final { scan-tree-dump-times "= -" 1 "cddce1" } } */
> -/* { dg-final { scan-tree-dump-times "= ABS_EXPR" 2 "cddce1" } } */
> +/* { dg-final { scan-tree-dump-times "__builtin_copysign" 1 "cddce1" } } */
> +/* { dg-final { scan-tree-dump-times "= ABS_EXPR" 1 "cddce1" } } */
> diff --git a/gcc/testsuite/gcc

Re: PR111754

2023-11-15 Thread Prathamesh Kulkarni
On Wed, 8 Nov 2023 at 21:57, Prathamesh Kulkarni
 wrote:
>
> On Thu, 26 Oct 2023 at 09:43, Prathamesh Kulkarni
>  wrote:
> >
> > On Thu, 26 Oct 2023 at 04:09, Richard Sandiford
> >  wrote:
> > >
> > > Prathamesh Kulkarni  writes:
> > > > On Wed, 25 Oct 2023 at 02:58, Richard Sandiford
> > > >  wrote:
> > > >>
> > > >> Hi,
> > > >>
> > > >> Sorry the slow review.  I clearly didn't think this through properly
> > > >> when doing the review of the original patch, so I wanted to spend
> > > >> some time working on the code to get a better understanding of
> > > >> the problem.
> > > >>
> > > >> Prathamesh Kulkarni  writes:
> > > >> > Hi,
> > > >> > For the following test-case:
> > > >> >
> > > >> > typedef float __attribute__((__vector_size__ (16))) F;
> > > >> > F foo (F a, F b)
> > > >> > {
> > > >> >   F v = (F) { 9 };
> > > >> >   return __builtin_shufflevector (v, v, 1, 0, 1, 2);
> > > >> > }
> > > >> >
> > > >> > Compiling with -O2 results in following ICE:
> > > >> > foo.c: In function ‘foo’:
> > > >> > foo.c:6:10: internal compiler error: in decompose, at rtl.h:2314
> > > >> > 6 |   return __builtin_shufflevector (v, v, 1, 0, 1, 2);
> > > >> >   |  ^~
> > > >> > 0x7f3185 wi::int_traits
> > > >> >>::decompose(long*, unsigned int, std::pair
> > > >> > const&)
> > > >> > ../../gcc/gcc/rtl.h:2314
> > > >> > 0x7f3185 wide_int_ref_storage > > >> > false>::wide_int_ref_storage
> > > >> >>(std::pair const&)
> > > >> > ../../gcc/gcc/wide-int.h:1089
> > > >> > 0x7f3185 generic_wide_int
> > > >> >>::generic_wide_int
> > > >> >>(std::pair const&)
> > > >> > ../../gcc/gcc/wide-int.h:847
> > > >> > 0x7f3185 poly_int<1u, generic_wide_int > > >> > false> > >::poly_int
> > > >> >>(poly_int_full, std::pair const&)
> > > >> > ../../gcc/gcc/poly-int.h:467
> > > >> > 0x7f3185 poly_int<1u, generic_wide_int > > >> > false> > >::poly_int
> > > >> >>(std::pair const&)
> > > >> > ../../gcc/gcc/poly-int.h:453
> > > >> > 0x7f3185 wi::to_poly_wide(rtx_def const*, machine_mode)
> > > >> > ../../gcc/gcc/rtl.h:2383
> > > >> > 0x7f3185 rtx_vector_builder::step(rtx_def*, rtx_def*) const
> > > >> > ../../gcc/gcc/rtx-vector-builder.h:122
> > > >> > 0xfd4e1b vector_builder > > >> > rtx_vector_builder>::elt(unsigned int) const
> > > >> > ../../gcc/gcc/vector-builder.h:253
> > > >> > 0xfd4d11 rtx_vector_builder::build()
> > > >> > ../../gcc/gcc/rtx-vector-builder.cc:73
> > > >> > 0xc21d9c const_vector_from_tree
> > > >> > ../../gcc/gcc/expr.cc:13487
> > > >> > 0xc21d9c expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
> > > >> > expand_modifier, rtx_def**, bool)
> > > >> > ../../gcc/gcc/expr.cc:11059
> > > >> > 0xaee682 expand_expr(tree_node*, rtx_def*, machine_mode, 
> > > >> > expand_modifier)
> > > >> > ../../gcc/gcc/expr.h:310
> > > >> > 0xaee682 expand_return
> > > >> > ../../gcc/gcc/cfgexpand.cc:3809
> > > >> > 0xaee682 expand_gimple_stmt_1
> > > >> > ../../gcc/gcc/cfgexpand.cc:3918
> > > >> > 0xaee682 expand_gimple_stmt
> > > >> > ../../gcc/gcc/cfgexpand.cc:4044
> > > >> > 0xaf28f0 expand_gimple_basic_block
> > > >> > ../../gcc/gcc/cfgexpand.cc:6100
> > > >> > 0xaf4996 execute
> > > >> > ../../gcc/gcc/cfgexpand.cc:6835
> > > >> >
> > > >> > IIUC, the issue is that fold_vec_perm returns a vector having float 
> > > >> > element
> > > >> > type with res_nelts_per_pattern == 3, and later ICE'

[aarch64] PR111702 - ICE in insert_regs after interleave+zip1 vector initialization patch

2023-11-23 Thread Prathamesh Kulkarni
Hi Richard,
For the test-case mentioned in PR111702, compiling with -O2
-frounding-math -fstack-protector-all results in following ICE during
cse2 pass:

test.c: In function 'foo':
test.c:119:1: internal compiler error: in insert_regs, at cse.cc:1120
  119 | }
  | ^
0xb7ebb0 insert_regs
../../gcc/gcc/cse.cc:1120
0x1f95134 merge_equiv_classes
../../gcc/gcc/cse.cc:1764
0x1f9b9ab cse_insn
../../gcc/gcc/cse.cc:4793
0x1f9fe30 cse_extended_basic_block
../../gcc/gcc/cse.cc:6577
0x1f9fe30 cse_main
../../gcc/gcc/cse.cc:6722
0x1fa0984 rest_of_handle_cse2
../../gcc/gcc/cse.cc:7620
0x1fa0984 execute
../../gcc/gcc/cse.cc:7675

This happens only with interleave+zip1 vector initialization with
-frounding-math -fstack-protector-all, while it compiles OK without
-fstack-protector-all. Also, it compiles OK with fallback sequence
code-gen (with or without -fstack-protector-all). Unfortunately, I
haven't been able to reduce the test-case further :/

>From the test-case, it seems only the vector initializer for type J
uses interleave+zip1 approach, while rest of the vector initializers
use fallback sequence.

J is defined as:
typedef _Float16 __attribute__((__vector_size__ (16))) J;

and the initializer is:
(J) { 11654, 4801, 5535, 9743, 61680}

interleave+zip1 sequence for above initializer J:
mode = V8HF

vals: (parallel:V8HF [
(reg:HF 642)
(reg:HF 645)
(reg:HF 648)
(reg:HF 651)
(reg:HF 654)
(const_double:HF 0.0 [0x0.0p+0]) repeated x3
])

target: (reg:V8HF 641)
seq:
(insn 1058 0 1059 (set (reg:V4HF 657)
(const_vector:V4HF [
(const_double:HF 0.0 [0x0.0p+0]) repeated x4
])) "test.c":81:8 -1
 (nil))
(insn 1059 1058 1060 (set (reg:V4HF 657)
(vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 642))
(reg:V4HF 657)
(const_int 1 [0x1]))) "test.c":81:8 -1
 (nil))
(insn 1060 1059 1061 (set (reg:V4HF 657)
(vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 648))
(reg:V4HF 657)
(const_int 2 [0x2]))) "test.c":81:8 -1
 (nil))
(insn 1061 1060 1062 (set (reg:V4HF 657)
(vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 654))
(reg:V4HF 657)
(const_int 4 [0x4]))) "test.c":81:8 -1
 (nil))
(insn 1062 1061 1063 (set (reg:V4HF 658)
(const_vector:V4HF [
(const_double:HF 0.0 [0x0.0p+0]) repeated x4
])) "test.c":81:8 -1
 (nil))
(insn 1063 1062 1064 (set (reg:V4HF 658)
(vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 645))
(reg:V4HF 658)
(const_int 1 [0x1]))) "test.c":81:8 -1
 (nil))
(insn 1064 1063 1065 (set (reg:V4HF 658)
(vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 651))
(reg:V4HF 658)
(const_int 2 [0x2]))) "test.c":81:8 -1
 (nil))
(insn 1065 1064 0 (set (reg:V8HF 641)
(unspec:V8HF [
(subreg:V8HF (reg:V4HF 657) 0)
(subreg:V8HF (reg:V4HF 658) 0)
] UNSPEC_ZIP1)) "test.c":81:8 -1
 (nil))

It seems to me that the above sequence correctly initializes the
vector into r641 ?
insns 1058-1061 construct r657 = { r642, r648, r654, 0 }
insns 1062-1064 construct r658 = { r645, r651, 0, 0 }
and zip1 will create r641 = { r642, r645, r648, r651, r654, 0, 0, 0 }

For the above test, it seems that with interleave+zip1 approach and
-fstack-protector-all,
in cse pass, there are two separate equivalence classes created for
(const_int 1), that need
to be merged in cse_insn:

   if (elt->first_same_value != src_eqv_elt->first_same_value)
{
  /* The REG_EQUAL is indicating that two formerly distinct
 classes are now equivalent.  So merge them.  */
  merge_equiv_classes (elt, src_eqv_elt);

elt equivalence chain:
Equivalence chain for (subreg:QI (reg:V16QI 671) 0):
(subreg:QI (reg:V16QI 671) 0)
(const_int 1 [0x1])

src_eqv_elt equivalence chain:
Equivalence chain for (const_int 1 [0x1]):
(reg:QI 34 v2)
(reg:QI 32 v0)
(reg:QI 34 v2)
(const_int 1 [0x1])
(vec_select:QI (reg:V16QI 671)
(parallel [
(const_int 1 [0x1])
]))
(vec_select:QI (reg:V16QI 32 v0)
(parallel [
(const_int 1 [0x1])
]))
(vec_select:QI (reg:V16QI 33 v1)
(parallel [
(const_int 2 [0x2])
]))
(vec_select:QI (reg:V16QI 33 v1)
(parallel [
(const_int 1 [0x1])
]))

The issue is that merge_equiv_classes doesn't seem to deal correctly with
multiple occurences of same register in class2 (src_eqv_elt), which
has two occurrences of
(reg:QI 34 v2)

In merge_equiv_classes, on first iteration, it will remove (reg:QI 34)
from reg_equiv_table
by calling delete_equiv_reg(34), and in insert_regs it will create an
entry for (reg:QI 34) in qty_table with new quantity number, and
create new equivalence in reg_eqv_table.

When we again come across (reg:QI 34) in class2, it will
unconditional

Re: PR111754

2023-11-23 Thread Prathamesh Kulkarni
On Wed, 15 Nov 2023 at 20:44, Prathamesh Kulkarni
 wrote:
>
> On Wed, 8 Nov 2023 at 21:57, Prathamesh Kulkarni
>  wrote:
> >
> > On Thu, 26 Oct 2023 at 09:43, Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Thu, 26 Oct 2023 at 04:09, Richard Sandiford
> > >  wrote:
> > > >
> > > > Prathamesh Kulkarni  writes:
> > > > > On Wed, 25 Oct 2023 at 02:58, Richard Sandiford
> > > > >  wrote:
> > > > >>
> > > > >> Hi,
> > > > >>
> > > > >> Sorry the slow review.  I clearly didn't think this through properly
> > > > >> when doing the review of the original patch, so I wanted to spend
> > > > >> some time working on the code to get a better understanding of
> > > > >> the problem.
> > > > >>
> > > > >> Prathamesh Kulkarni  writes:
> > > > >> > Hi,
> > > > >> > For the following test-case:
> > > > >> >
> > > > >> > typedef float __attribute__((__vector_size__ (16))) F;
> > > > >> > F foo (F a, F b)
> > > > >> > {
> > > > >> >   F v = (F) { 9 };
> > > > >> >   return __builtin_shufflevector (v, v, 1, 0, 1, 2);
> > > > >> > }
> > > > >> >
> > > > >> > Compiling with -O2 results in following ICE:
> > > > >> > foo.c: In function ‘foo’:
> > > > >> > foo.c:6:10: internal compiler error: in decompose, at rtl.h:2314
> > > > >> > 6 |   return __builtin_shufflevector (v, v, 1, 0, 1, 2);
> > > > >> >   |  ^~
> > > > >> > 0x7f3185 wi::int_traits
> > > > >> >>::decompose(long*, unsigned int, std::pair
> > > > >> > const&)
> > > > >> > ../../gcc/gcc/rtl.h:2314
> > > > >> > 0x7f3185 wide_int_ref_storage > > > >> > false>::wide_int_ref_storage
> > > > >> >>(std::pair const&)
> > > > >> > ../../gcc/gcc/wide-int.h:1089
> > > > >> > 0x7f3185 generic_wide_int
> > > > >> >>::generic_wide_int
> > > > >> >>(std::pair const&)
> > > > >> > ../../gcc/gcc/wide-int.h:847
> > > > >> > 0x7f3185 poly_int<1u, generic_wide_int > > > >> > false> > >::poly_int
> > > > >> >>(poly_int_full, std::pair const&)
> > > > >> > ../../gcc/gcc/poly-int.h:467
> > > > >> > 0x7f3185 poly_int<1u, generic_wide_int > > > >> > false> > >::poly_int
> > > > >> >>(std::pair const&)
> > > > >> > ../../gcc/gcc/poly-int.h:453
> > > > >> > 0x7f3185 wi::to_poly_wide(rtx_def const*, machine_mode)
> > > > >> > ../../gcc/gcc/rtl.h:2383
> > > > >> > 0x7f3185 rtx_vector_builder::step(rtx_def*, rtx_def*) const
> > > > >> > ../../gcc/gcc/rtx-vector-builder.h:122
> > > > >> > 0xfd4e1b vector_builder > > > >> > rtx_vector_builder>::elt(unsigned int) const
> > > > >> > ../../gcc/gcc/vector-builder.h:253
> > > > >> > 0xfd4d11 rtx_vector_builder::build()
> > > > >> > ../../gcc/gcc/rtx-vector-builder.cc:73
> > > > >> > 0xc21d9c const_vector_from_tree
> > > > >> > ../../gcc/gcc/expr.cc:13487
> > > > >> > 0xc21d9c expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
> > > > >> > expand_modifier, rtx_def**, bool)
> > > > >> > ../../gcc/gcc/expr.cc:11059
> > > > >> > 0xaee682 expand_expr(tree_node*, rtx_def*, machine_mode, 
> > > > >> > expand_modifier)
> > > > >> > ../../gcc/gcc/expr.h:310
> > > > >> > 0xaee682 expand_return
> > > > >> > ../../gcc/gcc/cfgexpand.cc:3809
> > > > >> > 0xaee682 expand_gimple_stmt_1
> > > > >> > ../../gcc/gcc/cfgexpand.cc:3918
> > > > >> > 0xaee682 expand_gimple_stmt
> > > > >> > ../../gcc/gcc/cfgexpand.cc:4044
> > > > >> > 0xaf28f0 exp

Re: PR111754

2023-11-27 Thread Prathamesh Kulkarni
On Fri, 24 Nov 2023 at 03:13, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > On Thu, 26 Oct 2023 at 09:43, Prathamesh Kulkarni
> >  wrote:
> >>
> >> On Thu, 26 Oct 2023 at 04:09, Richard Sandiford
> >>  wrote:
> >> >
> >> > Prathamesh Kulkarni  writes:
> >> > > On Wed, 25 Oct 2023 at 02:58, Richard Sandiford
> >> > >  wrote:
> >> > >> So I think the PR could be solved by something like the attached.
> >> > >> Do you agree?  If so, could you base the patch on this instead?
> >> > >>
> >> > >> Only tested against the self-tests.
> >> > >>
> >> > >> Thanks,
> >> > >> Richard
> >> > >>
> >> > >> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> >> > >> index 40767736389..00fce4945a7 100644
> >> > >> --- a/gcc/fold-const.cc
> >> > >> +++ b/gcc/fold-const.cc
> >> > >> @@ -10743,27 +10743,37 @@ fold_vec_perm_cst (tree type, tree arg0, 
> >> > >> tree arg1, const vec_perm_indices &sel,
> >> > >>unsigned res_npatterns, res_nelts_per_pattern;
> >> > >>unsigned HOST_WIDE_INT res_nelts;
> >> > >>
> >> > >> -  /* (1) If SEL is a suitable mask as determined by
> >> > >> - valid_mask_for_fold_vec_perm_cst_p, then:
> >> > >> - res_npatterns = max of npatterns between ARG0, ARG1, and SEL
> >> > >> - res_nelts_per_pattern = max of nelts_per_pattern between
> >> > >> -ARG0, ARG1 and SEL.
> >> > >> - (2) If SEL is not a suitable mask, and TYPE is VLS then:
> >> > >> - res_npatterns = nelts in result vector.
> >> > >> - res_nelts_per_pattern = 1.
> >> > >> - This exception is made so that VLS ARG0, ARG1 and SEL work as 
> >> > >> before.  */
> >> > >> -  if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
> >> > >> -{
> >> > >> -  res_npatterns
> >> > >> -   = std::max (VECTOR_CST_NPATTERNS (arg0),
> >> > >> -   std::max (VECTOR_CST_NPATTERNS (arg1),
> >> > >> - sel.encoding ().npatterns ()));
> >> > >> +  /* First try to implement the fold in a VLA-friendly way.
> >> > >> +
> >> > >> + (1) If the selector is simply a duplication of N elements, the
> >> > >> +result is likewise a duplication of N elements.
> >> > >> +
> >> > >> + (2) If the selector is N elements followed by a duplication
> >> > >> +of N elements, the result is too.
> >> > >>
> >> > >> -  res_nelts_per_pattern
> >> > >> -   = std::max (VECTOR_CST_NELTS_PER_PATTERN (arg0),
> >> > >> -   std::max (VECTOR_CST_NELTS_PER_PATTERN (arg1),
> >> > >> - sel.encoding ().nelts_per_pattern ()));
> >> > >> + (3) If the selector is N elements followed by an interleaving
> >> > >> +of N linear series, the situation is more complex.
> >> > >>
> >> > >> +valid_mask_for_fold_vec_perm_cst_p detects whether we
> >> > >> +can handle this case.  If we can, then each of the N linear
> >> > >> +series either (a) selects the same element each time or
> >> > >> +(b) selects a linear series from one of the input patterns.
> >> > >> +
> >> > >> +If (b) holds for one of the linear series, the result
> >> > >> +will contain a linear series, and so the result will have
> >> > >> +the same shape as the selector.  If (a) holds for all of
> >> > >> +the lienar series, the result will be the same as (2) above.
> >> > >> +
> >> > >> +(b) can only hold if one of the inputs pattern has a
> >> > >> +stepped encoding.  */
> >> > >> +  if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
> >> > >> +{
> >> > >> +  res_npatterns = sel.encoding ().npatterns ();
> >> > >> +  res_nelts_per_pattern = sel.encoding ().nelts_per_pat

[MAINTAINERS] Update my email address

2024-07-03 Thread Prathamesh Kulkarni
Pushing to trunk.

Signed-off-by: Prathamesh Kulkarni  

Thanks,
Prathamesh


RE: [MAINTAINERS] Update my email address

2024-07-03 Thread Prathamesh Kulkarni
Sorry, forgot to attach diff.


-Original Message-
From: Prathamesh Kulkarni  
Sent: Wednesday, July 3, 2024 7:04 PM
To: gcc-patches@gcc.gnu.org
Subject: [MAINTAINERS] Update my email address

External email: Use caution opening links or attachments


Pushing to trunk.

Signed-off-by: Prathamesh Kulkarni  

Thanks,
Prathamesh
[MAINTAINERS] Update my email address.

* MAINTAINERS: Update my email address and add myself to DCO.

Signed-off-by: Prathamesh Kulkarni  

diff --git a/MAINTAINERS b/MAINTAINERS
index 41319595bb5..2218f81194f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -512,7 +512,7 @@ Matt Kraai  

 Jan Kratochvil 
 Matthias Kretz 
 Louis Krupp
-Prathamesh Kulkarni
+Prathamesh Kulkarni
 Venkataramanan Kumar   
 Doug Kwan  
 Aaron W. LaFramboise   
@@ -792,3 +792,4 @@ Jonathan Wakely 

 Alexander Westbrooks   
 Chung-Ju Wu
 Pengxuan Zheng 
+Prathamesh Kulkarni


PR115394: Remove streamer_debugging and it's uses

2024-07-10 Thread Prathamesh Kulkarni
Hi Richard,
As per your suggestion in PR, the attached patch removes streamer_debugging and 
it's uses.
Bootstrapped on aarch64-linux-gnu.
OK to commit ?

Signed-off-by: Prathamesh Kulkarni 

Thanks,
Prathamesh
[PR115394] Remove streamer_debugging and it's uses.

gcc/ChangeLog:
PR lto/115394
* lto-streamer.h: Remove streamer_debugging definition.
* lto-streamer-out.cc (stream_write_tree_ref): Remove use of 
streamer_debugging.
(lto_output_tree): Likewise.
* tree-streamer-in.cc (streamer_read_tree_bitfields): Likewise.
(streamer_get_pickled_tree): Likewise.
* tree-streamer-out.cc (pack_ts_base_value_fields): Likewise.

Signed-off-by: Prathamesh Kulkarni 

diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc
index d4f728094ed..8b4bf9659cb 100644
--- a/gcc/lto-streamer-out.cc
+++ b/gcc/lto-streamer-out.cc
@@ -487,8 +487,6 @@ stream_write_tree_ref (struct output_block *ob, tree t)
gcc_checking_assert (tag == LTO_global_stream_ref);
  streamer_write_hwi (ob, -(int)(ix * 2 + id + 1));
}
-  if (streamer_debugging)
-   streamer_write_uhwi (ob, TREE_CODE (t));
 }
 }
 
@@ -1839,9 +1837,6 @@ lto_output_tree (struct output_block *ob, tree expr,
 will instantiate two different nodes for the same object.  */
   streamer_write_record_start (ob, LTO_tree_pickle_reference);
   streamer_write_uhwi (ob, ix);
-  if (streamer_debugging)
-   streamer_write_enum (ob->main_stream, LTO_tags, LTO_NUM_TAGS,
-lto_tree_code_to_tag (TREE_CODE (expr)));
   lto_stats.num_pickle_refs_output++;
 }
   else
@@ -1882,9 +1877,6 @@ lto_output_tree (struct output_block *ob, tree expr,
}
  streamer_write_record_start (ob, LTO_tree_pickle_reference);
  streamer_write_uhwi (ob, ix);
- if (streamer_debugging)
-   streamer_write_enum (ob->main_stream, LTO_tags, LTO_NUM_TAGS,
-lto_tree_code_to_tag (TREE_CODE (expr)));
}
   in_dfs_walk = false;
   lto_stats.num_pickle_refs_output++;
diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index e8dbba471ed..79c44d2cae7 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -126,10 +126,6 @@ along with GCC; see the file COPYING3.  If not see
 
 typedef unsigned char  lto_decl_flags_t;
 
-/* Stream additional data to LTO object files to make it easier to debug
-   streaming code.  This changes object files.  */
-static const bool streamer_debugging = false;
-
 /* Tags representing the various IL objects written to the bytecode file
(GIMPLE statements, basic blocks, EH regions, tree nodes, etc).
 
diff --git a/gcc/tree-streamer-in.cc b/gcc/tree-streamer-in.cc
index 35341a2b2b6..c248a74f7a1 100644
--- a/gcc/tree-streamer-in.cc
+++ b/gcc/tree-streamer-in.cc
@@ -485,15 +485,6 @@ streamer_read_tree_bitfields (class lto_input_block *ib,
 
   /* Read the bitpack of non-pointer values from IB.  */
   bp = streamer_read_bitpack (ib);
-
-  /* The first word in BP contains the code of the tree that we
- are about to read.  */
-  if (streamer_debugging)
-{
-  code = (enum tree_code) bp_unpack_value (&bp, 16);
-  lto_tag_check (lto_tree_code_to_tag (code),
-lto_tree_code_to_tag (TREE_CODE (expr)));
-}
   code = TREE_CODE (expr);
 
   /* Note that all these functions are highly sensitive to changes in
@@ -1110,17 +1101,8 @@ streamer_get_pickled_tree (class lto_input_block *ib, 
class data_in *data_in)
 {
   unsigned HOST_WIDE_INT ix;
   tree result;
-  enum LTO_tags expected_tag;
 
   ix = streamer_read_uhwi (ib);
   result = streamer_tree_cache_get_tree (data_in->reader_cache, ix);
-
-  if (streamer_debugging)
-{
-  expected_tag = streamer_read_enum (ib, LTO_tags, LTO_NUM_TAGS);
-  gcc_assert (result
- && TREE_CODE (result) == lto_tag_to_tree_code (expected_tag));
-}
-
   return result;
 }
diff --git a/gcc/tree-streamer-out.cc b/gcc/tree-streamer-out.cc
index c30ab62a585..b7205287ffb 100644
--- a/gcc/tree-streamer-out.cc
+++ b/gcc/tree-streamer-out.cc
@@ -71,8 +71,6 @@ write_identifier (struct output_block *ob,
 static inline void
 pack_ts_base_value_fields (struct bitpack_d *bp, tree expr)
 {
-  if (streamer_debugging)
-bp_pack_value (bp, TREE_CODE (expr), 16);
   if (!TYPE_P (expr))
 {
   bp_pack_value (bp, TREE_SIDE_EFFECTS (expr), 1);


Lower zeroing array assignment to memset for allocatable arrays

2024-07-10 Thread Prathamesh Kulkarni
Hi,
The attached patch lowers zeroing array assignment to memset for allocatable 
arrays.

For example:
subroutine test(z, n)
implicit none
integer :: n
real(4), allocatable :: z(:,:,:)

allocate(z(n, 8192, 2048))
z = 0
end subroutine

results in following call to memset instead of 3 nested loops for z = 0:
(void) __builtin_memset ((void *) z->data, 0, (unsigned long) MAX_EXPR 
dim[0].ubound - z->dim[0].lbound, -1> + 1) * (MAX_EXPR dim[1].ubound - 
z->dim[1].lbound, -1> + 1)) * (MAX_EXPR dim[2].ubound - z->dim[2].lbound, 
-1> + 1)) * 4));

The patch significantly improves speedup for an internal Fortran application on 
AArch64 -mcpu=grace (and potentially on other AArch64 cores too).
Bootstrapped+tested on aarch64-linux-gnu.
Does the patch look OK to commit ?

Signed-off-by: Prathamesh Kulkarni 

Thanks,
Prathamesh
Lower zeroing array assignment to memset for allocatable arrays.

gcc/fortran/ChangeLog:
* trans-expr.cc (gfc_trans_zero_assign): Handle allocatable arrays.

gcc/testsuite/ChangeLog:
* gfortran.dg/array_memset_3.f90: New test.

Signed-off-by: Prathamesh Kulkarni 

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 605434f4ddb..7773a24f9d4 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -11421,18 +11421,23 @@ gfc_trans_zero_assign (gfc_expr * expr)
   type = TREE_TYPE (dest);
   if (POINTER_TYPE_P (type))
 type = TREE_TYPE (type);
-  if (!GFC_ARRAY_TYPE_P (type))
-return NULL_TREE;
-
-  /* Determine the length of the array.  */
-  len = GFC_TYPE_ARRAY_SIZE (type);
-  if (!len || TREE_CODE (len) != INTEGER_CST)
+  if (GFC_ARRAY_TYPE_P (type))
+{
+  /* Determine the length of the array.  */
+  len = GFC_TYPE_ARRAY_SIZE (type);
+  if (!len || TREE_CODE (len) != INTEGER_CST)
+   return NULL_TREE;
+}
+  else if (GFC_DESCRIPTOR_TYPE_P (type))
+{
+  if (POINTER_TYPE_P (TREE_TYPE (dest)))
+   dest = build_fold_indirect_ref_loc (input_location, dest);
+  len = gfc_conv_descriptor_size (dest, GFC_TYPE_ARRAY_RANK (type));
+  dest = gfc_conv_descriptor_data_get (dest);
+}
+  else
 return NULL_TREE;
 
-  tmp = TYPE_SIZE_UNIT (gfc_get_element_type (type));
-  len = fold_build2_loc (input_location, MULT_EXPR, gfc_array_index_type, len,
-fold_convert (gfc_array_index_type, tmp));
-
   /* If we are zeroing a local array avoid taking its address by emitting
  a = {} instead.  */
   if (!POINTER_TYPE_P (TREE_TYPE (dest)))
@@ -11440,6 +11445,11 @@ gfc_trans_zero_assign (gfc_expr * expr)
   dest, build_constructor (TREE_TYPE (dest),
  NULL));
 
+  /* Multiply len by element size.  */
+  tmp = TYPE_SIZE_UNIT (gfc_get_element_type (type));
+  len = fold_build2_loc (input_location, MULT_EXPR, gfc_array_index_type,
+len, fold_convert (gfc_array_index_type, tmp));
+
   /* Convert arguments to the correct types.  */
   dest = fold_convert (pvoid_type_node, dest);
   len = fold_convert (size_type_node, len);
diff --git a/gcc/testsuite/gfortran.dg/array_memset_3.f90 
b/gcc/testsuite/gfortran.dg/array_memset_3.f90
new file mode 100644
index 000..b750c8de67d
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/array_memset_3.f90
@@ -0,0 +1,31 @@
+! { dg-do compile }
+! { dg-options "-O2 -fdump-tree-original" }
+
+subroutine test1(n)
+  implicit none
+integer(8) :: n
+real(4), allocatable :: z(:,:,:)
+
+allocate(z(n, 100, 200))
+z = 0
+end subroutine
+
+subroutine test2(n)
+  implicit none
+integer(8) :: n
+integer, allocatable :: z(:,:,:)
+
+allocate(z(n, 100, 200))
+z = 0
+end subroutine
+
+subroutine test3(n)
+  implicit none
+integer(8) :: n
+logical, allocatable :: z(:,:,:)
+
+allocate(z(n, 100, 200))
+z = .false. 
+end subroutine
+
+! { dg-final { scan-tree-dump-times "__builtin_memset" 3 "original" } }


RE: Lower zeroing array assignment to memset for allocatable arrays

2024-07-11 Thread Prathamesh Kulkarni


> -Original Message-
> From: Harald Anlauf 
> Sent: Thursday, July 11, 2024 12:53 AM
> To: Prathamesh Kulkarni ; gcc-
> patc...@gcc.gnu.org; fort...@gcc.gnu.org
> Subject: Re: Lower zeroing array assignment to memset for allocatable
> arrays
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi Prathamesh,
> 
> Am 10.07.24 um 13:22 schrieb Prathamesh Kulkarni:
> > Hi,
> > The attached patch lowers zeroing array assignment to memset for
> allocatable arrays.
> >
> > For example:
> > subroutine test(z, n)
> >  implicit none
> >  integer :: n
> >  real(4), allocatable :: z(:,:,:)
> >
> >  allocate(z(n, 8192, 2048))
> >  z = 0
> > end subroutine
> >
> > results in following call to memset instead of 3 nested loops for z
> = 0:
> >  (void) __builtin_memset ((void *) z->data, 0, (unsigned long)
> > MAX_EXPR dim[0].ubound - z->dim[0].lbound, -1> + 1) *
> > (MAX_EXPR dim[1].ubound - z->dim[1].lbound, -1> + 1)) *
> (MAX_EXPR
> > dim[2].ubound - z->dim[2].lbound, -1> + 1)) * 4));
> >
> > The patch significantly improves speedup for an internal Fortran
> application on AArch64 -mcpu=grace (and potentially on other AArch64
> cores too).
> > Bootstrapped+tested on aarch64-linux-gnu.
> > Does the patch look OK to commit ?
> 
> no, it is NOT ok.
> 
> Consider:
> 
> subroutine test0 (n, z)
>implicit none
>integer :: n
>real, pointer :: z(:,:,:) ! need not be contiguous!
>z = 0
> end subroutine
> 
> After your patch this also generates a memset, but this cannot be true
> in general.  One would need to have a test on contiguity of the array
> before memset can be used.
> 
> In principle this is a nice idea, and IIRC there exists a very old PR
> on this (by Thomas König?).  So it might be worth pursuing.
Hi Harald,
Thanks for the suggestions!
The attached patch checks gfc_is_simply_contiguous(expr, true, false) before 
lowering to memset,
which avoids generating memset for your example above.

Bootstrapped+tested on aarch64-linux-gnu.
Does the attached patch look OK ?

Signed-off-by: Prathamesh Kulkarni 

Thanks,
Prathamesh
> 
> Thanks,
> Harald
> 
> 
> > Signed-off-by: Prathamesh Kulkarni 
> >
> > Thanks,
> > Prathamesh

Lower zeroing array assignment to memset for allocatable arrays.

gcc/fortran/ChangeLog:
* trans-expr.cc (gfc_trans_zero_assign): Handle allocatable arrays.

gcc/testsuite/ChangeLog:
* gfortran.dg/array_memset_3.f90: New test.

Signed-off-by: Prathamesh Kulkarni 

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 477c2720187..f9a7f70b2a3 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -11515,18 +11515,24 @@ gfc_trans_zero_assign (gfc_expr * expr)
   type = TREE_TYPE (dest);
   if (POINTER_TYPE_P (type))
 type = TREE_TYPE (type);
-  if (!GFC_ARRAY_TYPE_P (type))
-return NULL_TREE;
-
-  /* Determine the length of the array.  */
-  len = GFC_TYPE_ARRAY_SIZE (type);
-  if (!len || TREE_CODE (len) != INTEGER_CST)
+  if (GFC_ARRAY_TYPE_P (type))
+{
+  /* Determine the length of the array.  */
+  len = GFC_TYPE_ARRAY_SIZE (type);
+  if (!len || TREE_CODE (len) != INTEGER_CST)
+   return NULL_TREE;
+}
+  else if (GFC_DESCRIPTOR_TYPE_P (type)
+ && gfc_is_simply_contiguous (expr, true, false))
+{
+  if (POINTER_TYPE_P (TREE_TYPE (dest)))
+   dest = build_fold_indirect_ref_loc (input_location, dest);
+  len = gfc_conv_descriptor_size (dest, GFC_TYPE_ARRAY_RANK (type));
+  dest = gfc_conv_descriptor_data_get (dest);
+}
+  else
 return NULL_TREE;
 
-  tmp = TYPE_SIZE_UNIT (gfc_get_element_type (type));
-  len = fold_build2_loc (input_location, MULT_EXPR, gfc_array_index_type, len,
-fold_convert (gfc_array_index_type, tmp));
-
   /* If we are zeroing a local array avoid taking its address by emitting
  a = {} instead.  */
   if (!POINTER_TYPE_P (TREE_TYPE (dest)))
@@ -11534,6 +11540,11 @@ gfc_trans_zero_assign (gfc_expr * expr)
   dest, build_constructor (TREE_TYPE (dest),
  NULL));
 
+  /* Multiply len by element size.  */
+  tmp = TYPE_SIZE_UNIT (gfc_get_element_type (type));
+  len = fold_build2_loc (input_location, MULT_EXPR, gfc_array_index_type,
+len, fold_convert (gfc_array_index_type, tmp));
+
   /* Convert arguments to the correct types.  */
   dest = fold_convert (pvoid_type_node, dest);
   len = fold_convert (size_type_node, len);
diff --git a/gcc/testsuite/gfortran.dg/array_memset_3.f90 
b/gcc/testsuite/gfortran.dg/array_memset_3.f90
new file mode 100644
index 000..753006

RE: Lower zeroing array assignment to memset for allocatable arrays

2024-07-12 Thread Prathamesh Kulkarni


> -Original Message-
> From: Harald Anlauf 
> Sent: Friday, July 12, 2024 1:52 AM
> To: Prathamesh Kulkarni ; gcc-
> patc...@gcc.gnu.org; fort...@gcc.gnu.org
> Subject: Re: Lower zeroing array assignment to memset for allocatable
> arrays
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi Prathamesh!
Hi Harald,
> 
> Am 11.07.24 um 12:16 schrieb Prathamesh Kulkarni:
> >
> >
> >> -Original Message-
> >> From: Harald Anlauf 
> >> Sent: Thursday, July 11, 2024 12:53 AM
> >> To: Prathamesh Kulkarni ; gcc-
> >> patc...@gcc.gnu.org; fort...@gcc.gnu.org
> >> Subject: Re: Lower zeroing array assignment to memset for
> allocatable
> >> arrays
> >>
> >> External email: Use caution opening links or attachments
> >>
> >>
> >> Hi Prathamesh,
> >>
> >> Am 10.07.24 um 13:22 schrieb Prathamesh Kulkarni:
> >>> Hi,
> >>> The attached patch lowers zeroing array assignment to memset for
> >> allocatable arrays.
> >>>
> >>> For example:
> >>> subroutine test(z, n)
> >>>   implicit none
> >>>   integer :: n
> >>>   real(4), allocatable :: z(:,:,:)
> >>>
> >>>   allocate(z(n, 8192, 2048))
> >>>   z = 0
> >>> end subroutine
> >>>
> >>> results in following call to memset instead of 3 nested loops for
> z
> >> = 0:
> >>>   (void) __builtin_memset ((void *) z->data, 0, (unsigned
> long)
> >>> MAX_EXPR dim[0].ubound - z->dim[0].lbound, -1> + 1) *
> >>> (MAX_EXPR dim[1].ubound - z->dim[1].lbound, -1> + 1)) *
> >> (MAX_EXPR
> >>> dim[2].ubound - z->dim[2].lbound, -1> + 1)) * 4));
> >>>
> >>> The patch significantly improves speedup for an internal Fortran
> >> application on AArch64 -mcpu=grace (and potentially on other
> AArch64
> >> cores too).
> >>> Bootstrapped+tested on aarch64-linux-gnu.
> >>> Does the patch look OK to commit ?
> >>
> >> no, it is NOT ok.
> >>
> >> Consider:
> >>
> >> subroutine test0 (n, z)
> >> implicit none
> >> integer :: n
> >> real, pointer :: z(:,:,:) ! need not be contiguous!
> >> z = 0
> >> end subroutine
> >>
> >> After your patch this also generates a memset, but this cannot be
> >> true in general.  One would need to have a test on contiguity of
> the
> >> array before memset can be used.
> >>
> >> In principle this is a nice idea, and IIRC there exists a very old
> PR
> >> on this (by Thomas König?).  So it might be worth pursuing.
> > Hi Harald,
> > Thanks for the suggestions!
> > The attached patch checks gfc_is_simply_contiguous(expr, true,
> false)
> > before lowering to memset, which avoids generating memset for your
> example above.
> 
> This is much better, as it avoids generating false memsets where it
> should not.  However, you now miss cases where the array is a
> component reference, as in:
> 
> subroutine test_dt (dt)
>implicit none
>type t
>   real, allocatable :: x(:,:,:) ! contiguous!
>   real, pointer, contiguous :: y(:,:,:) ! contiguous!
>   real, pointer :: z(:,:,:) ! need not be
> contiguous!
>end type t
>type(t) :: dt
>dt% x = 0  ! memset possible!
>dt% y = 0  ! memset possible!
>dt% z = 0  ! memset NOT possible!
> end subroutine
> 
> You'll need to cycle through the component references and apply the
> check for contiguity to the ultimate component, not the top level.
> 
> Can you have another look?
Thanks for the review!
It seems that component references are not currently handled even for static 
size arrays ?
For eg:
subroutine test_dt (dt, y)
   implicit none
   real :: y (10, 20, 30)
   type t
  real :: x(10, 20, 30)
   end type t
   type(t) :: dt
   y = 0
   dt% x = 0
end subroutine

With trunk, it generates memset for 'y' but not for dt%x.
That happens because copyable_array_p returns false for dt%x,
because expr->ref->next is non NULL:

  /* First check it's an array.  */
  if (expr->rank < 1 || !expr->ref || expr->ref->next)
return false;

and gfc_full_array_ref_p(expr) bails out if expr->ref->type != REF_ARRAY.
Looking thru git history, it seems both the checks were added in 18eaa2c0cd20 
to fix PR33370.
(Even after removing these checks, the previous patch bails out from 
gfc_trans_zero_assign because
GFC_DESCRIPTOR_TYPE_P (type) returns false for component ref and ends up 
returning NULL_TREE)
I am working on extending the patch to handle component refs for statically 
sized as well as allocatable arrays.

Since it looks like a bigger change and an extension to current functionality, 
will it be OK to commit the previous patch as-is (if it looks correct)
and address component refs in follow up one ?

Thanks,
Prathamesh  
 
> 
> Thanks,
> Harald
> 
> > Bootstrapped+tested on aarch64-linux-gnu.
> > Does the attached patch look OK ?
> >
> > Signed-off-by: Prathamesh Kulkarni 
> >
> > Thanks,
> > Prathamesh
> >>
> >> Thanks,
> >> Harald
> >>
> >>
> >>> Signed-off-by: Prathamesh Kulkarni 
> >>>
> >>> Thanks,
> >>> Prathamesh
> >



RE: Lower zeroing array assignment to memset for allocatable arrays

2024-07-15 Thread Prathamesh Kulkarni
> -Original Message-
> From: Harald Anlauf 
> Sent: Saturday, July 13, 2024 1:15 AM
> To: Prathamesh Kulkarni ; gcc-
> patc...@gcc.gnu.org; fort...@gcc.gnu.org
> Subject: Re: Lower zeroing array assignment to memset for allocatable
> arrays
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi Prathamesh,
Hi Harald,
> 
> Am 12.07.24 um 15:31 schrieb Prathamesh Kulkarni:
> > It seems that component references are not currently handled even
> for static size arrays ?
> > For eg:
> > subroutine test_dt (dt, y)
> > implicit none
> > real :: y (10, 20, 30)
> > type t
> >real :: x(10, 20, 30)
> > end type t
> > type(t) :: dt
> > y = 0
> > dt% x = 0
> > end subroutine
> >
> > With trunk, it generates memset for 'y' but not for dt%x.
> > That happens because copyable_array_p returns false for dt%x,
> because
> > expr->ref->next is non NULL:
> >
> >/* First check it's an array.  */
> >if (expr->rank < 1 || !expr->ref || expr->ref->next)
> >  return false;
> >
> > and gfc_full_array_ref_p(expr) bails out if expr->ref->type !=
> REF_ARRAY.
> 
> Indeed that check (as is) prevents the use of component refs.
> (I just tried to modify the this part to cycle thru the refs, but then
> I get regressions in the testsuite for some of the coarray tests.
> Furthermore, gfc_trans_zero_assign would need further changes to
> handle even the constant shapes from above.)
> 
> > Looking thru git history, it seems both the checks were added in
> 18eaa2c0cd20 to fix PR33370.
> > (Even after removing these checks, the previous patch bails out from
> > gfc_trans_zero_assign because GFC_DESCRIPTOR_TYPE_P (type) returns
> > false for component ref and ends up returning NULL_TREE) I am
> working on extending the patch to handle component refs for statically
> sized as well as allocatable arrays.
> >
> > Since it looks like a bigger change and an extension to current
> > functionality, will it be OK to commit the previous patch as-is (if
> it looks correct) and address component refs in follow up one ?
> 
> I agree that it is reasonable to defer the handling of arrays as
> components of derived types, and recommend to do the following:
> 
> - replace "&& gfc_is_simply_contiguous (expr, true, false))" in your
>last patch by "&& gfc_is_simply_contiguous (expr, false, false))",
>as that would also allow to treat
> 
>z(:,::1,:) = 0
> 
>as contiguous if z is allocatable or a contiguous pointer.
> 
> - open a PR in bugzilla to track the missed-optimization for
>the cases we discussed here, and link the discussion in the ML.
Done: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115935
> 
> Your patch then will be OK for mainline.
Thanks, does the attached version look OK ?
Bootstrapped+tested on aarch64-linux-gnu, x86_64-linux-gnu.

Thanks,
Prathamesh
> 
> Thanks,
> Harald
> 
> > Thanks,
> > Prathamesh
> >>
> >> Thanks,
> >> Harald
> >>
> >>> Bootstrapped+tested on aarch64-linux-gnu.
> >>> Does the attached patch look OK ?
> >>>
> >>> Signed-off-by: Prathamesh Kulkarni 
> >>>
> >>> Thanks,
> >>> Prathamesh
> >>>>
> >>>> Thanks,
> >>>> Harald
> >>>>
> >>>>
> >>>>> Signed-off-by: Prathamesh Kulkarni 
> >>>>>
> >>>>> Thanks,
> >>>>> Prathamesh
> >>>
> >

Lower zeroing array assignment to memset for allocatable arrays.

gcc/fortran/ChangeLog:
* trans-expr.cc (gfc_trans_zero_assign): Handle allocatable arrays.

gcc/testsuite/ChangeLog:
* gfortran.dg/array_memset_3.f90: New test.

Signed-off-by: Prathamesh Kulkarni 

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 477c2720187..a85b41bf815 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -11515,18 +11515,24 @@ gfc_trans_zero_assign (gfc_expr * expr)
   type = TREE_TYPE (dest);
   if (POINTER_TYPE_P (type))
 type = TREE_TYPE (type);
-  if (!GFC_ARRAY_TYPE_P (type))
-return NULL_TREE;
-
-  /* Determine the length of the array.  */
-  len = GFC_TYPE_ARRAY_SIZE (type);
-  if (!len || TREE_CODE (len) != INTEGER_CST)
+  if (GFC_ARRAY_TYPE_P (type))
+{
+  /* Determine the length of the array.  */
+  len = GFC_TYPE_ARRAY_SIZE (type);
+  if (!len || TREE_CODE (len) != INTEGER_CST)
+   return NULL_TREE;
+}
+  else if (GFC_DESCRIPTOR_TYPE_P (typ

RE: Lower zeroing array assignment to memset for allocatable arrays

2024-07-16 Thread Prathamesh Kulkarni
> -Original Message-
> From: Harald Anlauf 
> Sent: Tuesday, July 16, 2024 12:06 AM
> To: Prathamesh Kulkarni ; gcc-
> patc...@gcc.gnu.org; fort...@gcc.gnu.org
> Subject: Re: Lower zeroing array assignment to memset for allocatable
> arrays
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi Prathamesh!
> 
> Am 15.07.24 um 15:07 schrieb Prathamesh Kulkarni:
> >> -Original Message-
> >> From: Harald Anlauf 
> >> I agree that it is reasonable to defer the handling of arrays as
> >> components of derived types, and recommend to do the following:
> >>
> >> - replace "&& gfc_is_simply_contiguous (expr, true, false))" in
> your
> >> last patch by "&& gfc_is_simply_contiguous (expr, false,
> false))",
> >> as that would also allow to treat
> >>
> >> z(:,::1,:) = 0
> >>
> >> as contiguous if z is allocatable or a contiguous pointer.
> >>
> >> - open a PR in bugzilla to track the missed-optimization for
> >> the cases we discussed here, and link the discussion in the ML.
> > Done: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115935
> >>
> >> Your patch then will be OK for mainline.
> > Thanks, does the attached version look OK ?
> > Bootstrapped+tested on aarch64-linux-gnu, x86_64-linux-gnu.
> 
> This is now OK.
> 
> Thanks for the patch!
Thanks, committed to trunk in:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=616627245fb06106f7c5bc4a36784acc8ec166f0

Thanks,
Prathamesh
> 
> Harald
> 
> > Thanks,
> > Prathamesh
> 



[nvptx] Pass -m32/-m64 to host_compiler if it has multilib support

2024-08-08 Thread Prathamesh Kulkarni
Hi Richard,
After differing NUM_POLY_INT_COEFFS fix for AArch64/nvptx offloading, the 
following minimal test:

int main()
{
  int x;
  #pragma omp target map(x)
x = 5;
  return x;
}

compiled with -fopenmp -foffload=nvptx-none now fails with:
gcc: error: unrecognized command-line option '-m64'
nvptx mkoffload: fatal error: ../install/bin/gcc returned 1 exit status 
compilation terminated.

As mentioned in RFC email, this happens because 
nvptx/mkoffload.cc:compile_native passes -m64/-m32 to host compiler depending 
on whether
offload_abi is OFFLOAD_ABI_LP64 or OFFLOAD_ABI_ILP32, and aarch64 backend 
doesn't recognize these options.

Based on your suggestion in: 
https://gcc.gnu.org/pipermail/gcc/2024-July/244470.html,
The attached patch generates new macro HOST_MULTILIB derived from 
$enable_as_accelerator_for, and in mkoffload.cc it gates passing -m32/-m64
to host_compiler on HOST_MULTILIB. I verified that the macro is set to 0 for 
aarch64 host (and thus avoids above unrecognized command line option error),
and is set to 1 for x86_64 host.

Does the patch look OK ?

Signed-off-by: Prathamesh Kulkarni 

Thanks,
Prathamesh
[nvptx] Pass -m32/-m64 to host_compiler if it has multilib support.

gcc/ChangeLog:
* configure.ac: Generate new macro HOST_MULTILIB.
* config.in: Regenerate.
* configure: Likewise.
* config/nvptx/mkoffload.cc (compile_native): Gate appending
"-m32"/"-m64" to argv_obstack on HOST_MULTILIB.
    (main): Likewise.

Signed-off-by: Prathamesh Kulkarni 

diff --git a/gcc/config.in b/gcc/config.in
index 7fcabbe5061..3c509356f0a 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -2270,6 +2270,12 @@
 #endif
 
 
+/* Define if host has multilib support. */
+#ifndef USED_FOR_TARGET
+#undef HOST_MULTILIB
+#endif
+
+
 /* Define which stat syscall is able to handle 64bit indodes. */
 #ifndef USED_FOR_TARGET
 #undef HOST_STAT_FOR_64BIT_INODES
diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index 503b1abcefd..f7d29bd5215 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -607,17 +607,18 @@ compile_native (const char *infile, const char *outfile, 
const char *compiler,
   obstack_ptr_grow (&argv_obstack, ptx_dumpbase);
   obstack_ptr_grow (&argv_obstack, "-dumpbase-ext");
   obstack_ptr_grow (&argv_obstack, ".c");
-  switch (offload_abi)
-{
-case OFFLOAD_ABI_LP64:
-  obstack_ptr_grow (&argv_obstack, "-m64");
-  break;
-case OFFLOAD_ABI_ILP32:
-  obstack_ptr_grow (&argv_obstack, "-m32");
-  break;
-default:
-  gcc_unreachable ();
-}
+  if (HOST_MULTILIB)
+switch (offload_abi)
+  {
+   case OFFLOAD_ABI_LP64:
+ obstack_ptr_grow (&argv_obstack, "-m64");
+ break;
+   case OFFLOAD_ABI_ILP32:
+ obstack_ptr_grow (&argv_obstack, "-m32");
+ break;
+   default:
+ gcc_unreachable ();
+  }
   obstack_ptr_grow (&argv_obstack, infile);
   obstack_ptr_grow (&argv_obstack, "-c");
   obstack_ptr_grow (&argv_obstack, "-o");
@@ -761,17 +762,18 @@ main (int argc, char **argv)
   if (verbose)
 obstack_ptr_grow (&argv_obstack, "-v");
   obstack_ptr_grow (&argv_obstack, "-xlto");
-  switch (offload_abi)
-{
-case OFFLOAD_ABI_LP64:
-  obstack_ptr_grow (&argv_obstack, "-m64");
-  break;
-case OFFLOAD_ABI_ILP32:
-  obstack_ptr_grow (&argv_obstack, "-m32");
-  break;
-default:
-  gcc_unreachable ();
-}
+  if (HOST_MULTILIB)
+switch (offload_abi)
+  {
+   case OFFLOAD_ABI_LP64:
+ obstack_ptr_grow (&argv_obstack, "-m64");
+ break;
+   case OFFLOAD_ABI_ILP32:
+ obstack_ptr_grow (&argv_obstack, "-m32");
+ break;
+   default:
+ gcc_unreachable ();
+  }
   if (fopenmp)
 obstack_ptr_grow (&argv_obstack, "-mgomp");
 
diff --git a/gcc/configure b/gcc/configure
index 557ea5fa3ac..cdfa06f0c80 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -931,6 +931,7 @@ infodir
 docdir
 oldincludedir
 includedir
+runstatedir
 localstatedir
 sharedstatedir
 sysconfdir
@@ -1115,6 +1116,7 @@ datadir='${datarootdir}'
 sysconfdir='${prefix}/etc'
 sharedstatedir='${prefix}/com'
 localstatedir='${prefix}/var'
+runstatedir='${localstatedir}/run'
 includedir='${prefix}/include'
 oldincludedir='/usr/include'
 docdir='${datarootdir}/doc/${PACKAGE}'
@@ -1367,6 +1369,15 @@ do
   | -silent | --silent | --silen | --sile | --sil)
 silent=yes ;;
 
+  -runstatedir | --runstatedir | --runstatedi | --runstated \
+  | --runstate | --runstat | --runsta | --runst | --runs \
+  | --run | --ru | --r)
+ac_prev=runstatedir ;;
+  -

RE: [nvptx] Pass -m32/-m64 to host_compiler if it has multilib support

2024-08-12 Thread Prathamesh Kulkarni


> -Original Message-
> From: Thomas Schwinge 
> Sent: Friday, August 9, 2024 12:55 AM
> To: Prathamesh Kulkarni 
> Cc: Andrew Pinski ; Richard Biener
> ; gcc-patches@gcc.gnu.org; Jakub Jelinek
> 
> Subject: Re: [nvptx] Pass -m32/-m64 to host_compiler if it has
> multilib support
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi Prathamesh!
Hi Thomas,
> 
> On 2024-08-08T06:46:25-0700, Andrew Pinski  wrote:
> > On Thu, Aug 8, 2024 at 6:11 AM Prathamesh Kulkarni
> >  wrote:
> >> After differing NUM_POLY_INT_COEFFS fix for AArch64/nvptx
> offloading, the following minimal test:
> 
> First, thanks for your work on enabling this!  I will say that I had
> the plan to re-engage with Nvidia to hire us (as initial implementors
> of GCC/nvptx offloading) to make AArch64/nvptx offloading work, but
> now that Nvidia has its own GCC team, that's great that you're able to
> work on this yourself!  :-)
> 
> Please CC me for GCC/nvptx issues for (at least potentially...) faster
> response times.
Thanks, will do 😊
> 
> >> compiled with -fopenmp -foffload=nvptx-none now fails with:
> >> gcc: error: unrecognized command-line option '-m64'
> >> nvptx mkoffload: fatal error: ../install/bin/gcc returned 1 exit
> status compilation terminated.
> 
> Heh.  Yeah...
> 
> >> As mentioned in RFC email, this happens because
> >> nvptx/mkoffload.cc:compile_native passes -m64/-m32 to host compiler
> depending on whether offload_abi is OFFLOAD_ABI_LP64 or
> OFFLOAD_ABI_ILP32, and aarch64 backend doesn't recognize these
> options.
> >>
> >> Based on your suggestion in:
> >> https://gcc.gnu.org/pipermail/gcc/2024-July/244470.html,
> >> The attached patch generates new macro HOST_MULTILIB derived from
> >> $enable_as_accelerator_for, and in mkoffload.cc it gates passing
> >> -m32/-m64 to host_compiler on HOST_MULTILIB. I verified that the
> macro is set to 0 for aarch64 host (and thus avoids above unrecognized
> command line option error), and is set to 1 for x86_64 host.
> >>
> >> Does the patch look OK ?
> >
> > Note I think the usage of the name MULTILIB here is wrong because
> > aarch64 (and riscv) could have MUTLILIB support just the options are
> > different.
> 
> I also think the proposed patch is not quite the right hammer for the
> issue at hand.
> 
> > For aarch64, it would be -mabi=ilp32/-mabi=lp64 (riscv it is more
> > complex).
> >
> > This most likely should be something more complex due to the above.
> 
> Right.
> 
> > Maybe call it HOST_64_32 but even that seems wrong due to Aarch64
> > having ILP32 support and such.
> 
> Right.
> 
> > What about HOST_64ABI_OPTS="-mabi=lp64"/HOST_32ABI_OPTS="-
> mabi=ilp32"
> > but  I am not sure if that would be enough to support RISCV which
> > requires two options.
> 
> So, my idea is: instead of the current strategy that the host
> 'TARGET_OFFLOAD_OPTIONS' synthesizes '-foffload-abi=lp64' etc., which
> the 'mkoffload's then interpret and re-synthesize '-m64' etc. -- how
> about we instead directly tell the 'mkoffload's the relevant ABI
> options?  That is, 'TARGET_OFFLOAD_OPTIONS' instead synthesizes '-
> foffload-abi=-m64'
> etc., which the 'mkoffload's can then readily use.  Could you please
> give that a try, and/or does anyone see any issues with that approach?
> 
> And use something like '-foffload-abi=disable' to replace the current:
> 
> /* PR libgomp/65099: Currently, we only support offloading in 64-
> bit
>configurations.  */
> if (offload_abi == OFFLOAD_ABI_LP64)
>   {
> 
> (As discussed before, this should be done differently altogether, but
> that's for another day.)
Sorry, I don't quite follow. Currently we enable offloading if offload_abi == 
OFFLOAD_ABI_LP64,
which is synthesized from -foffload-abi=lp64. If we change -foffload-abi to 
instead specify
host-specific ABI opts, I guess mkoffload will still need to somehow figure out 
which ABI is used,
so it can disable offloading for 32-bit ? I suppose we could adjust 
TARGET_OFFLOAD_OPTIONS for each
host to pass -foffload-abi=disable if TARGET_ILP32 is set and offload target is 
nvptx, but not sure
if that'd be correct ?

In the attached patch, I added another option -foffload-abi-host-opts to 
specify host abi
opts, and leave -foffload-abi to specify if ABI is 32/64 bit which mkoffload 
can use to
enable/disable offloading (as before).
Does that look OK ?

Signed-off-by: Prathamesh Kulkarni 

Thanks,
Prathamesh

[optc-save-gen.awk] Fix streaming of command line options for offloading

2024-08-12 Thread Prathamesh Kulkarni
Hi,
As mentioned in:
https://gcc.gnu.org/pipermail/gcc/2024-August/244581.html

AArch64 cl_optimization_stream_out streams out target-specific optimization 
options like flag_aarch64_early_ldp_fusion, aarch64_early_ra etc, which breaks 
AArch64/nvptx offloading,
since nvptx cl_optimization_stream_in doesn't have corresponding stream-in for 
these options and ends up setting invalid values for ptr->explicit_mask (and 
subsequent data structures).

This makes even a trivial test like the following to cause ICE in 
lto_read_decls with -O3 -fopenmp -foffload=nvptx-none:

int main()
{
  int x;
  #pragma omp target map(x)
x;
}

The attached patch modifies optc-save-gen.awk to generate if 
(!lto_stream_offload_p) check before streaming out target-specific opt in 
cl_optimization_stream_out, which
fixes the issue. cl_optimization_stream_out after patch (last few entries):

  bp_pack_var_len_int (bp, ptr->x_flag_wrapv_pointer);
  bp_pack_var_len_int (bp, ptr->x_debug_nonbind_markers_p);
  if (!lto_stream_offload_p)
  bp_pack_var_len_int (bp, ptr->x_flag_aarch64_early_ldp_fusion);
  if (!lto_stream_offload_p)
  bp_pack_var_len_int (bp, ptr->x_aarch64_early_ra);
  if (!lto_stream_offload_p)
  bp_pack_var_len_int (bp, ptr->x_flag_aarch64_late_ldp_fusion);
  if (!lto_stream_offload_p)
  bp_pack_var_len_int (bp, ptr->x_flag_mlow_precision_div);
  if (!lto_stream_offload_p)
  bp_pack_var_len_int (bp, ptr->x_flag_mrecip_low_precision_sqrt);
  if (!lto_stream_offload_p)
  bp_pack_var_len_int (bp, ptr->x_flag_mlow_precision_sqrt);
  for (size_t i = 0; i < ARRAY_SIZE (ptr->explicit_mask); i++)
bp_pack_value (bp, ptr->explicit_mask[i], 64);

For target-specific options, streaming out is gated on !lto_stream_offload_p 
check.

The patch also fixes failures due to same issue with x86_64->nvptx offloading 
for target-print-1.f90 (and couple more).
Does the patch look OK ?

Signed-off-by: Prathamesh Kulkarni 

Thanks,
Prathamesh
[optc-save-gen.awk] Fix streaming of command line options for offloading.

The patch modifies optc-save-gen.awk to generate if (!lto_stream_offload_p)
check before streaming out target-specific opt in cl_optimization_stream_out,
when offloading is enabled.

gcc/ChangeLog:
* gcc/optc-save-gen.awk: New array var_target_opt. Use it to generate
if (!lto_stream_offload_p) check in cl_optimization_stream_out.

Signed-off-by: Prathamesh Kulkarni 

diff --git a/gcc/optc-save-gen.awk b/gcc/optc-save-gen.awk
index a3af88e3776..228efe2accd 100644
--- a/gcc/optc-save-gen.awk
+++ b/gcc/optc-save-gen.awk
@@ -1307,6 +1307,11 @@ for (i = 0; i < n_opts; i++) {
var_opt_optimize_init[n_opt_val] = init;
}
 
+   # Mark options that are annotated with both Optimization and
+   # Target so we can avoid streaming out target-specifc opts when
+   # offloading is enabled.
+   if (flag_set_p("Target", flags[i]))
+   var_target_opt[n_opt_val] = 1;
n_opt_val++;
}
 }
@@ -1384,6 +1389,10 @@ for (i = 0; i < n_opt_val; i++) {
} else {
sgn = "int";
}
+   # Do not stream out target-specifc opts if offloading is
+   # enabled.
+   if (var_target_opt[i])
+   print "  if (!lto_stream_offload_p)"
# If applicable, encode the streamed value.
if (var_opt_optimize_init[i]) {
print "  if (" var_opt_optimize_init[i] " > (" 
var_opt_val_type[i] ") 10)";


RE: [nvptx] Pass -m32/-m64 to host_compiler if it has multilib support

2024-08-16 Thread Prathamesh Kulkarni


> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, August 13, 2024 10:06 PM
> To: Thomas Schwinge 
> Cc: Prathamesh Kulkarni ; Andrew Pinski
> ; gcc-patches@gcc.gnu.org; Jakub Jelinek
> 
> Subject: Re: [nvptx] Pass -m32/-m64 to host_compiler if it has
> multilib support
> 
> External email: Use caution opening links or attachments
> 
> 
> > Am 13.08.2024 um 17:48 schrieb Thomas Schwinge
> :
> >
> > Hi Prathamesh!
> >
> > On 2024-08-12T07:50:07+, Prathamesh Kulkarni
>  wrote:
> >>> From: Thomas Schwinge 
> >>> Sent: Friday, August 9, 2024 12:55 AM
> >
> >>> On 2024-08-08T06:46:25-0700, Andrew Pinski 
> wrote:
> >>>> On Thu, Aug 8, 2024 at 6:11 AM Prathamesh Kulkarni
> >>>>  wrote:
> >>>>> After differing NUM_POLY_INT_COEFFS fix for AArch64/nvptx
> >>> offloading, the following minimal test:
> >>>
> >>> First, thanks for your work on enabling this!  I will say that I
> had
> >>> the plan to re-engage with Nvidia to hire us (as initial
> >>> implementors of GCC/nvptx offloading) to make AArch64/nvptx
> >>> offloading work, but now that Nvidia has its own GCC team, that's
> >>> great that you're able to work on this yourself!  :-)
> >>>
> >>> Please CC me for GCC/nvptx issues for (at least potentially...)
> >>> faster response times.
> >> Thanks, will do 😊
> >
> > Heh, so much for "potentially": I'm not able to spend a lot of time
> on
> > this right now, as I shall soon be out of office.  Quickly:
> >
> >>>>> compiled with -fopenmp -foffload=nvptx-none now fails with:
> >>>>> gcc: error: unrecognized command-line option '-m64'
> >>>>> nvptx mkoffload: fatal error: ../install/bin/gcc returned 1 exit
> >>> status compilation terminated.
> >>>
> >>> Heh.  Yeah...
> >>>
> >>>>> As mentioned in RFC email, this happens because
> >>>>> nvptx/mkoffload.cc:compile_native passes -m64/-m32 to host
> >>>>> compiler
> >>> depending on whether offload_abi is OFFLOAD_ABI_LP64 or
> >>> OFFLOAD_ABI_ILP32, and aarch64 backend doesn't recognize these
> >>> options.
> >
> >>> So, my idea is: instead of the current strategy that the host
> >>> 'TARGET_OFFLOAD_OPTIONS' synthesizes '-foffload-abi=lp64' etc.,
> >>> which the 'mkoffload's then interpret and re-synthesize '-m64'
> etc.
> >>> -- how about we instead directly tell the 'mkoffload's the
> relevant
> >>> ABI options?  That is, 'TARGET_OFFLOAD_OPTIONS' instead
> synthesizes
> >>> '- foffload-abi=-m64'
> >>> etc., which the 'mkoffload's can then readily use.  Could you
> please
> >>> give that a try, and/or does anyone see any issues with that
> approach?
> >>>
> >>> And use something like '-foffload-abi=disable' to replace the
> current:
> >>>
> >>>/* PR libgomp/65099: Currently, we only support offloading in
> 64-
> >>> bit
> >>>   configurations.  */
> >>>if (offload_abi == OFFLOAD_ABI_LP64)
> >>>  {
> >>>
> >>> (As discussed before, this should be done differently altogether,
> >>> but that's for another day.)
> >> Sorry, I don't quite follow. Currently we enable offloading if
> >> offload_abi == OFFLOAD_ABI_LP64, which is synthesized from
> >> -foffload-abi=lp64. If we change -foffload-abi to instead specify
> >> host-specific ABI opts, I guess mkoffload will still need to
> somehow
> >> figure out which ABI is used, so it can disable offloading for 32-
> bit
> >> ? I suppose we could adjust TARGET_OFFLOAD_OPTIONS for each host to
> pass -foffload-abi=disable if TARGET_ILP32 is set and offload target
> is nvptx, but not sure if that'd be correct ?
> >
> > Basically, yes.  My idea was that all 'TARGET_OFFLOAD_OPTIONS'
> > implementations return either the correct host flags to be used by
> the
> > 'mkoffload's (the case that offloading is supported for the current
> > host flags/ABI configuration), or otherwise return '-foffload-
> abi=disable'.
> > For example (untested):
> >
> >> char *
> >> ix86_offload_options (void)
> >> {
> >>   if (TARGET

RE: [optc-save-gen.awk] Fix streaming of command line options for offloading

2024-08-19 Thread Prathamesh Kulkarni


> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, August 13, 2024 12:52 PM
> To: Andrew Pinski 
> Cc: Prathamesh Kulkarni ; gcc-
> patc...@gcc.gnu.org; Thomas Schwinge 
> Subject: Re: [optc-save-gen.awk] Fix streaming of command line options
> for offloading
> 
> External email: Use caution opening links or attachments
> 
> 
> > Am 13.08.2024 um 08:37 schrieb Andrew Pinski :
> >
> > On Mon, Aug 12, 2024 at 10:36 PM Prathamesh Kulkarni
> >  wrote:
> >>
> >> Hi,
> >> As mentioned in:
> >> https://gcc.gnu.org/pipermail/gcc/2024-August/244581.html
> >>
> >> AArch64 cl_optimization_stream_out streams out target-specific
> >> optimization options like flag_aarch64_early_ldp_fusion,
> aarch64_early_ra etc, which breaks AArch64/nvptx offloading, since
> nvptx cl_optimization_stream_in doesn't have corresponding stream-in
> for these options and ends up setting invalid values for ptr-
> >explicit_mask (and subsequent data structures).
> >>
> >> This makes even a trivial test like the following to cause ICE in
> lto_read_decls with -O3 -fopenmp -foffload=nvptx-none:
> >>
> >> int main()
> >> {
> >>  int x;
> >>  #pragma omp target map(x)
> >>x;
> >> }
> >>
> >> The attached patch modifies optc-save-gen.awk to generate if
> >> (!lto_stream_offload_p) check before streaming out target-specific
> opt in cl_optimization_stream_out, which fixes the issue.
> cl_optimization_stream_out after patch (last few entries):
> >>
> >>  bp_pack_var_len_int (bp, ptr->x_flag_wrapv_pointer);
> >> bp_pack_var_len_int (bp, ptr->x_debug_nonbind_markers_p);  if
> >> (!lto_stream_offload_p)  bp_pack_var_len_int (bp,
> >> ptr->x_flag_aarch64_early_ldp_fusion);
> >>  if (!lto_stream_offload_p)
> >>  bp_pack_var_len_int (bp, ptr->x_aarch64_early_ra);  if
> >> (!lto_stream_offload_p)  bp_pack_var_len_int (bp,
> >> ptr->x_flag_aarch64_late_ldp_fusion);
> >>  if (!lto_stream_offload_p)
> >>  bp_pack_var_len_int (bp, ptr->x_flag_mlow_precision_div);  if
> >> (!lto_stream_offload_p)  bp_pack_var_len_int (bp,
> >> ptr->x_flag_mrecip_low_precision_sqrt);
> >>  if (!lto_stream_offload_p)
> >>  bp_pack_var_len_int (bp, ptr->x_flag_mlow_precision_sqrt);  for
> >> (size_t i = 0; i < ARRAY_SIZE (ptr->explicit_mask); i++)
> >>bp_pack_value (bp, ptr->explicit_mask[i], 64);
> >>
> >> For target-specific options, streaming out is gated on
> !lto_stream_offload_p check.
> >>
> >> The patch also fixes failures due to same issue with x86_64->nvptx
> offloading for target-print-1.f90 (and couple more).
> >> Does the patch look OK ?
> >
> > I think it seems to be on the right track. One thing that is also
> > going to be an issue is streaming in, there could be a target option
> > on the offload side that is marked as Optimization that would might
> > also cause issues. We should check to make sure that also gets fixed
> > here too. Or error out for offloading targets can't have target
> > options with Optimization on them during the build.
Thanks for the suggestions. The attached patch modifies optc-save-gen.awk
to emit an error if accel backend marks target specific option with 
Optimization.
AFAIU, currently neither nvptx nor gcn have target-specific options marked with 
Optimization,
so this is mostly a safeguard against future additions.

cl_optimization_stream_in after patch for target-specifc optimization options:

#ifdef ACCEL_COMPILER
#error accel compiler cannot define Optimization attribute for target-specific 
option x_flag_aarch64_early_ldp_fusion
#else
  ptr->x_flag_aarch64_early_ldp_fusion = (signed char ) bp_unpack_var_len_int 
(bp);
#endif

To test if this works, I added -mfoo to nvptx.opt and marked it with both 
Target and Optimization,
which resulted in the following build error for nvptx:

options-save.cc:13548:2: error: #error accel compiler cannot define 
Optimization attribute for target-specifc option x_flag_nvptx_foo
13548 | #error accel compiler cannot define Optimization attribute for 
target-specific option x_flag_nvptx_foo
  |  ^
> 
> It may have been misguided to mark target specific flags as
> Optimization.  It might be required to merge those (from all targets)
> into the common optimize enum, like we do for tree codes.  Language
> specific options marked as Optimization possibly have the same issue
> when mixing with other languages and LTO.  Can you assess the
> situation a bit more?
AFAIK, only c-family/c.opt marks few opt

Re-compute TYPE_MODE and DECL_MODE while streaming in for accelerator

2024-08-19 Thread Prathamesh Kulkarni
Hi Richard,
As mentioned in RFC email, for the following test:

int main()
{
  long c[4];
  #pragma omp target map(c)
c[0] = 0;
  return 0;
}

Compiling for AArch64 host with -O2 -fopenmp -foffload=nvptx-none results in:
lto1: fatal error: nvptx-none - 256-bit integer numbers unsupported (mode 'OI') 
compilation terminated.
nvptx mkoffload: fatal error: 
../install/bin/aarch64-unknown-linux-gnu-accel-nvptx-none-gcc returned 1 exit 
status compilation terminated.

This happens because AArch64 uses OImode for ARRAY_TYPE whose size fits 
256-bits, which is not supported on nvptx, and thus
emits the above diagnostic.

Following your suggestion, the attached patch streams out VOIDmode from host 
for TYPE_MODE and DECL_MODE for aggregate types
with offloading enabled, and while streaming-in on accel side, it recomputes 
TYPE_MODE and DECL_MODE, which fixes the issue.
Patch survives AArch64->nvptx offload testing for libgomp and bootstrap+test on 
aarch64-linux-gnu.

Does the patch look in the right direction ?

Signed-off-by: Prathamesh Kulkarni 

Thanks,
Prathamesh
Recompute TYPE_MODE and DECL_MODE for aggregate type for acclerator.

The patch streams out VOIDmode for aggregate types with offloading enabled,
and recomputes appropriate TYPE_MODE and DECL_MODE while streaming-in on accel
side. The rationale for this change is to avoid streaming out host-specific
modes that may be used for aggregate types, which may not be representable on
the accelerator. For eg, AArch64 uses OImode for ARRAY_TYPE whose size is 
256-bits,
and nvptx doesn't have OImode, and thus ends up emitting an error from
lto_input_mode_table.

gcc/ChangeLog:
* lto-streamer-in.cc: Include stor-layout.h.
(lto_read_tree_1): Call relayout_decl if
offloading is enabled.
* stor-layout.cc (layout_type): Move computation of mode for
ARRAY_TYPE from ...
(compute_array_mode): ... to here.
* stor-layout.h (compute_array_mode): Declare.
* tree-streamer-in.cc: Include stor-layout.h.
(unpack_ts_common_value_fields): Call compute_array_mode if offloading
is enabled.
* tree-streamer-out.cc (pack_ts_fixed_cst_value_fields): Stream out
VOIDmode if decl has aggregate type and offloading is enabled.
(pack_ts_type_common_value_fields): Stream out VOIDmode for aggregate
type if offloading is enabled.

Signed-off-by: Prathamesh Kulkarni 

diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
index cbf6041fd68..0420183faf8 100644
--- a/gcc/lto-streamer-in.cc
+++ b/gcc/lto-streamer-in.cc
@@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "debug.h"
 #include "alloc-pool.h"
 #include "toplev.h"
+#include "stor-layout.h"
 
 /* Allocator used to hold string slot entries for line map streaming.  */
 static struct object_allocator *string_slot_allocator;
@@ -1752,6 +1753,17 @@ lto_read_tree_1 (class lto_input_block *ib, class 
data_in *data_in, tree expr)
 with -g1, see for example PR113488.  */
   else if (DECL_P (expr) && DECL_ABSTRACT_ORIGIN (expr) == expr)
DECL_ABSTRACT_ORIGIN (expr) = NULL_TREE;
+
+#ifdef ACCEL_COMPILER
+  /* For decl with aggregate type, host streams out VOIDmode.
+Compute the correct DECL_MODE by calling relayout_decl.  */
+  if ((VAR_P (expr)
+  || TREE_CODE (expr) == PARM_DECL
+  || TREE_CODE (expr) == FIELD_DECL)
+ && AGGREGATE_TYPE_P (TREE_TYPE (expr))
+ && DECL_MODE (expr) == VOIDmode)
+   relayout_decl (expr);
+#endif
 }
 }
 
diff --git a/gcc/stor-layout.cc b/gcc/stor-layout.cc
index 10c0809914c..0ff8bd1171e 100644
--- a/gcc/stor-layout.cc
+++ b/gcc/stor-layout.cc
@@ -2396,6 +2396,32 @@ finish_builtin_struct (tree type, const char *name, tree 
fields,
   layout_decl (TYPE_NAME (type), 0);
 }
 
+/* Compute TYPE_MODE for TYPE (which is ARRAY_TYPE).  */
+
+void compute_array_mode (tree type)
+{
+  gcc_assert (TREE_CODE (type) == ARRAY_TYPE);
+
+  SET_TYPE_MODE (type, BLKmode);
+  if (TYPE_SIZE (type) != 0
+  && ! targetm.member_type_forces_blk (type, VOIDmode)
+  /* BLKmode elements force BLKmode aggregate;
+else extract/store fields may lose.  */
+  && (TYPE_MODE (TREE_TYPE (type)) != BLKmode
+ || TYPE_NO_FORCE_BLK (TREE_TYPE (type
+{
+  SET_TYPE_MODE (type, mode_for_array (TREE_TYPE (type),
+  TYPE_SIZE (type)));
+  if (TYPE_MODE (type) != BLKmode
+ && STRICT_ALIGNMENT && TYPE_ALIGN (type) < BIGGEST_ALIGNMENT
+ && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT (TYPE_MODE (type)))
+   {
+ TYPE_NO_FORCE_BLK (type) = 1;
+ SET_TYPE_MODE (type, BLKmode);
+   }
+}
+}
+
 /* Calculate the mode, size, and alignment for TYPE.
For an array type, calculate the element separation as

RE: [optc-save-gen.awk] Fix streaming of command line options for offloading

2024-08-20 Thread Prathamesh Kulkarni


> -Original Message-
> From: Richard Biener 
> Sent: Monday, August 19, 2024 6:51 PM
> To: Prathamesh Kulkarni 
> Cc: Andrew Pinski ; gcc-patches@gcc.gnu.org; Thomas
> Schwinge 
> Subject: RE: [optc-save-gen.awk] Fix streaming of command line options
> for offloading
> 
> External email: Use caution opening links or attachments
> 
> 
> On Mon, 19 Aug 2024, Prathamesh Kulkarni wrote:
> 
> >
> >
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Tuesday, August 13, 2024 12:52 PM
> > > To: Andrew Pinski 
> > > Cc: Prathamesh Kulkarni ; gcc-
> > > patc...@gcc.gnu.org; Thomas Schwinge 
> > > Subject: Re: [optc-save-gen.awk] Fix streaming of command line
> > > options for offloading
> > >
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > > Am 13.08.2024 um 08:37 schrieb Andrew Pinski
> :
> > > >
> > > > On Mon, Aug 12, 2024 at 10:36 PM Prathamesh Kulkarni
> > > >  wrote:
> > > >>
> > > >> Hi,
> > > >> As mentioned in:
> > > >> https://gcc.gnu.org/pipermail/gcc/2024-August/244581.html
> > > >>
> > > >> AArch64 cl_optimization_stream_out streams out target-specific
> > > >> optimization options like flag_aarch64_early_ldp_fusion,
> > > aarch64_early_ra etc, which breaks AArch64/nvptx offloading, since
> > > nvptx cl_optimization_stream_in doesn't have corresponding stream-
> in
> > > for these options and ends up setting invalid values for ptr-
> > > >explicit_mask (and subsequent data structures).
> > > >>
> > > >> This makes even a trivial test like the following to cause ICE
> in
> > > lto_read_decls with -O3 -fopenmp -foffload=nvptx-none:
> > > >>
> > > >> int main()
> > > >> {
> > > >>  int x;
> > > >>  #pragma omp target map(x)
> > > >>x;
> > > >> }
> > > >>
> > > >> The attached patch modifies optc-save-gen.awk to generate if
> > > >> (!lto_stream_offload_p) check before streaming out
> > > >> target-specific
> > > opt in cl_optimization_stream_out, which fixes the issue.
> > > cl_optimization_stream_out after patch (last few entries):
> > > >>
> > > >>  bp_pack_var_len_int (bp, ptr->x_flag_wrapv_pointer);
> > > >> bp_pack_var_len_int (bp, ptr->x_debug_nonbind_markers_p);  if
> > > >> (!lto_stream_offload_p)  bp_pack_var_len_int (bp,
> > > >> ptr->x_flag_aarch64_early_ldp_fusion);
> > > >>  if (!lto_stream_offload_p)
> > > >>  bp_pack_var_len_int (bp, ptr->x_aarch64_early_ra);  if
> > > >> (!lto_stream_offload_p)  bp_pack_var_len_int (bp,
> > > >> ptr->x_flag_aarch64_late_ldp_fusion);
> > > >>  if (!lto_stream_offload_p)
> > > >>  bp_pack_var_len_int (bp, ptr->x_flag_mlow_precision_div);  if
> > > >> (!lto_stream_offload_p)  bp_pack_var_len_int (bp,
> > > >> ptr->x_flag_mrecip_low_precision_sqrt);
> > > >>  if (!lto_stream_offload_p)
> > > >>  bp_pack_var_len_int (bp, ptr->x_flag_mlow_precision_sqrt);
> for
> > > >> (size_t i = 0; i < ARRAY_SIZE (ptr->explicit_mask); i++)
> > > >>bp_pack_value (bp, ptr->explicit_mask[i], 64);
> > > >>
> > > >> For target-specific options, streaming out is gated on
> > > !lto_stream_offload_p check.
> > > >>
> > > >> The patch also fixes failures due to same issue with
> > > >> x86_64->nvptx
> > > offloading for target-print-1.f90 (and couple more).
> > > >> Does the patch look OK ?
> > > >
> > > > I think it seems to be on the right track. One thing that is
> also
> > > > going to be an issue is streaming in, there could be a target
> > > > option on the offload side that is marked as Optimization that
> > > > would might also cause issues. We should check to make sure that
> > > > also gets fixed here too. Or error out for offloading targets
> > > > can't have target options with Optimization on them during the
> build.
> > Thanks for the suggestions. The attached patch modifies
> > optc-save-gen.awk to emit an error if accel backend marks target
> specific option with Optimization.
> > AFAIU, currently neither n

RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for accelerator

2024-08-21 Thread Prathamesh Kulkarni


> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, August 20, 2024 10:36 AM
> To: Richard Sandiford 
> Cc: Prathamesh Kulkarni ; Thomas Schwinge
> ; gcc-patches@gcc.gnu.org
> Subject: Re: Re-compute TYPE_MODE and DECL_MODE while streaming in for
> accelerator
> 
> External email: Use caution opening links or attachments
> 
> 
> > Am 19.08.2024 um 20:56 schrieb Richard Sandiford
> :
> >
> > Prathamesh Kulkarni  writes:
> >> diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc index
> >> cbf6041fd68..0420183faf8 100644
> >> --- a/gcc/lto-streamer-in.cc
> >> +++ b/gcc/lto-streamer-in.cc
> >> @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.  If not
> see
> >> #include "debug.h"
> >> #include "alloc-pool.h"
> >> #include "toplev.h"
> >> +#include "stor-layout.h"
> >>
> >> /* Allocator used to hold string slot entries for line map
> streaming.
> >> */ static struct object_allocator
> >> *string_slot_allocator; @@ -1752,6 +1753,17 @@ lto_read_tree_1
> (class lto_input_block *ib, class data_in *data_in, tree expr)
> >> with -g1, see for example PR113488.  */
> >>   else if (DECL_P (expr) && DECL_ABSTRACT_ORIGIN (expr) ==
> expr)
> >>DECL_ABSTRACT_ORIGIN (expr) = NULL_TREE;
> >> +
> >> +#ifdef ACCEL_COMPILER
> >> +  /* For decl with aggregate type, host streams out VOIDmode.
> >> + Compute the correct DECL_MODE by calling relayout_decl.  */
> >> +  if ((VAR_P (expr)
> >> +   || TREE_CODE (expr) == PARM_DECL
> >> +   || TREE_CODE (expr) == FIELD_DECL)
> >> +  && AGGREGATE_TYPE_P (TREE_TYPE (expr))
> >> +  && DECL_MODE (expr) == VOIDmode)
> >> +relayout_decl (expr);
> >> +#endif
> >
> > Genuine question, but: is relayout_decl safe in this context?  It
> does
> > a lot more than just reset the mode.  It also applies the target
> ABI's
> > preferences wrt alignment, padding, and so on, rather than
> preserving
> > those of the host's.
> 
> It would be better to just recompute the mode here.
Hi,
The attached patch sets DECL_MODE (expr) to TYPE_MODE (TREE_TYPE (expr)) in 
lto_read_tree_1 instead of calling relayout_decl (expr).
I checked layout_decl_type does the same thing for setting decl mode, except 
for bit fields. Since bit-fields cannot have
aggregate type, I am assuming setting DECL_MODE (expr) to TYPE_MODE (TREE_TYPE 
(expr)) would be OK in this case ?

Sorry if this sounds like a silly ques -- Why would it be unsafe to call 
relayout_decl for variables that are mapped to accelerator even
if it'd not preserve host's properties ? I assumed we want to assign accel's 
ABI properties for mapped decls (mode being one of them),
or am I misunderstanding ?

Signed-off-by: Prathamesh Kulkarni 

Thanks,
Prathamesh  
   
> 
> Richard
> 
> > Thanks,
> > Richard
> >
> >
> >> }
> >> }
> >>
> >> diff --git a/gcc/stor-layout.cc b/gcc/stor-layout.cc index
> >> 10c0809914c..0ff8bd1171e 100644
> >> --- a/gcc/stor-layout.cc
> >> +++ b/gcc/stor-layout.cc
> >> @@ -2396,6 +2396,32 @@ finish_builtin_struct (tree type, const char
> *name, tree fields,
> >>   layout_decl (TYPE_NAME (type), 0);
> >> }
> >>
> >> +/* Compute TYPE_MODE for TYPE (which is ARRAY_TYPE).  */
> >> +
> >> +void compute_array_mode (tree type)
> >> +{
> >> +  gcc_assert (TREE_CODE (type) == ARRAY_TYPE);
> >> +
> >> +  SET_TYPE_MODE (type, BLKmode);
> >> +  if (TYPE_SIZE (type) != 0
> >> +  && ! targetm.member_type_forces_blk (type, VOIDmode)
> >> +  /* BLKmode elements force BLKmode aggregate;
> >> + else extract/store fields may lose.  */
> >> +  && (TYPE_MODE (TREE_TYPE (type)) != BLKmode
> >> +  || TYPE_NO_FORCE_BLK (TREE_TYPE (type
> >> +{
> >> +  SET_TYPE_MODE (type, mode_for_array (TREE_TYPE (type),
> >> +   TYPE_SIZE (type)));
> >> +  if (TYPE_MODE (type) != BLKmode
> >> +  && STRICT_ALIGNMENT && TYPE_ALIGN (type) < BIGGEST_ALIGNMENT
> >> +  && TYPE_ALIGN (type) < GET_MODE_ALIGNMENT (TYPE_MODE
> (type)))
> >> +{
> >> +  TYPE_NO_FORCE_BLK (type) = 1;
> >> +  SET_TYPE_MODE (type, BLKmode)

RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for accelerator

2024-08-22 Thread Prathamesh Kulkarni
> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, August 21, 2024 5:09 PM
> To: Prathamesh Kulkarni 
> Cc: Richard Sandiford ; Thomas Schwinge
> ; gcc-patches@gcc.gnu.org
> Subject: RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for
> accelerator
> 
> External email: Use caution opening links or attachments
> 
> 
> On Wed, 21 Aug 2024, Prathamesh Kulkarni wrote:
> 
> >
> >
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Tuesday, August 20, 2024 10:36 AM
> > > To: Richard Sandiford 
> > > Cc: Prathamesh Kulkarni ; Thomas Schwinge
> > > ; gcc-patches@gcc.gnu.org
> > > Subject: Re: Re-compute TYPE_MODE and DECL_MODE while streaming in
> > > for accelerator
> > >
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > > Am 19.08.2024 um 20:56 schrieb Richard Sandiford
> > > :
> > > >
> > > > Prathamesh Kulkarni  writes:
> > > >> diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
> > > >> index
> > > >> cbf6041fd68..0420183faf8 100644
> > > >> --- a/gcc/lto-streamer-in.cc
> > > >> +++ b/gcc/lto-streamer-in.cc
> > > >> @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.  If
> not
> > > see
> > > >> #include "debug.h"
> > > >> #include "alloc-pool.h"
> > > >> #include "toplev.h"
> > > >> +#include "stor-layout.h"
> > > >>
> > > >> /* Allocator used to hold string slot entries for line map
> > > streaming.
> > > >> */ static struct object_allocator
> > > >> *string_slot_allocator; @@ -1752,6 +1753,17 @@ lto_read_tree_1
> > > (class lto_input_block *ib, class data_in *data_in, tree expr)
> > > >> with -g1, see for example PR113488.  */
> > > >>   else if (DECL_P (expr) && DECL_ABSTRACT_ORIGIN (expr) ==
> > > expr)
> > > >>DECL_ABSTRACT_ORIGIN (expr) = NULL_TREE;
> > > >> +
> > > >> +#ifdef ACCEL_COMPILER
> > > >> +  /* For decl with aggregate type, host streams out
> VOIDmode.
> > > >> + Compute the correct DECL_MODE by calling relayout_decl.
> */
> > > >> +  if ((VAR_P (expr)
> > > >> +   || TREE_CODE (expr) == PARM_DECL
> > > >> +   || TREE_CODE (expr) == FIELD_DECL)
> > > >> +  && AGGREGATE_TYPE_P (TREE_TYPE (expr))
> > > >> +  && DECL_MODE (expr) == VOIDmode)
> > > >> +relayout_decl (expr);
> > > >> +#endif
> > > >
> > > > Genuine question, but: is relayout_decl safe in this context?
> It
> > > does
> > > > a lot more than just reset the mode.  It also applies the target
> > > ABI's
> > > > preferences wrt alignment, padding, and so on, rather than
> > > preserving
> > > > those of the host's.
> > >
> > > It would be better to just recompute the mode here.
> > Hi,
> > The attached patch sets DECL_MODE (expr) to TYPE_MODE (TREE_TYPE
> (expr)) in lto_read_tree_1 instead of calling relayout_decl (expr).
> > I checked layout_decl_type does the same thing for setting decl
> mode,
> > except for bit fields. Since bit-fields cannot have aggregate type,
> I am assuming setting DECL_MODE (expr) to TYPE_MODE (TREE_TYPE (expr))
> would be OK in this case ?
> 
> Yep, that should work.
Thanks, I have committed the patch in:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=792adb8d222d0d1d16b182871e105f47823b8e72

after verifying it passes bootstrap+test on aarch64-linux-gnu,
and libgomp testing (without GPU) for aarch64->nvptx and x86_64->nvptx.
> 
> > Sorry if this sounds like a silly ques -- Why would it be unsafe to
> > call relayout_decl for variables that are mapped to accelerator even
> > if it'd not preserve host's properties ? I assumed we want to assign
> accel's ABI properties for mapped decls (mode being one of them), or
> am I misunderstanding ?
> 
> Structure layout need not be compatible but we are preserving that of
> the host instead of re-layouting in target context.  Likewise type <->
> mode mapping doesn't have to agree.
Ah OK, thanks for clarifying. So IIUC, in future, we might need to change that 
if
(in theory), host's structure layout for a decl is incompatible with a 
particular accel's ABI
and wil

RE: [PATCH v3] Update LDPT_REGISTER_CLAIM_FILE_HOOK_V2 linker plugin hook

2024-08-23 Thread Prathamesh Kulkarni


> -Original Message-
> From: Richard Biener 
> Sent: Thursday, August 22, 2024 2:16 PM
> To: H.J. Lu 
> Cc: gcc-patches@gcc.gnu.org; josmy...@redhat.com
> Subject: Re: [PATCH v3] Update LDPT_REGISTER_CLAIM_FILE_HOOK_V2 linker
> plugin hook
> 
> External email: Use caution opening links or attachments
> 
> 
> On Wed, Aug 21, 2024 at 4:25 PM H.J. Lu  wrote:
> >
> > This hook allows the BFD linker plugin to distinguish calls to
> > claim_file_handler that know the object is being used by the linker
> > (from ldmain.c:add_archive_element), from calls that don't know it's
> > being used by the linker (from elf_link_is_defined_archive_symbol);
> in
> > the latter case, the plugin should avoid including the unused LTO
> > archive members in link output.  To get the proper support for
> > archives with LTO common symbols, the linker fix
> 
> OK.
Hi,
After this commit:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=a98dd536b1017c2b814a3465206c6c01b2890998
I am no longer able to see mkoffload (and accel compiler) being invoked for 
nvptx (-save-temps also doesn't show accel dumps).
I have attached -v output before and after the commit for x86_64->nvptx 
offloading for the following simple test (host doesn't really matter, can also 
reproduce with aarch64 host):

int main()
{
  int x = 1;
  #pragma omp target map(x)
x = 5;
  return x;
}

Thanks,
Prathamesh
> 
> Thanks,
> Richard.
> 
> > commit a6f8fe0a9e9cbe871652e46ba7c22d5e9fb86208
> > Author: H.J. Lu 
> > Date:   Wed Aug 14 20:50:02 2024 -0700
> >
> > lto: Don't include unused LTO archive members in output
> >
> > is required.
> >
> > PR lto/116361
> > * lto-plugin.c (claim_file_handler_v2): Rename claimed to
> > can_be_claimed.  Include the LTO object only if it is known
> to
> > be included in link output.
> >
> > Signed-off-by: H.J. Lu 
> > ---
> >  lto-plugin/lto-plugin.c | 53
> > -
> >  1 file changed, 31 insertions(+), 22 deletions(-)
> >
> > diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c index
> > 152648338b9..61b0de62f52 100644
> > --- a/lto-plugin/lto-plugin.c
> > +++ b/lto-plugin/lto-plugin.c
> > @@ -1191,16 +1191,19 @@ process_offload_section (void *data, const
> char *name, off_t offset, off_t len)
> >return 1;
> >  }
> >
> > -/* Callback used by a linker to check if the plugin will claim
> FILE. Writes
> > -   the result in CLAIMED.  If KNOWN_USED, the object is known by
> the linker
> > -   to be used, or an older API version is in use that does not
> provide that
> > -   information; otherwise, the linker is only determining whether
> this is
> > -   a plugin object and it should not be registered as having
> offload data if
> > -   not claimed by the plugin.  */
> > +/* Callback used by a linker to check if the plugin can claim FILE.
> > +   Writes the result in CAN_BE_CLAIMED.  If KNOWN_USED != 0, the
> object
> > +   is known by the linker to be included in link output, or an
> older API
> > +   version is in use that does not provide that information.
> Otherwise,
> > +   the linker is only determining whether this is a plugin object
> and
> > +   only the symbol table is needed by the linker.  In this case,
> the
> > +   object should not be included in link output and this function
> will
> > +   be called by the linker again with KNOWN_USED != 0 after the
> linker
> > +   decides the object should be included in link output. */
> >
> >  static enum ld_plugin_status
> > -claim_file_handler_v2 (const struct ld_plugin_input_file *file, int
> *claimed,
> > -  int known_used)
> > +claim_file_handler_v2 (const struct ld_plugin_input_file *file,
> > +  int *can_be_claimed, int known_used)
> >  {
> >enum ld_plugin_status status;
> >struct plugin_objfile obj;
> > @@ -1229,7 +1232,7 @@ claim_file_handler_v2 (const struct
> ld_plugin_input_file *file, int *claimed,
> >  }
> >lto_file.handle = file->handle;
> >
> > -  *claimed = 0;
> > +  *can_be_claimed = 0;
> >obj.file = file;
> >obj.found = 0;
> >obj.offload = false;
> > @@ -1286,15 +1289,19 @@ claim_file_handler_v2 (const struct
> ld_plugin_input_file *file, int *claimed,
> >   lto_file.symtab.syms);
> >check (status == LDPS_OK, LDPL_FATAL, "could not add
> symbols");
> >
> > -  LOCK_SECTION;
> > -  num_claimed_files++;
> > -  claimed_files =
> > -   xrealloc (claimed_files,
> > - num_claimed_files * sizeof (struct
> plugin_file_info));
> > -  claimed_files[num_claimed_files - 1] = lto_file;
> > -  UNLOCK_SECTION;
> > +  /* Include it only if it is known to be used for link output.
> */
> > +  if (known_used)
> > +   {
> > + LOCK_SECTION;
> > + num_claimed_files++;
> > + claimed_files =
> > +   xrealloc (claimed_files,
> > + num_claimed_files * sizeof (struct
> plugin_file_info));
> > +   

RE: [PATCH] lto: Don't check obj.found for offload section

2024-08-23 Thread Prathamesh Kulkarni
> -Original Message-
> From: H.J. Lu 
> Sent: Friday, August 23, 2024 6:07 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Prathamesh Kulkarni ;
> richard.guent...@gmail.com
> Subject: [PATCH] lto: Don't check obj.found for offload section
> 
> External email: Use caution opening links or attachments
> 
> 
> obj.found is the number of LTO symbols.  We should include the offload
> section when it is used by linker even if there are no LTO symbols.
> 
> PR lto/116361
> * lto-plugin.c (claim_file_handler_v2): Don't check obj.found
> for the offload section.
Hi,
I applied your patch locally, and can confirm this fixes the issue with 
offloading, thanks!

Thanks,
Prathamesh
> 
> Signed-off-by: H.J. Lu 
> ---
>  lto-plugin/lto-plugin.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c index
> 61b0de62f52..c564b36eb92 100644
> --- a/lto-plugin/lto-plugin.c
> +++ b/lto-plugin/lto-plugin.c
> @@ -1320,7 +1320,7 @@ claim_file_handler_v2 (const struct
> ld_plugin_input_file *file,
>if (*can_be_claimed && !obj.offload && offload_files_last_lto ==
> NULL)
>  offload_files_last_lto = offload_files_last;
> 
> -  if (obj.offload && known_used && obj.found > 0)
> +  if (obj.offload && known_used)
>  {
>/* Add file to the list.  The order must be exactly the same as
> the final
>  order after recompilation and linking, otherwise host and
> target tables
> --
> 2.46.0



[nvptx] Fix code-gen for alias attribute

2024-08-26 Thread Prathamesh Kulkarni
Hi,
For the following test (adapted from pr96390.c):

__attribute__((noipa)) int foo () { return 42; }
int bar () __attribute__((alias ("foo")));
int baz () __attribute__((alias ("bar")));

int main ()
{
  int n;
  #pragma omp target map(from:n)
n = baz ();
  return n;
}

Compiling with -fopenmp -foffload=nvptx-none -foffload=-malias 
-foffload=-mptx=6.3 results in:

ptxas fatal   : Internal error: alias to unknown symbol
nvptx-as: ptxas returned 255 exit status
nvptx mkoffload: fatal error: 
../../install/bin/aarch64-unknown-linux-gnu-accel-nvptx-none-gcc returned 1 
exit status
compilation terminated.
lto-wrapper: fatal error: 
/home/prathameshk/gnu-toolchain/gcc/grcogcc-38/install/libexec/gcc/aarch64-unknown-linux-gnu/15.0.0//accel/nvptx-none/mkoffload
 returned 1 exit status
compilation terminated. 

This happens because ptx code-gen shows:

// BEGIN GLOBAL FUNCTION DEF: foo
.visible .func (.param.u32 %value_out) foo
{
.reg.u32 %value;
mov.u32 %value, 42;
st.param.u32[%value_out], %value;
ret;
}
.visible .func (.param.u32 %value_out) bar;
.alias bar,foo;
.visible .func (.param.u32 %value_out) baz;
.alias baz,bar;

.alias baz, bar is invalid since PTX requires aliasee to be a defined function:
https://sw-docs-dgx-station.nvidia.com/cuda-latest/parallel-thread-execution/latest-internal/#kernel-and-function-directives-alias

The patch uses cgraph_node::get(name)->ultimate_alias_target () instead of the 
provided value in nvptx_asm_output_def_from_decls.
For the above case, it now generates the following ptx:

.alias baz,foo; 
instead of:
.alias baz,bar;

which fixes the issue.

Does the patch look in the right direction ?

Signed-off-by: Prathamesh Kulkarni 

Thanks,
Prathamesh

[nvptx] Fix code-gen for alias attribute.

For the following test (adapted from pr96390.c):

__attribute__((noipa)) int foo () { return 42; }
int bar () __attribute__((alias ("foo")));
int baz () __attribute__((alias ("bar")));

int main ()
{
  int n;
  #pragma omp target map(from:n)
n = baz ();
  return n;
}

gcc emits following ptx for baz:
.visible .func (.param.u32 %value_out) bar;
.alias bar,foo;
.visible .func (.param.u32 %value_out) baz;
.alias baz,bar;

which is incorrect since PTX requires aliasee to be a defined function.
The patch instead uses cgraph_node::get(name)->ultimate_alias_target,
which generates the following PTX:

.visible .func (.param.u32 %value_out) baz;
.alias baz,foo;

gcc/ChangeLog:

* config/nvptx/nvptx.cc (nvptx_asm_output_def_from_decls): Use
cgraph_node::get(name)->ultimate_alias_target instead of value.

Signed-off-by: Prathamesh Kulkarni 

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 2a8f713c680..9688b0e6f2d 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -7583,7 +7583,8 @@ nvptx_mem_local_p (rtx mem)
   while (0)
 
 void
-nvptx_asm_output_def_from_decls (FILE *stream, tree name, tree value)
+nvptx_asm_output_def_from_decls (FILE *stream, tree name,
+tree value ATTRIBUTE_UNUSED)
 {
   if (nvptx_alias == 0 || !TARGET_PTX_6_3)
 {
@@ -7618,7 +7619,8 @@ nvptx_asm_output_def_from_decls (FILE *stream, tree name, 
tree value)
   return;
 }
 
-  if (!cgraph_node::get (name)->referred_to_p ())
+  cgraph_node *cnode = cgraph_node::get (name);
+  if (!cnode->referred_to_p ())
 /* Prevent "Internal error: reference to deleted section".  */
 return;
 
@@ -7627,8 +7629,10 @@ nvptx_asm_output_def_from_decls (FILE *stream, tree 
name, tree value)
   fputs (s.str ().c_str (), stream);
 
   tree id = DECL_ASSEMBLER_NAME (name);
+  symtab_node *alias_target_node = cnode->ultimate_alias_target ();
+  tree alias_target_id = DECL_ASSEMBLER_NAME (alias_target_node->decl);
   NVPTX_ASM_OUTPUT_DEF (stream, IDENTIFIER_POINTER (id),
-   IDENTIFIER_POINTER (value));
+   IDENTIFIER_POINTER (alias_target_id));
 }
 
 #undef NVPTX_ASM_OUTPUT_DEF


Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS

2024-07-29 Thread Prathamesh Kulkarni
Hi Richard,
Thanks for your suggestions on RFC email, the attached patch adds support for 
streaming of poly_int when it's degree <= accel's NUM_POLY_INT_COEFFS.
The patch changes streaming of poly_int as follows:

Streaming out poly_int:

degree = poly_int.degree();
stream out degree;
for (i = 0; i < degree; i++)
  stream out poly_int.coeffs[i];

Streaming in poly_int:

stream in degree;
if (degree > NUM_POLY_INT_COEFFS)
  fatal_error();
stream in coeffs;
// Set remaining coeffs to zero in case degree < accel's NUM_POLY_INT_COEFFS
for (i = degree; i < NUM_POLY_INT_COEFFS; i++)
  poly_int.coeffs[i] = 0;

Patch passes bootstrap+test and LTO bootstrap+test on aarch64-linux-gnu.
LTO bootstrap+test on x86_64-linux-gnu in progress.

I am not quite sure how to test it for offloading since currently it's 
(entirely) broken for aarch64->nvptx.
I can give a try with x86_64->nvptx offloading if required (altho I guess LTO 
bootstrap should test streaming changes ?)

Signed-off-by: Prathamesh Kulkarni 

Thanks,
Prathamesh
Partially support streaming of poly_int for offloading.

Support streaming of poly_int for offloading when it's degree doesn't exceed
accel's NUM_POLY_INT_COEFFS.

The patch changes streaming of poly_int as follows:

Streaming out poly_int:

degree = poly_int.degree();
stream out degree;
for (i = 0; i < degree; i++)
  stream out poly_int.coeffs[i];

Streaming in poly_int (for accelerator):

stream in degree;
if (degree > NUM_POLY_INT_COEFFS)
  fatal_error();
stream in coeffs;
// Set remaining coeffs to zero in case degree < accel's NUM_POLY_INT_COEFFS
for (i = degree; i < NUM_POLY_INT_COEFFS; i++)
  poly_int.coeffs[i] = 0;

gcc/ChangeLog:

* data-streamer-in.cc (streamer_read_poly_uint64): Stream in poly_int
degree and call poly_int_read_common. 
(streamer_read_poly_int64): Likewise.
* data-streamer-out.cc (streamer_write_poly_uint64): Stream out poly_int
degree.
(streamer_write_poly_int64): Likewise.
* data-streamer.h (bp_pack_poly_value): Likewise.
(poly_int_read_common): New function template.
(bp_unpack_poly_value): Stream in poly_int degree and call
poly_int_read_common.
* poly-int.h (C>::degree): New method.
* tree-streamer-in.cc (lto_input_ts_poly_tree_pointers): Stream in
POLY_INT_CST degree, issue a fatal_error if degree is greater than
NUM_POLY_INT_COEFFS, and zero out remaining coeffs. 
* tree-streamer-out.cc (write_ts_poly_tree_pointers): Calculate and
stream out POLY_INT_CST degree.

Signed-off-by: Prathamesh Kulkarni 

diff --git a/gcc/data-streamer-in.cc b/gcc/data-streamer-in.cc
index 7dce2928ef0..91cece39b05 100644
--- a/gcc/data-streamer-in.cc
+++ b/gcc/data-streamer-in.cc
@@ -182,10 +182,11 @@ streamer_read_hwi (class lto_input_block *ib)
 poly_uint64
 streamer_read_poly_uint64 (class lto_input_block *ib)
 {
+  unsigned degree = streamer_read_uhwi (ib);
   poly_uint64 res;
-  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+  for (unsigned int i = 0; i < degree; ++i)
 res.coeffs[i] = streamer_read_uhwi (ib);
-  return res;
+  return poly_int_read_common (res, degree);
 }
 
 /* Read a poly_int64 from IB.  */
@@ -193,10 +194,11 @@ streamer_read_poly_uint64 (class lto_input_block *ib)
 poly_int64
 streamer_read_poly_int64 (class lto_input_block *ib)
 {
+  unsigned degree = streamer_read_uhwi (ib);
   poly_int64 res;
-  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+  for (unsigned int i = 0; i < degree; ++i)
 res.coeffs[i] = streamer_read_hwi (ib);
-  return res;
+  return poly_int_read_common (res, degree);
 }
 
 /* Read gcov_type value from IB.  */
diff --git a/gcc/data-streamer-out.cc b/gcc/data-streamer-out.cc
index c237e30f704..b0fb9dedb24 100644
--- a/gcc/data-streamer-out.cc
+++ b/gcc/data-streamer-out.cc
@@ -227,7 +227,9 @@ streamer_write_hwi (struct output_block *ob, HOST_WIDE_INT 
work)
 void
 streamer_write_poly_uint64 (struct output_block *ob, poly_uint64 work)
 {
-  for (int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+  unsigned degree = work.degree ();
+  streamer_write_uhwi_stream (ob->main_stream, degree);
+  for (unsigned i = 0; i < degree; ++i)
 streamer_write_uhwi_stream (ob->main_stream, work.coeffs[i]);
 }
 
@@ -236,7 +238,9 @@ streamer_write_poly_uint64 (struct output_block *ob, 
poly_uint64 work)
 void
 streamer_write_poly_int64 (struct output_block *ob, poly_int64 work)
 {
-  for (int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+  unsigned degree = work.degree ();
+  streamer_write_uhwi_stream (ob->main_stream, degree);
+  for (unsigned i = 0; i < degree; ++i)
 streamer_write_hwi_stream (ob->main_stream, work.coeffs[i]);
 }
 
diff --git a/gcc/data-streamer.h b/gcc/data-streamer.h
index 6a2596134ce..b154c439b8c 100644
--- a/gcc/data-streamer.h
+++ b/gcc/data-streamer.h
@@ -142,7 +142,9 @@ bp_pack_poly_val

RE: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS

2024-07-29 Thread Prathamesh Kulkarni


> -Original Message-
> From: Richard Sandiford 
> Sent: Monday, July 29, 2024 9:43 PM
> To: Richard Biener 
> Cc: Prathamesh Kulkarni ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: Support streaming of poly_int for offloading when it's
> degree <= accel's NUM_POLY_INT_COEFFS
> 
> External email: Use caution opening links or attachments
> 
> 
> Richard Biener  writes:
> > On Mon, 29 Jul 2024, Prathamesh Kulkarni wrote:
> >
> >> Hi Richard,
> >> Thanks for your suggestions on RFC email, the attached patch adds
> support for streaming of poly_int when it's degree <= accel's
> NUM_POLY_INT_COEFFS.
> >> The patch changes streaming of poly_int as follows:
> >>
> >> Streaming out poly_int:
> >>
> >> degree = poly_int.degree();
> >> stream out degree;
> >> for (i = 0; i < degree; i++)
> >>   stream out poly_int.coeffs[i];
> >>
> >> Streaming in poly_int:
> >>
> >> stream in degree;
> >> if (degree > NUM_POLY_INT_COEFFS)
> >>   fatal_error();
> >> stream in coeffs;
> >> // Set remaining coeffs to zero in case degree < accel's
> >> NUM_POLY_INT_COEFFS for (i = degree; i < NUM_POLY_INT_COEFFS; i++)
> >>   poly_int.coeffs[i] = 0;
> >>
> >> Patch passes bootstrap+test and LTO bootstrap+test on aarch64-
> linux-gnu.
> >> LTO bootstrap+test on x86_64-linux-gnu in progress.
> >>
> >> I am not quite sure how to test it for offloading since currently
> it's (entirely) broken for aarch64->nvptx.
> >> I can give a try with x86_64->nvptx offloading if required (altho I
> >> guess LTO bootstrap should test streaming changes ?)
> >
> > +  unsigned degree
> > += bp_unpack_value (bp, BITS_PER_UNIT * sizeof (unsigned
> > HOST_WIDE_INT));
> >
> > The NUM_POLY_INT_COEFFS target define doesn't seem to be constrained
> > to any type it needs to fit into, using HOST_WIDE_INT is arbitrary.
> > I'd say we should constrain it to a reasonable upper bound, like 2?
> > Maybe even have MAX_NUM_POLY_INT_COEFFS or NUM_POLY_INT_COEFFS_BITS
> in
> > poly-int.h and constrain NUM_POLY_INT_COEFFS.
> >
> > The patch looks reasonable over all, but Richard S. should have a
> say
> > about the abstraction you chose and the poly-int adjustment.
> 
> Sorry if this has been discussed already, but could we instead stream
> NUM_POLY_INT_COEFFS once per file, rather than once per poly_int?
> It's a target invariant, and poly_int has wormed its way into lots of
> things by now :)
Hi Richard,
The patch doesn't stream out NUM_POLY_INT_COEFFS, but the degree of poly_int 
(and streams-out coeffs only up to degree, ignoring the higher zero coeffs).
During streaming-in, it reads back the degree (and streamed coeffs upto degree) 
and issues an error if degree > accel's NUM_POLY_INT_COEFFS, since we can't
(as-is) represent a degree-N poly_int on accel with NUM_POLY_INT_COEFFS < N. If 
degree < accel's NUM_POLY_INT_COEFFS, the remaining coeffs are set to 0
(similar to zero-extension). I posted more details in RFC: 
https://gcc.gnu.org/pipermail/gcc/2024-July/244466.html

The attached patch defines MAX_NUM_POLY_INT_COEFFS_BITS in poly-int.h to 
represent number of bits needed for max value of NUM_POLY_INT_COEFFS defined by 
any target,
and uses that for packing/unpacking degree of poly_int to/from bitstream, which 
should make it independent of the type used for representing 
NUM_POLY_INT_COEFFS by
the target.

Bootstrap+test and LTO bootstrap+test in progress on aarch64-linux-gnu.
Does the patch look OK ?

Signed-off-by: Prathamesh Kulkarni 

Thanks,
Prathamesh
> 
> Thanks,
> Richard
Partially support streaming of poly_int for offloading.

Support streaming of poly_int for offloading when it's degree doesn't exceed
accel's NUM_POLY_INT_COEFFS.

The patch changes streaming of poly_int as follows:

Streaming out poly_int:

degree = poly_int.degree();
stream out degree;
for (i = 0; i < degree; i++)
  stream out poly_int.coeffs[i];

Streaming in poly_int (for accelerator):

stream in degree;
if (degree > NUM_POLY_INT_COEFFS)
  fatal_error();
stream in coeffs;
// Set remaining coeffs to zero in case degree < accel's NUM_POLY_INT_COEFFS
for (i = degree; i < NUM_POLY_INT_COEFFS; i++)
  poly_int.coeffs[i] = 0;

The patch defines MAX_NUM_POLY_INT_COEFFS_BITS in poly-int.h to represent
number of bits needed to represent value of max NUM_POLY_INT_COEFFS for any
target, which should make packing/unpacking degree of poly_int to/from
bitstream independent of the type used to represent NUM_POLY_INT_COEFF by the
target.

gcc/ChangeLog:

 

RE: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS

2024-07-30 Thread Prathamesh Kulkarni



> -Original Message-
> From: Jakub Jelinek 
> Sent: Tuesday, July 30, 2024 3:16 PM
> To: Richard Biener 
> Cc: Richard Sandiford ; Prathamesh Kulkarni
> ; gcc-patches@gcc.gnu.org
> Subject: Re: Support streaming of poly_int for offloading when it's
> degree <= accel's NUM_POLY_INT_COEFFS
> 
> External email: Use caution opening links or attachments
> 
> 
> On Tue, Jul 30, 2024 at 11:25:42AM +0200, Richard Biener wrote:
> > Only "relevant" stuff should be streamed - the offload code and all
> > trees refered to.
> 
> Yeah.
> 
> > > > I think all current issues are because of poly-* leaking in for
> > > > cases where a non-poly would have worked fine, but I have not
> had
> > > > a look myself.
> > >
> > > One of the cases that Prathamesh mentions is streaming the mode
> sizes.
> > > Are those modes "offload target modes" or "host modes"?  It seems
> > > like it shouldn't be an error for the host to have VLA modes per
> se.
> > > It's just that those modes can't be used in the host/offload
> interface.
> >
> > There's a requirement that a mode mapping exists from the host to
> > target enum machine_mode.  I don't remember exactly how we compute
> > that mapping and whether streaming of some data (and thus poly-int)
> > are part of this.
> 
> During streaming out, the code records what machine modes are being
> streamed (in streamer_mode_table).
> For those modes (and their inner modes) then lto_write_mode_table
> should stream a table with mode details like class, bits, size, inner
> mode, nunits, real mode format if any, etc.
> That table is then streamed in in the offloading compiler and it
> attempts to find corresponding modes (and emits fatal_error if there
> is no such mode; consider say x86_64 long double with XFmode being
> used in offloading code which doesn't have XFmode support).
> Now, because Richard S. changed GET_MODE_SIZE etc. to give poly_int
> rather than int, this has been changed to use bp_pack_poly_value; but
> that relies on the same number of coefficients for poly_int, which is
> not the case when e.g. offloading aarch64 to gcn or nvptx.
Indeed, for the minimal test:
int main()
{
  int x;
  #pragma omp target map (to: x)
  {
x = 0;
  }
  return x;
}

Streaming out mode_table from AArch64 shows:
mode = SI, mclass = 2, size = 4, prec = 32
mode = DI, mclass = 2, size = 8, prec = 64

While streaming-in for nvptx shows:
mclass = 2, size = 4, prec = 0

The discrepancy happens because of differing value of NUM_POLY_INT_COEFFS 
between AArch64 and nvptx.
>From AArch64 it streams out size and prec as <4, 0> and <32, 0> respectively, 
>where 0 comes from coeffs[1].
While streaming-in from nvptx, since NUM_POLY_INT_COEFFS is 1, it incorrectly 
reads size as 4, and prec as 0.
> 
> From what I can see, this mode table handling are the only uses of
> bp_pack_poly_value.  So the options are either to stream at the start
> of the mode table the NUM_POLY_INT_COEFFS value and in
> bp_unpack_poly_value pass to it what we've read and fill in any
> remaining coeffs with zeros, or in each bp_pack_poly_value stream the
> number of coefficients and then stream that back in and fill in
> remaining ones (and diagnose if it would try to read non-zero
> coefficient which isn't stored).
This is the approach taken in proposed patch (stream-out degree of poly_int 
followed by coeffs).

> I think streaming NUM_POLY_INT_COEFFS once would be more compact (at
> least for non-aarch64/riscv targets).
I will try implementing this, thanks.

Thanks,
Prathamesh
> 
> Jakub



RE: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS

2024-07-31 Thread Prathamesh Kulkarni


> -Original Message-
> From: Prathamesh Kulkarni 
> Sent: Tuesday, July 30, 2024 4:44 PM
> To: Jakub Jelinek ; Richard Biener
> 
> Cc: Richard Sandiford ; gcc-
> patc...@gcc.gnu.org
> Subject: RE: Support streaming of poly_int for offloading when it's
> degree <= accel's NUM_POLY_INT_COEFFS
> 
> External email: Use caution opening links or attachments
> 
> 
> > -Original Message-
> > From: Jakub Jelinek 
> > Sent: Tuesday, July 30, 2024 3:16 PM
> > To: Richard Biener 
> > Cc: Richard Sandiford ; Prathamesh
> Kulkarni
> > ; gcc-patches@gcc.gnu.org
> > Subject: Re: Support streaming of poly_int for offloading when it's
> > degree <= accel's NUM_POLY_INT_COEFFS
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Tue, Jul 30, 2024 at 11:25:42AM +0200, Richard Biener wrote:
> > > Only "relevant" stuff should be streamed - the offload code and
> all
> > > trees refered to.
> >
> > Yeah.
> >
> > > > > I think all current issues are because of poly-* leaking in
> for
> > > > > cases where a non-poly would have worked fine, but I have not
> > had
> > > > > a look myself.
> > > >
> > > > One of the cases that Prathamesh mentions is streaming the mode
> > sizes.
> > > > Are those modes "offload target modes" or "host modes"?  It
> seems
> > > > like it shouldn't be an error for the host to have VLA modes per
> > se.
> > > > It's just that those modes can't be used in the host/offload
> > interface.
> > >
> > > There's a requirement that a mode mapping exists from the host to
> > > target enum machine_mode.  I don't remember exactly how we compute
> > > that mapping and whether streaming of some data (and thus poly-
> int)
> > > are part of this.
> >
> > During streaming out, the code records what machine modes are being
> > streamed (in streamer_mode_table).
> > For those modes (and their inner modes) then lto_write_mode_table
> > should stream a table with mode details like class, bits, size,
> inner
> > mode, nunits, real mode format if any, etc.
> > That table is then streamed in in the offloading compiler and it
> > attempts to find corresponding modes (and emits fatal_error if there
> > is no such mode; consider say x86_64 long double with XFmode being
> > used in offloading code which doesn't have XFmode support).
> > Now, because Richard S. changed GET_MODE_SIZE etc. to give poly_int
> > rather than int, this has been changed to use bp_pack_poly_value;
> but
> > that relies on the same number of coefficients for poly_int, which
> is
> > not the case when e.g. offloading aarch64 to gcn or nvptx.
> Indeed, for the minimal test:
> int main()
> {
>   int x;
>   #pragma omp target map (to: x)
>   {
> x = 0;
>   }
>   return x;
> }
> 
> Streaming out mode_table from AArch64 shows:
> mode = SI, mclass = 2, size = 4, prec = 32 mode = DI, mclass = 2, size
> = 8, prec = 64
> 
> While streaming-in for nvptx shows:
> mclass = 2, size = 4, prec = 0
> 
> The discrepancy happens because of differing value of
> NUM_POLY_INT_COEFFS between AArch64 and nvptx.
> From AArch64 it streams out size and prec as <4, 0> and <32, 0>
> respectively, where 0 comes from coeffs[1].
> While streaming-in from nvptx, since NUM_POLY_INT_COEFFS is 1, it
> incorrectly reads size as 4, and prec as 0.
> >
> > From what I can see, this mode table handling are the only uses of
> > bp_pack_poly_value.  So the options are either to stream at the
> start
> > of the mode table the NUM_POLY_INT_COEFFS value and in
> > bp_unpack_poly_value pass to it what we've read and fill in any
> > remaining coeffs with zeros, or in each bp_pack_poly_value stream
> the
> > number of coefficients and then stream that back in and fill in
> > remaining ones (and diagnose if it would try to read non-zero
> > coefficient which isn't stored).
> This is the approach taken in proposed patch (stream-out degree of
> poly_int followed by coeffs).
> 
> > I think streaming NUM_POLY_INT_COEFFS once would be more compact (at
> > least for non-aarch64/riscv targets).
> I will try implementing this, thanks.
Hi,
The attached patch streams-out NUM_POLY_INT_COEFFS only once at beginning of 
mode_table, which should make LTO bytecode more compact
for non VLA hosts. And changes streaming-in of poly_int as follows:

if (host_num_poly_int_coeffs <= N

RE: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS

2024-07-31 Thread Prathamesh Kulkarni


> -Original Message-
> From: Tobias Burnus 
> Sent: Tuesday, July 30, 2024 6:08 PM
> To: Prathamesh Kulkarni ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: Support streaming of poly_int for offloading when it's
> degree <= accel's NUM_POLY_INT_COEFFS
> 
> External email: Use caution opening links or attachments
> 
> 
> Prathamesh Kulkarni wrote:
> > Thanks for your suggestions on RFC email, the attached patch adds
> support for streaming of poly_int when it's degree <= accel's
> NUM_POLY_INT_COEFFS.
> 
> First, thanks a lot for your patch!
> 
> Secondly, it seems as if this patch is indented to fully or partially
> fix the following PRs.
> If so, can you add the PR to the commit log such that both "git log"
> will help finding the problem report and the commit will show up in
> the issue?
Hi Tobias,
Thanks for the pointers to relevant Bugzilla PRs! I have included them in my 
latest patch:
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658866.html

Thanks,
Prathamesh
> 
> 
> https://gcc.gnu.org/PR111937
>PR ipa/111937
>offloading from x86_64-linux-gnu to riscv*-linux-gnu will have
> issues
> 
> https://gcc.gnu.org/PR96265
>PR ipa/96265
>offloading to nvptx-none from aarch64-linux-gnu (and
> riscv*-linux-gnu) does not work
> 
> And - marked as duplicate of the latter:
> 
> https://gcc.gnu.org/PR114174
>PR lto/114174
>[aarch64] Offloading to nvptx-none
> 
> Thanks,
> 
> Tobias


RE: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS

2024-08-02 Thread Prathamesh Kulkarni


> -Original Message-
> From: Jakub Jelinek 
> Sent: Wednesday, July 31, 2024 8:46 PM
> To: Prathamesh Kulkarni 
> Cc: Richard Biener ; Richard Sandiford
> ; gcc-patches@gcc.gnu.org
> Subject: Re: Support streaming of poly_int for offloading when it's
> degree <= accel's NUM_POLY_INT_COEFFS
> 
> External email: Use caution opening links or attachments
> 
> 
> On Wed, Jul 31, 2024 at 02:58:34PM +, Prathamesh Kulkarni wrote:
> > There are a couple of issues in the patch:
> > (1) The patch streams out NUM_POLY_INT_COEFFS at beginning of
> > mode_table, which should work for bp_unpack_poly_value, (since AFAIK,
> > it's only called by lto_input_mode_table). However, I am not sure if
> we will always call lto_input_mode_table before streaming in poly_int64
> / poly_uint64 ? Or should we stream out host NUM_POLY_INT_COEFFS at a
> different place in LTO bytecode ?
> 
> The poly_ints unpacked in lto_input_mode_table obviously are done after
> that.
> If you use it for streaming in from other sections, you need to check if
> they can't be read before the mode table.
> And, you don't really need to stream out/in the number for non-
> offloading LTO, that should use just NUM_POLY_INT_COEFFS.
> 
> > --- a/gcc/data-streamer-in.cc
> > +++ b/gcc/data-streamer-in.cc
> > @@ -183,9 +183,7 @@ poly_uint64
> >  streamer_read_poly_uint64 (class lto_input_block *ib)  {
> >poly_uint64 res;
> > -  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
> > -res.coeffs[i] = streamer_read_uhwi (ib);
> > -  return res;
> > +  POLY_INT_READ_COMMON(res, streamer_read_uhwi (ib))
> 
> Why is this macro and not an inline function or inline function template
> oor inline function calling a lambda?
> Even if it has to be a macro (I don't see why), it should be defined
> such that you need to add ; at the end, ideally not include the return
> res; in there because it is just too weird if used like that (or make it
> return what will be returned and use return POLY_INT_READ_COMMON...) and
> there needs to be a space in between COMMON and (.
> 
> > @@ -194,9 +192,7 @@ poly_int64
> >  streamer_read_poly_int64 (class lto_input_block *ib)  {
> >poly_int64 res;
> > -  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
> > -res.coeffs[i] = streamer_read_hwi (ib);
> > -  return res;
> > +  POLY_INT_READ_COMMON(res, streamer_read_hwi (ib))
> >  }
> 
> Ditto.
> > +   __typeof(x.coeffs[0]) val = streamer_read_coeff;
> \
> 
> You certainly can't use a GCC extension like __typeof here.
> Plus missing space.
> 
> > +   if (val != 0)
> \
> > + fatal_error (input_location,
> \
> > +  "Degree of % exceeds "
> \
> 
> Diagnostics shouldn't start with uppercase letter.
> 
> > +  "% (%d)",
> \
> > +  NUM_POLY_INT_COEFFS);
> \
> > + }
> \
> > +}
> \
> > +
> \
> > +  return x;
> \
> > +}
> > +
> > --- a/gcc/poly-int.h
> > +++ b/gcc/poly-int.h
> > @@ -354,6 +354,10 @@ struct poly_result
> > ? (void) ((RES).coeffs[I] = VALUE) \
> > : (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C
> > (VALUE)))
> >
> > +/* Number of bits needed to represent maximum value of
> > +   NUM_POLY_INT_COEFFS defined by any target.  */ #define
> > +MAX_NUM_POLY_INT_COEFFS_BITS (2)
> 
> Why (2) and not just 2?
> There should be some static_assert to make sure it is a maximum for any
> target.
> 
> > +   if (!integer_zerop (val))
> > + fatal_error (input_location,
> > +  "Degree of % exceeds "
> 
> Again.
Hi Jakub,
Thanks for the suggestions. The attached patch streams out NUM_POLY_INT_COEFFS 
only if offloading is enabled,
defines poly_int_read_common to be variadic template, and asserts that 
host_num_poly_int_coeffs is streamed-in
before reading poly_int (AFAIU, the only user of streamer_read_poly_int64 
currently is ipa-modref pass).

Patch passes LTO bootstrap+test on aarch64-linux-gnu and in-progress on 
x86_64-linux-gnu. To test this patch with
offloading enabled, I have so far just been testing libgomp (make 
check-target-libgomp). Should I be testing any
additional parts of the testsuite for offloading ?

Does the attached patch look in the right direction ?

Signed-off-by: Prathamesh Kulkarni 

Thanks,
Prathamesh
> > +  "%");
> > + }
> > +}
> >  }
> 
> Jakub

Partially support streaming of poly_int for offloading.

Th

RE: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS

2024-08-05 Thread Prathamesh Kulkarni


> -Original Message-
> From: Jakub Jelinek 
> Sent: Friday, August 2, 2024 5:43 PM
> To: Prathamesh Kulkarni 
> Cc: Richard Biener ; Richard Sandiford
> ; gcc-patches@gcc.gnu.org
> Subject: Re: Support streaming of poly_int for offloading when it's
> degree <= accel's NUM_POLY_INT_COEFFS
> 
> External email: Use caution opening links or attachments
> 
> 
> On Fri, Aug 02, 2024 at 11:58:19AM +, Prathamesh Kulkarni wrote:
> > diff --git a/gcc/data-streamer-in.cc b/gcc/data-streamer-in.cc index
> > 7dce2928ef0..7b9d8cc0129 100644
> > --- a/gcc/data-streamer-in.cc
> > +++ b/gcc/data-streamer-in.cc
> > @@ -182,10 +182,8 @@ streamer_read_hwi (class lto_input_block *ib)
> >  poly_uint64
> >  streamer_read_poly_uint64 (class lto_input_block *ib)  {
> > -  poly_uint64 res;
> > -  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
> > -res.coeffs[i] = streamer_read_uhwi (ib);
> > -  return res;
> > +  return
> poly_int_read_common::coeff_type>
> > +  (streamer_read_uhwi, ib);
> 
> Can't you use
>   using coeff_type = poly_int_traits ::coeff_type;
>   return poly_int_read_common  (streamer_read_uhwi, ib); ?
> The call arguments on different line from the actual function name are
> ugly.
> 
> > --- a/gcc/data-streamer.cc
> > +++ b/gcc/data-streamer.cc
> > @@ -28,6 +28,12 @@ along with GCC; see the file COPYING3.  If not see
> > #include "cgraph.h"
> >  #include "data-streamer.h"
> >
> > +/* For offloading -- While streaming-out, host NUM_POLY_INT_COEFFS is
> > +   stored at beginning of mode_table. While streaming-in, the value
> is read in
> > +   host_num_poly_int_coeffs.  */
> > +
> > +unsigned host_num_poly_int_coeffs = 0;
> 
> I think it would be better to guard this with #ifdef ACCEL_COMPILER.
> 
> > +template
> > +poly_int poly_int_read_common (F read_coeff,
> > +Args ...args) {
> > +  poly_int x;
> > +  unsigned i;
> > +
> > +#ifndef ACCEL_COMPILER
> > +  host_num_poly_int_coeffs = NUM_POLY_INT_COEFFS; #endif
> 
> And instead of modifying a global var again and again do #ifdef
> ACCEL_COMPILER
>   const unsigned num_poly_int_coeffs = host_num_poly_int_coeffs;
>   gcc_assert (num_poly_int_coeffs > 0);
> #else
>   const unsigned num_poly_int_coeffs = NUM_POLY_INT_COEFFS; #endif and
> use num_poly_int_coeffs in the functions.
Hi Jakub,
I have done the suggested changes in the attached patch.
Does it look OK ?

Thanks,
Prathamesh
> 
> Jakub

Partially support streaming of poly_int for offloading.

The patch streams out host NUM_POLY_INT_COEFFS, and changes
streaming in as follows:

if (host_num_poly_int_coeffs <= NUM_POLY_INT_COEFFS)
{
  for (i = 0; i < host_num_poly_int_coeffs; i++)
poly_int.coeffs[i] = stream_in coeff;
  for (; i < NUM_POLY_INT_COEFFS; i++)
poly_int.coeffs[i] = 0;
}
else
{
  for (i = 0; i < NUM_POLY_INT_COEFFS; i++)
poly_int.coeffs[i] = stream_in coeff;

  /* Ensure that degree of poly_int <= accel NUM_POLY_INT_COEFFS.  */ 
  for (; i < host_num_poly_int_coeffs; i++)
{
  val = stream_in coeff;
  if (val != 0)
error ();
}
}

gcc/ChangeLog:
PR ipa/96265
PR ipa/111937
* data-streamer-in.cc (streamer_read_poly_uint64): Remove code for
streaming, and call poly_int_read_common instead. 
(streamer_read_poly_int64): Likewise.
* data-streamer.cc (host_num_poly_int_coeffs): Conditionally define
new variable if ACCEL_COMPILER is defined.
* data-streamer.h (host_num_poly_int_coeffs): Declare.
(poly_int_read_common): New function template.
(bp_unpack_poly_value): Remove code for streaming and call
poly_int_read_common instead.
* lto-streamer-in.cc (lto_input_mode_table): Stream-in host
NUM_POLY_INT_COEFFS into host_num_poly_int_coeffs if ACCEL_COMPILER
is defined.
* lto-streamer-out.cc (lto_write_mode_table): Stream out
NUM_POLY_INT_COEFFS if offloading is enabled.
* poly-int.h (MAX_NUM_POLY_INT_COEFFS_BITS): New macro.
* tree-streamer-in.cc (lto_input_ts_poly_tree_pointers): Adjust
streaming-in of poly_int.

Signed-off-by: Prathamesh Kulkarni 

diff --git a/gcc/data-streamer-in.cc b/gcc/data-streamer-in.cc
index 7dce2928ef0..57955a20091 100644
--- a/gcc/data-streamer-in.cc
+++ b/gcc/data-streamer-in.cc
@@ -182,10 +182,8 @@ streamer_read_hwi (class lto_input_block *ib)
 poly_uint64
 streamer_read_poly_uint64 (class lto_input_block *ib)
 {
-  poly_uint64 res;
-  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
-res.coeffs[i] = streamer_read_uhwi (ib);
-  return res;
+  using coeff_type = poly_int_traits::coeff_type;
+  ret

RE: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS

2024-08-07 Thread Prathamesh Kulkarni


> -Original Message-
> From: Jakub Jelinek 
> Sent: Monday, August 5, 2024 8:01 PM
> To: Prathamesh Kulkarni 
> Cc: Richard Biener ; Richard Sandiford
> ; gcc-patches@gcc.gnu.org
> Subject: Re: Support streaming of poly_int for offloading when it's
> degree <= accel's NUM_POLY_INT_COEFFS
> 
> External email: Use caution opening links or attachments
> 
> 
> On Mon, Aug 05, 2024 at 02:24:00PM +, Prathamesh Kulkarni wrote:
> > gcc/ChangeLog:
> >   PR ipa/96265
> >   PR ipa/111937
> >   * data-streamer-in.cc (streamer_read_poly_uint64): Remove code
> for
> >   streaming, and call poly_int_read_common instead.
> >   (streamer_read_poly_int64): Likewise.
> >   * data-streamer.cc (host_num_poly_int_coeffs): Conditionally
> define
> >   new variable if ACCEL_COMPILER is defined.
> >   * data-streamer.h (host_num_poly_int_coeffs): Declare.
> >   (poly_int_read_common): New function template.
> >   (bp_unpack_poly_value): Remove code for streaming and call
> >   poly_int_read_common instead.
> >   * lto-streamer-in.cc (lto_input_mode_table): Stream-in host
> >   NUM_POLY_INT_COEFFS into host_num_poly_int_coeffs if
> ACCEL_COMPILER
> >   is defined.
> >   * lto-streamer-out.cc (lto_write_mode_table): Stream out
> >   NUM_POLY_INT_COEFFS if offloading is enabled.
> >   * poly-int.h (MAX_NUM_POLY_INT_COEFFS_BITS): New macro.
> >   * tree-streamer-in.cc (lto_input_ts_poly_tree_pointers):
> Adjust
> >   streaming-in of poly_int.
> >
> > Signed-off-by: Prathamesh Kulkarni 
> 
> > --- a/gcc/data-streamer.cc
> > +++ b/gcc/data-streamer.cc
> > @@ -28,6 +28,14 @@ along with GCC; see the file COPYING3.  If not
> see
> > #include "cgraph.h"
> >  #include "data-streamer.h"
> >
> > +/* For offloading -- While streaming-out, host NUM_POLY_INT_COEFFS
> is
> > +   stored at beginning of mode_table. While streaming-in, the value
> > +is read in
> 
> Two spaces after . rather than just one, and because of that move in
> on the next line.
> 
> > +   host_num_poly_int_coeffs.  */
> 
> Otherwise LGTM.
Thanks, I have adjusted the formatting of the comment and a typo in 
streamer_read_poly_uint64.
Patch passes bootstrap+test and LTO bootstrap+test on aarch64-linux-gnu, LTO 
bootstrap+test on x86_64-linux-gnu.
And doesn't seem to regress libgomp testing for x86_64 -> nvptx offloading 
(altho there were a few occurrences of flaky tests in results).
Is the patch OK to commit ?

Thanks,
Prathamesh
> 
> Jakub

Partially support streaming of poly_int for offloading.

When offloading is enabled, the patch streams out host
NUM_POLY_INT_COEFFS, and changes streaming in as follows:

if (host_num_poly_int_coeffs <= NUM_POLY_INT_COEFFS)
{
  for (i = 0; i < host_num_poly_int_coeffs; i++)
poly_int.coeffs[i] = stream_in coeff;
  for (; i < NUM_POLY_INT_COEFFS; i++)
poly_int.coeffs[i] = 0;
}
else
{
  for (i = 0; i < NUM_POLY_INT_COEFFS; i++)
poly_int.coeffs[i] = stream_in coeff;

  /* Ensure that degree of poly_int <= accel NUM_POLY_INT_COEFFS.  */ 
  for (; i < host_num_poly_int_coeffs; i++)
{
  val = stream_in coeff;
  if (val != 0)
error ();
}
}

gcc/ChangeLog:
PR ipa/96265
PR ipa/111937
* data-streamer-in.cc (streamer_read_poly_uint64): Remove code for
streaming, and call poly_int_read_common instead. 
(streamer_read_poly_int64): Likewise.
* data-streamer.cc (host_num_poly_int_coeffs): Conditionally define
new variable if ACCEL_COMPILER is defined.
* data-streamer.h (host_num_poly_int_coeffs): Declare.
(poly_int_read_common): New function template.
(bp_unpack_poly_value): Remove code for streaming and call
poly_int_read_common instead.
* lto-streamer-in.cc (lto_input_mode_table): Stream-in host
NUM_POLY_INT_COEFFS into host_num_poly_int_coeffs if ACCEL_COMPILER
    is defined.
* lto-streamer-out.cc (lto_write_mode_table): Stream out
NUM_POLY_INT_COEFFS if offloading is enabled.
* poly-int.h (MAX_NUM_POLY_INT_COEFFS_BITS): New macro.
* tree-streamer-in.cc (lto_input_ts_poly_tree_pointers): Adjust
streaming-in of poly_int.

Signed-off-by: Prathamesh Kulkarni 

diff --git a/gcc/data-streamer-in.cc b/gcc/data-streamer-in.cc
index 7dce2928ef0..07dbc5e2bc3 100644
--- a/gcc/data-streamer-in.cc
+++ b/gcc/data-streamer-in.cc
@@ -182,10 +182,8 @@ streamer_read_hwi (class lto_input_block *ib)
 poly_uint64
 streamer_read_poly_uint64 (class lto_input_block *ib)
 {
-  poly_uint64 res;
-  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
-res.c

RE: Support streaming of poly_int for offloading when it's degree <= accel's NUM_POLY_INT_COEFFS

2024-08-07 Thread Prathamesh Kulkarni



> -Original Message-
> From: Jakub Jelinek 
> Sent: Wednesday, August 7, 2024 11:27 PM
> To: Prathamesh Kulkarni 
> Cc: Richard Biener ; Richard Sandiford
> ; gcc-patches@gcc.gnu.org
> Subject: Re: Support streaming of poly_int for offloading when it's
> degree <= accel's NUM_POLY_INT_COEFFS
> 
> External email: Use caution opening links or attachments
> 
> 
> On Wed, Aug 07, 2024 at 05:53:00PM +, Prathamesh Kulkarni wrote:
> > > Two spaces after . rather than just one, and because of that move
> in
> > > on the next line.
> > >
> > > > +   host_num_poly_int_coeffs.  */
> > >
> > > Otherwise LGTM.
> > Thanks, I have adjusted the formatting of the comment and a typo in
> streamer_read_poly_uint64.
> > Patch passes bootstrap+test and LTO bootstrap+test on aarch64-linux-
> gnu, LTO bootstrap+test on x86_64-linux-gnu.
> > And doesn't seem to regress libgomp testing for x86_64 -> nvptx
> offloading (altho there were a few occurrences of flaky tests in
> results).
> > Is the patch OK to commit ?
> 
> The "Otherwise LGTM." was already covering that.
> So yes.
Thanks, pushed to trunk in:
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=38900247f3880d6eca2e364a000e5898f8deae64

Thanks,
Prathamesh
> 
> Jakub



RE: [nvptx] Fix code-gen for alias attribute

2024-09-01 Thread Prathamesh Kulkarni
> -Original Message-
> From: Prathamesh Kulkarni 
> Sent: Monday, August 26, 2024 4:21 PM
> To: Thomas Schwinge ; gcc-patches@gcc.gnu.org
> Subject: [nvptx] Fix code-gen for alias attribute
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi,
> For the following test (adapted from pr96390.c):
> 
> __attribute__((noipa)) int foo () { return 42; } int bar ()
> __attribute__((alias ("foo"))); int baz () __attribute__((alias
> ("bar")));
> 
> int main ()
> {
>   int n;
>   #pragma omp target map(from:n)
> n = baz ();
>   return n;
> }
> 
> Compiling with -fopenmp -foffload=nvptx-none -foffload=-malias -
> foffload=-mptx=6.3 results in:
> 
> ptxas fatal   : Internal error: alias to unknown symbol
> nvptx-as: ptxas returned 255 exit status nvptx mkoffload: fatal error:
> ../../install/bin/aarch64-unknown-linux-gnu-accel-nvptx-none-gcc
> returned 1 exit status compilation terminated.
> lto-wrapper: fatal error: /home/prathameshk/gnu-toolchain/gcc/grcogcc-
> 38/install/libexec/gcc/aarch64-unknown-linux-gnu/15.0.0//accel/nvptx-
> none/mkoffload returned 1 exit status compilation terminated.
> 
> This happens because ptx code-gen shows:
> 
> // BEGIN GLOBAL FUNCTION DEF: foo
> .visible .func (.param.u32 %value_out) foo {
> .reg.u32 %value;
> mov.u32 %value, 42;
> st.param.u32[%value_out], %value;
> ret;
> }
> .visible .func (.param.u32 %value_out) bar; .alias bar,foo; .visible
> .func (.param.u32 %value_out) baz; .alias baz,bar;
> 
> .alias baz, bar is invalid since PTX requires aliasee to be a defined
> function:
> https://sw-docs-dgx-station.nvidia.com/cuda-latest/parallel-thread-
> execution/latest-internal/#kernel-and-function-directives-alias
> 
> The patch uses cgraph_node::get(name)->ultimate_alias_target ()
> instead of the provided value in nvptx_asm_output_def_from_decls.
> For the above case, it now generates the following ptx:
> 
> .alias baz,foo;
> instead of:
> .alias baz,bar;
> 
> which fixes the issue.
> 
> Does the patch look in the right direction ?
> 
> Signed-off-by: Prathamesh Kulkarni 
Hi,
ping: https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661457.html

Thanks,
Prathamesh
> 
> Thanks,
> Prathamesh



[gimplify.cc] Avoid ICE when passing VLA vector to accelerator

2024-08-31 Thread Prathamesh Kulkarni
Hi,
For the following test:
#include 

int main()
{
  svint32_t x;
  #pragma omp target map(x)
    x;
  return 0;
}

compiling with -fopenmp -foffload=nvptx-none results in following ICE:

t_sve.c: In function 'main':
t_sve.c:6:11: internal compiler error: Segmentation fault
    6 |   #pragma omp target map(x)
      |           ^~~
0x228ed13 internal_error(char const*, ...)
        ../../gcc/gcc/diagnostic-global-context.cc:491
0xfcf68f crash_signal
        ../../gcc/gcc/toplev.cc:321
0xc17d9c omp_add_variable
        ../../gcc/gcc/gimplify.cc:7811
0xc17d9c omp_add_variable
        ../../gcc/gcc/gimplify.cc:7752
0xc4176b gimplify_scan_omp_clauses
        ../../gcc/gcc/gimplify.cc:12881
0xc46d53 gimplify_omp_workshare
        ../../gcc/gcc/gimplify.cc:17139
0xc23383 gimplify_expr(tree_node**, gimple**, gimple**, bool (*)(tree_node*), 
int)
        ../../gcc/gcc/gimplify.cc:18668
0xc27f53 gimplify_stmt(tree_node**, gimple**)
        ../../gcc/gcc/gimplify.cc:7646
0xc24ef7 gimplify_statement_list
        ../../gcc/gcc/gimplify.cc:2250
0xc24ef7 gimplify_expr(tree_node**, gimple**, gimple**, bool (*)(tree_node*), 
int)
        ../../gcc/gcc/gimplify.cc:18565
0xc27f53 gimplify_stmt(tree_node**, gimple**)
        ../../gcc/gcc/gimplify.cc:7646
0xc289d3 gimplify_bind_expr
        ../../gcc/gcc/gimplify.cc:1642
0xc24b9b gimplify_expr(tree_node**, gimple**, gimple**, bool (*)(tree_node*), 
int)
        ../../gcc/gcc/gimplify.cc:18315
0xc27f53 gimplify_stmt(tree_node**, gimple**)
        ../../gcc/gcc/gimplify.cc:7646
0xc24ef7 gimplify_statement_list
        ../../gcc/gcc/gimplify.cc:2250
0xc24ef7 gimplify_expr(tree_node**, gimple**, gimple**, bool (*)(tree_node*), 
int)
        ../../gcc/gcc/gimplify.cc:18565
0xc27f53 gimplify_stmt(tree_node**, gimple**)
        ../../gcc/gcc/gimplify.cc:7646
0xc2aadb gimplify_body(tree_node*, bool)
        ../../gcc/gcc/gimplify.cc:19393
0xc2b05f gimplify_function_tree(tree_node*)
        ../../gcc/gcc/gimplify.cc:19594
0xa0e47f cgraph_node::analyze()
        ../../gcc/gcc/cgraphunit.cc:687 

The attached patch fixes the issue by checking if variable is VLA vector,
and emits an error in that case since no accel currently supports VLA vectors.
Does the patch look OK ?

Signed-off-by: Prathamesh Kulkarni 

Thanks,
Prathamesh

[gimplify.cc] Emit an error if VLA vector is passed to accelerator.

gcc/ChangeLog:

* gimplify.cc (omp_add_variable): Emit an error if VLA vector is passed
to accelerator.

Signed-off-by: Prathamesh Kulkarni 

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 26a216e151d..fb7bd919b54 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -7789,6 +7789,11 @@ omp_add_variable (struct gimplify_omp_ctx *ctx, tree 
decl, unsigned int flags)
  the parameters of the type.  */
   if (DECL_SIZE (decl) && TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST)
 {
+  /* For now, bail out if a VLA vector is passed to accelerator.  */
+  if (VECTOR_TYPE_P (TREE_TYPE (decl))
+ && POLY_INT_CST_P (DECL_SIZE (decl)))
+   fatal_error (input_location,
+"passing VLA vector to accelerator not implemented");
   /* Add the pointer replacement variable as PRIVATE if the variable
 replacement is private, else FIRSTPRIVATE since we'll need the
 address of the original variable either for SHARED, or for the


RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for accelerator

2024-09-02 Thread Prathamesh Kulkarni


> -Original Message-
> From: Prathamesh Kulkarni 
> Sent: Thursday, August 22, 2024 7:41 PM
> To: Richard Biener 
> Cc: Richard Sandiford ; Thomas Schwinge
> ; gcc-patches@gcc.gnu.org
> Subject: RE: Re-compute TYPE_MODE and DECL_MODE while streaming in for
> accelerator
> 
> External email: Use caution opening links or attachments
> 
> 
> > -Original Message-
> > From: Richard Biener 
> > Sent: Wednesday, August 21, 2024 5:09 PM
> > To: Prathamesh Kulkarni 
> > Cc: Richard Sandiford ; Thomas Schwinge
> > ; gcc-patches@gcc.gnu.org
> > Subject: RE: Re-compute TYPE_MODE and DECL_MODE while streaming in
> for
> > accelerator
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Wed, 21 Aug 2024, Prathamesh Kulkarni wrote:
> >
> > >
> > >
> > > > -Original Message-
> > > > From: Richard Biener 
> > > > Sent: Tuesday, August 20, 2024 10:36 AM
> > > > To: Richard Sandiford 
> > > > Cc: Prathamesh Kulkarni ; Thomas
> Schwinge
> > > > ; gcc-patches@gcc.gnu.org
> > > > Subject: Re: Re-compute TYPE_MODE and DECL_MODE while streaming
> in
> > > > for accelerator
> > > >
> > > > External email: Use caution opening links or attachments
> > > >
> > > >
> > > > > Am 19.08.2024 um 20:56 schrieb Richard Sandiford
> > > > :
> > > > >
> > > > > Prathamesh Kulkarni  writes:
> > > > >> diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
> > > > >> index
> > > > >> cbf6041fd68..0420183faf8 100644
> > > > >> --- a/gcc/lto-streamer-in.cc
> > > > >> +++ b/gcc/lto-streamer-in.cc
> > > > >> @@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.  If
> > not
> > > > see
> > > > >> #include "debug.h"
> > > > >> #include "alloc-pool.h"
> > > > >> #include "toplev.h"
> > > > >> +#include "stor-layout.h"
> > > > >>
> > > > >> /* Allocator used to hold string slot entries for line map
> > > > streaming.
> > > > >> */ static struct object_allocator
> > > > >> *string_slot_allocator; @@ -1752,6 +1753,17 @@
> lto_read_tree_1
> > > > (class lto_input_block *ib, class data_in *data_in, tree expr)
> > > > >> with -g1, see for example PR113488.  */
> > > > >>   else if (DECL_P (expr) && DECL_ABSTRACT_ORIGIN (expr)
> ==
> > > > expr)
> > > > >>DECL_ABSTRACT_ORIGIN (expr) = NULL_TREE;
> > > > >> +
> > > > >> +#ifdef ACCEL_COMPILER
> > > > >> +  /* For decl with aggregate type, host streams out
> > VOIDmode.
> > > > >> + Compute the correct DECL_MODE by calling relayout_decl.
> > */
> > > > >> +  if ((VAR_P (expr)
> > > > >> +   || TREE_CODE (expr) == PARM_DECL
> > > > >> +   || TREE_CODE (expr) == FIELD_DECL)
> > > > >> +  && AGGREGATE_TYPE_P (TREE_TYPE (expr))
> > > > >> +  && DECL_MODE (expr) == VOIDmode)
> > > > >> +relayout_decl (expr);
> > > > >> +#endif
> > > > >
> > > > > Genuine question, but: is relayout_decl safe in this context?
> > It
> > > > does
> > > > > a lot more than just reset the mode.  It also applies the
> target
> > > > ABI's
> > > > > preferences wrt alignment, padding, and so on, rather than
> > > > preserving
> > > > > those of the host's.
> > > >
> > > > It would be better to just recompute the mode here.
> > > Hi,
> > > The attached patch sets DECL_MODE (expr) to TYPE_MODE (TREE_TYPE
> > (expr)) in lto_read_tree_1 instead of calling relayout_decl (expr).
> > > I checked layout_decl_type does the same thing for setting decl
> > mode,
> > > except for bit fields. Since bit-fields cannot have aggregate
> type,
> > I am assuming setting DECL_MODE (expr) to TYPE_MODE (TREE_TYPE
> (expr))
> > would be OK in this case ?
> >
> > Yep, that should work.
> Thanks, I have committed the patch in:
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=792adb8d222d0d1d16b18287
> 1e105f47823b8e72
Hi,
This also results in same f

RE: [gimplify.cc] Avoid ICE when passing VLA vector to accelerator

2024-09-03 Thread Prathamesh Kulkarni
> -Original Message-
> From: Richard Biener 
> Sent: Monday, September 2, 2024 12:47 PM
> To: Prathamesh Kulkarni 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [gimplify.cc] Avoid ICE when passing VLA vector to
> accelerator
> 
> External email: Use caution opening links or attachments
> 
> 
> On Sun, 1 Sep 2024, Prathamesh Kulkarni wrote:
> 
> > Hi,
> > For the following test:
> > #include 
> >
> > int main()
> > {
> >   svint32_t x;
> >   #pragma omp target map(x)
> > x;
> >   return 0;
> > }
> >
> > compiling with -fopenmp -foffload=nvptx-none results in following
> ICE:
> >
> > t_sve.c: In function 'main':
> > t_sve.c:6:11: internal compiler error: Segmentation fault
> > 6 |   #pragma omp target map(x)
> >   |   ^~~
> > 0x228ed13 internal_error(char const*, ...)
> > ../../gcc/gcc/diagnostic-global-context.cc:491
> > 0xfcf68f crash_signal
> > ../../gcc/gcc/toplev.cc:321
> > 0xc17d9c omp_add_variable
> > ../../gcc/gcc/gimplify.cc:7811
> 
> that's not on trunk head?  Anyway, I think that instead
> 
>   /* When adding a variable-sized variable, we have to handle all
> sorts
>  of additional bits of data: the pointer replacement variable, and
>  the parameters of the type.  */
>   if (DECL_SIZE (decl) && TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST)
> 
> should instead be checking for !POLY_INT_CST_P (DECl_SIZE (decl))
Hi Richard,
Thanks for the suggestions. The attached patch adds !POLY_INT_CST_P check in 
omp_add_variable
(and few more places where it segfaulted), but keeps TREE_CODE (DECL_SIZE 
(decl)) != INTEGER_CST check to
avoid above ICE with -msve-vector-bits= option.

The test now fails with:
lto1: fatal error: degree of 'poly_int' exceeds 'NUM_POLY_INT_COEFFS' (1)
compilation terminated.
nvptx mkoffload: fatal error: 
../install/bin/aarch64-unknown-linux-gnu-accel-nvptx-none-gcc returned 1 exit 
status
compilation terminated.

Which looks reasonable IMO, since we don't yet fully support streaming of 
poly_ints
(and compiles OK when length is set with -msve-vector-bits= option).

Bootstrap+test in progress on aarch64-linux-gnu.
Does the patch look OK ?

Signed-off-by: Prathamesh Kulkarni 

Thanks,
Prathamesh
> 
> Richard.
> 
> 
> > 0xc17d9c omp_add_variable
> > ../../gcc/gcc/gimplify.cc:7752 0xc4176b
> > gimplify_scan_omp_clauses
> > ../../gcc/gcc/gimplify.cc:12881
> > 0xc46d53 gimplify_omp_workshare
> > ../../gcc/gcc/gimplify.cc:17139
> > 0xc23383 gimplify_expr(tree_node**, gimple**, gimple**, bool
> (*)(tree_node*), int)
> > ../../gcc/gcc/gimplify.cc:18668
> > 0xc27f53 gimplify_stmt(tree_node**, gimple**)
> > ../../gcc/gcc/gimplify.cc:7646
> > 0xc24ef7 gimplify_statement_list
> > ../../gcc/gcc/gimplify.cc:2250
> > 0xc24ef7 gimplify_expr(tree_node**, gimple**, gimple**, bool
> (*)(tree_node*), int)
> > ../../gcc/gcc/gimplify.cc:18565
> > 0xc27f53 gimplify_stmt(tree_node**, gimple**)
> > ../../gcc/gcc/gimplify.cc:7646
> > 0xc289d3 gimplify_bind_expr
> > ../../gcc/gcc/gimplify.cc:1642 0xc24b9b
> > gimplify_expr(tree_node**, gimple**, gimple**, bool (*)(tree_node*),
> int)
> > ../../gcc/gcc/gimplify.cc:18315
> > 0xc27f53 gimplify_stmt(tree_node**, gimple**)
> > ../../gcc/gcc/gimplify.cc:7646
> > 0xc24ef7 gimplify_statement_list
> > ../../gcc/gcc/gimplify.cc:2250
> > 0xc24ef7 gimplify_expr(tree_node**, gimple**, gimple**, bool
> (*)(tree_node*), int)
> > ../../gcc/gcc/gimplify.cc:18565
> > 0xc27f53 gimplify_stmt(tree_node**, gimple**)
> > ../../gcc/gcc/gimplify.cc:7646 0xc2aadb
> > gimplify_body(tree_node*, bool)
> > ../../gcc/gcc/gimplify.cc:19393 0xc2b05f
> > gimplify_function_tree(tree_node*)
> > ../../gcc/gcc/gimplify.cc:19594 0xa0e47f
> > cgraph_node::analyze()
> > ../../gcc/gcc/cgraphunit.cc:687
> >
> > The attached patch fixes the issue by checking if variable is VLA
> > vector, and emits an error in that case since no accel currently
> supports VLA vectors.
> > Does the patch look OK ?
> >
> > Signed-off-by: Prathamesh Kulkarni 
> >
> > Thanks,
> > Prathamesh
> >
> >
> 
> --
> Richard Biener 
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)
Avoid ICE when passing VLA vector to accelerator.

gcc/ChangeLog:
*

RE: [gimplify.cc] Avoid ICE when passing VLA vector to accelerator

2024-09-05 Thread Prathamesh Kulkarni


> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, September 3, 2024 2:09 PM
> To: Prathamesh Kulkarni 
> Cc: gcc-patches@gcc.gnu.org
> Subject: RE: [gimplify.cc] Avoid ICE when passing VLA vector to
> accelerator
> 
> External email: Use caution opening links or attachments
> 
> 
> On Tue, 3 Sep 2024, Prathamesh Kulkarni wrote:
> 
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Monday, September 2, 2024 12:47 PM
> > > To: Prathamesh Kulkarni 
> > > Cc: gcc-patches@gcc.gnu.org
> > > Subject: Re: [gimplify.cc] Avoid ICE when passing VLA vector to
> > > accelerator
> > >
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > On Sun, 1 Sep 2024, Prathamesh Kulkarni wrote:
> > >
> > > > Hi,
> > > > For the following test:
> > > > #include 
> > > >
> > > > int main()
> > > > {
> > > >   svint32_t x;
> > > >   #pragma omp target map(x)
> > > > x;
> > > >   return 0;
> > > > }
> > > >
> > > > compiling with -fopenmp -foffload=nvptx-none results in
> following
> > > ICE:
> > > >
> > > > t_sve.c: In function 'main':
> > > > t_sve.c:6:11: internal compiler error: Segmentation fault
> > > > 6 |   #pragma omp target map(x)
> > > >   |   ^~~
> > > > 0x228ed13 internal_error(char const*, ...)
> > > > ../../gcc/gcc/diagnostic-global-context.cc:491
> > > > 0xfcf68f crash_signal
> > > > ../../gcc/gcc/toplev.cc:321 0xc17d9c omp_add_variable
> > > > ../../gcc/gcc/gimplify.cc:7811
> > >
> > > that's not on trunk head?  Anyway, I think that instead
> > >
> > >   /* When adding a variable-sized variable, we have to handle all
> > > sorts
> > >  of additional bits of data: the pointer replacement variable,
> and
> > >  the parameters of the type.  */
> > >   if (DECL_SIZE (decl) && TREE_CODE (DECL_SIZE (decl)) !=
> > > INTEGER_CST)
> > >
> > > should instead be checking for !POLY_INT_CST_P (DECl_SIZE (decl))
> > Hi Richard,
> > Thanks for the suggestions. The attached patch adds !POLY_INT_CST_P
> > check in omp_add_variable (and few more places where it segfaulted),
> > but keeps TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST check to avoid
> above ICE with -msve-vector-bits= option.
> >
> > The test now fails with:
> > lto1: fatal error: degree of 'poly_int' exceeds
> 'NUM_POLY_INT_COEFFS'
> > (1) compilation terminated.
> > nvptx mkoffload: fatal error:
> > ../install/bin/aarch64-unknown-linux-gnu-accel-nvptx-none-gcc
> returned 1 exit status compilation terminated.
> >
> > Which looks reasonable IMO, since we don't yet fully support
> streaming
> > of poly_ints (and compiles OK when length is set with -msve-vector-
> bits= option).
> >
> > Bootstrap+test in progress on aarch64-linux-gnu.
> > Does the patch look OK ?
> 
> Please use use !poly_int_tree_p which checks for both INTEGER_CST and
> POLY_INT_CST_P.
> 
> OK with that change.
Thanks, I have committed the attached patch in:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=ae88e91938af364ef5613e5461b12b484b578bc5

after verifying it passes bootstrap+test on aarch64-linux-gnu and survives 
libgomp tests
for Aarch64/nvptx offloading.

Thanks,
Prathamesh
> 
> Richard.
> 
> >
> > Signed-off-by: Prathamesh Kulkarni 
> >
> > Thanks,
> > Prathamesh
> > >
> > > Richard.
> > >
> > >
> > > > 0xc17d9c omp_add_variable
> > > > ../../gcc/gcc/gimplify.cc:7752 0xc4176b
> > > > gimplify_scan_omp_clauses
> > > > ../../gcc/gcc/gimplify.cc:12881
> > > > 0xc46d53 gimplify_omp_workshare
> > > > ../../gcc/gcc/gimplify.cc:17139
> > > > 0xc23383 gimplify_expr(tree_node**, gimple**, gimple**, bool
> > > (*)(tree_node*), int)
> > > > ../../gcc/gcc/gimplify.cc:18668
> > > > 0xc27f53 gimplify_stmt(tree_node**, gimple**)
> > > > ../../gcc/gcc/gimplify.cc:7646
> > > > 0xc24ef7 gimplify_statement_list
> > > > ../../gcc/gcc/gimplify.cc:2250
> > > > 0xc24ef7 gimplify_expr(tree_node**, gimple**, gimple**, bool
> > > (*)(tree_node*), int)
> > > > ../../gcc/gcc/gimpl

RE: [nvptx] Pass -m32/-m64 to host_compiler if it has multilib support

2024-09-08 Thread Prathamesh Kulkarni
> -Original Message-
> From: Thomas Schwinge 
> Sent: Friday, September 6, 2024 2:31 PM
> To: Prathamesh Kulkarni ; Richard Biener
> 
> Cc: Andrew Pinski ; gcc-patches@gcc.gnu.org; Jakub
> Jelinek 
> Subject: RE: [nvptx] Pass -m32/-m64 to host_compiler if it has
> multilib support
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi!
Hi Thomas,
Thanks for the review!
> 
> On 2024-08-16T15:36:29+, Prathamesh Kulkarni
>  wrote:
> >> > Am 13.08.2024 um 17:48 schrieb Thomas Schwinge
> >> :
> >> > On 2024-08-12T07:50:07+, Prathamesh Kulkarni
> >>  wrote:
> >> >>> From: Thomas Schwinge 
> >> >>> Sent: Friday, August 9, 2024 12:55 AM
> >> >
> >> >>> On 2024-08-08T06:46:25-0700, Andrew Pinski 
> >> wrote:
> >> >>>> On Thu, Aug 8, 2024 at 6:11 AM Prathamesh Kulkarni
> >> >>>>  wrote:
> >> >>>>> compiled with -fopenmp -foffload=nvptx-none now fails with:
> >> >>>>> gcc: error: unrecognized command-line option '-m64'
> >> >>>>> nvptx mkoffload: fatal error: ../install/bin/gcc returned 1
> >> >>>>> exit
> >> >>> status compilation terminated.
> >> >>>
> >> >>> Heh.  Yeah...
> >> >>>
> >> >>>>> As mentioned in RFC email, this happens because
> >> >>>>> nvptx/mkoffload.cc:compile_native passes -m64/-m32 to host
> >> >>>>> compiler
> >> >>> depending on whether offload_abi is OFFLOAD_ABI_LP64 or
> >> >>> OFFLOAD_ABI_ILP32, and aarch64 backend doesn't recognize these
> >> >>> options.
> >> >
> >> >>> So, my idea is: instead of the current strategy that the host
> >> >>> 'TARGET_OFFLOAD_OPTIONS' synthesizes '-foffload-abi=lp64' etc.,
> >> >>> which the 'mkoffload's then interpret and re-synthesize '-m64'
> etc.
> >> >>> -- how about we instead directly tell the 'mkoffload's the
> >> >>> relevant ABI options?  That is, 'TARGET_OFFLOAD_OPTIONS'
> instead
> >> >>> synthesizes '-foffload-abi=-m64'
> >> >>> etc., which the 'mkoffload's can then readily use.  Could you
> >> >>> please give that a try, and/or does anyone see any issues with
> that approach?
> >> >>>
> >> >>> And use something like '-foffload-abi=disable' to replace the
> current:
> >> >>>
> >> >>>/* PR libgomp/65099: Currently, we only support offloading
> in 64-bit
> >> >>>   configurations.  */
> >> >>>if (offload_abi == OFFLOAD_ABI_LP64)
> >> >>>  {
> >> >>>
> >> >>> (As discussed before, this should be done differently
> altogether,
> >> >>> but that's for another day.)
> >> >> Sorry, I don't quite follow. Currently we enable offloading if
> >> >> offload_abi == OFFLOAD_ABI_LP64, which is synthesized from
> >> >> -foffload-abi=lp64. If we change -foffload-abi to instead
> specify
> >> >> host-specific ABI opts, I guess mkoffload will still need to
> >> >> somehow figure out which ABI is used, so it can disable
> offloading
> >> >> for 32-bit ? I suppose we could adjust TARGET_OFFLOAD_OPTIONS
> for
> >> >> each host to
> >> pass -foffload-abi=disable if TARGET_ILP32 is set and offload
> target
> >> is nvptx, but not sure if that'd be correct ?
> >> >
> >> > Basically, yes.  My idea was that all 'TARGET_OFFLOAD_OPTIONS'
> >> > implementations return either the correct host flags to be used
> by
> >> > the 'mkoffload's (the case that offloading is supported for the
> >> > current host flags/ABI configuration), or otherwise return '-
> foffload-abi=disable'.
> 
> Oh..., you're right of course: we do need to continue to tell the
> 'mkoffload's which kind of offload code to generate!  My bad...
> 
> >> >> I added another option -foffload-abi-host-opts to specify host
> abi
> >> >> opts, and leave -foffload-abi to specify if ABI is 32/64 bit
> which
> >> >> mkoffload can use to enable/disable offloading (as before).
> >> >
> >>

RE: [nvptx] Pass -m32/-m64 to host_compiler if it has multilib support

2024-09-10 Thread Prathamesh Kulkarni
> -Original Message-
> From: Thomas Schwinge 
> Sent: Monday, September 9, 2024 8:50 PM
> To: Prathamesh Kulkarni ; Richard Biener
> 
> Cc: Andrew Pinski ; gcc-patches@gcc.gnu.org; Jakub
> Jelinek 
> Subject: RE: [nvptx] Pass -m32/-m64 to host_compiler if it has
> multilib support
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi Prathamesh!
Hi Thomas,
> 
> On 2024-09-09T06:31:18+, Prathamesh Kulkarni
>  wrote:
> >> -Original Message-
> >> From: Thomas Schwinge 
> >> Sent: Friday, September 6, 2024 2:31 PM On 2024-08-
> 16T15:36:29+,
> >> Prathamesh Kulkarni  wrote:
> >> >> > Am 13.08.2024 um 17:48 schrieb Thomas Schwinge
> >> >> :
> >> >> > On 2024-08-12T07:50:07+, Prathamesh Kulkarni
> >> >>  wrote:
> >> >> >> I added another option -foffload-abi-host-opts to specify
> host
> >> abi
> >> >> >> opts, and leave -foffload-abi to specify if ABI is 32/64 bit
> >> which
> >> >> >> mkoffload can use to enable/disable offloading (as before).
> 
> >> > --- a/gcc/config/aarch64/aarch64.cc
> >> > +++ b/gcc/config/aarch64/aarch64.cc
> >> > @@ -18999,9 +18999,9 @@ static char *  aarch64_offload_options
> >> > (void)  {
> >> >if (TARGET_ILP32)
> >> > -return xstrdup ("-foffload-abi=ilp32");
> >> > +return xstrdup ("-foffload-abi=ilp32
> >> > + -foffload-abi-host-opts=-mabi=ilp32");
> >> >else
> >> > -return xstrdup ("-foffload-abi=lp64");
> >> > +return xstrdup ("-foffload-abi=lp64
> >> > + -foffload-abi-host-opts=-mabi=lp64");
> >> >  }
> >>
> >> As none of the current offload compilers is set up of ILP32, I
> >> suggest we continue to pass '-foffload-abi=ilp32' without
> >> '-foffload-abi-host- opts=[...]' -- the 'mkoffload's in that case
> >> should get to the point where the latter is used.
> 
> Oh...  I was wrong with the latter item: I failed to see that the
> 'mkoffload's still do 'compile_native' even if they don't create an
> actual offload image, sorry!
> 
> > Um, would that still possibly result in arch mismatch for host
> objects and xnvptx-none.o if we don't pass host ABI opts for ILP32 ?
> > For eg, if the host compiler defaults to 64-bit code-gen (and user
> > requests for 32-bit code gen on host), and we avoid passing host ABI
> opts for -foffload-abi=ilp32, it will generate 64-bit xnvptx-none.o
> (corresponding to empty ptx_cfile_name), while rest of the host
> objects will be 32-bit, or am I misunderstanding ?
> 
> You're quite right -- my fault.
> 
> > The attached patch avoids passing -foffload-abi-host-opts if -
> foffload-abi=ilp32.
> 
> So, sorry for the back and forth.  I think we now agree that we do
> need '-foffload-abi-host-opts=[...]' specified in call cases (as you
> originally had), and then again unconditionally use
> 'offload_abi_host_opts' in the 'mkoffload's' 'compile_native'
> functions.
Done in the attached patch, thanks.
> 
> > Could you please test the patch for gcn backend ?
> 
> I'll do that.
> 
> > [nvptx] Pass host specific ABI opts from mkoffload.
> >
> > The patch adds an option -foffload-abi-host-opts, which is set by
> host
> > in TARGET_OFFLOAD_OPTIONS, and mkoffload then passes it's value
> 
> "its", by the way.  ;-)
Fixed 😊
> 
> > to host_compiler.
> 
> > --- a/gcc/common.opt
> > +++ b/gcc/common.opt
> 
> > +foffload-abi-host-opts=
> > +Common Driver Joined MissingArgError(option missing after %qs)
> > +-foffload-abi-host-opts= Specify host ABI options.
> > +
> 
> Still need TAB between '-foffload-abi-host-opts=' and its
> help text.
Done.
> 
> > --- a/gcc/config/gcn/mkoffload.cc
> > +++ b/gcc/config/gcn/mkoffload.cc
> 
> > @@ -998,6 +996,14 @@ main (int argc, char **argv)
> >"unrecognizable argument of option %<" STR
> "%>");
> >   }
> >  #undef STR
> > +  else if (startswith (argv[i], "-foffload-abi-host-opts="))
> > + {
> > +   if (offload_abi_host_opts)
> > + fatal_error (input_location,
> > +  "-foffload-abi-host-opts specified multiple
> > + times");
> 
> ACK, but again '%<-fof

RE: [nvptx] Pass -m32/-m64 to host_compiler if it has multilib support

2024-09-10 Thread Prathamesh Kulkarni
> -Original Message-
> From: Thomas Schwinge 
> Sent: Tuesday, September 10, 2024 8:19 PM
> To: Prathamesh Kulkarni ; Richard Biener
> 
> Cc: Andrew Pinski ; gcc-patches@gcc.gnu.org; Jakub
> Jelinek 
> Subject: RE: [nvptx] Pass -m32/-m64 to host_compiler if it has
> multilib support
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi Prathamesh!
> 
> On 2024-09-10T13:22:10+, Prathamesh Kulkarni
>  wrote:
> >> -Original Message-
> >> From: Thomas Schwinge 
> >> Sent: Monday, September 9, 2024 8:50 PM
> 
> >> > Could you please test the patch for gcn backend ?
> 
> I've successfully tested x86_64 host with GCN as well as nvptx
> offloading, and also ppc64le host with nvptx offloading.
Thanks for the thorough testing!
> 
> I just realized two more minor things:
> 
> > [nvptx] Pass host specific ABI opts from mkoffload.
> >
> > The patch adds an option -foffload-abi-host-opts, which is set by
> host
> > in TARGET_OFFLOAD_OPTIONS, and mkoffload then passes its value to
> > host_compiler.
> >
> 
> Please add here "   PR target/96265".
> 
> > gcc/ChangeLog:
> >   * common.opt (foffload-abi-host-opts): New option.
> >   * config/aarch64/aarch64.cc (aarch64_offload_options): Pass
> >   -foffload-abi-host-opts.
> >   * config/i386/i386-opts.cc (ix86_offload_options): Likewise.
> >   * config/rs6000/rs6000.cc (rs6000_offload_options): Likewise.
> >   * config/nvptx/mkoffload.cc (offload_abi_host_opts): Define.
> >   (compile_native): Append offload_abi_host_opts to
> argv_obstack.
> >   (main): Handle option -foffload-abi-host-opts.
> >   * config/gcn/mkoffload.cc (offload_abi_host_opts): Define.
> >   (compile_native): Append offload_abi_host_opts to
> argv_obstack.
> >   (main): Handle option -foffload-abi-host-opts.
> >   * lto-wrapper.cc (merge_and_complain): Handle
> >   -foffload-abi-host-opts.
> >   (append_compiler_options): Likewise.
> >   * opts.cc (common_handle_option): Likewise.
> >
> > Signed-off-by: Prathamesh Kulkarni 
> 
> Given that we're adding a new option to 'gcc/common.opt', do we need
> to update (regenerate?) 'gcc/common.opt.urls'?  (I've not yet had the
> need myself, and therefore not yet looked up how to do that.)  Or
> maybe not, given that '-foffload-abi-host-opts=[...]' isn't
> documented?
I checked common.opt.urls doesn't seem to have entry for -foffload-abi,
so I guess it's probably not necessary for -foffload-abi-host-opts either ?
Or should we do it for both the options ?
> 
> Otherwise looks good to me; OK to push (with these minor items
> addressed, as necessary), thanks!
Thanks, I have committed the patch to trunk in:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=e783a4a683762487cb003ae48235f3d44875de1b
Will post a follow up patch to regenerate common.opt.urls for -foffload-abi and 
-foffload-abi-host-opts if required.

Thanks,
Prathamesh
> 
> 
> Grüße
>  Thomas
> 
> 
> > diff --git a/gcc/common.opt b/gcc/common.opt index
> > ea39f87ae71..d270e524ff4 100644
> > --- a/gcc/common.opt
> > +++ b/gcc/common.opt
> > @@ -2361,6 +2361,10 @@ Enum(offload_abi) String(ilp32)
> > Value(OFFLOAD_ABI_ILP32)  EnumValue
> >  Enum(offload_abi) String(lp64) Value(OFFLOAD_ABI_LP64)
> >
> > +foffload-abi-host-opts=
> > +Common Joined MissingArgError(option missing after %qs)
> > +-foffload-abi-host-opts=Specify host ABI options.
> > +
> >  fomit-frame-pointer
> >  Common Var(flag_omit_frame_pointer) Optimization  When possible do
> > not generate stack frames.
> > diff --git a/gcc/config/aarch64/aarch64.cc
> > b/gcc/config/aarch64/aarch64.cc index 6a3f1a23a9f..6ccf08d1cc0
> 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -19000,9 +19000,9 @@ static char *
> >  aarch64_offload_options (void)
> >  {
> >if (TARGET_ILP32)
> > -return xstrdup ("-foffload-abi=ilp32");
> > +return xstrdup ("-foffload-abi=ilp32
> > + -foffload-abi-host-opts=-mabi=ilp32");
> >else
> > -return xstrdup ("-foffload-abi=lp64");
> > +return xstrdup ("-foffload-abi=lp64
> > + -foffload-abi-host-opts=-mabi=lp64");
> >  }
> >
> >  static struct machine_function *
> > diff --git a/gcc/config/gcn/mkoffload.cc
> b/gcc/config/gcn/mkoffload.cc
> > index b8d981878ed..345bbf7709c 100644
> > --- a/gcc/c

Re: reject decl with incomplete struct/union type in check_global_declaration()

2016-01-20 Thread Prathamesh Kulkarni
On 19 January 2016 at 16:49, Marek Polacek  wrote:
> Sorry for speaking up late, but I think we could do better with formatting
> in this patch:
>
> On Sat, Jan 16, 2016 at 03:45:22PM +0530, Prathamesh Kulkarni wrote:
>> diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
>> index 915376d..d36fc67 100644
>> --- a/gcc/c/c-decl.c
>> +++ b/gcc/c/c-decl.c
>> @@ -4791,6 +4791,13 @@ finish_decl (tree decl, location_t init_loc, tree 
>> init,
>>  TREE_TYPE (decl) = error_mark_node;
>>}
>>
>> +  if ((RECORD_OR_UNION_TYPE_P (TREE_TYPE (decl))
>> +   || TREE_CODE (TREE_TYPE (decl)) == ENUMERAL_TYPE)
>> +   && DECL_SIZE (decl) == 0 && TREE_STATIC (decl))
>
> DECL_SIZE yields a tree, so I'd rather see NULL_TREE instead of 0 here (yeah,
> the enclosing code uses 0s :().  The "&& TREE_STATIC..." should be on its own
> line.
>
>> + {
>> +   incomplete_record_decls.safe_push (decl);
>> + }
>> +
>
> Redundant braces.
>
>> diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
>> index a0e0052..3c8a496 100644
>> --- a/gcc/c/c-parser.c
>> +++ b/gcc/c/c-parser.c
>> @@ -59,6 +59,8 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "gimple-expr.h"
>>  #include "context.h"
>>
>> +vec incomplete_record_decls = vNULL;
>
> This could use a comment.
>
>> +
>> +  for (unsigned i = 0; i < incomplete_record_decls.length (); ++i)
>> +{
>> +  tree decl = incomplete_record_decls[i];
>> +  if (DECL_SIZE (decl) == 0 && TREE_TYPE (decl) != error_mark_node)
>
> I'd s/0/NULL_TREE/.
Thanks for the review, I have done the suggested changes in this
version of the patch.
Ok for trunk ?

Thanks,
Prathamesh
>
> Marek
diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 5830e22..1ec6042 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -4791,6 +4791,12 @@ finish_decl (tree decl, location_t init_loc, tree init,
   TREE_TYPE (decl) = error_mark_node;
 }
 
+  if ((RECORD_OR_UNION_TYPE_P (TREE_TYPE (decl))
+ || TREE_CODE (TREE_TYPE (decl)) == ENUMERAL_TYPE)
+ && DECL_SIZE (decl) == NULL_TREE
+ && TREE_STATIC (decl))
+   incomplete_record_decls.safe_push (decl);
+
   if (is_global_var (decl) && DECL_SIZE (decl) != 0)
{
  if (TREE_CODE (DECL_SIZE (decl)) == INTEGER_CST)
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 919680a..1d3b9e1 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -59,6 +59,15 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-expr.h"
 #include "context.h"
 
+/* We need to walk over decls with incomplete struct/union/enum types
+   after parsing the whole translation unit.
+   In finish_decl(), if the decl is static, has incomplete
+   struct/union/enum type, it is appened to incomplete_record_decls.
+   In c_parser_translation_unit(), we iterate over incomplete_record_decls
+   and report error if any of the decls are still incomplete.  */ 
+
+vec incomplete_record_decls = vNULL;
+
 void
 set_c_expr_source_range (c_expr *expr,
 location_t start, location_t finish)
@@ -1421,6 +1430,16 @@ c_parser_translation_unit (c_parser *parser)
}
   while (c_parser_next_token_is_not (parser, CPP_EOF));
 }
+
+  for (unsigned i = 0; i < incomplete_record_decls.length (); ++i)
+{
+  tree decl = incomplete_record_decls[i];
+  if (DECL_SIZE (decl) == NULL_TREE && TREE_TYPE (decl) != error_mark_node)
+   {
+ error ("storage size of %q+D isn%'t known", decl);
+ TREE_TYPE (decl) = error_mark_node;
+   }
+}
 }
 
 /* Parse an external declaration (C90 6.7, C99 6.9).
diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
index 81a3d58..cf79ba7 100644
--- a/gcc/c/c-tree.h
+++ b/gcc/c/c-tree.h
@@ -731,4 +731,6 @@ set_c_expr_source_range (c_expr *expr,
 /* In c-fold.c */
 extern tree decl_constant_value_for_optimization (tree);
 
+extern vec incomplete_record_decls;
+
 #endif /* ! GCC_C_TREE_H */
diff --git a/gcc/testsuite/gcc.dg/Wcxx-compat-8.c 
b/gcc/testsuite/gcc.dg/Wcxx-compat-8.c
index f7e8c55..4e9ddc1 100644
--- a/gcc/testsuite/gcc.dg/Wcxx-compat-8.c
+++ b/gcc/testsuite/gcc.dg/Wcxx-compat-8.c
@@ -33,6 +33,7 @@ enum e3
 
 __typeof__ (struct s5 { int i; }) v5; /* { dg-warning "invalid in C\[+\]\[+\]" 
} */
 __typeof__ (struct t5) w5; /* { dg-bogus "invalid in C\[+\]\[+\]" } */
+  /* { dg-error "storage size of 'w5' isn't known" "" { target *-*-* } 35 } */
 
 int
 f1 (struct s1 *p)
@@ -64,4 +65,4 @@ f5 ()
   return &((struct t8) { });  /* { dg-warning "invalid in C\[+\]\[+\]"

[C++ patch] report better diagnostic for static following '[' in parameter declaration

2016-01-28 Thread Prathamesh Kulkarni
Hi,
For the test-case,
void f(int a[static 10]);

g++ gives following errors:
test-foo.cpp:1:14: error: expected primary-expression before ‘static’
 void f(int a[static 10]);
  ^
test-foo.cpp:1:14: error: expected ‘]’ before ‘static’
test-foo.cpp:1:14: error: expected ‘)’ before ‘static’
test-foo.cpp:1:14: error: expected initializer before ‘static’

and clang++ gives:
test-foo.cpp:1:13: error: static array size is a C99 feature, not
permitted in C++
void f(int a[static 10]);
^
I have attached patch that attempts to report the same diagnostic.
With patch, g++ reports:

test-foo.cpp:1:14: error: static array size is a C99 feature,not
permitted in C++
 void f(int a[static 10])
  ^~
test-foo.cpp:1:14: error: expected ‘]’ before ‘static’
test-foo.cpp:1:14: error: expected ‘)’ before ‘static’
test-foo.cpp:1:14: error: expected initializer before ‘static’

I tried to remove the 3 errors that follow it (expected X before static)
but without luck :/

Bootstrap and tested on x86_64-unknown-linux-gnu.
OK for trunk ?

Thanks,
Prathamesh
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index d03b0c9..4d3e38a 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -19016,10 +19017,22 @@ cp_parser_direct_declarator (cp_parser* parser,
  cp_lexer_consume_token (parser->lexer);
  /* Peek at the next token.  */
  token = cp_lexer_peek_token (parser->lexer);
+
+ /* If static keyword immediately follows [, report error.  */
+ if (cp_lexer_next_token_is_keyword (parser->lexer, RID_STATIC)
+ && current_binding_level->kind == sk_function_parms)
+   {
+ error_at (token->location,
+   "static array size is a C99 feature,"
+   "not permitted in C++");
+ bounds = error_mark_node;
+   }
+
  /* If the next token is `]', then there is no
 constant-expression.  */
- if (token->type != CPP_CLOSE_SQUARE)
+ else if (token->type != CPP_CLOSE_SQUARE)
{
+
  bool non_constant_p;
  bounds
= cp_parser_constant_expression (parser,
diff --git a/gcc/testsuite/g++.dg/parse/static-array-error.C 
b/gcc/testsuite/g++.dg/parse/static-array-error.C
new file mode 100644
index 000..8b58588
--- /dev/null
+++ b/gcc/testsuite/g++.dg/parse/static-array-error.C
@@ -0,0 +1,6 @@
+// { dg-do compile }
+
+void f(int a[static 10]);  /* { dg-error "static array size is a C99 feature" 
} */
+/* { dg-error "expected ']' before 'static'" "" { target *-*-* } 3 } */
+/* { dg-error "expected ')' before 'static'" "" { target *-*-* } 3 } */
+/* { dg-error "expected initializer before 'static'" "" { target *-*-* } 3 } */


ChangeLog
Description: Binary data


Re: [C++ patch] report better diagnostic for static following '[' in parameter declaration

2016-01-29 Thread Prathamesh Kulkarni
On 29 January 2016 at 05:03, Marek Polacek  wrote:
> On Fri, Jan 29, 2016 at 04:46:56AM +0530, Prathamesh Kulkarni wrote:
>> @@ -19016,10 +19017,22 @@ cp_parser_direct_declarator (cp_parser* parser,
>> cp_lexer_consume_token (parser->lexer);
>> /* Peek at the next token.  */
>> token = cp_lexer_peek_token (parser->lexer);
>> +
>> +   /* If static keyword immediately follows [, report error.  */
>> +   if (cp_lexer_next_token_is_keyword (parser->lexer, RID_STATIC)
>> +   && current_binding_level->kind == sk_function_parms)
>> + {
>> +   error_at (token->location,
>> + "static array size is a C99 feature,"
>> + "not permitted in C++");
>> +   bounds = error_mark_node;
>> + }
>> +
>
> I think this isn't sufficient as-is; if we're changing the diagnostics here,
> we should also handle e.g. void f(int a[const 10]); where clang++ says
> g.C:1:13: error: qualifier in array size is a C99 feature, not permitted in 
> C++
>
> And also e.g.
> void f(int a[const static 10]);
> void f(int a[static const 10]);
> and similar.
Thanks for the review. AFAIK the type-qualifiers would be const,
restrict, volatile and _Atomic (n1570 p 6.7.3) ?
I added a check for those and for variable length array.
I am having issues with writing the test-case,
some cases pass with -std=c++11 but fail with -std=c++98.
Could you please have a look ?

Thanks,
Prathamesh
>
> Marek
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 33f1df3..04137b3 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -982,6 +982,24 @@ cp_lexer_next_token_is_decl_specifier_keyword (cp_lexer 
*lexer)
 }
 }
 
+static bool
+cp_lexer_next_token_is_c_type_qual (cp_lexer *lexer)
+{
+  if (cp_lexer_next_token_is_keyword (lexer, RID_CONST) 
+  || cp_lexer_next_token_is_keyword (lexer, RID_VOLATILE))
+return true;
+
+  cp_token *token = cp_lexer_peek_token (lexer);
+  if (token->type == CPP_NAME)
+{
+  tree name = token->u.value;
+  const char *p = IDENTIFIER_POINTER (name);
+  return !strcmp (p, "restrict") || !strcmp (p, "_Atomic");
+}
+
+  return false;
+}
+
 /* Returns TRUE iff the token T begins a decltype type.  */
 
 static bool
@@ -18998,10 +19016,40 @@ cp_parser_direct_declarator (cp_parser* parser,
  cp_lexer_consume_token (parser->lexer);
  /* Peek at the next token.  */
  token = cp_lexer_peek_token (parser->lexer);
+
+ /* If static or type-qualifier or * immediately follows [,
+report error.  */
+ if (current_binding_level->kind == sk_function_parms)
+   {
+ if (cp_lexer_next_token_is_keyword (parser->lexer, RID_STATIC))
+   {
+   error_at (token->location,
+ "static array size is a C99 feature, "
+ "not permitted in C++");
+   bounds = error_mark_node;
+   }
+ else if (cp_lexer_next_token_is_c_type_qual (parser->lexer))
+   {
+   error_at (token->location,
+ "qualifier in array size is a C99 feature, "
+ "not permitted in C++");
+   bounds = error_mark_node;
+   }
+ 
+ else if (token->type == CPP_MULT)
+   {
+   error_at (token->location,
+ "variable-length array size is a C99 feature, "
+ "not permitted in C++");
+   bounds = error_mark_node;
+   }
+   }
+
  /* If the next token is `]', then there is no
 constant-expression.  */
- if (token->type != CPP_CLOSE_SQUARE)
+ if (token->type != CPP_CLOSE_SQUARE && bounds != error_mark_node)
{
+
  bool non_constant_p;
  bounds
= cp_parser_constant_expression (parser,
diff --git a/gcc/testsuite/g++.dg/parse/static-array-error.C 
b/gcc/testsuite/g++.dg/parse/static-array-error.C
new file mode 100644
index 000..028320d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/parse/static-array-error.C
@@ -0,0 +1,33 @@
+// { dg-do compile }
+
+void f1(int a[static 10]);  /* { dg-error "static array size is a C99 feature" 
} */
+/* { dg-error "expected '\\]' before 'static'" "" { target *-*-* } 3 } */
+/* { dg-error "expected '\\)' before 'static'" "" { target *-*-* } 3 } */
+/* { dg-error "expected initializer before 'static'" "

Re: [ARM] implement division using vrecpe/vrecps with -funsafe-math-optimizations

2016-02-05 Thread Prathamesh Kulkarni
On 4 February 2016 at 16:31, Ramana Radhakrishnan
 wrote:
> On Sun, Jan 17, 2016 at 9:06 AM, Prathamesh Kulkarni
>  wrote:
>> On 31 July 2015 at 15:04, Ramana Radhakrishnan
>>  wrote:
>>>
>>>
>>> On 29/07/15 11:09, Prathamesh Kulkarni wrote:
>>>> Hi,
>>>> This patch tries to implement division with multiplication by
>>>> reciprocal using vrecpe/vrecps
>>>> with -funsafe-math-optimizations and -freciprocal-math enabled.
>>>> Tested on arm-none-linux-gnueabihf using qemu.
>>>> OK for trunk ?
>>>>
>>>> Thank you,
>>>> Prathamesh
>>>>
>>>
>>> I've tried this in the past and never been convinced that 2 iterations are 
>>> enough to get to stability with this given that the results are only 
>>> precise for 8 bits / iteration. Thus I've always believed you need 3 
>>> iterations rather than 2 at which point I've never been sure that it's 
>>> worth it. So the testing that you've done with this currently is not enough 
>>> for this to go into the tree.
>>>
>>> I'd like this to be tested on a couple of different AArch32 implementations 
>>> with a wider range of inputs to verify that the results are acceptable as 
>>> well as running something like SPEC2k(6) with atleast one iteration to 
>>> ensure correctness.
>> Hi,
>> I got results of SPEC2k6 fp benchmarks:
>> a15: +0.64% overall, 481.wrf: +6.46%
>> a53: +0.21% overall, 416.gamess: -1.39%, 481.wrf: +6.76%
>> a57: +0.35% overall, 481.wrf: +3.84%
>> The other benchmarks had (almost) identical results.
>
> Thanks for the benchmarking results -  Please repost the patch with
> the changes that I had requested in my previous review - given it is
> now stage4 , I would rather queue changes like this for stage1 now.
Hi,
Please find the updated patch attached.
It passes testsuite for arm-none-linux-gnueabi, arm-none-linux-gnueabihf and
arm-none-eabi.
However the test-case added in the patch (neon-vect-div-1.c) fails to
get vectorized at -O2
for armeb-none-linux-gnueabihf.
Charles suggested me to try with -O3, which worked.
It appears the test-case fails to get vectorized with
-fvect-cost-model=cheap (which is default enabled at -O2)
and passes for -fno-vect-cost-model / -fvect-cost-model=dynamic

I can't figure out why it fails -fvect-cost-model=cheap.
From the vect dump (attached):
neon-vect-div-1.c:12:3: note: Setting misalignment to -1.
neon-vect-div-1.c:12:3: note: not vectorized: unsupported unaligned load.*_9

Thanks,
Prathamesh
>
> Thanks,
> Ramana
>
>>
>> Thanks,
>> Prathamesh
>>>
>>>
>>> moving on to the patches.
>>>
>>>> diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
>>>> index 654d9d5..28c2e2a 100644
>>>> --- a/gcc/config/arm/neon.md
>>>> +++ b/gcc/config/arm/neon.md
>>>> @@ -548,6 +548,32 @@
>>>>  (const_string "neon_mul_")))]
>>>>  )
>>>>
>>>
>>> Please add a comment here.
>>>
>>>> +(define_expand "div3"
>>>> +  [(set (match_operand:VCVTF 0 "s_register_operand" "=w")
>>>> +(div:VCVTF (match_operand:VCVTF 1 "s_register_operand" "w")
>>>> +   (match_operand:VCVTF 2 "s_register_operand" "w")))]
>>>
>>> I want to double check that this doesn't collide with Alan's patches for 
>>> FP16 especially if he reuses the VCVTF iterator for all the vcvt f16 cases.
>>>
>>>> +  "TARGET_NEON && flag_unsafe_math_optimizations && flag_reciprocal_math"
>>>> +  {
>>>> +rtx rec = gen_reg_rtx (mode);
>>>> +rtx vrecps_temp = gen_reg_rtx (mode);
>>>> +
>>>> +/* Reciprocal estimate */
>>>> +emit_insn (gen_neon_vrecpe (rec, operands[2]));
>>>> +
>>>> +/* Perform 2 iterations of Newton-Raphson method for better accuracy 
>>>> */
>>>> +for (int i = 0; i < 2; i++)
>>>> +  {
>>>> + emit_insn (gen_neon_vrecps (vrecps_temp, rec, operands[2]));
>>>> + emit_insn (gen_mul3 (rec, rec, vrecps_temp));
>>>> +  }
>>>> +
>>>> +/* We now have reciprocal in rec, perform operands[0] = operands[1] * 
>>>> rec */
>>>> +emit_insn (gen_mul3 (operands[0], operands[1], rec));
>>>> +DONE;
>>>> +  }
>>>> +)
&g

add check for aarch64 in check_effective_target_section_anchors()

2016-02-11 Thread Prathamesh Kulkarni
Hi,
aarch64 supports section anchors but it appears
check_effective_target_section_anchors() doesn't contain entry for it.
This patch adds for entry for aarch64.
OK for trunk ?

Thanks,
Prathamesh
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 645981a..66fb1ea 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5467,7 +5467,8 @@ proc check_effective_target_section_anchors { } {
 } else {
 set et_section_anchors_saved 0
 if { [istarget powerpc*-*-*]
- || [istarget arm*-*-*] } {
+ || [istarget arm*-*-*] 
+ || [istarget aarch64*-*-*] } {
set et_section_anchors_saved 1
 }
 }


ChangeLog
Description: Binary data


Re: add check for aarch64 in check_effective_target_section_anchors()

2016-02-15 Thread Prathamesh Kulkarni
On 15 February 2016 at 19:24, James Greenhalgh  wrote:
> On Thu, Feb 11, 2016 at 11:03:23PM +0530, Prathamesh Kulkarni wrote:
>> Hi,
>> aarch64 supports section anchors but it appears
>> check_effective_target_section_anchors() doesn't contain entry for it.
>> This patch adds for entry for aarch64.
>> OK for trunk ?
>
> OK. I presume you tested this, and the testcases this enables PASS without
> issue?
Yes, the unsupported test-cases for section anchors pass.
http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/233425-target-supports/aarch64-none-linux-gnu/diff-gcc-rh60-aarch64-none-linux-gnu-default-default-default.txt
Tested with aarch64-none-linux-gnu, aarch64-none-elf, and aarch64_be-none-elf.
Committed as r233426.

Thanks,
Prathamesh
>
> Thanks,
> James
>
>> diff --git a/gcc/testsuite/lib/target-supports.exp 
>> b/gcc/testsuite/lib/target-supports.exp
>> index 645981a..66fb1ea 100644
>> --- a/gcc/testsuite/lib/target-supports.exp
>> +++ b/gcc/testsuite/lib/target-supports.exp
>> @@ -5467,7 +5467,8 @@ proc check_effective_target_section_anchors { } {
>>  } else {
>>  set et_section_anchors_saved 0
>>  if { [istarget powerpc*-*-*]
>> -   || [istarget arm*-*-*] } {
>> +   || [istarget arm*-*-*]
>> +   || [istarget aarch64*-*-*] } {
>> set et_section_anchors_saved 1
>>  }
>>  }
>
>


Re: [PATCH] Fix PR69951

2016-03-01 Thread Prathamesh Kulkarni
On 1 March 2016 at 16:19, Richard Biener  wrote:
> On Tue, 1 Mar 2016, Ramana Radhakrishnan wrote:
>
>>
>>
>> On 01/03/16 09:54, Richard Biener wrote:
>> > On Tue, 1 Mar 2016, James Greenhalgh wrote:
>> >
>> >> On Tue, Mar 01, 2016 at 10:21:27AM +0100, Richard Biener wrote:
>> >>> On Mon, 29 Feb 2016, James Greenhalgh wrote:
>> >>>
>>  On Fri, Feb 26, 2016 at 09:32:53AM +0100, Richard Biener wrote:
>> >
>> > The following fixes PR69951, hopefully the last case of decl alias
>> > issues with alias analysis.  This time it's points-to and the DECL_UIDs
>> > used in points-to sets not being canonicalized.
>> >
>> > The simplest (and cheapest) fix is to make aliases refer to the
>> > ultimate alias target via their DECL_PT_UID which we conveniently
>> > have available.
>> >
>> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
>> >
>> > Richard.
>> >
>> > 2016-02-26  Richard Biener  
>> >
>> > PR tree-optimization/69551
>> > * tree-ssa-structalias.c (get_constraint_for_ssa_var): When
>> > looking through aliases adjust DECL_PT_UID to refer to the
>> > ultimate alias target.
>> >
>> > * gcc.dg/torture/pr69951.c: New testcase.
>> 
>>  I see this new testcase failing on an ARM target as so:
>> 
>>  /tmp/ccChjoFc.s: Assembler messages:
>>  /tmp/ccChjoFc.s:21: Warning: [-mwarn-syms]: Assignment makes a 
>>  symbol match an ARM instruction: b
>> 
>>  FAIL: gcc.dg/torture/pr69951.c   -O0  (test for excess errors)
>> 
>>  But I haven't managed to reproduce it outside of the test environment.
>> 
>>  The fix looks trivial, rename b to anything else you fancy (well... stay
>>  clear of add and ldr). I'll put a fix in myself if I can manage to get
>>  this to reproduce - though if anyone else wants to do it I won't be
>>  offended :-).
>> >>>
>> >>> Huh, I wonder what's the use of such warning.  After all 'ldr' is a valid
>> >>> C symbol name, too.  In fact my cross arm as doesn't report this
>> >>> warning (binutils 2.25.0)
>> >>>
>>  arm-suse-linux-gnueabi-as t.s -mwarn-syms
>> >>> Assembler messages:
>> >>> Error: unrecognized option -mwarn-syms
>> >>
>> >> Right, I've figured out the set of conditions... You need to be targeting
>> >> an arm-*-linux-* system to make sure that the ASM_OUTPUT_DEF definition
>> >> from config/arm/linux-elf.h is pulled in. This causes us to emit:
>> >>
>> >> b = a
>> >>
>> >> Rather than
>> >>
>> >>.setb,a
>> >>
>> >> Writing it as "b = a" causes the warning added to resolve binutils
>> >> PR18347 [1] to kick in, so you need binutils > 2.26 or to have backported
>> >> that patch).
>> >>
>> >> Resolving it by hacking the testcase would be one fix, but I wonder why 
>> >> the
>> >> ARM port prefers to emit "b = a" in a linux environment if .set does the
>> >> same thing and always avoids the warning? Maybe Ramana/Richard/Kyrill/Nick
>> >> remember?
>> >> (AArch64 does the same thing, but the AArch64 gas port doesn't
>> >> have the PR18347 fix).
>> >
>> > So does b = a define a macro then and the warning is to avoid you
>> > doing
>>
>>
>>
>>
>> I don't think this is a macro, b = a seems to be a way of setting the
>> value of a to b. in the assembler. If a is an expression , then I
>> believe the expression is resolved at assemble time - (b ends up being a
>> symbol in the symbol table produced with the value of a) in this case
>> the address of a. .set b, a achieves the same thing from my experiments
>> and reading of the sources. The reason ports appear to choose not to use
>> the .set a, b idiom is if the assembler syntax has hijacked the .set
>> directive for something else. Thus I don't see why we use the
>> ASM_OUTPUT_DEF form in the GNU/Linux port TBH rather than the .set form
>> especially as we don't reuse .set for anything else in the ARM assembler
>> port and SET_ASM_OP is defined in config/arm/aout.h.
>>
>> The use of .set in the arm port of glibc for assembler code for the same
>> purpose seems to also vindicate that kind of thought.
>>  No reasons were given here[1], maybe Nick or Richard remember from
>> nearly 18 years ago ;)
>>
>>
>> Therefore this seems to be an assembler bug to me in that it doesn't
>> allow such an assignment of values, and a backend wart to me that we
>> have ASM_OUTPUT_DEF defined for no good reason. So, a patch that removes
>> ASM_OUTPUT_DEF from linux-elf.h seems obvious to me pending testing.
>>
>>
>> Nick , Richard - any thoughts ?
>
> So - why does it warn at all for this?  And why does it only warn
> for b = a and not .set b, a?
The rationale for that appears to be in comment 3 for binutils PR18347:
https://sourceware.org/bugzilla/show_bug.cgi?id=18347

"As far as adding some way to suppress the warning... Instruction set
extensions mean
that an acceptable symbol one day will cause a warning tomorrow.
Having some way 

[genmatch] reject empty c_expr

2015-07-15 Thread Prathamesh Kulkarni
Hi,
We allow c_expr to be empty which accepts cases like the following:

(simplify
  match-operand
  (if ()
result-operand))

(simplify
  match-operand
  {})

The attached patch rejects empty c_expr.
Ok for trunk after bootstrap + test ?

Thank you,
Prathamesh
2015-07-15  Prathamesh Kulkarni  

* genmatch.c (parse_c_expr): Reject empty c_expr.
Index: genmatch.c
===
--- genmatch.c  (revision 225834)
+++ genmatch.c  (working copy)
@@ -3375,6 +3375,7 @@
   unsigned opencnt;
   vec code = vNULL;
   unsigned nr_stmts = 0;
+  bool empty = true;
   eat_token (start);
   if (start == CPP_OPEN_PAREN)
 end = CPP_CLOSE_PAREN;
@@ -3394,6 +3395,7 @@
   && --opencnt == 0)
break;
 
+  empty = false;
   /* This is a lame way of counting the number of statements.  */
   if (token->type == CPP_SEMICOLON)
nr_stmts++;
@@ -3412,6 +3414,10 @@
   code.safe_push (*token);
 }
   while (1);
+
+  if (empty)
+fatal_at (token, "c_expr cannot be empty");
+
   return new c_expr (r, code, nr_stmts, vNULL, capture_ids);
 }
 


Re: [genmatch] reject empty c_expr

2015-07-16 Thread Prathamesh Kulkarni
On 16 July 2015 at 12:39, Richard Biener  wrote:
> On Wed, 15 Jul 2015, Prathamesh Kulkarni wrote:
>
>> Hi,
>> We allow c_expr to be empty which accepts cases like the following:
>>
>> (simplify
>>   match-operand
>>   (if ()
>> result-operand))
>>
>> (simplify
>>   match-operand
>>   {})
>
> Yes we do.  We also do not reject various other "bad" forms like
>
>  { ( blah! }
>
> so I am not sure treating empty ones specially makes sense.  After
> all a c-expr is just a list of preprocessing tokens we re-inject
> into the generated C code.
Well the empty () causes genmatch to segfault.
(simplify
  (plus @x @y)
(if ()
  @x)))

segfaults here at genmatch.c:2583
output_line_directive (f, ife->cond->code[0].src_loc);

IIUC, since we bail out early from parse_c_expr on "()", code remains
vNULL and accessing code[0] leads to segfault.

backtrace with gdb:
#0  operator[] (ix=0, this=0x6bb628) at ../../src/gcc/vec.h:1180
#1  dt_simplify::gen_1 (this=this@entry=0x6bac30,
f=f@entry=0x77dd4400 <_IO_2_1_stdout_>, indent=indent@entry=10,
gimple=gimple@entry=true, result=0x6bb5e0) at
../../src/gcc/genmatch.c:2583
#2  0x00413430 in dt_simplify::gen (this=0x6bac30,
f=0x77dd4400 <_IO_2_1_stdout_>, indent=10, gimple=)
at ../../src/gcc/genmatch.c:2873
#3  0x0040de6f in dt_node::gen_kids_1
(this=this@entry=0x6babc0, f=f@entry=0x77dd4400 <_IO_2_1_stdout_>,
indent=indent@entry=8, gimple=gimple@entry=true,
gimple_exprs=gimple_exprs@entry=...,
generic_exprs=generic_exprs@entry=..., fns=fns@entry=...,
generic_fns=generic_fns@entry=..., preds=preds@entry=...,
others=others@entry=...) at ../../src/gcc/genmatch.c:2510
#4  0x0040ecf3 in dt_node::gen_kids (this=0x6babc0,
f=0x77dd4400 <_IO_2_1_stdout_>, indent=8, gimple=true) at
../../src/gcc/genmatch.c:2312
#5  0x0040f491 in dt_operand::gen (this=0x6babc0,
f=0x77dd4400 <_IO_2_1_stdout_>, indent=8, gimple=)
at ../../src/gcc/genmatch.c:2548
#6  0x0040e3a2 in dt_node::gen_kids (this=0x6bab80,
f=0x77dd4400 <_IO_2_1_stdout_>, indent=8, gimple=true) at
../../src/gcc/genmatch.c:2298
#7  0x0040f491 in dt_operand::gen (this=0x6bab80,
f=0x77dd4400 <_IO_2_1_stdout_>, indent=8, gimple=)
at ../../src/gcc/genmatch.c:2548
#8  0x0040e3a2 in dt_node::gen_kids (this=0x6bab40,
f=0x77dd4400 <_IO_2_1_stdout_>, indent=8, gimple=true) at
../../src/gcc/genmatch.c:2298
#9  0x0040f667 in decision_tree::gen_gimple
(this=0x7fffde00, f=0x77dd4400 <_IO_2_1_stdout_>) at
../../src/gcc/genmatch.c:2913
#10 0x0040835e in main (argc=7056432, argv=0x0) at
../../src/gcc/genmatch.c:4135

Thanks,
Prathamesh
>
>> The attached patch rejects empty c_expr.
>> Ok for trunk after bootstrap + test ?
>>
>> Thank you,
>> Prathamesh
>>
>
> --
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham 
> Norton, HRB 21284 (AG Nuernberg)


  1   2   3   4   5   6   7   8   9   10   >