Re: [PATCH] wide-int: Allow up to 16320 bits wide_int and change widest_int precision to 32640 bits [PR102989]

2023-10-12 Thread Richard Sandiford
Jakub Jelinek  writes:
> On Thu, Oct 12, 2023 at 11:54:14AM +0100, Richard Sandiford wrote:
>> Jakub Jelinek  writes:
>> > @@ -2036,11 +2075,20 @@ wi::lrshift_large (HOST_WIDE_INT *val, c
>> >   unsigned int xlen, unsigned int xprecision,
>> >   unsigned int precision, unsigned int shift)
>> >  {
>> > -  unsigned int len = rshift_large_common (val, xval, xlen, xprecision, 
>> > shift);
>> > +  /* Work out how many blocks are needed to store the significant bits
>> > + (excluding the upper zeros or signs).  */
>> > +  unsigned int blocks_needed = BLOCKS_NEEDED (xprecision - shift);
>> > +  unsigned int len = blocks_needed;
>> > +  if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS)
>> > +  && len > xlen
>> > +  && xval[xlen - 1] >= 0)
>> > +len = xlen;
>> 
>> I think here too it would be worth dropping the:
>> 
>>   UNLIKELY (len > WIDE_INT_MAX_INL_ELTS)
>> 
>> part of the condition, since presumably the change should be safe
>> regardless of that.
>
> If so, there is also one spot in lshift_large as well.  So incrementally:
>
> --- gcc/wide-int.cc   2023-10-11 14:41:23.719132402 +0200
> +++ gcc/wide-int.cc   2023-10-11 14:41:23.719132402 +0200
> @@ -2013,8 +2013,7 @@
>  
>/* The whole-block shift fills with zeros.  */
>unsigned int len = BLOCKS_NEEDED (precision);
> -  if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS))
> -len = xlen + skip + 1;
> +  len = MIN (xlen + skip + 1, len);
>for (unsigned int i = 0; i < skip; ++i)
>  val[i] = 0;
>  
> @@ -2079,9 +2078,7 @@
>   (excluding the upper zeros or signs).  */
>unsigned int blocks_needed = BLOCKS_NEEDED (xprecision - shift);
>unsigned int len = blocks_needed;
> -  if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS)
> -  && len > xlen
> -  && xval[xlen - 1] >= 0)
> +  if (len > xlen && xval[xlen - 1] >= 0)
>  len = xlen;
>  
>rshift_large_common (val, xval, xlen, shift, len);
> @@ -2114,9 +2111,7 @@
>/* Work out how many blocks are needed to store the significant bits
>   (excluding the upper zeros or signs).  */
>unsigned int blocks_needed = BLOCKS_NEEDED (xprecision - shift);
> -  unsigned int len = blocks_needed;
> -  if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS) && len > xlen)
> -len = xlen;
> +  unsigned int len = MIN (xlen, blocks_needed);
>  
>rshift_large_common (val, xval, xlen, shift, len);
>  
> which I'll test soon.

LGTM.

>> OK for thw wide-int parts with those changes.
>
> Thanks.  What do you think about that
> --- gcc/wide-int.h.jj 2023-10-11 12:05:47.718059477 +0200
> +++ gcc/wide-int.h2023-10-11 13:51:56.081552500 +0200
> @@ -1635,6 +1635,8 @@ widest_int_storage ::write_val (unsig
>u.valp = XNEWVEC (HOST_WIDE_INT, l);
>return u.valp;
>  }
> +  else if (CHECKING_P && l < WIDE_INT_MAX_INL_ELTS)
> +u.val[l] = HOST_WIDE_INT_UC (0xbaaddeadbeef);
>return u.val;
>  }
>  
> @@ -1650,6 +1652,9 @@ widest_int_storage ::set_len (unsigne
>memcpy (u.val, valp, l * sizeof (u.val[0]));
>XDELETEVEC (valp);
>  }
> +  else if (len && len < WIDE_INT_MAX_INL_ELTS)
> +gcc_checking_assert ((unsigned HOST_WIDE_INT) u.val[len]
> +  == HOST_WIDE_INT_UC (0xbaaddeadbeef));
>len = l;
>/* There are no excess bits in val[len - 1].  */
>STATIC_ASSERT (N % HOST_BITS_PER_WIDE_INT == 0);
>
> part, shall that go into trunk as well or is that too much slowdown
> for checking builds?

I don't have a good intuition about how big the slowdown will be,
but FWIW I agree with Richi that it'd be better to include the change.
We can always take it out again if it proves to be unexpectedly expensive.

Thanks,
Richard


Re: [PATCH] wide-int: Allow up to 16320 bits wide_int and change widest_int precision to 32640 bits [PR102989]

2023-10-12 Thread Jakub Jelinek
On Thu, Oct 12, 2023 at 11:54:14AM +0100, Richard Sandiford wrote:
> Jakub Jelinek  writes:
> > @@ -2036,11 +2075,20 @@ wi::lrshift_large (HOST_WIDE_INT *val, c
> >unsigned int xlen, unsigned int xprecision,
> >unsigned int precision, unsigned int shift)
> >  {
> > -  unsigned int len = rshift_large_common (val, xval, xlen, xprecision, 
> > shift);
> > +  /* Work out how many blocks are needed to store the significant bits
> > + (excluding the upper zeros or signs).  */
> > +  unsigned int blocks_needed = BLOCKS_NEEDED (xprecision - shift);
> > +  unsigned int len = blocks_needed;
> > +  if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS)
> > +  && len > xlen
> > +  && xval[xlen - 1] >= 0)
> > +len = xlen;
> 
> I think here too it would be worth dropping the:
> 
>   UNLIKELY (len > WIDE_INT_MAX_INL_ELTS)
> 
> part of the condition, since presumably the change should be safe
> regardless of that.

If so, there is also one spot in lshift_large as well.  So incrementally:

--- gcc/wide-int.cc 2023-10-11 14:41:23.719132402 +0200
+++ gcc/wide-int.cc 2023-10-11 14:41:23.719132402 +0200
@@ -2013,8 +2013,7 @@
 
   /* The whole-block shift fills with zeros.  */
   unsigned int len = BLOCKS_NEEDED (precision);
-  if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS))
-len = xlen + skip + 1;
+  len = MIN (xlen + skip + 1, len);
   for (unsigned int i = 0; i < skip; ++i)
 val[i] = 0;
 
@@ -2079,9 +2078,7 @@
  (excluding the upper zeros or signs).  */
   unsigned int blocks_needed = BLOCKS_NEEDED (xprecision - shift);
   unsigned int len = blocks_needed;
-  if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS)
-  && len > xlen
-  && xval[xlen - 1] >= 0)
+  if (len > xlen && xval[xlen - 1] >= 0)
 len = xlen;
 
   rshift_large_common (val, xval, xlen, shift, len);
@@ -2114,9 +2111,7 @@
   /* Work out how many blocks are needed to store the significant bits
  (excluding the upper zeros or signs).  */
   unsigned int blocks_needed = BLOCKS_NEEDED (xprecision - shift);
-  unsigned int len = blocks_needed;
-  if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS) && len > xlen)
-len = xlen;
+  unsigned int len = MIN (xlen, blocks_needed);
 
   rshift_large_common (val, xval, xlen, shift, len);
 
which I'll test soon.

> OK for thw wide-int parts with those changes.

Thanks.  What do you think about that
--- gcc/wide-int.h.jj   2023-10-11 12:05:47.718059477 +0200
+++ gcc/wide-int.h  2023-10-11 13:51:56.081552500 +0200
@@ -1635,6 +1635,8 @@ widest_int_storage ::write_val (unsig
   u.valp = XNEWVEC (HOST_WIDE_INT, l);
   return u.valp;
 }
+  else if (CHECKING_P && l < WIDE_INT_MAX_INL_ELTS)
+u.val[l] = HOST_WIDE_INT_UC (0xbaaddeadbeef);
   return u.val;
 }
 
@@ -1650,6 +1652,9 @@ widest_int_storage ::set_len (unsigne
   memcpy (u.val, valp, l * sizeof (u.val[0]));
   XDELETEVEC (valp);
 }
+  else if (len && len < WIDE_INT_MAX_INL_ELTS)
+gcc_checking_assert ((unsigned HOST_WIDE_INT) u.val[len]
+== HOST_WIDE_INT_UC (0xbaaddeadbeef));
   len = l;
   /* There are no excess bits in val[len - 1].  */
   STATIC_ASSERT (N % HOST_BITS_PER_WIDE_INT == 0);

part, shall that go into trunk as well or is that too much slowdown
for checking builds?

Jakub



Re: [PATCH] wide-int: Allow up to 16320 bits wide_int and change widest_int precision to 32640 bits [PR102989]

2023-10-12 Thread Richard Sandiford
Jakub Jelinek  writes:
> @@ -2036,11 +2075,20 @@ wi::lrshift_large (HOST_WIDE_INT *val, c
>  unsigned int xlen, unsigned int xprecision,
>  unsigned int precision, unsigned int shift)
>  {
> -  unsigned int len = rshift_large_common (val, xval, xlen, xprecision, 
> shift);
> +  /* Work out how many blocks are needed to store the significant bits
> + (excluding the upper zeros or signs).  */
> +  unsigned int blocks_needed = BLOCKS_NEEDED (xprecision - shift);
> +  unsigned int len = blocks_needed;
> +  if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS)
> +  && len > xlen
> +  && xval[xlen - 1] >= 0)
> +len = xlen;

I think here too it would be worth dropping the:

  UNLIKELY (len > WIDE_INT_MAX_INL_ELTS)

part of the condition, since presumably the change should be safe
regardless of that.

> +
> +  rshift_large_common (val, xval, xlen, shift, len);
>  
>/* The value we just created has precision XPRECISION - SHIFT.
>   Zero-extend it to wider precisions.  */
> -  if (precision > xprecision - shift)
> +  if (precision > xprecision - shift && len == blocks_needed)
>  {
>unsigned int small_prec = (xprecision - shift) % 
> HOST_BITS_PER_WIDE_INT;
>if (small_prec)
> @@ -2063,11 +2111,18 @@ wi::arshift_large (HOST_WIDE_INT *val, c
>  unsigned int xlen, unsigned int xprecision,
>  unsigned int precision, unsigned int shift)
>  {
> -  unsigned int len = rshift_large_common (val, xval, xlen, xprecision, 
> shift);
> +  /* Work out how many blocks are needed to store the significant bits
> + (excluding the upper zeros or signs).  */
> +  unsigned int blocks_needed = BLOCKS_NEEDED (xprecision - shift);
> +  unsigned int len = blocks_needed;
> +  if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS) && len > xlen)
> +len = xlen;
> +

Same here.

OK for thw wide-int parts with those changes.

Thanks,
Richard


Re: [PATCH] wide-int: Allow up to 16320 bits wide_int and change widest_int precision to 32640 bits [PR102989]

2023-10-10 Thread Jakub Jelinek
On Tue, Oct 10, 2023 at 06:41:50PM +0100, Richard Sandiford wrote:
> > On the wide_int side, I see
> >  155291 576
> > (supposedly because of bound_wide_int, where we create wide_int_ref from
> > the 576-bit precision bound_wide_int and then create 576-bit wide_int when
> > using unary or binary operation on that).
> 
> 576 bits seems quite a lot for a loop bound.  We're enterning near-infinite
> territory with 128 bits. :)  But I don't want to rehash old discussions.
> If we take this size for wide_int as given, then...

We could go for 128 bits bounds_wide_int and redo the stats given that.
Though maybe such tweaks can be done incrementally.

> > So, perhaps we could get away with say WIDEST_INT_MAX_INL_ELTS of 5 or 6
> > instead of 9 but keep WIDE_INT_MAX_INL_ELTS at 9 (or whatever is computed
> > from MAX_BITSIZE_MODE_ANY_INT?).  Or keep it at 9 for both (i.e. without
> > the third patch).
> 
> ...I agree we might as well keep the widest_int size the same for
> simplicity.  It'd only be worth distinguishing them if we have positive
> proof that it's worthwhile.
> 
> So going with patches 1 + 2 sounds good to me, but I don't have a strong
> preference.

Ok.

> > +  if (prec > WIDE_INT_MAX_INL_PRECISION && !high)
> > +prec = (op1len + op2len + 1) * HOST_BITS_PER_WIDE_INT;
> 
> Changing the precision looked a bit dangerous at first, but I agree it
> looks correct in context, in that nothing later on seems to require prec
> to be the real precision of the number.  But I wonder whether we should
> instead do:
> 
>   if (!high)
> prec = MIN ((op1len + op2len + 1) * HOST_BITS_PER_WIDE_INT, prec);

Ok, will try that (and in other spots too).

> so that the assumption gets a bit more testing.  Same idea for the others.
> I.e. in any case where we think it's safe to reduce a precision or
> length for out-of-line buffers, I think we should try to do the same
> for inline ones.
> 
> I'm not sure off-hand why + 1 is safe here but + 2 is needed for the
> write_val estimate.  Might be worth a comment in one place or the other.

As I wrote earlier, I'm not sure about the need for the + 1 above in
prec computation, perhaps just
  if (!high)
prec = MIN ((op1len + op2len + 1) * HOST_BITS_PER_WIDE_INT, prec);
could work too (especially if the multiplication is always signed vs. signed
or unsigned vs. unsigned).  Even say
  HOST_WIDE_INT_MIN * HOST_WIDE_INT_MIN (i.e. op1len == op2len == 1) result
of 0x4000'0'' (for smallest signed^2) or
~(unsigned HOST_WIDE_INT) 0 * ~(unsigned HOST_WIDE_INT) 0 (largest
unsigned^2) of
0x'fffe''0001
fits into prec 2; and for widest_int-like representation,
0x'ff wouldn't have op1len 1, but 2; but the + 1 on top of
that is needed because of the way wi_pack then works.  But guess will try to
play with it some more tomorrow.

> > @@ -2399,9 +2453,12 @@ from_int (int i)
> >  static void
> >  assert_deceq (const char *expected, const wide_int_ref &wi, signop sgn)
> >  {
> > -  char buf[WIDE_INT_PRINT_BUFFER_SIZE];
> > -  print_dec (wi, buf, sgn);
> > -  ASSERT_STREQ (expected, buf);
> > +  char buf[WIDE_INT_PRINT_BUFFER_SIZE], *p = buf;
> > +  unsigned len = wi.get_len ();
> > +  if (UNLIKELY (len > WIDE_INT_MAX_INL_ELTS))
> > +p = XALLOCAVEC (char, len * HOST_BITS_PER_WIDE_INT / 4 + 4);
> 
> Is this size enough for decimal?  E.g. to go back to the 576-bit example,
> 2^575 is O(10^173), but 576/4+4 == 148.

generic_wide_ints with precision larger than HOST_BITS_PER_WIDE_INT and
length larger than 1 are always printed hexadecimally, we don't have code to
print it decimally (it wouldn't be terribly hard, we have divmod, but could
be expensive).  Though, I've always wondered why we at least for print_decs
don't print negative values as -0xwhatever, rather than 0xsomethinglarge.
Any change in this area could affect a lot of dumps though.

Jakub



Re: [PATCH] wide-int: Allow up to 16320 bits wide_int and change widest_int precision to 32640 bits [PR102989]

2023-10-10 Thread Richard Sandiford
Jakub Jelinek  writes:
> On Mon, Oct 09, 2023 at 03:44:10PM +0200, Jakub Jelinek wrote:
>> Thanks, just quick answers, will work on patch adjustments after trying to
>> get rid of rwide_int (seems dwarf2out has very limited needs from it, just
>> some routine to construct it in GCed memory (and never change afterwards)
>> from const wide_int_ref & or so, and then working operator ==,
>> get_precision, elt, get_len and get_val methods, so I think we could just
>> have a struct dw_wide_int { unsigned int prec, len; HOST_WIDE_INT val[1]; };
>> and perform the methods on it after converting to a storage ref.
>
> Now in patch form (again, incremental).
>
>> > Does the variable-length memcpy pay for itself?  If so, perhaps that's a
>> > sign that we should have a smaller inline buffer for this class (say 2 
>> > HWIs).
>> 
>> Guess I'll try to see what results in smaller .text size.
>
> I've left the memcpy changes into a separate patch (incremental, attached).
> Seems that second patch results in .text growth by 16256 bytes (0.04%),
> though I'd bet it probably makes compile time tiny bit faster because it
> replaces an out of line memcpy (caused by variable length) with inlined one.
>
> With even the third one it shrinks by 84544 bytes (0.21% down), but the
> extra statistics patch then shows massive number of allocations after
> running make check-gcc check-g++ check-gfortran for just a minute or two.
> On the widest_int side, I see (first number from sort | uniq -c | sort -nr,
> second the estimated or final len)
> 7289034 4
>  173586 5
>   21819 6
> i.e. there are tons of widest_ints which need len 4 (or perhaps just
> have it as upper estimation), maybe even 5 would be nice.

Thanks for running the stats.  That's definitely a lot more than I expected.

> On the wide_int side, I see
>  155291 576
> (supposedly because of bound_wide_int, where we create wide_int_ref from
> the 576-bit precision bound_wide_int and then create 576-bit wide_int when
> using unary or binary operation on that).

576 bits seems quite a lot for a loop bound.  We're enterning near-infinite
territory with 128 bits. :)  But I don't want to rehash old discussions.
If we take this size for wide_int as given, then...

> So, perhaps we could get away with say WIDEST_INT_MAX_INL_ELTS of 5 or 6
> instead of 9 but keep WIDE_INT_MAX_INL_ELTS at 9 (or whatever is computed
> from MAX_BITSIZE_MODE_ANY_INT?).  Or keep it at 9 for both (i.e. without
> the third patch).

...I agree we might as well keep the widest_int size the same for
simplicity.  It'd only be worth distinguishing them if we have positive
proof that it's worthwhile.

So going with patches 1 + 2 sounds good to me, but I don't have a strong
preference.

On the wide-int.cc changes:

> @@ -1469,6 +1452,36 @@ wi::mul_internal (HOST_WIDE_INT *val, co
>return 1;
>  }
> 
> +  /* The sizes here are scaled to support a 2x WIDE_INT_MAX_INL_PRECISION by 
> 2x
> + WIDE_INT_MAX_INL_PRECISION yielding a 4x WIDE_INT_MAX_INL_PRECISION
> + result.  */
> +
> +  unsigned HOST_HALF_WIDE_INT
> +ubuf[4 * WIDE_INT_MAX_INL_PRECISION / HOST_BITS_PER_HALF_WIDE_INT];
> +  unsigned HOST_HALF_WIDE_INT
> +vbuf[4 * WIDE_INT_MAX_INL_PRECISION / HOST_BITS_PER_HALF_WIDE_INT];
> +  /* The '2' in 'R' is because we are internally doing a full
> + multiply.  */
> +  unsigned HOST_HALF_WIDE_INT
> +rbuf[2 * 4 * WIDE_INT_MAX_INL_PRECISION / HOST_BITS_PER_HALF_WIDE_INT];
> +  const HOST_WIDE_INT mask = ((HOST_WIDE_INT)1 << 
> HOST_BITS_PER_HALF_WIDE_INT) - 1;
> +  unsigned HOST_HALF_WIDE_INT *u = ubuf;
> +  unsigned HOST_HALF_WIDE_INT *v = vbuf;
> +  unsigned HOST_HALF_WIDE_INT *r = rbuf;
> +
> +  if (prec > WIDE_INT_MAX_INL_PRECISION && !high)
> +prec = (op1len + op2len + 1) * HOST_BITS_PER_WIDE_INT;

Changing the precision looked a bit dangerous at first, but I agree it
looks correct in context, in that nothing later on seems to require prec
to be the real precision of the number.  But I wonder whether we should
instead do:

  if (!high)
prec = MIN ((op1len + op2len + 1) * HOST_BITS_PER_WIDE_INT, prec);

so that the assumption gets a bit more testing.  Same idea for the others.
I.e. in any case where we think it's safe to reduce a precision or
length for out-of-line buffers, I think we should try to do the same
for inline ones.

I'm not sure off-hand why + 1 is safe here but + 2 is needed for the
write_val estimate.  Might be worth a comment in one place or the other.

> +  unsigned int blocks_needed = BLOCKS_NEEDED (prec);
> +  unsigned int half_blocks_needed = blocks_needed * 2;
> +  if (UNLIKELY (prec > WIDE_INT_MAX_INL_PRECISION))
> +{
> +  unsigned HOST_HALF_WIDE_INT *buf
> + = XALLOCAVEC (unsigned HOST_HALF_WIDE_INT, 4 * 4 * blocks_needed);
> +  u = buf;
> +  v = u + 4 * blocks_needed;
> +  r = v + 4 * blocks_needed;
> +}
> +
>/* We do unsigned mul and then correct it.  */
>wi_unpack (u, op1val, op1len, half_blocks_needed, prec, SIGNED)

Re: [PATCH] wide-int: Allow up to 16320 bits wide_int and change widest_int precision to 32640 bits [PR102989]

2023-10-09 Thread Jakub Jelinek
On Mon, Oct 09, 2023 at 03:44:10PM +0200, Jakub Jelinek wrote:
> Thanks, just quick answers, will work on patch adjustments after trying to
> get rid of rwide_int (seems dwarf2out has very limited needs from it, just
> some routine to construct it in GCed memory (and never change afterwards)
> from const wide_int_ref & or so, and then working operator ==,
> get_precision, elt, get_len and get_val methods, so I think we could just
> have a struct dw_wide_int { unsigned int prec, len; HOST_WIDE_INT val[1]; };
> and perform the methods on it after converting to a storage ref.

Now in patch form (again, incremental).

> > Does the variable-length memcpy pay for itself?  If so, perhaps that's a
> > sign that we should have a smaller inline buffer for this class (say 2 
> > HWIs).
> 
> Guess I'll try to see what results in smaller .text size.

I've left the memcpy changes into a separate patch (incremental, attached).
Seems that second patch results in .text growth by 16256 bytes (0.04%),
though I'd bet it probably makes compile time tiny bit faster because it
replaces an out of line memcpy (caused by variable length) with inlined one.

With even the third one it shrinks by 84544 bytes (0.21% down), but the
extra statistics patch then shows massive number of allocations after
running make check-gcc check-g++ check-gfortran for just a minute or two.
On the widest_int side, I see (first number from sort | uniq -c | sort -nr,
second the estimated or final len)
7289034 4
 173586 5
  21819 6
i.e. there are tons of widest_ints which need len 4 (or perhaps just
have it as upper estimation), maybe even 5 would be nice.
On the wide_int side, I see
 155291 576
(supposedly because of bound_wide_int, where we create wide_int_ref from
the 576-bit precision bound_wide_int and then create 576-bit wide_int when
using unary or binary operation on that).

So, perhaps we could get away with say WIDEST_INT_MAX_INL_ELTS of 5 or 6
instead of 9 but keep WIDE_INT_MAX_INL_ELTS at 9 (or whatever is computed
from MAX_BITSIZE_MODE_ANY_INT?).  Or keep it at 9 for both (i.e. without
the third patch).

--- gcc/poly-int.h.jj   2023-10-09 14:37:45.883940062 +0200
+++ gcc/poly-int.h  2023-10-09 17:05:26.629828329 +0200
@@ -96,7 +96,7 @@ struct poly_coeff_traits
-struct poly_coeff_traits
+struct poly_coeff_traits
 {
   typedef WI_UNARY_RESULT (T) result;
   typedef int int_type;
@@ -110,14 +110,13 @@ struct poly_coeff_traits
-struct poly_coeff_traits
+struct poly_coeff_traits
 {
   typedef WI_UNARY_RESULT (T) result;
   typedef int int_type;
   /* These types are always signed.  */
   static const int signedness = 1;
   static const int precision = wi::int_traits::precision;
-  static const int inl_precision = wi::int_traits::inl_precision;
   static const int rank = precision * 2 / CHAR_BIT;
 
   template
--- gcc/double-int.h.jj 2023-01-02 09:32:22.747280053 +0100
+++ gcc/double-int.h2023-10-09 17:06:03.446317336 +0200
@@ -440,7 +440,7 @@ namespace wi
   template <>
   struct int_traits 
   {
-static const enum precision_type precision_type = CONST_PRECISION;
+static const enum precision_type precision_type = INL_CONST_PRECISION;
 static const bool host_dependent_precision = true;
 static const unsigned int precision = HOST_BITS_PER_DOUBLE_INT;
 static unsigned int get_precision (const double_int &);
--- gcc/wide-int.h.jj   2023-10-09 16:06:39.326805176 +0200
+++ gcc/wide-int.h  2023-10-09 17:29:20.016951691 +0200
@@ -343,8 +343,8 @@ template  class widest_int_storag
 
 typedef generic_wide_int  wide_int;
 typedef FIXED_WIDE_INT (ADDR_MAX_PRECISION) offset_int;
-typedef generic_wide_int  > 
widest_int;
-typedef generic_wide_int  
> widest2_int;
+typedef generic_wide_int  > 
widest_int;
+typedef generic_wide_int  > 
widest2_int;
 
 /* wi::storage_ref can be a reference to a primitive type,
so this is the conservatively-correct setting.  */
@@ -394,13 +394,13 @@ namespace wi
 /* The integer has a variable precision but no defined signedness.  */
 VAR_PRECISION,
 
-/* The integer has a constant precision (known at GCC compile time)
-   and is signed.  */
-CONST_PRECISION,
-
-/* Like CONST_PRECISION, but with WIDEST_INT_MAX_PRECISION or larger
-   precision where not all elements of arrays are always present.  */
-WIDEST_CONST_PRECISION
+/* The integer has a constant precision (known at GCC compile time),
+   is signed and all elements are in inline buffer.  */
+INL_CONST_PRECISION,
+
+/* Like INL_CONST_PRECISION, but elements can be heap allocated for
+   larger lengths.  */
+CONST_PRECISION
   };
 
   /* This class, which has no default implementation, is expected to
@@ -410,15 +410,10 @@ namespace wi
Classifies the type of T.
 
  static const unsigned int precision;
-   Only defined if precision_type == CONST_PRECISION or
-   precision_type == WIDEST_CONST_PRECISION.  Specifies the
+   Only defined if precision_type == INL_CONST_PRE

Re: [PATCH] wide-int: Allow up to 16320 bits wide_int and change widest_int precision to 32640 bits [PR102989]

2023-10-09 Thread Jakub Jelinek
On Mon, Oct 09, 2023 at 01:54:19PM +0100, Richard Sandiford wrote:
> > I've additionally built it with the incremental attached patch and
> > on make -C gcc check-gcc check-g++ -j32 -k it didn't show any
> > wide_int/widest_int heap allocations unless a > 128-bit _BitInt or wb/uwb
> > constant needing > 128-bit _BitInt was used in a testcase.
> 
> Overall it looks really good to me FWIW.  Some comments about the
> wide-int.h changes below.  Will send a separate message about wide-int.cc.

Thanks, just quick answers, will work on patch adjustments after trying to
get rid of rwide_int (seems dwarf2out has very limited needs from it, just
some routine to construct it in GCed memory (and never change afterwards)
from const wide_int_ref & or so, and then working operator ==,
get_precision, elt, get_len and get_val methods, so I think we could just
have a struct dw_wide_int { unsigned int prec, len; HOST_WIDE_INT val[1]; };
and perform the methods on it after converting to a storage ref.

> > @@ -380,7 +406,11 @@ namespace wi
> >  
> >  /* The integer has a constant precision (known at GCC compile time)
> > and is signed.  */
> > -CONST_PRECISION
> > +CONST_PRECISION,
> > +
> > +/* Like CONST_PRECISION, but with WIDEST_INT_MAX_PRECISION or larger
> > +   precision where not all elements of arrays are always present.  */
> > +WIDEST_CONST_PRECISION
> >};
> 
> Sorry to bring this up so late, but how about using INL_CONST_PRECISION
> for the fully inline case and CONST_PRECISION for the general case?
> That seems more consistent with the other naming in the patch.

Ok.

> > @@ -482,6 +541,18 @@ namespace wi
> >};
> >  
> >template 
> > +  struct binary_traits  > WIDEST_CONST_PRECISION>
> > +  {
> > +STATIC_ASSERT (int_traits ::precision == int_traits 
> > ::precision);
> 
> Should this assert for equal inl_precision too?  Although it probably
> isn't necessary computationally, it seems a bit arbitrary to pick the
> first inl_precision...

inl_precision is only used for widest_int/widest2_int, so if precision is
equal, inl_precision is as well.

> > +inline wide_int_storage::wide_int_storage (const wide_int_storage &x)
> > +{
> > +  len = x.len;
> > +  precision = x.precision;
> > +  if (UNLIKELY (precision > WIDE_INT_MAX_INL_PRECISION))
> > +{
> > +  u.valp = XNEWVEC (HOST_WIDE_INT, CEIL (precision, 
> > HOST_BITS_PER_WIDE_INT));
> > +  memcpy (u.valp, x.u.valp, len * sizeof (HOST_WIDE_INT));
> > +}
> > +  else if (LIKELY (precision))
> > +memcpy (u.val, x.u.val, len * sizeof (HOST_WIDE_INT));
> > +}
> 
> Does the variable-length memcpy pay for itself?  If so, perhaps that's a
> sign that we should have a smaller inline buffer for this class (say 2 HWIs).

Guess I'll try to see what results in smaller .text size.

> > +namespace wi
> > +{
> > +  template 
> > +  struct int_traits < widest_int_storage  >
> > +  {
> > +static const enum precision_type precision_type = 
> > WIDEST_CONST_PRECISION;
> > +static const bool host_dependent_precision = false;
> > +static const bool is_sign_extended = true;
> > +static const bool needs_write_val_arg = true;
> > +static const unsigned int precision
> > +  = N / WIDE_INT_MAX_INL_PRECISION * WIDEST_INT_MAX_PRECISION;
> 
> What's the reasoning behind this calculation?  It would give 0 for
> N < WIDE_INT_MAX_INL_PRECISION, and the "MAX" suggests that N
> shouldn't be > WIDE_INT_MAX_INL_PRECISION either.
> 
> I wonder whether this should be a second template parameter, with an
> assert that precision > inl_precision.

Maybe.  Yet another option would be to always use WIDE_INT_MAX_INL_PRECISION
as the inline precision (and use N template parameter just to decide about
the overall precision), regardless of whether it is widest_int or
widest2_int.  The latter is very rare and even much rarer that something
wouldn't fit into the WIDE_INT_MAX_INL_PRECISION when not using _BitInt.
The reason for introducing inl_precision was to avoid the heap allocation
for widest2_int unless _BitInt is in use, but maybe that isn't worth it.

> Nit: might format more naturally with:
> 
>   using res_traits = wi::int_traits :
>   ...

Ok.

> > @@ -2203,6 +2781,9 @@ wi::sext (const T &x, unsigned int offse
> >unsigned int precision = get_precision (result);
> >WIDE_INT_REF_FOR (T) xi (x, precision);
> >  
> > +  if (result.needs_write_val_arg)
> > +val = result.write_val (MAX (xi.len,
> > +CEIL (offset, HOST_BITS_PER_WIDE_INT)));
> 
> Why MAX rather than MIN?

Because it needs to be an upper bound.
In this case, sext_large has
  unsigned int len = offset / HOST_BITS_PER_WIDE_INT;
  /* Extending beyond the precision is a no-op.  If we have only stored
 OFFSET bits or fewer, the rest are already signs.  */
  if (offset >= precision || len >= xlen)
{
  for (unsigned i = 0; i < xlen; ++i)
val[i] = xval[i];
  return xlen;
}
  unsigned int suboffset = of

Re: [PATCH] wide-int: Allow up to 16320 bits wide_int and change widest_int precision to 32640 bits [PR102989]

2023-10-09 Thread Richard Sandiford
Jakub Jelinek  writes:
> Hi!
>
> As mentioned in the _BitInt support thread, _BitInt(N) is currently limited
> by the wide_int/widest_int maximum precision limitation, which is depending
> on target 191, 319, 575 or 703 bits (one less than WIDE_INT_MAX_PRECISION).
> That is fairly low limit for _BitInt, especially on the targets with the 191
> bit limitation.
>
> The following patch bumps that limit to 16319 bits on all arches, which is
> the limit imposed by INTEGER_CST representation (unsigned char members
> holding number of HOST_WIDE_INT limbs).
>
> In order to achieve that, wide_int is changed from a trivially copyable type
> which contained just an inline array of WIDE_INT_MAX_ELTS (3, 5, 9 or
> 11 limbs depending on target) limbs into a non-trivially copy constructible,
> copy assignable and destructible type which for the usual small cases (up
> to WIDE_INT_MAX_INL_ELTS which is the former WIDE_INT_MAX_ELTS) still uses
> an inline array of limbs, but for larger precisions uses heap allocated
> limb array.  This makes wide_int unusable in GC structures, so for dwarf2out
> which was the only place which needed it there is a new rwide_int type
> (restricted wide_int) which supports only up to RWIDE_INT_MAX_ELTS limbs
> inline and is trivially copyable (dwarf2out should never deal with large
> _BitInt constants, those should have been lowered earlier).
>
> Similarly, widest_int has been changed from a trivially copyable type which
> contained also an inline array of WIDE_INT_MAX_ELTS limbs (but unlike
> wide_int didn't contain precision and assumed that to be
> WIDE_INT_MAX_PRECISION) into a non-trivially copy constructible, copy
> assignable and destructible type which has always WIDEST_INT_MAX_PRECISION
> precision (32640 bits currently, twice as much as INTEGER_CST limitation
> allows) and unlike wide_int decides depending on get_len () value whether
> it uses an inline array (again, up to WIDE_INT_MAX_INL_ELTS) or heap
> allocated one.  In wide-int.h this means we need to estimate an upper
> bound on how many limbs will wide-int.cc (usually, sometimes wide-int.h)
> need to write, heap allocate if needed based on that estimation and upon
> set_len which is done at the end if we guessed over WIDE_INT_MAX_INL_ELTS
> and allocated dynamically, while we actually need less than that
> copy/deallocate.  The unexact guesses are needed because the exact
> computation of the length in wide-int.cc is sometimes quite complex and
> especially canonicalize at the end can decrease it.  widest_int is again
> because of this not usable in GC structures, so cfgloop.h has been changed
> to use fixed_wide_int_storage  and punt if
> we'd have larger _BitInt based iterators, programs having more than 128-bit
> iterators will be hopefully rare and I think it is fine to treat loops with
> more than 2^127 iterations as effectively possibly infinite, omp-general.cc
> is changed to use fixed_wide_int_storage <1024>, as it better should support
> scores with the same precision on all arches.
>
> Code which used WIDE_INT_PRINT_BUFFER_SIZE sized buffers for printing
> wide_int/widest_int into buffer had to be changed to use XALLOCAVEC for
> larger lengths.
>
> On x86_64, the patch in --enable-checking=yes,rtl,extra configured
> bootstrapped cc1plus enlarges the .text section by 1.01% - from
> 0x25725a5 to 0x25e and similarly at least when compiling insn-recog.cc
> with the usual bootstrap option slows compilation down by 1.01%,
> user 4m22.046s and 4m22.384s on vanilla trunk vs.
> 4m25.947s and 4m25.581s on patched trunk.  I'm afraid some code size growth
> and compile time slowdown is unavoidable in this case, we use wide_int and
> widest_int everywhere, and while the rare cases are marked with UNLIKELY
> macros, it still means extra checks for it.

Yeah, it's unfortunate, but like you say, it's probably unavoidable.
Having effectively arbitrary-size integers breaks most of the simplifying
asssumptions.

> The patch also regresses
> +FAIL: gm2/pim/fail/largeconst.mod,  -O  
> +FAIL: gm2/pim/fail/largeconst.mod,  -O -g  
> +FAIL: gm2/pim/fail/largeconst.mod,  -O3 -fomit-frame-pointer  
> +FAIL: gm2/pim/fail/largeconst.mod,  -O3 -fomit-frame-pointer 
> -finline-functions  
> +FAIL: gm2/pim/fail/largeconst.mod,  -Os  
> +FAIL: gm2/pim/fail/largeconst.mod,  -g  
> +FAIL: gm2/pim/fail/largeconst2.mod,  -O  
> +FAIL: gm2/pim/fail/largeconst2.mod,  -O -g  
> +FAIL: gm2/pim/fail/largeconst2.mod,  -O3 -fomit-frame-pointer  
> +FAIL: gm2/pim/fail/largeconst2.mod,  -O3 -fomit-frame-pointer 
> -finline-functions  
> +FAIL: gm2/pim/fail/largeconst2.mod,  -Os  
> +FAIL: gm2/pim/fail/largeconst2.mod,  -g  
> tests, which previously were rejected with
> error: constant literal 
> ‘12345678912345678912345679123456789123456789123456789123456789123456791234567891234567891234567891234567891234567912345678912345678912345678912345678912345679123456789123456789’
>  exceeds internal ZTYPE range
> kind of errors, but now are accepted.  Seems the F