Re: [PATCH] libstdc++: More efficient std::chrono::year::leap.

2021-06-24 Thread Cassio Neri via Gcc-patches
Thanks Jonathan.

Here are some benchmarks (assembly in [1]):
https://quick-bench.com/q/jclBXmi4QLDcRMLuuVpxTUsFmQw

Unfortunately, quick-bench times out unless some implementations are
commented out. You can copy the code and run it locally (needs google
benchmark) to get the full picture.

I really like Ulrich Drepper's implementation (version 8). GCC 11 emmits
really good code and looking at previous versions this has been very
consistent. This implementation also highlights very clearly that my hack
to calculate __is_multiple_of_100 (as opposed to % 100 used in version 7)
saves a ror instruction with remarkable performance effects.

In my AMD Ryzen 7 box, his implementation doesn't seem to be the fastest.
Nevertheless, compared to other fast alternatives, performance differences
are small and results seem very much platform dependent. In my opinion, his
implementation is the best, especially, given recent compiler changes. My
reasoning follows.

My original motivation was contrasting 'year % 400 == 0' with 'year % 16'
as in versions 1 and 2:

  __is_multiple_of_100 ? year % 400 == 0 : year % 4 == 0; // version 1
  __is_multiple_of_100 ? year % 16 == 0 : year % 4 == 0; // version 2

It's fair to expect that version 2 won't be slower than version 1. However,
this is the case and by quite a big margin. The emitted instructions are
very reasonable and, I guess, the issue is the choice of registers. I've
reimplemented half of version 2 in inline asm using similar register
choices as version 1 and got the performance boost that I was expecting.
(By no means I'm suggesting a non portable implementation here. It was just
an exercise to make my point. Also, it performs very poorly when compiled
by clang.)

The point here is that code emitted for sources like versions 1 and 2
(involving %) seems sensitive to very small variations and has been
changing across compiler releases in the last 3 years or so. (This can be
checked on godbolt.) Clang also seems very confused [2]. Even if the
emitted code for version 2 and others were good today, I fear a new
compiler release could regress. The stability of the code emitted for
Ulrich Drepper's implementation is much more reliable, its performance is
very good and its clarity is only compromised by my own hack for
__is_multiple_of_100. (Recall, however, that this hack is crucial for his
implementation's performance.)

Best wishes,
Cassio.

[1] https://godbolt.org/z/TbGYEqx33
[2] https://bugs.llvm.org/show_bug.cgi?id=50334

On Wed, Jun 23, 2021 at 6:52 PM Jonathan Wakely  wrote:

> On 23/06/21 18:51 +0100, Jonathan Wakely wrote:
> >Here's what I've committed. Tested x86_64-linux and powerpc64le-linux.
> >Pushed to trunk.
> >
> >
> >
>
> >commit b92d12d3fe3f1aa56d190d960e40c62869a6cfbb
> >Author: Cassio Neri 
> >Date:   Wed Jun 23 15:32:16 2021
> >
> >libstdc++: More efficient std::chrono::year::leap
> >
> >Simple change to std::chrono::year::is_leap. If a year is multiple of
> 100,
> >then it's divisible by 400 if and only if it's divisible by 16. The
> latter
> >allows for better code generation.
> >
> >The expression is then either y%16 or y%4 which are both powers of two
> >and so it can be rearranged to use simple bitmask operations.
> >
> >Co-authored-by: Jonathan Wakely 
> >Co-authored-by: Ulrich Drepper 
> >
> >libstdc++-v3/ChangeLog:
> >
> >* include/std/chrono (chrono::year::is_leap()): Optimize.
> >
> >diff --git a/libstdc++-v3/include/std/chrono
> b/libstdc++-v3/include/std/chrono
> >index 4631a727d73..863b6a27bdf 100644
> >--- a/libstdc++-v3/include/std/chrono
> >+++ b/libstdc++-v3/include/std/chrono
> >@@ -1606,13 +1606,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >   // [1] https://github.com/cassioneri/calendar
> >   // [2]
> https://accu.org/journals/overload/28/155/overload155.pdf#page=16
> >
> >+  // Furthermore, if y%100 != 0, then y%400==0 is equivalent to
> y%16==0,
> >+  // so we can rearrange the expression to (mult_100 ? y % 4 : y %
> 16)==0
>
> But Ulrich pointed out I got my boolean logic all muddled up in the
> comment. Fixed with the attached patch!
>
>


Re: [PATCH] libstdc++: More efficient std::chrono::year::leap.

2021-06-23 Thread Jonathan Wakely via Gcc-patches

On 23/06/21 18:51 +0100, Jonathan Wakely wrote:

Here's what I've committed. Tested x86_64-linux and powerpc64le-linux.
Pushed to trunk.






commit b92d12d3fe3f1aa56d190d960e40c62869a6cfbb
Author: Cassio Neri 
Date:   Wed Jun 23 15:32:16 2021

   libstdc++: More efficient std::chrono::year::leap
   
   Simple change to std::chrono::year::is_leap. If a year is multiple of 100,

   then it's divisible by 400 if and only if it's divisible by 16. The latter
   allows for better code generation.
   
   The expression is then either y%16 or y%4 which are both powers of two

   and so it can be rearranged to use simple bitmask operations.
   
   Co-authored-by: Jonathan Wakely 

   Co-authored-by: Ulrich Drepper 
   
   libstdc++-v3/ChangeLog:
   
   * include/std/chrono (chrono::year::is_leap()): Optimize.


diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 4631a727d73..863b6a27bdf 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1606,13 +1606,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
// [1] https://github.com/cassioneri/calendar
// [2] https://accu.org/journals/overload/28/155/overload155.pdf#page=16

+   // Furthermore, if y%100 != 0, then y%400==0 is equivalent to y%16==0,
+   // so we can rearrange the expression to (mult_100 ? y % 4 : y % 16)==0


But Ulrich pointed out I got my boolean logic all muddled up in the
comment. Fixed with the attached patch!

commit 4a404f66b09d661bdc60e7e0d5d08f00c4e194db
Author: Jonathan Wakely 
Date:   Wed Jun 23 18:50:03 2021

libstdc++: Fix comment in chrono::year::is_leap()

libstdc++-v3/ChangeLog:

* include/std/chrono (chrono::year::is_leap()): Fix incorrect
logic in comment.

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 863b6a27bdf..0b597be1944 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1606,8 +1606,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	// [1] https://github.com/cassioneri/calendar
 	// [2] https://accu.org/journals/overload/28/155/overload155.pdf#page=16
 
-	// Furthermore, if y%100 != 0, then y%400==0 is equivalent to y%16==0,
-	// so we can rearrange the expression to (mult_100 ? y % 4 : y % 16)==0
+	// Furthermore, if y%100 == 0, then y%400==0 is equivalent to y%16==0,
+	// so we can simplify it to (!mult_100 && y % 4 == 0) || y % 16 == 0,
 	// which is equivalent to (y & (mult_100 ? 15 : 3)) == 0.
 	// See https://gcc.gnu.org/pipermail/libstdc++/2021-June/052815.html
 


Re: [PATCH] libstdc++: More efficient std::chrono::year::leap.

2021-06-23 Thread Jonathan Wakely via Gcc-patches

On 23/06/21 14:16 +0100, Jonathan Wakely wrote:

On 23/06/21 12:45 +0100, Jonathan Wakely wrote:

On 21/05/21 19:44 +0100, Cassio Neri via Libstdc++ wrote:

I've checked the generated code and the compiler doesn't figure out
the logic. I added a comment to explain.

(Revised patch below and attached.)

Best wishes,
Cassio.

---

Simple change to std::chrono::year::is_leap. If a year is multiple of 100,
then it's divisible by 400 if and only if it's divisible by 16. The latter
allows for better code generation.

Tested on x86_64-pc-linux-gnu.

libstdc++-v3/ChangeLog:
libstdc++-v3/ChangeLog:

 * include/std/chrono:

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 4631a727d73..85aa0379432 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1612,7 +1612,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  constexpr uint32_t __offset   = __max_dividend / 2 / 100 * 100;
  const bool __is_multiple_of_100
= __multiplier * (_M_y + __offset) < __bound;
-return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0;
+// Usually we test _M_y % 400 == 0 but, when it's already known that
+// _M_y%100 == 0, then _M_y % 400==0 is equivalent to _M_y % 16 == 0.

^^
N.B. this comment should say !=


+return (!__is_multiple_of_100 || _M_y % 16 == 0) && _M_y % 4 == 0;


If y % 16 == 0 then y % 4 == 0 too. So we could write that as:

return (!__is_multiple_of_100 && _M_y % 4 == 0) || _M_y % 16 == 0;

This seems to perform even better over a wide range of inputs, can you
confirm that result with your own tests?

However, my microbenchmark also shows that the simplistic code using
y%100 often performs even better than the current code calculating
__is_multiple_of_100 to avoid the modulus operation. So maybe my tests
are bad.

My rearranged expression above is equivalent to:

return _M_y % (__is_multiple_of_100 ? 16 : 4) == 0;

which can be written without branches:

return _M_y % (4 << (2 * __is_multiple_of_100)) == 0;

However, both Clang and GCC already remove the branch for (x ? 16 : 4)
and the conditional expression produces slightly smaller code with 
GCC (see https://gcc.gnu.org/PR101179 regarding that). But neither 
of

these seems to improve compared to my first rearrangement above.


This version from Ulrich Drepper produces the smallest code of all
(and also performs well, if not the absolute fastest) in my
benchmarks:

 return (y & (__is_multiple_of_100 ? 15 : 3)) == 0;


Here's what I've committed. Tested x86_64-linux and powerpc64le-linux.
Pushed to trunk.



commit b92d12d3fe3f1aa56d190d960e40c62869a6cfbb
Author: Cassio Neri 
Date:   Wed Jun 23 15:32:16 2021

libstdc++: More efficient std::chrono::year::leap

Simple change to std::chrono::year::is_leap. If a year is multiple of 100,
then it's divisible by 400 if and only if it's divisible by 16. The latter
allows for better code generation.

The expression is then either y%16 or y%4 which are both powers of two
and so it can be rearranged to use simple bitmask operations.

Co-authored-by: Jonathan Wakely 
Co-authored-by: Ulrich Drepper 

libstdc++-v3/ChangeLog:

* include/std/chrono (chrono::year::is_leap()): Optimize.

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 4631a727d73..863b6a27bdf 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1606,13 +1606,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	// [1] https://github.com/cassioneri/calendar
 	// [2] https://accu.org/journals/overload/28/155/overload155.pdf#page=16
 
+	// Furthermore, if y%100 != 0, then y%400==0 is equivalent to y%16==0,
+	// so we can rearrange the expression to (mult_100 ? y % 4 : y % 16)==0
+	// which is equivalent to (y & (mult_100 ? 15 : 3)) == 0.
+	// See https://gcc.gnu.org/pipermail/libstdc++/2021-June/052815.html
+
 	constexpr uint32_t __multiplier   = 42949673;
 	constexpr uint32_t __bound= 42949669;
 	constexpr uint32_t __max_dividend = 1073741799;
 	constexpr uint32_t __offset   = __max_dividend / 2 / 100 * 100;
 	const bool __is_multiple_of_100
 	  = __multiplier * (_M_y + __offset) < __bound;
-	return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0;
+	return (_M_y & (__is_multiple_of_100 ? 15 : 3)) == 0;
   }
 
   explicit constexpr


Re: [PATCH] libstdc++: More efficient std::chrono::year::leap.

2021-06-23 Thread Jonathan Wakely via Gcc-patches

On 23/06/21 12:45 +0100, Jonathan Wakely wrote:

On 21/05/21 19:44 +0100, Cassio Neri via Libstdc++ wrote:

I've checked the generated code and the compiler doesn't figure out
the logic. I added a comment to explain.

(Revised patch below and attached.)

Best wishes,
Cassio.

---

Simple change to std::chrono::year::is_leap. If a year is multiple of 100,
then it's divisible by 400 if and only if it's divisible by 16. The latter
allows for better code generation.

Tested on x86_64-pc-linux-gnu.

libstdc++-v3/ChangeLog:
libstdc++-v3/ChangeLog:

  * include/std/chrono:

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 4631a727d73..85aa0379432 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1612,7 +1612,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   constexpr uint32_t __offset   = __max_dividend / 2 / 100 * 100;
   const bool __is_multiple_of_100
 = __multiplier * (_M_y + __offset) < __bound;
-return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0;
+// Usually we test _M_y % 400 == 0 but, when it's already known that
+// _M_y%100 == 0, then _M_y % 400==0 is equivalent to _M_y % 16 == 0.

 ^^
 N.B. this comment should say !=


+return (!__is_multiple_of_100 || _M_y % 16 == 0) && _M_y % 4 == 0;


If y % 16 == 0 then y % 4 == 0 too. So we could write that as:

 return (!__is_multiple_of_100 && _M_y % 4 == 0) || _M_y % 16 == 0;

This seems to perform even better over a wide range of inputs, can you
confirm that result with your own tests?

However, my microbenchmark also shows that the simplistic code using
y%100 often performs even better than the current code calculating
__is_multiple_of_100 to avoid the modulus operation. So maybe my tests
are bad.

My rearranged expression above is equivalent to:

 return _M_y % (__is_multiple_of_100 ? 16 : 4) == 0;

which can be written without branches:

 return _M_y % (4 << (2 * __is_multiple_of_100)) == 0;

However, both Clang and GCC already remove the branch for (x ? 16 : 4)
and the conditional expression produces slightly smaller code with GCC 
(see https://gcc.gnu.org/PR101179 regarding that). But neither of

these seems to improve compared to my first rearrangement above.


This version from Ulrich Drepper produces the smallest code of all
(and also performs well, if not the absolute fastest) in my
benchmarks:

  return (y & (__is_multiple_of_100 ? 15 : 3)) == 0;




Re: [PATCH] libstdc++: More efficient std::chrono::year::leap.

2021-06-23 Thread Jonathan Wakely via Gcc-patches

On 21/05/21 19:44 +0100, Cassio Neri via Libstdc++ wrote:

I've checked the generated code and the compiler doesn't figure out
the logic. I added a comment to explain.

(Revised patch below and attached.)

Best wishes,
Cassio.

---

Simple change to std::chrono::year::is_leap. If a year is multiple of 100,
then it's divisible by 400 if and only if it's divisible by 16. The latter
allows for better code generation.

Tested on x86_64-pc-linux-gnu.

libstdc++-v3/ChangeLog:
libstdc++-v3/ChangeLog:

   * include/std/chrono:

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 4631a727d73..85aa0379432 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1612,7 +1612,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
constexpr uint32_t __offset   = __max_dividend / 2 / 100 * 100;
const bool __is_multiple_of_100
  = __multiplier * (_M_y + __offset) < __bound;
-return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0;
+// Usually we test _M_y % 400 == 0 but, when it's already known that
+// _M_y%100 == 0, then _M_y % 400==0 is equivalent to _M_y % 16 == 0.

  ^^
  N.B. this comment should say !=


+return (!__is_multiple_of_100 || _M_y % 16 == 0) && _M_y % 4 == 0;


If y % 16 == 0 then y % 4 == 0 too. So we could write that as:

  return (!__is_multiple_of_100 && _M_y % 4 == 0) || _M_y % 16 == 0;

This seems to perform even better over a wide range of inputs, can you
confirm that result with your own tests?

However, my microbenchmark also shows that the simplistic code using
y%100 often performs even better than the current code calculating
__is_multiple_of_100 to avoid the modulus operation. So maybe my tests
are bad.

My rearranged expression above is equivalent to:

  return _M_y % (__is_multiple_of_100 ? 16 : 4) == 0;

which can be written without branches:

  return _M_y % (4 << (2 * __is_multiple_of_100)) == 0;

However, both Clang and GCC already remove the branch for (x ? 16 : 4)
and the conditional expression produces slightly smaller code with GCC 
(see https://gcc.gnu.org/PR101179 regarding that). But neither of

these seems to improve compared to my first rearrangement above.




Re: [PATCH] libstdc++: More efficient std::chrono::year::leap.

2021-05-21 Thread Cassio Neri via Gcc-patches
I've checked the generated code and the compiler doesn't figure out
the logic. I added a comment to explain.

(Revised patch below and attached.)

Best wishes,
Cassio.

---

Simple change to std::chrono::year::is_leap. If a year is multiple of 100,
then it's divisible by 400 if and only if it's divisible by 16. The latter
allows for better code generation.

Tested on x86_64-pc-linux-gnu.

libstdc++-v3/ChangeLog:
libstdc++-v3/ChangeLog:

* include/std/chrono:

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 4631a727d73..85aa0379432 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1612,7 +1612,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 constexpr uint32_t __offset   = __max_dividend / 2 / 100 * 100;
 const bool __is_multiple_of_100
   = __multiplier * (_M_y + __offset) < __bound;
-return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0;
+// Usually we test _M_y % 400 == 0 but, when it's already known that
+// _M_y%100 == 0, then _M_y % 400==0 is equivalent to _M_y % 16 == 0.
+return (!__is_multiple_of_100 || _M_y % 16 == 0) && _M_y % 4 == 0;
   }

   explicit constexpr

On Fri, May 21, 2021 at 7:05 PM Koning, Paul  wrote:
>
>
>
> > On May 21, 2021, at 1:46 PM, Cassio Neri via Gcc-patches 
> >  wrote:
> >
> > Simple change to std::chrono::year::is_leap. If a year is multiple of 100,
> > then it's divisible by 400 if and only if it's divisible by 16. The latter
> > allows for better code generation.
>
> I wonder if the optimizer could be taught to do that.
>
> The change seems correct but it is very confusing; at the very least the 
> reasoning you gave should be stated in a comment on that check.
>
> paul
>
>
Simple change to std::chrono::year::is_leap. If a year is multiple of 100,
then it's divisible by 400 if and only if it's divisible by 16. The latter
allows for better code generation.

Tested on x86_64-pc-linux-gnu.

libstdc++-v3/ChangeLog:
libstdc++-v3/ChangeLog:

	* include/std/chrono:

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 4631a727d73..85aa0379432 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1612,7 +1612,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	constexpr uint32_t __offset   = __max_dividend / 2 / 100 * 100;
 	const bool __is_multiple_of_100
 	  = __multiplier * (_M_y + __offset) < __bound;
-	return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0;
+	// Usually we test _M_y % 400 == 0 but, when it's already known that
+	// _M_y%100 == 0, then _M_y % 400==0 is equivalent to _M_y % 16 == 0.
+	return (!__is_multiple_of_100 || _M_y % 16 == 0) && _M_y % 4 == 0;
   }

   explicit constexpr


Re: [PATCH] libstdc++: More efficient std::chrono::year::leap.

2021-05-21 Thread Koning, Paul via Gcc-patches



> On May 21, 2021, at 1:46 PM, Cassio Neri via Gcc-patches 
>  wrote:
> 
> Simple change to std::chrono::year::is_leap. If a year is multiple of 100,
> then it's divisible by 400 if and only if it's divisible by 16. The latter
> allows for better code generation.

I wonder if the optimizer could be taught to do that.

The change seems correct but it is very confusing; at the very least the 
reasoning you gave should be stated in a comment on that check.

paul




[PATCH] libstdc++: More efficient std::chrono::year::leap.

2021-05-21 Thread Cassio Neri via Gcc-patches
Simple change to std::chrono::year::is_leap. If a year is multiple of 100,
then it's divisible by 400 if and only if it's divisible by 16. The latter
allows for better code generation.

Tested on x86_64-pc-linux-gnu.

libstdc++-v3/ChangeLog:

* include/std/chrono:

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 4631a727d73..6221d02c1b3 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1612,7 +1612,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  constexpr uint32_t __offset   = __max_dividend / 2 / 100 * 100;
  const bool __is_multiple_of_100
   = __multiplier * (_M_y + __offset) < __bound;
- return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0;
+ return (!__is_multiple_of_100 || _M_y % 16 == 0) && _M_y % 4 == 0;
   }

   explicit constexpr


Re: [PATCH, libstdc++] More

2019-11-25 Thread François Dumont
I have a patch for the same location so here is my remark that might 
make my patch useless.


Maybe you can even merge it with yours Ed, here it is:

https://gcc.gnu.org/ml/libstdc++/2019-10/msg00072.html

On 11/25/19 10:15 PM, Jonathan Wakely wrote:

On 15/11/19 22:17 -0500, Ed Smith-Rowland via libstdc++ wrote:

Index: include/bits/stl_algobase.h
===
--- include/bits/stl_algobase.h    (revision 278318)
+++ include/bits/stl_algobase.h    (working copy)
@@ -107,6 +107,50 @@
    }

  /*
+   * A constexpr wrapper for __builtin_memmove.


This should say __builtin_memcpy.


+   * @param __num The number of elements of type _Tp (not bytes).
+   */
+  template
+    _GLIBCXX14_CONSTEXPR
+    inline void*
+    __memcpy(_Tp* __dst, const _Tp* __src, size_t __num)
+    {
+#ifdef __cpp_lib_is_constant_evaluated
+  if (std::is_constant_evaluated())
+    {
+  for(; __num > 0; --__num)


for (; __num != 0; --__num)

and make __num prtdiff_t.

would be better here as this way the compiler is able to report bad 
usages rather than silently do nothing.




+    *__dst++ = *__src++;
+  return __dst;
+    }
+  else
+#endif
+    return __builtin_memcpy(__dst, __src, sizeof(_Tp) * __num);
+  return __dst;
+    }
+
+  /*
+   * A constexpr wrapper for __builtin_memmove.


And this should say __builtin_memset.


+   * @param __num The number of elements of type _Tp (not bytes).
+   */
+  template
+    _GLIBCXX14_CONSTEXPR
+    inline void*
+    __memset(_Tp* __dst, _Tp __a, size_t __num)
+    {
+#ifdef __cpp_lib_is_constant_evaluated
+  if (std::is_constant_evaluated())
+    {
+  for(; __num > 0; --__num)
+    *__dst = __a;
+  return __dst;
+    }
+  else
+#endif
+    return __builtin_memset(__dst, __a, sizeof(_Tp) * __num);
+  return __dst;
+    }


OK for trunk with those two comment fixes. Thanks.







Re: [PATCH, libstdc++] More

2019-11-25 Thread Jonathan Wakely

On 15/11/19 22:17 -0500, Ed Smith-Rowland via libstdc++ wrote:

Index: include/bits/stl_algobase.h
===
--- include/bits/stl_algobase.h (revision 278318)
+++ include/bits/stl_algobase.h (working copy)
@@ -107,6 +107,50 @@
}

  /*
+   * A constexpr wrapper for __builtin_memmove.


This should say __builtin_memcpy.


+   * @param __num The number of elements of type _Tp (not bytes).
+   */
+  template
+_GLIBCXX14_CONSTEXPR
+inline void*
+__memcpy(_Tp* __dst, const _Tp* __src, size_t __num)
+{
+#ifdef __cpp_lib_is_constant_evaluated
+  if (std::is_constant_evaluated())
+   {
+ for(; __num > 0; --__num)
+   *__dst++ = *__src++;
+ return __dst;
+   }
+  else
+#endif
+   return __builtin_memcpy(__dst, __src, sizeof(_Tp) * __num);
+  return __dst;
+}
+
+  /*
+   * A constexpr wrapper for __builtin_memmove.


And this should say __builtin_memset.


+   * @param __num The number of elements of type _Tp (not bytes).
+   */
+  template
+_GLIBCXX14_CONSTEXPR
+inline void*
+__memset(_Tp* __dst, _Tp __a, size_t __num)
+{
+#ifdef __cpp_lib_is_constant_evaluated
+  if (std::is_constant_evaluated())
+   {
+ for(; __num > 0; --__num)
+   *__dst = __a;
+ return __dst;
+   }
+  else
+#endif
+   return __builtin_memset(__dst, __a, sizeof(_Tp) * __num);
+  return __dst;
+}


OK for trunk with those two comment fixes. Thanks.




[PATCH, libstdc++] More

2019-11-15 Thread Ed Smith-Rowland via gcc-patches


First of all, the redo of the iterator portion in response to 
https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01467.html passed testing.


This patch contains constexpr char_traits. I also found an old extension 
that had been left out of the constexpr bandwagon that I caught up.


This patch also includes constexpr string_view::copy which depends on 
teh type_traits.


This gets all of p1032 except string::copy (although constexpr 
char_traits should almost enable this too once constexpr string is in).  
I'm done.


This passes on x86_64-linux.

OK?


2019-11-16  Edward Smith-Rowland  <3dw...@verizon.net>

	Implement the char_traits and string_view part of C++20 p1032 Misc.
	constexpr bits.
	* include/bits/char_traits.h (move, copy, assign): Constexpr.
	* include/bits/stl_algobase.h (__memcpy, __memset): New.
	* include/ext/pod_char_traits.h (from, to, operator==, operator<)
	(assign, eq, lt, compare, length, find, move, copy, assign)
	(to_char_type, to_int_type, eq_int_type, eof, not_eof):
	Make these constespr for appropriate standards.
	* testsuite/21_strings/char_traits/requirements/char/constexpr.cc: New test.
	* testsuite/21_strings/char_traits/requirements/wchar_t/constexpr.cc: New test.
	* include/std/string_view (copy): Constexpr.
	* testsuite/21_strings/basic_string_view/operations/copy/char/constexpr.cc: New test.
	* testsuite/21_strings/basic_string_view/operations/copy/wchar_t/constexpr.cc: New test.

Index: include/bits/char_traits.h
===
--- include/bits/char_traits.h	(revision 278318)
+++ include/bits/char_traits.h	(working copy)
@@ -113,13 +113,13 @@
   static _GLIBCXX14_CONSTEXPR const char_type*
   find(const char_type* __s, std::size_t __n, const char_type& __a);
 
-  static char_type*
+  static _GLIBCXX20_CONSTEXPR char_type*
   move(char_type* __s1, const char_type* __s2, std::size_t __n);
 
-  static char_type*
+  static _GLIBCXX20_CONSTEXPR char_type*
   copy(char_type* __s1, const char_type* __s2, std::size_t __n);
 
-  static char_type*
+  static _GLIBCXX20_CONSTEXPR char_type*
   assign(char_type* __s, std::size_t __n, char_type __a);
 
   static _GLIBCXX_CONSTEXPR char_type
@@ -179,18 +179,17 @@
 }
 
   template
-typename char_traits<_CharT>::char_type*
+_GLIBCXX20_CONSTEXPR typename char_traits<_CharT>::char_type*
 char_traits<_CharT>::
 move(char_type* __s1, const char_type* __s2, std::size_t __n)
 {
   if (__n == 0)
 	return __s1;
-  return static_cast<_CharT*>(__builtin_memmove(__s1, __s2,
-		__n * sizeof(char_type)));
+  return static_cast<_CharT*>(std::__memmove(__s1, __s2, __n));
 }
 
   template
-typename char_traits<_CharT>::char_type*
+_GLIBCXX20_CONSTEXPR typename char_traits<_CharT>::char_type*
 char_traits<_CharT>::
 copy(char_type* __s1, const char_type* __s2, std::size_t __n)
 {
@@ -200,7 +199,7 @@
 }
 
   template
-typename char_traits<_CharT>::char_type*
+_GLIBCXX20_CONSTEXPR typename char_traits<_CharT>::char_type*
 char_traits<_CharT>::
 assign(char_type* __s, std::size_t __n, char_type __a)
 {
@@ -349,28 +348,28 @@
 	return static_cast(__builtin_memchr(__s, __a, __n));
   }
 
-  static char_type*
+  static _GLIBCXX20_CONSTEXPR char_type*
   move(char_type* __s1, const char_type* __s2, size_t __n)
   {
 	if (__n == 0)
 	  return __s1;
-	return static_cast(__builtin_memmove(__s1, __s2, __n));
+	return static_cast(std::__memmove(__s1, __s2, __n));
   }
 
-  static char_type*
+  static _GLIBCXX20_CONSTEXPR char_type*
   copy(char_type* __s1, const char_type* __s2, size_t __n)
   {
 	if (__n == 0)
 	  return __s1;
-	return static_cast(__builtin_memcpy(__s1, __s2, __n));
+	return static_cast(std::__memcpy(__s1, __s2, __n));
   }
 
-  static char_type*
+  static _GLIBCXX20_CONSTEXPR char_type*
   assign(char_type* __s, size_t __n, char_type __a)
   {
 	if (__n == 0)
 	  return __s;
-	return static_cast(__builtin_memset(__s, __a, __n));
+	return static_cast(std::__memset(__s, __a, __n));
   }
 
   static _GLIBCXX_CONSTEXPR char_type
@@ -458,27 +457,42 @@
 	return wmemchr(__s, __a, __n);
   }
 
-  static char_type*
+  static _GLIBCXX20_CONSTEXPR char_type*
   move(char_type* __s1, const char_type* __s2, size_t __n)
   {
 	if (__n == 0)
 	  return __s1;
+#ifdef __cpp_lib_is_constant_evaluated
+	if (std::is_constant_evaluated())
+	  return static_cast(std::__memmove(__s1, __s2, __n));
+	else
+#endif
 	return wmemmove(__s1, __s2, __n);
   }
 
-  static char_type*
+  static _GLIBCXX20_CONSTEXPR char_type*
   copy(char_type* __s1, const char_type* __s2, size_t __n)
   {
 	if (__n == 0)
 	  return __s1;
+#ifdef __cpp_lib_is_constant_evaluated
+	if (std::is_constant_evaluated())
+	  return static_cast(std::__memcpy(__s1, __s2, __n));
+	else
+#endif
 	return