[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2021-12-22 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

Andrew Pinski  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |10.0
  Known to fail||9.4.0
  Known to work||10.1.0

--- Comment #27 from Andrew Pinski  ---
Fixed in GCC 10 by r10-5360 which turns on loop distrubtion at -O2 instead of
-O3.

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-24 Thread mattreecebentley at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #26 from Matt Bentley  ---
(In reply to Jonathan Wakely from comment #25)

> What warning? Why can't you just pass 0 to __builtin_memset? It's a null
> pointer constant. I don't see any warning from clang when using -Weverything.

As corrected above, clang only issues the warning when comparing NULL to 0
without static_cast or reinterpret_cast.


> reinterpet_cast is forbidden in constexpr functions because it's purpose is
> to break the type system and say "trust me, I know what I'm doing", and such
> tricks are not allowed in constant expressions.

Yeah, but... why


> Using reinterpert_cast to convert 0 (a null pointer constant) into a pointer
> type is just silly and poor style. That conversion can be done implicitly,
> it doesn't need a sledgehammer to be used.

That's a bit of silly hyperbole I can do without.


> > But __builtin_constant_p allows it, so it's use is a matter of programmer
> > choice, at least in this context.
> 
> It really isn't if the standard requires the algorithm to be 'constexpr'
> (which we don't implement yet, but there's no point adding constructs which
> will just make life harder in the future).

Fair enough-
M@

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-20 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #25 from Jonathan Wakely  ---
(In reply to Matt Bentley from comment #23)
> > Actually, don't quote me on that - I may be thinking of the
> > 'reinterpret_cast<_Tp>(0)' - one of the two.
> 
> Just to confirm, "reinterpret_cast(__first)" not required in this
> context,  either "reinterpret_cast<_Tp>(0)" or "static_cast<_Tp>(0)" *are*
> required to avoid warnings in clang when _Tp is a pointer. Either works fine.

What warning? Why can't you just pass 0 to __builtin_memset? It's a null
pointer constant. I don't see any warning from clang when using -Weverything.

In C++11 we'd just use nullptr of course, but that can't be used here as the
code must compile as C++98.

> I understand that reinterpret_cast isn't allowed inside constexpr, but not
> why, and can't find any resources explicitly stating the reasoning.

reinterpet_cast is forbidden in constexpr functions because it's purpose is to
break the type system and say "trust me, I know what I'm doing", and such
tricks are not allowed in constant expressions.

Using reinterpert_cast to convert 0 (a null pointer constant) into a pointer
type is just silly and poor style. That conversion can be done implicitly, it
doesn't need a sledgehammer to be used.

> But __builtin_constant_p allows it, so it's use is a matter of programmer
> choice, at least in this context.

It really isn't if the standard requires the algorithm to be 'constexpr' (which
we don't implement yet, but there's no point adding constructs which will just
make life harder in the future).

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-19 Thread mattreecebentley at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #24 from Matt Bentley  ---
Ugh. I made a mistake. Clang only throws the warning when comparing NULL with 0
without reinterpret_cast/static_cast, not when comparing pointers in general.

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-19 Thread mattreecebentley at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #23 from Matt Bentley  ---

> Actually, don't quote me on that - I may be thinking of the
> 'reinterpret_cast<_Tp>(0)' - one of the two.

Just to confirm, "reinterpret_cast(__first)" not required in this
context,  either "reinterpret_cast<_Tp>(0)" or "static_cast<_Tp>(0)" *are*
required to avoid warnings in clang when _Tp is a pointer. Either works fine.
I understand that reinterpret_cast isn't allowed inside constexpr, but not why,
and can't find any resources explicitly stating the reasoning.
But __builtin_constant_p allows it, so it's use is a matter of programmer
choice, at least in this context.

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-19 Thread mattreecebentley at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #22 from Matt Bentley  ---
(In reply to Jonathan Wakely from comment #21)
> Surely static_cast is good enough, and doesn't rule out making the function
> constexpr?

It may well be, how does reinterpret_cast cause constexpr to fail, given that
it's also compile-time?

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-18 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #21 from Jonathan Wakely  ---
Surely static_cast is good enough, and doesn't rule out making the function
constexpr?

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-17 Thread mattreecebentley at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #20 from Matt Bentley  ---
(In reply to Matt Bentley from comment #19)
> (In reply to Jonathan Wakely from comment #18)
> > (In reply to Matt Bentley from comment #13)
> > > Well it's more that you're doing- at any rate, the issue you've noted is
> > > easily bypassed by changing the "reinterpret_cast(__first)" to
> > > "reinterpret_cast(&*(__first))".
> > 
> > Also, independent of the non-contiguous problem, using reinterpret_cast here
> > is unnecessary (any non-const pointer can be implicitly converted to void*)
> > and would prevent adding constexpr to the algorithm (as required for C++2a).
> 
> It is to prevent compiler warnings under clang.

Actually, don't quote me on that - I may be thinking of the
'reinterpret_cast<_Tp>(0)' - one of the two.

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-17 Thread mattreecebentley at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #19 from Matt Bentley  ---
(In reply to Jonathan Wakely from comment #18)
> (In reply to Matt Bentley from comment #13)
> > Well it's more that you're doing- at any rate, the issue you've noted is
> > easily bypassed by changing the "reinterpret_cast(__first)" to
> > "reinterpret_cast(&*(__first))".
> 
> Also, independent of the non-contiguous problem, using reinterpret_cast here
> is unnecessary (any non-const pointer can be implicitly converted to void*)
> and would prevent adding constexpr to the algorithm (as required for C++2a).

It is to prevent compiler warnings under clang.

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-16 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #18 from Jonathan Wakely  ---
(In reply to Matt Bentley from comment #13)
> Well it's more that you're doing- at any rate, the issue you've noted is
> easily bypassed by changing the "reinterpret_cast(__first)" to
> "reinterpret_cast(&*(__first))".

Also, independent of the non-contiguous problem, using reinterpret_cast here is
unnecessary (any non-const pointer can be implicitly converted to void*) and
would prevent adding constexpr to the algorithm (as required for C++2a).

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-16 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #17 from Jonathan Wakely  ---
(In reply to Matt Bentley from comment #16)
> Yup - arguably it would be nice to have an overload for vector iterators
> that does the same thing.

We already handle that, see the __niter_base functions which unwrap std::vector
and std::basic_string iterators into pointers.

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-16 Thread mattreecebentley at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #16 from Matt Bentley  ---
(In reply to Jonathan Wakely from comment #15)

> Look at how this exact problem is already solved elsewhere in the same file.
> All the algorithms that dispatch to a __builtin_memxxx function use similar
> techniques, detecting when the iterators are pointers for a start.

Yup - arguably it would be nice to have an overload for vector iterators that
does the same thing.


> And see https://gcc.gnu.org/contribute.html#legal for why code pasted into
> bugzilla without the necessary paperwork is unhelpful (and possibly even
> counter-productive if we end up having to re-invent the same wheel without
> using your code).

I've read it, it says copyright assignments are unneeded for small
contributions. Also, this wasn't patch code, just spitballing code.


> Indeed. It's a forward iterator. Even a random access iterator doesn't
> guarantee you can do that (e.g. std::deque::iterator is random access but
> not contiguous).

Yep, got it.

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-16 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #15 from Jonathan Wakely  ---
(In reply to Matt Bentley from comment #14)
> (In reply to Matt Bentley from comment #13)
> > (In reply to Jonathan Wakely from comment #12)
> > > Also you're doing a reinterpret_cast from an arbitrary iterator type, 
> > > which
> > > is not necessarily a pointer, or even a random access iterator.
> > > 
> > > Since you don't have a copyright assignment in place please leave the 
> > > patch
> > > to us, this is less than helpful :-)
> > 
> > Well it's more that you're doing-

I'm thinking about how to actually fix it, with a patch that works and could be
accepted into libstdc++.

Look at how this exact problem is already solved elsewhere in the same file.
All the algorithms that dispatch to a __builtin_memxxx function use similar
techniques, detecting when the iterators are pointers for a start.

And see https://gcc.gnu.org/contribute.html#legal for why code pasted into
bugzilla without the necessary paperwork is unhelpful (and possibly even
counter-productive if we end up having to re-invent the same wheel without
using your code).

> at any rate, the issue you've noted is
> > easily bypassed by changing the "reinterpret_cast(__first)" to
> > "reinterpret_cast(&*(__first))".
> > Cheers.
> 
> My bad, I missed the point about the memory not necessarily being contiguous.

Indeed. It's a forward iterator. Even a random access iterator doesn't
guarantee you can do that (e.g. std::deque::iterator is random access but not
contiguous).

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-16 Thread mattreecebentley at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #14 from Matt Bentley  ---
(In reply to Matt Bentley from comment #13)
> (In reply to Jonathan Wakely from comment #12)
> > Also you're doing a reinterpret_cast from an arbitrary iterator type, which
> > is not necessarily a pointer, or even a random access iterator.
> > 
> > Since you don't have a copyright assignment in place please leave the patch
> > to us, this is less than helpful :-)
> 
> Well it's more that you're doing- at any rate, the issue you've noted is
> easily bypassed by changing the "reinterpret_cast(__first)" to
> "reinterpret_cast(&*(__first))".
> Cheers.

My bad, I missed the point about the memory not necessarily being contiguous.

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-16 Thread mattreecebentley at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #13 from Matt Bentley  ---
(In reply to Jonathan Wakely from comment #12)
> Also you're doing a reinterpret_cast from an arbitrary iterator type, which
> is not necessarily a pointer, or even a random access iterator.
> 
> Since you don't have a copyright assignment in place please leave the patch
> to us, this is less than helpful :-)

Well it's more that you're doing- at any rate, the issue you've noted is easily
bypassed by changing the "reinterpret_cast(__first)" to
"reinterpret_cast(&*(__first))".
Cheers.

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-16 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #12 from Jonathan Wakely  ---
Also you're doing a reinterpret_cast from an arbitrary iterator type, which is
not necessarily a pointer, or even a random access iterator.

Since you don't have a copyright assignment in place please leave the patch to
us, this is less than helpful :-)

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-16 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #11 from Jonathan Wakely  ---
__is_pointer_helper and __is_integral_helper are not available in C++98 and are
not supposed to be used directly anyway, they're just implementation details
for is_pointer and is_integral.

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-15 Thread mattreecebentley at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #10 from Matt Bentley  ---
(In reply to Marc Glisse from comment #9)
> (In reply to Andrew Pinski from comment #7)
> We could also use __builtin_constant_p, if the function is inlined often
> enough (don't know if it is).

Right - so then the proposed function becomes:

template
inline typename
__gnu_cxx::__enable_if<__is_pointer_helper<_Tp>::__value ||
__is_integral_helper<_Tp>::__value, void>::__type
__fill_a(_ForwardIterator __first, _ForwardIterator __last,
 const _Tp& __value)
{
  if (__builtin_constant_p(__value) == 1)
  {
if (__value != reinterpret_cast<_Tp>(0))
  {
  const _Tp __tmp = __value;
  for (; __first != __last; ++__first)
*__first = __tmp;
}
else
{
  if (const size_t __len = __last - __first)
__builtin_memset(reinterpret_cast(__first), 0,
__len *
sizeof(_Tp));
}
}
else
{
  const _Tp __tmp = __value;
  for (; __first != __last; ++__first)
*__first = __tmp;
}
}

, if I'm getting the enable_if syntax correct?

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-13 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #9 from Marc Glisse  ---
(In reply to Andrew Pinski from comment #7)
> This is incorrect for floating point types

Because of negative 0 I assume.

> And it introduces an extra check at runtime if value is not known to compile 
> time.

We could also use __builtin_constant_p, if the function is inlined often enough
(don't know if it is).

Best would still be to see if we can enable parts of ldist at -O2.

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-12 Thread mattreecebentley at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #8 from Matt Bentley  ---
> This is incorrect for floating point types and non scalars.  And it
> introduces an extra check at runtime if value is not known to compile time.

This is the overload for scalar types, read the function description line.
The extra check is negligible compared to the overhead caused by the
alternative looping code vs memset, as is benchmarked above.

I had to read up on how floating-point is implementation-defined, so yes you're
right, the specialization would have to be further constricted to integral and
scalar pointer types using __is_integral_helper & __is_scalar.
Whether the commonality of zero-wiping newly allocated arrays outweights the
overhead of an additional check for non-zero fills, is a good question.

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-12 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #7 from Andrew Pinski  ---
(In reply to Matt Bentley from comment #6)
> Suggested patch for libstdc++, std_algobase.h, line 688:
>   template
> inline typename
> __gnu_cxx::__enable_if<__is_scalar<_Tp>::__value, void>::__type
> __fill_a(_ForwardIterator __first, _ForwardIterator __last,
>const _Tp& __value)
> {
>   if (__value != reinterpret_cast<_Tp>(0))
>   {
> const _Tp __tmp = __value;
> for (; __first != __last; ++__first)
>   *__first = __tmp;
>   }
>   else
>   {
> if (const size_t __len = __last - __first)
>   __builtin_memset(reinterpret_cast(__first), 0, __len *
> sizeof(_Tp));
>   }
> }
> 
> Comments?
This is incorrect for floating point types and non scalars.  And it introduces
an extra check at runtime if value is not known to compile time.

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-12 Thread mattreecebentley at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #6 from Matt Bentley  ---
Suggested patch for libstdc++, std_algobase.h, line 688:
  template
inline typename
__gnu_cxx::__enable_if<__is_scalar<_Tp>::__value, void>::__type
__fill_a(_ForwardIterator __first, _ForwardIterator __last,
 const _Tp& __value)
{
  if (__value != reinterpret_cast<_Tp>(0))
  {
  const _Tp __tmp = __value;
  for (; __first != __last; ++__first)
*__first = __tmp;
}
else
{
  if (const size_t __len = __last - __first)
__builtin_memset(reinterpret_cast(__first), 0, __len *
sizeof(_Tp));
}
}

Comments?

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-12 Thread mattreecebentley at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #5 from Matt Bentley  ---
What would be even more useful is a warning: for unused data.

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-07-11
 Ever confirmed|0   |1

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-10 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #4 from Marc Glisse  ---
There have been questions before about enabling (parts of) ldist at -O2.

(In reply to Matt Bentley from comment #3)
> I thought I should note that there is also a missing optimization
> opportunity in the code. Clang optimizes the code I've listed to remove the
> benchmark loops entirely since it detects that the arrays aren't actually
> being used for anything. 

Indeed, I am surprised gcc didn't optimize out all those useless writes (ints
is local and doesn't escape). Seems worth investigating a bit.

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-10 Thread mattreecebentley at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #3 from Matt Bentley  ---
I thought I should note that there is also a missing optimization opportunity
in the code. Clang optimizes the code I've listed to remove the benchmark loops
entirely since it detects that the arrays aren't actually being used for
anything. 

In order to get clang to benchmark it properly, I had to add a loop which adds
the array contents to a total post-benchmark, as follows:

#include 
#include 
#include 

double total = 0;


static void memory_memset(benchmark::State& state)
{
int ints[5];

for (auto _ : state)
{
std::memset(ints, 0, sizeof(int) * 5);
}

for (int counter = 0; counter != 5; ++counter)
{
total += ints[counter];
}
}


static void memory_filln(benchmark::State& state)
{
int ints[5];

for (auto _ : state)
{
std::fill_n(ints, 5, 0);
}

for (int counter = 0; counter != 5; ++counter)
{
total += ints[counter];
}
}


static void memory_fill(benchmark::State& state)
{
int ints[5];

for (auto _ : state)
{
std::fill(std::begin(ints), std::end(ints), 0);
}

for (int counter = 0; counter != 5; ++counter)
{
total += ints[counter];
}
}


// Register the function as a benchmark
BENCHMARK(memory_filln);
BENCHMARK(memory_fill);
BENCHMARK(memory_memset);



int main (int argc, char ** argv)
{
benchmark::Initialize (, argv);
benchmark::RunSpecifiedBenchmarks ();
printf("Total = %f\n", total);
getchar();
return 0;
}

[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays

2018-07-10 Thread mattreecebentley at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471

--- Comment #2 from Matt Bentley  ---
(In reply to Andrew Pinski from comment #1)
> Does -O3 bring the performance back?  -O3 enables loop distrubtion which
> should be able to detect these loops.

Just tested - yes.
However this is a fairly trivial testcase and not all projects compile with O3
(most don't).
Clang by comparison optimizes this out at -O2.