[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471 Andrew Pinski changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED Target Milestone|--- |10.0 Known to fail||9.4.0 Known to work||10.1.0 --- Comment #27 from Andrew Pinski --- Fixed in GCC 10 by r10-5360 which turns on loop distrubtion at -O2 instead of -O3.
[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471 --- Comment #26 from Matt Bentley --- (In reply to Jonathan Wakely from comment #25) > What warning? Why can't you just pass 0 to __builtin_memset? It's a null > pointer constant. I don't see any warning from clang when using -Weverything. As corrected above, clang only issues the warning when comparing NULL to 0 without static_cast or reinterpret_cast. > reinterpet_cast is forbidden in constexpr functions because it's purpose is > to break the type system and say "trust me, I know what I'm doing", and such > tricks are not allowed in constant expressions. Yeah, but... why > Using reinterpert_cast to convert 0 (a null pointer constant) into a pointer > type is just silly and poor style. That conversion can be done implicitly, > it doesn't need a sledgehammer to be used. That's a bit of silly hyperbole I can do without. > > But __builtin_constant_p allows it, so it's use is a matter of programmer > > choice, at least in this context. > > It really isn't if the standard requires the algorithm to be 'constexpr' > (which we don't implement yet, but there's no point adding constructs which > will just make life harder in the future). Fair enough- M@
[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471 --- Comment #25 from Jonathan Wakely --- (In reply to Matt Bentley from comment #23) > > Actually, don't quote me on that - I may be thinking of the > > 'reinterpret_cast<_Tp>(0)' - one of the two. > > Just to confirm, "reinterpret_cast(__first)" not required in this > context, either "reinterpret_cast<_Tp>(0)" or "static_cast<_Tp>(0)" *are* > required to avoid warnings in clang when _Tp is a pointer. Either works fine. What warning? Why can't you just pass 0 to __builtin_memset? It's a null pointer constant. I don't see any warning from clang when using -Weverything. In C++11 we'd just use nullptr of course, but that can't be used here as the code must compile as C++98. > I understand that reinterpret_cast isn't allowed inside constexpr, but not > why, and can't find any resources explicitly stating the reasoning. reinterpet_cast is forbidden in constexpr functions because it's purpose is to break the type system and say "trust me, I know what I'm doing", and such tricks are not allowed in constant expressions. Using reinterpert_cast to convert 0 (a null pointer constant) into a pointer type is just silly and poor style. That conversion can be done implicitly, it doesn't need a sledgehammer to be used. > But __builtin_constant_p allows it, so it's use is a matter of programmer > choice, at least in this context. It really isn't if the standard requires the algorithm to be 'constexpr' (which we don't implement yet, but there's no point adding constructs which will just make life harder in the future).
[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471 --- Comment #24 from Matt Bentley --- Ugh. I made a mistake. Clang only throws the warning when comparing NULL with 0 without reinterpret_cast/static_cast, not when comparing pointers in general.
[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471 --- Comment #23 from Matt Bentley --- > Actually, don't quote me on that - I may be thinking of the > 'reinterpret_cast<_Tp>(0)' - one of the two. Just to confirm, "reinterpret_cast(__first)" not required in this context, either "reinterpret_cast<_Tp>(0)" or "static_cast<_Tp>(0)" *are* required to avoid warnings in clang when _Tp is a pointer. Either works fine. I understand that reinterpret_cast isn't allowed inside constexpr, but not why, and can't find any resources explicitly stating the reasoning. But __builtin_constant_p allows it, so it's use is a matter of programmer choice, at least in this context.
[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471 --- Comment #22 from Matt Bentley --- (In reply to Jonathan Wakely from comment #21) > Surely static_cast is good enough, and doesn't rule out making the function > constexpr? It may well be, how does reinterpret_cast cause constexpr to fail, given that it's also compile-time?
[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471 --- Comment #21 from Jonathan Wakely --- Surely static_cast is good enough, and doesn't rule out making the function constexpr?
[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471 --- Comment #20 from Matt Bentley --- (In reply to Matt Bentley from comment #19) > (In reply to Jonathan Wakely from comment #18) > > (In reply to Matt Bentley from comment #13) > > > Well it's more that you're doing- at any rate, the issue you've noted is > > > easily bypassed by changing the "reinterpret_cast(__first)" to > > > "reinterpret_cast(&*(__first))". > > > > Also, independent of the non-contiguous problem, using reinterpret_cast here > > is unnecessary (any non-const pointer can be implicitly converted to void*) > > and would prevent adding constexpr to the algorithm (as required for C++2a). > > It is to prevent compiler warnings under clang. Actually, don't quote me on that - I may be thinking of the 'reinterpret_cast<_Tp>(0)' - one of the two.
[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471 --- Comment #19 from Matt Bentley --- (In reply to Jonathan Wakely from comment #18) > (In reply to Matt Bentley from comment #13) > > Well it's more that you're doing- at any rate, the issue you've noted is > > easily bypassed by changing the "reinterpret_cast(__first)" to > > "reinterpret_cast(&*(__first))". > > Also, independent of the non-contiguous problem, using reinterpret_cast here > is unnecessary (any non-const pointer can be implicitly converted to void*) > and would prevent adding constexpr to the algorithm (as required for C++2a). It is to prevent compiler warnings under clang.
[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471 --- Comment #18 from Jonathan Wakely --- (In reply to Matt Bentley from comment #13) > Well it's more that you're doing- at any rate, the issue you've noted is > easily bypassed by changing the "reinterpret_cast(__first)" to > "reinterpret_cast(&*(__first))". Also, independent of the non-contiguous problem, using reinterpret_cast here is unnecessary (any non-const pointer can be implicitly converted to void*) and would prevent adding constexpr to the algorithm (as required for C++2a).
[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471 --- Comment #17 from Jonathan Wakely --- (In reply to Matt Bentley from comment #16) > Yup - arguably it would be nice to have an overload for vector iterators > that does the same thing. We already handle that, see the __niter_base functions which unwrap std::vector and std::basic_string iterators into pointers.
[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471 --- Comment #16 from Matt Bentley --- (In reply to Jonathan Wakely from comment #15) > Look at how this exact problem is already solved elsewhere in the same file. > All the algorithms that dispatch to a __builtin_memxxx function use similar > techniques, detecting when the iterators are pointers for a start. Yup - arguably it would be nice to have an overload for vector iterators that does the same thing. > And see https://gcc.gnu.org/contribute.html#legal for why code pasted into > bugzilla without the necessary paperwork is unhelpful (and possibly even > counter-productive if we end up having to re-invent the same wheel without > using your code). I've read it, it says copyright assignments are unneeded for small contributions. Also, this wasn't patch code, just spitballing code. > Indeed. It's a forward iterator. Even a random access iterator doesn't > guarantee you can do that (e.g. std::deque::iterator is random access but > not contiguous). Yep, got it.
[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471 --- Comment #15 from Jonathan Wakely --- (In reply to Matt Bentley from comment #14) > (In reply to Matt Bentley from comment #13) > > (In reply to Jonathan Wakely from comment #12) > > > Also you're doing a reinterpret_cast from an arbitrary iterator type, > > > which > > > is not necessarily a pointer, or even a random access iterator. > > > > > > Since you don't have a copyright assignment in place please leave the > > > patch > > > to us, this is less than helpful :-) > > > > Well it's more that you're doing- I'm thinking about how to actually fix it, with a patch that works and could be accepted into libstdc++. Look at how this exact problem is already solved elsewhere in the same file. All the algorithms that dispatch to a __builtin_memxxx function use similar techniques, detecting when the iterators are pointers for a start. And see https://gcc.gnu.org/contribute.html#legal for why code pasted into bugzilla without the necessary paperwork is unhelpful (and possibly even counter-productive if we end up having to re-invent the same wheel without using your code). > at any rate, the issue you've noted is > > easily bypassed by changing the "reinterpret_cast(__first)" to > > "reinterpret_cast(&*(__first))". > > Cheers. > > My bad, I missed the point about the memory not necessarily being contiguous. Indeed. It's a forward iterator. Even a random access iterator doesn't guarantee you can do that (e.g. std::deque::iterator is random access but not contiguous).
[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471 --- Comment #14 from Matt Bentley --- (In reply to Matt Bentley from comment #13) > (In reply to Jonathan Wakely from comment #12) > > Also you're doing a reinterpret_cast from an arbitrary iterator type, which > > is not necessarily a pointer, or even a random access iterator. > > > > Since you don't have a copyright assignment in place please leave the patch > > to us, this is less than helpful :-) > > Well it's more that you're doing- at any rate, the issue you've noted is > easily bypassed by changing the "reinterpret_cast(__first)" to > "reinterpret_cast(&*(__first))". > Cheers. My bad, I missed the point about the memory not necessarily being contiguous.
[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471 --- Comment #13 from Matt Bentley --- (In reply to Jonathan Wakely from comment #12) > Also you're doing a reinterpret_cast from an arbitrary iterator type, which > is not necessarily a pointer, or even a random access iterator. > > Since you don't have a copyright assignment in place please leave the patch > to us, this is less than helpful :-) Well it's more that you're doing- at any rate, the issue you've noted is easily bypassed by changing the "reinterpret_cast(__first)" to "reinterpret_cast(&*(__first))". Cheers.
[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471 --- Comment #12 from Jonathan Wakely --- Also you're doing a reinterpret_cast from an arbitrary iterator type, which is not necessarily a pointer, or even a random access iterator. Since you don't have a copyright assignment in place please leave the patch to us, this is less than helpful :-)
[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471 --- Comment #11 from Jonathan Wakely --- __is_pointer_helper and __is_integral_helper are not available in C++98 and are not supposed to be used directly anyway, they're just implementation details for is_pointer and is_integral.
[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471 --- Comment #10 from Matt Bentley --- (In reply to Marc Glisse from comment #9) > (In reply to Andrew Pinski from comment #7) > We could also use __builtin_constant_p, if the function is inlined often > enough (don't know if it is). Right - so then the proposed function becomes: template inline typename __gnu_cxx::__enable_if<__is_pointer_helper<_Tp>::__value || __is_integral_helper<_Tp>::__value, void>::__type __fill_a(_ForwardIterator __first, _ForwardIterator __last, const _Tp& __value) { if (__builtin_constant_p(__value) == 1) { if (__value != reinterpret_cast<_Tp>(0)) { const _Tp __tmp = __value; for (; __first != __last; ++__first) *__first = __tmp; } else { if (const size_t __len = __last - __first) __builtin_memset(reinterpret_cast(__first), 0, __len * sizeof(_Tp)); } } else { const _Tp __tmp = __value; for (; __first != __last; ++__first) *__first = __tmp; } } , if I'm getting the enable_if syntax correct?
[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471 --- Comment #9 from Marc Glisse --- (In reply to Andrew Pinski from comment #7) > This is incorrect for floating point types Because of negative 0 I assume. > And it introduces an extra check at runtime if value is not known to compile > time. We could also use __builtin_constant_p, if the function is inlined often enough (don't know if it is). Best would still be to see if we can enable parts of ldist at -O2.
[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471 --- Comment #8 from Matt Bentley --- > This is incorrect for floating point types and non scalars. And it > introduces an extra check at runtime if value is not known to compile time. This is the overload for scalar types, read the function description line. The extra check is negligible compared to the overhead caused by the alternative looping code vs memset, as is benchmarked above. I had to read up on how floating-point is implementation-defined, so yes you're right, the specialization would have to be further constricted to integral and scalar pointer types using __is_integral_helper & __is_scalar. Whether the commonality of zero-wiping newly allocated arrays outweights the overhead of an additional check for non-zero fills, is a good question.
[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471 --- Comment #7 from Andrew Pinski --- (In reply to Matt Bentley from comment #6) > Suggested patch for libstdc++, std_algobase.h, line 688: > template > inline typename > __gnu_cxx::__enable_if<__is_scalar<_Tp>::__value, void>::__type > __fill_a(_ForwardIterator __first, _ForwardIterator __last, >const _Tp& __value) > { > if (__value != reinterpret_cast<_Tp>(0)) > { > const _Tp __tmp = __value; > for (; __first != __last; ++__first) > *__first = __tmp; > } > else > { > if (const size_t __len = __last - __first) > __builtin_memset(reinterpret_cast(__first), 0, __len * > sizeof(_Tp)); > } > } > > Comments? This is incorrect for floating point types and non scalars. And it introduces an extra check at runtime if value is not known to compile time.
[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471 --- Comment #6 from Matt Bentley --- Suggested patch for libstdc++, std_algobase.h, line 688: template inline typename __gnu_cxx::__enable_if<__is_scalar<_Tp>::__value, void>::__type __fill_a(_ForwardIterator __first, _ForwardIterator __last, const _Tp& __value) { if (__value != reinterpret_cast<_Tp>(0)) { const _Tp __tmp = __value; for (; __first != __last; ++__first) *__first = __tmp; } else { if (const size_t __len = __last - __first) __builtin_memset(reinterpret_cast(__first), 0, __len * sizeof(_Tp)); } } Comments?
[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471 --- Comment #5 from Matt Bentley --- What would be even more useful is a warning: for unused data.
[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471 Richard Biener changed: What|Removed |Added Keywords||missed-optimization Status|UNCONFIRMED |NEW Last reconfirmed||2018-07-11 Ever confirmed|0 |1
[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471 --- Comment #4 from Marc Glisse --- There have been questions before about enabling (parts of) ldist at -O2. (In reply to Matt Bentley from comment #3) > I thought I should note that there is also a missing optimization > opportunity in the code. Clang optimizes the code I've listed to remove the > benchmark loops entirely since it detects that the arrays aren't actually > being used for anything. Indeed, I am surprised gcc didn't optimize out all those useless writes (ints is local and doesn't escape). Seems worth investigating a bit.
[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471 --- Comment #3 from Matt Bentley --- I thought I should note that there is also a missing optimization opportunity in the code. Clang optimizes the code I've listed to remove the benchmark loops entirely since it detects that the arrays aren't actually being used for anything. In order to get clang to benchmark it properly, I had to add a loop which adds the array contents to a total post-benchmark, as follows: #include #include #include double total = 0; static void memory_memset(benchmark::State& state) { int ints[5]; for (auto _ : state) { std::memset(ints, 0, sizeof(int) * 5); } for (int counter = 0; counter != 5; ++counter) { total += ints[counter]; } } static void memory_filln(benchmark::State& state) { int ints[5]; for (auto _ : state) { std::fill_n(ints, 5, 0); } for (int counter = 0; counter != 5; ++counter) { total += ints[counter]; } } static void memory_fill(benchmark::State& state) { int ints[5]; for (auto _ : state) { std::fill(std::begin(ints), std::end(ints), 0); } for (int counter = 0; counter != 5; ++counter) { total += ints[counter]; } } // Register the function as a benchmark BENCHMARK(memory_filln); BENCHMARK(memory_fill); BENCHMARK(memory_memset); int main (int argc, char ** argv) { benchmark::Initialize (, argv); benchmark::RunSpecifiedBenchmarks (); printf("Total = %f\n", total); getchar(); return 0; }
[Bug middle-end/86471] GCC/libstdc++ outputs inferior code for std::fill and std::fill_n vs std::memset on c-style arrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86471 --- Comment #2 from Matt Bentley --- (In reply to Andrew Pinski from comment #1) > Does -O3 bring the performance back? -O3 enables loop distrubtion which > should be able to detect these loops. Just tested - yes. However this is a fairly trivial testcase and not all projects compile with O3 (most don't). Clang by comparison optimizes this out at -O2.