https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67153
Nathan Kurz changed:
What|Removed |Added
CC||nate at verse dot com
--- Comment #21
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67153
--- Comment #23 from Nathan Kurz ---
> 1. As a correction: *without* the count takes twice as long to run as with,
>or when using bitset<>.
Oops, I did say that backwards. My tests agree with what you say.
> 2. As a heuristic, favoring
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67153
--- Comment #22 from ncm at cantrip dot org ---
(In reply to Nathan Kurz from comment #21)
> My current belief is
> that everything here is expected behavior, and there is no bug with either
> the compiler or processor.
>
> The code spends
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67153
--- Comment #19 from Andrew Pinski ---
bitset<> version
real0m1.073s
user0m1.052s
sys 0m0.021s
unsigned int
real0m0.903s
user0m0.883s
sys 0m0.019s
bitset with container adapter:
real0m1.519s
user0m1.499s
sys
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67153
--- Comment #20 from Andrew Pinski ---
(In reply to Andrew Pinski from comment #19)
> bitset<> version
> real0m1.073s
> user0m1.052s
> sys 0m0.021s
>
> unsigned int
> real0m0.903s
> user0m0.883s
> sys 0m0.019s
>
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67153
Richard Biener changed:
What|Removed |Added
Target Milestone|5.3 |5.4
--- Comment #17 from Richard
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67153
--- Comment #18 from ncm at cantrip dot org ---
It is far from clear to me that gcc-5's choice to put the increment value in a
register, and use just one loop body, is wrong. Rather, it appears that an
incidental choice in the placement order of
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67153
Richard Biener changed:
What|Removed |Added
Priority|P3 |P2
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67153
--- Comment #14 from ncm at cantrip dot org ---
A notable difference between g++-4.9 output and g++-5 output is that,
while both hoist the (word == seven) comparison out of the innermost
loop, gcc-4.9 splits inner loop into two versions, one that
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67153
Mikhail Maltsev miyuki at gcc dot gnu.org changed:
What|Removed |Added
CC||miyuki at gcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67153
--- Comment #13 from ncm at cantrip dot org ---
This is essentially the entire difference between the versions of
puzzlegen-int.cc without, and with, the added ++count; line
referenced above (modulo register assignments and branch labels)
that
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67153
--- Comment #12 from ncm at cantrip dot org ---
As regards hot spots, the program has two:
int score[7] = { 0, };
for (Letters word : words)
/**/if (!(word ~seven))
for_each_in_seven([](Letters letter,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67153
--- Comment #9 from ncm at cantrip dot org ---
I did experiment with -m[no-]bmi[2] a fair bit. It all made a significant
difference in the instructions emitted, but exactly zero difference in
runtime. That's actually not surprising at all;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67153
--- Comment #11 from ncm at cantrip dot org ---
Aha, Uroš, I see your name in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62011
Please forgive me for teaching you about micro-ops.
The code being generated for all versions does use (e.g.)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67153
--- Comment #10 from ncm at cantrip dot org ---
I found this, which at first blush seems like it might be relevant.
The hardware complained about here is the same Haswell i7-4770.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67153
--- Comment #7 from Uroš Bizjak ubizjak at gmail dot com ---
(In reply to ncm from comment #6)
It seems worth adding that the same failure occurs without -march=native.
Can you experiment a bit with -mno-bmi and/or -mno-bmi2 compile options?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67153
--- Comment #6 from ncm at cantrip dot org ---
It seems worth adding that the same failure occurs without -march=native.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67153
--- Comment #8 from Uroš Bizjak ubizjak at gmail dot com ---
(In reply to Uroš Bizjak from comment #7)
Can you experiment a bit with -mno-bmi and/or -mno-bmi2 compile options?
Also, perf is able to record execution profiles, it will help you
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67153
--- Comment #4 from ncm at cantrip dot org ---
Also fails 5.2.1 (Debian 5.2.1--15) 5.2.1 20150808
As noted, the third version of the program, using bitset but not using
lambdas, is as slow as the version using unsigned int -- even when built
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67153
--- Comment #5 from ncm at cantrip dot org ---
My preliminary conclusion is that a hardware optimization provided in Haswell
but not in Westmere is not recognizing the opportunity in the unsigned int
test case, that it finds in the original
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67153
--- Comment #3 from ncm at cantrip dot org ---
Created attachment 36159
-- https://gcc.gnu.org/bugzilla/attachment.cgi?id=36159action=edit
bitset, but using an inlined container adapter, not lambdas, and slow
This version compiles just as
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67153
Richard Biener rguenth at gcc dot gnu.org changed:
What|Removed |Added
Keywords|
22 matches
Mail list logo