[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 --- Comment #30 from Aldy Hernandez --- On 5/26/21 3:23 PM, rguenther at suse dot de wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 > > --- Comment #29 from rguenther at suse dot de --- > On Wed, 26 May 2021, amacleod at redhat dot com wrote: > >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 >> >> --- Comment #28 from Andrew Macleod --- >> (In reply to [email protected] from comment #27) >>> On Wed, 26 May 2021, aldyh at redhat dot com wrote: >>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 --- Comment #26 from Aldy Hernandez --- On Wed, May 26, 2021 at 10:34 AM rguenther at suse dot de wrote: > It's probably too strict for multiple_of_p which is fine with > overflows that preserve modulo behavior. Could you provide an example? >>> >>> Like with DECL_SIZE being D.1234 * 8 as unsigned multiplication >>> and the query whether it's a multiple of 8. Once you have no >>> range for 'D.1234' you will signal overflow (correctly) but >>> even then it's still a mutliple of 8. >> >> Determining whether an arbitrary expression is a multiple of some number is >> not >> really something we can figure out via ranges. Well, that's not quite true. >> If >> we fully fleshed out the operations you care about, things like multiply or >> shift you could get some results. presumably things like multiply by 2,4,8 >> and >> 16.. if we created correct multi-ranges for those, a cast of the high >> precision >> range back to the original precision would yield an identical non-varying >> range. and for non-multiples/unsupported values we'd get varying or something >> not the same as the original value?. This would only work if the original >> value doesn't come out varying.Although if its varying, maybe you dont >> care >> and a match is ok anyway?We could have may_overflow_p also return the >> higher precision range for inspection if its true... > > I guess optimally multiple_of_p would check for each individual > operation it looks at whether the op is transparent for the > modulo query and only if not checks whether there's possible overflow > (where ranges could help). My POC gimple_ranger_wider class can just be overloaded and in range_of_expr, you could just hijack the operations you care about (PLUS_EXPR, MULTIPLY_EXPR, etc) and treat them specially. Just a thought... Aldy
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 --- Comment #29 from rguenther at suse dot de --- On Wed, 26 May 2021, amacleod at redhat dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 > > --- Comment #28 from Andrew Macleod --- > (In reply to [email protected] from comment #27) > > On Wed, 26 May 2021, aldyh at redhat dot com wrote: > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 > > > > > > --- Comment #26 from Aldy Hernandez --- > > > On Wed, May 26, 2021 at 10:34 AM rguenther at suse dot de > > > wrote: > > > > > > > It's probably too strict for multiple_of_p which is fine with > > > > overflows that preserve modulo behavior. > > > > > > Could you provide an example? > > > > Like with DECL_SIZE being D.1234 * 8 as unsigned multiplication > > and the query whether it's a multiple of 8. Once you have no > > range for 'D.1234' you will signal overflow (correctly) but > > even then it's still a mutliple of 8. > > Determining whether an arbitrary expression is a multiple of some number is > not > really something we can figure out via ranges. Well, that's not quite true. If > we fully fleshed out the operations you care about, things like multiply or > shift you could get some results. presumably things like multiply by 2,4,8 and > 16.. if we created correct multi-ranges for those, a cast of the high > precision > range back to the original precision would yield an identical non-varying > range. and for non-multiples/unsupported values we'd get varying or something > not the same as the original value?. This would only work if the original > value doesn't come out varying.Although if its varying, maybe you dont > care > and a match is ok anyway?We could have may_overflow_p also return the > higher precision range for inspection if its true... I guess optimally multiple_of_p would check for each individual operation it looks at whether the op is transparent for the modulo query and only if not checks whether there's possible overflow (where ranges could help).
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 --- Comment #28 from Andrew Macleod --- (In reply to [email protected] from comment #27) > On Wed, 26 May 2021, aldyh at redhat dot com wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 > > > > --- Comment #26 from Aldy Hernandez --- > > On Wed, May 26, 2021 at 10:34 AM rguenther at suse dot de > > wrote: > > > > > It's probably too strict for multiple_of_p which is fine with > > > overflows that preserve modulo behavior. > > > > Could you provide an example? > > Like with DECL_SIZE being D.1234 * 8 as unsigned multiplication > and the query whether it's a multiple of 8. Once you have no > range for 'D.1234' you will signal overflow (correctly) but > even then it's still a mutliple of 8. Determining whether an arbitrary expression is a multiple of some number is not really something we can figure out via ranges. Well, that's not quite true. If we fully fleshed out the operations you care about, things like multiply or shift you could get some results. presumably things like multiply by 2,4,8 and 16.. if we created correct multi-ranges for those, a cast of the high precision range back to the original precision would yield an identical non-varying range. and for non-multiples/unsupported values we'd get varying or something not the same as the original value?. This would only work if the original value doesn't come out varying.Although if its varying, maybe you dont care and a match is ok anyway?We could have may_overflow_p also return the higher precision range for inspection if its true...
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 --- Comment #27 from rguenther at suse dot de --- On Wed, 26 May 2021, aldyh at redhat dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 > > --- Comment #26 from Aldy Hernandez --- > On Wed, May 26, 2021 at 10:34 AM rguenther at suse dot de > wrote: > > > It's probably too strict for multiple_of_p which is fine with > > overflows that preserve modulo behavior. > > Could you provide an example? Like with DECL_SIZE being D.1234 * 8 as unsigned multiplication and the query whether it's a multiple of 8. Once you have no range for 'D.1234' you will signal overflow (correctly) but even then it's still a mutliple of 8.
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 --- Comment #26 from Aldy Hernandez --- On Wed, May 26, 2021 at 10:34 AM rguenther at suse dot de wrote: > It's probably too strict for multiple_of_p which is fine with > overflows that preserve modulo behavior. Could you provide an example?
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499
--- Comment #25 from rguenther at suse dot de ---
On Wed, 26 May 2021, aldyh at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499
>
> --- Comment #23 from Aldy Hernandez ---
> I have an upcoming patchset that implements a range evaluator for tree
> expressions (similar to determine_value_range), as well as a gimple_ranger
> that
> evaluates expressions in a higher precision. This combination looks like it
> could provide a way for determining overflow:
>
> // Return TRUE if evaluating EXPR in its type can produce an overflow.
> // Overflow is determined by calculating the possible range for EXPR
> // in its natural type and in a wider type. If the results differ,
> // evaluating EXPR may have overflowed.
>
> bool
> may_overflow_p (const_tree expr)
> {
> gimple_ranger ranger;
> gimple_ranger_wider wranger (TREE_TYPE (expr));
> int_range_max r, w;
>
> if (!ranger.range_of_expr (r, const_cast (expr))
> || !wranger.range_of_expr (w, const_cast (expr)))
> return true;
>
> return r != w;
> }
>
> Of course, we'd need to adapt the above to re-use any active ranger instead of
> instantiating a new one every time.
>
> The above yields overflow for the 16-bit expression in question:
>
> (gdb) p debug(top)
> g_2823_lsm.5_6 * 7854 + 57682
>
> (gdb) p may_overflow_p (top)
> $6 = true
>
> This is because the range of the above expression in unsigned 16-bit yields
> VARYING (R), whereas in 32-bit unsigned yields [57682, 514769572] (W).
>
> Plugging in the above into multiple_of_p:
>
> diff --git a/gcc/fold-const.c b/gcc/fold-const.c
> index 4a4358362e1..ea8ec3bbeec 100644
> --- a/gcc/fold-const.c
> +++ b/gcc/fold-const.c
> @@ -13987,6 +13987,9 @@ multiple_of_p (tree type, const_tree top, const_tree
> bottom)
>if (TREE_CODE (type) != INTEGER_TYPE)
> return 0;
>
> + if (may_overflow_p (top))
> +return false;
> +
>switch (TREE_CODE (top))
> {
> case BIT_AND_EXPR:
>
> ...yields what I think is the correct result:
>
> (base) abulafia:~/bld/t/gcc$ ./xgcc -B./ a.c -O1 -w
> (base) abulafia:~/bld/t/gcc$ ./a.out
> start g_2823 = 60533
> end g_2823 = 32768
> (base) abulafia:~/bld/t/gcc$ ./xgcc -B./ a.c -O1 -fpeel-loops
> -ftree-loop-vectorize -w
> (base) abulafia:~/bld/t/gcc$ ./a.out
> start g_2823 = 60533
> end g_2823 = 32768
>
> Does this seem right?
It's probably too strict for multiple_of_p which is fine with
overflows that preserve modulo behavior.
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 --- Comment #24 from Aldy Hernandez --- (In reply to Aldy Hernandez from comment #23) > The above yields overflow for the 16-bit expression in question: > > (gdb) p debug(top) > g_2823_lsm.5_6 * 7854 + 57682 > > (gdb) p may_overflow_p (top) > $6 = true > > This is because the range of the above expression in unsigned 16-bit yields > VARYING (R), whereas in 32-bit unsigned yields [57682, 514769572] (W). Which, I will note, matches Andrew's hand calculation of: >so the original expression is in 16 bit math, and if it was evaluated as > [0, 65535] * [7854, 7854] + [57682, 57682] > in 32 bit precision, it would come back with the answer [57682, 514769572].
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499
--- Comment #23 from Aldy Hernandez ---
I have an upcoming patchset that implements a range evaluator for tree
expressions (similar to determine_value_range), as well as a gimple_ranger that
evaluates expressions in a higher precision. This combination looks like it
could provide a way for determining overflow:
// Return TRUE if evaluating EXPR in its type can produce an overflow.
// Overflow is determined by calculating the possible range for EXPR
// in its natural type and in a wider type. If the results differ,
// evaluating EXPR may have overflowed.
bool
may_overflow_p (const_tree expr)
{
gimple_ranger ranger;
gimple_ranger_wider wranger (TREE_TYPE (expr));
int_range_max r, w;
if (!ranger.range_of_expr (r, const_cast (expr))
|| !wranger.range_of_expr (w, const_cast (expr)))
return true;
return r != w;
}
Of course, we'd need to adapt the above to re-use any active ranger instead of
instantiating a new one every time.
The above yields overflow for the 16-bit expression in question:
(gdb) p debug(top)
g_2823_lsm.5_6 * 7854 + 57682
(gdb) p may_overflow_p (top)
$6 = true
This is because the range of the above expression in unsigned 16-bit yields
VARYING (R), whereas in 32-bit unsigned yields [57682, 514769572] (W).
Plugging in the above into multiple_of_p:
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 4a4358362e1..ea8ec3bbeec 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -13987,6 +13987,9 @@ multiple_of_p (tree type, const_tree top, const_tree
bottom)
if (TREE_CODE (type) != INTEGER_TYPE)
return 0;
+ if (may_overflow_p (top))
+return false;
+
switch (TREE_CODE (top))
{
case BIT_AND_EXPR:
...yields what I think is the correct result:
(base) abulafia:~/bld/t/gcc$ ./xgcc -B./ a.c -O1 -w
(base) abulafia:~/bld/t/gcc$ ./a.out
start g_2823 = 60533
end g_2823 = 32768
(base) abulafia:~/bld/t/gcc$ ./xgcc -B./ a.c -O1 -fpeel-loops
-ftree-loop-vectorize -w
(base) abulafia:~/bld/t/gcc$ ./a.out
start g_2823 = 60533
end g_2823 = 32768
Does this seem right?
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 --- Comment #22 from Jeffrey A. Law --- I have vague memories of it, but it wasn't my code. It was actually Craig Burley. It's original purpose was merely to allow converting *_DIV_EXPR into EXACT_DIV_EXPR which presumably was important for some g77 cases way back then.
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 --- Comment #21 from Andrew Macleod --- > > > > Would this be useful? and would it solve this problem? I'm sure there are > > other details to work out related to the increased precision, but it seems > > like it might work? > > Hmm, so the issue we're facing is asking whether x * 3 is a multiple of 3 > but we have to consider that x * 3 might overflow. For '3' being > power-of-two > overflow doesn't change anything but for some C ((X * C) % S % C) != 0 > (where S is the modulo reduction applied on overflow). > > So it seems that, to be useful, this ranger expression walk would need to be > integrated with multiple_of_p itself. Or we need to restrict calls to > multiple_of_p to those C where S is a multiple of C. (but in principle > multiple_of_p is happy with a symbolic C - not sure if it is ever called with > one though) > We'll add the general expression evaluator, since I think its pretty straightforward and useful in general. And if we then add the ability to indicate an expression may overflow, then multiple_of_p can decide whether that is a check it wants to make or not, and if it is, then it can simply ask. multiple_of_p would simply be a client calling "may_overflow_p (expr)"
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 --- Comment #20 from Aldy Hernandez --- On Wed, May 19, 2021 at 8:31 AM rguenth at gcc dot gnu.org wrote: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 > > --- Comment #17 from Richard Biener --- > (In reply to Andrew Macleod from comment #16) > > We could add an expression evaluator that can walk that expression, invoking > > range-ops on each expression, and calling a ranger instance to evaluate a > > range for any ssa_name it finds. > > > > It would bail if there are unknown tree-codes to range-ops. > > Yeah, it would be similar to the existing determine_value_range () function > which does exactly do this (but not using ranger). determine_value_range() has been calling range-ops under the covers for quite a while, so it's half-way there. It would require some minor tweaks: a) Use irange instead of value_range so as to not throw away the higher precision range-ops calculates. b) If we want context-aware ranges, pass it a gimple statement / edge / etc, and a range_query/ranger. Oh yeah, and return a proper range, not this value_range_kind + wide_int + wide_int business (determine_value_range_1 does this already).
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 --- Comment #19 from bin cheng --- (In reply to bin cheng from comment #18) > Did some experiments, there are two fallouts after explicitly returning > false for unsigned/wrapping types in MULT_EXPR/MINUS_EXPR/PLUS_EXPR. One is > the mentioned use of multiple_of_p in number_of_iterations_ne, the other is > for alignment warning in stor-layout.c. As pointed out, the latter case is > known not overflow/wrap. > > So I am thinking to introduce an additional parameter indicating that caller > knows "top" doesn't overfow/wrap, otherwise, try to get rid of the > undocumented assumption. we can always improve the accuracy using ranger or > other tools. Not sure if this is the right way to do. > > As for MULT_NO_OVERFLOW/PLUS_NO_OVERFLOW, IMHO, it's not that simple? For > example, unsigned_num(multiple of 4, and larger than 0) + 0xfffc is > multiple of 4, but it's overflow behavior on which we rely here. Hmm, 4 is special and not a correct example. Considering: n (unsigned, multiple of 3, and > 0) + 0xfffd It's multiple of 3, but we need to rely on wrapping to get answer.
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 --- Comment #18 from bin cheng --- Did some experiments, there are two fallouts after explicitly returning false for unsigned/wrapping types in MULT_EXPR/MINUS_EXPR/PLUS_EXPR. One is the mentioned use of multiple_of_p in number_of_iterations_ne, the other is for alignment warning in stor-layout.c. As pointed out, the latter case is known not overflow/wrap. So I am thinking to introduce an additional parameter indicating that caller knows "top" doesn't overfow/wrap, otherwise, try to get rid of the undocumented assumption. we can always improve the accuracy using ranger or other tools. Not sure if this is the right way to do. As for MULT_NO_OVERFLOW/PLUS_NO_OVERFLOW, IMHO, it's not that simple? For example, unsigned_num(multiple of 4, and larger than 0) + 0xfffc is multiple of 4, but it's overflow behavior on which we rely here.
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 --- Comment #17 from Richard Biener --- (In reply to Andrew Macleod from comment #16) > Aldy and I are discussing this. > > Ranger itself can't do anything outside of the gimple IL, its effectively > just a GIMPLE interface to range-ops. ... but I don't think it would be > hard to add a range-ops expression evaluator. I notice the multiple_of_p's > "top" argument is an expression tree ie: > > g_2823_lsm.5_13 * 7854 + 57682 > > We could add an expression evaluator that can walk that expression, invoking > range-ops on each expression, and calling a ranger instance to evaluate a > range for any ssa_name it finds. > > It would bail if there are unknown tree-codes to range-ops. Yeah, it would be similar to the existing determine_value_range () function which does exactly do this (but not using ranger). > I don't think this is a particularly difficult thing to do, but by itself > doesn't really tell you if an overflow is possible Yes, so in itself it wouldn't be enough. > Once we can evaluate an expression outside of the IL like this, it would > also not be difficult to then evaluate the expression in an increased > precision. > We could cast the range at each point of the evaluation to the increased > precision and invoke range-ops on that range. We could tell at any point if > an overflow is possible because the increased precision calculation would be > different than the original. > > so the original expression is in 16 bit math, and if it was evaluated as > [0, 65535] * [7854, 7854] + [57682, 57682] > in 32 bit precision, it would come back with the answer [57682, 514769572]. > Casting the final original value to 32 bit would yield a different result, > and we could conclude than an overflow is possible. > > Would this be useful? and would it solve this problem? I'm sure there are > other details to work out related to the increased precision, but it seems > like it might work? Hmm, so the issue we're facing is asking whether x * 3 is a multiple of 3 but we have to consider that x * 3 might overflow. For '3' being power-of-two overflow doesn't change anything but for some C ((X * C) % S % C) != 0 (where S is the modulo reduction applied on overflow). So it seems that, to be useful, this ranger expression walk would need to be integrated with multiple_of_p itself. Or we need to restrict calls to multiple_of_p to those C where S is a multiple of C. (but in principle multiple_of_p is happy with a symbolic C - not sure if it is ever called with one though) > I think Martin Sebor was also interested in something along these lines so > I'm CC ing him. I think he wanted to do this within the IL for some of his > warnings.. but I think something similar is feasible with an IL walk rather > than expression walk.
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 Andrew Macleod changed: What|Removed |Added CC||msebor at gcc dot gnu.org --- Comment #16 from Andrew Macleod --- Aldy and I are discussing this. Ranger itself can't do anything outside of the gimple IL, its effectively just a GIMPLE interface to range-ops. ... but I don't think it would be hard to add a range-ops expression evaluator. I notice the multiple_of_p's "top" argument is an expression tree ie: g_2823_lsm.5_13 * 7854 + 57682 We could add an expression evaluator that can walk that expression, invoking range-ops on each expression, and calling a ranger instance to evaluate a range for any ssa_name it finds. It would bail if there are unknown tree-codes to range-ops. I don't think this is a particularly difficult thing to do, but by itself doesn't really tell you if an overflow is possible Once we can evaluate an expression outside of the IL like this, it would also not be difficult to then evaluate the expression in an increased precision. We could cast the range at each point of the evaluation to the increased precision and invoke range-ops on that range. We could tell at any point if an overflow is possible because the increased precision calculation would be different than the original. so the original expression is in 16 bit math, and if it was evaluated as [0, 65535] * [7854, 7854] + [57682, 57682] in 32 bit precision, it would come back with the answer [57682, 514769572]. Casting the final original value to 32 bit would yield a different result, and we could conclude than an overflow is possible. Would this be useful? and would it solve this problem? I'm sure there are other details to work out related to the increased precision, but it seems like it might work? I think Martin Sebor was also interested in something along these lines so I'm CC ing him. I think he wanted to do this within the IL for some of his warnings.. but I think something similar is feasible with an IL walk rather than expression walk.
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499
--- Comment #15 from rguenther at suse dot de ---
On Tue, 18 May 2021, amker at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499
>
> --- Comment #14 from bin cheng ---
> (In reply to Richard Biener from comment #12)
> > So in number_of_iterations_ne it looks like the step 's' is always constant
> > which makes me wonder if we can somehow use ranger to tell multiple_of_p
> > (type, c, s)
> > or at least whether, if c is x * s, the multiplication could have
> > overflowed?
>
> Yeah, I am looking if "multiple of" can be feasibly checked in niter analysis,
> with help of some basic information from multiple_of_p.
>
> BTW, I am not following changes in "ranger", how should I used in analysis? or
> similar to value range info?
I'm not sure - let's see if the ranger folks have any good idea here.
Btw, there's tree_ctz which looks more conservative and could be used
for power-of-two 's'. split_constant_offset also recently got
refactoring to avoid a plethora of overflow issues it ran into,
so we can eventually improve multiple_of_p to be correct without
pre-conditions. But I fear that for DECL_SIZE & friends where
we "know" that multiplications by 8 to get from byte to bit size
do not overflow we cannot be too conservative here. Maybe in the
end we need to distinguish those with new MULT_NO_OVERFLOW,
PLUS_NO_OVERFLOW, etc. When creating those expressions we should
already be using size_{bin,un}op. The conversion handling of course
still looks bogus to me even in this context.
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 --- Comment #14 from bin cheng --- (In reply to Richard Biener from comment #12) > So in number_of_iterations_ne it looks like the step 's' is always constant > which makes me wonder if we can somehow use ranger to tell multiple_of_p > (type, c, s) > or at least whether, if c is x * s, the multiplication could have overflowed? Yeah, I am looking if "multiple of" can be feasibly checked in niter analysis, with help of some basic information from multiple_of_p. BTW, I am not following changes in "ranger", how should I used in analysis? or similar to value range info? Thanks
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 --- Comment #13 from bin cheng --- (In reply to Richard Biener from comment #10) > (In reply to bin cheng from comment #9) > > Seems we have a long standing bug in fold-const.c:multiple_of_p in case of > > wrapping types. Take unsigned int as an example: > > (0xfffc * 0x3) % 0x3 = 0x1 > > But multiple_of_p returns true here. > > > > The same issue also stands for MINUS_EXPR and PLUS_EXPR. Given > > multiple_of_p is used elsewhere, the fix might break existing optimizations. > > Especially, number of loop iterations is computed in unsigned types > > multiple_of_p is mostly used in contexts where overflow "cannot happen" > (in TYPE/DECL_SIZE computation context), and in niter analysis it seems to > be guarded similarly. This restriction of multiple_of_p seems undocumented, Oh, I am not aware of this. Actually my previous change to it seems broke this assumption already. Will see how to fix or revert the change. > so fixing that might be good. > > Now, you don't say what's the chain of events that lead to a multiple_of_p > call > eventually leading to the wrong answer, but I guess it's the code added > under the > > + if (!niter->control.no_overflow > + && (integer_onep (s) || multiple_of_p (type, c, s))) > > check as !niter->control.no_overflow seems to suggest that the multiple_of_p > check is not properly guarded?
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 Richard Biener changed: What|Removed |Added CC||aldyh at gcc dot gnu.org --- Comment #12 from Richard Biener --- So in number_of_iterations_ne it looks like the step 's' is always constant which makes me wonder if we can somehow use ranger to tell multiple_of_p (type, c, s) or at least whether, if c is x * s, the multiplication could have overflowed?
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499
Richard Biener changed:
What|Removed |Added
CC||jakub at gcc dot gnu.org,
||law at gcc dot gnu.org
--- Comment #11 from Richard Biener ---
But there's still
int
multiple_of_p (tree type, const_tree top, const_tree bottom)
{
...
case NOP_EXPR:
/* Can't handle conversions from non-integral or wider integral type. */
if ((TREE_CODE (TREE_TYPE (TREE_OPERAND (top, 0))) != INTEGER_TYPE)
|| (TYPE_PRECISION (type)
< TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (top, 0)
return 0;
/* fall through */
case SAVE_EXPR:
return multiple_of_p (type, TREE_OPERAND (top, 0), bottom);
which makes guards like
|| (TYPE_OVERFLOW_UNDEFINED (type)
&& multiple_of_p (type, c, s)))
break for say multiple_of_p (int, (int)(unsigned)(x*s), s). So it's a bit
fishy in the end ... :/
Jeff originally introduced multiple_of_p, maybe he remembers something.
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 --- Comment #10 from Richard Biener --- (In reply to bin cheng from comment #9) > Seems we have a long standing bug in fold-const.c:multiple_of_p in case of > wrapping types. Take unsigned int as an example: > (0xfffc * 0x3) % 0x3 = 0x1 > But multiple_of_p returns true here. > > The same issue also stands for MINUS_EXPR and PLUS_EXPR. Given > multiple_of_p is used elsewhere, the fix might break existing optimizations. > Especially, number of loop iterations is computed in unsigned types multiple_of_p is mostly used in contexts where overflow "cannot happen" (in TYPE/DECL_SIZE computation context), and in niter analysis it seems to be guarded similarly. This restriction of multiple_of_p seems undocumented, so fixing that might be good. Now, you don't say what's the chain of events that lead to a multiple_of_p call eventually leading to the wrong answer, but I guess it's the code added under the + if (!niter->control.no_overflow + && (integer_onep (s) || multiple_of_p (type, c, s))) check as !niter->control.no_overflow seems to suggest that the multiple_of_p check is not properly guarded?
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 --- Comment #9 from bin cheng --- Seems we have a long standing bug in fold-const.c:multiple_of_p in case of wrapping types. Take unsigned int as an example: (0xfffc * 0x3) % 0x3 = 0x1 But multiple_of_p returns true here. The same issue also stands for MINUS_EXPR and PLUS_EXPR. Given multiple_of_p is used elsewhere, the fix might break existing optimizations. Especially, number of loop iterations is computed in unsigned types
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 --- Comment #8 from Martin Liška --- (In reply to bin cheng from comment #7) > (In reply to Martin Liška from comment #4) > > (In reply to Martin Liška from comment #3) > > > But expected result is end g_2823 = 32768, right? > > > Clang returns the same result 32768. > > > > Which regresses since r7-2373-g69b806f6a60efcf1. > > Hmm, that was a fix long ago. Will investigate this. Sorry for the > breakage. gcc pr100499.c -O1 -fpeel-loops -ftree-loop-vectorize -fno-vect-cost-model && ./a.out | grep 32768 also breaks with r7-2373-g69b806f6a60efcf1.
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 bin cheng changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |amker at gcc dot gnu.org --- Comment #7 from bin cheng --- (In reply to Martin Liška from comment #4) > (In reply to Martin Liška from comment #3) > > But expected result is end g_2823 = 32768, right? > > Clang returns the same result 32768. > > Which regresses since r7-2373-g69b806f6a60efcf1. Hmm, that was a fix long ago. Will investigate this. Sorry for the breakage.
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 Richard Biener changed: What|Removed |Added Known to work|11.1.0 | Known to fail||11.1.0 --- Comment #6 from Richard Biener --- OK, so GCC 11 is broken as well. Might want to bisect with -fno-vect-cost-model
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 --- Comment #5 from John Dong --- (In reply to Martin Liška from comment #3) > But expected result is end g_2823 = 32768, right? > Clang returns the same result 32768. Yes, I think so.
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 Martin Liška changed: What|Removed |Added CC||amker at gcc dot gnu.org --- Comment #4 from Martin Liška --- (In reply to Martin Liška from comment #3) > But expected result is end g_2823 = 32768, right? > Clang returns the same result 32768. Which regresses since r7-2373-g69b806f6a60efcf1.
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 --- Comment #3 from Martin Liška --- But expected result is end g_2823 = 32768, right? Clang returns the same result 32768.
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 Martin Liška changed: What|Removed |Added Keywords|needs-bisection | CC||marxin at gcc dot gnu.org --- Comment #2 from Martin Liška --- gcc pr100499.c -O1 -fpeel-loops -ftree-loop-vectorize && ./a.out | grep 60526 succeeds since r11-2453-gc89366b12ff4f362.
[Bug tree-optimization/100499] Different results with -fpeel-loops -ftree-loop-vectorize options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499 Richard Biener changed: What|Removed |Added Known to work||11.1.0 Ever confirmed|0 |1 Keywords||needs-bisection, wrong-code Status|UNCONFIRMED |NEW CC||rguenth at gcc dot gnu.org Last reconfirmed||2021-05-10 Known to fail||10.3.1, 8.4.1, 9.3.1 --- Comment #1 from Richard Biener --- Confirmed. It looks like GCC 11 works (it's not vectorizing). I wonder what fixed it.
