[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2019-06-03 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

--- Comment #36 from Wilco  ---
(In reply to Orr Shalom Dvory from comment #35)
> Hi, thanks for your respond. can someone mark this bug as need to be
> improved?
> Does anyone agree/disagree with my new proposed method?

It's best to create a new bug if there are still easy cases that could be
optimized. Also it seems the constants it uses are quite complex - it may be
feasible to simplify them. Eg. long f(long x) { return x % 35 == 0; }

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2019-06-02 Thread dvoreader at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

--- Comment #35 from Orr Shalom Dvory  ---
Hi, thanks for your respond. can someone mark this bug as need to be improved?
Does anyone agree/disagree with my new proposed method?

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2019-05-30 Thread hoganmeier at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

krux  changed:

   What|Removed |Added

 CC||hoganmeier at gmail dot com

--- Comment #34 from krux  ---
Also fixes the duplicate https://gcc.gnu.org/bugzilla/show_bug.cgi?id=12849.
Can't close it though.

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2019-02-26 Thread dvoreader at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

Orr Shalom Dvory  changed:

   What|Removed |Added

 CC||dvoreader at gmail dot com

--- Comment #33 from Orr Shalom Dvory  ---
(In reply to Jakub Jelinek from comment #25)
> Created attachment 44657 [details]
> gcc9-pr82853.patch
> 
> Full untested patch.  For x % C1 == C2 it handles all unsigned cases where
> C1 is odd, if C1 is even, just cases where C2 <= -1U % C1, if signed modulo,
> just x % C1 == 0 cases (where C1 is not INT_MIN).

Hi, I will try to help. I've got a more accurate formula. look at my comment
over here https://reviews.llvm.org/D50222
I'm copying it here for the sake of all. 
Hi guys, I found the magical formula for unsigned integers that works also with
even numbers without the need to check for overflows with any remainder:
from divisor d and reminder r, I calculate 4 constants.
any d!=0 should fit.

void calculate_constants64(uint64_t d, uint64_t r, uint64_t ,uint64_t ,
uint64_t ,uint64_t& u)
{
k=__builtin_ctzll(d);/* the power of 2 */
uint64_t d_odd=d>>k;
mmi=find_mod_mul_inverse(d_odd,64);
/* 64 is word size*/
s=(r*mmi);
u=(ULLONG_MAX-r)/d;/* note that I divide by d, not d_odd */
}
A little bit background: the constant (u +1) is the count of the possible
values in the range of 64 bit number that will yield the correct modulo.
The constant s should zero the first k bits if the given value have modulo of
r. it will also zero the modulo of d_odd.

then the compiler should generate the following code with the given constants:

int checkrem64(uint64_t k,uint64_t mmi, uint64_t s,uint64_t u,uint64_t x)
{
uint64_t o=((x*mmi)-s);
o= (o>>k)|(o<<(64-k));/*ROTR64(o,k)*/
return o<=u;
}
this replace the following:

/* d is the divisor, r is the remainder */
int checkrem64(uint64_t x)
{
  return x%d==r;
}
this is the code to find modular inverse..

uint64_t find_mod_mul_inverse(uint64_t x, uint64_t bits)
{
  if (bits > 64 || ((x&1)==0))
  return 0;// invalid parameters
  uint64_t mask;
  if (bits == 64)
  mask = -1;
  else
  {
  mask = 1;
  mask<<=bits;
  mask--;
  }
  x&=mask;
  uint64_t result=1, state=x, ctz=0;
  while(state!=1ULL)
  {
  ctz=__builtin_ctzll(state^1);
  result|=1ULL<>k;
mmi=find_mod_mul_inverse(d_odd,64);
/* 64 is word size*/
 //this is the added line to make it work with signed integers
  r+=0x8000   % d;
s=(r*mmi);
u=(ULLONG_MAX-r)/d;
}

int checkrem64(uint64_t k,uint64_t mmi, uint64_t s,uint64_t u,uint64_t x)
{
 //this is the added line to make it work with signed integers
//x came as signed number but was casted to unsigned
x^=0x8000   ;// this is addition simplified to xor, spaces for
clarity.
uint64_t o=((x*mmi)-s);
o= (o>>k)|(o<<(64-k));/*ROTR64(o,k)*/
return o<=u;
}
There is a way to tweak u and s to make it work on negative only numbers or
positive only numbers, when r is negative or positive... but for r=0, this
should work. please tell me the exact requirements, and I will do the math.

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2019-02-09 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

Alexander Monakov  changed:

   What|Removed |Added

 CC||vermaelen.wouter at gmail dot 
com

--- Comment #32 from Alexander Monakov  ---
*** Bug 49552 has been marked as a duplicate of this bug. ***

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2018-09-13 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

Jakub Jelinek  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #31 from Jakub Jelinek  ---
Fixed on the trunk.  We might still try to improve some == non-zero cases with
conditional code, and certainly should repeat this in the vectorizer's pattern
recognizer.

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2018-09-12 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

--- Comment #30 from Jakub Jelinek  ---
Author: jakub
Date: Wed Sep 12 18:28:20 2018
New Revision: 264248

URL: https://gcc.gnu.org/viewcvs?rev=264248=gcc=rev
Log:
PR middle-end/82853
* expr.h (maybe_optimize_mod_cmp): Declare.
* expr.c (mod_inv): New function.
(maybe_optimize_mod_cmp): New function.
(do_store_flag): Use it.
* cfgexpand.c (expand_gimple_cond): Likewise.

* gcc.target/i386/pr82853-1.c: New test.
* gcc.target/i386/pr82853-2.c: New test.

Added:
trunk/gcc/testsuite/gcc.target/i386/pr82853-1.c
trunk/gcc/testsuite/gcc.target/i386/pr82853-2.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/cfgexpand.c
trunk/gcc/expr.c
trunk/gcc/expr.h
trunk/gcc/testsuite/ChangeLog

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2018-09-05 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

Jakub Jelinek  changed:

   What|Removed |Added

 CC||antoshkka at gmail dot com

--- Comment #29 from Jakub Jelinek  ---
*** Bug 87232 has been marked as a duplicate of this bug. ***

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2018-09-04 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

--- Comment #28 from Jakub Jelinek  ---
On i9-7960X I get (cc1 is -O0 checking build, so bootstrapped compiler might be
much faster), will repeat that with bootstrapped compiler if it succeeds.  The
__int128 and unsigned __int128 tests are clearly too expensive.

for i in 1 2 3 4 5 6; do time ./cc1 -quiet -O2 -o pr82853-$i.{s,c}; gcc -o
pr82853-$i{,.s}; time ./pr82853-$i; echo $?; done
real0m11.273s
user0m11.182s
sys 0m0.039s

real0m2.997s
user0m2.993s
sys 0m0.001s
0

real0m8.145s
user0m8.082s
sys 0m0.026s

real0m2.166s
user0m2.165s
sys 0m0.000s
0

real0m11.683s
user0m11.597s
sys 0m0.033s

real0m5.315s
user0m5.312s
sys 0m0.000s
0

real0m7.972s
user0m7.903s
sys 0m0.032s

real0m3.801s
user0m3.798s
sys 0m0.001s
0

real0m12.846s
user0m12.762s
sys 0m0.028s

real0m17.471s
user0m17.458s
sys 0m0.001s
0

real0m8.546s
user0m8.486s
sys 0m0.022s

real0m13.738s
user0m13.728s
sys 0m0.000s

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2018-09-04 Thread rearnsha at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

--- Comment #27 from Richard Earnshaw  ---
(In reply to Jakub Jelinek from comment #26)

> A test generator for x % c1 == c2 expansion for unsigned, int, unsigned long
> long, long long, unsigned int128 and int128 types (assuming ilp32 or lp64)
> plus tests for those types.  Takes about 2 minutes to compile + run on a fast
> box and uses random (), so not really sure the tests should go into the
> testsuite.  Thoughts on that?  After all, the generator isn't extra smart and
> doesn't try to find problematic corner cases.

If you work on the 'guess' of a 10x slowdown when using simulators, that's
probably too long particularly if it's not delivering high value. 

Ultimately, it will depend on how much of that 2 minutes is during
generate/compile and how much during run time.

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2018-09-04 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

--- Comment #26 from Jakub Jelinek  ---
Created attachment 44658
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44658=edit
pr82853-tests.tar.xz

A test generator for x % c1 == c2 expansion for unsigned, int, unsigned long
long, long long, unsigned int128 and int128 types (assuming ilp32 or lp64)
plus tests for those types.  Takes about 2 minutes to compile + run on a fast
box and uses random (), so not really sure the tests should go into the
testsuite.  Thoughts on that?  After all, the generator isn't extra smart and
doesn't try to find problematic corner cases.

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2018-09-04 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

Jakub Jelinek  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org

--- Comment #25 from Jakub Jelinek  ---
Created attachment 44657
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44657=edit
gcc9-pr82853.patch

Full untested patch.  For x % C1 == C2 it handles all unsigned cases where C1
is odd, if C1 is even, just cases where C2 <= -1U % C1, if signed modulo, just
x % C1 == 0 cases (where C1 is not INT_MIN).

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2018-09-03 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

--- Comment #24 from Jakub Jelinek  ---
(In reply to Wilco from comment #23)
> (In reply to Jakub Jelinek from comment #22)
> > So, while it isn't correct to replace x % 3U == 1 by (x - 1) % 3U == 0,
> > because
> > for x == 0 the test will yield a different value, as 0xU % 3U is 0
> > and
> > 0 % 3U is also 0, x % 3U == 1 is equivalent to (x - 1) * 0xaaabU <=
> > 0x5554U, but x % 3U == 0 is equivalent to x * 0xaaabU <= 
> > 0xU.
> > Now to see if something useful can be used also for the even divisors.
> 
> Yes for this case it is safe to do (x - 1) * 0xaaab < 0x, but
> you can also do x * 0xaaab >= 0xaaab which is even simpler.

Yeah, a special case, todo++.

> Basically (x % C) == N is simpler for 2 values of N.

I meant that for C odd it works for any value, x % C == D for C odd and D <= -1
% C then (x - D) * mod_inv (C) <= -1 / C, otherwise if C is odd and D > -1 % C
then
(x - D) * mod_inv (C) < -1 / C.
Just if C is even it is more complicated.  
For C positive odd and signed modulo with D == 0 it can be done easily too
(will implement tomorrow).

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2018-09-03 Thread wdijkstr at arm dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

Wilco  changed:

   What|Removed |Added

 CC||wdijkstr at arm dot com

--- Comment #23 from Wilco  ---
(In reply to Jakub Jelinek from comment #22)
> So, while it isn't correct to replace x % 3U == 1 by (x - 1) % 3U == 0,
> because
> for x == 0 the test will yield a different value, as 0xU % 3U is 0
> and
> 0 % 3U is also 0, x % 3U == 1 is equivalent to (x - 1) * 0xaaabU <=
> 0x5554U, but x % 3U == 0 is equivalent to x * 0xaaabU <= 0xU.
> Now to see if something useful can be used also for the even divisors.

Yes for this case it is safe to do (x - 1) * 0xaaab < 0x, but you
can also do x * 0xaaab >= 0xaaab which is even simpler. Basically (x %
C) == N is simpler for 2 values of N.

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2018-09-03 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

--- Comment #22 from Jakub Jelinek  ---
So, while it isn't correct to replace x % 3U == 1 by (x - 1) % 3U == 0, because
for x == 0 the test will yield a different value, as 0xU % 3U is 0 and
0 % 3U is also 0, x % 3U == 1 is equivalent to (x - 1) * 0xaaabU <=
0x5554U, but x % 3U == 0 is equivalent to x * 0xaaabU <= 0xU.
Now to see if something useful can be used also for the even divisors.

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2018-09-03 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

--- Comment #21 from Jakub Jelinek  ---
Probably should punt early if integer_onep (treeop1), that case should have
been optimized earlier, but if it isn't, we shouldn't miscompile.

Another thing is if *arg1 is < 0 or >= treeop1, again, I'd hope we have
optimized that already to false (or true for NE_EXPR), but if not, we shouldn't
miscompile.

Another thing is that C4 needs to be adjusted, if *arg1 is bigger than
all_ones%treeop1, then we need to subtract 1 from C4 - e.g. for x % 3 == 0
we want to expand it as x * 0xaaab <= 0x, while for x % 3 == 1
we should expand as x * 0xaaab - 1 * 0xaaab <= 0x - 1, because
0x % 3 == 0 and thus there is one more case where x % 3 == 0 compared
to x % 3 == 1 and x % 3 == 2 cases.

Hacker's Delight 10-17 mentions <= (2^prec - 1) / c1 rather than <= (2^prec) /
c1, probably should do that too.

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2018-09-03 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

--- Comment #20 from Jakub Jelinek  ---
Testcase I've been eyeballing so far:
unsigned f1 (unsigned x) { return (x % 679U) == 0; }
unsigned f2 (unsigned x, unsigned *y) { *y = x / 679U; return (x % 679U) == 0;
}
unsigned f3 (unsigned x) { return (x % 1738U) == 0; }
void bar (void);
void f4 (unsigned x) { if (x % 3 == 0) bar (); }
void f5 (unsigned x) { if (x % 3 == 1) bar (); }
void f6 (unsigned x) { if (x % 3 == 2) bar (); }

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2018-09-03 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

--- Comment #19 from Jakub Jelinek  ---
Created attachment 44656
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44656=edit
gcc9-pr82853-wip.patch

Untested WIP patch which does this during expansion if it is cheaper according
to target costs.
As the FIXMEs say, I'm not yet trying Alex' proposal with doing the subtraction
on X (can't be used unconditionally, need to prove that values from -C2 to -1
u% C1 aren't 0), and am not trying even C1 when rotates aren't available
(either could just expand the rotate anyway using 2 shifts + or, or comparison
+ mask + comparison + bitwise and (though that would complicate the callers
equite a bit).

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2018-09-03 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

--- Comment #18 from Wilco  ---
(In reply to Alexander Monakov from comment #17)
> (In reply to Jakub Jelinek from comment #16)
> > For unsigned x % y == z if y is odd constant we can handle it for any
> > constant z, by computing m = mul_inv (y, 2^prec) and d = (2^prec / y) and
> > using x * m - (z * m) < d .
> 
> Is that preferable to testing (x - z) % y == 0?  Why?

That would require checking that it when x - z underflows if returns the
correct answer (or adding a && x >= z).

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2018-09-03 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #17 from Alexander Monakov  ---
(In reply to Jakub Jelinek from comment #16)
> For unsigned x % y == z if y is odd constant we can handle it for any
> constant z, by computing m = mul_inv (y, 2^prec) and d = (2^prec / y) and
> using x * m - (z * m) < d .

Is that preferable to testing (x - z) % y == 0?  Why?

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2018-09-03 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

--- Comment #16 from Jakub Jelinek  ---
For unsigned x % y == z if y is odd constant we can handle it for any constant
z, by computing m = mul_inv (y, 2^prec) and d = (2^prec / y) and using x * m -
(z * m) < d .
For even y, not sure if it can work for anything but z == 0; for z == 0 we can
do
s = ctz (y); y_ = y >> s; m = mul_inv (y_, 2^prec); d = (2^prec / y_);
and use ((x * m) r>> s) < d .

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2018-09-03 Thread ktkachov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org

--- Comment #15 from ktkachov at gcc dot gnu.org ---
I tried to do something similar at
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02005.html and the feedback was
that we should do this at expand time

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2018-09-03 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

--- Comment #14 from Jakub Jelinek  ---
http://duriansoftware.com/joe/Optimizing-is-multiple-checks-with-modular-arithmetic.html

Do we want to do this at GIMPLE time ignoring costs, or during expansion time?
Doing it later has the benefit that we can compare costs and could avoid
breaking say divmod recognition, or finding out multiple uses of the modulo,
etc.
Doing it earlier has the benefit for vectorization I guess, otherwise we need a
pattern recognizer that will work like the expansion.

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2018-09-03 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #13 from Jakub Jelinek  ---
*** Bug 87203 has been marked as a duplicate of this bug. ***

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2017-11-06 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

--- Comment #12 from Wilco  ---
(In reply to Marc Glisse from comment #11)
> (In reply to Wilco from comment #9)
> > It works for any C where (divisor*C) MOD 2^32 == 1 (or -1).
> 
> For x%3==0, i.e. z==0 for x==3*y+z with 0<=y<5556 and 0<=z<3. 
> Indeed, x*0xaaab==y+z*0xaaab is in the right range precisely for
> z==0 and the same can be done for any odd number.

You can make it work for even numbers as well, eg. (x % 10) == 0 -> (x % 5) ==
0 && (x & 1) == 0. If you can avoid branches then it may still be faster.

> > You can support any kind of comparison, it doesn't need to be with 0 (but 
> > zero is the easiest).
> 
> Any ==cst will yield a range test. It is less obvious that inequalities are
> transformed to a contiguous range... (try x%7<3 maybe)

Indeed, expanding into x % 7 == 0 || x % 7 == 1 || x % 7 == 2 is unlikely to be
a good idea. However you can change x % 7 >= 1 into x % 7 != 0 or x % 7 < 6
into x % 7 != 6 which are single ranges.

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2017-11-06 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

--- Comment #11 from Marc Glisse  ---
(In reply to Wilco from comment #9)
> It works for any C where (divisor*C) MOD 2^32 == 1 (or -1).

For x%3==0, i.e. z==0 for x==3*y+z with 0<=y<5556 and 0<=z<3. 
Indeed, x*0xaaab==y+z*0xaaab is in the right range precisely for z==0
and the same can be done for any odd number.

> You can support any kind of comparison, it doesn't need to be with 0 (but 
> zero is the easiest).

Any ==cst will yield a range test. It is less obvious that inequalities are
transformed to a contiguous range... (try x%7<3 maybe)

> I forgot whether I made it work for signed too, but it's certainly
> possible to skip the sign handling in x % 4 == 0 even if x is signed.

4 is a completely different story, as a power of 2.

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2017-11-06 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

--- Comment #10 from Richard Biener  ---
Might be even more special-cased with x % 3 == 0 ? C1 : C2 aka in if-conversion
context.

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2017-11-06 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

Wilco  changed:

   What|Removed |Added

 CC||wilco at gcc dot gnu.org

--- Comment #9 from Wilco  ---
(In reply to Andi Kleen from comment #8)
> I'm not sure if it works with other numbers too.
> 
> (need to dig through Hacker's delight & Matters Computational to see if they
> have anything on it)
> 
> But it could be extended for other word lengths at least
> 
> BTW there are some other cases, will file a bug shortly on those too.

It works for any C where (divisor*C) MOD 2^32 == 1 (or -1). You can support any
kind of comparison, it doesn't need to be with 0 (but zero is the easiest). I
forgot whether I made it work for signed too, but it's certainly possible to
skip the sign handling in x % 4 == 0 even if x is signed.

While this is faster in general than computing the full modulo result, it
wouldn't be if you need the division result as well. So this works best on
single-use modulos used in comparisons, at the same time as general divisions
are expanded.

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2017-11-05 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

--- Comment #8 from Andi Kleen  ---
I'm not sure if it works with other numbers too.

(need to dig through Hacker's delight & Matters Computational to see if they
have anything on it)

But it could be extended for other word lengths at least

BTW there are some other cases, will file a bug shortly on those too.

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2017-11-05 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

--- Comment #7 from Marc Glisse  ---
Is that a special case of a more generic transformation, which might apply for
other values of 3, 0, == etc, or is this meant only literally for x%3==0?

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2017-11-05 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-11-05
 Ever confirmed|0   |1

--- Comment #6 from Andrew Pinski  ---
(In reply to Andi Kleen from comment #5)
> Also I'm not sure why you would want it in the middle end. It should all
> work at the tree level

Because we lose meaning and there is already code to handle the %N (where N is
a constant) to a multiplication in the middle-end so reusing that code would be
easy.  Plus 0 is simpler than 0x5556.

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2017-11-05 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

--- Comment #5 from Andi Kleen  ---
Also I'm not sure why you would want it in the middle end. It should all work
at the tree level

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2017-11-05 Thread andi-gcc at firstfloor dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

--- Comment #4 from Andi Kleen  ---
Right it's about special casing the complete expression

[Bug middle-end/82853] Optimize x % 3 == 0 without modulo

2017-11-05 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82853

Andrew Pinski  changed:

   What|Removed |Added

  Component|target  |middle-end

--- Comment #3 from Andrew Pinski  ---
For your code aarch64 produces:
mod3:
mov w1, 43691
movkw1, 0x, lsl 16
umull   x1, w0, w1
lsr x1, x1, 32
lsr w1, w1, 1
add w1, w1, w1, lsl 1
cmp w0, w1
csetw0, eq
ret


Oh you mean not to produce a%3 followed by == part :).  Anyways this code
should go into the middle-end expanders and not be part of match.pd.