New submission from Carl Friedrich Bolz <[email protected]>: a / b with integers is currently doing something like:
r = a / b p = r * y if y < 0: u = p - x else: u = x - p return r + (u >> INT_BITS_1) the first two operations could be done with just one IDIV instruction on X86. For the rest it's maybe better to write it branch-free. discussion: 16:08:06 <cfbolz> indeed, before rtyping it's just "floordiv" 16:08:22 <arigato> and then "floordiv" is turned into... not "int_floordiv" 16:09:11 <cfbolz> int_truncdiv + a call 16:09:19 <arigato> yes 16:10:33 <cfbolz> arigato: might not get around to it now, but will keep it in mind 16:10:37 <arigato> ok 16:10:50 <arigato> I *think* the current slightly strange situation gives good results for the jit 16:11:45 <cfbolz> no clue 16:12:13 <arigato> at least I remember when int_floordiv was meant to have Python semantics too 16:12:54 <arigato> it was more messy in the jit, and then sometimes for nothing if you really wanted a C-like behavior in the end 16:13:14 <cfbolz> right 16:14:11 <cfbolz> arigato: still, you get thiskind of code: 16:14:17 <cfbolz> https://www.irccloud.com/pastebin/x2jo6KoG 16:15:10 <arigato> right, but one way or another, the RPython "/" operator needs to translate to this 16:15:35 <cfbolz> arigato: I am worried by the last guard a bit 16:15:45 <arigato> (probably someone with the right mindset can come up with a different set of instructions that is somehow better) 16:15:57 <arigato> cfbolz: ah, I see 16:15:59 <cfbolz> arigato: because it can really make bridges 16:16:08 <arigato> ok 16:16:48 <arigato> it comes out of ll_correct_int_floordiv() 16:17:00 <cfbolz> can we write this branch free 16:17:19 <arigato> first step is to remember why this works at all :-) 16:17:44 <cfbolz> :-) 16:17:59 <arigato> right, I think the branch was put there specifically because it's the less likely to have different values at run-time 16:18:18 <arigato> "x/y" where y is sometimes positive and sometimes negative... is probably a bit rare 16:18:28 <cfbolz> ah, no 16:18:42 <cfbolz> arigato: it's maybe even because then if you know that y is positive you can constant fold 16:19:00 <arigato> yes, that too 16:19:14 <cfbolz> arigato: because division by a constant factor 16:20:25 <cfbolz> still confused how it works 16:21:09 <arigato> (u >> INT_BITS_1) is -1 if u < 0, or 0 otherwise 16:21:37 <cfbolz> yes 16:21:41 <cfbolz> but what is u? 16:22:22 <arigato> u is computed to be 0 if the division was exact, or < 0 if not 16:22:37 <cfbolz> right 16:23:05 <arigato> a branch-free variant would be: return r + (((x - p) ^ y) >> INT_BITS_1) 16:23:34 → willingc joined ([email protected]) 16:23:43 <cfbolz> arigato: which means we would replace a guard by an xor 16:23:44 <arigato> the trade-off is if we already know that y > 0 16:23:49 <arigato> yes 16:24:20 <arigato> this kind of code would be all constant-folded away, right? if jit.isconstant(y > 0):... 16:24:21 <cfbolz> no clue :-( 16:24:41 <cfbolz> why all? 16:24:49 <cfbolz> only the guard would go away, no? 16:24:54 → Wessie joined ([email protected]) 16:25:05 <arigato> yes yes, I mean the "jit.isconstant()" part 16:25:24 <cfbolz> yes 16:25:28 <arigato> but it's probably not going to ever say True 16:25:38 <arigato> y is known to be positive later 16:25:42 <arigato> not during tracing 16:25:48 <cfbolz> yes, e.g. if it's loop invariant 16:26:01 <arigato> hum 16:26:16 <arigato> the alternative is to add a correct (but very special-case) arithmetic optmization: 16:26:33 <arigato> i3 = i1 ^ i2; i4 = i3 >> INT_BITS_1 16:26:55 <arigato> if i2 is known to be positive, then we can substitute i3 for i1 in the computation of i4 16:27:07 <cfbolz> arigato: very special case :-) 16:27:15 <cfbolz> arigato: hm, in x86 assembler you could indeed do better, because IDIV computes both x / y and the remainder in one instruction, no? 16:27:25 <arigato> ah, yes 16:27:27 <cfbolz> so you wouldn't need the multiply 16:27:38 ⇐ zer0def quit ([email protected]) Ping timeout: 240 seconds 16:27:39 <arigato> indeed 16:28:13 <cfbolz> so we would need an int_divrem (which we can't have, because it returns two things) 16:28:34 <arigato> it's probably not a problem: 16:29:10 <arigato> we can arrange things such that in the end the jit backend sees a INT_TRUNCDIV immediately followed by a INT_REM 16:29:35 <cfbolz> I see 16:30:56 <arigato> (btw, same about 'mod' vs 'int_mod') 16:31:03 <cfbolz> yes ---------- messages: 6577 nosy: cfbolz, pypy-issue priority: performance bug status: unread title: a / b with integers could be faster in the JIT ________________________________________ PyPy bug tracker <[email protected]> <https://bugs.pypy.org/issue1701> ________________________________________ _______________________________________________ pypy-issue mailing list [email protected] https://mail.python.org/mailman/listinfo/pypy-issue
