Your instruction timings are right, but I question whether two branches
would be better if the each input is zero, say, half the time randomly.
a+b == (a>b?a:b) is a good trick!
The fact that or doesn't fuse and cmp does is important. It means that
a+b == (a>b?a:b) takes only 4 microops, same as the mul version (without
mov), but with 2 cycles less time to back out of a misprediction.
So: two branches would be good if the branches never mispredict.
Otherwise a+b == (a>b?a:b) is better. If in the future or fuses, there
will be a very narrow set of cases where mul would win.
Henry Rich
On 7/29/2019 10:10 AM, Marshall Lochbaum wrote:
Two instructions but one of them's slow. I found the test
a+b == (a>b?a:b) on unsigned integers to be slightly faster in a loop
with independent iterations and significantly faster when a loop
dependency is introduced. However, a solution with two branches was
usually faster than either.
I loaded integers to test from an array. Compiling a+b == (a>b?a:b) with
gcc -O3 produces:
mov rax,QWORD PTR [rdx]
mov rcx,QWORD PTR [rdx+0x8]
cmp rax,rcx
lea r8,[rax+rcx*1]
cmovb rax,rcx
cmp r8,rax
jne ...
And !((__int128)a * (__int128)b) gives:
mov rax,QWORD PTR [rcx+0x8]
mul QWORD PTR [rcx]
mov r9,rdx
or r9,rax
jne ...
To my knowledge cmp can be fused with jne but or can't. Since in the
first assembly listing lea is independent from cmovb, the critical path
is cmp-cmovb-cmp/jne, which is three cycles latency on almost anything.
In the second, the critical path is mul-mov-or-jne, which is six cycles
(multiplication takes three), or five if the move gets eliminated.
Marshall
On Mon, Jul 29, 2019 at 07:50:48AM -0400, Henry Rich wrote:
I'm glad I posted this to the Forum, because I had been thinking in terms of
standard C, and not about extensions.
The best so far is Gilles's idea, assuming you can get to the entire
product:
take a*b; if((high_product|low_product)==0)stmt;
2 instructions+1 branch. Gonna be hard to beat that.
Once you allow other instructions, there are many solutions. Just simply
writing
if(!((a==0) & (b==0)))stmt
will generate 5 instructions+1 branch, using the sete instruction.
For the record, my solution was ((a&b)|(-a&-b))==0
Henry Rich
On 7/29/2019 1:21 AM, Louis de Forcrand wrote:
I am missing something:
a == 0 iff popcnt(a) == 0, so indeed
a == 0 || b == 0
is equivalent to
popcnt(a) == 0 || popcnt(b) == 0
which can be written as
!(popcnt(a) && popcnt(b)),
but if this is translated into a single-branch sequence of machine
instructions, then wouldn’t
!(a && b)
be as well?
The solution I was thinking of by counting leading zero bits is (with LZCNT said
instruction, | bitwise or, ~ bitwise not, >> logical shift right, and 32 = 2^5
bit integers):
LZCNT(a) >> 5 | LZCNT(b) >> 5
Using POPCNT in the same way is also possible:
POPCNT(~a) >> 5 | POPCNT(~b) >> 5
Cheers,
Louis
On 28 Jul 2019, at 23:49, Roger Hui <[email protected]> wrote:
If you have thought about this for a year then the following can't be the
answer, but here it is:
There is either a popcnt instruction, or you can implement it with
straight-line code without branching. Whence:
if !(popcnt(a)&&popcnt(b)) stmt;
On Sun, Jul 28, 2019 at 5:51 PM Henry Rich <[email protected]> wrote:
Many of the members of this Forum will remember the days of assembler
language
...and who could less than merry be
when writing out BXLE?
AND, OR, XOR were our meat. Those days are returning.
You have two integer variables a and b and you want to do something if
one or both of them are 0. In C, you might write
if(a==0 || b==0)stmt;
but that will generate the code
cmp a,0
bz stmt
cmp b,0
bnz notstmt
stmt:
...
notstmt:
Here's the problem: your machine is very slow at branch instructions.
Each branch takes 30 times as long as a regular instruction. (I am
describing a state-of-the-art Intel CPU when it cannot predict the
branches effectively, perhaps because the data is... unpredictable).
Obviously, you want to use only one branch instruction. You may use as
many arithmetic and logic instructions as you like, but only one
branch. The usual condition codes, ZNCV, are available. How tight can
you make the code?
Example: suppose the problem were to execute stmt if one or both of a
and b is NOT zero. Then you would simply write
or a,b
bnz notstmt
...
Checking for zero seems to be harder.
No hurry. I have pondered this problem for over a year, and just today
I found a solution I consider acceptable.
Henry Rich
---
This email has been checked for viruses by AVG.
https://www.avg.com
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm