Two instructions but one of them's slow. I found the test
a+b == (a>b?a:b) on unsigned integers to be slightly faster in a loop
with independent iterations and significantly faster when a loop
dependency is introduced. However, a solution with two branches was
usually faster than either.
I loaded integers to test from an array. Compiling a+b == (a>b?a:b) with
gcc -O3 produces:
mov rax,QWORD PTR [rdx]
mov rcx,QWORD PTR [rdx+0x8]
cmp rax,rcx
lea r8,[rax+rcx*1]
cmovb rax,rcx
cmp r8,rax
jne ...
And !((__int128)a * (__int128)b) gives:
mov rax,QWORD PTR [rcx+0x8]
mul QWORD PTR [rcx]
mov r9,rdx
or r9,rax
jne ...
To my knowledge cmp can be fused with jne but or can't. Since in the
first assembly listing lea is independent from cmovb, the critical path
is cmp-cmovb-cmp/jne, which is three cycles latency on almost anything.
In the second, the critical path is mul-mov-or-jne, which is six cycles
(multiplication takes three), or five if the move gets eliminated.
Marshall
On Mon, Jul 29, 2019 at 07:50:48AM -0400, Henry Rich wrote:
> I'm glad I posted this to the Forum, because I had been thinking in terms of
> standard C, and not about extensions.
>
> The best so far is Gilles's idea, assuming you can get to the entire
> product:
>
> take a*b; if((high_product|low_product)==0)stmt;
>
> 2 instructions+1 branch. Gonna be hard to beat that.
>
> Once you allow other instructions, there are many solutions. Just simply
> writing
>
> if(!((a==0) & (b==0)))stmt
>
> will generate 5 instructions+1 branch, using the sete instruction.
>
> For the record, my solution was ((a&b)|(-a&-b))==0
>
> Henry Rich
>
> On 7/29/2019 1:21 AM, Louis de Forcrand wrote:
> > I am missing something:
> > a == 0 iff popcnt(a) == 0, so indeed
> > a == 0 || b == 0
> > is equivalent to
> > popcnt(a) == 0 || popcnt(b) == 0
> > which can be written as
> > !(popcnt(a) && popcnt(b)),
> > but if this is translated into a single-branch sequence of machine
> > instructions, then wouldn’t
> > !(a && b)
> > be as well?
> >
> > The solution I was thinking of by counting leading zero bits is (with LZCNT
> > said instruction, | bitwise or, ~ bitwise not, >> logical shift right, and
> > 32 = 2^5 bit integers):
> >
> > LZCNT(a) >> 5 | LZCNT(b) >> 5
> >
> > Using POPCNT in the same way is also possible:
> >
> > POPCNT(~a) >> 5 | POPCNT(~b) >> 5
> >
> > Cheers,
> > Louis
> >
> > > On 28 Jul 2019, at 23:49, Roger Hui <[email protected]> wrote:
> > >
> > > If you have thought about this for a year then the following can't be the
> > > answer, but here it is:
> > >
> > > There is either a popcnt instruction, or you can implement it with
> > > straight-line code without branching. Whence:
> > > if !(popcnt(a)&&popcnt(b)) stmt;
> > >
> > >
> > >
> > > > On Sun, Jul 28, 2019 at 5:51 PM Henry Rich <[email protected]> wrote:
> > > >
> > > > Many of the members of this Forum will remember the days of assembler
> > > > language
> > > >
> > > > ...and who could less than merry be
> > > > when writing out BXLE?
> > > >
> > > > AND, OR, XOR were our meat. Those days are returning.
> > > >
> > > > You have two integer variables a and b and you want to do something if
> > > > one or both of them are 0. In C, you might write
> > > >
> > > > if(a==0 || b==0)stmt;
> > > >
> > > > but that will generate the code
> > > >
> > > > cmp a,0
> > > > bz stmt
> > > > cmp b,0
> > > > bnz notstmt
> > > > stmt:
> > > > ...
> > > > notstmt:
> > > >
> > > > Here's the problem: your machine is very slow at branch instructions.
> > > > Each branch takes 30 times as long as a regular instruction. (I am
> > > > describing a state-of-the-art Intel CPU when it cannot predict the
> > > > branches effectively, perhaps because the data is... unpredictable).
> > > >
> > > > Obviously, you want to use only one branch instruction. You may use as
> > > > many arithmetic and logic instructions as you like, but only one
> > > > branch. The usual condition codes, ZNCV, are available. How tight can
> > > > you make the code?
> > > >
> > > > Example: suppose the problem were to execute stmt if one or both of a
> > > > and b is NOT zero. Then you would simply write
> > > >
> > > > or a,b
> > > > bnz notstmt
> > > > ...
> > > >
> > > > Checking for zero seems to be harder.
> > > >
> > > > No hurry. I have pondered this problem for over a year, and just today
> > > > I found a solution I consider acceptable.
> > > >
> > > > Henry Rich
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > ---
> > > > This email has been checked for viruses by AVG.
> > > > https://www.avg.com
> > > >
> > > > ----------------------------------------------------------------------
> > > > For information about J forums see http://www.jsoftware.com/forums.htm
> > > >
> > > ----------------------------------------------------------------------
> > > For information about J forums see http://www.jsoftware.com/forums.htm
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm