Re: [Open-graphics] Computing overflow

Timothy Normand Miller Sun, 24 Jun 2012 17:08:04 -0700

On Sun, Jun 24, 2012 at 4:56 PM,  <[email protected]> wrote:
> Hi !
>
>
> On Sun, 24 Jun 2012 15:14:52 -0400, Timothy Normand Miller wrote:
>>
>> On Sun, Jun 24, 2012 at 2:50 PM,  <[email protected]> wrote:
>>>
>>> The real question is : how often have you used or needed to detect
>>> overflow ?
>>
>>
>> No idea.  :)  I'm taking a conservative approach to things I don't
>> know enough about.  Probably 20 years ago, I worked out the use of the
>> overflow bit in comparisons, so my memory isn't fresh.
>
> then take a few steps back, look at actual code and other recent
> architectures ?


Yeah.  I should do that.  Actually, taking a look at the architectures
isn't so hard.  It's figuring out what the compilers produce to have
certain effects that would be more difficult unless I went through the
trouble of compiling a gcc cross compiler.  I know that MIPS has
several specialized compare instructions for this, on top of subtract,
but MIPS doesn't have condition codes.

All of my recent research in computer architecture has been in energy
efficiency and reliability.

>
>
>>> I suppose you need it because of conditional instructions that test
>>> less than or equal, and those stuff.
>>> You can get rid of of the overflow bit by creating "compare" instructions
>>> that specify if the operands are signed or not.
>>> Hint: the only difference is to XOR the MSB of the operands, and pass
>>> them
>>> through a substract unit. The comparison result will end up in the carry
>>> bit.
>>
>> Yes, and I have considered that option.  Moreover, I specifically want
>> to test that option.
>
> then just do it :-)
>
>
>>  What I'm doing right NOW is working on the
>> option where there are complex predicates and compare instructions
>> that don't know the condition.
>
> I'm not sure to understand.
> Can you provide an example ?

Option A:  Simple compare instructions, multiple condition flags, and
a complex condition code on every instruction.
Option B:  Multiple specialized compare instructions, few condition
flags, simple condition code on every instruction.
Option C:  No predication.

>
>
>>  Next, I'll include an alternative
>> option where the compare instructions know what the condition is and
>> therefore the predicates are simpler.
>
> That's cool if it reduces the complexity of the predication
> logic, hence the predication size and object code size :-)
>
>
>> Note that when I say predicates, I mean like ARM, not Itanium.
>
> Whatever you choose.
> As long as it is self-coherent :-)
>
>
>>> About the optimised add/sub macro that is faster :
>>>  - it is pointless since you have designed a pipeline where the
>>>    integer part has enormous slack.
>>
>>
>> Pointless for correctness, extremely indirect for speed, not pointless
>> for area.
>
> area matters, I agree.
>
>
>> P = area of add(A,B) or sub(A,B)
>> Q = area of sub(0,B)
>> R = area of addsub(A,B,selector)
>>
>> If (R > P+Q), then we change the sign separately.  Otherwise we use
>> the addsub.
>
>
> I'm not sure to understand but I'm convinced that you can do everything
> with only one add/sub unit.

Yes.

>
>> That accounts not just for area but also leakage power.  There are
>> also separate dynamic power considerations.  Indeed, for dynamic power
>> considerations, we may argue for separate adder and subtractor paths,
>
> yet there is almost no difference, structurally, between add and sub,
> so i'm still puzzled.

Yes, that's true, except that to use an add to do a sub, you have to
change a sign, while on the other hand, you can make a specialized
subtractor.  Depending on how much more complex an addsub is than a
subtractor or adder, there may be an interesting dynamic power versus
area tradeoff.

>
>> because now we can clock-gate them separately.  If they're combined,
>> then a larger circuit is on for every add or subtract.
>
> it is only marginally larger.

I haven't actually examined one.  So I decided to work out the circuit
for a single-bit addsub.  There are four inputs:  A, B, C-in, and Sub.
 And two outputs, P (output) and Q (carry out).  When subtracting,
C-in==0 means borrow.  Anyhow, I worked out the karnaugh map, and it
seems to me to be maybe twice as complex than a regular full adder.
But I didn't work out the full adder again, so I can't be absolutely
sure.  Also, I probably made a mistake because I was in a rush anyway.
 In an FPGA, the full addsub will have nearly the same complexity as
an adder, but in an ASIC, it will be relatively more complex.  (And
this is because the adder doesn't make maximal use of the CLBs in the
FPGA.)

>
>
>>  If they're
>> separate, then there are two smaller circuits that are mutually
>> exclusive, albeit with larger total area and leakage power.
>
> I'm worried by the MUX at the end, too.

Yes, but I would combine that with an already-exiting MUX.

>
>
>>>  - The row of XORs will be optimised away.
>>
>> It's not a row of XORs.  A 1's compliment is a row of XORs, while a
>> 2's complement has its own carry chain.
>
> you're not doing it smart enough then ?
> I remember that it's quite simple actually.
>
>
>> Also, (A-B) is the same as (A+(~B)+1), so if you can have a carry-in,
>> you can do this with just a row of XORs and some multiplexing.
>
> multiplexing what ? Just XOR the right bits :-)
>
>
>> But now we're running into tools limitations.  The HDL synthesizers
>> I've encountered do a crap job of optimizing (A+B+1), where they end
>> up with two adders, rather than one adder with carry-in.  As a result,
>> I end up doing (A+B+1) as (A-(~B)).
>
>
> ok, here is the key trick, but it's in VHDL.
>
> Don't write two adds if you don't want them.
> Instead, extend the width of the source vectors by one LSB,
> set one LSB to 1 and the other is the conditional carry-in.
> Then, for the output, just drop the added LSB.
> The synthesizer will quietly optimise this pseudo-carry-in.

*smacks forehead*  After all these years, I never thought of that,
although it seems obvious now.  Sheesh.  Is that a consequence of
being daft or just being self-taught?  :)

Well, anyhow, I FEEL dumb now.  :)

>
> Similarly, you can get the carry out by appending a cleared MSB to both
> operands.

That one I knew.  :)

Although, interestingly, the Verilog tools are happy to take two 8-bit
addends and produce a 9-bit result with carry.

>
> Here is some code extracted from http://yasep.org/VHDL/microYASEP.vhd
>
>  Addsub <= '1' when -- this one controls the "carry in"
>      opcode_defined(Op_SUB ,opcode,int_opcode) or
>      opcode_defined(Op_CMPS,opcode,int_opcode) or
>      opcode_defined(Op_CMPU,opcode,int_opcode)
>    else '0';
>
>  --- some more code that XORs ActualA and ActualB
>
>  -- perform ADD and SUB with just one adder and carry in :
>  sumAux <= unsigned('0' & ActualA & '1')
>          + unsigned('0' & ActualB & Addsub);
>
>  -- extract the actual result and the carry out
>  Carry_out <= std_logic(sumAux(YASEP_SIZE+1));
>  ASU <= std_logic_vector(sumAux(YASEP_SIZE downto 1));
>
> I see no need of multiplexing an adder and subber results :-)

So what you're saying is that with a trick in the low bit and a 1's
complement, I can can do an addsub?  I'll have to tinker with that.

>
> For the signed/unsigned comparison, have a deeper look at the code
> that computes compare_signed, ActualB and ActualA. It's just XORs,
> as promised.

I'll have to spend some time reading that code.  I'm not so good at
reading VHDL, though.

>
>
>
>>>  - with no "overflow" to compute, no need to worry about it, anyway :-)
>>
>> I have to put some more thought into the implications of this.  I'm
>> not a compilers person, so I don't know what the tradeoffs are with
>> regard to signed vs. unsigned comparisons.  I just know what lots of
>> different CPUs provide.  Obviously, we should evaluate this issue, but
>> I need input from someone who is an expert on compilers.
>
> then it depends on the compiler you want to use.
>
>
>>> "Overflow" is a legacy of an era that is long gone, so don't bother with
>>> it.
>>
>> I would like to know more about this.  Can you provide a longer
>> explanation?  With an overflow bit, (A<B) is a single instruction
>> change, depending on whether A and B are signed or not.  Without an
>> overflow bit, how is this handled?
>
>
> Without overflow, the idea is
> (for unsigned operands) :
>
> CMPU operand1, operand2
> if carry, DO something
>
> for signed operands, just use CMPS instead
> (it xors the MSB of both operands)

XORs them with what?  Anyhow I assume I'll discover that in the code
to that CPU.

>
> for less than or equal, instead of less than,
> just swap operands and check for "not carry".
>
> At least, that's how I remember these methods.
>
>
>>> Hope it helped you,
>>
>>
>> Thanks!
>
>
> You're welcome :-)



-- 
Timothy Normand Miller
http://www.cse.ohio-state.edu/~millerti
Open Graphics Project
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Re: [Open-graphics] Computing overflow

Reply via email to