Hi !
On Sun, 24 Jun 2012 15:14:52 -0400, Timothy Normand Miller wrote:
On Sun, Jun 24, 2012 at 2:50 PM, <[email protected]> wrote:
The real question is : how often have you used or needed to detect
overflow ?
No idea. :) I'm taking a conservative approach to things I don't
know enough about. Probably 20 years ago, I worked out the use of
the
overflow bit in comparisons, so my memory isn't fresh.
then take a few steps back, look at actual code and other recent
architectures ?
I suppose you need it because of conditional instructions that test
less than or equal, and those stuff.
You can get rid of of the overflow bit by creating "compare"
instructions
that specify if the operands are signed or not.
Hint: the only difference is to XOR the MSB of the operands, and
pass them
through a substract unit. The comparison result will end up in the
carry
bit.
Yes, and I have considered that option. Moreover, I specifically
want
to test that option.
then just do it :-)
What I'm doing right NOW is working on the
option where there are complex predicates and compare instructions
that don't know the condition.
I'm not sure to understand.
Can you provide an example ?
Next, I'll include an alternative
option where the compare instructions know what the condition is and
therefore the predicates are simpler.
That's cool if it reduces the complexity of the predication
logic, hence the predication size and object code size :-)
Note that when I say predicates, I mean like ARM, not Itanium.
Whatever you choose.
As long as it is self-coherent :-)
About the optimised add/sub macro that is faster :
- it is pointless since you have designed a pipeline where the
integer part has enormous slack.
Pointless for correctness, extremely indirect for speed, not
pointless
for area.
area matters, I agree.
P = area of add(A,B) or sub(A,B)
Q = area of sub(0,B)
R = area of addsub(A,B,selector)
If (R > P+Q), then we change the sign separately. Otherwise we use
the addsub.
I'm not sure to understand but I'm convinced that you can do everything
with only one add/sub unit.
That accounts not just for area but also leakage power. There are
also separate dynamic power considerations. Indeed, for dynamic
power
considerations, we may argue for separate adder and subtractor paths,
yet there is almost no difference, structurally, between add and sub,
so i'm still puzzled.
because now we can clock-gate them separately. If they're combined,
then a larger circuit is on for every add or subtract.
it is only marginally larger.
If they're
separate, then there are two smaller circuits that are mutually
exclusive, albeit with larger total area and leakage power.
I'm worried by the MUX at the end, too.
- The row of XORs will be optimised away.
It's not a row of XORs. A 1's compliment is a row of XORs, while a
2's complement has its own carry chain.
you're not doing it smart enough then ?
I remember that it's quite simple actually.
Also, (A-B) is the same as (A+(~B)+1), so if you can have a carry-in,
you can do this with just a row of XORs and some multiplexing.
multiplexing what ? Just XOR the right bits :-)
But now we're running into tools limitations. The HDL synthesizers
I've encountered do a crap job of optimizing (A+B+1), where they end
up with two adders, rather than one adder with carry-in. As a
result,
I end up doing (A+B+1) as (A-(~B)).
ok, here is the key trick, but it's in VHDL.
Don't write two adds if you don't want them.
Instead, extend the width of the source vectors by one LSB,
set one LSB to 1 and the other is the conditional carry-in.
Then, for the output, just drop the added LSB.
The synthesizer will quietly optimise this pseudo-carry-in.
Similarly, you can get the carry out by appending a cleared MSB to both
operands.
Here is some code extracted from http://yasep.org/VHDL/microYASEP.vhd
Addsub <= '1' when -- this one controls the "carry in"
opcode_defined(Op_SUB ,opcode,int_opcode) or
opcode_defined(Op_CMPS,opcode,int_opcode) or
opcode_defined(Op_CMPU,opcode,int_opcode)
else '0';
--- some more code that XORs ActualA and ActualB
-- perform ADD and SUB with just one adder and carry in :
sumAux <= unsigned('0' & ActualA & '1')
+ unsigned('0' & ActualB & Addsub);
-- extract the actual result and the carry out
Carry_out <= std_logic(sumAux(YASEP_SIZE+1));
ASU <= std_logic_vector(sumAux(YASEP_SIZE downto 1));
I see no need of multiplexing an adder and subber results :-)
For the signed/unsigned comparison, have a deeper look at the code
that computes compare_signed, ActualB and ActualA. It's just XORs,
as promised.
- with no "overflow" to compute, no need to worry about it, anyway
:-)
I have to put some more thought into the implications of this. I'm
not a compilers person, so I don't know what the tradeoffs are with
regard to signed vs. unsigned comparisons. I just know what lots of
different CPUs provide. Obviously, we should evaluate this issue,
but
I need input from someone who is an expert on compilers.
then it depends on the compiler you want to use.
"Overflow" is a legacy of an era that is long gone, so don't bother
with it.
I would like to know more about this. Can you provide a longer
explanation? With an overflow bit, (A<B) is a single instruction
change, depending on whether A and B are signed or not. Without an
overflow bit, how is this handled?
Without overflow, the idea is
(for unsigned operands) :
CMPU operand1, operand2
if carry, DO something
for signed operands, just use CMPS instead
(it xors the MSB of both operands)
for less than or equal, instead of less than,
just swap operands and check for "not carry".
At least, that's how I remember these methods.
Hope it helped you,
Thanks!
You're welcome :-)
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)