[Open-graphics] Re: float25 multiplier

André Pouliot Sun, 29 Jul 2007 18:06:19 -0700

Timothy Normand Miller wrote:
> On 7/25/07, André Pouliot <[EMAIL PROTECTED]> wrote:
>   
>> Hello,
>>
>> Here is the first version for the float25 multiplier. The float are
>> based on the IEEE-754 specification but with a reduce mantissa to fit
>> the hardware on the spartan3. It's still doesn't have a test bench, I
>> still need to learn how to do one in verilog and install a simulator.
>> But it do pass synthesis for a spartan3 the resource used are 1
>> multiplier and 110 flip-flop and 47 LUT. The result post-synthesis are
>> what I expected for the logic.
>>     
>
> As for simulators, I suggest Icarus because it's free.
>
> Now, do you need help with writing a test bench in terms of knowing
> how to write behavioral Verilog?  Or do you need suggestions on how to
> come up with test numbers to input?
>
> For the former, I suspect we have a few examples checked into our SVN.
>  Otherwise, I can give you something to get started with.  I think the
> test environments for the PCI controller and the memory controller are
> SVN, and you should be able to use those to figure out how to set up
> clocks and stuff.
>
> If it's the latter, I would suggest writing a C program to output
> Verilog code.  To start with, I'd write a task in Verilog that took
> the inputs and output (coded in hex or whatever).  Have the task set
> the inputs to the multiplier and then wait the pipeline length and
> then test the output of the multiplier against what you gave it.  So
> your test code would look vague like this:
>   
For the test bench where I need more help is with the number generator.
My C is not really functional except for reading code.  For the verilog,
I am still learning how to write it and I expected to write a TB for it
soon. What you did as a TB look good if it's ok by you I will use it.
Thanks for the commentary on my code. I still need to learn a lot about
verilog.


For how the IEEE float work, as I understand, the unnormalized number
happen when you reach the smallest value possible for the exponent. All
the value for that number are represented as 0.mantissa the mantissa
part could be any number, there could even be a value of 2^-23 as
mantissa and all the bit in the mantissa before could be zero. The value
would still be represented as 0.mantissa. The reason as I see it must be
that with the smallest exponent you want to represent a value near zero
so the notation 1.mantissa in that case doesn't make sense.

If we want to save resource we could expect all value to be normalized
even number that should be unnormalized. We could also expect when the
exponent is zero the value to always be zero it would diminish the
dynamic range but simplify a lot of the code and make corner case easier
to handle. Or  we could stick to the spec of IEEE and augment the logic
to do so, it would take at least 1 or 2 more stage and at least 280 more
LUT(big mux for selecting the value to output as mantissa, doing the
same function in asic it would be a barrel shifter, custom made it would
be rather small and fast).

Actually with the logic that I wrote if we multiply an unnormalized with
a normalized number we could end up with a exponent in the normalized
range but the mantissa be a number represented in the format 0.mantissa.
The faulty result wouldn't be detected and it would be inserting an
error in the value since the "1./0." is inferred depending of the exponent.
> initial begin
>    // do reset or whatever
>    // ...
>
>    test_mult('h42987, 'hab76346, 'h3697863);
>    test_mult('hbfe63547, 'h48957348, 'h23476248);
>    // ... more generated code...
> end
>
> task test_mult
> input [24:0] ina, inb, outc;
> begin
>     mult_input_a = ina;
>     mult_input_b = inb;
>     pe; pe; pe; pe;
>     if (mult_output != outc) begin
>         $display(",... something about a mismatch  ...")
>     end
> end
> endtask
>
> task pe;
> begin
>     @(posedge clock);
> end
> endtask
>
>
> (Note that my numbers are bogus.)
>
>
>
>   
>> It's a 4 stage multiplier. The input aren't latched before beginning the
>> bit manipulation, a supposition is made that the previous module will
>> latch is output data.
>>     
>
> This is common practice.
>
>   
>> First stage is used for verification if the mantissa value is normalized
>> or not by testing the exponent. Also in that stage the sign bit is
>> calculated and the incoming signal are split in the different part that
>> composed them.
>>
>> Second stage are where the true calculation take place, addition of the
>> exponent and multiplication of the mantissa.
>>
>> Third stage is where depending on the result of the mantissa we
>> normalizes the result.  Selection of what part of the mantissa to keep
>> and correction of the exponent field, since there is an offset in the
>> exponent to compensate for.
>>
>> Four stage the value are rounded to 0 or infinite, if the exponent fall
>> below 1 or is bigger than 254.
>>
>> The part that could be ameliorated is the 4 stage with the rounding.
>> There is no support for how to handle unnormalized number except by
>> rounding them to zero. The result of the multiplication can't produce
>> NaN or unnormalized number.
>>     
>
> This is what we need for the GPU!
>
> More comments below.
>
>   
>>
>> /*-----------------------------------------------------------------------------
>> File name : float25Mult.v
>> Description : A floating point multiplier base on the Float of IEEE-754
>> mantissa is a 16 bits field, Exponents is 8 bits field and 1 sign bit.
>> The multiplier produce correct result with normalised value, denormalised 
>> value
>> are also calculed correctly but the output is not well handled. If the 
>> Exponent
>> go under zero the value is rounded to zero. If the exponent have a value of 
>> 255
>> or more the result is rounded to infinite.
>>
>> Author : André Pouliot
>> Created : 2007/05/25
>> Modified : 2007/05/25
>> -----------------------------------------------------------------------------*/
>>
>> //module float25 multiplication
>> module floatmult25 (
>> clk,
>> floatA,
>> floatB,
>> floatResult
>> );
>>
>> //Port definition
>> input           clk;
>> input[24:0]     floatA;
>> input[24:0]     floatB;
>> output[24:0]    floatResult;
>>
>> wire            clk;
>> wire[24:0]      floatA;
>> wire[24:0]      floatB;
>> wire[24:0]      floatResult;
>>     
>
> These wires are redundant to the input/output above.
>
>   
>> //internal signal
>> reg             signStg1;
>> reg             normaliseBitA;
>> reg             normaliseBitB;
>> reg[7:0]        exponentAStg1;
>> reg[7:0]        exponentBStg1;
>> reg[15:0]       mantissaAStg1;
>> reg[15:0]       mantissaBStg1;
>>
>> reg             signStg2;
>> reg[8:0]        exponentStg2;
>> reg[33:0]       mantissaStg2;
>>
>> reg             signStg3;
>> reg[9:0]        exponentStg3;
>> reg[15:0]       mantissaStg3;
>>
>> reg             signStg4;
>> reg[7:0]        exponentStg4;
>> reg[15:0]       mantissaStg4;
>>
>> //---------------------
>> //Begin logic
>> //---------------------
>>
>> //First stage evaluation if value is normalised or not and bit splicing
>> //in independant field
>> always @(posedge clk)
>> begin : Stage1
>>   signStg1 <= floatA[24]^floatB[24];
>>   exponentAStg1 <= floatA[23:16];
>>   exponentBStg1 <= floatB[23:16];
>>   mantissaAStg1 <= floatA[15:0];
>>   mantissaBStg1 <= floatB[15:0];
>>   normaliseBitA <= |floatA[23:16];
>>   normaliseBitB <= |floatB[23:16];
>> end
>>
>> //second stage multiplication and addition of the mantissa and exponent
>>
>> always @(posedge clk)
>> begin : Stage2
>>   signStg2 <= signStg1;
>>   exponentStg2 <= exponentAStg1 + exponentBStg1;
>>   mantissaStg2 <= 
>> {normaliseBitA,mantissaAStg1}*{normaliseBitB,mantissaBStg1};
>> end
>>     
>
> At some point, I had gotten confused by the IEEE spec.  I know that
> normalized represents 1.mantissa, but is unnormalized 0.mantissa or
> (0.mantissa<<1) ?
>
>   
>> //Stage 3 mantissa select for reforming the data for next stage
>> //and exponent adjust depending on mantissa result.
>> always @(posedge clk)
>> begin : Stage3
>>   signStg3 <= signStg2;
>>   if (mantissaStg2[33]) begin
>>     exponentStg3 <= exponentStg2 - 126;
>>     mantissaStg3 <= mantissaStg2[32:17];
>>   end else begin
>>     exponentStg3 <= exponentStg2 - 127;
>>     mantissaStg3 <= mantissaStg2[31:16];
>>   end
>> end
>>     
>
> I decided to work this out for myself, forgetting unnormalized.  Once
> you add the 1 to the number, the largest operand you can get is 1FFFF.
>  The smallest is 10000.  So, the largest product is 0x3FFFC0001, which
> is 34 bits, and the smallest is 0x100000000, which is 33 bits.  So it
> looks like you have it right!
>
> I think perhaps you do more with unnormalized numbers than you need
> to.  Have you considered treating them all as zero in input?  You
> might eliminate some logic.
>
>   
>> //Stage 4 Rounding to zero or infinite before output.
>> always @(posedge clk)
>> begin : Stage4
>>   signStg4 <= signStg3;
>>   if (exponentStg3[9] || exponentStg3 == 0) begin//if negatif or zero round 
>> to zero
>>     exponentStg4 <= 8'h00;
>>     mantissaStg4 <= 16'h0000;
>>   end else if(exponentStg3[8] || exponentStg3 == 255) begin
>>     exponentStg4 <= 8'hFF;
>>     mantissaStg4 <= 16'h0000;
>>   end else begin
>>     exponentStg4 <= exponentStg3;
>>     mantissaStg4 <= mantissaStg3;
>>   end
>> end
>>
>> assign floatResult[24] = signStg4;
>> assign floatResult[23:16] = exponentStg4;
>> assign floatResult[15:0] = mantissaStg4;
>>
>> endmodule
>>
>>     
>
>
>   

_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

[Open-graphics] Re: float25 multiplier

Reply via email to