In response to the lack of multiplier logic on the XP10, I've considered
a completely serial multiplier.  Now, we already have some good ideas in
this thread, esp the radix-4 sounds promising, but in the spirit of
investigating alternatives:

The attached multiplier takes two 16 bit inputs as operands.  One of
them is processed serially.  It can be used in two ways:  Either, we run
it 32 cycles to extract the final result serially, or better: We can run
it 16 cycles and post-process the output with the ALU adder as indicated
by the attached test module.

I think the serial multiplier has some nice properties:  All internal
connections are very local.  Further, it only has one level of LUTs if
we can eliminate the "start"-condition; either by using reset logic if
that's allowed, or by making sure the internal state goes to zero before
issuing a new multiply.  This is also my main question to the experts:
Given such short and local chains of combinatorics, could it be possible
to run it at twice the clock speed of the CPU?  That could give us an 8
cycle 16x16->32 multiplier quite cheaply in terms of gate count.

If we choose to post-process the result with the ALU adder, we may as
well make the parallel-processed operand 32 bit.  I haven't looked too
closely it this, but I think it's possible to reuse the same logic for
8x32, 16x32, and 32x32 by sampling the result after 8, 16, or 32 cycles,
rsp, and do simple bit-shifts of the partial result before we feed it to
the adder.
module mul16x16ser_helper(clock, start, x, y, za_o, zb_o);

input clock, start;
input[15:0] x, y;
output[31:0] za_o;
output[15:0] zb_o;

reg[15:0] x_r;
reg[14:0] y_r;
reg[14:0] s;
reg[15:0] c;
reg[15:0] v;

integer i;
always @(posedge clock) begin
    if (start) begin
        x_r <= x;
        y_r <= y[15:1];
        s <= x[15:1] & {15{y[0]}};
        c <= 0;
        v <= {x[0], 15'b0};
    end else begin
        {c[0], v} <= {(x_r[0] & y_r[0]) + c[0] + s[0], v[15:1]};
        for (i = 1; i < 15; i = i + 1)
            {c[i], s[i - 1]} <= (x_r[i] & y_r[0]) + c[i] + s[i];
        {c[15], s[14]} <= (x_r[15] & y_r[0]) + c[15];
        y_r <= {1'b0, y_r[14:1]};
    end
end

assign za_o = {s[14:0], v};
assign zb_o = c;

endmodule
module test();

reg clock, start;
reg[15:0] x, y;
wire[31:0] za;
wire[15:0] zb;

always #5 clock <= !clock;

mul16x16ser_helper mul(clock, start, x, y, za, zb);

wire[31:0] z = za + {zb, 16'b0};
initial begin
    $monitor("%d za = %d, zb = %d, z = %d", $time, za, zb, z);
    clock <= 1;
    start <= 1;
    x <= 60000;
    y <= 60002;
    #1000;
    start <= 0;
    #150;
    $finish;
end

endmodule
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to