Re: Best way to carry on 2-input architecture?

2014-08-22 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes: > For concreteness, say we attempt to use a size 16 FFT, over the > cofficient ring 2^{12}+1 (it's some years since I last looked carefully > at FFT-based multiply, but I hope those parameters make sense. The FFT > inputs are 4 bits, but grow to 12 in th

Re: Best way to carry on 2-input architecture?

2014-08-22 Thread Niels Möller
"Wesley W. Terpstra" writes: > I was not proposing it for 64x64!!! > I have no idea when NTT would be more efficient. > Maybe like 2048x2048 or something. At least in software, the threshold where FFT based multiply becomes useful is a lot lower for wraparound mul than for regular mul. > The ga

Re: Best way to carry on 2-input architecture?

2014-08-22 Thread Wesley W. Terpstra
On Fri, Aug 22, 2014 at 2:06 PM, Niels Möller wrote: > But a 64-bit mul is usually not single cycle? Of course not. I'm not sure what you're driving at here. If mullo and mulhi both have the same clock and are both 3 cycles deep, it does not matter if one has slightly lower latency between two r

Re: Best way to carry on 2-input architecture?

2014-08-22 Thread Niels Möller
"Wesley W. Terpstra" writes: > Nah. You use the same clock for the whole ALU, so your clock rate is > limited by the same critical path for both circuits. Thus, mullo is > actually no faster, despite being potentially two gates shorter. But a 64-bit mul is usually not single cycle? On typical x8

Re: Best way to carry on 2-input architecture?

2014-08-22 Thread Wesley W. Terpstra
On Fri, Aug 22, 2014 at 11:46 AM, Wesley W. Terpstra wrote: >> H = (x - L) mod (B-1) > > So let me get this straight. Your plan is to compute x=a*b mod (B-1). > In other words, you sum 64 columns of 64-rows in a square shape > instead of the usual parallelogram. Then you throw the twos complemen

Re: Best way to carry on 2-input architecture?

2014-08-22 Thread Wesley W. Terpstra
On Thu, Aug 21, 2014 at 11:12 PM, Niels Möller wrote: > One *can* reduce a piece at the right end to a single row ... yeah, but Wallace trees are essentially ripple adders at the right-end = slow. So you don't use them for that. Maybe after you get it down to 3 rows for all columns (ie: the middl