The code I posted on the Wiki is for understanding, not for performance measurement. Several improvements are possible:
0. The code recomputes roots of _1 (4 times!) every time it runs; a serious implementation would reuse the roots 1. The final pass, which produces an imaginary-only result, would be replaced by a step to halve the size of the final FFT, which would save 25% overall 2. You would just use the fftw library anyway, which is optimized already and is a few times faster The important point, to me, is that this in an O(*^.) algorithm. It can feasibly multiply polynomials of a million terms, or numbers with millions of digits. I wanted to see whether it would work as well as the literature suggests, and I think the answer is Yes. I also wondered whether Roger should use something like that for multiplication of extended integers, and I think the answer is Yes to that too, though for all I know he's already doing it. I haven't looked at your obta. I question your use of tm*sz as a figure of merit. If I am willing to trade space for time I am happy to take 3x the space in 0.5x the time. That is especially true here, where any operands that are big enough to cause a memory problem would take days for the computation. Henry Rich On 12/30/2010 1:58 AM, R.E. Boss wrote: >> -----Oorspronkelijk bericht----- >> Van: [email protected] [mailto:programming- >> [email protected]] Namens Henry Rich >> Verzonden: dinsdag 28 december 2010 18:42 >> Aan: Programming forum >> Onderwerp: Re: [Jprogramming] Convolution using FFT WAS: Proposal for >> special code voor dyadic f/.@:(g/) >> >> OK, all these suggestions are now in >> >> http://www.jsoftware.com/jwiki/Essays/FFT >> >> Henry Rich >> > > Some observations. > > 'X Y'=. 8192 8193<@(?...@$)"(0) 1000 > rank'iconvolve~ X';'iconvolve~ Y' > +----+-----+----+----+ > |rank|tm*sz|time|size| > +----+-----+----+----+ > | 0 |1.00 |1.00|1.00| > +----+-----+----+----+ > | 1 |5.22 |2.61|2.00| > +----+-----+----+----+ > NB. jump in performance at 2^n > > 'X Y' =. 2 400 ?...@$ 1000x > datatype&> X;Y;(X +/ obta * Y);X iconvolve Y > extended > extended > extended > integer > NB. no extended output > > 'X Y' =. 2 8192 ?...@$ 1000 > rank'X +/ obta * Y';'X iconvolve Y' > +----+-----+----+----+ > |rank|tm*sz|time|size| > +----+-----+----+----+ > | 0 |1.00 |6.49|1.00| > +----+-----+----+----+ > | 1 |1.36 |1.00|8.83| > +----+-----+----+----+ > NB. fast, not lean > > 'X Y' =. 2 8193 ?...@$ 1000 > rank'X +/ obta * Y';'X iconvolve Y' > +----+-----+----+-----+ > |rank|tm*sz|time|size | > +----+-----+----+-----+ > | 0 |1.00 |2.69| 1.00| > +----+-----+----+-----+ > | 1 |6.56 |1.00|17.65| > +----+-----+----+-----+ > NB. overall less efficient > > 'X Y' =. 8000 9000<@(?...@$)"(0) 1000 > rank'X +/ obta * Y';'X iconvolve Y' > +----+-----+----+-----+ > |rank|tm*sz|time|size | > +----+-----+----+-----+ > | 0 | 1.00|2.58| 1.00| > +----+-----+----+-----+ > | 1 |13.26|1.00|34.15| > +----+-----+----+-----+ > NB. especially for arbitrary input > > 'X Y' =. 2 16384 ?...@$ 1000 > rank'X +/ obta * Y';'X iconvolve Y' > +----+-----+----+----+ > |rank|tm*sz|time|size| > +----+-----+----+----+ > | 1 |1.41 |9.40|1.00| > +----+-----+----+----+ > | 0 |1.00 |1.00|6.65| > +----+-----+----+----+ > NB. finally more efficient > > 'X Y' =. 16000 17000<@(?...@$)"(0) 1000 > rank'X +/ obta * Y';'X iconvolve Y' > +----+-----+----+-----+ > |rank|tm*sz|time|size | > +----+-----+----+-----+ > | 0 |1.00 |4.13| 1.00| > +----+-----+----+-----+ > | 1 |5.14 |1.00|21.21| > +----+-----+----+-----+ > NB. However ... > > If f/.@:(g/) would be replaced by special code like obta, figures would be > even (slightly?) more in favor of obta. > > > R.E. Boss > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
