I'll defer answer to this to others, as I haven't used FFT for a long time.
I do remember, however, that the discrete cosine transform was actually
more popular in the circles I frequented. Would it be difficult to adapt
your implementation to offer dct?
Andrei
David Simcha wrote:
BTW, I've started thinking a little more about big picture issues here,
and I'm debating whether it's a higher priority to improve performance
on power of 2 sizes, or to try to support other radix values.
There are two use cases for an FFT that I'm familiar with. The
power-of-two limitation isn't severe in either of them.
1. Getting an idea of what the spectrum of a signal looks like. Here,
it's common to pad with zeros because the plots become clearer looking,
even if your FFT lib doesn't require it.
2. Computing a convolution. Here, padding with zeros is necessary
anyhow to prevent the signal from being "interpreted" as periodic.
Are there any major use cases where the power of two limitation is
severe, or should I just focus on optimizing powers of 2 and call it a day?
On Mon, Aug 2, 2010 at 10:23 AM, Don Clugston <[email protected]
<mailto:[email protected]>> wrote:
On 2 August 2010 15:41, David Simcha <[email protected]
<mailto:[email protected]>> wrote:
> Unfortunately I just downloaded the benchmark program for FFTW
and my FFT is
> a ton slower, depending on how you look at it. Using size 2^20 as my
> benchmark, FFTW takes about 131 seconds to create its plan, even
using
> -oestimate, the fastest planner. However, the plan can be reused if
> performing multiple FFTs of the same size, and once the plan is
created, it
> can do an FFT of size 2^20 in about 53 milliseconds (which I find
almost
> unbelievable because even sorting an array of size 2^20 using a
> well-optimized quick sort takes almost that long, and FFT seems
like it
> should should have a much larger constant than quick sort),
compared to my
> latest improvements to around 730 on the hardware I'm
benchmarking on.
>
> On the other hand, for one-off use cases, the lack of needing to
create a
> plan is a big win, both from a speed and a simplicity of API
point of view.
> Benchmarking against R, which doesn't appear to use plans,
making the
> comparison somewhat more relevant, things look better for my FFT:
R takes
> about 610 milliseconds for a size 2^20 pure real FFT.
All you're seeing is the L2 cache. Did you see the link I posted to
the NG about the internals of FFTW? The graph at the top shows FFTW is
40 times faster than the 'numerical recipes' code for 2^^20. So your
factor of 13 isn't so bad. Based on that graph, if you reduce it to
say 2^^15, the factor should drop significantly. Adding a little bit
of cache awareness (using core.cpuid) should be able to avoid the
performance cliff.
Also, DMD's floating point optimiser is so primitive, you lose up to a
factor of two immediately.
_______________________________________________
phobos mailing list
[email protected] <mailto:[email protected]>
http://lists.puremagic.com/mailman/listinfo/phobos
------------------------------------------------------------------------
_______________________________________________
phobos mailing list
[email protected]
http://lists.puremagic.com/mailman/listinfo/phobos
_______________________________________________
phobos mailing list
[email protected]
http://lists.puremagic.com/mailman/listinfo/phobos