This is what you have long waited for! You have written a lot of signal 
processing code based on lists? You planned to rewrite it to some 
efficient chunky list structure? Better don't do that! The chunk structure 
is usually not what you want. You only use it for efficiency. But it makes 
it almost impossible to write fusion rules that are both correct and 
applicable. Thus I propose to do most things on lists using the functions 
from the stream-fusion package and convert only once to a chunky 
structure, namely at the end of a series of transformations. If the 
operations process the list in one pass from start to end, which is the 
case for most signal processing operations, then fusion will jump in and 
turn everything into a single loop that creates the chunky sequence. 
Sometimes you need to INLINE functions in order to show GHC that there is 
something inside your definitions that can be fused across function 
boundaries. However, for cutting, arranging, and sharing signals, you will 
certainly prefer a chunky sequence as interim data structure.

That's the library you need:
  
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/storablevector-streamfusion

For an example and performance numbers, compile with
$ cabal install -fbuildTests
...
1607 PreInlineUnconditionally
2343 PostInlineUnconditionally
759 UnfoldingDone
96 RuleFired
     12 *#
     2 ++
     8 /##
     2 <#
     1 <=#
     2 ==#->case
     3 SPEC Main.hPutArray
     12 STREAM stream/unstream fusion
...

"stream/unstream" is the fusion rule that makes your code fast! The higher 
the count, the better.

$ ./dist/build/speedtest/speedtest
storablevector-from-stream: 0.131797
storablevector-fused: 8.753800000000002e-2
storablevector: 0.23966099999999999
storablevector-lazy-from-stream: 0.197343
storablevector-lazy-fused: 0.104651
storablevector-lazy: 0.229007
loop-poke: 0.22254400000000002
list-binary-put: 0.111832
list-poke-buffer: 0.146816
list-poke: 0.14077699999999999


  Interestingly, the tests 'storablevector' and 'storablevector-lazy' are 
much slower than their fusion counterparts, although fusion should just 
turn modular code into the same code as in 'storablevector' and 
'storablevector-lazy'. Maybe some strictness tricks in stream-fusion, that 
I do not understand.
  Since my machine has around 1 GHz CPU clock rate, the numbers mean that 
the CPU spends about 100 cycles per sample value, which is quite much for 
my taste. Per sample it must perform a Double multiplication, a pointer 
increment, a conversion from Double to Int16, and finally write that to 
the disk. I thought that today's pipelined, superscalar, what-know-I 
processors can do multiple operations per cycle instead of spending 
multiple cycles per operation as in MC68000 days.

Nonetheless, much fun with playing!
Henning
_______________________________________________
haskell-art mailing list
haskell-art@lurk.org
http://lists.lurk.org/mailman/listinfo/haskell-art

Reply via email to