Re: [Haskell-cafe] Benchmark of DFT libraries in Haskell
Scott Michel scooter@gmail.com wrote: I might be missing something in translation, but if I understand Takayuki's message's intent, everything needs to be calculated because the C-based FFTW library is called (eventually). Laziness doesn't really have an impact. The choice of underlying data structure and whether FFTW wisdom is used clearly has a significant impact. If the Haskell wrapper library is a thick enough, lazy layer around FFTW, the size of the result vector may not at all depend on any FFTW computation. Again, I have no experience at all with FFTW or any Haskell bindings to it. This is just a general remark that is worth keeping in mind. Greets, Ertugrul -- Key-ID: E5DD8D11 Ertugrul Soeylemez e...@ertes.de FPrint: BD28 3E3F BE63 BADD 4157 9134 D56A 37FA E5DD 8D11 Keysrv: hkp://subkeys.pgp.net/ signature.asc Description: PGP signature ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Benchmark of DFT libraries in Haskell
Takayuki Muranushi wrote: * vector-fftw with wisdom was more than 1/2 times faster than fftw in C with wisdom (and with communication overhead.) I would be suspicious of that result. Calling a C function from a library should be slower from Haskell than from C. Erik -- -- Erik de Castro Lopo http://www.mega-nerd.com/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] CRIP: the Curiously Reoccuring Instance Pattern
oleg at okmij.org writes: I think instead you should have: - abandoned FunDeps - embraced Overlapping more! Well, using TypeCast to emulate all FunDeps was demonstrated three years later after HList (or even sooner -- I don't remember when exactly the code was written): http://okmij.org/ftp/Haskell/TypeClass.html#Haskell1 Yes, I see the same idiom. I'll use it below in the definition of the TypeEq predicate. So here's my conjecture: 1. We don't need FunDeps, except to define TypeCast. (And GHC now has equality constraints, which are prettier.) 2. Without FunDeps, overlapping works fine. ... I agree on point 1 but not on point 2. The example of incoherence described in Sec `7.6.3.4. Overlapping instances' of the GHC User Guide has nothing to do with functional dependencies. The question a couple of months ago was: do we need Type-level TypeRep? And you had made a case before that for the TTypeable approach. And TTypeable started life as 'A representation-based equality predicate', Section 9 of the HList paper. And the justification for it was 'Overlapping banned', Section 6. So three questions in light of the approach of abandoning FunDeps and therefore not getting interference with overlapping: A. Does TTypeable need to be so complicated? B. Is TTypeable needed at all? C. Does the 'simplistic' version of type equality testing suffer possible IncoherentInstances? A. I conjecture that TTypeable does not need the TC_code family, nor the phantom type to drive it, nor the NAT encoding. Instead let each type stand for itself, then we just need TYPEOF: type instance TYPEOF () = ( (), NIL ) type instance TYPEOF [a]= ( [()], (TYPEOF a) :/ NIL ) type instance TYPEOF (IO a) = ( IO (), (TYPEOF a) :/ NIL ) -- etc -- I guess TH could generate those instances? Then for testing type equality, we can use a simple overlapping test: type family TypeEq a b :: Bool where { TypeEq a a = True ; TypeEq _ _ = False } This uses the proposed NewAxiom style of type function. Equivalently with a class: instance (TypeCast p TTrue) = TypeEq a a p-- works in Hugs 2002! instance (TypeCast p TFalse) = TypeEq a b p No risk of incoherent instances because the arguments have to be instantiated via TYPEOF. The approach of putting unit as dummy argument to the type constructors means we can also test the 'shape' of the types. 2. But (apart from testing the shape) have we gained anything here over testing the types directly, rather than going via TYPEOF? If the type isn't grounded enough to test for equality (as observed in Section 9 'By chance or by design?'), then neither is it grounded enough to instantiate for TYPEOF. 3. The incoherent instances example in Sec 7.6.3.4 is because of instances defined in separate modules. The simplistic test for type equality declares its two instances together. Is there some way an incoherent instance could slip through? If not, what is the TTypeable mechanism gaining? And by the way, I couldn't help trying a bit of 'compiler archaeology'. I dug out Hugs version November 2002. My revised hDeleteMany works fine, as does my 'repair' to the example in 'Ended up in murky water'. So I can't see a need for TTypeable even back in 2004. AntC ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] ANNOUNCE: system-time-monotonic-0.1
system-time-monotonic [1] provides access to the system's monotonic clock. Usage looks like this: * Use 'newClock' to create a new monotonic clock * Use 'clockGetTime' to see how much time has elapsed since the clock was created. This package currently supports Linux and Windows. It might also work on BSD, but I haven't tested it there. Mac OS X support is currently not implemented, but patches are welcome. GHC uses mach_absolute_time and mach_timebase_info; see ticket #5865 [2]. I also added a handy utility function 'delay', a variant of threadDelay taking a DiffTime instead of Int microseconds. Thus: delay 1.5 Waits 1.5 seconds. Note that since 'delay' simply calls 'threadDelay' in a loop, it is disrupted by system clock changes (again, see ticket #5865). [1]: http://hackage.haskell.org/package/system-time-monotonic [2]: http://hackage.haskell.org/trac/ghc/ticket/5865 --- The rest of this email describes various hurdles involved in implementing this package, and how they were addressed. ## GetTickCount The most obvious one is that GetTickCount has a short wraparound (49.7 days). Two ways to address this: * Don't use GetTickCount; use QueryPerformanceCounter (or similar) instead. This is currently how it's done in GHC HEAD. * Use GetTickCount, but avoid comparing times that are far apart by tracking the total difference each time we look at the clock. I took the second approach, because I found out that QueryPerformanceCounter is actually less accurate in the long run than GetTickCount. In particular, QueryPerformanceCounter stops ticking (or maybe ticks slower, I forget) when the computer is asleep. Here's the trick I use to work around GetTickCount's wraparound (pseudocode): st1 :: Word32 t1 :: DiffTime newClock: st1 := GetTickCount() t1 := 0 clockGetTime: st2 := GetTickCount() t2 := t1 + (Int32)(st2 - st1 modulo 2^32) st1 := st2 t1 := t2 return t2 This workaround only works if clockGetTime is called at least once every 24.8 days. It's important that st2 - st1 be done modulo 2^32, to compensate for wraparound. However, the result should be converted to a signed 32-bit integer, in case st2 was recorded earlier than st1 (which can easily happen in a concurrent context). In particular, here's what you should *not* do (which GHC currently does, when QueryPerformanceCounter is not available): st1 = (bigger_type) GetTickCount(); ... st2 = (bigger_type) GetTickCount(); return (st2 - st1) This will return a bogus value if a wraparound occurred between st1 and st2. system-time-monotonic tests, at runtime, if GetTickCount64 is available. If so, it uses it. Otherwise, it falls back to GetTickCount. Here's the code I used to do the run-time system call lookup: /* cbits/dll.c */ typedef ULONGLONG (WINAPI *GetTickCount64_t)(void); GetTickCount64_t system_time_monotonic_load_GetTickCount64(void) { return (GetTickCount64_t) GetProcAddress(GetModuleHandle(TEXT(kernel32.dll)), GetTickCount64); } -- System/Time/Monotonic/Direct.hsc type C_GetTickCount64 = IO #{type ULONGLONG} foreign import ccall system_time_monotonic_load_GetTickCount64 :: IO (FunPtr C_GetTickCount64) foreign import stdcall dynamic mkGetTickCount64 :: FunPtr C_GetTickCount64 - C_GetTickCount64 Did I do this right? In particular: * Can I assume the kernel32.dll HMODULE won't be pulled out from under me? * Is `foreign import stdcall dynamic` the right incantation for using a pointer to a WINAPI function? ## CLOCK_MONOTONIC is not actually monotonic on Linux CLOCK_MONOTONIC is subject to NTP adjustments. Worse, CLOCK_MONOTONIC stops when the computer is put to sleep (unlike GetTickCount, which does the right thing). Two clocks were introduced very recently in Linux: * CLOCK_MONOTONIC_RAW * CLOCK_BOOTTIME I'd like to support CLOCK_BOOTTIME at some point. I'm not sure if it's subject to NTP adjustments or not, since the announcement [3] says: CLOCK_BOOTTIME is identical to CLOCK_MONOTONIC, except it also includes any time spent in suspend (as currently measured by read_persistent_clock()). This allows applications to get a suspend aware monotonic clock. One idea would be to hard-code the clockid_t of CLOCK_BOOTTIME and CLOCK_MONOTONIC_RAW, and try calling clock_gettime with each of these. If one of these succeeds, use it. Otherwise, fall back to CLOCK_MONOTONIC. Thus, the compiled binary can test, at runtime, if a new enough kernel is available. For now, system-time-monotonic simply uses CLOCK_MONOTONIC. [3]: http://lwn.net/Articles/428176/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Benchmark of DFT libraries in Haskell
Dear Ertugrul, Scott and Erik, thank you for your comments. w.r.t the lazyness, I make the solvers to calculate the amplitude of final FFT results (i.e. to calculate the square magnitude of array elements and sum over them,) compare the response with the expected results and cause side effects depending on the test result. This should cause the FFT chain to be fully evaluated. * vector-fftw with wisdom was more than 1/2 times faster than fftw in C with wisdom (and with communication overhead.) I would be suspicious of that result. Calling a C function from a library should be slower from Haskell than from C. Sorry for the confusion, What I meant is that vector-fftw version takes more time than C version, but less than twice. Please compare the two lines * fft/cpp 1 1048576 102 * fft/vector-fftw 0 1048576 102 in http://paraiso-lang.org/html/bench-dft-in-haskell.html . P.S. including GPU contestants would be interesting! 2012/8/6 Erik de Castro Lopo mle...@mega-nerd.com: Takayuki Muranushi wrote: * vector-fftw with wisdom was more than 1/2 times faster than fftw in C with wisdom (and with communication overhead.) I would be suspicious of that result. Calling a C function from a library should be slower from Haskell than from C. Erik -- -- Erik de Castro Lopo http://www.mega-nerd.com/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe Best, -- Takayuki MURANUSHI The Hakubi Center for Advanced Research, Kyoto University http://www.hakubi.kyoto-u.ac.jp/02_mem/h22/muranushi.html ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe