Re: [Haskell-cafe] Benchmark of DFT libraries in Haskell

2012-08-06 Thread Ertugrul Söylemez
Scott Michel scooter@gmail.com wrote:

 I might be missing something in translation, but if I understand
 Takayuki's message's intent, everything needs to be calculated because
 the C-based FFTW library is called (eventually). Laziness doesn't
 really have an impact.

 The choice of underlying data structure and whether FFTW wisdom is
 used clearly has a significant impact.

If the Haskell wrapper library is a thick enough, lazy layer around
FFTW, the size of the result vector may not at all depend on any FFTW
computation.

Again, I have no experience at all with FFTW or any Haskell bindings to
it.  This is just a general remark that is worth keeping in mind.


Greets,
Ertugrul

-- 
Key-ID: E5DD8D11 Ertugrul Soeylemez e...@ertes.de
FPrint: BD28 3E3F BE63 BADD 4157  9134 D56A 37FA E5DD 8D11
Keysrv: hkp://subkeys.pgp.net/


signature.asc
Description: PGP signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Benchmark of DFT libraries in Haskell

2012-08-06 Thread Erik de Castro Lopo
Takayuki Muranushi wrote:

 * vector-fftw with wisdom was more than 1/2 times faster than fftw in
 C with wisdom (and with communication overhead.)

I would be suspicious of that result. Calling a C function from a library
should be slower from Haskell than from C.

Erik
-- 
--
Erik de Castro Lopo
http://www.mega-nerd.com/

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] CRIP: the Curiously Reoccuring Instance Pattern

2012-08-06 Thread AntC
 oleg at okmij.org writes:

 
  I think instead you should have:
  - abandoned FunDeps
  - embraced Overlapping more!
 
 Well, using TypeCast to emulate all FunDeps was demonstrated three
 years later after HList (or even sooner -- I don't remember when
 exactly the code was written):
 
 http://okmij.org/ftp/Haskell/TypeClass.html#Haskell1
 

Yes, I see the same idiom. I'll use it below in the definition of the TypeEq 
predicate.

 
  So here's my conjecture:
  1. We don't need FunDeps, except to define TypeCast.
 (And GHC now has equality constraints, which are prettier.)
  2. Without FunDeps, overlapping works fine.
  ...
 
 I agree on point 1 but not on point 2. The example of incoherence
 described in Sec `7.6.3.4. Overlapping instances' of the GHC User
 Guide has nothing to do with functional dependencies.
 

The question a couple of months ago was: do we need Type-level TypeRep? And 
you had made a case before that for the TTypeable approach. And TTypeable 
started life as 'A representation-based equality predicate', Section 9 of the 
HList paper. And the justification for it was 'Overlapping banned', Section 6.

So three questions in light of the approach of abandoning FunDeps and 
therefore not getting interference with overlapping:
A. Does TTypeable need to be so complicated?
B. Is TTypeable needed at all?
C. Does the 'simplistic' version of type equality testing suffer possible 
IncoherentInstances?

A. I conjecture that TTypeable does not need the TC_code family,
   nor the phantom type to drive it, nor the NAT encoding.
   Instead let each type stand for itself, then we just need TYPEOF:

type instance TYPEOF () = ( (), NIL )
type instance TYPEOF [a]= ( [()], (TYPEOF a) :/ NIL )
type instance TYPEOF (IO a) = ( IO (), (TYPEOF a) :/ NIL )
-- etc
-- I guess TH could generate those instances?

Then for testing type equality, we can use a simple overlapping test:

type family TypeEq a b :: Bool where
{ TypeEq a a = True
; TypeEq _ _ = False
}

   This uses the proposed NewAxiom style of type function.
   Equivalently with a class:

instance (TypeCast p TTrue)  = TypeEq a a p-- works in Hugs 2002!
instance (TypeCast p TFalse) = TypeEq a b p

   No risk of incoherent instances because the arguments have to be
   instantiated via TYPEOF.

   The approach of putting unit as dummy argument to the type constructors
   means we can also test the 'shape' of the types.

2. But (apart from testing the shape) have we gained anything here
   over testing the types directly, rather than going via TYPEOF?
   If the type isn't grounded enough to test for equality
   (as observed in Section 9 'By chance or by design?'),
   then neither is it grounded enough to instantiate for TYPEOF.

3. The incoherent instances example in Sec 7.6.3.4 is because of instances
   defined in separate modules.
   The simplistic test for type equality declares its two instances together.
   Is there some way an incoherent instance could slip through?

   If not, what is the TTypeable mechanism gaining?

And by the way, I couldn't help trying a bit of 'compiler archaeology'. I dug 
out Hugs version November 2002. My revised hDeleteMany works fine, as does 
my 'repair' to the example in 'Ended up in murky water'.

So I can't see a need for TTypeable even back in 2004.

AntC


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] ANNOUNCE: system-time-monotonic-0.1

2012-08-06 Thread Joey Adams
system-time-monotonic [1] provides access to the system's monotonic
clock.  Usage looks like this:

 * Use 'newClock' to create a new monotonic clock

 * Use 'clockGetTime' to see how much time has elapsed since the clock
was created.

This package currently supports Linux and Windows.  It might also work
on BSD, but I haven't tested it there.

Mac OS X support is currently not implemented, but patches are
welcome.  GHC uses mach_absolute_time and mach_timebase_info; see
ticket #5865 [2].

I also added a handy utility function 'delay', a variant of
threadDelay taking a DiffTime instead of Int microseconds.  Thus:

delay 1.5

Waits 1.5 seconds.  Note that since 'delay' simply calls 'threadDelay'
in a loop, it is disrupted by system clock changes (again, see ticket
#5865).

 [1]: http://hackage.haskell.org/package/system-time-monotonic
 [2]: http://hackage.haskell.org/trac/ghc/ticket/5865

---

The rest of this email describes various hurdles involved in
implementing this package, and how they were addressed.

## GetTickCount

The most obvious one is that GetTickCount has a short wraparound (49.7
days).  Two ways to address this:

 * Don't use GetTickCount; use QueryPerformanceCounter (or similar)
instead.  This is currently how it's done in GHC HEAD.

 * Use GetTickCount, but avoid comparing times that are far apart by
tracking the total difference each time we look at the clock.

I took the second approach, because I found out that
QueryPerformanceCounter is actually less accurate in the long run than
GetTickCount.  In particular, QueryPerformanceCounter stops ticking
(or maybe ticks slower, I forget) when the computer is asleep.

Here's the trick I use to work around GetTickCount's wraparound (pseudocode):

st1 :: Word32
t1  :: DiffTime

newClock:
st1 := GetTickCount()
t1  := 0

clockGetTime:
st2 := GetTickCount()
t2  := t1 + (Int32)(st2 - st1 modulo 2^32)

st1 := st2
t1  := t2

return t2

This workaround only works if clockGetTime is called at least once
every 24.8 days.

It's important that st2 - st1 be done modulo 2^32, to compensate for
wraparound.  However, the result should be converted to a signed
32-bit integer, in case st2 was recorded earlier than st1 (which can
easily happen in a concurrent context).

In particular, here's what you should *not* do (which GHC currently
does, when QueryPerformanceCounter is not available):

st1 = (bigger_type) GetTickCount();
...
st2 = (bigger_type) GetTickCount();
return (st2 - st1)

This will return a bogus value if a wraparound occurred between st1 and st2.

system-time-monotonic tests, at runtime, if GetTickCount64 is
available.  If so, it uses it.  Otherwise, it falls back to
GetTickCount.  Here's the code I used to do the run-time system call
lookup:

/* cbits/dll.c */
typedef ULONGLONG (WINAPI *GetTickCount64_t)(void);

GetTickCount64_t system_time_monotonic_load_GetTickCount64(void)
{
return (GetTickCount64_t)
GetProcAddress(GetModuleHandle(TEXT(kernel32.dll)),
   GetTickCount64);
}


-- System/Time/Monotonic/Direct.hsc
type C_GetTickCount64 = IO #{type ULONGLONG}

foreign import ccall
system_time_monotonic_load_GetTickCount64 :: IO (FunPtr
C_GetTickCount64)

foreign import stdcall dynamic
mkGetTickCount64 :: FunPtr C_GetTickCount64 - C_GetTickCount64

Did I do this right?  In particular:

 * Can I assume the kernel32.dll HMODULE won't be pulled out from under me?

 * Is `foreign import stdcall dynamic` the right incantation for
using a pointer to a WINAPI function?

## CLOCK_MONOTONIC is not actually monotonic on Linux

CLOCK_MONOTONIC is subject to NTP adjustments.  Worse, CLOCK_MONOTONIC
stops when the computer is put to sleep (unlike GetTickCount, which
does the right thing).

Two clocks were introduced very recently in Linux:

 * CLOCK_MONOTONIC_RAW

 * CLOCK_BOOTTIME

I'd like to support CLOCK_BOOTTIME at some point.  I'm not sure if
it's subject to NTP adjustments or not, since the announcement [3]
says:

CLOCK_BOOTTIME is identical to CLOCK_MONOTONIC, except it also
includes any time spent in suspend (as currently measured by
read_persistent_clock()). This allows applications to get a
suspend aware monotonic clock.

One idea would be to hard-code the clockid_t of CLOCK_BOOTTIME and
CLOCK_MONOTONIC_RAW, and try calling clock_gettime with each of these.
 If one of these succeeds, use it.  Otherwise, fall back to
CLOCK_MONOTONIC.  Thus, the compiled binary can test, at runtime, if a
new enough kernel is available.

For now, system-time-monotonic simply uses CLOCK_MONOTONIC.

 [3]: http://lwn.net/Articles/428176/

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Benchmark of DFT libraries in Haskell

2012-08-06 Thread Takayuki Muranushi
Dear Ertugrul, Scott and Erik, thank you for your comments.

w.r.t the lazyness, I make the solvers to calculate the amplitude of
final FFT results (i.e. to calculate the square magnitude of array
elements and sum over them,) compare the response with the expected
results and cause side effects depending on the test result. This
should cause the FFT chain to be fully evaluated.

 * vector-fftw with wisdom was more than 1/2 times faster than fftw in
 C with wisdom (and with communication overhead.)

 I would be suspicious of that result. Calling a C function from a library
 should be slower from Haskell than from C.

Sorry for the confusion, What I meant is that vector-fftw version takes
more time than C version, but less than twice. Please compare the two lines

* fft/cpp 1 1048576 102
* fft/vector-fftw 0 1048576 102

in http://paraiso-lang.org/html/bench-dft-in-haskell.html .

P.S. including GPU contestants would be interesting!

2012/8/6 Erik de Castro Lopo mle...@mega-nerd.com:
 Takayuki Muranushi wrote:

 * vector-fftw with wisdom was more than 1/2 times faster than fftw in
 C with wisdom (and with communication overhead.)

 I would be suspicious of that result. Calling a C function from a library
 should be slower from Haskell than from C.

 Erik
 --
 --
 Erik de Castro Lopo
 http://www.mega-nerd.com/

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe


Best,

-- 
Takayuki MURANUSHI
The Hakubi Center for Advanced Research, Kyoto University
http://www.hakubi.kyoto-u.ac.jp/02_mem/h22/muranushi.html

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe