[Haskell-cafe] vector, alignment and SIMD through FFI

2012-07-06 Thread Nicolas Trangez
Hello Cafe,

Recently I've been playing with the implementation of an algorithm, for
which we already have highly-optimized implementations available (in
plain C/C++ as well as OCaml with calls to C through FFI).

The algorithm works on buffers/arrays/vectors/whatever you want to call
it, which needs to be combined in certain ways. This can be highly
optimized by using SIMD instructions (like the ones provides by several
SSE versions).

I'd like to get a to Haskell version which is comparable in efficiency
as the existing versions, whilst remaining as 'functional' as possible.
I don't mind jumping into some low-level C glue and FFI (using ccall or
custom primops), but this should be limited.

Currently I have something working (still highly unoptimized) using
(unboxed) vectors from the vector package, using mutable versions within
a well-contained ST environment in some places.

One hot zone of the current version is combining several vectors, and
the performance of this operation could be greatly improved by using
SIMD instructions. There's one catch though: when using these, memory
should be aligned on certain boundaries (16 byte in this case).

First and foremost, to be able to pass my vectors to some C functions, I
should change my code into using Storable vectors (which should be fine,
I guess I can expect similar performance characteristics?). I couldn't
find any information on alignment guarantees of these vectors though...

Which is how I get to my question: are there any such guarantees? If
not, are there any pointers to how to proceed with this? I guess
tracking alignment at the type level should be the goal, you can find
some code trying to explain my reasoning up to now at the end of this
email.

I have some issues with this:

- I'd need to re-implement almost all vector operations, which seems
stupid.
- It doesn't actually work right now ;-)
- It'd be nice to be able to encode 'compatible' alignment: as an
example, a 16 byte aligned buffer is also 8 byte aligned...

I hope the above explains somewhat my goal. Any thoughts  help on this
would be very welcome!

Thanks,

Nicolas


module Data.Vector.SIMD (
-- ...
) where

import qualified Data.Vector.Storable as SV

import Foreign.Storable (Storable, sizeOf)
import Foreign.Ptr (Ptr, FunPtr)
import Foreign.ForeignPtr (ForeignPtr, newForeignPtr)
import System.IO.Unsafe (unsafePerformIO)

class Alignment a where
alignment :: a - Int

data A8Byte
instance Alignment A8Byte where
alignment _ = 8

data A16Byte
instance Alignment A16Byte where
alignment _ = 16

newtype Alignment a = SIMDVector a b = V (SV.Vector b)

replicate :: (Alignment a, Storable b) = a - Int - b - SIMDVector a
b
replicate a n b = V v
  where
ptr = unsafePerformIO $ do
v - _mm_malloc n (alignment a)
-- memset etc
return v

v = SV.unsafeFromForeignPtr0 ptr n

-- These are 2 _stub versions of the procedures since xmminstr.h (or
mm_malloc.h
-- when using GCC) contains them as inline procedures which are not
available
-- as-is in a library. There should be some C module which exports
-- _mm_malloc_stub and _mm_free_stub, which simply includes xmminstr.h
and calls
-- the underlying procedures.
foreign import ccall _mm_malloc_stub _mm_malloc_stub :: Int - Int -
IO (Ptr a)
foreign import ccall _mm_free_stub _mm_free_stub :: FunPtr (Ptr a -
IO ())


_mm_malloc :: Storable a = Int - Int - IO (ForeignPtr a)
_mm_malloc s l = do
-- This fails:
-- Ambiguous type variable `a0' in the constraint:
--   (Storable a0) arising from a use of `sizeOf'
-- v - c_mm_malloc (s * sizeOf (undefined :: a)) l
newForeignPtr _mm_free_stub undefined

-- This allocates a 16 byte aligned output buffer, takes 2 existing ones
and
-- calls some FFI function to perform some magic.
-- The implementation could run inside ST, if the FFI import (which e.g.
works
-- on a mutable buffer and returns IO ()) is lifted into ST using
unsafeIOToST
mySIMDFun :: SIMDVector A16Byte a - SIMDVector A16Byte a - SIMDVector
A16Byte a
mySIMDFun a b = undefined


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] vector, alignment and SIMD through FFI

2012-07-06 Thread Thomas DuBuisson
On Fri, Jul 6, 2012 at 1:06 PM, Nicolas Trangez nico...@incubaid.com wrote:
 -- This fails:
 -- Ambiguous type variable `a0' in the constraint:
 --   (Storable a0) arising from a use of `sizeOf'

Here you can either tie a type knot using proxy types or you can use
the scoped type variable language extension.

Perhaps I'm missing something specific to your use, but for the
alignment issue you should be OK just calling allocBytes or one of its
variants.  I made some noise about this a bit ago and it resulted in
some extra words in the report under mallocBytes:


The block of memory is sufficiently aligned for any of the basic
foreign types that fits into a memory block of the allocated size.


Which I'm pretty sure GHC did, and still does, follow.

Cheers,
Thomas

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] vector, alignment and SIMD through FFI

2012-07-06 Thread Bryan O'Sullivan
On Fri, Jul 6, 2012 at 1:43 PM, Thomas DuBuisson thomas.dubuis...@gmail.com
 wrote:

 The block of memory is sufficiently aligned for any of the basic
 foreign types that fits into a memory block of the allocated size.


That's not the same thing as a guarantee of 16-byte alignment, note, as
none of the standard foreign types have that requirement.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] vector, alignment and SIMD through FFI

2012-07-06 Thread Nicolas Trangez
On Fri, 2012-07-06 at 13:43 -0700, Thomas DuBuisson wrote:
 On Fri, Jul 6, 2012 at 1:06 PM, Nicolas Trangez nico...@incubaid.com wrote:
  -- This fails:
  -- Ambiguous type variable `a0' in the constraint:
  --   (Storable a0) arising from a use of `sizeOf'
 
 Here you can either tie a type knot using proxy types or you can use
 the scoped type variable language extension.

Guess I'll have to do some reading ;-) Thanks.

 Perhaps I'm missing something specific to your use, but for the
 alignment issue you should be OK just calling allocBytes or one of its
 variants.  I made some noise about this a bit ago and it resulted in
 some extra words in the report under mallocBytes:
 
 
 The block of memory is sufficiently aligned for any of the basic
 foreign types that fits into a memory block of the allocated size.
 
 
 Which I'm pretty sure GHC did, and still does, follow.

Hmh... as far as I could find, mallocBytes basically does what malloc(3)
does, which is 8-byte alignment if I'm not mistaken on my x86-64 Linux
system. I could use those and the over-allocate-and-offset tricks,
but... that's ugly unless strictly necessary ;-)

Normally posix_memalign or memalign or valloc or _mm_malloc should
provide what I need as-is.

Except, when using those and vector's unsafeFromForeignPtr0, all I get
is a Vector a, which no longer has any alignment information in the
type, so I can't write a function which only accepts N-aligned vectors.
As a result, I'd need to be very careful only to pass aligned vectors to
it (checking manually), add code to handle pre/post-alignment bytes in
my SIMD functions (slow and stupid), or live with it and let my
application crash at random.

I found some work by Oleg Kiselyov and Chung-chieh Shan at [1] which
might be related, yet as of now I feel like that's too general for my
purpose (e.g. I don't see how to integrate it with vector).

Thanks,

Nicolas

[1] http://okmij.org/ftp/Haskell/types.html#ls-resources


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe