Re: Use of H98 FFI

2003-08-03 Thread Manuel M T Chakravarty
Peter Thiemann <[EMAIL PROTECTED]> wrote,

> I recently had my first exposure to Haskell's FFI when I was trying to
> compute MD5 and SHA1 hashes using the existing C implementations. In
> each case, the idea is to make the hash function available as function
> 
> > md5 :: String -> String
> 
> However, the naive implementation
> 
> > md5_init md5_state
> > n <- newCString str
> > md5_append md5_state n (fromIntegral (length str))
> > md5_finish md5_state md5_digest
> 
> does not scale to computing hashes of really long strings (50 MB, say,
> as arising from reading a moderately big file), since it tries to
> create a CString of that size, first! 
> 
> Trying to avoid the allocation of this giant CString requires to split
> up the original string into smaller parts and convert each part to a
> CString separately. Clearly, this task involves a lot of allocation,
> essentially the input string needs to be copied part by part.
> 
> Hence, I was wondering why the FFI only provides functionality to
> convert an *entire* list of Char into a CString. For applications like
> this hash computation, it would be advantageous to be able to specify
> *how much* of the input string to marshall to the CString and have the
> conversion function return the rest of the input string and the
> CString. That is, in addition to 
> 
> > newCString :: String -> IO CString
> 
> there should be
> 
> > newCStringPart :: String -> Int -> IO (CStringLen, String)
> 
> or even
> 
> > toCStringPart :: String -> CStringLen -> IO (Int, String)
> 
> where CStringLen describes a target buffer into which the String
> argument is to be marshalled.  (and similarly for other list types)

The idea of the FFI Addendum is to provide the *basic*
functionality needed for using foreign code from Haskell.
However, it was explicitly never the aim to cover all
possibly needed idioms.  Instead, we expect that FFI tools
(such as GreenCard or C->Haskell) and additional support
libraries provide this coverage by mapping more complex
marshaling requirements down to the basics described in the
FFI Addendum.

> Clearly, I can program this functionality by hand. But I have to
> revert to byte-wise processing using pokeByteOff, castCharToCChar, and
> so on. In addition, the optimizer does not seem to be very effective on
> such code, so it seems advantageous to provide it in the library
> already.

As Sven already pointed out, it seems like you should use
Word8 instead of Char (which is unicode-based).  The FFI
libraries don't do anything but use pokeElemOff and friends
themselves.  So, performance-wise there will be no
difference.  For `pokeArray', the libraries use

pokeArray :: Storable a => Ptr a -> [a] -> IO ()
#ifndef __GLASGOW_HASKELL__
pokeArray ptr vals =  zipWithM_ (pokeElemOff ptr) [0..] vals
#else
pokeArray ptr vals = go vals 0#
  where go [] n# = return ()
go (val:vals) n# = do pokeElemOff ptr (I# n#) val; go vals (n# +# 1#)
#endif

ie the GHC code makes use of unboxed values, though.

Cheers,
Manuel

PS: It might be useful to provide the functions you describe
as part of the hierachical libraries, though.
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Use of H98 FFI

2003-08-01 Thread Peter Thiemann
> "Derek" == Derek Elkins <[EMAIL PROTECTED]> writes:

Derek> 

Derek> Except that I would probably mapM_ over a list of chunks, I don't
Derek> see what the problem is with your second version of the code is.

The second version allocates memory like crazy, so much that the
pokeByteOff and castCharToCChar version becomes twice roughly twice as
fast. Here are some figures [all compiled with -O2, otherwise version
c looses badly]: 

s - hash in one go
a - chunks of 512 characters
c - byte-wise

[on 51MB String]
Main a : 58.730u 1.440s 1:00.13 100.0%   0+0k 0+0io 182pf+0w
Main c : 29.370u 0.740s 0:30.07 100.1%   0+0k 0+0io 182pf+0w
[on 32MB String]
Main a : 37.500u 1.100s 0:38.58 100.0%   0+0k 0+0io 182pf+0w
Main c : 18.790u 0.510s 0:19.43 99.3%0+0k 0+0io 182pf+0w
[on 500k String]
Main s : 0.560u 0.120s 0:00.66 103.0%0+0k 0+0io 182pf+0w
Main a : 0.610u 0.010s 0:00.59 105.0%0+0k 0+0io 182pf+0w
Main c : 0.290u 0.000s 0:00.27 107.4%0+0k 0+0io 182pf+0w

So the generally most efficient seems to be to use c, but I suspect
that further gains would be possible if this variant were implemented
at a lower level. Hence the proposal for newCStringPart.

-Peter
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Use of H98 FFI

2003-08-01 Thread Derek Elkins
On 01 Aug 2003 09:44:14 +0200
Peter Thiemann <[EMAIL PROTECTED]> wrote:

> I recently had my first exposure to Haskell's FFI when I was trying to
> compute MD5 and SHA1 hashes using the existing C implementations. In
> each case, the idea is to make the hash function available as function
> 
> > md5 :: String -> String
> 
> However, the naive implementation
> 
> > md5_init md5_state
> > n <- newCString str
> > md5_append md5_state n (fromIntegral (length str))
> > md5_finish md5_state md5_digest
> 
> does not scale to computing hashes of really long strings (50 MB, say,
> as arising from reading a moderately big file), since it tries to
> create a CString of that size, first! 
> 
> Trying to avoid the allocation of this giant CString requires to split
> up the original string into smaller parts and convert each part to a
> CString separately. Clearly, this task involves a lot of allocation,
> essentially the input string needs to be copied part by part.
> 
> Hence, I was wondering why the FFI only provides functionality to
> convert an *entire* list of Char into a CString. For applications like
> this hash computation, it would be advantageous to be able to specify
> *how much* of the input string to marshall to the CString and have the
> conversion function return the rest of the input string and the
> CString. That is, in addition to 
> 
> > newCString :: String -> IO CString
> 
> there should be
> 
> > newCStringPart :: String -> Int -> IO (CStringLen, String)
> 
> or even
> 
> > toCStringPart :: String -> CStringLen -> IO (Int, String)
> 
> where CStringLen describes a target buffer into which the String
> argument is to be marshalled.  (and similarly for other list types)
> 
> Clearly, I can program this functionality by hand. But I have to
> revert to byte-wise processing using pokeByteOff, castCharToCChar, and
> so on. In addition, the optimizer does not seem to be very effective
> on such code, so it seems advantageous to provide it in the library
> already.
> 
> But perhaps I'm overlooking something, so I'm appending the code I was
> using below.
> 
> -Peter

Except that I would probably mapM_ over a list of chunks, I don't
see what the problem is with your second version of the code is.

___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Re: Use of H98 FFI

2003-08-01 Thread Sven Panne
Peter Thiemann wrote:
md5 :: String -> String
Hmmm, this should probably be:

> md5 :: [Word8] -> [Word8]

unless you really want the MD5 of the Unicode characters...

Cheers,
   S.
___
Haskell mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell


Use of H98 FFI

2003-08-01 Thread Peter Thiemann
I recently had my first exposure to Haskell's FFI when I was trying to
compute MD5 and SHA1 hashes using the existing C implementations. In
each case, the idea is to make the hash function available as function

> md5 :: String -> String

However, the naive implementation

> md5_init md5_state
> n <- newCString str
> md5_append md5_state n (fromIntegral (length str))
> md5_finish md5_state md5_digest

does not scale to computing hashes of really long strings (50 MB, say,
as arising from reading a moderately big file), since it tries to
create a CString of that size, first! 

Trying to avoid the allocation of this giant CString requires to split
up the original string into smaller parts and convert each part to a
CString separately. Clearly, this task involves a lot of allocation,
essentially the input string needs to be copied part by part.

Hence, I was wondering why the FFI only provides functionality to
convert an *entire* list of Char into a CString. For applications like
this hash computation, it would be advantageous to be able to specify
*how much* of the input string to marshall to the CString and have the
conversion function return the rest of the input string and the
CString. That is, in addition to 

> newCString :: String -> IO CString

there should be

> newCStringPart :: String -> Int -> IO (CStringLen, String)

or even

> toCStringPart :: String -> CStringLen -> IO (Int, String)

where CStringLen describes a target buffer into which the String
argument is to be marshalled.  (and similarly for other list types)

Clearly, I can program this functionality by hand. But I have to
revert to byte-wise processing using pokeByteOff, castCharToCChar, and
so on. In addition, the optimizer does not seem to be very effective on
such code, so it seems advantageous to provide it in the library
already.

But perhaps I'm overlooking something, so I'm appending the code I was
using below.

-Peter



MD5.hs
Description: three interfaces to MD5