RE: [Haskell-cafe] Optimising UTF8-CString - String marshaling, plus comments on withCStringLen/peekCStringLen
Weird... I sent this over a month ago, and was a bit puzzled as to why it didn't appear on the list. Has it been waiting for a moderator to release? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Alistair Bayley Sent: 04 June 2007 09:44 To: haskell-cafe Cc: Duncan Coutts Subject: Re: [Haskell-cafe] Optimising UTF8-CString - String marshaling,plus comments on withCStringLen/peekCStringLen Hello cafe, (Following up on my own optimisation question, and Duncan's advice to look at http://darcs.haskell.org/ghc/compiler/utils/Encoding.hs) If you want to look at some existing optimised UTF8 encoding/decoding code then take a look at the code used in GHC: http://darcs.haskell.org/ghc/compiler/utils/Encoding.hs Duncan I took a look at the UTF8 decoder in GHC. This inspired me to write one that also used unboxed types directly. Pleasingly, it goes like a cut cat, and uses far less space than the naive version, but it's not portable, which is a bummer. ... * Confidentiality Note: The information contained in this message, and any attachments, may contain confidential and/or privileged material. It is intended solely for the person(s) or entity to which it is addressed. Any review, retransmission, dissemination, or taking of any action in reliance upon this information by persons or entities other than the intended recipient(s) is prohibited. If you received this in error, please contact the sender and delete the material from any computer. * ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: [Haskell-cafe] Optimising UTF8-CString - String marshaling, plus comments on withCStringLen/peekCStringLen
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Stefan O'Rear fromUTF8Ptr unboxes fine for me with HEAD and 6.6.1. - the chr function tests that its Int argument is less than 1114111, before constructing the Char. It'd be nice to avoid this test. You want unsafeChr from the (undocumented) GHC.Base module. http://darcs.haskell.org/ghc-6.6/packages/base/GHC/Base.lhs for reference (but don't copy the file, it's already an importable module). odd duplicated simplifier output) ISTR seeing a bug report about this a while back, we know it's dumb. You could probably use x 0xF8 instead. FWIW, I've optimised this to a point where I'm happy with it, and you can see the results here: http://darcs.haskell.org/takusen/Foreign/C/UTF8.hs I was using ghc-6.6 back in June, and an upgrade to 6.6.1 fixed some of the issues for me (e.g. unboxing Ptrs, bang-patterns differ from seq). I sent a test case to Simon PJ about the duplicated code in the simplifier output, but I can't tell if it's been added as a trac ticket. Alistair * Confidentiality Note: The information contained in this message, and any attachments, may contain confidential and/or privileged material. It is intended solely for the person(s) or entity to which it is addressed. Any review, retransmission, dissemination, or taking of any action in reliance upon this information by persons or entities other than the intended recipient(s) is prohibited. If you received this in error, please contact the sender and delete the material from any computer. * ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Optimising UTF8-CString - String marshaling, plus comments on withCStringLen/peekCStringLen
Hello cafe, (Following up on my own optimisation question, and Duncan's advice to look at http://darcs.haskell.org/ghc/compiler/utils/Encoding.hs) If you want to look at some existing optimised UTF8 encoding/decoding code then take a look at the code used in GHC: http://darcs.haskell.org/ghc/compiler/utils/Encoding.hs Duncan I took a look at the UTF8 decoder in GHC. This inspired me to write one that also used unboxed types directly. Pleasingly, it goes like a cut cat, and uses far less space than the naive version, but it's not portable, which is a bummer. (The docs tell me that using GHC.Exts is the approved way of accessing GHC-specific extensions, but all of the useful stuff seems to be in GHC.Prim.) After some expriments with the simplifier, I think I have a portable version of a direct-from-buffer decoder which seems to perform nearly as well as one written directly against GHC primitive unboxed functions. I'm wondering if there's anything further I can do to improve performance. The portable unboxed version is within about 15% of the unboxed version in terms of time and allocation. Changes I made: - added strictness annotations in the form of a strictness guards that are always False - unrolled loops (they were always short loops anyway, with a maximum of 3 or 4 iterations) - replaced shiftL with multiplication, because multiplication unboxes, while shiftL doesn't. Some things I've noticed in the simplifier output: - the shiftL call hasn't unboxed or inlined into a call to uncheckedShiftL#, which I would prefer. Would this be possible if we added unchecked versions of the shiftL/R functions to Data.Bits? - Ptrs don't get unboxed. Why is this? Some IO monad thing? - the chr function tests that its Int argument is less than 1114111, before constructing the Char. It'd be nice to avoid this test. - why does this code: | x = 0xF7 = remaining 3 (bAND x 0x07) xs | otherwise = err x turn into this i.e. the = turns into two identical case-branches, using eqword# and ltword#, rather than one case-branch using leword# ? case GHC.Prim.eqWord# a11_a2PJ __word 247 of wild25_X2SU { GHC.Base.False - case GHC.Prim.ltWord# a11_a2PJ __word 247 of wild6_Xcw { GHC.Base.False - error call GHC.Base.True - $wremaining_r3dD 3 (__scc {fromUTF8 main:Foreign.C.UTF8 !} GHC.Base.I# (GHC.Prim.word2Int# (GHC.Prim.and# a11_a2PJ __word 7))) xs_aVm }; GHC.Base.True - $wremaining_r3dD 3 (__scc {fromUTF8 main:Foreign.C.UTF8 !} GHC.Base.I# (GHC.Prim.word2Int# (GHC.Prim.and# a11_a2PJ __word 7))) xs_aVm }; BTW, what's the difference between the indexXxxxOffAddr# and readXxxxOffAddr# functions in GHC.Prim? AFAICT they are equivalent, except that the read* functions take an extra State# s parameter. Presumably this is to thread the IO monad's RealWorld value through, to create some sort of data dependency between the functions (and so to ensure ordered evaluation?) Alistair {-# OPTIONS_GHC -fglasgow-exts #-} -- | -- Module : Foreign.C.UTF8 -- Copyright : (c) 2004 John Meacham, Alistair Bayley -- License : BSD-style -- Maintainer : [EMAIL PROTECTED] -- Stability : experimental -- Portability : portable -- -- Marshall Haskell Strings to and from UTF8-encoded CStrings. -- This module's code is inspired by John Meacham's UTF8 en- de-coders, -- and also those found in the HXT library (module Text.XML.HXT.DOM.Unicode). -- -- Note that the -Len functions all return the length in bytes, -- not Chars (this is more useful, as you are most likely to want -- to pass the length to an FFI function, which is most likely -- expecting the length in bytes). If you want the length in Chars, -- well, you have the original String, so... module Foreign.C.UTF8 ( peekUTF8String, peekUTF8StringLen , peekUTF8StringB, peekUTF8StringLenB , newUTF8String, withUTF8String, withUTF8StringLen , toUTF8String, fromUTF8String , lengthUTF8, fromUTF8, toUTF8 ) where import Control.Monad (when, liftM) import Data.Bits import Data.Char import Data.Word (Word8) import Foreign.C.Types import Foreign.C.String import Foreign.Ptr import Foreign.Marshal.Array import Foreign.Storable import GHC.Base import GHC.Ptr (Ptr(..)) nullCChar :: CChar nullCChar = 0 nullByte :: Word8 nullByte = 0 -- | Analogous to peekCString. Converts UTF8 CString to String. peekUTF8String :: CString - IO String peekUTF8String cs = peekArray0 nullByte (castPtr cs) = return . fromUTF8 peekUTF8StringB :: CString - IO String peekUTF8StringB cs = fromUTF8Ptr0 (castPtr cs) -- | Analogous to peekCStringLen. Converts UTF8 CString to String. -- The resulting String will end either when @len@ bytes -- have been converted, or when a NULL is found. peekUTF8StringLen :: CStringLen - IO String peekUTF8StringLen (cs, len) = peekArray len (castPtr cs) = return . fromUTF8 peekUTF8StringLenB :: CStringLen - IO String
Re: [Haskell-cafe] Optimising UTF8-CString - String marshaling, plus comments on withCStringLen/peekCStringLen
On Mon, Jun 04, 2007 at 09:43:32AM +0100, Alistair Bayley wrote: (The docs tell me that using GHC.Exts is the approved way of accessing GHC-specific extensions, but all of the useful stuff seems to be in GHC.Prim.) All of the useful stuff *is* exported from GHC.Exts, it even says so in the haddock: Synopsis ... module GHC.Prim That is, GHC.Exts exports everything GHC.Prim does. Standard H98 re-export syntax. Besides, user code can't import GHC.Prim at all in GHC HEADs newer than a couple months (arguably a bug, but it only breaks bad code, so...) Some things I've noticed in the simplifier output: - the shiftL call hasn't unboxed or inlined into a call to uncheckedShiftL#, which I would prefer. Would this be possible if we added unchecked versions of the shiftL/R functions to Data.Bits? - Ptrs don't get unboxed. Why is this? Some IO monad thing? fromUTF8Ptr unboxes fine for me with HEAD and 6.6.1. - the chr function tests that its Int argument is less than 1114111, before constructing the Char. It'd be nice to avoid this test. You want unsafeChr from the (undocumented) GHC.Base module. http://darcs.haskell.org/ghc-6.6/packages/base/GHC/Base.lhs for reference (but don't copy the file, it's already an importable module). - why does this code: | x = 0xF7 = remaining 3 (bAND x 0x07) xs | otherwise = err x turn into this i.e. the = turns into two identical case-branches, using eqword# and ltword#, rather than one case-branch using leword# ? case GHC.Prim.eqWord# a11_a2PJ __word 247 of wild25_X2SU { GHC.Base.False - case GHC.Prim.ltWord# a11_a2PJ __word 247 of wild6_Xcw { GHC.Base.False - error call GHC.Base.True - $wremaining_r3dD 3 (__scc {fromUTF8 main:Foreign.C.UTF8 !} GHC.Base.I# (GHC.Prim.word2Int# (GHC.Prim.and# a11_a2PJ __word 7))) xs_aVm }; GHC.Base.True - $wremaining_r3dD 3 (__scc {fromUTF8 main:Foreign.C.UTF8 !} GHC.Base.I# (GHC.Prim.word2Int# (GHC.Prim.and# a11_a2PJ __word 7))) xs_aVm }; ISTR seeing a bug report about this a while back, we know it's dumb. You could probably use x 0xF8 instead. BTW, what's the difference between the indexXxxxOffAddr# and readXxxxOffAddr# functions in GHC.Prim? AFAICT they are equivalent, except that the read* functions take an extra State# s parameter. Presumably this is to thread the IO monad's RealWorld value through, to create some sort of data dependency between the functions (and so to ensure ordered evaluation?) Exactly. readFoo won't be reordered, indexFoo will - which matters when doing reads and writes at addresses that might alias. Stefan signature.asc Description: Digital signature ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Optimising UTF8-CString - String marshaling, plus comments on withCStringLen/peekCStringLen
Simon, Sorry for the delay on responding. I'm using 6.6, so I'll upgrade to 6.6.1 and retest. Preusmably you're only interested if this behaviour persists in 6.6.1. I'll check both cases and make a test cases for them if necessary. I've upgraded to 6.6.1 and am pleased to report that there appears to be no difference between seq and bang-patterns in the simplifier output. As for the other case (where a function results in two functions in the simplifier output, which appear to be some sort of worker-wrapper pair), a module is attached which gives this behaviour. I compiled it with: ghc -c UTF8.hs -O2 -prof -auto -ddump-simpl Look for $s$wfromUTF8Ptr and $wfromUTF8Ptr. I also notice that with -auto-all, the function readUTF8Char appears in the simplifier output, but with -auto it is inlined into fromUTF8Ptr, and so vanishes from the simplifier output. Running my test case compiled with -auto seems to gives run times at about 70-80% of -auto-all, and allocation is 40% (!) of -auto-all; that's a significant difference. Alistair module UTF8 ( peekUTF8String, peekUTF8StringLen ) where import Control.Monad (when, liftM) import Data.Char import Data.Word (Word8) import Foreign.C.String import Foreign.Ptr import Foreign.Marshal.Array import Foreign.Storable import GHC.Base (unsafeChr) nullByte :: Word8 nullByte = 0 peekUTF8String :: CString - IO String peekUTF8String cs = fromUTF8Ptr0 (castPtr cs) peekUTF8StringLen :: CStringLen - IO String peekUTF8StringLen (cs, len) = fromUTF8Ptr (len-1) (castPtr cs) fromUTF8Ptr0 :: Ptr Word8 - IO String fromUTF8Ptr0 p = do len - lengthArray0 nullByte p fromUTF8Ptr (len-1) p fromUTF8Ptr :: Int - Ptr Word8 - String - IO String fromUTF8Ptr bytes p acc | () `seq` bytes `seq` p `seq` acc `seq` False = undefined | bytes 0 = return acc | otherwise = do x - liftM fromIntegral (peekElemOff p bytes) case () of _ | x == 0 - error (fromUTF8Ptr: zero byte found in string as position ++ show bytes) | x = 0x7F - fromUTF8Ptr (bytes-1) p (unsafeChr x:acc) | x = 0xBF bytes == 0 - error fromUTF8Ptr: surrogate at start of string | x = 0xBF - fromUTF8Ptr (bytes-1) p acc | otherwise - do c - readUTF8Char x bytes p fromUTF8Ptr (bytes-1) p (c:acc) readUTF8Char :: Int - Int - Ptr Word8 - IO Char readUTF8Char x offset p | () `seq` x `seq` offset `seq` p `seq` False = undefined | otherwise = case () of _ | x == 0 - err x | x = 0x7F - return (unsafeChr x) | x = 0xBF - err x | x = 0xDF - do x1 - liftM fromIntegral (peekElemOff p (offset + 1)) return (unsafeChr ( ((x - 0xC0) * 64) + (x1 - 0x80) )) | x = 0xEF - do x1 - liftM fromIntegral (peekElemOff p (offset + 1)) x2 - liftM fromIntegral (peekElemOff p (offset + 2)) return (unsafeChr ( ((x - 0xE0) * 4096) + ((x1 - 0x80) * 64) + (x2 - 0x80) )) | x = 0xF7 - do x1 - liftM fromIntegral (peekElemOff p (offset + 1)) x2 - liftM fromIntegral (peekElemOff p (offset + 2)) x3 - liftM fromIntegral (peekElemOff p (offset + 3)) return (unsafeChr ( ((x - 0xF0) * 262144) + ((x1 - 0x80) * 4096) + ((x2 - 0x80) * 64) + (x3 - 0x80) )) | otherwise - err x where err x = error (readUTF8Char: illegal UTF-8 character ++ show x) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Optimising UTF8-CString - String marshaling, plus comments on withCStringLen/peekCStringLen
Simon, You're right, both versions should give the same code. Which version of GHC are you using? Both with the HEAD and with 6.6.1 I get the nice unboxed code with the `seq` version too. My test program is below. I'm using 6.6, so I'll upgrade to 6.6.1 and retest. Preusmably you're only interested if this behaviour persists in 6.6.1. I'll check both cases and make a test cases for them if necessary. Thanks, Alistair ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
RE: [Haskell-cafe] Optimising UTF8-CString - String marshaling, plus comments on withCStringLen/peekCStringLen
Alistair You're right, both versions should give the same code. Which version of GHC are you using? Both with the HEAD and with 6.6.1 I get the nice unboxed code with the `seq` version too. My test program is below. If you can make a reproducible test case of the unexpected behaviour please file it as a Trac bug and I will take a look. Pls include the actual code and command line you used to compile it. I have not looked at your second point; again a reproducible example would be helpful. Thanks Simon import GHC.Ptr import GHC.Word readUTF8Char :: Int - Int - Ptr Word8 - Int readUTF8Char x offset p | () !x !offset !p !False = 10 | x7 = 3 | otherwise = readUTF8Char x offset p where x ! y = seq x y -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Alistair Bayley Sent: 05 June 2007 09:37 To: Duncan Coutts Cc: haskell-cafe Subject: Re: [Haskell-cafe] Optimising UTF8-CString - String marshaling, plus comments on withCStringLen/peekCStringLen {- Arity: 4 Strictness: LSSL -} Right. Unboxed args are always given the annotation L. So that function is strict in that pointer arg, but GHC is choosing not to unbox it. I'm not sure why that's the case. I thought maybe it was because I hadn't said -funbox-strict-fields, but it didn't change when I did. Is there some semantic advantage to bang-patterns, or is it just a syntactic convenience? It's syntactic convenience. I've noticed differences between strictness guards and bang-patterns with GHC. This is a problem, I think, because bang-patterns are GHC only, and I wanted to keep the code portable. This strictness guard: readUTF8Char :: Int - Int - Ptr Word8 - IO Char readUTF8Char x offset p | () !x !offset !p !False = undefined | otherwise = ... where x ! y = seq x y results in this simplifier output: [Arity 3 Str: DmdType LSS] $wreadUTF8Char_r38I = \ (ww_s30B :: GHC.Prim.Int#) (w_s30D :: GHC.Base.Int) (w1_s30E :: GHC.Ptr.Ptr GHC.Word.Word8) - However, if I change this to: {-# OPTIONS_GHC -fbang-patterns #-} ... readUTF8Char :: Int - Int - Ptr Word8 - IO Char readUTF8Char !x !offset !p | otherwise = then I get this simpifier output: [Arity 3 Str: DmdType LLL] $wreadUTF8Char_r38n = \ (ww_s2Zk :: GHC.Prim.Int#) (ww1_s2Zo :: GHC.Prim.Int#) (ww2_s2Zs :: GHC.Prim.Addr#) - Also, with bang-patterns I've noticed that fromUTF8Ptr transforms into two functions, which contain very similar code: [Arity 5] Foreign.C.UTF8.$s$wfromUTF8Ptr = \ (acc_X151 :: GHC.Base.String) (new_s_a2GQ :: GHC.Prim.State# GHC.Prim.RealWorld) (a87_a2GR :: GHC.Base.Char) (ww_s2ZH :: GHC.Prim.Addr#) (sc_s36j :: GHC.Prim.Int#) - [Arity 4 Str: DmdType LLSL] Foreign.C.UTF8.$wfromUTF8Ptr = \ (ww_s2ZD :: GHC.Prim.Int#) (ww1_s2ZH :: GHC.Prim.Addr#) (w_s2ZJ :: GHC.Base.String) (w1_s2ZK :: GHC.Prim.State# GHC.Prim.RealWorld) - $wfromUTF8Ptr calls itself and $s$wfromUTF8Ptr, but $s$wfromUTF8Ptr only every calls itself, so $wfromUTF8Ptr could be considered the wrapper, and $s$wfromUTF8Ptr the worker, I guess. AFAICT it's a transformation of the various cases in fromUTF8Ptr: | x = 0x7F - fromUTF8Ptr (bytes-1) p (chr x:acc) | x = 0xBF bytes == 0 - error fromUTF8Ptr: ... | x = 0xBF - fromUTF8Ptr (bytes-1) p acc | otherwise - do c - readUTF8Char x bytes p fromUTF8Ptr (bytes-1) p (c:acc) The first case, x = 0x7F, results in a call to $s$wfromUTF8Ptr. The third case, x = 0xBF, results in a call to $wfromUTF8Ptr. The last case, otherwise, results in a call to $s$wfromUTF8Ptr. The calls to $s$wfromUTF8Ptr pass the newly constructed Char and the rest of the String separately, and they are cons'ed in $s$wfromUTF8Ptr. Not sure what benefit this gives... I don't know what transformation causes this, but it was a bit of a surprise. Alistair ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Optimising UTF8-CString - String marshaling, plus comments on withCStringLen/peekCStringLen
{- Arity: 4 Strictness: LSSL -} Right. Unboxed args are always given the annotation L. So that function is strict in that pointer arg, but GHC is choosing not to unbox it. I'm not sure why that's the case. I thought maybe it was because I hadn't said -funbox-strict-fields, but it didn't change when I did. Is there some semantic advantage to bang-patterns, or is it just a syntactic convenience? It's syntactic convenience. I've noticed differences between strictness guards and bang-patterns with GHC. This is a problem, I think, because bang-patterns are GHC only, and I wanted to keep the code portable. This strictness guard: readUTF8Char :: Int - Int - Ptr Word8 - IO Char readUTF8Char x offset p | () !x !offset !p !False = undefined | otherwise = ... where x ! y = seq x y results in this simplifier output: [Arity 3 Str: DmdType LSS] $wreadUTF8Char_r38I = \ (ww_s30B :: GHC.Prim.Int#) (w_s30D :: GHC.Base.Int) (w1_s30E :: GHC.Ptr.Ptr GHC.Word.Word8) - However, if I change this to: {-# OPTIONS_GHC -fbang-patterns #-} ... readUTF8Char :: Int - Int - Ptr Word8 - IO Char readUTF8Char !x !offset !p | otherwise = then I get this simpifier output: [Arity 3 Str: DmdType LLL] $wreadUTF8Char_r38n = \ (ww_s2Zk :: GHC.Prim.Int#) (ww1_s2Zo :: GHC.Prim.Int#) (ww2_s2Zs :: GHC.Prim.Addr#) - Also, with bang-patterns I've noticed that fromUTF8Ptr transforms into two functions, which contain very similar code: [Arity 5] Foreign.C.UTF8.$s$wfromUTF8Ptr = \ (acc_X151 :: GHC.Base.String) (new_s_a2GQ :: GHC.Prim.State# GHC.Prim.RealWorld) (a87_a2GR :: GHC.Base.Char) (ww_s2ZH :: GHC.Prim.Addr#) (sc_s36j :: GHC.Prim.Int#) - [Arity 4 Str: DmdType LLSL] Foreign.C.UTF8.$wfromUTF8Ptr = \ (ww_s2ZD :: GHC.Prim.Int#) (ww1_s2ZH :: GHC.Prim.Addr#) (w_s2ZJ :: GHC.Base.String) (w1_s2ZK :: GHC.Prim.State# GHC.Prim.RealWorld) - $wfromUTF8Ptr calls itself and $s$wfromUTF8Ptr, but $s$wfromUTF8Ptr only every calls itself, so $wfromUTF8Ptr could be considered the wrapper, and $s$wfromUTF8Ptr the worker, I guess. AFAICT it's a transformation of the various cases in fromUTF8Ptr: | x = 0x7F - fromUTF8Ptr (bytes-1) p (chr x:acc) | x = 0xBF bytes == 0 - error fromUTF8Ptr: ... | x = 0xBF - fromUTF8Ptr (bytes-1) p acc | otherwise - do c - readUTF8Char x bytes p fromUTF8Ptr (bytes-1) p (c:acc) The first case, x = 0x7F, results in a call to $s$wfromUTF8Ptr. The third case, x = 0xBF, results in a call to $wfromUTF8Ptr. The last case, otherwise, results in a call to $s$wfromUTF8Ptr. The calls to $s$wfromUTF8Ptr pass the newly constructed Char and the rest of the String separately, and they are cons'ed in $s$wfromUTF8Ptr. Not sure what benefit this gives... I don't know what transformation causes this, but it was a bit of a surprise. Alistair ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Optimising UTF8-CString - String marshaling, plus comments on withCStringLen/peekCStringLen
On Mon, 2007-06-04 at 09:43 +0100, Alistair Bayley wrote: After some expriments with the simplifier, I think I have a portable version of a direct-from-buffer decoder which seems to perform nearly as well as one written directly against GHC primitive unboxed functions. I'm wondering if there's anything further I can do to improve performance. The portable unboxed version is within about 15% of the unboxed version in terms of time and allocation. Well done. Changes I made: - added strictness annotations in the form of a strictness guards that are always False - unrolled loops (they were always short loops anyway, with a maximum of 3 or 4 iterations) - replaced shiftL with multiplication, because multiplication unboxes, while shiftL doesn't. Some things I've noticed in the simplifier output: - the shiftL call hasn't unboxed or inlined into a call to uncheckedShiftL#, which I would prefer. Would this be possible if we added unchecked versions of the shiftL/R functions to Data.Bits? I think this was discussed before. Check the archives. I don't think there was any resolution however. As I recall there was some attempt to get the ordinary shiftL/R functions to inline and eliminate the bounds checks when the shifts were constants less than the bounds. - Ptrs don't get unboxed. Why is this? Some IO monad thing? Got any more detail? - the chr function tests that its Int argument is less than 1114111, before constructing the Char. It'd be nice to avoid this test. Yeah. In Data.ByteString.Char8 we invent this w2c c2w functions to avoid the test. There should probably be a standard version of this unchecked conversion. - why does this code: | x = 0xF7 = remaining 3 (bAND x 0x07) xs | otherwise = err x turn into this i.e. the = turns into two identical case-branches, using eqword# and ltword#, rather than one case-branch using leword# ? I thought it might be to do with the instance declaration not giving a method for = but letting it default to being defined in terms of and ==, however it's hard to say what's really going on, since the instance declaration is just: data Word = W# Word# deriving (Eq, Ord) I have no idea how ghc derives Eq and Ord instances for a type when it contains unboxed components, which cannot themselves be members of a type class. BTW, what's the difference between the indexXxxxOffAddr# and readXxxxOffAddr# functions in GHC.Prim? AFAICT they are equivalent, except that the read* functions take an extra State# s parameter. Presumably this is to thread the IO monad's RealWorld value through, to create some sort of data dependency between the functions (and so to ensure ordered evaluation?) Right. So it'd only be safe to use the index ones on immutable arrays because there's no way to enforce sequencing with respect to array writes when using the index version. Duncan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Optimising UTF8-CString - String marshaling, plus comments on withCStringLen/peekCStringLen
On 04/06/07, Duncan Coutts [EMAIL PROTECTED] wrote: On Mon, 2007-06-04 at 09:43 +0100, Alistair Bayley wrote: After some experiments with the simplifier, ... The portable unboxed version is within about 15% of the unboxed version in terms of time and allocation. Well done. Of course, that might be saying more about the performance of the unboxed version... Yeah. In Data.ByteString.Char8 we invent this w2c c2w functions to avoid the test. There should probably be a standard version of this unchecked conversion. Bulat suggested unsafeChr from GHC.Exts, but I can't see this. I guess I could roll my own; after all it's just (C# (chr# x)). BTW, what's the difference between the indexXxxxOffAddr# and readXxxxOffAddr# functions in GHC.Prim? Right. So it'd only be safe to use the index ones on immutable arrays because there's no way to enforce sequencing with respect to array writes when using the index version. In this case I'm reading from a CString buffer, which is (hopefully) not changing during the function invocation, and never written to by my code. So presumably it'd be pretty safe to use the index- functions. - Ptrs don't get unboxed. Why is this? Some IO monad thing? Got any more detail? OK. readUTF8Char's transformation starts with this: $wreadUTF8Char_r3de = \ (ww_s33v :: GHC.Prim.Int#) (w_s33x :: GHC.Ptr.Ptr GHC.Word.Word8) - If we expect it to unbox, I'd expect the Ptr to become Addr#. Later, this (w_s33x) gets unboxed just before it's used: case w_s33x of wild6_a2JM { GHC.Ptr.Ptr a_a2JO - case GHC.Prim.readWord8OffAddr# @ GHC.Prim.RealWorld a_a2JO 1 s_a2Jf readUTF8Char is called by fromUTF8Ptr, where there's a little Ptr arithmetic. The Ptr argument to fromUTF8Ptr is unboxed, offset is added, and the result is reboxed so that it can be consumed by readUTF8Char. All a bit unnecessary, I think e.g. Foreign.C.UTF8.$wfromUTF8Ptr = ... let { p'_s38N [Just D(T)] :: GHC.Ptr.Ptr GHC.Word.Word8 [Str: DmdType] p'_s38N = __scc {fromUTF8Ptr main:Foreign.C.UTF8 !} case w_s33J of wild11_a2DW { GHC.Ptr.Ptr addr_a2DY - GHC.Ptr.Ptr @ GHC.Word.Word8 (GHC.Prim.plusAddr# addr_a2DY ww_s33H) } } in ... I'd prefer the Ptr arg to fromUTF8Ptr to also be unboxed, so that the primitive plusAddr# can be used directly on it before it's passed to readUTF8Char. Perhaps instead I could push this Ptr arithmetic down to readUTF8Char, and pass it the constant Ptr to the start of the buffer, and the offset into it, rather than a Ptr to the current position. Alistair ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Optimising UTF8-CString - String marshaling, plus comments on withCStringLen/peekCStringLen
On Mon, 2007-06-04 at 13:12 +0100, Alistair Bayley wrote: BTW, what's the difference between the indexXxxxOffAddr# and readXxxxOffAddr# functions in GHC.Prim? Right. So it'd only be safe to use the index ones on immutable arrays because there's no way to enforce sequencing with respect to array writes when using the index version. In this case I'm reading from a CString buffer, which is (hopefully) not changing during the function invocation, and never written to by my code. So presumably it'd be pretty safe to use the index- functions. Yes. - Ptrs don't get unboxed. Why is this? Some IO monad thing? Got any more detail? OK. readUTF8Char's transformation starts with this: $wreadUTF8Char_r3de = \ (ww_s33v :: GHC.Prim.Int#) (w_s33x :: GHC.Ptr.Ptr GHC.Word.Word8) - If we expect it to unbox, I'd expect the Ptr to become Addr#. Later, this (w_s33x) gets unboxed just before it's used: case w_s33x of wild6_a2JM { GHC.Ptr.Ptr a_a2JO - case GHC.Prim.readWord8OffAddr# @ GHC.Prim.RealWorld a_a2JO 1 s_a2Jf readUTF8Char is called by fromUTF8Ptr, where there's a little Ptr arithmetic. The Ptr argument to fromUTF8Ptr is unboxed, offset is added, and the result is reboxed so that it can be consumed by readUTF8Char. All a bit unnecessary, I think e.g. Are you sure fromUTF8Ptr is strict in its ptr arg? Try with a ! pattern on that arg. You'll need -fbang-patterns. That translates into the seq False trick that oy're already using elsewhere. Experimenting by adding ! patterns is much quicker and easier however. Once you've got the right set of strictness annotations you can go back to using the more portable, but ugly seq False trick. You can also get ghc to tell you what strictness it inferred for your functions. It's shown in the .hi file. Use ghc --show-iface UTF8.hi. I think the UL syntax for describing the strictness is described in the GHC manual somewhere (or perhaps it's on the GHC wiki). Duncan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Optimising UTF8-CString - String marshaling, plus comments on withCStringLen/peekCStringLen
On Wed, 2007-05-23 at 10:45 +0100, Alistair Bayley wrote: Hello cafe, D'ya fancy an optimisation exercise? In Takusen we currently marshal UTF8-encoded CStrings by first turning the CString into [word8], and then running this through a [Word8] - String UTF8 decoder. We thought it would be more space-efficient (and hopefully faster) to marshal directly from the CString buffer, rather than use an intermediate list. We assumed it would be most space-efficient by working backwards from the end of the CString buffer, like the peekArray/peekArray0 functions in Foreign.Marshal.Array. So I implemented it and benchmarked against the original UTF8 marshaling function, which simply converts CString - [Word] - String. And to my surprise, the [Word8] - String solution seems to be faster, and uses less memory, than the function which creates the String directly from the CString buffer. Now I appreciate that GHC's optimisations are quite effective (presumably deforestation should take credit here), but I thought I'd ask the haskell-cafe optimiser if we could do better with the direct-from-buffer function. I'm loath to start eyeballing GHC Core, but if needs must... If you want to look at some existing optimised UTF8 encoding/decoding code then take a look at the code used in GHC: http://darcs.haskell.org/ghc/compiler/utils/Encoding.hs utf8DecodeString and utf8EncodeString. They're both fairly low level, dealing with pointers to existing buffers. In particular for the utf8EncodeString you need to have allocated a buffer of the right size already. You can use utf8EncodedLength to find that if you don't already know it. Also, utf8DecodeString assumes that the end of the string has sentinel bytes so it might not be directly suitable for your example. Duncan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe