Re: [Haskell-cafe] Re: Detecting system endianness

2008-12-23 Thread wren ng thornton

Maurí­cio wrote:

  But why would you want that? I understand the only
  situation when talking about number of bytes
  makes sense is when you are using Foreign and
  Ptr. (...)

 Because I'm using both Ptr and Foreign? ;)

 See my recent announcement for bytestring-trie. One of the 
 optimizations I'm working on is to read off a full natural word at a 
 time, (...)


I see, you mean the size of a machine word, not of Data.Word.


AFAIK, Data.Word.Word is defined to be the same size as Prelude.Int 
(which it isn't on GHC 6.8.2 on Intel OS X: 32bits vs 31bits) and Int is 
defined to be at least 31bits but can be more. My interpretation of this 
is that Int and Word will generally be implemented by the architecture's 
natural word size in order to optimize performance, much like C's int 
and unsigned int but with better definition of allowed sizes. This 
seems to be supported by the existence of definite-sized variants Word8, 
Word16, Word32...


So yeah, I'm meaning the machine word, but I think Word is intended to 
proxy for that. Maybe I'm wrong, but provided that Word contains (or can 
be persuaded to contain) a round number of Word8 and that operations on 
Word are cheaper than the analogous sequence of operations on the Word8 
representation, that's good enough for my needs.


--
Live well,
~wren
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Detecting system endianness

2008-12-23 Thread John Meacham
On Tue, Dec 23, 2008 at 07:44:14PM -0500, wren ng thornton wrote:
 AFAIK, Data.Word.Word is defined to be the same size as Prelude.Int  
 (which it isn't on GHC 6.8.2 on Intel OS X: 32bits vs 31bits) and Int is  
 defined to be at least 31bits but can be more. My interpretation of this  
 is that Int and Word will generally be implemented by the architecture's  
 natural word size in order to optimize performance, much like C's int  
 and unsigned int but with better definition of allowed sizes. This  
 seems to be supported by the existence of definite-sized variants Word8,  
 Word16, Word32...

Of course, natural word size can mean 'natural pointer size' or
'natural int size'. Which are different on many architectures. So, you
want to be careful about which you want.

 So yeah, I'm meaning the machine word, but I think Word is intended to  
 proxy for that. Maybe I'm wrong, but provided that Word contains (or can  
 be persuaded to contain) a round number of Word8 and that operations on  
 Word are cheaper than the analogous sequence of operations on the Word8  
 representation, that's good enough for my needs.

If you want to find out the 'natural' sizes, then look at the 'CInt',
'Ptr', and 'FunPtr' types, which follow the C 'int' 'void *' and 'void
(*fn)()' types. So they will conform to the architecture ABI for the
underlying spec/operating system. 

If you just want a type guarenteed to be able to hold a pointer or an
integer, use 'IntPtr' or 'WordPtr' which are provided for just that
case.

John

-- 
John Meacham - ⑆repetae.net⑆john⑈
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Detecting system endianness

2008-12-19 Thread wren ng thornton

Maurí­cio wrote:

But why would you want that? I understand the only
situation when talking about number of bytes
makes sense is when you are using Foreign and
Ptr. Besides that, you can only guess the amount
of memory you need to deal with your data (taking
laziness, GC etc. into account).


Because I'm using both Ptr and Foreign? ;)

See my recent announcement for bytestring-trie. One of the optimizations 
I'm working on is to read off a full natural word at a time, instead of 
just one byte. To do this properly I need to detect the word size so 
that I don't accidentally read garbage off the end of the ByteString 
when there's less than a natural word left.


Detecting endianness is similar because it determines how to interpret 
that word as if it were an array of bytes, which is needed to get the 
correct behavior when interpreting the word as a bit-vector for trieing. 
That is, if you only read the first two bytes on a big-endian machine, 
then you're skipping the 4/6/? bytes which are actually at the beginning 
of the bytestring.


I'm not sure how important physical endianness of bytes within a word 
is. For IntMap a common case is to get large contiguous chunks of keys, 
so logical big-endian trieing improves performance over logical 
little-endian. I'm not sure how common large contiguous chunks of 
bytestring keys are, though. Reading a word then changing the physical 
endianness of the bytes seems expensive.


--
Live well,
~wren
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe