Re: [Haskell-cafe] converting prefixes of CString - String

2011-04-26 Thread Malcolm Wallace

On 25 Apr 2011, at 08:16, Eric Stansifer wrote:

 Let 'c2h' convert CStrings to Haskell Strings, and 'h2c' convert
 Haskell Strings to CStrings.  (If I understand correctly, c2h . h2c
 === id, but h2c . c2h is not the identity on all inputs;

That is correct.  CStrings are 8-bits, and Haskell Strings are 32-bits.  
Converting from Haskell to C loses information, unless you use a multi-byte 
encoding on the C side (for instance, UTF8).

  or perhaps c2h is not defined for all CStrings.

Rather, h2c is not necessarily well-defined for all Haskell Strings.  In 
particular, the marshalling functions in Foreign.C.String simply truncate any 
character larger than one byte, to its lowest byte.

I suggest you look at the utf8-string package, for instance 
Codec.Binary.UTF8.String.{encode,decode}, which convert Haskell strings to/from 
a list of Word8, which can then be transferred via the FFI to wherever you like.

Regards,
Malcolm

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] converting prefixes of CString - String

2011-04-26 Thread Eric Stansifer
 Let 'c2h' convert CStrings to Haskell Strings, and 'h2c' convert
 Haskell Strings to CStrings.  (If I understand correctly, c2h . h2c
 === id, but h2c . c2h is not the identity on all inputs;

 That is correct.  CStrings are 8-bits, and Haskell Strings are 32-bits.  
 Converting from Haskell to C loses information, unless you use a multi-byte 
 encoding on the C side (for instance, UTF8).

So actually I am incorrect, and h2c . c2h is the identity but c2h . h2c is not?

 I suggest you look at the utf8-string package, for instance 
 Codec.Binary.UTF8.String.{encode,decode}, which convert Haskell strings 
 to/from a list of Word8, which can then be transferred via the FFI to 
 wherever you like.

This package was very helpful;  I looked at the source to see how the
utf8 encoding was done.  It looks as if the functionality I want is
technically feasible but not implemented yet;  it shouldn't be too
much trouble to implement it myself, by imitating the existing
'decode' function but changing its behavior when it runs out of input
in the middle of a utf8-character.  Also key is the property
s1 ++ s2 == decode (encode s1)) ++ decode (encode s2))
holds.

Thanks,
Eric

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] converting prefixes of CString - String

2011-04-26 Thread Malcolm Wallace

On 26 Apr 2011, at 13:31, Eric Stansifer wrote:

 Let 'c2h' convert CStrings to Haskell Strings, and 'h2c' convert
 Haskell Strings to CStrings.  (If I understand correctly, c2h . h2c
 === id, but h2c . c2h is not the identity on all inputs;
 
 That is correct.  CStrings are 8-bits, and Haskell Strings are 32-bits.  
 Converting from Haskell to C loses information, unless you use a multi-byte 
 encoding on the C side (for instance, UTF8).
 
 So actually I am incorrect, and h2c . c2h is the identity but c2h . h2c is 
 not?

Ah, my bad.  In reading the composition from right to left, I inadvertently 
read h2c and c2h from right to left as well!  So, starting from C, converting 
to Haskell, and back to C is the identity, yes.  Starting from Haskell, no.

Regards,
Malcolm

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] converting prefixes of CString - String

2011-04-25 Thread Eric Stansifer
I have been reading Foreign.C.String but it does not seem to provide
the functionality I was looking for.

Let 'c2h' convert CStrings to Haskell Strings, and 'h2c' convert
Haskell Strings to CStrings.  (If I understand correctly, c2h . h2c
=== id, but h2c . c2h is not the identity on all inputs;  or perhaps
c2h is not defined for all CStrings.  Probably this is all locale
dependent.)

I have an infinite Haskell String transferred byte-wise over a
network;  I would like to convert some prefix of the bytes received
into a prefix of the String I started with.  However, if I understand
correctly, if s is a Haskell String it is not necessarily true that
c2h (take n (h2c s)) is a prefix of s for all n.  So I have two
questions:

Given a CString of the form cs = take n (h2c s), how do I know
whether c2h cs is a prefix of s or not?  Is there a way to recognize
whether a CString is valid as opposed to truncated in the middle of
a code point, or is this impossible?  Better yet, given a CString cs
= take n (h2c s), is there a way to find the maximal prefix cs' of cs
such that c2h cs' is a prefix of s?

If s == s1 ++ s2, is it necessarily true that s == (c2h (h2c s1)) ++
(c2h (h2c s2))?  If so, then I can perform my conversion a bit at a
time, otherwise I'd need to start from the beginning of the cstring
each time I receive additional data.

In practice, I think my solution will come down to restricting my
program to only using the lower 128 characters, but I'd like to know
how to handle this problem in full generality.

Thanks,
Eric

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe