RE: Marshalling Haskell String <-> UTF-8
On 02 September 2004 11:36, Ross Paterson wrote: > On Wed, Sep 01, 2004 at 04:39:30PM -0700, John Meacham wrote: >> I should mention I have a new version of the CWString library in >> development that conforms to the new FFI spec and works on all posixy >> systems, not just those that have unicode wchar_t's like my first >> posting. >> >> It is not quite ready for release, but if there is a strong need I >> can package it up nicely. > > The most useful packaging would be as a patch against the HEAD version > of fptools/libraries/base/Foreign/C/String.hs > > There may also be a difficulty in that you may require hsc2hs but I > think Simon wants to keep it out of low-level modules (?). Yes, using hsc2hs in libraries/base is problematic for bootstrapping reasons - it adds inconvenient extra steps to the process of bootstrapping GHC on a new platform, so please avoid it. Outside of the libraries required for building GHC, hsc2hs is fair game. Cheers, Simon ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Marshalling Haskell String <-> UTF-8
On Wed, Sep 01, 2004 at 04:39:30PM -0700, John Meacham wrote: > I should mention I have a new version of the CWString library in > development that conforms to the new FFI spec and works on all posixy > systems, not just those that have unicode wchar_t's like my first > posting. > > It is not quite ready for release, but if there is a strong need I can > package it up nicely. The most useful packaging would be as a patch against the HEAD version of fptools/libraries/base/Foreign/C/String.hs There may also be a difficulty in that you may require hsc2hs but I think Simon wants to keep it out of low-level modules (?). ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Marshalling Haskell String <-> UTF-8
On Wed, Sep 01, 2004 at 11:13:23AM +0100, Ross Paterson wrote: > On Wed, Sep 01, 2004 at 10:16:23AM +0100, Bayley, Alistair wrote: > > I want to call a foreign C function that takes a UTF-8 encoded string as one > > of its arguments (and there's also a version of the function that receives > > UTF-16). Can someone point me to documentation or examples of how this would > > be done? AFAICT (reading the FFI spec) marshalling a String to a CString is > > locale-dependent, whereas I know that I want UTF-8/16. > > The locale-dependent marshalling of CString described by the FFI spec > isn't yet implemented in the library. There is some code by John Meacham > including UTF-8 conversion at I should mention I have a new version of the CWString library in development that conforms to the new FFI spec and works on all posixy systems, not just those that have unicode wchar_t's like my first posting. It is not quite ready for release, but if there is a strong need I can package it up nicely. John -- John Meacham - ârepetae.netâjohnâ ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
RE: Marshalling Haskell String <-> UTF-8
> From: George Russell [mailto:[EMAIL PROTECTED] > > http://www.haskell.org//pipermail/glasgow-haskell-users/2004-April/006 > 564.html Thanks George, this looks useful. There are some things I want to clarify... module UTF8( toUTF8, -- :: String -> String -- Converts a String (whose characters must all have codes <2^31) into -- its UTF8 representation. fromUTF8WE, -- :: Monad m => String -> m String -- Converts a UTF8 representation of a String back into the String, -- catching all possible format errors. Does toUTF8 return a String whose Chars are all code-points < 256, which, when converted to bytes, will represent a UTF-8 string? Likewise, does fromUTF8WE expect a String whose Chars are all code-points < 256 i.e. they are the result of saying "chr n" for each byte in the UTF-8 stream? > From: Simon Marlow [mailto:[EMAIL PROTECTED] > > In any case, none of this allows you to specify a UTF-8 conversion. Are there plans to add UTF-8 (and UTF-16) conversion functions to the libraries? I imagine they would be useful... > Your best bet is to marshal it yourself. We're a bit behind in this > area: 6.2.x doesn't have CAString and CWString, and CString is just > char*. The HEAD has CAString and CWString, and will hopefully follow > the FFI spec by the time we release 6.4 (we still have to do the > locale encoding/decoding between CString and String, IIRC). Again, I want to clarify some things... - will any encoding/decoding be performed by the peekCString/withCString/newCString(Len) functions? i.e. if I want to avoid encoding/decoding (because I know my string is already in UTF-8) then I simply have to avoid using these functions? - can I still declare the foreign functions with CString types without worrying that encoding/decoding might be attempted? - when the CAString functions are available then I should use them, but for now I will have to write something that uses castCharToCChar/castCCharToChar + peekArray0/pokeArray0. Thanks, Alistair. - * Confidentiality Note: The information contained in this message, and any attachments, may contain confidential and/or privileged material. It is intended solely for the person(s) or entity to which it is addressed. Any review, retransmission, dissemination, or taking of any action in reliance upon this information by persons or entities other than the intended recipient(s) is prohibited. If you received this in error, please contact the sender and delete the material from any computer. * ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Marshalling Haskell String <-> UTF-8
I have implemented code to do this which I think is better than John Meacham's, because it (a) handles all UTF8 sequences (up to 6 bytes); (b) checks for errors as UTF8 decoders are supposed to do; (c) lets you determine if there is an error without having to seq the entire list. Here is a link: http://www.haskell.org//pipermail/glasgow-haskell-users/2004-April/006564.html ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Marshalling Haskell String <-> UTF-8
On Wed, Sep 01, 2004 at 11:13:23AM +0100, Ross Paterson wrote: > On Wed, Sep 01, 2004 at 10:16:23AM +0100, Bayley, Alistair wrote: > > I want to call a foreign C function that takes a UTF-8 encoded string > > as one of its arguments (and there's also a version of the function > > that receives UTF-16). Can someone point me to documentation or > > examples of how this would be done? AFAICT (reading the FFI spec) > > marshalling a String to a CString is locale-dependent, whereas I know > > that I want UTF-8/16. > > The locale-dependent marshalling of CString described by the FFI spec > isn't yet implemented in the library. There is some code by John Meacham > including UTF-8 conversion at > > http://www.haskell.org/pipermail/ffi/2003-August/001355.html You could also look at the darcs source code, as darcs uses UTF8 to store file names. -- David Roundy http://www.abridgegame.org/darcs ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
Re: Marshalling Haskell String <-> UTF-8
On Wed, Sep 01, 2004 at 10:16:23AM +0100, Bayley, Alistair wrote: > I want to call a foreign C function that takes a UTF-8 encoded string as one > of its arguments (and there's also a version of the function that receives > UTF-16). Can someone point me to documentation or examples of how this would > be done? AFAICT (reading the FFI spec) marshalling a String to a CString is > locale-dependent, whereas I know that I want UTF-8/16. The locale-dependent marshalling of CString described by the FFI spec isn't yet implemented in the library. There is some code by John Meacham including UTF-8 conversion at http://www.haskell.org/pipermail/ffi/2003-August/001355.html > Can I use the UTF-16 functions directly with CWStrings? (I'm not sure > exactly what wchar_t is, as it's apparently dependent on the locale at > compile-time, and could be 8, 16, or 32 bits). Under Windows, CWString uses the UTF-16 encoding. On systems that define __STDC_ISO_10646__ (e.g. glibc as used under Linux) it uses UTF-32. (This is in the CVS version that will become 6.4, not the current release.) ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
RE: Marshalling Haskell String <-> UTF-8
On 01 September 2004 10:16, Bayley, Alistair wrote: > I want to call a foreign C function that takes a UTF-8 encoded string > as one of its arguments (and there's also a version of the function > that receives UTF-16). Can someone point me to documentation or > examples of how this would be done? AFAICT (reading the FFI spec) > marshalling a String to a CString is locale-dependent, whereas I know > that I want UTF-8/16. > > Also, if a C function returns a UTF-8 (or UTF-16) encoded string, how > do I marshall this reliably into a Haskell String? > > Can I use the UTF-16 functions directly with CWStrings? (I'm not sure > exactly what wchar_t is, as it's apparently dependent on the locale at > compile-time, and could be 8, 16, or 32 bits). Your best bet is to marshal it yourself. We're a bit behind in this area: 6.2.x doesn't have CAString and CWString, and CString is just char*. The HEAD has CAString and CWString, and will hopefully follow the FFI spec by the time we release 6.4 (we still have to do the locale encoding/decoding between CString and String, IIRC). In any case, none of this allows you to specify a UTF-8 conversion. wchar_t varies from platform to platform: on Windows it is 16 bits, on Linux with glibc it is 32 bits, for example. CWString is only useful for talking to C interfaces that are expressed in terms of wchar_t. Cheers, Simon ___ Glasgow-haskell-users mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-users