Re: Marshalling Haskell String - UTF-8

2004-09-02 Thread Ross Paterson
On Wed, Sep 01, 2004 at 04:39:30PM -0700, John Meacham wrote:
 I should mention I have a new version of the CWString library in
 development that conforms to the new FFI spec and works on all posixy
 systems, not just those that have unicode wchar_t's like my first
 posting.  
 
 It is not quite ready for release, but if there is a strong need I can
 package it up nicely. 

The most useful packaging would be as a patch against the HEAD version
of fptools/libraries/base/Foreign/C/String.hs

There may also be a difficulty in that you may require hsc2hs but I
think Simon wants to keep it out of low-level modules (?).
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


RE: Marshalling Haskell String - UTF-8

2004-09-02 Thread Simon Marlow
On 02 September 2004 11:36, Ross Paterson wrote:

 On Wed, Sep 01, 2004 at 04:39:30PM -0700, John Meacham wrote:
 I should mention I have a new version of the CWString library in
 development that conforms to the new FFI spec and works on all posixy
 systems, not just those that have unicode wchar_t's like my first
 posting. 
 
 It is not quite ready for release, but if there is a strong need I
 can package it up nicely.
 
 The most useful packaging would be as a patch against the HEAD version
 of fptools/libraries/base/Foreign/C/String.hs
 
 There may also be a difficulty in that you may require hsc2hs but I
 think Simon wants to keep it out of low-level modules (?).

Yes, using hsc2hs in libraries/base is problematic for bootstrapping
reasons - it adds inconvenient extra steps to the process of
bootstrapping GHC on a new platform, so please avoid it.  Outside of the
libraries required for building GHC, hsc2hs is fair game.

Cheers,
Simon
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Marshalling Haskell String - UTF-8

2004-09-01 Thread Bayley, Alistair
I want to call a foreign C function that takes a UTF-8 encoded string as one
of its arguments (and there's also a version of the function that receives
UTF-16). Can someone point me to documentation or examples of how this would
be done? AFAICT (reading the FFI spec) marshalling a String to a CString is
locale-dependent, whereas I know that I want UTF-8/16.

Also, if a C function returns a UTF-8 (or UTF-16) encoded string, how do I
marshall this reliably into a Haskell String?

Can I use the UTF-16 functions directly with CWStrings? (I'm not sure
exactly what wchar_t is, as it's apparently dependent on the locale at
compile-time, and could be 8, 16, or 32 bits).

Thanks,
Alistair.

-
*
Confidentiality Note: The information contained in this 
message, and any attachments, may contain confidential 
and/or privileged material. It is intended solely for the 
person(s) or entity to which it is addressed. Any review, 
retransmission, dissemination, or taking of any action in 
reliance upon this information by persons or entities other 
than the intended recipient(s) is prohibited. If you received
this in error, please contact the sender and delete the 
material from any computer.
*

___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


RE: Marshalling Haskell String - UTF-8

2004-09-01 Thread Simon Marlow
On 01 September 2004 10:16, Bayley, Alistair wrote:

 I want to call a foreign C function that takes a UTF-8 encoded string
 as one of its arguments (and there's also a version of the function
 that receives UTF-16). Can someone point me to documentation or
 examples of how this would be done? AFAICT (reading the FFI spec)
 marshalling a String to a CString is locale-dependent, whereas I know
 that I want UTF-8/16. 
 
 Also, if a C function returns a UTF-8 (or UTF-16) encoded string, how
 do I marshall this reliably into a Haskell String?
 
 Can I use the UTF-16 functions directly with CWStrings? (I'm not sure
 exactly what wchar_t is, as it's apparently dependent on the locale at
 compile-time, and could be 8, 16, or 32 bits).

Your best bet is to marshal it yourself.  We're a bit behind in this
area: 6.2.x doesn't have CAString and CWString, and CString is just
char*.  The HEAD has CAString and CWString, and will hopefully follow
the FFI spec by the time we release 6.4 (we still have to do the locale
encoding/decoding between CString and String, IIRC).

In any case, none of this allows you to specify a UTF-8 conversion.

wchar_t varies from platform to platform: on Windows it is 16 bits, on
Linux with glibc it is 32 bits, for example.  CWString is only useful
for talking to C interfaces that are expressed in terms of wchar_t.

Cheers,
Simon
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Marshalling Haskell String - UTF-8

2004-09-01 Thread Ross Paterson
On Wed, Sep 01, 2004 at 10:16:23AM +0100, Bayley, Alistair wrote:
 I want to call a foreign C function that takes a UTF-8 encoded string as one
 of its arguments (and there's also a version of the function that receives
 UTF-16). Can someone point me to documentation or examples of how this would
 be done? AFAICT (reading the FFI spec) marshalling a String to a CString is
 locale-dependent, whereas I know that I want UTF-8/16.

The locale-dependent marshalling of CString described by the FFI spec
isn't yet implemented in the library.  There is some code by John Meacham
including UTF-8 conversion at

http://www.haskell.org/pipermail/ffi/2003-August/001355.html

 Can I use the UTF-16 functions directly with CWStrings? (I'm not sure
 exactly what wchar_t is, as it's apparently dependent on the locale at
 compile-time, and could be 8, 16, or 32 bits).

Under Windows, CWString uses the UTF-16 encoding.  On systems that define
__STDC_ISO_10646__ (e.g. glibc as used under Linux) it uses UTF-32.
(This is in the CVS version that will become 6.4, not the current release.)
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Marshalling Haskell String - UTF-8

2004-09-01 Thread David Roundy
On Wed, Sep 01, 2004 at 11:13:23AM +0100, Ross Paterson wrote:
 On Wed, Sep 01, 2004 at 10:16:23AM +0100, Bayley, Alistair wrote:
  I want to call a foreign C function that takes a UTF-8 encoded string
  as one of its arguments (and there's also a version of the function
  that receives UTF-16). Can someone point me to documentation or
  examples of how this would be done? AFAICT (reading the FFI spec)
  marshalling a String to a CString is locale-dependent, whereas I know
  that I want UTF-8/16.
 
 The locale-dependent marshalling of CString described by the FFI spec
 isn't yet implemented in the library.  There is some code by John Meacham
 including UTF-8 conversion at
 
   http://www.haskell.org/pipermail/ffi/2003-August/001355.html

You could also look at the darcs source code, as darcs uses UTF8 to store
file names.
-- 
David Roundy
http://www.abridgegame.org/darcs
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Marshalling Haskell String - UTF-8

2004-09-01 Thread George Russell
I have implemented code to do this which I think is better than
John Meacham's, because it (a) handles all UTF8 sequences
(up to 6 bytes); (b) checks for errors as UTF8 decoders are
supposed to do; (c) lets you determine if there is an error
without having to seq the entire list.  Here is a link:
   http://www.haskell.org//pipermail/glasgow-haskell-users/2004-April/006564.html
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


RE: Marshalling Haskell String - UTF-8

2004-09-01 Thread Bayley, Alistair
 From: George Russell [mailto:[EMAIL PROTECTED]
 
 http://www.haskell.org//pipermail/glasgow-haskell-users/2004-April/006
 564.html


Thanks George, this looks useful.

There are some things I want to clarify...

module UTF8(
toUTF8,
   -- :: String - String
   -- Converts a String (whose characters must all have codes 2^31)
into
   -- its UTF8 representation.
fromUTF8WE,
   -- :: Monad m = String - m String
   -- Converts a UTF8 representation of a String back into the String,
   -- catching all possible format errors.

Does toUTF8 return a String whose Chars are all code-points  256, which,
when converted to bytes, will represent a UTF-8 string?

Likewise, does fromUTF8WE expect a String whose Chars are all code-points 
256 i.e. they are the result of saying chr n for each byte in the UTF-8
stream?



 From: Simon Marlow [mailto:[EMAIL PROTECTED]
 
 In any case, none of this allows you to specify a UTF-8 conversion.

Are there plans to add UTF-8 (and UTF-16) conversion functions to the
libraries? I imagine they would be useful...


 Your best bet is to marshal it yourself.  We're a bit behind in this
 area: 6.2.x doesn't have CAString and CWString, and CString is just 
 char*.  The HEAD has CAString and CWString, and will hopefully follow 
 the FFI spec by the time we release 6.4 (we still have to do the 
 locale encoding/decoding between CString and String, IIRC).

Again, I want to clarify some things...

 - will any encoding/decoding be performed by the
peekCString/withCString/newCString(Len) functions? i.e. if I want to avoid
encoding/decoding (because I know my string is already in UTF-8) then I
simply have to avoid using these functions?

 - can I still declare the foreign functions with CString types without
worrying that encoding/decoding might be attempted?

 - when the CAString functions are available then I should use them, but for
now I will have to write something that uses castCharToCChar/castCCharToChar
+ peekArray0/pokeArray0.


Thanks,
Alistair.

-
*
Confidentiality Note: The information contained in this 
message, and any attachments, may contain confidential 
and/or privileged material. It is intended solely for the 
person(s) or entity to which it is addressed. Any review, 
retransmission, dissemination, or taking of any action in 
reliance upon this information by persons or entities other 
than the intended recipient(s) is prohibited. If you received
this in error, please contact the sender and delete the 
material from any computer.
*

___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Marshalling Haskell String - UTF-8

2004-09-01 Thread John Meacham
On Wed, Sep 01, 2004 at 11:13:23AM +0100, Ross Paterson wrote:
 On Wed, Sep 01, 2004 at 10:16:23AM +0100, Bayley, Alistair wrote:
  I want to call a foreign C function that takes a UTF-8 encoded string as one
  of its arguments (and there's also a version of the function that receives
  UTF-16). Can someone point me to documentation or examples of how this would
  be done? AFAICT (reading the FFI spec) marshalling a String to a CString is
  locale-dependent, whereas I know that I want UTF-8/16.
 
 The locale-dependent marshalling of CString described by the FFI spec
 isn't yet implemented in the library.  There is some code by John Meacham
 including UTF-8 conversion at

I should mention I have a new version of the CWString library in
development that conforms to the new FFI spec and works on all posixy
systems, not just those that have unicode wchar_t's like my first
posting.  

It is not quite ready for release, but if there is a strong need I can
package it up nicely. 
John
-- 
John Meacham - repetae.netjohn 
___
Glasgow-haskell-users mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users