UTF-8 encode/decode

2004-04-27 Thread George Russell
I have implemented UTF8-encode/decode. Unlike the code someone has already posted it handles all UTF8 sequences, including those longer than 3 bytes. It also catches all illegal UTF8 sequences (such as characters encoded with a longer sequence than necessary). Here is the code.

Re: UTF-8 encode/decode

2004-04-27 Thread David Brown
On Tue, Apr 27, 2004 at 10:55:57AM +0200, George Russell wrote: I have implemented UTF8-encode/decode. Unlike the code someone has already posted it handles all UTF8 sequences, including those longer than 3 bytes. It also catches all illegal UTF8 sequences (such as characters encoded with a

Re: UTF-8 encode/decode

2004-04-27 Thread George Russell
David Brown wrote (snipped): What license is your code covered under? As it stands now, it is an informative example, but cannot be used by anybody. As author, I am quite happy for it to be used and modified by other people for non-commercial purposes. As far as I know my employers wouldn't any

UTF-8 encode/decode libraries.

2004-04-26 Thread David Brown
I am writing some utilities to deal with UTF-8 encoded text files (not source). Currently, I'm just reading in the UTF-8 directly, and things work reasonably well, since my parse tokens are ASCII, they are easy to parse. However, the character type seems perfectly happy with larger values for

Re: UTF-8 encode/decode libraries.

2004-04-26 Thread Duncan Coutts
On Mon, 2004-04-26 at 18:49, David Brown wrote: Is anyone aware of any Haskell libraries for doing UTF-8 decoding and encoding? If not, I'll write something simple. The gtk2hs library uses the following functions internally. Credit to Axel Simon I believe unless he swiped them from somewhere

Re: UTF-8 encode/decode libraries.

2004-04-26 Thread Sven Panne
Duncan Coutts wrote: On Mon, 2004-04-26 at 18:49, David Brown wrote: [...] toUTF :: String - String Hmmm, String - [Word8] would be nicer... fromUTF :: String - String ... and here: [Word8] - String or [Word8] - Maybe String. Furthermore, UTF-8 is not restricted to a maximum of 3 bytes per

Re: UTF-8 encode/decode libraries.

2004-04-26 Thread David Brown
On Mon, Apr 26, 2004 at 08:33:38PM +0200, Sven Panne wrote: Duncan Coutts wrote: On Mon, 2004-04-26 at 18:49, David Brown wrote: [...] toUTF :: String - String Hmmm, String - [Word8] would be nicer... fromUTF :: String - String ... and here: [Word8] - String or [Word8] - Maybe String.