Re: [Haskell] ANNOUNCE: Data.CompactString 0.1 - my attempt at a Unicode ByteString

2007-02-09 Thread Duncan Coutts
On Thu, 2007-02-08 at 17:01 -0800, John Meacham wrote: On Tue, Feb 06, 2007 at 03:16:17PM +0900, shelarcy wrote: I'm afraid that its fantasy is broken again, as no surrogate pair UCS-2 cover all language that is trusted before Europe and America people. UCS-2 is a disaster in every way.

Re: [Haskell] ANNOUNCE: Data.CompactString 0.1 - my attempt at a Unicode ByteString

2007-02-09 Thread Deborah Goldsmith
On Thu, 2007-02-08 at 17:01 -0800, John Meacham wrote: UCS-2 is a disaster in every way. someone had to say it. :) UCS-2 has been deprecated for many years. everything should be ascii, utf8 or ucs-4 or migrating to it. UCS-4 has also been deprecated for many years. The main forms of

Re: [Haskell] ANNOUNCE: Data.CompactString 0.1 - my attempt at a Unicode ByteString

2007-02-08 Thread John Meacham
On Tue, Feb 06, 2007 at 03:16:17PM +0900, shelarcy wrote: I'm afraid that its fantasy is broken again, as no surrogate pair UCS-2 cover all language that is trusted before Europe and America people. UCS-2 is a disaster in every way. someone had to say it. :) everything should be ascii, utf8

Re: [Haskell] ANNOUNCE: Data.CompactString 0.1 - my attempt at a Unicode ByteString

2007-02-08 Thread John Meacham
On Mon, Feb 05, 2007 at 01:14:26PM +0100, Twan van Laarhoven wrote: The reason for inventing my own encoding is that it is easier to use and takes less space than UTF-8. The only advantage UTF-8 has is that it can be read and written directly. I guess this is a trade off, faster

Re: [Haskell] ANNOUNCE: Data.CompactString 0.1 - my attempt at a Unicode ByteString

2007-02-05 Thread Chris Kuklewicz
Twan van Laarhoven wrote: Hello all, I would like to announce my attempt at making a Unicode version of Data.ByteString. The library is named Data.CompactString to avoid conflict with other (Fast)PackedString libraries. The library uses a variable length encoding (1 to 3 bytes) of Chars

Re: [Haskell] ANNOUNCE: Data.CompactString 0.1 - my attempt at a Unicode ByteString

2007-02-05 Thread Twan van Laarhoven
Chris Kuklewicz wrote: Can I be among the first to ask that any Unicode variant of ByteString use a recognized encoding? snip In reading all the poke/peek function I did not see anything that your tag bits accomplish that the tag bits in utf-8 do not, except that you want to write only a

Re: [Haskell] ANNOUNCE: Data.CompactString 0.1 - my attempt at a Unicode ByteString

2007-02-05 Thread shelarcy
Hello Twan, On Mon, 05 Feb 2007 08:46:35 +0900, Twan van Laarhoven [EMAIL PROTECTED] wrote: I would like to announce my attempt at making a Unicode version of Data.ByteString. The library is named Data.CompactString to avoid conflict with other (Fast)PackedString libraries. How about add

Re: [Haskell] ANNOUNCE: Data.CompactString 0.1 - my attempt at a Unicode ByteString

2007-02-05 Thread Chris Kuklewicz
shelarcy wrote: Hello Twan, On Mon, 05 Feb 2007 08:46:35 +0900, Twan van Laarhoven [EMAIL PROTECTED] wrote: I would like to announce my attempt at making a Unicode version of Data.ByteString. The library is named Data.CompactString to avoid conflict with other (Fast)PackedString

Re: [Haskell] ANNOUNCE: Data.CompactString 0.1 - my attempt at a Unicode ByteString

2007-02-05 Thread Alistair Bayley
On 05/02/07, Chris Kuklewicz [EMAIL PROTECTED] wrote: shelarcy wrote: Many Hasekll UTF-8 libraries doesn't support over 3 byte encodings. UTF-8 uses 1,2,3, or 4 bytes. Anything that does not support 4 bytes does not support UTF-8 Well, some of them are probably a bit dated; they likely

Re: [Haskell] ANNOUNCE: Data.CompactString 0.1 - my attempt at a Unicode ByteString

2007-02-05 Thread David Menendez
Alistair Bayley writes: On 05/02/07, Chris Kuklewicz [EMAIL PROTECTED] wrote: UTF-8 is a 4 byte encoding. There is no valid UTF-8 5 or 6 byte encoding. Chris is right here, in that Takusen's decoder is incorrect w.r.t. the standard, in allowing up to 6 bytes to encode a single char.

Re: [Haskell] ANNOUNCE: Data.CompactString 0.1 - my attempt at a Unicode ByteString

2007-02-05 Thread shelarcy
On Tue, 06 Feb 2007 00:25:45 +0900, Chris Kuklewicz [EMAIL PROTECTED] wrote: UTF-8 also uses 4 to 6 byte encodings now. CJK Unified Ideographs Extension B, Tai Xuan Jing Symbol and Music Symbol, etc ... use 4 byte encoding. Looking at several sources, it seems you are incorrect. Haskell

[Haskell] ANNOUNCE: Data.CompactString 0.1 - my attempt at a Unicode ByteString

2007-02-04 Thread Twan van Laarhoven
Hello all, I would like to announce my attempt at making a Unicode version of Data.ByteString. The library is named Data.CompactString to avoid conflict with other (Fast)PackedString libraries. The library uses a variable length encoding (1 to 3 bytes) of Chars into Word8s, which are then