Re: [Haskell-cafe] Valid Haskell characters

2008-08-25 Thread Deborah Goldsmith
You can't determine Unicode character properties by analyzing the names of the characters. Read chapter 4 of the standard: http://www.unicode.org/versions/Unicode5.0.0/ch04.pdf and get the property values here: http://www.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt It sounds like

Re: [Haskell-cafe] Re: Valid Haskell characters

2008-08-25 Thread Deborah Goldsmith
Sm = Symbol, math Sc = Symbol, currency Sk = Symbol, modifier So = Symbol, other Zs = Separator, space Zl = Separator, line Zp = Separator, paragraph Cc = Other, control Cf = Other, format Cs = Other, surrogate Co = Other, private use Cn = Other, not assigned (including noncharacters) Deborah

Re: [Haskell-cafe] Valid Haskell characters

2008-08-25 Thread Deborah Goldsmith
All characters with general category Lu have the property Uppercase, but the converse is not true. Deborah On Aug 25, 2008, at 8:27 PM, Richard A. O'Keefe wrote: On 26 Aug 2008, at 1:31 pm, Deborah Goldsmith wrote: You can't determine Unicode character properties by analyzing the names

Re: [Haskell-cafe] ANN: Topkata

2008-06-14 Thread Deborah Goldsmith
On Jun 14, 2008, at 1:06 PM, Don Stewart wrote: tom.davie: In the mean time -- who knows enough to make ghc target ARM, and get this to link against the iPhone libraries? This would be quite a coup if it could be made to run there! I'd be interested. We should start a wiki page for

Re: readline problems building GHC on Mac OS X (was: Re: [Haskell-cafe] Re: ANNOUNCE: GHC version 6.8.2)

2007-12-21 Thread Deborah Goldsmith
On Dec 21, 2007, at 3:40 PM, Thorkil Naur wrote: 1. Which readline do we use? GNU readline, of course. As opposed to the readline installed as /usr/include/readline/*.h and /usr/lib/libreadline.dylib on our PPC Mac OS X machines which are said to be (and can even be observed to be)

Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-10-02 Thread Deborah Goldsmith
On Oct 2, 2007, at 5:11 AM, ChrisK wrote: Deborah Goldsmith wrote: UTF-16 is the native encoding used for Cocoa, Java, ICU, and Carbon, and is what appears in the APIs for all of them. UTF-16 is also what's stored in the volume catalog on Mac disks. UTF-8 is only used in BSD APIs

Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-10-02 Thread Deborah Goldsmith
On Oct 2, 2007, at 8:44 AM, Jonathan Cast wrote: I would like to, again, strongly argue against sacrificing compatibility with Linux/BSD/etc. for the sake of compatibility with OS X or Windows. FFI bindings have to convert data formats in any case; Haskell shouldn't gratuitously break Linux

Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-10-02 Thread Deborah Goldsmith
On Oct 2, 2007, at 3:01 PM, Twan van Laarhoven wrote: Lots of people wrote: I want a UTF-8 bikeshed! No, I want a UTF-16 bikeshed! What the heck does it matter what encoding the library uses internally? I expect the interface to be something like (from my own CompactString library):

Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-10-01 Thread Deborah Goldsmith
Sorry for the long delay, work has been really busy... On Sep 27, 2007, at 12:25 PM, Aaron Denney wrote: On 2007-09-27, Aaron Denney [EMAIL PROTECTED] wrote: Well, not so much. As Duncan mentioned, it's a matter of what the most common case is. UTF-16 is effectively fixed-width for the

Re: [Haskell-cafe] Re: PROPOSAL: New efficient Unicode string library.

2007-09-26 Thread Deborah Goldsmith
On Sep 26, 2007, at 11:06 AM, Aaron Denney wrote: UTF-16 has no advantage over UTF-8 in this respect, because of surrogate pairs and combining characters. Good point. Well, not so much. As Duncan mentioned, it's a matter of what the most common case is. UTF-16 is effectively fixed-width

Re: [Haskell-cafe] PROPOSAL: New efficient Unicode string library.

2007-09-25 Thread Deborah Goldsmith
I'll look over the proposal more carefully when I get time, but the most important issue is to not let the storage type leak into the interface. From an implementation point of view, UTF-16 is the most efficient representation for processing Unicode. It's the native Unicode