You can't determine Unicode character properties by analyzing the
names of the characters.
Read chapter 4 of the standard:
http://www.unicode.org/versions/Unicode5.0.0/ch04.pdf
and get the property values here:
http://www.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt
It sounds like
Sm = Symbol, math
Sc = Symbol, currency
Sk = Symbol, modifier
So = Symbol, other
Zs = Separator, space
Zl = Separator, line
Zp = Separator, paragraph
Cc = Other, control
Cf = Other, format
Cs = Other, surrogate
Co = Other, private use
Cn = Other, not assigned (including noncharacters)
Deborah
All characters with general category Lu have the property Uppercase,
but the converse is not true.
Deborah
On Aug 25, 2008, at 8:27 PM, Richard A. O'Keefe wrote:
On 26 Aug 2008, at 1:31 pm, Deborah Goldsmith wrote:
You can't determine Unicode character properties by analyzing the
names
On Jun 14, 2008, at 1:06 PM, Don Stewart wrote:
tom.davie:
In the mean time -- who knows enough to make ghc target ARM, and get
this to link against the iPhone libraries? This would be quite a
coup
if it could be made to run there!
I'd be interested. We should start a wiki page for
On Dec 21, 2007, at 3:40 PM, Thorkil Naur wrote:
1. Which readline do we use? GNU readline, of course. As opposed to
the
readline installed as /usr/include/readline/*.h
and /usr/lib/libreadline.dylib on our PPC Mac OS X machines which
are said to
be (and can even be observed to be)
On Oct 2, 2007, at 5:11 AM, ChrisK wrote:
Deborah Goldsmith wrote:
UTF-16 is the native encoding used for Cocoa, Java, ICU, and
Carbon, and
is what appears in the APIs for all of them. UTF-16 is also what's
stored in the volume catalog on Mac disks. UTF-8 is only used in BSD
APIs
On Oct 2, 2007, at 8:44 AM, Jonathan Cast wrote:
I would like to, again, strongly argue against sacrificing
compatibility
with Linux/BSD/etc. for the sake of compatibility with OS X or
Windows.
FFI bindings have to convert data formats in any case; Haskell
shouldn't
gratuitously break Linux
On Oct 2, 2007, at 3:01 PM, Twan van Laarhoven wrote:
Lots of people wrote:
I want a UTF-8 bikeshed!
No, I want a UTF-16 bikeshed!
What the heck does it matter what encoding the library uses
internally? I expect the interface to be something like (from my own
CompactString library):
Sorry for the long delay, work has been really busy...
On Sep 27, 2007, at 12:25 PM, Aaron Denney wrote:
On 2007-09-27, Aaron Denney [EMAIL PROTECTED] wrote:
Well, not so much. As Duncan mentioned, it's a matter of what the
most
common case is. UTF-16 is effectively fixed-width for the
On Sep 26, 2007, at 11:06 AM, Aaron Denney wrote:
UTF-16 has no advantage over UTF-8 in this respect, because of
surrogate
pairs and combining characters.
Good point.
Well, not so much. As Duncan mentioned, it's a matter of what the most
common case is. UTF-16 is effectively fixed-width
I'll look over the proposal more carefully when I get time, but the
most important issue is to not let the storage type leak into the
interface.
From an implementation point of view, UTF-16 is the most efficient
representation for processing Unicode. It's the native Unicode
11 matches
Mail list logo