Re: incorrect non-ascii letters in web archive

2005-03-29 Thread Henry Spencer
, yes, but not surprising. Henry Spencer [EMAIL PROTECTED] -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/

Re: mbstoupper or utf8toupper

2005-01-11 Thread Henry Spencer
, as the Unicode spec points out, you really want to do case-insensitive comparisons by first mapping characters to equivalence classes -- not by mapping to a particular case and assuming that all equivalent letters will map to the same character. Henry

Re: mbstoupper or utf8toupper

2005-01-10 Thread Henry Spencer
to this confusion? I think you've overlooked the fact that *now* he is talking about tolower(), not towlower(). tolower() deals with char, not wchar_t, and in most char character sets, there is no dotless i. Henry Spencer

Re: mbstoupper or utf8toupper

2004-12-31 Thread Henry Spencer
. It would help if you could supply more context: why do you want to do this, as part of what? Henry Spencer [EMAIL PROTECTED] -- Linux-UTF8: i18n of Linux on all levels Archive

Re: questions with combining characters [was: Unicode: endpoint of evolution of encodings?]

2004-11-17 Thread Henry Spencer
thing. Henry Spencer [EMAIL PROTECTED] -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/

Re: questions with combining characters [was: Unicode: endpoint of evolution of encodings?]

2004-11-17 Thread Henry Spencer
will differ considerably from the keystrokes needed to enter it. It's a protocol of some kind, and somebody needs to define how that protocol works and what its backspace operation does. Unicode assigns no semantics to codes 8 and 127. Henry

Re: Canonical Mode Input Processing with multi-byte character sets

2004-02-24 Thread Henry Spencer
you might want editing, whether it be a terminal emulator or an X server. So the problem has user-land solutions in principle; I'm not saying it would be particularly simple... Henry Spencer

Re: Canonical Mode Input Processing with multi-byte character sets

2004-02-24 Thread Henry Spencer
with any of the languages where these issues get serious. But the potential is there.) Henry Spencer [EMAIL PROTECTED] -- Linux-UTF8: i18n of Linux on all levels Archive: http

Re: Canonical Mode Input Processing with multi-byte character sets

2004-02-23 Thread Henry Spencer
efficiency issue any more. Henry Spencer [EMAIL PROTECTED] -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/

Re: Perl unicode weirdness.

2004-02-06 Thread Henry Spencer
of the error is possible, or (c) it's important to preserve the original data even if it is malformed. Henry Spencer [EMAIL PROTECTED] -- Linux-UTF8: i18n of Linux on all levels Archive

Re: Perl unicode weirdness.

2004-02-05 Thread Henry Spencer
.) Henry Spencer [EMAIL PROTECTED] -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/

Re: Perl unicode weirdness.

2004-02-03 Thread Henry Spencer
, but the helper function is only part of an implementation and should not be mislabeled as being the whole thing. Henry Spencer [EMAIL PROTECTED] -- Linux-UTF8: i18n of Linux on all levels

Re: Perl unicode weirdness.

2004-02-03 Thread Henry Spencer
.) Henry Spencer [EMAIL PROTECTED] -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/

Re: Perl unicode weirdness.

2004-02-02 Thread Henry Spencer
to differentiate between kilobits and kilobytes with kb and kB. Changing hyphens and case doesn't make distinctions or avoid confusion. Yes, it would be better to call the more general encoding, say, UTF-P. Henry Spencer

Re: Perl unicode weirdness.

2004-02-02 Thread Henry Spencer
, again to avoid confusion. Call it UTF-P, or UTF-8P, or UTF-9, but not utf8, please. Henry Spencer [EMAIL PROTECTED] -- Linux-UTF8: i18n of Linux on all levels Archive: http

Re: Perl unicode weirdness.

2004-02-02 Thread Henry Spencer
+FEFF is now discouraged. Henry Spencer [EMAIL PROTECTED] -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/

Re: Binary transparency lost in UTF-8 tools

2003-07-06 Thread Henry Spencer
!). Arguably, the same issue arises for things like U+, which are guaranteed to be non-characters and hence should never appear in input. Henry Spencer [EMAIL PROTECTED] -- Linux-UTF8

Re: Strings in a programming language

2003-07-04 Thread Henry Spencer
techniques, in particular, are available only if the desired sequences are exactly known in advance. Henry Spencer [EMAIL PROTECTED] -- Linux-UTF8: i18n of Linux on all levels Archive

Re: Wide character APIs

2003-07-03 Thread Henry Spencer
. Henry Spencer [EMAIL PROTECTED] -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/

RE: Revision of UTF-8 history in draft-yergeau-rfc2279bis-05.txt

2003-06-14 Thread Henry Spencer
. Henry Spencer [EMAIL PROTECTED] -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/

Re: diacritic marks for Latin alphabet (Re: supporting XIM)

2003-04-01 Thread Henry Spencer
all it needs are encoded as precomposed... As I understand it, the usual written forms of Vietnamese explicitly need multiple marks per letter; there are no precomposed forms for that. Henry Spencer

Re: NUL-transparent Java-UTF-8

2002-12-22 Thread Henry Spencer
and reversibly. Henry Spencer [EMAIL PROTECTED] -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/

Re: character encoding diagram

2002-12-20 Thread Henry Spencer
not ASCII-compatible... It might be worth mention, because Java's not the only thing using it. It's actually quite convenient to be able to make applications NUL-transparent without having to recode all the string operations. Henry

Re: UTF-8 gnroff mangles up syntax samples

2002-12-07 Thread Henry Spencer
boldface as the verbatim font with verbatim processing, that will go a long way toward doing the right thing. Bold does not see much other use in traditional manpages. Henry Spencer

Re: UTF-8 wakeup call

2002-12-07 Thread Henry Spencer
. Henry Spencer [EMAIL PROTECTED] -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/

Re: filename and normalization

2002-12-05 Thread Henry Spencer
font should perhaps lead to warning messages for the author. Hmm, yes, that's probably the best approach. Henry Spencer [EMAIL PROTECTED] -- Linux-UTF8: i18n of Linux on all levels

Re: filename and normalization (was gcc identifiers)

2002-12-04 Thread Henry Spencer
. Henry Spencer [EMAIL PROTECTED] -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/

Re: filename and normalization

2002-12-04 Thread Henry Spencer
On Wed, 4 Dec 2002, Glenn Maynard wrote: When --help is printed, I want to see two hyphens, not a dash. You probably want to see two minus signs, not two hyphens... Henry Spencer

Re: Linux Console in UTF-8 - current state

2002-09-11 Thread Henry Spencer
On 10 Sep 2002, H. Peter Anvin wrote: The only sane way to deal with this is to do a console daemon in userspace... As the re-invention of X proceeds apace... :-) Henry Spencer

Re: world of utf-8

2002-08-20 Thread Henry Spencer
raw 8-bit in *headers*, even as a future direction. For the moment, and probably for a long time to come, mail headers have to use RFC 2047 encodings. Henry Spencer [EMAIL PROTECTED

Re: Paper size

2002-05-02 Thread Henry Spencer
is using right now means that existing notebooks, binders, shelves, etc. do not suddenly become unusable, as they often do if the new size is *larger* in one or both dimensions. Henry Spencer

Re: Paper size

2002-05-02 Thread Henry Spencer
politics of standardization here, not about right and wrong. Henry Spencer [EMAIL PROTECTED] -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/

Re: Paper size

2002-05-02 Thread Henry Spencer
changes and additions. Quite apart from any *technical* merit that has, it means that the existing design's current vendors have to make changes too; this helps sell the new standard to people who will have to retool completely for it. Henry

Re: Paper size

2002-05-02 Thread Henry Spencer
exactly the same proportions as the original sheet. Henry Spencer [EMAIL PROTECTED] -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/

Re: brocken bar and UCS keyboard

2002-02-21 Thread Henry Spencer
. Henry Spencer [EMAIL PROTECTED] -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/

Re: brocken bar and UCS keyboard

2002-02-21 Thread Henry Spencer
to believe that there is no distinction to be made, that hyphen is proper for all purposes. Henry Spencer [EMAIL PROTECTED] -- Linux-UTF8: i18n of Linux on all levels Archive: http

Re: [I18n]Re: Li18nux Locale Name Guideline Public Review

2002-01-21 Thread Henry Spencer
. Henry Spencer [EMAIL PROTECTED] -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/

Re: [I18n]Re: Li18nux Locale Name Guideline Public Review

2002-01-21 Thread Henry Spencer
, people will use setenv... For small values of people. :-) Only the experts will. The experts presumably can get the case of a locale name right. Henry Spencer [EMAIL PROTECTED

Re: Fraktur

2002-01-11 Thread Henry Spencer
. :-( Henry Spencer [EMAIL PROTECTED] -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/

Re: Unicode, character ambiguities

2002-01-09 Thread Henry Spencer
at least a strong historical presence in Latin-alphabet texts, are unreadable to a lot of Latin-alphabet users, and were nevertheless unified. Henry Spencer [EMAIL PROTECTED] -- Linux

Re: Unicode, character ambiguities

2002-01-09 Thread Henry Spencer
and scholars -- people who *did* expect to have to deal with it on a day to day basis -- were involved in the design and implementation of Han unification. This was not some hideous Western plot foisted on Japan from abroad. Henry Spencer

Re: Unicode, character ambiguities

2002-01-09 Thread Henry Spencer
until afterward.) Henry Spencer [EMAIL PROTECTED] -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/

Re: Unicode, character ambiguities

2002-01-09 Thread Henry Spencer
on the comments of one person who clearly has strong opinions on the matter himself. Henry Spencer [EMAIL PROTECTED] -- Linux-UTF8: i18n of Linux on all levels Archive: http

Re: Free availability of ISO/IEC standards

2002-01-03 Thread Henry Spencer
small companies building IT hardware. It's not as easy for a small company to get into the hardware business as it used to be, but it is still feasible; moreover, encouraging this is important. Henry Spencer

Re: German keyboard apostrophe mix-up

2001-07-12 Thread Henry Spencer
. Henry Spencer [EMAIL PROTECTED] - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/

Re: character properties

2000-09-29 Thread Henry Spencer
nd tolower() was redefined to do that. (A stupid move -- they should have changed the name when they changed the behavior.) Henry Spencer [EMAIL PROTECTED] - Linux-UTF8: i18n of Linux on all leve

Re: utf-8 encoding scheme

2000-07-27 Thread Henry Spencer
not be the best way to do that. Henry Spencer [EMAIL PROTECTED] - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/lists/

Re: utf-8 encoding scheme

2000-07-13 Thread Henry Spencer
value is unknown or unrepresentable in Unicode". That is, it marks the place where something untranslatable used to be. Henry Spencer [EMAIL PROTECTED] - Linux-UTF8: i18n of Li

Re: utf-8 encoding scheme

2000-07-12 Thread Henry Spencer
of a decoder see no advantage from this behavior, since they are canonicalizing anyway. Henry Spencer [EMAIL PROTECTED] - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/lists/