Re: Small Latin Letter m with Macron

2003-01-16 Thread Kenneth Whistler
Christoph Päper asked: I recently learned in news:de.etc.sprache.deutsch that there has been a tradition (in handwritten text more than in print) of writing mm as only one m with a macron above. I can't find any such character in Unicode, just U+1E3F and U+1E41. You could of course build

Re: U+2047 double question mark collation

2003-01-15 Thread Kenneth Whistler
Vadim, I have a problem with creating collation key for U+2047 (double question mark). Explicit collation keys for this symbol is absent in allkeys.txt. allkeys.txt in the current version of the Unicode Collation Algorithm is based on the Unicode *3.1* repertoire. This can be seen in the

RE: h in Greek epigraphy

2002-12-20 Thread Kenneth Whistler
BTW, the introductory sentence on page 360 of TUS 3 seems strange. It says that IPA includes basic Latin letters and a number of Latin letters from other blocks and then puts four Greek letters in the list! Should this be changed to something like IPA includes basic Latin letters and a

Re: h in Greek epigraphy

2002-12-18 Thread Kenneth Whistler
My first answer to my correspondent was just use Roman h. That would be my suggestion, too. It is available now -- it matches current practice, and requires no further action. A program that was sorting text, or trying to determine what script a word was written in, would get confused by

RE: Precomposed Tibetan

2002-12-17 Thread Kenneth Whistler
Peter Lofting asked: Presumedly the present proposal of 900+ stacks is a maturation of the same system. And the claim for universality is based on it being able to typeset everything they have published to-date. It is based on the Founders system software, as Michael mentioned. The

RE: Precomposed Tibetan

2002-12-17 Thread Kenneth Whistler
Marco commented: Another key point, IMHO, is verifying the following claim contained in the proposal document: Tibetan BrdaRten characters are structure-stable characters widely used in education, publication, classics documentation including Tibetan medicine. The electronic data

Re: Localized names of character ranges

2002-12-03 Thread Kenneth Whistler
Doug, seconding a suggestion by Marco, wrote: I agree that a multilingual Unicode glossary should be assembled (possibly as a volunteer project) and officially endorsed by the Unicode Consortium, so users and vendors will be on common terminological ground. In general, I favor such an

Re: Default properties for PUA characters???

2002-12-02 Thread Kenneth Whistler
Christian Wittern asked: Leaving aside the red light that flashed in my head on the notion of the W3C recommending PUA (for interchange?), I was wondering about the notion of PUA characters being by Unicode defaults treated as ideographs. Is there a canonical reference for this? Just

Re: mixed-script writing systems

2002-11-26 Thread Kenneth Whistler
Dean Snyder asked: ... What it comes down to is the fact that for historic scripts in particular, there are no defined criteria that would enable us to simply *discover* the right answer regarding the identity of scripts. To a certain extent, the encoding committees need to make arbitrary

Re: ISO 10646, Unicode The FAQ (Bengali Khanda Ta)

2002-11-21 Thread Kenneth Whistler
Rick investigated, and came up with: In a specific case, Andy asked about Khanda Ta, and pointed to a WG2 resolution that contradicts the Unicode FAQ on the same topic. I looked up a paper listing an action item as follows, taken from document

Re: Lowercase numerals

2002-11-20 Thread Kenneth Whistler
Doug Ewell answered: Thomas Lotze thomas dot lotze at uni dash jena dot de wrote: Why is it that while there are both uppercase and lowercase roman numerals in the Unicode character set (in the Number Forms range), no lowercase arabic numerals (old-style or text figures) are encoded? If

Re: mixed-script writing systems

2002-11-18 Thread Kenneth Whistler
Andrew West wrote: On Mon, 18 Nov 2002 02:34:18 -0800 (PST), Kenneth Whistler wrote: In point of fact, people for centuries have been borrowing back and forth between Latin, Greek, and Cyrillic in particular, so that in some respects LGC is a kind of metascript and should be treated

Re: The result of the Plane 14 tag characters review

2002-11-18 Thread Kenneth Whistler
James Kass said: How do these differences apply to Unicode plain text and the Plane 14 tags? For example, it was noted that the ideographic full stop is centered in Chinese text but sits on the baseline (and isn't centered) in Japanese text. This claim about ideographic periods is untrue.

Re: The result of the Plane 14 tag characters review

2002-11-18 Thread Kenneth Whistler
Michael Everson asked: At 13:37 -0800 2002-11-18, Kenneth Whistler wrote: Go to any Japanese newspaper. There is no required change of typographic style when Chinese names and placenames are mentioned in the context of Japanese articles about China. Go to any Chinese newspaper

Re: The result of the Plane 14 tag characters review

2002-11-18 Thread Kenneth Whistler
These is completely comparable to the fact that my local English-language newspaper doesn't need a German language tag to write Gerhard Schroeder. How about a multilingual newspaper? What of a multilingual newspaper? Take a hypothetical instance of a German/English newspaper, which

Re: mixed-script writing systems

2002-11-15 Thread Kenneth Whistler
So, the question is this: Should we say that this writing system is completely Latin (keeping the norm that orthographic writing systems use a single script) and apply the principle of unification -- across languages but not across scripts -- to imply that we need to encode new characters,

Re: The result of the plane 14 tag characters review.

2002-11-12 Thread Kenneth Whistler
William Overington asked: As the Unicode Consortium invited public comments on the possible deprecation of plane 14 tag characters, will the Unicode Consortium be making a prompt public statement of the result of the review as soon as the present meeting of the Unicode Technical Committee is

Re: In defense of Plane 14 language tags (long)

2002-11-12 Thread Kenneth Whistler
David Hopwood said: Note that if deprecation implies no longer treating these characters as ignorables, It would not. The only character *property* implication that deprecation of Plane 14 language tags (or any other characters) would have is the requirement that they gain the Deprecated

Re: Names for UTF-8 with and without BOM

2002-11-01 Thread Kenneth Whistler
Perhaps it is time to think of three other words starting with B, O, M that make a better explanation.) Bollixed Operational Muddle ;-) --Ken

RE: New Charakter Proposal

2002-10-30 Thread Kenneth Whistler
Dominikus Scherkl replied to Markus: My other suggestion (and the main reason to call the proposed charakter source failure indicator symbol (SFIS)) was intended especaly for mall-formed utf-8 input that has overlong encodings. This is a special, custom form of error handling - why

RE: Character identities

2002-10-29 Thread Kenneth Whistler
Michael asked: My eyes have glazed over reading this discussion. What am I being asked to agree with? Here's the executive summary for those without the time to plow through the longer exchange: Marco: It is o.k. (in a German-specific context) to display an umlaut as a macron (or a

Re: Character identities

2002-10-28 Thread Kenneth Whistler
Hm, what if I want to make, say, snow capped Devanagari glyphs for my hiking company in Nepal? Shouldn't I assign them to Unicode code points? That's what Private Use code positions are for. -- Michael Everson * * Everson Typography * * http://www.evertype.com Um, Michael, I think

Re: Origin of the term i18n

2002-10-14 Thread Kenneth Whistler
Raymond Mercier asked: Isn't i18n rather off-list ? Neither Sarasvati nor the self-styled list police have objected. While historical origin discussions are OT, they do seem to have an interested following on the Unicode list. Perhaps more to the point, Unicode implementations are all about

Re: Origin of the term i18n

2002-10-11 Thread Kenneth Whistler
Sorry to appear the curmudgeon, but ^^ recte: c8n --K1n

Re: Origin of the term i18n

2002-10-11 Thread Kenneth Whistler
Mark, Mark, I am curious why you find this term so distasteful? Is it the algorithm itself or just a general objection to acronyms and the like? Or something else entirely? I find this particular way of forming abbreviations particularly ugly and obscure. It is also usually unnecessary;

Re: Historians- what is origin of i18n, l10n, etc.?

2002-10-10 Thread Kenneth Whistler
W0e n3r u2d t1e g1d-a3l, g3y a1d o5e a10n i18n, h5r! What I don't understand, since these a10n's are in such widespread use among programmers and character encoders, is why they don't use h9l, as in i12n, lan, and gbn? --K1n BTW, these aan's are not only o5e, they are also o4e, but

Re: ISO 8859-11 (Thai) cross-mapping table

2002-10-07 Thread Kenneth Whistler
Elliotte Harold asked: The Unicode data files at http://www.unicode.org/Public/MAPPINGS/ISO8859/ do not include a mapping for ISO-8859-11, Thai. Is there any particular reason for this? Just that nobody got around to submitting and posting one. Since there was a lot of discussion about

Re: Sporadic Unicode revisited

2002-10-02 Thread Kenneth Whistler
Keld responded: On Wed, Oct 02, 2002 at 02:47:42PM -0400, John Cowan wrote: Mark Davis scripsit: Those mnemonics in (http://www.faqs.org/rfcs/rfc1345.html) are pretty useless in practice, as well as being misnamed. From Websters: assisting or intended to assist memory. So what

Re: Sporadic Unicode revisited

2002-10-02 Thread Kenneth Whistler
John Cowan responded to Rick: (BTW, I agree with Mark about those ISO 14755 [recte: RFC 1345] abbreviations... They aren't very mnemonic. Many people have the charts available, so there is no great advantage to using mnemonics over simply using numbers or palettes.) They are easy

Re Permission to reproduce?

2002-10-01 Thread Kenneth Whistler
Martin Kochanski asked: I want to post a Cardbox database on our Web site (Cardbox is the database that we sell) that contains a list of all Unicode characters: hexadecimal code, decimal code, character, and character name (eg. GREEK CAPITAL LETTER OMEGA WITH TONOS). The first

Pound and Lira (was: Re: The Currency Symbol of China)

2002-09-30 Thread Kenneth Whistler
Marco Cimarosti scripsit: The same should be true for the £ sign. But unluckily, for some obscure reason, Unicode thinks that currencies called pound should have one bar and be encoded with U+00A3, while currencies called lira should have two bars and be encoded with U+20A4.

RE: The Currency Symbol of China

2002-09-30 Thread Kenneth Whistler
Barry Caplan wrote [further morphing this thread]: I also think (but I could be wrong) that ye is not one of the characters in the famous Buddhist poem that uses each of the kana once and only once, and establishes a de facto sorting order by virtue of being the only such poem. OTOH, I

Re: glyph selection for Unicode in browsers

2002-09-26 Thread Kenneth Whistler
Tex, 3) The language information used to be derived dubiously from code page and is missing with Unicode, and architecture needs to accomodate a better model for bringing language to font selection. The archetypal situation is for CJK, and in particular J, where language choice correlates

Re: Sequences of combining characters (from Romanization of Cyrillic andByzantine legal codes)

2002-09-26 Thread Kenneth Whistler
William Overington asked: While on the topic, how would the following sequence be displayed please? U+0074 U+0361 U+0073 ZWJ U+0307 Just like: U+0074 U+0361 U+0073 U+0307 The sequence U+0073, ZWJ, U+0307 could request a ligature of the s and the dot-above, but since it is unlikely that

Re: Keys. (derives from Re: Sequences of combining characters.)

2002-09-26 Thread Kenneth Whistler
Peter responded: A document would contain a sequence such as follows. U+2604 U+0302 U+20E3 12001 U+2460 London U+2604 U+0302 U+20E2 You could just as easily have used S C=12001London/S or S C=12001 P1=London/ or even: cometcircumflex messageId=12001London/cometcircumflex if

Re: Sequences of combining characters (from Romanization of Cyrillic and Byzantine legal codes)

2002-09-20 Thread Kenneth Whistler
Charles Cox suggested: Might there be a case for defining an invisible combining enclosing mark (ICEM), which is otherwise identical to the enclosing circle? Then, if I've understood the conventions correctly the sequence: U+0074 U+034F U+0073 ICEM U+0311 U+0307 would give ts with a

Re: Sequences of combining characters (from Romanization of Cyrillic andByzantine legal codes)

2002-09-20 Thread Kenneth Whistler
Peter said: This stuff *can* all be handled with appropriately designed ligations in fonts, so there are options for display: U+0074, U+0361, U+0073, U+0307 == maps via ligation table to: {t-s-tie-ligature-with-dot-above} glyph I would consider this an anomolous rendering.

Re: Sequences of combining characters (from Romanization of Cyrillic and Byzantine legal codes)

2002-09-18 Thread Kenneth Whistler
William Overington asked: In the discussion about romanization of Cyrillic ligatures I asked how one expresses in Unicode the ts ligature with a dot above. Regarding Ken's response to the Byzantine legal codes matter, it would appear possible that the way that the ts ligature with a dot

Re: Sequences of combining characters (from Romanization of Cyrillicand Byzantine legal codes)

2002-09-18 Thread Kenneth Whistler
The ALA-LC conventions are not the only alternatives available for representation of Abkhaz and/or Khanty/Mansi data in romanization. In fact, you can find such data on the web using alternative romanizations. So it isn't as if the current gap in figuring out precisely how, in Unicode, to

Re: French or German Unicode Names??

2002-09-17 Thread Kenneth Whistler
Ms. Hughes, ISO/IEC 10646-1:2000, which is exactly correlated with the Unicode Standard, Version 3.0, is available in French. You can purchase a copy from ISO: http://www.iso.ch/ (Go to the ISO Store section of the site and search for the ISO number 10646.) I don't know of any German

UTF-8 (was Re: Mercury News: Hawaiian on a Mac)

2002-09-05 Thread Kenneth Whistler
Markus Scherer responded: Stefan Persson wrote: This links to a different page on the same server: http://www.cl.cam.ac.uk/~mgk25/unicode.html That page contains a strange UTF-8 table: ... The last two byte sequences are invalid. Markus Kuhn's page shows the original ISO

Re: various stroked characters

2002-09-05 Thread Kenneth Whistler
Peter, Here's my take on your questions. The less clear cases involve b, d and g. 1) Lower case b with a horizontal stroke through the bowl (hereafter b-stroke-bowl) is used in some phonetic traditions for voiced bilabial fricative (beta, in IPA). The annotation for U+0180 (b with a

RE: Double Macrons on gh...

2002-08-30 Thread Kenneth Whistler
Robert Wheelock asked: Recently, I read some messages saying that there're 3 new double-wide overstruck accents are proposed for Unicode: Umm. Well, they aren't double-wide and they aren't overstruck, and their names are not: 035D: double-wide breve 035E: double-wide macron 035F:

Re: Revised proposal for Missing character glyph

2002-08-26 Thread Kenneth Whistler
[Resend of a response which got eaten by the Unicode email during the system maintenance last week. Carl already responded to me on this, but others may not have seen what he was responding to. --Ken] Proposed unknown and missing character representation. This would be an alternate to method

Re: The Unicode Technical Committee meeting in Redmond, Washington State, USA.

2002-08-26 Thread Kenneth Whistler
William Overington inquired: As many readers may know, the Unicode Technical Committee was due to start a four day meeting yesterday at the Redmond, Washington State, USA campus of Microsoft, that is, on 20 August 2002. Here in England I am interested to know of what is happening and to

Re: The existing rules for U+FFF9 through to U+FFFC. (spins from Re: Furigana)

2002-08-15 Thread Kenneth Whistler
An interesting point for consideration is as to whether the following sequence is permitted in interchanged documents. U+FFF9 U+FFFC U+FFFA Temperature variation with time. U+FFFB That is, the annotated text is an object replacement character and the annotation is a caption for a

Re: Furigana

2002-08-14 Thread Kenneth Whistler
Doug (and Michael also): What if I *want* to design an annotation-aware rendering mechanism? Suppose I read Section 13.6 and decide that, instead of just throwing the annotation characters away, I should attempt to display them directly above (and smaller than) the normal text, the way

Re: Scripts in Unicode 4.0

2002-08-14 Thread Kenneth Whistler
John Hudson mused: Love the HOT BEVERAGE character, but where's the TALL LOWFAT SOYMILK MOCHA FRAPPUCCINO? Come on guys, there's enough blank spaces in that block for the entire Starbucks beverage menu, especially if you treat things like EXTRA FOAM as a combining character. Well,

Re: An idea for keeping U+FFFC usable. (spins off from Re: Furigana)

2002-08-14 Thread Kenneth Whistler
William Overington teased us all unmercifully with: It occurs to me that it is possible to introduce a convention, either as a matter included in the Unicode specification, or as just a known about thing, that if one has a plain text Unicode file with a file name that has some particular

Re: The mystery of Edwin U+1E9A

2002-08-14 Thread Kenneth Whistler
John Cowan asked: Where does this strange beast come from? Semitic transliteration practice, if I recall correctly. Its name is LATIN SMALL LETTER A WITH RIGHT HALF RING, and the right half ring is indeed above the a. We don't have a RIGHT HALF RING ABOVE combining mark, so it only gets

Re: Discrepancy between Names List Code Charts?

2002-08-14 Thread Kenneth Whistler
This is my first posting to this list so please be gentle with me! *pounces and begins to play with the little furry creature (gently)* Can someone help me with this confusion as I am unsure how I should structure these WITH CEDILLA characters in fonts I'm working on. See TUS 3.0, pp.

Re: Double Macrons on gh (was Re: Tildes on Vowels)

2002-08-13 Thread Kenneth Whistler
James Kass asked: Please note that both the UTC and WG2 have approved a new set of combining double accents: U+035D COMBINING DOUBLE BREVE U+035E COMBINING DOUBLE MACRON U+035F COMBINING DOUBLE LOW LINE snip Now, the question is, how long will it take for the fonts and

Re: Furigana

2002-08-13 Thread Kenneth Whistler
I want to be able to send a Blissymbol string with a gloss in English or Swedish attached. Nothing to do with Japanese whatsoever. Basically, as for all things annotational or interlineating, this is an excellent application for markup. --Ken

Re: Furigana

2002-08-13 Thread Kenneth Whistler
Michael, At 14:16 -0700 2002-08-13, Kenneth Whistler wrote: I want to be able to send a Blissymbol string with a gloss in English or Swedish attached. Nothing to do with Japanese whatsoever. Basically, as for all things annotational or interlineating, this is an excellent application

Re: Furigana

2002-08-13 Thread Kenneth Whistler
Michael Everson (in training as a curmudgeon) harrumpfed ;-) The Japanese national body was very clear about this, and was opposed to these going into the standard unless such clarifications were made, to ensure that these were not intended for plain text interchange of furigana (or other

Re: Furigana

2002-08-13 Thread Kenneth Whistler
Tex asked: But does the standard address their removal by receivers (or intermediaries) , and does removing them include removing the contained annotation? Yes and yes. p. 326: On input, a plain text receiver should either preserve all characters

Re: Is U+0140 (l with middle dot) ever used?

2002-08-12 Thread Kenneth Whistler
Keld responded: On Fri, Aug 09, 2002 at 11:44:40PM +0100, Anto'nio Martins-Tuva'lkin wrote: Hm. But middle dot is not also a letter symbol. It's also used as a bullet, a tab filling, even a box-drawing char. Shouldn't Unicode provide a way to separate this duality? · has

Re: Furigana

2002-08-12 Thread Kenneth Whistler
Michael asked: At 12:11 -0700 2002-08-08, Kenneth Whistler wrote: Ah, but read the caveats carefully. The Unicode interlinear annotation characters are *not* intended for interchange, unlike the HTML4 ruby tag. See TUS 3.0, p. 326. They are, essentially, internal-use anchor points

Double Macrons on gh (was Re: Tildes on Vowels)

2002-08-12 Thread Kenneth Whistler
A propos of this long thread about display of combining macrons in Middle English, morphing from tildes on vowels: In Mozilla 2002072104, Windows XP, I get perfectly good overlines on yagh (now). I'd be interested in seeing how it looked with the combining macra. Please note that both

Re: Taboo Variants

2002-08-09 Thread Kenneth Whistler
Lest everyone go scrabbling off the deep end and drown on this particular thread, I would like to point out the following facts: U+2FDF IDEOGRAPHIC TABOO VARIATION INDICATOR was accepted by the UTC on April 30, 2002. However, when the proposal was taken into WG2 it met a wall of opposition led

Re: Furigana

2002-08-08 Thread Kenneth Whistler
Stefan wrote: Many Japanese word processors already have that capability. HTML4 has ruby tag exactly for that purpose. And Unicode has characters for that purpose, too. Unicode: U+FFF9 kanji U+FFFA furigana U+FFFB HTML4: RUBYRD kanji /RDRT furigana /RT/RUBY

Compatibility and Politics (was Re: Digraphs as Distinct Logical Units)

2002-08-08 Thread Kenneth Whistler
Roozbeh asked: Expecting the compatibility decompositions to serve this purpose effectively is overvaluing what they can actually do. I would love to hear your opinion about what compatibility decompositions *are* for, then. I feel a little confused here. They are helpful annotations to

Re: Digraphs as Distinct Logical Units

2002-08-02 Thread Kenneth Whistler
At 04:48 PM 02-08-02, Kenneth Whistler wrote: ... and some extreme case orthographies are known that employ up to *hepta*graphs! Ooo, I want one! Do you have any examples, Ken? If I recall correctly, that one was a technical orthography of Nama -- but I can't track down an online

Re: Missing character glyph- example

2002-08-01 Thread Kenneth Whistler
As a clarification, here is a sample web page: http://www.cardbox.com/missing.htm The requirement is to be able to display the first paragraph of the page in such a way that it makes sense in its reference to the text on the rest of the page. The character after the word this: in

Re: Missing character glyph

2002-07-31 Thread Kenneth Whistler
Asmus wrote: At 08:40 PM 7/30/02 -0700, Doug Ewell wrote: a code-point that has no character assigned to it (and is not likely to get one), e. g. U+03A2 No code point is safe. True enough. But then I figure Plane 13 characters like U+DEAD1 are pretty unlikely to be assigned to a

Re: REALLY *not* Tamil - changing scripts (long)

2002-07-29 Thread Kenneth Whistler
It's *much* easier -- and, in the long term, safer -- for them to select from the extensive inventory of characters available in Unicode and to avoid using ASCII punctuation characters with redefined word-building semantics. I don't get what you are saying here, why should people be

Re: REALLY *not* Tamil - changing scripts (long)

2002-07-29 Thread Kenneth Whistler
Keld wrote: In Linux, *Which* Linux? :-) Caldera OpenLinux, Corel Linux, Debian GNU/Linux, Elfstone Linux, Libranet Linux, Linux-Mandrake, Phat Linux, Red Hat Linux, Slackware Linux, Stampede GNU/Linux, Storm Linux, SuSE Linux, or TurboLinux? Or for that matter another dozen international

Re: (long) Making orthographies computer-ready (was *not* Telephoning Tamil)

2002-07-29 Thread Kenneth Whistler
One that occurs to me might be the Khoisan languages of Africa, which I believe commonly use ! (U+0021) for a click sound. This is almost exactly the same problem you are describing for Tongva. U+01C3 LATIN LETTER RETROFLEX CLICK (General Category Lo) was encoded precisely for this. It is

God's and devil's details (was: Re: Unicode certification - quote correction and attribution)

2002-07-26 Thread Kenneth Whistler
[Tex Texin] Actually, (or so I have heard) it is God dwells in the details of our work, I have seen it attributed to Einstein, more generally to shakers, and others. So Ludwig might have been quoting others. [Ken Whistler] And the devil is in the details. Looking a bit at your

Re: God's and devil's details (was: Re: Unicode certification - quote correction and attribution)

2002-07-26 Thread Kenneth Whistler
The correct Einsteinian German appears to be: Der liebe Gott steckt im Detail (cf. http://www.benecke.com/einsteinprogramm.html) (and there are German alternatives such as Gott lebt im Detail) and the satanic alternate is: Der Teufel liegt im Detail (very common, actually, but maybe just

Re: Abstract character?

2002-07-23 Thread Kenneth Whistler
Following up on several responses on this thread. Mark Davis said: A small correction to Ken's message: The Unicode scalar value definitionally excludes D800..DFFF, which are only code unit values used in UTF-16, and which are not code points associated with any

Re: Abstract character?

2002-07-22 Thread Kenneth Whistler
Lars Marius Garshol asked: I'm trying to find out what an abstract character is. I've been looking at chapter 3 of Unicode 3.0, without really achieving enlightenment. The term Unicode scalar value (apparently synonymous with code point) seems clear. It is the identifying number assigned

Re: ISO/IEC 10646 versus Unicode

2002-07-18 Thread Kenneth Whistler
Marion Gunn wrote: The immediate attraction ang great advantage of Unicode’s vision was its simplicity/focus: after an unsteady and argumentative start, its founders committed Unicode to the IMPLEMENTATION of10646, and became very specific (loud) about not calling it a STANDARD (note to

Re: Basic question: types of diacritics marks

2002-07-18 Thread Kenneth Whistler
Adam asked: I have a very basic question. What would be the implementation differences of diacritics marks in a font? For example, we'd consider: U+00B4 acute accent U+02CA modifier letter acute accent U+0301 combining acute accent What are the common recommendations regarding the

What Unicode Is (was RE: Inappropriate Proposals FAQ)

2002-07-12 Thread Kenneth Whistler
Suzanne responded: Maybe Unicode is more of a shared set of rules that apply to low level data structures surrounding text and its algorithms then a character set. Sounds like the start of a philosophical debate. If Unicode is described as a set of rules, we'll be in a world of

Re: Hmm, this evolved into an editorial when I wasn't looking :) was: RE: Inappropriate Proposals FAQ

2002-07-12 Thread Kenneth Whistler
Barry Caplan wrote: At 01:27 PM 7/11/2002 -0400, Suzanne M. Topping wrote: Unicode is a character set. Period. Each character has numerous properties in Unicode, whereas they generally don't in legacy character sets. Each character, or some characters? For all intents and

RE: Saying characters out loud (derives from hash, pound,octothorpe?)

2002-07-11 Thread Kenneth Whistler
Joe sent around a classic version of Waka waka bang splat, but my favorite is a slightly pared-down version set to music for a four-part round, lyrics by Fred Bremmer and Steve Kroese, music by Melissa D. Binde: http://www.roundsing.org/music/waka-waka.html where you can listen to it in it's

Re: *Why* are precomposed characters required for backward compatibility?

2002-07-11 Thread Kenneth Whistler
Dan Oscarsson said: NFD should not be an extension of ASCII. There are several spacing accents in ASCII that should be decomposed just like the spacing accents in ISO 8859-1 are decomposed. All or none spacing accents should be decomposed. In addition to the usage clarifications made by

Re: Variant selectors in Mongolian

2002-07-10 Thread Kenneth Whistler
Martin Heijdra asked: The statement For example, in languages employing the Mongolian script, sometimes a specific variant range of glyphs is needed for a specific textual purpose for which the range of generic glyphs is considered inappropriate could be taken to mean this solution.

Re: Definition of character: Exegesis of SC2 nomenclature

2002-07-10 Thread Kenneth Whistler
Martin Kochanski waxed exuberantly: I mention this because Unicode is the opposite of Procrustean. There is no finer antidote to gloom and cynicism than leafing through the Unicode Standard. In what other computing book could you find a phrase such as In good Latvian typography? Or:

Re: Variant selectors in Mongolian

2002-07-10 Thread Kenneth Whistler
John Hudson wrote: Mongolian variants *are* very confusing, and I'm not sure what the best way to describe them is. Part of the problem is that there is still some tension in the UTC regarding just how to define the affect of the variation selectors. Position A: A variation selector

Strange resemblances and weird sisters

2002-07-10 Thread Kenneth Whistler
Then there is the oft-cited Character Most Resembling a Line Break: MALAYALAM LETTER UU (U+0D0A) Then in Extension B there are many, many weird and wonderful candidates for strangest CJK characters. Some of my personal favorites include: U+26B99 U+20137 U+20572 U+2069C U+2696E With such

RE: Phaistos in ConScript

2002-07-09 Thread Kenneth Whistler
Michael, Ken. Thanks for your response. Hmm. I think I detect the invisible ironic smiley there. Thanks for broadcasting my private, poke-in-the-ribs response to you and Marco back to the public list. ;-) As I said, the original might (assuming a syllabic structure and assigning random

Re: Phaistos Disk

2002-07-09 Thread Kenneth Whistler
Michael, At 10:58 -0400 2002-07-05, Patrick Rourke wrote: There is also the question of what kind of text it represents: is it a prose text, is it a catalogue of items (the other Aegean scripts tempt one to suspect this), each item represented by an ideograph, etc.? Well if you look

Definition of character: Exegesis of SC2 nomenclature

2002-07-09 Thread Kenneth Whistler
One possibly interesting thing derived from the threads from hell is the notion that the definition of character offered in the various ISO JTC1/SC2 character encoding standards and TR's such as the Character-Glyph Model (TR 15825) may be leading people astray about what is appropriate to encode

Re: *Why* are precomposed characters required for backward compatibility?

2002-07-09 Thread Kenneth Whistler
David Hopwood wrote: Marco Cimarosti wrote: BTW, they always sold me that precomposed accented letters exist in Unicode only because of backward compatibility with existing standards. I don't get that argument. It is not difficult to round-trip convert between NFD and a non-Unicode

Re: Ending the Overington [debate]

2002-07-09 Thread Kenneth Whistler
David Hopwood responded to Michael Everson: people just keep saying that markup exists, as if the very existence of XML in some way precludes single code point colour codes and single code point formatting codes and so on. Yes, that is right. That is entirely right. No it isn't.

Re: Multiple encodings for 1 character

2002-07-08 Thread Kenneth Whistler
Theodore wrote: What is going to be done about the confusion generated from having multiple ways to encode the same character? For example, for filenames, OSX will encode an accented Roman letter one way, while for filenames Windows will encode it the other way. These kind of

Re: Whats the difference between a composite and a combining sequence?

2002-07-08 Thread Kenneth Whistler
Theodore, http://www.unicode.org/unicode/reports/tr15/ mentions both composites and combining sequences. But it doesn't tell us the difference. I know what a combining sequence is. If I didn't know what a composite was, I'd guess it was the same thing as a combining sequence. See TUS

Re: FW: Inappropriate Proposals FAQ

2002-07-03 Thread Kenneth Whistler
Suzanne, Can people from the review committee give me some hard and fast rules for when something is thrown out? As Michael Everson indicated, the answer to this is probably not. However, perhaps the most important thing for serious script proposers to do, to see if what they are concerned

Re: (long) Re: Chromatic font research

2002-07-02 Thread Kenneth Whistler
[*groans in the audience*] I know, I know -- another contribution in the endless thread... In re: The Respectfully Experiment I used it as evidence that ideas about what should not be included in Unicode can change over a period of time as new scientific evidence is discovered. Having

Re: ZWJ and Latin Ligatures (was Re: (long) Re: Chromatic font research)

2002-07-01 Thread Kenneth Whistler
James Kass said: One problem with TR28 is that it is worded so that it appears to be in addition to earlier guidelines. It is. The way this works is as follows: The original decision about the ZWJ as request for ligation was documented in the Unicode 3.0.1 update notice. That documentation

Re: Chromatic font research

2002-06-26 Thread Kenneth Whistler
Philipp said: The most obvious and simple example for glyph colours with semantic meaning that I can think of appears to be encoding characters for national flags (something that might even be considered proposable). As *characters*? Why? What is this bug that people catch, which induces

Re: Hexadecimal characters.

2002-06-20 Thread Kenneth Whistler
At 03:03 AM 6/20/02 -0400, Tom Finch wrote: I wish to propose sixteen consecutive digits for the purpose of displaying hexadecimal values. [...] Has this been considered? [David Starner] I seem to recall that it has. The problem is, they're just new copies of old characters. An

Re: Chess symbols, ZWJ, Opentype and holly type ornaments.

2002-06-20 Thread Kenneth Whistler
In view of the fact that some people are unwilling to let my ideas be discussed in this forum upon their academic merit but simply use an ad hominem attack almost every time I post (before many people can have the chance to sit down and, if they wish, have a serious read of my ideas), when

Re: Hexadecimal characters.

2002-06-20 Thread Kenneth Whistler
Tom Finch said: Hmm, so representing Devanagari digits is more important than hexadecimal, which is used almost more than decimal on the web? I think you may be misconstruing the purpose of the character encoding here. If I want to represent the hexadecimal numbers 0x60DB 0x618A in email

Re: Chess symbols, ZWJ, Opentype and holly type ornaments.

2002-06-20 Thread Kenneth Whistler
IOW, brevity's wit's soul. Well-spoken, dear Polonius. But better to Adorn the soul of wit so briefly put to us. My liege, and madam, to expostulate What majesty should be, what duty is, Why day is day, night is night, and time is time. Were nothing but to waste night, day, and time.

Re: Q: How many enumerated characters in Unicode?

2002-06-04 Thread Kenneth Whistler
Adam asked: How many characters does the current version of the Unicode Standard enumerate? 95,156. BTW: I think this information would be useful if it were always included in the summary of earch revision. Agreed. The total was listed in Unicode 3.1 (94,140), and you could get the

Re: Fixed position combining classes (Was: Combining class for Thaicharacters)

2002-06-03 Thread Kenneth Whistler
Peter, On 06/02/2002 05:40:05 AM Samphan Raruenrom wrote: My opinion is that they should have been simplified, but that setting the bulk of them to 0 was a mistake and creates some significant problems (which go a step beyond the questions you raise here). Can you elaborate on this?

RE: How is UTF8, UTF16 and UTF32 encoded?

2002-05-31 Thread Kenneth Whistler
Rick Cameron asked: The Unicode Standard 2.0 had a table in Appendix A that is, I think, just what you're asking for. I can't find this table in the online version of TUS 3.0 (it's not very useful that the online index gives page numbers, when there's no way to map a page number to the

<    1   2   3   4   5   6   7   8   >