Re: UNICODE version of _T(x) macro

2010-11-22 Thread Kenneth Whistler
Somya asked: I have unicode C application. I am using the following macro to define my string to 2 byte width characters. #ifdef UNICODE #define _T(x) L##x But I see that GCC compiler maps 'L' to wchar_t, which is 4 byte on Linux. I have used -fshort-wchar option on Linux but I

RE: Application that displays CJK text in Normalization Form D

2010-11-15 Thread Kenneth Whistler
FA47 is a compatibility character, and would have a compatibility mapping. Faulty syllogism. FA47 is a CJK Compatibility character, which means it was encoded for compatibility purposes -- in this case to cover the round-trip mapping needed for JIS X 0213. However, it has a *canonical*

CJK Compatibility Gotchas (was: Re: Application that displays CJK text in Normalization Form D

2010-11-15 Thread Kenneth Whistler
Asmus replied: On 11/15/2010 2:24 PM, Kenneth Whistler wrote: FA47 is a compatibility character, and would have a compatibility mapping. Faulty syllogism. Formally correct answer but only because of something of a design flaw in Unicode. When the type of mapping was decided

Re: Is there a term for strictly-just-this-encoding-and-not-really-that-encoding?

2010-11-10 Thread Kenneth Whistler
Mark Davis wrote: What are also tricky are the 'almost' supersets, where there are only a few different characters. Those definitely cause problems because the difference in data is almost undetectable. For example, Mark is referring to cases such as ISO 8859-1 and 8859-15. Those share all

Re: IDNA2008 Contextual rules clarification

2010-10-29 Thread Kenneth Whistler
Nagesh Chigurupati asked: I have a question regarding some of the contextual rules in RFC5892. For example the contextual rule in appendix A.4 Greek Lower Numeral Sign (U+0375), states the following: If Script(After(cp)) .eq. Greek Then True; If the Greek Lower Numeral Sign (U+0375) is

RE: Is there any unambiguous vowel length mark code point for classicists?

2010-10-27 Thread Kenneth Whistler
Gy. Dobner asked: But my original question was not how to encode a combining macron in one more possible way but how to encode a length mark that would display as something _visually_ _distinguishable_ _from_ _a_ _macron_ (because the macron is functionally ambiguous and hence unsuitable for

Re: Creative people on Twitter

2010-10-14 Thread Kenneth Whistler
What is the position regarding the 32-bit code point space above U+10 please? Does the Unicode Consortium and/or ISO or indeed anyone else make any claims upon it? Yes, the claim is that if you use it, you're generating invalid Unicode. Don't do it, don't contemplate it,

Re: Irrational numeric values in TUS

2010-10-12 Thread Kenneth Whistler
Karl Williamson asked: The Unicode standard only gives numeric values to rational numbers. Is the reason for this merely because of the difficulty of representing irrational ones? No. Primarily it is because the Unicode Standard is a *character* encoding standard, and not a standard for

Re: Irrational numeric values in TUS

2010-10-12 Thread Kenneth Whistler
Asmus, I'm curious if any thought was given to this, and what code points I'm missing in my analysis. U+1D452 MATHEMATICAL ITALIC SMALL E (or merely U+0065 LATIN SMALL LETTER E), also used for Euler's number. See also U+2147. Now you are confusing Euler's constant - also depicted with

Re: [OT]: a strange language name abbreviation (was: How to encode reversed section sign?)

2010-08-06 Thread Kenneth Whistler
Exploring the dictionary with the search engine (which is operational since today morning ...) I discovered two occurences of an unexplained abbreviation which refers to a language in which silvir means silver and ses means six. The name of the language is abbreviated as Kimr. Any ideas

Re: Draft Proposal to add Variation Sequences for Latin and Cyrillic letters

2010-08-05 Thread Kenneth Whistler
I am thinking of where a poet might specify an ending version of a glyph at the end of the last word on some lines, yet not on others, for poetic effect. I think that it would be good if one could specify that in plain text. Why can't a poet find a poetic means of doing that, instead of

Re: Signage

2010-08-04 Thread Kenneth Whistler
But an approach that abstracts the name, then tries to re-imagine a representation from scratch is, in my view, very much misguided. Recall that many of the emojis 1) have changed glyphs quite a lot from the source glyphs, and 2) are to quite an extent defined from the *source*

Re: UTS#10 (UCA) 7.1.3 Implicit Weights, Unassign ed and Other CodeÿA Points

2010-08-04 Thread Kenneth Whistler
That statement is incorrect. The UCA currently specifies that ill-formed code unit sequences and *noncharacters* are mapped to [....], but unassigned code points are not. This is exactly equivalent: if you use strength level 3, they are both [...], ... You have

Re: Results of public Review Issues (in particular #121)

2010-08-03 Thread Kenneth Whistler
Martin, In a discussion about a new protocol, there was some issue about how to replace illegal bytes in UTF-8 with U+FFFD. That let me remember that there was once a Public Review Issue about this, and that as a result, I added something to the Ruby (programming language) codebase. I

Re: UTS#10 (UCA) 7.1.3 Implicit Weights, Unassigned and Other Code Points

2010-08-02 Thread Kenneth Whistler
Philippe Verdy said: Implicit weights for unassigned code points and other characters that are NOT ill-formed are suboptimal, as noted in the proposed update. To follow up on Mark's response on this thread... It should take into account their existing default properties, notably : [ long

Re: UTS#10 (collation) : French backwards level 2, and word-breakers.

2010-07-30 Thread Kenneth Whistler
Frédéric Grosshans asked: Why did you chose the fleur words ? The question discussed about the accent do not seem to arise here. I was struck by the issues about space, hyphen (or lack thereof) and alternate spellings that could be illustrated by that stretch of topics, so used that as the

Re: [ISO15924] Typo for Egyptian_Hierog(l)yphs

2010-07-29 Thread Kenneth Whistler
Philippe Verdy noted: Everywhere below, the Unicode property value alias is missing an 'l'. - In HTML table 1: Egyp 050 Egyptian hieroglyphshiéroglyphes égyptiens Egyptian _Hierogyphs 2009-06-01 etc. These errors in the tables have been corrected by the Registration

Re: ? Reasonable to propose stability policy on numeric type = decimal

2010-07-27 Thread Kenneth Whistler
C. E. Whitehead said: I've not gone through many character charts though so I can't really speak as an expert as you all can; sorry I've not gotten to more; I will try to ... For people who wish to pursue this issue further, the relevant information is neatly summarized in the extracted

Re: Why does EULER CONSTANT not have math property and PLANCK CONSTANT does?

2010-07-27 Thread Kenneth Whistler
Karl Williamson asked: Subject: Why does EULER CONSTANT not have math property and PLANCK CONSTANT does? They are U+2107 and U+210E respectively. Because U+210E PLANCK CONSTANT is, to quote the standard, simply a mathematical italic h. It serves as the filler for the gap in the run of

Re: VS characters, default ignorable property and text search and collation

2010-07-26 Thread Kenneth Whistler
Sharma asked: I have a question about VS characters and the default ignorable property. TUS 5.2 ch 16.4 clearly states that VS characters are default ignorable. Ch 5.21 states that default ignorable characters are to be ignored in rendering (except in specialized modes which show hidden

Re: Indian Rupee Sign (U+20B9) proposal

2010-07-22 Thread Kenneth Whistler
On this date, Unicode had received proposals for same purpose form non-insiders too -- as you know this is true because India is a nation of over a billion populations. I have seen no other proposals to encode the character, submitted either to the UTC or to WG2. Actually, there has

Re: Pau Cin Hau scripts proposal : confusive N3865 and better older N3781

2010-07-20 Thread Kenneth Whistler
Philippe Verdy said: A side note about this preliminary proposal for allocating blocks in the SMP for the two Pau Cin Hau scripts (including one for the large logographic script, with 1050 signs): http://std.dkuug.dk/JTC1/SC2/WG2/docs/n3865.pdf (authored by Anshuman Pandey, in MIT) If

Re: Bengali Script

2010-07-13 Thread Kenneth Whistler
So what do we do with all these names? Can't we ask Mark to use a lottery to pick one and go from there? ... So whaddya say, Mark? Have a go at the roulette wheel? Ladies and gentlemen... step right up and place your bets!! Bengali, Bangla, Bengalese, Bangladeshi, Bengalian, Bengalish,

RE: UTS#10 (collation) : French backwards level 2, and word-breakers.

2010-07-12 Thread Kenneth Whistler
Philippe Verdy said: If we don't limit the backwards reordering, then all accents in the full sentences will be reordered, so this is the final word that will drive the order. not only this is incorrect, I understand that you think that the ordering should be done word-by-word, with the

RE: UTS#10 (collation) : French backwards level 2, and word-breakers.

2010-07-12 Thread Kenneth Whistler
Philippe Verdy wrote: Kenneth Whistler k...@sybase.com wrote: Huh? That is just preprocessing to delete portions of strings before calculating keys. If you want to do so, be my guest, but building in arbitrary rules of content suppression into the UCA algorithm itself is a non-starter

RE: UTS#10 (collation) : French backwards level 2, and word-breakers.

2010-07-07 Thread Kenneth Whistler
[ snipping all the word breaking discussion, which I am not going to comment on ... ] CE Whitehead said: I collate as follows (note that i' is equivalent to i with accent grave): (EXAMPLE 1 -- my sort) di Silva, Fred, di Silva, John di Si'lva, Fred di Si'lva, John Disilva, Fred

Re: Keying emoji characters using an ordinary keyboard (from Re: ASCII emoji in iOS4)

2010-06-30 Thread Kenneth Whistler
William Overington asked: Will the Unicode Standard version 6.0 include mention of the unification of characters from the emoji set used in mobile telephones with earlier Unicode characters, also including a list of those characters of the emoji set that have been unified and where to

Euro Sign in 8859-15 (was: Re: Indian Rupee Sign to be chosen today)

2010-06-25 Thread Kenneth Whistler
On Fri, 25 Jun 2010, I wrote Even in the year 2010, the euro sign (¤) doesn't work reliably. in both the Unicode list and in the newsgroup de.test. unicode.org shows a euro sign: http://www.unicode.org/mail-arch/unicode-ml/y2010-m06/0372.html groups.google.com shows a currency

Re: Latin Script

2010-06-16 Thread Kenneth Whistler
John - If I define a symbol (variable or constant) named ɸ and some user types 'φ' or 'ϕ' instead, it won't match. Can you please post the names for the other two, i.e., 'φ' or 'ϕ' ? John was referring to: U+0278 LATIN SMALL LETTER PHI U+03C6 GREEK SMALL LETTER PHI U+03D5 GREEK PHI

Re: Writing a proposal for an unusual script: SignWriting

2010-06-11 Thread Kenneth Whistler
Steve, All of this writing can be encoded using 1280 code points. I have a 12-bit encoding with bi-directional conversion with UTF-8 working for planes 1, 15, or 16. A minor point, but I suggest you not use bi-directional in that context. Bidirectional is a term of art in Unicode

Re: Hexadecimal digits

2010-06-04 Thread Kenneth Whistler
On Friday 04 June 2010 08:51:05 am Otto Stolz wrote: In any case, you have to know the base of every number you are going to parse. This stems from the fact that the same digits are used for all number systems. Luke-Jr replied: But you first need to know if it is a number or a word.

Re: Hexadecimal digits

2010-06-04 Thread Kenneth Whistler
But again, I'm not talking about programming. My four year old can grasp tonal just as well as she could decimal had I been teaching that. Now if I were using the a-f notation, she would be (reasonably) confused as to why *some* numbers are unique, but *other* numbers are also letters.

RE: Greek letter LAMDA?

2010-06-02 Thread Kenneth Whistler
Note that as of 1993, the only LAMDA or LAMBDA characters in the standard were: 039B;GREEK CAPITAL LETTER LAMDA;Lu;0;L;N;GREEK CAPITAL LETTER LAMBDA;;;03BB; 03BB;GREEK SMALL LETTER LAMDA;Ll;0;L;N;GREEK SMALL LETTER LAMBDA;;039B;;039B 019B;LATIN SMALL LETTER LAMBDA WITH

Tengwar and Cirth (was: Re: A question about user areas)

2010-06-02 Thread Kenneth Whistler
I'm not sure how much longer we should continue to wait for Tengwar and Cirth. Three words: Squeaky wheel -- grease. Don't expect this to just happen. The corporate members of the Unicode Consortium are mostly concerned about economically significant sets of characters that impact their

Re: Greek letter LAMDA?

2010-06-01 Thread Kenneth Whistler
John Dlugosz asked: Why does the code chart call the plain Greek letter (upper and lower case) LAMDA rather than LAMBDA? Because ISO 8859-7 called it LAMDA rather than LAMBDA. Note that Unicode 1.0 called it LAMBDA, but synchronization of names for Unicode 1.1 (in 1993) was towards ISO

Re: Greek letter LAMDA?

2010-06-01 Thread Kenneth Whistler
Robert Abel noted: It seems U+019B is the only instance where lambda is used. All other instances use lamda. So it seems the slip-up is the other way around, whatever the initial reasoning for using lamda was. It was not a slip-up. It was deliberate at the time (1993). Note that as of

RE: Greek letter LAMDA?

2010-06-01 Thread Kenneth Whistler
Why not? I thought the names of some things have changed between versions, and other database items have changed substantially. See Name Stability on the Unicode Character Encoding Stability Policy page: http://www.unicode.org/policies/stability_policy.html --Ken Names sometimes don't

RE: Roundtripping in Unicode

2004-12-14 Thread Kenneth Whistler
Lars said: According to UTC, you need to keep processing the UNIX filenames as BINARY data. And, also according to UTC, any UTF-8 function is allowed to reject invalid sequences. Basically, you are not supposed to use strcpy to process filenames. This is a very misleading set of statements.

RE: Roundtripping in Unicode

2004-12-13 Thread Kenneth Whistler
Lars Kristan stated: I said, the choice is yours. My proposal does not prevent you from doing it your way. You don't need to change anything and it will still work the way it worked before. OK? I just want 128 codepoints so I can make my own choice. You have them: U+EE80..U+EEFF, which are

Re: US-ASCII (was: Re: Invalid UTF-8 sequences)

2004-12-10 Thread Kenneth Whistler
If any criticism was present, it referred to the redundant US- prefix in US-ASCII, not to Unicode, and even that wasn't really criticism, just my lack of understanding /why/. In addition to Doug's historical clarification, you need to understand this as a perfectly normal linguistic

Re: US-ASCII (was: Re: Invalid UTF-8 sequences)

2004-12-10 Thread Kenneth Whistler
Tim Greenwood asked: ... a perfectly normal linguistic process of attributive disambiguation of a term which had grown ambiguous in usage. Is that like the 'Please RSVP' that I see all too often? Or should that not be excused? *grins* Well, technically, that is not a case of

Re: Please RSVP... (was: US-ASCII)

2004-12-10 Thread Kenneth Whistler
Philippe, RSVP is a French acronym for Répondez, s'il vous plait. Yes, we know that. But it is also a reanalyzed English verb which means reply to a message (or invitation). That it has been morphological reanalyzed is demonstrated by the fact that it takes regular English verb endings, as

Re: Roadmapped scripts

2004-12-09 Thread Kenneth Whistler
Peter Kirk noted: I was reviewing the Roadmap for the SMP (http://www.unicode.org/roadmaps/smp/), in comparison with the list of proposed new scripts, and found a few anomalies. Hittite (Anatolian) Hieroglyphs/Luvian is listed as a proposed new script, with a draft proposal, but seems

Re: Invalid UTF-8 sequences (was: Re: Nicest UTF)

2004-12-08 Thread Kenneth Whistler
John Cowan responded: Storage of UNIX filenames on Windows databases, for example, ^^ O.k., I just quoted this back from the original email, but it really is a complete misconception of the issue for databases. Windows databases is a

Re: Nicest UTF

2004-12-08 Thread Kenneth Whistler
Marcin asked: The general trouble is that numeric character references can only encode individual code points By design. rather than graphemes (is this a correct term for a non-combining code point with a sequence of combining code points?). No. The correct term is combining character

RE: Invalid UTF-8 sequences (was: Re: Nicest UTF)

2004-12-08 Thread Kenneth Whistler
Lars responded: ... Whatever the solutions for representation of corrupt data bytes or uninterpreted data bytes on conversion to Unicode may be, that is irrelevant to the concerns on whether an application is using UTF-8 or UTF-16 or UTF-32. The important fact is that if you have an

Re: Nicest UTF.. UTF-9, UTF-36, UTF-80, UTF-64, ...

2004-12-07 Thread Kenneth Whistler
Philippe stated, and I need to correct: UTF-24 already exists as an encoding form (it is identical to UTF-32), if you just consider that encoding forms just need to be able to represent a valid code range within a single code unit. This is false. Unicode encoding forms exist by virtue of

Re: Nicest UTF.. UTF-9, UTF-36, UTF-80, UTF-64, ...

2004-12-07 Thread Kenneth Whistler
Philippe continued: As if Unicode had to be bound on architectural constraints such as the requirement of representing code units (which are architectural for a system) only as 16-bit or 32-bit units, Yes, it does. By definition. In the standard. ignoring the fact that technologies do

RE: Invalid UTF-8 sequences (was: Re: Nicest UTF)

2004-12-07 Thread Kenneth Whistler
Lars, I'm going to step in here, because this argument seems to be generating more heat than light. I never said it doesn't violate any existing rules. Stating that it does, doesn't help a bit. Rules can be changed. I ask you to step back and try to see the big picture. First, I'm going to

Re: Relationship between Unicode and 10646

2004-11-30 Thread Kenneth Whistler
John Cowan clarified the JTC1 process: The result of a no vote is that the process loops until all such votes are resolved. All comments on a formal JTC1 ballot receive a *disposition*. As far as possible, that disposition is done by committee consensus, which usually means, in practice, the

Re: Relationship between Unicode and 10646

2004-11-30 Thread Kenneth Whistler
Peter, This was in fact my question: will the amendment be passed automatically if there is a majority in favour, or does it go back for further discussion until a consensus is reached? You have clarified that the latter is true. And I am glad to hear it. The relevant applicable clauses

Re: CGJ , RLM

2004-11-29 Thread Kenneth Whistler
Otoo Stolz asked: In German, however, a ligature must not span a syllable break. How should I code plain text, w.r.t. hyphenation and ligatures? - Huf + ZWNJ + lattich - Huf + SYH + lattich - Huf + SYH + ZWNJ + lattich - Huf + ZWNJ + SYH + lattich You should code it as: Huflattich

Dutch malarkey (was: Re: (base as a combing char))

2004-11-29 Thread Kenneth Whistler
Philippe Verdy responded to John Cowan: From: John Cowan [EMAIL PROTECTED] the need to encode Dutch ij as a single character, which is neither necessary nor practical. (U+0132 and U+0133 are encoded for compatibility only.) In cases where ij is a digraph in Dutch text, i+ZWNJ+j will be

Re: CGJ , RLM

2004-11-29 Thread Kenneth Whistler
Mark Davis said (in reference to a long set of comments by Philippe Verdy on this thread): The statements below are incorrect And Philippe asked: Which statements? My message is mostly a read as a question, not as an affirmation... And I will attempt the fact-finding... CGJ is a

Re: Ideograph?!?

2004-11-29 Thread Kenneth Whistler
Michael Norton (a.k.a. Flarn) asked: What's an ideograph? Also, what's a radical? Are they the same thing? No, they aren't. In the Unicode context, the simplest answer is that an ideograph or a CJK ideograph is simply to be taken as a synonym for a Chinese character. A radical is one of a

Re: No Invisible Character - NBSP at the start of a word

2004-11-29 Thread Kenneth Whistler
John Hudson responded to Jony Rosenne: The idea that the position of such text on a page -- as a marginal note -- somehow demotes it from being text, is particularly nonsensical. I think you two (Jony and John) are talking at cross-purposes on this particular point. The *content* of

RE: Ideograph?!?

2004-11-29 Thread Kenneth Whistler
Allen Haaheim provided some further detailed clarification: Note that Han characters are logographic, not ideographic. That is, they are graphemes that represent words (or at least morphemes), not ideas. This correctly states the situation for the normal case for Chinese characters used

RE: Question on Canonical equivilance

2004-11-24 Thread Kenneth Whistler
Tim Greenwood asked: All of the spacing combining marks (general category Mc) except musical symbols have a canonical combining class of 0. So, for example 0B95 (TAMIL LETTER KA) 0BC7 (TAMIL VOWEL SIGN EE - stands to the left of the consonant) 0BBE (TAMIL VOWEL SIGN AA - on the right)

Re: My Querry

2004-11-23 Thread Kenneth Whistler
Harshal Trivedi asked: How can i make sure that UTF-8 format string has terminated while encoding it, as compared to C program string which ends with '\0' (NULL) character? You don't need to do anything special at all when using UTF-8 in C programs, as far as string termination goes. UTF-8

About Encoding Theory (was: Re: Again not about Phoenician)

2004-11-08 Thread Kenneth Whistler
Peter Kirk suggested: I am suggesting that the best way to get the job done properly is to lay the conceptual foundation properly first, instead of trying to build a structure on a foundation which doesn't match... Part of the problem that I think some people are having here, including Peter,

Re: not font designers?

2004-11-03 Thread Kenneth Whistler
Elaine Keown asked: Supposedly this list has 600 people. Just of curiosity, how many of you are NOT font designers? And since a number of people are declaring their backgrounds, I'll chime in, too. ;-) I am not a font designer, although I have designed fonts (many years ago) for

Re: Public Review Issues Update

2004-10-21 Thread Kenneth Whistler
Theo, Further following up from what Mark Davis responded... Mark Davis wrote: All comments are reviewed at the next UTC meeting. Due to the volume, we don't reply to each and every one what the disposition was. If actions were taken, they are recorded in the minutes of the meetings.

Re: June Hebrew ?

2004-10-15 Thread Kenneth Whistler
Elaine, [Feel free to forward this on to the Hebrew lists you copied on your original inquiry, if you think it appropriate.] Peter Constable replied on the Unicode list: Which items? There were three at the June meeting: - atnah hafukh - lower dot and nun hafukha - qamats qatan

Re: outside decomposed, inside precomposed

2004-10-13 Thread Kenneth Whistler
Jon Hanna wrote: imported UTF-8 sequences like [U+0065][U+0303] e, tilde get remapped internally to [U+1ebd] LATIN SMALL LETTER E WITH TILDE. Is this kind of behavior what one would expect? That's conformant, if it causes problems with any other process (including other

Re: Sample of german -burg abbreviature

2004-10-01 Thread Kenneth Whistler
At 06:04 PM 9/30/2004, Michael Everson wrote: see no reason given for us not to unify the handwritten symbol we have seen with BREVE ABOVE. and Asmus responded: Functionally, the symbol is not a breve. Visually, the sample does not look like a standard breve, and the font resource

RE: Saudi-Arabian Copyright sign

2004-09-21 Thread Kenneth Whistler
Kent wrote: Kenneth Whistler wrote: Second, there is the question of cursive joining for Arabic. I don't know anything in the Unicode Standard that states that a combining enclosing mark breaks cursive ligation. It stands to reason that it *should*, but I don't know anything

RE: Saudi-Arabian Copyright sign

2004-09-20 Thread Kenneth Whistler
Asmus responded: It's a simple combining character. Even if you can't do arbitrary circles around characters, you can take one character sequence and map it to the glyph in a font. Systems that can't do even that need to be fixed. In other words, you would like to treat this as a mandatory

Re: Unicode Shorthand?

2004-09-20 Thread Kenneth Whistler
Michael Everson responded to Christopher Fynn's question: At 13:46 +0100 2004-09-19, Christopher Fynn wrote: So, am I right in assuming that were someone put together a decent proposal for one or more shorthand scripts, there is no particular reason in principle why it would be rejected?

Unicode shorthand? Background

2004-09-20 Thread Kenneth Whistler
Incidentally, for those interested, the website of the National Court Reporters Association has a brief history of shorthand (skewed of course to the English language-based developments): http://www.ncraonline.org/about/history/shorthand.shtml A summary of the development of the Stenograph

Re: Unibook 4.0.1 available

2004-09-17 Thread Kenneth Whistler
Philippe waxed lyrical about the advantages of platform-independent development: Isn't Java hiding most of these platform details, by providing unified support for platform-specific look and feel? Aren't there now many PLAF and themes manager available with automatic default selection of the

Re: Historic scripts for Albanian: Elsaban and Beitha Kukju

2004-09-16 Thread Kenneth Whistler
Philippe asked: http://www.omniglot.com/writing/albanian.htm shows two historic scripts that have been used to write Albanian (Shqip): - the Elsaban script in the 18th century, which looks like Old Greek for the language Tosk variant. However there are lots of unique letter forms, and

Re: Japanese pitch accent representations

2004-09-07 Thread Kenneth Whistler
On 05/09/2004 18:27, John Cowan wrote: The following links show L-shaped marks, apparently combining characters, that indicate the change-of-pitch position in Japanese words written in romaji. Are these novel characters, or can they be identified with existing Unicode characters? Are

Re: Umlaut and Tréma, was: Variation sele ctors and vowel marks

2004-07-14 Thread Kenneth Whistler
Peter Kirk wrote: At 11:02 AM 7/13/2004, Peter Kirk wrote: I was surprised to see that WG2 has accepted a proposal made by the US National Body to use CGJ to distinguish between Umlaut and Tréma in German bibliographic data. And Asmus responded: You raise some interesting

Re: Umlaut and Tréma, was: Variation sele ctors and vowel marks

2004-07-14 Thread Kenneth Whistler
Peter Kirk continued: I did read it, but it didn't deal with the issue I was concerned about, of multiple combining marks. And I was concerned about that issue because that was the major concern expressed in the earlier discussion on variation selectors, and presented as the decisive

Re: Changing UCA primary weights (bad idea)

2004-07-09 Thread Kenneth Whistler
Subject: Re: Changing UCA primarly weights (bad idea) Correcting the subject, just because it bugs me... You are certainly right that this is not a slam-dunk; there are reasons for and against it. And

Re: Impotance of diacritics (was: Looking for transcription ...)

2004-07-09 Thread Kenneth Whistler
Subject: Impotance of diacritics (was: Looking for transcription ...) ^ It's a good thing this discussion of the impotence of diacritics from bushmanush didn't also mention \/|å.G4ä, and talked about *tran*scription, instead of *pre*scription, or my spam filter would

Re: Diacritic and similar foldings and spam filtering

2004-07-08 Thread Kenneth Whistler
Peter Kirk said: I made a serious point, not apparently made in the UTR draft, that diacritic folding may be useful for spam filtering and similar applications including finding misleading URIs. This seems like a reasonable point to make and to add to the discussion of folding in UTR #30.

Name of Greek block (was: Re: Greek tonos and oxia)

2004-06-30 Thread Kenneth Whistler
the versions in the main Greek and Coptic block (or has it been officially renamed just Greek?) No, the block name won't be changed, in part because changing block names is another destabilization in the standard that really serves nobody well, but mostly because the existing 14 Coptic letters

Re: what combining diacritical mark suits d and l with stroke ?

2004-06-29 Thread Kenneth Whistler
I like to use the decomposed version of Unicode characters Ð, ð, £ and ³ (U+0110, U+0111, U+0141 and U+0142). For example, d followed by a combining_diacritical_mark should generate ð (d with stroke). What combining_diacritical_mark should be used for this case ? As Michael and Clark

Re: lines 05-08, version 4.7 of Roadmap to BMP and 'Hebrew extensions'

2004-06-29 Thread Kenneth Whistler
Elain asked: Quotes below from the SMP .pdf---I can't put the three quotes below together intelligibly. Do the quotes mean that the Linear B syllabary and Old Italic and Ugaritic are already in permanent locations in the SMP, or do they mean something else? You should start with the

Re: Greek tonos and oxia

2004-06-29 Thread Kenneth Whistler
I have a (hopefully) short question about polytonic Greek support. Does anyone know what the idea was behind encoding Greek vowel+acute combinations (without apirates, etc.) twice: first in the Basic Greek section as vowel+tonos, for the second time in the Extended Greek section as

RE: Rendering of sequences containing double diacritic (was Re: Bantu click letters)

2004-06-11 Thread Kenneth Whistler
Peter Constable wrote, Don't forget canonical equivalence (I forgot about this as well): the double-width diacritics have a combining class of 234 rather than 230. This means that 0251 0361 0302 028A is canonically equivalent to 0251 0302 0361 028A. Therefore, the first (for better or

Re: Medieval CJK race-horse names (was Re: Bantu click letters )

2004-06-11 Thread Kenneth Whistler
On Jun 11, 2004, at 6:44 AM, Andrew C. West wrote: Depite the oft-mentioned cutesy Hong Kong race horse names, idiosyncratic invented Han ideographs are a negligible component of the encoded CJK repertoire. In my opinion there are thousands, possibly tens of thousands, of

Re: Bantu click letters

2004-06-10 Thread Kenneth Whistler
Michael, And now you are answering arguments with irrelevancies. But the argument in this particular case hinges on a particular, nonce set of characters. You use nonce very easily. Nonce: Occurring, used, or made only once or for a special occasion. You can, of course, quibble that this

Re: Script variants and compatibility equivalence, was: Response to Everson Phoenician and why June 7?

2004-06-04 Thread Kenneth Whistler
Peter, There is no consensus that this Phoenician proposal is necessary. I and others have also put forward several mediating positions e.g. separate encoding with compatibility decompositions Which was rejected by Ken for good technical reasons. I don't remember any technical reasons,

Re: Game pieces proposal

2004-06-01 Thread Kenneth Whistler
António noted: Dunno about the others, but spanish play cards suit symbols are clearly style variations of U+2660, U+2663, U+2665 and U+2666. (BTW, I'm right asuming that U+2660, U+2663, U+2665 and U+2666 are the actual suit symbols, while U+2661, U+2662, U+2664 and U+2667 are just

RE: PH technical issues (was RE: Why Fraktur is irrelevant

2004-05-28 Thread Kenneth Whistler
Peter Constable responded to Peter Kirk: From: Peter Kirk [mailto:[EMAIL PROTECTED] Sent: Friday, May 28, 2004 1:40 PM Well, I understood the semantic content of a text to be the meaning of the words... [Kirk continuing, to provide more context... , not the indication of which

Re: Palaeo-Hebrew, Phoenician, and Unicode (Phoenician Unicode proposal)

2004-05-26 Thread Kenneth Whistler
Dean Snyder parried (and missed): James Kass wrote at 4:37 PM on Wednesday, May 26, 2004: Shemayah Phillips of ebionite.org It has some differences in representing Hebrew because square script has more characters (e.g., shin/sin) than Palaeo. Not a relevant argument - Spanish has more

Re: Response to Everson Phoenician and why June 7?

2004-05-25 Thread Kenneth Whistler
Peter, There is no consensus that this Phoenician proposal is necessary. I and others have also put forward several mediating positions e.g. separate encoding with compatibility decompositions Which was rejected by Ken for good technical reasons. I don't remember any technical

Re: VISCII (was: Re: [BULK] - Re: MCW encoding of Hebrew)

2004-05-25 Thread Kenneth Whistler
John Cowan asked: Doug Ewell scripsit: So is [VIQR] a 7-bit encoding, or a scheme layered on top of ASCII? It's a scheme layered on top of ASCII And what is KOI-7? A true 7-bit encoding for Russian, in which Cyrillic letters (small and capital respectively) were encoded in

Re: Multiple Writing Directions in One Script

2004-05-25 Thread Kenneth Whistler
Archaic Greek could be written right-to-left, left-to-right, or boustrophedon. I'm asking for technical advice as to how such variability in writing direction streams in the same script can be, and should be, handled in Unicode, and how it should be dealt with in a Unicode proposal. TUS

Re: Glyph Stance

2004-05-25 Thread Kenneth Whistler
Dean Snyder asked: Archaic Greek exhibits variable glyph stance, that is, glyphs can be flipped horizontally or even vertically, usually dependent upon the direction of the writing stream. How should variable glyph stance for the same characters in the same script be dealt with in Unicode

Re: Proposal to encode dominoes and other game symbols

2004-05-25 Thread Kenneth Whistler
John Hudson asked: I would like to know what the presumed purpose of U+2616 and U+2617 is. In Unicode? To map to JIS X 0213. You need to ask the JSC what *their* intent was in adding these two characters to the Japanese standard. Not so. Both sides has four generals: two 'gold' and two

Language Tagging in Plain Text (was: Re: Response to blah blah blah)

2004-05-24 Thread Kenneth Whistler
[EMAIL PROTECTED] (James Kass) writes: And we use language tagging in plain text how? I seem to remember the Japanese asking that. It wasn't the Japanese that asked for it. And I seem to remember Unicode encoding the Plane 14 tags for that. Plane 14 language tags were encoded to

Re: MCW encoding of Hebrew (was RE: Response to Everson Ph and why Jun 7? fervor)

2004-05-24 Thread Kenneth Whistler
Philippe asked: In fact, any existing MCW/ASCII-encoded file of Hebrew text is, in fact, also MCW/Unicode-encoded since the representation of Basic Latin characters at the character encoding form and character encoding scheme levels is exactly the same for ASCII as it is for Unicode:

Re: Zip vs. Non Zipped and ISO 15924 draft fixes

2004-05-21 Thread Kenneth Whistler
Doug asked: I'm sure this is a dumb question, but why would there be any pages in non-Unicode charsets on the Unicode Web site? Legacy, just as for many sites. The question is whether it makes sense to go back to older, archived material and: a. delete it, because it is in Latin-1 or CP

Re: Compatibility equivalents, was: Qamats Qatan

2004-05-21 Thread Kenneth Whistler
Peter Kirk suggested: Similarly, I suppose, with the proposed Phoenician script: each character could be given a compatibility decomposition to the equivalent Hebrew letter. This implies automatic interleaved collation. Now, while I don't expect Michael Everson to jump at this suggestion,

Re: Response to Everson Phoenician and why June 7?

2004-05-21 Thread Kenneth Whistler
Dean continued: Or (making the missed point explicit): I attempted to bring this thread back on track yesterday, but since it seems to have veered off into the ditch again, we may as well spin our wheels some more, I guess. :-( If the UTC did consider the potential for large numbers of users

Re: Response to Everson Phoenician and why June 7?

2004-05-20 Thread Kenneth Whistler
Patrick said: In this case, I think it's important to be picky because there are no current Unicoding practices for Phoenician. You may mean that the Unicode book does not document how Phoenician (or Paleo-Hebrew) may be encoded. This is not to say that no one is using Unicode to encode

Variation Sequences as Substitute for Fonts or for Encoding a Script (Was... Phoenician ...)

2004-05-20 Thread Kenneth Whistler
Ernest indicated: Whether using variation sequences to separate Phoenician from Square Hebrew would be daft would depend upon a number of factors. How often would both glyph repertoires appear in the same document? How frequently would non-Square Hebrew glyphs be used? How important

  1   2   3   4   5   6   7   8   >