Re: UNICODE version of _T(x) macro

2010-11-22 Thread Kenneth Whistler
Somya asked: > I have unicode C application. I am using the following macro > to define my string > to 2 byte width characters. > > #ifdef UNICODE > #define _T(x) L##x > > But I see that GCC compiler maps 'L' to wchar_t, which is 4 byte on Linux. I > have used -fshort-wchar option > on Linux

CJK Compatibility Gotchas (was: Re: Application that displays CJK text in Normalization Form D

2010-11-15 Thread Kenneth Whistler
Asmus replied: > On 11/15/2010 2:24 PM, Kenneth Whistler wrote: > >> FA47 is a "compatibility character", and would have a > >> compatibility mapping. > > Faulty syllogism. > > Formally correct answer but only because of something of a design flaw

RE: Application that displays CJK text in Normalization Form D

2010-11-15 Thread Kenneth Whistler
> FA47 is a "compatibility character", and would have a compatibility mapping. Faulty syllogism. FA47 is a CJK Compatibility character, which means it was encoded for compatibility purposes -- in this case to cover the round-trip mapping needed for JIS X 0213. However, it has a *canonical* deco

Re: Is there a term for strictly-just-this-encoding-and-not-really-that-encoding?

2010-11-10 Thread Kenneth Whistler
Mark Davis wrote: > What are also tricky are the 'almost' supersets, where there are only a few > different characters. Those definitely cause problems because the difference > in data is almost undetectable. For example, Mark is referring to cases such as ISO 8859-1 and 8859-15. Those share all

Re: IDNA2008 Contextual rules clarification

2010-10-29 Thread Kenneth Whistler
Nagesh Chigurupati asked: > I have a question regarding some of the contextual rules in RFC5892. For > example the contextual rule in appendix A.4 Greek Lower Numeral Sign > (U+0375), states the following: > > If Script(After(cp)) .eq. Greek Then True; > > If the Greek Lower Numeral Sign (U+037

RE: Is there any unambiguous vowel length mark code point for classicists?

2010-10-27 Thread Kenneth Whistler
Gy. Dobner asked: > But my original question was not how to encode a combining macron in one > more possible way but how to encode a length mark that would display as > something _visually_ _distinguishable_ _from_ _a_ _macron_ (because the > macron is functionally ambiguous and hence unsuitable f

Re: Creative people on Twitter

2010-10-14 Thread Kenneth Whistler
> > What is the position regarding the 32-bit code point space > > above U+10 please? > > Does the Unicode Consortium and/or ISO or indeed anyone else > > make any claims upon it? > Yes, the claim is that if you use it, you're generating invalid Unicode. > > Don't do it, don't contempla

Re: Irrational numeric values in TUS

2010-10-12 Thread Kenneth Whistler
Asmus, > >> I'm curious if any thought was given to this, and what code points I'm > >> missing in my analysis. > > U+1D452 MATHEMATICAL ITALIC SMALL E (or merely U+0065 LATIN > > SMALL LETTER E), also used for Euler's number. See also U+2147. > > Now you are confusing Euler's constant - also dep

Re: Irrational numeric values in TUS

2010-10-12 Thread Kenneth Whistler
Karl Williamson asked: > The Unicode standard only gives numeric values to rational numbers. Is > the reason for this merely because of the difficulty of representing > irrational ones? No. Primarily it is because the Unicode Standard is a *character* encoding standard, and not a standard for

Re: [OT]: a strange language name abbreviation (was: How to encode reversed section sign?)

2010-08-06 Thread Kenneth Whistler
> Exploring the dictionary with the search engine (which is operational > since today morning ...) I discovered two occurences of an unexplained > abbreviation which refers to a language in which "silvir" means > "silver" and "ses" means "six". The name of the language is > abbreviated as "Kimr."

Re: Draft Proposal to add Variation Sequences for Latin and Cyrillic letters

2010-08-05 Thread Kenneth Whistler
> I am thinking of where a poet might specify an ending version > of a glyph at the end of the last word on some lines, yet not > on others, for poetic effect. I think that it would be good > if one could specify that in plain text. Why can't a poet find a poetic means of doing that, instead o

Re: UTS#10 (UCA) 7.1.3 Implicit Weights, Unassign ed and Other CodeÿA Points

2010-08-04 Thread Kenneth Whistler
> > That statement is incorrect. The UCA currently specifies that > > ill-formed code unit sequences and *noncharacters* are mapped > > to [....], but unassigned code points are not. > > This is exactly equivalent: if you use strength level 3, they are > both [...], ...

Re: Signage

2010-08-04 Thread Kenneth Whistler
> > But an approach that abstracts the name, then tries to re-imagine a > > representation from scratch is, in my view, very much misguided. > > Recall that many of the emojis 1) have changed glyphs quite a lot from > the source glyphs, and 2) are to quite an extent defined from the *source* > (J

Re: Results of public Review Issues (in particular #121)

2010-08-03 Thread Kenneth Whistler
Martin, > In a discussion about a new protocol, there was some issue about how to > replace illegal bytes in UTF-8 with U+FFFD. That let me remember that > there was once a Public Review Issue about this, and that as a result, I > added something to the Ruby (programming language) codebase. I t

Re: UTS#10 (UCA) 7.1.3 Implicit Weights, Unassigned and Other Code Points

2010-08-02 Thread Kenneth Whistler
Philippe Verdy said: > Implicit weights for unassigned code points and other characters that > are NOT ill-formed are suboptimal, as noted in the proposed update. To follow up on Mark's response on this thread... > > It should take into account their existing default properties, notably : [ lo

Re: CSUR Tonal

2010-07-30 Thread Kenneth Whistler
Luke asked: > Given this scenario, is it proper to encode perhaps one set of TONAL MODIFIER > LETTER SMALL _ suitable for use, No. > are we stuck using these mismatching existing > encodings, No, although if I were representing this data, that is probably what I would use. > or perhaps some

Re: UTS#10 (collation) : French backwards level 2, and word-breakers.

2010-07-30 Thread Kenneth Whistler
Frédéric Grosshans asked: > Why did you chose the "fleur" words ? The question discussed about the > accent do not seem to arise here. I was struck by the issues about space, hyphen (or lack thereof) and alternate spellings that could be illustrated by that stretch of topics, so used that as the

Re: UTS#10 (collation) : French backwards level 2, and word-breakers.

2010-07-29 Thread Kenneth Whistler
A couple of weeks ago, in this thread Philippe Verdy said: > Breaking on words, even if it requirs a very modest buffering, > will significantly improve the processing time, > because each word in the long texts will be scanned only > once, and all the rest will occur within the small and > co

Re: [ISO15924] Typo for Egyptian_Hierog(l)yphs

2010-07-29 Thread Kenneth Whistler
Philippe Verdy noted: > > Everywhere below, the Unicode property value alias is missing an 'l'. > > - In HTML table 1: > Egyp 050 Egyptian hieroglyphshiéroglyphes égyptiens Egyptian > _Hierogyphs 2009-06-01 etc. These errors in the tables have been corrected by the Registration Aut

Re: Why does EULER CONSTANT not have math property and PLANCK CONSTANT does?

2010-07-27 Thread Kenneth Whistler
Karl Williamson asked: > Subject: Why does EULER CONSTANT not have math property and PLANCK CONSTANT > does? > They are U+2107 and U+210E respectively. Because U+210E PLANCK CONSTANT is, to quote the standard, "simply a mathematical italic h". It serves as the filler for the gap in the run of m

Re: ? Reasonable to propose stability policy on numeric type = decimal

2010-07-27 Thread Kenneth Whistler
C. E. Whitehead said: > I've not gone through many character charts though so I can't > really speak as an expert as you all can; sorry I've not gotten > to more; I will try to ... For people who wish to pursue this issue further, the relevant information is neatly summarized in the extracted p

Re: VS characters, default ignorable property and text search and collation

2010-07-26 Thread Kenneth Whistler
Sharma asked: > I have a question about VS characters and the default ignorable property. > > TUS 5.2 ch 16.4 clearly states that VS characters are default ignorable. > Ch 5.21 states that default ignorable characters are to be ignored in > rendering (except in specialized modes which show hidd

Re: Indian Rupee Sign (U+20B9) proposal

2010-07-22 Thread Kenneth Whistler
> > On this date, Unicode had received proposals for same purpose > > form non-insiders too -- as you know this is true because India > > is a nation of over a billion populations. > > I have seen no other proposals to encode the character, submitted > either to the UTC or to WG2. Actually, t

Re: Pau Cin Hau scripts proposal : confusive N3865 and better older N3781

2010-07-20 Thread Kenneth Whistler
Philippe Verdy said: > A side note about this preliminary proposal for allocating blocks in > the SMP for the two Pau Cin Hau scripts (including one for the large > "logographic" script, with 1050 signs): > > http://std.dkuug.dk/JTC1/SC2/WG2/docs/n3865.pdf > > (authored by Anshuman Pandey, in MI

Re: Arab Ma[r]ks

2010-07-14 Thread Kenneth Whistler
Arno Schmitt noted: > The marks in the Arabic bloc are not well organized; A well-known fact that has resulted from the prior legacy for Arabic encoding brought into Unicode, followed by twenty years of incremental encoding of additional marks, as evidence has been brought to bear and proposals f

Re: Bengali Script

2010-07-13 Thread Kenneth Whistler
> So what do we do with all these names? > Can't we ask Mark to use a lottery to pick one and go from there? ... So whaddya say, Mark? Have a go at the roulette wheel? Ladies and gentlemen... step right up and place your bets!! Bengali, Bangla, Bengalese, Bangladeshi, Bengalian, Bengalish, Beng

Re: UTS#10 (collation) : French backwards level 2, and word-breakers.

2010-07-12 Thread Kenneth Whistler
Philippe Verdy said: > A basic word-breaker using inly the space separator would marvelously > improve the speed of French sorting even if backwards ordering occurs, > just because it would significantly improve the data locality in the > implementation and would considerably reduce the reallocati

RE: UTS#10 (collation) : French backwards level 2, and word-breakers.

2010-07-12 Thread Kenneth Whistler
Philippe Verdy wrote: > "Kenneth Whistler" wrote: > > Huh? That is just preprocessing to delete portions of strings > > before calculating keys. If you want to do so, be my guest, > > but building in arbitrary rules of content suppression into > > the

RE: UTS#10 (collation) : French backwards level 2, and word-breakers.

2010-07-12 Thread Kenneth Whistler
Philippe Verdy said: > If we don't limit the backwards reordering, then all accents in the > full sentences will be reordered, so this is the final word that will > drive the order. not only this is incorrect, I understand that you think that the ordering should be done word-by-word, with the Fre

RE: UTS#10 (collation) : French backwards level 2, and word-breakers.

2010-07-07 Thread Kenneth Whistler
[ snipping all the word breaking discussion, which I am not going to comment on ... ] CE Whitehead said: > I collate as follows (note that i' is equivalent to i with accent grave): > > (EXAMPLE 1 -- my sort) > di Silva, Fred, > di Silva, John > di Si'lva, Fred > di Si'lva, John > Disilva, Fr

Re: Keying emoji characters using an ordinary keyboard (from Re: "ASCII" emoji in iOS4)

2010-06-30 Thread Kenneth Whistler
William Overington asked: > Will the Unicode Standard version 6.0 include mention of > the unification of characters from the emoji set used in > mobile telephones with earlier Unicode characters, also > including a list of those characters of the emoji set > that have been unified and where t

Euro Sign in 8859-15 (was: Re: Indian Rupee Sign to be chosen today)

2010-06-25 Thread Kenneth Whistler
> On Fri, 25 Jun 2010, I wrote > > > Even in the year 2010, the euro sign (¤) doesn't work reliably. > > in both the Unicode list and in the newsgroup de.test. > > unicode.org shows a euro sign: > http://www.unicode.org/mail-arch/unicode-ml/y2010-m06/0372.html > > groups.google.com shows a cur

Re: Refining the idea for the SignWriting proposal

2010-06-21 Thread Kenneth Whistler
A small aside on one suggestion by Philippe Verdy: > This also suggests a new separate general category for the abstract > symbols/traits encoded for such complex scripts, instead of assigning > them in "gc=Lo" or defining them as unrelated symbols in "gc=S*" : > possibly "gc=Lx" ? That would run

Re: Latin Script

2010-06-16 Thread Kenneth Whistler
> John -> If I define a symbol (variable or constant) named ɸ and some > user types 'φ' or 'ϕ' instead, it won't match. > > Can you please post the names for the other two, i.e., 'φ' or 'ϕ' ? John was referring to: U+0278 LATIN SMALL LETTER PHI U+03C6 GREEK SMALL LETTER PHI U+03D5 GREEK P

Re: Writing a proposal for an unusual script: SignWriting

2010-06-11 Thread Kenneth Whistler
Steve, > All of this writing can be encoded using 1280 code points. I > have a 12-bit encoding with bi-directional conversion with UTF-8 working > for planes 1, 15, or 16. A minor point, but I suggest you not use "bi-directional" in that context. "Bidirectional" is a term of art in Unicode ch

Re: Hexadecimal digits

2010-06-04 Thread Kenneth Whistler
> But again, I'm not talking about programming. My four year old can grasp > tonal > just as well as she could decimal had I been teaching that. Now if I were > using the a-f notation, she would be (reasonably) confused as to why *some* > numbers are unique, but *other* numbers are also letter

Re: Hexadecimal digits

2010-06-04 Thread Kenneth Whistler
> On Friday 04 June 2010 08:51:05 am Otto Stolz wrote: > > In any case, you have to know the base of every number > > you are going to parse. This stems from the fact that > > the same digits are used for all number systems. Luke-Jr replied: > > But you first need to know if it is a number or a

Tengwar and Cirth (was: Re: A question about "user areas")

2010-06-02 Thread Kenneth Whistler
> I'm not sure how much longer we should continue to wait for Tengwar and > Cirth. Three words: Squeaky wheel -- grease. Don't expect this to "just happen". The corporate members of the Unicode Consortium are mostly concerned about economically significant sets of characters that impact their b

RE: Greek letter "LAMDA"?

2010-06-02 Thread Kenneth Whistler
> > Note that as of 1993, the only "LAMDA" or "LAMBDA" characters > > in the standard were: > > > > 039B;GREEK CAPITAL LETTER LAMDA;Lu;0;L;N;GREEK CAPITAL LETTER > > LAMBDA;;;03BB; > > 03BB;GREEK SMALL LETTER LAMDA;Ll;0;L;N;GREEK SMALL LETTER > > LAMBDA;;039B;;039B > > 019B;LATIN SMALL LE

RE: Greek letter "LAMDA"?

2010-06-01 Thread Kenneth Whistler
> Why not? I thought the names of some things have changed > between versions, and other database items have changed substantially. See "Name Stability" on the Unicode Character Encoding Stability Policy page: http://www.unicode.org/policies/stability_policy.html --Ken > > Names sometimes d

Re: Greek letter "LAMDA"?

2010-06-01 Thread Kenneth Whistler
Robert Abel noted: > It seems U+019B is the only instance where "lambda" is used. All other > instances use "lamda". So it seems the slip-up is the other way around, > whatever the initial reasoning for using "lamda" was. It was not a slip-up. It was deliberate at the time (1993). Note that as

Re: Greek letter "LAMDA"?

2010-06-01 Thread Kenneth Whistler
John Dlugosz asked: > Why does the code chart call the plain Greek letter (upper and > lower case) "LAMDA" rather than "LAMBDA"? Because ISO 8859-7 called it "LAMDA" rather than "LAMBDA". Note that Unicode 1.0 called it "LAMBDA", but synchronization of names for Unicode 1.1 (in 1993) was towar

Re: Roundtripping in Unicode

2004-12-14 Thread Kenneth Whistler
Marcin Kowalczyk noted: > Unicode has the following property. Consider sequences of valid > Unicode characters: from the range U+..U+10, excluding > non-characters (i.e. U+nFFFE and U+n for n from 0 to 0x10 and > U+FDD0..U+FDEF) and surrogates. Any such sequence can be encoded > in any

Re: Validity and properties of U+FFFD (was RE: Roundtripping in Unico de)

2004-12-14 Thread Kenneth Whistler
Lars asked: > BTW, what are the properties of U+FFFD? In English please, do not point me > to the standard. ?! It has the general category of "Symbol Other" [gc=So]. > Like, can it be a part of an identifier, It does not have the ID_Start or the ID_Continue property, which you could determin

RE: Roundtripping in Unicode

2004-12-14 Thread Kenneth Whistler
Lars said: > According to UTC, you need to keep processing > the UNIX filenames as BINARY data. And, also according to UTC, any UTF-8 > function is allowed to reject invalid sequences. Basically, you are not > supposed to use strcpy to process filenames. This is a very misleading set of statement

RE: Roundtripping in Unicode

2004-12-13 Thread Kenneth Whistler
Lars Kristan stated: > I said, the choice is yours. My proposal does not prevent you from doing it > your way. You don't need to change anything and it will still work the way > it worked before. OK? I just want 128 codepoints so I can make my own > choice. You have them: U+EE80..U+EEFF, which a

Re: Please RSVP... (was: US-ASCII)

2004-12-10 Thread Kenneth Whistler
Philippe, > RSVP is a French acronym for "Répondez, s'il vous plait". Yes, we know that. But it is also a reanalyzed English verb which means "reply to a message (or invitation)". That it has been morphological reanalyzed is demonstrated by the fact that it takes regular English verb endings, a

Re: US-ASCII (was: Re: Invalid UTF-8 sequences)

2004-12-10 Thread Kenneth Whistler
Tim Greenwood asked: > > ... a perfectly normal linguistic process of > > attributive disambiguation of a term which had grown ambiguous > > in usage. > > Is that like the 'Please RSVP' that I see all too often? Or should > that not be excused? *grins* Well, technically, that is not a case of at

Re: US-ASCII (was: Re: Invalid UTF-8 sequences)

2004-12-10 Thread Kenneth Whistler
> If any > criticism was present, it referred to the redundant "US-" prefix in > "US-ASCII", not to Unicode, and even that wasn't really criticism, just my > lack of understanding /why/. In addition to Doug's historical clarification, you need to understand this as a perfectly normal linguistic

Re: Roadmapped scripts

2004-12-09 Thread Kenneth Whistler
Peter Kirk noted: > I was reviewing the Roadmap for the SMP > (http://www.unicode.org/roadmaps/smp/), in comparison with the list of > proposed new scripts, and found a few anomalies. > > "Hittite (Anatolian) Hieroglyphs/Luvian" is listed as a proposed new > script, with a draft proposal, but

RE: Invalid UTF-8 sequences (was: Re: Nicest UTF)

2004-12-08 Thread Kenneth Whistler
Lars responded: > > ... Whatever the solutions > > for representation of corrupt data bytes or uninterpreted data > > bytes on conversion to Unicode may be, that is irrelevant to the > > concerns on whether an application is using UTF-8 or UTF-16 > > or UTF-32. > The important fact is that if you

Re: Nicest UTF

2004-12-08 Thread Kenneth Whistler
Marcin asked: > The general trouble is that numeric character references can only > encode individual code points By design. > rather than graphemes (is this a correct > term for a non-combining code point with a sequence of combining code > points?). No. The correct term is "combining characte

Re: Invalid UTF-8 sequences (was: Re: Nicest UTF)

2004-12-08 Thread Kenneth Whistler
John Cowan responded: > > Storage of UNIX filenames on Windows databases, for example, ^^ O.k., I just quoted this back from the original email, but it really is a complete misconception of the issue for databases. "Windows databases" is a misn

RE: Invalid UTF-8 sequences (was: Re: Nicest UTF)

2004-12-07 Thread Kenneth Whistler
Lars, I'm going to step in here, because this argument seems to be generating more heat than light. > I never said it doesn't violate any existing rules. Stating that it does, > doesn't help a bit. Rules can be changed. > I ask you to step back and try to see the big picture. First, I'm going

Re: Nicest UTF.. UTF-9, UTF-36, UTF-80, UTF-64, ...

2004-12-07 Thread Kenneth Whistler
Philippe continued: > As if Unicode had to be bound on > architectural constraints such as the requirement of representing code units > (which are architectural for a system) only as 16-bit or 32-bit units, Yes, it does. By definition. In the standard. > ignoring the fact that technologies do

Re: Nicest UTF.. UTF-9, UTF-36, UTF-80, UTF-64, ...

2004-12-07 Thread Kenneth Whistler
Philippe stated, and I need to correct: > UTF-24 already exists as an encoding form (it is identical to UTF-32), if > you just consider that encoding forms just need to be able to represent a > valid code range within a single code unit. This is false. Unicode encoding forms exist by virtue of

Re: Relationship between Unicode and 10646

2004-11-30 Thread Kenneth Whistler
Peter, > This was in fact my question: will the amendment be > passed automatically if there is a majority in favour, or does it go > back for further discussion until a consensus is reached? You have > clarified that the latter is true. And I am glad to hear it. The relevant applicable clause

Re: Relationship between Unicode and 10646

2004-11-30 Thread Kenneth Whistler
John Cowan clarified the JTC1 process: > The result of a > "no" vote is that the process loops until all such votes are resolved. All comments on a formal JTC1 ballot receive a *disposition*. As far as possible, that disposition is done by committee consensus, which usually means, in practice, th

RE: Ideograph?!?

2004-11-29 Thread Kenneth Whistler
Allen Haaheim provided some further detailed clarification: > Note that Han characters are logographic, not ideographic. That is, > they are graphemes that represent words (or at least morphemes), > not ideas. This correctly states the situation for the normal case for Chinese characters used w

Re: No Invisible Character - NBSP at the start of a word

2004-11-29 Thread Kenneth Whistler
John Hudson responded to Jony Rosenne: > The idea that the position of such text on a page -- as a marginal > note -- somehow demotes > it from being text, is particularly nonsensical. I think you two (Jony and John) are talking at cross-purposes on this particular point. The *content* of marg

Re: Ideograph?!?

2004-11-29 Thread Kenneth Whistler
Michael Norton (a.k.a. Flarn) asked: > What's an ideograph? Also, what's a radical? > Are they the same thing? No, they aren't. In the Unicode context, the simplest answer is that an "ideograph" or a "CJK ideograph" is simply to be taken as a synonym for "a Chinese character". A "radical" is on

Re: CGJ , RLM

2004-11-29 Thread Kenneth Whistler
Mark Davis said (in reference to a long set of comments by Philippe Verdy on this thread): > The statements below are incorrect And Philippe asked: > Which "statements"? My message is mostly a read as a question, not as an > affirmation... And I will attempt the fact-finding... > CGJ is a com

Dutch malarkey (was: Re: (base as a combing char))

2004-11-29 Thread Kenneth Whistler
Philippe Verdy responded to John Cowan: > From: "John Cowan" <[EMAIL PROTECTED]> > > the need to encode Dutch > > ij as a single character, which is neither necessary nor practical. > > (U+0132 and U+0133 are encoded for compatibility only.) In cases where > > ij is a digraph in Dutch text, i+ZWN

Re: CGJ , RLM

2004-11-29 Thread Kenneth Whistler
Otoo Stolz asked: > In German, however, a ligature must not span a syllable break. > How should I code plain text, w.r.t. hyphenation and ligatures? > - "Huf" + ZWNJ + "lattich" > - "Huf" + SYH + "lattich" > - "Huf" + SYH + ZWNJ + "lattich" > - "Huf" + ZWNJ + SYH + "lattich" You should code it as

RE: Question on Canonical equivilance

2004-11-24 Thread Kenneth Whistler
Tim Greenwood asked: > > All of the spacing combining marks (general category Mc) except > > musical symbols have a canonical combining class of 0. So, for example > > > > 0B95 (TAMIL LETTER KA) 0BC7 (TAMIL VOWEL SIGN EE - stands to the left > > of the consonant) 0BBE (TAMIL VOWEL SIGN AA - on th

Re: My Querry

2004-11-23 Thread Kenneth Whistler
Harshal Trivedi asked: > How can i make sure that UTF-8 format string has terminated while > encoding it, as compared to C program string which ends with '\0' > (NULL) character? You don't need to do anything special at all when using UTF-8 in C programs, as far as string termination goes. UTF-8

About Encoding Theory (was: Re: Again not about Phoenician)

2004-11-08 Thread Kenneth Whistler
Peter Kirk suggested: > I am suggesting that the best way to get the job done properly is to lay > the conceptual foundation properly first, instead of trying to build a > structure on a foundation which doesn't match... Part of the problem that I think some people are having here, including Pete

Re: not font designers?

2004-11-03 Thread Kenneth Whistler
Elaine Keown asked: > Supposedly this list has >600 people. > > Just of curiosity, how many of you are NOT font > designers? And since a number of people are declaring their backgrounds, I'll chime in, too. ;-) I am not a font designer, although I have designed fonts (many years ago) for ling

Re: Public Review Issues Update

2004-10-21 Thread Kenneth Whistler
Theo, Further following up from what Mark Davis responded... > Mark Davis wrote: > > All comments are reviewed at the next UTC meeting. Due to the volume, we > > don't reply to each and every one what the disposition was. If actions were > > taken, they are recorded in the minutes of the meetings

Re: June Hebrew ?

2004-10-15 Thread Kenneth Whistler
Elaine, [Feel free to forward this on to the Hebrew lists you copied on your original inquiry, if you think it appropriate.] > Peter Constable replied on the Unicode list: > >Which items? There were three at the June meeting: > >- atnah hafukh > >- lower dot and nun hafukha > >- qamats qatan

Re: outside decomposed, inside precomposed

2004-10-13 Thread Kenneth Whistler
> Jon Hanna wrote: > > >>imported UTF-8 sequences like [U+0065][U+0303] get > >>remapped > >>internally to [U+1ebd] LATIN SMALL LETTER E WITH TILDE. > >> > >>Is this kind of behavior what one would expect? > >> > >> > > > >That's conformant, if it causes problems with any other process (in

Re: Sample of german -burg abbreviature

2004-10-01 Thread Kenneth Whistler
> At 06:04 PM 9/30/2004, Michael Everson wrote: > > see no reason given for us not to unify the handwritten symbol we have > > seen with BREVE ABOVE. and Asmus responded: > Functionally, the symbol is not a breve. Visually, the sample does not look > like a standard breve, and the font resou

Re: Saudi-Arabian Copyright sign

2004-09-22 Thread Kenneth Whistler
Jonathan Coxhead asked: > >>Then could/should we use the sequence <200C, 062D, 20DD, 200C>? > > > > > > You *could* use that sequence, and if your rendering implementation > > were sophisticated enough, it *might* render what you were > > expecting. > > So here's my question ... > > If

Re: Saudi-Arabian Copyright sign

2004-09-22 Thread Kenneth Whistler
Antoine asked: > On Tuesday, September 21st, 2004 18:50 Kenneth Whistler va escriure: > > > > With this change in place, it seems to me that the case is > > quite clear *for* separate encoding of any circled Arabic > > letter used as a symbol. If the sequence <062D

RE: Saudi-Arabian Copyright sign

2004-09-21 Thread Kenneth Whistler
Kent wrote: > Kenneth Whistler wrote: > > > Second, there is the question of cursive joining for Arabic. > > I don't know anything in the Unicode Standard that states that > > a combining enclosing mark breaks cursive ligation. It stands > > to reason that it

Unicode & shorthand? Background

2004-09-20 Thread Kenneth Whistler
Incidentally, for those interested, the website of the National Court Reporters Association has a brief history of shorthand (skewed of course to the English language-based developments): http://www.ncraonline.org/about/history/shorthand.shtml A summary of the development of the Stenograph machin

Re: Unicode & Shorthand?

2004-09-20 Thread Kenneth Whistler
>> There is no specific allocation > > for Gregg or Pitman or any other particular system, but > > 11E00..11FFF is currently blocked out for shorthands, simply > > as a placeholder to indicate that we know such systems > > exist and that somebody might bring forth a proposal and > > that if success

Re: Unicode & Shorthand?

2004-09-20 Thread Kenneth Whistler
Michael Everson responded to Christopher Fynn's question: > At 13:46 +0100 2004-09-19, Christopher Fynn wrote: > > >So, am I right in assuming that were someone put together a decent > >proposal for one or more shorthand scripts, there is no particular > >reason in principle why it would be rej

RE: Saudi-Arabian Copyright sign

2004-09-20 Thread Kenneth Whistler
Asmus responded: > >It's a simple combining character. Even if you can't do arbitrary circles > >around characters, you can take one character sequence and map it to the > >glyph in a font. Systems that can't do even that need to be fixed. > > In other words, you would like to treat this as a man

Re: Unibook 4.0.1 available

2004-09-17 Thread Kenneth Whistler
Philippe waxed lyrical about the advantages of platform-independent development: > Isn't Java hiding most of these platform details, by providing unified > support for platform-specific look and feel? Aren't there now many PLAF and > themes manager available with automatic default selection of t

Re: Historic scripts for Albanian: Elsaban and Beitha Kukju

2004-09-16 Thread Kenneth Whistler
Philippe asked: > http://www.omniglot.com/writing/albanian.htm > shows two historic scripts that have been used to write Albanian (Shqip): > - the Elsaban script in the 18th century, which looks like Old Greek for the > language Tosk variant. However there are lots of unique letter forms, and >

Re: Japanese pitch accent representations

2004-09-07 Thread Kenneth Whistler
> On 05/09/2004 18:27, John Cowan wrote: > > >The following links show L-shaped marks, apparently combining > >characters, that indicate the change-of-pitch position in Japanese > >words written in romaji. Are these novel characters, or can they > >be identified with existing Unicode characters?

Re: Umlaut and Tréma, was: Variation sele ctors and vowel marks

2004-07-14 Thread Kenneth Whistler
> >>One > >>such situation is Holam Male which never takes an additional combining > >>mark*. So why can't we represent it as ? > >> > >> > > > >Because the UTC has ruled out as interpretable sequences. > > > > > > Is there a better reason than "because we say so"? You don't have to >

Re: Umlaut and Tréma, was: Variation sele ctors and vowel marks

2004-07-14 Thread Kenneth Whistler
Peter Kirk continued: > I did read it, but it didn't deal with the issue I was concerned about, > of multiple combining marks. And I was concerned about that issue > because that was the major concern expressed in the earlier discussion > on variation selectors, and presented as the decisive re

Re: Umlaut and Tréma, was: Variation sele ctors and vowel marks

2004-07-14 Thread Kenneth Whistler
Peter Kirk wrote: > > At 11:02 AM 7/13/2004, Peter Kirk wrote: > > > >> I was surprised to see that WG2 has accepted a proposal made by the > >> US National Body to use CGJ to distinguish between Umlaut and Tréma > >> in German bibliographic data. And Asmus responded: > > You raise some intere

Re: Impotance of diacritics (was: Looking for transcription ...)

2004-07-09 Thread Kenneth Whistler
> Subject: Impotance of diacritics (was: Looking for transcription ...) ^ It's a good thing this discussion of the impotence of diacritics from bushmanush didn't also mention \/|å.G4ä, and talked about *tran*scription, instead of *pre*scription, or my spam filter would certainl

Re: Changing UCA primary weights (bad idea)

2004-07-09 Thread Kenneth Whistler
> Subject: Re: Changing UCA primarly weights (bad idea) Correcting the subject, just because it bugs me... > You are certainly right that this is not a slam-dunk; there are reasons for > and against it. A

Re: Diacritic and similar foldings and spam filtering

2004-07-08 Thread Kenneth Whistler
Peter Kirk said: > I made a serious point, not apparently made in the UTR draft, that > diacritic folding may be useful for spam filtering and similar > applications including finding misleading URIs. This seems like a reasonable point to make and to add to the discussion of folding in UTR #30

Name of Greek block (was: Re: Greek tonos and oxia)

2004-06-30 Thread Kenneth Whistler
> the versions in the main Greek and > Coptic block (or has it been officially renamed just "Greek"?) No, the block name won't be changed, in part because changing block names is another destabilization in the standard that really serves nobody well, but mostly because the existing 14 Coptic lett

Re: Greek tonos and oxia

2004-06-29 Thread Kenneth Whistler
> I have a (hopefully) short question about "polytonic" Greek support. > Does anyone know what the idea was behind encoding Greek vowel+acute > combinations (without apirates, etc.) twice: first in the Basic > Greek section as vowel+tonos, for the second time in the Extended > Greek section as vow

Re: lines 05-08, version 4.7 of Roadmap to BMP and 'Hebrew extensions'

2004-06-29 Thread Kenneth Whistler
Elain asked: > Quotes below from the SMP .pdf---I can't put the three > quotes below together intelligibly. > > Do the quotes mean that the Linear B syllabary and Old > Italic and Ugaritic are already in permanent locations > in the SMP, or do they mean something else? You should start with th

Re: what combining diacritical mark suits d and l with stroke ?

2004-06-29 Thread Kenneth Whistler
> I like to use the decomposed version of Unicode characters Ð, ð, £ and > ³ (U+0110, U+0111, U+0141 and U+0142). > For example, d followed by a combining_diacritical_mark should generate > ð (d with stroke). > > What combining_diacritical_mark should be used for this case ? As Michael and Clark

Re: Medieval CJK race-horse names (was Re: Bantu click letters )

2004-06-11 Thread Kenneth Whistler
> On Jun 11, 2004, at 6:44 AM, Andrew C. West wrote: > > > Depite the oft-mentioned cutesy Hong Kong race horse names, > > idiosyncratic > > invented Han ideographs are a negligible component of the encoded CJK > > repertoire. In my opinion there are thousands, possibly tens of > > thousands, o

RE: Rendering of sequences containing double diacritic (was Re: Bantu click letters)

2004-06-11 Thread Kenneth Whistler
> Peter Constable wrote, > > > Don't forget canonical equivalence (I forgot about this as well): the > > double-width diacritics have a combining class of 234 rather than 230. > > This means that 0251 0361 0302 028A is canonically equivalent to 0251 > > 0302 0361 028A. Therefore, the first (for be

Re: Bantu click letters

2004-06-10 Thread Kenneth Whistler
Michael, And now you are answering arguments with irrelevancies. > >But the argument in this particular case hinges on a particular, > >nonce set of characters. > > You use "nonce" very easily. Nonce: Occurring, used, or made only once or for a special occasion. You can, of course, quibble tha

Re: Bantu click letters

2004-06-10 Thread Kenneth Whistler
> > Simply because some images appear in some > > documents does not mean that they automatically should be > > represented as encoded > > characters. > > These aren't images. They're clearly letters; they occur in running texts and > represent > the sounds of a spoken language. Well, I agree

Re: Script variants and compatibility equivalence, was: Response to Everson Phoenician and why June 7?

2004-06-04 Thread Kenneth Whistler
Peter, > There is no consensus that this Phoenician proposal is necessary. I > and others have also put forward several mediating positions e.g. > separate encoding with compatibility decompositions > > >>>Which was rejected by Ken for good technical reasons. > >>> > >>I don't r

Re: Game pieces proposal

2004-06-01 Thread Kenneth Whistler
António noted: > Dunno about the others, but spanish play cards suit symbols are > clearly "style" variations of U+2660, U+2663, U+2665 and U+2666. > > (BTW, I'm right asuming that U+2660, U+2663, U+2665 and U+2666 are the > "actual" suit symbols, while U+2661, U+2662, U+2664 and U+2667 are > jus

Re: Proposal to encode dominoes and other game symbols

2004-06-01 Thread Kenneth Whistler
Ted Hopp responded: > On Tuesday, May 25, 2004 5:23 AM, Michael Everson wrote: > > >At what point is it more practical to say 'use a graphic'? > > > > When they are just pictures of things. Not when they are coherent > > sets of things with structure, used by people for well over a century > > to

RE: PH technical issues (was RE: Why Fraktur is irrelevant

2004-05-28 Thread Kenneth Whistler
Peter Constable responded to Peter Kirk: > > From: Peter Kirk [mailto:[EMAIL PROTECTED] > > Sent: Friday, May 28, 2004 1:40 PM > > > > Well, I understood the semantic content of a text to be the meaning of > > the words... [Kirk continuing, to provide more context... > > , not the indication o

  1   2   3   4   5   6   7   8   9   10   >