Re: Greek letter "LAMDA"?

2010-06-01 Thread Asmus Freytag
On 6/1/2010 4:14 PM, Mark Crispin wrote: Is it really necessary to have this sort of pedagogical discussions on the Unicode list? "Is this character name misspelled?" "Is Unicode a for-profit company?" "Who owns the Unicode font?" etc. etc. Perhaps we need to have a unicode-qu...@u

Re: Greek letter "LAMDA"?

2010-06-01 Thread Asmus Freytag
On 6/1/2010 6:04 PM, Mark Crispin wrote: I don't think that the unicode list should be used for the type of questions that have polluted it recently. That list unicode@unicode.org is open for general questions. It has no formal standing as far as the business of the Consortium is concerned, and

Re: Least used parts of BMP.

2010-06-01 Thread Asmus Freytag
On 6/1/2010 8:04 PM, Kannan Goundan wrote: I'm trying to come up with a compact encoding for Unicode strings for data serialization purposes. The goals are fast read/write and small size. Why not use SCSU? You get the small size and the encoder/decoder aren't that complicated. You get the

Re: Greek letter "LAMDA"?

2010-06-02 Thread Asmus Freytag
On 6/2/2010 11:46 AM, Jonathan Rosenne wrote: Although this mail was not addressed to me, I did read it. Sue me. The terms of use for the Unicode mail list essentially state that these types of boilerplate are null and void as far as Unicode is concerned. You will find the following in h

Re: Greek letter "LAMDA"?

2010-06-02 Thread Asmus Freytag
On 6/2/2010 3:28 PM, John Dlugosz wrote: If anyone can “null and void” it, I wonder why companies bother to put such things in people’s outgoing mail. I would have thought they could come up with a proper net-etiquite version, but they just don’t care. These things are bogus, because they ge

Re: Least used parts of BMP.

2010-06-02 Thread Asmus Freytag
SCSU is a pass-through for ASCII, plus it handles the common mix of ASCII plus 96 local characters (Latin-1, Greek, Cyrillic, Thai, etc) really fast. Go look at the sample code. If you take that as starting point for optimization, I think you'll be fine.

Re: Least used parts of BMP.

2010-06-04 Thread Asmus Freytag
On 6/4/2010 8:34 AM, Mark Davis ☕ wrote: In a compression format, that doesn't matter; you can't expect random access, nor many of the other features of UTF-8. The minimal expectation for these kinds of simple compression is that when you write a string with a particular /write/ method, and th

Re: Questionable lines on LineBreakTest.txt

2010-06-07 Thread Asmus Freytag
On 6/7/2010 4:26 PM, Masaaki Shibata wrote: I'm studying the UAX #14 (5.2.0) and testing my code against LineBreakTest.txt. And I found some test cases on this text file seem to be contradictory to the rules on the document. For example, LB25 explicitly prohibits breaking between CP and PO, whil

Re: Tamil u,uu matra consonants - Orthographic variation

2010-06-09 Thread Asmus Freytag
Can we stop double posting on Unicode and Unicore list? People on the unicode list cannot reply to people on the other list, and vice versa (unless they happen to be mermbers of both lists). Thanks. A./

Re: Writing a proposal for an unusual script: SignWriting

2010-06-14 Thread Asmus Freytag
On 6/14/2010 1:18 PM, Mark E. Shoulson wrote: On 06/14/2010 02:15 PM, Asmus Freytag wrote: On 6/14/2010 9:21 AM, Stephen Slevinski wrote: Plain text SignWriting should be able to write actual sign language, such as "hello world." You could equally well insist that it should be p

Re: Latin Script

2010-06-17 Thread Asmus Freytag
On 6/17/2010 7:24 PM, Tulasi wrote: What is equivalent ISO/IEC ISO/IEC what? There are hundreds of ISO/IEC standards, of which dozens are character encoding standards. for "U+0278 LATIN SMALL LETTER PHI (ɸ)"? Or do Unicode & ISO/IEC use different number & name for same letter/symbol? ISO

Re: Generic Base Letter

2010-06-26 Thread Asmus Freytag
On 6/26/2010 8:03 AM, Otto Stolz wrote: Hi Vincent Setterholm, you have been asking: What I'd like to see is a code point for a generic base character You could try U+25CC DOTTED CIRCLE, though the reference glyph for this cgaracter is larger than the dotted circles used to attach the various

Re: Indian Rupee Sign to be chosen today

2010-06-26 Thread Asmus Freytag
On 6/26/2010 5:41 PM, Doug Ewell wrote: Regarding the inability to distinguish 8859-15 heuristically from 8859-1, I understand the problem when there are no tags or other hints, or for cases like Windows-1252 text declared to be 8859-1, but it seems unlikely to me that there is much text enc

Re: Generic Base Letter

2010-06-27 Thread Asmus Freytag
The one argument that I find convincing is that too many implementations seem set to disallow generic combination, relying instead on fixed tables of known/permissible combinations. In that situation, a formally adopted character with the clearly stated semantic of "is expected to actually ren

Re: charset parameter in Google Groups (was Re: Indian Rupee Sign to be chosen today)

2010-06-28 Thread Asmus Freytag
On 6/28/2010 11:38 AM, Mark Davis ☕ wrote: The problem with slavishly following the charset parameter is that it is often incorrect. However, the charset parameter is a signal into the character detection module, so the charset is correctly supplied from the message then the results of the d

Re: Latin Script

2010-06-28 Thread Asmus Freytag
I'd like to second Mark. There is a lot of information in the Standard, including the UAXs, and the Unicode Character Database that would help answer your questions. The volunteers associated with the Unicode effort have worked hard putting all that information together - so use it, instead o

Re: charset parameter in Google Groups

2010-07-01 Thread Asmus Freytag
On 7/1/2010 11:29 AM, John Burger wrote: Andreas Prilop wrote: The problem with slavishly following the charset parameter is that it is often incorrect. I wonder how you could draw such a conclusion. In order to make such a statement, there must be some other (god-given?) parameter, which is

Re: charset parameter in Google Groups

2010-07-07 Thread Asmus Freytag
Andreas, I think we all realize your frustration with well-meaning software. Because tags can be wrong for no fault of the human originating the document, I fully understand that Google might want to attempt to improve the user experience in such situations. The problem is that doing so shoul

Re: ? Reasonable to propose stability policy on numeric type = decimal

2010-07-24 Thread Asmus Freytag
On 7/24/2010 3:00 PM, Bill Poser wrote: On Sat, Jul 24, 2010 at 1:00 PM, Michael Everson wrote: Digits can be scattered randomly about the code space and it wouldn't make any difference. Having written a library for performing conversions between Unicode strings and numbers, I disag

Re: Reasonable to propose stability policy on numeric type = decimal

2010-07-25 Thread Asmus Freytag
The short answer to Karl's question is that there will not be an absolute guarantee. The long answer is that, partly for the reasons he's mentioned, this won't be a practical problem. A. Most of the living scripts that are in wide use have been encoded, including whatever digits are in use.

Re: Reasonable to propose stability policy on numeric type = decimal

2010-07-25 Thread Asmus Freytag
On 7/25/2010 6:05 PM, Martin J. Dürst wrote: On 2010/07/26 4:37, Asmus Freytag wrote: PPS: a very hypothetical tough case would be a script where letters serve both as letters and as decimal place-value digits, and with modern living practice. Well, there actually is such a script, namely

Re: ? Reasonable to propose stability policy on numeric type = decimal

2010-07-26 Thread Asmus Freytag
On 7/26/2010 6:55 AM, John Burger wrote: Mark Davis ☕ wrote: From just a quick scan, it appears that they are currently all contiguous within their respective groups. If we were to impose a stability policy, it would be a constraint on the general_category: we would not assign general_categor

Re: ? Reasonable to propose stability policy on numeric type = decimal

2010-07-26 Thread Asmus Freytag
On 7/26/2010 12:13 PM, Mark Davis ☕ wrote: I agree that having it stated at point of use is useful - and we do that in other cases covered by stability clauses; but we can only state it IF we have the corresponding stability policy. Mark, The statement in your "but" clause really isn't correct

Re: Why does EULER CONSTANT not have math property and PLANCK CONSTANT does?

2010-07-27 Thread Asmus Freytag
On 7/27/2010 3:02 PM, Kenneth Whistler wrote: Karl Williamson asked: Subject: Why does EULER CONSTANT not have math property and PLANCK CONSTANT does? They are U+2107 and U+210E respectively. Because U+210E PLANCK CONSTANT is, to quote the standard, "simply a mathematical

Re: High dot/dot above punctuation?

2010-07-28 Thread Asmus Freytag
On 7/28/2010 2:02 AM, Kent Karlsson wrote: Den 2010-07-28 09.50, skrev "Jukka K. Korpela" : André Szabolcs Szelp wrote: Generally, for the decimal point . (U+002E FULLSTOP) and , (U+002C COMMA) is used in the SI world. However, earlier conventions could use different notation, such a

Re: High dot/dot above punctuation?

2010-07-28 Thread Asmus Freytag
On 7/28/2010 9:30 AM, André Szabolcs Szelp wrote: You really all say, that general property Sk (DOT ABOVE) rather than Po (FULL STOP, COMMA, MIDDLE DOT) (compared with all other decimal point characters) can not cause any problems ever in certain algorithms? No, we say that this is equivalent to

Re: High dot/dot above punctuation?

2010-07-28 Thread Asmus Freytag
On 7/28/2010 10:09 AM, Murray Sargent wrote: Contextual rendering is getting to be more common thanks to adoption of OpenType features. For example, both MS Publisher 2010 and MS Word 2010 support various contextually dependent OpenType features at the user's discretion. The choice of glyph for U+

Re: Reasonable to propose stability policy on numeric type = decimal

2010-07-28 Thread Asmus Freytag
On 7/28/2010 9:33 PM, karl williamson wrote: The digits (一、 二、三、四、五、六、七、八、九、〇) are used both as letters and as decimal place-value digits, and they are scattered widelythe same characters are also used as elements in a system that doesn't use place-value, but uses special characters to show

Re: Reasonable to propose stability policy on numeric type = decimal

2010-07-28 Thread Asmus Freytag
On 7/28/2010 10:13 PM, Martin J. Dürst wrote: Sequences of numeric Kanji are also used in names and word-plays, and as sequences of individual small numbers. But the same applies to our digits. A very simple example is to use them as a ruler in plain text: 1 2 3

Re: Plain text

2010-07-28 Thread Asmus Freytag
On 7/28/2010 9:32 PM, Doug Ewell wrote: Murray Sargent wrote: It's worth remembering that plain text is a format that was introduced due to the limitations of early computers. Books have always been rendered with at least some degree of rich text. And due to the complexity of Unicode, even U

Re: Digit/letter variants in the "same" unified script (was: stability policy on numeric type = decimal)

2010-07-29 Thread Asmus Freytag
Having Nd be limited to characters that a) are used in decimal radix numbers b) are part of a complete, ordered sequence 0..9 would make this property regular enough to serve implementers. You could script the creation of relevant data for your implementation based on that property. *Exceptions

Re: High dot/dot above punctuation?

2010-07-31 Thread Asmus Freytag
On 7/29/2010 1:15 AM, Khaled Hosny wrote: I don't buy in Unicode idea of encoding different sets of decimal digits separately, they are all different graphical presentations of the same thing. Two observations: 1) During rendering, everything turns into a graphical representation.

Re: Draft Proposal to add Variation Sequences for Latin and Cyrillic letters

2010-08-04 Thread Asmus Freytag
On 8/2/2010 5:04 PM, Karl Pentzlin wrote: I have compiled a draft proposal: Proposal to add Variation Sequences for Latin and Cyrillic letters The draft can be downloaded at: http://www.pentzlin.com/Variation-Sequences-Latin-Cyrillic2.pdf (4.3 MB). The final proposal is intended to be submitted

Re: Standard fallback characters (was: Draft Proposal to add Variation=D=A Sequences for Latin and Cyrillic letters)

2010-08-04 Thread Asmus Freytag
On 8/4/2010 1:30 PM, verdy_p wrote: "Asmus Freytag" wrote: The Fraktur problem is one where one typestyle requires additional information (e.g. when to select long s) that is not required for rendering the same text in another typestyle. If it is indeed desirable (and possible)

Re: Re:=D=A Standard fallback characters (was: Draft Proposal to add Variation� Sequences for Latin and Cyrillic letters)

2010-08-04 Thread Asmus Freytag
Philipe, Text typeset in Fraktur contains more information than text typset in Antiqua. That means, there are some places where there are some (mild) ambiguities in representation in the Antiqua version. Not enough to bother a human reader who can use deep context to read the text correctly,

Re: Draft Proposal to add Variation Sequences for Latin and Cyrillic letters

2010-08-05 Thread Asmus Freytag
On 8/5/2010 3:47 AM, William_J_G Overington wrote: On Wednesday 4 August 2010, Asmus Freytag wrote: However, there's no need to add variation sequences to select an *ambiguous* form. Those sequences should be removed from the proposal. Are you here talking about such thin

Re: Accessing alternate glyphs from plain text (from Re: Draft Proposal to add Variation Sequences for Latin and Cyrillic letters)

2010-08-06 Thread Asmus Freytag
On 8/6/2010 2:03 AM, William_J_G Overington wrote: On Thursday, 5 August 2010, Kenneth Whistler wrote: I am thinking of where a poet might specify an ending version of a glyph at the end of the last word on some lines, yet not on others, for poetic effect. I think that it would be good i

Re: A simpler definition of the Bidi Algorithm

2010-09-10 Thread Asmus Freytag
The first discussions that lead to the current formulation of the bidi algorithm easily go back 20 years by now. There's some value in not re-stating a specification - even if a new formulation could be found to be 100% equivalent. That value lies in the fact that any reader can tell, by simpl

Re: 00B7 vs. 2027

2010-09-18 Thread Asmus Freytag
On 9/18/2010 8:36 AM, abysta wrote: Hello. I need a dot to separate words into syllables. What should I use, 00B7 or 2027, and why? 2027 is explicitly intended to be used to show syllables as is done in dictionaries. You don't make it explicit in your query, but it sounds like that is the

Re: 00B7 vs. 2027

2010-09-18 Thread Asmus Freytag
On 9/18/2010 10:56 AM, Lorna Priest wrote: U+00B7 MIDDLE DOT is semantically ambiguous and has (partly therefore) varying renderings, and it might be used as a replacement for U+2027 if the latter cannot be used reliably. What about using U+02D1 - half triangular colon? Why not use the

Re: statistics

2010-10-11 Thread Asmus Freytag
On 10/11/2010 9:49 PM, Janusz S. "Bień" wrote: On Mon, 11 Oct 2010 announceme...@unicode.org wrote: The newly finalized Unicode Version 6.0 adds 2,088 characters, What is the current total? Are other statistic informations available somewhere? The announcement gives a link to click throug

Re: Irrational numeric values in TUS

2010-10-12 Thread Asmus Freytag
Ken, some comments, and a few suggestions near the end. On 10/12/2010 4:56 PM, Kenneth Whistler wrote: Karl Williamson asked: The Unicode standard only gives numeric values to rational numbers. Is the reason for this merely because of the difficulty of representing irrational ones? No. Pr

Re: [unicode] Telugu Unicode Encoding Review

2010-10-16 Thread Asmus Freytag
On 10/16/2010 10:38 AM, suzuki toshiya wrote: Hi, I've never heard any comments about the reservation of the codepoints to making the code chart structure similar among multiple script, no posive, no negative. So your comment is interesting. Could you tell me more about what kind of disadvantag

Re: A simpler definition of the Bidi Algorithm

2010-10-17 Thread Asmus Freytag
On 10/17/2010 7:01 AM, Michael D. Adams wrote: This is something that not even the C++ and Java reference implementations do (though it appears that the C++ implementation of the W rules was originally derived from a regular expression as it uses state tables, but if so it is undocumented). (Wh

Re: A simpler definition of the Bidi Algorithm

2010-10-17 Thread Asmus Freytag
On 10/17/2010 10:59 AM, Michael D. Adams wrote: "The biggest challenge was not in creating those tables, but in understanding the nuances of the rules, by the way." Two questions so I can understand better. First, by nuances do you mean the nuances of how the rules interact (which I think woul

Re: Utility to report and repair broken surrogate pairs in UTF-16 text

2010-11-04 Thread Asmus Freytag
On 11/4/2010 5:46 PM, Doug Ewell wrote: Markus Scherer wrote: While processing 16-bit Unicode text which is not assumed to be well-formed UTF-16, you can treat (decode) an unpaired surrogate as a mostly-inert surrogate code point. However, you cannot unambiguously encode a surrogate code poin

Re: Utility to report and repair broken surrogate pairs in UTF-16 text

2010-11-05 Thread Asmus Freytag
On 11/5/2010 7:02 AM, Doug Ewell wrote: Asmus Freytag wrote: I'm probably missing something here, but I don't agree that it's OK for a consumer of UTF-16 to accept an unpaired surrogate without throwing an error, or converting it to U+FFFD, or otherwise raising a fuss. Unpaired

Re: Is there a term for strictly-just-this-encoding-and-not-really-that-encoding?

2010-11-10 Thread Asmus Freytag
If you want to get that point across to a general audience, you could use a more colloquial term, albeit one that itself derives from mathematics. Text that can be completely expressed in ASCII is fits into something (ASCII) that works as a "lowest common denominator" of a large number of char

Re: Application that displays CJK text in Normalization Form D

2010-11-14 Thread Asmus Freytag
On 11/14/2010 12:57 PM, Doug Ewell wrote: Jim Monty wrote: Japanese kana (the "J" in "CJK") and Korean syllables (the "K" in "CJK") both have different normalization forms. What do ideographs have to do with anything? I didn't mention ideographs; you did. The term "CJK" is often used to ref

Re: Application that displays CJK text in Normalization Form D

2010-11-15 Thread Asmus Freytag
On 11/15/2010 2:24 PM, Kenneth Whistler wrote: FA47 is a "compatibility character", and would have a compatibility mapping. Faulty syllogism. Formally correct answer but only because of something of a design flaw in Unicode. When the type of mapping was decided on, people didn't fully expect

Re: CJK Compatibility Gotchas (was: Re: Application that displays CJK text in Normalization Form D

2010-11-15 Thread Asmus Freytag
On 11/15/2010 5:43 PM, Kenneth Whistler wrote: Perhaps someone would like to make a detailed proposal to the UTC for how to fix the text and charts?;-) Ken, having shown yourself the master of detail in your reply, I think you've appointed yourself. A round of applause for Ken! See how eas

Re: Are Latin and Cyrillic essentially the same script?

2010-11-18 Thread Asmus Freytag
On 11/18/2010 8:04 AM, Peter Constable wrote: From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf Of André Szabolcs Szelp AFAIR the reservations of WG2 concerning the encoding of Jangalif Latin Ь/ь as a new character were not in view of Cyrillic Ь/ь, but rather in vi

Re: Are Latin and Cyrillic essentially the same script?

2010-11-19 Thread Asmus Freytag
On 11/18/2010 11:15 PM, Peter Constable wrote: If you'd like a precedent, here's one: Yes, I think discussion of precedents is important - it leads to the formulation of encoding principles that can then (hopefully) result in more consistency in future encoding efforts. Let me add the cavea

Re: Are Latin and Cyrillic essentially the same script?

2010-11-22 Thread Asmus Freytag
On 11/22/2010 4:15 AM, Michael Everson wrote: It boils down to this: just as there aren’t technical or usability reasons that make it problematic to represent IPA text using two Greek characters in an otherwise-Latin system, Yes there are. Sorting multilingual text including Greek and IPA tra

Re: UNICODE version of _T(x) macro

2010-11-22 Thread Asmus Freytag
On 11/22/2010 10:18 AM, Phillips, Addison wrote: sowmya satyanarayana wrote: Taking this, what is the best way to define _T(x) macro of UNICODE version, so that my strings will always be 2 byte wide character? Unicode characters aren't always 2 bytes wide. Characters with values of U+1

Re: UNICODE version of _T(x) macro

2010-11-22 Thread Asmus Freytag
On 11/22/2010 11:08 AM, Asmus Freytag wrote: depending on whether some global compile time flat (usually UNICODE or _UNICODE) is set or not. recte: flag.

Re: UNICODE version of _T(x) macro

2010-11-23 Thread Asmus Freytag
On 11/23/2010 1:58 AM, sowmya satyanarayana wrote: This what I am actually looking for. My ODBC application supports UTF-16, which is 2 byte width characters. This application is completely oriented around using _T(x) macro as Asmus Freytag figured out. Yeah, it's nice when you c

Re: Latin IPA letter a

2011-06-28 Thread Asmus Freytag
On 6/28/2011 1:51 AM, Michael Everson wrote: On 28 Jun 2011, at 09:28, Jean-François Colson wrote: In Times New Roman, which is the default font for MS Word (probably the best known word processor), the letters “a” and “ɑ” are indistinguishable in italics. That is a fault of the font. No, t

Re: Unifon

2011-06-28 Thread Asmus Freytag
On 6/28/2011 1:40 AM, Andreas Stötzner wrote: Am 28.06.2011 um 09:43 schrieb Jean-François Colson: I’m interested in Unifon (http://www.unifon.org). That’s a phonemic alphabet for English which is used to teach reading. Although it has been encoded in the ConScript Unicode Registry as a new s

Re: Typo in bidi reference implementation

2011-07-01 Thread Asmus Freytag
On 7/1/2011 12:06 AM, Peter Krefting wrote: Hi! On line 65 of http://www.unicode.org/Public/PROGRAMS/BidiReferenceCpp/bidi.cpp (version 26) the word "utility" is spelled as "uitlity" (line 80 has the correct spelling). Not that it matters much, just something we noticed. If it's in a comme

Re: unicode Digest V12 #108

2011-07-02 Thread Asmus Freytag
On 7/2/2011 8:59 AM, Philippe Verdy wrote: 2011/7/2 Andrew Miller: The "ng" in Llangollen is not the digram "ng" but two separate letters (unlike the "ll" in the name which is the digram). Why not simply using a soft hyphen between "n" and "g" in this case ? Soft hyphens are normally recognized

Re: unicode Digest V12 #108

2011-07-05 Thread Asmus Freytag
On 7/3/2011 6:31 AM, Philippe Verdy wrote: Regarfing the previous comment about the Danish "aa", Sorry, most of that discussion missed the mark. "Modern" Danish can have "AA" for two reasons. Accidental occurrence, as in "dataanalyse" which is composed of two words which just happens to put

Re: unicode Digest V12 #108

2011-07-06 Thread Asmus Freytag
On 7/6/2011 12:16 AM, Jukka K. Korpela wrote: Allowing word division just to say that some characters do not constitute a digraph (or trigraph…) is not practical e.g. when the text has otherwise no word divisions, for one reason or another, or when the particular word division point is typograp

Re: Proposed Update UAXes for Unicode 6.1

2011-07-07 Thread Asmus Freytag
On 7/7/2011 8:42 PM, Karl Williamson wrote: On 07/07/2011 02:33 PM, announceme...@unicode.org wrote: Proposed updates for most Unicode Standard Annexes for Version 6.1 of the Unicode Standard have been posted for public review. Many of the documents appear to have no current modifications to

Re: Unicode 7.0 goals and ++

2011-07-11 Thread Asmus Freytag
On 7/11/2011 11:57 AM, Ken Whistler wrote: On 7/10/2011 4:58 PM, Ernest van den Boogaard wrote: For the long term, I suggest Unicode should aim for this: That kind of terminological purity isn't going to occur. ... The Unicode Consortium has a glossary of terms: ... But the Unicode Sta

Definition of character

2011-07-12 Thread Asmus Freytag
Jukka, reminding everyone of the definition of "technical term" as opposed to a word in everyday language isn't helping address the underlying issue. Everyone is familiar with this distinction. You note that there's a bit of a truism that underlies the definition of character and character e

Re: Quick survey of Apple symbol fonts (in context of the Wingding/Webding proposal)

2011-07-15 Thread Asmus Freytag
On 7/15/2011 1:08 AM, Karl Pentzlin wrote: In WG2 N4085 "Further proposed additions to ISO/IEC 10646 and comments to other proposals" (2011‐ 05‐25), the German NB had requested re WG2 N4022 "Proposal to add Wingdings and Webdings Symbols" besides other points: "Also, in doing this work, othe

Re: Quick survey of Apple symbol fonts (in context of the Wingding/Webding proposal)

2011-07-15 Thread Asmus Freytag
On 7/15/2011 9:03 AM, Doug Ewell wrote: Andrew West replied to Michael Everson: I think that having encoded symbols for control characters (which we already have for some of them) is no bad thing, and the argument about "too many characters" is not compelling, as there are only some dozens of

Re: Quick survey of Apple symbol fonts (in context of the Wingding/Webding proposal)

2011-07-15 Thread Asmus Freytag
On 7/15/2011 2:23 AM, Karl Pentzlin wrote: Am Freitag, 15. Juli 2011 um 10:58 schrieb Asmus Freytag: AF> ... There appear to be a large number of symbols for which a AF> Unicode equivalent can be identified with great certainty - AF> and beyond that there seem to be characters for w

Re: Quick survey of Apple symbol fonts (in context of the Wingding/Webding proposal)

2011-07-15 Thread Asmus Freytag
On 7/15/2011 10:26 AM, Michael Everson wrote: What I see is a certain unreasonability reflecting a certain conservatism. Text about the Standard is important, and should be representable in an interchangeable way. Here { } is a Right to left override character. character. I want to talk about

Re: Quick survey of Apple symbol fonts (in context of the Wingding/Webding proposal)

2011-07-15 Thread Asmus Freytag
On 7/15/2011 11:05 AM, Doug Ewell wrote: What I see is a certain unreasonability reflecting a certain conservatism. Text about the Standard is important, and should be representable in an interchangeable way. Here { } is a Right to left override character. character. I want to talk about it in

Re: Quick survey of Apple symbol fonts (in context of the Wingding/Webding proposal)

2011-07-15 Thread Asmus Freytag
On 7/15/2011 11:36 AM, Michael Everson wrote: However, I agree with Asmus that in the context of the Wingdings-type symbols these characters should not be considered. They should be considered as a whole on their own. Thank you Michael. To reiterate and restate (so it can be read out of cont

Re: Quick survey of Apple symbol fonts (in context of the Wingding/Webding proposal)

2011-07-15 Thread Asmus Freytag
On 7/15/2011 2:18 PM, Michael Everson wrote: As for the others, those are chart glyphs for the ZWNJ and the ZWJ. There is no need to encode *characters* for chart glyphs. That's your assertion. Some other people have a different view, and think that there *is* a need to encode *characters* for

Re: Quick survey of Apple symbol fonts (in context of the Wingding/Webdingproposal)

2011-07-15 Thread Asmus Freytag
On 7/15/2011 10:48 PM, Doug Ewell wrote: I apologize for the unintended content-free post. It's my phone's fault. -- My dog ate the homework - 2011? :) A./

Re: Quick survey of Apple symbol fonts (in context of the Wingding/Webding proposal)

2011-07-16 Thread Asmus Freytag
On 7/16/2011 1:53 AM, Michael Everson wrote: On 16 Jul 2011, at 04:37, Asmus Freytag wrote: It's not a matter of competing "views". There's a well-defined process for adding characters to the standard. It starts by documenting usage. Yes, Asmus, and when one wants to d

Re: Quick survey of Apple symbol fonts (in context of the Wingding/Webding proposal)

2011-07-16 Thread Asmus Freytag
Karl, I've published similar "surveys" in the past, where the object was to get feedback on the desirability of further action. I stick by my recommendation in favor of keeping "raw data" out of the document registry and of doing the committee a favor by "adding value" in form of a sifting or

Re: Quick survey of Apple symbol fonts (in context of the Wingding/Webding proposal)

2011-07-17 Thread Asmus Freytag
On 7/17/2011 2:47 AM, Petr Tomasek wrote: On Sun, Jul 17, 2011 at 10:14:55AM +0100, Julian Bradfield wrote: Wouldn't it be more economical to encode a single UNICODE ESCAPE CHARACTER which forces the following character to be interpreted as a printable glyph rather than any control function? I

Re: Quick survey of Apple symbol fonts (in context of the Wingding/Webdingproposal)

2011-07-17 Thread Asmus Freytag
On 7/17/2011 12:19 PM, Doug Ewell wrote: Asmus wrote: The reason is, of course, because these codes would *reinterpret* existing characters. You could argue that Variation Selectors do the same, but they are carefully constructed so that they can be safely ignored. Variation selectors don'

Re: Quick survey of Apple symbol fonts (in context of the Wingding/Webding proposal)

2011-07-17 Thread Asmus Freytag
On 7/17/2011 12:19 PM, Philippe Verdy wrote: 2011/7/17 Asmus Freytag: On 7/17/2011 2:35 AM, Michael Everson wrote: ... invisible and stateful control characters are more expensive than ordinary graphic symbols. In this case, the expense is so much higher as to rule out such an idea from the

Re: Quick survey of Apple symbol fonts (in context of the Wingding/Webding proposal)

2011-07-19 Thread Asmus Freytag
On 7/19/2011 7:18 PM, John W Kennedy wrote: On Jul 19, 2011, at 9:20 PM, Peter Constable wrote: So you want to be able to discuss NBSP (say) in plain text. You can already do that; in fact, you have multiple ways that everybody here will have no difficulty understanding: "NBSP" "no-break spa

Re: Greek Characters Duplicated as Latin

2011-08-14 Thread Asmus Freytag
On 8/14/2011 1:39 PM, Richard Wordingham wrote: U+00B5 MICRO SIGN is an ISO-8859-1 character, and was therefore included as U+00B5. It normally precedes a Latin-script letter, and therefore it actually makes sense to treat it as a Latin-script character, and possibly give it a different shape i

Re: Anything from the Symbol font to add along with W*dings?

2011-08-14 Thread Asmus Freytag
On 8/14/2011 12:51 PM, Jukka K. Korpela wrote: 14.8.2011 17:51, Doug Ewell wrote: This sounds like Jukka expects browsers to analyze the glyph assigned in the font to the code position for 'a' and decline to display it if it doesn't look enough like an 'a' (rejecting, for example, Greek 'α'). I

Re: Sanskrit nasalized L

2011-08-16 Thread Asmus Freytag
On 8/16/2011 1:57 AM, Andrew West wrote: On 16 August 2011 02:59, Richard Wordingham wrote: All I've got to go on is the penultimate sentence in TUS 6.0 Section 10.2 - 'Rarely, stacks are seen that contain more than one such consonant-vowel combination in a vertical arrangement'.

Re: Non-standard Tibetan stacks (was Re: Sanskrit nasalized L)

2011-08-16 Thread Asmus Freytag
On 8/16/2011 3:32 PM, Andrew West wrote: On 16 August 2011 18:19, Asmus Freytag wrote: "These stacks are highly unusual and are considered beyond the scope of plain text rendering. They may be handled by higher-level mechanisms". The question is: have any such "mechanisms&qu

Re: What are the present criteria...

2011-08-18 Thread Asmus Freytag
On 8/18/2011 7:29 AM, Doug Ewell wrote: Karl Pentzlin wrote: The quoted indicators for benefit were part of a concern of the German NB regarding the Wingding/Webding proposals. The concern expressed in WG2 N4085 is that some characters proposed there conform neither to the policy statements by

Re: Code pages and Unicode (wasn't really: RE: Endangered Alphabets)

2011-08-19 Thread Asmus Freytag
On 8/19/2011 2:35 PM, Jukka K. Korpela wrote: 20.8.2011 0:07, Doug Ewell wrote: Of course, 2.1 billion characters is also overkill, but the advent of UTF-16 was how we ended up with 17 planes. And now we think that a little over a million is enough for everyone, just as they thought in the l

Re: Code pages and Unicode (wasn't really: RE: Endangered Alphabets)

2011-08-19 Thread Asmus Freytag
On 8/19/2011 3:24 PM, Ken Whistler wrote: On 8/19/2011 2:07 PM, Doug Ewell wrote: Technically, I think 10646 was always limited to 32,768 planes so that one could always address a code point with a 32-bit signed integer (a nod to the Java fans). Well, yes, but it didn't really have anything to

Re: RTL PUA?

2011-08-20 Thread Asmus Freytag
On 8/20/2011 6:44 PM, Doug Ewell wrote: Would that really be a better default? I thought the main RTL needs for the PUA would be for unencoded scripts, not for even more Arabic letters. (How many more are there anyway?) In any case, either 'R' or 'AL' as the Plane 16 default would be an improv

Re: RTL PUA?

2011-08-21 Thread Asmus Freytag
On 8/21/2011 3:31 PM, Richard Wordingham wrote: On Sun, 21 Aug 2011 11:00:26 -0600 "Doug Ewell" wrote: I think as soon as we start talking about this many scenarios, we are no longer talking about what the *default* bidi class of the PUA (or some part of it) should be. Instead, we are talking

Re: RTL PUA?

2011-08-21 Thread Asmus Freytag
On 8/21/2011 7:34 PM, Doug Ewell wrote: So what you are asking about is a directional control character that would assign subsequent characters a BC of 'AL', right? You don't want to call this a LANGUAGE MARK or anything else that implies language identification, because of the existence of "r

Re: Implement BIDI algorithm by line

2011-08-22 Thread Asmus Freytag
Huh? What context is this in? On 8/22/2011 11:18 AM, CE Whitehead wrote: Hi. I think many line breaks within paragraphs are soft line breaks but that embedding levels have to be taken into account when deciding the width of the glyphs; that's as near as I can tell. Here is the description o

Re: Designing a format for research use of the PUA in a RTL mode (from Re: RTL PUA?)

2011-08-23 Thread Asmus Freytag
On 8/23/2011 7:22 AM, Doug Ewell wrote: Of all applications, a word processor or DTP application would want to know more about the properties of characters than just whether they are RTL. Line breaking, word breaking, and case mapping come to mind. I would think the format used by standard UCD

Re: Code pages and Unicode

2011-08-23 Thread Asmus Freytag
On 8/23/2011 12:00 PM, Richard Wordingham wrote: On Mon, 22 Aug 2011 16:18:56 -0700 Ken Whistler wrote: How about Clause 12.5 of ISO/IEC 10646: <001B, 0025, 0040> You "escape" out of UTF-16 to ISO 2022, and then you can do whatever the heck you want, including exchange and processing of comp

Re: Code pages and Unicode

2011-08-24 Thread Asmus Freytag
On 8/24/2011 7:45 PM, Richard Wordingham wrote: Which earlier coding system supported Welsh? (I'm thinking of 'W WITH CIRCUMFLEX', U+0174 and U+0175.) How was the use of the canonical decompositions incompatible with the character encodings of legacy systems? Latin-1 has the same codes as ISO

Re: PRI #202: Extensions to NameAliases.txt for Unicode 6.1.0

2011-08-26 Thread Asmus Freytag
I agree with Ken that Phillipe's suggestion of conflating the annotations for mathematical use with formal Unicode name aliases is a non-starter. The former exist to help mathematicians identify symbols in Unicode, when they know their name from entity lists. The latter are designed to allow pr

Re: PRI #202: Extensions to NameAliases.txt for Unicode 6.1.0

2011-08-26 Thread Asmus Freytag
On 8/26/2011 10:09 PM, Philippe Verdy wrote: 2011/8/27 Asmus Freytag: I agree with Ken that Phillipe's suggestion of conflating the annotations for mathematical use with formal Unicode name aliases is a non-starter. Yes but why then adding ISO 6429 alias names ? What makes ISO 6429 a b

Re: PRI #202: Extensions to NameAliases.txt for Unicode 6.1.0

2011-08-26 Thread Asmus Freytag
On 8/26/2011 7:52 PM, Benjamin M Scarborough wrote: Are name aliases exempted from the normal character naming conventions? I ask because four of the entries have words that begin with numbers. 008E;SINGLE-SHIFT 2;control 008F;SINGLE-SHIFT 3;control 0091;PRIVATE USE 1;control 0092;PRIVATE USE 2

Re: PRI #202: Extensions to NameAliases.txt for Unicode 6.1.0

2011-08-27 Thread Asmus Freytag
On 8/27/2011 1:31 AM, Andrew West wrote: On 27 August 2011 09:25, Andrew West wrote: On 27 August 2011 03:52, Benjamin M Scarborough wrote: Are name aliases exempted from the normal character naming conventions? I ask because four of the entries have words that begin with numbers. 008E;SIN

Re: PRI #202: Extensions to NameAliases.txt for Unicode 6.1.0

2011-08-28 Thread Asmus Freytag
On 8/28/2011 9:46 PM, Doug Ewell wrote: Philippe Verdy wrote: If there are other mappings to do with other standards, and those standards must be only informative, we already have the "/MAPPINGS" directory beside the "/UNIDATA" directory where the UCD belongs too. But in general, with the exc

Re: PRI #202: Extensions to NameAliases.txt for Unicode 6.1.0

2011-08-28 Thread Asmus Freytag
On 8/28/2011 6:43 PM, Philippe Verdy wrote: 2011/8/27 Asmus Freytag: I also think that the status field "iso6429" is badly named. It should be "control", and what is named control should be "control-alternate", or perhaps, both of these groups should become simply

<    1   2   3   4   5   6   7   8   9   10   >