Re: Solidus variations

2011-10-07 Thread Asmus Freytag
Murray's work comes from the desire to represent mathematical equations faithfully, based nearly entirely on the semantics of the operators and having those operators be represented as Unicode characters. One solution that he uses is the use of redundant parens. Parens can be supplied to

Re: Continue: Glaring mistake in nomenclature , should it have been Assamese ?

2011-09-14 Thread Asmus Freytag
On 9/14/2011 11:14 AM, Michael Everson wrote: At this point, I think I have to make a plea: Sarasvati, spare us. +1

Re: Need for Level Direction Mark

2011-09-13 Thread Asmus Freytag
On 9/13/2011 6:01 AM, Philippe Verdy wrote: Unfortunately, adding controls would imply the creation of new Bidi classes for them (and forgetting the stability policy about them, which was published too soon before solving evident problems). The first part is correct, and giving up stability to

Re: ligature usage - WAS: How do we find out what assigned code points aren't normally used in text?

2011-09-11 Thread Asmus Freytag
On 9/9/2011 8:12 PM, Stephan Stiller wrote: Dear Martin, Thanks for alerting me to the issue of causal direction of aesthetic preference - it's been on my mind, but your reply helps me sort out some details. When I first encountered text (outside of the German language locale) with ample

Re: PRI #202: Extensions to NameAliases.txt for Unicode 6.1.0

2011-09-01 Thread Asmus Freytag
On 8/31/2011 11:25 PM, Philippe Verdy wrote: 2011/9/1 Karl Williamsonpub...@khwilliamson.com: But now that I'm an UTC member, I hope I will hear these cases earlier... Congratulations! Does it justify so many new aliases at the same time ? No. I'm firmly with you, I support the

Re: PRI #202: Extensions to NameAliases.txt for Unicode 6.1.0

2011-08-28 Thread Asmus Freytag
On 8/28/2011 9:46 PM, Doug Ewell wrote: Philippe Verdy wrote: If there are other mappings to do with other standards, and those standards must be only informative, we already have the /MAPPINGS directory beside the /UNIDATA directory where the UCD belongs too. But in general, with the

Re: PRI #202: Extensions to NameAliases.txt for Unicode 6.1.0

2011-08-28 Thread Asmus Freytag
On 8/28/2011 6:43 PM, Philippe Verdy wrote: 2011/8/27 Asmus Freytagasm...@ix.netcom.com: I also think that the status field iso6429 is badly named. It should be control, and what is named control should be control-alternate, or perhaps, both of these groups should become simply control. I think

Re: PRI #202: Extensions to NameAliases.txt for Unicode 6.1.0

2011-08-27 Thread Asmus Freytag
On 8/26/2011 10:09 PM, Philippe Verdy wrote: 2011/8/27 Asmus Freytagasm...@ix.netcom.com: I agree with Ken that Phillipe's suggestion of conflating the annotations for mathematical use with formal Unicode name aliases is a non-starter. Yes but why then adding ISO 6429 alias names ? What makes

Re: PRI #202: Extensions to NameAliases.txt for Unicode 6.1.0

2011-08-27 Thread Asmus Freytag
On 8/26/2011 7:52 PM, Benjamin M Scarborough wrote: Are name aliases exempted from the normal character naming conventions? I ask because four of the entries have words that begin with numbers. 008E;SINGLE-SHIFT 2;control 008F;SINGLE-SHIFT 3;control 0091;PRIVATE USE 1;control 0092;PRIVATE USE

Re: PRI #202: Extensions to NameAliases.txt for Unicode 6.1.0

2011-08-27 Thread Asmus Freytag
On 8/27/2011 1:31 AM, Andrew West wrote: On 27 August 2011 09:25, Andrew Westandrewcw...@gmail.com wrote: On 27 August 2011 03:52, Benjamin M Scarborough benjamin.scarboro...@utdallas.edu wrote: Are name aliases exempted from the normal character naming conventions? I ask because four of

Re: PRI #202: Extensions to NameAliases.txt for Unicode 6.1.0

2011-08-26 Thread Asmus Freytag
I agree with Ken that Phillipe's suggestion of conflating the annotations for mathematical use with formal Unicode name aliases is a non-starter. The former exist to help mathematicians identify symbols in Unicode, when they know their name from entity lists. The latter are designed to allow

Re: Code pages and Unicode

2011-08-25 Thread Asmus Freytag
On 8/24/2011 7:45 PM, Richard Wordingham wrote: Which earlier coding system supported Welsh? (I'm thinking of 'W WITH CIRCUMFLEX', U+0174 and U+0175.) How was the use of the canonical decompositions incompatible with the character encodings of legacy systems? Latin-1 has the same codes as

Re: Designing a format for research use of the PUA in a RTL mode (from Re: RTL PUA?)

2011-08-23 Thread Asmus Freytag
On 8/23/2011 7:22 AM, Doug Ewell wrote: Of all applications, a word processor or DTP application would want to know more about the properties of characters than just whether they are RTL. Line breaking, word breaking, and case mapping come to mind. I would think the format used by standard UCD

Re: Code pages and Unicode

2011-08-23 Thread Asmus Freytag
On 8/23/2011 12:00 PM, Richard Wordingham wrote: On Mon, 22 Aug 2011 16:18:56 -0700 Ken Whistlerk...@sybase.com wrote: How about Clause 12.5 of ISO/IEC 10646: 001B, 0025, 0040 You escape out of UTF-16 to ISO 2022, and then you can do whatever the heck you want, including exchange and

Re: RTL PUA?

2011-08-22 Thread Asmus Freytag
On 8/21/2011 7:34 PM, Doug Ewell wrote: So what you are asking about is a directional control character that would assign subsequent characters a BC of 'AL', right? You don't want to call this a LANGUAGE MARK or anything else that implies language identification, because of the existence of

Re: Implement BIDI algorithm by line

2011-08-22 Thread Asmus Freytag
Huh? What context is this in? On 8/22/2011 11:18 AM, CE Whitehead wrote: Hi. I think many line breaks within paragraphs are soft line breaks but that embedding levels have to be taken into account when deciding the width of the glyphs; that's as near as I can tell. Here is the description

Re: RTL PUA?

2011-08-21 Thread Asmus Freytag
On 8/21/2011 3:31 PM, Richard Wordingham wrote: On Sun, 21 Aug 2011 11:00:26 -0600 Doug Ewelld...@ewellic.org wrote: I think as soon as we start talking about this many scenarios, we are no longer talking about what the *default* bidi class of the PUA (or some part of it) should be. Instead,

Re: RTL PUA?

2011-08-20 Thread Asmus Freytag
On 8/20/2011 6:44 PM, Doug Ewell wrote: Would that really be a better default? I thought the main RTL needs for the PUA would be for unencoded scripts, not for even more Arabic letters. (How many more are there anyway?) In any case, either 'R' or 'AL' as the Plane 16 default would be an

Re: Code pages and Unicode (wasn't really: RE: Endangered Alphabets)

2011-08-19 Thread Asmus Freytag
On 8/19/2011 2:35 PM, Jukka K. Korpela wrote: 20.8.2011 0:07, Doug Ewell wrote: Of course, 2.1 billion characters is also overkill, but the advent of UTF-16 was how we ended up with 17 planes. And now we think that a little over a million is enough for everyone, just as they thought in the

Re: Code pages and Unicode (wasn't really: RE: Endangered Alphabets)

2011-08-19 Thread Asmus Freytag
On 8/19/2011 3:24 PM, Ken Whistler wrote: On 8/19/2011 2:07 PM, Doug Ewell wrote: Technically, I think 10646 was always limited to 32,768 planes so that one could always address a code point with a 32-bit signed integer (a nod to the Java fans). Well, yes, but it didn't really have anything

Re: What are the present criteria...

2011-08-18 Thread Asmus Freytag
On 8/18/2011 7:29 AM, Doug Ewell wrote: Karl Pentzlinkarl dash pentzlin at acssoft dot de wrote: The quoted indicators for benefit were part of a concern of the German NB regarding the Wingding/Webding proposals. The concern expressed in WG2 N4085 is that some characters proposed there

Re: Sanskrit nasalized L

2011-08-16 Thread Asmus Freytag
On 8/16/2011 1:57 AM, Andrew West wrote: On 16 August 2011 02:59, Richard Wordingham richard.wording...@ntlworld.com wrote: All I've got to go on is the penultimate sentence in TUS 6.0 Section 10.2 - 'Rarely, stacks are seen that contain more than one such consonant-vowel combination in a

Re: Non-standard Tibetan stacks (was Re: Sanskrit nasalized L)

2011-08-16 Thread Asmus Freytag
On 8/16/2011 3:32 PM, Andrew West wrote: On 16 August 2011 18:19, Asmus Freytagasm...@ix.netcom.com wrote: These stacks are highly unusual and are considered beyond the scope of plain text rendering. They may be handled by higher-level mechanisms. The question is: have any such mechanisms

Re: Greek Characters Duplicated as Latin

2011-08-14 Thread Asmus Freytag
On 8/14/2011 1:39 PM, Richard Wordingham wrote: U+00B5 MICRO SIGN is an ISO-8859-1 character, and was therefore included as U+00B5. It normally precedes a Latin-script letter, and therefore it actually makes sense to treat it as a Latin-script character, and possibly give it a different shape

Re: Anything from the Symbol font to add along with W*dings?

2011-08-14 Thread Asmus Freytag
On 8/14/2011 12:51 PM, Jukka K. Korpela wrote: 14.8.2011 17:51, Doug Ewell wrote: This sounds like Jukka expects browsers to analyze the glyph assigned in the font to the code position for 'a' and decline to display it if it doesn't look enough like an 'a' (rejecting, for example, Greek 'α').

Re: ZWNBSP vs. WJ (was: How is NBH (U0083) Implemented?)

2011-08-05 Thread Asmus Freytag (w)
The ambiguity of an initial FEFF was not desirable, but this discussion shows that certain things can't be so easily fixed by adding characters at a later stage. The more time elapsed between encoding of the ambiguous character and the later fix the more software, the more data, and the more

Re: Quick survey of Apple symbol fonts (in context of the Wingding/Webding proposal)

2011-07-17 Thread Asmus Freytag
On 7/17/2011 2:47 AM, Petr Tomasek wrote: On Sun, Jul 17, 2011 at 10:14:55AM +0100, Julian Bradfield wrote: Wouldn't it be more economical to encode a single UNICODE ESCAPE CHARACTER which forces the following character to be interpreted as a printable glyph rather than any control function?

Re: Quick survey of Apple symbol fonts (in context of the Wingding/Webdingproposal)

2011-07-17 Thread Asmus Freytag
On 7/17/2011 12:19 PM, Doug Ewell wrote: Asmus wrote: The reason is, of course, because these codes would *reinterpret* existing characters. You could argue that Variation Selectors do the same, but they are carefully constructed so that they can be safely ignored. Variation selectors

Re: Quick survey of Apple symbol fonts (in context of the Wingding/Webding proposal)

2011-07-17 Thread Asmus Freytag
On 7/17/2011 12:19 PM, Philippe Verdy wrote: 2011/7/17 Asmus Freytagasm...@ix.netcom.com: On 7/17/2011 2:35 AM, Michael Everson wrote: ... invisible and stateful control characters are more expensive than ordinary graphic symbols. In this case, the expense is so much higher as to rule out

Re: Quick survey of Apple symbol fonts (in context of the Wingding/Webdingproposal)

2011-07-16 Thread Asmus Freytag
On 7/15/2011 10:48 PM, Doug Ewell wrote: I apologize for the unintended content-free post. It's my phone's fault. -- My dog ate the homework - 2011? :) A./

Re: Quick survey of Apple symbol fonts (in context of the Wingding/Webding proposal)

2011-07-16 Thread Asmus Freytag
On 7/16/2011 1:53 AM, Michael Everson wrote: On 16 Jul 2011, at 04:37, Asmus Freytag wrote: It's not a matter of competing views. There's a well-defined process for adding characters to the standard. It starts by documenting usage. Yes, Asmus, and when one wants to do that, one writes

Re: Quick survey of Apple symbol fonts (in context of the Wingding/Webding proposal)

2011-07-16 Thread Asmus Freytag
Karl, I've published similar surveys in the past, where the object was to get feedback on the desirability of further action. I stick by my recommendation in favor of keeping raw data out of the document registry and of doing the committee a favor by adding value in form of a sifting or

Re: Quick survey of Apple symbol fonts (in context of the Wingding/Webding proposal)

2011-07-15 Thread Asmus Freytag
On 7/15/2011 1:08 AM, Karl Pentzlin wrote: In WG2 N4085 Further proposed additions to ISO/IEC 10646 and comments to other proposals (2011‐ 05‐25), the German NB had requested re WG2 N4022 Proposal to add Wingdings and Webdings Symbols besides other points: Also, in doing this work, other

Re: Quick survey of Apple symbol fonts (in context of the Wingding/Webding proposal)

2011-07-15 Thread Asmus Freytag
On 7/15/2011 9:03 AM, Doug Ewell wrote: Andrew Westandrewcwest at gmail dot com replied to Michael Everson: I think that having encoded symbols for control characters (which we already have for some of them) is no bad thing, and the argument about too many characters is not compelling, as

Re: Quick survey of Apple symbol fonts (in context of the Wingding/Webding proposal)

2011-07-15 Thread Asmus Freytag
On 7/15/2011 2:23 AM, Karl Pentzlin wrote: Am Freitag, 15. Juli 2011 um 10:58 schrieb Asmus Freytag: AF ... There appear to be a large number of symbols for which a AF Unicode equivalent can be identified with great certainty - AF and beyond that there seem to be characters for which such AF

Re: Quick survey of Apple symbol fonts (in context of the Wingding/Webding proposal)

2011-07-15 Thread Asmus Freytag
On 7/15/2011 10:26 AM, Michael Everson wrote: What I see is a certain unreasonability reflecting a certain conservatism. Text about the Standard is important, and should be representable in an interchangeable way. Here { } is a Right to left override character. character. I want to talk about

Re: Quick survey of Apple symbol fonts (in context of the Wingding/Webding proposal)

2011-07-15 Thread Asmus Freytag
On 7/15/2011 11:05 AM, Doug Ewell wrote: What I see is a certain unreasonability reflecting a certain conservatism. Text about the Standard is important, and should be representable in an interchangeable way. Here { } is a Right to left override character. character. I want to talk about it

Re: Quick survey of Apple symbol fonts (in context of the Wingding/Webding proposal)

2011-07-15 Thread Asmus Freytag
On 7/15/2011 11:36 AM, Michael Everson wrote: However, I agree with Asmus that in the context of the Wingdings-type symbols these characters should not be considered. They should be considered as a whole on their own. Thank you Michael. To reiterate and restate (so it can be read out of

Definition of character

2011-07-12 Thread Asmus Freytag
Jukka, reminding everyone of the definition of technical term as opposed to a word in everyday language isn't helping address the underlying issue. Everyone is familiar with this distinction. You note that there's a bit of a truism that underlies the definition of character and character

Re: Unicode 7.0 goals and ++

2011-07-11 Thread Asmus Freytag
On 7/11/2011 11:57 AM, Ken Whistler wrote: On 7/10/2011 4:58 PM, Ernest van den Boogaard wrote: For the long term, I suggest Unicode should aim for this: That kind of terminological purity isn't going to occur. ... The Unicode Consortium has a glossary of terms: ... But the Unicode

Re: Proposed Update UAXes for Unicode 6.1

2011-07-07 Thread Asmus Freytag
On 7/7/2011 8:42 PM, Karl Williamson wrote: On 07/07/2011 02:33 PM, announceme...@unicode.org wrote: Proposed updates for most Unicode Standard Annexes for Version 6.1 of the Unicode Standard have been posted for public review. Many of the documents appear to have no current modifications to

Re: unicode Digest V12 #108

2011-07-06 Thread Asmus Freytag
On 7/3/2011 6:31 AM, Philippe Verdy wrote: Regarfing the previous comment about the Danish aa, Sorry, most of that discussion missed the mark. Modern Danish can have AA for two reasons. Accidental occurrence, as in dataanalyse which is composed of two words which just happens to put two A

Re: unicode Digest V12 #108

2011-07-06 Thread Asmus Freytag
On 7/6/2011 12:16 AM, Jukka K. Korpela wrote: Allowing word division just to say that some characters do not constitute a digraph (or trigraph…) is not practical e.g. when the text has otherwise no word divisions, for one reason or another, or when the particular word division point is

Re: unicode Digest V12 #108

2011-07-02 Thread Asmus Freytag
On 7/2/2011 8:59 AM, Philippe Verdy wrote: 2011/7/2 Andrew Millera.j.mil...@bcs.org.uk: The ng in Llangollen is not the digram ng but two separate letters (unlike the ll in the name which is the digram). Why not simply using a soft hyphen between n and g in this case ? Soft hyphens are

Re: Typo in bidi reference implementation

2011-07-01 Thread Asmus Freytag
On 7/1/2011 12:06 AM, Peter Krefting wrote: Hi! On line 65 of http://www.unicode.org/Public/PROGRAMS/BidiReferenceCpp/bidi.cpp (version 26) the word utility is spelled as uitlity (line 80 has the correct spelling). Not that it matters much, just something we noticed. If it's in a comment,

Re: Latin IPA letter a

2011-06-28 Thread Asmus Freytag
On 6/28/2011 1:51 AM, Michael Everson wrote: On 28 Jun 2011, at 09:28, Jean-François Colson wrote: In Times New Roman, which is the default font for MS Word (probably the best known word processor), the letters “a” and “ɑ” are indistinguishable in italics. That is a fault of the font. No,

Re: Unifon

2011-06-28 Thread Asmus Freytag
On 6/28/2011 1:40 AM, Andreas Stötzner wrote: Am 28.06.2011 um 09:43 schrieb Jean-François Colson: I’m interested in Unifon (http://www.unifon.org). That’s a phonemic alphabet for English which is used to teach reading. Although it has been encoded in the ConScript Unicode Registry as a new

Re: UNICODE version of _T(x) macro

2010-11-23 Thread Asmus Freytag
On 11/23/2010 1:58 AM, sowmya satyanarayana wrote: This what I am actually looking for. My ODBC application supports UTF-16, which is 2 byte width characters. This application is completely oriented around using _T(x) macro as Asmus Freytag figured out. Yeah, it's nice when you can do

Re: Are Latin and Cyrillic essentially the same script?

2010-11-22 Thread Asmus Freytag
On 11/22/2010 4:15 AM, Michael Everson wrote: It boils down to this: just as there aren’t technical or usability reasons that make it problematic to represent IPA text using two Greek characters in an otherwise-Latin system, Yes there are. Sorting multilingual text including Greek and IPA

Re: UNICODE version of _T(x) macro

2010-11-22 Thread Asmus Freytag
On 11/22/2010 10:18 AM, Phillips, Addison wrote: sowmya satyanarayanasowmya underscore satyanarayana at yahoo dot com wrote: Taking this, what is the best way to define _T(x) macro of UNICODE version, so that my strings will always be 2 byte wide character? Unicode characters aren't always

Re: UNICODE version of _T(x) macro

2010-11-22 Thread Asmus Freytag
On 11/22/2010 11:08 AM, Asmus Freytag wrote: depending on whether some global compile time flat (usually UNICODE or _UNICODE) is set or not. recte: flag.

Re: Are Latin and Cyrillic essentially the same script?

2010-11-19 Thread Asmus Freytag
On 11/18/2010 11:15 PM, Peter Constable wrote: If you'd like a precedent, here's one: Yes, I think discussion of precedents is important - it leads to the formulation of encoding principles that can then (hopefully) result in more consistency in future encoding efforts. Let me add the

Re: Are Latin and Cyrillic essentially the same script?

2010-11-18 Thread Asmus Freytag
On 11/18/2010 8:04 AM, Peter Constable wrote: From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf Of André Szabolcs Szelp AFAIR the reservations of WG2 concerning the encoding of Jangalif Latin Ь/ь as a new character were not in view of Cyrillic Ь/ь, but rather in

Re: Application that displays CJK text in Normalization Form D

2010-11-15 Thread Asmus Freytag
On 11/15/2010 2:24 PM, Kenneth Whistler wrote: FA47 is a compatibility character, and would have a compatibility mapping. Faulty syllogism. Formally correct answer but only because of something of a design flaw in Unicode. When the type of mapping was decided on, people didn't fully expect

Re: CJK Compatibility Gotchas (was: Re: Application that displays CJK text in Normalization Form D

2010-11-15 Thread Asmus Freytag
On 11/15/2010 5:43 PM, Kenneth Whistler wrote: Perhaps someone would like to make a detailed proposal to the UTC for how to fix the text and charts?;-) Ken, having shown yourself the master of detail in your reply, I think you've appointed yourself. A round of applause for Ken! See how

Re: Application that displays CJK text in Normalization Form D

2010-11-14 Thread Asmus Freytag
On 11/14/2010 12:57 PM, Doug Ewell wrote: Jim Monty jim dot monty at yahoo dot com wrote: Japanese kana (the J in CJK) and Korean syllables (the K in CJK) both have different normalization forms. What do ideographs have to do with anything? I didn't mention ideographs; you did. The term CJK

Re: Is there a term for strictly-just-this-encoding-and-not-really-that-encoding?

2010-11-10 Thread Asmus Freytag
If you want to get that point across to a general audience, you could use a more colloquial term, albeit one that itself derives from mathematics. Text that can be completely expressed in ASCII is fits into something (ASCII) that works as a lowest common denominator of a large number of

Re: Utility to report and repair broken surrogate pairs in UTF-16 text

2010-11-05 Thread Asmus Freytag
On 11/4/2010 5:46 PM, Doug Ewell wrote: Markus Scherer wrote: While processing 16-bit Unicode text which is not assumed to be well-formed UTF-16, you can treat (decode) an unpaired surrogate as a mostly-inert surrogate code point. However, you cannot unambiguously encode a surrogate code

Re: Utility to report and repair broken surrogate pairs in UTF-16 text

2010-11-05 Thread Asmus Freytag
On 11/5/2010 7:02 AM, Doug Ewell wrote: Asmus Freytagasmusf at ix dot netcom dot com wrote: I'm probably missing something here, but I don't agree that it's OK for a consumer of UTF-16 to accept an unpaired surrogate without throwing an error, or converting it to U+FFFD, or otherwise raising

Re: A simpler definition of the Bidi Algorithm

2010-10-17 Thread Asmus Freytag
On 10/17/2010 7:01 AM, Michael D. Adams wrote: This is something that not even the C++ and Java reference implementations do (though it appears that the C++ implementation of the W rules was originally derived from a regular expression as it uses state tables, but if so it is undocumented).

Re: A simpler definition of the Bidi Algorithm

2010-10-17 Thread Asmus Freytag
On 10/17/2010 10:59 AM, Michael D. Adams wrote: The biggest challenge was not in creating those tables, but in understanding the nuances of the rules, by the way. Two questions so I can understand better. First, by nuances do you mean the nuances of how the rules interact (which I think would

Re: [unicode] Telugu Unicode Encoding Review

2010-10-16 Thread Asmus Freytag
On 10/16/2010 10:38 AM, suzuki toshiya wrote: Hi, I've never heard any comments about the reservation of the codepoints to making the code chart structure similar among multiple script, no posive, no negative. So your comment is interesting. Could you tell me more about what kind of

Re: statistics

2010-10-12 Thread Asmus Freytag
On 10/11/2010 9:49 PM, Janusz S. Bień wrote: On Mon, 11 Oct 2010 announceme...@unicode.org wrote: The newly finalized Unicode Version 6.0 adds 2,088 characters, What is the current total? Are other statistic informations available somewhere? The announcement gives a link to click

Re: Irrational numeric values in TUS

2010-10-12 Thread Asmus Freytag
Ken, some comments, and a few suggestions near the end. On 10/12/2010 4:56 PM, Kenneth Whistler wrote: Karl Williamson asked: The Unicode standard only gives numeric values to rational numbers. Is the reason for this merely because of the difficulty of representing irrational ones? No.

Re: 00B7 vs. 2027

2010-09-18 Thread Asmus Freytag
On 9/18/2010 8:36 AM, abysta wrote: Hello. I need a dot to separate words into syllables. What should I use, 00B7 or 2027, and why? 2027 is explicitly intended to be used to show syllables as is done in dictionaries. You don't make it explicit in your query, but it sounds like that is

Re: 00B7 vs. 2027

2010-09-18 Thread Asmus Freytag
On 9/18/2010 10:56 AM, Lorna Priest wrote: U+00B7 MIDDLE DOT is semantically ambiguous and has (partly therefore) varying renderings, and it might be used as a replacement for U+2027 if the latter cannot be used reliably. What about using U+02D1 - half triangular colon? Why not use

Re: A simpler definition of the Bidi Algorithm

2010-09-10 Thread Asmus Freytag
The first discussions that lead to the current formulation of the bidi algorithm easily go back 20 years by now. There's some value in not re-stating a specification - even if a new formulation could be found to be 100% equivalent. That value lies in the fact that any reader can tell, by

Re: Accessing alternate glyphs from plain text (from Re: Draft Proposal to add Variation Sequences for Latin and Cyrillic letters)

2010-08-06 Thread Asmus Freytag
On 8/6/2010 2:03 AM, William_J_G Overington wrote: On Thursday, 5 August 2010, Kenneth Whistler k...@sybase.com wrote: I am thinking of where a poet might specify an ending version of a glyph at the end of the last word on some lines, yet not on others, for poetic effect. I think that it

Re: Draft Proposal to add Variation Sequences for Latin and Cyrillic letters

2010-08-05 Thread Asmus Freytag
On 8/5/2010 3:47 AM, William_J_G Overington wrote: On Wednesday 4 August 2010, Asmus Freytag asm...@ix.netcom.com wrote: However, there's no need to add variation sequences to select an *ambiguous* form. Those sequences should be removed from the proposal. Are you here talking about

Re: Draft Proposal to add Variation Sequences for Latin and Cyrillic letters

2010-08-04 Thread Asmus Freytag
On 8/2/2010 5:04 PM, Karl Pentzlin wrote: I have compiled a draft proposal: Proposal to add Variation Sequences for Latin and Cyrillic letters The draft can be downloaded at: http://www.pentzlin.com/Variation-Sequences-Latin-Cyrillic2.pdf (4.3 MB). The final proposal is intended to be submitted

Re: Standard fallback characters (was: Draft Proposal to add Variation=D=A Sequences for Latin and Cyrillic letters)

2010-08-04 Thread Asmus Freytag
On 8/4/2010 1:30 PM, verdy_p wrote: Asmus Freytag wrote: The Fraktur problem is one where one typestyle requires additional information (e.g. when to select long s) that is not required for rendering the same text in another typestyle. If it is indeed desirable (and possible) to create

Re: Re:=D=A Standard fallback characters (was: Draft Proposal to add Variation� Sequences for Latin and Cyrillic letters)

2010-08-04 Thread Asmus Freytag
Philipe, Text typeset in Fraktur contains more information than text typset in Antiqua. That means, there are some places where there are some (mild) ambiguities in representation in the Antiqua version. Not enough to bother a human reader who can use deep context to read the text correctly,

Re: Plain text

2010-07-29 Thread Asmus Freytag
On 7/28/2010 9:32 PM, Doug Ewell wrote: Murray Sargent murrays at exchange dot microsoft dot com wrote: It's worth remembering that plain text is a format that was introduced due to the limitations of early computers. Books have always been rendered with at least some degree of rich text. And

Re: High dot/dot above punctuation?

2010-07-28 Thread Asmus Freytag
On 7/28/2010 2:02 AM, Kent Karlsson wrote: Den 2010-07-28 09.50, skrev Jukka K. Korpela jkorp...@cs.tut.fi: André Szabolcs Szelp wrote: Generally, for the decimal point . (U+002E FULLSTOP) and , (U+002C COMMA) is used in the SI world. However, earlier conventions could use different

Re: High dot/dot above punctuation?

2010-07-28 Thread Asmus Freytag
On 7/28/2010 10:09 AM, Murray Sargent wrote: Contextual rendering is getting to be more common thanks to adoption of OpenType features. For example, both MS Publisher 2010 and MS Word 2010 support various contextually dependent OpenType features at the user's discretion. The choice of glyph for

Re: Reasonable to propose stability policy on numeric type = decimal

2010-07-28 Thread Asmus Freytag
On 7/28/2010 10:13 PM, Martin J. Dürst wrote: Sequences of numeric Kanji are also used in names and word-plays, and as sequences of individual small numbers. But the same applies to our digits. A very simple example is to use them as a ruler in plain text: 1 2 3

Re: Why does EULER CONSTANT not have math property and PLANCK CONSTANT does?

2010-07-27 Thread Asmus Freytag
On 7/27/2010 3:02 PM, Kenneth Whistler wrote: Karl Williamson asked: Subject: Why does EULER CONSTANT not have math property and PLANCK CONSTANT does? They are U+2107 and U+210E respectively. Because U+210E PLANCK CONSTANT is, to quote the standard, simply a mathematical

Re: ? Reasonable to propose stability policy on numeric type = decimal

2010-07-26 Thread Asmus Freytag
On 7/26/2010 12:13 PM, Mark Davis ☕ wrote: I agree that having it stated at point of use is useful - and we do that in other cases covered by stability clauses; but we can only state it IF we have the corresponding stability policy. Mark, The statement in your but clause really isn't correct.

Re: Reasonable to propose stability policy on numeric type = decimal

2010-07-25 Thread Asmus Freytag
The short answer to Karl's question is that there will not be an absolute guarantee. The long answer is that, partly for the reasons he's mentioned, this won't be a practical problem. A. Most of the living scripts that are in wide use have been encoded, including whatever digits are in use.

Re: Reasonable to propose stability policy on numeric type = decimal

2010-07-25 Thread Asmus Freytag
On 7/25/2010 6:05 PM, Martin J. Dürst wrote: On 2010/07/26 4:37, Asmus Freytag wrote: PPS: a very hypothetical tough case would be a script where letters serve both as letters and as decimal place-value digits, and with modern living practice. Well, there actually is such a script, namely

Re: ? Reasonable to propose stability policy on numeric type = decimal

2010-07-24 Thread Asmus Freytag
On 7/24/2010 3:00 PM, Bill Poser wrote: On Sat, Jul 24, 2010 at 1:00 PM, Michael Everson ever...@evertype.com wrote: Digits can be scattered randomly about the code space and it wouldn't make any difference. Having written a library for performing conversions between Unicode strings

Re: charset parameter in Google Groups

2010-07-07 Thread Asmus Freytag
Andreas, I think we all realize your frustration with well-meaning software. Because tags can be wrong for no fault of the human originating the document, I fully understand that Google might want to attempt to improve the user experience in such situations. The problem is that doing so

Re: charset parameter in Google Groups (was Re: Indian Rupee Sign to be chosen today)

2010-06-28 Thread Asmus Freytag
On 6/28/2010 11:38 AM, Mark Davis ☕ wrote: The problem with slavishly following the charset parameter is that it is often incorrect. However, the charset parameter is a signal into the character detection module, so the charset is correctly supplied from the message then the results of the

Re: Latin Script

2010-06-28 Thread Asmus Freytag
I'd like to second Mark. There is a lot of information in the Standard, including the UAXs, and the Unicode Character Database that would help answer your questions. The volunteers associated with the Unicode effort have worked hard putting all that information together - so use it, instead

Re: Generic Base Letter

2010-06-27 Thread Asmus Freytag
The one argument that I find convincing is that too many implementations seem set to disallow generic combination, relying instead on fixed tables of known/permissible combinations. In that situation, a formally adopted character with the clearly stated semantic of is expected to actually

Re: Indian Rupee Sign to be chosen today

2010-06-26 Thread Asmus Freytag
On 6/26/2010 5:41 PM, Doug Ewell wrote: Regarding the inability to distinguish 8859-15 heuristically from 8859-1, I understand the problem when there are no tags or other hints, or for cases like Windows-1252 text declared to be 8859-1, but it seems unlikely to me that there is much text

Re: Latin Script

2010-06-17 Thread Asmus Freytag
On 6/17/2010 7:24 PM, Tulasi wrote: What is equivalent ISO/IEC ISO/IEC what? There are hundreds of ISO/IEC standards, of which dozens are character encoding standards. for U+0278 LATIN SMALL LETTER PHI (ɸ)? Or do Unicode ISO/IEC use different number name for same letter/symbol?

Re: Writing a proposal for an unusual script: SignWriting

2010-06-14 Thread Asmus Freytag
On 6/14/2010 1:18 PM, Mark E. Shoulson wrote: On 06/14/2010 02:15 PM, Asmus Freytag wrote: On 6/14/2010 9:21 AM, Stephen Slevinski wrote: Plain text SignWriting should be able to write actual sign language, such as hello world. You could equally well insist that it should be possible

Re: Tamil u,uu matra consonants - Orthographic variation

2010-06-09 Thread Asmus Freytag
Can we stop double posting on Unicode and Unicore list? People on the unicode list cannot reply to people on the other list, and vice versa (unless they happen to be mermbers of both lists). Thanks. A./

Re: Questionable lines on LineBreakTest.txt

2010-06-07 Thread Asmus Freytag
On 6/7/2010 4:26 PM, Masaaki Shibata wrote: I'm studying the UAX #14 (5.2.0) and testing my code against LineBreakTest.txt. And I found some test cases on this text file seem to be contradictory to the rules on the document. For example, LB25 explicitly prohibits breaking between CP and PO,

Re: Least used parts of BMP.

2010-06-04 Thread Asmus Freytag
On 6/4/2010 8:34 AM, Mark Davis ☕ wrote: In a compression format, that doesn't matter; you can't expect random access, nor many of the other features of UTF-8. The minimal expectation for these kinds of simple compression is that when you write a string with a particular /write/ method, and

Re: Greek letter LAMDA?

2010-06-02 Thread Asmus Freytag
On 6/1/2010 6:04 PM, Mark Crispin wrote: I don't think that the unicode list should be used for the type of questions that have polluted it recently. That list unicode@unicode.org is open for general questions. It has no formal standing as far as the business of the Consortium is concerned, and

Re: Least used parts of BMP.

2010-06-02 Thread Asmus Freytag
On 6/1/2010 8:04 PM, Kannan Goundan wrote: I'm trying to come up with a compact encoding for Unicode strings for data serialization purposes. The goals are fast read/write and small size. Why not use SCSU? You get the small size and the encoder/decoder aren't that complicated. You get the

Re: Greek letter LAMDA?

2010-06-02 Thread Asmus Freytag
On 6/2/2010 11:46 AM, Jonathan Rosenne wrote: Although this mail was not addressed to me, I did read it. Sue me. The terms of use for the Unicode mail list essentially state that these types of boilerplate are null and void as far as Unicode is concerned. You will find the following in

Re: Greek letter LAMDA?

2010-06-02 Thread Asmus Freytag
On 6/2/2010 3:28 PM, John Dlugosz wrote: If anyone can “null and void” it, I wonder why companies bother to put such things in people’s outgoing mail. I would have thought they could come up with a proper net-etiquite version, but they just don’t care. These things are bogus, because they

Re: Least used parts of BMP.

2010-06-02 Thread Asmus Freytag
SCSU is a pass-through for ASCII, plus it handles the common mix of ASCII plus 96 local characters (Latin-1, Greek, Cyrillic, Thai, etc) really fast. Go look at the sample code. If you take that as starting point for optimization, I think you'll be fine.

Re: Greek letter LAMDA?

2010-06-01 Thread Asmus Freytag
On 6/1/2010 1:37 PM, John Dlugosz wrote: Why does the code chart call the plain Greek letter (upper and lower case) “LAMDA” rather than “LAMBDA”? The latter is used in other places where a glyph is based on the lambda, e.g. “U+019B LATIN SMALL LETTER LAMBDA WITH STROKE” Names sometimes

Re: Greek letter LAMDA?

2010-06-01 Thread Asmus Freytag
On 6/1/2010 4:14 PM, Mark Crispin wrote: Is it really necessary to have this sort of pedagogical discussions on the Unicode list? Is this character name misspelled? Is Unicode a for-profit company? Who owns the Unicode font? etc. etc. Perhaps we need to have a

Re: Unicode Inc

2010-05-31 Thread Asmus Freytag
On 5/31/2010 12:33 PM, Tulasi wrote: Thanks Mark for posting the links! My posting was based on http://www.unicode.org/consortium/directors.html where in the bottom it said Unicode Inc. Looks like the elected members from consortium http://www.unicode.org/consortium/consort.html forms Unicode

Re: IS UNICODE a STANDRAD ?

2010-05-31 Thread Asmus Freytag
On 5/31/2010 2:12 PM, V. M. Kumaraswamy wrote: Hello all, Just a clarification an UNICODE. Is UNICODE a STANDRAD Yes, Unicode (The Unicode Standard), is indeed a standard. And no, the use of ALL CAPS is discouraged. The proper spelling is Unicode. that needs to be followed by all

<    4   5   6   7   8   9   10   11   12   13   >