Re: Folding algorithm and canonical equivalence

2004-07-18 Thread Asmus Freytag
At 11:17 PM 7/17/2004, John Cowan wrote: Peter Kirk scripsit: But I think the best thing to do is to drop *all* Hebrew combining marks; the result of this is valid unpointed Hebrew. I agree. OK, in my last message I was cofused, this was Peter's suggestion and Jony had seconded it. I take it

Re: Folding algorithm and canonical equivalence

2004-07-18 Thread Asmus Freytag
At 05:28 AM 7/18/2004, Peter Kirk wrote: I can see that there might be cases when the Hebrew folding should be invoked without other scripts being affected. But I think that anyone applying a general accent or diacritic folding would expect this to include all Hebrew (and Arabic, Syriac etc)

Re: Folding algorithm and canonical equivalence

2004-07-18 Thread Asmus Freytag
At 05:25 AM 7/18/2004, Peter Kirk wrote: I accept that there might be some script-specific cases in which particular accents should not be removed. The breve in Cyrillic i kratkoe might be an example; but then this might be rather too language-specific as well. But these should be clearly

RE: Folding algorithm and canonical equivalence

2004-07-19 Thread Asmus Freytag
At 07:53 PM 7/18/2004, Jony Rosenne wrote: By this logic, I cannot see why you lump Latin/Greek/Cyrillic together. Latin/Greek/Cyrillic share the fact that for searches you may want to remove accents, but, except for very unusual circumstances, it's not a good idea to transform text permanently.

Re: Back to the subject: Folding algorithm and canonical equivalence

2004-07-19 Thread Asmus Freytag
At 01:56 PM 7/19/2004, Mark Davis wrote: You did point out an oversight; Asmus and I have been working on the issue. ‎Mark As Mark wrote, your point is taken and we've taken that onboard. However, we won't try to *edit* text on the list, that's why we are not engaging in a long discussion on the

Re: Processing of default ignorable code points

2004-08-05 Thread Asmus Freytag
At 11:11 AM 8/5/2004, Peter Kirk wrote: In TUS 4.0 Section 5.3, p.111, the following is stated of default ignorable code points: These characters are also ignored except with respect to specific, defined processes; for example, ZERO WIDTH NON-JOINER is ignored in collation. ... For more

Re: Microsoft Unicode Article Review

2004-08-06 Thread Asmus Freytag
At 10:04 AM 8/6/2004, Marcin 'Qrczak' Kowalczyk wrote: I don't like perpetuating the myth that Unicode is a 16-bit encoding and UCS-2 can represent all Unicode characters Neither do I. I've replied to John offline with extensive comments. He's on a reasonably tight deadline, so he probably

Re: markup on combining characters

2004-09-08 Thread Asmus Freytag
At 12:49 AM 9/8/2004, Philippe Verdy wrote: And still no decision if this invisible base character will be added or not. It's just a public review for now, Well, hold your horses for a bit here. If something's out of review, there won't be a decision until the review is over. Anything that has

RE: [BULK] - Re: markup on combining characters

2004-09-08 Thread Asmus Freytag
At 05:53 PM 9/8/2004, Mike Ayers wrote: From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Asmus Freytag Sent: Wednesday, September 08, 2004 2:25 PM If something's out of review, there won't be a decision until the review is over. I'm sorry, but I can't make sense

RE: [BULK] - Re: markup on combining characters

2004-09-09 Thread Asmus Freytag
At 11:06 AM 9/9/2004, Mike Ayers wrote: how to color parts of characters, is out of scope. A given diacritic can be a part of a character's glyph, but _combining characters_ are characters, not merely part of characters. Therefore, in principle, the question of how to encode text streams that

Re: TR29 Word Break awkwardness

2004-09-14 Thread Asmus Freytag
It might be worth stepping back and asking the question: What is the purpose of publishing word-breaking behavior as part of the Unicode Standard? The answer to this question is neither easy nor obvious. Part of the problem is that what constitutes a 'word' is subject to tailoring. In certain

Re: Questions about diacritics

2004-09-15 Thread Asmus Freytag
At 05:21 PM 9/14/2004, Anto'nio Martins-Tuva'lkin wrote: On 2004.09.14, 17:06, Jörg Knappen [EMAIL PROTECTED] wrote: My classic for this situation is the german -burg abbreviature often seen in cartography: It is -bg. with breve between b and g. Why not U+0062 U+035D U+0067 ? I guess that the

Unibook 4.0.1 available

2004-09-15 Thread Asmus Freytag
I've updated Unibook to version 4.0.1 The latest version reads more property files and can display some of the new 4.0.1 properties. There are new ways to combine properties. As before, you can cutpaste either the character code or the character name of a selected character to the clipboard.

Re: Unibook 4.0.1 available

2004-09-16 Thread Asmus Freytag
At 04:32 AM 9/16/2004, Marion Gunn wrote: Lovely browser. Is it possible to obtain a Mac-friendly version? mg I'm glad you like it. I've been told previously by experienced Mac users that it runs fine with Virtual PC on the Mac. A./ PS: To those of you who downloaded 4.0.1 already, the zip

Re: Unibook 4.0.1 available

2004-09-17 Thread Asmus Freytag
Just a note of thanks to many of you who have sent me useful feedback. I also found out that my update to the archive had gone awry, but am happy to point out now that http://www.unicode.org/unibook/Unibook-4.0.1.zip is finally the correct version. A./

RE: Saudi-Arabian Copyright sign

2004-09-19 Thread Asmus Freytag
At 11:37 AM 9/19/2004, D. Starner wrote: To: [EMAIL PROTECTED] Subject: RE: Saudi-Arabian Copyright sign Jorg Knappen writes: On Sun, 19 Sep 2004, Jon Hanna wrote: Looks like {U+062D, U+20DD} Yes, it does look like that. But it forms a separate entity, just like its precedents COPYRIGHT SIGN

RE: Saudi-Arabian Copyright sign

2004-09-20 Thread Asmus Freytag
At 06:09 PM 9/19/2004, D. Starner wrote: Asmus Freytag writes: Given the nature of the symbol in question, I would personally see no reason to object to encoding it - especially given the current and projected lack of availability of other alternatives. It's a simple combining character

RE: Saudi-Arabian Copyright sign

2004-09-20 Thread Asmus Freytag
At 10:32 AM 9/20/2004, you wrote: Michael Everson schrieb: I would like to see a range of samples in several publications published in several languages and more than one country. That would make a stronger case for it. I'll be able to dig out some more samples from other books published by

Re: Saudi-Arabian Copyright sign

2004-09-20 Thread Asmus Freytag
At 11:50 AM 9/20/2004, Eric Muller wrote: But the real obstacle for a generative approach is QA: if as a font vendor you want to ensure some level of quality, then it is hard to avoid human work essentially proportional to the number of base+mark *combinations* you claim to support. If you

Re: Saudi-Arabian Copyright sign

2004-09-21 Thread Asmus Freytag
At 10:55 PM 9/20/2004, Doug Ewell wrote: Jörg Knappen knappen at uni dash mainz dot de wrote: I see a precedent in Unicode to treat Copyright-like sign differently from simple encircled letters: Unicode takes precautions not to encode the same character twice. Therefore, superscript digits 2

Re: Named sequences, was: Saudi-Arabian Copyright sign

2004-09-21 Thread Asmus Freytag
At 03:58 AM 9/21/2004, Peter Kirk wrote: On 20/09/2004 19:21, Asmus Freytag wrote: ... PS for named sequences: See: http://www.unicode.org/reports/tr34 Draft Data: http://www.unicode.org/Public/4.1-Update/NamedCompositeEntities-4.1.0d4.txt (the last part of the file name may change

Re: bit notation in ISO-8859-x is wrong

2004-10-10 Thread Asmus Freytag
At 07:22 PM 10/10/2004, James Kass wrote: Most people begin counting with one, although in the recent past some computer technicians have begun to begin counting with zero. This is bound to cause problems and discrepancies. Not counting from zero leads to weird situations at times, such as the

Re: outside decomposed, inside precomposed

2004-10-13 Thread Asmus Freytag
At 01:42 PM 10/13/2004, Eric Muller wrote: It has interesting consequences: e.g. U+2FA1C is canonically equivalent to U+9F3B, so the BMP is not closed under canonical equivalence, so no conformant system could make its repertoire exactly the BMP. We should have thought of that sooner - what a

Re: Sample of german -burg abbreviature

2004-09-30 Thread Asmus Freytag
Otto Stolz wrote: As has been said before, in this thread (by Jörg Knappen, IIRC), the little bow in the -burg abbreviation stems from the u stripped together with the r. In German handwriting it used to be common to place a mark above the letter 'u', to distinguish it from 'n'. When I first saw

Re: Sample of german -burg abbreviature

2004-09-30 Thread Asmus Freytag
At 06:04 PM 9/30/2004, Michael Everson wrote: see no reason given for us not to unify the handwritten symbol we have seen with BREVE ABOVE. In the environment described, apparently bg is taken as an abbreviation for berg, and b˜g (with breve) is being used as an abbreviation for burg. The

test

2004-11-01 Thread Asmus Freytag
please ignore. A./

Re: basic-hebrew RtL-space ?

2004-11-01 Thread Asmus Freytag
At 09:48 PM 11/1/2004, Doug Ewell wrote: Philippe Verdy verdy underscore p at wanadoo dot fr wrote: Visual entry should never be used. It was used for some legacy encodings to render text on devices that don't implement the Bidi algorithm and can only render text as LTR. Nobody enters RTL text

Re: not font designers?

2004-11-07 Thread Asmus Freytag
At 08:29 PM 11/6/2004, Doug Ewell wrote: You've received nine public responses: one from a genuine font designer, two (or more, depending on interpretation) from people who have designed fonts at some time but don't identify themselves as font designers, and the remainder from people who

Re: The aim of Unicode

2004-11-08 Thread Asmus Freytag
At 05:14 PM 11/8/2004, Michael Everson wrote: At 00:47 + 2004-11-09, Peter Kirk wrote: The aim of Unicode standardisation is surely to define a single and unambiguous representation of text. Not at all, not in the least but. It's to provide encoding for the world's writing systems. It is

Re: Opinions on this Java URL?

2004-11-14 Thread Asmus Freytag
At 10:21 AM 11/14/2004, Doug Ewell wrote: Throughout all of this, I had completely missed the fact that the Tech Note for CESU-8 had been upgraded to a Tech Report, two and a half years ago, in fact. Perhaps I was in denial. Anyway, that ... invalidates many of my comments... Noted. CESU-8 is

Re: Opinions on this Java URL?

2004-11-14 Thread Asmus Freytag
At 10:01 PM 11/14/2004, Doug Ewell wrote: Asmus Freytag asmusf at ix dot netcom dot com wrote: There are some UTF-8/UTF-16 interoperability aspects that are addressed by CESU-8. These concerns are real, and affect multi- component architectures that must interchange data across component

Re: Opinions on this Java URL?

2004-11-16 Thread Asmus Freytag
At 01:45 PM 11/15/2004, Philippe Verdy wrote: Deprecated does not mean that it is not used. This interface remains accessible when working with internal class file format. I don't understand however why the storage format of the string constants pool was not changed when the class format was

Re: Unicode HTML, download

2004-11-21 Thread Asmus Freytag
At 11:10 AM 11/21/2004, Doug Ewell wrote: Actually, of course, the only way to *guarantee* that readers will see the right glyphs is to chuck HTML altogether and create a PDF file. And that's a task that needs to be approached with some care as well. The UTC and WG2 constantly get PDF documents

Re: No Invisible Character - NBSP at the start of a word

2004-11-24 Thread Asmus Freytag
At 04:36 AM 11/24/2004, Peter Kirk wrote: I understand that the proposed INVISIBLE CHARACTER was rejected at the recent UTC meeting. I presume that the intention is that NBSP should be used instead. At the moment, NBSP is the only sanctioned base character without 'ink'. There are cases of words

Re: My Querry

2004-11-24 Thread Asmus Freytag
At 04:23 PM 11/23/2004, Chris Jacobs wrote: Now, this implies that UTF-8 does interpret U+ as an ASCII NULL control char. This is incompatible with using it as a string terminator. Except that it's up to you how to interpret the C0 control codes in Unicode. You can do it according to ISO 6429

Re: No Invisible Character - NBSP at the start of a word

2004-11-24 Thread Asmus Freytag
At 04:53 PM 11/24/2004, Peter Kirk wrote: On 24/11/2004 22:23, Peter Kirk wrote: On 24/11/2004 22:00, Asmus Freytag wrote: ... The sequence SPACE NBSP *does* not allow a break after the SPACE under the line breaking rules we publish in UAX#14. I tried to change does not into *does* and missed

Re: Misuse of 8th bit [Was: My Querry]

2004-11-26 Thread Asmus Freytag
The fact is, once you dedicate the top bits in a pipe to some purposes, you've narrowed the width of the pipe. That's what happened to those systems that implemented a 7-bit pipe for ASCII by using the top bit for other purposes. And everybody seems to agree that when you serialize such an

Re: No Invisible Character - NBSP at the start of a word

2004-11-27 Thread Asmus Freytag
At 04:23 PM 11/26/2004, Peter Kirk wrote: As I understand it (and I asked for confirmation of this but have not received it), according to the current version of UAX #14 there is no break opportunity between SPACE and NBSP, because rule LB11b precedes rule LB12, although there is a note Many

Re: CGJ , RLM

2004-11-27 Thread Asmus Freytag
At 11:13 AM 11/26/2004, Philippe Verdy wrote: Note however that the ZWJ prohibits breaking, despite in French there's a possible hyphenation at the first occurence, where it is also a syllable break, but not for the second occurence that occurs in the middle of the second syllable. None of the

Re: Relationship between Unicode and 10646

2004-11-27 Thread Asmus Freytag
At 01:26 PM 11/27/2004, Philippe Verdy wrote: But it's true that the United States have delegated several times their official international representation to the Unicode Concertium, acting on behalf of the US government for some decisions or some limited domains (this is valid because Unicode

Re: Ligatures

2004-11-27 Thread Asmus Freytag
At 07:44 PM 11/27/2004, Doug Ewell wrote: The problem, as Addison pointed out, is that if you use these forms in text, most searching and sorting operations will fail to recognize them. That's not the only problem. In some languages other ligatures, such as fj might be as commonly needed as fi -

Re: No Invisible Character - NBSP at the start of a word

2004-11-27 Thread Asmus Freytag
At 04:58 PM 11/27/2004, John Hudson wrote: Mark E. Shoulson wrote: Well, that's the difference under discussion. The plain text would seem to be either the qere or the ketiv (but not the combined blended form), since each of those is somewhat sensible. Is there some place in the standard where

Re: No Invisible Character - NBSP at the start of a word

2004-11-28 Thread Asmus Freytag
At 10:10 AM 11/28/2004, Peter Kirk wrote: And I will remember not to implement the official standard whenever I come across such a note, but rather to avoid mis-applied conservatism by following everyone else in breaking the standard. I would have phrased it as: ... in following everyone else in

Re: CGJ , RLM

2004-11-29 Thread Asmus Freytag
Wachs-tube (growth tube) Not the common reading of this. However, a growth tube or growing tube might be an implement in some specialized context. But note that such compounds might also be formed with 'Wuchs-', perhaps even preferentially so. Therefore, reading 'Wachs-' as wax, as Otto

Re: Ideograph?!?

2004-11-29 Thread Asmus Freytag
At 02:14 PM 11/29/2004, Kenneth Whistler wrote: By the way, Google is your friend. If you want to get information about such things, googling for it is a good way to start. I suggest reading: http://encyclopedia.thefreedictionary.com/Chinese%20writing%20system As Richard Cook has pointed out, the

Re: Nicest UTF

2004-12-03 Thread Asmus Freytag
At 09:56 PM 12/2/2004, Doug Ewell wrote: I use ... and UTF-32 for most internal processing that I write myself. Let people say UTF-32 is wasteful if they want; I don't tend to store huge amounts of text in memory at once, so the overhead is much less important than one code unit per character.

RE: No Invisible Character - NBSP at the start of a word

2004-12-07 Thread Asmus Freytag
At 11:52 PM 12/6/2004, Jony Rosenne wrote: In chapter 8, regarding Hebrew, the standard says: Positioning. Marks may combine with vowels and other points, and there are complex typographic rules for positioning these combinations. I understand that this sentence should be regarded as being

Re: [hebrew] Re: proposals I wrote (and also, didn't write)

2004-12-07 Thread Asmus Freytag
At 09:50 PM 12/6/2004, John Hudson wrote: I don't know. I try to avoid politics, if possible. The significance of what I'm saying is that you have made a good start in your proposal, that it has some shortcomings, and that I hope to be able to help put something more complete together. It

Re: US-ASCII (was: Re: Invalid UTF-8 sequences)

2004-12-10 Thread Asmus Freytag
At 12:50 PM 12/10/2004, Kenneth Whistler wrote: Tim Greenwood asked: ... a perfectly normal linguistic process of attributive disambiguation of a term which had grown ambiguous in usage. Is that like the 'Please RSVP' that I see all too often? Or should that not be excused? *grins* Well,

Re: Danda disunification (was Re: New Public Review Issue posted)

2004-12-23 Thread Asmus Freytag
At 04:32 PM 12/23/2004, James Kass wrote: Public Review Issue # 59 concerning danda and double danda doesn't mention the Limbu script specifically. The double danda, at least, is used in the Limbu script. See the exhibit on page 12 of N2410.PDF. It's also listed in the Limbu punctuation shown on

Re: Unicode Inc

2010-05-31 Thread Asmus Freytag
On 5/31/2010 12:33 PM, Tulasi wrote: Thanks Mark for posting the links! My posting was based on http://www.unicode.org/consortium/directors.html where in the bottom it said Unicode Inc. Looks like the elected members from consortium http://www.unicode.org/consortium/consort.html forms Unicode

Re: IS UNICODE a STANDRAD ?

2010-05-31 Thread Asmus Freytag
On 5/31/2010 2:12 PM, V. M. Kumaraswamy wrote: Hello all, Just a clarification an UNICODE. Is UNICODE a STANDRAD Yes, Unicode (The Unicode Standard), is indeed a standard. And no, the use of ALL CAPS is discouraged. The proper spelling is Unicode. that needs to be followed by all

Re: Greek letter LAMDA?

2010-06-01 Thread Asmus Freytag
On 6/1/2010 1:37 PM, John Dlugosz wrote: Why does the code chart call the plain Greek letter (upper and lower case) “LAMDA” rather than “LAMBDA”? The latter is used in other places where a glyph is based on the lambda, e.g. “U+019B LATIN SMALL LETTER LAMBDA WITH STROKE” Names sometimes

Re: Greek letter LAMDA?

2010-06-01 Thread Asmus Freytag
On 6/1/2010 4:14 PM, Mark Crispin wrote: Is it really necessary to have this sort of pedagogical discussions on the Unicode list? Is this character name misspelled? Is Unicode a for-profit company? Who owns the Unicode font? etc. etc. Perhaps we need to have a

Re: Greek letter LAMDA?

2010-06-02 Thread Asmus Freytag
On 6/1/2010 6:04 PM, Mark Crispin wrote: I don't think that the unicode list should be used for the type of questions that have polluted it recently. That list unicode@unicode.org is open for general questions. It has no formal standing as far as the business of the Consortium is concerned, and

Re: Least used parts of BMP.

2010-06-02 Thread Asmus Freytag
On 6/1/2010 8:04 PM, Kannan Goundan wrote: I'm trying to come up with a compact encoding for Unicode strings for data serialization purposes. The goals are fast read/write and small size. Why not use SCSU? You get the small size and the encoder/decoder aren't that complicated. You get the

Re: Greek letter LAMDA?

2010-06-02 Thread Asmus Freytag
On 6/2/2010 11:46 AM, Jonathan Rosenne wrote: Although this mail was not addressed to me, I did read it. Sue me. The terms of use for the Unicode mail list essentially state that these types of boilerplate are null and void as far as Unicode is concerned. You will find the following in

Re: Greek letter LAMDA?

2010-06-02 Thread Asmus Freytag
On 6/2/2010 3:28 PM, John Dlugosz wrote: If anyone can “null and void” it, I wonder why companies bother to put such things in people’s outgoing mail. I would have thought they could come up with a proper net-etiquite version, but they just don’t care. These things are bogus, because they

Re: Least used parts of BMP.

2010-06-02 Thread Asmus Freytag
SCSU is a pass-through for ASCII, plus it handles the common mix of ASCII plus 96 local characters (Latin-1, Greek, Cyrillic, Thai, etc) really fast. Go look at the sample code. If you take that as starting point for optimization, I think you'll be fine.

Re: Least used parts of BMP.

2010-06-04 Thread Asmus Freytag
On 6/4/2010 8:34 AM, Mark Davis ☕ wrote: In a compression format, that doesn't matter; you can't expect random access, nor many of the other features of UTF-8. The minimal expectation for these kinds of simple compression is that when you write a string with a particular /write/ method, and

Re: Questionable lines on LineBreakTest.txt

2010-06-07 Thread Asmus Freytag
On 6/7/2010 4:26 PM, Masaaki Shibata wrote: I'm studying the UAX #14 (5.2.0) and testing my code against LineBreakTest.txt. And I found some test cases on this text file seem to be contradictory to the rules on the document. For example, LB25 explicitly prohibits breaking between CP and PO,

Re: Tamil u,uu matra consonants - Orthographic variation

2010-06-09 Thread Asmus Freytag
Can we stop double posting on Unicode and Unicore list? People on the unicode list cannot reply to people on the other list, and vice versa (unless they happen to be mermbers of both lists). Thanks. A./

Re: Writing a proposal for an unusual script: SignWriting

2010-06-14 Thread Asmus Freytag
On 6/14/2010 1:18 PM, Mark E. Shoulson wrote: On 06/14/2010 02:15 PM, Asmus Freytag wrote: On 6/14/2010 9:21 AM, Stephen Slevinski wrote: Plain text SignWriting should be able to write actual sign language, such as hello world. You could equally well insist that it should be possible

Re: Latin Script

2010-06-17 Thread Asmus Freytag
On 6/17/2010 7:24 PM, Tulasi wrote: What is equivalent ISO/IEC ISO/IEC what? There are hundreds of ISO/IEC standards, of which dozens are character encoding standards. for U+0278 LATIN SMALL LETTER PHI (ɸ)? Or do Unicode ISO/IEC use different number name for same letter/symbol?

Re: Indian Rupee Sign to be chosen today

2010-06-26 Thread Asmus Freytag
On 6/26/2010 5:41 PM, Doug Ewell wrote: Regarding the inability to distinguish 8859-15 heuristically from 8859-1, I understand the problem when there are no tags or other hints, or for cases like Windows-1252 text declared to be 8859-1, but it seems unlikely to me that there is much text

Re: Generic Base Letter

2010-06-27 Thread Asmus Freytag
The one argument that I find convincing is that too many implementations seem set to disallow generic combination, relying instead on fixed tables of known/permissible combinations. In that situation, a formally adopted character with the clearly stated semantic of is expected to actually

Re: charset parameter in Google Groups (was Re: Indian Rupee Sign to be chosen today)

2010-06-28 Thread Asmus Freytag
On 6/28/2010 11:38 AM, Mark Davis ☕ wrote: The problem with slavishly following the charset parameter is that it is often incorrect. However, the charset parameter is a signal into the character detection module, so the charset is correctly supplied from the message then the results of the

Re: Latin Script

2010-06-28 Thread Asmus Freytag
I'd like to second Mark. There is a lot of information in the Standard, including the UAXs, and the Unicode Character Database that would help answer your questions. The volunteers associated with the Unicode effort have worked hard putting all that information together - so use it, instead

Re: charset parameter in Google Groups

2010-07-07 Thread Asmus Freytag
Andreas, I think we all realize your frustration with well-meaning software. Because tags can be wrong for no fault of the human originating the document, I fully understand that Google might want to attempt to improve the user experience in such situations. The problem is that doing so

Re: ? Reasonable to propose stability policy on numeric type = decimal

2010-07-24 Thread Asmus Freytag
On 7/24/2010 3:00 PM, Bill Poser wrote: On Sat, Jul 24, 2010 at 1:00 PM, Michael Everson ever...@evertype.com wrote: Digits can be scattered randomly about the code space and it wouldn't make any difference. Having written a library for performing conversions between Unicode strings

Re: Reasonable to propose stability policy on numeric type = decimal

2010-07-25 Thread Asmus Freytag
The short answer to Karl's question is that there will not be an absolute guarantee. The long answer is that, partly for the reasons he's mentioned, this won't be a practical problem. A. Most of the living scripts that are in wide use have been encoded, including whatever digits are in use.

Re: Reasonable to propose stability policy on numeric type = decimal

2010-07-25 Thread Asmus Freytag
On 7/25/2010 6:05 PM, Martin J. Dürst wrote: On 2010/07/26 4:37, Asmus Freytag wrote: PPS: a very hypothetical tough case would be a script where letters serve both as letters and as decimal place-value digits, and with modern living practice. Well, there actually is such a script, namely

Re: ? Reasonable to propose stability policy on numeric type = decimal

2010-07-26 Thread Asmus Freytag
On 7/26/2010 12:13 PM, Mark Davis ☕ wrote: I agree that having it stated at point of use is useful - and we do that in other cases covered by stability clauses; but we can only state it IF we have the corresponding stability policy. Mark, The statement in your but clause really isn't correct.

Re: Why does EULER CONSTANT not have math property and PLANCK CONSTANT does?

2010-07-27 Thread Asmus Freytag
On 7/27/2010 3:02 PM, Kenneth Whistler wrote: Karl Williamson asked: Subject: Why does EULER CONSTANT not have math property and PLANCK CONSTANT does? They are U+2107 and U+210E respectively. Because U+210E PLANCK CONSTANT is, to quote the standard, simply a mathematical

Re: High dot/dot above punctuation?

2010-07-28 Thread Asmus Freytag
On 7/28/2010 2:02 AM, Kent Karlsson wrote: Den 2010-07-28 09.50, skrev Jukka K. Korpela jkorp...@cs.tut.fi: André Szabolcs Szelp wrote: Generally, for the decimal point . (U+002E FULLSTOP) and , (U+002C COMMA) is used in the SI world. However, earlier conventions could use different

Re: High dot/dot above punctuation?

2010-07-28 Thread Asmus Freytag
On 7/28/2010 10:09 AM, Murray Sargent wrote: Contextual rendering is getting to be more common thanks to adoption of OpenType features. For example, both MS Publisher 2010 and MS Word 2010 support various contextually dependent OpenType features at the user's discretion. The choice of glyph for

Re: Reasonable to propose stability policy on numeric type = decimal

2010-07-28 Thread Asmus Freytag
On 7/28/2010 10:13 PM, Martin J. Dürst wrote: Sequences of numeric Kanji are also used in names and word-plays, and as sequences of individual small numbers. But the same applies to our digits. A very simple example is to use them as a ruler in plain text: 1 2 3

Re: Plain text

2010-07-29 Thread Asmus Freytag
On 7/28/2010 9:32 PM, Doug Ewell wrote: Murray Sargent murrays at exchange dot microsoft dot com wrote: It's worth remembering that plain text is a format that was introduced due to the limitations of early computers. Books have always been rendered with at least some degree of rich text. And

Re: Draft Proposal to add Variation Sequences for Latin and Cyrillic letters

2010-08-04 Thread Asmus Freytag
On 8/2/2010 5:04 PM, Karl Pentzlin wrote: I have compiled a draft proposal: Proposal to add Variation Sequences for Latin and Cyrillic letters The draft can be downloaded at: http://www.pentzlin.com/Variation-Sequences-Latin-Cyrillic2.pdf (4.3 MB). The final proposal is intended to be submitted

Re: Standard fallback characters (was: Draft Proposal to add Variation=D=A Sequences for Latin and Cyrillic letters)

2010-08-04 Thread Asmus Freytag
On 8/4/2010 1:30 PM, verdy_p wrote: Asmus Freytag wrote: The Fraktur problem is one where one typestyle requires additional information (e.g. when to select long s) that is not required for rendering the same text in another typestyle. If it is indeed desirable (and possible) to create

Re: Re:=D=A Standard fallback characters (was: Draft Proposal to add Variation� Sequences for Latin and Cyrillic letters)

2010-08-04 Thread Asmus Freytag
Philipe, Text typeset in Fraktur contains more information than text typset in Antiqua. That means, there are some places where there are some (mild) ambiguities in representation in the Antiqua version. Not enough to bother a human reader who can use deep context to read the text correctly,

Re: Draft Proposal to add Variation Sequences for Latin and Cyrillic letters

2010-08-05 Thread Asmus Freytag
On 8/5/2010 3:47 AM, William_J_G Overington wrote: On Wednesday 4 August 2010, Asmus Freytag asm...@ix.netcom.com wrote: However, there's no need to add variation sequences to select an *ambiguous* form. Those sequences should be removed from the proposal. Are you here talking about

Re: Accessing alternate glyphs from plain text (from Re: Draft Proposal to add Variation Sequences for Latin and Cyrillic letters)

2010-08-06 Thread Asmus Freytag
On 8/6/2010 2:03 AM, William_J_G Overington wrote: On Thursday, 5 August 2010, Kenneth Whistler k...@sybase.com wrote: I am thinking of where a poet might specify an ending version of a glyph at the end of the last word on some lines, yet not on others, for poetic effect. I think that it

Re: A simpler definition of the Bidi Algorithm

2010-09-10 Thread Asmus Freytag
The first discussions that lead to the current formulation of the bidi algorithm easily go back 20 years by now. There's some value in not re-stating a specification - even if a new formulation could be found to be 100% equivalent. That value lies in the fact that any reader can tell, by

Re: 00B7 vs. 2027

2010-09-18 Thread Asmus Freytag
On 9/18/2010 8:36 AM, abysta wrote: Hello. I need a dot to separate words into syllables. What should I use, 00B7 or 2027, and why? 2027 is explicitly intended to be used to show syllables as is done in dictionaries. You don't make it explicit in your query, but it sounds like that is

Re: 00B7 vs. 2027

2010-09-18 Thread Asmus Freytag
On 9/18/2010 10:56 AM, Lorna Priest wrote: U+00B7 MIDDLE DOT is semantically ambiguous and has (partly therefore) varying renderings, and it might be used as a replacement for U+2027 if the latter cannot be used reliably. What about using U+02D1 - half triangular colon? Why not use

Re: statistics

2010-10-12 Thread Asmus Freytag
On 10/11/2010 9:49 PM, Janusz S. Bień wrote: On Mon, 11 Oct 2010 announceme...@unicode.org wrote: The newly finalized Unicode Version 6.0 adds 2,088 characters, What is the current total? Are other statistic informations available somewhere? The announcement gives a link to click

Re: Irrational numeric values in TUS

2010-10-12 Thread Asmus Freytag
Ken, some comments, and a few suggestions near the end. On 10/12/2010 4:56 PM, Kenneth Whistler wrote: Karl Williamson asked: The Unicode standard only gives numeric values to rational numbers. Is the reason for this merely because of the difficulty of representing irrational ones? No.

Re: [unicode] Telugu Unicode Encoding Review

2010-10-16 Thread Asmus Freytag
On 10/16/2010 10:38 AM, suzuki toshiya wrote: Hi, I've never heard any comments about the reservation of the codepoints to making the code chart structure similar among multiple script, no posive, no negative. So your comment is interesting. Could you tell me more about what kind of

Re: A simpler definition of the Bidi Algorithm

2010-10-17 Thread Asmus Freytag
On 10/17/2010 7:01 AM, Michael D. Adams wrote: This is something that not even the C++ and Java reference implementations do (though it appears that the C++ implementation of the W rules was originally derived from a regular expression as it uses state tables, but if so it is undocumented).

Re: A simpler definition of the Bidi Algorithm

2010-10-17 Thread Asmus Freytag
On 10/17/2010 10:59 AM, Michael D. Adams wrote: The biggest challenge was not in creating those tables, but in understanding the nuances of the rules, by the way. Two questions so I can understand better. First, by nuances do you mean the nuances of how the rules interact (which I think would

Re: Utility to report and repair broken surrogate pairs in UTF-16 text

2010-11-05 Thread Asmus Freytag
On 11/4/2010 5:46 PM, Doug Ewell wrote: Markus Scherer wrote: While processing 16-bit Unicode text which is not assumed to be well-formed UTF-16, you can treat (decode) an unpaired surrogate as a mostly-inert surrogate code point. However, you cannot unambiguously encode a surrogate code

Re: Utility to report and repair broken surrogate pairs in UTF-16 text

2010-11-05 Thread Asmus Freytag
On 11/5/2010 7:02 AM, Doug Ewell wrote: Asmus Freytagasmusf at ix dot netcom dot com wrote: I'm probably missing something here, but I don't agree that it's OK for a consumer of UTF-16 to accept an unpaired surrogate without throwing an error, or converting it to U+FFFD, or otherwise raising

Re: Is there a term for strictly-just-this-encoding-and-not-really-that-encoding?

2010-11-10 Thread Asmus Freytag
If you want to get that point across to a general audience, you could use a more colloquial term, albeit one that itself derives from mathematics. Text that can be completely expressed in ASCII is fits into something (ASCII) that works as a lowest common denominator of a large number of

Re: Application that displays CJK text in Normalization Form D

2010-11-14 Thread Asmus Freytag
On 11/14/2010 12:57 PM, Doug Ewell wrote: Jim Monty jim dot monty at yahoo dot com wrote: Japanese kana (the J in CJK) and Korean syllables (the K in CJK) both have different normalization forms. What do ideographs have to do with anything? I didn't mention ideographs; you did. The term CJK

Re: Application that displays CJK text in Normalization Form D

2010-11-15 Thread Asmus Freytag
On 11/15/2010 2:24 PM, Kenneth Whistler wrote: FA47 is a compatibility character, and would have a compatibility mapping. Faulty syllogism. Formally correct answer but only because of something of a design flaw in Unicode. When the type of mapping was decided on, people didn't fully expect

Re: CJK Compatibility Gotchas (was: Re: Application that displays CJK text in Normalization Form D

2010-11-15 Thread Asmus Freytag
On 11/15/2010 5:43 PM, Kenneth Whistler wrote: Perhaps someone would like to make a detailed proposal to the UTC for how to fix the text and charts?;-) Ken, having shown yourself the master of detail in your reply, I think you've appointed yourself. A round of applause for Ken! See how

Re: Are Latin and Cyrillic essentially the same script?

2010-11-18 Thread Asmus Freytag
On 11/18/2010 8:04 AM, Peter Constable wrote: From: unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] On Behalf Of André Szabolcs Szelp AFAIR the reservations of WG2 concerning the encoding of Jangalif Latin Ь/ь as a new character were not in view of Cyrillic Ь/ь, but rather in

Re: Are Latin and Cyrillic essentially the same script?

2010-11-19 Thread Asmus Freytag
On 11/18/2010 11:15 PM, Peter Constable wrote: If you'd like a precedent, here's one: Yes, I think discussion of precedents is important - it leads to the formulation of encoding principles that can then (hopefully) result in more consistency in future encoding efforts. Let me add the

Re: Are Latin and Cyrillic essentially the same script?

2010-11-22 Thread Asmus Freytag
On 11/22/2010 4:15 AM, Michael Everson wrote: It boils down to this: just as there aren’t technical or usability reasons that make it problematic to represent IPA text using two Greek characters in an otherwise-Latin system, Yes there are. Sorting multilingual text including Greek and IPA

<    1   2   3   4   5   6   7   8   9   10   >