Unicode 4.0 Poster

2003-11-24 Thread Mark Davis
I remembered that I had done something with making a Unicode Poster some time ago. Dusted it off, and posted the results. Voila, every Unicode character in 4.0: http://www.macchiato.com/unicode/UnicodeChart.zip Columns: 256, Rows: 410 all unassigned rows are skipped (with double line showing

Re: Request - convert ISCII to Unicode

2003-11-21 Thread Mark Davis
Unfortunately, charset names -- including IANA names -- are in general not well-defined, in the sense that - one can access a mapping table to/from Unicode/10646 for them - that mapping table is guaranteed to represent what a vendor actually does in conversion APIs. Thus, what we base our aliases

Proposed Successor to RFC 3066 (language tags)

2003-11-19 Thread Mark Davis
Addison and I have been working on a proposed successor to RFC 3066 (language tags), which should be of interest to many people on this list. http://www.ietf.org/internet-drafts/draft-langtags-phillips-davis-01.txtFeedback is welcome.Mark Note: we submitted a PDF version at the same time,

Re: Ternary search trees for Unicode dictionaries

2003-11-18 Thread Mark Davis
- Original Message - From: Theodore H. Smith [EMAIL PROTECTED] To: Mark Davis [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Tue, 2003 Nov 18 08:42 Subject: Re: Ternary search trees for Unicode dictionaries Hi Mark, Your tries are nice, however they are being used for single unicode characters

Re: Ternary search trees for Unicode dictionaries

2003-11-17 Thread Mark Davis
We tend to use tries, which have very good performance characteristics. See bits of unicode on my site: www.macchiato.com. Mark __ http://www.macchiato.com - Original Message - From: Theodore H. Smith [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Mon,

Re: compatibility characters (in XML context)

2003-11-14 Thread Mark Davis
Phillipe, instead of trying to sound authoritative by making up a whole-cloth definition -- one that is completely and utterly wrong -- and thereby confuse and mislead a beginner, you should either be silent or simply point the person to the Unicode glossary:

Re: ZWJ, ZWNJ, CGJ and combination

2003-11-10 Thread Mark Davis
Message - From: Kent Karlsson [EMAIL PROTECTED] To: 'Peter Kirk' [EMAIL PROTECTED]; 'Mark Davis' [EMAIL PROTECTED] Cc: 'Unicode List' [EMAIL PROTECTED]; 'Roozbeh Pournader' [EMAIL PROTECTED] Sent: Mon, 2003 Nov 10 03:01 Subject: RE: ZWJ, ZWNJ, CGJ and combination ... I would see this use

Re: Hexadecimal digits?

2003-11-10 Thread Mark Davis
I agree -- this is pointless. The UTC has discussed this before, and I don't think there is any chance that the UTC would add either: (a) made-up hexadecimal digits that differ in shape from A-F, or (b) glyphic clones of A-F that were hexadecimal digits. Mark __

Re: ZWJ, ZWNJ, CGJ and combination

2003-11-09 Thread Mark Davis
] To: Mark Davis [EMAIL PROTECTED] Cc: Unicode List [EMAIL PROTECTED] Sent: Sun, 2003 Nov 09 09:19 Subject: Re: ZWJ, ZWNJ, CGJ and combination On 08/11/2003 17:09, Mark Davis wrote: I agree with the first part of your analysis. By the phrase requesting ligation of combining characters

Re: ZWJ, ZWNJ, CGJ and combination

2003-11-08 Thread Mark Davis
The UTC just approved a clarification of the base character definition, as follows: D13a Graphic character: a character with the General Categories of Letter (L), Combining Mark (M), Number (N), Punctuation (P), Symbol (S), or Space Separator (Zs). Graphic characters specifically exclude

Re: ZWJ, ZWNJ, CGJ and combination

2003-11-08 Thread Mark Davis
(followup) And for checking character properties without having to delve into the UCD data files, try the ICU Demo at: http://oss.software.ibm.com/cgi-bin/icu/ub/utf-8/?ch=200B Mark __ http://www.macchiato.com - Original Message - From: Peter Kirk

Re: ZWJ, ZWNJ, CGJ and combination

2003-11-08 Thread Mark Davis
- Original Message - From: Peter Kirk [EMAIL PROTECTED] To: Mark Davis [EMAIL PROTECTED] Cc: Unicode List [EMAIL PROTECTED] Sent: Sat, 2003 Nov 08 16:09 Subject: Re: ZWJ, ZWNJ, CGJ and combination On 08/11/2003 15:52, Mark Davis wrote: The UTC just approved a clarification of the base

Re: [hebrew] Re: ZWJ, ZWNJ, CGJ and combination

2003-11-08 Thread Mark Davis
You are stating many things as if they were facts, when they are simply not true. You should verify them against the definitions before stating them in such a 'definitive' way. Examples: - VS1 is a combining character, and not a base character.

Re: Merging combining classes, was: New contribution N2676

2003-10-27 Thread Mark Davis
Thank you for the interesting thoughts. As I understand your suggestion, and bearing in mind that dagesh (and the rare rafe) are also consonant modifiers, you are effectively suggesting an order (already normalised): consonant dagesh rafe shin/sin-dot CGJ right-meteg CGJ vowel accent CGJ

Re: transliteration in java

2003-10-25 Thread Mark Davis
Check out ICU4J (http://oss.software.ibm.com/icu4j/). There is a demo of transliteration at http://oss.software.ibm.com/cgi-bin/icu/tr. For Cyrillic, we currently only do an ISO-based transliteration, but you can do your own custom ones. (The demo will store custom rules that people have

Re: [OT] RE: GDP by language

2003-10-23 Thread Mark Davis
I want to caution people that the chart should *not* be taken as an exact guide. The percentage of language speakers within a country, and the percent of GDP ascribable to those language speakers are all pretty fuzzy. In addition, I had excluded countries that were at or below 0.05% of world GDP,

Re: GDP by language

2003-10-22 Thread Mark Davis
__ http://www.macchiato.com - Original Message - From: Marco Cimarosti [EMAIL PROTECTED] To: 'Mark Davis' [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Wed, 2003 Oct 22 02:17 Subject: RE: GDP by language Mark Davis wrote: BTW, some time ago I had

GDP by language

2003-10-21 Thread Mark Davis
BTW, some time ago I had generated a pie chart of world GDP divided up by language. Someone on this list asked for a copy, so I posted it here in case others might find it interesting: http://www.macchiato.com/economy/GDP_PPP_by_language.pdf Mark __

Re: GDP by language

2003-10-21 Thread Mark Davis
It is PPP. (You get a very different pie chart with other measures of GDP, of course). Mark __ http://www.macchiato.com - Original Message - From: Patrick Andries [EMAIL PROTECTED] To: Mark Davis [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED

Re: GDP by language

2003-10-21 Thread Mark Davis
multilingual populations. Still, I think it is close enough to get an useful overall picture. Mark __ http://www.macchiato.com - Original Message - From: John Cowan [EMAIL PROTECTED] To: Mark Davis [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent

Re: Klingons and their allies - Beyond 17 planes

2003-10-21 Thread Mark Davis
I can even read Mark Davis' signature - that is, it appears correctly, I'd love to know what it means! shiSyAdicchetparAjayam shiSyAt from the student icchet one should desire parAjayam defeat A teacher should wish to be defeated by his own student in scholarship I got this from

Re: GDP by language

2003-10-21 Thread Mark Davis
I don't think it is quite that simple. Look at India, for example. Mark __ http://www.macchiato.com - Original Message - From: John Cowan [EMAIL PROTECTED] To: Mark Davis [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Tue, 2003 Oct 21 12:36 Subject: Re

IAB positions with respect ISO royalties, CS

2003-09-29 Thread Mark Davis
With respect to the issues we raised in http://www.unicode.org/consortium/utc-positions.html, the IAB has taken the following positions: http://www.iab.org/documents/correspondance/2003-09-25-iso-cs-code.html http://www.iab.org/documents/correspondance/2003-09-23-isocodes.html

Re: [OT?] ICU training offerings anyone?

2003-08-26 Thread Mark Davis
IBM currently doesn't currently offer regular public ICU training. We do provide overviews of ICU at the Unicode conferences (and will do so again at the upcoming Atlanta GA (USA) meeting). If enough people at that conference are interested, we may also be able to hold an ad hoc session there. If

minor update to UAX #29

2003-08-23 Thread Mark Davis
There is a minor update to http://www.unicode.org/reports/tr29/tr29-5.html to use the new UCD property. Mark __ http://www.macchiato.com Eppur si muove

Re: Proposed Draft UTR #31 - Syntax Characters

2003-08-22 Thread Mark Davis
The purpose of the Pattern Syntax characters is *not* to list everything that is a symbol or punctuation mark. That exists independently. Think of them as operators in the engine syntax, as ? or * are used today in Perl, or as +, -, /, * could be used in math expressions. The goal is to have a

Re: Proposed Draft UTR #31 - Syntax Characters

2003-08-22 Thread Mark Davis
Technical Report issues would be fine. I think #1 is worth considering. For #2, see other message to Peter Kirk. Mark __ http://www.macchiato.com Eppur si muove - Original Message - From: Marco Cimarosti [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent:

UCA 4.0.0d5

2003-08-22 Thread Mark Davis
There is a new version of the default Unicode collation element table at: http://www.unicode.org/reports/tr10/allkeys-4.0.0d5.txt with corresponding charts at http://www.unicode.org/charts/collation/beta/ Mark __ http://www.macchiato.com Eppur si muove

Re: Please reduce chatty posts to Unicode list

2003-08-21 Thread Mark Davis
Agreed. Maybe we could have an [EMAIL PROTECTED] just so that people can shift there conversations over there when they depart from discussions of Unicode. Then people can discuss conventions for the price of a metric pint of beer with hexidecimal euro number formats to their heart's content, and

Re: [bidi] Re: Unicode Collation Algorithm: 4.0 Update (beta)

2003-08-21 Thread Mark Davis
be on those groups anyway. Mark __ http://www.macchiato.com Eppur si muove - Original Message - From: Matitiahu Allouche [EMAIL PROTECTED] To: Mark Davis [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Thursday, August 21, 2003 00:55

Re: Proposed Draft UTR #31 - Syntax Characters

2003-08-21 Thread Mark Davis
There is one open issue I'd like to draw people's attention to: whether to have a narrow or broader approach to the whitespace in a pattern environment. The narrower definition would be: 0009..000D ; Pattern_White_Space # CHARACTER TABULATION..CARRIAGE RETURN (CR) 0020 ; Pattern_White_Space

Re: Proposed Draft UTR #31 - Syntax Characters

2003-08-21 Thread Mark Davis
I suspect your distinction is a bit too subtle to be useful. Having, for example, a RLM only have affect when adjacent to a space in a regular expression would be pretty prone to error; expecially since the character would be invisible. The reason for allowing LRM and RLM is to be able to make

Re: [bidi] Re: Unicode Collation Algorithm: 4.0 Update (beta)

2003-08-21 Thread Mark Davis
, then raised to the bidi list once there is more consensus. Mark __ http://www.macchiato.com Eppur si muove - Original Message - From: Peter Kirk [EMAIL PROTECTED] To: Mark Davis [EMAIL PROTECTED] Cc: Matitiahu Allouche [EMAIL PROTECTED]; [EMAIL PROTECTED

Re: Proposed Draft UTR #31 - Syntax Characters

2003-08-21 Thread Mark Davis
Remember, this is *not* when using the pattern to parse, this is in constructing the pattern itself. Mark __ http://www.macchiato.com Eppur si muove - Original Message - From: Ben Dougall [EMAIL PROTECTED] To: Mark Davis [EMAIL PROTECTED] Cc: [EMAIL

Re: [Way OT] Beer measurements

2003-08-21 Thread Mark Davis
Could the [Way OT] discussion be moved to a egroup or other forum? Mark __ http://www.macchiato.com Eppur si muove - Original Message - From: Anto'nio Martins-Tuva'lkin [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, August 21, 2003 13:00

Re: [bidi] Re: Unicode Collation Algorithm: 4.0 Update (beta)

2003-08-19 Thread Mark Davis
or not? Mark __ http://www.macchiato.com Eppur si muove - Original Message - From: Matitiahu Allouche [EMAIL PROTECTED] To: Mark Davis [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Tuesday, August 19, 2003 01:21 Subject

Re: [Way OT] Beer measurements (was: Re: Handwritten EURO sign)

2003-08-19 Thread Mark Davis
Yes, I am sick and tired of dealing with this horrible non-decimal measurement system the US has for time: the number of units per other unit vary all across the board: 60..61 : 1, 60 : 1, 24 : 1, 28..31 : 1, 12 : 1, 365..366 : 1 -- awful. At least with inches, feet, and miles, the number of feet

Re: [bidi] Re: Unicode Collation Algorithm: 4.0 Update (beta)

2003-08-19 Thread Mark Davis
://www.unicode.org/reports/tr10/tr10-10.html for more information. Mark __ http://www.macchiato.com Eppur si muove - Original Message - From: Peter Kirk [EMAIL PROTECTED] To: Mark Davis [EMAIL PROTECTED] Cc: Matitiahu Allouche [EMAIL PROTECTED]; [EMAIL PROTECTED

Re: [bidi] Re: Unicode Collation Algorithm: 4.0 Update (beta)

2003-08-19 Thread Mark Davis
! Mark __ http://www.macchiato.com Eppur si muove - Original Message - From: Peter Kirk [EMAIL PROTECTED] To: Mark Davis [EMAIL PROTECTED] Cc: Matitiahu Allouche [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; Joan Wardell

Re: [bidi] Re: Unicode Collation Algorithm: 4.0 Update (beta)

2003-08-18 Thread Mark Davis
] To: Mark Davis [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Monday, August 18, 2003 10:08 Subject: [bidi] Re: Unicode Collation Algorithm: 4.0 Update (beta) I have submitted the following text on the Unicode Reporting form. This report relates to the collation

Re: Unicode Collation Algorithm: 4.0 Update (beta)

2003-08-17 Thread Mark Davis
There are also beta collation charts in: http://www.unicode.org/charts/collation/beta/ Mark __ http://www.macchiato.com Eppur si muove - Original Message - From: Rick McGowan [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Friday, August 15, 2003 19:27

Re: [indic] Re: Unicode Collation Algorithm: 4.0 Update (beta)

2003-08-17 Thread Mark Davis
comments below. Mark __ http://www.macchiato.com Eppur si muove - Original Message - From: Michael (michka) Kaplan [EMAIL PROTECTED] To: Mark Davis [EMAIL PROTECTED]; [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Sunday, August 17

Re: Questions on ZWNBS - for line initial holam plus alef

2003-08-14 Thread Mark Davis
Peter, in XML you really don't want to use attributes for any general text; there are too many restrictions on the content. For example, we never put translatable text into them. Attributes should really be treated more like sequences of symbols, with a constrained syntax. This is also not in

Re: Display of Isolated Nonspacing Marks (was Re: Questions on ZWNBS...)

2003-08-14 Thread Mark Davis
Moreover, as I wrote before, the wording in that one paragraph in 3.0 is not clearly stated, but it is clear from a reading of the rest of the standard -- with numerous examples -- and from the UCD 3.0 properties, that space *is not* a format character, and *is* a suitable base for combining

Re: Questions on ZWNBS - for line initial holam plus alef

2003-08-14 Thread Mark Davis
Some of this seems to be in reference to an earlier contention that Text Boundaries (inc. Lines) break between the space and the non-spacing mark. I think this was attributed to Phillipe. [This may not be true: I don't actually read his email, because the information content per line falls below

Re: Assume everything on this list is ignored

2003-08-14 Thread Mark Davis
Questions [at http://www.unicode.org/faq/];. (Which I did). Now, if it is true, as Mark Davis suggests, that the Frequently Asked Questions list at http://www.unicode.org/faq/; is unrelated to this list, then: (1) This should be made clear on the consortium's web page (http://www.unicode.org

Re: Display of Isolated Nonspacing Marks (was Re: Questions on ZWNBS...)

2003-08-14 Thread Mark Davis
Where did you get the notion that space is not a base character? And base characters include those that are not control or format characters. Space is neither one. The standard specifically states in a number of places that to exhibit a combining mark in isolation you use a space (or NBSP). Mark

Assume everything on this list is ignored (was Re: Newbie Question - what are all those duplicated characters FO R?)

2003-08-11 Thread Mark Davis
I repeat again. Nothing on this list has any guarantee that it will be seen by anyone in the UTC. If you want to submit a FAQ question that's great -- and I strongly encourage it. But please use: http://www.unicode.org/reporting.html to make sure it is tracked. The same goes for comments from

Re: Display of Isolated Nonspacing Marks (was Re: Questions on ZWNBS...)

2003-08-10 Thread Mark Davis
As for oe-ligature, the French representative to WG3 (or its predecessor) said that France could live without it. Even worse; the story I heard was that the committee had planned from the start to have and in positions D7 and F7, but that late in the process the representative from France

Re: Display of Isolated Nonspacing Marks (was Re: Questions on ZWNBS...)

2003-08-08 Thread Mark Davis
- Original Message - From: Peter Kirk [EMAIL PROTECTED] To: Mark Davis [EMAIL PROTECTED] Cc: Unicode List [EMAIL PROTECTED] Sent: Tuesday, August 05, 2003 14:50 Subject: Re: Display of Isolated Nonspacing Marks (was Re: Questions on ZWNBS...) On 05/08/2003 14:40, Mark Davis wrote: Where

Re: Questions on ZWNBS

2003-08-04 Thread Mark Davis
The ZWSP and Word Joiner (plus ZWNBSP in its discouraged usage) are targeted specifically at encouraging or avoiding *line break*. Their names may be misleading; people intending to use them for any other function should carefully read the sections of the Unicode Standard that discuss their usage.

Please use other list (was Re: Hebrew Vav Holam)

2003-08-01 Thread Mark Davis
I would remind the people interested in Hebrew issues that a list has been set up for their benefit, and recommend that they use it. Cf. Darling Unicadetti... By popular demand, considering the deluge of Biblical Hebrew issues cropping up recently on the Unicode list, I have created a new

Re: Please use other list (was Re: Hebrew Vav Holam)

2003-08-01 Thread Mark Davis
ok, np Mark __ http://www.macchiato.com Eppur si muove - Original Message - From: John Cowan [EMAIL PROTECTED] To: Mark Davis [EMAIL PROTECTED] Cc: Peter Kirk [EMAIL PROTECTED]; Ted Hopp [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Friday, August 01

Deterministic Sort

2003-08-01 Thread Mark Davis
Various people have demonstrated a certain amount of confusion around the notion of a deterministic sort vs a deterministic comparison. This is an important issue for Unicode sorting and string comparison, so I put together some material into a tech note and passed it by the editorial committee.

Re: From [b-hebrew] Variant forms of vav with holem

2003-07-30 Thread Mark Davis
This depends on who you mean by we. It's not just you and me, Ted. If in discussions on this list a consensus is reached that this is the best way to go, then we have the top people in Unicode behind us and We should make sure that you all understand that this email list is an open disucssion

Process (was Re: From [b-hebrew] Variant forms of vav with holem)

2003-07-30 Thread Mark Davis
muove - Original Message - From: Peter Kirk [EMAIL PROTECTED] To: Mark Davis [EMAIL PROTECTED] Cc: Ted Hopp [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Wednesday, July 30, 2003 18:19 Subject: Re: From [b-hebrew] Variant forms of vav with holem On 30/07/2003 17:04, Mark Davis wrote: We

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-28 Thread Mark Davis
Changing the canonical order is not going to happen. If you want to read about the problems that that would cause, there has been plenty written about it on this list if you consult the archives. Mark __ http://www.macchiato.com Eppur si muove - Original

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-24 Thread Mark Davis
Peter, Effectively we'd be looking at some amendment to the normalization algorithms to insert CGJ in certain enumerated contexts. The standard normalization forms (NFC, NFD, NFKC, NFKD) will certainly not change in this regard. On the other hand, it would be possible to add additional

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-23 Thread Mark Davis
Peter, This all depends on whether the UTC approves, at the upcoming meeting in August, the proposal to extend the use of CGJ to allow for inclusion within sequences of combining marks in order to prevent reordering of those marks. Of course, it could be used right now for that purpose, in the

Re: Yerushala(y)im - or Biblical Hebrew

2003-07-23 Thread Mark Davis
Exactly. See http://www.unicode.org/faq/normalization.html#8, for example. (Note: the last FAQ would change if the UTC accepts the proposal for usage of CGJ.) Mark __ http://www.macchiato.com Eppur si muove - Original Message - From: Peter Kirk [EMAIL

Proposed Draft UTR #31: Identifier and Pattern Syntax

2003-07-18 Thread Mark Davis
There is a new proposed draft TR available for public comment on http://www.unicode.org/reports/tr31/. This document describes specifications for recommended defaults for the use of Unicode in the definitions of identifiers and in pattern-based syntax. It incorporates the Identifier section of

Re: ISO 639 duplicate codes (was: Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures)

2003-07-14 Thread Mark Davis
and Azeri, was: Accented ij ligatures) On Monday, July 14, 2003 5:34 AM, Mark Davis [EMAIL PROTECTED] wrote: ... Of course Java already includes some parts of ICU, but other things are in ICU4J are difficult now to integrate in Java, simply because IBM forgot to modularize ICU so

Re: ISO 639 duplicate codes (was: Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures)

2003-07-13 Thread Mark Davis
... Of course Java already includes some parts of ICU, but other things are in ICU4J are difficult now to integrate in Java, simply because IBM forgot to modularize ICU so that it can be integrated slowly. Accepting ICU4J as part of the core is a big decision choice, because ICU4J is quite

Re: Biblical Hebrew (Was: Major Defect in Combining Classes of Tibetan Vowels)

2003-06-26 Thread Mark Davis
Another consequence is that it separates the sequence into two combining sequences, not one. Don't know if this is a serious problem, especially since we are concerned with a limited domain with non-modern usage, but I wanted to mention it. Mark __

Re: Biblical Hebrew

2003-06-26 Thread Mark Davis
1. I agree with Ken about the current lack of precedent for Cfs before combining marks. Interestingly, that we do have a proposal to do just that, in http://www.unicode.org/review/pr-9.pdf However, note that the whole purpose of putting the Cf after the Ra is to separate it from the halant, so

Re: Major Defect in Combining Classes of Tibetan Vowels

2003-06-25 Thread Mark Davis
Michael, that is like saying move the bloody character or remove the bloody character. Mark __ http://www.macchiato.com Eppur si muove - Original Message - From: Michael Everson [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, June 25, 2003

Re: Major Defect in Combining Classes of Tibetan Vowels

2003-06-25 Thread Mark Davis
this was the case Someone might misread your statement. We did not change the combining classes for Hebrew. Mark __ http://www.macchiato.com Eppur si muove - Original Message - From: Michael (michka) Kaplan [EMAIL PROTECTED] To: [EMAIL PROTECTED];

Re: conformance for unicode 2.x?

2003-06-06 Thread Mark Davis
If you start on http://www.unicode.org/ and click on Start Here, you'll get to a page about the Unicode Standard. In the left-hand column, clicking on Versions of the Unicode Standard will get you to http://www.unicode.org/standard/versions/. In the left-hand column you will see the different

Re: Classification of U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK

2003-06-06 Thread Mark Davis
__ http://www.macchiato.com Eppur si muove - Original Message - From: Mount, Rob (Robert F) [EMAIL PROTECTED] To: Mark Davis [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Thursday, June 05, 2003 11:57 Subject: RE: Classification of U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK Thanks

Re: conformance for unicode 2.x?

2003-06-06 Thread Mark Davis
for 2.x are archived? If so, is that a good idea? Barry At 11:09 AM 6/5/2003 -0700, Mark Davis wrote: If you start on http://www.unicode.org/ and click on Start Here, you'll get to a page about the Unicode Standard. In the left-hand column, clicking on Versions of the Unicode Standard

Re: Encoding converion through JDBC

2003-06-05 Thread Mark Davis
A few items: I agree with your main point, which is that UCS-2 is, for all practical purposes, just a repertoire subset of UTF-16; the code units and bit-width are the same. Some Java classes that assume that the char arithmetic will automatically roll after 16 bits are wrong. The JVM spec

Re: Not snazzy (was: New Unicode Savvy Logo)

2003-05-30 Thread Mark Davis
Rick posted a message recently he intended as a personal contribution, but it may have been interpreted as an official statement. Here is some clarification of what he wrote. 1. His point about compliance and conformance was intended to indicate that using the savvy logo would only indicate that

Re: Dutch IJ, again

2003-05-27 Thread Mark Davis
Well, I don't know who told you, but WORD JOINER only affects linebreak behavior, not intercharacter spacing. Mark __ http://www.macchiato.com Eppur si muove - Original Message - From: Anto'nio Martins-Tuva'lkin [EMAIL PROTECTED] To: [EMAIL PROTECTED]

Re: javascript and unicode

2003-05-27 Thread Mark Davis
One minor correction: However, it's true that ECMAScript will allow you to create invalide Unicode strings. More precisely, ECMAScript (and other systems) will allow you to create 16-bit Unicode strings that are not UTF-16. See Section 2.7 in http://www.unicode.org/book/preview/ch02.pdf. Mark

Re: Devanagari Glottal Stop

2003-04-05 Thread Mark Davis
Can you respond back to them with the information as to the languages involved? Mark ( ) [EMAIL PROTECTED] IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193 (408) 256-3148 fax: (408) 256-0799 - Original Message - From: Michael Everson [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent:

Locales vs. Language codes

2003-04-04 Thread Mark Davis
By the way, a few people have been discussing possible solutions to some of the problems with language codes (and their relation to locales), which may be of interest to some people here. The discussion has just been switched to http://www.alvestrand.no/mailman/listinfo/ietf-languages, which has

Unicode 4.0 slides and ICU slides

2003-04-02 Thread Mark Davis
Some people asked me where they could see copies of my Unicode 4.0 slides and ICU Overview slides from the Prague conference. I posted them on my website, at http://www.macchiato.com/. Once Steven gets back, we'll post copies of the LDML slides (and the ICU Overview slides) on the ICU site. Mark

Re: Inherited-script characters

2003-03-28 Thread Mark Davis
-0799 - Original Message - From: Doug Ewell [EMAIL PROTECTED] To: Unicode Mailing List [EMAIL PROTECTED]; Mark Davis [EMAIL PROTECTED] Sent: Friday, March 28, 2003 17:31 Subject: Inherited-script characters Last December, Mark Davis indicated that a passage similar to the following would

Re: Unicode Technical Report #22

2003-03-20 Thread Mark Davis
Claude Tardif [EMAIL PROTECTED]To: Mark Davis/Cupertino/[EMAIL PROTECTED] cc: [EMAIL

Re: Re. and Rs. currency sign

2003-03-19 Thread Mark Davis
He keeps them all ;-) Mark () [EMAIL PROTECTED] IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193 (408) 256-3148 fax: (408) 256-0799 - Original Message - From: Michael Everson [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, March 19, 2003 12:23 Subject: Re: Re. and Rs.

Re: Custom fonts (was: Tolkien wanta-be)

2003-03-18 Thread Mark Davis
I do say that if a webpage has U+E000 defined as banana and I have it defined as apple, that then their range U+E000-U+F8FF is a different PUA, belonging to a different extension of unicode than my range U+E000-U+F8FF It is *not* a different PUA. The PUA is defined to be simply a range of code

4.0.0 Beta Unicode Standard Annexes

2003-03-18 Thread Mark Davis
The following UAXes have beta drafts available for review. These documents are updated to the latest UTC decisions for Unicode 4.0.0. (#14 will have a few more changes soon to account for UTC decisions made in the last meeting.) Each document has a Modifications section that describes the latest

Re: Normalisation and Greek characters

2003-03-15 Thread Mark Davis
If you have questions as to particular normalizations, I'd suggest looking at the normalization charts on the Unicode website. Mark [EMAIL PROTECTED] IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193 (408) 256-3148 fax: (408) 256-0799 - Original Message - From: David J. Perry [EMAIL

Re: geometric shapes

2003-03-13 Thread Mark Davis
This might be worth writing a Technical Note to start with; see http://www.unicode.org/notes/ Mark [EMAIL PROTECTED] IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193 (408) 256-3148 fax: (408) 256-0799 - Original Message - From: Frank da Cruz [EMAIL PROTECTED] To: Pim Blokland

Deterministic Sorting (was Re: ZWNJ Persian Collation)

2003-03-13 Thread Mark Davis
I want to point out two things. 1. UCA provides a mechanism for producing a deterministic sort (there called semi-stable). See step 3.10 (http://www.unicode.org/reports/tr10/#Step_3). 2. A deterministic sort is actually not needed very often; people confuse it with a stable sort. See

Re: Deterministic Sorting (was Re: ZWNJ Persian Collation)

2003-03-13 Thread Mark Davis
Well, maybe 3 things ;-) Mark [EMAIL PROTECTED] IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193 (408) 256-3148 fax: (408) 256-0799 - Original Message - From: Mark Davis [EMAIL PROTECTED] To: Markus Scherer [EMAIL PROTECTED]; unicode [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent

Re: FAQ entry (was: Looking for information on the UnicodeData file)

2003-03-11 Thread Mark Davis
No. One cannot make such a black and white statement (correctly, at least). The OED does use Csar, for example. While most people would consider it slightly old-fashioned to use that form, it is done. Mark [EMAIL PROTECTED] IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193 (408) 256-3148

Re: Encoding: Unicode Quarterly Newsletter

2003-03-10 Thread Mark Davis
In the interests of internationalization, I suppose I should point out that the weight of the Unicode 4.0 book, while 9 Lbs in the US, will be 4.1 kg everywhere else in the world. In the interests of precision: - The weight would be 9 lb anywhere on the earth. - The *mass* would be 4.1 kg,

Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

2003-03-03 Thread Mark Davis
- Original Message - From: Asmus Freytag [EMAIL PROTECTED] To: Mark Davis [EMAIL PROTECTED]; Kent Karlsson [EMAIL PROTECTED]; 'Michael (michka) Kaplan' [EMAIL PROTECTED] Cc: 'Yung-Fong Tang' [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Sunday, March 02, 2003 21:10 Subject: Re: UTF-8 Error Handling

Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

2003-03-03 Thread Mark Davis
that they can do more complex processing. Mark [EMAIL PROTECTED] IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193 (408) 256-3148 fax: (408) 256-0799 - Original Message - From: Asmus Freytag [EMAIL PROTECTED] To: Mark Davis [EMAIL PROTECTED]; Kent Karlsson [EMAIL PROTECTED]; 'Michael

Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

2003-03-03 Thread Mark Davis
to. Mark [EMAIL PROTECTED] IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193 (408) 256-3148 fax: (408) 256-0799 - Original Message - From: Asmus Freytag [EMAIL PROTECTED] To: Mark Davis [EMAIL PROTECTED]; Kent Karlsson [EMAIL PROTECTED]; 'Michael (michka) Kaplan' [EMAIL PROTECTED] Cc

Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

2003-03-02 Thread Mark Davis
I agree with Kent that it is somewhat less robust to simply remove ill-formed sequences, since it removes any indication that the data was corrupted. Either better to signal an error, or insert some other indication like a REPLACEMENT CHARACTER or SUB at that point. (And in my reading, C12a does

Re: Unicode 4.0 BETA available for review

2003-02-27 Thread Mark Davis
And also RFC is FREE of charge but not Unicode standard itself. The Unicode Standard *is* free of charge; the entire text is posted on www.unicode.org. Mark [EMAIL PROTECTED] IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193 (408) 256-3148 fax: (408) 256-0799 - Original Message -

Re: CJK Unified Ideographs Range

2003-02-20 Thread Mark Davis
We should remember that blocks do not necessarily contain a consistent set of characters. See http://www.unicode.org/reports/tr18/#Character_Blocks. If we really need space for characters, then we can allocate them in 'related' blocks. (We also do not guarantee that block boundaries are

Re: LATIN LETTER N WITH DIAERESIS?

2003-01-28 Thread Mark Davis
I have a chart at http://www.macchiato.com/unicode/composition_chart.html that makes it pretty easy to find all those odd precomposed characters. Mark __ http://www.macchiato.com ► “Eppur si muove” ◄ - Original Message - From: Curtis Clark [EMAIL

Pinyin Readings (was Re: CJK fonts)

2002-12-11 Thread Mark Davis
John, we've communicated a number of errors in the pinyin readings on previous occasions. Since you said you were going to be looking at the Mandarin readings, I just dumped a complete file of what we are currently using so that you can look at it. (Since it is rather large for email, I stored it

Re: Script of U+0951 .. U+0954

2002-12-09 Thread Mark Davis
] To: Unicode Mailing List [EMAIL PROTECTED] Cc: Mark Davis [EMAIL PROTECTED] Sent: Saturday, December 07, 2002 09:15 Subject: Re: Script of U+0951 .. U+0954 There were some errors in my suggested update to Scripts.txt. A correction has been posted. Sorry about that. Mark Davis mark dot davis at jtcsv

Re: [OT] HAIKU computer talk

2002-12-05 Thread Mark Davis
Those are fun -- if you like them I have a link to a site full on www.macchiato.com (I also put up Hu's on First, if you haven't seen it). Mark __ http://www.macchiato.com ► “Eppur si muove” ◄ - Original Message - From: [EMAIL PROTECTED] To: [EMAIL

Re: Script of U+0951 .. U+0954

2002-12-05 Thread Mark Davis
with MS people (and not only me, but also Pothana's designer), MS answered that the Unicode standard seemed to imply that these accents apply to Devanagari script only. That is incorrect; all non-spacing marks should inherit the script of their base character. We need to make this clear in

Re: Default properties for PUA characters???

2002-12-03 Thread Mark Davis
Ken is correct: the default properties are somewhat different for ideographs than for PUAs. In addition, PUAs are a special case compared to other characters; implementations are free, within very broad limits, to change the default properties associated with a PUA code point to whatever is

<    1   2   3   4   5   6   7   8   9   10   >