Re: Case mapping errors?

2000-06-22 Thread Mark Davis
These characters are purely coded for compatibility. Unicode does not distinguish letters by the abbreviations that they happen to be used in. There is no difference in semantics between the "g" in "go" vs. the "g" in "12g", nor between the "Å" in "Århus" vs. the "Å" in "15Å", nor -- for that

English as she is spoke

2000-06-22 Thread mark . davis
, these data nevertheless have the risk always run by corruption. Mark ___ Mark Davis, IBM Center for Java Technology, Cupertino (408) 777-5850 [fax: 5891], [EMAIL PROTECTED], [EMAIL PROTECTED] http://maps.yahoo.com/py/maps.py?Pyt=Tmapaddr=10275+N.+De+Anzacsz=95014

Re: C # character model

2000-06-28 Thread Mark Davis
Almost all international functions (upper-, lower-, titlecasing, case folding, drawing, measuring, collation, transliteration, grapheme-, word-, linebreaks, etc.) should take *strings* in the API, NOT single code-points. Single code-point APIs almost always malfunction once you get outside of

Re: Mixing languages on a Web site

2000-06-30 Thread Mark Davis
This is very much like how we did the multlingual content in http://www.unicode.org/unicode/standard/WhatIsUnicode.html, which currently has English, French, German, Italian, Russian, and Arabic; with more to follow. Mark Herman Ranes wrote: [EMAIL PROTECTED] skreiv: I am mixing

Re: Plane 14 tags and SCSU

2000-07-02 Thread Mark Davis
In Asmus's defense, there are fewer recipients that will understand SCSU right now, so one needs to be a bit more carefull about slinging it around. On the other hand, for anything outside of plain English, it is quite a handy mechanism for interchanging Unicode text, so it can reduce memory

Re: Any other Italians on Unicode List? (was RE: French annotated Cha

2000-07-06 Thread Mark Davis
Such lists of translations for the glossary terms in Unicode would be quite useful. If these are produced, be sure to request their addition to Useful Resources on the Unicode site. Mark Antoine Leca wrote: Patrick Andries wrote: - Original Message - From: "Marco Piovanelli"

Re: Names of planes, and request for sneak preview

2000-07-11 Thread Mark Davis
We haven't used the notion of Planes and Groups. These actually derived, as far as I can remember from early days in L2, from later-discarded mechanisms that would let you swap in planes into the BMP. Thus it was important to distinguish these levels. Planes and Groups are themselves not

Re: Names of planes, and request for sneak preview

2000-07-11 Thread Mark Davis
I ALY FND ANMs HRD2 DL WTH. WD PFR NML WDS. Michael Everson wrote: Ar 07:53 -0800 2000-07-11, scríobh John H. Jenkins: At the same time, it would be nice to have a Unicodally correct way of referring to planes 1 and 2, since there is an important boundary between them. Just use the

Unicode site redesign

2000-07-19 Thread Mark Davis
We are pleased to announce that the Unicode web site has been redesigned to improve navigation and usability. Our new look features a more accessible layout and color scheme, with related links in the side bar on most pages to help you learn about other information available on the site. Longer

Re: Unicode FAQ addendum

2000-07-20 Thread Mark Davis
Narrowing in on it, with one amendation. UTF-8 code units are 8 bits, so we can't say that. Mark Becker, Joseph wrote: | C1 says "A process shall interpret Unicode code values as 16-bit | quantities." DE I think the focus here was supposed to be on the fact that Unicode code DE values are

Re: Unicode in VFAT file system

2000-07-21 Thread Mark Davis
Unicode has changed and evolved over the years. At this point, UCS-2 is a funny beast, because it shares precisely the same encoding space as UTF-16. That is, in code units there is absolutely no difference between them. The only real difference is whether you interpret the code units in the

Re: Signature for SCSU

2000-07-21 Thread Mark Davis
Because of its usage, ZWNBSP is extremely unlikely at the start of a file, but that doesn't mean it can't occur. A question mark is also extremely unlikely, as are many other characters. However, they can occur. Unicode doesn't forbid any sequence of characters from occurring. Stripping, say,

Re: Oracle and Surrogate Pairs

2000-07-25 Thread Mark Davis
You could define a UTF that mapped scalar values below to the same as UTF-8, and values above to a 6 byte value. It would *not* be UTF-8, but it can be well defined. If you look below D29 -- p. 46 at the first full paragraph -- you find that for round tripping, UTFs are required to map

Re: Making Unicode characters

2000-07-26 Thread Mark Davis
If you just want one or two characters, I have a chart webpage on my site (www.macchiato.com). You type in the code number and ENTER, and it presents a chart of 128 characters, with that character in green. Copy and paste, and here it is. 女 [Visible if your mailer handles UTF-8] Mark [EMAIL

Re: What a difference a glyph makes...

2000-07-26 Thread Mark Davis
Interestingly for tax forms, the fallback mapping for many Windows encodings has Lira (₤) converting up to pound (£), cf. http://oss.software.ibm.com/icu/charset/CharMaps-HTML/windows-1252-2000.html. There are some other interesting fallbacks there... Mark [EMAIL PROTECTED] wrote: Mark Davis

Re: Display problems

2000-07-29 Thread Mark Davis
m/products/jdk/1.1/docs/guide/intl/fontprop.html] on how to edit them to add new fonts. (This may take some patience: the description is not exactly straightforward.) Edward Cherlin wrote: At 6:41 AM -0800 7/25/2000, Mark Davis wrote: The issue of how to get Java to display Unicode character

press release

2000-07-30 Thread Mark Davis
BTW, saw the following press release from Peoplesoft PeopleSoft Implements Unicode link to http://checkers.peoplesoft.com/events.nsf/07dd07bae4e2a86b8825666700767bbf/f59d0dfabda3a051882569190047a690?OpenDocument

Re: Addition of remaining two Maltese Characters to Unicode

2000-08-01 Thread Mark Davis
We do not currently have a character that would serve the purpose being discussed. The functions of the ZWNBSP and ZWSP are to forbid/allow linebreak, which is orthogonal to the issue of whether two characters form a grapheme. Although graphemes shouldn't linebreak, not every pair of letters

Re: Question

2000-08-02 Thread Mark Davis
Indic support is in IBM's JDK, I believe in 1.3. Mark Vinit Bhatt wrote: Hi Addison, Thanks for really descriptive and explanatory email. It helped me a lot in grasping basics of Unicode and Internationalization. I also got good link from the site you gave me. That is :-

Re: FW: Date Controls

2000-08-18 Thread Mark Davis
Before people get either excited or dismayed by these two drafts, one should note that they are simply drafts: it is by no means assured that they will ever be approved, or used if approved. Mark Keld Jørn Simonsen wrote: On date and time formatting: The forthcoming ISO TR 14652 can

Unicode 3.0.1 beta period ending

2000-08-18 Thread Mark Davis
The Unicode 3.0.1 beta period is closing on August 25. We encourage everyone who uses the Unicode Character Database files to download and examine these files in detail. In particular, some files have recently been added to this beta as directed by the Unicode Technical Committee at its 84th

Re: CVTUTF.C bug question

2000-08-23 Thread Mark Davis
Thanks. That code does need to be fixed, once we get the time. Oliver Steinau wrote: I have a question concerning the CVTUTF.C file that is on the CD in the Unicode 3.0 book. There's a piece of code which I don't think is correct... Function ConvertUTF8toUTF16 contains the following piece

Re: IUC 17 Related Announcement

2000-09-01 Thread Mark Davis
If some noble soul volunteers to act as a sports reporter, I'm sure we can work up something. It's probably a bit much to web-cam it, but that may come in the future. Mark Otto Stolz wrote: On Thu, 31 Aug 2000 17:31:49 (GMT-0800), Sarasvati has written: As part of the reception on Thursday,

Unicode 3.0.1 Released

2000-09-01 Thread Mark Davis
Unicode 3.0.1 has been released. This version is described on http://www.unicode.org/unicode/standard/versions/Unicode3.0.1.html, and is linked from the Unicode home page. Here is a short excerpt from that page: Unicode 3.0.1 is an update version of Unicode 3.0. It does not contain

Re: Unicode 3.0.1 Released

2000-09-02 Thread Mark Davis
, and there is a note explaining them. It might not go into enough detail -- we can supply that in the FAQ: http://www.unicode.org/unicode/faq/casemap_charprop.html Mark John Cowan wrote: On Fri, 1 Sep 2000, Mark Davis wrote: Unicode 3.0.1 has been released. This version is described on http

Re: Surrogate support in *ML?

2000-09-07 Thread Mark Davis
In HTML or XML you always use the code point (e.g. UTF-32), not a series of code units (UTF-8 or UTF-16). Thus you would use: #x10123; not #xD800;#xDD23; from UTF-16 nor #xF0;#x90;#x84;#xA3; from UTF-8 Mark Brendan Murray/DUB/Lotus wrote: How can one encode a surrogate character as an

[Fwd: Unicode Conversions]

2000-09-07 Thread Mark Davis
Mark Davis wrote: Hello all, I have been trying to input unicode from a browser and store it in a database. The problem is the different encodings used to represent the unicode. The input text is in the UTF-8 format. I have read on the Microsoft support site that SQL Server 7.0

Re: Unicode on a non-Unicode web page

2000-09-08 Thread Mark Davis
Take a look at the Unicode FAQ on the web, at www.unicode.org "Gary P. Grosso" wrote: Hi Unicoders, I am working on software to emit HTML in the encoding and character set of the user's choice, from SGML/XML documents which can contain any Plane 1 Unicode character. The question is what

Re: Surrogate support in *ML?

2000-09-08 Thread Mark Davis
Good point. In the past, I have used "surrogate characters" to refer to the characters encoded above , and surrogate code units to refer to the UTF-16 units D800-DFFF. However, I think that leads to confusion. Nobody has come up with a good term for all characters above . "Plane 1-16

Re: surrogate terminology

2000-09-13 Thread Mark Davis
Not all code points are assigned (or even assignable) to characters. U+xx is used to refer to code points, which range from 0 to 10. Of these code points, some are assigned to characters (including regular characters, control characters, format characters, and private use characters

Re: Tagging orthographic systems (was: (iso639.186) the

2000-09-15 Thread Mark Davis
I share the concern about combinatorial explosions. Look a Spanish, Arabic or English, for example: http://oss.software.ibm.com/developerworks/opensource/icu/localeexplorer/ I agree that de-*-sp1996 makes more sense. For us, the variant should go before the country only if the variant is -- in

Re: Ligatured characters

2000-09-15 Thread Mark Davis
I'd like to remind everyone to look at the latest version of the Unicode Standard, especially when looking at fine points. To cite Unicode 3.0.1 (http://www.unicode.org/unicode/standard/versions/Unicode3.0.1.html) "Section 13.2 Controlling Ligatures, page 318: the text is superseded by the

Re: [idn] nameprep forbidden characters

2000-09-17 Thread Mark Davis
I am curious why you feel so strongly that the Hebrew points should be ignored in domain names. Prima facie, it seems that there is little harm in treating them no differently from other characters. What problem would arise if the domain was ABC.COM and I could not get it by typing AB*C.COM?

Re: [idn] nameprep forbidden characters

2000-09-17 Thread Mark Davis
the last sentence. I had thought that the vowel marks were used to get the exact pronunciation. If that is not true, it may be part of my misunderstanding of the situation. Jony -Original Message- From: Mark Davis [mailto:[EMAIL PROTECTED]] Sent: Sunday, September 17, 2000 7:58 PM

Re: http://www.unicode.org/unicode/standard/standard.html

2000-09-18 Thread Mark Davis
quot;, in TUC 3.0, p. 318. Am 2000-09-15 um 14:40 UCT hat Mark Davis geschrieben: I'd like to remind everyone to look at the latest version of the Unicode Standard, especially when looking at fine points. To cite Unicode 3.0.1 (http://www.unicode.org/unicode/standard/versions/Unicode3.0

Re: [idn] nameprep forbidden characters

2000-09-19 Thread Mark Davis
UCA (#10) already handles that. You will get a "fuzzy" compare if you mask off less important weights, and you will get a much better ordering than binary compare as well. Mark Hart, Edwin F. wrote: Is there a need for a "fuzzy" comparison where names with and without points in Hebrew? Is

Re: [idn] nameprep forbidden characters

2000-09-19 Thread Mark Davis
er scripts such as Arabic? Mark Davis replied UCA (#10) already handles that. You will get a "fuzzy" compare if you mask off less important weights, and you will get a much better ordering than binary compare as well. But then, why does the W3 Consortium want to *forbid* some Unic

Re: TATAP = TATAR

2000-09-19 Thread Mark Davis
If those can be confirmed, then the SpecialCasing file should be modified to add them. Could you verify this in time for the next UTC? Mark Cathy Wissink wrote: I believe Azeri also uses the dotless i/dotted i Turkish-style casing. Cathy -Original Message- From: Carl W. Brown

Re: New Name Registry Using Unicode

2000-10-02 Thread Mark Davis
There are a number of similarities between this XNS and IDN, so http://www.ietf.org/internet-drafts/draft-ietf-idn-nameprep-00.txt would be worth reading. On locales: using them is dangerous for matching. The only reason to add locale is if it were to make a difference which letters match. But

Re: lag time in Unicode implementations in OS, etc?

2000-10-03 Thread Mark Davis
It would be more accurate to say that it does not support all of Unicode 3.0. Just using the phrase "doesn't support 3.0" suggests that it is not compliant. A product can be compliant to a particular version of Unicode while only supporting a subset of the characters. Even compliant products

Re: lag time in Unicode implementations in OS, etc?

2000-10-03 Thread Mark Davis
If there are specific areas where the BIDI algorithm has flaws, that should be communicated to the UTC bidi subcommittee, ideally with a proposal to fix the problem. Mark - Original Message - From: "Michael (michka) Kaplan" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent:

Re: help me !!!

2000-10-03 Thread Mark Davis
Please take a look at www.unicode.org - Original Message - From: "Karambir Rohilla" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Tuesday, October 03, 2000 21:17 Subject: help me !!! hello Please help me anyone waht is UTF8 UTF16 ? regard karambir singh

What is Unicode?

2000-10-04 Thread Mark Davis
Thanks to the industriousness of volunteer translators and to Magda and Julie's editorial work, we have many more translations of "What is Unicode" on www.unicode.org (all in UTF-8, of course). Check out http://www.unicode.org/unicode/standard/WhatIsUnicode.html. If you have problems displaying

Re: UTF-8 and UTF-16

2000-10-05 Thread Mark Davis
UTF-8, UTF-16, and UTF-32 all support exactly the same character repertoire. Please look at www.unicode.org, on the front page is a link to the FAQs. Mark - Original Message - From: "George Zeigler" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Thursday, October 05, 2000

Re: Correct definition for an isLatin1() function

2000-10-05 Thread Mark Davis
For the purpose specified, isLatin1 should just test for = 0xFF. After all, one would not want to exclude TAB, CR or LF ☺ Mark - Original Message - From: "John Cowan" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Thursday, October 05, 2000 10:33 Subject: Re: Correct

Re: utf-8 != latin-1

2000-10-17 Thread Mark Davis
One of the main features of XML is that it has quite strict rules about how to handle errors. The goal, I believe, is to ensure that we are not awash in malformed files that have no clear interpretation. And this is clearly an error: the acceptable code points are quite clearly stated:

Re: [Very OT] Japanese economy failing -- it's the Japanese language and

2000-10-20 Thread Mark Davis
Zumindest die Hälfte der Namen im Lande kann so oder auch so ausgesprochen werden - je nachdem, wie es der Namensträger wünscht. Much the same in America; you very often don't know how someone's last name is pronounced (or spelt): Stein = shtyn? styn? steen? - Original Message - From:

Re: Fonts that support the ORNL rendering of Tamil?

2000-10-31 Thread Mark Davis
Can someone write up a description of the proposed change, with the attandant glyphs. There is a UTC meeting next week in San Diego, so now's the time. Mark - Original Message - From: "Antoine Leca" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Tuesday, October 31, 2000

Re: Normative vs Informative

2000-11-06 Thread Mark Davis
3:15 Subject: Re: Normative vs Informative Ar 00:04 -0700 2000-10-26, scríobh Mark Davis: I am leary of using normative your way unless we find strong evidence of this. Well, that's just wrong, Mark. (Sorry, it's beat-up Mark day I guess.) Ken explained Normative and Infor

Re: National Languages Support in Windows

2000-11-10 Thread Mark Davis
ICU has a list of these. If you take a look at http://oss.software.ibm.com/icu/charset/CharMaps-HTML/windows-1252-2000.html , for example, you will see some other interesting cases. Mark - Original Message - From: "Michael (michka) Kaplan" [EMAIL PROTECTED] To: "Unicode List" [EMAIL

Re: Devanagari question

2000-11-13 Thread Mark Davis
The Unicode Standard does define the rendering of such combinations, which is in the absence of any other information to stack outwards. Implementations that can't do that will either overstrike, or use some other fallback rendering. A sophisticated rendering will use positioning such as control

Re: [idn] Javascript code charts, unicode converter, show-characters

2000-11-15 Thread Mark Davis
programmatically, the program is wrong. Mark - Original Message - From: J. William Semich To: Rick H Wesson ; Mark Davis Cc: Unicore ; Unicode ; [EMAIL PROTECTED] ; w3c-i18n-ig Sent: Wednesday, November 15, 2000 09:32 Subject: Re: [idn] Javascript code charts

Re: [idn] Javascript code charts, unicode converter, show-characters

2000-11-16 Thread Mark Davis
That agrees with the results I get on http://www.macchiato.com/unicode/convert.html. Mark - Original Message - From: J. William Semich To: Mark Davis ; Rick H Wesson Cc: Unicore ; Unicode ; w3c-i18n-ig Sent: Wednesday, November 15, 2000 22:46 Subject: Re: [idn

Re: string vs. char [was Re: Java and Unicode]

2000-11-16 Thread Mark Davis
We have found that it works pretty well to have a uchar32 datatype, with uchar16 storage in strings. In ICU (C version) we use macros for efficient access; in ICU (C++) version we use method calls, and for ICU (Java version) we have a set of utility static methods (since we can't add to the Java

Re: Greek Prosgegrammeni

2000-11-19 Thread Mark Davis
I haven't had time to read this list recently, so here is a somewhat belated response. But, even if you do so, we are left with a "wrong" canonical decomposition: 1FBC;GREEK CAPITAL LETTER ALPHA WITH PROSGEGRAMMENI;Lt;0;L;0391 0345N1FB3; According to James' statement (which is not

Re: string vs. char [was Re: Java and Unicode]

2000-11-20 Thread Mark Davis
The UTC will be using the terms "supplementary code points", "supplementary characters" and "supplementary planes". The term it is "deprecating with extreme prejudice" is "surrogate characters". See http://www.unicode.org/glossary/ for more information. Mark - Original Message - From:

Re: Unicode Case Mappings UTR #21

2000-11-29 Thread Mark Davis
These are good points. TR 21 deliberately does not specify the language conventions for using titlecase, which as you note will change the effect of its use (see http://www.unicode.org/unicode/reports/tr21/#TitlecaseCaveats). Most products will have some smarts, but also leave it up to the user

UTF-8 Corrigendum, new Glossary

2000-11-29 Thread Mark Davis
We would like to call two items to people's attention. 1. The Unicode Technical Committee has modified the definition of UTF-8 to forbid conformant implementations from interpreting non-shortest forms for BMP characters, and clarified some of the conformance clauses. For more information, see

Re: UTF-8 Corrigendum, new Glossary

2000-11-30 Thread Mark Davis
- From: "G. Adam Stanislav" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Wednesday, November 29, 2000 22:42 Subject: Re: UTF-8 Corrigendum, new Glossary At 21:08 29-11-2000 -0800, Mark Davis wrote: 1. The Unicode Technical Committee has modified the d

Re: sequences and stuff

2000-11-30 Thread Mark Davis
The soft hyphen is not sufficient, since in other languages the case where two letters must be distinguished in collation may not fall on a syllable boundary, or allow hyphenation between them. The UTC looked at all the possible existing boundary-control characters; none of them really work for

Re: display problems on browser

2000-12-01 Thread Mark Davis
Have you tried looking at the Unicode home page, at "Display Problems", or the FAQ "Unicode on the Web"? - Original Message - From: "sreekant" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Thursday, November 30, 2000 22:27 Subject: display problems on browser hi, I am

Re: Transcriptions of Unicode

2000-12-01 Thread Mark Davis
OTECTED] Cc: "Unicode List" [EMAIL PROTECTED] Sent: Friday, December 01, 2000 2:30 PM Subject: Re: Transcriptions of "Unicode" Sad to report, my browser (Netscape 4.7) shows the Yiddish as Daw-key-nu-ye (It's left to right not rtl...) I am using the Monotype Andal

Re: Transcriptions of Unicode

2000-12-02 Thread Mark Davis
quot;Unicode List" [EMAIL PROTECTED] Sent: Friday, December 01, 2000 22:46 Subject: Re: Transcriptions of "Unicode" Cool. Now if you also add LANG attributes, Mozilla/Netscape 6 will use the fonts that have been set up for those languages. E.g.: span lang="ja" title=&quo

Re: Transcriptions of Unicode

2000-12-02 Thread Mark Davis
ill use the fonts that have been set up for those languages. E.g.: span lang="ja" title="Japanese".../span Erik Mark Davis wrote: Done. From: "Michael (michka) Kaplan" [EMAIL PROTECTED] I would suggest adding a span title="{insert lang name

Re: Transcriptions of Unicode

2000-12-04 Thread Mark Davis
=== Globalization Engineering Consulting Services On Sat, 2 Dec 2000, Mark Davis wrote: Won't Modzilla pick fonts based on character code? The only ones in the list that couldn't be deduced from that would be the Yiddish and the Chinese. Mark - Original Message --

TR22

2000-12-04 Thread Mark Davis
As per the instructions of the Unicode Technical Committee, TR#22: Character Mapping Markup Language (CharMapML) has been advanced from draft TR to full TR. See http://www.unicode.org/unicode/reports/tr22/ for more information. Note: The UTC intends to continue development this TR to also

Re: displaying Unicode text (was Re: Transcriptions of Unicode)

2000-12-07 Thread Mark Davis
isplaying Unicode text (was Re: Transcriptions of "Unicode") Mark Davis wrote: Let's take an example. - The page is UTF-8. - It contains a mixture of German, dingbats and Hindi text. - My locale is de_DE. From your description, it sounds like Modzilla works as follows: - The local

Re: Transcriptions of Unicode

2000-12-14 Thread Mark Davis
ember 12, 2000 09:01 Subject: Re: Transcriptions of Unicode Ar 07:11 -0800 2000-12-12, scríobh Mark Davis: ARMENIAN BULGARIAN CHEROKEE ETHIOPIC GREEK GUJARATI GURMUKHI INUKTITUT OGHAM RUNIC RUSSIAN SINHALA UCAS See http://www.egt.ie/standards/iso10646/pdf/junikod.pdf Michael Everson ** E

Re: Transcriptions of Unicode

2000-12-14 Thread Mark Davis
That matches what I have on http://www.macchiato.com/unicode/Unicode_transcriptions.html, right? (circle?) Mark - Original Message - From: "Michael (michka) Kaplan" [EMAIL PROTECTED] To: "Mark Davis" [EMAIL PROTECTED]; "Unicode List" [EMAIL PROTECTED] Sen

Re: (SC22WG20.3292) 14651 draft table updated

2001-01-02 Thread Mark Davis
rom Maurice Bauhahn, but have some outstanding questions that need to be resolved before attempting to roll the results into the table. The resolution of Khmer sorting should also shed some light on what to do with Myanmar, which shares a number of structural similarities with Khmer. Some s

Re: GBK, HZ and EUC-TW

2001-01-08 Thread Mark Davis
In specific cases you may use one character conversion mapping instead of two, but you should be very careful about that. See http://www.unicode.org/unicode/reports/tr22/, especially "1.2.1 Best-Fit Mappings" Mark - Original Message - From: "Lars Marius Garshol" [EMAIL PROTECTED] To:

Re: Reverse Bidi Algorithm

2001-01-08 Thread Mark Davis
ICU offers a reverse BIDI algorithm. (http://oss.software.ibm.com/icu/) Mark - Original Message - From: "Roozbeh Pournader" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Cc: "Behdad Esfahbod" [EMAIL PROTECTED] Sent: Monday, January 08, 2001 20:12 Subject: Reverse Bidi Algorithm

Re: Transcriptions of Unicode

2001-01-12 Thread Mark Davis
nal Message - From: "Marco Cimarosti" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Friday, January 12, 2001 03:11 Subject: Re: Transcriptions of "Unicode" Hallo everybody! I don't fully agree with Mark Davis' API transcription of "Unic

Re: Transcriptions of Unicode

2001-01-12 Thread Mark Davis
Thanks for your detailed note; I'll have to think it over. ... But there's another inconsistency in the transcription: the vowels in the first ("u-") and third ("-code") syllable are both phonemically long. Either you put the length mark on both (recommended for *phonetic* transcription), or

Re: UNICODE application on IBM Mainframe

2001-01-17 Thread Mark Davis
Unicode is always serialized in a UTF: UTF-8, UTF-16*, or UTF-32*. The definition of each of these is invariant across systems: in UTF-8 an 'a' is always stored as 0x61. There is a special UTF for use on EBCDIC systems. Check out the technical reports and FAQs on www.unicode.org. Mark -

Re: A real bug in bidi

2001-01-17 Thread Mark Davis
Yes, I have already proposed an agenda item for the next UTC, to get this fix into 3.1. Mark ___ Mark Davis, IBM GCoC, Cupertino (408) 777-5850 [fax: 5891], [EMAIL PROTECTED], [EMAIL PROTECTED] http://maps.yahoo.com/py/maps.py?Pyt=Tmapaddr=10275+N.+De+Anzacsz=95014 Roozbeh Pournader [EMAIL

Re: PDUTR #27: Unicode 3.1

2001-01-22 Thread Mark Davis
BTW, we have settled on a term for characters with code points above . See http://www.unicode.org/glossary/#supplementary_character http://www.unicode.org/glossary/#supplementary_code_point Mark - Original Message - From: "David Starner" [EMAIL PROTECTED] To: "Unicode List"

Re: Time Intervals

2001-01-22 Thread Mark Davis
This appears to have bounced the first time I sent it. - Original Message - From: "Mark Davis" [EMAIL PROTECTED] To: "Unicore" [EMAIL PROTECTED]; "Unicode" [EMAIL PROTECTED] Sent: Monday, January 22, 2001 08:04 Subject: Time Intervals After a reque

Re: Unicode 3.1: IDS and ZW(N)J

2001-01-24 Thread Mark Davis
It doesn't add any value to insert joiners. Just add the IDS itself to the font table. Mark - Original Message - From: "John Cowan" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Wednesday, January 24, 2001 11:21 Subject: Re: Unicode 3.1: IDS and ZW(N)J John Jenkins

Re: Benefits of Unicode

2001-01-29 Thread Mark Davis
Title: Unicode Benefits Allows for multilingual documents using any or all the languages you desire. Invoice or ticketing applications can print native language names. *"multilingual documents" are rare -- as most people understand the term 'documents'. What more people care about is that

Re: Unicode 3.1: UTF-8

2001-02-01 Thread Mark Davis
This is not an omission. This issue was debated at great length in the Unicode technical committee, and the precise wording was agreed to by the committee. Mark - Original Message - From: "John Cowan" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Wednesday, January 31,

Re: Property error for U+2118?

2001-02-01 Thread Mark Davis
John, It's interesting how we find ways to get around rules that bother us This is a misrepresentation. The symbol was always intended to be the Weierstrass elliptic function. It was misnamed, and is thus annotated with the correct information. Nobody is winking. ... If I had read the

Re: Time Intervals

2001-02-01 Thread Mark Davis
te format conversion routine and noticed that ICU has no week based year support. Fortunately I don't think my client needs it. Carl -Original Message----- From: Mark Davis [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 25, 2001 9:18 PM To: Carl W. Brown; Unicode List Subject: Re: Time I

Re: Property error for U+2118?

2001-02-01 Thread Mark Davis
Did you not receive the GIF in the original message. Mark - Original Message - From: "David Starner" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Thursday, February 01, 2001 11:01 Subject: Re: Property error for U+2118? On Thu, Feb 01, 2001 at 10

Re: [OT] Unicode-compatible SQL?

2001-02-05 Thread Mark Davis
The topic came up in a UTC meeting some time ago, a "UTF-8S". The motivation was for performance (having a form that reproduces the binary order of UTF-16). We have yet to see a formal proposal for this, though. Mark - Original Message - From: "J M Sykes" [EMAIL PROTECTED] To: "Unicode

Re: Surrogate space in Unicode

2001-02-06 Thread Mark Davis
It is the set of code points that can be addressed using surrogate code points. For more information, see the glossary at www.unicode.org. Mark - Original Message - From: "nikita k" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Tuesday, February 06, 2001 01:51 Subject:

Proposed Update: UAX #19: UTF-32

2001-02-07 Thread Mark Davis
as a Unicode Standard Annex. However, it has not undergone final editorial review: it is not a stable document and may not be used as reference material nor cited as a normative reference from another document. Mark ___ Mark Davis, IBM GCoC, Cupertino (408) 777-5850 [fax: 5891], [EMAIL PROTECTED

Re: The normalization form of the result of a dyadic operation.

2001-02-09 Thread Mark Davis
The whole principle of tagging individual strings with NF* is a bit odd to me; not sure I like it. The K forms in particular are reallya folding operation, much like casing. I would not expect to find a model where someone tagged every string in a database with its Case, and then had some

Re: Unicode collation algorithm - Khmer/Cambodian

2001-02-10 Thread Mark Davis
I have not been following this discussion up until now. Typically the issue with syllables is like that with word-sorting. With word sorting, no matter what is in the second word, any difference in the first word swamps it. Example: ab xyz ghi abc def ghi In many cases, UCA does handle syllabic

Re: extracting words

2001-02-11 Thread Mark Davis
Word break is *very* different than linebreak; see Chapter 5 of TUS, and the Linebreak TR. For linebreak the only tricky language is Thai, since it requires a dictionary lookup (much like hyphenation in English). Java (and ICU) supply linebreak mechanisms as a part of the standard API. They also

Re: extracting words

2001-02-11 Thread Mark Davis
Please read TUS Chapter 5 and the Linebreak TR before proceeding, as I recommended in my last message. The Unicode standard is online, as is the TR. Both can be found by going to www.unicode.org, and selecting the right topic. The TR in particular discusses the recommended approach to line break

Re: Unicode collation algorithm - interpretation]

2001-02-11 Thread Mark Davis
I agree with Tex that the algorithm is small, if implemented in the straightforward way. I also agree with his #1, #2, and #3. I will add two things: 1. Where performance is important, and where people start adding options (e.g. uppercase lowercase vs. the reverse), the implemenation of

Re: Korean linebreking and UTR14(was Re: extracting words)

2001-02-12 Thread Mark Davis
On Sun, 11 Feb 2001, Mark Davis wrote: MD Please read TUS Chapter 5 and the Linebreak TR before proceeding, as I MD recommended in my last message. The Unicode standard is online, as is the MD TR. Both can be found by going to www.unicode.org, and selecting the right MD topic. The TR in

Re: Korean linebreking and UTR14(was Re: extracting words)

2001-02-13 Thread Mark Davis
ot; [EMAIL PROTECTED] Sent: Monday, February 12, 2001 20:30 Subject: Re: Korean linebreking and UTR14(was Re: extracting words) On Mon, 12 Feb 2001, Mark Davis wrote: Thank you for your answer. Asmus Freytag is the one to talk to; he can look into this. Do you think I should contact him

Unicode Transcriptions

2001-02-15 Thread Mark Davis
I am still missing Bopomofo, Khmer, Mongolian, Myanmar, Sinhala, Syriac, Thaana on http://www.macchiato.com/unicode/Unicode_transcriptions.html If anyone could supply one of these, I would appreciate it. Also, Ken suggested that the Bopomofo should be a Bopomofo transcription of the Chinese

Collation

2001-02-22 Thread Mark Davis
For those interested in collation, we have a new version of the ICU collation design document on http://oss.software.ibm.com/icu/develop/collation/. Feedback is welcome. Mark ___ Mark Davis, IBM GCoC, Cupertino (408) 777-5850 [fax: 5891], [EMAIL PROTECTED], [EMAIL PROTECTED] http

Re: An Aburdly Brief Introduction to Unicode (was Re: Perception ...)

2001-02-23 Thread Mark Davis
many comments - Original Message - From: "Tom Lord" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Wednesday, February 21, 2001 21:15 Subject: An Aburdly Brief Introduction to Unicode (was Re: Perception ...) We've seen several posts about the perception that Unicode is

Re: An Aburdly Brief Introduction to Unicode (was Re: Perception ...)

2001-02-23 Thread Mark Davis
Message - From: "John Cowan" [EMAIL PROTECTED] To: "Mark Davis" [EMAIL PROTECTED] Cc: "Unicode List" [EMAIL PROTECTED] Sent: Friday, February 23, 2001 08:21 Subject: Re: An Aburdly Brief Introduction to Unicode (was Re: Perception ...) Mark Davis wrote: A _code_po

Re: An Aburdly Brief Introduction to Unicode (was Re: Perception ...)

2001-02-24 Thread Mark Davis
Ken has done a nice job of fleshing out the issues. I would add a bit to that. The glossary entry for "abstract character", as he points out, was inherited from 10646. "Abstract Character. A unit of information used for the organization, control, or representation of textual data. (See

Re: collation sequences (was Klingon silliness)

2001-02-27 Thread Mark Davis
You can use the same collation sequence for two languages, even if they use different sets of letters, as long as they don't *conflict*. For example, you can't have Swedish and German with the same sequence, since they differ in how they deal with a-umlaut. If there are any words x and y, both in

  1   2   3   4   5   6   7   8   9   10   >