RE: Greek questions, on- and off-topic
My Greek textbook has acute, grave, and circumflex (called by those names), but I'm not sure what these correspond to in the Greek and Greek Extended blocks (there seem to be many more diacriticals than those). Is there an on-line guide somewhere? There are in fact other diacritics used in Greek in addition to the three accents: - Dieresis or dialytica (also used in modern spelling) - Spiritus asper (romanized with an "h") and spiritus levis - Subscript iota (to show an unpronounced etymological "i") - Macron and breve (only used in grammar books and dictionaries) - Apostrophe (admitting it can be called a diacritic) - and something else that I am forgetting, probably... To know which Unicode code points should be used for these diacritics, the handiest thing is to look up the canonical decompositions in the UnicodeData.txt database, both in the basic Greek block (U+03xx) and in the extended block (U+1Fxx). The canonical decomposition field is the data just after the 5th semicolon on each line. _ Marco
Re: Greek questions, on- and off-topic
Here's a listing of the Unicode names (which are the modern Greek names, I believe) for diacriticals in the Extended Greek range and the analogous English *common* names of the Greek accents: acute = oxia grave = varia circumflex = perispomeni iota subscript = ypogegrammeni smooth breathing = psili rough breathing = dasia diaresis = dialytika "Tonos" is the Greek word for accent. The letters with "tonos" in the basic Greek block are called that because all accented Greek characters in modern Greek script use the same accent - and that is the acute. The following diacriticals are not used in typeset Greek text, but only in dictionaries and other books where learners need to be given the length of alpha, iota, and epsilon (omicron and epsilon are of course always short; and omega and eta are of course always long, so one would never need the macron or breve over the other four vowels, even in dictionary listings): macron = macron vrachy = breve The basic Greek block also includes letters that are not used in Classical Attic (Stigma, Digamma, Qoppa, Sampi, Yot), except that some are used as numerals, and a number of characters that are only used in Coptic (post-hieroglyphic Egyptian: Dei, Shei, Fei, Khei, Hori, Shima) and are derived from the demotic Egyptian script. Also do not use the "symbol" versions of Greek letters. Ano teleia is the semicolon (a raised dot). I imagine that the capitals with diaresis are there for text that's in all capitals but is accented. Note that ancient, biblical and Byzantine Greek all use the polytonic version of the script, and modern Greek uses the monotonic (in effect, only uses the acute accent). I've been working for some time on an online resource for using Unicode with ancient Greek, but it's not yet in finished form. It is VERY important to follow Marco Cimarosti's suggestion to look at the normalization forms chart. Patrick Rourke [EMAIL PROTECTED] - Original Message - From: "Marco Cimarosti" [EMAIL PROTECTED] To: "Unicode List" [EMAIL PROTECTED] Sent: Tuesday, January 23, 2001 3:10 AM Subject: RE: Greek questions, on- and off-topic My Greek textbook has acute, grave, and circumflex (called by those names), but I'm not sure what these correspond to in the Greek and Greek Extended blocks (there seem to be many more diacriticals than those). Is there an on-line guide somewhere? There are in fact other diacritics used in Greek in addition to the three accents: - Dieresis or dialytica (also used in modern spelling) - Spiritus asper (romanized with an "h") and spiritus levis - Subscript iota (to show an unpronounced etymological "i") - Macron and breve (only used in grammar books and dictionaries) - Apostrophe (admitting it can be called a diacritic) - and something else that I am forgetting, probably... To know which Unicode code points should be used for these diacritics, the handiest thing is to look up the canonical decompositions in the UnicodeData.txt database, both in the basic Greek block (U+03xx) and in the extended block (U+1Fxx). The canonical decomposition field is the data just after the 5th semicolon on each line. _ Marco
Re: What about musical notation ?
Hello, I think Mr. Garres means the western musical notation invented in the 1200s, which is very widely, if not universally, used today. Unicode 3.0 actually already has at least 2 older forms of musical notation in the main Hebrew block and somewhere in the Arabic block---they are signs for chanting liturgically. These symbols are at least 1100 years old. Elaine Keown "Erik Garrés" wrote: I would like to know, why the symbols used for music are not listed on UNICODE ? Find the best deals on the web at AltaVista Shopping! http://www.shopping.altavista.com
RE: PDUTR #27: Unicode 3.1
On 01/22/2001 01:11:42 PM Kenneth Whistler wrote: I agree that Mark Davis' discussion covers many of the tricks to make things small *and* fast when dealing with Unicode tables. However, you can start out with relatively simple approaches and still get excellent performance in both memory and speed. For example, I recently extended my own Sybase Unicode library implementation... I'm sure the dedicated bit-twiddlers could improve my table size... Does anyone have compact implementations that are open-source (or otherwise share-able)? - Peter --- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: [EMAIL PROTECTED]
Re: UNICODE application on IBM Mainframe
I would like to add one item to this discussion: Recently, someone from the IBM S/390 group told me that they had decided to store and use Unicode on S/390 as UTF-8/16/32. They will not use UTF-EBCDIC. I am not aware of anyone inside or outside of IBM who does use UTF-EBCDIC. (There is another EBCDIC-friendly proposal out there in IBM that also does not seem to have been adopted.) If the IMS DB is now updated to use Unicode, then it is probably as UTF-16, right? markus
Re: PDUTR #27: Unicode 3.1
ICU stores most UnicodeData.txt properties in its uprops.dat, currently some 23kB (Unicode 3.0). This does not include character names, which are in unames.dat, currently some 83kB. There is currently a bug about wrong properties for the last 1k chars in plane 15 16 (I will try to fix this before ICU 1.8), but otherwise it works fine for all of Unicode. It's open source. http://oss.software.ibm.com/developerworks/opensource/icu/ubrowse?k=10 markus [EMAIL PROTECTED] wrote: Does anyone have compact implementations that are open-source (or otherwise share-able)?
Re: UNICODE application on IBM Mainframe
The IMS DB supports UTF-16. Actually, you can store anything you want in an IMS DB - if you want to provide all your own transaction management. IMS provides transaction management for UTF-16, just not through any 3270-based applications. Lisa Markus Scherer [EMAIL PROTECTED] on 01/23/2001 10:18:35 AM To: "Unicode List" [EMAIL PROTECTED] cc: Subject: Re: UNICODE application on IBM Mainframe I would like to add one item to this discussion: Recently, someone from the IBM S/390 group told me that they had decided to store and use Unicode on S/390 as UTF-8/16/32. They will not use UTF-EBCDIC. I am not aware of anyone inside or outside of IBM who does use UTF-EBCDIC. (There is another EBCDIC-friendly proposal out there in IBM that also does not seem to have been adopted.) If the IMS DB is now updated to use Unicode, then it is probably as UTF-16, right? markus
Re: PDUTR #27: Unicode 3.1
Thanks for the info. Peter On 01/23/2001 12:56:45 PM Markus Scherer wrote: ICU stores most UnicodeData.txt properties in its uprops.dat, currently some 23kB (Unicode 3.0). This does not include character names, which are in unames.dat, currently some 83kB. There is currently a bug about wrong properties for the last 1k chars in plane 15 16 (I will try to fix this before ICU 1.8), but otherwise it works fine for all of Unicode. It's open source. http://oss.software.ibm.com/developerworks/opensource/icu/ubrowse?k=10 markus [EMAIL PROTECTED] wrote: Does anyone have compact implementations that are open-source (or otherwise share-able)?
Re: What about musical notation ?
Text on spanish and english Texto en español e inglés ** * VERSIÓN EN ESPAÑOL * ** Leí el código aprovado (pero aún no liberado), pero existe una deficiencia (a mi parecer) y sin menospreciar el excelente trabajo de Perry Roland: -Hablando específicamente de las notas; se enfoca a representar gráficamente una partitura, sin embargo no le está dando un significado a la posición que ocupa cada nota dentro del pentagrama, es decir, una negra en "Fa" no es lo mismo que en "La". Pensando un poco en como mejorarlo es asignar caracteres de posición (tal como se hace con índices y subíndices) para que conformen una sola representación gráfica, pero con significado (de acuerdo a la posición en el pentagrama). _ | |___| |__@_ ___@_ ¿Para qué mejorarlo?: Poder almacenar música (y no símbolos) de forma compacta en medios electrónicos, luego los reproductores electrónicos "hablarán" lo que se escribió en lenguaje musical (del mismo modo que ya existe software que habla lo que está escrito en cierto idioma) Gracias por su tiempo y atención, Erik Garrés *** * ENGLISH VERSION * *** I read the code approved (but not released yet), but exists a deficiency (from my point of view) and giving to Perry Roland all my admiration for the excellent work: -Talking strictly about the notes; the convention approved is focusing to "draw" music, but it is not giving it a meaning to the position where each note is, what I try to say is, a NATURAL on "Fa" has NOT the same value (meaning) on "La". Thinking a little bit how to improve it, is asigning caracters for position (similar to superscripts and subscripts) in order to have a unique graphic representation, but with meaning (determined by the position on the block). _ | |___| |__@_ ___@_ Why the improvement?: To be able to store music (not symbols) in a condensed format into electronic media, so the players will "talk" what is written in "muscial language" (like some software do speaking phrases in some languages). Thanks for your time and attention, Erik Garrés Hello, I think Mr. Garres means the western musical notation invented in the 1200s, which is very widely, if not universally, used today. Unicode 3.0 actually already has at least 2 older forms of musical notation in the main Hebrew block and somewhere in the Arabic block---they are signs for chanting liturgically. These symbols are at least 1100 years old. Elaine Keown "Erik Garrés" wrote: I would like to know, why the symbols used for music are not listed on UNICODE ? _ Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.
Chemistry on chinesse. (CJK)
Text on english and spanish Texto en inglés y español ** * VERSIÓN EN ESPAÑOL * ** Hacen falta los elementos químicos en el contexto de los caracteres chinos, debido a que no tienen el alfabeto para escribirlo, así que los requieren como una representación gráfica. Recuerdo que cuando estaba en prepa vi un libro de química que ilustraba que los elementos químicos se escriben de igual forma en cualquier idioma, y no los he visto listados en el Unicode ni tampoco fueron mencionados para la próxima revisión. Garcias por su tiempo y atención, Erik Garrés * ENGLISH VERSION * The elements of the periodical table (chemistry) are missing, and they are specially needed on chinesse because they don't have alphabet, so they need them as a graphical representation. I'm exposing this because when I was at high-school, I remember I saw on a chemistry book a chinesse page of a chemistry book, where my book was giving an example of the world-wide use of the chemical notation, and I don't see them listed on Unicode, neither is mentionated for the next revision. Thanks for your time and attention, Erik Garrés _ Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.