RE: Greek questions, on- and off-topic

2001-01-23 Thread Marco Cimarosti

 My Greek textbook has acute, grave, and circumflex (called by 
 those names), 
 but I'm not sure what these correspond to in the Greek and 
 Greek Extended 
 blocks (there seem to be many more diacriticals than those). 
 Is there an on-line guide somewhere?

There are in fact other diacritics used in Greek in addition to the three
accents:

- Dieresis or dialytica (also used in modern spelling)
- Spiritus asper (romanized with an "h") and spiritus levis
- Subscript iota (to show an unpronounced etymological "i")
- Macron and breve (only used in grammar books and dictionaries)
- Apostrophe (admitting it can be called a diacritic)
- and something else that I am forgetting, probably...

To know which Unicode code points should be used for these diacritics, the
handiest thing is to look up the canonical decompositions in the
UnicodeData.txt database, both in the basic Greek block (U+03xx) and in the
extended block (U+1Fxx). The canonical decomposition field is the data just
after the 5th semicolon on each line.

_ Marco




Re: Greek questions, on- and off-topic

2001-01-23 Thread Patrick T. Rourke

Here's a listing of the Unicode names (which are the modern Greek names, I
believe) for diacriticals in the Extended Greek range and the analogous
English *common* names of the Greek accents:

acute = oxia
grave = varia
circumflex = perispomeni
iota subscript = ypogegrammeni
smooth breathing = psili
rough breathing = dasia
diaresis = dialytika

"Tonos" is the Greek word for accent.  The letters with "tonos" in the basic
Greek block are called that because all accented Greek characters in modern
Greek script use the same accent - and that is the acute.

The following diacriticals are not used in typeset Greek text, but only in
dictionaries and other books where learners need to be given the length of
alpha, iota, and epsilon (omicron and epsilon are of course always short;
and omega and eta are of course always long, so one would never need the
macron or breve over the other four vowels, even in dictionary listings):

macron = macron
vrachy = breve

The basic Greek block also includes letters that are not used in Classical
Attic (Stigma, Digamma, Qoppa, Sampi, Yot), except that some are used as
numerals, and a number of characters that are only used in Coptic
(post-hieroglyphic Egyptian: Dei, Shei, Fei, Khei, Hori, Shima) and are
derived from the demotic Egyptian script.  Also do not use the "symbol"
versions of Greek letters.

Ano teleia is the semicolon (a raised dot). I imagine that the capitals with
diaresis are there for text that's in all capitals but is accented.

Note that ancient, biblical and Byzantine Greek all use the polytonic
version of the script, and modern Greek uses the monotonic (in effect, only
uses the acute accent).

I've been working for some time on an online resource for using Unicode with
ancient Greek, but it's not yet in finished form.

It is VERY important to follow Marco Cimarosti's suggestion to look at the
normalization forms chart.

Patrick Rourke
[EMAIL PROTECTED]


- Original Message -
From: "Marco Cimarosti" [EMAIL PROTECTED]
To: "Unicode List" [EMAIL PROTECTED]
Sent: Tuesday, January 23, 2001 3:10 AM
Subject: RE: Greek questions, on- and off-topic


  My Greek textbook has acute, grave, and circumflex (called by
  those names),
  but I'm not sure what these correspond to in the Greek and
  Greek Extended
  blocks (there seem to be many more diacriticals than those).
  Is there an on-line guide somewhere?

 There are in fact other diacritics used in Greek in addition to the three
 accents:

 - Dieresis or dialytica (also used in modern spelling)
 - Spiritus asper (romanized with an "h") and spiritus levis
 - Subscript iota (to show an unpronounced etymological "i")
 - Macron and breve (only used in grammar books and dictionaries)
 - Apostrophe (admitting it can be called a diacritic)
 - and something else that I am forgetting, probably...

 To know which Unicode code points should be used for these diacritics, the
 handiest thing is to look up the canonical decompositions in the
 UnicodeData.txt database, both in the basic Greek block (U+03xx) and in
the
 extended block (U+1Fxx). The canonical decomposition field is the data
just
 after the 5th semicolon on each line.

 _ Marco






Re: What about musical notation ?

2001-01-23 Thread Elaine Keown

Hello,

I think Mr. Garres means the western musical notation invented in the 1200s, which is 
very widely, if not universally, used today.

Unicode 3.0 actually already has at least 2 older forms of musical notation in the 
main Hebrew block and somewhere in the Arabic block---they are signs for chanting 
liturgically.   These symbols are at least 1100 years old.

Elaine Keown

 "Erik Garrés" wrote: 
 I would like to know, why the symbols used for music are not listed on 
 UNICODE ? 

Find the best deals on the web at AltaVista Shopping!
http://www.shopping.altavista.com



RE: PDUTR #27: Unicode 3.1

2001-01-23 Thread Peter_Constable


On 01/22/2001 01:11:42 PM Kenneth Whistler wrote:

I agree that Mark Davis' discussion covers many of the tricks to make
things
small *and* fast when dealing with Unicode tables.

However, you can start out with relatively simple approaches and still
get excellent performance in both memory and speed. For example, I
recently extended my own Sybase Unicode library implementation...

I'm sure the dedicated bit-twiddlers could improve my table size...

Does anyone have compact implementations that are open-source (or otherwise
share-able)?



- Peter


---
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: [EMAIL PROTECTED]





Re: UNICODE application on IBM Mainframe

2001-01-23 Thread Markus Scherer

I would like to add one item to this discussion:

Recently, someone from the IBM S/390 group told me that they had decided to store and 
use Unicode on S/390 as UTF-8/16/32.
They will not use UTF-EBCDIC. I am not aware of anyone inside or outside of IBM who 
does use UTF-EBCDIC. (There is another EBCDIC-friendly proposal out there in IBM that 
also does not seem to have been adopted.)

If the IMS DB is now updated to use Unicode, then it is probably as UTF-16, right?

markus



Re: PDUTR #27: Unicode 3.1

2001-01-23 Thread Markus Scherer

ICU stores most UnicodeData.txt properties in its uprops.dat, currently some 23kB 
(Unicode 3.0).
This does not include character names, which are in unames.dat, currently some 83kB.

There is currently a bug about wrong properties for the last 1k chars in plane 15  16 
(I will try to fix this before ICU 1.8), but otherwise it works fine for all of 
Unicode.

It's open source.

http://oss.software.ibm.com/developerworks/opensource/icu/ubrowse?k=10

markus

[EMAIL PROTECTED] wrote:
 Does anyone have compact implementations that are open-source (or otherwise
 share-able)?



Re: UNICODE application on IBM Mainframe

2001-01-23 Thread Lisa Moore


The IMS DB supports UTF-16.  Actually, you can store anything you want in
an IMS DB - if you want to provide all your own transaction management.
IMS provides transaction management for UTF-16, just not through any
3270-based applications.

Lisa


Markus Scherer [EMAIL PROTECTED] on 01/23/2001 10:18:35 AM

To:   "Unicode List" [EMAIL PROTECTED]
cc:
Subject:  Re: UNICODE application on IBM Mainframe



I would like to add one item to this discussion:

Recently, someone from the IBM S/390 group told me that they had decided to
store and use Unicode on S/390 as UTF-8/16/32.
They will not use UTF-EBCDIC. I am not aware of anyone inside or outside of
IBM who does use UTF-EBCDIC. (There is another EBCDIC-friendly proposal out
there in IBM that also does not seem to have been adopted.)

If the IMS DB is now updated to use Unicode, then it is probably as UTF-16,
right?

markus






Re: PDUTR #27: Unicode 3.1

2001-01-23 Thread Peter_Constable


Thanks for the info.


Peter




On 01/23/2001 12:56:45 PM Markus Scherer wrote:

ICU stores most UnicodeData.txt properties in its uprops.dat, currently
some
23kB (Unicode 3.0).
This does not include character names, which are in unames.dat, currently
some
83kB.

There is currently a bug about wrong properties for the last 1k chars in
plane
15  16 (I will try to fix this before ICU 1.8), but otherwise it works
fine
for all of Unicode.

It's open source.

http://oss.software.ibm.com/developerworks/opensource/icu/ubrowse?k=10

markus

[EMAIL PROTECTED] wrote:
 Does anyone have compact implementations that are open-source (or
otherwise
 share-able)?




Re: What about musical notation ?

2001-01-23 Thread Erik Garrés

Text on spanish and english
Texto en español e inglés

**
* VERSIÓN EN ESPAÑOL *
**

Leí el código aprovado (pero aún no liberado), pero existe una deficiencia 
(a mi parecer) y sin menospreciar el excelente trabajo de Perry Roland:
-Hablando específicamente de las notas; se enfoca a representar gráficamente 
una partitura, sin embargo no le está dando un significado a la posición que 
ocupa cada nota dentro del pentagrama, es decir, una negra en "Fa" no es lo 
mismo que en "La". Pensando un poco en como mejorarlo es asignar caracteres 
de posición (tal como se hace con índices y subíndices) para que conformen 
una sola representación gráfica, pero con significado (de acuerdo a la 
posición en el pentagrama).
_
|
|___|
|__@_
___@_

¿Para qué mejorarlo?: Poder almacenar música (y no símbolos) de forma 
compacta en medios electrónicos, luego los reproductores electrónicos 
"hablarán" lo que se escribió en lenguaje musical (del mismo modo que ya 
existe software que habla lo que está escrito en cierto idioma)

Gracias por su tiempo y atención,
Erik Garrés


***
* ENGLISH VERSION *
***
I read the code approved (but not released yet), but exists a deficiency 
(from my point of view) and giving to Perry Roland all my admiration for the 
excellent work:
-Talking strictly about the notes; the convention approved is focusing to 
"draw" music, but it is not giving it a meaning to the position where each 
note is, what I try to say is, a NATURAL on "Fa" has NOT the same value 
(meaning) on "La". Thinking a little bit how to improve it, is asigning 
caracters for position (similar to superscripts and subscripts) in order to 
have a unique graphic representation, but with meaning (determined by the 
position on the block).
_
|
|___|
|__@_
___@_

Why the improvement?: To be able to store music (not symbols) in a condensed 
format into electronic media, so the players will "talk" what is written in 
"muscial language" (like some software do speaking phrases in some 
languages).

Thanks for your time and attention,
Erik Garrés



Hello,

I think Mr. Garres means the western musical notation invented in the 
 1200s, which is very widely, if not universally, used today.

Unicode 3.0 actually already has at least 2 older forms of musical 
 notation in the main Hebrew block and somewhere in the Arabic 
block---they are signs for chanting liturgically.   These symbols are at 
least 1100 years old.

Elaine Keown

  "Erik Garrés" wrote:
  I would like to know, why the symbols used for music are not listed on
  UNICODE ?

_
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.




Chemistry on chinesse. (CJK)

2001-01-23 Thread Erik Garrés

Text on english and spanish
Texto en inglés y español

**
* VERSIÓN EN ESPAÑOL *
**

Hacen falta los elementos químicos en el contexto de los caracteres chinos, 
debido a que no tienen el alfabeto para escribirlo, así que los requieren 
como una representación gráfica.
Recuerdo que cuando estaba en prepa vi un libro de química que ilustraba que 
los elementos químicos se escriben de igual forma en cualquier idioma, y no 
los he visto listados en el Unicode ni tampoco fueron mencionados para la 
próxima revisión.

Garcias por su tiempo y atención,
Erik Garrés


* ENGLISH VERSION  *


The elements of the periodical table (chemistry) are missing, and they are 
specially needed on chinesse because they don't have alphabet, so they need 
them as a graphical representation.
I'm exposing this because when I was at high-school, I remember I saw on a 
chemistry book a chinesse page of a chemistry book, where my book was giving 
an example of the world-wide use of the chemical notation, and I don't see 
them listed on Unicode, neither is mentionated for the next revision.

Thanks for your time and attention,
Erik Garrés

_
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.