Latvian palatalised consonants

2000-10-17 Thread Herman Ranes

Is the use of the existing precomposed characters in the Latin
Extended-A block considered 'right' for encoding Latvian palatal
consonants, or is it considered 'wrong' so that I will have to use
composites with U+0326 'Combining comma below' in stead?

I am aware that many use those percomposed cedilla-characters, but
nevertheless it does not look Latvian to me...

Romanian did get its precomposed letters - can one expect any
precendence with regard to Latvian?

:-)

--
Herman Ranes  Høgskolen i Sør-Trøndelag
  Avdeling for teknologi
Telefon   +47 73559606Institutt for elektroteknikk
Telefaks  +47 73559581
[EMAIL PROTECTED]  N-7004 Trondheim
http://www.hist.no/~herman/   NOREG



Re: Latvian palatalised consonants

2000-10-17 Thread Jukka . Korpela

On Tue, 17 Oct 2000, Herman Ranes wrote:

 Is the use of the existing precomposed characters in the Latin
 Extended-A block considered 'right' for encoding Latvian palatal
 consonants, or is it considered 'wrong' so that I will have to use
 composites with U+0326 'Combining comma below' in stead?
 
 I am aware that many use those percomposed cedilla-characters, but
 nevertheless it does not look Latvian to me...
 
 Romanian did get its precomposed letters - can one expect any
 precendence with regard to Latvian?

As far as I know, there is an official decision by the Romanian
Standards Institute to regard certain Romanian characters as
containing comma below and not cedilla, making ISO 8859-2 (originally
designed to cover Romanian too) inadequate for writing Romanian.
There is a committee draft for ISO 8859-16 intended to solve this problem:
http://www.egt.ie/standards/iso8859/cd8859-16-en.pdf

What is the official position on the nature of the diacritic mark
we're discussing, in Latvia? Inofficial documents, like
http://www.geocities.com/tuksnesis/valoda/diacrtic.html
seem to call it "cedilla" - and display glyphs where it is clearly
comma-like in appearance.

_If_ there were an official statement saying that it's a comma and not
a cedilla, then one _might_ refer to the Romanian case as a precedent.
But then the problem would arise whether one really needs to make
a distinction between comma and cedilla. The problem with s and t
with comma or cedilla was that they are also used outside Romanian.
The Unicode attitude, expressed in the description of Latin Extended-A,
http://www.unicode.org/charts/PDF/U0100.pdf is somewhat confusing.
For example, U+015F LATIN SMALL LETTER S WITH CEDILLA is, according to it,
used in Turkish, Azerbaidjani, Romanian, ..., but "a glyph variant
with comma below is preferred for Romanian"; on the other hand,
that "glyph variant" appears as U+0219 LATIN SMALL LETTER S WITH COMMA
BELOW, with the note "Romanian, when distinct comma below form is
required".

So are the characters you're referring to "Latvian only"?

Unfortunately, there doesn't seem to be any collection of information
that could be used as a reference concerning the use of letters in
different languages. The ISO 8859 series implicitly constitutes a partial
(but very partial) reference, since those standards list languages for
which a particular standard of the series is applicable for. (See my
http://www.hut.fi/u/jkorpela/8859.html which summarizes coverage of
European languages by ISO 8859 alphabets.) Then there's the rather
detailed
http://www.eki.ee/itstandard/docs/draft-alvestrand-lang-char-03.txt
but it is old, and with a status of expired Internet-draft. And there
are some notes in the Unicode standard, but they are typically just
_examples_ of languages in which a character is used. And there's a nice
online database at http://www.eki.ee/letter/ which is based on various
sources.

For example, for U+0137 LATIN SMALL LETTER K WITH CEDILLA, all the
information available to me suggests that it is used in Latvian only,
with a glyph where the diacritic part is a comma below "k", not
connected to it in any cedilla-like manner. So what would be the
problem in using it? There _would_ be a problem if some other language
used the character so that the diacritic part is somehow cedilla-like.
(But even then, it might be regarded as something to be handled at
a higher protocol level, based on language information.)

So I don't think it's a problem; the only real problem appears to be
the _name_ which contains the word CEDILLA, but it's just a name,
and diacritics may vary in appearance anyway. (Consider how differently
an acute accent can be displayed.)

-- 
Yucca, http://www.hut.fi/u/jkorpela/ or http://yucca.hut.fi/yucca.html




Re: utf-8 != latin-1

2000-10-17 Thread Mark Davis

One of the main features of XML is that it has quite strict rules about how
to handle errors. The goal, I believe, is to ensure that we are not awash in
malformed files that have no clear interpretation.

And this is clearly an error: the acceptable code points are quite clearly
stated:

http://www.w3.org/TR/2000/REC-xml-20001006#dt-character

Converting an illegal UTF-8 sequence into a valid -- BUT WRONG -- sequence
of valid code points is clearly against the intent of this production rule.
XML could have taken the opposite tack -- that illegal code points and
illegal code unit sequences are to be ignored. But it didn't.

Mark

BTW, I have a simple browser-based UTF converter (in Javascript) at
http://www.macchiato.com/unicode/charts.html (click on Converter). It lets
you convert back and forth to different UTFs, with various choices for
format. And, it does checks for illegal UTF-8 sequences!

- Original Message -
From: "Doug Ewell" [EMAIL PROTECTED]
To: "Unicode List" [EMAIL PROTECTED]
Sent: Friday, October 13, 2000 21:59
Subject: Re: utf-8 != latin-1


"Steven R. Loomis" [EMAIL PROTECTED] wrote:

 What happened was that the sequence  AD 63 61 73 was
 interpreted as U+E54E U+DC73..

Why?  As an illegal UTF-8 sequence, it shouldn't be interpreted as
anything.

John Cowan's "utf" perl script (which carries the appropriate
disclaimers about no error checking) converts that sequence to U+D94E
U+DC73, which seems a bit more reasonable -- at least it's a complete
surrogate pair.

-Doug Ewell
 Fullerton, California







Re: Can anyone help me!!!

2000-10-17 Thread sanatan mohanty




hi,

  can't i use unicode to generate and show the fonts in any browser
irrespective of their support to  unicode!. like by writing  plugin or
something like this. and when a user with browser which doesn't support
unicode like to access that webpage. he/she needs to install that plugin.

 will it be possible

 
On Thu, 28 Sep 2000, Yung-Fong Tang wrote:

 
 
 Antoine Leca wrote:
 
  sanatan mohanty wrote:
  
   i  have a project to make a webpage, which will be unicode enable.
 
  Good.
 
   i can show indian language fonts.
   i can type those fonts on the webpage itself on text boxes!.
 
  Ah! How do you do that?
  Or do you mean "would/should" instead?
 
   and it should be atleast work on netscape and windows
   explorer!, and atleast LINUX and Windows OS supports it!.
 
  I am not aware that Netscape, even in version 6, is able to
  display Indian sentences encoded in Unicode (although it is
  able to display individual characters). The problem is in
  the rendering (displaying) of the conjuncts, and the reordering
  of the left-positionned matra's.
 
 Does Netscape6 on Win2K have this problem ? If so, can you put together
 a test page for us? We know there are problem when we try to select the
 conjuncts. However, since we use TextOutW, in theory the TextOutW should
 handle conjuncts and handle the reording of the left-positionned
 matra's.
 
 
 
 
so, can u people give me some brief ideas abt keyboard mapping,
 
  Keyboard layout is unrelated to the problem.
  You can use whatever you want (or are comfortable with).
 
  However, you certainly need a Unicode-able editor. Very few of
  them are Indian-enabled (Microsoft are the best choice, but are
  not the cheaper, particularly since it pratically needs Win2000).
 
   unicode font setting,
 
  There are very few Indian "Unicode" fonts for the moment.
  And even less work with X11/Linux.
 
  In fact, I am not aware of any such a font. Which is the main
  reason why I ask the questions above.
 
   dispay setting
 
  What do you mean with display setting?
  The display setting is on the the client side. You are not going
  to have any form of control on this setting... (and no, I do not
  like browsing a web site and encountering a page that says
  "please, change over all your settings in order to browse my
  site"; actually, I often switch away).
 
  Antoine
 




RE: Can anyone help me!!!

2000-10-17 Thread Michael Jansson

Hi,

Writing a plugin would not be enough. There are quite a few issues
to deal with when rendering Indian text in a browser without
Unicode support (as you all know). I assume that you are looking for 
a solution that works for more than just one browser on one platform!?

Some browser may neither support Unicode text encoding formats (e.g.
utf-8), nor rendering of 16-bit characters. Also they would probably 
not be able to deal with the complex character shaping and positioning 
and text direction issues found in Indian and other languages. Some 
browsers do not support downloading (partial) fonts yet, so these 
browsers may not be able to show the text even if they did support 
Unicode. There are other issues as well

It's not impossible to solve these problems though, but it is *very* 
hard. We (at BorWare AB) are working on a product with which we intend 
to support Unicode, CSS level 2 and font embedding on many platforms and 
browsers. Specifically, it will support Indian Unicode fonts (OpenType 
Layout) and non-Unicode Indian fonts (TT, T1, etc) in IE 4.x, IE 5.x, 
Nav 4.x, Nav 6.x, Op4, WebTV on (non-Indian) Windows, Unix, Mac. It's 
being beta tested right now and should be available sometime next year...

Regards,
- Michael


 -Original Message-
 From: sanatan mohanty [mailto:[EMAIL PROTECTED]]
 Sent: Tuesday, October 17, 2000 5:33 PM
 To: Unicode List
 Cc: Unicode List; [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Subject: Re: Can anyone help me!!!
 
 
 
 
 
 hi,
 
   can't i use unicode to generate and show the fonts in any browser
 irrespective of their support to  unicode!. like by writing  plugin or
 something like this. and when a user with browser which 
 doesn't support
 unicode like to access that webpage. he/she needs to install 
 that plugin.
 
  will it be possible
 
  
 On Thu, 28 Sep 2000, Yung-Fong Tang wrote:
 
  
  
  Antoine Leca wrote:
  
   sanatan mohanty wrote:
   
i  have a project to make a webpage, which will be 
 unicode enable.
  
   Good.
  
i can show indian language fonts.
i can type those fonts on the webpage itself on text boxes!.
  
   Ah! How do you do that?
   Or do you mean "would/should" instead?
  
and it should be atleast work on netscape and windows
explorer!, and atleast LINUX and Windows OS supports it!.
  
   I am not aware that Netscape, even in version 6, is able to
   display Indian sentences encoded in Unicode (although it is
   able to display individual characters). The problem is in
   the rendering (displaying) of the conjuncts, and the reordering
   of the left-positionned matra's.
  
  Does Netscape6 on Win2K have this problem ? If so, can you 
 put together
  a test page for us? We know there are problem when we try 
 to select the
  conjuncts. However, since we use TextOutW, in theory the 
 TextOutW should
  handle conjuncts and handle the reording of the left-positionned
  matra's.
  
  
  
  
 so, can u people give me some brief ideas abt keyboard mapping,
  
   Keyboard layout is unrelated to the problem.
   You can use whatever you want (or are comfortable with).
  
   However, you certainly need a Unicode-able editor. Very few of
   them are Indian-enabled (Microsoft are the best choice, but are
   not the cheaper, particularly since it pratically needs Win2000).
  
unicode font setting,
  
   There are very few Indian "Unicode" fonts for the moment.
   And even less work with X11/Linux.
  
   In fact, I am not aware of any such a font. Which is the main
   reason why I ask the questions above.
  
dispay setting
  
   What do you mean with display setting?
   The display setting is on the the client side. You are not going
   to have any form of control on this setting... (and no, I do not
   like browsing a web site and encountering a page that says
   "please, change over all your settings in order to browse my
   site"; actually, I often switch away).
  
   Antoine
  
 



C Programming for Unicode

2000-10-17 Thread SoHee Kim


 Hi,

 I would like to modify existing C application so that it supports
unicode.
 Does anybody know any references any samples that would help?
 Thanks.

 SoHee




Preliminary charts for Unicode 3.2 draft

2000-10-17 Thread Asmus Freytag

Preliminary character charts are now available for those characters that 
are proposed to go into Unicode 3.2 (and into AMD1 to ISO/IEC 
10646-1:2000). The majority of the proposed characters are mathematical 
symbols and arrows.


The new URL is:
http://www.unicode.org/charts/draftunicode32/

There is also a link to the draft charts from the Pipeline Table.
The link is in a new paragraph reading:

Charts of the characters proposed for addition in Unicode Version 3.2
are currently available for review. The charts provide preliminary
information only. Click here for the index.

The difference for these charts over other proposal documents is that the 
new characters are shown in context with the existing characters, using the 
standard charts and nameslist format. The file format is PDF.

The charts are made available to allow implementers to prepare products 
that will eventually support Unicode 3.2 when it is published. Please note 
the cautionary language and disclaimers that accompany these charts.

If you note any errors, omissions, inaccuracies etc. you may send your 
detailed comments to me.

A./



Re: C Programming for Unicode

2000-10-17 Thread Helena Shih

There are a few options, depending what you mean by "supports unicode".  If
all you care about the code page conversion so your program can process
Unicode code points, glibc is freely available on many platforms,
http://www.gnu.org.

If your application requires more sophisticated Unicode support such as
collation and word break etc., take a look at ICU,
http://oss.software.ibm.com/icu.  It's also freely available on many
interesting environments.  Qt also provides a great set of features, again
for free.  A more complete list of internatinalization libraries can be
found at http://www.unicode.org/unicode/onlinedat/products.html.  Some of
them are commercial products and some not.

- Original Message -
From: "SoHee Kim" [EMAIL PROTECTED]
To: "Unicode List" [EMAIL PROTECTED]
Sent: Tuesday, October 17, 2000 1:54 PM
Subject: C Programming for Unicode



  Hi,

  I would like to modify existing C application so that it supports
 unicode.
  Does anybody know any references any samples that would help?
  Thanks.

  SoHee






Re: C Programming for Unicode

2000-10-17 Thread Jungshik Shin

On Tue, 17 Oct 2000, Helena Shih wrote:

 There are a few options, depending what you mean by "supports unicode".  If
 all you care about the code page conversion so your program can process
 Unicode code points, glibc is freely available on many platforms,
 http://www.gnu.org.

I'm afraid this is a little bit of understatement fow what glibc can do
(among other things, glibc can do collation like any other C library can
do with appropriate locales)  . It would be a good description of iconv
and friends in glibc ( any C library with iconv supporting many encodings
). In case glibc is too big to install on the target platform, there's
also a standalone free (LGPLed) libiconv (developed by Bruno Haible )
that offers iconv(3) for a lot of encodings.

  http://clisp.cons.org/~haible/packages-libiconv.html

Jungshik Shin




Re: C Programming for Unicode

2000-10-17 Thread Helena Shih

My apology, I didn't realize glibc also supports Unicode collation
algorithm.  If so, yes, my statement underestimated the support in glibc
quite a bit.

Sorry.
- Original Message -

 I'm afraid this is a little bit of understatement fow what glibc can do
 (among other things, glibc can do collation like any other C library can
 do with appropriate locales)  . It would be a good description of iconv
 and friends in glibc ( any C library with iconv supporting many encodings
 ). In case glibc is too big to install on the target platform, there's
 also a standalone free (LGPLed) libiconv (developed by Bruno Haible )
 that offers iconv(3) for a lot of encodings.

   http://clisp.cons.org/~haible/packages-libiconv.html

 Jungshik Shin






Re: Korean syllable decomposition(was: CJK combining components)

2000-10-17 Thread Jungshik Shin

On Tue, 17 Oct 2000 [EMAIL PROTECTED] wrote:

 So, do they have a table that says "This hangul syllable
 is made up of components X, Y, and Z"?) Maybe Unicode
 should have one.

Well, Unicode will never have one for dynamic glyph composition of Hangul
syllables ;-) because there are so many possibilities (how many different
sets of glyphs to use for initial consonants, medial vowels and final
consonants.  The higher quality you want to get, the more sets you need).
One example of such a table is, though, provided by Hanterm(Korean xterm)
source code (http://elf.kaist.ac.kr/hanterm) which can make use of
both precomposed Hangul fonts (with only 2350 syllables for KS X 1001)
and fonts made up of Jamos ( 10 sets of initial consonants, 4? sets of
medial vowels and 4? sets of final consonants) for on-the-fly composition
of glyphs (for all 11,172 modern syllables and thousands of antique
syllables). Mozilla supports that and you may find it interestng to go
thru nsUnicodeToX11Johab.cpp (at www.mozilla.org, follow the link for the
source code and type in the file name).  Unix/X11 JDK used to allow this
kind of on-the-fly composition by simply editing font.properties file
and providing a simple Java class to take care of dynamic composition,
but at least Linux port of JDK 1.2 stopped working that way.



 When you make a Korean font, you only need to make
 the components and have a program combine them for
 you, correct?

That's not that simple, unfortunately.  In principle, that's possible,
but in reality it still needs a lot of manual intervention to get a high
quaility font.  (I'm not familiar with the way foundries in Korea make
Hangul fonts) Anyway, if you look inside the some of truetype fonts with
Hangul syllables, you'll find a lot of components (Jamos) that I presume
make up syllables making use of facilities provided by truetype for
the 'dynamic' composition(??).

Jungshik Shin





RE: C Programming for Unicode

2000-10-17 Thread Carl W. Brown

SoHee,

See
http://oss.software.ibm.com/developerworks/opensource/icu/project/index.html

This has a library of Unicode C APIs.  Most of the docs are for the C++ APIs
but if you look at the user guide
http://oss.software.ibm.com/developerworks/opensource/icu/project/userguide/
index.html you can see examples of the C API.

Carl

-Original Message-
From: SoHee Kim [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, October 17, 2000 1:54 PM
To: Unicode List
Subject: C Programming for Unicode



 Hi,

 I would like to modify existing C application so that it supports
unicode.
 Does anybody know any references any samples that would help?
 Thanks.

 SoHee