Re: Tamil glyphs

2000-09-07 Thread Michael \(michka\) Kaplan
From: "Antoine Leca" <[EMAIL PROTECTED]> > Sorry, what is "the other form"? As I see things, in Tamil Nadu the current > use is write NNAI exactly the same as, for example, KAI (that is, without > the "elephant-trunk" form that TUS appears to require). There are two forms, with and without the "

Re: Win32: Commandline/batch ANSI-UTF8-UTF16-UTF8-ANSI conversion too

2000-09-07 Thread Michael \(michka\) Kaplan
Heh heh heh... thats the one I was going to suggest for the UTF-8 conversions... :-) i would stronly recommend you use StrConv for the non-UTF-8 ones though, because it is roughly twice as fast (the slowness of Declare statements in VB). Although I do have a test version that uses Matt Curland's

RE: Win32: Commandline/batch ANSI-UTF8-UTF16-UTF8-ANSI conversion

2000-09-07 Thread Mikko Lahti
Title: RE: Win32: Commandline/batch ANSI-UTF8-UTF16-UTF8-ANSI conversion too Mostly to convert files. I think I could do it with MLANG but I was hoping that there already are some tools to do it (or a VB wrapper since my C is not that good ;>). I was looking at your web site (excellent, by th

Re: Win32: Commandline/batch ANSI-UTF8-UTF16-UTF8-ANSI conversion too

2000-09-07 Thread Michael \(michka\) Kaplan
I do not think there is one, actually. Are you looking to convert files or strings you send via the command line? michka - Original Message - From: "Mikko Lahti" <[EMAIL PROTECTED]> To: "Unicode List" <[EMAIL PROTECTED]> Sent: Thursday, September 07, 2000 6:31 PM Subject: Win32: Comman

Win32: Commandline/batch ANSI-UTF8-UTF16-UTF8-ANSI conversion too

2000-09-07 Thread Mikko Lahti
Title: Win32: Commandline/batch ANSI-UTF8-UTF16-UTF8-ANSI conversion tools Are there any Win32 command line or batch ANSI to Unicode conversions tools out there? Desired conversions are: - Windows-1252 to UTF-8 - Windows-1252 to UTF-16 - UTF-8 to Windows-1252 - UTF-16 to Windows-1252 - U

RE: Unicode on a non-Unicode web page

2000-09-07 Thread Paul Deuter
Your question is essentially "How do I mix characters encoded in more than one character set on a single page?" A normal page has one document and that one document will expect characters to be encoded in the character set specified in the meta tag in the header. It is possible to have a compound

Re: Ligatured characters

2000-09-07 Thread James Kass
William Overington wrote: > However, suppose that > one is wishing to transcribe an eighteenth century printed book and one > wishes to preserve the information as to when a long s was used and when a > ligature such as "long s and t" were used. How should one encode the text > in unicode pleas

Re: Ligatured characters

2000-09-07 Thread John Cowan
William Overington wrote: > Suppose that one is producing a program, such as a Java applet, to display > pages of printed text, and one wishes to encode text that contains > ligatures. Unicode is a plain text encoding standard. Fonts can and should supply the ligatures which are appropriate to

Re: Tamil glyphs

2000-09-07 Thread Antoine Leca
Michael Kaplan wrote: > > To answer a question someone else posed, a ZWNBSP or a ZWJ will not work > here since the vowel reordering must happen, as well. They are two entirely > different but entirely valid forms of the same groups of letters. Agreed. > I guess one could claim that the probl

Re: Unicode on a non-Unicode web page

2000-09-07 Thread John Cowan
"Gary P. Grosso" wrote: > Netscape Communicator 4.6 doesn't. Versions of Netscape before 4.7 had this bug: character references greater than ÿ only worked if the transmission character set was UTF-8. > One way to look at this is: how do I use unicode as an > "escape" to include some isolated co

Re: Surrogate support in *ML?

2000-09-07 Thread John Cowan
Brendan Murray/DUB/Lotus wrote: > I know that XML explicitly excludes surrogates. XML, like any conformant Unicode process, excludes *unpaired* surrogates. Surrogate-pair characters may be used, though not in XML names, only in character data. They may appear as themselves or using character re

Ligatured characters

2000-09-07 Thread William Overington
Having had some experience with handset metal type, years ago, I remember that many founts had ligatured characters. Most founts had fi ff fl ffi ffl each provided as one piece of type. Some founts had ct and st ligatures as well, with a thin ornamental line connecting the top of the c or s and

Re: Unicode on a non-Unicode web page

2000-09-07 Thread Herman Ranes
"Gary P. Grosso" skreiv: > > Hi Unicoders, > > > Is there some way I can nudge Netscape's browser to display these? An amateur's explanation of what can be done to make the HTML code 'understandable' to Netscape Navigator 4, without actually encoding in UTF-8: -Meta tag the document as UTF

RE: Surrogate support in *ML?

2000-09-07 Thread Karlsson Kent - keka
> From: Brendan Murray/DUB/Lotus [mailto:[EMAIL PROTECTED]] ... > Karlsson Kent - keka <[EMAIL PROTECTED]> wrote: > > At the level of XML the number of bits is irrelevant. > > The "high and low surrogate" code points are excluded > > from being used as NCRs. A character (not UTF-16 code > > unit

Re: Surrogate support in *ML?

2000-09-07 Thread Brendan Murray/DUB/Lotus
Mark Davis <[EMAIL PROTECTED]> wrote > In HTML or XML you always use the code point (e.g. UTF-32), not a series of > code units (UTF-8 or UTF-16). Thus you would use: > > 𐄣 > > not �� from UTF-16 Thank you - that solves the conundrum. B=

Re: Unicode on a non-Unicode web page

2000-09-07 Thread Michael \(michka\) Kaplan
NS 4.x is simply not very good at this sort of thing. The only real solution is to use an encoding that will support the characters, such as UTF-8. michka - Original Message - From: "Gary P. Grosso" <[EMAIL PROTECTED]> To: "Unicode List" <[EMAIL PROTECTED]> Sent: Thursday, September 07,

Re: Tamil glyphs

2000-09-07 Thread Antoine Leca
[EMAIL PROTECTED] wrote: > > I am gradually developing the impression that the spelling of modern Indic > languages occasionally needs old graphies (ligatures, etc.) in quotations > from "classical" sources. I understand your point, and I certainly can understand it. However, can we consider th

[Fwd: Unicode Conversions]

2000-09-07 Thread Mark Davis
Mark Davis wrote: > > > > > Hello all, > > I have been trying to input unicode from a browser and store it in a database. >The problem is the different encodings used to represent the unicode. > > The input text is in the UTF-8 format. I have read on the Microsoft support site >that SQL Serv

Unicode on a non-Unicode web page

2000-09-07 Thread Gary P. Grosso
Hi Unicoders, I am working on software to emit HTML in the encoding and character set of the user's choice, from SGML/XML documents which can contain any Plane 1 Unicode character. The question is what to do with characters outside the selected encoding. I thought I would use the "numeric" chara

Re: Tamil glyphs

2000-09-07 Thread Michael \(michka\) Kaplan
To answer a question someone else posed, a ZWNBSP or a ZWJ will not work here since the vowel reordering must happen, as well. They are two entirely different but entirely valid forms of the same groups of letters. I guess one could claim that the problem is with the current block description, wh

Re: Tamil glyphs

2000-09-07 Thread Michael \(michka\) Kaplan
From: "Rick McGowan" <[EMAIL PROTECTED]> > > However, it cannot currently be handled by Unicode. You must choose the > > proper font to display NNA AI, NNNA AI, LA AI, or LLA AI. The Monotype font > > and Latha in Windows 2000 are the way that my client got both display types. > > I suppose if you

RE: Surrogate support in *ML?

2000-09-07 Thread Brendan Murray/DUB/Lotus
Karlsson Kent - keka <[EMAIL PROTECTED]> wrote: > At the level of XML the number of bits is irrelevant. > The "high and low surrogate" code points are excluded > from being used as NCRs. A character (not UTF-16 code > units) can be referenced by NCRs. See (XML) procuction 66 > (CharRef) and its

Re: Surrogate support in *ML?

2000-09-07 Thread Mark Davis
In HTML or XML you always use the code point (e.g. UTF-32), not a series of code units (UTF-8 or UTF-16). Thus you would use: 𐄣 not �� from UTF-16 nor 𐄣 from UTF-8 Mark Brendan Murray/DUB/Lotus wrote: > How can one encode a surrogate character as an entity in HTML/XML? Should > it be as t

Re: Plane 14 redux (was: Same language, two locales)

2000-09-07 Thread Michael Everson
I do not have the confidence which you do in the Ethnologue's taxonomy or in its freedom from error, Peter. The 50+ document requirement for ISO 639-2 is not unreasonable. Languages should be proposed for inclusion in ISO 639 wherever appropriate. Other languages can be proposed via RFC 1766. If

RE: Tamil glyphs

2000-09-07 Thread Marco . Cimarosti
Antoine Leca wrote: > Michael (michka) Kaplan wrote: > [...] > > The Monotype font and Latha in Windows 2000 are the way > that my client got > > both display types. > > I believe this is a rather special need that your client > have: as I understand, > he wants, at the same time, some renderi

Surrogate support in *ML?

2000-09-07 Thread Brendan Murray/DUB/Lotus
How can one encode a surrogate character as an entity in HTML/XML? Should it be as two separate characters or as one 32-bit value? In other words should it be: ꯍïGH; or �GH; Brendan

RE: Converter for BIG5

2000-09-07 Thread Chris Pratley
One option is Word2000. Just open the text file and the UI for trying different encodings should appear. If it doesn't you can force it to appear by setting on Tools/Options/General/Confirm conversions on open. Then when you open the file, choose the type to be Encoded text.   You can t

Re: Tamil glyphs

2000-09-07 Thread Antoine Leca
Michael (michka) Kaplan wrote: > [About the representation vs. encoding of Tamil .naa] > > Actually, Apurva just did explain it and since she comes from a > typography background she did explain how the whole problem can be handled > via fonts. :-) > > However, it cannot currently be handled

Re: Converter for BIG5

2000-09-07 Thread Sean O Seaghdha
Ar 6 Sep 2000, ag 21:25 scríobh viswanathan fán ábhar "Converter for BIG5": One possibility out of many: NJStar Communicator 2.2 comes with a Universal Code Converter - look under the nán [Big5] «n [HZ] ~{DO~} menu. It does conversions between CJK encodings & Unicode (UCS2, UTF8, UTF7). For 32-