From: "Antoine Leca" <[EMAIL PROTECTED]>
> Sorry, what is "the other form"? As I see things, in Tamil Nadu the
current
> use is write NNAI exactly the same as, for example, KAI (that is, without
> the "elephant-trunk" form that TUS appears to require).
There are two forms, with and without the "
Heh heh heh... thats the one I was going to suggest for the UTF-8
conversions... :-)
i would stronly recommend you use StrConv for the non-UTF-8 ones though,
because it is roughly twice as fast (the slowness of Declare statements in
VB).
Although I do have a test version that uses Matt Curland's
Title: RE: Win32: Commandline/batch ANSI-UTF8-UTF16-UTF8-ANSI conversion too
Mostly to convert files. I think I could do it with MLANG but I was hoping that there already are some tools to do it (or a VB wrapper since my C is not that good ;>).
I was looking at your web site (excellent, by th
I do not think there is one, actually.
Are you looking to convert files or strings you send via the command line?
michka
- Original Message -
From: "Mikko Lahti" <[EMAIL PROTECTED]>
To: "Unicode List" <[EMAIL PROTECTED]>
Sent: Thursday, September 07, 2000 6:31 PM
Subject: Win32: Comman
Title: Win32: Commandline/batch ANSI-UTF8-UTF16-UTF8-ANSI conversion tools
Are there any Win32 command line or batch ANSI to Unicode conversions tools out there?
Desired conversions are:
- Windows-1252 to UTF-8
- Windows-1252 to UTF-16
- UTF-8 to Windows-1252
- UTF-16 to Windows-1252
- U
Your question is essentially "How do I mix characters encoded in more than
one character set on a single page?"
A normal page has one document and that one document will expect characters
to be encoded in the character set specified in the meta tag in the header.
It is possible to have a compound
William Overington wrote:
> However, suppose that
> one is wishing to transcribe an eighteenth century printed book and one
> wishes to preserve the information as to when a long s was used and when a
> ligature such as "long s and t" were used. How should one encode the text
> in unicode pleas
William Overington wrote:
> Suppose that one is producing a program, such as a Java applet, to display
> pages of printed text, and one wishes to encode text that contains
> ligatures.
Unicode is a plain text encoding standard. Fonts can and should supply the
ligatures which are appropriate to
Michael Kaplan wrote:
>
> To answer a question someone else posed, a ZWNBSP or a ZWJ will not work
> here since the vowel reordering must happen, as well. They are two entirely
> different but entirely valid forms of the same groups of letters.
Agreed.
> I guess one could claim that the probl
"Gary P. Grosso" wrote:
> Netscape Communicator 4.6 doesn't.
Versions of Netscape before 4.7 had this bug: character references greater
than ÿ only worked if the transmission character set was UTF-8.
> One way to look at this is: how do I use unicode as an
> "escape" to include some isolated co
Brendan Murray/DUB/Lotus wrote:
> I know that XML explicitly excludes surrogates.
XML, like any conformant Unicode process, excludes *unpaired* surrogates.
Surrogate-pair characters may be used, though not in XML names, only
in character data. They may appear as themselves or using character
re
Having had some experience with handset metal type, years ago, I remember
that many founts had ligatured characters. Most founts had fi ff fl ffi ffl
each provided as one piece of type. Some founts had ct and st ligatures as
well, with a thin ornamental line connecting the top of the c or s and
"Gary P. Grosso" skreiv:
>
> Hi Unicoders,
>
>
> Is there some way I can nudge Netscape's browser to display these?
An amateur's explanation of what can be done to make the HTML code
'understandable' to Netscape Navigator 4, without actually encoding in
UTF-8:
-Meta tag the document as UTF
> From: Brendan Murray/DUB/Lotus [mailto:[EMAIL PROTECTED]]
...
> Karlsson Kent - keka <[EMAIL PROTECTED]> wrote:
> > At the level of XML the number of bits is irrelevant.
> > The "high and low surrogate" code points are excluded
> > from being used as NCRs. A character (not UTF-16 code
> > unit
Mark Davis <[EMAIL PROTECTED]> wrote
> In HTML or XML you always use the code point (e.g. UTF-32), not a series
of
> code units (UTF-8 or UTF-16). Thus you would use:
>
> 𐄣
>
> not �� from UTF-16
Thank you - that solves the conundrum.
B=
NS 4.x is simply not very good at this sort of thing. The only real solution
is to use an encoding that will support the characters, such as UTF-8.
michka
- Original Message -
From: "Gary P. Grosso" <[EMAIL PROTECTED]>
To: "Unicode List" <[EMAIL PROTECTED]>
Sent: Thursday, September 07,
[EMAIL PROTECTED] wrote:
>
> I am gradually developing the impression that the spelling of modern Indic
> languages occasionally needs old graphies (ligatures, etc.) in quotations
> from "classical" sources.
I understand your point, and I certainly can understand it.
However, can we consider th
Mark Davis wrote:
> >
>
> > Hello all,
> > I have been trying to input unicode from a browser and store it in a database.
>The problem is the different encodings used to represent the unicode.
> > The input text is in the UTF-8 format. I have read on the Microsoft support site
>that SQL Serv
Hi Unicoders,
I am working on software to emit HTML in the encoding
and character set of the user's choice, from SGML/XML
documents which can contain any Plane 1 Unicode character.
The question is what to do with characters outside the
selected encoding. I thought I would use the "numeric"
chara
To answer a question someone else posed, a ZWNBSP or a ZWJ will not work
here since the vowel reordering must happen, as well. They are two entirely
different but entirely valid forms of the same groups of letters.
I guess one could claim that the problem is with the current block
description, wh
From: "Rick McGowan" <[EMAIL PROTECTED]>
> > However, it cannot currently be handled by Unicode. You must choose the
> > proper font to display NNA AI, NNNA AI, LA AI, or LLA AI. The Monotype
font
> > and Latha in Windows 2000 are the way that my client got both display
types.
>
> I suppose if you
Karlsson Kent - keka <[EMAIL PROTECTED]> wrote:
> At the level of XML the number of bits is irrelevant.
> The "high and low surrogate" code points are excluded
> from being used as NCRs. A character (not UTF-16 code
> units) can be referenced by NCRs. See (XML) procuction 66
> (CharRef) and its
In HTML or XML you always use the code point (e.g. UTF-32), not a series of
code units (UTF-8 or UTF-16). Thus you would use:
𐄣
not �� from UTF-16
nor ð„£ from UTF-8
Mark
Brendan Murray/DUB/Lotus wrote:
> How can one encode a surrogate character as an entity in HTML/XML? Should
> it be as t
I do not have the confidence which you do in the Ethnologue's taxonomy or
in its freedom from error, Peter. The 50+ document requirement for ISO
639-2 is not unreasonable. Languages should be proposed for inclusion in
ISO 639 wherever appropriate. Other languages can be proposed via RFC 1766.
If
Antoine Leca wrote:
> Michael (michka) Kaplan wrote:
> [...]
> > The Monotype font and Latha in Windows 2000 are the way
> that my client got
> > both display types.
>
> I believe this is a rather special need that your client
> have: as I understand,
> he wants, at the same time, some renderi
How can one encode a surrogate character as an entity in HTML/XML? Should
it be as two separate characters or as one 32-bit value? In other words
should it be:
ꯍïGH;
or
�GH;
Brendan
One option is Word2000. Just open the text file and
the UI for trying different encodings should appear. If it doesn't you can
force it to appear by setting on Tools/Options/General/Confirm conversions on
open. Then when you open the file, choose the type to be Encoded text.
You can t
Michael (michka) Kaplan wrote:
>
[About the representation vs. encoding of Tamil .naa]
>
> Actually, Apurva just did explain it and since she comes from a
> typography background she did explain how the whole problem can be handled
> via fonts. :-)
>
> However, it cannot currently be handled
Ar 6 Sep 2000, ag 21:25 scríobh viswanathan
fán ábhar "Converter for BIG5":
One possibility out of many:
NJStar Communicator 2.2 comes with a Universal Code Converter - look under the nán
[Big5] «n [HZ] ~{DO~} menu. It does conversions between CJK encodings & Unicode
(UCS2, UTF8, UTF7). For 32-
29 matches
Mail list logo