Depends on what version of Windows you are on. Farsi is not officially
supported in all code points for cp1256. This one is supported in WinME,
Win2000, and WinXP. It maps to 0xED on cp1256 when it does map?
MichKa
Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/
-
Gujarati does not require an IME at all; it is not a script with that huge
of repetoire!
It does require a keyboard though have you looked into Windows XP for
this? It provides not only a keyboard but also an OpenType font and
collation data, as well.
MichKa
Michael Kaplan
Trigeminal
The locale choice covers all of Unicode; the choice of 1033 just means that
the standard collation table is going to be used, with no specific
exceptions that many other languages require.
More info on collation in SQL Server can be found in the following white
paper (it discusses 7.0 as well):
From: [EMAIL PROTECTED]
Well, careful now. The language is English. You mean someone who uses
the
script.
Yes, that is what I meant... I was referring to users of the script. Though
I suppose if they were going to try to tackle the original inner and outer
plates it would not be English but
From: William Overington [EMAIL PROTECTED]
Is there an official Unicode Consortium statement that states, for the
record, that the Unicode Consortium refuses to encode more ligatures and
precomposed characters please?
I think it is quite clearly stated that the ones that ARE present are
From: John H. Jenkins [EMAIL PROTECTED]
At 5:28 PM +0100 10/2/01, Michael Everson wrote:
The CSUR is maintained to support scripts of various kinds. Some of
those (Shavian, Deseret, Tengwar, Cirth) are expected to graduate
into Unicode.
And one of them already has!
And I am sure Apple
From: [EMAIL PROTECTED]
I still live in hopes that someone, John or someone else, will one
day send me a Deseret keyboard layout that is at least SLIGHTLY
standard (meaning more than one person has ever used it).
I need something I can download and read on a Windows machine.
Text or a GIF
From: Yung-Fong Tang [EMAIL PROTECTED]
Can anyone tell me where can I find a online version of the GB18030
standard (yes, I want the STANDARD itself. Not someone's paper talk
about the standard) . Or anyone could tell me where to get a copy of the
standard.
You mean the original Chinese?
From: Yung-Fong Tang
Case mapping ? You have no way to generate mapping table for
case mapping with knowing the character unless you already
define those character have no case or only one case.
Um, Unicode defines a behavior and even properties for unassigned code
points. If you choose not
From: Geoffrey Waigh [EMAIL PROTECTED]
It shouldn't require honest-to-goodness we-were't-kidding
see-here's-one-defined-now characters
In many cases, it did.
for developers to slap themselves on the head
They did -- and they are slapping others around them, too.
and start developing
From: Suzanne M. Topping [EMAIL PROTECTED]
From: Michael Everson [mailto:[EMAIL PROTECTED]]
Three fonts walk into a bar. The barman, wiping a glass, shakes his
head and says to them: I'll have none of your type in here.
Gee, and I thought he was going to say:
Why the long face?
From: Ayers, Mike [EMAIL PROTECTED]
Analyze problem. Pick solution. In that order.
Wiser advise was ne'er spoken, on *this* topic at least.
I wonder is there is some way that a policy decision can be made to declare
a moratorium on the whole *My* UTF is better than *your* UTF for a while?
From: Tom Emerson [EMAIL PROTECTED]
But if I have a text string, and that string is encoded in UTF-16, and
I want to access Unicode character values, then I cannot index that
string in constant time.
To find character n I have to walk all of the 16-bit values in that
string accounting for
From: "Marcin 'Qrczak' Kowalczyk" [EMAIL PROTECTED]
Why would UTF-16 be easier for internal processing than UTF-8?
Both are variable-length encodings.
Good straw man!
Working with UTF-16 is immensely easier than working with UTF-8. As I am am
sure you know! :-)
MichKa
Michael Kaplan
From: Ayers, Mike [EMAIL PROTECTED]
From: John Cowan [mailto:[EMAIL PROTECTED]]
[EMAIL PROTECTED] scripsit:
Oops! One of two Unicode 101 mistakes I made in the
same day. Where was my brain?
Unicode Ate Your Brain, of course! (See my tutorial at
Orlando this year.)
Nah,
From: Carl W. Brown [EMAIL PROTECTED]
However, I do not understand the TSCII for Tamil. Unicode
provides the script separation that they want.
TSCII is mostly out of favor now (tamil.net being the main exception, and
that only because its webmaster hates all established standards for doing
This is the same problem that was discussed extensively for Tamil at TI2001
in Kuala Lampur last month. Basically, it boils down to three problems:
1) Most of the people involved do not understand Unicode or how it works.
2) Most of the people involved expect natural language processing to be a
From: Marco Cimarosti [EMAIL PROTECTED]
Does renaming UTF-8S to CESU-8 fix all the issues that were
discussed on this mailing list at the beginning of last spring?
In my opinion (and the opinion of some others), no. But they do represent
the *attempt* to answer them.
Specifically:
- How
From: [EMAIL PROTECTED]
If Michka is referring to non-compliant CESU-8 parsers, I really
wouldn't care much because CESU-8 is supposed to live in its
own little private world. But if people start compromising their
UTF-8 parsers to accommodate CESU-8 adaptively, it would
be a great blow to
From: Mark Davis [EMAIL PROTECTED]
- A significant reason for CESU-8 garnering enough support was that its
introduction allows the definition of UTF-8 itself to be tightened, to
formally exclude the 3-byte surrogates both in reading and writing.
I do not see this as a valid argument at all
From: Carl W. Brown [EMAIL PROTECTED]
In actuality it would be difficult for IANA to deny a character set for
any
official character set so the decision is actually up to the Unicode
committee.
I concur.
I don't believe that the idea of registering CESU-8 with IANA came from
the
Unicode
From: John Cowan [EMAIL PROTECTED]
False.
IANA's registry is merely de facto: what they register is not in fact
encodings, but *names* of encodings. The charset name ISO646-DE is
legal as an XML encoding, but it would astonish me if any extant
XML parser supports it. (This is one of
From: Carl W. Brown [EMAIL PROTECTED]
It would seem to be that if you either have to change the UTF-8 code to
support CESU-8 or change the UTF-16 compare logic then changing the UTF-16
logic to do code point order compares is a much more containable change
with
a much lower processing
Carl, Doug,
The issues you and Doug brought up were vigorously discussed. For the
decision, all I can say is that not everyone voted for it (which will be a
matter of public record once the preliminary minutes are posted).
D This section of the TR amazed me. In the Summary and
D elsewhere,
From: "Marcin 'Qrczak' Kowalczyk" [EMAIL PROTECTED]
Thu, 13 Sep 2001 12:52:04 -0700, Asmus Freytag [EMAIL PROTECTED]
utf-8 cannot as readily be used as internal format.
It's as easy as UTF-16. Unless you want a broken implementation which
treats surrogates as pairs of characters. It's as
From: Ayers, Mike [EMAIL PROTECTED]
Not in the best mood, am I?
Well, you did forget the all important My encoding is better than your
encoding! at the end. :-)
MichKa
Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/
Actually, you are mistaken.
The decision to encode the Angstrom sign had more to do with the fact that
it ws encoded in many legacy encoding sets. There is no specific rule that
every unit sign must also be encoded. If you can use Unicode to properly
store and render what you need, then there is
More importantly, many speakers coming in later in the week are NOT YET in
San Jose -- not sure what effect this will have on things.
MichKa
Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/
- Original Message -
From: Carl W. Brown
To: Mark Davis ; Unicode
Sent:
From: Keld Jørn Simonsen [EMAIL PROTECTED]
Real-life sorts, like MS Windows sorting or Linux sorting, actually
adheres
to these Danish rules, once you have set up your machine for Danish.
And this is the *true* answer to the whole mess of attempting *multilingual*
sorts -- once the user
From: Mark Davis [EMAIL PROTECTED]
Michael, that isn't the point. There is a problem even when you stick to
one
language.
That is, there are situations where two letters in a language, e.g. ch
in
Slovak, are normally sorted as one. However, in some exceptional
circumstances those letters
From: David Gallardo [EMAIL PROTECTED]
As a practical matter, you need to take the diacritics into account when
sorting, even in English where they (may or may not) have linguistic
significance, otherwise you'll get nondeterministic behaviour. In other
words, résumé and resume should fall
The script setting will not really end up being used in this case. It is
present because it is a fundamental member of the LOGFONT structure, but it
is only used in cases where a device context will not be using Unicode and
needs an intelligent guess as to what code page to use for rendering.
/
- Original Message -
From: Tex Texin [EMAIL PROTECTED]
To: Michael (michka) Kaplan [EMAIL PROTECTED]
Cc: Unicoders [EMAIL PROTECTED]; Gary Clink [EMAIL PROTECTED]
Sent: Wednesday, September 05, 2001 12:06 PM
Subject: Re: Using Unicode fonts for plaintext display on windows 2000
Michael,
thanks
Well, my big guesses:
1) not using the right function get the text in (using WM_SETTEXT via
SendMessageA, TextOutA, ExtTextOutA)
or
2) not creating the window via the right function (using CreateWindowExA)
Those are the only two ways that the script should affect things on Win9x.
Note that VB
From: David Starner [EMAIL PROTECTED]
Frankly, the attitude of Forget all the stuff that you have working;
just throw it all away and move to Unicode is not one that wins many
converts. Backward compatibility and the ability to interface with
other systems running different stuff is always
From: KUSANO Takayuki [EMAIL PROTECTED]
This is only a problem for people who do not want to use Unicode.
But, most people can't live without 'legacy' encodings, because
there are many documents, data in 'legacy' encodings and there are
stille many applications/terminals that cannot
MSLU is documented in the Platform SDK.
BUT you are not going to get Unicode *functionality* from MSLU, from VB or
elsewhere; MSLU only gives you a wrapper layer (and it converts after that),
so the work you would do to make it callable from VB would not actually be
beneficial?
MichKa
Michael
This is not an NT issue so much as a Visual C++ CRT issue (the setloale
function is implemented there, for what you are probably using). At present,
there is no support for this (take a look at the code if you need to know
why, it makes all kinds of assumptions like one byte per character that
This is only a problem for people who do not want to use Unicode. It is
certainly not Unicode's fault that the various [vendor-provided] versions of
standards are incomplete or that they conflict with each other.
Well, I suppose you could also blame Misha, for thinking that EUC-JP + NCRs
would
Well, clearly its a hoax. The assimilated press has always been this way.
Kind of amusing, in its own way. But no, there is no Klingon Freedom League,
and speakers/attendees do not have to fear problems with protests in San
Jose surrounding the conference. :-)
MichKa
Michael Kaplan
Trigeminal
From: Carl W. Brown [EMAIL PROTECTED]
Microsoft now has a solution for you. You can add Unicode support to
Win95/98/Me http://www.microsoft.com/globaldev/Articles/mslu_announce.asp
Well, as wonderful as I think MSLU is (not that I am biased or anything) it
is not going to add Unicode support
From: Carl W. Brown [EMAIL PROTECTED]
I had presumed that he was able to get Unicode support on NT but not 95.
I
did something like this for a VB 3.0 application by writing controls to
extend the language.
Indeed, you can get Unicode support on NT -- but that is *real* Unicode
support. The
From: Iman Saad [EMAIL PROTECTED]
I tried adding the following header in the section
of the cgi script that includes the html code, but that did not change
anything:
META HTTP-EQUIV=Content-Type CONTENT=text/html; CHARSET=UTF-8
If you look at the following link (all on one line)
From: "Adam Twardoch" [EMAIL PROTECTED]
I have just finished reading Ken Lude's "CJKV Information Processing"
(O'Reilly, 1999). While I found much of the information contained in that
book highly helpful, I can't help the feeling that its structure might
need
a slightly more systemmatic
Microsoft's Euro story can be seen at:
http://www.microsoft.com/europe/euro/
Specifically, the Windows info is at
http://microsoft.com/windows/euro.asp
There is no way to arbitarily add code points to a Windows code page,
though. Either you have the patch or the newest file, or you do not. If
I have gotten roughly 100 of them, from various email addresses on my web
site.
michka
- Original Message -
From: Carl W. Brown [EMAIL PROTECTED]
To: Michael Everson [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Tuesday, July 24, 2001 8:47 AM
Subject: RE: some kind of virus?
Michael,
From: jgo [EMAIL PROTECTED]
The following table defines the standard languages used by Microsoft.
This table was generated by the Unicode group for use with TrueType
and Unicode.
I don't see such a table via search from the Unicode site.
Is this just another M$ non-standard standard
From: Marc Durdin [EMAIL PROTECTED]
I must disagree with this statement. I know of quite a few changes to the
LCID list, some of which have caused me considerable pain in the past.
Any of them in winnt.h?
So, there are significant issues with Microsoft's LCIDs:
1. The tables are not
From: Marc Durdin [EMAIL PROTECTED]
I must disagree with this statement. I know of quite a few changes to
the
LCID list, some of which have caused me considerable pain in the past.
Any of them in winnt.h?
Try Serbo-Croatian. Documents created with the old Cyrillic LCID
definitely would
From: Dennis L. Goyette Sr. [EMAIL PROTECTED]
Anybody have any idea of how to display chinese characters in windows
menus
bars? All I get is parallel bars. thanks
This would mean that the font choice for menus is not one that will accept
Chinese characters and you need to change the
michka
the only book on internationalization in VB at
http://www.i18nWithVB.com/
- Original Message -
From: Michael Everson [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Saturday, July 14, 2001 9:56 AM
Subject: Re: Is there Unicode mail out there?
At 09:49 -0700 2001-07-14, Mark
From: Michael Everson [EMAIL PROTECTED]
Then it's not standard and can't be relied upon. Pity.
Actually, it is a standard, as of HTML 4.0. All you need is compliant
browser.
MichKa
Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/
From: [EMAIL PROTECTED]
Can you read this? This is coming from Lotus Notes.
Yes, it looks like you are confused (all those question marks!)
MichKa
Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/
If you mean under Windows, then the answer is that they return Unicode in
Unicode applications. Perhaps more details on the platform you are using?
MichKa
Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/
- Original Message -
From: Adarsh [EMAIL PROTECTED]
To: [EMAIL
From: James Kass [EMAIL PROTECTED]
So, can Internet Explorer now display non-BMP characters ?! interrobang
Still not having any luck on Windows M.E. with Marco Cimarosti's java
charts. Any suggestions?
Thus far, I can make it work on Windows 2000 and Windows XP (with IE5.0,
5.5, and 6.0)
From: John H. Jenkins [EMAIL PROTECTED]
Has the UNIHAN.TXT file been updated to include radical-stroke data
for Plane Two characters?
Yes. Ever since Unicode 3.1 was released. (We still don't have an
Extension B font, however.)
There is one in Office XP's CHS and CHP language packs
From: Richard Cook [EMAIL PROTECTED]
This must be the Beijing Zhong Yi Electronics font ... I heard that
Microsoft was licensing it, but didn't imagine they'd release it so soon
...
The font vendor is listed as BDFX, and the copyright is for the Founder
Corporation. Further respondant sayeth
From: てんどうりゅうじ [EMAIL PROTECTED]
I mean. You take the radical of 水 (water) and add 7 strokes a certain way
to get 酒 (sake).
It was not there, alas.
Actually, you are mistaken; U+9152 does indeed represent the character you
wanted, else this (UTF-8 encoded!) message would not be able to
From: James Kass [EMAIL PROTECTED]
Perhaps he (てんどうりゅうじ) was lamenting the character's absence
in the Han Radical Index section under radical # 85.
If all the characters made from the water radical were listed
under that radical in the Han Radical Index (and so forth),
where would the
From: Michael Everson [EMAIL PROTECTED]
At 09:47 -0700 2001-07-08, Michael \(michka\) Kaplan wrote:
Perhaps a rule needs to be imposed about the amount of sake that should
be
consumed before submitting a character proposal?
I've never had any trouble with beer.
Ah, but that would indicate
From: てんどうりゅうじ [EMAIL PROTECTED]
Perhaps he (縺ヲ繧薙←縺・j繧・≧縺・ was lamenting the character's
absence
in the Han Radical Index section under radical # 85.
Yes. It belongs there.
Its so sad that you do not have a UTF-8 compatible e-mail client. :-(
Come on. What ワープロばか (which probably most
Well, I cannot speak for PowerBuilder (my knowledge of it is very out of
date), but for both Netscape and MS SQL Server you may or may not be able to
support Indic scripts -- the deciding factor will be based on what version
of each product you are using.
Beyond that, I do not think that any one
From: [EMAIL PROTECTED]
What's bad is that work seems to get done on fictional scripts while
there
are still millions of real people (some of whom even have access to
computers) who can't express texts of their natively-used languages with
Unicode because we don't have their scripts encoded.
From: John Cowan [EMAIL PROTECTED]
Just so, which means that the energy spent on invented scripts is nowise
taken away from the energy that could be spent on obscure-but-real
scripts.
Would that it were otherwise.
No one is arguing the FACTUAL basis for the above, but it is quite
reasonable
From: Michael Everson [EMAIL PROTECTED]
The editorial response to comments from national groups, in the
public archive of ISO 10646 stuff that you linked to at the start
of this message, included a complaint about Deseret from the German
Standards body, in that it was inappropriate for being
From: Kenneth Whistler [EMAIL PROTECTED]
I've been lurking on this discussion, but have to chime in here.
I do appreciate it, for what its worth. The chime was very much in tune.
While fully recognizing the importance of Middle Earth to some people it is
difficult for me to get past the fact
From: Kenneth Whistler [EMAIL PROTECTED]
You can just call me a consciencious objector to having anyone who
subscribes to Vinyar Tengwar considering themselves to be among the
Númenoreans (a.k.a. the Dúnedain), who alone of all the races of Men
knew
Elvish tongues. :-)
Aha! I see you
Hee hee - unless you're packing a guide to anime, you'll never find
'em anyway. らんま is Ranma, as in Ranma Saotome, and あかね is Akane, as in
Akane Tendo, the two main stars of Rumiko Takahashi's bizarre (if
monothematic) sex comedy Ranma 1/2.
Seeing this wonderful use of Unicode text in
From: Richard Cook [EMAIL PROTECTED]
now, I know of other phonemic alphabets for English ... e.g., I think
Ben Franklin invented one, ... and I have one of my own. Are any of
these slated for encoding too?
Fictional scripts have been, are, and will likely continue to be a constant
source of
From: John H. Jenkins [EMAIL PROTECTED]
FWIW, there is a small but non-zero Shavian user community, and a
number of fonts are available, some of them very pretty.
Of this I have no doubt -- but this was true of Klingon, also. g
I was expressing doubt that the majority of the community are:
From: John Cowan [EMAIL PROTECTED]
As for whether your script would be encoded, where it ends up vis-a-vis
the
potential roadmap is more a side effect of who you know than anything
else.
Smiley or not, someone might actually believe that, and it
isn't true. Michael Everson is more than
From: "Marcin 'Qrczak' Kowalczyk" [EMAIL PROTECTED]
It's a pity that UTF-16 doesn't encode characters up to U+F, such
that code points corresponding to lone surrogates can be encoded as
pairs of surrogates.
Unfortunately, we would then be stuck with what happens when two such
surrogate
From: [EMAIL PROTECTED]
Oh yeah, well, I can be more tongue-in-cheek than all of you. I've
already
implemented it.
Doug, this is one of those things one should be ashamed of, like believing
in the April Fool's Day message about self serve encodings enough to have
put together a proposal to
From: [EMAIL PROTECTED]
I'm never ashamed of perfectly good code I've written to fulfill a
humorous
requirement. I'm only ashamed of badly written code, or code that
implements
a bad idea that someone else thinks is a good idea.
The latter is kind of the worry I had -- a long time ago I
From: Jianping Yang [EMAIL PROTECTED]
Carl W. Brown wrote:
If there are no surrogates in the database, is there any reason that I
can
not change the database from UTF8 to AL32UTF8?
You can change the database from UTF8 to AL32UTF8 in this case. Also you
can
use Oracle database scanner to
From: [EMAIL PROTECTED]
Waiting until characters were assigned
outside the BMP to start working on the UCS-2 problem is like waiting
until
2000-01-01 to start working on the Y2K problem.
Its actually a bit worse than this -- its coming up with a solution to Y2K
problems that requires other
From: Youtie Effaight [EMAIL PROTECTED]
Well, Mister Constable. What's new about that? Looks to me
like e-Leven Digit Grrl just forgot to turn off her microphone
again... We're witnessing the spacey under-mumble of a quickly
crumbling mind. Maybe we'll get lucky and she'll burn up on
From: [EMAIL PROTECTED]
Can anyone give me a specific example of why Line Breaking or East Asian
Width properties aren't normative?
Why be more specific then there are a lot of people who think they might
possibly have made TOO MUCH normative and do not want to make things
unchangeable that
From: [EMAIL PROTECTED]
On 06/15/2001 06:29:51 PM Michael \(michka\) Kaplan wrote:
Why be more specific then there are a lot of people who think they might
possibly have made TOO MUCH normative and do not want to make things
unchangeable that might be in error or might need to change later
From: [EMAIL PROTECTED]
Out of curiousity, is there documentation on XCCS available anywhere?
Check out google.com: it will get about 120+ hits on the words XCCS
standard and several of them seem vaguely relevant. :-)
MichKa
Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/
From: Carl W. Brown [EMAIL PROTECTED]
I think that UTF-16x would be a better approach than UTF-8s. I am sure
that
I have missed some issues feel free to comment. In any case UTF-16s would
naturally be in Unicode code point order. It would be easy to transform
to
UCS-2 for applications
From: Carl W. Brown [EMAIL PROTECTED]
I first I thought the same thing but I have changed my mind. There are
problems but the problems are with UTF-16 not UTF-8. I don't think that I
am the only one who thinks that UTF-8s will create more problems that it
fixes.
Worse yet they will also
From: Rick McGowan [EMAIL PROTECTED]
... asking for a lavicious license to be lecherously lazy
Parse error at lavicious. No such word appears in any English
dictionary I own, not even the OED.
Sorry, that was to be lascivious.
Glad someone is still parsing in this thread.
michka
(whoops, sent too soon!)
From: Carl W. Brown [EMAIL PROTECTED]
I am proposing that we fix UTF-16.
Are you formally proposing this? For the next UTC meeting? Without an actual
customer that is wanting it for an implementation I am pretty sure this will
be voted down pretty loudly.
michka
From: Carl W. Brown [EMAIL PROTECTED]
I am proposing that we fix UTF-16.
Are you formally proposing this? For the next UTC meeting?
michka
From: Jianping Yang [EMAIL PROTECTED]
If UTF-8S were to by some miracle be accepted by
the UTC, implementers will be put out and offended
for most of the next decade.
If it is, that is rule of law from UTC.
Very true.
devil's advocate And if they vote against it, will you do the
From: Jianping Yang [EMAIL PROTECTED]
Oracle is promoting and following the standard. Same as most other
database
vendors, our database does not fully support supplementary character in
Oracle
8i and Oracle 7. But as we see the need to support it, we extend this
support
in Oracle 9i. So far,
From: Mark Davis [EMAIL PROTECTED]
UTF-8 was defined before UTF-16. At the time it was first defined, there
were no surrogates, so there was no special handling of the D800..DFFF
code
points.
In other words, Oracle has an alternate solution here for 9i -- they can
simply explain that the old
From: "$B$F$s$I$&$j$e$&$8(B" [EMAIL PROTECTED]
A search engine regards the words "stone" and "STONE" as identical.
So why isn't $B$$$7(B treated the same as $B%$%7(B? The difference can be
quite marked, such as $B%l%$%W(B versus $B$l$$$W(B or such.
Well, there is nothing to stop
We don't have Paul Clayton's e-mail address, but I assume you can forward
on, Magda?
SQL Server, ASP, and VB are all able to support UTF-16, which is a 16-bit
per code point encoding form. The term 16 bit character set is a bit
unclear in its meaning, what exactly Paul is looking for here would
From: Mark Davis [EMAIL PROTECTED]
2. Auto-detection does not particularly favor one side or the other.
UTF-8 and UTF-8s are strictly non-overlapping. If you ever encounter a
supplementary character expressed with two 3-byte values, you know you do
not have pure UTF-8. If you ever encounter
From: [EMAIL PROTECTED]
On 06/04/2001 02:10:35 AM Doug Ewell wrote:
While we are at it, here's another argument against the existence of both
UTF-8 and this new UTF-8s. Recently there was a discussion about the use
of
the U+FEFF signature in UTF-8 files, with a fair number of Unicode
From: Misha Wolf [EMAIL PROTECTED]
Let's be careful with the word legal. The strange (per-)version of
UTF-8 which re-encodes UTF-16 is legal input as far as The Unicode
Standard is concerned. It is, however, totally illegal as far as the
IETF, the Internet, the W3C, the WWW, XML, and HTML
From: Marco Cimarosti [EMAIL PROTECTED]
No, please, let's not make waters more muddied than they already are.
Let's
keep on calling Oracle's proposal UTF-8S, as there is no point in
finding
a cuter name for it.
Fair enough.
Wrong point! Perhaps it will not hurt applications which read text
Simon,
Would you care to answer (officially) why exactly Oracle needs for anything
to be done here? Per the spec, it is not illegal for a process to interpret
5/6-byte supplementary characters; it is only illegal to emit them. It seems
that Oracle and everyone else is well covered with the
Simon,
Would you care to answer (officially) why exactly Oracle needs for anything
to be done here? Per the spec, it is not illegal for a process to interpret
5/6-byte supplementary characters; it is only illegal to emit them. It seems
that Oracle and everyone else is well covered with the
From: Jianping Yang [EMAIL PROTECTED]
As a matter of fact, the surrogate or supplementary
character was not defined in the past, so we could
live without Premise B in the past. But now the
supplementary character is defined and will soon be
supported, we have to bother with it.
Poor
From: G. Adam Stanislav [EMAIL PROTECTED]
At 13:11 22-05-2001 -0700, Carl W. Brown wrote:
There is no easy solution.
Yes, there is, though it is probably beyond the scope of this list.
Nevertheless, there is a very simple solution. It needs to be done
on the OS level: Create metafonts.
From: 11 digit boy [EMAIL PROTECTED]
I have worked with many terminal emulator systems that use
mono-spaced fonts. The first place you start having problems
is with script fonts like Arabic. With Indic languages you often
have to reorder characters before rendering
Um. How about having all
From: Graham Asher [EMAIL PROTECTED]
But I guess this is obvious. I just wanted to chime in with the view that
a
single Unicode Font would be useful, and a whole lot better than some
people suggest.
As an implementer of rasterizers and text layout systems I can also state
that the problem
201 - 300 of 575 matches
Mail list logo