see the glyphs come out correctly.
As not even *reordering* is done, I guess that my Uniscribe DLL does not
support these scripts. Are they implemented in newer versions of Uniscribe?
If yes, where can I get it?
Thanks in advance for any help.
--
Marco Cimarosti
Hallo everybody! I received this in the mail, and I thought it could be of
interestfor the Unicode mailing lits:
Aragonese - Lo geno d'as normas ye que aiga tantas entre ras que se puede
eslexir.
Asturian - Lo bono de les normes ye qu'hai munches onde escoyer.
Basque - Arauen alderik onena da
Mike Ayers wrote:
Side 1 (print and cut out):
++---+---+--+
| U+ | yy zz |Cima's UTF-8 Magic | Hex= |
| U+007F | ! ! |Pocket Encoder | B-4 |
| YZ | . . | | |
Rick McGowan wrote:
I mistakenly thought Tifinagh was rtl.
That's OK. It has been, and sometimes still is, written right
to left, hence it was roadmapped in a right-to-left
allocation block. However, in modern usage, and in the
Moroccan national standard now being drafted, it
is
[\]{}
Anto'nio Martins-Tuva'lkin wrote:
On 2004.06.22, 16:20, Marco Cimarosti wrote:
You can also compose them with the normal letter followed by
character MODIFIER LETTER MACRON (code 02C9, decimal 713).
Oops! You mean U+0304 : COMBINING MACRON (decimal: 772).
Yes, right, sorry.
(Hey
Joe Speroni wrote:
I apologize for a simple question, but after a few hours of
research I don't seem to be able to find the characters needed.
Funny: I see them in my Windows Character Map utility at the first hit on
Page Down key...
I'm trying to scan a Latin text that uses a bar over the
Antoine Leca wrote:
The virus cannot have any knowledge of a language code. And
much less of the language used by its next victim...
It sends e-mails to addresses stolen from the previous victim's address
list, so it can analyze the top-level domain of these addresses (.it,
.fr, etc.).
It seems that even the virus industry is getting global!
F-Secure Virus Descriptions : NetSky.X
[...]
Netsky.X sends messages in several different languages: English, Swedish,
Finnish, Polish, Norwegian, Portuguese, Italian, French, German and possibly
the language of some small island called
Peter Kirk wrote:
mutlu etmek okumak belgili tanimlik belge.
...
This is Turkish, of a sort. The virus writers have presumably
confused .tc and .tk, as this Turkish is the first body listed
and .tc is the first domain listed.
Yes, and the translation was probably done translating word by
Gary P. Grosso wrote:
Judging by what we saw in the back of the Unicode 2.0 book,
we would tend to say that it is correct that (in an index)
21333 (0x5355) is sorting under 21313 (0x5341) instead of
20843 (0x516b). I am looking for some table of radicals
that I can show our customer to help
Rick McGowan wrote:
Unicode 4.0.1 has been released! [...]
The main new features in Unicode 4.0.1 are the following:
[...]
3. Unicode Character Database:
[...]
* Changed: general category of U+200B ZERO WIDTH SPACE
* Changed: bidi class of several characters
(If I am asking a
Michael Everson wrote:
What organization uses the ANARCHY SYMBOL? ;-)
The anarchist movement. Why are you winking?
Ciao.
Marco
Kenneth Whistler wrote:
Why is an Anarchist asking to standardize something?
Why not!? Can you elaborate on this? Myself, I am an anarchist sympathizer,
and I have been deeply interested in a character encoding standard for
nearly ten years now...
Anarchism is against imposing forms of
Jon Wilson wrote:
I disagree that the anarchy symbol is not a character used in the
representation of words. I can write a word beginning with A with
either a simple LATIN CAPITAL LETTER A, or with an Anarchy symbol, or
with an existing CIRCLED LATIN CAPITAL LETTER A.
You can also write an
Peter Kirk wrote:
Come to think of it, a not very large group of them with a
bit of money behind them could buy enough votes to outvote
the corporations and destroy Unicode -
Yes, right, interesting possibility! Not that much money either: a single
punk rock concert would probably raise
Curtis Clark wrote:
Are there any languages that use letters with diacriticals,
but *never* use the base letter without diacriticals?
AFAIK, Thaana is such a case.
Unlike Indic scripts, Thaana has no inherent vowel, so each consonant letter
always takes either a vowel mark or the sukuun (=
John Jenkins wrote:
Anybody understand what he means by there is unicode gamma of
characters but it is not complete?
I guess unicode gamma of characters is Italinglish for Unicode character
set.
(Italian gamma means repertoire, range, scale, set.)
_ Marco
Jon Hanna wrote:
I refuse to rename my UTF-81920!
Doug, Shlomi, there's a new one out there!
Jon, would you mind describing it?
_ Marco
Peter Kirk wrote:
This one also looks dangerous.
What do you mean by dangerous? This is an heuristic algorithm, so it is
only supposed to work always but only in some lucky cases.
If lucky cases average to, say, 20% or less then it is a bad and useless
algorithm; if they average to, say, 80% or
Jon Hanna wrote:
False positives can be caused by the use of U+ (which is
most often encoded as 0x00) which some applications do use
in text files.
I have never seen such a thing, can you make an example?
I can't imagine any use for a NULL in a file apart terminating records or
strings
Peter Kirk wrote:
What do you mean by dangerous? This is an heuristic
algorithm, so it is only supposed to work always [...]
(I meant: it is not supposed to work always)
I would not consider an 80% algorithm to be very good -
depending on the circumstances etc. But if for example 20% of
my
Christopher Cullen wrote:
(2) The Unicode home page says: The Unicode Standard defines
codes for characters used in all the major languages [...]
mathematical symbols, technical symbols, [...].
I suggest that in an enterprise so universal and
cross-cultural as Unicode, the definition of what
Doug Ewell wrote:
In UTF-16 practically any sequence of bytes is valid, and since you
can't assume you know the language, you can't employ distribution
statistics. Twelve years ago, when most text was not Unicode and all
Unicode text was UTF-16, Microsoft documentation suggested a heuristic
Anto'nio Martins-Tuva'lkin wrote:
|O OoOO |
|O oOOO |
| OOo O O|
|OO oOO |
| O o OO|
|OO o|
|O Oo OO |
|O o|
| OOo OOO|
|O OoO |
| OOo OOO|
|O o OOO|
«\N5l#oVO7X7G»?
_ Marco
Hallvard B Furuseth wrote:
I need a function which converts Latin Unicode characters to
the closest equivalent ASCII characters, e.g. é - e.
Before I reinvent the wheel, does any public domain or GPL
code for this already exist?
I don't know, sorry.
If not,
for the most part I expect I
John Cowan wrote:
In the New York City subway system (of underground trains, that is,
not underground pedestrian tunnels!), this letter has been
consistently avoided since 1967, when the system of distinguishing trains
by letter or number was instituted. The only other letters never used are
Doug Ewell wrote:
I'll go farther than that. It's always bothered me that speakers of
European languages, including English but especially French, have seen
fit to rename the cities and internal subdivisions of other countries.
Rightly said!
There is reason to rename Colonia to Kln, Augusta
Michael Everson wrote:
At 11:04 +0100 2003-12-17, Marco Cimarosti wrote:
There is reason to rename Colonia to Köln, Augusta to
Augsburg,
Eboraco to York, Provincia to Provence, and so on.
Nicely said. Subtle irony tends to go over some
people's heads on this list though.
Especially
Philippe Verdy wrote:
#code;cc;nfd;nfkdFolded; # CHAR?; NFD?; NFKDFOLDED?;
# RIAL SIGN
fdfc;;;isolated 0631 06cc 0627 0644; # ??; ?; ?;
The Arial Unicode MS font does not have a glyph for the
Rial currency sign so I won't comment lots about it, even if
it's a special ligature of
Doug Ewell wrote:
This seems very misguided, if true. Alphabetical primacy can
hardly be considered an effective measure of the relative
power or importance of a nation.
[...]
Remember that in the time frame in question, the late '30s and early
'40s, three of the major world powers were
Tim Greenwood wrote:
In my interpretation of the C standard (which I am reading from
http://std.dkuug.dk/JTC1/SC22/WG14/www/docs/n843.pdf) UTF-8 is not a
valid wchar_t encoding if your execution character set contains
characters outside the C0 controls and Basic Latin range, and
UTF-16 is
Hmm. Now here's some C++ source code (syntax colored as
Philippe suggests, to imply that the text editor understands
C++ at least well :enough to color it)
int n = wcslen(Lcafé);
(That's int n = wcslen(Lcafé); for those without HTML email)
The L prefix on a string literal makes it a
I (Marco Cimarosti) wrote:
So, should n equal four or five?
Why not six?
^^^
Errata: seven.
If, in our C(++) compiler, type wchar_t is an alias for
char, and wide character strings are encoded in UTF-8,
and the é is decomposed, then n will be equal to 6
Peter Kirk wrote:
So, should n equal four or five? The answer would appear to
depend on whether or not the source file was saved in NFC
or NFD format.
No, surely not. If the wcslen() function is fully Unicode
conformant, it should give the same output whatever the
canonically
[...]
some greedy investors turned it into a scam just for a quick buck (for
surely it will be quick!)
Sorry, I had to get that off my chest. Hopefully someone
with some pull in Ireland will read this and do something
about it :-)
Or simply flush Guinne$$ and drink Murphix. :-)
Ciao.
I was wondering: what exactly does GB-18030 certification consists of?
I guess that some tests done on the software, but what exactly? Also, where
and who performs this certification? Does the Chinese government do it
directly, or is it out-sourced to external agencies? Does this have to be in
Pim Blokland wrote:
Not only that, but the process making the mistake of thinking it is
UTF-8 also makes the mistake of not generating an error for
encountering malformed byte sequences,
BTW, this process has a name: Internet Explorer.
AND of outputting the result as two 16-bit numbers
Peter Jacobi wrote:
IMHO this doesn't fit well actual Tamil use and raises a lot
of practical problems.
Either there must be an accepted list of these ligatures (but lists of
archaic usage tend to grow), or one is bound to put a preemptive ZWNJ
after every SHA VIRAMA in modern use, to
Peter Constable wrote:
Alternatives given were
(0BB8)(0BCD)(0BB1)(0BC0)
(0BB6)(0BCD)(0BB1)(0BC0) (if and when U+0BB6 becomes Unicode)
(0B9A)(0BBF)(0BB1)(0BC0)
Alternatives to what? The first and third sequence would have distinct
appearances (see attached file), and would consistute
the Unicode Consortium official,
stating whether o not I am allowed to re-distribute the above described
files in a commercial application?
Thank you in advance. Regards.
Marco Cimarosti
(S3, Italy, http://www.essetre.it)
Mark Davis wrote:
Marco, I certainly wouldn't draw that conclusion. This is not
the appropriate forum for a political or ethical discussion,
Of course. I just noticed that those numbers reflect a sad fact of life:
that rich people get more than poor people. As this fact is so obvious to
Mark Davis wrote:
BTW, some time ago I had generated a pie chart of world GDP
divided up by language.
Those quotients are immoral.
Of course, this immorality is not the fault of he who did the calculation:
the immorality is out there, and those infamous numbers are just an
arithmetical
Jill Ramonsky wrote:
[...] I've even invented (and used) some 8-bit encodings which
leave the whole of Latin-1 unchanged (apart from the C1s) and use C1
characters a bit like surrogate pairs to reach the rest.
Doug, are you listening? It seems there's a new clone of UTF:-)Z waiting for
Peter Kirk wrote:
Are we talking about a real non-Latin script, some kind of
syllabary or logographic script, for Swahili and other
Bantu languages? [...]
Or did someone not notice that Marco's comments were about
the word joke?
Indeed.
In the last few months, I have been relatively
Philippe Verdy wrote:
As Africa has been influenced by many foreign invasions,
there may in fact exist other scripts to represent this
language [...]
Yes: until a recent past, Swahili was also commonly written in the Arabic
alphabet.
_ Marco
Chris Jacobs wrote:
[...]
Nevertheless I think if Unicode don't want to decide how the
PUA is to be interpreted
Please take notice of this interpreted: I'll come back to this soon.
it should be at the very least provide a mechanism by which
an user of the PUA can specify which
John Cowan wrote:
You persist in misunderstanding. Suppose I came along and told you
I wanted to create a Unicode codepoint for each word in every language
on Earth. Would you blithely allocate me a 24-billion-codepoint
private space?
Why? 200 millions should be more than enough: that's
Jill Ramonsky wrote:
In my experience, there is a performance hit.
I had to write an API for my employer last year to handle
some aspects of Unicode. We normalised everything to NFD,
not NFC (but that's easier, not harder). Nonetheless, all
the string handling routines were not allowed to
Gautam Sengupta wrote:
--- Marco Cimarosti wrote:
OK but, then, your ZWJ becomes exactly what
Unicode's VIRAMA has always
been: [...]
You are absolutely right. I am suggesting that the
language-specific viramas be retained as
script-specific *explicit* viramas that never
disappear
Gautam Sengupta wrote:
Is there any reason (apart from trying to be
ISCII-conformant) why the Bangla word /ki/ what
cannot be encoded as [KA][ZWJ][I]? Do we really need
combining forms of vowels to encode Indian scripts?
Perhaps you are right that it *would* have been a cleaner design to have
Peter Kirk wrote:
I don't understand the specific issues here... But it does
seem a rather strange design principle that we should
expect a text to be displayed meaningfully even when the font
lacks the glyphs required for proper display.
The fact is that these glyphs are not necessarily
Gautam Sengupta wrote:
I am no programmer, but surely the rendering engine
could be tweaked to display a halant/hashant in the
aforementioned situations? I understand that it won't
happen *automatically* if we were to use ZWJ instead
of VIRAMA. But if you were to take the trouble to do
the
Doug Ewell wrote:
[...]
we'd all use UTF-336. Er?
If only I had a bit more spare time, Jill. You do NOT want to get me
started... :-)
Go for it, Doug! :-)
If I only had a bit of spare time myself, I'd be eager of running
bits-per-character statistics for UTF:-)336 in various
Jony Rosenne wrote:
I don't remember whether Hebrew Braille is written RTL or LTR.
Braille is always LTR, even for Hebrew and Arabic.
To be more precise, Braille is always LTR when you read it, but RTL when you
write it manually (because it is engraved on the back side of the paper,
using a
I (Marco Cimarosti) wrote:
Jony Rosenne wrote:
I don't remember whether Hebrew Braille is written RTL or LTR.
Braille is always LTR, even for Hebrew and Arabic.
Hwæt! I noticed only now that the Bidirectional Category of braille
characters is ON - Other neutrals!
AFAIK, that is completely
Jill Ramonsky wrote:
Hey - the public will just have to get used to it!
No, the public should not be bored with these technical details: in the user
manual, a book will still be a book. The fact that, in the source code
of the application book means something else if of interest only to
Peter Kirk wrote:
For i% = 1 to Len(utf8string$)
c$ = Mid(utf8string$, i%, 1)
Process c$
Next i%
Such a loop would be more efficient in UTF-32 of course, but this is
still a real need for working with character counts.
If the string type and function of this Basic dialect is not
Elliotte Rusty Harold wrote:
A W3C XML Schema Language validator needs a character based API to
correctly implement the minLength and maxLength facets on xsd:string
As far as I understand, xsd:string is a list of Character-s, and a
Character is an integer which can hold any valid Unicode code
Doug Ewell wrote:
Depends on what processing you are talking about. Just to cite the
most obvious case, passing a non-ASCII, UTF-8 string to byte-oriented
strlen() will fail dramatically.
Why? The purpose of strlen() is counting the number of *bytes* needed to
store a certain string, and this
Theodore H. Smith wrote:
Hi lists,
Hi, member.
I'm wondering how people tend to do their non-ascii string processing.
I think no one has been doing ASCII string processing for decades. :-) But I
guess you meant non-SBCS (single byte character set) string processing.
[...]
So, I'm
Stephane Bortzmeyer wrote:
On Mon, Oct 06, 2003 at 12:09:34PM +0200,
Marco Cimarosti [EMAIL PROTECTED] wrote
a message of 14 lines which said:
What strlen() cannot do is countîng the number of
*characters* in a string.
But who cares? I can imagine very few situations where
someone
Stephane Bortzmeyer wrote:
OK. But the length in characters of a string is not
character semantics:
it's plain nonsense, IMHO.
I disagree.
Feel free.
But I still don't see any use in knowing how many characters are in an UTF-8
string, apart the use that I already mentioned: allocating a
Edward H. Trager wrote:
But I still don't see any use in knowing how many
characters are in an UTF-8
string, apart the use that I already mentioned: allocating
a buffer for a
UTF-8 to UTF-32 conversion.
Well, I know a good use for it: a console or terminal-based
application which
This (Peter's) answer is, in my understanding, the nearest to the
truth.
He made the same assumption I did: you declared that your file was UTF-8 but
actually it wasn't. :-)
Here is the problem:
How do I make my keyboard which only produces 8-bit [...]
The keyboard has nothing to do with
[EMAIL PROTECTED] wrote (through Magda Danish):
[...]
Our problem is the representation of the £ sign (British
pound sign - U+00A3). When we type this character into our
pages and then set the character encoding in our pages to
Unicode (UTF-8) (either by setting it directly in the HTTP
[EMAIL PROTECTED] wrote:
In a plain text environment, there is often a need to encode more than
just the plain character. A console, or terminal emulator, is such an
environment. Therefore I propose the following as a technical report
for internal encoding of unicode characters; with one
Michael Everson wrote:
At 08:33 -0700 2003-09-25, John Hudson wrote:
Unicode is an encoding standard for text on computers that allows
documents in any script and language to be entered, stored, edited
and exchanged.
blank stare from layman
Unicode is a code in which every letter of
Doug Ewell wrote:
[...]
(BTW, pet peeve: The word acronym should only be used to mean a
pronounceable WORD (nym) formed from the initials of other words.
Classic examples are scuba and radar. If you can figure
out how to pronounce qbcs, more power to you, but to me it's just
an
It seems that the IT world has a new acronym: QBCS. I understand that it
stands for quadra-byte character set, and I heard it used to refer to GB
13030.
My question is: it just a fancy sinomym for GB 13030 or can it also refer to
Unicode or other encodings?
Thanks in advance.
_ Marco
+ comma + space + identifier
bar).
Regards.
Marco Cimarosti ([EMAIL PROTECTED])
Feedback on UTR#31 (draft 1): Non-Latin Punctuation.
I suggest that a small set of non-Latin punctuation marks be added in class
Pattern_Syntax. Each one of the punctuation marks that I am suggesting to include
Dear Unicoders,
Does any company offer training on ICU programming? I am more interested in
courses located in Europe, but I'd also be glad to know about courses in
North America or elsewhere.
If you feel that this information is not appropriate for the public list,
please feel free to reply
Peter Kirk wrote:
Similarly, Hebrew geresh and gershayim look like quotation
marks and are used interchangeably in legacy encodings,
the same with maqaf and hyphen
- maqaf is very much the cultural equivalent of hyphen, and I
have seen recent discussion about whether the hyphen key on a
Peter Kirk wrote:
Well, the situation with Hebrew sof pasuq is almost identical to that
for Greek and Arabic question marks, except that it is functionally a
full stop not a question mark, so I can't see any reason other than
prejudice for omitting it from the list.
Well, I had a much
Peter Kirk wrote:
But the other way round is less of a problem. So I am suggesting that
for now we define all punctuation characters except for those with
specifically defined operator functions, also all undefined
characters, as giving a syntax error. This makes it possible
to define
Rick McGowan wrote:
the process as possible so that it can be considered
The draft is found at http://www.unicode.org/reports/tr31/
and feedback can be submitted as described there.
(Before submitting official feedback, I'd like to discuss my comments here.
BTW, which Type of Message should I
Jill Ramonsky wrote:
Damn. I guess you guys are all going to hate me for asking
this, but ...
what exactly is a mathematical space?
An compatibility space character used only in typesetting mathematics:
205F;MEDIUM MATHEMATICAL SPACE;Zs;0;WS;compat 0020N;
PS. I'm going to
Mark Davis wrote:
Technical Report issues would be fine.
I think #1 is worth considering. For #2, see other message to
Peter Kirk.
I agree with your statement: The purpose of the Pattern Syntax characters
is *not* to list everything that is a symbol or punctuation mark. But that
is what
Peter Kirk wrote:
[...] I guess English legs tended to be longer than Roman
ones.
Well, if by English you mean those Germanic barbarians who invaded
Britannia, I guess that the British mile existed way before they set their
feet on the island...
_ Marco
Doug Ewell wrote:
Shouldn't a pint of beer be administratively fixed at 500
mL, just as a fifth of liquor in America is now
officially 750 mL? Seems like a good task for an ISO
working group.
You could generalize it a bit: Alignment Of Metric And Imperial Units Whose
Difference Is So Small
Pim Blokland wrote:
It must be a really urgent need if one cares about those 3.28
metres...
4.28 actually.
Ooops.
But are you serious about lengthening the yard to be the same size
as the meter?
I was just joking...
Ha! Fat chance! You might as well suggest we abolish the yard
Anto'nio Martins-Tuva'lkin wrote:
On 2003.08.06, 11:12, Philippe Verdy [EMAIL PROTECTED] wrote:
the placement of the currency unit symbol or multiple is language
dependant, and the same local practices are used with the
euro, as the
one used for pre-euro currencies.
You mean that
Philippe Verdy wrote:
Excessive cross-posting to multiple newsgroups, forums and list
servers is considered bulk (and also opposed to the netiquette). As
this message is targetting a too large audience and out of topic, and
is also a commercial ad, I can say that bulk+unsollicitated makes it
Rob Mount
Q1: Can a character be both alphabetic and diacritic?
I would say yes. My understanding of the Lm general category is: a
diacritic letter.
Q2: Is there a difinitive answer as to whether this is an alphabetic
character?
Strictly speaking, as katakana and hiragana are not alphabets,
Chris Jacobs wrote:
Depends on how much text you need.
If it is just a few words then getting an unipad from
http://www.unipad.org/ would be enough.
You can copy and paste the chars from it.
If this is not enought than have a look at
http://www.tavultesoft.com/keyman/
BTW, Unipad also
Philippe Verdy wrote:
However the interesting part of your question for discussion
in this list is:
- Which Unicode character should be used to encode the
spacing ring? (may conflict with the degree sign, or a
upscript small letter O)
- Should you use a Greek Gamma or a Latin Gamma, and a
[OOOPS! This works better if I set the proper MIME encoding... Sorry]
Philippe Verdy wrote:
This contrasts a lot with the Unicode codepoints assigned to
abstract characters, that are processable out of any
contextual stylesheet, font or markup system, where its only
semantic is in that
Philippe Verdy wrote:
This contrasts a lot with the Unicode codepoints assigned to
abstract characters, that are processable out of any
contextual stylesheet, font or markup system, where its only
semantic is in that case private use with no linguistic
semantic and no abstract character
Brian Doyle wrote:
on 5/29/03 9:15 AM, Marion Gunn at [EMAIL PROTECTED] wrote:
When a reference to using embryonic ISO 639-3 to
'legitimize' SIL's flawed
Ethnologue is let pass with no comment
Why is Ethnologue flawed?
And how is this more on-topic on a mailing list called Unicode
Philippe Verdy wrote:
Savvy is better understood in this context as aware, than
archaic or informal in your English-Italian dictionnary.
No, archaic, American and informal are usage labels, not translations.
The translation is buon senso. (BTW, it is: Dizionario Garzanti di
inglese, Garzanti
Rick McGowan wrote:
2. It is unikely that the Unicode *logo* itself (i.e. the thing at
http://www.unicode.org/webscripts/logo60s2.gif) will be incorporated
directly in any image that people are allowed to put on their
websites, because to put the Unicode logo on a product or whatever
Andrew C. West wrote:
I agree with Philippe on this one. A sensible, and easily
understandable, motto
like The world speaks Unicode would be much better. The
word savvy just
sends a shiver of embarrasment down my spine. Not only is
savvy not a word
that is probably high in the vocabulary
Doug Ewell wrote:
Drop everything and check out a kewl new Windows program available at:
http://users.adelphia.net/~dewell/mathtext.html
𝔬𝔱𝔣𝔩!
_ Marco
Stefan Persson wrote:
Let's say that I have two files, namely file1 file2, in any Unicode
encoding, both starting with a BOM, and I compile them into
one by using
cat file1 file2 file3
in Unix or
copy file1 + file2 file3
in MS-DOS, file3 will have the following contents:
BOM
I (Marco Cimarosti) wrote:
As a minimum, option -v must know the semantics of NL and
LF control codes, of the digits, and the of white space.
Sorry, I meant: option -n.
_ Marco
Kent Karlsson wrote:
I'm not going into the implementation part; just pointing out that
this issue is not something an operating system can ignore.
cat and cp can and shall ignore it. They are octet-level
file operations, attaching no semantics to the octets. Try iconv.
This byte-level
Kenneth Whistler wrote:
Dream on. The information needed exists in books and other
reference source in libraries, book shops, and other collections
across India -- and, for that matter, around the world. It is
merely a matter of collecting the relevant information and
distilling it into
askq1 askq1 wrote:
From: Pim Blokland [EMAIL PROTECTED]
However, you have said this is not what you want!
So what is it that you do want?
I want c/c++ code that will give me UTF8 byte sequence
representing a given code-point,
UTF16 16 bits sequence reppresenting a given
code-point,
askq1 askq1 wrote:
I want c/c++ functions/routines that will convert Unicode to
UTF8/UTF16/UCS2 encodings and vice-versa. Can some-one point
me where can I get these code routines?
Unicode's reference implementation is here, but I don't know how much
up-to-date it is with some tiny changes in
Kenneth Whistler wrote:
[...]
Of course, further weight corrections need to be applied if reading
the standard *below* sea level or in a deep cave.
I hope it will not be consider pedantic to observe that the mass or weight
of a book do not change depending on whether someone is reading it or
1 - 100 of 708 matches
Mail list logo